scispace - formally typeset
Search or ask a question

Showing papers in "BMC Genomics in 2014"


Journal ArticleDOI
TL;DR: Ngs.plot is a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data and is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.
Abstract: Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. We have developed ngs.plot – a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.

809 citations


Journal ArticleDOI
TL;DR: The proposed method efficiently makes use of information from close and distant relatives for accurate genotype imputation and is fast, owing to its deterministic nature and, therefore, it can easily be used in large data sets where the use of other methods is impractical.
Abstract: Genotype imputation can help reduce genotyping costs particularly for implementation of genomic selection In applications entailing large populations, recovering the genotypes of untyped loci using information from reference individuals that were genotyped with a higher density panel is computationally challenging Popular imputation methods are based upon the Hidden Markov model and have computational constraints due to an intensive sampling process A fast, deterministic approach, which makes use of both family and population information, is presented here All individuals are related and, therefore, share haplotypes which may differ in length and frequency based on their relationships The method starts with family imputation if pedigree information is available, and then exploits close relationships by searching for long haplotype matches in the reference group using overlapping sliding windows The search continues as the window size is shrunk in each chromosome sweep in order to capture more distant relationships The proposed method gave higher or similar imputation accuracy than Beagle and Impute2 in cattle data sets when all available information was used When close relatives of target individuals were present in the reference group, the method resulted in higher accuracy compared to the other two methods even when the pedigree was not used Rare variants were also imputed with higher accuracy Finally, computing requirements were considerably lower than those of Beagle and Impute2 The presented method took 28 minutes to impute from 6 k to 50 k genotypes for 2,000 individuals with a reference size of 64,429 individuals The proposed method efficiently makes use of information from close and distant relatives for accurate genotype imputation In addition to its high imputation accuracy, the method is fast, owing to its deterministic nature and, therefore, it can easily be used in large data sets where the use of other methods is impractical

766 citations


Journal ArticleDOI
TL;DR: This research indicated that GWAS not only identified most of the previously reported QTL controlling seed protein and oil, but also resulted in narrower genomic regions than the regions reported as containing these QTL.
Abstract: Background: Association analysis is an alternative to conventional family-based methods to detect the location of gene(s) or quantitative trait loci (QTL) and provides relatively high resolution in terms of defining the genome position of a gene or QTL. Seed protein and oil concentration are quantitative traits which are determined by the interaction among many genes with small to moderate genetic effects and their interaction with the environment. In this study, a genome-wide association study (GWAS) was performed to identify quantitative trait loci (QTL) controlling seed protein and oil concentration in 298 soybean germplasm accessions exhibiting a wide range of seed protein and oil content. Results: A total of 55,159 single nucleotide polymorphisms (SNPs) were genotyped using various methods including Illumina Infinium and GoldenGate assays and 31,954 markers with minor allele frequency >0.10 were used to estimate linkage disequilibrium (LD) in heterochromatic and euchromatic regions. In euchromatic regions, the mean LD (r 2 ) rapidly declined to 0.2 within 360 Kbp, whereas the mean LD declined to 0.2 at 9,600 Kbp in heterochromatic regions. The GWAS results identified 40 SNPs in 17 different genomic regions significantly associated with seed protein. Of these, the five SNPs with the highest associations and seven adjacent SNPs were located in the 27.6-30.0 Mbp region of Gm20. A major seed protein QTL has been previously mapped to the same location and potential candidate genes have recently been identified in this region. The GWAS results also detected 25 SNPs in 13 different genomic regions associated with seed oil. Of these markers, seven SNPs had a significant association with both protein and oil. Conclusions: This research indicated that GWAS not only identified most of the previously reported QTL controlling seed protein and oil, but also resulted in narrower genomic regions than the regions reported as containing these QTL. The narrower GWAS-defined genome regions will allow more precise marker-assisted allele selection and will expedite positional cloning of the causal gene(s).

515 citations


Journal ArticleDOI
TL;DR: A new pipeline, SNPhylo, to construct phylogenetic trees based on large SNP datasets, which can help a researcher focus more on interpretation of the results of analysis of voluminous data sets, rather than manipulations necessary to accomplish the analysis.
Abstract: Phylogenetic trees are widely used for genetic and evolutionary studies in various organisms. Advanced sequencing technology has dramatically enriched data available for constructing phylogenetic trees based on single nucleotide polymorphisms (SNPs). However, massive SNP data makes it difficult to perform reliable analysis, and there has been no ready-to-use pipeline to generate phylogenetic trees from these data. We developed a new pipeline, SNPhylo, to construct phylogenetic trees based on large SNP datasets. The pipeline may enable users to construct a phylogenetic tree from three representative SNP data file formats. In addition, in order to increase reliability of a tree, the pipeline has steps such as removing low quality data and considering linkage disequilibrium. A maximum likelihood method for the inference of phylogeny is also adopted in generation of a tree in our pipeline. Using SNPhylo, users can easily produce a reliable phylogenetic tree from a large SNP data file. Thus, this pipeline can help a researcher focus more on interpretation of the results of analysis of voluminous data sets, rather than manipulations necessary to accomplish the analysis.

393 citations


Journal ArticleDOI
TL;DR: This work describes a further improved and refined version of the M. truncatula genome (Mt4.0) based on de novo whole genome shotgun assembly of a majority of Illumina and 454 reads using ALLPATHS-LG, and re-annotates the genome through the gene prediction pipeline, which integrates EST, RNA-seq, protein and gene prediction evidences.
Abstract: Medicago truncatula, a close relative of alfalfa, is a preeminent model for studying nitrogen fixation, symbiosis, and legume genomics. The Medicago sequencing project began in 2003 with the goal to decipher sequences originated from the euchromatic portion of the genome. The initial sequencing approach was based on a BAC tiling path, culminating in a BAC-based assembly (Mt3.5) as well as an in-depth analysis of the genome published in 2011. Here we describe a further improved and refined version of the M. truncatula genome (Mt4.0) based on de novo whole genome shotgun assembly of a majority of Illumina and 454 reads using ALLPATHS-LG. The ALLPATHS-LG scaffolds were anchored onto the pseudomolecules on the basis of alignments to both the optical map and the genotyping-by-sequencing (GBS) map. The Mt4.0 pseudomolecules encompass ~360 Mb of actual sequences spanning 390 Mb of which ~330 Mb align perfectly with the optical map, presenting a drastic improvement over the BAC-based Mt3.5 which only contained 70% sequences (~250 Mb) of the current version. Most of the sequences and genes that previously resided on the unanchored portion of Mt3.5 have now been incorporated into the Mt4.0 pseudomolecules, with the exception of ~28 Mb of unplaced sequences. With regard to gene annotation, the genome has been re-annotated through our gene prediction pipeline, which integrates EST, RNA-seq, protein and gene prediction evidences. A total of 50,894 genes (31,661 high confidence and 19,233 low confidence) are included in Mt4.0 which overlapped with ~82% of the gene loci annotated in Mt3.5. Of the remaining genes, 14% of the Mt3.5 genes have been deprecated to an “unsupported” status and 4% are absent from the Mt4.0 predictions. Mt4.0 and its associated resources, such as genome browsers, BLAST-able datasets and gene information pages, can be found on the JCVI Medicago web site ( http://www.jcvi.org/medicago ). The assembly and annotation has been deposited in GenBank (BioProject: PRJNA10791). The heavily curated chromosomal sequences and associated gene models of Medicago will serve as a better reference for legume biology and comparative genomics.

373 citations


Journal ArticleDOI
TL;DR: Improved honey bee genome assembly with a new gene annotation set and a number of genes similar to that of other insect genomes are reported, contrary to what was suggested in OGSv1.0.
Abstract: The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.

370 citations


Journal ArticleDOI
TL;DR: This work developed an analysis flow that uses sequence-based strategies to predict novel GTs, but also exploits a network-based approach to infer the putative substrate classes of these predicted GTs.
Abstract: Bacterial interactions with the environment- and/or host largely depend on the bacterial glycome. The specificities of a bacterial glycome are largely determined by glycosyltransferases (GTs), the enzymes involved in transferring sugar moieties from an activated donor to a specific substrate. Of these GTs their coding regions, but mainly also their substrate specificity are still largely unannotated as most sequence-based annotation flows suffer from the lack of characterized sequence motifs that can aid in the prediction of the substrate specificity. In this work, we developed an analysis flow that uses sequence-based strategies to predict novel GTs, but also exploits a network-based approach to infer the putative substrate classes of these predicted GTs. Our analysis flow was benchmarked with the well-documented GT-repertoire of Campylobacter jejuni NCTC 11168 and applied to the probiotic model Lactobacillus rhamnosus GG to expand our insights in the glycosylation potential of this bacterium. In L. rhamnosus GG we could predict 48 GTs of which eight were not previously reported. For at least 20 of these GTs a substrate relation was inferred. We confirmed through experimental validation our prediction of WelI acting upstream of WelE in the biosynthesis of exopolysaccharides. We further hypothesize to have identified in L. rhamnosus GG the yet undiscovered genes involved in the biosynthesis of glucose-rich glycans and novel GTs involved in the glycosylation of proteins. Interestingly, we also predict GTs with well-known functions in peptidoglycan synthesis to also play a role in protein glycosylation.

302 citations


Journal ArticleDOI
TL;DR: Ribo-Zero-Seq provides equivalent rRNA removal efficiency, coverage uniformity, genome-based mapped reads, and consistently high quality quantification of transcripts, suggesting that RNA-Sequ can be used with FFPE-derived RNAs for gene expression profiling.
Abstract: RNA sequencing (RNA-Seq) is often used for transcriptome profiling as well as the identification of novel transcripts and alternative splicing events. Typically, RNA-Seq libraries are prepared from total RNA using poly(A) enrichment of the mRNA (mRNA-Seq) to remove ribosomal RNA (rRNA), however, this method fails to capture non-poly(A) transcripts or partially degraded mRNAs. Hence, a mRNA-Seq protocol will not be compatible for use with RNAs coming from Formalin-Fixed and Paraffin-Embedded (FFPE) samples. To address the desire to perform RNA-Seq on FFPE materials, we evaluated two different library preparation protocols that could be compatible for use with small RNA fragments. We obtained paired Fresh Frozen (FF) and FFPE RNAs from multiple tumors and subjected these to different gene expression profiling methods. We tested 11 human breast tumor samples using: (a) FF RNAs by microarray, mRNA-Seq, Ribo-Zero-Seq and DSN-Seq (Duplex-Specific Nuclease) and (b) FFPE RNAs by Ribo-Zero-Seq and DSN-Seq. We also performed these different RNA-Seq protocols using 10 TCGA tumors as a validation set. The data from paired RNA samples showed high concordance in transcript quantification across all protocols and between FF and FFPE RNAs. In both FF and FFPE, Ribo-Zero-Seq removed rRNA with comparable efficiency as mRNA-Seq, and it provided an equivalent or less biased coverage on gene 3′ ends. Compared to mRNA-Seq where 69% of bases were mapped to the transcriptome, DSN-Seq and Ribo-Zero-Seq contained significantly fewer reads mapping to the transcriptome (20-30%); in these RNA-Seq protocols, many if not most reads mapped to intronic regions. Approximately 14 million reads in mRNA-Seq and 45–65 million reads in Ribo-Zero-Seq or DSN-Seq were required to achieve the same gene detection levels as a standard Agilent DNA microarray. Our results demonstrate that compared to mRNA-Seq and microarrays, Ribo-Zero-Seq provides equivalent rRNA removal efficiency, coverage uniformity, genome-based mapped reads, and consistently high quality quantification of transcripts. Moreover, Ribo-Zero-Seq and DSN-Seq have consistent transcript quantification using FFPE RNAs, suggesting that RNA-Seq can be used with FFPE-derived RNAs for gene expression profiling.

279 citations


Journal ArticleDOI
TL;DR: An integrated map of the genome, transcriptome and immunome of an epithelial mouse tumor, the CT26 colon carcinoma cell line, which predicts that CT26 is refractory to anti-EGFR mAbs and sensitive to MEK and MET inhibitors, as have been previously reported.
Abstract: Tumor models are critical for our understanding of cancer and the development of cancer therapeutics. Here, we present an integrated map of the genome, transcriptome and immunome of an epithelial mouse tumor, the CT26 colon carcinoma cell line. We found that Kras is homozygously mutated at p.G12D, Apc and Tp53 are not mutated, and Cdkn2a is homozygously deleted. Proliferation and stem-cell markers, including Top2a, Birc5 (Survivin), Cldn6 and Mki67, are highly expressed while differentiation and top-crypt markers Muc2, Ms4a8a (MS4A8B) and Epcam are not. Myc, Trp53 (tp53), Mdm2, Hif1a, and Nras are highly expressed while Egfr and Flt1 are not. MHC class I but not MHC class II is expressed. Several known cancer-testis antigens are expressed, including Atad2, Cep55, and Pbk. The highest expressed gene is a mutated form of the mouse tumor antigen gp70. Of the 1,688 non-synonymous point variations, 154 are both in expressed genes and in peptides predicted to bind MHC and thus potential targets for immunotherapy development. Based on its molecular signature, we predicted that CT26 is refractory to anti-EGFR mAbs and sensitive to MEK and MET inhibitors, as have been previously reported. CT26 cells share molecular features with aggressive, undifferentiated, refractory human colorectal carcinoma cells. As CT26 is one of the most extensively used syngeneic mouse tumor models, our data provide a map for the rationale design of mode-of-action studies for pre-clinical evaluation of targeted- and immunotherapies.

262 citations


Journal ArticleDOI
TL;DR: The comprehensive mouse alpha and beta cell transcriptomes complemented by the comparison of the global (dis)similarities between mouse and human beta cells represent invaluable resources to boost the accuracy by which rodent models offer guidance in finding cures for human diabetes.
Abstract: Insulin producing beta cell and glucagon producing alpha cells are colocalized in pancreatic islets in an arrangement that facilitates the coordinated release of the two principal hormones that regulate glucose homeostasis and prevent both hypoglycemia and diabetes. However, this intricate organization has also complicated the determination of the cellular source(s) of the expression of genes that are detected in the islet. This reflects a significant gap in our understanding of mouse islet physiology, which reduces the effectiveness by which mice model human islet disease. To overcome this challenge, we generated a bitransgenic reporter mouse that faithfully labels all beta and alpha cells in mouse islets to enable FACS-based purification and the generation of comprehensive transcriptomes of both populations. This facilitates systematic comparison across thousands of genes between the two major endocrine cell types of the islets of Langerhans whose principal hormones are of cardinal importance for glucose homeostasis. Our data leveraged against similar data for human beta cells reveal a core common beta cell transcriptome of 9900+ genes. Against the backdrop of overall similar beta cell transcriptomes, we describe marked differences in the repertoire of receptors and long non-coding RNAs between mouse and human beta cells. The comprehensive mouse alpha and beta cell transcriptomes complemented by the comparison of the global (dis)similarities between mouse and human beta cells represent invaluable resources to boost the accuracy by which rodent models offer guidance in finding cures for human diabetes.

249 citations


Journal ArticleDOI
TL;DR: The results support increased transcription of retrotransposons in transformed cells, which may explain the somatic retrotransposition events recently reported in several types of cancers.
Abstract: Repetitive elements comprise at least 55% of the human genome with more recent estimates as high as two-thirds. Most of these elements are retrotransposons, DNA sequences that can insert copies of themselves into new genomic locations by a “copy and paste” mechanism. These mobile genetic elements play important roles in shaping genomes during evolution, and have been implicated in the etiology of many human diseases. Despite their abundance and diversity, few studies investigated the regulation of endogenous retrotransposons at the genome-wide scale, primarily because of the technical difficulties of uniquely mapping high-throughput sequencing reads to repetitive DNA. Here we develop a new computational method called RepEnrich to study genome-wide transcriptional regulation of repetitive elements. We show that many of the Long Terminal Repeat retrotransposons in humans are transcriptionally active in a cell line-specific manner. Cancer cell lines display increased RNA Polymerase II binding to retrotransposons than cell lines derived from normal tissue. Consistent with increased transcriptional activity of retrotransposons in cancer cells we found significantly higher levels of L1 retrotransposon RNA expression in prostate tumors compared to normal-matched controls. Our results support increased transcription of retrotransposons in transformed cells, which may explain the somatic retrotransposition events recently reported in several types of cancers.

Journal ArticleDOI
TL;DR: The differences between these four varieties of A. pullulans are large enough to justify their redefinition as separate species and the availability of the genome sequences of the four Aureobasidium species should improve their biotechnological exploitation and promote the understanding of their stress-tolerance mechanisms, diverse lifestyles, and pathogenic potential.
Abstract: Aureobasidium pullulans is a black-yeast-like fungus used for production of the polysaccharide pullulan and the antimycotic aureobasidin A, and as a biocontrol agent in agriculture. It can cause opportunistic human infections, and it inhabits various extreme environments. To promote the understanding of these traits, we performed de-novo genome sequencing of the four varieties of A. pullulans. The 25.43-29.62 Mb genomes of these four varieties of A. pullulans encode between 10266 and 11866 predicted proteins. Their genomes encode most of the enzyme families involved in degradation of plant material and many sugar transporters, and they have genes possibly associated with degradation of plastic and aromatic compounds. Proteins believed to be involved in the synthesis of pullulan and siderophores, but not of aureobasidin A, are predicted. Putative stress-tolerance genes include several aquaporins and aquaglyceroporins, large numbers of alkali-metal cation transporters, genes for the synthesis of compatible solutes and melanin, all of the components of the high-osmolarity glycerol pathway, and bacteriorhodopsin-like proteins. All of these genomes contain a homothallic mating-type locus. The differences between these four varieties of A. pullulans are large enough to justify their redefinition as separate species: A. pullulans, A. melanogenum, A. subglaciale and A. namibiae. The redundancy observed in several gene families can be linked to the nutritional versatility of these species and their particular stress tolerance. The availability of the genome sequences of the four Aureobasidium species should improve their biotechnological exploitation and promote our understanding of their stress-tolerance mechanisms, diverse lifestyles, and pathogenic potential.

Journal ArticleDOI
TL;DR: Examination of gut microbial composition in Obese rats, non-obese Wistar rats and Spontaneously Hypertensive rats indicated that non-Obese and hypertensive rats harbor a different gut microbiota from obese rats and that exercise training alters gut microbiotaFrom an obese and hypert Offensive genotype background.
Abstract: Obesity is a multifactor disease associated with cardiovascular disorders such as hypertension. Recently, gut microbiota was linked to obesity pathogenesisand shown to influence the host metabolism. Moreover, several factors such as host-genotype and life-style have been shown to modulate gut microbiota composition. Exercise is a well-known agent used for the treatment of numerous pathologies, such as obesity and hypertension; it has recently been demonstrated to shape gut microbiota consortia. Since exercise-altered microbiota could possibly improve the treatment of diseases related to dysfunctional microbiota, this study aimed to examine the effect of controlled exercise training on gut microbial composition in Obese rats (n = 3), non-obese Wistar rats (n = 3) and Spontaneously Hypertensive rats (n = 3). Pyrosequencing of 16S rRNA genes from fecal samples collected before and after exercise training was used for this purpose. Exercise altered the composition and diversity of gut bacteria at genus level in all rat lineages. Allobaculum (Hypertensive rats), Pseudomonas and Lactobacillus (Obese rats) were shown to be enriched after exercise, while Streptococcus (Wistar rats), Aggregatibacter and Sutturella (Hypertensive rats) were more enhanced before exercise. A significant correlation was seen in the Clostridiaceae and Bacteroidaceae families and Oscillospira and Ruminococcus genera with blood lactate accumulation. Moreover, Wistar and Hypertensive rats were shown to share a similar microbiota composition, as opposed to Obese rats. Finally, Streptococcus alactolyticus, Bifidobacterium animalis, Ruminococcus gnavus, Aggregatibacter pneumotropica and Bifidobacterium pseudolongum were enriched in Obese rats. These data indicate that non-obese and hypertensive rats harbor a different gut microbiota from obese rats and that exercise training alters gut microbiota from an obese and hypertensive genotype background.

Journal ArticleDOI
TL;DR: This study provides the first insights into the fish intestinal microbiome and its changes under starvation, with dramatic enrichment of Bacteroidetes, but significant depletion of Betaproteobacteria in starved intestines.
Abstract: Starvation not only affects the nutritional and health status of the animals, but also the microbial composition in the host’s intestine. Next-generation sequencing provides a unique opportunity to explore gut microbial communities and their interactions with hosts. However, studies on gut microbiomes have been conducted predominantly in humans and land animals. Not much is known on gut microbiomes of aquatic animals and their changes under changing environmental conditions. To address this shortcoming, we determined the microbial gene catalogue, and investigated changes in the microbial composition and host-microbe interactions in the intestine of Asian seabass in response to starvation. We found 33 phyla, 66 classes, 130 orders and 278 families in the intestinal microbiome. Proteobacteria (48.8%), Firmicutes (15.3%) and Bacteroidetes (8.2%) were the three most abundant bacteria taxa. Comparative analyses of the microbiome revealed shifts in bacteria communities, with dramatic enrichment of Bacteroidetes, but significant depletion of Betaproteobacteria in starved intestines. In addition, significant differences in clusters of orthologous groups (COG) functional categories and orthologous groups were observed. Genes related to antibiotic activity in the microbiome were significantly enriched in response to starvation, and host genes related to the immune response were generally up-regulated. This study provides the first insights into the fish intestinal microbiome and its changes under starvation. Further detailed study on interactions between intestinal microbiomes and hosts under dynamic conditions will shed new light on how the hosts and microbes respond to the changing environment.

Journal ArticleDOI
TL;DR: The workflow presented proved to be a cost-efficient alternative to Sanger sequencing for high-throughput HLA typing and “neXtype” for streamlined data analysis and HLA allele assignment was developed.
Abstract: A close match of the HLA alleles between donor and recipient is an important prerequisite for successful unrelated hematopoietic stem cell transplantation. To increase the chances of finding an unrelated donor, registries recruit many hundred thousands of volunteers each year. Many registries with limited resources have had to find a trade-off between cost and resolution and extent of typing for newly recruited donors in the past. Therefore, we have taken advantage of recent improvements in NGS to develop a workflow for low-cost, high-resolution HLA typing. We have established a straightforward three-step workflow for high-throughput HLA typing: Exons 2 and 3 of HLA-A, -B, -C, -DRB1, -DQB1 and -DPB1 are amplified by PCR on Fluidigm Access Array microfluidic chips. Illumina sequencing adapters and sample specific tags are directly incorporated during PCR. Upon pooling and cleanup, 384 samples are sequenced in a single Illumina MiSeq run. We developed “neXtype” for streamlined data analysis and HLA allele assignment. The workflow was validated with 1140 samples typed at 6 loci. All neXtype results were concordant with the Sanger sequences, demonstrating error-free typing of more than 6000 HLA loci. Current capacity in routine operation is 12,000 samples per week. The workflow presented proved to be a cost-efficient alternative to Sanger sequencing for high-throughput HLA typing. Despite the focus on cost efficiency, resolution exceeds the current standards of Sanger typing for donor registration.

Journal ArticleDOI
TL;DR: This study represents the first insights into the genomic organization and metabolic traits of the seventh order of methanogens and suggests a different handling of the Pyl-encoding capacity among the three analyzed Methanomassiliicoccales representatives.
Abstract: A seventh order of methanogens, the Methanomassiliicoccales, has been identified in diverse anaerobic environments including the gastrointestinal tracts (GIT) of humans and other animals and may contribute significantly to methane emission and global warming Methanomassiliicoccales are phylogenetically distant from all other orders of methanogens and belong to a large evolutionary branch composed by lineages of non-methanogenic archaea such as Thermoplasmatales, the Deep Hydrothermal Vent Euryarchaeota-2 (DHVE-2, Aciduliprofundum boonei) and the Marine Group-II (MG-II) To better understand this new order and its relationship to other archaea, we manually curated and extensively compared the genome sequences of three Methanomassiliicoccales representatives derived from human GIT microbiota, “Candidatus Methanomethylophilus alvus", “Candidatus Methanomassiliicoccus intestinalis” and Methanomassiliicoccus luminyensis Comparative analyses revealed atypical features, such as the scattering of the ribosomal RNA genes in the genome and the absence of eukaryotic-like histone gene otherwise present in most of Euryarchaeota genomes Previously identified in Thermoplasmatales genomes, these features are presently extended to several completely sequenced genomes of this large evolutionary branch, including MG-II and DHVE2 The three Methanomassiliicoccales genomes share a unique composition of genes involved in energy conservation suggesting an original combination of two main energy conservation processes previously described in other methanogens They also display substantial differences with each other, such as their codon usage, the nature and origin of their CRISPRs systems and the genes possibly involved in particular environmental adaptations The genome of M luminyensis encodes several features to thrive in soil and sediment conditions suggesting its larger environmental distribution than GIT Conversely, “Ca M alvus” and “Ca M intestinalis” do not present these features and could be more restricted and specialized on GIT Prediction of the amber codon usage, either as a termination signal of translation or coding for pyrrolysine revealed contrasted patterns among the three genomes and suggests a different handling of the Pyl-encoding capacity This study represents the first insights into the genomic organization and metabolic traits of the seventh order of methanogens It suggests contrasted evolutionary history among the three analyzed Methanomassiliicoccales representatives and provides information on conserved characteristics among the overall methanogens and among Thermoplasmata

Journal ArticleDOI
TL;DR: Insight is given into how SNPs impact gene regulation and the notion that peripheral blood may be a reliable correlate of physiological processes in other tissues is supported.
Abstract: Individual genotypes at specific loci can result in different patterns of DNA methylation. These methylation quantitative trait loci (meQTLs) influence methylation across extended genomic regions and may underlie direct SNP associations or gene-environment interactions. We hypothesized that the detection of meQTLs varies with ancestral population, developmental stage, and tissue type. We explored this by analyzing seven datasets that varied by ancestry (African American vs. Caucasian), developmental stage (neonate vs. adult), and tissue type (blood vs. four regions of postmortem brain) with genome-wide DNA methylation and SNP data. We tested for meQTLs by constructing linear regression models of methylation levels at each CpG site on SNP genotypes within 50 kb under an additive model controlling for multiple tests. Most meQTLs mapped to intronic regions, although a limited number appeared to occur in synonymous or nonsynonymous coding SNPs. We saw significant overlap of meQTLs between ancestral groups, developmental stages, and tissue types, with the highest rates of overlap within the four brain regions. Compared with a random group of SNPs with comparable frequencies, meQTLs were more likely to be 1) represented among the most associated SNPs in the WTCCC bipolar disorder results and 2) located in microRNA binding sites. These data give us insight into how SNPs impact gene regulation and support the notion that peripheral blood may be a reliable correlate of physiological processes in other tissues.

Journal ArticleDOI
TL;DR: The high density Affymetrix® Axiom® Maize Genotyping Array is optimized for European and American temperate maize and was developed based on a diverse sample panel by applying stringent quality filter criteria to ensure its suitability for a broad range of applications.
Abstract: High density genotyping data are indispensable for genomic analyses of complex traits in animal and crop species. Maize is one of the most important crop plants worldwide, however a high density SNP genotyping array for analysis of its large and highly dynamic genome was not available so far. We developed a high density maize SNP array composed of 616,201 variants (SNPs and small indels). Initially, 57 M variants were discovered by sequencing 30 representative temperate maize lines and then stringently filtered for sequence quality scores and predicted conversion performance on the array resulting in the selection of 1.2 M polymorphic variants assayed on two screening arrays. To identify high-confidence variants, 285 DNA samples from a broad genetic diversity panel of worldwide maize lines including the samples used for sequencing, important founder lines for European maize breeding, hybrids, and proprietary samples with European, US, semi-tropical, and tropical origin were used for experimental validation. We selected 616 k variants according to their performance during validation, support of genotype calls through sequencing data, and physical distribution for further analysis and for the design of the commercially available Affymetrix® Axiom® Maize Genotyping Array. This array is composed of 609,442 SNPs and 6,759 indels. Among these are 116,224 variants in coding regions and 45,655 SNPs of the Illumina® MaizeSNP50 BeadChip for study comparison. In a subset of 45,974 variants, apart from the target SNP additional off-target variants are detected, which show only a minor bias towards intermediate allele frequencies. We performed principal coordinate and admixture analyses to determine the ability of the array to detect and resolve population structure and investigated the extent of LD within a worldwide validation panel. The high density Affymetrix® Axiom® Maize Genotyping Array is optimized for European and American temperate maize and was developed based on a diverse sample panel by applying stringent quality filter criteria to ensure its suitability for a broad range of applications. With 600 k variants it is the largest currently publically available genotyping array in crop species.

Journal ArticleDOI
TL;DR: The results indicate that direct and selective methylation of certain TFBS that prevents TF binding is restricted to special cases and cannot be considered as a general regulatory mechanism of transcription.
Abstract: DNA methylation in promoters is closely linked to downstream gene repression. However, whether DNA methylation is a cause or a consequence of gene repression remains an open question. If it is a cause, then DNA methylation may affect the affinity of transcription factors (TFs) for their binding sites (TFBSs). If it is a consequence, then gene repression caused by chromatin modification may be stabilized by DNA methylation. Until now, these two possibilities have been supported only by non-systematic evidence and they have not been tested on a wide range of TFs. An average promoter methylation is usually used in studies, whereas recent results suggested that methylation of individual cytosines can also be important. We found that the methylation profiles of 16.6% of cytosines and the expression profiles of neighboring transcriptional start sites (TSSs) were significantly negatively correlated. We called the CpGs corresponding to such cytosines “traffic lights”. We observed a strong selection against CpG “traffic lights” within TFBSs. The negative selection was stronger for transcriptional repressors as compared with transcriptional activators or multifunctional TFs as well as for core TFBS positions as compared with flanking TFBS positions. Our results indicate that direct and selective methylation of certain TFBS that prevents TF binding is restricted to special cases and cannot be considered as a general regulatory mechanism of transcription.

Journal ArticleDOI
TL;DR: An algorithm for identifying structural variation from DNA resequencing data is implemented as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes and is able to reliably predict structural variation with modest read-depth coverage of the reference genome.
Abstract: Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for ~25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation with modest read-depth coverage of the reference genome (>40-fold). Using breseq to predict structural variation should be useful for studies of microbial epidemiology, experimental evolution, synthetic biology, and genetics when a reference genome for a closely related strain is available. In these cases, breseq can discover mutations that may be responsible for important or unintended changes in genomes that might otherwise go undetected.

Journal ArticleDOI
TL;DR: This study provided a comprehensive view of AS under salt stress and revealed novel insights into the potential roles of AS in plant response to salt stress, suggesting a complex loop in AS regulation for stress adaptation.
Abstract: Alternative splicing (AS) of precursor mRNA (pre-mRNA) is an important gene regulation process that potentially regulates many physiological processes in plants, including the response to abiotic stresses such as salt stress. To analyze global changes in AS under salt stress, we obtained high-coverage (~200 times) RNA sequencing data from Arabidopsis thaliana seedlings that were treated with different concentrations of NaCl. We detected that ~49% of all intron-containing genes were alternatively spliced under salt stress, 10% of which experienced significant differential alternative splicing (DAS). Furthermore, AS increased significantly under salt stress compared with under unstressed conditions. We demonstrated that most DAS genes were not differentially regulated by salt stress, suggesting that AS may represent an independent layer of gene regulation in response to stress. Our analysis of functional categories suggested that DAS genes were associated with specific functional pathways, such as the pathways for the responses to stresses and RNA splicing. We revealed that serine/arginine-rich (SR) splicing factors were frequently and specifically regulated in AS under salt stresses, suggesting a complex loop in AS regulation for stress adaptation. We also showed that alternative splicing site selection (SS) occurred most frequently at 4 nucleotides upstream or downstream of the dominant sites and that exon skipping tended to link with alternative SS. Our study provided a comprehensive view of AS under salt stress and revealed novel insights into the potential roles of AS in plant response to salt stress.

Journal ArticleDOI
TL;DR: It is concluded that chronic regular smoking is associated with changes in peripheral mononuclear cell methylation signature which perturb inflammatory and immune function pathways and may contribute to increased vulnerability for complex illnesses with inflammatory components.
Abstract: Background Regular smoking is associated with a wide variety of syndromes with prominent inflammatory components such as cancer, obesity and type 2 diabetes. Heavy regular smoking is also associated with changes in the DNA methylation of peripheral mononuclear cells. However, in younger smokers, inflammatory epigenetic findings are largely absent which suggests the inflammatory response(s) to smoking may be dose dependent. To help understand whether peripheral mononuclear cells have a role in mediating these responses in older smokers with higher cumulative smoke exposure, we examined genome-wide DNA methylation in a group of well characterized adult African American subjects informative for smoking, as well as serum C-reactive protein (CRP) and interleukin-6 receptor (IL6R) levels. In addition, complementary bioinformatic analyses were conducted to delineate possible pathways affected by long-term smoking.

Journal ArticleDOI
TL;DR: This manuscript describes the first high-density SNP genotyping array for Atlantic salmon, likely to be used as a platform for high-resolution genetics research into traits of evolutionary and economic importance in salmonids and in aquaculture breeding programs via genomic selection.
Abstract: Dense single nucleotide polymorphism (SNP) genotyping arrays provide extensive information on polymorphic variation across the genome of species of interest. Such information can be used in studies of the genetic architecture of quantitative traits and to improve the accuracy of selection in breeding programs. In Atlantic salmon (Salmo salar), these goals are currently hampered by the lack of a high-density SNP genotyping platform. Therefore, the aim of the study was to develop and test a dense Atlantic salmon SNP array. SNP discovery was performed using extensive deep sequencing of Reduced Representation (RR-Seq), Restriction site-Associated DNA (RAD-Seq) and mRNA (RNA-Seq) libraries derived from farmed and wild Atlantic salmon samples (n = 283) resulting in the discovery of > 400 K putative SNPs. An Affymetrix Axiom® myDesign Custom Array was created and tested on samples of animals of wild and farmed origin (n = 96) revealing a total of 132,033 polymorphic SNPs with high call rate, good cluster separation on the array and stable Mendelian inheritance in our sample. At least 38% of these SNPs are from transcribed genomic regions and therefore more likely to include functional variants. Linkage analysis utilising the lack of male recombination in salmonids allowed the mapping of 40,214 SNPs distributed across all 29 pairs of chromosomes, highlighting the extensive genome-wide coverage of the SNPs. An identity-by-state clustering analysis revealed that the array can clearly distinguish between fish of different origins, within and between farmed and wild populations. Finally, Y-chromosome-specific probes included on the array provide an accurate molecular genetic test for sex. This manuscript describes the first high-density SNP genotyping array for Atlantic salmon. This array will be publicly available and is likely to be used as a platform for high-resolution genetics research into traits of evolutionary and economic importance in salmonids and in aquaculture breeding programs via genomic selection.

Journal ArticleDOI
TL;DR: Diverse predicted functions and expression patterns in the repertoire of S. sclerotiorum effector candidates will facilitate the functional analysis of fungal pathogenicity determinants and should prove useful in the search for plant quantitative disease resistance components active against the white mold.
Abstract: The white mold fungus Sclerotinia sclerotiorum is a devastating necrotrophic plant pathogen with a remarkably broad host range. The interaction of necrotrophs with their hosts is more complex than initially thought, and still poorly understood. We combined bioinformatics approaches to determine the repertoire of S. sclerotiorum effector candidates and conducted detailed sequence and expression analyses on selected candidates. We identified 486 S. sclerotiorum secreted protein genes expressed in planta, many of which have no predicted enzymatic activity and may be involved in the interaction between the fungus and its hosts. We focused on those showing (i) protein domains and motifs found in known fungal effectors, (ii) signatures of positive selection, (iii) recent gene duplication, or (iv) being S. sclerotiorum-specific. We identified 78 effector candidates based on these properties. We analyzed the expression pattern of 16 representative effector candidate genes on four host plants and revealed diverse expression patterns. These results reveal diverse predicted functions and expression patterns in the repertoire of S. sclerotiorum effector candidates. They will facilitate the functional analysis of fungal pathogenicity determinants and should prove useful in the search for plant quantitative disease resistance components active against the white mold.

Journal ArticleDOI
TL;DR: DLD1 and SW480 colon carcinoma cell lines are suitable model systems to study Wnt/β- catenin signaling and associated colorectal carcinogenesis and the confirmed and the newly identified potential β-catenin target genes are useful starting points for further studies.
Abstract: Deregulation of Wnt/β-catenin signaling is a hallmark of the majority of sporadic forms of colorectal cancer and results in increased stability of the protein β-catenin. β-catenin is then shuttled into the nucleus where it activates the transcription of its target genes, including the proto-oncogenes MYC and CCND1 as well as the genes encoding the basic helix-loop-helix (bHLH) proteins ASCL2 and ITF-2B. To identify genes commonly regulated by β-catenin in colorectal cancer cell lines, we analyzed β-catenin target gene expression in two non-isogenic cell lines, DLD1 and SW480, using DNA microarrays and compared these genes to β-catenin target genes published in the PubMed database and DNA microarray data presented in the Gene Expression Omnibus (GEO) database. Treatment of DLD1 and SW480 cells with β-catenin siRNA resulted in differential expression of 1501 and 2389 genes, respectively. 335 of these genes were regulated in the same direction in both cell lines. Comparison of these data with published β-catenin target genes for the colon carcinoma cell line LS174T revealed 193 genes that are regulated similarly in all three cell lines. The overlapping gene set includes confirmed β-catenin target genes like AXIN2, MYC, and ASCL2. We also identified 11 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways that are regulated similarly in DLD1 and SW480 cells and one pathway – the steroid biosynthesis pathway – was regulated in all three cell lines. Based on the large number of potential β-catenin target genes found to be similarly regulated in DLD1, SW480 and LS174T cells as well as the large overlap with confirmed β-catenin target genes, we conclude that DLD1 and SW480 colon carcinoma cell lines are suitable model systems to study Wnt/β-catenin signaling and associated colorectal carcinogenesis. Furthermore, the confirmed and the newly identified potential β-catenin target genes are useful starting points for further studies.

Journal ArticleDOI
TL;DR: The data should help researchers who are planning exome sequencing to select appropriate exome capture technology for their particular application and show key differences in performance between the four technologies.
Abstract: Recent developments in deep (next-generation) sequencing technologies are significantly impacting medical research. The global analysis of protein coding regions in genomes of interest by whole exome sequencing is a widely used application. Many technologies for exome capture are commercially available; here we compare the performance of four of them: NimbleGen’s SeqCap EZ v3.0, Agilent’s SureSelect v4.0, Illumina’s TruSeq Exome, and Illumina’s Nextera Exome, all applied to the same human tumor DNA sample. Each capture technology was evaluated for its coverage of different exome databases, target coverage efficiency, GC bias, sensitivity in single nucleotide variant detection, sensitivity in small indel detection, and technical reproducibility. In general, all technologies performed well; however, our data demonstrated small, but consistent differences between the four capture technologies. Illumina technologies cover more bases in coding and untranslated regions. Furthermore, whereas most of the technologies provide reduced coverage in regions with low or high GC content, the Nextera technology tends to bias towards target regions with high GC content. We show key differences in performance between the four technologies. Our data should help researchers who are planning exome sequencing to select appropriate exome capture technology for their particular application.

Journal ArticleDOI
TL;DR: The first high-quality genome sequence of industrial high-penicillin strain of P. chrysogenum NCPC10086 is provided and some non-synonymous mutations in the genes participating in homogentisate pathway or working as regulators of penicillin biosynthesis are found.
Abstract: Due to the importance of Penicillium chrysogenum holding in medicine, the genome of low-penicillin producing laboratorial strain Wisconsin54-1255 had been sequenced and fully annotated. Through classical mutagenesis of Wisconsin54-1255, product titers and productivities of penicillin have dramatically increased, but what underlying genome structural variations is still little known. Therefore, genome sequencing of a high-penicillin producing industrial strain is very meaningful. To reveal more insights into the genome structural variations of high-penicillin producing strain, we sequenced an industrial strain P. chrysogenum NCPC10086. By whole genome comparative analysis, we observed a large number of mutations, insertions and deletions, and structural variations. There are 69 new genes that not exist in the genome sequence of Wisconsin54-1255 and some of them are involved in energy metabolism, nitrogen metabolism and glutathione metabolism. Most importantly, we discovered a 53.7 Kb "new shift fragment" in a seven copies of determinative penicillin biosynthesis cluster in NCPC10086 and the arrangement type of amplified region is unique. Moreover, we presented two large-scale translocations in NCPC10086, containing genes involved energy, nitrogen metabolism and peroxysome pathway. At last, we found some non-synonymous mutations in the genes participating in homogentisate pathway or working as regulators of penicillin biosynthesis. We provided the first high-quality genome sequence of industrial high-penicillin strain of P. chrysogenum and carried out a comparative genome analysis with a low-producing experimental strain. The genomic variations we discovered are related with energy metabolism, nitrogen metabolism and so on. These findings demonstrate the potential information for insights into the high-penicillin yielding mechanism and metabolic engineering in the future.

Journal ArticleDOI
TL;DR: This largest GWAS ever performed in beef cattle led to discover several novel across-breed and breed-specific large-effect pleiotropic QTL that cumulatively account for a significant percentage of additive genetic variance (e.g. more than a third of additive Genetic variance of birth and mature weights; and calving ease direct in Hereford).
Abstract: The availability of high-density SNP assays including the BovineSNP50 (50 K) enables the identification of novel quantitative trait loci (QTL) and improvement of the resolution of the locations of previously mapped QTL. We performed a series of genome-wide association studies (GWAS) using 50 K genotypes scored in 18,274 animals from 10 US beef cattle breeds with observations for twelve body weights, calving ease and carcass traits. A total of 159 large-effects QTL (defined as 1-Mb genome windows explaining more than 1% of additive genetic variance) were identified. In general, more QTL were identified in analyses with bigger sample sizes. Four large-effect pleiotropic or closely linked QTLs located on BTA6 at 37–42 Mb (primarily at 38 Mb), on BTA7 at 93 Mb, on BTA14 at 23–26 Mb (primarily at 25 Mb) and on BTA20 at 4 Mb were identified in more than one breed. Several breed-specific large-effect pleiotropic or closely linked QTL were also identified. Some identified QTL regions harbor genes known to have large effects on a variety of traits in cattle such as PLAG1 and MSTN and others harbor promising candidate genes including NCAPG, ARRDC3, ERGIC1, SH3PXD2B, HMGA2, MSRB3, LEMD3, TIGAR, SEPT7, and KIRREL3. Gene ontology analysis revealed that genes involved in ossification and in adipose tissue development were over-represented in the identified pleiotropic QTL. Also, the MAPK signaling pathway was identified as a common pathway affected by the genes located near the pleiotropic QTL. This largest GWAS ever performed in beef cattle, led us to discover several novel across-breed and breed-specific large-effect pleiotropic QTL that cumulatively account for a significant percentage of additive genetic variance (e.g. more than a third of additive genetic variance of birth and mature weights; and calving ease direct in Hereford). These results will improve our understanding of the biology of growth and body composition in cattle.

Journal ArticleDOI
TL;DR: The genome-wide identification, chromosome organization, gene structures, evolutionary and expression analyses of grapevine bZIP genes provide an overall insight of this gene family and their potential involvement in growth, development and stress responses.
Abstract: Basic leucine zipper (bZIP) transcription factor gene family is one of the largest and most diverse families in plants. Current studies have shown that the bZIP proteins regulate numerous growth and developmental processes and biotic and abiotic stress responses. Nonetheless, knowledge concerning the specific expression patterns and evolutionary history of plant bZIP family members remains very limited. We identified 55 bZIP transcription factor-encoding genes in the grapevine (Vitis vinifera) genome, and divided them into 10 groups according to the phylogenetic relationship with those in Arabidopsis. The chromosome distribution and the collinearity analyses suggest that expansion of the grapevine bZIP (VvbZIP) transcription factor family was greatly contributed by the segment/chromosomal duplications, which may be associated with the grapevine genome fusion events. Nine intron/exon structural patterns within the bZIP domain and the additional conserved motifs were identified among all VvbZIP proteins, and showed a high group-specificity. The predicted specificities on DNA-binding domains indicated that some highly conserved amino acid residues exist across each major group in the tree of land plant life. The expression patterns of VvbZIP genes across the grapevine gene expression atlas, based on microarray technology, suggest that VvbZIP genes are involved in grapevine organ development, especially seed development. Expression analysis based on qRT-PCR indicated that VvbZIP genes are extensively involved in drought- and heat-responses, with possibly different mechanisms. The genome-wide identification, chromosome organization, gene structures, evolutionary and expression analyses of grapevine bZIP genes provide an overall insight of this gene family and their potential involvement in growth, development and stress responses. This will facilitate further research on the bZIP gene family regarding their evolutionary history and biological functions.

Journal ArticleDOI
TL;DR: This work combines metabolomics and transcriptomics to investigate the inducible biosynthesis of the bioactive diterpenoid tanshinones from the Chinese medicinal herb, Salvia miltiorrhiza (Danshen), and indicates a biphasic response of Danshen terpenoid metabolism to elicitation.
Abstract: Plant natural products have been co-opted for millennia by humans for various uses such as flavor, fragrances, and medicines. These compounds often are only produced in relatively low amounts and are difficult to chemically synthesize, limiting access. While elucidation of the underlying biosynthetic processes might help alleviate these issues (e.g., via metabolic engineering), investigation of this is hindered by the low levels of relevant gene expression and expansion of the corresponding enzymatic gene families. However, the often-inducible nature of such metabolic processes enables selection of those genes whose expression pattern indicates a role in production of the targeted natural product. Here, we combine metabolomics and transcriptomics to investigate the inducible biosynthesis of the bioactive diterpenoid tanshinones from the Chinese medicinal herb, Salvia miltiorrhiza (Danshen). Untargeted metabolomics investigation of elicited hairy root cultures indicated that tanshinone production was a dominant component of the metabolic response, increasing at later time points. A transcriptomic approach was applied to not only define a comprehensive transcriptome (comprised of 20,972 non-redundant genes), but also its response to induction, revealing 6,358 genes that exhibited differential expression, with significant enrichment for up-regulation of genes involved in stress, stimulus and immune response processes. Consistent with our metabolomics analysis, there appears to be a slower but more sustained increased in transcript levels of known genes from diterpenoid and, more specifically, tanshinone biosynthesis. Among the co-regulated genes were 70 transcription factors and 8 cytochromes P450, providing targets for future investigation. Our results indicate a biphasic response of Danshen terpenoid metabolism to elicitation, with early induction of sesqui- and tri- terpenoid biosynthesis, followed by later and more sustained production of the diterpenoid tanshinones. Our data provides a firm foundation for further elucidation of tanshinone and other inducible natural product metabolism in Danshen.