scispace - formally typeset
Search or ask a question

Showing papers by "Chad Nusbaum published in 2010"


Journal ArticleDOI
TL;DR: In this paper, the authors sequenced 60 million small RNAs from mouse brain, ovary, testes, embryonic stem cells, three embryonic stages, and whole newborns.
Abstract: MicroRNAs (miRNAs) are small regulatory RNAs that derive from distinctive hairpin transcripts. To learn more about the miRNAs of mammals, we sequenced 60 million small RNAs from mouse brain, ovary, testes, embryonic stem cells, three embryonic stages, and whole newborns. Analysis of these sequences confirmed 398 annotated miRNA genes and identified 108 novel miRNA genes. More than 150 previously annotated miRNAs and hundreds of candidates failed to yield sequenced RNAs with miRNA-like features. Ectopically expressing these previously proposed miRNA hairpins also did not yield small RNAs, whereas ectopically expressing the confirmed and newly identified hairpins usually did yield small RNAs with the classical miRNA features, including dependence on the Drosha endonuclease for processing. These experiments, which suggest that previous estimates of conserved mammalian miRNAs were inflated, provide a substantially revised list of confidently identified murine miRNAs from which to infer the general features of mammalian miRNAs. Our analyses also revealed new aspects of miRNA biogenesis and modification, including tissue-specific strand preferences, sequential Dicer cleavage of a metazoan precursor miRNA (pre-miRNA), consequential 5' heterogeneity, newly identified instances of miRNA editing, and evidence for widespread pre-miRNA uridylation reminiscent of miRNA regulation by Lin28.

813 citations


Journal ArticleDOI
TL;DR: In this paper, the authors developed a comprehensive computational pipeline to compare library quality metrics from any RNA-seq method, using the well-annotated Saccharomyces cerevisiae transcriptome as a benchmark.
Abstract: Strand-specific, massively parallel cDNA sequencing (RNA-seq) is a powerful tool for transcript discovery, genome annotation and expression profiling. There are multiple published methods for strand-specific RNA-seq, but no consensus exists as to how to choose between them. Here we developed a comprehensive computational pipeline to compare library quality metrics from any RNA-seq method. Using the well-annotated Saccharomyces cerevisiae transcriptome as a benchmark, we compared seven library-construction protocols, including both published and our own methods. We found marked differences in strand specificity, library complexity, evenness and continuity of coverage, agreement with known annotations and accuracy for expression profiling. Weighing each method's performance and ease, we identified the dUTP second-strand marking and the Illumina RNA ligation methods as the leading protocols, with the former benefitting from the current availability of paired-end sequencing. Our analysis provides a comprehensive benchmark, and our computational pipeline is applicable for assessment of future protocols in other organisms.

714 citations


01 Aug 2010
TL;DR: A comprehensive computational pipeline is developed to compare library quality metrics from any RNA-seq method and identified the dUTP second-strand marking and the Illumina RNA ligation methods as the leading protocols, with the former benefitting from the current availability of paired-end sequencing.
Abstract: Strand-specific, massively parallel cDNA sequencing (RNA-seq) is a powerful tool for transcript discovery, genome annotation and expression profiling. There are multiple published methods for strand-specific RNA-seq, but no consensus exists as to how to choose between them. Here we developed a comprehensive computational pipeline to compare library quality metrics from any RNA-seq method. Using the well-annotated Saccharomyces cerevisiae transcriptome as a benchmark, we compared seven library-construction protocols, including both published and our own methods. We found marked differences in strand specificity, library complexity, evenness and continuity of coverage, agreement with known annotations and accuracy for expression profiling. Weighing each method's performance and ease, we identified the dUTP second-strand marking and the Illumina RNA ligation methods as the leading protocols, with the former benefitting from the current availability of paired-end sequencing. Our analysis provides a comprehensive benchmark, and our computational pipeline is applicable for assessment of future protocols in other organisms.

675 citations


Journal ArticleDOI
21 May 2010-Science
TL;DR: Results from an initial reference genome sequencing of 178 microbial genomes allow for ~40% of random sequences from the microbiome of the gastrointestinal tract to be associated with organisms based on the match criteria used, suggesting that the authors are still far from saturating microbial species genetic data sets.
Abstract: The human microbiome refers to the community of microorganisms, including prokaryotes, viruses, and microbial eukaryotes, that populate the human body. The National Institutes of Health launched an initiative that focuses on describing the diversity of microbial species that are associated with health and disease. The first phase of this initiative includes the sequencing of hundreds of microbial reference genomes, coupled to metagenomic sequencing from multiple body sites. Here we present results from an initial reference genome sequencing of 178 microbial genomes. From 547,968 predicted polypeptides that correspond to the gene complement of these strains, previously unidentified ("novel") polypeptides that had both unmasked sequence length greater than 100 amino acids and no BLASTP match to any nonreference entry in the nonredundant subset were defined. This analysis resulted in a set of 30,867 polypeptides, of which 29,987 (approximately 97%) were unique. In addition, this set of microbial genomes allows for approximately 40% of random sequences from the microbiome of the gastrointestinal tract to be associated with organisms based on the match criteria used. Insights into pan-genome analysis suggest that we are still far from saturating microbial species genetic data sets. In addition, the associated metrics and standards used by our group for quality assurance are presented.

649 citations


Journal ArticleDOI
10 Dec 2010-Science
TL;DR: A group of papers analyzes pathogen genomes to find the roots of virulence, opportunism, and life-style determinants, demonstrating that dynamic repeat-rich genome compartments underpin accelerated gene evolution following host jumps in this pathogen lineage.
Abstract: Many plant pathogens, including those in the lineage of the Irish potato famine organism Phytophthora infestans, evolve by host jumps followed by specialization. However, how host jumps affect genome evolution remains largely unknown. To determine the patterns of sequence variation in the P. infestans lineage, we resequenced six genomes of four sister species. This revealed uneven evolutionary rates across genomes with genes in repeat-rich regions showing higher rates of structural polymorphisms and positive selection. These loci are enriched in genes induced in planta, implicating host adaptation in genome evolution. Unexpectedly, genes involved in epigenetic processes formed another class of rapidly evolving residents of the gene-sparse regions. These results demonstrate that dynamic repeat-rich genome compartments underpin accelerated gene evolution following host jumps in this pathogen lineage.

409 citations


Journal ArticleDOI
TL;DR: In the version of this article initially published, the fourth sentence in the Online Methods section “RNA extraction and library preparation,” that read in part “procedure that combines a random priming step with a shearing step and results in fragments of ∼700 bp in size,’” should have read, ”procedures that combines fragmentation of mRNA to a peak size of ∼750 nucleotides by heating6 followed by random-primed reverse transcription
Abstract: Nat. Biotechnol. 28, 503–510 (2010); published online 02 May 2010; corrected after print 9 July 2010 In the version of this article initially published, the fourth sentence in the Online Methods section “RNA extraction and library preparation,” that read in part “procedure that combines a random priming step with a shearing step8,9, 28 and results in fragments of ∼700 bp in size,” should have read, “procedure that combines fragmentation of mRNA to a peak size of ∼750 nucleotides by heating6 followed by random-primed reverse transcription8.

20 citations


Journal ArticleDOI
TL;DR: This article is part of the supplement: Beyond the Genome: The true gene count, human evolution and disease genomics.
Abstract: This article is part of the supplement: Beyond the Genome: The true gene count, human evolution and disease genomics

7 citations


01 Jan 2010
TL;DR: An automated, high throughput library constr uction process for 454 technology, using automated-friendly magnetic bead-based size selection and cleanup steps, to create sequence-ready 454 libraries in 2 days, a dramatic improvement over the standard method.
Abstract: We present an automated, high throughput library constr uction process for 454 technology. Sample handling errors and cross-contamination are minimized via end-to-end barcoding of plasticware, along with molecular DNA barcoding of constructs. Automation-friendly magnetic bead-based size selection and cleanup steps have been devised, eliminating major bottlenecks and significant sources of error. Using this methodology, one technician can create 96 sequence-ready 454 libraries in 2 days, a dramatic improvement over the standard method. Background The emergence of next-generation sequencing technolo-gies, such as the Roche/454 Genome Sequencer, the Illu-mina Genome Analyzer, the Applied Biosystems SOLiDsequencer and others, has provided the opportunity forboth large genome centers and individual labs to generateDNA sequence data at an unprecedented scale [1]. How-ever, as sequence output continues to increase dramati-cally, processes to generate sequence-ready libraries lagbehind in scale. The minimum unit of sequence data (forexample, lane or channel) already exceeds the amountrequired for small projects, such as viral or bacterialgenomes, and will continue to increase. As a result, proj-ects with large numbers of samples but small sequenceper sample requirements become increasingly challeng-ing to undertake in a cost-effective manner.The 454 Genome Sequencer uses bead-in-emulsionamplification and a pyrosequencing chemistry to gener-ate DNA sequence reads by synthesis [2]. Longer readsand shorter sequencing run times make the 454 platforma powerful tool for

1 citations