scispace - formally typeset
Search or ask a question
Author

Chad Nusbaum

Bio: Chad Nusbaum is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Genome & Gene. The author has an hindex of 48, co-authored 69 publications receiving 62980 citations. Previous affiliations of Chad Nusbaum include Barts Health NHS Trust & Uniformed Services University of the Health Sciences.


Papers
More filters
Journal ArticleDOI
TL;DR: For the first time, proteomic data are used in the primary annotation of a new genome, providing validation of expression for many of the predicted proteins, including a long repeating unit of DNA of approximately 2435 bp present in five complete copies.
Abstract: Although often considered “minimal” organisms, mycoplasmas show a wide range of diversity with respect to host environment, phenotypic traits, and pathogenicity. Here we report the complete genomic sequence and proteogenomic map for the piscine mycoplasma Mycoplasma mobile, noted for its robust gliding motility. For the first time, proteomic data are used in the primary annotation of a new genome, providing validation of expression for many of the predicted proteins. Several novel features were discovered including a long repeating unit of DNA of ∼2435 bp present in five complete copies that are shown to code for nearly identical yet uniquely expressed proteins. M. mobile has among the lowest DNA GC contents (24.9%) and most reduced set of tRNAs of any organism yet reported (28). Numerous instances of tandem duplication as well as lateral gene transfer are evident in the genome. The multiple available complete genome sequences for other motile and immotile mycoplasmas enabled us to use comparative genomic and phylogenetic methods to suggest several candidate genes that might be involved in motility. The results of these analyses leave open the possibility that gliding motility might have arisen independently more than once in the mycoplasma lineage.

248 citations

Journal ArticleDOI
TL;DR: This work presents a general approach for ab initio discovery of the complete transcriptome of the budding yeast, based only on the unannotated genome sequence and millions of short reads from a single massively parallel sequencing run.
Abstract: Defining the transcriptome, the repertoire of transcribed regions encoded in the genome, is a challenging experimental task. Current approaches, relying on sequencing of ESTs or cDNA libraries, are expensive and labor-intensive. Here, we present a general approach for ab initio discovery of the complete transcriptome of the budding yeast, based only on the unannotated genome sequence and millions of short reads from a single massively parallel sequencing run. Using novel algorithms, we automatically construct a highly accurate transcript catalog. Our approach automatically and fully defines 86% of the genes expressed under the given conditions, and discovers 160 previously undescribed transcription units of 250 bp or longer. It correctly demarcates the 5' and 3' UTR boundaries of 86 and 77% of expressed genes, respectively. The method further identifies 83% of known splice junctions in expressed genes, and discovers 25 previously uncharacterized introns, including 2 cases of condition-dependent intron retention. Our framework is applicable to poorly understood organisms, and can lead to greater understanding of the transcribed elements in an explored genome.

239 citations

Journal ArticleDOI
TL;DR: This method was developed to assemble the genome of the sea squirt Ciona savignyi, which was sequenced to a depth of 12.7 x from a single wild individual and determined that the sequenced individual had an extremely high heterozygosity rate.
Abstract: Whole-genome assembly is now used routinely to obtain high-quality draft sequence for the genomes of species with low levels of polymorphism. However, genome assembly remains extremely challenging for highly polymorphic species. The difficulty arises because two divergent haplotypes are sequenced together, making it difficult to distinguish alleles at the same locus from paralogs at different loci. We present here a method for assembling highly polymorphic diploid genomes that involves assembling the two haplotypes separately and then merging them to obtain a reference sequence. Our method was developed to assemble the genome of the sea squirt Ciona savignyi, which was sequenced to a depth of 12.7 x from a single wild individual. By comparing finished clones of the two haplotypes we determined that the sequenced individual had an extremely high heterozygosity rate, averaging 4.6% with significant regional variation and rearrangements at all physical scales. Applied to these data, our method produced a reference assembly covering 157 Mb, with N50 contig and scaffold sizes of 47 kb and 989 kb, respectively. Alignment of ESTs indicates that 88% of loci are present at least once and 81% exactly once in the reference assembly. Our method represented loci in a single copy more reliably and achieved greater contiguity than a conventional whole-genome assembly method.

201 citations

Journal ArticleDOI
TL;DR: The results indicate that much of the increase in genome size of maize relative to rice andArabidopsis and Arabidopsis is attributable to an increase in number of both repetitive elements and genes.
Abstract: Maize (Zea mays or corn) plays many varied and important roles in society. It is not only an important experimental model plant, but also a major livestock feed crop and a significant source of industrial products such as sweeteners and ethanol. In this study we report the systematic analysis of contiguous sequences of the maize genome. We selected 100 random regions averaging 144 kb in size, representing about 0.6% of the genome, and generated a high-quality dataset for sequence analysis. This sampling contains 330 annotated genes, 91% of which are supported by expressed sequence tag data from maize and other cereal species. Genes averaged 4 kb in size with five exons, although the largest was over 59 kb with 31 exons. Gene density varied over a wide range from 0.5 to 10.7 genes per 100 kb and genes did not appear to cluster significantly. The total repetitive element content we observed (66%) was slightly higher than previous whole-genome estimates (58%-63%) and consisted almost exclusively of retroelements. The vast majority of genes can be aligned to at least one sequence read derived from gene-enrichment procedures, but only about 30% are fully covered. Our results indicate that much of the increase in genome size of maize relative to rice (Oryza sativa) and Arabidopsis (Arabidopsis thaliana) is attributable to an increase in number of both repetitive elements and genes.

169 citations

01 Nov 2008
TL;DR: In this paper, the authors performed full genome analyses on four Listeria monocytogenes, including human and food isolates from both a 1988 case of sporadic listeriosis and a 2000 Listeriosis outbreak, which had been linked to contaminated food from a single processing facility.
Abstract: While increasing data on bacterial evolution in controlled environments are available, our understanding of bacterial genome evolution in natural environments is limited. We thus performed full genome analyses on four Listeria monocytogenes, including human and food isolates from both a 1988 case of sporadic listeriosis and a 2000 listeriosis outbreak, which had been linked to contaminated food from a single processing facility. All four isolates had been shown to have identical subtypes, suggesting that a specific L. monocytogenes strain persisted in this processing plant over at least 12 years. While a genome sequence for the 1988 food isolate has been reported, we sequenced the genomes of the 1988 human isolate as well as a human and a food isolate from the 2000 outbreak to allow for comparative genome analyses. The two L. monocytogenes isolates from 1988 and the two isolates from 2000 had highly similar genome backbone sequences with very few single nucleotide (nt) polymorphisms (1 – 8 SNPs/isolate; confirmed by re-sequencing). While no genome rearrangements were identified in the backbone genome of the four isolates, a 42 kb prophage inserted in the chromosomal comK gene showed evidence for major genome rearrangements. The human-food isolate pair from each 1988 and 2000 had identical prophage sequence; however, there were significant differences in the prophage sequences between the 1988 and 2000 isolates. Diversification of this prophage appears to have been caused by multiple homologous recombination events or possibly prophage replacement. In addition, only the 2000 human isolate contained a plasmid, suggesting plasmid loss or acquisition events. Surprisingly, besides the polymorphisms found in the comK prophage, a single SNP in the tRNA Thr-4 prophage represents the only SNP that differentiates the 1988 isolates from the 2000 isolates. Our data support the hypothesis that the 2000 human listeriosis outbreak was caused by a L. monocytogenes strain that persisted in a food processing facility over 12 years and show that genome sequencing is a valuable and feasible tool for retrospective epidemiological analyses. Short-term evolution of L. monocytogenes in non-controlled environments appears to involve limited diversification beyond plasmid gain or loss and prophage diversification, highlighting the importance of phages in bacterial evolution.

149 citations


Cited by
More filters
Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

18,940 citations

Journal ArticleDOI
TL;DR: SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies.
Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V−SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online (http://bioinf.spbau.ru/spades). It is distributed as open source software.

16,859 citations

Book ChapterDOI
TL;DR: This chapter assumes acquaintance with the principles and practice of PCR, as outlined in, for example, refs.
Abstract: 1. Introduction Designing PCR and sequencing primers are essential activities for molecular biologists around the world. This chapter assumes acquaintance with the principles and practice of PCR, as outlined in, for example, refs. 1–4. Primer3 is a computer program that suggests PCR primers for a variety of applications, for example to create STSs (sequence tagged sites) for radiation hybrid mapping (5), or to amplify sequences for single nucleotide polymor-phism discovery (6). Primer3 can also select single primers for sequencing reactions and can design oligonucleotide hybridization probes. In selecting oligos for primers or hybridization probes, Primer3 can consider many factors. These include oligo melting temperature, length, GC content , 3′ stability, estimated secondary structure, the likelihood of annealing to or amplifying undesirable sequences (for example interspersed repeats), the likelihood of primer–dimer formation between two copies of the same primer, and the accuracy of the source sequence. In the design of primer pairs Primer3 can consider product size and melting temperature, the likelihood of primer– dimer formation between the two primers in the pair, the difference between primer melting temperatures, and primer location relative to particular regions of interest or to be avoided.

16,407 citations

Journal ArticleDOI
TL;DR: The Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available, providing a unified solution for transcriptome reconstruction in any sample.
Abstract: Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.

15,665 citations