scispace - formally typeset
Search or ask a question
Institution

J. Craig Venter Institute

NonprofitLa Jolla, California, United States
About: J. Craig Venter Institute is a nonprofit organization based out in La Jolla, California, United States. It is known for research contribution in the topics: Genome & Gene. The organization has 1268 authors who have published 2300 publications receiving 304083 citations. The organization is also known as: JCVI & The Institute for Genomic Research.
Topics: Genome, Gene, Genomics, Population, Microbiome


Papers
More filters
Journal ArticleDOI
TL;DR: This work describes a further improved and refined version of the M. truncatula genome (Mt4.0) based on de novo whole genome shotgun assembly of a majority of Illumina and 454 reads using ALLPATHS-LG, and re-annotates the genome through the gene prediction pipeline, which integrates EST, RNA-seq, protein and gene prediction evidences.
Abstract: Medicago truncatula, a close relative of alfalfa, is a preeminent model for studying nitrogen fixation, symbiosis, and legume genomics. The Medicago sequencing project began in 2003 with the goal to decipher sequences originated from the euchromatic portion of the genome. The initial sequencing approach was based on a BAC tiling path, culminating in a BAC-based assembly (Mt3.5) as well as an in-depth analysis of the genome published in 2011. Here we describe a further improved and refined version of the M. truncatula genome (Mt4.0) based on de novo whole genome shotgun assembly of a majority of Illumina and 454 reads using ALLPATHS-LG. The ALLPATHS-LG scaffolds were anchored onto the pseudomolecules on the basis of alignments to both the optical map and the genotyping-by-sequencing (GBS) map. The Mt4.0 pseudomolecules encompass ~360 Mb of actual sequences spanning 390 Mb of which ~330 Mb align perfectly with the optical map, presenting a drastic improvement over the BAC-based Mt3.5 which only contained 70% sequences (~250 Mb) of the current version. Most of the sequences and genes that previously resided on the unanchored portion of Mt3.5 have now been incorporated into the Mt4.0 pseudomolecules, with the exception of ~28 Mb of unplaced sequences. With regard to gene annotation, the genome has been re-annotated through our gene prediction pipeline, which integrates EST, RNA-seq, protein and gene prediction evidences. A total of 50,894 genes (31,661 high confidence and 19,233 low confidence) are included in Mt4.0 which overlapped with ~82% of the gene loci annotated in Mt3.5. Of the remaining genes, 14% of the Mt3.5 genes have been deprecated to an “unsupported” status and 4% are absent from the Mt4.0 predictions. Mt4.0 and its associated resources, such as genome browsers, BLAST-able datasets and gene information pages, can be found on the JCVI Medicago web site ( http://www.jcvi.org/medicago ). The assembly and annotation has been deposited in GenBank (BioProject: PRJNA10791). The heavily curated chromosomal sequences and associated gene models of Medicago will serve as a better reference for legume biology and comparative genomics.

373 citations

Journal ArticleDOI
09 Oct 2009-Science
TL;DR: In this article, the authors propose a method to distinguish good from poor data sets by navigating through the databases to find the number and type of reads deposited in sequence trace repositories (and not all genomes have this available), or to identify the number of contigs or genome fragments deposited to the database.
Abstract: For over a decade, genome sequences have adhered to only two standards that are relied on for purposes of sequence analysis by interested third parties (1, 2). However, ongoing developments in revolutionary sequencing technologies have resulted in a redefinition of traditional whole-genome sequencing that requires reevaluation of such standards. With commercially available 454 pyrosequencing (followed by Illumina, SOLiD, and now Helicos), there has been an explosion of genomes sequenced under the moniker “draft”; however, these can be very poor quality genomes (due to inherent errors in the sequencing technologies, and the inability of assembly programs to fully address these errors). Further, one can only infer that such draft genomes may be of poor quality by navigating through the databases to find the number and type of reads deposited in sequence trace repositories (and not all genomes have this available), or to identify the number of contigs or genome fragments deposited to the database. The difficulty in assessing the quality of such deposited genomes has created some havoc for genome analysis pipelines and has contributed to many wasted hours. Exponential leaps in raw sequencing capability and greatly reduced prices have further skewed the time- and cost-ratios of draft data generation versus the painstaking process of improving and finishing a genome. The result is an ever-widening gap between drafted and finished genomes that only promises to continue (see the figure, page 236); hence, there is an urgent need to distinguish good from poor data sets.

370 citations

Journal ArticleDOI
29 Mar 2012-PLOS ONE
TL;DR: The web tool Natural Product Domain Seeker (NaPDoS), which provides an automated method to assess the secondary metabolite biosynthetic gene diversity and novelty of strains or environments, and provides a rapid method to identify genes that may be associated with uncharacterized biochemistry.
Abstract: New bioinformatic tools are needed to analyze the growing volume of DNA sequence data. This is especially true in the case of secondary metabolite biosynthesis, where the highly repetitive nature of the associated genes creates major challenges for accurate sequence assembly and analysis. Here we introduce the web tool Natural Product Domain Seeker (NaPDoS), which provides an automated method to assess the secondary metabolite biosynthetic gene diversity and novelty of strains or environments. NaPDoS analyses are based on the phylogenetic relationships of sequence tags derived from polyketide synthase (PKS) and non-ribosomal peptide synthetase (NRPS) genes, respectively. The sequence tags correspond to PKS-derived ketosynthase domains and NRPS-derived condensation domains and are compared to an internal database of experimentally characterized biosynthetic genes. NaPDoS provides a rapid mechanism to extract and classify ketosynthase and condensation domains from PCR products, genomes, and metagenomic datasets. Close database matches provide a mechanism to infer the generalized structures of secondary metabolites while new phylogenetic lineages provide targets for the discovery of new enzyme architectures or mechanisms of secondary metabolite assembly. Here we outline the main features of NaPDoS and test it on four draft genome sequences and two metagenomic datasets. The results provide a rapid method to assess secondary metabolite biosynthetic gene diversity and richness in organisms or environments and a mechanism to identify genes that may be associated with uncharacterized biochemistry.

369 citations

Journal ArticleDOI
TL;DR: Variation in simple sequence repeats in key genes can provide a mechanism for generating antigenic variation that may account for the mammalian host's inability to mount a durable adaptive immune response to a B. mallei infection.
Abstract: The complete genome sequence of Burkholderia mallei ATCC 23344 provides insight into this highly infectious bacterium's pathogenicity and evolutionary history. B. mallei, the etiologic agent of glanders, has come under renewed scientific investigation as a result of recent concerns about its past and potential future use as a biological weapon. Genome analysis identified a number of putative virulence factors whose function was supported by comparative genome hybridization and expression profiling of the bacterium in hamster liver in vivo. The genome contains numerous insertion sequence elements that have mediated extensive deletions and rearrangements of the genome relative to Burkholderia pseudomallei. The genome also contains a vast number (>12,000) of simple sequence repeats. Variation in simple sequence repeats in key genes can provide a mechanism for generating antigenic variation that may account for the mammalian host's inability to mount a durable adaptive immune response to a B. mallei infection.

369 citations

Journal ArticleDOI
26 Dec 2018-PLOS ONE
TL;DR: It is demonstrated that closely related neuronal cell types can be similarly discriminated with both methods if intronic sequences are included in snRNA-seq analysis, and the high information content of nuclear RNA for characterization of cellular diversity in brain tissues is illustrated.
Abstract: Transcriptomic profiling of complex tissues by single-nucleus RNA-sequencing (snRNA-seq) affords some advantages over single-cell RNA-sequencing (scRNA-seq). snRNA-seq provides less biased cellular coverage, does not appear to suffer cell isolation-based transcriptional artifacts, and can be applied to archived frozen specimens. We used well-matched snRNA-seq and scRNA-seq datasets from mouse visual cortex to compare cell type detection. Although more transcripts are detected in individual whole cells (~11,000 genes) than nuclei (~7,000 genes), we demonstrate that closely related neuronal cell types can be similarly discriminated with both methods if intronic sequences are included in snRNA-seq analysis. We estimate that the nuclear proportion of total cellular mRNA varies from 20% to over 50% for large and small pyramidal neurons, respectively. Together, these results illustrate the high information content of nuclear RNA for characterization of cellular diversity in brain tissues.

368 citations


Authors

Showing all 1274 results

NameH-indexPapersCitations
John R. Yates1771036129029
Anders M. Dale156823133891
Ronald W. Davis155644151276
Steven L. Salzberg147407231756
Mark Raymond Adams1471187135038
Nicholas J. Schork12558762131
William R. Jacobs11849048638
Ian T. Paulsen11235469460
Michael B. Brenner11139344771
Kenneth H. Nealson10848351100
Claire M. Fraser10835276292
Stephen L. Hoffman10445838597
Michael J. Brownstein10227447929
Amalio Telenti10242140509
John Quackenbush9942767029
Network Information
Related Institutions (5)
Wellcome Trust Sanger Institute
9.6K papers, 1.2M citations

94% related

Broad Institute
11.6K papers, 1.5M citations

92% related

Cold Spring Harbor Laboratory
6.6K papers, 1M citations

92% related

Pasteur Institute
50.3K papers, 2.5M citations

92% related

Howard Hughes Medical Institute
34.6K papers, 5.2M citations

92% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20233
202211
2021116
2020141
2019154
2018157