scispace - formally typeset
Search or ask a question
Institution

J. Craig Venter Institute

NonprofitLa Jolla, California, United States
About: J. Craig Venter Institute is a nonprofit organization based out in La Jolla, California, United States. It is known for research contribution in the topics: Genome & Gene. The organization has 1268 authors who have published 2300 publications receiving 304083 citations. The organization is also known as: JCVI & The Institute for Genomic Research.
Topics: Genome, Gene, Genomics, Population, Microbiome


Papers
More filters
Journal ArticleDOI
TL;DR: It is demonstrated that even when the purpose is to understand complex structural variation at a single region of the genome, complete genome assembly is becoming the simplest way to achieve this goal.
Abstract: The handheld Oxford Nanopore MinION sequencer generates ultra-long reads with minimal cost and time requirements, which makes sequencing genomes at the bench feasible. Here, we sequence the gold standard Arabidopsis thaliana genome (KBS-Mac-74 accession) on the bench with the MinION sequencer, and assemble the genome using typical consumer computing hardware (4 Cores, 16 Gb RAM) into chromosome arms (62 contigs with an N50 length of 12.3 Mb). We validate the contiguity and quality of the assembly with two independent single-molecule technologies, Bionano optical genome maps and Pacific Biosciences Sequel sequencing. The new A. thaliana KBS-Mac-74 genome enables resolution of a quantitative trait locus that had previously been recalcitrant to a Sanger-based BAC sequencing approach. In summary, we demonstrate that even when the purpose is to understand complex structural variation at a single region of the genome, complete genome assembly is becoming the simplest way to achieve this goal.

250 citations

Journal ArticleDOI
TL;DR: Using whole-genome comparisons of five diverse strains of Bacillus anthracis to facilitate SNP discovery shows that only polymorphisms lying along the evolutionary pathway between reference strains will be observed, and shows how divergent branches in topologies collapse to single points but provide accurate information on internodal distances and points of origin for ancestral clades.
Abstract: Phylogenetic reconstruction using molecular data is often subject to homoplasy, leading to inaccurate conclusions about phylogenetic relationships among operational taxonomic units. Compared with other molecular markers, single-nucleotide polymorphisms (SNPs) exhibit extremely low mutation rates, making them rare in recently emerged pathogens, but they are less prone to homoplasy and thus extremely valuable for phylogenetic analyses. Despite their phylogenetic potential, ascertainment bias occurs when SNP characters are discovered through biased taxonomic sampling; by using whole-genome comparisons of five diverse strains of Bacillus anthracis to facilitate SNP discovery, we show that only polymorphisms lying along the evolutionary pathway between reference strains will be observed. We illustrate this in theoretical and simulated data sets in which complex phylogenetic topologies are reduced to linear evolutionary models. Using a set of 990 SNP markers, we also show how divergent branches in our topologies collapse to single points but provide accurate information on internodal distances and points of origin for ancestral clades. These data allowed us to determine the ancestral root of B. anthracis, showing that it lies closer to a newly described "C" branch than to either of two previously described "A" or "B" branches. In addition, subclade rooting of the C branch revealed unequal evolutionary rates that seem to be correlated with ecological parameters and strain attributes. Our use of nonhomoplastic whole-genome SNP characters allows branch points and clade membership to be estimated with great precision, providing greater insight into epidemiological, ecological, and forensic questions.

249 citations

Journal ArticleDOI
TL;DR: Due to the large number of reads afforded by the 454 DNA sequencing technology, it is effective in revealing the expression of transcripts from a broad range of GO categories and contains many rare transcripts in normalized cDNA libraries, although only a limited portion of their sequence is uncovered.
Abstract: In this study, we addressed whether a single 454 Life Science GS20 sequencing run provides new gene discovery from a normalized cDNA library, and whether the short reads produced via this technology are of value in gene structure annotation. A single 454 GS20 sequencing run on adapter-ligated cDNA, from a normalized cDNA library, generated 292,465 reads that were reduced to 252,384 reads with an average read length of 92 nucleotides after cleaning. After clustering and assembly, a total of 184,599 unique sequences were generated containing over 400 SSRs. The 454 sequences generated hits to more genes than a comparable amount of sequence from MtGI. Although short, the 454 reads are of sufficient length to map to a unique genome location as effectively as longer ESTs produced by conventional sequencing. Functional interpretation of the sequences was carried out by Gene Ontology assignments from matches to Arabidopsis and was shown to cover a broad range of GO categories. 53,796 assemblies and singletons (29%) had no match in the existing MtGI. Within the previously unobserved Medicago transcripts, thousands had matches in a comprehensive protein database and one or more of the TIGR Plant Gene Indices. Approximately 20% of these novel sequences could be found in the Medicago genome sequence. A total of 70,026 reads generated by the 454 technology were mapped to 785 Medicago finished BACs using PASA and over 1,000 gene models required modification. In parallel to 454 sequencing, 4,445 5'-prime reads were generated by conventional sequencing using the same library and from the assembled sequences it was shown to contain about 52% full length cDNAs encoding proteins from 50 to over 500 amino acids in length. Due to the large number of reads afforded by the 454 DNA sequencing technology, it is effective in revealing the expression of transcripts from a broad range of GO categories and contains many rare transcripts in normalized cDNA libraries, although only a limited portion of their sequence is uncovered. As with longer ESTs, 454 reads can be mapped uniquely onto genomic sequence to provide support for, and modifications of, gene predictions.

248 citations

Journal ArticleDOI
TL;DR: The development of ePlant is described and several examples illustrating its integrative features for hypothesis generation are presented, including the process of deploying ePl plant as an “app” on Araport.
Abstract: A big challenge in current systems biology research arises when different types of data must be accessed from separate sources and visualized using separate tools. The high cognitive load required to navigate such a workflow is detrimental to hypothesis generation. Accordingly, there is a need for a robust research platform that incorporates all data and provides integrated search, analysis, and visualization features through a single portal. Here, we present ePlant (http://bar.utoronto.ca/eplant), a visual analytic tool for exploring multiple levels of Arabidopsis thaliana data through a zoomable user interface. ePlant connects to several publicly available web services to download genome, proteome, interactome, transcriptome, and 3D molecular structure data for one or more genes or gene products of interest. Data are displayed with a set of visualization tools that are presented using a conceptual hierarchy from big to small, and many of the tools combine information from more than one data type. We describe the development of ePlant in this article and present several examples illustrating its integrative features for hypothesis generation. We also describe the process of deploying ePlant as an “app” on Araport. Building on readily available web services, the code for ePlant is freely available for any other biological species research.

247 citations

Journal ArticleDOI
TL;DR: The genome sequence of ExPEC IHE3034 (ST95) isolated from a case of neonatal meningitis is determined and the gene encoding the most protective antigen was detected in most of the E. coli isolates, highly conserved in sequence and found to be exported by a type II secretion system which seems to be nonfunctional in nonpathogenic strains.
Abstract: Extraintestinal pathogenic Escherichia coli (ExPEC) are a common cause of disease in both mammals and birds. A vaccine to prevent such infections would be desirable given the increasing antibiotic resistance of these bacteria. We have determined the genome sequence of ExPEC IHE3034 (ST95) isolated from a case of neonatal meningitis and compared this to available genome sequences of other ExPEC strains and a few nonpathogenic E. coli. We found 19 genomic islands present in the genome of IHE3034, which are absent in the nonpathogenic E. coli isolates. By using subtractive reverse vaccinology we identified 230 antigens present in ExPEC but absent (or present with low similarity) in nonpathogenic strains. Nine antigens were protective in a mouse challenge model. Some of them were also present in other pathogenic non-ExPEC strains, suggesting that a broadly protective E. coli vaccine may be possible. The gene encoding the most protective antigen was detected in most of the E. coli isolates, highly conserved in sequence and found to be exported by a type II secretion system which seems to be nonfunctional in nonpathogenic strains.

245 citations


Authors

Showing all 1274 results

NameH-indexPapersCitations
John R. Yates1771036129029
Anders M. Dale156823133891
Ronald W. Davis155644151276
Steven L. Salzberg147407231756
Mark Raymond Adams1471187135038
Nicholas J. Schork12558762131
William R. Jacobs11849048638
Ian T. Paulsen11235469460
Michael B. Brenner11139344771
Kenneth H. Nealson10848351100
Claire M. Fraser10835276292
Stephen L. Hoffman10445838597
Michael J. Brownstein10227447929
Amalio Telenti10242140509
John Quackenbush9942767029
Network Information
Related Institutions (5)
Wellcome Trust Sanger Institute
9.6K papers, 1.2M citations

94% related

Broad Institute
11.6K papers, 1.5M citations

92% related

Cold Spring Harbor Laboratory
6.6K papers, 1M citations

92% related

Pasteur Institute
50.3K papers, 2.5M citations

92% related

Howard Hughes Medical Institute
34.6K papers, 5.2M citations

92% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20233
202211
2021116
2020141
2019154
2018157