scispace - formally typeset
Search or ask a question
Author

Tom Slezak

Bio: Tom Slezak is an academic researcher from Lawrence Livermore National Laboratory. The author has contributed to research in topics: Genome & Metagenomics. The author has an hindex of 28, co-authored 51 publications receiving 24078 citations. Previous affiliations of Tom Slezak include Joint Genome Institute & University of Missouri–Kansas City.


Papers
More filters
Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
TL;DR: KSNP3.0 as discussed by the authors is a significantly improved version of kSNP v2.0, a program for SNP identification and phylogenetic analysis without genome alignment or the requirement for reference genomes.
Abstract: UNLABELLED We announce the release of kSNP3.0, a program for SNP identification and phylogenetic analysis without genome alignment or the requirement for reference genomes. kSNP3.0 is a significantly improved version of kSNP v2. AVAILABILITY AND IMPLEMENTATION kSNP3.0 is implemented as a package of stand-alone executables for Linux and Mac OS X under the open-source BSD license. The executable packages, source code and a full User Guide are freely available at https://sourceforge.net/projects/ksnp/files/ CONTACT barryghall@gmail.com.

447 citations

Journal ArticleDOI
Jane Grimwood1, Laurie Gordon2, Laurie Gordon3, Anne S. Olsen2, Anne S. Olsen3, Astrid Terry2, Jeremy Schmutz1, Jane Lamerdin3, Jane Lamerdin2, Uffe Hellsten2, David Goodstein2, Olivier Couronne2, Mary Bao Tran-Gyamfi2, Mary Bao Tran-Gyamfi3, Andrea Aerts2, Michael R. Altherr4, Michael R. Altherr2, Linda K. Ashworth2, Linda K. Ashworth3, Eva Bajorek1, Stacey Black1, Elbert Branscomb3, Elbert Branscomb2, Sean Caenepeel2, Anthony V. Carrano3, Anthony V. Carrano2, Chenier Caoile1, Yee Man Chan1, Mari Christensen3, Mari Christensen2, Catherine A. Cleland2, Catherine A. Cleland4, Alex Copeland2, Eileen Dalin2, Paramvir S. Dehal2, Mirian Denys1, John C. Detter2, Julio Escobar1, Dave Flowers1, Dea Fotopulos1, Carmen Rosa Albacete García1, Anca M. Georgescu3, Anca M. Georgescu2, Tijana Glavina2, Maria Gomez1, Eidelyn Gonzales1, Matthew Groza2, Matthew Groza3, Nancy Hammon2, Trevor Hawkins2, Lauren Haydu1, Isaac Ho2, Wayne Huang2, Sanjay Israni2, Jamie Jett2, Kristen Kadner2, Heather Kimball2, Arthur Kobayashi3, Arthur Kobayashi2, Vladimer Larionov, Sun-Hee Leem, Frederick Lopez1, Yunian Lou2, Steve Lowry2, Stephanie Malfatti3, Stephanie Malfatti2, Diego Martinez2, Paula McCready2, Paula McCready3, Catherine Medina1, Jenna Morgan2, Kathryn Nelson4, Kathryn Nelson2, Matt Nolan2, Ivan Ovcharenko3, Ivan Ovcharenko2, Sam Pitluck2, Martin Pollard2, Anthony P. Popkie5, Paul Predki2, Glenda Quan2, Glenda Quan3, Lucía Ramírez1, Sam Rash2, James Retterer1, Alex Rodriguez1, Stephanine Rogers1, Asaf Salamov2, Angelica Salazar1, Xinwei She5, Doug Smith2, Tom Slezak2, Tom Slezak3, Victor V. Solovyev2, Nina Thayer2, Nina Thayer4, Hope Tice2, Ming Tsai1, Anna Ustaszewska2, Nu Vo1, Mark C. Wagner2, Mark C. Wagner3, Jeremy Wheeler1, Kevin Wu1, Gary Xie2, Gary Xie4, Joan Yang1, Inna Dubchak2, Terrence S. Furey6, Pieter J. deJong7, Mark Dickson1, David Gordon8, Evan E. Eichler5, Len A. Pennacchio2, Paul G. Richardson2, Lisa Stubbs3, Lisa Stubbs2, Daniel S. Rokhsar2, Richard M. Myers1, Edward M. Rubin2, Susan Lucas2 
01 Apr 2004-Nature
TL;DR: Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, and segments of coding and non-coding conservation with the distant fish species Takifugu.
Abstract: Chromosome 19 has the highest gene density of all human chromosomes, more than double the genome-wide average. The large clustered gene families, corresponding high G + C content, CpG islands and density of repetitive DNA indicate a chromosome rich in biological and evolutionary significance. Here we describe 55.8 million base pairs of highly accurate finished sequence representing 99.9% of the euchromatin portion of the chromosome. Manual curation of gene loci reveals 1,461 protein-coding genes and 321 pseudogenes. Among these are genes directly implicated in mendelian disorders, including familial hypercholesterolaemia and insulin-resistant diabetes. Nearly one-quarter of these genes belong to tandemly arranged families, encompassing more than 25% of the chromosome. Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, and segments of coding and non-coding conservation with the distant fish species Takifugu.

307 citations

Journal ArticleDOI
TL;DR: This work describes FDA-ARGOS, a reference database for high-quality microbial reference genomes, and demonstrates its utility on the example of two use cases and provides quality control metrics for the FDA- ARGOS genomic database resource and outlines the need for genome quality gap filling in the public domain.
Abstract: FDA proactively invests in tools to support innovation of emerging technologies, such as infectious disease next generation sequencing (ID-NGS). Here, we introduce FDA-ARGOS quality-controlled reference genomes as a public database for diagnostic purposes and demonstrate its utility on the example of two use cases. We provide quality control metrics for the FDA-ARGOS genomic database resource and outline the need for genome quality gap filling in the public domain. In the first use case, we show more accurate microbial identification of Enterococcus avium from metagenomic samples with FDA-ARGOS reference genomes compared to non-curated GenBank genomes. In the second use case, we demonstrate the utility of FDA-ARGOS reference genomes for Ebola virus target sequence comparison as part of a composite validation strategy for ID-NGS diagnostic tests. The use of FDA-ARGOS as an in silico target sequence comparator tool combined with representative clinical testing could reduce the burden for completing ID-NGS clinical trials. To be able to use infectious disease next generation sequencing as a diagnostic tool, appropriate reference datasets are required. Here, Sichtig et al. describe FDA-ARGOS, a reference database for high-quality microbial reference genomes, and demonstrate its utility on the example of two use cases.

264 citations

01 Jan 2015
TL;DR: KSNP3.0 is a significantly improved version of kSNP v2.0, a program for SNP identification and phylogenetic analysis without genome alignment or the requirement for reference genomes.
Abstract: SUMMARY: We announce the release of kSNP3.0, a program for SNP identification and phylogenetic analysis without genome alignment or the requirement for reference genomes. kSNP3.0 is a significantly improved version of kSNP v2. AVAILABILITY AND IMPLEMENTATION: kSNP3.0 is implemented as a package of stand-alone executables for Linux and Mac OS X under the opensource BSD license. The executable packages, source code and a full User Guide are freely available at https:// sourceforge.net/projects/ksnp/files/

263 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

14,075 citations

Journal ArticleDOI
TL;DR: An overview of the statistical methods, computational tools, and visual exploration modules for data input and the results obtainable in MEGA is provided.
Abstract: With its theoretical basis firmly established in molecular evolutionary and population genetics, the comparative DNA and protein sequence analysis plays a central role in reconstructing the evolutionary histories of species and multigene families, estimating rates of molecular evolution, and inferring the nature and extent of selective forces shaping the evolution of genes and genomes. The scope of these investigations has now expanded greatly owing to the development of high-throughput sequencing techniques and novel statistical and computational methods. These methods require easy-to-use computer programs. One such effort has been to produce Molecular Evolutionary Genetics Analysis (MEGA) software, with its focus on facilitating the exploration and analysis of the DNA and protein sequence variation from an evolutionary perspective. Currently in its third major release, MEGA3 contains facilities for automatic and manual sequence alignment, web-based mining of databases, inference of the phylogenetic trees, estimation of evolutionary distances and testing evolutionary hypotheses. This paper provides an overview of the statistical methods, computational tools, and visual exploration modules for data input and the results obtainable in MEGA.

12,124 citations

Journal ArticleDOI
J. Craig Venter1, Mark Raymond Adams1, Eugene W. Myers1, Peter W. Li1  +269 moreInstitutions (12)
16 Feb 2001-Science
TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.
Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

12,098 citations

Journal ArticleDOI
14 Jan 2005-Cell
TL;DR: In a four-genome analysis of 3' UTRs, approximately 13,000 regulatory relationships were detected above the estimate of false-positive predictions, thereby implicating as miRNA targets more than 5300 human genes, which represented 30% of the gene set.

11,624 citations

01 Jun 2012
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

10,124 citations