scispace - formally typeset
Search or ask a question
Author

Frank Korzeniewski

Bio: Frank Korzeniewski is an academic researcher from Joint Genome Institute. The author has contributed to research in topics: Metagenomics & Sequence assembly. The author has an hindex of 7, co-authored 8 publications receiving 1740 citations. Previous affiliations of Frank Korzeniewski include United States Department of Energy & Lawrence Berkeley National Laboratory.
Topics: Metagenomics, Sequence assembly, Phrap, IMG, Genomics

Papers
More filters
Journal ArticleDOI
TL;DR: MycoCosm is a fungal genomics portal developed by the US Department of Energy Joint Genome Institute to support integration, analysis and dissemination of fungal genome sequences and other 'omics' data by providing interactive web-based tools.
Abstract: MycoCosm is a fungal genomics portal (http://jgi.doe.gov/fungi), developed by the US Department of Energy Joint Genome Institute to support integration, analysis and dissemination of fungal genome sequences and other 'omics' data by providing interactive web-based tools. MycoCosm also promotes and facilitates user community participation through the nomination of new species of fungi for sequencing, and the annotation and analysis of resulting data. By efficiently filling gaps in the Fungal Tree of Life, MycoCosm will help address important problems associated with energy and the environment, taking advantage of growing fungal genomics resources.

1,037 citations

Journal ArticleDOI
TL;DR: The integrated microbial genomes system is a new data management and analysis platform for microbial genomes provided by the Joint Genome Institute that contains both draft and complete JGI genomes integrated with other publicly available microbial genomes of all three domains of life.
Abstract: The integrated microbial genomes (IMG) system is a new data management and analysis platform for microbial genomes provided by the Joint Genome Institute (JGI). IMG contains both draft and complete JGI genomes integrated with other publicly available microbial genomes of all three domains of life. IMG provides tools and viewers for analyzing genomes, genes and functions, individually or in a comparative context. IMG allows users to focus their analysis on subsets of genes and genomes of interest and to save the results of their analysis. IMG is available at http://img.jgi.doe.gov.

413 citations

Journal ArticleDOI
TL;DR: Three simulated data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition and explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes.
Abstract: Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based ( blast hit distribution) and two sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.

354 citations

Journal ArticleDOI
10 Jul 2006
TL;DR: The IMG/M system as mentioned in this paper is an experimental metagenome data management and analysis system that is based on the Integrated Microbial Genomes (IMG) system, which provides tools and viewers for analyzing both metagenomes and isolate genomes individually or in a comparative context.
Abstract: The application of shotgun sequencing to environmental samples has revealed a new universe of microbial community genomes (metagenomes) involving previously uncultured organisms. Metagenome analysis, which is expected to provide a comprehensive picture of the gene functions and metabolic capacity for microbial communities, needs to be conducted in the context of a comprehensive data management and analysis system. We present in this paper IMG/M, an experimental metagenome data management and analysis system that is based on the Integrated Microbial Genomes (IMG) system. IMG/M provides tools and viewers for analyzing both metagenomes and isolate genomes individually or in a comparative context. IMG/M is available at. Contact: vmmarkowitz@lbl.gov

93 citations

01 Mar 2006
TL;DR: IMG/M is presented, an experimental metagenome data management and analysis system that is based on the Integrated Microbial Genomes (IMG) system and provides tools and viewers for analyzing both metagenomes and isolate genomes individually or in a comparative context.
Abstract: An Experimental Metagenome Data Management and Analysis System Victor M. Markowitz 1 , Natalia N. Ivanova 2 , Krishna Palaniappan 1 , Ernest Szeto 1 , Frank Korzeniewski 1 , Nikos C. Kyrpides 2 , and Philip Hugenholtz 2 Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, California 94720, USA Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, California 94598, USA ABSTRACT The application of shotgun sequencing to environmental samples has revealed a new universe of microbial community genomes (metagenomes) involving previously uncultured organisms. Metagenome analysis, which is expected to provide a comprehensive picture of the gene functions and metabolic capacity of microbial community, needs to be conducted in the context of a comprehensive data management and analysis system. We present in this paper IMG/M, an experimental metagenome data management and analysis system that is based on the Integrated Microbial Genomes (IMG) system. IMG/M provides tools and viewers for analyzing both metagenomes and isolate genomes individually or in a comparative context. 1. INTRODUCTION Environmental microbial community (microbiome) genome analysis (also known as metagenome analysis [17]) is expected to lead to advances in environmental cleanup, agriculture, industrial processes, and alternative energy production. Similarly, human metagenome analysis could provide new insights into the variation of microbial populations associated with the human body, ascribe qualitative and quantitative changes in human microbiota as risk/causative factors of disease and develop new treatment strategies [9]. The application of shotgun sequencing to microbiome samples has enabled the study of metagenomes involving previously uncultured and unculturable organisms. Comparative analysis of the metagenomes in the context of available reference isolate genomes could potentially reveal large-scale patterns of biochemical interactions and habitat-specific correlations in the host environment that might otherwise be missed [6]. Studies of environmental microbiomes, such as acid mine drainage biofilms [21] and Sargasso Sea samples[22], as well as studies of human microbiomes, such as the human gut microbiome [9], are examples of a rapidly expanding area of metagenome analysis applications. Unlike microbial genome data from isolate organisms, the generation and interpretation of metagenome data is in early stages of development. Metagenomes sequenced by organizations such as the Joint Genome Institute (JGI), TIGR, and the Venter Institute, follow an assembly and annotation process that is specific to each sequencing center. Although traditional assembly and annotation algorithms do not perform as well on metagenome sequences as they do on isolate microbial genomes (see [4] for an overview of microbiome sequence assembly and gene prediction problems), they yield data that are amenable to valuable comparative analysis and interpretation as illustrated by the studies published in [20] and [21]. Thus, the metagenomes of simple microbiomes can be assembled into sizable scaffolds and for highly abundant organisms the quality of the assembly and annotation may approach that of draft isolate genomes. For such metagenomes, it is possible to infer the metabolic capabilities of dominant organisms and identify the key member organisms that perform community-essential tasks. Although metagenomic sequence data processing poses numerous challenges due to the complex nature and inherent incompleteness of the data, and the lack of methods designed specifically for processing such data, successful analysis can be carried out on existing metagenomic data. As initial methods are improved or new methods emerge, metagenome data sets will be revised, thus leading to better quality data and annotations. However, metagenome data analysis needs to be conducted in the context of a comprehensive data management and analysis system that provides support for data review and revision. We have addressed this need by developing an experimental metagenome data management and analysis system, IMG/M, based on the Integrated Microbial Genomes (IMG) system [12]. Like IMG, IMG/M is based on the principle that integration of available genomic data is essential for understanding the biology of newly sequenced genomes, as the efficiency of genome analysis increases substantially when it is conducted in a comparative context. Such an integrated context is even more critical for analyzing the inherently incomplete metagenome data. IMG/M has been successfully used for the study of biological phosphorus removing (EBPR) sludge communities [13], and is currently used for analyzing several metagenomes sequenced at JGI. In the following sections, we first discuss the main metagenome data processing challenges. Next, we

88 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences that achieves classification accuracy comparable to the fastest BLAST program.
Abstract: Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Previous programs designed for this task have been relatively slow and computationally expensive, forcing researchers to use faster abundance estimation programs, which only classify small subsets of metagenomic data. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program. In its fastest mode, Kraken classifies 100 base pair reads at a rate of over 4.1 million reads per minute, 909 times faster than Megablast and 11 times faster than the abundance estimation program MetaPhlAn. Kraken is available at http://ccb.jhu.edu/software/kraken/.

3,317 citations

Journal ArticleDOI
TL;DR: This study illustrates how combining comparative metagenomics with gnotobiotic mouse models and specific dietary manipulations can disclose the niches of previously uncharacterized members of the gut microbiota.

2,578 citations

Journal ArticleDOI
TL;DR: MetaSPAdes as mentioned in this paper addresses various challenges of metagenomic assembly by capitalizing on computational ideas that proved to be useful in assemblies of single cells and highly polymorphic diploid genomes.
Abstract: While metagenomics has emerged as a technology of choice for analyzing bacterial populations, the assembly of metagenomic data remains challenging, thus stifling biological discoveries. Moreover, recent studies revealed that complex bacterial populations may be composed from dozens of related strains, thus further amplifying the challenge of metagenomic assembly. metaSPAdes addresses various challenges of metagenomic assembly by capitalizing on computational ideas that proved to be useful in assemblies of single cells and highly polymorphic diploid genomes. We benchmark metaSPAdes against other state-of-the-art metagenome assemblers and demonstrate that it results in high-quality assemblies across diverse data sets.

2,295 citations

Journal ArticleDOI
TL;DR: The RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines and offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job.
Abstract: The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.

1,666 citations

Journal ArticleDOI
TL;DR: This work presents an approach that uses clade-specific marker genes to unambiguously assign reads to microbial clades more accurately and >50× faster than current approaches, and validated the metagenomic phylogenetic analysis tool, MetaPhlAn, on terabases of short reads.
Abstract: Metagenomic shotgun sequencing data can identify microbes populating a microbial community and their proportions, but existing taxonomic profiling methods are inefficient for increasingly large data sets. We present an approach that uses clade-specific marker genes to unambiguously assign reads to microbial clades more accurately and >50× faster than current approaches. We validated our metagenomic phylogenetic analysis tool, MetaPhlAn, on terabases of short reads and provide the largest metagenomic profiling to date of the human gut. It can be accessed at http://huttenhower.sph.harvard.edu/metaphlan/.

1,566 citations