Showing papers by "Zhong Wang published in 2017"

PDF

Open Access

Journal Article•DOI•

Critical Assessment of Metagenome Interpretation - A benchmark of metagenomics software

[...]

Alexander Sczyrba¹, Peter Hofmann², Peter Hofmann³, Peter Belmann, David Koslicki⁴, Stefan Janssen⁵, Johannes Dröge², Johannes Dröge³, Ivan Gregor³, Ivan Gregor², Stephan Majda³, Jessika Fiedler³, Eik Dahms³, Eik Dahms², Andreas Bremges, Adrian Fritz², Ruben Garrido-Oter, Tue Sparholt Jørgensen⁶, Tue Sparholt Jørgensen⁷, Tue Sparholt Jørgensen⁸, Nicole Shapiro⁹, Philip D. Blood¹⁰, Alexey Gurevich¹¹, Yang Bai¹², Dmitrij Turaev¹³, Matthew Z. DeMaere¹⁴, Rayan Chikhi¹⁵, Niranjan Nagarajan¹⁶, Christopher Quince¹⁷, Fernando Meyer², Monika Balvočiūtė¹⁸, Lars Hestbjerg Hansen⁷, Søren J. Sørensen⁶, Burton Kuan Hui Chia¹⁶, Bertrand Denis¹⁶, Jeff Froula⁹, Zhong Wang⁹, Robert Egan⁹, Dongwan Don Kang⁹, Jeffrey J. Cook¹⁹, Charles Deltel²⁰, Michael Beckstette, Claire Lemaitre²⁰, Pierre Peterlongo²⁰, Guillaume Rizk, Dominique Lavenier¹⁵, Yu Wei Wu²¹, Yu Wei Wu²², Steven W. Singer²², Steven W. Singer²³, Chirag Jain²⁴, Marc Strous²⁵, Heiner Klingenberg²⁶, Peter Meinicke²⁶, Michael D. Barton⁹, Thomas Lingner, Hsin-Hung Lin²⁷, Yu-Chieh Liao²⁷, Genivaldo G. Z. Silva²⁸, Daniel A. Cuevas²⁸, Robert Edwards²⁸, Surya Saha²⁹, Vitor C. Piro³⁰, Vitor C. Piro³¹, Bernhard Y. Renard³¹, Mihai Pop³², Hans-Peter Klenk³³, Markus Göker³⁴, Nikos C. Kyrpides⁹, Tanja Woyke⁹, Julia A. Vorholt³⁵, Paul Schulze-Lefert¹², Edward M. Rubin⁹, Aaron E. Darling¹⁴, Thomas Rattei¹³, Alice C. McHardy - Show less +72 more•Institutions (35)

Bielefeld University¹, BRICS², University of Düsseldorf³, Oregon State University⁴, University of California, San Diego⁵, University of Copenhagen⁶, Aarhus University⁷, Roskilde University⁸, Joint Genome Institute⁹, Pittsburgh Supercomputing Center¹⁰, Saint Petersburg State University¹¹, Max Planck Society¹², University of Vienna¹³, University of Technology, Sydney¹⁴, Centre national de la recherche scientifique¹⁵, Genome Institute of Singapore¹⁶, University of Warwick¹⁷, University of Tübingen¹⁸, Intel¹⁹, French Institute for Research in Computer Science and Automation²⁰, Taipei Medical University²¹, Joint BioEnergy Institute²², Lawrence Berkeley National Laboratory²³, Georgia Institute of Technology²⁴, University of Calgary²⁵, University of Göttingen²⁶, National Health Research Institutes²⁷, San Diego State University²⁸, Boyce Thompson Institute for Plant Research²⁹, Coordenadoria de Aperfeiçoamento de Pessoal de Nível Superior³⁰, Robert Koch Institute³¹, University of Maryland, College Park³², Newcastle University³³, Leibniz Association³⁴, ETH Zurich³⁵

02 Oct 2017-Nature Methods

TL;DR: The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups as discussed by the authors.

...read moreread less

Abstract: Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.

...read moreread less

593 citations

Posted Content•DOI•

Critical Assessment of Metagenome Interpretation − a benchmark of computational metagenomics software

[...]

Alexander Sczyrba¹, Peter Hofmann², Peter Belmann¹, David Koslicki³, Stefan Janssen⁴, Johannes Dröge², Ivan Gregor², Stephan Majda², Jessika Fiedler², Eik Dahms², Andreas Bremges¹, Adrian Fritz⁵, Ruben Garrido-Oter², Tue Sparholt Jørgensen⁶, Nicole Shapiro⁷, Philip D. Blood⁸, Alexey Gurevich⁹, Yang Bai¹⁰, Dmitrij Turaev¹¹, Matthew Z. DeMaere¹², Rayan Chikhi¹³, Niranjan Nagarajan¹⁴, Christopher Quince¹⁵, Lars Hestbjerg Hansen¹⁶, Søren J. Sørensen⁶, Burton Kuan Hui Chia¹⁴, Bertrand Denis¹⁴, Jeff Froula⁷, Zhong Wang⁷, Robert Egan⁷, Dongwan Don Kang⁷, Jeffrey J. Cook¹⁷, Charles Deltel¹⁸, Michael Beckstette, Claire Lemaitre¹⁸, Pierre Peterlongo¹⁸, Guillaume Rizk, Dominique Lavenier¹³, Yu Wei Wu¹⁹, Steven W. Singer²⁰, Chirag Jain²¹, Marc Strous²², Heiner Klingenberg²³, Peter Meinicke²³, Michael D. Barton⁷, Thomas Lingner, Hsin-Hung Lin²⁴, Yu-Chieh Liao²⁴, Genivaldo G. Z. Silva²⁵, Daniel A. Cuevas²⁵, Robert Edwards²⁵, Surya Saha²⁶, Vitor C. Piro²⁷, Bernhard Y. Renard²⁷, Mihai Pop²⁸, Hans-Peter Klenk²⁹, Markus Göker³⁰, Nikos C. Kyrpides⁷, Tanja Woyke⁷, Julia A. Vorholt³¹, Paul Schulze-Lefert²¹, Edward M. Rubin⁷, Aaron E. Darling¹², Thomas Rattei¹¹, Alice C. McHardy² - Show less +61 more•Institutions (31)

Bielefeld University¹, University of Düsseldorf², Oregon State University³, University of California, Berkeley⁴, BRICS⁵, University of Copenhagen⁶, Joint Genome Institute⁷, Pittsburgh Supercomputing Center⁸, Saint Petersburg State University⁹, Chinese Academy of Sciences¹⁰, University of Vienna¹¹, University of Technology, Sydney¹², Centre national de la recherche scientifique¹³, Genome Institute of Singapore¹⁴, University of Warwick¹⁵, Aarhus University¹⁶, Intel¹⁷, French Institute for Research in Computer Science and Automation¹⁸, Taipei Medical University¹⁹, Lawrence Berkeley National Laboratory²⁰, Max Planck Society²¹, University of Calgary²², University of Göttingen²³, National Health Research Institutes²⁴, San Diego State University²⁵, Boyce Thompson Institute for Plant Research²⁶, Robert Koch Institute²⁷, University of Maryland, College Park²⁸, Newcastle University²⁹, Leibniz Association³⁰, ETH Zurich³¹

09 Jan 2017-bioRxiv

TL;DR: Benchmark metagenomes were generated from ~700 newly sequenced microorganisms and ~600 novel viruses and plasmids, including genomes with varying degrees of relatedness to each other and to publicly available ones and representing common experimental setups.

...read moreread less

Abstract: In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets and evaluation metrics complicates proper performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on datasets of unprecedented complexity and realism. Benchmark metagenomes were generated from newly sequenced ~700 microorganisms and ~600 novel viruses and plasmids, including genomes with varying degrees of relatedness to each other and to publicly available ones and representing common experimental setups. Across all datasets, assembly and genome binning programs performed well for species represented by individual genomes, while performance was substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below the family level. Parameter settings substantially impacted performances, underscoring the importance of program reproducibility. While highlighting current challenges in computational metagenomics, the CAMI results provide a roadmap for software selection to answer specific research questions.

...read moreread less

59 citations

Journal Article•DOI•

A case study of tuning MapReduce for efficient Bioinformatics in the cloud

[...]

Lizhen Shi¹, Zhong Wang², Weikuan Yu¹, Xiandong Meng²•Institutions (2)

Florida State University¹, Lawrence Berkeley National Laboratory²

01 Jan 2017

TL;DR: An exemplary case for tuning MapReduce-based bioinformatics applications in the cloud, and documents the key parameters that could lead to significant performance benefits are presented.

...read moreread less

Abstract: The combination of the Hadoop MapReduce programming model and cloud computing allows biological scientists to analyze next-generation sequencing (NGS) data in a timely and cost-effective manner. Cloud computing platforms remove the burden of IT facility procurement and management from end users and provide ease of access to Hadoop clusters. However, biological scientists are still expected to choose appropriate Hadoop parameters for running their jobs. More importantly, the available Hadoop tuning guidelines are either obsolete or too general to capture the particular characteristics of bioinformatics applications. In this study, we aim to minimize the cloud computing cost spent on bioinformatics data analysis by optimizing the extracted significant Hadoop parameters. When using MapReduce-based bioinformatics tools in the cloud, the default settings often lead to resource underutilization and wasteful expenses. We choose k-mer counting, a representative application used in a large number of NGS data analysis tools, as our study case. Experimental results show that, with the fine-tuned parameters, we achieve a total of 4× speedup compared with the original performance (using the default settings). This paper presents an exemplary case for tuning MapReduce-based bioinformatics applications in the cloud, and documents the key parameters that could lead to significant performance benefits.

...read moreread less

15 citations

Reference Entry•DOI•

Transcriptomics: Next Generation Transcriptome

[...]

Nicole V. Johnson¹, Nicole V. Johnson², Zhong Wang², Zhong Wang³, Zhong Wang¹ - Show less +1 more•Institutions (3)

Lawrence Berkeley National Laboratory¹, Joint Genome Institute², University of California, Merced³

26 Oct 2017