cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate
Günter Klambauer,Karin Schwarzbauer,Andreas Mayr,Djork-Arné Clevert,Andreas Mitterecker,Ulrich Bodenhofer,Sepp Hochreiter +6 more
TLDR
‘Copy Number estimation by a Mixture Of PoissonS’ (cn.MOPS), a data processing pipeline for CNV detection in NGS data outperformed its five competitors in terms of precision (1–FDR) and recall for both gains and losses in all benchmark data sets.Abstract:
Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose ‘Copy Number estimation by a Mixture Of PoissonS’ (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1–FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/ software/cnmops/ and at Bioconductor.read more
Citations
More filters
Journal ArticleDOI
Sequencing depth and coverage: key considerations in genomic analyses
TL;DR: The issue of sequencing depth in the design of next-generation sequencing experiments is discussed and current guidelines and precedents on the issue of coverage are reviewed for four major study designs, including de novo genome sequencing, genome resequencing, transcriptome sequencing and genomic location analyses.
Journal ArticleDOI
CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing
TL;DR: A method for copy number detection, implemented in the software package CNVkit, that uses both the targeted reads and the nonspecifically captured off-target reads to infer copy number evenly across the genome, successfully inferred copy number at equivalent to 100-kilobase resolution genome-wide from a platform targeting as few as 293 genes.
Journal ArticleDOI
Mosdepth: quick coverage calculation for genomes and exomes
TL;DR: Mosdepth is a new command‐line tool for rapidly calculating genome‐wide sequencing coverage that uses a simple algorithm that is computationally efficient and enables it to quickly produce coverage summaries.
Journal ArticleDOI
Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives
TL;DR: The recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data are reviewed to discuss their strengths and weaknesses and suggest directions for future development.
Journal ArticleDOI
A structural variation reference for medical and population genetics
Ryan L. Collins,Ryan L. Collins,Harrison Brand,Harrison Brand,Konrad J. Karczewski,Konrad J. Karczewski,Xuefang Zhao,Xuefang Zhao,Jessica Alföldi,Jessica Alföldi,Laurent C. Francioli,Laurent C. Francioli,Amit Khera,Amit Khera,Chelsea Lowther,Chelsea Lowther,Laura D. Gauthier,Harold Z. Wang,Harold Z. Wang,Nicholas A. Watts,Nicholas A. Watts,Matthew Solomonson,Matthew Solomonson,Anne H. O’Donnell-Luria,Anne H. O’Donnell-Luria,Alexander Baumann,Ruchi Munshi,Mark Walker,Christopher W. Whelan,Yongqing Huang,Ted Brookings,Ted Sharpe,Matthew R. Stone,Matthew R. Stone,Elise Valkanas,Elise Valkanas,Jack Fu,Jack Fu,Grace Tiao,Grace Tiao,Kristen M. Laricchia,Kristen M. Laricchia,Valentin Ruano-Rubio,Christine Stevens,Namrata Gupta,Caroline N. Cusick,Lauren Margolin,Genome Aggregation Database Production Team,Kent D. Taylor,Henry J. Lin,Stephen S. Rich,Wendy S. Post,Yii-Der Ida Chen,Jerome I. Rotter,Chad Nusbaum,Anthony A. Philippakis,Eric S. Lander,Eric S. Lander,Eric S. Lander,Stacey Gabriel,Benjamin M. Neale,Sekar Kathiresan,Mark J. Daly,Eric Banks,Daniel G. MacArthur,Michael E. Talkowski +65 more
TL;DR: A large empirical assessment of sequence-resolved structural variants from 14,891 genomes across diverse global populations in the Genome Aggregation Database (gnomAD) provides a reference map for disease-association studies, population genetics, and diagnostic screening.
References
More filters
Journal ArticleDOI
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Journal ArticleDOI
A framework for variation discovery and genotyping using next-generation DNA sequencing data
Mark A. DePristo,Eric Banks,Ryan Poplin,Kiran V. Garimella,Jared Maguire,Christopher Hartl,Anthony A. Philippakis,Anthony A. Philippakis,Anthony A. Philippakis,Guillermo del Angel,Manuel A. Rivas,Manuel A. Rivas,Matt Hanna,Aaron McKenna,Timothy Fennell,Andrew Kernytsky,Andrey Sivachenko,Kristian Cibulskis,Stacey Gabriel,David Altshuler,David Altshuler,Mark J. Daly,Mark J. Daly +22 more
TL;DR: A unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs is presented.
Journal ArticleDOI
A Map of Human Genome Variation From Population-Scale Sequencing
Gonçalo R. Abecasis,David Altshuler,David Altshuler,Adam Auton,Lisa D Brooks,Richard Durbin,Richard A. Gibbs,Matthew E. Hurles,Gil McVean +8 more
TL;DR: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype as mentioned in this paper, and the results of the pilot phase of the project, designed to develop and compare different strategies for genomewide sequencing with high-throughput platforms.
Journal ArticleDOI
Accurate whole human genome sequencing using reversible terminator chemistry
David R. Bentley,Shankar Balasubramanian,Harold Swerdlow,Harold Swerdlow,Geoffrey Paul Smith,John Milton,John Milton,Clive Gavin Brown,Clive Gavin Brown,Kevin Hall,Dirk J. Evers,Colin Barnes,Colin Barnes,Helen Bignell,Jonathan Mark Boutell,Jason Bryant,Richard J. Carter,R. Keira Cheetham,Anthony J. Cox,Darren James Ellis,Michael R. Flatbush,Niall Anthony Gormley,Sean Humphray,Leslie J. Irving,Mirian S. Karbelashvili,Scott M. Kirk,Heng Li,Xiaohai Liu,Xiaohai Liu,Klaus Maisinger,Lisa Murray,Bojan Obradovic,Tobias William Barr Ost,Michael Lawrence Parkinson,M. R. Pratt,Isabelle Rasolonjatovo,Mark T. Reed,Roberto Rigatti,Chiara Rodighiero,Mark T. Ross,Andrea Sabot,Subramanian V. Sankar,Aylwyn Scally,Gary P. Schroth,Mark Smith,Vincent Peter Smith,Anastassia Spiridou,Peta E. Torrance,Svilen S. Tzonev,Eric Vermaas,Klaudia Walter,Wu Xiaolin,Lu Zhang,Mohammed D. Alam,Carole Anastasi,Ify C. Aniebo,David Mark Dunstan Bailey,Iain R. Bancarz,Saibal Banerjee,Selena G. Barbour,Primo Baybayan,Vincent A. Benoit,Kevin Benson,Claire Bevis,Phillip J. Black,Asha Boodhun,Joe S. Brennan,John Bridgham,Rob C. Brown,Andrew A. Brown,Dale Buermann,Abass A. Bundu,James C. Burrows,Nigel P. Carter,Nestor Castillo,Maria Chiara E. Catenazzi,Simon Chang,R. Neil Cooley,Natasha R. Crake,Olubunmi O. Dada,Konstantinos D. Diakoumakos,Belen Dominguez-Fernandez,David James Earnshaw,David James Earnshaw,Ugonna C. Egbujor,David W. Elmore,Sergey Etchin,Mark R. Ewan,Milan Fedurco,Louise Fraser,Karin Fuentes Fajardo,W. Scott Furey,David George,Kimberley J. Gietzen,Colin P. Goddard,George Stefan Golda,Philip A. Granieri,David E. Green,David L. Gustafson,Nancy F. Hansen,Kevin Harnish,Christian D. Haudenschild,Narinder I. Heyer,Matthew M. Hims,Johnny T. Ho,Adrian Horgan,Katya Hoschler,Steve Hurwitz,Denis V. Ivanov,Maria Q. Johnson,Terena James,T. A. Huw Jones,Gyoung-Dong Kang,Tzvetana H. Kerelska,Alan D. Kersey,Irina Khrebtukova,Alex P. Kindwall,Zoya Kingsbury,Paula Kokko-Gonzales,Anil Kumar,Marc Laurent,Cindy Lawley,Sarah E. Lee,Xavier Lee,Arnold Liao,Jennifer A. Loch,Mitch Lok,Shujun Luo,Radhika M. Mammen,John W. Martin,Patrick Mccauley,Paul McNitt,Parul Mehta,Keith W. Moon,Joe W. Mullens,Taksina Newington,Zemin Ning,Bee Ling Ng,Sonia M. Novo,Michael J. O'Neill,Mark A. Osborne,Mark A. Osborne,Andrew Osnowski,Omead Ostadan,Lambros L. Paraschos,Lea Pickering,Andrew C. Pike,Alger C. Pike,D. Chris Pinkard,Daniel P. Pliskin,Joe Podhasky,Victor J. Quijano,Come Raczy,Vicki H. Rae,Stephen Rawlings,Ana Chiva Rodriguez,Phyllida M. Roe,John Rogers,Maria Candelaria Rogert Bacigalupo,Nikolai Romanov,Anthony Romieu,Rithy K. Roth,Natalie J. Rourke,Silke Ruediger,Eli Rusman,Raquel Maria Sanches-Kuiper,Martin R. Schenker,Josefina M. Seoane,Richard Shaw,Mitch K. Shiver,Steven W. Short,Ning Sizto,Johannes P. Sluis,Melanie Anne Smith,Jean Ernest Sohna Sohna,Eric J. Spence,Kim B. Stevens,Neil Sutton,Lukasz Szajkowski,Carolyn Tregidgo,Gerardo Turcatti,Stephanie Vandevondele,Yuli Verhovsky,Selene M. Virk,Suzanne Wakelin,Gregory C. Walcott,Jingwen Wang,Graham John Worsley,Juying Yan,Ling Yau,Mike Zuerlein,Jane Rogers,James C. Mullikin,Matthew E. Hurles,Nick J. McCooke,Nick J. McCooke,John Stephen West,Frank L. Oaks,Peter Lundberg,David Klenerman,Richard Durbin,Anthony J. Smith +201 more
TL;DR: An approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost is reported, effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.
Journal ArticleDOI
The cancer genome
Michael R. Stratton,Michael R. Stratton,Peter J. Campbell,Peter J. Campbell,P. Andrew Futreal +4 more
TL;DR: This work has shown that the complete DNA sequence of large numbers of cancer genomes will be possible to obtain and will provide a detailed and comprehensive perspective on how individual cancers have developed.