scispace - formally typeset
Open AccessJournal ArticleDOI

cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate

TLDR
‘Copy Number estimation by a Mixture Of PoissonS’ (cn.MOPS), a data processing pipeline for CNV detection in NGS data outperformed its five competitors in terms of precision (1–FDR) and recall for both gains and losses in all benchmark data sets.
Abstract
Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose ‘Copy Number estimation by a Mixture Of PoissonS’ (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1–FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/ software/cnmops/ and at Bioconductor.

read more

Citations
More filters
Journal ArticleDOI

Sequencing depth and coverage: key considerations in genomic analyses

TL;DR: The issue of sequencing depth in the design of next-generation sequencing experiments is discussed and current guidelines and precedents on the issue of coverage are reviewed for four major study designs, including de novo genome sequencing, genome resequencing, transcriptome sequencing and genomic location analyses.
Journal ArticleDOI

CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing

TL;DR: A method for copy number detection, implemented in the software package CNVkit, that uses both the targeted reads and the nonspecifically captured off-target reads to infer copy number evenly across the genome, successfully inferred copy number at equivalent to 100-kilobase resolution genome-wide from a platform targeting as few as 293 genes.
Journal ArticleDOI

Mosdepth: quick coverage calculation for genomes and exomes

TL;DR: Mosdepth is a new command‐line tool for rapidly calculating genome‐wide sequencing coverage that uses a simple algorithm that is computationally efficient and enables it to quickly produce coverage summaries.
Journal ArticleDOI

Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives

TL;DR: The recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data are reviewed to discuss their strengths and weaknesses and suggest directions for future development.
Journal ArticleDOI

A structural variation reference for medical and population genetics

TL;DR: A large empirical assessment of sequence-resolved structural variants from 14,891 genomes across diverse global populations in the Genome Aggregation Database (gnomAD) provides a reference map for disease-association studies, population genetics, and diagnostic screening.
References
More filters
Journal ArticleDOI

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Journal ArticleDOI

A Map of Human Genome Variation From Population-Scale Sequencing

TL;DR: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype as mentioned in this paper, and the results of the pilot phase of the project, designed to develop and compare different strategies for genomewide sequencing with high-throughput platforms.
Journal ArticleDOI

Accurate whole human genome sequencing using reversible terminator chemistry

David R. Bentley, +201 more
- 06 Nov 2008 - 
TL;DR: An approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost is reported, effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.
Journal ArticleDOI

The cancer genome

TL;DR: This work has shown that the complete DNA sequence of large numbers of cancer genomes will be possible to obtain and will provide a detailed and comprehensive perspective on how individual cancers have developed.
Related Papers (5)