False Negatives Are a Significant Feature of Next Generation Sequencing Callsets
read more
Citations
The presence and impact of reference bias on population genomic studies of prehistoric human populations.
Ultrarare variants drive substantial cis heritability of human gene expression.
No Evidence for Recent Selection at FOXP2 among Diverse Human Populations
The presence and impact of reference bias on population genomic studies of prehistoric human populations
Advances and Trends in Omics Technology Development
References
Fast and accurate short read alignment with Burrows–Wheeler transform
The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
A global reference for human genetic variation.
A framework for variation discovery and genotyping using next-generation DNA sequencing data
Sequence and organization of the human mitochondrial genome
Related Papers (5)
A remark on copy number variation detection methods.
Frequently Asked Questions (12)
Q2. What is the effect of such pipelines?
Such pipelines tend to optimize filtering out false positive variants, which are highly prevalent in raw 2nd generation sequencing data (DePristo et al.
Q3. How can the authors model a mtDNA variant as being inherited identically by descent?
In the absence of recombination, any given contiguous sequence of nucleotides can be modeled as being inherited identically by descent (IBD) by creating a phylogenetic tree of shared and derived mutations.
Q4. How did the authors compute the depth of coverage for each base pair location in each sample?
To compute the depth of coverage for each base pair location in each sample in their Illumina data, the authors used GATK’s DepthOfCoverage (McKenna et al. 2010).
Q5. How many variants were excluded from the mother’s sequence?
For 9 of the remaining candidate mutations, the variants in the mother’s sequence were predicted to be present based on the mother’s phylogenetic lineage, so the corresponding candidate mutations were excluded.
Q6. What is the way to evaluate false negatives in haploid systems?
While PhyloFaN can be used to systematically explore the effect of pipeline parameters on the false negative in haploid systems, it is an imperfect proxy for assaying autosomal data.
Q7. How many variants were missing from the NGS dataset?
In the Complete Genomics dataset, their algorithm estimates that 2,313 out of11,429 predicted variants were missing from the NGS variant callset.
Q8. What is the importance of a balanced assessment of false positive and false negative error rates?
A balanced assessment of both false positive and false negative error rates is necessary for Mendelian and complex disease identification approaches, but also crucial for evolutionary studies of mutation rates (Ségurel et al. 2014).
Q9. What is the reason for the high number of candidate de novo mutations in human genome?
There is often a high number of candidate de novo mutations identified in trio/duo, but most candidates are a result of either a false positive in the offspring or a false negative in a parent (Girard et al.
Q10. Why did the authors exclude indels from the analysis?
For consistency, the authors excluded indels from this analysis so the autosomal and mitochondrial false negative rates could be compared.
Q11. How does the logit model predict the probability of false negative status?
A logit model with these parameters predicts that an increase in coverage from 2,000 to 3,000 reads leads to a decrease in the probability of false negative status from 17.3% to 15.8%.
Q12. What was the assumption that the mtDNA had only one allele in the individual?
During preparation of the callset, it was assumed that for any given locus the mtDNA has only one allele in a particular individual and heterozygous sites were removed.