scispace - formally typeset
Open AccessJournal ArticleDOI

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data

Heng Li
- 01 Nov 2011 - 
- Vol. 27, Iss: 21, pp 2987-2993
TLDR
This work presents a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation and demonstrates that this method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping.
Abstract
Motivation: Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e.g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty. Results: We present a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation. On real data, we demonstrate that our method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping. We also highlight the necessity of using symmetric datasets for finding somatic mutations and confirm that for discovering rare events, mismapping is frequently the leading source of errors. Availability: http://samtools.sourceforge.net Contact: hengli@broadinstitute.org

read more

Citations
More filters
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton, +517 more
- 01 Oct 2015 - 
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Posted Content

Haplotype-based variant detection from short-read sequencing

Erik Garrison, +1 more
- 17 Jul 2012 - 
TL;DR: A Bayesian statistical framework which is capable of modeling multiallelic loci in sets of individuals with non-uniform copy number is developed and its implementation in a haplotype-based variant detector, FreeBayes is described.
Journal ArticleDOI

Twelve years of SAMtools and BCFtools.

TL;DR: The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines and are freely available on GitHub under the permissive MIT licence, free for both noncommercial and commercial use.
Journal ArticleDOI

Efficient generation of mouse models of human diseases via ABE- and BE-mediated base editing

TL;DR: In vivo generation of mouse models carrying clinically relevant mutations using C→T and A→G editors is demonstrated, making it feasible to model and potentially cure relevant genetic diseases.
Journal ArticleDOI

ANGSD: Analysis of Next Generation Sequencing Data

TL;DR: A multithreaded program suite called ANGSD that can calculate various summary statistics, and perform association mapping and population genetic analyses utilizing the full information in next generation sequencing data by working directly on the raw sequencing data or by using genotype likelihoods.
References
More filters
Journal ArticleDOI

The Sequence Alignment/Map format and SAMtools

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI

Fast and accurate short read alignment with Burrows–Wheeler transform

TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Journal ArticleDOI

The variant call format and VCFtools

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Journal ArticleDOI

A Map of Human Genome Variation From Population-Scale Sequencing

TL;DR: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype as mentioned in this paper, and the results of the pilot phase of the project, designed to develop and compare different strategies for genomewide sequencing with high-throughput platforms.
Related Papers (5)