SweeD: Likelihood-Based Detection of Selective Sweeps in Thousands of Genomes

doi:10.1093/MOLBEV/MST112

Open AccessJournal ArticleDOI

SweeD: Likelihood-Based Detection of Selective Sweeps in Thousands of Genomes

Pavlos Pavlidis, +3 more

- 01 Sep 2013 -

Molecular Biology and Evolution

- Vol. 30, Iss: 9, pp 2224-2234

Chats0

TLDR

It is shown that an increase of sample size results in more precise detection of positive selection and the ability to analyze substantially larger sample sizes by using SweeD leads to more accurate sweep detection.

Abstract:

The advent of modern DNA sequencing technology is the driving force in obtaining complete intra-specific genomes that can be used to detect loci that have been subject to positive selection in the recent past. Based on selective sweep theory, beneficial loci can be detected by examining the single nucleotide polymorphism patterns in intraspecific genome alignments. In the last decade, a plethora of algorithms for identifying selective sweeps have been developed. However, the majority of these algorithms have not been designed for analyzing whole-genome data. We present SweeD (Sweep Detector), an open-source tool for the rapid detection of selective sweeps in whole genomes. It analyzes site frequency spectra and represents a substantial extension of the widely used SweepFinder program. The sequential version of SweeD is up to 22 times faster than SweepFinder and, more importantly, is able to analyze thousands of sequences. We also provide a parallel implementation of SweeD for multi-core processors. Furthermore, we implemented a checkpointing mechanism that allows to deploy SweeD on cluster systems with queue execution time restrictions, as well as to resume long-running analyses after processor failures. In addition, the user can specify various demographic models via the command-line to calculate their theoretically expected site frequency spectra. Therefore, (in contrast to SweepFinder) the neutral site frequencies can optionally be directly calculated from a given demographic model. We show that an increase of sample size results in more precise detection of positive selection. Thus, the ability to analyze substantially larger sample sizes by using SweeD leads to more accurate sweep detection. We validate SweeD via simulations and by scanning the first chromosome from the 1000 human Genomes project for selective sweeps. We compare SweeD results with results from a linkage-disequilibrium-based approach and identify common outliers.

SweeD: Likelihood-Based Detection of Selective Sweeps in Thousands of Genomes

Citations

Sweeps in time: leveraging the joint distribution of branch lengths

Genomic diversity and selection sweeps identified in Indian swamp buffaloes reveals it's uniqueness with riverine buffaloes.

Thinking too positive? Revisiting current methods of population-genetic selection inference

Identification of important genomic footprints using eight different selection signature statistics in domestic cattle breeds

Resequencing and association mapping of the generalist pathogen Botrytis cinerea

References

Practical Methods of Optimization.

An integrated map of genetic variation from 1,092 human genomes

Practical Methods of Optimization

The hitch-hiking effect of a favourable gene.

Generating samples under a Wright-Fisher neutral model of genetic variation.

Related Papers (5)

The variant call format and VCFtools

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.

Fast and accurate short read alignment with Burrows–Wheeler transform

The Sequence Alignment/Map format and SAMtools