Interrogating a High-Density SNP Map for Signatures of Natural Selection

doi:10.1101/GR.631202

Open AccessJournal ArticleDOI

Interrogating a High-Density SNP Map for Signatures of Natural Selection

Joshua M. Akey, +5 more

- 01 Dec 2002 -

Genome Research

- Vol. 12, Iss: 12, pp 1805-1814

TLDR

An analysis of single nucleotide polymorphisms with allele frequencies that were determined in three populations provides a first generation natural selection map of the human genome and provides compelling evidence that selection has shaped extant patterns of human genomic variation.

Abstract:

Natural selection, which can be defined as the differential contribution of genetic variants to future generations (Aquadro et al. 2001), is the driving force of Darwinian evolution. Despite intense research, only a relatively small number of regions and genes have been directly implicated as targets of selection in the human genome (Kitano and Saitou 1999; Rana et al. 1999; Huttley et al. 2000; Hollox et al. 2001; Hull et al. 2001; Hurst and Pal 2001; Koda et al. 2001; Sullivan et al. 2001; Tishkoff et al. 2001; Baum et al. 2002; Fullerton et al. 2002; Gilad et al. 2002; Hamblin et al. 2002). A more comprehensive and genomic understanding of how and where natural selection has shaped patterns of genetic variation may provide important insights into the mechanisms of evolutionary change (Otto 2000), guide selection of loci for inclusion in population genetic studies (Vitalis et al. 2001), facilitate the annotation of functionally significant genomic regions (Nielsen 2001), and help elucidate genotype-phenotype correlations in complex diseases (Przeworski et al. 2000; Nielsen 2001). Detecting unambiguous evidence for natural selection remains challenging because the effect of selection on the distribution of genetic variation can be mimicked by population demographic history (i.e., the size, structure, and mating pattern of a population). For instance, both adaptive hitchhiking and population expansion can cause an excess of rare variants observed in DNA sequence data compared with what is expected under a standard neutral model (Tajima 1989; Przeworski et al. 2000). Despite these difficulties, the recent deluge of publicly available single nucleotide polymorphisms (SNPs) provides an exciting opportunity to identify genome-wide signatures of selection (Sunyaev et al. 2000; Fay et al. 2001; Sachidanandam et al. 2001). To this end, examining the variation in SNP allele frequencies between populations, which can be quantified by the statistic FST, is a promising strategy for detecting signatures of natural selection (Lewontin and Krakauer 1973; Rana et al. 1999; Hollox et al. 2001; Fullerton et al. 2002; Gilad et al. 2002; Hamblin et al. 2002). Under selective neutrality, FST is determined by genetic drift, which will affect all loci across the genome in a similar and predictable fashion. On the other hand, natural selection is a locus-specific force that can cause systematic deviations in FST values for a selected gene and nearby genetic markers. For example, geographically restricted directional selection may lead to an increase in FST of a selected locus, whereas balancing or species-wide directional selection may lead to a decrease in FST compared with neutrally evolving loci (Cavalli-Sforza 1966; Bowcock et al. 1991; Andolfatto 2001). Previous studies that have attempted to identify natural selection based on patterns of population differentiation relied on simulations to obtain the expected distribution of FST under selective neutrality (Lewontin and Krakauer 1973; Bowcock et al. 1991; Beaumont and Nichols 1996). However, the simulated distribution of FST strongly depends on the assumed population demographic history, which is rarely known with any degree of certainty. As an expanding number of SNPs are genotyped across multiple populations, a complimentary approach that does not require tenuous assumptions about population demographic history is now becoming feasible. Specifically, by sampling a large number of SNPs throughout the genome, loci that have been affected by natural selection can simply be identified as outliers in the extreme tails of the empirical distribution of FST (Cavalli-Sforza 1966; Black et al. 2001; Goldstein and Chikhi 2002). Recently, this strategy has been used to infer natural selection in the CAPN10 gene; however, the empirical distribution of FST contained <100 loci (Fullerton et al. 2002). In this work, we describe an analysis of 26,530 SNPs with allele frequencies that were determined in three populations: African-American, East Asian, and European-American. The density of this SNP allele frequency map provides a unique and powerful opportunity to interrogate the genome for signatures of natural selection. Through a variety of analyses, we have found statistically significant evidence supporting the hypothesis that selection has influenced extant patterns of human genetic variation. Furthermore, we have identified 174 candidate genes that demonstrate signatures of selection when contrasted to the empirical genome-wide distribution of FST. This analysis provides the conceptual foundation for constructing a high-resolution natural selection map, which will be an important resource in understanding the recent evolutionary history of our species, and will facilitate detailed studies on the identified candidate genes.

Interrogating a High-Density SNP Map for Signatures of Natural Selection

Citations

A second generation human haplotype map of over 3.1 million SNPs

Molecular Signatures of Natural Selection

How to track and assess genotyping errors in population genetics studies.

The power and promise of population genomics: from genotyping to genome typing

Whole-Genome Patterns of Common DNA Variation in Three Human Populations

References

Gene Ontology: tool for the unification of biology

Estimating F-statistics for the analysis of population structure.

Molecular Evolutionary Genetics

The genetical structure of populations

A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms

Related Papers (5)

A Map of Recent Positive Selection in the Human Genome

Estimating F-statistics for the analysis of population structure.

Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.

The hitch-hiking effect of a favourable gene.

The genetical structure of populations