scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Estimating F-statistics for the analysis of population structure.

01 Nov 1984-Evolution (Wiley)-Vol. 38, Iss: 6, pp 1358-1370
TL;DR: The purpose of this discussion is to offer some unity to various estimation formulae and to point out that correlations of genes in structured populations, with which F-statistics are concerned, are expressed very conveniently with a set of parameters treated by Cockerham (1 969, 1973).
Abstract: This journal frequently contains papers that report values of F-statistics estimated from genetic data collected from several populations. These parameters, FST, FIT, and FIS, were introduced by Wright (1951), and offer a convenient means of summarizing population structure. While there is some disagreement about the interpretation of the quantities, there is considerably more disagreement on the method of evaluating them. Different authors make different assumptions about sample sizes or numbers of populations and handle the difficulties of multiple alleles and unequal sample sizes in different ways. Wright himself, for example, did not consider the effects of finite sample size. The purpose of this discussion is to offer some unity to various estimation formulae and to point out that correlations of genes in structured populations, with which F-statistics are concerned, are expressed very conveniently with a set of parameters treated by Cockerham (1 969, 1973). We start with the parameters and construct appropriate estimators for them, rather than beginning the discussion with various data functions. The extension of Cockerham's work to multiple alleles and loci will be made explicit, and the use of jackknife procedures for estimating variances will be advocated. All of this may be regarded as an extension of a recent treatment of estimating the coancestry coefficient to serve as a mea-
Citations
More filters
Journal ArticleDOI
TL;DR: Arlequin ver 3.0 as discussed by the authors is a software package integrating several basic and advanced methods for population genetics data analysis, like the computation of standard genetic diversity indices, the estimation of allele and haplotype frequencies, tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium, estimation or parameters from past population expansions, and thorough analyses of population subdivision under the AMOVA framework.
Abstract: Arlequin ver 3.0 is a software package integrating several basic and advanced methods for population genetics data analysis, like the computation of standard genetic diversity indices, the estimation of allele and haplotype frequencies, tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium, estimation or parameters from past population expansions, and thorough analyses of population subdivision under the AMOVA framework. Arlequin 3 introduces a completely new graphical interface written in C++, a more robust semantic analysis of input files, and two new methods: a Bayesian estimation of gametic phase from multi-locus genotypes, and an estimation of the parameters of an instantaneous spatial expansion from DNA sequence polymorphism. Arlequin can handle several data types like DNA sequences, microsatellite data, or standard multi-locus genotypes. A Windows version of the software is freely available on http://cmpg.unibe.ch/software/arlequin3.

14,271 citations

Journal ArticleDOI
01 Jun 1992-Genetics
TL;DR: In this article, a framework for the study of molecular variation within a single species is presented, where information on DNA haplotype divergence is incorporated into an analysis of variance format, derived from a matrix of squared-distances among all pairs of haplotypes.
Abstract: We present here a framework for the study of molecular variation within a single species. Information on DNA haplotype divergence is incorporated into an analysis of variance format, derived from a matrix of squared-distances among all pairs of haplotypes. This analysis of molecular variance (AMOVA) produces estimates of variance components and F-statistic analogs, designated here as phi-statistics, reflecting the correlation of haplotypic diversity at different levels of hierarchical subdivision. The method is flexible enough to accommodate several alternative input matrices, corresponding to different types of molecular data, as well as different types of evolutionary assumptions, without modifying the basic structure of the analysis. The significance of the variance components and phi-statistics is tested using a permutational approach, eliminating the normality assumption that is conventional for analysis of variance but inappropriate for molecular data. Application of AMOVA to human mitochondrial DNA haplotype data shows that population subdivisions are better resolved when some measure of molecular differences among haplotypes is introduced into the analysis. At the intraspecific level, however, the additional information provided by knowing the exact phylogenetic relations among haplotypes or by a nonlinear translation of restriction-site change into nucleotide diversity does not significantly modify the inferred population genetic structure. Monte Carlo studies show that site sampling does not fundamentally affect the significance of the molecular variance components. The AMOVA treatment is easily extended in several different directions and it constitutes a coherent and flexible framework for the statistical analysis of molecular data.

12,835 citations

Journal ArticleDOI
TL;DR: This note summarizes developments of the genepop software since its first description in 1995, and in particular those new to version 4.0: an extended input format, several estimators of neighbourhood size under isolation by distance, new estimators and confidence intervals for null allele frequency, and less important extensions to previous options.
Abstract: This note summarizes developments of the genepop software since its first description in 1995, and in particular those new to version 4.0: an extended input format, several estimators of neighbourhood size under isolation by distance, new estimators and confidence intervals for null allele frequency, and less important extensions to previous options. genepop now runs under Linux as well as under Windows, and can be entirely controlled by batch calls.

8,171 citations


Cites background or methods from "Estimating F-statistics for the ana..."

  • ...As further detailed in the genepop documentation, while the single locus estimators are identical, these multilocus estimators differ from the ones described in Weir & Cockerham (1984) and Weir (1996)....

    [...]

  • ...…of Weir (1996) give the same weight to estimates of the Q’s for a locus typed at five individuals in each subpopulation as for a locus typed at 50 individuals in each subpopulation, while the estimators or Weir & Cockerham (1984) give less weight to the Q estimates from loci with larger samples....

    [...]

Journal ArticleDOI
18 Oct 2007-Nature
TL;DR: The Phase II HapMap is described, which characterizes over 3.1 million human single nucleotide polymorphisms genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed, and increased differentiation at non-synonymous, compared to synonymous, SNPs is demonstrated.
Abstract: We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.

4,565 citations

References
More filters
Journal ArticleDOI
TL;DR: The primary objectives of this study are to characterize each taxon in terms of gene frequencies, genetic diversity and genotype frequencies relative to HardyWeinberg expectation, and to apportion genetic diversity within and between subspecies.
Abstract: thousands of individuals. The species is divisible into six morphologically distinct subspecies which are largely allopatric or parapatric (Erbe and Turner, 1962). Their ranges are depicted in Figure 1. Subspecies glabriflora grows principally on stabilized sandy dune soils in open fields of the Coastal Plain, ssp-. littoralis on coastal beach sands, ssp. tharpii on sandy soils in open prairies of the Rio Grande Plain, ssp. goldsmithii and ssp. drummondii on sandy loams in open or cleared forest areas, and ssp. mcallisteri in open fields on shallow sandy soils of granitic origin on the Edwards Plateau. Reproductive barriers to gene exchange are weak (Erbe and Turner, 1962; Levin, 1976), and the integrity of the subspecies seems due to their ecogeographical isolation. All of the subspecies exhibit gametophytic self-incompatibility, and have the same chromosome number, n = 7. The primary objectives of this study are (1) to characterize each taxon in terms of gene frequencies, genetic diversity and genotype frequencies relative to HardyWeinberg expectation, (2) to apportion genetic diversity within and between subspecies, (3) to describe gene frequency heterogeneity between populations of the same subspecies and the species as a whole, (4) to describe geographical variation patterns for alleles and genetic diversity, and (5) to determine the genetic affinities of the subspecies. In doing this an attempt is made to discern the roles of gene flow restriction within and among populations and stochastic processes in fostering and maintaining the organization of genetic variability in P. drummondii. The concordance between morphological and allozymic variation also is considered. EVOLUTION 31:477-494. September 1977 477

77 citations


"Estimating F-statistics for the ana..." refers methods in this paper

  • ...Such an approach is suggested by Nei and Imaizumi (1966), made explicit by Workman and Niswander (1970), discussed by Wright (1978), and used by Chesser (1983), Ellstrand and Levin (1980), Levin (1977, 1978), Levin et al. (1979), Pamilo (1983), and Ryman et al. (1980)....

    [...]

Journal ArticleDOI
TL;DR: The level of genetic differentiation among partially isolated subpopulations will, in the absence of selection (and ignoring mutation), depend on effective sizes of demes and the migration rates between them.
Abstract: A population structure in which the metapopulation is divided into small breeding units is often considered most favorable for evolution, because it facilitates genetic change and interpopulation differentiation (Wright, 1931, 1940). Such a structure may be common in nature, as electrophoretic studies have revealed considerable microgeographic differentiation in natural populations of many animals, i.e., genetic differentiation in an area which might be thought to be within the dispersal range of single individuals. This has been detected in both vertebrates (e.g., Selander, 1970; Patton and Feder, 1981) and invertebrates (e.g., Selander and Kaufman, 1975; Richmond, 1978; Jones et al., 1980; Varvio-Aho and Pamilo, 1980). One cause of such microgeographic differentiation in allele frequencies could be locally differing selection pressures which may be further enhanced by habitat choice by individuals. Alternatively, the differentiation may arise from stochastic processes. The level of genetic differentiation among partially isolated subpopulations will, in the absence of selection (and ignoring mutation), depend on effective sizes of demes and the migration rates between them. In continuous populations, local allele frequency differences may arise because of isolation by distance, and the basic conceptual unit is a neighborhood, the size of population from which the parents may be treated as if drawn at random (Wright, 1943, 1951). The effects of various population structures on genetic differentiation have been examined by using

73 citations

Journal ArticleDOI
TL;DR: Studies of the genetic and population structure in house mice that are reported here grew out of the need to resolve a paradox.
Abstract: It is a widespread view that house mice, Mus musculus, live in behaviorally isolated tribes or demes often simply referred to as social groups. However, several problems with the basis for this view may be identified: (1) no trapping system can show that mice restrict their breeding to a particular tribe; (2) low vagility is not equivalent to an exclusive home range: an occasional copulation between groups is very hard to disprove; and (3) sampling by removal disrupts existing social groups by selectively removing dominant mice which causes immigration into the area where removal occurred (Adamczyk and Walkowa, 197 1; Baker, 1980, provides extensive references that are omitted throughout this report). Studies of the genetic and population structure in house mice that are reported here grew out of the need to resolve a paradox. Whereas the observed segregation ratio of the lethal alleles (tlu) at the T locus predicted high frequencies of tl' in natural populations, field data revealed low frequencies (Bruck, 1957; Bennett, 1975). This difference between observed and expected gene frequencies was explained by assuming that house mice lived in semiisolated, small, tribal groups, in which genetic drift would play a major role in determining gene frequencies bringing the expected gene frequencies into concordance with observations (Lewontin and Dunn, 1960). Levin et al. (1969) simulated

73 citations

Journal ArticleDOI
TL;DR: It is hypothesized that Oe.
Abstract: Oenothera organensis Munz has been an enigma because of its small population size but rich self-incompatibility (S) gene polymorphism. This perennial is known solely from the Organ Mountains, New Mexico, at elevations between 6,000 and 7,500 ft; and occupies an area of about 18 sq. mi. Emerson (1939, 1940) sampled four populations and found over 45 S-alleles whose distribution was rather homogeneous. Forty-five alleles is very high in view of Emerson's notion that the total population of Oe. organensis was less than 500 plants. There have been several attempts to explain how so many alleles each could be maintained at a low frequency in a species, with so few individuals (Wright, 1939, 1960, 1965; Fisher, 1958; Moran, 1962; Crosby, 1966). Is this very narrow endemic (rich in Salleles) rich or depauperate in genetic variation at other loci? Moreover is the spatial organization of variation at other loci similar to that at the S-locus? The purpose of our study was to obtain answers to these questions. As a prelude to stating our expectations and considering our findings, it is important to recognize a number of aspects of the ecology of the species. Recent exploration of the Organ Mountains by Ritter revealed that Oe. organensis is abundant in canyons that have good drainage. All of the populations known to date, and locations where others are likely, are noted in Figure 1. The species may be composed of 5,000 plants. The largest number of plants occur on the east side of the mountains which is rather mesic in many areas. The dispersal of pollen within and among populations is accomplished by strongflying hawkmoths (Hyles lineata, Manduca quinquemaculata, Sphinx chersis). Seeds have no obvious special dispersal adaptations. Local dispersal presumably is by water and small mammals or birds. Interpopulation seed dispersal is probably accomplished by deer, which heavily browse inflorescences and capsules. The sites of Oenothera are the only watering holes for deer in their peregrinations over ridges between canyons. Given the information on S-locus polymorphism, the distribution of populations and the type of pollen and seed vectors, what may we expect to find for the genome as a whole? Clearly S-allele polymorphism is not indicative of overall genetic variation. Polymorphism at the Slocus is protected in part through frequency-dependent selection; the rare alleles tend to be at an advantage (Wright, 1969). This form of selection tends to retard the decay of genetic variability and insure the retention of at least a few alleles. At other loci, the decay of variability is likely to be rapid, being dictated by the size and number of populations and the amount of isolation and differential selection between the populations. If there is little genetic subdivision of the species, as is suggested by the S-alleles, then the species would be a rather poor repository for genetic variation (Wright, 1948, 1951). In view of its narrow endemism, we hypothesize that Oe. organensis is monomorphic at most allozyme loci and has few alleles at polymorphic loci. The spatial distribution of alleles at other loci is likely to be governed by the same factors operative on S-alleles, especially the pollen and seed flow distributions. Since pollen is carried by strong

72 citations


"Estimating F-statistics for the ana..." refers methods in this paper

  • ...Such an approach is suggested by Nei and Imaizumi (1966), made explicit by Workman and Niswander (1970), discussed by Wright (1978), and used by Chesser (1983), Ellstrand and Levin (1980), Levin (1977, 1978), Levin et al. (1979), Pamilo (1983), and Ryman et al. (1980)....

    [...]

Journal ArticleDOI
01 May 1966-Heredity
TL;DR: The genetic structure of human populations II and theiation of blood group gene frequencies among isolated populations 1 suggest that human populations are partitioned into three main categories: hunter-gatherers, followed byentary populations and then descendants of the hunted and those unaffected by the hunted.
Abstract: Genetic structure of human populations II. Differentiation of blood group gene frequencies among isolated populations 1

70 citations


"Estimating F-statistics for the ana..." refers methods in this paper

  • ...Such an approach is suggested by Nei and Imaizumi (1966), made explicit by Workman and Niswander (1970), discussed by Wright (1978), and used by Chesser (1983), Ellstrand and Levin (1980), Levin (1977, 1978), Levin et al. (1979), Pamilo (1983), and Ryman et al. (1980)....

    [...]