scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Estimating F-statistics for the analysis of population structure.

01 Nov 1984-Evolution (Wiley)-Vol. 38, Iss: 6, pp 1358-1370
TL;DR: The purpose of this discussion is to offer some unity to various estimation formulae and to point out that correlations of genes in structured populations, with which F-statistics are concerned, are expressed very conveniently with a set of parameters treated by Cockerham (1 969, 1973).
Abstract: This journal frequently contains papers that report values of F-statistics estimated from genetic data collected from several populations. These parameters, FST, FIT, and FIS, were introduced by Wright (1951), and offer a convenient means of summarizing population structure. While there is some disagreement about the interpretation of the quantities, there is considerably more disagreement on the method of evaluating them. Different authors make different assumptions about sample sizes or numbers of populations and handle the difficulties of multiple alleles and unequal sample sizes in different ways. Wright himself, for example, did not consider the effects of finite sample size. The purpose of this discussion is to offer some unity to various estimation formulae and to point out that correlations of genes in structured populations, with which F-statistics are concerned, are expressed very conveniently with a set of parameters treated by Cockerham (1 969, 1973). We start with the parameters and construct appropriate estimators for them, rather than beginning the discussion with various data functions. The extension of Cockerham's work to multiple alleles and loci will be made explicit, and the use of jackknife procedures for estimating variances will be advocated. All of this may be regarded as an extension of a recent treatment of estimating the coancestry coefficient to serve as a mea-
Citations
More filters
Journal ArticleDOI
TL;DR: Arlequin ver 3.0 as discussed by the authors is a software package integrating several basic and advanced methods for population genetics data analysis, like the computation of standard genetic diversity indices, the estimation of allele and haplotype frequencies, tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium, estimation or parameters from past population expansions, and thorough analyses of population subdivision under the AMOVA framework.
Abstract: Arlequin ver 3.0 is a software package integrating several basic and advanced methods for population genetics data analysis, like the computation of standard genetic diversity indices, the estimation of allele and haplotype frequencies, tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium, estimation or parameters from past population expansions, and thorough analyses of population subdivision under the AMOVA framework. Arlequin 3 introduces a completely new graphical interface written in C++, a more robust semantic analysis of input files, and two new methods: a Bayesian estimation of gametic phase from multi-locus genotypes, and an estimation of the parameters of an instantaneous spatial expansion from DNA sequence polymorphism. Arlequin can handle several data types like DNA sequences, microsatellite data, or standard multi-locus genotypes. A Windows version of the software is freely available on http://cmpg.unibe.ch/software/arlequin3.

14,271 citations

Journal ArticleDOI
01 Jun 1992-Genetics
TL;DR: In this article, a framework for the study of molecular variation within a single species is presented, where information on DNA haplotype divergence is incorporated into an analysis of variance format, derived from a matrix of squared-distances among all pairs of haplotypes.
Abstract: We present here a framework for the study of molecular variation within a single species. Information on DNA haplotype divergence is incorporated into an analysis of variance format, derived from a matrix of squared-distances among all pairs of haplotypes. This analysis of molecular variance (AMOVA) produces estimates of variance components and F-statistic analogs, designated here as phi-statistics, reflecting the correlation of haplotypic diversity at different levels of hierarchical subdivision. The method is flexible enough to accommodate several alternative input matrices, corresponding to different types of molecular data, as well as different types of evolutionary assumptions, without modifying the basic structure of the analysis. The significance of the variance components and phi-statistics is tested using a permutational approach, eliminating the normality assumption that is conventional for analysis of variance but inappropriate for molecular data. Application of AMOVA to human mitochondrial DNA haplotype data shows that population subdivisions are better resolved when some measure of molecular differences among haplotypes is introduced into the analysis. At the intraspecific level, however, the additional information provided by knowing the exact phylogenetic relations among haplotypes or by a nonlinear translation of restriction-site change into nucleotide diversity does not significantly modify the inferred population genetic structure. Monte Carlo studies show that site sampling does not fundamentally affect the significance of the molecular variance components. The AMOVA treatment is easily extended in several different directions and it constitutes a coherent and flexible framework for the statistical analysis of molecular data.

12,835 citations

Journal ArticleDOI
TL;DR: This note summarizes developments of the genepop software since its first description in 1995, and in particular those new to version 4.0: an extended input format, several estimators of neighbourhood size under isolation by distance, new estimators and confidence intervals for null allele frequency, and less important extensions to previous options.
Abstract: This note summarizes developments of the genepop software since its first description in 1995, and in particular those new to version 4.0: an extended input format, several estimators of neighbourhood size under isolation by distance, new estimators and confidence intervals for null allele frequency, and less important extensions to previous options. genepop now runs under Linux as well as under Windows, and can be entirely controlled by batch calls.

8,171 citations


Cites background or methods from "Estimating F-statistics for the ana..."

  • ...As further detailed in the genepop documentation, while the single locus estimators are identical, these multilocus estimators differ from the ones described in Weir & Cockerham (1984) and Weir (1996)....

    [...]

  • ...…of Weir (1996) give the same weight to estimates of the Q’s for a locus typed at five individuals in each subpopulation as for a locus typed at 50 individuals in each subpopulation, while the estimators or Weir & Cockerham (1984) give less weight to the Q estimates from loci with larger samples....

    [...]

Journal ArticleDOI
18 Oct 2007-Nature
TL;DR: The Phase II HapMap is described, which characterizes over 3.1 million human single nucleotide polymorphisms genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed, and increased differentiation at non-synonymous, compared to synonymous, SNPs is demonstrated.
Abstract: We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.

4,565 citations

References
More filters
Journal ArticleDOI
TL;DR: A method is presented by which the gene diversity (heterozygosity) of a subdivided population can be analyzed into its components, i.e., the gene diversities within and between subpopulations.
Abstract: A method is presented by which the gene diversity (heterozygosity) of a subdivided population can be analyzed into its components, i.e., the gene diversities within and between subpopulations. This method is applicable to any population without regard to the number of alleles per locus, the pattern of evolutionary forces such as mutation, selection, and migration, and the reproductive method of the organism used. Measures of the absolute and relative magnitudes of gene differentiation among subpopulations are also proposed.

8,465 citations


"Estimating F-statistics for the ana..." refers background in this paper

  • ...Many papers do not give computational formulae, but generally refer to work by Wright (1943, 1951, 1965, 1973) or Nei (1973, 1977), and any assumptions made about sample sizes are not stated....

    [...]

Book
01 Jan 1987
TL;DR: The Delta Method and the Influence Function Cross-Validation, Jackknife and Bootstrap Balanced Repeated Replication (half-sampling) Random Subsampling Nonparametric Confidence Intervals as mentioned in this paper.
Abstract: The Jackknife Estimate of Bias The Jackknife Estimate of Variance Bias of the Jackknife Variance Estimate The Bootstrap The Infinitesimal Jackknife The Delta Method and the Influence Function Cross-Validation, Jackknife and Bootstrap Balanced Repeated Replications (Half-Sampling) Random Subsampling Nonparametric Confidence Intervals.

7,007 citations

Journal ArticleDOI
29 Mar 1943-Genetics

5,446 citations


"Estimating F-statistics for the ana..." refers background in this paper

  • ...Many papers do not give computational formulae, but generally refer to work by Wright (1943, 1951, 1965, 1973) or Nei (1973, 1977), and any assumptions made about sample sizes are not stated....

    [...]

Journal ArticleDOI
TL;DR: It was found that there is no equilibrium in either case short of complete fixation locally, in spite of the linear increase in number of different ancestors with increasing number of ancestral generations, in contrast to systems (half first cousin or second cousin) in which this increase is more than linear and a steady state is rapidly attained with respect to heterozygosis.
Abstract: Kimura and Crow (1963b) have recently made an interesting comparison between two classes of systems of mating within populations of constant size: ones in which there is maximum avoidance of consanguine mating and ones in which all matings are between close relatives around an unbroken circle. These are illustrated in Figs. 1 and 2 in populations of eight. The rate of decrease of heterozygosis in the former class had, as they note, been found long before to approach 1/(4N) asymptotically with increasing size of population, N (Wright, 1921, 1933a). Two cases with patterns of mating similar to those of Kimura and Crow's second class, except that the matings were between neighbors along infinitely extended lines instead of around a circle, had also been considered in these papers. These systems consisted of exclusive mating of half-sibs or of first cousins, otherwise with a minimum of relationship. It was found that there is no equilibrium in either case short of complete fixation locally, in spite of the linear increase in number of different ancestors with increasing number of ancestral generations. This was in contrast to systems (half first cousin or second cousin) in which this increase is more than linear and a steady state is rapidly attained with respect to heterozygosis. Kimura and Crow were surprised to find that the limiting rates of decrease of heterozygosis in their circular systems are much less than under maximum avoidance approaching [v/(2N + 4)]2 in the case of half-sib matings and [7/ (N + 12)]2 under first-cousin matings with large N. Maxi-

3,305 citations