Estimating F-statistics for the analysis of population structure.

doi:10.1111/J.1558-5646.1984.TB05657.X

Home
/
Papers
/
Estimating F-statistics for the analysis of population structure.

Journal Article•DOI•

Estimating F-statistics for the analysis of population structure.

Bruce S. Weir¹, C. Clark Cockerham¹•Institutions (1)

North Carolina State University¹

01 Nov 1984-Evolution (Wiley)-Vol. 38, Iss: 6, pp 1358-1370

TL;DR: The purpose of this discussion is to offer some unity to various estimation formulae and to point out that correlations of genes in structured populations, with which F-statistics are concerned, are expressed very conveniently with a set of parameters treated by Cockerham (1 969, 1973).

read less

Abstract: This journal frequently contains papers that report values of F-statistics estimated from genetic data collected from several populations. These parameters, FST, FIT, and FIS, were introduced by Wright (1951), and offer a convenient means of summarizing population structure. While there is some disagreement about the interpretation of the quantities, there is considerably more disagreement on the method of evaluating them. Different authors make different assumptions about sample sizes or numbers of populations and handle the difficulties of multiple alleles and unequal sample sizes in different ways. Wright himself, for example, did not consider the effects of finite sample size. The purpose of this discussion is to offer some unity to various estimation formulae and to point out that correlations of genes in structured populations, with which F-statistics are concerned, are expressed very conveniently with a set of parameters treated by Cockerham (1 969, 1973). We start with the parameters and construct appropriate estimators for them, rather than beginning the discussion with various data functions. The extension of Cockerham's work to multiple alleles and loci will be made explicit, and the use of jackknife procedures for estimating variances will be advocated. All of this may be regarded as an extension of a recent treatment of estimating the coancestry coefficient to serve as a mea-

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Arlequin (version 3.0): An integrated software package for population genetics data analysis

[...]

Laurent Excoffier¹, Guillaume Laval¹, Stefan W. Schneider¹•Institutions (1)

University of Bern¹

01 Jan 2005-Evolutionary Bioinformatics

TL;DR: Arlequin ver 3.0 as discussed by the authors is a software package integrating several basic and advanced methods for population genetics data analysis, like the computation of standard genetic diversity indices, the estimation of allele and haplotype frequencies, tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium, estimation or parameters from past population expansions, and thorough analyses of population subdivision under the AMOVA framework.

...read moreread less

Abstract: Arlequin ver 3.0 is a software package integrating several basic and advanced methods for population genetics data analysis, like the computation of standard genetic diversity indices, the estimation of allele and haplotype frequencies, tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium, estimation or parameters from past population expansions, and thorough analyses of population subdivision under the AMOVA framework. Arlequin 3 introduces a completely new graphical interface written in C++, a more robust semantic analysis of input files, and two new methods: a Bayesian estimation of gametic phase from multi-locus genotypes, and an estimation of the parameters of an instantaneous spatial expansion from DNA sequence polymorphism. Arlequin can handle several data types like DNA sequences, microsatellite data, or standard multi-locus genotypes. A Windows version of the software is freely available on http://cmpg.unibe.ch/software/arlequin3.

...read moreread less

14,271 citations

Journal Article•DOI•

Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data.

[...]

Laurent Excoffier¹, Peter E. Smouse¹, Joseph M. Quattro¹•Institutions (1)

Rutgers University¹

01 Jun 1992-Genetics

TL;DR: In this article, a framework for the study of molecular variation within a single species is presented, where information on DNA haplotype divergence is incorporated into an analysis of variance format, derived from a matrix of squared-distances among all pairs of haplotypes.

...read moreread less

Abstract: We present here a framework for the study of molecular variation within a single species. Information on DNA haplotype divergence is incorporated into an analysis of variance format, derived from a matrix of squared-distances among all pairs of haplotypes. This analysis of molecular variance (AMOVA) produces estimates of variance components and F-statistic analogs, designated here as phi-statistics, reflecting the correlation of haplotypic diversity at different levels of hierarchical subdivision. The method is flexible enough to accommodate several alternative input matrices, corresponding to different types of molecular data, as well as different types of evolutionary assumptions, without modifying the basic structure of the analysis. The significance of the variance components and phi-statistics is tested using a permutational approach, eliminating the normality assumption that is conventional for analysis of variance but inappropriate for molecular data. Application of AMOVA to human mitochondrial DNA haplotype data shows that population subdivisions are better resolved when some measure of molecular differences among haplotypes is introduced into the analysis. At the intraspecific level, however, the additional information provided by knowing the exact phylogenetic relations among haplotypes or by a nonlinear translation of restriction-site change into nucleotide diversity does not significantly modify the inferred population genetic structure. Monte Carlo studies show that site sampling does not fundamentally affect the significance of the molecular variance components. The AMOVA treatment is easily extended in several different directions and it constitutes a coherent and flexible framework for the statistical analysis of molecular data.

...read moreread less

12,835 citations

Journal Article•

Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data.

[...]

Peter E. Smouse, Laurent Excoffier, Joseph M. Quattro

30 May 1992-Genomics

12,252 citations

Journal Article•DOI•

genepop'007: a complete re-implementation of the genepop software for Windows and Linux

[...]

François Rousset¹•Institutions (1)

University of Montpellier¹

01 Jan 2008-Molecular Ecology Resources

TL;DR: This note summarizes developments of the genepop software since its first description in 1995, and in particular those new to version 4.0: an extended input format, several estimators of neighbourhood size under isolation by distance, new estimators and confidence intervals for null allele frequency, and less important extensions to previous options.

...read moreread less

Abstract: This note summarizes developments of the genepop software since its first description in 1995, and in particular those new to version 4.0: an extended input format, several estimators of neighbourhood size under isolation by distance, new estimators and confidence intervals for null allele frequency, and less important extensions to previous options. genepop now runs under Linux as well as under Windows, and can be entirely controlled by batch calls.

...read moreread less

8,171 citations

Cites background or methods from "Estimating F-statistics for the ana..."

...As further detailed in the genepop documentation, while the single locus estimators are identical, these multilocus estimators differ from the ones described in Weir & Cockerham (1984) and Weir (1996)....
[...]
...…of Weir (1996) give the same weight to estimates of the Q’s for a locus typed at five individuals in each subpopulation as for a locus typed at 50 individuals in each subpopulation, while the estimators or Weir & Cockerham (1984) give less weight to the Q estimates from loci with larger samples....
[...]

Journal Article•DOI•

A second generation human haplotype map of over 3.1 million SNPs

[...]

Kelly A. Frazer¹, Dennis G. Ballinger, David R. Cox, David A. Hinds +234 more•Institutions (48)

18 Oct 2007-Nature

TL;DR: The Phase II HapMap is described, which characterizes over 3.1 million human single nucleotide polymorphisms genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed, and increased differentiation at non-synonymous, compared to synonymous, SNPs is demonstrated.

...read moreread less

Abstract: We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.

...read moreread less

4,565 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Analysis of Gene Diversity in Subdivided Populations

[...]

Masatoshi Nei

01 Dec 1973-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: A method is presented by which the gene diversity (heterozygosity) of a subdivided population can be analyzed into its components, i.e., the gene diversities within and between subpopulations.

...read moreread less

Abstract: A method is presented by which the gene diversity (heterozygosity) of a subdivided population can be analyzed into its components, i.e., the gene diversities within and between subpopulations. This method is applicable to any population without regard to the number of alleles per locus, the pattern of evolutionary forces such as mutation, selection, and migration, and the reproductive method of the organism used. Measures of the absolute and relative magnitudes of gene differentiation among subpopulations are also proposed.

...read moreread less

8,465 citations

"Estimating F-statistics for the ana..." refers background in this paper

...Many papers do not give computational formulae, but generally refer to work by Wright (1943, 1951, 1965, 1973) or Nei (1973, 1977), and any assumptions made about sample sizes are not stated....
[...]

Book•

The jackknife, the bootstrap, and other resampling plans

[...]

Bradley Efron

01 Jan 1987

TL;DR: The Delta Method and the Influence Function Cross-Validation, Jackknife and Bootstrap Balanced Repeated Replication (half-sampling) Random Subsampling Nonparametric Confidence Intervals as mentioned in this paper.

...read moreread less

Abstract: The Jackknife Estimate of Bias The Jackknife Estimate of Variance Bias of the Jackknife Variance Estimate The Bootstrap The Infinitesimal Jackknife The Delta Method and the Influence Function Cross-Validation, Jackknife and Bootstrap Balanced Repeated Replications (Half-Sampling) Random Subsampling Nonparametric Confidence Intervals.

...read moreread less

7,007 citations

Journal Article•DOI•

The genetical structure of populations

[...]

Sewall Wright¹•Institutions (1)

University College London¹

01 Jan 1949-Annals of Human Genetics

6,139 citations

Journal Article•DOI•

Isolation by Distance.

[...]

Sewall Wright¹•Institutions (1)

University of Chicago¹

29 Mar 1943-Genetics

5,446 citations

"Estimating F-statistics for the ana..." refers background in this paper

...Many papers do not give computational formulae, but generally refer to work by Wright (1943, 1951, 1965, 1973) or Nei (1973, 1977), and any assumptions made about sample sizes are not stated....
[...]

Journal Article•DOI•

The interpretation of population structure by F-statistics with special regard to systems of mating

[...]

Sewall Wright¹•Institutions (1)

University of Wisconsin-Madison¹

01 Sep 1965-Evolution

TL;DR: It was found that there is no equilibrium in either case short of complete fixation locally, in spite of the linear increase in number of different ancestors with increasing number of ancestral generations, in contrast to systems (half first cousin or second cousin) in which this increase is more than linear and a steady state is rapidly attained with respect to heterozygosis.

...read moreread less

Abstract: Kimura and Crow (1963b) have recently made an interesting comparison between two classes of systems of mating within populations of constant size: ones in which there is maximum avoidance of consanguine mating and ones in which all matings are between close relatives around an unbroken circle. These are illustrated in Figs. 1 and 2 in populations of eight. The rate of decrease of heterozygosis in the former class had, as they note, been found long before to approach 1/(4N) asymptotically with increasing size of population, N (Wright, 1921, 1933a). Two cases with patterns of mating similar to those of Kimura and Crow's second class, except that the matings were between neighbors along infinitely extended lines instead of around a circle, had also been considered in these papers. These systems consisted of exclusive mating of half-sibs or of first cousins, otherwise with a minimum of relationship. It was found that there is no equilibrium in either case short of complete fixation locally, in spite of the linear increase in number of different ancestors with increasing number of ancestral generations. This was in contrast to systems (half first cousin or second cousin) in which this increase is more than linear and a steady state is rapidly attained with respect to heterozygosis. Kimura and Crow were surprised to find that the limiting rates of decrease of heterozygosis in their circular systems are much less than under maximum avoidance approaching [v/(2N + 4)]2 in the case of half-sib matings and [7/ (N + 12)]2 under first-cousin matings with large N. Maxi-

...read moreread less

3,305 citations