scispace - formally typeset
Search or ask a question
Journal ArticleDOI

FSTAT (Version 1.2): A Computer Program to Calculate F-Statistics

01 Nov 1995-Journal of Heredity (Oxford University Press)-Vol. 86, Iss: 6, pp 485-486
About: This article is published in Journal of Heredity.The article was published on 1995-11-01 and is currently open access. It has received 7585 citations till now.

Content maybe subject to copyright    Report

Computer Note
FSTAT (Version 1.2): A
Computer Program to
Calculate F-Statistics
J.
Goudet
Computation of Wright's
(1943,
1951) fixa-
tion indices
(F
u
,
F
w
and
FJ)
is widespread
among population biologists to assess ge-
netic differentiation of populations. F
u
is a
measure of the within population hetero-
zygote deficit, F
u
is a measure of the
among population heterozygote deficit
(the Walhund effect), and F
lt
is a measure
of the global heterozygote deficit. These
indices could be defined as the correlation
between uniting gametes (Wright 1943,
1951),
as a function of gene diversity in
the total population (Nei 1973, 1977; Nei
and Chesser 1983), or as a function of vari-
ance components from a nested analysis
of variance (Cockerham 1969, 1973; Cock-
erham and Weir
1986;
Weir and Cockerham
1984).
Deriving unbiased estimators of
these quantities has been at the center of
an argument between Nei on the one hand
(Nei 1986; Nei and Chesser 1983) and Weir
and Cockerham (Cockerham 1973; Cock-
erham and Weir 1986, 1987) on the other.
An in-depth review of the properties of
both families of estimators can be found
in Chakraborty and Dander-Hopfe (1991).
Cockerham and Weir (1993) presented an
interpretation of the differences between
Nei's estimator of F^ G
w
and their own, 9,
in terms of the probabilities of identity by
descent.
The most often used package for analy-
sis of allele and genotype frequencies is
Swofford and Selander's (1981) BIOSYS-1.
Although very useful for calculation of ge-
netic distances and building of phyloge-
netic trees, its module on F statistics es-
timation is outdated, because it is based
on Nei's (1977) article, which does not
take sampling effects into account.
1
pro-
pose a PASCAL program that performs the
calculation of Weir and Cockerham's
(1984) estimators of
F
statistics, based on
the Fortran listing published in Weir's
(1990)
Genetic
Data
Analysis,
with several
new features.
The Program FSTAT
The program FSTAT performs the follow-
ing:
Estimated frequency of alleles per sam-
ple and overall (from which indices
such as Nei's gene diversity can be di-
rectly calculated).
Observed and expected heterozygosity
per allele and sample. Expected hetero-
zygosity is calculated using Levene's
(1949) correction for small samples.
F
u
(f in Weir and Cockerham's notation)
estimated per sample over loci.
F
u
(F), F
u
(9), and F
a
(/) estimated per
allele, locus, and globally over all sam-
ples.
A fourth index has been added, a
measure of Hamilton's (1971) related-
ness,
r = 2FJ(l + F^, using the esti-
mator given in Queller and Goodnight
(1988).
This measure is the average re-
latedness of individuals within samples
when compared to the whole. It is often
used in studies of social insects.
Confidence intervals based on resam-
pling schemes are provided: (1) Mean
and variance of F statistics per locus,
estimated from jackknifing over sam-
ples.
(2) Mean and variance of F statis-
tics over loci, estimated from
jackknif-
ing over loci. (3) Bootstrap confidence
intervals of F statistics performed on
the loci.
Calculation of F
u
per pair of samples.
The output is a matrix of F
u
values that
can be used to carry out Slatkin's (1993)
method to test for isolation by distance
because F
a
is closely related to a genet-
ic distance (Reynolds et al. 1983). This
matrix could also be used in Mantel
tests (e.g., Manly 1985).
Test of the significance of F
a
, F^ and /•"
per locus and over all loci using per-
mutations (Excoffier et al. 1992; Hudson
et al. 1992; Manly 1991). The aim is to
obtain the distribution of the null hy-
pothesis, namely F^ not >0, and to
compare this null distribution with the
observed F^ The probability of obtain-
ing by chance a value as large or larger
than the observed is given: (1) For F
a
,
alleles are permuted among individuals
within samples. (2) For F
in
alleles are
permuted among samples. (3) For F
m
two types of tests can be carried out,
depending on the results of the test on
F
u
.
If
F,,
is not significantly different from
zero,
it is valid to permute alleles
among samples to test F
w
because al-
leles can be considered as independent.
If
Ft,
is different from zero, however, al-
leles within individuals are not indepen-
dent anymore, and the appropriate per-
mutation units are the genotypes, to be
permuted among samples.
These tests were developed to avoid the
caveats of existing tests, such as those of
Workman and Niswander (1970), based on
X
2
and therefore relying on large samples
(expected classes larger than 5). Raymond
and Rousset (in press) generalized Fish-
er's exact test for Hardy-Weinberg equilib-
rium to among samples differentiation.
Some problems remain however with this
test, because combining information from
different loci is carried out using Fisher's
procedure (Fisher 1954; Sokal and Rohlf
1981),
which does not weight loci. Fur-
thermore, their test for between sample
differentiation is based on the assumption
that there is Hardy-Weinberg equilibrium
within samples. If there is departure from
it, then alleles within individuals are not
independent, and the exact test for differ-
entiation would lead to erroneous results.
Permutations eliminate those caveats.
Slatkin (1994) pointed out that, when
studying population differentiation, it may
485

be more appropriate to use F^ a statistic
arising naturally, as a test statistic.
The Random Number Generator pro-
posed by L'Ecuyer (1988) was chosen for
the Bootstrap and permutation proce-
dures.
It combines two of the best Multi-
plicative Linear Congruential Generators
known and has passed all the tests for ran-
dom number generators.
The format of the output file (tab sepa-
rators) allows direct reading into many
commercially available spreadsheets, fa-
cilitating printing and graphical represen-
tation of the data.
A real mode version of the program
runs on 80286 (and above) PC compati-
bles.
No coprocessor is required, but will
speed up calculations. A protected mode
version will run on 80386 (and above) PC
compatibles. Again, no coprocessor is re-
quired. This version uses all the available
extended memory, therefore allowing the
processing of larger data sets. The actual
limits are:
Number of samples: 200
Number of locus: 50
Number of alleles at the most polymor-
phic locus: 99
Maximum number of individuals:
5,000
Maximum number of permutations:
15,000
The program is also suited for haploid
data and appropriately handles missing
data, such as a locus missing completely
from one sample. The program is distrib-
uted with no charges. It can be sent elec-
tronically in a Binhexed or unencoded for-
mat (requests should be sent to jero-
me.goudet@izea.unil.ch). Alternatively, it
can be retrieved from the ftp server ora-
cle.bangor.ac.uk after anonymous login, in
the directory pub/fstat.
From the School of Biological Sciences, University of
Wales, Bangor, U.K., and Instltut de Zoologle et d'Ecol-
ogle Anlmale, Bat Biologle, Unlverslte de Lausanne,
Lausanne CH-1015, Switzerland This work forms part
of a research program into the evolutionary effect of
gene flow and has been partly funded by the University
of Wales, Bangor, the Department of Environment, U.K.,
and the Swiss National Science Foundation. It Is a con-
tribution to the Biodiversity Module of the Swiss Pri-
ority Program on the environment. I am Indebted to
Thierry DeMeeus for thorough checking of the program
capabilities. Many thanks are due to Chris Gliddon,
Michel Raymond, and Francois Rousset
The Journal of Heredity 1995:86(6)
Reference*
Chakraborty R and Danker-Hopfe H, 1991 Analysis of
population structure: a comparative study of different
estimators of Wright's fixation Indices. In: Statistical
methods in biological and medical sciences (Rao CR
and Chakraborty
R,
eds) North Holland Elsevter; 203-
254.
Cockerham CC, 1969. Variance of gene frequencies.
Evolution 23:72-84.
Cockerham CC,
1973.
Analysis of gene frequencies. Ge-
netics 74 679-700.
Cockerham CC and Weir BS, 1986 Estimation of in-
breeding parameters in stratified populations Ann
Hum Genet 50.271-281.
Cockerham
CC
and Weir
BS,
1987.
Correlations, descent
measures drift with migration and mutation Proc Natl
Acad Sci USa 848512-8514.
Cockerham CC and Weir BS, 1993. Estimation of gene-
flow from F-statlstlcs. Evolution 47:855-863.
Excoffier
L,
Smouse PE, and Quattro JM,
1992.
Analysis
of molecular variance inferred from metric distances
among DNA haplotypes. Application to human mito-
chondrlal DNA restriction data. Genetics 131.479-491.
Fisher RA, 1954 Statistical methods for research work-
ers,
12th ed. Edinburgh: Oliver and Boyd.
Hamilton WD, 1971. Selection of selfish and altruistic
behavior In some extreme models In: Man and beast:
comparative social behavior (Elsenberg JF and Dillon
WS,
eds). Washington,
D C
Smithsonian Institute
Press,
57-91.
Hudson RR, Boos
DD,
and Kaplan NL, 1992
A
statistical
test to detect geographic subdivision. Mol Biol Evol 9:
138-151
L'Ecuyer P, 1988 Efficient and portable Random Num-
ber Generators Commun ACM 31:147-157.
Levene H, 1949. On a matching problem arising in ge-
netics.
Ann Math Stat 20:91-94.
Manly BJF, 1985 The statistics ol natural selection
London
1
Chapman and Hall.
Manly
BJF,
1991.
Randomization and Monte Carlo meth-
ods in biology London' Chapman and Hall
Nel M, 1973 Analysis of gene diversity in subdivided
populations Proc Natl Acad Scl USA 70 3321-3323.
Nel M, 1977 F-statistics and analysis of gene diversity
in subdivided populations. Ann Hum Genet
41
225-233
Nel M, 1986 Definition and estimation of fixation Indi-
ces Evolution 40.643-645.
Nel
M
and Chesser RK, 1983 Estimation ol fixation In-
dices and gene diversities. Ann Hum Genet 47 253-259
Queller DC and Goodnight KF, 1988. Estimating relat-
edness using genetic markers. Evolution 43:258-275
Raymond
M
and Rousset F, In press. An exact test for
population differentiation. Evolution.
Reynolds J, Weir BS, and Cockerham CC, 1983 Esti-
mation ol the coancestry coefficient: basis for a short-
term genetic distance. Genetics 105.767-779.
SlatHn M, 1993. Isolation by distance in equilibrium
and non-equlllbrlum populations Evolution 47:264-279.
Slatkin M, 1994 An exact test for neutrality based on
the Ewens sampling distribution. Genet Res 64:71-74
Sokal
RR
and Rohlf FJ, 1981 Biometry. New York: Free-
man
Swofford DL and Selander RB, 1981.
Blosys-1:
a FOR-
TRAN program for the comprehensive analysis for elec-
trophoretlc data In population genetics and systemat-
lcs.
J Hered 72:281-283.
Weir BS, 1990. Genetic data analysis. Sunderland, Mas-
sachusetts' Slnauer
Weir BS and Cockerham CC, 1984. Estimating F-statls-
tlcs for the analysis of population structure Evolution
38.1358-1370
Workman PL and Nlswander
JD,
1970. Population stud-
ies on Southwestern Indian tribes D. Local genetic dif-
ferentiation In the Papago. Am J Hum Genet 2224-49.
Wright S, 1943 Isolation by distance. Genetics 28:114-
138
Wright S, 1951 The genetlcal structure of populations.
Ann Eugen 15323-354
Received August 30, 1994
Accepted March 14, 1995
Corresponding Editor: Stephen O'Brien
486 The Journal of Heredity 1995 86(6)
Citations
More filters
Journal ArticleDOI
TL;DR: Arlequin ver 3.0 as discussed by the authors is a software package integrating several basic and advanced methods for population genetics data analysis, like the computation of standard genetic diversity indices, the estimation of allele and haplotype frequencies, tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium, estimation or parameters from past population expansions, and thorough analyses of population subdivision under the AMOVA framework.
Abstract: Arlequin ver 3.0 is a software package integrating several basic and advanced methods for population genetics data analysis, like the computation of standard genetic diversity indices, the estimation of allele and haplotype frequencies, tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium, estimation or parameters from past population expansions, and thorough analyses of population subdivision under the AMOVA framework. Arlequin 3 introduces a completely new graphical interface written in C++, a more robust semantic analysis of input files, and two new methods: a Bayesian estimation of gametic phase from multi-locus genotypes, and an estimation of the parameters of an instantaneous spatial expansion from DNA sequence polymorphism. Arlequin can handle several data types like DNA sequences, microsatellite data, or standard multi-locus genotypes. A Windows version of the software is freely available on http://cmpg.unibe.ch/software/arlequin3.

14,271 citations

Journal ArticleDOI
TL;DR: Spag e d i as discussed by the authors is a software primarily designed to characterize the spatial genetic structure of mapped individuals or populations using genotype data of codominant markers, which is useful for detecting isolation by distance within or among populations and estimating gene dispersal parameters; assessing genetic relatedness between individuals and its actual variance, a parameter of interest for marker-based inferences of quantitative inheritance.
Abstract: spag e d i version 1.0 is a software primarily designed to characterize the spatial genetic structure of mapped individuals or populations using genotype data of codominant markers. It computes various statistics describing genetic relatedness or differentiation between individuals or populations by pairwise comparisons and tests their significance by appropriate numerical resampling. spag e d i is useful for: (i) detecting isolation by distance within or among populations and estimating gene dispersal parameters; (ii) assessing genetic relatedness between individuals and its actual variance, a parameter of interest for marker based inferences of quantitative inheritance; (iii) assessing genetic differentiation among populations, including the case of haploids or autopolyploids.

3,509 citations

Journal ArticleDOI
TL;DR: The expanded population genomics functions in Stacks will make it a useful tool to harness the newest generation of massively parallel genotyping data for ecological and evolutionary genetics.
Abstract: Massively parallel short-read sequencing technologies, coupled with powerful software platforms, are enabling investigators to analyse tens of thousands of genetic markers. This wealth of data is rapidly expanding and allowing biological questions to be addressed with unprecedented scope and precision. The sizes of the data sets are now posing significant data processing and analysis challenges. Here we describe an extension of the Stacks software package to efficiently use genotype-by-sequencing data for studies of populations of organisms. Stacks now produces core population genomic summary statistics and SNP-by-SNP statistical tests. These statistics can be analysed across a reference genome using a smoothed sliding window. Stacks also now provides several output formats for several commonly used downstream analysis packages. The expanded population genomics functions in Stacks will make it a useful tool to harness the newest generation of massively parallel genotyping data for ecological and evolutionary genetics.

2,958 citations

Journal ArticleDOI
TL;DR: GENECLASS2 is a software that computes various genetic assignment criteria to assign or exclude reference populations as the origin of diploid or haploid individuals, as well as of groups of individuals, on the basis of multilocus genotype data, for the specific task of first-generation migrant detection.
Abstract: GENECLASS2 is a software that computes various genetic assignment criteria to assign or exclude reference populations as the origin of diploid or haploid individuals, as well as of groups of individuals, on the basis of multilocus genotype data. In addition to traditional assignment aims, the program allows the specific task of first-generation migrant detection. It includes several Monte Carlo resampling algorithms that compute for each individual its probability of belonging to each reference population or to be a resident (i.e., not a first-generation migrant) in the population where it was sampled. A user-friendly interface facilitates the treatment of large datasets.

2,406 citations


Cites methods from "FSTAT (Version 1.2): A Computer Pro..."

  • ...The file formats accepted by GENECLASS2 are those used by the following population genetics software programs: GENEPOP (Raymond and Rousset 1995), GENETIX (Belkhir et al. 1996–2001), and FSTAT (Goudet 1995)....

    [...]

Journal ArticleDOI
TL;DR: A new approach has emerged for analyzing spatial genetic data without requiring that discrete populations be identified in advance, and promises to facilitate the understanding of how geographical and environmental features structure genetic variation at both the population and individual levels.
Abstract: Understanding the processes and patterns of gene flow and local adaptation requires a detailed knowledge of how landscape characteristics structure populations. This understanding is crucial, not only for improving ecological knowledge, but also for managing properly the genetic diversity of threatened and endangered populations. For nearly 80 years, population geneticists have investigated how physiognomy and other landscape features have influenced genetic variation within and between populations. They have relied on sampling populations that have been identified beforehand because most population genetics methods have required discrete populations. However, a new approach has emerged for analyzing spatial genetic data without requiring that discrete populations be identified in advance. This approach, landscape genetics, promises to facilitate our understanding of how geographical and environmental features structure genetic variation at both the population and individual levels, and has implications for ecology, evolution and conservation biology. It differs from other genetic approaches, such as phylogeography, in that it tends to focus on processes at finer spatial and temporal scales. Here, we discuss, from a population genetic perspective, the current tools available for conducting studies of landscape genetics.

2,248 citations


Additional excerpts

  • ...FSTAT (http://www.unil.ch/izea/softwares/fstat.html) [ 62 ]...

    [...]

References
More filters
Journal ArticleDOI
29 Mar 1943-Genetics

5,446 citations

Journal ArticleDOI
TL;DR: Analytical theory shows that there is a simple relationship between M̂ and geographic distance in both equilibrium and non‐equilibrium populations and that this relationship is approximately independent of mutation rate when the mutation rate is small.
Abstract: It is shown that for allele frequency data a useful measure of the extent of gene flow between a pair of populations is M∘=(1/FST-1)/4, which is the estimated level of gene flow in an island model at equilibrium. For DNA sequence data, the same formula can be used if FST is replaced by NST . In a population with restricted dispersal, analytic theory shows that there is a simple relationship between M and geographic distance in both equilibrium and non-equilibrium populations and that this relationship is approximately independent of mutation rate when the mutation rate is small. Simulation results show that with reasonable sample sizes, isolation by distance can indeed be detected and that, at least in some cases, non-equilibrium patterns can be distinguished. This approach to analyzing isolation by distance is used for two allozyme data sets, one from gulls and one from pocket gophers.

2,499 citations

Journal ArticleDOI
01 Nov 1983-Genetics
TL;DR: Simulations of a monoecious population mating at random showed that a weighted ratio of single-locus estimators performed better than an unweighted average or a least squares estimator in the drift situation.
Abstract: A distance measure for populations diverging by drift only is based on the coancestry coefficient θ, and three estimators of the distance D = - ln (1 - θ) are constructed for multiallelic, multilocus data. Simulations of a monoecious population mating at random showed that a weighted ratio of single-locus estimators performed better than an unweighted average or a least squares estimator. Jackknifing over loci provided satisfactory variance estimates of distance values. In the drift situation, in which mutation is excluded, the weighted estimator of D appears to be a better measure of distance than others that have appeared in the literature.

1,776 citations

Frequently Asked Questions (1)
Q1. What are the contributions in this paper?

In this paper, Chakraborty and Dander-Hopfe presented an interpretation of the differences between Nei 's estimator of F^ Gw and their own, 9, in terms of the probabilities of identity by descent.