Journal Article•DOI•

FSTAT (Version 1.2): A Computer Program to Calculate F-Statistics

01 Nov 1995-Journal of Heredity (Oxford University Press)-Vol. 86, Iss: 6, pp 485-486

About: This article is published in Journal of Heredity.The article was published on 1995-11-01 and is currently open access. It has received 7585 citations till now.

...read moreread less

Content maybe subject to copyright Report

Computer Note

FSTAT (Version 1.2): A

Computer Program to

Calculate F-Statistics

Goudet

Computation of Wright's

(1943,

1951) fixa-

tion indices

and

FJ)

is widespread

among population biologists to assess ge-

netic differentiation of populations. F

is a

measure of the within population hetero-

zygote deficit, F

is a measure of the

among population heterozygote deficit

(the Walhund effect), and F

is a measure

of the global heterozygote deficit. These

indices could be defined as the correlation

between uniting gametes (Wright 1943,

1951),

as a function of gene diversity in

the total population (Nei 1973, 1977; Nei

and Chesser 1983), or as a function of vari-

ance components from a nested analysis

of variance (Cockerham 1969, 1973; Cock-

erham and Weir

1986;

Weir and Cockerham

1984).

Deriving unbiased estimators of

these quantities has been at the center of

an argument between Nei on the one hand

(Nei 1986; Nei and Chesser 1983) and Weir

and Cockerham (Cockerham 1973; Cock-

erham and Weir 1986, 1987) on the other.

An in-depth review of the properties of

both families of estimators can be found

in Chakraborty and Dander-Hopfe (1991).

Cockerham and Weir (1993) presented an

interpretation of the differences between

Nei's estimator of F^ G

and their own, 9,

in terms of the probabilities of identity by

descent.

The most often used package for analy-

sis of allele and genotype frequencies is

Swofford and Selander's (1981) BIOSYS-1.

Although very useful for calculation of ge-

netic distances and building of phyloge-

netic trees, its module on F statistics es-

timation is outdated, because it is based

on Nei's (1977) article, which does not

take sampling effects into account.

pro-

pose a PASCAL program that performs the

calculation of Weir and Cockerham's

(1984) estimators of

statistics, based on

the Fortran listing published in Weir's

(1990)

Genetic

Data

Analysis,

with several

new features.

The Program FSTAT

The program FSTAT performs the follow-

ing:

• Estimated frequency of alleles per sam-

ple and overall (from which indices

such as Nei's gene diversity can be di-

rectly calculated).

• Observed and expected heterozygosity

per allele and sample. Expected hetero-

zygosity is calculated using Levene's

(1949) correction for small samples.

• F

(f in Weir and Cockerham's notation)

estimated per sample over loci.

• F

(F), F

(9), and F

(/) estimated per

allele, locus, and globally over all sam-

ples.

A fourth index has been added, a

measure of Hamilton's (1971) related-

ness,

r = 2FJ(l + F^, using the esti-

mator given in Queller and Goodnight

(1988).

This measure is the average re-

latedness of individuals within samples

when compared to the whole. It is often

used in studies of social insects.

• Confidence intervals based on resam-

pling schemes are provided: (1) Mean

and variance of F statistics per locus,

estimated from jackknifing over sam-

ples.

(2) Mean and variance of F statis-

tics over loci, estimated from

jackknif-

ing over loci. (3) Bootstrap confidence

intervals of F statistics performed on

the loci.

• Calculation of F

per pair of samples.

The output is a matrix of F

values that

can be used to carry out Slatkin's (1993)

method to test for isolation by distance

because F

is closely related to a genet-

ic distance (Reynolds et al. 1983). This

matrix could also be used in Mantel

tests (e.g., Manly 1985).

• Test of the significance of F

, F^ and /•"„

per locus and over all loci using per-

mutations (Excoffier et al. 1992; Hudson

et al. 1992; Manly 1991). The aim is to

obtain the distribution of the null hy-

pothesis, namely F^ not >0, and to

compare this null distribution with the

observed F^ The probability of obtain-

ing by chance a value as large or larger

than the observed is given: (1) For F

alleles are permuted among individuals

within samples. (2) For F

alleles are

permuted among samples. (3) For F

two types of tests can be carried out,

depending on the results of the test on

F,,

is not significantly different from

zero,

it is valid to permute alleles

among samples to test F

because al-

leles can be considered as independent.

Ft,

is different from zero, however, al-

leles within individuals are not indepen-

dent anymore, and the appropriate per-

mutation units are the genotypes, to be

permuted among samples.

These tests were developed to avoid the

caveats of existing tests, such as those of

Workman and Niswander (1970), based on

and therefore relying on large samples

(expected classes larger than 5). Raymond

and Rousset (in press) generalized Fish-

er's exact test for Hardy-Weinberg equilib-

rium to among samples differentiation.

Some problems remain however with this

test, because combining information from

different loci is carried out using Fisher's

procedure (Fisher 1954; Sokal and Rohlf

1981),

which does not weight loci. Fur-

thermore, their test for between sample

differentiation is based on the assumption

that there is Hardy-Weinberg equilibrium

within samples. If there is departure from

it, then alleles within individuals are not

independent, and the exact test for differ-

entiation would lead to erroneous results.

Permutations eliminate those caveats.

Slatkin (1994) pointed out that, when

studying population differentiation, it may

485

be more appropriate to use F^ a statistic

arising naturally, as a test statistic.

The Random Number Generator pro-

posed by L'Ecuyer (1988) was chosen for

the Bootstrap and permutation proce-

dures.

It combines two of the best Multi-

plicative Linear Congruential Generators

known and has passed all the tests for ran-

dom number generators.

The format of the output file (tab sepa-

rators) allows direct reading into many

commercially available spreadsheets, fa-

cilitating printing and graphical represen-

tation of the data.

A real mode version of the program

runs on 80286 (and above) PC compati-

bles.

No coprocessor is required, but will

speed up calculations. A protected mode

version will run on 80386 (and above) PC

compatibles. Again, no coprocessor is re-

quired. This version uses all the available

extended memory, therefore allowing the

processing of larger data sets. The actual

limits are:

• Number of samples: 200

• Number of locus: 50

• Number of alleles at the most polymor-

phic locus: 99

• Maximum number of individuals:

5,000

• Maximum number of permutations:

15,000

The program is also suited for haploid

data and appropriately handles missing

data, such as a locus missing completely

from one sample. The program is distrib-

uted with no charges. It can be sent elec-

tronically in a Binhexed or unencoded for-

mat (requests should be sent to jero-

me.goudet@izea.unil.ch). Alternatively, it

can be retrieved from the ftp server ora-

cle.bangor.ac.uk after anonymous login, in

the directory pub/fstat.

From the School of Biological Sciences, University of

Wales, Bangor, U.K., and Instltut de Zoologle et d'Ecol-

ogle Anlmale, Bat Biologle, Unlverslte de Lausanne,

Lausanne CH-1015, Switzerland This work forms part

of a research program into the evolutionary effect of

gene flow and has been partly funded by the University

of Wales, Bangor, the Department of Environment, U.K.,

and the Swiss National Science Foundation. It Is a con-

tribution to the Biodiversity Module of the Swiss Pri-

ority Program on the environment. I am Indebted to

Thierry DeMeeus for thorough checking of the program

capabilities. Many thanks are due to Chris Gliddon,

Michel Raymond, and Francois Rousset

The Journal of Heredity 1995:86(6)

Reference*

Chakraborty R and Danker-Hopfe H, 1991 Analysis of

population structure: a comparative study of different

estimators of Wright's fixation Indices. In: Statistical

methods in biological and medical sciences (Rao CR

and Chakraborty

eds) North Holland Elsevter; 203-

254.

Cockerham CC, 1969. Variance of gene frequencies.

Evolution 23:72-84.

Cockerham CC,

1973.

Analysis of gene frequencies. Ge-

netics 74 679-700.

Cockerham CC and Weir BS, 1986 Estimation of in-

breeding parameters in stratified populations Ann

Hum Genet 50.271-281.

Cockerham

and Weir

BS,

1987.

Correlations, descent

measures drift with migration and mutation Proc Natl

Acad Sci USa 848512-8514.

Cockerham CC and Weir BS, 1993. Estimation of gene-

flow from F-statlstlcs. Evolution 47:855-863.

Excoffier

Smouse PE, and Quattro JM,

1992.

Analysis

of molecular variance inferred from metric distances

among DNA haplotypes. Application to human mito-

chondrlal DNA restriction data. Genetics 131.479-491.

Fisher RA, 1954 Statistical methods for research work-

ers,

12th ed. Edinburgh: Oliver and Boyd.

Hamilton WD, 1971. Selection of selfish and altruistic

behavior In some extreme models In: Man and beast:

comparative social behavior (Elsenberg JF and Dillon

WS,

eds). Washington,

D C •

Smithsonian Institute

Press,

57-91.

Hudson RR, Boos

DD,

and Kaplan NL, 1992

statistical

test to detect geographic subdivision. Mol Biol Evol 9:

138-151

L'Ecuyer P, 1988 Efficient and portable Random Num-

ber Generators Commun ACM 31:147-157.

Levene H, 1949. On a matching problem arising in ge-

netics.

Ann Math Stat 20:91-94.

Manly BJF, 1985 The statistics ol natural selection

London

Chapman and Hall.

Manly

BJF,

1991.

Randomization and Monte Carlo meth-

ods in biology London' Chapman and Hall

Nel M, 1973 Analysis of gene diversity in subdivided

populations Proc Natl Acad Scl USA 70 3321-3323.

Nel M, 1977 F-statistics and analysis of gene diversity

in subdivided populations. Ann Hum Genet

225-233

Nel M, 1986 Definition and estimation of fixation Indi-

ces Evolution 40.643-645.

Nel

and Chesser RK, 1983 Estimation ol fixation In-

dices and gene diversities. Ann Hum Genet 47 253-259

Queller DC and Goodnight KF, 1988. Estimating relat-

edness using genetic markers. Evolution 43:258-275

Raymond

and Rousset F, In press. An exact test for

population differentiation. Evolution.

Reynolds J, Weir BS, and Cockerham CC, 1983 Esti-

mation ol the coancestry coefficient: basis for a short-

term genetic distance. Genetics 105.767-779.

SlatHn M, 1993. Isolation by distance in equilibrium

and non-equlllbrlum populations Evolution 47:264-279.

Slatkin M, 1994 An exact test for neutrality based on

the Ewens sampling distribution. Genet Res 64:71-74

Sokal

and Rohlf FJ, 1981 Biometry. New York: Free-

man

Swofford DL and Selander RB, 1981.

Blosys-1:

a FOR-

TRAN program for the comprehensive analysis for elec-

trophoretlc data In population genetics and systemat-

lcs.

J Hered 72:281-283.

Weir BS, 1990. Genetic data analysis. Sunderland, Mas-

sachusetts' Slnauer

Weir BS and Cockerham CC, 1984. Estimating F-statls-

tlcs for the analysis of population structure Evolution

38.1358-1370

Workman PL and Nlswander

JD,

1970. Population stud-

ies on Southwestern Indian tribes D. Local genetic dif-

ferentiation In the Papago. Am J Hum Genet 2224-49.

Wright S, 1943 Isolation by distance. Genetics 28:114-

138

Wright S, 1951 The genetlcal structure of populations.

Ann Eugen 15323-354

Received August 30, 1994

Accepted March 14, 1995

Corresponding Editor: Stephen O'Brien

486 The Journal of Heredity 1995 86(6)

HTML Viewer

Frequently Asked Questions (1)

Q1. What are the contributions in this paper?

In this paper, Chakraborty and Dander-Hopfe presented an interpretation of the differences between Nei 's estimator of F^ Gw and their own, 9, in terms of the probabilities of identity by descent.

FSTAT (Version 1.2): A Computer Program to Calculate F-Statistics

Citations

Cites methods from "FSTAT (Version 1.2): A Computer Pro..."

Additional excerpts

References

Related Papers (5)

Frequently Asked Questions (1)

Q1. What are the contributions in this paper?