Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study.
read more
Citations
STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method
CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure
Discriminant analysis of principal components: a new method for the analysis of genetically structured populations
Inferring weak population structure with the assistance of sample group information.
Clumpak: a program for identifying clustering modes and packaging population structure inferences across K
References
The neighbor-joining method: a new method for reconstructing phylogenetic trees.
Inference of population structure using multilocus genotype data
AFLP: a new technique for DNA fingerprinting.
Evolution in Mendelian Populations.
Inference of Population Structure Using Multilocus Genotype Data: Linked Loci and Correlated Allele Frequencies
Related Papers (5)
Inference of population structure using multilocus genotype data
STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method
genalex 6: genetic analysis in Excel. Population genetic software for teaching and research
Frequently Asked Questions (13)
Q2. What is the predictor of the real number of clusters?
The authors find that ∆ K, an ad hoc quantity related to the second order rate of change of the log probability of data with respect to the number of clusters, is a good predictor of the real number of clusters.
Q3. What other markers are commonly used in populations studies?
An alternative family of markers also commonly used in populations studies are the amplified fragment length polymorphism (AFLPs) (Vos et al. 1995).
Q4. What is the expected value of FST between archipelagos?
The expected value of FST is 0.30 between archipelagos ( FArchipelago-Total ), 0.16 between islands within archipelagos ( FIsland-Archipelago), and 0.41 overall ( FIsland-Total ).
Q5. What is the goal of this study?
The goal of this study is to test the ability of the algorithm underlying the software structureto detect the number of clusters in situations including more than two populations.
Q6. Why did the authors divide m(|L(K)|) by s?
The authors divided m(|L′′(K )|) by s[L(K )] because the authors found a clear and general trend toward an increase of the variance of L(K) between runs as K increased.
Q7. What is the name of the alternative model-based method?
An alternative model-based method developed recently by Pritchard et al . (2000) and implemented in the software structureaims at delineating clusters of individuals on the basis of their genotypes at multiple loci using a Bayesian approach.
Q8. Why did the authors restrict their simulations to cases of moderate to strong structure?
The authors restricted their simulations to cases of moderate to strong structure at different hierarchical levels because their goal was to test the ability of the algorithm to detect the number of groups of individuals in situations when different layers of population structure exist, as is often the case in real situations.
Q9. What is the common criterion used to identify true number of populations?
True number of populations ( K ) is often identified using the maximal value of L ( K ) returned by structure (Zeisset & Beebee 2001; Ciofi et al .
Q10. How many clusters of individuals can be detected with the AFLP?
few nonhuman species could be genotyped with such intensity, but this study indicates that detection of the correct number of clusters can still be found when differentiation is weaker than in their main simulations, and this was confirmed by further limited simulations with FST among archipelagos as low as 3.8% (see above).
Q11. What was the effect of subsampling of individuals or loci?
Subsampling of individuals or loci reduced the height of the modal value of ∆K (Fig. 4G, H), and 10 AFLPs produced a weaker signal than one microsatellite because the average magnitude of the height of the modal value of ∆K was twice lower for the former.
Q12. What is the expected value of FST for this model?
The expected value of FST for this model cannot be easily analytically resolved, but global FST estimated over the 10 replicates (10 times 100 microsatellite loci) is 0.33 and pairwise FST range from 0.16 to 0.43.
Q13. How many individuals did Rosenberg et al. (2002) find in the data set?
Rosenberg et al. (2002) showed empirically on a very large microsatellite data set (377 loci) encompassing 1026 individuals from the five continents that humans cluster in five groups, loosely corresponding to the five continents.