scispace - formally typeset
Open AccessJournal ArticleDOI

Clumpak: a program for identifying clustering modes and packaging population structure inferences across K

TLDR
Clumpak, available at http://clumpak.tau.ac.il, simplifies the use of model-based analyses of population structure in population genetics and molecular ecology by automating the postprocessing of results of model‐based population structure analyses.
Abstract
The identification of the genetic structure of populations from multilocus genotype data has become a central component of modern population-genetic data analysis. Application of model-based clustering programs often entails a number of steps, in which the user considers different modelling assumptions, compares results across different predetermined values of the number of assumed clusters (a parameter typically denoted K), examines multiple independent runs for each fixed value of K, and distinguishes among runs belonging to substantially distinct clustering solutions. Here, we present CLUMPAK (Cluster Markov Packager Across K), a method that automates the postprocessing of results of model-based population structure analyses. For analysing multiple independent runs at a single K value, CLUMPAK identifies sets of highly similar runs, separating distinct groups of runs that represent distinct modes in the space of possible solutions. This procedure, which generates a consensus solution for each distinct mode, is performed by the use of a Markov clustering algorithm that relies on a similarity matrix between replicate runs, as computed by the software CLUMPP. Next, CLUMPAK identifies an optimal alignment of inferred clusters across different values of K, extending a similar approach implemented for a fixed K in CLUMPP and simplifying the comparison of clustering results across different K values. CLUMPAK incorporates additional features, such as implementations of methods for choosing K and comparing solutions obtained by different programs, models, or data subsets. CLUMPAK, available at http://clumpak.tau.ac.il, simplifies the use of model-based analyses of population structure in population genetics and molecular ecology.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

The K = 2 conundrum.

TL;DR: This review suggests that many studies may have been over‐ or underestimating population genetic structure; both scenarios have serious consequences, particularly with respect to conservation and management.
Journal ArticleDOI

StructureSelector: A web-based software to select and visualize the optimal number of clusters using multiple methods.

TL;DR: A web‐based user‐friendly software structureselector is developed to calculate the four appealing alternative statistics together with the commonly used Ln Pr(X|K) and ΔK statistics.
Journal ArticleDOI

Genetic diversity of the African malaria vector Anopheles gambiae.

Alistair Miles, +70 more
- 07 Dec 2017 - 
TL;DR: These data revealed complex population structure and patterns of gene flow, with evidence of ancient expansions, recent bottlenecks, and local variation in effective population size.
Journal ArticleDOI

pong: fast analysis and visualization of latent clusters in population genetic data

TL;DR: Pong is introduced, a network-graphical approach for analyzing and visualizing membership in latent clusters with a native interactive D3.js visualization that outpaces current solutions by more than an order of magnitude in runtime while providing a customizable and interactive visualization of population structure that is more accurate than those produced by current tools.
References
More filters
Journal ArticleDOI

Inference of population structure using multilocus genotype data

TL;DR: Pritch et al. as discussed by the authors proposed a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations, which can be applied to most of the commonly used genetic markers, provided that they are not closely linked.
Journal ArticleDOI

Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study.

TL;DR: It is found that in most cases the estimated ‘log probability of data’ does not provide a correct estimation of the number of clusters, K, and using an ad hoc statistic ΔK based on the rate of change in the log probability between successive K values, structure accurately detects the uppermost hierarchical level of structure for the scenarios the authors tested.
Book

Finding Groups in Data: An Introduction to Cluster Analysis

TL;DR: An electrical signal transmission system, applicable to the transmission of signals from trackside hot box detector equipment for railroad locomotives and rolling stock, wherein a basic pulse train is transmitted whereof the pulses are of a selected first amplitude and represent a train axle count.
Journal ArticleDOI

STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method

TL;DR: STRUCTURE HARVESTER is presented, a web-based program for collating results generated by the program STRUCTURE, which provides a fast way to assess and visualize likelihood values across multiple values of K and hundreds of iterations for easier detection of the number of genetic groups that best fit the data.
Related Papers (5)