scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The K = 2 conundrum.

TL;DR: This review suggests that many studies may have been over‐ or underestimating population genetic structure; both scenarios have serious consequences, particularly with respect to conservation and management.
Abstract: Assessments of population genetic structure have become an increasing focus as they can provide valuable insight into patterns of migration and gene flow structure, the most highly cited of several clustering-based methods, was developed to provide robust estimates without the need for populations to be determined a priori structure introduces the problem of selecting the optimal number of clusters, and as a result, the ΔK method was proposed to assist in the identification of the "true" number of clusters In our review of 1,264 studies using structure to explore population subdivision, studies that used ΔK were more likely to identify K = 2 (54%, 443/822) than studies that did not use ΔK (21%, 82/386) A troubling finding was that very few studies performed the hierarchical analysis recommended by the authors of both ΔK and structure to fully explore population subdivision Furthermore, extensions of earlier simulations indicate that, with a representative number of markers, ΔK frequently identifies K = 2 as the top level of hierarchical structure, even when more subpopulations are present This review suggests that many studies may have been over- or underestimating population genetic structure; both scenarios have serious consequences, particularly with respect to conservation and management We recommend publication standards for population structure results so that readers can assess the implications of the results given their own understanding of the species biology
Citations
More filters
Journal ArticleDOI
TL;DR: An approach is implemented to assess the goodness of fit of the model using the ancestry “palettes” estimated by CHROMOPAINTER and apply it to both simulated data and real case studies, allowing a richer and more robust analysis of recent demographic history.
Abstract: Genetic clustering algorithms, implemented in programs such as STRUCTURE and ADMIXTURE, have been used extensively in the characterisation of individuals and populations based on genetic data. A successful example is the reconstruction of the genetic history of African Americans as a product of recent admixture between highly differentiated populations. Histories can also be reconstructed using the same procedure for groups that do not have admixture in their recent history, where recent genetic drift is strong or that deviate in other ways from the underlying inference model. Unfortunately, such histories can be misleading. We have implemented an approach, badMIXTURE, to assess the goodness of fit of the model using the ancestry “palettes” estimated by CHROMOPAINTER and apply it to both simulated data and real case studies. Combining these complementary analyses with additional methods that are designed to test specific hypotheses allows a richer and more robust analysis of recent demographic history.

324 citations

Journal ArticleDOI
TL;DR: The results suggest that hybridization with P. theophrasti was of central importance in the diversification history of the cultivated date palm and a survey of Phoenix remains in the archaeobotanical record supports a late arrival of date palm to North Africa.
Abstract: Date palm ( Phoenix dactylifera L.) is a major fruit crop of arid regions that were domesticated ∼7,000 y ago in the Near or Middle East. This species is cultivated widely in the Middle East and North Africa, and previous population genetic studies have shown genetic differentiation between these regions. We investigated the evolutionary history of P. dactylifera and its wild relatives by resequencing the genomes of date palm varieties and five of its closest relatives. Our results indicate that the North African population has mixed ancestry with components from Middle Eastern P. dactylifera and Phoenix theophrasti , a wild relative endemic to the Eastern Mediterranean. Introgressive hybridization is supported by tests of admixture, reduced subdivision between North African date palm and P. theophrasti , sharing of haplotypes in introgressed regions, and a population model that incorporates gene flow between these populations. Analysis of ancestry proportions indicates that as much as 18% of the genome of North African varieties can be traced to P. theophrasti and a large percentage of loci in this population are segregating for single-nucleotide polymorphisms (SNPs) that are fixed in P. theophrasti and absent from date palm in the Middle East. We present a survey of Phoenix remains in the archaeobotanical record which supports a late arrival of date palm to North Africa. Our results suggest that hybridization with P. theophrasti was of central importance in the diversification history of the cultivated date palm.

93 citations


Cites methods from "The K = 2 conundrum."

  • ...Finally, we conducted a set of analyses restricted to species pairs (“hierarchical” analysis) (31)....

    [...]

Journal ArticleDOI
04 Aug 2020-Heredity
TL;DR: It was found that with a priori groupings, distance between genetic clusters reflected underlying FST, and when migration rates were high and groups were described de novo there was considerable inaccuracy, both in terms of the number of genetic clusters suggested and placement of individuals into those clusters.
Abstract: Inference of genetic clusters is a key aim of population genetics, sparking development of numerous analytical methods. Within these, there is a conceptual divide between finding de novo structure versus assessment of a priori groups. Recently developed, Discriminant Analysis of Principal Components (DAPC), combines discriminant analysis (DA) with principal component (PC) analysis. When applying DAPC, the groups used in the DA (specified a priori or described de novo) need to be carefully assessed. While DAPC has rapidly become a core technique, the sensitivity of the method to misspecification of groups and how it is being empirically applied, are unknown. To address this, we conducted a simulation study examining the influence of a priori versus de novo group designations, and a literature review of how DAPC is being applied. We found that with a priori groupings, distance between genetic clusters reflected underlying FST. However, when migration rates were high and groups were described de novo there was considerable inaccuracy, both in terms of the number of genetic clusters suggested and placement of individuals into those clusters. Nearly all (90.1%) of 224 studies surveyed used DAPC to find de novo clusters, and for the majority (62.5%) the stated goal matched the results. However, most studies (52.3%) omit key run parameters, preventing repeatability and transparency. Therefore, we present recommendations for standard reporting of parameters used in DAPC analyses. The influence of groupings in genetic clustering is not unique to DAPC, and researchers need to consider their goal and which methods will be most appropriate.

65 citations


Cites background or methods from "The K = 2 conundrum."

  • ...…being introduced (e.g., Bradburd et al. 2018; Wang 2019) and best practices for others refined (e.g., Gilbert et al. 2012; Verity and Nichols 2016; Janes et al. 2017; Cullingham et al. 2020) researchers are turning to a “total evidence approach,” using multiple analysis methods on their data....

    [...]

  • ...However, similar to what has been done for other clustering methods (Latch et al. 2006; Patterson et al. 2006; Janes et al. 2017; Cullingham et al. 2020), exploration of more migration scenarios with different numbers of sampled individuals and loci will be needed to firmly establish a detection…...

    [...]

  • ...This number of loci was chosen as it was the average number seen in a previous review of papers applying STRUCTURE for determining genetic clusters (Janes et al. 2017)....

    [...]

  • ...However, similar to what has been done for other clustering methods (Latch et al. 2006; Patterson et al. 2006; Janes et al. 2017; Cullingham et al. 2020), exploration of more migration scenarios with different numbers of sampled individuals and loci will be needed to firmly establish a detection threshold....

    [...]

  • ...This reporting has likely been spurred after a period where best practices were developed and discussed in the literature (Pritchard et al. 2000; Evanno et al. 2005; Gilbert et al. 2012; Puechmaille 2016; Janes et al. 2017; Wang 2017; Cullingham et al. 2020)....

    [...]

Journal ArticleDOI
TL;DR: The findings indicate that invasive species might be repeatedly introduced from their native range, and they emphasize the importance of multiple, human‐mediated introductions in successful invasions.
Abstract: Retracing introduction routes is crucial for understanding the evolutionary processes involved in an invasion, as well as for highlighting the invasion history of a species at the global scale. The Asian long-horned beetle (ALB) Anoplophora glabripennis is a xylophagous pest native to Asia and invasive in North America and Europe. It is responsible for severe losses of urban trees, in both its native and invaded ranges. Based on historical and genetic data, several hypotheses have been formulated concerning its invasion history, including the possibility of multiple introductions from the native zone and secondary dispersal within the invaded areas, but none have been formally tested. In this study, we characterized the genetic structure of ALB in both its native and invaded ranges using microsatellites. In order to test different invasion scenarios, we used an approximate Bayesian "random forest" algorithm together with traditional population genetics approaches. The strong population differentiation observed in the native area was not geographically structured, suggesting complex migration events that were probably human-mediated. Both native and invasive populations had low genetic diversity, but this characteristic did not prevent the success of the ALB invasions. Our results highlight the complexity of invasion pathways for insect pests. Specifically, our findings indicate that invasive species might be repeatedly introduced from their native range, and they emphasize the importance of multiple, human-mediated introductions in successful invasions. Finally, our results demonstrate that invasive species can spread across continents following a bridgehead path, in which an invasive population may have acted as a source for another invasion.

60 citations

Journal ArticleDOI
TL;DR: In this paper, an optimal value of m can be inferred from the second-order rate of change in likelihood (Δm) across incremental values of m. This method has been implemented in a freely available R package called "OptM" and as a web application (https://rfitak.shinyapps.io/OptM/) to interface directly with the output files of Treemix.
Abstract: The software Treemix has become extensively used to estimate the number of migration events, or edges (m), on population trees from genome-wide allele frequency data. However, the appropriate number of edges to include remains unclear. Here, I show that an optimal value of m can be inferred from the second-order rate of change in likelihood (Δm) across incremental values of m. Repurposed from its original use to estimate the number of population clusters in the software Structure (ΔK), I show using simulated populations that Δm performs equally as well as current recommendations for Treemix. A demonstration of an empirical dataset from domestic dogs indicates that this method may be preferable in large, complex population histories and can prioritize migration events for subsequent investigation. The method has been implemented in a freely available R package called "OptM" and as a web application (https://rfitak.shinyapps.io/OptM/) to interface directly with the output files of Treemix.

58 citations

References
More filters
Journal ArticleDOI
01 Jun 2000-Genetics
TL;DR: Pritch et al. as discussed by the authors proposed a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations, which can be applied to most of the commonly used genetic markers, provided that they are not closely linked.
Abstract: We describe a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations. We assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed. Our model does not assume a particular mutation process, and it can be applied to most of the commonly used genetic markers, provided that they are not closely linked. Applications of our method include demonstrating the presence of population structure, assigning individuals to populations, studying hybrid zones, and identifying migrants and admixed individuals. We show that the method can produce highly accurate assignments using modest numbers of loci— e.g. , seven microsatellite loci in an example using genotype data from an endangered bird species. The software used for this article is available from http://www.stats.ox.ac.uk/~pritch/home.html.

27,454 citations

Journal ArticleDOI
TL;DR: It is found that in most cases the estimated ‘log probability of data’ does not provide a correct estimation of the number of clusters, K, and using an ad hoc statistic ΔK based on the rate of change in the log probability between successive K values, structure accurately detects the uppermost hierarchical level of structure for the scenarios the authors tested.
Abstract: The identification of genetically homogeneous groups of individuals is a long standing issue in population genetics. A recent Bayesian algorithm implemented in the software STRUCTURE allows the identification of such groups. However, the ability of this algorithm to detect the true number of clusters (K) in a sample of individuals when patterns of dispersal among populations are not homogeneous has not been tested. The goal of this study is to carry out such tests, using various dispersal scenarios from data generated with an individual-based model. We found that in most cases the estimated 'log probability of data' does not provide a correct estimation of the number of clusters, K. However, using an ad hoc statistic DeltaK based on the rate of change in the log probability of data between successive K values, we found that STRUCTURE accurately detects the uppermost hierarchical level of structure for the scenarios we tested. As might be expected, the results are sensitive to the type of genetic marker used (AFLP vs. microsatellite), the number of loci scored, the number of populations sampled, and the number of individuals typed in each sample.

18,572 citations


"The K = 2 conundrum." refers background or methods or result in this paper

  • ...Using a conservative number of samples and loci simulated under different demographic scenarios, Evanno et al. (2005) explored the change in the slope of the Ln Pr(X|K) curve as a means to select K....

    [...]

  • ...These results suggest that some of the studies we reviewed might have been improved by following the clear recommendations outlined by Pritchard et al. (2000), Pritchard and Wen (2003) and Evanno et al. (2005)....

    [...]

  • ...Populations were generated comprising two separate sexes, in contrast to the hermaphroditic individuals simulated by Evanno et al. (2005)....

    [...]

  • ...2017;26:3594–3602. software, such as STRUCTURE (Pritchard, Stephens, & Donnelly, 2000), and particularly when the commonly used Evanno, Regnaut, and Goudet (2005) method of identifying the “optimal” number of clusters (hereafter referred to as the DKmethod) is applied....

    [...]

  • ...Understandably, several others have commented on the difficulty in determining the point of plateau (e.g., Evanno et al., 2005; Falush, Stephens, & Pritchard, 2003; Francois & Durand, 2010; Latch, Dharmarajan, Glaubitz, & Rhodes, 2006) and a number of factors, in addition to the inherent Bayesian…...

    [...]

Journal ArticleDOI
TL;DR: The purpose of this discussion is to offer some unity to various estimation formulae and to point out that correlations of genes in structured populations, with which F-statistics are concerned, are expressed very conveniently with a set of parameters treated by Cockerham (1 969, 1973).
Abstract: This journal frequently contains papers that report values of F-statistics estimated from genetic data collected from several populations. These parameters, FST, FIT, and FIS, were introduced by Wright (1951), and offer a convenient means of summarizing population structure. While there is some disagreement about the interpretation of the quantities, there is considerably more disagreement on the method of evaluating them. Different authors make different assumptions about sample sizes or numbers of populations and handle the difficulties of multiple alleles and unequal sample sizes in different ways. Wright himself, for example, did not consider the effects of finite sample size. The purpose of this discussion is to offer some unity to various estimation formulae and to point out that correlations of genes in structured populations, with which F-statistics are concerned, are expressed very conveniently with a set of parameters treated by Cockerham (1 969, 1973). We start with the parameters and construct appropriate estimators for them, rather than beginning the discussion with various data functions. The extension of Cockerham's work to multiple alleles and loci will be made explicit, and the use of jackknife procedures for estimating variances will be advocated. All of this may be regarded as an extension of a recent treatment of estimating the coancestry coefficient to serve as a mea-

17,890 citations


"The K = 2 conundrum." refers background in this paper

  • ...The applied value of such information has been demonstrated numerous times, for example, in conservation (Cullingham et al., 2016; Worth et al., 2014), pest management (Chapuis et al., 2008; Kerdelhue, Boivin, & Burban, 2014), the identification of species boundaries (Hamlin & Arnold, 2014; Janes,…...

    [...]

Journal ArticleDOI
TL;DR: STRUCTURE HARVESTER is presented, a web-based program for collating results generated by the program STRUCTURE, which provides a fast way to assess and visualize likelihood values across multiple values of K and hundreds of iterations for easier detection of the number of genetic groups that best fit the data.
Abstract: We present STRUCTURE HARVESTER (available at http://taylor0.biology.ucla.edu/structureHarvester/ ), a web-based program for collating results generated by the program STRUCTURE. The program provides a fast way to assess and visualize likelihood values across multiple values of K and hundreds of iterations for easier detection of the number of genetic groups that best fit the data. In addition, STRUCTURE HARVESTER will reformat data for use in downstream programs, such as CLUMPP.

9,960 citations


"The K = 2 conundrum." refers methods in this paper

  • ...Web-based pipelines, such as Structure Harvester (Earl & vonHoldt, 2012) and Clumpak (Kopelman et al., 2015), have significantly streamlined the process of obtaining both Ln Pr(X|K) and DK plots....

    [...]

Trending Questions (1)
What are the detriments of k 10 2?

The provided paper does not mention any detriments of k = 2. The paper discusses the use of the ΔK method to identify the optimal number of clusters in population genetic structure studies.