scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Genetic Variation and Population Structure in Native Americans

TL;DR: Evidence is observed of a higher level of diversity and lower level of population structure in western South America compared to eastern South America, a relative lack of differentiation between Mesoamerican and Andean populations, and a partial agreement on a local scale between genetic similarity and the linguistic classification of populations.
Abstract: We examined genetic diversity and population structure in the American landmass using 678 autosomal microsatellite markers genotyped in 422 individuals representing 24 Native American populations sampled from North, Central, and South America. These data were analyzed jointly with similar data available in 54 other indigenous populations worldwide, including an additional five Native American groups. The Native American populations have lower genetic diversity and greater differentiation than populations from other continental regions. We observe gradients both of decreasing genetic diversity as a function of geographic distance from the Bering Strait and of decreasing genetic similarity to Siberians—signals of the southward dispersal of human populations from the northwestern tip of the Americas. We also observe evidence of: (1) a higher level of diversity and lower level of population structure in western South America compared to eastern South America, (2) a relative lack of differentiation between Mesoamerican and Andean populations, (3) a scenario in which coastal routes were easier for migrating peoples to traverse in comparison with inland routes, and (4) a partial agreement on a local scale between genetic similarity and the linguistic classification of populations. These findings offer new insights into the process of population dispersal and differentiation during the peopling of the Americas.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The Discriminant Analysis of Principal Components (DAPC) is introduced, a multivariate method designed to identify and describe clusters of genetically related individuals that performs generally better than STRUCTURE at characterizing population subdivision.
Abstract: The dramatic progress in sequencing technologies offers unprecedented prospects for deciphering the organization of natural populations in space and time. However, the size of the datasets generated also poses some daunting challenges. In particular, Bayesian clustering algorithms based on pre-defined population genetics models such as the STRUCTURE or BAPS software may not be able to cope with this unprecedented amount of data. Thus, there is a need for less computer-intensive approaches. Multivariate analyses seem particularly appealing as they are specifically devoted to extracting information from large datasets. Unfortunately, currently available multivariate methods still lack some essential features needed to study the genetic structure of natural populations. We introduce the Discriminant Analysis of Principal Components (DAPC), a multivariate method designed to identify and describe clusters of genetically related individuals. When group priors are lacking, DAPC uses sequential K-means and model selection to infer genetic clusters. Our approach allows extracting rich information from genetic data, providing assignment of individuals to groups, a visual assessment of between-population differentiation, and contribution of individual alleles to population structuring. We evaluate the performance of our method using simulated data, which were also analyzed using STRUCTURE as a benchmark. Additionally, we illustrate the method by analyzing microsatellite polymorphism in worldwide human populations and hemagglutinin gene sequence variation in seasonal influenza. Analysis of simulated data revealed that our approach performs generally better than STRUCTURE at characterizing population subdivision. The tools implemented in DAPC for the identification of clusters and graphical representation of between-group structures allow to unravel complex population structures. Our approach is also faster than Bayesian clustering algorithms by several orders of magnitude, and may be applicable to a wider range of datasets.

3,770 citations


Cites methods or result from "Genetic Variation and Population St..."

  • ...The resulting colorplot (Figure 7) defines clear-cut patterns which are strikingly similar to results previously obtained under a four clusters population genetics model with STRUCTURE [25,26,29]....

    [...]

  • ...The results obtained are remarkably clear and consistent with previous findings [25,26]....

    [...]

  • ...The subdivision inferred by DAPC is strikingly similar to the four clusters identified by the STRUCTURE software [25,26,29]....

    [...]

  • ...This dataset was extended by adding genotypes from 24 Native American and Siberian populations [26]....

    [...]

  • ...First, we analyse worldwide structuring of native human populations using the HGDP-CEPH cell line panel typed for microsatellite markers [23-25], enriched with additional populations of Native Americans [26]....

    [...]

Journal ArticleDOI
TL;DR: Clumpak, available at http://clumpak.tau.ac.il, simplifies the use of model-based analyses of population structure in population genetics and molecular ecology by automating the postprocessing of results of model‐based population structure analyses.
Abstract: The identification of the genetic structure of populations from multilocus genotype data has become a central component of modern population-genetic data analysis. Application of model-based clustering programs often entails a number of steps, in which the user considers different modelling assumptions, compares results across different predetermined values of the number of assumed clusters (a parameter typically denoted K), examines multiple independent runs for each fixed value of K, and distinguishes among runs belonging to substantially distinct clustering solutions. Here, we present CLUMPAK (Cluster Markov Packager Across K), a method that automates the postprocessing of results of model-based population structure analyses. For analysing multiple independent runs at a single K value, CLUMPAK identifies sets of highly similar runs, separating distinct groups of runs that represent distinct modes in the space of possible solutions. This procedure, which generates a consensus solution for each distinct mode, is performed by the use of a Markov clustering algorithm that relies on a similarity matrix between replicate runs, as computed by the software CLUMPP. Next, CLUMPAK identifies an optimal alignment of inferred clusters across different values of K, extending a similar approach implemented for a fixed K in CLUMPP and simplifying the comparison of clustering results across different K values. CLUMPAK incorporates additional features, such as implementations of methods for choosing K and comparing solutions obtained by different programs, models, or data subsets. CLUMPAK, available at http://clumpak.tau.ac.il, simplifies the use of model-based analyses of population structure in population genetics and molecular ecology.

2,252 citations


Cites background or methods from "Genetic Variation and Population St..."

  • ...For example, in examining how distinct solutions – both major and minor modes – change across a range of K values, Wang et al. (2007) observed multimodality at certain values of K, but found that solutions for larger K could be viewed as refinements of several of the modes observed at lower K....

    [...]

  • ...Keywords: admixture, ancestry, clustering, population structure Received 30 August 2014; revision received 19 January 2015; accepted 28 January 2015...

    [...]

  • ...…in cases in which simultaneous examination of multiple modes at multiple K values is of interest, such as when the clustering pattern in the most frequently occurring mode for a given K does not provide a refinement of the corresponding solution for a smaller choice of K (Wang et al. 2007)....

    [...]

  • ...First, we modified the © 2015 John Wiley & Sons Ltd similarity threshold, using either a fixed similarity threshold in the range of 0.6–0.9, reasonable in light of past choices (Wang et al. 2007; Jakobsson et al. 2008), or by the default approach in which the threshold was determined dynamically....

    [...]

  • ...For example, Wang et al. (2007) and Jakobsson et al. (2008) identified as modes all sets of replicates for which the pairwise similarity score for each pair of runs exceeded a specific threshold....

    [...]

Journal ArticleDOI
01 Nov 2012-Genetics
TL;DR: A suite of methods for learning about population mixtures are presented, implemented in a software package called ADMIXTOOLS, that support formal tests for whether mixture occurred and make it possible to infer proportions and dates of mixture.
Abstract: Population mixture is an important process in biology. We present a suite of methods for learning about population mixtures, implemented in a software package called ADMIXTOOLS, that support formal tests for whether mixture occurred and make it possible to infer proportions and dates of mixture. We also describe the development of a new single nucleotide polymorphism (SNP) array consisting of 629,433 sites with clearly documented ascertainment that was specifically designed for population genetic analyses and that we genotyped in 934 individuals from 53 diverse populations. To illustrate the methods, we give a number of examples that provide new insights about the history of human admixture. The most striking finding is a clear signal of admixture into northern Europe, with one ancestral population related to present-day Basques and Sardinians and the other related to present-day populations of northeast Asia and the Americas. This likely reflects a history of admixture between Neolithic migrants and the indigenous Mesolithic population of Europe, consistent with recent analyses of ancient bones from Sweden and the sequencing of the genome of the Tyrolean "Iceman."

1,877 citations

Journal ArticleDOI
22 May 2009-Science
TL;DR: A detailed genetic analysis of most major groups of African populations is provided, suggesting that Africans represent 14 ancestral populations that correlate with self-described ethnicity and shared cultural and/or linguistic properties.
Abstract: Africa is the source of all modern humans, but characterization of genetic variation and of relationships among populations across the continent has been enigmatic. We studied 121 African populations, four African American populations, and 60 non-African populations for patterns of variation at 1327 nuclear microsatellite and insertion/deletion markers. We identified 14 ancestral population clusters in Africa that correlate with self-described ethnicity and shared cultural and/or linguistic properties. We observed high levels of mixed ancestry in most populations, reflecting historical migration events across the continent. Our data also provide evidence for shared ancestry among geographically diverse hunter-gatherer populations (Khoesan speakers and Pygmies). The ancestry of African Americans is predominantly from Niger-Kordofanian (approximately 71%), European (approximately 13%), and other African (approximately 8%) populations, although admixture levels varied considerably among individuals. This study helps tease apart the complex evolutionary history of Africans and African Americans, aiding both anthropological and genetic epidemiologic studies.

1,376 citations

Journal ArticleDOI
TL;DR: The level of genetic diversity within a society is found to have a hump-shaped effect on development outcomes in both the pre-colonial and the modern era, reflecting the trade-off between the beneficial and the detrimental effects of diversity on productivity.
Abstract: This research advances and empirically establishes the hypothesis that, in the course of the prehistoric exodus of Homo sapiens out of Africa, variation in migratory distance to various settlements across the globe affected genetic diversity and has had a persistent humpshaped effect on comparative economic development, reflecting the trade-off between the beneficial and the detrimental effects of diversity on productivity. While the low diversity of Native American populations and the high diversity of African populations have been detrimental for the development of these regions, the intermediate levels of diversity associated with European and Asian populations have been conducive for development. (JEL N10, N30, N50, O10, O50, Z10) Prevailing hypotheses of comparative economic development highlight various determinants of the remarkable inequality in income per capita across the globe. The significance of geographical, institutional, and cultural factors, human capital, ethnolinguistic fractionalization, colonialism, and globalization has been at the heart of a debate concerning the genesis of the astounding transformation in the pattern of comparative development over the past few centuries. While early research focused on the proximate forces that contributed to the divergence in living

870 citations


Cites background or methods or result from "Genetic Variation and Population St..."

  • ...In estimating the impact on economic development of migratory distance from east Africa, via its e¤ect on genetic diversity, this research overcomes limitations and potential concerns that are presented by the existing data on genetic diversity across the globe (i.e., measurement error, data…...

    [...]

  • ...…since random sampling errors are more prevalent in circumstances in population genetics (e.g., Prugnolle et al., 2005; Ramachandran et al., 2005; Wang et al., 2007) have found strong empirical evidence in support of this prediction.12 The present study exploits the explanatory power of…...

    [...]

  • ...On the other hand, using an expanded data set comprised of the 53 HGDP-CEPH ethnic groups and an additional 24 Native American populations, Wang et al. (2007) nd that migratory distance explains a more modest 74% of the variation in genetic diversity, based on allelic frequencies for 678 loci....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods for reconstructing phylogenetic trees from evolutionary distance data.
Abstract: A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.

57,055 citations


Additional excerpts

  • ...An unrooted neighbor-joining [120] population...

    [...]

Journal ArticleDOI
01 Jun 2000-Genetics
TL;DR: Pritch et al. as discussed by the authors proposed a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations, which can be applied to most of the commonly used genetic markers, provided that they are not closely linked.
Abstract: We describe a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations. We assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed. Our model does not assume a particular mutation process, and it can be applied to most of the commonly used genetic markers, provided that they are not closely linked. Applications of our method include demonstrating the presence of population structure, assigning individuals to populations, studying hybrid zones, and identifying migrants and admixed individuals. We show that the method can produce highly accurate assignments using modest numbers of loci— e.g. , seven microsatellite loci in an example using genotype data from an endangered bird species. The software used for this article is available from http://www.stats.ox.ac.uk/~pritch/home.html.

27,454 citations

Book
01 Feb 1987
TL;DR: Recent developments of statistical methods in molecular phylogenetics are reviewed and it is shown that the mathematical foundations of these methods are not well established, but computer simulations and empirical data indicate that currently used methods produce reasonably good phylogenetic trees when a sufficiently large number of nucleotides or amino acids are used.
Abstract: Recent developments of statistical methods in molecular phylogenetics are reviewed. It is shown that the mathematical foundations of these methods are not well established, but computer simulations and empirical data indicate that currently used methods such as neighbor joining, minimum evolution, likelihood, and parsimony methods produce reasonably good phylogenetic trees when a sufficiently large number of nucleotides or amino acids are used. However, when the rate of evolution varies exlensively from branch to branch, many methods may fail to recover the true topology. Solid statistical tests for examining'the accuracy of trees obtained by neighborjoining, minimum evolution, and least-squares method are available, but the methods for likelihood and parsimony trees are yet to be refined. Parsimony, likelihood, and distance methods can all be used for inferring amino acid sequences of the proteins of ancestral organisms that have become extinct.

15,840 citations


"Genetic Variation and Population St..." refers methods in this paper

  • ...For each population, expected heterozygosity was computed for each locus using an unbiased estimator [115], and the average across loci was taken as the population estimate....

    [...]

Journal ArticleDOI
TL;DR: An overview of the statistical methods, computational tools, and visual exploration modules for data input and the results obtainable in MEGA is provided.
Abstract: With its theoretical basis firmly established in molecular evolutionary and population genetics, the comparative DNA and protein sequence analysis plays a central role in reconstructing the evolutionary histories of species and multigene families, estimating rates of molecular evolution, and inferring the nature and extent of selective forces shaping the evolution of genes and genomes. The scope of these investigations has now expanded greatly owing to the development of high-throughput sequencing techniques and novel statistical and computational methods. These methods require easy-to-use computer programs. One such effort has been to produce Molecular Evolutionary Genetics Analysis (MEGA) software, with its focus on facilitating the exploration and analysis of the DNA and protein sequence variation from an evolutionary perspective. Currently in its third major release, MEGA3 contains facilities for automatic and manual sequence alignment, web-based mining of databases, inference of the phylogenetic trees, estimation of evolutionary distances and testing evolutionary hypotheses. This paper provides an overview of the statistical methods, computational tools, and visual exploration modules for data input and the results obtainable in MEGA.

12,124 citations


"Genetic Variation and Population St..." refers methods in this paper

  • ...The computation of bootstrap distances was performed using PowerMarker [111], and the consensus tree was obtained and plotted using MEGA3 [122]....

    [...]

Journal Article
TL;DR: The technic to be given below for imparting statistical validity to the procedures already in vogue can be viewed as a generalized form of regression with possible useful application to problems arising in quite different contexts.
Abstract: The problem of identifying subtle time-space clustering of disease, as may be occurring in leukemia, is described and reviewed. Published approaches, generally associated with studies of leukemia, not dependent on knowledge of the underlying population for their validity, are directed towards identifying clustering by establishing a relationship between the temporal and the spatial separations for the n ( n - 1)/2 possible pairs which can be formed from the n observed cases of disease. Here it is proposed that statistical power can be improved by applying a reciprocal transform to these separations. While a permutational approach can give valid probability levels for any observed association, for reasons of practicability, it is suggested that the observed association be tested relative to its permutational variance. Formulas and computational procedures for doing so are given. While the distance measures between points represent symmetric relationships subject to mathematical and geometric regularities, the variance formula developed is appropriate for arbitrary relationships. Simplified procedures are given for the case of symmetric and skew-symmetric relationships. The general procedure is indicated as being potentially useful in other situations as, for example, the study of interpersonal relationships. Viewing the procedure as a regression approach, the possibility for extending it to nonlinear and multivariate situations is suggested. Other aspects of the problem and of the procedure developed are discussed. Similarly, pure temporal clustering can be identified by a study of incidence rates in periods of widespread epidemics. In point of fact, many epidemics of communicable diseases are somewhat local in nature and so these do actually constitute temporal-spatial clusters. For leukemia and similar diseases in which cases seem to arise substantially at random rather than as clear-cut epidemics, it is necessary to devise sensitive and efficient procedures for detecting any nonrandom component of disease occurrence. Various ingenious procedures which statisticians have developed for the detection of disease clustering are reviewed here. These procedures can be generalized so as to increase their statistical validity and efficiency. The technic to be given below for imparting statistical validity to the procedures already in vogue can be viewed as a generalized form of regression with possible useful application to problems arising in quite different contexts.

11,408 citations


"Genetic Variation and Population St..." refers background or methods in this paper

  • ...For comparison with linguistic distances, Da genetic distances were used (Table S27), and the Mantel correlation coefficients [38] between pairs of distance matrices (among genetic, geographic, and linguistic) were obtained, with significance assessed using 10,000 permutations of rows and columns....

    [...]

  • ...ures, despite the high Mantel correlation coefficients [38]...

    [...]

Related Papers (5)
16 Aug 2012-Nature