scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data

15 Nov 2012-PLOS Genetics (Public Library of Science)-Vol. 8, Iss: 11
TL;DR: A statistical model for inferring the patterns of population splits and mixtures in multiple populations and it is shown that a simple bifurcating tree does not fully describe the data; in contrast, many migration events are inferred.
Abstract: Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In our model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data, we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to domestication and that East Asian toy breeds (the Shih Tzu and the Pekingese) result from admixture between modern toy breeds and “ancient” Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.com.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
01 Nov 2012-Genetics
TL;DR: A suite of methods for learning about population mixtures are presented, implemented in a software package called ADMIXTOOLS, that support formal tests for whether mixture occurred and make it possible to infer proportions and dates of mixture.
Abstract: Population mixture is an important process in biology. We present a suite of methods for learning about population mixtures, implemented in a software package called ADMIXTOOLS, that support formal tests for whether mixture occurred and make it possible to infer proportions and dates of mixture. We also describe the development of a new single nucleotide polymorphism (SNP) array consisting of 629,433 sites with clearly documented ascertainment that was specifically designed for population genetic analyses and that we genotyped in 934 individuals from 53 diverse populations. To illustrate the methods, we give a number of examples that provide new insights about the history of human admixture. The most striking finding is a clear signal of admixture into northern Europe, with one ancestral population related to present-day Basques and Sardinians and the other related to present-day populations of northeast Asia and the Americas. This likely reflects a history of admixture between Neolithic migrants and the indigenous Mesolithic population of Europe, consistent with recent analyses of ancient bones from Sweden and the sequencing of the genome of the Tyrolean "Iceman."

1,877 citations


Cites methods from "Inference of Population Splits and ..."

  • ...Admixture graph fitting has some similarities to the TreeMix method of Pickrell and Pritchard (2012) but differs in that TreeMix allows users to automatically explore the space of possible models and to find the one that best fits the data (our method does not), while our method provides a rigorous…...

    [...]

Journal ArticleDOI
12 Oct 2012-Science
TL;DR: The genomic sequence provides evidence for very low rates of heterozygosity in the Denisova, probably not because of recent inbreeding, but instead because of a small population size, and illuminates the relationships between humans and archaics, including Neandertals, and establishes a catalog of genetic changes within the human lineage.
Abstract: We present a DNA library preparation method that has allowed us to reconstruct a high-coverage (30×) genome sequence of a Denisovan, an extinct relative of Neandertals. The quality of this genome allows a direct estimation of Denisovan heterozygosity indicating that genetic diversity in these archaic hominins was extremely low. It also allows tentative dating of the specimen on the basis of “missing evolution” in its genome, detailed measurements of Denisovan and Neandertal admixture into present-day human populations, and the generation of a near-complete catalog of genetic changes that swept to high frequency in modern humans since their divergence from Denisovans.

1,690 citations

Journal ArticleDOI
11 Jun 2015-Nature
TL;DR: In this paper, the authors generated genome-wide data from 69 Europeans who lived between 8,000-3,000 years ago by enriching ancient DNA libraries for a target set of almost 400,000 polymorphisms.
Abstract: We generated genome-wide data from 69 Europeans who lived between 8,000-3,000 years ago by enriching ancient DNA libraries for a target set of almost 400,000 polymorphisms. Enrichment of these positions decreases the sequencing required for genome-wide ancient DNA analysis by a median of around 250-fold, allowing us to study an order of magnitude more individuals than previous studies and to obtain new insights about the past. We show that the populations of Western and Far Eastern Europe followed opposite trajectories between 8,000-5,000 years ago. At the beginning of the Neolithic period in Europe, ∼8,000-7,000 years ago, closely related groups of early farmers appeared in Germany, Hungary and Spain, different from indigenous hunter-gatherers, whereas Russia was inhabited by a distinctive population of hunter-gatherers with high affinity to a ∼24,000-year-old Siberian. By ∼6,000-5,000 years ago, farmers throughout much of Europe had more hunter-gatherer ancestry than their predecessors, but in Russia, the Yamnaya steppe herders of this time were descended not only from the preceding eastern European hunter-gatherers, but also from a population of Near Eastern ancestry. Western and Eastern Europe came into contact ∼4,500 years ago, as the Late Neolithic Corded Ware people from Germany traced ∼75% of their ancestry to the Yamnaya, documenting a massive migration into the heartland of Europe from its eastern periphery. This steppe ancestry persisted in all sampled central Europeans until at least ∼3,000 years ago, and is ubiquitous in present-day Europeans. These results provide support for a steppe origin of at least some of the Indo-European languages of Europe.

1,332 citations

Journal ArticleDOI
01 Jun 2014-Genetics
TL;DR: Developing efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework and proposing useful heuristic scores to identify the number of populations represented in a data set and a new hierarchical prior to detect weak population structure in the data.
Abstract: Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a data set and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data and illustrate using genotype data from the CEPH-Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias toward detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html.

1,266 citations


Cites background from "Inference of Population Splits and ..."

  • ...A natural extension of this hierarchical prior would be to allow for a full locus-independent variance–covariance matrix (Pickrell and Pritchard 2012)....

    [...]

Journal ArticleDOI
Iosif Lazaridis1, Iosif Lazaridis2, Nick Patterson2, Alissa Mittnik3, Gabriel Renaud4, Swapan Mallick1, Swapan Mallick2, Karola Kirsanow5, Peter H. Sudmant6, Joshua G. Schraiber7, Joshua G. Schraiber6, Sergi Castellano4, Mark Lipson8, Bonnie Berger2, Bonnie Berger8, Christos Economou9, Ruth Bollongino5, Qiaomei Fu4, Kirsten I. Bos3, Susanne Nordenfelt2, Susanne Nordenfelt1, Heng Li1, Heng Li2, Cesare de Filippo4, Kay Prüfer4, Susanna Sawyer4, Cosimo Posth3, Wolfgang Haak10, Fredrik Hallgren11, Elin Fornander11, Nadin Rohland2, Nadin Rohland1, Dominique Delsate12, Michael Francken3, Jean-Michel Guinet12, Joachim Wahl, George Ayodo, Hamza A. Babiker13, Hamza A. Babiker14, Graciela Bailliet, Elena Balanovska, Oleg Balanovsky, Ramiro Barrantes15, Gabriel Bedoya16, Haim Ben-Ami17, Judit Bene18, Fouad Berrada19, Claudio M. Bravi, Francesca Brisighelli20, George B.J. Busby21, Francesco Calì, Mikhail Churnosov22, David E. C. Cole23, Daniel Corach24, Larissa Damba, George van Driem25, Stanislav Dryomov26, Jean-Michel Dugoujon27, Sardana A. Fedorova28, Irene Gallego Romero29, Marina Gubina, Michael F. Hammer30, Brenna M. Henn31, Tor Hervig32, Ugur Hodoglugil33, Aashish R. Jha29, Sena Karachanak-Yankova34, Rita Khusainova35, Elza Khusnutdinova35, Rick A. Kittles30, Toomas Kivisild36, William Klitz7, Vaidutis Kučinskas37, Alena Kushniarevich38, Leila Laredj39, Sergey Litvinov38, Theologos Loukidis40, Theologos Loukidis41, Robert W. Mahley42, Béla Melegh18, Ene Metspalu43, Julio Molina, Joanna L. Mountain, Klemetti Näkkäläjärvi44, Desislava Nesheva34, Thomas B. Nyambo45, Ludmila P. Osipova, Jüri Parik43, Fedor Platonov28, Olga L. Posukh, Valentino Romano46, Francisco Rothhammer47, Francisco Rothhammer48, Igor Rudan13, Ruslan Ruizbakiev49, Hovhannes Sahakyan50, Hovhannes Sahakyan38, Antti Sajantila51, Antonio Salas52, Elena B. Starikovskaya26, Ayele Tarekegn, Draga Toncheva34, Shahlo Turdikulova49, Ingrida Uktveryte37, Olga Utevska53, René Vasquez54, Mercedes Villena54, Mikhail Voevoda55, Cheryl A. Winkler56, Levon Yepiskoposyan50, Pierre Zalloua1, Pierre Zalloua57, Tatijana Zemunik58, Alan Cooper10, Cristian Capelli21, Mark G. Thomas41, Andres Ruiz-Linares41, Sarah A. Tishkoff59, Lalji Singh60, Kumarasamy Thangaraj61, Richard Villems62, Richard Villems43, Richard Villems38, David Comas63, Rem I. Sukernik26, Mait Metspalu38, Matthias Meyer4, Evan E. Eichler6, Joachim Burger5, Montgomery Slatkin7, Svante Pääbo4, Janet Kelso4, David Reich2, David Reich64, David Reich1, Johannes Krause3, Johannes Krause4 
Harvard University1, Broad Institute2, University of Tübingen3, Max Planck Society4, University of Mainz5, University of Washington6, University of California, Berkeley7, Massachusetts Institute of Technology8, Stockholm University9, University of Adelaide10, The Heritage Foundation11, National Museum of Natural History12, University of Edinburgh13, Sultan Qaboos University14, University of Costa Rica15, University of Antioquia16, Rambam Health Care Campus17, University of Pécs18, Al Akhawayn University19, Catholic University of the Sacred Heart20, University of Oxford21, Belgorod State University22, University of Toronto23, University of Buenos Aires24, University of Bern25, Russian Academy of Sciences26, Paul Sabatier University27, North-Eastern Federal University28, University of Chicago29, University of Arizona30, Stony Brook University31, University of Bergen32, Illumina33, Sofia Medical University34, Bashkir State University35, University of Cambridge36, Vilnius University37, Estonian Biocentre38, University of Strasbourg39, Amgen40, University College London41, Gladstone Institutes42, University of Tartu43, University of Oulu44, Muhimbili University of Health and Allied Sciences45, University of Palermo46, University of Tarapacá47, University of Chile48, Academy of Sciences of Uzbekistan49, Armenian National Academy of Sciences50, University of North Texas51, University of Santiago de Compostela52, University of Kharkiv53, Higher University of San Andrés54, Novosibirsk State University55, Leidos56, Lebanese American University57, University of Split58, University of Pennsylvania59, Banaras Hindu University60, Centre for Cellular and Molecular Biology61, Estonian Academy of Sciences62, Pompeu Fabra University63, Howard Hughes Medical Institute64
18 Sep 2014-Nature
TL;DR: It is shown that most present-day Europeans derive from at least three highly differentiated populations: west European hunter-gatherers, who contributed ancestry to all Europeans but not to Near Easterners; ancient north Eurasians related to Upper Palaeolithic Siberians; and early European farmers, who were mainly of Near Eastern origin but also harboured west Europeanhunter-gatherer related ancestry.
Abstract: We sequenced the genomes of a ∼7,000-year-old farmer from Germany and eight ∼8,000-year-old hunter-gatherers from Luxembourg and Sweden. We analysed these and other ancient genomes with 2,345 contemporary humans to show that most present-day Europeans derive from at least three highly differentiated populations: west European hunter-gatherers, who contributed ancestry to all Europeans but not to Near Easterners; ancient north Eurasians related to Upper Palaeolithic Siberians, who contributed to both Europeans and Near Easterners; and early European farmers, who were mainly of Near Eastern origin but also harboured west European hunter-gatherer related ancestry. We model these populations' deep relationships and show that early European farmers had ∼44% ancestry from a 'basal Eurasian' population that split before the diversification of other non-African lineages.

1,077 citations

References
More filters
Journal ArticleDOI
TL;DR: The neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods for reconstructing phylogenetic trees from evolutionary distance data.
Abstract: A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.

57,055 citations

Journal ArticleDOI
01 Jun 2000-Genetics
TL;DR: Pritch et al. as discussed by the authors proposed a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations, which can be applied to most of the commonly used genetic markers, provided that they are not closely linked.
Abstract: We describe a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations. We assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed. Our model does not assume a particular mutation process, and it can be applied to most of the commonly used genetic markers, provided that they are not closely linked. Applications of our method include demonstrating the presence of population structure, assigning individuals to populations, studying hybrid zones, and identifying migrants and admixed individuals. We show that the method can produce highly accurate assignments using modest numbers of loci— e.g. , seven microsatellite loci in an example using genotype data from an endangered bird species. The software used for this article is available from http://www.stats.ox.ac.uk/~pritch/home.html.

27,454 citations

Journal ArticleDOI
20 Jul 1978-Genetics
TL;DR: It is shown that the number of individuals to be used for estimating average heterozygosity can be very small if a large number of loci are studied and the average heter homozygosity is low.
Abstract: The magnitudes of the systematic biases involved in sample heterozygosity and sample genetic distances are evaluated, and formulae for obtaining unbiased estimates of average heterozygosity and genetic distance are developed. It is also shown that the number of individuals to be used for estimating average heterozygosity can be very small if a large number of loci are studied and the average heterozygosity is low. The number of individuals to be used for estimating genetic distance can also be very small if the genetic distance is large and the average heterozygosity of the two species compared is low.

11,137 citations


"Inference of Population Splits and ..." refers background in this paper

  • ...e., the mean across all SNPs of X ik (1 X ik )). A natural estimator of B i is then: B i = h i 4N i (4) where h i is an unbiased estimate of the heterozygosity in population i averaged over all SNPs [Nei, 1978]: h i = 1 n Xn k =1 n ik (2 N i n ik ) N i(2 N i 1) : (5) As derived in the main text, the sample covariance of populations i and j, W ij , is: W ij = V ij 1 m Xm k =1 V ik 1 m Xm k =1 V jk + 1 m 2 X...

    [...]

  • ...A natural estimator of Bi is then: Bi = hi 4Ni (4) where hi is an unbiased estimate of the heterozygosity in population i averaged over all SNPs [Nei, 1978]: hi = 1 n n∑ k=1 nik(2Ni − nik) Ni(2Ni − 1) ....

    [...]

Journal ArticleDOI
TL;DR: This article reviews the terminology used for phylogenetic networks and covers both split networks and reticulate networks, how they are defined, and how they can be interpreted and outlines the beginnings of a comprehensive statistical framework for applying split network methods.
Abstract: The evolutionary history of a set of taxa is usually represented by a phylogenetic tree, and this model has greatly facilitated the discussion and testing of hypotheses. However, it is well known that more complex evolutionary scenarios are poorly described by such models. Further, even when evolution proceeds in a tree-like manner, analysis of the data may not be best served by using methods that enforce a tree structure but rather by a richer visualization of the data to evaluate its properties, at least as an essential first step. Thus, phylogenetic networks should be employed when reticulate events such as hybridization, horizontal gene transfer, recombination, or gene duplication and loss are believed to be involved, and, even in the absence of such events, phylogenetic networks have a useful role to play. This article reviews the terminology used for phylogenetic networks and covers both split networks and reticulate networks, how they are defined, and how they can be interpreted. Additionally, the article outlines the beginnings of a comprehensive statistical framework for applying split network methods. We show how split networks can represent confidence sets of trees and introduce a conservative statistical test for whether the conflicting signal in a network is treelike. Finally, this article describes a new program, SplitsTree4, an interactive and comprehensive tool for inferring different types of phylogenetic networks from sequences, distances, and trees.

7,273 citations


"Inference of Population Splits and ..." refers methods in this paper

  • ...Graph-based models are of growing interest in phylogenetics [45,46], but have been rarely used in population genetics (with some exceptions [37,40,47])....

    [...]

Book
01 Jun 1974
TL;DR: Since the lm function provides a lot of features it is rather complicated so it is going to instead use the function lsfit as a model, which computes only the coefficient estimates and the residuals.
Abstract: Since the lm function provides a lot of features it is rather complicated. So we are going to instead use the function lsfit as a model. It computes only the coefficient estimates and the residuals. Now would be a good time to read the help file for lsfit. Note that lsfit supports the fitting of multiple least squares models and weighted least squares. Our function will not, hence we can omit the arguments wt, weights and yname. Also, changing tolerances is a little advanced so we will trust the default values and omit the argument tolerance as well.

6,956 citations