scispace - formally typeset
Search or ask a question
Posted Content

Dated ancestral trees from binary trait data and its application to the diversification of languages

TL;DR: This work proposes a model‐based analysis of binary trait data and presents a Markov chain Monte Carlo algorithm that can sample from the resulting posterior distribution, based on using a birth–death process for the evolution of the elements of sets of traits.
Abstract: Binary trait data record the presence or absence of distinguishing traits in individuals. We treat the problem of estimating ancestral trees with time depth from binary trait data. Simple analysis of such data is problematic. Each homology class of traits has a unique birth event on the tree, and the birth event of a trait visible at the leaves is biased towards the leaves. We propose a model-based analysis of such data, and present an MCMC algorithm that can sample from the resulting posterior distribution. Our model is based on using a birth-death process for the evolution of the elements of sets of traits. Our analysis correctly accounts for the removal of singleton traits, which are commonly discarded in real data sets. We illustrate Bayesian inference for two binary-trait data sets which arise in historical linguistics. The Bayesian approach allows for the incorporation of information from ancestral languages. The marginal prior distribution of the root time is uniform. We present a thorough analysis of the robustness of our results to model mispecification, through analysis of predictive distributions for external data, and fitting data simulated under alternative observation models. The reconstructed ages of tree nodes are relatively robust, whilst posterior probabilities for topology are not reliable.
Citations
More filters
Journal ArticleDOI
23 Jan 2009-Science
TL;DR: The results are robust to assumptions about the rooting and calibration of the trees and demonstrate the combined power of linguistic scholarship, database technologies, and computational phylogenetic methods for resolving questions about human prehistory.
Abstract: Debates about human prehistory often center on the role that population expansions play in shaping biological and cultural diversity. Hypotheses on the origin of the Austronesian settlers of the Pacific are divided between a recent “pulse-pause” expansion from Taiwan and an older “slow-boat” diffusion from Wallacea. We used lexical data and Bayesian phylogenetic methods to construct a phylogeny of 400 languages. In agreement with the pulse-pause scenario, the language trees place the Austronesian origin in Taiwan approximately 5230 years ago and reveal a series of settlement pauses and expansion pulses linked to technological and social innovations. These results are robust to assumptions about the rooting and calibration of the trees and demonstrate the combined power of linguistic scholarship, database technologies, and computational phylogenetic methods for resolving questions about human prehistory.

632 citations


Cites background from "Dated ancestral trees from binary t..."

  • ...Nothofer, B. (1986) The Barrier island languages in the Austronesian language family....

    [...]

Journal ArticleDOI
24 Aug 2012-Science
TL;DR: Both the inferred timing and root location of the Indo-European language trees fit with an agricultural expansion from Anatolia beginning 8000 to 9500 years ago, which supports the suggestion that the origin of the language family was indeed Anatolia 7 to 10 thousand years ago—contemporaneous with the spread of agriculture.
Abstract: There are two competing hypotheses for the origin of the Indo-European language family. The conventional view places the homeland in the Pontic steppes about 6000 years ago. An alternative hypothesis claims that the languages spread from Anatolia with the expansion of farming 8000 to 9500 years ago. We used Bayesian phylogeographic approaches, together with basic vocabulary data from 103 ancient and contemporary Indo-European languages, to explicitly model the expansion of the family and test these hypotheses. We found decisive support for an Anatolian origin over a steppe origin. Both the inferred timing and root location of the Indo-European language trees fit with an agricultural expansion from Anatolia beginning 8000 to 9500 years ago. These results highlight the critical role that phylogeographic inference can play in resolving debates about human prehistory.

487 citations

Book
08 Oct 2015
TL;DR: This chapter discusses phylogenetic tree space exploration in the context of BEAST, a programming language that automates the very labor-intensive and therefore time-heavy and expensive process of manually cataloging and cataloging trees.
Abstract: What are the models used in phylogenetic analysis and what exactly is involved in Bayesian evolutionary analysis using Markov chain Monte Carlo (MCMC) methods? How can you choose and apply these models, which parameterisations and priors make sense, and how can you diagnose Bayesian MCMC when things go wrong? These are just a few of the questions answered in this comprehensive overview of Bayesian approaches to phylogenetics. This practical guide:Addresses the theoretical aspects of the field Advises on how to prepare and perform phylogenetic analysisHelps with interpreting analyses and visualisation of phylogeniesDescribes the software architecture Helps developing BEAST 2.2 extensions to allow these models to be extended further.With an accompanying website providing example files and tutorials (http://beast2.org/), this one-stop reference to applying the latest phylogenetic models in BEAST 2 will provide essential guidance for all users – from those using phylogenetic tools, to computational biologists and Bayesian statisticians.

390 citations

Journal ArticleDOI
TL;DR: Current computational methods derived from evolutionary biology are used to compare the extent to which lexical evolution is tree-like in different parts of the world and to evaluate the coherence of cultural and linguistic lineages.
Abstract: In this paper we outline two debates about the nature of human cultural history. The first focuses on the extent to which human history is tree-like (its shape), and the second on the unity of that history (its fabric). Proponents of cultural phylogenetics are often accused of assuming that human history has been both highly tree-like and consisting of tightly linked lineages. Critics have pointed out obvious exceptions to these assumptions. Instead of a priori dichotomous disputes about the validity of cultural phylogenetics, we suggest that the debate is better conceptualized as involving positions along continuous dimensions. The challenge for empirical research is, therefore, to determine where particular aspects of culture lie on these dimensions. We discuss the ability of current computational methods derived from evolutionary biology to address these questions. These methods are then used to compare the extent to which lexical evolution is tree-like in different parts of the world and to evaluate the coherence of cultural and linguistic lineages.

157 citations


Cites background from "Dated ancestral trees from binary t..."

  • ...According to the recent phylogenetic estimates (Gray & Atkinson 2003; Nicholls & Gray 2008), the initial divergence of Indo-European languages dates back to approximately 8500 years, whereas Polynesian languages date back to only 3000 years (Gray et al. 2009; Spriggs 2010)....

    [...]

  • ...…in the last 3000 years (Spriggs 2010), whereas the Indo-European languages started to disperse across continental Europe approximately 8500 years ago, with the major radiation of the language families occurring around 6000 years BP (Gray & Atkinson 2003; Atkinson et al. 2005; Nicholls & Gray 2008)....

    [...]

Journal ArticleDOI
01 Mar 2015-Language
TL;DR: A phylogenetic analysis in which ancestry constraints permit more accurate inference of rates of change, based on observed changes between ancient or medieval languages and their modern descendants, shows that lexical traits undergo recurrent evolution due to recurring patterns of semantic and morphological change.
Abstract: $QFHVWU\FRQVWUDLQHGSK\ORJHQHWLFDQDO\VLVVXSSRUWV WKH,QGR(XURSHDQVWHSSHK\SRWKHVLV :LOO&KDQJ&KXQGUD&DWKFDUW'DYLG+DOO$QGUHZ*DUUHWW Language, Volume 91, Number 1, March 2015, pp. 194-244 (Article) 3XEOLVKHGE\/LQJXLVWLF6RFLHW\RI$PHULFD DOI: 10.1353/lan.2015.0005 For additional information about this article http://muse.jhu.edu/journals/lan/summary/v091/91.1.chang.html Access provided by University of California @ Berkeley (25 Jun 2015 14:40 GMT)

155 citations

References
More filters
Journal ArticleDOI
TL;DR: The neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods for reconstructing phylogenetic trees from evolutionary distance data.
Abstract: A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.

57,055 citations


"Dated ancestral trees from binary t..." refers background or methods or result in this paper

  • ...Readers interested in the application only should read the first paragraph of 2, and all of Section 4, before jumping to the data analysis in Sections 7 and 8. Graphics illustrating this application make up the bulk of the supplement Nicholls and Gray (2007) , see http://www.stats.ox.ac.uk/~nicholls/linkfiles/papers/NichollsGray06-SUPP.pdf....

    [...]

  • ...In the supplement, Nicholls and Gray (2007) , we define and display consensus trees, a central point estimate for topology and branch length....

    [...]

  • ...A graphical illustration of the process and notation is given in the supplement, Nicholls and Gray (2007) ....

    [...]

  • ...The branching at the top of the superclade s-BCIG is poorly resolved as the s-BCIG, s-CIG and s-IG branches are separated by just 1000 years in the tree on which the synthetic data was simulated (see Nicholls and Gray (2007) ), which is small compared to µ −1 ≃ 3000....

    [...]

  • ...In Huson and Steel (2004) the traits are distinct genes which are present or absent in an individual, and trees are built using a maximum-likelihood pairwise-distance, and the neighborjoining methods of Saitou and Nei (1987) ....

    [...]

Journal ArticleDOI
TL;DR: A computationally feasible method for finding such maximum likelihood estimates is developed, and a computer program is available that allows the testing of hypotheses about the constancy of evolutionary rates by likelihood ratio tests.
Abstract: The application of maximum likelihood techniques to the estimation of evolutionary trees from nucleic acid sequence data is discussed. A computationally feasible method for finding such maximum likelihood estimates is developed, and a computer program is available. This method has advantages over the traditional parsimony algorithms, which can give misleading results if rates of evolution differ in different lineages. It also allows the testing of hypotheses about the constancy of evolutionary rates by likelihood ratio tests, and gives rough indication of the error of the estimate of the tree.

13,111 citations


"Dated ancestral trees from binary t..." refers background or methods in this paper

  • ...Felsenstein (1992) gave the likelihood for a Poisson process acting on a finite state space, along the branches of a tree, conditioned to show states other than the zero state at the leaves....

    [...]

  • ...Felsenstein (1992) gave the likelihood for a Poisson process acting on a finite state space, along the branches of a tree, conditioned to show states other than the zero state at the leaves. Lewis (2001) proposed to apply certain trait models of this kind (so-called Jukes–Cantor models) to morphological character data, in a maximum likelihood analysis. Lewis (2001) mentioned the problem of thinning traits that are displayed at a single taxon and treated it by ensuring that the data are not so thinned. Nylander et al. (2004) fitted models from the same family, allowing for the thinning of all parsimony uninformative characters (traits that are displayed at 0, 1, L−1 or L leaves). These models do not constrain a trait to be generated at a single birth event. They modelled a fixed number of traits which move back and forwards between different categorical values indefinitely. The number of distinct traits is fixed for all time. We impose a single birth event for a trait and an evolution which proceeds from absence to presence to absence only. The number of distinct traits that are generated by our process is random, so the total number is informative of the relative rates of birth and death. The model that we have described resembles the Watterson (1975) infinite sites model, but here trait death is in effect back-mutation. Our model is similar to the infinite alleles model of Kimura and Crow (1964), though the number of alleles is not random, whereas the number of traits is random....

    [...]

  • ...…μ} , and consequently the likelihood is P.D|g, μ, λ/= 1 N! exp { − ∫ [g] λ.z/ dz } N∏ a=1 λ ∫ [g] Pr{Ma=ma|xa, g, μ} dxa: .3/ We compute λ ∫ [g] Pr{O.z/ > d|z, g, μ} dz and the factors λ ∫ [g] Pr.Ma=ma|xa, g, μ/ dxa using recursions that are related to the pruning recursion of Felsenstein (1981)....

    [...]

  • ...The model, which is described in Huson and Steel (2004) and Atkinson et al....

    [...]

  • ...Felsenstein (1992) gave the likelihood for a Poisson process acting on a finite state space, along the branches of a tree, conditioned to show states other than the zero state at the leaves. Lewis (2001) proposed to apply certain trait models of this kind (so-called Jukes–Cantor models) to morphological character data, in a maximum likelihood analysis. Lewis (2001) mentioned the problem of thinning traits that are displayed at a single taxon and treated it by ensuring that the data are not so thinned....

    [...]

Journal ArticleDOI
TL;DR: UNLABELLED Analysis of Phylogenetics and Evolution (APE) is a package written in the R language for use in molecular evolution and phylogenetics that provides both utility functions for reading and writing data and manipulating phylogenetic trees.
Abstract: Summary: Analysis of Phylogenetics and Evolution (APE) is a package written in the R language for use in molecular evolution and phylogenetics. APE provides both utility functions for reading and writing data and manipulating phylogenetic trees, as well as several advanced methods for phylogenetic and evolutionary analysis (e.g. comparative and population genetic methods). APE takes advantage of the many R functions for statistics and graphics, and also provides a flexible framework for developing and implementing further statistical methods for the analysis of evolutionary processes. Availability: The program is free and available from the official R package archive at http://cran.r-project.org/src/contrib/PACKAGES.html#ape. APE is licensed under the GNU General Public License.

10,818 citations

Journal ArticleDOI
TL;DR: This article reviews the terminology used for phylogenetic networks and covers both split networks and reticulate networks, how they are defined, and how they can be interpreted and outlines the beginnings of a comprehensive statistical framework for applying split network methods.
Abstract: The evolutionary history of a set of taxa is usually represented by a phylogenetic tree, and this model has greatly facilitated the discussion and testing of hypotheses. However, it is well known that more complex evolutionary scenarios are poorly described by such models. Further, even when evolution proceeds in a tree-like manner, analysis of the data may not be best served by using methods that enforce a tree structure but rather by a richer visualization of the data to evaluate its properties, at least as an essential first step. Thus, phylogenetic networks should be employed when reticulate events such as hybridization, horizontal gene transfer, recombination, or gene duplication and loss are believed to be involved, and, even in the absence of such events, phylogenetic networks have a useful role to play. This article reviews the terminology used for phylogenetic networks and covers both split networks and reticulate networks, how they are defined, and how they can be interpreted. Additionally, the article outlines the beginnings of a comprehensive statistical framework for applying split network methods. We show how split networks can represent confidence sets of trees and introduce a conservative statistical test for whether the conflicting signal in a network is treelike. Finally, this article describes a new program, SplitsTree4, an interactive and comprehensive tool for inferring different types of phylogenetic networks from sequences, distances, and trees.

7,273 citations


"Dated ancestral trees from binary t..." refers methods in this paper

  • ...8-2 of Paradis et al. (2004). Monte Carlo simulations were carried out using TraitLab, a freely available MatLab package written by Geoff Nicholls and David Welch....

    [...]

Journal ArticleDOI
TL;DR: The distribution is obtained for the number of segregating sites observed in a sample from a population which is subject to recurring, new, mutations but not subject to recombination, and applies approximately to three population models.

3,870 citations