Showing papers in &quot;Systematic Biology in 2014&quot;

A Linear-Time Algorithm for Gaussian and Non-Gaussian Trait Evolution Models

TL;DR: The re-implementing of the Dispersal-Extinction-Cladogenesis model of LAGRANGE is modified to create a new model, DEC + J, which adds founder-event speciation, the importance of which is governed by a new free parameter, and the results indicate that the assumptions of historical biogeography models can have large impacts on inference and require testing and comparison with statistical methods.

...read moreread less

Abstract: Founder-event speciation, where a rare jump dispersal event founds a new genetically isolated lineage, has long been considered crucial by many historical biogeographers, but its importance is disputed within the vicariance school. Probabilistic modeling of geographic range evolution creates the potential to test different biogeographical models against data using standard statistical model choice procedures, as long as multiple models are available. I re-implement the Dispersal-Extinction-Cladogenesis (DEC) model of LAGRANGE in the R package BioGeoBEARS, and modify it to create a new model, DEC+J, which adds founder-event speciation, the importance of which is governed by a new free parameter, j. The identifiability of DEC and DEC+J is tested on data sets simulated under a wide range of macroevolutionary models where geography evolves jointly with lineage birth/death events. The results confirm that DEC and DEC+J are identifiable even though these models ignore the fact that molecular phylogenies are missing many cladogenesis and extinction events. The simulations also indicate that DEC will have substantially increased errors in ancestral range estimation and parameter inference when the true model includes +J. DEC and DEC+J are compared on 13 empirical data sets drawn from studies of island clades. Likelihood-ratio tests indicate that all clades reject DEC, and AICc model weights show large to overwhelming support for DEC+J, for the first time verifying the importance of founder-event speciation in island clades via statistical model choice. Under DEC+J, ancestral nodes are usually estimated to have ranges occupying only one island, rather than the widespread ancestors often favored by DEC. These results indicate that the assumptions of historical biogeography models can have large impacts on inference and require testing and comparison with statistical methods. (BioGeoBEARS; cladogenesis; extinction; founder-event speciation; GeoSSE; historical biogeography; jump dispersal; LAGRANGE.)

...read moreread less

932 citations

Journal Article•DOI•

[...]

Lam Si Tung Ho¹, Cécile Ané¹•Institutions (1)

University of Wisconsin-Madison¹

A Generalized K Statistic for Estimating Phylogenetic Signal from Shape and Other High-Dimensional Multivariate Data

TL;DR: A linear-time algorithm applicable to a large class of trait evolution models, for efficient likelihood calculations and parameter inference on very large trees, which solves the traditional computational burden associated with two key terms, namely the determinant of the phylogenetic covariance matrix V and quadratic products involving the inverse of V.

...read moreread less

Abstract: We developed a linear-time algorithm applicable to a large class of trait evolution models, for efficient likelihood calculations and parameter inference on very large trees. Our algorithm solves the traditional computational burden associated with two key terms, namely the determinant of the phylogenetic covariance matrix V and quadratic products involving the inverse of V. Applications include Gaussian models such as Brownian motion-derived models like Pagel's lambda, kappa, delta, and the early-burst model; Ornstein-Uhlenbeck models to account for natural selection with possibly varying selection parameters along the tree; as well as non-Gaussian models such as phylogenetic logistic regression, phylogenetic Poisson regression, and phylogenetic generalized linear mixed models. Outside of phylogenetic regression, our algorithm also applies to phylogenetic principal component analysis, phylogenetic discriminant analysis or phylogenetic prediction. The computational gain opens up new avenues for complex models or extensive resampling procedures on very large trees. We identify the class of models that our algorithm can handle as all models whose covariance matrix has a 3-point structure. We further show that this structure uniquely identifies a rooted tree whose branch lengths parametrize the trait covariance matrix, which acts as a similarity matrix. The new algorithm is implemented in the R package phylolm, including functions for phylogenetic linear regression and phylogenetic logistic regression.

...read moreread less

728 citations

Journal Article•DOI•

[...]

Dean C. Adams¹•Institutions (1)

Iowa State University¹

30 Apr 2014-Systematic Biology

TL;DR: A generalization of the K statistic of Blomberg et al. that is useful for quantifying and evaluating phylogenetic signal in highly dimensional multivariate data is described and the utility of the new approach is illustrated by evaluating the strength of phylogenetics signal for head shape in a lineage of Plethodon salamanders.

...read moreread less

Abstract: Phylogenetic signal is the tendency for closely related species to display similar trait values due to their common ancestry. Several methods have been developed for quantifying phylogenetic signal in univariate traits and for sets of traits treated simultaneously, and the statistical properties of these approaches have been extensively studied. However, methods for assessing phylogenetic signal in high-dimensional multivariate traits like shape are less well developed, and their statistical performance is not well characterized. In this article, I describe a generalization of the K statistic of Blomberg et al. that is useful for quantifying and evaluating phylogenetic signal in highly dimensional multivariate data. The method (Kmult) is found from the equivalency between statistical methods based on covariance matrices and those based on distance matrices. Using computer simulations based on Brownian motion, I demonstrate that the expected value of Kmult remains at 1.0 as trait variation among species is increased or decreased, and as the number of trait dimensions is increased. By contrast, estimates of phylogenetic signal found with a squared-change parsimony procedure for multivariate data change with increasing trait variation among species and with increasing numbers of trait dimensions, confounding biological interpretations. I also evaluate the statistical performance of hypothesis testing procedures based on Kmult and find that the method displays appropriate Type I error and high statistical power for detecting phylogenetic signal in high- dimensional data. Statistical properties of Kmult were consistent for simulations using bifurcating and random phylogenies, for simulations using different numbers of species, for simulations that varied the number of trait dimensions, and for different underlying models of trait covariance structure. Overall these findings demonstrate that Kmult provides a useful means of evaluating phylogenetic signal in high-dimensional multivariate traits. Finally, I illustrate the utility of the new approach by evaluating the strength of phylogenetic signal for head shape in a lineage of Plethodon salamanders. (Geometric morphometrics; macroevolution; morphological evolution; phylogenetic comparative method.)

...read moreread less

452 citations

Journal Article•DOI•

Species delimitation using genome-wide SNP data.

[...]

Adam D. Leaché¹, Matthew K. Fujita², Vladimir N. Minin¹, Remco R. Bouckaert³•Institutions (3)

University of Washington¹, University of Texas at Arlington², University of Auckland³

The Influence of Gene Flow on Species Tree Estimation: A Simulation Study

TL;DR: A recently introduced dynamic programming algorithm for estimating species trees that bypasses MCMC integration over gene trees with sophisticated methods for estimating marginal likelihoods, needed for Bayesian model selection, are combined to provide a rigorous and computationally tractable technique for genome-wide species delimitation.

...read moreread less

Abstract: The multispecies coalescent has provided important progress for evolutionary inferences, including increasing the statistical rigor and objectivity of comparisons among competing species delimitation models. However, Bayesian species delimitation methods typically require brute force integration over gene trees via Markov chain Monte Carlo (MCMC), which introduces a large computation burden and precludes their application to genomic-scale data. Here we combine a recently introduced dynamic programming algorithm for estimating species trees that bypasses MCMC integration over gene trees with sophisticated methods for estimating marginal likelihoods, needed for Bayesian model selection, to provide a rigorous and computationally tractable technique for genome-wide species delimitation. We provide a critical yet simple correction that brings the likelihoods of different species trees, and more importantly their corresponding marginal likelihoods, to the same common denominator, which enables direct and accurate comparisons of competing species delimitation models using Bayes factors. We test this approach, which we call Bayes factor delimitation (*with genomic data; BFD*), using common species delimitation scenarios with computer simulations. Varying the numbers of loci and the number of samples suggest that the approach can distinguish the true model even with few loci and limited samples per species. Misspecification of the prior for population size θ has little impact on support for the true model. We apply the approach to West African forest geckos (Hemidactylus fasciatus complex) using genome-wide SNP data. This new Bayesian method for species delimitation builds on a growing trend for objective species delimitation methods with explicit model assumptions that are easily tested. [Bayes factor; model testing; phylogeography; RADseq; simulation; speciation.].

...read moreread less

376 citations

Journal Article•DOI•

[...]

Adam D. Leaché¹, Rebecca Harris¹, Bruce Rannala², Bruce Rannala³, Ziheng Yang⁴, Ziheng Yang³ - Show less +2 more•Institutions (4)

University of Washington¹, University of California, Davis², Beijing Institute of Genomics³, University College London⁴

01 Jan 2014-Systematic Biology

TL;DR: The results highlight the need for careful sampling design in phylogeographic and species delimitation studies as gene flow, introgression, or incorrect sample assignments can bias the estimation of the species tree topology and of parameter estimates such as population sizes and divergence times.

...read moreread less

Abstract: Gene flow among populations or species and incomplete lineage sorting (ILS) are two evolutionary processes responsible for generating gene tree discordance and therefore hindering species tree estimation. Numerous studies have evaluated the impacts of ILS on species tree inference, yet the ramifications of gene flow on species trees remain less studied. Here, we simulate and analyse multilocus sequence data generated with ILS and gene flow to quantify their impacts on species tree inference. We characterize species tree estimation errors under various models of gene flow, such as the isolation-migration model, the n-island model, and gene flow between non-sister species or involving ancestral species, and species boundaries crossed by a single gene copy (allelic introgression) or by a single migrant individual. These patterns of gene flow are explored on species trees of different sizes (4 vs. 10 species), at different time scales (shallow vs. deep), and with different migration rates. Species trees are estimated with the multispecies coalescent model using Bayesian methods (BEST and *BEAST) and with a summary statistic approach (MPEST) that facilitates phylogenomic-scale analysis. Even in cases where the topology of the species tree is estimated with high accuracy, we find that gene flow can result in overestimates of population sizes (species tree dilation) and underestimates of species divergence times (species tree compression). Signatures of migration events remain present in the distribution of coalescent times for gene trees, and with sufficient data it is possible to identify those loci that have crossed species boundaries. These results highlight the need for careful sampling design in phylogeographic and species delimitation studies as gene flow, introgression, or incorrect sample assignments can bias the estimation of the species tree topology and of parameter estimates such as population sizes and divergence times. ( ∗ BEAST; BEST; coalescence; compression; dilation; introgression; MPEST; migration; simulation.)

...read moreread less

297 citations

Journal Article•DOI•

Target Capture and Massively Parallel Sequencing of Ultraconserved Elements for Comparative Studies at Shallow Evolutionary Time Scales

[...]

Brian Tilston Smith¹, Michael G. Harvey¹, Brant C. Faircloth², Travis C. Glenn³, Robb T. Brumfield¹ - Show less +1 more•Institutions (3)

Louisiana State University¹, University of California, Los Angeles², University of Georgia³

01 Jan 2014-Systematic Biology

TL;DR: Using sequence capture and massively parallel sequencing to generate UCE data for five co-distributed Neotropical rainforest bird species, it is found that orthologous UCEs are an effective genetic marker for studies investigating evolutionary patterns and processes at shallow timescales.

...read moreread less

Abstract: Comparative genetic studies of non-model organisms are transforming rapidly due to major advances in sequencing technology. A limiting factor in these studies has been the identification and screening of orthologous loci across an evolutionarily distant set of taxa. Here, we evaluate the efficacy of genomic markers targeting ultraconserved DNA elements (UCEs) for analyses at shallow evolutionary timescales. Using sequence capture and massively parallel sequencing to generate UCE data for five co-distributed Neotropical rainforest bird species, we recovered 776-1516 UCE loci across the five species. Across species, 53-77% of the loci were polymorphic, containing between 2.0 and 3.2 variable sites per polymorphic locus, on average. We performed species tree construction, coalescent modeling, and species delimitation, and we found that the five co-distributed species exhibited discordant phylogeographic histories. We also found that species trees and divergence times estimated from UCEs were similar to the parameters obtained from mtDNA. The species that inhabit the understory had older divergence times across barriers, contained a higher number of cryptic species, and exhibited larger effective population sizes relative to the species inhabiting the canopy. Because orthologous UCEs can be obtained from a wide array of taxa, are polymorphic at shallow evolutionary timescales, and can be generated rapidly at low cost, they are an effective genetic marker for studies investigating evolutionary patterns and processes at shallow timescales. (Birds; coalescent theory; isolation-with-migration; massively parallel sequencing; Neotropics; next-generation sequencing; phylogeography; SNPs.)

...read moreread less

290 citations

Journal Article•DOI•

Borneo and Indochina are Major Evolutionary Hotspots for Southeast Asian Biodiversity

[...]

Mark de Bruyn¹, Björn Stelbrink², Robert J. Morley³, Robert Hall³, Gary R. Carvalho¹, Charles H. Cannon⁴, Charles H. Cannon⁵, Gerrit D. van den Bergh⁶, Erik Meijaard⁷, Erik Meijaard⁸, Ian Metcalfe⁹, Ian Metcalfe¹⁰, Luigi Boitani¹¹, Luigi Maiorano¹¹, Robert Shoup, Thomas von Rintelen² - Show less +12 more•Institutions (11)

Bangor University¹, Humboldt University of Berlin², Royal Holloway, University of London³, Texas Tech University⁴, Chinese Academy of Sciences⁵, University of Wollongong⁶, University of Queensland⁷, Australian National University⁸, Macquarie University⁹, University of New England (Australia)¹⁰, Sapienza University of Rome¹¹

Biogeographic Analysis Reveals Ancient Continental Vicariance and Recent Oceanic Dispersal in Amphibians

TL;DR: Meta-analyses of geological, climatic, and biological data sets are conducted to test which areas have been the sources of long-term biological diversity in SE Asia, particularly in the pre-Miocene, Miocene, and Plio-Pleistocene and whether the respective biota have been dominated by in situ diversification, immigration and/or emigration, or equilibrium dynamics.

...read moreread less

Abstract: Tropical Southeast (SE) Asia harbors extraordinary species richness and in its entirety comprises four of the Earth's 34 biodiversity hotspots. Here, we examine the assembly of the SE Asian biota through time and space. We conduct meta-analyses of geological, climatic, and biological (including 61 phylogenetic) data sets to test which areas have been the sources of long-term biological diversity in SE Asia, particularly in the pre-Miocene, Miocene, and Plio-Pleistocene, and whether the respective biota have been dominated by in situ diversification, immigration and/or emigration, or equilibrium dynamics. We identify Borneo and Indochina, in particular, as major "evolutionary hotspots" for a diverse range of fauna and flora. Although most of the region's biodiversity is a result of both the accumulation of immigrants and in situ diversification, within-area diversification and subsequent emigration have been the predominant signals characterizing Indochina and Borneo's biota since at least the early Miocene. In contrast, colonization events are comparatively rare from younger volcanically active emergent islands such as Java, which show increased levels of immigration events. Few dispersal events were observed across the major biogeographic barrier of Wallace's Line. Accelerated efforts to conserve Borneo's flora and fauna in particular, currently housing the highest levels of SE Asian plant and mammal species richness, are critically required.

...read moreread less

272 citations

Journal Article•DOI•

[...]

R. Alexander Pyron¹•Institutions (1)

George Washington University¹

01 Sep 2014-Systematic Biology

TL;DR: It is found that Pangaean origin and subsequent Laurasian and Gondwanan fragmentation explain a large proportion of patterns in the distribution of extant species, and dispersal during the Cenozoic has also exerted a strong influence.

...read moreread less

Abstract: Amphibia comprises over 7000 extant species distributed in almost every ecosystem on every continent except Antarctica. Most species also show high specificity for particular habitats, biomes, or climatic niches, seemingly rendering long-distance dispersal unlikely. Indeed, many lineages still seem to show the signature of their Pangaean origin, approximately 300 Ma later. To date, no study has attempted a large-scale historical-biogeographic analysis of the group to understand the distribution of extant lineages. Here, I use an updated chronogram containing 3309 species (~45% of extant diversity) to reconstruct their movement between 12 global ecoregions. I find that Pangaean origin and subsequent Laurasian and Gondwanan fragmentation explain a large proportion of patterns in the distribution of extant species. However, dispersal during the Cenozoic, likely across land bridges or short distances across oceans, has also exerted a strong influence. Finally, there are at least three strongly supported instances of long-distance oceanic dispersal between former Gondwanan landmasses during the Cenozoic. Extinction from intervening areas seems to be a strong factor in shaping present-day distributions. Dispersal and extinction from and between ecoregions are apparently tied to the evolution of extraordinarily adaptive expansion-oriented phenotypes that allow lineages to easily colonize new areas and diversify, or conversely, to extremely specialized phenotypes or heavily relictual climatic niches that result in strong geographic localization and limited diversification. (Amphibians; caecilians; dispersal; frogs; historical biogeography; oceanic dispersal; salamanders; vicariance.)

...read moreread less

269 citations

Journal Article•DOI•

Species Delimitation Using Bayes Factors: Simulations and Application to the Sceloporus scalaris Species Group (Squamata: Phrynosomatidae)

[...]

Jared A. Grummer¹, Robert W. Bryson¹, Tod W. Reeder¹•Institutions (1)

San Diego State University¹

A novel Bayesian method for inferring and interpreting the dynamics of adaptive landscapes from phylogenetic comparative data.

TL;DR: Bayes factor delimitation of species showed improved performance when species limits are tested by reassigning individuals between species, as opposed to either lumping or splitting lineages, and marginal-likelihood estimates via PS or SS analyses provide a useful and complementary alternative to existing species delimitation methods.

...read moreread less

Abstract: Current molecular methods of species delimitation are limited by the types of species delimitation models and scenarios that can be tested. Bayes factors allow for more flexibility in testing non-nested species delimitation models and hypotheses of individual assignment to alternative lineages. Here, we examined the efficacy of Bayes factors in delimiting species through simulations and empirical data from the Sceloporus scalaris species group. Marginal-likelihood scores of competing species delimitation models, from which Bayes factor values were compared, were estimated with four different methods: harmonic mean estimation (HME), smoothed harmonic mean estimation (sHME), path-sampling/thermodynamic integration (PS), and stepping-stone (SS) analysis. We also performed model selection using a posterior simulation-based analog of the Akaike information criterion through Markov chain Monte Carlo analysis (AICM). Bayes factor species delimitation results from the empirical data were then compared with results from the reversible-jump MCMC (rjMCMC) coalescent-based species delimitation method Bayesian Phylogenetics and Phylogeography (BPP *BEAST; BPP incomplete lineage sorting; marginal-likelihood estimation; Mexico; model choice.)

...read moreread less

268 citations

Journal Article•DOI•

[...]

Josef C. Uyeda¹, Luke J. Harmon¹•Institutions (1)

University of Idaho¹

30 Jul 2014-Systematic Biology

TL;DR: It is argued that Bayesian model fitting of OU models to comparative data provides a framework for integrating of multiple sources of biological data-such as microevolutionary estimates of selection parameters and paleontological timeseries-allowing inference of adaptive landscape dynamics with explicit, process-based biological interpretations.

...read moreread less

Abstract: Our understanding of macroevolutionary patterns of adaptive evolution has greatly increased with the advent of large-scale phylogenetic comparative methods. Widely used Ornstein-Uhlenbeck (OU) models can describe an adaptive process of divergence and selection. However, inference of the dynamics of adaptive landscapes from comparative data is complicated by interpretational difficulties, lack of identifiability among parameter values and the common requirement that adaptive hypotheses must be assigned a priori. Here, we develop a reversible-jump Bayesian method of fitting multi-optima OU models to phylogenetic comparative data that estimates the placement and magnitude of adaptive shifts directly from the data. We show how biologically informed hypotheses can be tested against this inferred posterior of shift locations using Bayes Factors to establish whether our a priori models adequately describe the dynamics of adaptive peak shifts. Furthermore, we show how the inclusion of informative priors can be used to restrict models to biologically realistic parameter space and test particular biological interpretations of evolutionary models. We argue that Bayesian model fitting of OU models to comparative data provides a framework for integrating of multiple sources of biological data-such as microevolutionary estimates of selection parameters and paleontological timeseries-allowing inference of adaptive landscape dynamics with explicit, process-based biological interpretations.

...read moreread less

255 citations

Journal Article•DOI•

Analysis and Visualization of Complex Macroevolutionary Dynamics: An Example from Australian Scincid Lizards

[...]

Daniel L. Rabosky¹, Stephen C. Donnellan², Stephen C. Donnellan³, Michael C. Grundler¹, Irby J. Lovette¹ - Show less +1 more•Institutions (3)

University of Michigan¹, University of Adelaide², South Australian Museum³

A Gateway for Phylogenetic Analysis Powered by Grid Computing Featuring GARLI 2.0

TL;DR: It is demonstrated that major axes of morphological variation can be decoupled from species diversification and the Bayesian framework described here can be used to identify and characterize complex mixtures of dynamic processes on phylogenetic trees.

...read moreread less

Abstract: The correlation between species diversification and morphological evolution has long been of interest in evolutionary biology. We investigated the relationship between these processes during the radiation of 250+scincid lizards that constitute Australia's most species-rich clade of terrestrial vertebrates. We generated a time-calibrated phylogenetic tree for the group that was more than 85% complete at the species level and collected multivariate morphometric data for 183 species. We reconstructed the dynamics of species diversification and trait evolution using a Bayesian statistical framework (BAMM) that simultaneously accounts for variation in evolutionary rates through time and among lineages. We extended the BAMM model to accommodate time-dependent phenotypic evolution, and we describe several new methods for summarizing and visualizing macroevolutionary rate heterogeneity on phylogenetic trees. Two major clades (Lerista, Ctenotus; >90 spp. each) are associated with high rates of species diversification relative to the background rate across Australian sphenomorphine skinks. The Lerista clade is characterized by relatively high lability of body form and has undergone repeated instances of limb reduction, but Ctenotus is characterized by an extreme deceleration in the rate of body shape evolution. We estimate that rates of phenotypic evolution decreased by more than an order of magnitude in the common ancestor of the Ctenotus clade. These results provide evidence for a modal shift in phenotypic evolutionary dynamics and demonstrate that major axes of morphological variation can be decoupled from species diversification. More generally, the Bayesian framework described here can be used to identify and characterize complex mixtures of dynamic processes on phylogenetic trees. [Bayesian; diversification; evolvability; lizard; macroevolution, punctuated equilibrium, speciation.].

...read moreread less

Journal Article•DOI•

[...]

Adam L. Bazinet¹, Derrick J. Zwickl¹, Michael P. Cummings¹•Institutions (1)

University of Maryland, College Park¹

01 Sep 2014-Systematic Biology

TL;DR: Molecularevolution.org, a publicly available gateway for high-throughput, maximum-likelihood phylogenetic analysis powered by grid computing, is introduced and details about how the grid system efficiently delivers high-quality phylogenetic results are provided.

...read moreread less

Abstract: We introduce molecularevolution.org, a publicly available gateway for high-throughput, maximum-likelihood phylogenetic analysis powered by grid computing. The gateway features a GARLI 2.0 web service that enables a user to quickly and easily submit thousands of maximum likelihood tree searches or bootstrap searches that are executed in parallel on distributed computing resources. The GARLI web service allows one to easily specify partitioned substitution models using a graphical interface, and it performs sophisticated post-processing of phylogenetic results. Although the GARLI web service has been used by the research community for over three years, here we formally announce the availability of the service, describe its capabilities, highlight new features and recent improvements, and provide details about how the grid system efficiently delivers high-quality phylogenetic results. (GARLI, gateway, grid computing, maximum likelihood, molecular evolution portal, phylogenetics, web service.)

...read moreread less

Journal Article•DOI•

Chloroplast Phylogenomic Analyses Resolve Deep-Level Relationships of an Intractable Bamboo Tribe Arundinarieae (Poaceae)

[...]

Peng-Fei Ma¹, Yu Xiao Zhang¹, Chun-Xia Zeng¹, Zhen-Hua Guo¹, De-Zhu Li¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

Coalescent versus Concatenation Methods and the Placement of Amborella as Sister to Water Lilies

TL;DR: It is illuminated how chloroplast phylogenomics can be used for elucidating difficult phylogeny at low taxonomic levels in intractable plant groups.

...read moreread less

Abstract: The temperate woody bamboos constitute a distinct tribe Arundinarieae (Poaceae: Bambusoideae) with high species diversity. Estimating phylogenetic relationships among the 11 major lineages of Arundinarieae has been particularly difficult, owing to a possible rapid radiation and the extremely low rate of sequence divergence. Here, we explore the use of chloroplast genome sequencing for phylogenetic inference. We sampled 25 species (22 temperate bamboos and 3 outgroups) for the complete genome representing eight major lineages of Arundinarieae in an attempt to resolve backbone relationships. Phylogenetic analyses of coding versus noncoding sequences, and of different regions of the genome (large single copy and small single copy, and inverted repeat regions) yielded no well-supported contradicting topologies but potential incongruence was found between the coding and noncoding sequences. The use of various data partitioning schemes in analysis of the complete sequences resulted in nearly identical topologies and node support values, although the partitioning schemes were decisively different from each other as to the fit to the data. Our full genomic data set substantially increased resolution along the backbone and provided strong support for most relationships despite the very short internodes and long branches in the tree. The inferred relationships were also robust to potential confounding factors (e.g., long-branch attraction) and received support from independent indels in the genome. We then added taxa from the three Arundinarieae lineages that were not included in the full-genome data set; each of these were sampled for more than 50% genome sequences. The resulting trees not only corroborated the reconstructed deep-level relationships but also largely resolved the phylogenetic placements of these three additional lineages. Furthermore, adding 129 additional taxa sampled for only eight chloroplast loci to the combined data set yielded almost identical relationships, albeit with low support values. We believe that the inferred phylogeny is robust to taxon sampling. Having resolved the deep-level relationships of Arundinarieae, we illuminate how chloroplast phylogenomics can be used for elucidating difficult phylogeny at low taxonomic levels in intractable plant groups. (Branch length; chloroplast phylogenomics; data partitioning; low sequence divergence; sampling; temperate woody bamboos.)

...read moreread less

Journal Article•DOI•

[...]

Zhenxiang Xi¹, Liang-Liang Liu², Joshua S. Rest³, Charles C. Davis¹•Institutions (3)

Harvard University¹, University of Georgia², Stony Brook University³

Conflicting Phylogenies for Early Land Plants are Caused by Composition Biases among Synonymous Substitutions

TL;DR: A broad coalescent-based species tree estimation of 45 seed plants is provided and it is suggested that the Amborella alone placement inferred using concatenation methods is likely misled by fast-evolving sites, which appear to be more robust to elevated substitution rates.

...read moreread less

Abstract: The molecular era has fundamentally reshaped our knowledge of the evolution and diversification of angiosperms. One outstanding question is the phylogenetic placement of Amborella trichopoda Baill., commonly thought to represent the first lineage of extant angiosperms. Here, we leverage publicly available data and provide a broad coalescent-based species tree estimation of 45 seed plants. By incorporating 310 nuclear genes, our coalescent analyses strongly support a clade containing Amborella plus water lilies (i.e., Nymphaeales) that is sister to all other angiosperms across different nucleotide rate partitions. Our results also show that commonly applied concatenation methods produce strongly supported, but incongruent placements of Amborella: slow-evolving nucleotide sites corroborate results from coalescent analyses, whereas fast-evolving sites place Amborella alone as the first lineage of extant angiosperms. We further explored the performance of coalescent versus concatenation methods using nucleotide sequences simulated on (i) the two alternate placements of Amborella with branch lengths and substitution model parameters estimated from each of the 310 nuclear genes and (ii) three hypothetical species trees that are topologically identical except with respect to the degree of deep coalescence and branch lengths. Our results collectively suggest that the Amborella alone placement inferred using concatenation methods is likely misled by fast-evolving sites. This appears to be exacerbated by the combination of long branches in stem group angiosperms, Amborella, and Nymphaeales with the short internal branch separating Amborella and Nymphaeales. In contrast, coalescent methods appear to be more robust to elevated substitution rates.

...read moreread less

Journal Article•DOI•

[...]

Cymon J. Cox¹, Blaise Li¹, Peter G. Foster², T. Martin Embley³, Peter Civáň¹ - Show less +1 more•Institutions (3)

University of the Algarve¹, Natural History Museum², University of Newcastle³

Bayesian Estimation of Speciation and Extinction from Incomplete Fossil Occurrence Data

TL;DR: Despite the similarity among land–plant life cycles, they differ in one significant aspect: in the three bryophyte groups, the haploid gametophytic stage is the dominant vegetative stage, whereas in vascular plants the diploid sporophyte dominates.

...read moreread less

Abstract: Plants are the primary producers of the terrestrial ecosystems that dominate much of the natural environment. Occurring approximately 480 Ma (Sanderson 2003; Kenrick et al. 2012), the evolutionary transition of plants from an aquatic to a terrestrial environment was accompanied by several major developmental innovations. The freshwater charophyte ancestors of land plants have a haplobiontic life cycle with a single haploid multicellular stage, whereas land plants, which include the bryophytes (liverworts, hornworts, and mosses) and tracheophytes (also called vascular plants, namely, lycopods, ferns, and seed plants), exhibit a marked alternation of generations with a diplobiontic life cycle with both haploid and diploid multicellular stages and where the embryo remains attached to, and is nourished by, the gametophyte (Haig 2008). The interjection of a multicellular diploid phase into the land–plant life cycle was an important adaptation that enabled long-distance dispersal via mitotic spores where waterborne male gametes have restricted motility in dry terrestrial environments. Despite the similarity among land–plant life cycles, they differ in one significant aspect: in the three bryophyte groups, the haploid gametophytic stage is the dominant vegetative stage, whereas in vascular plants the diploid sporophyte dominates. A common assumption, and one implied by the tradition of referring to bryophytes as “lower plants”—in contrast to the “higher” tracheophytes—is that the bryophytes and their life cycle are primitive (Kato and Akiyama 2005). However, without a strong phylogenetic hypothesis of land–plant relationships, it is not clear which (if either) of the gametophyte or sporophyte was the dominant ancestral vegetative state present in the earliest land plants (Renzaglia et al. 2007; Qiu et al. 2012).

...read moreread less

Journal Article•DOI•

[...]

Daniele Silvestro¹, Jan Schnitzler², Lee Hsiang Liow³, Alexandre Antonelli⁴, Nicolas Salamin⁵, Nicolas Salamin¹ - Show less +2 more•Institutions (5)

University of Lausanne¹, Goethe University Frankfurt², University of Oslo³, University of Gothenburg⁴, Swiss Institute of Bioinformatics⁵

08 Feb 2014-Systematic Biology

TL;DR: A new probabilistic framework to jointly estimate species-specific times of speciation and extinction and the rates of the underlying birth-death process based on the fossil record is presented and represents a step towards integrating phylogenetic and fossil information to infer macroevolutionary processes.

...read moreread less

Abstract: The temporal dynamics of species diversity are shaped by variations in the rates of speciation and extinction, and there is a long history of inferring these rates using first and last appearances of taxa in the fossil record. Understanding diversity dynamics critically depends on unbiased estimates of the unobserved times of speciation and extinction for all lineages, but the inference of these parameters is challenging due to the complex nature of the available data. Here, we present a new probabilistic framework to jointly estimate species-specific times of speciation and extinction and the rates of the underlying birth-death process based on the fossil record. The rates are allowed to vary through time independently of each other, and the probability of preservation and sampling is explicitly incorporated in the model to estimate the true lifespan of each lineage. We implement a Bayesian algorithm to assess the presence of rate shifts by exploring alternative diversification models. Tests on a range of simulated data sets reveal the accuracy and robustness of our approach against violations of the underlying assumptions and various degrees of data incompleteness. Finally, we demonstrate the application of our method with the diversification of the mammal family Rhinocerotidae and reveal a complex history of repeated and independent temporal shifts of both speciation and extinction rates, leading to the expansion and subsequent decline of the group. The estimated parameters of the birth-death process implemented here are directly comparable with those obtained from dated molecular phylogenies. Thus, our model represents a step towards integrating phylogenetic and fossil information to infer macroevolutionary processes.(BDMCMC; biodiversity trends; Birth-death process; incomplete fossil sampling; macroevolution; species rise and fall.)

...read moreread less

Journal Article•DOI•

Quantifying and Comparing Phylogenetic Evolutionary Rates for Shape and other High-Dimensional Phenotypic Data

[...]

Dean C. Adams

Molecular Dating, Evolutionary Rates, and the Age of the Grasses

TL;DR: A method to quantify phylogenetic evolutionary rates for high-dimensional multivariate data (σ2 mult) is described, found from the equivalency between statistical methods based on covariance matrices and those based on distance matrices (R-mode and Q-mode methods), and the utility is illustrated by evaluating rates of head shape evolution in a lineage of Plethodon salamanders.

...read moreread less

Abstract: Many questions in evolutionary biology require the quantification and comparison of rates of phenotypic evolution. Recently, phylogenetic comparative methods have been developed for comparing evolutionary rates on a phylogeny for single, univariate traits (� 2 ), and evolutionary rate matrices (R) for sets of traits treated simultaneously. However, high-dimensional traits like shape remain under-examined with this framework, because methods suited for such data have not been fully developed. In this article, I describe a method to quantify phylogenetic evolutionary rates for high-dimensional multivariate data (� 2 ), found from the equivalency between statistical methods based on covariance matrices and those based on distance matrices (R-mode and Q-mode methods). I then use simulations to evaluate the statistical performance of hypothesis-testing procedures that compare � 2 for two or more groups of species on a phylogeny. Under both isotropic and non-isotropic conditions, and for differing numbers of trait dimensions, the proposed method displays appropriate Type I error and high statistical power for detecting known differences in � 2 among groups. In contrast, the Type I error rate of likelihood tests based on the evolutionary rate matrix (R) increases as the number of trait dimensions (p) increases, and becomes unacceptably large when only a few trait dimensions are considered. Further, likelihood tests based on R cannot be computed when the number of trait dimensions equals or exceeds the number of taxa in the phylogeny (i.e., when p ≥ N). These results demonstrate that tests based on � 2 provide a useful means of comparing evolutionary rates for high-dimensional data that are otherwise not analytically accessible to methods based on the evolutionary rate matrix. This advance thus expands the phylogenetic comparative toolkit for high-dimensional phenotypic traits like shape. Finally, I illustrate the utility of the new approach by evaluating rates of head shape evolution in a lineage of Plethodon salamanders. (Evolutionary rates; geometric morphometrics; macroevolution; morphological evolution; phylogenetic comparative method.).

...read moreread less

Journal Article•DOI•

[...]

Pascal-Antoine Christin¹, Pascal-Antoine Christin², Elizabeth L. Spriggs², Colin P. Osborne¹, Caroline A.E. Strömberg³, Nicolas Salamin⁴, Nicolas Salamin⁵, Erika J. Edwards² - Show less +4 more•Institutions (5)

University of Sheffield¹, Brown University², University of Washington³, Swiss Institute of Bioinformatics⁴, University of Lausanne⁵

Global biodiversity assessment and hyper-cryptic species complexes: more than one species of elephant in the room?

TL;DR: It is shown that molecular dating based on a data set of plastid markers is strongly dependent on the model assumptions, and that phylogenetic markers extracted from complete nuclear genomes can be a useful complement to the more commonly used plASTid markers.

...read moreread less

Abstract: Many questions in evolutionary biology require an estimate of divergence times but, for groups with a sparse fossil record, such estimates rely heavily on molecular dating methods. The accuracy of these methods depends on both an adequate underlying model and the appropriate implementation of fossil evidence as calibration points. We explore the effect of these in Poaceae (grasses), a diverse plant lineage with a very limited fossil record, focusing particularly on dating the early divergences in the group. We show that molecular dating based on a data set of plastid markers is strongly dependent on the model assumptions. In particular, an acceleration of evolutionary rates at the base of Poaceae followed by a deceleration in the descendants strongly biases methods that assume an autocorrelation of rates. This problem can be circumvented by using markers that have lower rate variation, and we show that phylogenetic markers extracted from complete nuclear genomes can be a useful complement to the more commonly used plastid markers. However, estimates of divergence times remain strongly affected by different implementations of fossil calibration points. Analyses calibrated with only macrofossils lead to estimates for the age of core Poaceae ∼51-55 Ma, but the inclusion of microfossil evidence pushes this age to 74-82 Ma and leads to lower estimated evolutionary rates in grasses. These results emphasize the importance of considering markers from multiple genomes and alternative fossil placements when addressing evolutionary issues that depend on ages estimated for important groups. (divergence time; molecular dating; mutation rate; phylogeny; Poaceae.)

...read moreread less

Journal Article•DOI•

[...]

Mark Adams¹, Mark Adams², Tarmo A. Raadik³, Tarmo A. Raadik⁴, Christopher P. Burridge⁵, Arthur Georges⁴ - Show less +2 more•Institutions (5)

University of Adelaide¹, South Australian Museum², Arthur Rylah Institute for Environmental Research³, University of Canberra⁴, University of Tasmania⁵

The emergence of lobsters: phylogenetic relationships, morphological evolution and divergence time comparisons of an ancient group (decapoda: achelata, astacidea, glypheidea, polychelida).

TL;DR: This work explores the significance and extent of so-called "hyper-cryptic" species complexes, using the Australian freshwater fish Galaxias olidus as a proxy for any organism whose taxonomy ought to be largely finalized when compared to those in little-studied or morphologically undifferentiated groups.

...read moreread less

Abstract: Several recent estimates of global biodiversity have concluded that the total number of species on Earth lies near the lower end of the wide range touted in previous decades. However, none of these recent estimates formally explore the real "elephant in the room", namely, what proportion of species are taxonomically invisible to conventional assessments, and thus, as undiagnosed cryptic species, remain uncountable until revealed by multi-gene molecular assessments. Here we explore the significance and extent of so-called "hyper-cryptic" species complexes, using the Australian freshwater fish Galaxias olidus as a proxy for any organism whose taxonomy ought to be largely finalized when compared to those in little-studied or morphologically undifferentiated groups. Our comprehensive allozyme (838 fish for 54 putative loci), mtDNA (557 fish for 605 bp of cytb), and morphological (1963-3389 vouchers for 17-58 characters) assessment of this species across its broad geographic range revealed a 1500% increase in species-level biodiversity, and suggested that additional taxa may remain undiscovered. Importantly, while all 15 candidate species were morphologically diagnosable a posteriori from one another, single-gene DNA barcoding proved largely unsuccessful as an a priori method for species identification. These results lead us to draw two strong inferences of relevance to estimates of global biodiversity. First, hyper-cryptic complexes are likely to be common in many organismal groups. Second, no assessment of species numbers can be considered "best practice" in the molecular age unless it explicitly includes estimates of the extent of cryptic and hyper-cryptic biodiversity. [Galaxiidae; global estimates; hyper-diverse; mountain galaxias; species counts; species richness.].

...read moreread less

Journal Article•DOI•

[...]

Heather D. Bracken-Grissom¹, Shane T. Ahyong², Richard D. Wilkinson³, Rodney M. Feldmann⁴, Carrie E. Schweitzer⁵, Jesse W. Breinholt⁶, Matthew L. Bendall⁷, Ferran Palero⁸, Tin-Yam Chan⁹, Darryl L. Felder¹⁰, Rafael Robles¹¹, Ka Hou Chu¹², Ling Ming Tsang¹², Dohyup Kim⁷, Joel W. Martin¹³, Keith A. Crandall¹⁴ - Show less +12 more•Institutions (14)

Florida International University¹, University of New South Wales², University of Nottingham³, Kent State University⁴, Kent State University at Stark⁵, Florida Museum of Natural History⁶, Brigham Young University⁷, University of Valencia⁸, National Taiwan Ocean University⁹, University of Louisiana at Lafayette¹⁰, University of São Paulo¹¹, The Chinese University of Hong Kong¹², Natural History Museum of Los Angeles County¹³, George Washington University¹⁴

Coalescent Species Delimitation in Milksnakes (Genus Lampropeltis) and Impacts on Phylogenetic Comparative Analyses

TL;DR: This study estimated phylogenetic relationships among the major groups of all lobster families and 94% of the genera using six genes (mitochondrial and nuclear) and 195 morphological characters across 173 species of lobsters for the most comprehensive sampling to date.

...read moreread less

Abstract: Lobsters are a ubiquitous and economically important group of decapod crustaceans that include the infraorders Polychelida, Glypheidea, Astacidea and Achelata. They include familiar forms such as the spiny, slipper, clawed lobsters and crayfish and unfamiliar forms such as the deep-sea and “living fossil” species. The high degree of morphological diversity among these infraorders has led to a dynamic classification and conflicting hypotheses of evolutionary relationships. In this study, we estimated phylogenetic relationships among the major groups of all lobster families and 94% of the genera using six genes (mitochondrial and nuclear) and 195 morphological characters across 173 species of lobsters for the most comprehensive sampling to date. Lobsters were recovered as a non-monophyletic assemblage in the combined (molecular + morphology) analysis. All families were monophyletic, with the exception of Cambaridae, and 7 of 79 genera were recovered as poly- or paraphyletic. A rich fossil history coupled with dense taxon coverage allowed us to estimate and compare divergence times and origins of major lineages using two drastically different approaches. Age priors were constructed and/or included based on fossil age information or fossil discovery, age, and extant species count data. Results from the two approaches were largely congruent across deep to shallow taxonomic divergences across major lineages. The origin of the first lobster-like decapod (Polychelida) was estimated in the Devonian (∼409–372 Ma) with all infraorders present in the Carboniferous (∼353–318 Ma). Fossil calibration subsampling studies examined the influence of sampling density (number of fossils) and placement (deep, middle, and shallow) on divergence time estimates. Results from our study suggest including at least 1 fossil per 10 operational taxonomic units (OTUs) in divergence dating analyses. [Dating; decapods; divergence; lobsters; molecular; morphology; phylogenetics.]

...read moreread less

Journal Article•DOI•

[...]

Sara Ruane¹, Sara Ruane², Robert W. Bryson³, R. Alexander Pyron⁴, Frank T. Burbrink¹, Frank T. Burbrink² - Show less +2 more•Institutions (4)

College of Staten Island¹, City University of New York², American Museum of Natural History³, George Washington University⁴

Morphological Clocks in Paleontology, and a Mid-Cretaceous Origin of Crown Aves

TL;DR: This study examines how delimitation of previously unrecognized diversity in the milksnake (Lampropeltis triangulum) and use of a species-tree approach affects both estimation of the Lampropelti phylogeny and comparative analyses with respect to the timing of diversification.

...read moreread less

Abstract: Both gene-tree discordance and unrecognized diversity are sources of error for accurate estimation of species trees, and can affect downstream diversification analyses by obscuring the correct number of nodes, their density, and the lengths of the branches subtending them. Although the theoretical impact of gene-tree discordance on evolutionary analyses has been examined previously, the effect of unsampled and cryptic diversity has not. Here, we examine how delimitation of previously unrecognized diversity in the milksnake (Lampropeltis triangulum) and use of a species-tree approach affects both estimation of the Lampropeltis phylogeny and comparative analyses with respect to the timing of diversification. Coalescent species delimitation indicates that L. triangulum is not monophyletic and that there are multiple species of milksnake, which increases the known species diversity in the genus Lampropeltis by 40%. Both genealogical and temporal discordance occurs between gene trees and the species tree, with evidence that mitochondrial DNA (mtDNA) introgression is a main factor. This discordance is further manifested in the preferred models of diversification, where the concatenated gene tree strongly supports an early burst of speciation during the Miocene, in contrast to species-tree estimates where diversification follows a birth-death model and speciation occurs mostly in the Pliocene and Pleistocene. This study highlights the crucial interaction among coalescent-based phylogeography and species delimitation, systematics, and species diversification analyses. (Divergence-time estimation; diversification rates; Lampropeltini; gene-tree/species- tree discordance; mtDNA introgression; Pleistocene diversification.)

...read moreread less

Journal Article•DOI•

[...]

Michael S. Y. Lee¹, Andrea Cau², Darren Naish, Gareth J. Dyke³•Institutions (3)

South Australian Museum¹, University of Adelaide², University of Southampton³

Robust Regression and Posterior Predictive Simulation Increase Power to Detect Early Bursts of Trait Evolution

TL;DR: The oldest moleculardates further imply an extraordinarily rapid earlybird evolution, with the modern birds appearing only 20 myr after the KPg boundary.

...read moreread less

Abstract: Birds are among the most diverse and intensivelystudied vertebrate groups, but many aspects of theirhigher-level phylogeny and evolution still remaincontroversial. One contentious issue concerns theantiquity of modern birds (=crown Aves): the ageof the most recent common ancestor of all livingbirds (Gauthier 1986). Very few Mesozoic fossilsare attributable to modern birds (e.g., Clarke et al.2005; Dyke and Kaiser 2011; Brocklehurst et al. 2012;Ksepka and Boyd 2012) suggesting that they diversiﬁedlargely or entirely in the early Paleogene, perhaps in theecologicalvacuumcreatedbytheextinctionofnon-aviandinosaurs, pterosaurs, and many archaic (stem) birds(e.g.,Longrichetal.2011).Incontrast,molecularstudiesindicate that modern birds commenced radiating deepwithin the Mesozoic, for example ∼130Ma(Cooperand Penny 1997; Haddrath and Baker 2012)or∼113 Ma(Jetz et al. 2012), with ratites, galliforms, anseriforms,shorebirds, and even passerines surviving acrossthe KPg boundary (∼66 Ma). The oldest moleculardates further imply an extraordinarily rapid earlybird evolution, with the modern birds appearingonly 20 myr after

...read moreread less

Journal Article•DOI•

[...]

Graham J. Slater¹, Graham J. Slater², Matthew W. Pennell³, Matthew W. Pennell⁴•Institutions (4)

National Museum of Natural History¹, University of California, Los Angeles², University of Idaho³, National Evolutionary Synthesis Center⁴

Mitochondrial phylogenomics of early land plants: mitigating the effects of saturation, compositional heterogeneity, and codon-usage bias.

TL;DR: This work develops posterior predictive simulation approaches and shows that they outperform maximum likelihood approaches at identifying early bursts at moderate strength, and advocates the adoption of similar posterior predictive approaches to improve the fit and to assess the adequacy of macroevolutionary models in general.

...read moreread less

Abstract: A central prediction of much theory on adaptive radiations is that traits should evolve rapidly during the early stages of a clade's history and subsequently slowdown in rate as niches become saturated—a so-called "Early Burst." Although a common pattern in the fossil record, evidence for early bursts of trait evolution in phylogenetic comparative data has been equivocal at best. We show here that this may not necessarily be due to the absence of this pattern in nature. Rather, commonly used methods to infer its presence perform poorly when when the strength of the burst—the rate at which phenotypic evolution declines—is small, and when some morphological convergence is present within the clade. We present two modifications to existing comparative methods that allow greater power to detect early bursts in simulated datasets. First, we develop posterior predictive simulation approaches and show that they outperform maximum likelihood approaches at identifying early bursts at moderate strength. Second, we use a robust regression procedure that allows for the identification and down-weighting of convergent taxa, leading to moderate increases in method performance. We demonstrate the utility and power of these approach by investigating the evolution of body size in cetaceans. Model fitting using maximum likelihood is equivocal with regards the mode of cetacean body size evolution. However, posterior predictive simulation combined with a robust node height test return low support for Brownian motion or rate shift models, but not the early burst model. While the jury is still out on whether early bursts are actually common in nature, our approach will hopefully facilitate more robust testing of this hypothesis. We advocate the adoption of similar posterior predictive approaches to improve the fit and to assess the adequacy of macroevolutionary models in general. (Adaptive Radiations, Early Burst, Posterior Predictive Simulations, Quantitative Characters)

...read moreread less

Journal Article•DOI•

[...]

Yang Liu¹, Cymon J. Cox¹, Wei Wang¹, Bernard Goffinet¹•Institutions (1)

University of Connecticut¹

Probabilistic Graphical Model Representation in Phylogenetics

TL;DR: It is determined that land plant lineages differ in their nucleotide composition, and in their usage of synonymous codon variants, and it is concluded that while genomic data may generate highly supported phylogenetic trees, these inferences may be artifacts.

...read moreread less

Abstract: Phylogenetic analyses using concatenation of genomic-scale data have been seen as the panacea for resolving the incongruences among inferences from few or single genes. However, phylogenomics may also suffer from systematic errors, due to the, perhaps cumulative, effects of saturation, among-taxa compositional (GC content) heterogeneity, or codon-usage bias plaguing the individual nucleotide loci that are concatenated. Here, we provide an example of how these factors affect the inferences of the phylogeny of early land plants based on mitochondrial genomic data. Mitochondrial sequences evolve slowly in plants and hence are thought to be suitable for resolving deep relationships. We newly assembled mitochondrial genomes from 20 bryophytes, complemented these with 40 other streptophytes (land plants plus algal outgroups), compiling a data matrix of 60 taxa and 41 mitochondrial genes. Homogeneous analyses of the concatenated nucleotide data resolve mosses as sister-group to the remaining land plants. However, the corresponding translated amino acid data support the liverwort lineage in this position. Both results receive weak to moderate support in maximum-likelihood analyses, but strong support in Bayesian inferences. Tests of alternative hypotheses using either nucleotide or amino acid data provide implicit support for their respective optimal topologies, and clearly reject the hypotheses that bryophytes are monophyletic, liverworts and mosses share a unique common ancestor, or hornworts are sister to the remaining land plants. We determined that land plant lineages differ in their nucleotide composition, and in their usage of synonymous codon variants. Composition heterogeneous Bayesian analyses employing a nonstationary model that accounts for variation in among-lineage composition, and inferences from degenerated nucleotide data that avoid the effects of synonymous substitutions that underlie codon-usage bias, again recovered liverworts being sister to the remaining land plants but without support. These analyses indicate that the inference of an early-branching moss lineage based on the nucleotide data is caused by convergent compositional biases. Accommodating among-site amino acid compositional heterogeneity (CAT- model) yields no support for the optimal resolution of liverwort as sister to the rest of land plants, suggesting that the robust inference of the liverwort position in homogeneous analyses may be due in part to compositional biases among sites. All analyses support a paraphyletic bryophytes with hornworts composing the sister-group to tracheophytes. We conclude that while genomic data may generate highly supported phylogenetic trees, these inferences may be artifacts. We suggest that phylogenomic analyses should assess the possible impact of potential biases through comparisons of protein-coding gene data and their amino acid translations by evaluating the impact of substitutional saturation, synonymous substitutions, and compositional biases through data deletion strategies and by analyzing the data using heterogeneous composition models. We caution against relying on any one presentation of the data (nucleotide or amino acid) or any one type of analysis even when analyzing large-scale data sets, no matter how well-supported, without fully exploring the effects of substitution models. (Compositional heterogeneity; early land plants; evolutionary saturation; mitochondrial genome; phylogenomics; synonymous codon-usage bias.)

...read moreread less

Journal Article•DOI•

[...]

Sebastian Höhna¹, Sebastian Höhna², Tracy A. Heath³, Tracy A. Heath⁴, Bastien Boussau⁵, Bastien Boussau⁴, Michael J. Landis⁶, Fredrik Ronquist⁶, John P. Huelsenbeck⁷, John P. Huelsenbeck⁴ - Show less +6 more•Institutions (7)

Stockholm University¹, University of California, Davis², University of Kansas³, University of California, Berkeley⁴, University of Lyon⁵, Swedish Museum of Natural History⁶, King Abdulaziz University⁷

20 Jun 2014-Systematic Biology

TL;DR: An introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics is provided and a new graphical model component, tree plates, is introduced to capture the changing structure of the subgraph corresponding to a phylogenetic tree.

...read moreread less

Abstract: Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis-Hastings or Gibbs sampling of the posterior distribution.

...read moreread less

Journal Article•DOI•

Introgression and phenotypic assimilation in Zimmerius flycatchers (Tyrannidae): population genetic and phylogenetic inferences from genome-wide SNPs.

[...]

Frank E. Rheindt¹, Matthew K. Fujita², Peter R. Wilton, Scott V. Edwards³•Institutions (3)

National University of Singapore¹, Harvard University², University of Texas at Arlington³

Supertrees Based on the Subtree Prune-and-Regraft Distance

TL;DR: Introgression of key alleles may have led to phenotypic assimilation in the plumage of mosaic birds, suggesting that selection may have been a key factor facilitating introgression.

...read moreread less

Abstract: Genetic introgression is pervasive in nature and may lead to large-scale phenotypic assimilation and/or admixture of populations, but there is limited knowledge on whether large phenotypic changes are typically accompanied by high levels of introgression throughout the genome. Using bioacoustic, biometric, and spectrophotometric data from a flycatcher (Tyrannidae) system in the Neotropical genus Zimmerius, we document a mosaic pattern of phenotypic admixture in which a population of Zimmerius viridiflavus in northern Peru (henceforth "mosaic") is vocally and biometrically similar to conspecifics to the south but shares plumage characteristics with a different species (Zimmerius chrysops) to the north. To clarify the origins of the mosaic population, we used the RAD-seq approach to generate a data set of 37,361 genome- wide single nucleotide polymorphisms (SNPs). A range of population-genetic diagnostics shows that the genome of the mosaic population is largely indistinguishable from southern Z. viridiflavus and distinct from northern Z. chrysops, and the application of parsimony and species tree methods to the genome-wide SNP data set confirms the close affinity of the mosaic population with southern Z. viridiflavus. Even so, using a subset of 2710 SNPs found across all sampled lineages in configurations appropriate for a recently proposed statistical ("ABBA/BABA") test that distinguishes gene flow from incomplete lineage sorting, we detected low levels of gene flow from northern Z. chrysops into the mosaic population. Mapping the candidate loci for introgression from Z. chrysops into the mosaic population to the zebra finch genome reveals close linkage with genes significantly enriched in functions involving cell projection and plasma membranes. Introgression of key alleles may have led to phenotypic assimilation in the plumage of mosaic birds, suggesting that selection may have been a key factor facilitating introgression. (gene flow; hybridization; Zimmerius; tyrant-flycatcher.)

...read moreread less

Journal Article•DOI•

[...]

Chris Whidden¹, Norbert Zeh¹, Robert G. Beiko¹•Institutions (1)

Dalhousie University¹

A Bayesian method for analyzing lateral gene transfer.

TL;DR: This work successfully constructed an SPR supertree from a phylogenomic dataset of 40,631 gene trees that covered 244 genomes representing several major bacterial phyla, and allowed direct inference of highways of gene transfer between bacterial classes and genera.

...read moreread less

Abstract: Supertree methods reconcile a set of phylogenetic trees into a single structure that is often interpreted as a branching history of species. A key challenge is combining conflicting evolutionary histories that are due to artifacts of phylogenetic reconstruction and phenomena such as lateral gene transfer (LGT). Many supertree approaches use optimality criteria that do not reflect underlying processes, have known biases, and may be unduly influenced by LGT. We present the first method to construct supertrees by using the subtree prune-and-regraft (SPR) distance as an optimality criterion. Although calculating the rooted SPR distance between a pair of trees is NP-hard, our new maximum agreement forest- based methods can reconcile trees with hundreds of taxa and > 50 transfers in fractions of a second, which enables repeated calculations during the course of an iterative search. Our approach can accommodate trees in which uncertain relationships have been collapsed to multifurcating nodes. Using a series of benchmark datasets simulated under plausible rates of LGT, we show that SPR supertrees are more similar to correct species histories than supertrees based on parsimony or Robinson- Foulds distance criteria. We successfully constructed an SPR supertree from a phylogenomic dataset of 40,631 gene trees that covered 244 genomes representing several major bacterial phyla. Our SPR-based approach also allowed direct inference of highways of gene transfer between bacterial classes and genera. A Small number of these highways connect genera in different phyla and can highlight specific genes implicated in long-distance LGT. (Lateral gene transfer; matrix representation with parsimony; phylogenomics; prokaryotic phylogeny; Robinson-Foulds; subtree prune-and-regraft; supertrees.)

...read moreread less

Journal Article•DOI•

[...]

Joel Sjöstrand¹, Joel Sjöstrand², Ali Tofigh³, Vincent Daubin⁴, Lars Arvestad¹, Lars Arvestad², Bengt Sennblad⁵, Bengt Sennblad¹, Jens Lagergren¹, Jens Lagergren⁶ - Show less +6 more•Institutions (6)

Science for Life Laboratory¹, Stockholm University², McGill University³, University of Lyon⁴, Karolinska Institutet⁵, Royal Institute of Technology⁶

20 Feb 2014-Systematic Biology

TL;DR: A Bayesian Markov-chain Monte Carlo-based method is presented that integrates GD, gene loss, LGT, and sequence evolution, and is applied in a genome-wide analysis of two groups of bacteria: Mollicutes and Cyanobacteria.

...read moreread less

Abstract: Lateral gene transfer (LGT)—which transfers DNA between two non-vertically related individuals belonging to the same or different species—is recognized as a major force in prokaryotic evolution, and evidence of its impact on eukaryotic evolution is ever increasing. LGT has attracted much public attention for its potential to transfer pathogenic elements and antibiotic resistance in bacteria, and to transfer pesticide resistance from genetically modified crops to other plants. In a wider perspective, there is a growing body of studies highlighting the role of LGT in enabling organisms to occupy new niches or adapt to environmental changes. The challenge LGT poses to the standard tree-based conception of evolution is also being debated. Studies of LGT have, however, been severely limited by a lack of computational tools. The best currently available LGT algorithms are parsimony-based phylogenetic methods, which require a pre-computed gene tree and cannot choose between sometimes wildly differing most parsimonious solutions. Moreover, in many studies, simple heuristics are applied that can only handle putative orthologs and completely disregard gene duplications (GDs). Consequently, proposed LGT among specific gene families, and the rate of LGT in general, remain debated. We present a Bayesian Markov-chain Monte Carlo-based method that integrates GD, gene loss, LGT, and sequence evolution, and apply the method in a genome-wide analysis of two groups of bacteria: Mollicutes and Cyanobacteria. Our analyses show that although the LGT rate between distant species is high, the net combined rate of duplication and close-species LGT is on average higher. We also show that the common practice of disregarding reconcilability in gene tree inference overestimates the number of LGT and duplication events. (Bayesian; gene duplication; gene loss; horizontal gene transfer; lateral gene transfer; MCMC; phylogenetics.)

...read moreread less

Journal Article•DOI•

Poor Fit to the Multispecies Coalescent is Widely Detectable in Empirical Data

[...]

Noah M. Reid¹, Sarah M. Hird², Jeremy M. Brown, Tara A. Pelletier, John D. McVay, Jordan D. Satler, Bryan C. Carstens - Show less +3 more•Institutions (2)

Louisiana State University¹, Ohio State University²

Integrating Incomplete Fossils by Isolating Conflicting Signal in Saturated and Non-Independent Morphological Characters

TL;DR: It is shown that poor model fit is detectable in the majority of data sets; that this poor fit can mislead phylogenetic estimation; and that in some cases it stems from processes of inherent interest to systematists.

...read moreread less

Abstract: Model checking is a critical part of Bayesian data analysis, yet it remains largely unused in systematic studies. Phylogeny estimation has recently moved into an era of increasingly complex models that simultaneously account for multiple evolutionary processes, the statistical fit of these models to the data has rarely been tested. Here we develop a posterior predictive simulation-based model check for a commonly used multispecies coalescent model, implemented in *BEAST, and apply it to 25 published data sets. We show that poor model fit is detectable in the majority of data sets; that this poor fit can mislead phylogenetic estimation; and that in some cases it stems from processes of inherent interest to systematists. We suggest that as systematists scale up to phylogenomic data sets, which will be subject to a heterogeneous array of evolutionary processes, critically evaluating the fit of models to data is an analytical step that can no longer be ignored. (Gene duplication and extinction; gene tree; hybridization; model fit; multispecies coalescent; next-generation sequencing; posterior predictive simulation; species delimitation; species tree.)

...read moreread less

Journal Article•DOI•

[...]

Liliana M. Dávalos, Paúl M. Velazco¹, Omar Warsi¹, Peter D. Smits¹, Nancy B. Simmons¹ - Show less +1 more•Institutions (1)

American Museum of Natural History¹