scispace - formally typeset
Search or ask a question

Showing papers in "Systematic Biology in 2012"


Journal ArticleDOI
TL;DR: The new version provides convergence diagnostics and allows multiple analyses to be run in parallel with convergence progress monitored on the fly, and provides more output options than previously, including samples of ancestral states, site rates, site dN/dS rations, branch rates, and node dates.
Abstract: Since its introduction in 2001, MrBayes has grown in popularity as a software package for Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) methods. With this note, we announce the release of version 3.2, a major upgrade to the latest official release presented in 2003. The new version provides convergence diagnostics and allows multiple analyses to be run in parallel with convergence progress monitored on the fly. The introduction of new proposals and automatic optimization of tuning parameters has improved convergence for many problems. The new version also sports significantly faster likelihood calculations through streaming single-instruction-multiple-data extensions (SSE) and support of the BEAGLE library, allowing likelihood calculations to be delegated to graphics processing units (GPUs) on compatible hardware. Speedup factors range from around 2 with SSE code to more than 50 with BEAGLE for codon problems. Checkpointing across all models allows long runs to be completed even when an analysis is prematurely terminated. New models include relaxed clocks, dating, model averaging across time-reversible substitution models, and support for hard, negative, and partial (backbone) tree constraints. Inference of species trees from gene trees is supported by full incorporation of the Bayesian estimation of species trees (BEST) algorithms. Marginal model likelihoods for Bayes factor tests can be estimated accurately across the entire model space using the stepping stone method. The new version provides more output options than previously, including samples of ancestral states, site rates, site d(N)/d(S) rations, branch rates, and node dates. A wide range of statistics on tree parameters can also be output for visualization in FigTree and compatible software.

18,718 citations


Journal ArticleDOI
TL;DR: Dendroscope 3 is a new program for working with rooted phylogenetic trees and networks that provides a number of methods for drawing and comparingRoot phylogenetic networks, and for computing them from rooted trees.
Abstract: Dendroscope 3 is a new program for working with rooted phylogenetic trees and networks. It provides a number of methods for drawing and comparing rooted phylogenetic networks, and for computing them from rooted trees. The program can be used interactively or in command-line mode. The program is written in Java, use of the software is free, and installers for all 3 major operating systems can be downloaded from www.dendroscope.org. [Phylogenetic trees; phylogenetic networks; software.].

1,396 citations


Journal ArticleDOI
TL;DR: A new class of molecular marker, anchored by ultraconserved genomic elements (UCEs), that universally enable target enrichment and sequencing of thousands of orthologous loci across species separated by hundreds of millions of years of evolution is introduced.
Abstract: Although massively parallel sequencing has facilitated large-scale DNA sequencing, comparisons among distantly related species rely upon small portions of the genome that are easily aligned. Methods are needed to efficiently obtain comparable DNA fragments prior to massively parallel sequencing, particularly for biologists working with non- model organisms. We introduce a new class of molecular marker, anchored by ultraconserved genomic elements (UCEs), that universally enable target enrichment and sequencing of thousands of orthologous loci across species separated by hundreds of millions of years of evolution. Our analyses here focus on use of UCE markers in Amniota because UCEs and phylogenetic relationships are well-known in some amniotes. We perform an in silico experiment to demonstrate that sequence flanking 2030 UCEs contains information sufficient to enable unambiguous recovery of the established primate phylogeny. We extend this experiment by performing an in vitro enrichment of 2386 UCE-anchored loci from nine, non- model avian species. We then use alignments of 854 of these loci to unambiguously recover the established evolutionary relationships within and among three ancient bird lineages. Because many organismal lineages have UCEs, this type of genetic marker and the analytical framework we outline can be applied across the tree of life, potentially reshaping our understanding of phylogeny at many taxonomic levels. (Flanking sequence; genetic markers; phylogenomics; sequence capture; target enrichment; ultraconserved elements.)

979 citations


Journal ArticleDOI
TL;DR: The results suggest that the crown group dates back to the Carboniferous, ∼309 Ma (95% interval: 291--347 Ma), and diversified into major extant lineages much earlier than previously thought, well before the Triassic.
Abstract: Phylogenies are usually dated by calibrating interior nodes against the fossil record. This relies on indirect methods that, in the worst case, misrepresent the fossil information. Here, we contrast such node dating with an approach that includes fossils along with the extant taxa in a Bayesian total-evidence analysis. As a test case, we focus on the early radiation of the Hymenoptera, mostly documented by poorly preserved impression fossils that are difficult to place phylogenetically. Specifically, we compare node dating using nine calibration points derived from the fossil record with total-evidence dating based on 343 morphological characters scored for 45 fossil (4-20% complete) and 68 extant taxa. In both cases we use molecular data from seven markers (∼5 kb) for the extant taxa. Because it is difficult to model speciation, extinction, sampling, and fossil preservation realistically, we develop a simple uniform prior for clock trees with fossils, and we use relaxed clock models to accommodate rate variation across the tree. Despite considerable uncertainty in the placement of most fossils, we find that they contribute significantly to the estimation of divergence times in the total-evidence analysis. In particular, the posterior distributions on divergence times are less sensitive to prior assumptions and tend to be more precise than in node dating. The total-evidence analysis also shows that four of the seven Hymenoptera calibration points used in node dating are likely to be based on erroneous or doubtful assumptions about the fossil placement. With respect to the early radiation of Hymenoptera, our results suggest that the crown group dates back to the Carboniferous, ∼309 Ma (95% interval: 291-347 Ma), and diversified into major extant lineages much earlier than previously thought, well before the Triassic. (Bayesian inference; fossil dating; morphological evolution; relaxed clock; statistical phylogenetics.)

706 citations


Journal ArticleDOI
TL;DR: A new, cost-efficient, and rapid approach to obtaining data from hundreds of loci for potentially hundreds of individuals for deep and shallow phylogenetic studies, found that hybrid enrichment using conserved probes (anchored enrichment) can recover a large number of unlinked loci that are useful at a diversity of phylogenetic timescales.
Abstract: The field of phylogenetics is on the cusp of a major revolution, enabled by new methods of data collection that leverage both genomic resources and recent advances in DNA sequencing. Previous phylogenetic work has required labor-intensive marker development coupled with single-locus polymerase chain reaction and DNA sequencing on clade-by-clade and locus-by-locus basis. Here, we present a new, cost-efficient, and rapid approach to obtaining data from hundreds of loci for potentially hundreds of individuals for deep and shallow phylogenetic studies. Specifically, we designed probes for target enrichment of >500 loci in highly conserved anchor regions of vertebrate genomes (flanked by less conserved regions) from five model species and tested enrichment efficiency in nonmodel species up to 508 million years divergent from the nearest model. We found that hybrid enrichment using conserved probes (anchored enrichment) can recover a large number of unlinked loci that are useful at a diversity of phylogenetic timescales. This new approach has the potential not only to expedite resolution of deep-scale portions of the Tree of Life but also to greatly accelerate resolution of the large number of shallow clades that remain unresolved. The combination of low cost (~1% of the cost of traditional Sanger sequencing and ~3.5% of the cost of high-throughput amplicon sequencing for projects on the scale of 500 loci × 100 individuals) and rapid data collection (~2 weeks of laboratory time) are expected to make this approach tractable even for researchers working on systems with limited or nonexistent genomic resources.

681 citations


Journal ArticleDOI
TL;DR: A specimen-based protocol for selecting and documenting relevant fossils is presented and future directions for evaluating and utilizing phylogenetic and temporal data from the fossil record are discussed, to establish the best practices for justifying fossils used for the temporal calibration of molecular phylogenies.
Abstract: At this time, no abstract is available. SciVerse Scopus has content delivery agreements in place with each publisher and currently contains 30 million records with an abstract. An abstract may not be present due to incomplete data, as supplied by the publisher, or is still in the process of being indexed.

589 citations


Journal ArticleDOI
TL;DR: BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference, is presented, which provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms.
Abstract: Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.

556 citations


Journal ArticleDOI
TL;DR: A CO1 data set of aquatic predaceous diving beetles of the tribe Agabini is presented and it is shown that even if samples are collected to maximize the geographical coverage, up to 70 individuals are required to sample 95% of intraspecific variation, showing that the geographical scale of sampling has a critical impact on the global application of DNA barcoding.
Abstract: Eight years after DNA barcoding was formally proposed on a large scale, CO1 sequences are rapidly accumu- lating from around the world. While studies to date have mostly targeted local or regional species assemblages, the recent launch of the global iBOL project (International Barcode of Life), highlights the need to understand the effects of geographi- cal scale on Barcoding's goals. Sampling has been central in the debate on DNA Barcoding, but the effect of the geographical scale of sampling has not yet been thoroughly and explicitly tested with empirical data. Here, we present a CO1 data set of aquatic predaceous diving beetles of the tribe Agabini, sampled throughout Europe, and use it to investigate how the geographic scale of sampling affects 1) the estimated intraspecific variation of species, 2) the genetic distance to the most closely related heterospecific, 3) the ratio of intraspecific and interspecific variation, 4) the frequency of taxonomically rec- ognized species found to be monophyletic, and 5) query identification performance based on 6 different species assignment methods. Intraspecific variation was significantly correlated with the geographical scale of sampling ( R-square = 0.7), and more than half of the species with 10 or more sampled individuals (N = 29) showed higher intraspecific variation than 1% sequence divergence. In contrast, the distance to the closest heterospecific showed a significant decrease with increasing geographical scale of sampling. The average genetic distance dropped from >7% for samples within 1 km, to 6000 km apart. Over a third of the species were not monophyletic, and the proportion increased through locally, nationally, regionally, and continentally restricted subsets of the data. The success of identifying queries decreased with increasing spatial scale of sampling; liberal methods declined from 100% to around 90%, whereas strict methods dropped to below 50% at continental scales. The proportion of query identifications considered uncertain (more than one species <1% distance from query) escalated from zero at local, to 50% at continental scale. Finally, by resampling the most widely sampled species we show that even if samples are collected to maximize the geographical coverage, up to 70 indi- viduals are required to sample 95% of intraspecific variation. The results show that the geographical scale of sampling has a critical impact on the global application of DNA barcoding. Scale-effects result from the relative importance of different pro- cesses determining the composition of regional species assemblages (dispersal and ecological assembly) and global clades (demography, speciation, and extinction). The incorporation of geographical information, where available, will be required to obtain identification rates at global scales equivalent to those in regional barcoding studies. Our result hence provides an impetus for both smarter barcoding tools and sprouting national barcoding initiatives—smaller geographical scales deliver higher accuracy. (Agabini; diving beetles; DNA barcoding; Dytiscidae; iBOL; identification methods; sampling; scale effect; species monophyly)

397 citations


Journal ArticleDOI
TL;DR: It is suggested that increased background research should be made at all stages of the calibration process to reduce errors wherever possible, from verifying the geochronological data on the fossils to critical reassessment of their phylogenetic position.
Abstract: Although temporal calibration is widely recognized as critical for obtaining accurate divergence-time estimates using molecular dating methods, few studies have evaluated the variation resulting from different calibration strategies. Depending on the information available, researchers have often used primary calibrations from the fossil record or secondary calibrations from previous molecular dating studies. In analyses of flowering plants, primary calibration data can be obtained from macro- and mesofossils (e.g., leaves, flowers, and fruits) or microfossils (e.g., pollen). Fossil data can vary substantially in accuracy and precision, presenting a difficult choice when selecting appropriate calibrations. Here, we test the impact of eight plausible calibration scenarios for Nothofagus (Nothofagaceae, Fagales), a plant genus with a particularly rich and well-studied fossil record. To do so, we reviewed the phylogenetic placement and geochronology of 38 fossil taxa of Nothofagus and other Fagales, and we identified minimum age constraints for up to 18 nodes of the phylogeny of Fagales. Molecular dating analyses were conducted for each scenario using maximum likelihood (RAxML + r8s) and Bayesian (BEAST) approaches on sequence data from six regions of the chloroplast and nuclear genomes. Using either ingroup or outgroup constraints, or both, led to similar age estimates, except near strongly influential calibration nodes. Using "early but risky" fossil constraints in addition to "safe but late" constraints, or using assumptions of vicariance instead of fossil constraints, led to older age estimates. In contrast, using secondary calibration points yielded drastically younger age estimates. This empirical study highlights the critical influence of calibration on molecular dating analyses. Even in a best-case situation, with many thoroughly vetted fossils available, substantial uncertainties can remain in the estimates of divergence times. For example, our estimates for the crown group age of Nothofagus varied from 13 to 113 Ma across our full range of calibration scenarios. We suggest that increased background research should be made at all stages of the calibration process to reduce errors wherever possible, from verifying the geochronological data on the fossils to critical reassessment of their phylogenetic position.

369 citations


Journal ArticleDOI
TL;DR: A modification to the original SATé algorithm that improves upon SATé (which is now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy, and presents two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results.
Abstract: Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATe estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324:1561-1564). Here, we present a modification to the original SATe algorithm that improves upon SATe (which we now call SATe-I) in terms of speed and of phylogenetic and alignment accuracy. SATe-II uses a different divide-and-conquer strategy than SATe-I and so produces smaller more closely related subsets than SATe-I; as a result, SATe-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATe-I. Generally, SATe is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATe-II-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATe-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATe's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of those sequences that maximize likelihood under the Jukes-Cantor model is uninformative in the worst possible sense. For all inputs, all trees optimize the likelihood score. Second, we show that a greedy heuristic that uses GTR+Gamma ML to optimize the alignment and the tree can produce very poor alignments and trees. Therefore, the excellent performance of SATe-II and SATe-I is not because ML is used as an optimization criterion for choosing the best tree/alignment pair but rather due to the particular divide-and-conquer realignment techniques employed.

340 citations


Journal ArticleDOI
TL;DR: Biogeographic analyses indicate that the present-day distribution of fig and pollinator lineages is consistent with a Eurasian origin and subsequent dispersal, rather than with Gondwanan vicariance.
Abstract: It is thought that speciation in phytophagous insects is often due to colonization of novel host plants, because radiations of plant and insect lineages are typically asynchronous. Recent phylogenetic comparisons have supported this model of diversification for both insect herbivores and specialized pollinators. An exceptional case where contemporaneous plant-insect diversification might be expected is the obligate mutualism between fig trees (Ficus species, Moraceae) and their pollinating wasps (Agaonidae, Hymenoptera). The ubiquity and ecological significance of this mutualism in tropical and subtropical ecosystems has long intrigued biologists, but the systematic challenge posed by >750 interacting species pairs has hindered progress toward understanding its evolutionary history. In particular, taxon sampling and analytical tools have been insufficient for large-scale cophylogenetic analyses. Here, we sampled nearly 200 interacting pairs of fig and wasp species from across the globe. Two supermatrices were assembled: on an average, wasps had sequences from 77% of 6 genes (5.6 kb), figs had sequences from 60% of 5 genes (5.5 kb), and overall 850 new DNA sequences were generated for this study. We also developed a new analytical tool, Jane 2, for event-based phylogenetic reconciliation analysis of very large data sets. Separate Bayesian phylogenetic analyses for figs and fig wasps under relaxed molecular clock assumptions indicate Cretaceous diversification of crown groups and contemporaneous divergence for nearly half of all fig and pollinator lineages. Event-based cophylogenetic analyses further support the codiversification hypothesis. Biogeographic analyses indicate that the present-day distribution of fig and pollinator lineages is consistent with a Eurasian origin and subsequent dispersal, rather than with Gondwanan vicariance. Overall, our findings indicate that the fig-pollinator mutualism represents an extreme case among plant-insect interactions of coordinated dispersal and long-term codiversification. [Biogeography; coevolution; cospeciation; host switching; long-branch attraction; phylogeny.].

Journal ArticleDOI
TL;DR: It is clarified that calibration densities, such as those defined in BEAST 1.5, do not represent the marginal prior distribution of the calibration node, and an alternative construction for a calibrated Yule prior on trees is described that allows direct specification of the marginalPrior of the calibrated divergence time.
Abstract: The use of fossil evidence to calibrate divergence time estimation has a long history. More recently, Bayesian Markov chain Monte Carlo has become the dominant method of divergence time estimation, and fossil evidence has been reinterpreted as the specification of prior distributions on the divergence times of calibration nodes. These so-called "soft calibrations" have become widely used but the statistical properties of calibrated tree priors in a Bayesian setting hashave not been carefully investigated. Here, we clarify that calibration densities, such as those defined in BEAST 1.5, do not represent the marginal prior distribution of the calibration node. We illustrate this with a number of analytical results on small trees. We also describe an alternative construction for a calibrated Yule prior on trees that allows direct specification of the marginal prior distribution of the calibrated divergence time, with or without the restriction of monophyly. This method requires the computation of the Yule prior conditional on the height of the divergence being calibrated. Unfortunately, a practical solution for multiple calibrations remains elusive. Our results suggest that direct estimation of the prior induced by specifying multiple calibration densities should be a prerequisite of any divergence time dating analysis.

Journal ArticleDOI
TL;DR: Bayesian ancestral state reconstructions and BiSSE likelihood analyses of correlated diversification indicated that increased rates of speciation are strongly associated with the derived evolution of perennial life history and invasion of montane ecosystems.
Abstract: Replicate radiations provide powerful comparative systems to address questions about the interplay between opportunity and innovation in driving episodes of diversification and the factors limiting their subsequent progression. However, such systems have been rarely documented at intercontinental scales. Here, we evaluate the hypothesis of multiple radiations in the genus Lupinus (Leguminosae), which exhibits some of the highest known rates of net diversification in plants. Given that incomplete taxon sampling, background extinction, and lineage-specific variation in diversification rates can confound macroevolutionary inferences regarding the timing and mechanisms of cladogenesis, we used Bayesian relaxed clock phylogenetic analyses as well as MEDUSA and BiSSE birth–death likelihood models of diversification, to evaluate the evolutionary patterns of lineage accumulation in Lupinus. We identified 3 significant shifts to increased rates of net diversification (r) relative to background levels in the genus (r = 0.18–0.48 lineages/myr). The primary shift occurred approximately 4.6 Ma (r = 0.48–1.76) in the montane regions of western North America, followed by a secondary shift approximately 2.7 Ma (r = 0.89–3.33) associated with range expansion and diversification of allopatrically distributed sister clades in the Mexican highlands and Andes. We also recovered evidence for a third independent shift approximately 6.5 Ma at the base of a lower elevation eastern South American grassland and campo rupestre clade (r = 0.36–1.33). Bayesian ancestral state reconstructions and BiSSE likelihood analyses of correlated diversification indicated that increased rates of speciation are strongly associated with the derived evolution of perennial life history and invasion of montane ecosystems. Although we currently lack hard evidence for “replicate adaptive radiations” in the sense of convergent morphological and ecological trajectories among species in different clades, these results are consistent with the hypothesis that iteroparity functioned as an adaptive key innovation, providing a mechanism for range expansion and rapid divergence in upper elevation regions across much of the New World.

Journal ArticleDOI
TL;DR: It is considered that marine species comprise only 16% of all species on Earth although the oceans contain a greater phylogenetic diversity than occurs on land, and it is predicted that there may be 1.8-2.0 million species onEarth, significantly less than some previous estimates.
Abstract: We found that trends in the rate of description of 580,000 marine and terrestrial species, in the taxonomically authoritative World Register of Marine Species and Catalogue of Life databases, were similar until the 1950s. Since then, the relative number of marine to terrestrial species described per year has increased, reflecting the less explored nature of the oceans. From the mid-19th century, the cumulative number of species described has been linear, with the highest number of species described in the decade of 1900, and fewer species described and fewer authors active during the World Wars. There were more authors describing species since the 1960s, indicating greater taxonomic effort. There were fewer species described per author since the 1920s, suggesting it has become more difficult to discover new species. There was no evidence of any change in individual effort by taxonomists. Using a nonhomogeneous renewal process model we predicted that 24-31% to 21-29% more marine and terrestrial species remain to be discovered, respectively. We discuss why we consider that marine species comprise only 16% of all species on Earth although the oceans contain a greater phylogenetic diversity than occurs on land. We predict that there may be 1.8-2.0 million species on Earth, of which about 0.3 million are marine, significantly less than some previous estimates. (Biodiversity; biogeography; deep-sea modeling; macroecology; marine; taxonomy; terrestrial.)

Journal ArticleDOI
TL;DR: It is argued that all existing techniques need to be modified to accommodate the commonness of rarity and that all future techniques should be explicit about how rare species can be discovered and treated.
Abstract: Singletons—species only known from a single specimen—and uniques—species that have only been collected once—are very common in biodiversity samples. Recent reviews suggest that in tropical arthropod samples, 30% of all species are represented by only one specimen (Bickel 1999; Novotny and Basset 2000; Coddington et al. 2009), with additional sampling helping little with eliminating rarity. Usually, such sampling only converts some of the singleton species to doubletons, with new singleton species being discovered in the process (Scharff et al. 2003; Coddington et al. 2009). Here, we first demonstrate that rare species are similarly common in specimen samples used for taxonomic research before we argue that the phenomenon of rarity has been insufficiently considered by the new quantitative techniques for species delimitation. Addressing this disconnect between theory and reality is pressing given that the last decade has seen a renewed interest in methods for species identification and delimitation (Sites and Marshall 2004; O’Meara 2010). Much of this interest has been fuelled by the availability of DNA sequences (Meier 2008). However, many newly proposed techniques implicitly or explicitly assume that all populations and species can be well sampled. But what is the value of these techniques if many species have only been collected once and/or are only known from one specimen? Here, we argue that all existing techniques need to be modified to accommodate the commonness of rarity and that all future techniques should be explicit about how rare species can be discovered and treated.

Journal ArticleDOI
TL;DR: This study provides a surprising, but well-supported, hypothesis for a convict-blenny sister group to the charismatic cichlids and new insights into the evolution of pharyngognathy.
Abstract: The perciform group Labroidei includes approximately 2600 species and comprises some of the most diverse and successful lineages of teleost fishes. Composed of four major clades, Cichlidae, Labridae (wrasses, parrotfishes, and weed whitings), Pomacentridae (damselfishes), and Embiotocidae (surfperches); labroids have been an icon for studies of biodiversity, adaptive radiation, and sexual selection. The success and diversification of labroids have been largely attributed to the presence of a major innovation in the pharyngeal jaw apparatus, pharyngognathy, which is hypothesized to increase feeding capacity and versatility. We present results of large-scale phylogenetic analyses and a survey of pharyngeal jaw functional morphology that allow us to examine the evolution of pharyngognathy in a historical context. Phylogenetic analyses were based on a sample of 188 acanthomorph (spiny-rayed fish) species, primarily percomorphs (perch-like fishes), and DNA sequence data collected from 10 nuclear loci that have been previously used to resolve higher level ray-finned fish relationships. Phylogenies inferred from this dataset using maximum likelihood, Bayesian, and species tree analyses indicate polyphyly of the traditional Labroidei and clearly separate Labridae from the remainder of the traditional labroid lineages (Cichlidae, Embiotocidae, and Pomacentridae). These three "chromide" families grouped within a newly discovered clade of 40 families and more than 4800 species (>27% of percomorphs and >16% of all ray-finned fishes), which we name Ovalentaria for its characteristic demersal, adhesive eggs with chorionic filaments. This fantastically diverse clade includes some of the most species-rich lineages of marine and freshwater fishes, including all representatives of the Cichlidae, Embiotocidae, Pomacentridae, Ambassidae, Gobiesocidae, Grammatidae, Mugilidae, Opistognathidae, Pholidichthyidae, Plesiopidae (including Notograptus), Polycentridae, Pseudochromidae, Atherinomorpha, and Blennioidei. Beyond the discovery of Ovalentaria, this study provides a surprising, but well-supported, hypothesis for a convict- blenny (Pholidichthys) sister group to the charismatic cichlids and new insights into the evolution of pharyngognathy. Bayesian stochastic mapping ancestral state reconstructions indicate that pharyngognathy has evolved at least six times in percomorphs, including four separate origins in members of the former Labroidei, one origin in the Centrogenyidae, and one origin within Beloniformes. Our analyses indicate that all pharyngognathous fishes have a mechanically efficient biting mechanism enabled by the muscular sling and a single lower jaw element. However, a major distinction exists between Labridae, which lacks the widespread, generalized percomorph pharyngeal biting mechanism, and all other pharyngognathous clades, which possess this generalized biting mechanism in addition to pharyngognathy. Our results reveal a remarkable history of pharyngognathy: far from a single origin, it appears to have evolved at least six times, and its status as a major evolutionary innovation is reinforced by it being a synapomorphy for several independent major radiations, including some of the most species rich and ecologically diverse percomorph clades of coral reef and tropical freshwater fishes, Labridae and Cichlidae. (Acanthomorpha; Beloniformes; Centrogenyidae; key innovation; Labroidei; Ovalentaria; pharyngeal jaws; Perciformes.)

Journal ArticleDOI
TL;DR: In this article, the authors investigate the performance of common MCMC proposal distributions in terms of median and variance of run time to convergence on 11 data sets and introduce two new Metropolized Gibbs Samplers for moving through "tree space."
Abstract: Increasingly, large data sets pose a challenge for computationally intensive phylogenetic methods such as Bayesian Markov chain Monte Carlo (MCMC). Here, we investigate the performance of common MCMC proposal distributions in terms of median and variance of run time to convergence on 11 data sets. We introduce two new Metropolized Gibbs Samplers for moving through "tree space." MCMC simulation using these new proposals shows faster average run time and dramatically improved predictability in performance, with a 20-fold reduction in the variance of the time to estimate the posterior distribution to a given accuracy. We also introduce conditional clade probabilities and demonstrate that they provide a superior means of approximating tree topology posterior probabilities from samples recorded during MCMC.

Journal ArticleDOI
TL;DR: It is shown that the idea of “protracted” speciation can be incorporated in the standard birth–death model of diversification and provides a compelling fit to four bird phylogenies with realistic parameter values.
Abstract: Phylogenetic trees show a remarkable slowdown in the increase of number of lineages towards the present, a phenomenon which cannot be explained by the standard birth-death model of diversification with constant speciation and extinction rates. The birth-death model instead predicts a constant or accelerating increase in the number of lineages, which has been called the pull of the present. The observed slowdown has been attributed to nonconstancy of the speciation and extinction rates due to some form of diversity dependence (i.e., species-level density dependence), but the mechanisms underlying this are still unclear. Here, we propose an alternative explanation based on the simple concept that speciation takes time to complete. We show that this idea of "protracted" speciation can be incorporated in the standard birth-death model of diversification. The protracted birth-death model predicts a realistic slowdown in the rate of increase of number of lineages in the phylogeny and provides a compelling fit to four bird phylogenies with realistic parameter values. Thus, the effect of recognizing the generally accepted fact that speciation is not an instantaneous event is significant; even if it cannot account for all the observed patterns, it certainly contributes substantially and should therefore be incorporated into future studies.

Journal ArticleDOI
TL;DR: It is shown how bias in estimated evolutionary regressions can arise from several sources, including phylogenetic inertia and either observational or biological error in the predictor variables, and how all these biases can be estimated and corrected for in the presence of phylogenetic correlations.
Abstract: Regressions of biological variables across species are rarely perfect. Usually, there are residual deviations from the estimated model relationship, and such deviations commonly show a pattern of phylogenetic correlations indicating that they have biological causes. We discuss the origins and effects of phylogenetically correlated biological variation in regression studies. In particular, we discuss the interplay of biological deviations with deviations due to observational or measurement errors, which are also important in comparative studies based on estimated species means. We show how bias in estimated evolutionary regressions can arise from several sources, including phylogenetic inertia and either observational or biological error in the predictor variables. We show how all these biases can be estimated and corrected for in the presence of phylogenetic correlations. We present general formulas for incorporating measurement error in linear models with correlated data. We also show how alternative regression models, such as major axis and reduced major axis regression, which are often recommended when there is error in predictor variables, are strongly biased when there is biological variation in any part of the model. We argue that such methods should never be used to estimate evolutionary or allometric regression slopes.

Journal ArticleDOI
TL;DR: A new biogeographical model for late Mesozoic terrestrial ecosystems is proposed in which Europe and "Gondwanan" territories possessed a common Eurogondwanans fauna during the earliest Cretaceous, and tree reconciliation analyses (TRAs) were performed based onBiogeographical signals provided by a supertree of late Meszoic archosaurs.
Abstract: Late Mesozoic palaeobiogeography has been characterized by a distinction between the northern territories of Laurasia and the southern landmasses of Gondwana. The repeated discovery of Gondwanan lineages in Laurasia has led to the proposal of alternative scenarios to explain these anomalous occurrences. A new biogeographical model for late Mesozoic terrestrial ecosystems is here proposed in which Europe and "Gondwanan" territories possessed a common Eurogondwanan fauna during the earliest Cretaceous. Subsequently, following the Hauterivian, the European territories severed from Africa and then connected to Asiamerica resulting in a faunal interchange. This model explains the presence of Gondwanan taxa in Laurasia and the absence of Laurasian forms in the southern territories during the Cretaceous. In order to test this new palaeobiogeographical model, tree reconciliation analyses (TRAs) were performed based on biogeographical signals provided by a supertree of late Mesozoic archosaurs. The TRAs found significant evidence for the presence of an earliest Cretaceous Eurogondwanan fauna followed by a relatively short-term Gondwana-Laurasia dichotomy. The analysis recovered evidence for a biogeographical reconnection of the European territories with Africa and South America- Antarctica during the Campanian to Maastrichtian time-slice. This biogeographical scenario appears to continue through the early Tertiary and sheds light on the trans-Atlantic disjunct distributions of several extant plant and animal groups. (Archosauria; Atlantogea; Cretaceous; Eurogondwana; palaeobiogeography; Tertiary.)


Journal ArticleDOI
TL;DR: A Monte Carlo approach to estimating power to resolve as well as deriving a nearly equivalent faster deterministic calculation are developed and implemented and predicted power of resolution for the loci analyzed.
Abstract: A principal objective for phylogenetic experimental design is to predict the power of a data set to resolve nodes in a phylogenetic tree. However, proactively assessing the potential for phylogenetic noise compared with signal in a candidate data set has been a formidable challenge. Understanding the impact of collection of additional sequence data to resolve recalcitrant internodes at diverse historical times will facilitate increasingly accurate and cost-effective phylogenetic research. Here, we derive theory based on the fundamental unit of the phylogenetic tree, the quartet, that applies estimates of the state space and the rates of evolution of characters in a data set to predict phylogenetic signal and phylogenetic noise and therefore to predict the power to resolve internodes. We develop and implement a Monte Carlo approach to estimating power to resolve as well as deriving a nearly equivalent faster deterministic calculation. These approaches are applied to describe the distribution of potential signal, polytomy, or noise for two example data sets, one recent (cytochrome c oxidase I and 28S ribosomal rRNA sequences from Diplazontinae parasitoid wasps) and one deep (eight nuclear genes and a phylogenomic sequence for diverse microbial eukaryotes including Stramenopiles, Alveolata, and Rhizaria). The predicted power of resolution for the loci analyzed is consistent with the historic use of the genes in phylogenetics.

Journal ArticleDOI
TL;DR: In this article, a statistical model with a double original advantage is proposed, which incorporates information about the spatial distribution of the samples, with the aim to increase inference power and to relate more explicitly observed patterns to ge- ography and allow one to analyze genetic and phenotypic data within a unified model and inference framework, thus opening the way to robust comparisons between markers and possibly combined analyses.
Abstract: Recognition of evolutionary units (species, populations) requires integrating several kinds of data, such as ge- netic or phenotypic markers or spatial information in order to get a comprehensive view concerning the differentiation of the units. We propose a statistical model with a double original advantage: (i) it incorporates information about the spatial distribution of the samples, with the aim to increase inference power and to relate more explicitly observed patterns to ge- ography and (ii) it allows one to analyze genetic and phenotypic data within a unified model and inference framework, thus opening the way to robust comparisons between markers and possibly combined analyses. We show from simulated data as well as real data that our method estimates parameters accurately and is an improvement over alternative approaches in many situations. The power of this method is exemplified using an intricate case of inter- and intraspecies differentiation based on an original data set of georeferenced genetic and morphometric markers obtained on Myodes voles from Sweden. A computer program is made available as an extension of the R package Geneland. (Bayesian model; bio-geography; clus- tering; Markov chain Monte Carlo; molecular markers; morphometrics; Myodes; R package; spatial data.)

Journal ArticleDOI
TL;DR: These findings support fragmentation of moist tropical forest in the eastern GS during this period when the refuge hypothesis would have the region serving as a contiguous wet-forest refuge and further Quaternary fragmentation and a role for rivers.
Abstract: The Guiana Shield (GS) is one of the most pristine regions of Amazonia and biologically one of the richest areas on Earth. How and when this massive diversity arose remains the subject of considerable debate. The prevailing hypothesis of Quaternary glacial refugia suggests that a part of the eastern GS, among other areas in Amazonia, served as stable forested refugia during periods of aridity. However, the recently proposed disturbance-vicariance hypothesis proposes that fluctuations in temperature on orbital timescales, with some associated aridity, have driven Neotropical diversification. The expectations of the temporal and spatial organization of biodiversity differ between these two hypotheses. Here, we compare the genetic structure of 12 leaf-litter inhabiting frog species from the GS lowlands using a combination of mitochondrial and nuclear sequences in an integrative analytical approach that includes phylogenetic reconstructions, molecular dating, and Geographic Information System methods. This comparative and integrated approach overcomes the well-known limitations of phylogeographic inference based on single species and single loci. All of the focal species exhibit distinct phylogeographic patterns highlighting taxon-specific historical distributions, ecological tolerances to climatic disturbance, and dispersal abilities. Nevertheless, all but one species exhibit a history of fragmentation/isolation within the eastern GS during the Quaternary with spatial and temporal concordance among species. The signature of isolation in northern French Guiana (FG) during the early Pleistocene is particularly clear. Approximate Bayesian Computation supports the synchrony of the divergence between northern FG and other GS lineages. Substructure observed throughout the GS suggests further Quaternary fragmentation and a role for rivers. Our findings support fragmentation of moist tropical forest in the eastern GS during this period when the refuge hypothesis would have the region serving as a contiguous wet-forest refuge.

Journal ArticleDOI
TL;DR: Among the significant phylogenetic results is the near-complete support along the eupolypod II backbone, the demonstrated paraphyly of Woodsiaceae as currently circumscribed, and the well-supported placement of the enigmatic genera Homalosorus, Diplaziopsis, and Woodsia.
Abstract: Backbone relationships within the large eupolypod II clade, which includes nearly a third of extant fern species, have resisted elucidation by both molecular and morphological data. Earlier studies suggest that much of the phylogenetic intractability of this group is due to three factors: (i) a long root that reduces apparent levels of support in the ingroup; (ii) long ingroup branches subtended by a series of very short backbone internodes (the "ancient rapid radiation" model); and (iii) significantly heterogeneous lineage-specific rates of substitution. To resolve the eupolypod II phylogeny, with a partic- ular emphasis on the backbone internodes, we assembled a data set of five plastid loci ( atpA, atpB, matK, rbcL, and trnG-R) from a sample of 81 accessions selected to capture the deepest divergences in the clade. We then evaluated our phylogenetic hypothesis against potential confounding factors, including those induced by rooting, ancient rapid radiation, rate hetero- geneity, and the Bayesian star-tree paradox artifact. While the strong support we inferred for the backbone relationships proved robust to these potential problems, their investigation revealed unexpected model-mediated impacts of outgroup composition, divergent effects of methods for countering the star-tree paradox artifact, and gave no support to concerns about the applicability of the unrooted model to data sets with heterogeneous lineage-specific rates of substitution. This study is among few to investigate these factors with empirical data, and the first to compare the performance of the two primary methods for overcoming the Bayesian star-tree paradox artifact. Among the significant phylogenetic results is the near-complete support along the eupolypod II backbone, the demonstrated paraphyly of Woodsiaceae as currently circum- scribed, and the well-supported placement of the enigmatic genera Homalosorus, Diplaziopsis, and Woodsia. (Moderate data; outgroup rooting; Phycas; phylogeny evaluation; rate heterogeneity; reduced consensus; star-tree paradox; Woodsiaceae.)

Journal ArticleDOI
TL;DR: It is remarkable that mtDNA introgression in hares is frequent, extensive, and always from the same donor arctic species, and possible explanations for the phenomenon are discussed in relation to the dynamics of range expansions and species replacements during the climatic oscillations of the Pleistocene.
Abstract: Understanding recent speciation history requires merging phylogenetic and population genetics approaches, taking into account the persistence of ancestral polymorphism and possible introgression. The emergence of a clear phy- logeny of hares (genus Lepus) has been hampered by poor genomic sampling and possible occurrence of mitochondrial DNA (mtDNA) introgression from the arctic/boreal Lepus timidus into several European temperate and possibly Ameri- can boreal species. However, no formal test of introgression, taking also incomplete lineage sorting into account, has been done. Here, to clarify the yet poorly resolved species phylogeny of hares and test hypotheses of mtDNA introgression, we sequenced 14 nuclear DNA and 2 mtDNA fragments (8205 and 1113 bp, respectively) in 50 specimens from 11 hare species from Eurasia, North America, and Africa. By applying an isolation-with-migration model to the nuclear data on subsets of species, we find evidence for very limited gene flow from L. timidus into most temperate European species, and not into the American boreal ones. Using a multilocus coalescent-based method, we infer the species phylogeny, which we find highly incongruent with mtDNA phylogeny using parametric bootstrap. Simulations of mtDNA evolution under the speciation history inferred from nuclear genes did not support the hypothesis of mtDNA introgression from L. timidus into the American L. townsendii but did suggest introgression from L. timidus into 4 temperate European species. One such event likely resulted in the complete replacement of the aboriginal mtDNA of L. castroviejoi and of its sister species L. corsicanus. It is remarkable that mtDNA introgression in hares is frequent, extensive, and always from the same donor arctic species. We discuss possible explanations for the phenomenon in relation to the dynamics of range expansions and species replace- ments during the climatic oscillations of the Pleistocene. (Coalescent simulations; discordant phylogenies; introgression; Lepus; rapid radiation; species-tree inference.)

Journal ArticleDOI
TL;DR: The most parsimonious species network and the fossil-based calibration of the homoeolog tree favored monophyly of the high polyploids, which has resulted from allodecaploidization 9–14 Ma, involving sympatric ancestors from the extant Viola sections Chamaemelanium, Plagiostigma, and Viola.
Abstract: The phylogenies of allopolyploids take the shape of networks and cannot be adequately represented as bifurcating trees. Especially for high polyploids (i.e., organisms with more than six sets of nuclear chromosomes), the signatures of gene homoeolog loss, deep coalescence, and polyploidy may become confounded, with the result that gene trees may be congruent with more than one species network. Herein, we obtained the most parsimonious species network by objective comparison of competing scenarios involving polyploidization and homoeolog loss in a high-polyploid lineage of violets (Viola, Violaceae) mostly or entirely restricted to North America, Central America, or Hawaii. We amplified homoeologs of the low-copy nuclear gene, glucose-6-phosphate isomerase (GPI), by single-molecule polymerase chain reaction (PCR) and the chloroplast trnL-F region by conventional PCR for 51 species and subspecies. Topological incongruence among GPI homoeolog subclades, owing to deep coalescence and two instances of putative loss (or lack of detection) of homoeologs, were reconciled by applying the maximum tree topology for each subclade. The most parsimonious species network and the fossil-based calibration of the homoeolog tree favored monophyly of the high polyploids, which has resulted from allodecaploidization 9-14 Ma, involving sympatric ancestors from the extant Viola sections Chamaemelanium (diploid), Plagiostigma (paleotetraploid), and Viola (paleotetraploid). Although two of the high-polyploid lineages (Boreali-Americanae, Pedatae) remained decaploid, recurrent polyploidization with tetraploids of section Plagiostigma within the last 5 Ma has resulted in two 14-ploid lineages (Mexicanae, Nosphinium) and one 18-ploid lineage (Langsdorffianae). This implies a more complex phylogenetic and biogeographic origin of the Hawaiian violets (Nosphinium) than that previously inferred from rDNA data and illustrates the necessity of considering polyploidy in phylogenetic and biogeographic reconstruction.

Journal ArticleDOI
TL;DR: Although phylogenetic analysis required extensive adjustment of program settings, it ultimately produced a well-resolved phylogeny for the Timaliidae, which provided strong support for major subclades within the family but extensive paraphyly of genera.
Abstract: The avian family Timaliidae is a species rich and morphologically diverse component of African and Asian tropi- cal forests. The morphological diversity within the family has attracted interest from ecologists and evolutionary biologists, but systematists have long suspected that this diversity might also mislead taxonomy, and recent molecular phylogenetic work has supported this hypothesis. We produced and analyzed a data set of 6 genes and almost 300 individuals to assess the evolutionary history of the family. Although phylogenetic analysis required extensive adjustment of program settings, we ultimately produced a well-resolved phylogeny for the family. The resulting phylogeny provided strong support for major subclades within the family but extensive paraphyly of genera. Only 3 genera represented by more than 3 species were monophyletic. Biogeographic reconstruction indicated a mainland Asian origin for the family and most major clades. Colonization of Africa, Sundaland, and the Philippines occurred relatively late in the family's history and was mostly unidi- rectional. Several putative babbler genera, such as Robsonius, Malia, Leonardina, and Micromacronus are only distantly related to the Timaliidae. (Babbler; biogeography; convergence; parameter interaction; Timaliidae.) The Timaliidae, generally known as the babblers, is a diverse family of oscine passerine birds that tradition- ally includes about 275 species in 50 genera (Dickinson 2003). These Old World insectivores are strikingly di- verse, both in species richness and breadth of mor- phological and behavioral adaptations. Babblers are highly social forest birds that often are found in mixed- species flocks. Their diversity of forms and behaviors, which has led to comparisons with Neotropical antbirds (Thamnophilidae) and antthrushes (Formicariidae) in their ecological diversity (Collar 2003), is reflected in the English names of some babbler genera: wren- babblers, jungle-babblers, tit-babblers, thrush-babblers, parrotbills, scimitar-babblers, etc. Babblers are a major component of the tropical Asian avifauna and a model system to study the biogeogra- phy of SE Asia. This species-rich family reaches its high- est diversity in SE Asia and is almost entirely restricted to the Old World (one species occurs in North America). Babblers are a significant part of the forest community in Asia, with a dozen or more species co-occurring in most areas. This high level of sympatry suggests that they are ideal for assessing general diversification patterns and testing biogeographic congruence among multiple codistributed groups. Most species of babblers are re- stricted to the interior of tropical forests, have relatively limited distributions, and are not migratory. These at-

Journal ArticleDOI
Xuming Zhou1, Shixia Xu1, Junxiao Xu1, Bingyao Chen1, Kaiya Zhou1, Guang Yang1 
TL;DR: Pegasoferae (Perissodactyla + Carnivora + Pholidota + Chiroptera) does not appear to be a natural group and Divergence time estimates were similar to those of several studies and suggest that the divergences among these orders occurred within just a few million years.
Abstract: Although great progress has been made in resolving the relationships of placental mammals, the position of several clades in Laurasiatheria remain controversial. In this study, we performed a phylogenetic analysis of 97 orthologs (46,152 bp) for 15 taxa, representing all laurasiatherian orders. Additionally, phylogenetic trees of laurasiatherian mammals with draft genome sequences were reconstructed based on 1608 exons (2,175,102 bp). Our reconstructions resolve the interordinal relationships within Laurasiatheria and corroborate the clades Scrotifera, Fereuungulata, and Cetartiodactyla. Furthermore, we tested alternative topologies within Laurasiatheria, and among alternatives for the phylogenetic position of Perissodactyla, a sister-group relationship with Cetartiodactyla receives the highest support. Thus, Pegasoferae (Perissodactyla + Carnivora + Pholidota + Chiroptera) does not appear to be a natural group. Divergence time estimates from these genes were compared with published estimates for splits within Laurasiatheria. Our estimates were similar to those of several studies and suggest that the divergences among these orders occurred within just a few million years.

Journal ArticleDOI
TL;DR: It is proved that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameters of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution.
Abstract: We prove that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameter of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution. This equivalence has several implications: 1. Understanding the structure of the linear model for GLS regression provides insight into when and why phylogeny is important in comparative studies. 2. The limitations of the PIC regression analysis are the same as the limitations of the GLS model. In particular, phylogenetic covariance applies only to the response variable in the regression and the explanatory variable should be regarded as fixed. Calculation of PICs for explanatory variables should be treated as a mathematical idiosyncrasy of the PIC regression algorithm. 3. Since the GLS estimator is the best linear unbiased estimator (BLUE), the slope parameter estimated using PICs is also BLUE. 4. If the slope is estimated using different branch lengths for the explanatory and response variables in the PIC algorithm, the estimator is no longer the BLUE, so this is not recommended. Finally, we discuss whether or not and how to accommodate phylogenetic covariance in regression analyses, particularly in relation to the problem of phylogenetic uncertainty. This discussion is from both frequentist and Bayesian perspectives.