Author
Claudia C. Weber
Other affiliations: Uppsala University, University of Bath, Temple University
Bio: Claudia C. Weber is an academic researcher from European Bioinformatics Institute. The author has contributed to research in topics: Fixation (population genetics) & Molecular evolution. The author has an hindex of 13, co-authored 19 publications receiving 1912 citations. Previous affiliations of Claudia C. Weber include Uppsala University & University of Bath.
Papers
More filters
••
Duke University1, University of Texas at Austin2, Heidelberg Institute for Theoretical Studies3, American Museum of Natural History4, Xi'an Jiaotong University5, Beijing Genomics Institute6, New Mexico State University7, University of Sydney8, University of California9, Uppsala University10, University of Copenhagen11, Okinawa Institute of Science and Technology12, University of Georgia13, Griffith University14, Catalan Institution for Research and Advanced Studies15, Joint Institute for Nuclear Research16, Oak Ridge National Laboratory17, Aarhus University18, Washington University in St. Louis19, University of California, Santa Cruz20, Cardiff University21, Kunming Institute of Zoology22, China Agricultural University23, Tulane University24, Louisiana State University25, Copenhagen Zoo26, Oregon Health & Science University27, Federal University of Pará28, Technical University of Denmark29, Canterbury Museum30, Curtin University31, Novosibirsk State University32, Smithsonian Institution33, National University of Singapore34, National Museum of Natural History35, Nova Southeastern University36, Occidental College37, University of Edinburgh38, Harvard University39, University of California, San Francisco40, University of Florida41, University of Illinois at Urbana–Champaign42
TL;DR: A genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves recovered a highly resolved tree that confirms previously controversial sister or close relationships and identifies the first divergence in Neoaves, two groups the authors named Passerea and Columbea.
Abstract: To better determine the history of modern birds, we performed a genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves using phylogenomic methods created to handle genome-scale data. We recovered a highly resolved tree that confirms previously controversial sister or close relationships. We identified the first divergence in Neoaves, two groups we named Passerea and Columbea, representing independent lineages of diverse and convergently evolved land and water bird species. Among Passerea, we infer the common ancestor of core landbirds to have been an apex predator and confirm independent gains of vocal learning. Among Columbea, we identify pigeons and flamingoes as belonging to sister clades. Even with whole genomes, some of the earliest branches in Neoaves proved challenging to resolve, which was best explained by massive protein-coding sequence convergence and high levels of incomplete lineage sorting that occurred during a rapid radiation after the Cretaceous-Paleogene mass extinction event about 66 million years ago.
1,624 citations
••
Howard Hughes Medical Institute1, University of Texas at Austin2, Heidelberg Institute for Theoretical Studies3, Xi'an Jiaotong University4, University of Copenhagen5, New Mexico State University6, University of Sydney7, Louisiana State University8, University of California, Los Angeles9, University of Montpellier10, Uppsala University11, Okinawa Institute of Science and Technology12, University of Georgia13, University of Edinburgh14, Harvard University15, Karlsruhe Institute of Technology16, University of California, San Francisco17, American Museum of Natural History18, University of Florida19, Curtin University20
TL;DR: The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date and the sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.
Abstract: Background: Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses. Findings: Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence. Conclusions: The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.
84 citations
••
TL;DR: It is found that lineages with large populations and short generations exhibit higher GC content, and the effect extends to both coding and non-coding sites, indicating that it is not due to selection on codon usage.
Abstract: While effective population size (Ne) and life history traits such as generation time are known to impact substitution rates, their potential effects on base composition evolution are less well understood. GC content increases with decreasing body mass in mammals, consistent with recombination-associated GC biased gene conversion (gBGC) more strongly impacting these lineages. However, shifts in chromosomal architecture and recombination landscapes between species may complicate the interpretation of these results. In birds, interchromosomal rearrangements are rare and the recombination landscape is conserved, suggesting that this group is well suited to assess the impact of life history on base composition. Employing data from 45 newly and 3 previously sequenced avian genomes covering a broad range of taxa, we found that lineages with large populations and short generations exhibit higher GC content. The effect extends to both coding and non-coding sites, indicating that it is not due to selection on codon usage. Consistent with recombination driving base composition, GC content and heterogeneity were positively correlated with the rate of recombination. Moreover, we observed ongoing increases in GC in the majority of lineages. Our results provide evidence that gBGC may drive patterns of nucleotide composition in avian genomes and are consistent with more effective gBGC in large populations and a greater number of meioses per unit time; that is, a shorter generation time. Thus, in accord with theoretical predictions, base composition evolution is substantially modulated by species life history.
82 citations
••
TL;DR: The relationship between demography and genomic base composition is in agreement with the gBGC hypothesis: organisms with larger populations have higher GC content than those with smaller populations.
Abstract: The origin and evolutionary dynamics of the spatial heterogeneity in genomic base composition have been debated since its discovery in the 1970s. With the recent availability of numerous genome sequences from a wide range of species it has been possible to address this question from a comparative perspective, and similarities and differences in base composition between groups of organisms are becoming evident. Ample evidence suggests that the contrasting dynamics of base composition are driven by GC-biased gene conversion (gBGC), a process that is associated with meiotic recombination. In line with this hypothesis, base composition is associated with the rate of recombination and the evolutionary dynamics of the recombination landscape, therefore, governs base composition. In addition, and at first sight perhaps surprisingly, the relationship between demography and genomic base composition is in agreement with the gBGC hypothesis: organisms with larger populations have higher GC content than those with smaller populations.
70 citations
••
TL;DR: A negative correlation between dN/dS and body mass is found, contrary to nearly neutral expectation, and the ratio of radical to conservative amino acid substitutions (Kr/Kc) correlates positively with body mass.
Abstract: Background: The ratio of the rates of non-synonymous and synonymous substitution (dN/dS) is commonly used to estimate selection in coding sequences. It is often suggested that, all else being equal, dN/dS should be lower in populations with large effective size (Ne) due to increased efficacy of purifying selection. As Ne is difficult to measure directly, life history traits such as body mass, which is typically negatively associated with population size, have commonly been used as proxies in empirical tests of this hypothesis. However, evidence of whether the expected positive correlation between body mass and dN/dS is consistently observed is conflicting. Results: Employing whole genome sequence data from 48 avian species, we assess the relationship between rates of molecular evolution and life history in birds. We find a negative correlation between dN/dS and body mass, contrary to nearly neutral expectation. This raises the question whether the correlation might be a method artefact. We therefore in turn consider non-stationary base composition, divergence time and saturation as possible explanations, but find no clear patterns. However, in striking contrast to dN/dS, the ratio of radical to conservative amino acid substitutions (Kr/Kc) correlates positively with body mass. Conclusions: Our results in principle accord with the notion that non-synonymous substitutions causing radical amino acid changes are more efficiently removed by selection in large populations, consistent with nearly neutral theory. These findings have implications for the use of dN/dS and suggest that caution is warranted when drawing conclusions about lineage-specific modes of protein evolution using this metric.
59 citations
Cited by
More filters
••
TL;DR: The approach to utilizing available RNA-Seq and other data types in the authors' manual curation process for vertebrate, plant, and other species is summarized, and a new direction for prokaryotic genomes and protein name management is described.
Abstract: The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.
4,104 citations
••
TL;DR: PartitionFinder 2 is a program for automatically selecting best-fit partitioning schemes and models of evolution for phylogenetic analyses that includes the ability to analyze morphological datasets, new methods to analyze genome-scale datasets, and new output formats to facilitate interoperability with downstream software.
Abstract: PartitionFinder 2 is a program for automatically selecting best-fit partitioning schemes and models of evolution for phylogenetic analyses. PartitionFinder 2 is substantially faster and more efficient than version 1, and incorporates many new methods and features. These include the ability to analyze morphological datasets, new methods to analyze genome-scale datasets, new output formats to facilitate interoperability with downstream software, and many new models of molecular evolution. PartitionFinder 2 is freely available under an open source license and works on Windows, OSX, and Linux operating systems. It can be downloaded from www.robertlanfear.com/partitionfinder. The source code is available at https://github.com/brettc/partitionfinder.
3,445 citations
••
Duke University1, University of Texas at Austin2, Heidelberg Institute for Theoretical Studies3, Xi'an Jiaotong University4, American Museum of Natural History5, Beijing Genomics Institute6, New Mexico State University7, University of Sydney8, University of California9, Uppsala University10, University of Copenhagen11, Okinawa Institute of Science and Technology12, University of Georgia13, Griffith University14, Catalan Institution for Research and Advanced Studies15, Oak Ridge National Laboratory16, Joint Institute for Nuclear Research17, Aarhus University18, Washington University in St. Louis19, University of California, Santa Cruz20, Cardiff University21, Kunming Institute of Zoology22, China Agricultural University23, Tulane University24, Louisiana State University25, Copenhagen Zoo26, Oregon Health & Science University27, Federal University of Pará28, Technical University of Denmark29, Canterbury Museum30, Curtin University31, Novosibirsk State University32, Smithsonian Institution33, National University of Singapore34, National Museum of Natural History35, Nova Southeastern University36, Occidental College37, University of Edinburgh38, Harvard University39, University of California, San Francisco40, University of Florida41, University of Illinois at Urbana–Champaign42
TL;DR: A genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves recovered a highly resolved tree that confirms previously controversial sister or close relationships and identifies the first divergence in Neoaves, two groups the authors named Passerea and Columbea.
Abstract: To better determine the history of modern birds, we performed a genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves using phylogenomic methods created to handle genome-scale data. We recovered a highly resolved tree that confirms previously controversial sister or close relationships. We identified the first divergence in Neoaves, two groups we named Passerea and Columbea, representing independent lineages of diverse and convergently evolved land and water bird species. Among Passerea, we infer the common ancestor of core landbirds to have been an apex predator and confirm independent gains of vocal learning. Among Columbea, we identify pigeons and flamingoes as belonging to sister clades. Even with whole genomes, some of the earliest branches in Neoaves proved challenging to resolve, which was best explained by massive protein-coding sequence convergence and high levels of incomplete lineage sorting that occurred during a rapid radiation after the Cretaceous-Paleogene mass extinction event about 66 million years ago.
1,624 citations
••
TL;DR: This work presents BUSCO v3 with example analyses that highlight the wide‐ranging utility of BUSCO assessments, which extend beyond quality control of genomics data sets to applications in comparative genomics analyses, gene predictor training, metagenomics, and phylogenomics.
Abstract: Genomics promises comprehensive surveying of genomes and metagenomes, but rapidly changing technologies and expanding data volumes make evaluation of completeness a challenging task. Technical sequencing quality metrics can be complemented by quantifying completeness of genomic data sets in terms of the expected gene content of Benchmarking Universal Single-Copy Orthologs (BUSCO, http://busco.ezlab.org). The latest software release implements a complete refactoring of the code to make it more flexible and extendable to facilitate high-throughput assessments. The original six lineage assessment data sets have been updated with improved species sampling, 34 new subsets have been built for vertebrates, arthropods, fungi, and prokaryotes that greatly enhance resolution, and data sets are now also available for nematodes, protists, and plants. Here, we present BUSCO v3 with example analyses that highlight the wide-ranging utility of BUSCO assessments, which extend beyond quality control of genomics data sets to applications in comparative genomics analyses, gene predictor training, metagenomics, and phylogenomics.
1,575 citations
••
TL;DR: The Environment for Tree Exploration v3 is presented, featuring numerous improvements in the underlying library of methods, and providing a novel set of standalone tools to perform common tasks in comparative genomics and phylogenetics.
Abstract: The Environment for Tree Exploration (ETE) is a computational framework that simplifies the reconstruction, analysis, and visualization of phylogenetic trees and multiple sequence alignments. Here, we present ETE v3, featuring numerous improvements in the underlying library of methods, and providing a novel set of standalone tools to perform common tasks in comparative genomics and phylogenetics. The new features include (i) building gene-based and supermatrix-based phylogenies using a single command, (ii) testing and visualizing evolutionary models, (iii) calculating distances between trees of different size or including duplications, and (iv) providing seamless integration with the NCBI taxonomy database. ETE is freely available at http://etetoolkit.org.
1,452 citations