scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System

08 Jul 2013-PLOS ONE (Public Library of Science)-Vol. 8, Iss: 7, pp 1-16
TL;DR: A persistent, species-level taxonomic registry for the animal kingdom is developed based on the analysis of patterns of nucleotide variation in the barcode region of the cytochrome c oxidase I (COI) gene.
Abstract: Because many animal species are undescribed, and because the identification of known species is often difficult, interim taxonomic nomenclature has often been used in biodiversity analysis. By assigning individuals to presumptive species, called operational taxonomic units (OTUs), these systems speed investigations into the patterning of biodiversity and enable studies that would otherwise be impossible. Although OTUs have conventionally been separated through their morphological divergence, DNA-based delineations are not only feasible, but have important advantages. OTU designation can be automated, data can be readily archived, and results can be easily compared among investigations. This study exploits these attributes to develop a persistent, species-level taxonomic registry for the animal kingdom based on the analysis of patterns of nucleotide variation in the barcode region of the cytochrome c oxidase I (COI) gene. It begins by examining the correspondence between groups of specimens identified to a species through prior taxonomic work and those inferred from the analysis of COI sequence variation using one new (RESL) and four established (ABGD, CROP, GMYC, jMOTU) algorithms. It subsequently describes the implementation, and structural attributes of the Barcode Index Number (BIN) system. Aside from a pragmatic role in biodiversity assessments, BINs will aid revisionary taxonomy by flagging possible cases of synonymy, and by collating geographical information, descriptive metadata, and images for specimens that are likely to belong to the same species, even if it is undescribed. More than 274,000 BIN web pages are now available, creating a biodiversity resource that is positioned for rapid growth.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
30 May 2014-Science
TL;DR: The biodiversity of eukaryote species and their extinction rates, distributions, and protection is reviewed, and what the future rates of species extinction will be, how well protected areas will slow extinction Rates, and how the remaining gaps in knowledge might be filled are reviewed.
Abstract: Background A principal function of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) is to “perform regular and timely assessments of knowledge on biodiversity.” In December 2013, its second plenary session approved a program to begin a global assessment in 2015. The Convention on Biological Diversity (CBD) and five other biodiversity-related conventions have adopted IPBES as their science-policy interface, so these assessments will be important in evaluating progress toward the CBD’s Aichi Targets of the Strategic Plan for Biodiversity 2011–2020. As a contribution toward such assessment, we review the biodiversity of eukaryote species and their extinction rates, distributions, and protection. We document what we know, how it likely differs from what we do not, and how these differences affect biodiversity statistics. Interestingly, several targets explicitly mention “known species”—a strong, if implicit, statement of incomplete knowledge. We start by asking how many species are known and how many remain undescribed. We then consider by how much human actions inflate extinction rates. Much depends on where species are, because different biomes contain different numbers of species of different susceptibilities. Biomes also suffer different levels of damage and have unequal levels of protection. How extinction rates will change depends on how and where threats expand and whether greater protection counters them. Different visualizations of species biodiversity. ( A ) The distributions of 9927 bird species. ( B ) The 4964 species with smaller than the median geographical range size. ( C ) The 1308 species assessed as threatened with a high risk of extinction by BirdLife International for the Red List of Threatened Species of the International Union for Conservation of Nature. ( D ) The 1080 threatened species with less than the median range size. (D) provides a strong geographical focus on where local conservation actions can have the greatest global impact. Additional biodiversity maps are available at www.biodiversitymapping.org. Advances Recent studies have clarified where the most vulnerable species live, where and how humanity changes the planet, and how this drives extinctions. These data are increasingly accessible, bringing greater transparency to science and governance. Taxonomic catalogs of plants, terrestrial vertebrates, freshwater fish, and some marine taxa are sufficient to assess their status and the limitations of our knowledge. Most species are undescribed, however. The species we know best have large geographical ranges and are often common within them. Most known species have small ranges, however, and such species are typically newer discoveries. The numbers of known species with very small ranges are increasing quickly, even in well-known taxa. They are geographically concentrated and are disproportionately likely to be threatened or already extinct. We expect unknown species to share these characteristics. Current rates of extinction are about 1000 times the background rate of extinction. These are higher than previously estimated and likely still underestimated. Future rates will depend on many factors and are poised to increase. Finally, although there has been rapid progress in developing protected areas, such efforts are not ecologically representative, nor do they optimally protect biodiversity. Outlook Progress on assessing biodiversity will emerge from continued expansion of the many recently created online databases, combining them with new global data sources on changing land and ocean use and with increasingly crowdsourced data on species’ distributions. Examples of practical conservation that follow from using combined data in Colombia and Brazil can be found at www.savingspecies.org and www.youtube.com/watch?v=R3zjeJW2NVk.

2,360 citations


Cites background from "A DNA-Based Registry for All Animal..."

  • ...It raises the controversial idea that many species may become known by a number derived from barcoding and not—or not only—from conventional descriptions (123)....

    [...]

Journal ArticleDOI
TL;DR: Clumpak, available at http://clumpak.tau.ac.il, simplifies the use of model-based analyses of population structure in population genetics and molecular ecology by automating the postprocessing of results of model‐based population structure analyses.
Abstract: The identification of the genetic structure of populations from multilocus genotype data has become a central component of modern population-genetic data analysis. Application of model-based clustering programs often entails a number of steps, in which the user considers different modelling assumptions, compares results across different predetermined values of the number of assumed clusters (a parameter typically denoted K), examines multiple independent runs for each fixed value of K, and distinguishes among runs belonging to substantially distinct clustering solutions. Here, we present CLUMPAK (Cluster Markov Packager Across K), a method that automates the postprocessing of results of model-based population structure analyses. For analysing multiple independent runs at a single K value, CLUMPAK identifies sets of highly similar runs, separating distinct groups of runs that represent distinct modes in the space of possible solutions. This procedure, which generates a consensus solution for each distinct mode, is performed by the use of a Markov clustering algorithm that relies on a similarity matrix between replicate runs, as computed by the software CLUMPP. Next, CLUMPAK identifies an optimal alignment of inferred clusters across different values of K, extending a similar approach implemented for a fixed K in CLUMPP and simplifying the comparison of clustering results across different K values. CLUMPAK incorporates additional features, such as implementations of methods for choosing K and comparing solutions obtained by different programs, models, or data subsets. CLUMPAK, available at http://clumpak.tau.ac.il, simplifies the use of model-based analyses of population structure in population genetics and molecular ecology.

2,252 citations


Cites background from "A DNA-Based Registry for All Animal..."

  • ...…identifying clusters, and it has previously been adapted for diverse biological problems, including orthology assignments (Enright et al. 2002), detection of operational taxonomic units (Ratnasingham & Hebert 2013), and identification of co-occurring associations among microbes (Faust et al. 2012)....

    [...]

Journal ArticleDOI
01 Jan 2020-Database
TL;DR: The National Center for Biotechnology Information (NCBI) Taxonomy includes organism names and classifications for every sequence in the nucleotide and protein sequence databases of the International Nucleotide Sequence Database Collaboration.
Abstract: The National Center for Biotechnology Information (NCBI) Taxonomy includes organism names and classifications for every sequence in the nucleotide and protein sequence databases of the International Nucleotide Sequence Database Collaboration. Since the last review of this resource in 2012, it has undergone several improvements. Most notable is the shift from a single SQL database to a series of linked databases tied to a framework of data called NameBank. This means that relations among data elements can be adjusted in more detail, resulting in expanded annotation of synonyms, the ability to flag names with specific nomenclatural properties, enhanced tracking of publications tied to names and improved annotation of scientific authorities and types. Additionally, practices utilized by NCBI Taxonomy curators specific to major taxonomic groups are described, terms peculiar to NCBI Taxonomy are explained, external resources are acknowledged and updates to tools and other resources are documented. Database URL: https://www.ncbi.nlm.nih.gov/taxonomy.

685 citations

Journal ArticleDOI
TL;DR: Wagner et al. as discussed by the authors found that more than half of all amphibians are imperiled and more than 80% of all vertebrate species are in danger of extinction over the next few decades.
Abstract: Nature is under siege. In the last 10,000 y the human population has grown from 1 million to 7.8 billion. Much of Earth’s arable lands are already in agriculture (1), millions of acres of tropical forest are cleared each year (2, 3), atmospheric CO2 levels are at their highest concentrations in more than 3 million y (4), and climates are erratically and steadily changing from pole to pole, triggering unprecedented droughts, fires, and floods across continents. Indeed, most biologists agree that the world has entered its sixth mass extinction event, the first since the end of the Cretaceous Period 66 million y ago, when more than 80% of all species, including the nonavian dinosaurs, perished. Ongoing losses have been clearly demonstrated for better-studied groups of organisms. Terrestrial vertebrate population sizes and ranges have contracted by one-third, and many mammals have experienced range declines of at least 80% over the last century (5). A 2019 assessment suggests that half of all amphibians are imperiled (2.5% of which have recently gone extinct) (6). Bird numbers across North America have fallen by 2.9 billion since 1970 (7). Prospects for the world’s coral reefs, beyond the middle of this century, could scarcely be more dire (8). A 2020 United Nations report estimated that more than a million species are in danger of extinction over the next few decades (9), but also see the more bridled assessments in refs. 10 and 11. Although a flurry of reports has drawn attention to declines in insect abundance, biomass, species richness, and range sizes (e.g., refs. 12⇓⇓⇓⇓⇓–18; for reviews see refs. 19 and 20), whether the rates of declines for insects are on par with or exceed those for other groups remains unknown. There are still too … [↵][1]1To whom correspondence may be addressed. Email: david.wagner{at}uconn.edu. [1]: #xref-corresp-1-1

609 citations

Journal ArticleDOI
TL;DR: The multi‐rate PTP is introduced, an improved method that alleviates the theoretical and technical shortcomings of PTP and consistently yields more accurate delimitations with respect to the taxonomy (i.e., identifies more taxonomic species, infers species numbers closer to theTaxonomy).
Abstract: Motivation: In recent years, molecular species delimitation has become a routine approach for quantifying and classifying biodiversity. Barcoding methods are of particular importance in large-scale surveys as they promote fast species discovery and biodiversity estimates. Among those, distance-based methods are the most common choice as they scale well with large datasets; however, they are sensitive to similarity threshold parameters and they ignore evolutionary relationships. The recently introduced "Poisson Tree Processes" (PTP) method is a phylogeny-aware approach that does not rely on such thresholds. Yet, two weaknesses of PTP impact its accuracy and practicality when applied to large datasets; it does not account for divergent intraspecific variation and is slow for a large number of sequences. Results: We introduce the multi-rate PTP (mPTP), an improved method that alleviates the theoretical and technical shortcomings of PTP. It incorporates different levels of intraspecific genetic diversity deriving from differences in either the evolutionary history or sampling of each species. Results on empirical data suggest that mPTP is superior to PTP and popular distance-based methods as it, consistently yields more accurate delimitations with respect to the taxonomy (i.e., identifies more taxonomic species, infers species numbers closer to the taxonomy). Moreover, mPTP does not require any similarity threshold as input. The novel dynamic programming algorithm attains a speedup of at least five orders of magnitude compared to PTP, allowing it to delimit species in large (meta-) barcoding data. In addition, Markov Chain Monte Carlo sampling provides a comprehensive evaluation of the inferred delimitation in just a few seconds for millions of steps, independently of tree size. Availability and Implementation: mPTP is implemented in C and is available for download at http://github.com/Pas-Kapli/mptp under the GNU Affero 3 license. A web-service is available at http://mptp.h-its.org . Contact: : paschalia.kapli@h-its.org or alexandros.stamatakis@h-its.org or tomas.flouri@h-its.org. Supplementary information: Supplementary data are available at Bioinformatics online.

535 citations


Cites background from "A DNA-Based Registry for All Animal..."

  • ...This and previous studies (Monaghan et al., 2009; Esselstyn et al., 2012; Ratnasingham and Hebert, 2013; Tang et al., 2014) suggest that single-locus barcoding methods provide meaningful clusters, close to taxonomically acknowledged species....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods for reconstructing phylogenetic trees from evolutionary distance data.
Abstract: A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.

57,055 citations


"A DNA-Based Registry for All Animal..." refers methods in this paper

  • ...N N Sequences: Sequences are represented in three ways: 1) as a histogram of distances generated from all pairwise comparisons within the BIN together with a representative of the nearest neighbouring BIN, 2) as a neighbor-joining tree [53] and as a haplotype similarity network diagram....

    [...]

Journal ArticleDOI
TL;DR: M mothur is used as a case study to trim, screen, and align sequences; calculate distances; assign sequences to operational taxonomic units; and describe the α and β diversity of eight marine samples previously characterized by pyrosequencing of 16S rRNA gene fragments.
Abstract: mothur aims to be a comprehensive software package that allows users to use a single piece of software to analyze community sequence data. It builds upon previous tools to provide a flexible and powerful software package for analyzing sequencing data. As a case study, we used mothur to trim, screen, and align sequences; calculate distances; assign sequences to operational taxonomic units; and describe the alpha and beta diversity of eight marine samples previously characterized by pyrosequencing of 16S rRNA gene fragments. This analysis of more than 222,000 sequences was completed in less than 2 h with a laptop computer.

17,350 citations


Additional excerpts

  • ...Algorithms based on single linkage clustering [20–23] have been widely used to quantify microbial diversity....

    [...]

Journal ArticleDOI
TL;DR: A new graphical display is proposed for partitioning techniques, where each cluster is represented by a so-called silhouette, which is based on the comparison of its tightness and separation, and provides an evaluation of clustering validity.

14,144 citations


"A DNA-Based Registry for All Animal..." refers methods in this paper

  • ...RESL defines the boundaries of each OTU selected for analysis by generating clusters using a range of values for the inflation parameter in MCL and then selects that which maximizes the Silhouette index [37]....

    [...]

  • ...The final step selects the optimal partitions for OTUs based on the Silhouette index, a cluster validation method that measures how tightly clusters are integrated [37]....

    [...]

Journal ArticleDOI
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

14,075 citations


"A DNA-Based Registry for All Animal..." refers background in this paper

  • ...A profile Hidden Markov Model [43] of the COI protein [44] aligns the input sequences....

    [...]

  • ...If a sequence meets these quality requirements, it is then checked for reading frame shifts as indicated by stop codons or improbable peptides given the COI profile [44]....

    [...]

Journal ArticleDOI
TL;DR: jModelTest 2: more models, new heuristics and parallel computing Diego Darriba, Guillermo L. Taboada, Ramón Doallo and David Posada.
Abstract: jModelTest 2: more models, new heuristics and parallel computing Diego Darriba, Guillermo L. Taboada, Ramón Doallo and David Posada Supplementary Table 1. New features in jModelTest 2 Supplementary Table 2. Model selection accuracy Supplementary Table 3. Mean square errors for model averaged estimates Supplementary Note 1. Hill-climbing hierarchical clustering algorithm Supplementary Note 2. Heuristic filtering Supplementary Note 3. Simulations from prior distributions Supplementary Note 4. Speed-up benchmark on real and simulated datasets

13,100 citations


"A DNA-Based Registry for All Animal..." refers methods in this paper

  • ...Prior to phylogeny reconstruction, the most appropriate model of evolution was separately estimated for each dataset from alignments using jModelTest [50]....

    [...]