scispace - formally typeset
Search or ask a question
Author

Guillaume Achaz

Bio: Guillaume Achaz is an academic researcher from University of Paris. The author has contributed to research in topics: Population & Coalescent theory. The author has an hindex of 27, co-authored 72 publications receiving 4039 citations. Previous affiliations of Guillaume Achaz include University of the French West Indies and Guiana & Centre national de la recherche scientifique.


Papers
More filters
Journal ArticleDOI
TL;DR: Automatic Barcode Gap Discovery is fast, simple method to split a sequence alignment data set into candidate species that should be complemented with other evidence in an integrative taxonomic approach.
Abstract: Within uncharacterized groups, DNA barcodes, short DNA sequences that are present in a wide range of species, can be used to assign organisms into species. We propose an automatic procedure that sorts the sequences into hypothetical species based on the barcode gap, which can be observed whenever the divergence among organisms belonging to the same species is smaller than divergence among organisms from different species. We use a range of prior intraspecific divergence to infer from the data a model-based one-sided confidence limit for intraspecific divergence. The method, called Automatic Barcode Gap Discovery (ABGD), then detects the barcode gap as the first significant gap beyond this limit and uses it to partition the data. Inference of the limit and gap detection are then recursively applied to previously obtained groups to get finer partitions until there is no further partitioning. Using six published data sets of metazoans, we show that ABGD is computationally efficient and performs well for standard prior maximum intraspecific divergences (a few per cent of divergence for the five data sets), except for one data set where less than three sequences per species were sampled. We further explore the theoretical limitations of ABGD through simulation of explicit speciation and population genetics scenarios. Our results emphasize in particular the sensitivity of the method to the presence of recent speciation events, via (unrealistically) high rates of speciation or large numbers of species. In conclusion, ABGD is fast, simple method to split a sequence alignment data set into candidate species that should be complemented with other evidence in an integrative taxonomic approach.

2,336 citations

Journal ArticleDOI
TL;DR: It is demonstrated that ASAP has the potential to become a major tool for taxonomists as it proposes rapidly in a full graphical exploratory interface relevant species hypothesis as a first step of the integrative taxonomy process.
Abstract: Here, we describe Assemble Species by Automatic Partitioning (ASAP), a new method to build species partitions from single locus sequence alignments (i.e., barcode data sets). ASAP is efficient enough to split data sets as large 104 sequences into putative species in several minutes. Although grounded in evolutionary theory, ASAP is the implementation of a hierarchical clustering algorithm that only uses pairwise genetic distances, avoiding the computational burden of phylogenetic reconstruction. Importantly, ASAP proposes species partitions ranked by a new scoring system that uses no biological prior insight of intraspecific diversity. ASAP is a stand-alone program that can be used either through a graphical web-interface or that can be downloaded and compiled for local usage. We have assessed its power along with three others programs (ABGD, PTP and GMYC) on 10 real COI barcode data sets representing various degrees of challenge (from small and easy cases to large and complicated data sets). We also used Monte-Carlo simulations of a multispecies coalescent framework to assess the strengths and weaknesses of ASAP and the other programs. Through these analyses, we demonstrate that ASAP has the potential to become a major tool for taxonomists as it proposes rapidly in a full graphical exploratory interface relevant species hypothesis as a first step of the integrative taxonomy process.

393 citations

Journal ArticleDOI
TL;DR: These findings provided evidence of frequent (37%) loss-of-function mutations in DEPDC5 associated with a broad spectrum of focal epilepsies and the implication of a DEP domain–containing protein that may be involved in membrane trafficking and/or G protein signaling opens new avenues for research.
Abstract: The main familial focal epilepsies are autosomal dominant nocturnal frontal lobe epilepsy, familial temporal lobe epilepsy and familial focal epilepsy with variable foci. A frameshift mutation in the DEPDC5 gene (encoding DEP domain-containing protein 5) was identified in a family with focal epilepsy with variable foci by linkage analysis and exome sequencing. Subsequent pyrosequencing of DEPDC5 in a cohort of 15 additional families with focal epilepsies identified 4 nonsense mutations and 1 missense mutation. Our findings provided evidence of frequent (37%) loss-of-function mutations in DEPDC5 associated with a broad spectrum of focal epilepsies. The implication of a DEP (Dishevelled, Egl-10 and Pleckstrin) domain-containing protein that may be involved in membrane trafficking and/or G protein signaling opens new avenues for research.

208 citations

Journal ArticleDOI
TL;DR: It is estimated that the authors may already have lost 7% of the species on Earth and that the biodiversity crisis is real, based on extrapolation from a random sample of land snail species via two independent approaches.
Abstract: Since the 1980s, many have suggested we are in the midst of a massive extinction crisis, yet only 799 (0.04%) of the 1.9 million known recent species are recorded as extinct, questioning the reality of the crisis. This low figure is due to the fact that the status of very few invertebrates, which represent the bulk of biodiversity, have been evaluated. Here we show, based on extrapolation from a random sample of land snail species via two independent approaches, that we may already have lost 7% (130,000 extinctions) of the species on Earth. However, this loss is masked by the emphasis on terrestrial vertebrates, the target of most conservation actions. Projections of species extinction rates are controversial because invertebrates are essentially excluded from these scenarios. Invertebrates can and must be assessed if we are to obtain a more realistic picture of the sixth extinction crisis.

167 citations

Journal ArticleDOI
01 Sep 2009-Genetics
TL;DR: The framework presented here paves the way for constructing novel tests optimized for specific violations of the standard model that ultimately will help to unravel scenarios of evolution.
Abstract: Neutrality tests based on the frequency spectrum (eg, Tajima's D or Fu and Li's F) are commonly used by population geneticists as routine tests to assess the goodness-of-fit of the standard neutral model on their data sets Here, I show that these neutrality tests are specific instances of a general model that encompasses them all I illustrate how this general framework can be taken advantage of to devise new more powerful tests that better detect deviations from the standard model Finally, I exemplify the usefulness of the framework on SNP data by showing how it supports the selection hypothesis in the lactase human gene by overcoming the ascertainment bias The framework presented here paves the way for constructing novel tests optimized for specific violations of the standard model that ultimately will help to unravel scenarios of evolution

148 citations


Cited by
More filters
Journal Article
Fumio Tajima1
30 Oct 1989-Genomics
TL;DR: It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.

11,521 citations

01 Jun 2012
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

10,124 citations

Journal ArticleDOI
25 Jun 2010-PLOS ONE
TL;DR: A new method to align two or more genomes that have undergone rearrangements due to recombination and substantial amounts of segmental gain and loss is described, demonstrating high accuracy in situations where genomes have undergone biologically feasible amounts of genome rearrangement, segmental loss and loss.
Abstract: Background Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms.

3,302 citations

Journal ArticleDOI
TL;DR: The DNA Sequence Polymorphism (DnaSP) software as mentioned in this paper is a popular tool for performing exhaustive population genetic analyses on multiple sequence alignments, such as single and multi-locus coalescent simulations under a wide range of demographic scenarios.
Abstract: We present version 6 of the DNA Sequence Polymorphism (DnaSP) software, a new version of the popular tool for performing exhaustive population genetic analyses on multiple sequence alignments. This major upgrade incorporates novel functionalities to analyze large data sets, such as those generated by high-throughput sequencing technologies. Among other features, DnaSP 6 implements: 1) modules for reading and analyzing data from genomic partitioning methods, such as RADseq or hybrid enrichment approaches, 2) faster methods scalable for high-throughput sequencing data, and 3) summary statistics for the analysis of multi-locus population genetics data. Furthermore, DnaSP 6 includes novel modules to perform single- and multi-locus coalescent simulations under a wide range of demographic scenarios. The DnaSP 6 program, with extensive documentation, is freely available at http://www.ub.edu/dnasp.

3,277 citations

Journal Article
TL;DR: FastTree as mentioned in this paper uses sequence profiles of internal nodes in the tree to implement neighbor-joining and uses heuristics to quickly identify candidate joins, then uses nearest-neighbor interchanges to reduce the length of the tree.
Abstract: Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest-neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N^2) space and O(N^2 L) time, but FastTree requires just O( NLa + N sqrt(N) ) memory and O( N sqrt(N) log(N) L a ) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 gigabytes of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 hours and 50 gigabytes of memory. In simulations, FastTree was slightly more accurate than neighbor joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.

2,436 citations