scispace - formally typeset
Search or ask a question
Author

Stefan Engelen

Bio: Stefan Engelen is an academic researcher from French Alternative Energies and Atomic Energy Commission. The author has contributed to research in topics: Nanopore sequencing & Genome. The author has an hindex of 19, co-authored 40 publications receiving 5061 citations. Previous affiliations of Stefan Engelen include Commissariat à l'énergie atomique et aux énergies alternatives & University of Évry Val d'Essonne.

Papers
More filters
Journal ArticleDOI
22 May 2015-Science
TL;DR: This work identifies ocean microbial core functionality and reveals that >73% of its abundance is shared with the human gut microbiome despite the physicochemical differences between these two ecosystems.
Abstract: Microbes are dominant drivers of biogeochemical processes, yet drawing a global picture of functional diversity, microbial community structure, and their ecological determinants remains a grand challenge. We analyzed 7.2 terabases of metagenomic data from 243 Tara Oceans samples from 68 locations in epipelagic and mesopelagic waters across the globe to generate an ocean microbial reference gene catalog with >40 million nonredundant, mostly novel sequences from viruses, prokaryotes, and picoeukaryotes. Using 139 prokaryote-enriched samples, containing >35,000 species, we show vertical stratification with epipelagic community composition mostly driven by temperature rather than other environmental factors or geography. We identify ocean microbial core functionality and reveal that >73% of its abundance is shared with the human gut microbiome despite the physicochemical differences between these two ecosystems.

1,934 citations

Journal ArticleDOI
Colomban de Vargas1, Colomban de Vargas2, Stéphane Audic2, Stéphane Audic1, Nicolas Henry1, Nicolas Henry2, Johan Decelle1, Johan Decelle2, Frédéric Mahé3, Frédéric Mahé1, Frédéric Mahé2, Ramiro Logares4, Enrique Lara, Cédric Berney2, Cédric Berney1, Noan Le Bescot2, Noan Le Bescot1, Ian Probert1, Ian Probert2, Margaux Carmichael5, Margaux Carmichael2, Margaux Carmichael1, Julie Poulain6, Sarah Romac1, Sarah Romac2, Sébastien Colin5, Sébastien Colin1, Sébastien Colin2, Jean-Marc Aury6, Lucie Bittner, Samuel Chaffron7, Samuel Chaffron8, Micah Dunthorn3, Stefan Engelen6, Olga Flegontova9, Olga Flegontova10, Lionel Guidi2, Lionel Guidi1, Aleš Horák10, Aleš Horák9, Olivier Jaillon6, Olivier Jaillon11, Olivier Jaillon1, Gipsi Lima-Mendez7, Gipsi Lima-Mendez8, Julius Lukeš10, Julius Lukeš9, Julius Lukeš12, Shruti Malviya5, Raphael Morard1, Raphael Morard2, Raphael Morard13, Matthieu Mulot, Eleonora Scalco14, Raffaele Siano15, Flora Vincent8, Flora Vincent5, Adriana Zingone14, Céline Dimier2, Céline Dimier1, Céline Dimier5, Marc Picheral1, Marc Picheral2, Sarah Searson2, Sarah Searson1, Stefanie Kandels-Lewis16, Tara Oceans Coordinators17, Silvia G. Acinas4, Peer Bork16, Peer Bork18, Chris Bowler5, Gabriel Gorsky2, Gabriel Gorsky1, Nigel Grimsley1, Nigel Grimsley19, Pascal Hingamp20, Daniele Iudicone14, Fabrice Not1, Fabrice Not2, Hiroyuki Ogata17, Stephane Pesant13, Jeroen Raes7, Jeroen Raes8, Michael E. Sieracki21, Michael E. Sieracki22, Sabrina Speich23, Sabrina Speich5, Lars Stemmann1, Lars Stemmann2, Shinichi Sunagawa16, Jean Weissenbach11, Jean Weissenbach1, Jean Weissenbach6, Patrick Wincker11, Patrick Wincker1, Patrick Wincker6, Eric Karsenti5, Eric Karsenti16 
22 May 2015-Science
TL;DR: Diversity emerged at all taxonomic levels, both within the groups comprising the ~11,200 cataloged morphospecies of eukaryotic plankton and among twice as many other deep-branching lineages of unappreciated importance in plankton ecology studies.
Abstract: Marine plankton support global biological and geochemical processes. Surveys of their biodiversity have hitherto been geographically restricted and have not accounted for the full range of plankton size. We assessed eukaryotic diversity from 334 size-fractionated photic-zone plankton communities collected across tropical and temperate oceans during the circumglobal Tara Oceans expedition. We analyzed 18S ribosomal DNA sequences across the intermediate plankton-size spectrum from the smallest unicellular eukaryotes (protists, >0.8 micrometers) to small animals of a few millimeters. Eukaryotic ribosomal diversity saturated at ~150,000 operational taxonomic units, about one-third of which could not be assigned to known eukaryotic groups. Diversity emerged at all taxonomic levels, both within the groups comprising the ~11,200 cataloged morphospecies of eukaryotic plankton and among twice as many other deep-branching lineages of unappreciated importance in plankton ecology studies. Most eukaryotic plankton biodiversity belonged to heterotrophic protistan groups, particularly those known to be parasites or symbiotic hosts.

1,378 citations

Journal ArticleDOI
11 Apr 2018-Nature
TL;DR: Whole-genome sequencing and phenotyping of 1,011 natural isolates of the yeast Saccharomyces cerevisiae reveal its evolutionary history, including a single out-of-China origin and multiple domestication events, and provides a framework for genotype–phenotype studies in this model organism.
Abstract: Large-scale population genomic surveys are essential to explore the phenotypic diversity of natural populations. Here we report the whole-genome sequencing and phenotyping of 1,011 Saccharomyces cerevisiae isolates, which together provide an accurate evolutionary picture of the genomic variants that shape the species-wide phenotypic landscape of this yeast. Genomic analyses support a single ‘out-of-China’ origin for this species, followed by several independent domestication events. Although domesticated isolates exhibit high variation in ploidy, aneuploidy and genome content, genome evolution in wild isolates is mainly driven by the accumulation of single nucleotide polymorphisms. A common feature is the extensive loss of heterozygosity, which represents an essential source of inter-individual variation in this mainly asexual species. Most of the single nucleotide polymorphisms, including experimentally identified functional polymorphisms, are present at very low frequencies. The largest numbers of variants identified by genome-wide association are copy-number changes, which have a greater phenotypic effect than do single nucleotide polymorphisms. This resource will guide future population genomics and genotype–phenotype studies in this classic model system. Whole-genome sequencing of 1,011 natural isolates of the yeast Saccharomyces cerevisiae reveals its evolutionary history, including a single out-of-China origin and multiple domestication events, and provides a framework for genotype–phenotype studies in this model organism.

727 citations

Journal ArticleDOI
TL;DR: Improved data browsing and searching tools have been added, original tools useful in the context of expert annotation have been developed and integrated and the website has been significantly redesigned to be more user-friendly.
Abstract: MicroScope is an integrated platform dedicated to both the methodical updating of microbial genome annotation and to comparative analysis. The resource provides data from completed and ongoing genome projects (automatic and expert annotations), together with data sources from post-genomic experiments (i.e. transcriptomics, mutant collections) allowing users to perfect and improve the understanding of gene functions. MicroScope (http://www.genoscope.cns.fr/agc/microscope) combines tools and graphical interfaces to analyse genomes and to perform the manual curation of gene annotations in a comparative context. Since its first publication in January 2006, the system (previously named MaGe for Magnifying Genomes) has been continuously extended both in terms of data content and analysis tools. The last update of MicroScope was published in 2009 in the Database journal. Today, the resource contains data for >1600 microbial genomes, of which ∼300 are manually curated and maintained by biologists (1200 personal accounts today). Expert annotations are continuously gathered in the MicroScope database (∼50 000 a year), contributing to the improvement of the quality of microbial genomes annotations. Improved data browsing and searching tools have been added, original tools useful in the context of expert annotation have been developed and integrated and the website has been significantly redesigned to be more user-friendly. Furthermore, in the context of the European project Microme (Framework Program 7 Collaborative Project), MicroScope is becoming a resource providing for the curation and analysis of both genomic and metabolic data. An increasing number of projects are related to the study of environmental bacterial (meta)genomes that are able to metabolize a large variety of chemical compounds that may be of high industrial interest.

376 citations

Journal ArticleDOI
01 Jan 2009-Database
TL;DR: This article emphasizes the essential role of expert annotation as a complement of automatic annotation in microbial genome annotation, especially for genomes initially analyzed by automatic procedures alone.
Abstract: The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http:// www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope’s rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of microbial genome annotation, especially for genomes initially analyzed by automatic procedures alone. Database URLs: http://www.genoscope.cns.fr/agc/mage and http://www.genoscope.cns.fr/agc/microcyc

284 citations


Cited by
More filters
01 Jun 2012
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

10,124 citations

Journal ArticleDOI
TL;DR: Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences, is presented, demonstrating that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences or Oxford Nanopore technologies.
Abstract: Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences (PacBio) or Oxford Nanopore technologies and achieves a contig NG50 of >21 Mbp on both human and Drosophila melanogaster PacBio data sets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.

4,806 citations

Journal ArticleDOI

3,734 citations

Journal ArticleDOI
TL;DR: Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long-read depth and accuracy are low.
Abstract: The Illumina DNA sequencing platform generates accurate but short reads, which can be used to produce accurate but fragmented genome assemblies. Pacific Biosciences and Oxford Nanopore Technologies DNA sequencing platforms generate long reads that can produce complete genome assemblies, but the sequencing is more expensive and error-prone. There is significant interest in combining data from these complementary sequencing technologies to generate more accurate "hybrid" assemblies. However, few tools exist that truly leverage the benefits of both types of data, namely the accuracy of short reads and the structural resolving power of long reads. Here we present Unicycler, a new tool for assembling bacterial genomes from a combination of short and long reads, which produces assemblies that are accurate, complete and cost-effective. Unicycler builds an initial assembly graph from short reads using the de novo assembler SPAdes and then simplifies the graph using information from short and long reads. Unicycler uses a novel semi-global aligner to align long reads to the assembly graph. Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long-read depth and accuracy are low. Unicycler is open source (GPLv3) and available at github.com/rrwick/Unicycler.

2,245 citations

01 Jan 2011
TL;DR: The sheer volume and scope of data posed by this flood of data pose a significant challenge to the development of efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.
Abstract: Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole-genome sequencing, epigenetic surveys, expression profiling of coding and noncoding RNAs, single nucleotide polymorphism (SNP) and copy number profiling, and functional assays. Analysis of these large, diverse data sets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data pose a significant challenge to the development of such tools.

2,187 citations