scispace - formally typeset
Search or ask a question

Showing papers by "Nikos C. Kyrpides published in 2010"


Journal ArticleDOI
21 May 2010-Science
TL;DR: Results from an initial reference genome sequencing of 178 microbial genomes allow for ~40% of random sequences from the microbiome of the gastrointestinal tract to be associated with organisms based on the match criteria used, suggesting that the authors are still far from saturating microbial species genetic data sets.
Abstract: The human microbiome refers to the community of microorganisms, including prokaryotes, viruses, and microbial eukaryotes, that populate the human body. The National Institutes of Health launched an initiative that focuses on describing the diversity of microbial species that are associated with health and disease. The first phase of this initiative includes the sequencing of hundreds of microbial reference genomes, coupled to metagenomic sequencing from multiple body sites. Here we present results from an initial reference genome sequencing of 178 microbial genomes. From 547,968 predicted polypeptides that correspond to the gene complement of these strains, previously unidentified ("novel") polypeptides that had both unmasked sequence length greater than 100 amino acids and no BLASTP match to any nonreference entry in the nonredundant subset were defined. This analysis resulted in a set of 30,867 polypeptides, of which 29,987 (approximately 97%) were unique. In addition, this set of microbial genomes allows for approximately 40% of random sequences from the microbiome of the gastrointestinal tract to be associated with organisms based on the match criteria used. Insights into pan-genome analysis suggest that we are still far from saturating microbial species genetic data sets. In addition, the associated metrics and standards used by our group for quality assurance are presented.

649 citations


Journal ArticleDOI
TL;DR: The GenePRIMP as discussed by the authors is a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missed genes and split genes.
Abstract: We present 'gene prediction improvement pipeline' (GenePRIMP; http://geneprimp.jgi-psf.org/), a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missed genes and split genes. We found that manual curation of gene models using the anomaly reports generated by GenePRIMP improved their quality, and demonstrate the applicability of GenePRIMP in improving finishing quality and comparing different genome-sequencing and annotation technologies.

459 citations


Journal ArticleDOI
TL;DR: The integrated microbial genomes (IMG) system as mentioned in this paper is a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context, containing both draft and complete microbial genomes.
Abstract: The integrated microbial genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG contains both draft and complete microbial genomes integrated with other publicly available genomes from all three domains of life, together with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. Since its first release in 2005, IMG’s data content and analytical capabilities have been constantly expanded through regular releases. Several companion IMG systems have been set up in order to serve domain specific needs, such as expert review of genome annotations. IMG is available at http://img.jgi.doe.gov.

264 citations


Journal ArticleDOI
TL;DR: Dinoroseobacter shibae DFL12T, a member of the globally important marine Roseobacter clade, comprises symbionts of cosmopolitan marine microalgae, including toxic dinoflagellates, and shows the most complex viral defense system of all Rhodobacterales sequenced to date.
Abstract: Dinoroseobacter shibae DFL12T, a member of the globally important marine Roseobacter clade, comprises symbionts of cosmopolitan marine microalgae, including toxic dinoflagellates. Its annotated 4 417 868 bp genome sequence revealed a possible advantage of this symbiosis for the algal host. D. shibae DFL12T is able to synthesize the vitamins B1 and B12 for which its host is auxotrophic. Two pathways for the de novo synthesis of vitamin B12 are present, one requiring oxygen and the other an oxygen-independent pathway. The de novo synthesis of vitamin B12 was confirmed to be functional, and D. shibae DFL12T was shown to provide the growth-limiting vitamins B1 and B12 to its dinoflagellate host. The Roseobacter clade has been considered to comprise obligate aerobic bacteria. However, D. shibae DFL12T is able to grow anaerobically using the alternative electron acceptors nitrate and dimethylsulfoxide; it has the arginine deiminase survival fermentation pathway and a complex oxygen-dependent Fnr (fumarate and nitrate reduction) regulon. Many of these traits are shared with other members of the Roseobacter clade. D. shibae DFL12T has five plasmids, showing examples for vertical recruitment of chromosomal genes (thiC) and horizontal gene transfer (cox genes, gene cluster of 47 kb) possibly by conjugation (vir gene cluster). The long-range (80%) synteny between two sister plasmids provides insights into the emergence of novel plasmids. D. shibae DFL12T shows the most complex viral defense system of all Rhodobacterales sequenced to date.

248 citations


Journal ArticleDOI
TL;DR: The birth of a concept and practical approach to exploring microbial life on earth, the Earth Microbiome Project (EMP), is born and how it can be applied to exploration of the microbiome of each ecosystem on this planet is provided.
Abstract: Between July 18th and 24th 2010, 26 leading microbial ecology, computation, bioinformatics and statistics researchers came together in Snowbird, Utah (USA) to discuss the challenge of how to best characterize the microbial world using next-generation sequencing technologies. The meeting was entitled “Terabase Metagenomics” and was sponsored by the Institute for Computing in Science (ICiS) summer 2010 workshop program. The aim of the workshop was to explore the fundamental questions relating to microbial ecology that could be addressed using advances in sequencing potential. Technological advances in next-generation sequencing platforms such as the Illumina HiSeq 2000 can generate in excess of 250 billion base pairs of genetic information in 8 days. Thus, the generation of a trillion base pairs of genetic information is becoming a routine matter. The main outcome from this meeting was the birth of a concept and practical approach to exploring microbial life on earth, the Earth Microbiome Project (EMP). Here we briefly describe the highlights of this meeting and provide an overview of the EMP concept and how it can be applied to exploration of the microbiome of each ecosystem on this planet.

245 citations


Journal ArticleDOI
TL;DR: The first meeting of the Earth Microbiome Project to discuss sample selection and acquisition focused on discussion of how to prioritize environmental samples for sequencing and metagenomic analysis as part of the global effort of the EMP.
Abstract: This report details the outcome the first meeting of the Earth Microbiome Project to discuss sample selection and acquisition The meeting, held at the Argonne National Laboratory on Wednesday October 6th 2010, focused on discussion of how to prioritize environmental samples for sequencing and metagenomic analysis as part of the global effort of the EMP to systematically determine the functional and phylogenetic diversity of microbial communities across the world

188 citations


Journal ArticleDOI
22 Mar 2010-PLOS ONE
TL;DR: It is possible to state that a broad metabolic capability is a general trait for Cupriavidus genus, however certain specialization towards a nutritional niche seems to be shaped mostly by the acquisition of “specialized” plasmids.
Abstract: Background Cupriavidus necator JMP134 is a Gram-negative β-proteobacterium able to grow on a variety of aromatic and chloroaromatic compounds as its sole carbon and energy source.

104 citations


Journal ArticleDOI
TL;DR: This is the first report of a complete genome sequence for a microsymbiont of the group of annual medic species adapted to acid soils and it is revealed that its genome size is 6,817,576 bp encoding 6,518 protein-coding genes and 81 RNA only encoding genes.
Abstract: Ensifer (Sinorhizobium) medicae is an effective nitrogen fixing microsymbiont of a diverse range of annual Medicago (medic) species. Strain WSM419 is an aerobic, motile, non-spore forming, Gram-negative rod isolated from a M. murex root nodule collected in Sardinia, Italy in 1981. WSM419 was manufactured commercially in Australia as an inoculant for annual medics during 1985 to 1993 due to its nitrogen fixation, saprophytic competence and acid tolerance properties. Here we describe the basic features of this organism, together with the complete genome sequence, and annotation. This is the first report of a complete genome sequence for a microsymbiont of the group of annual medic species adapted to acid soils. We reveal that its genome size is 6,817,576 bp encoding 6,518 protein-coding genes and 81 RNA only encoding genes. The genome contains a chromosome of size 3,781,904 bp and 3 plasmids of size 1,570,951 bp, 1,245,408 bp and 219,313 bp. The smallest plasmid is a feature unique to this medic microsymbiont.

100 citations


Journal ArticleDOI
TL;DR: The sequenced strains significantly increase the number of noncommensal/nonpathogenic clostridial species and provide a key foundation for future studies of biomass conversion, cellulosome composition, and clostridgeial systems biology.
Abstract: Modern methods to develop microbe-based biomass conversion processes require a system-level understanding of the microbes involved. Clostridium species have long been recognized as ideal candidates for processes involving biomass conversion and production of various biofuels and other industrial products. To expand the knowledge base for clostridial species relevant to current biofuel production efforts, we have sequenced the genomes of 20 species spanning multiple genera. The majority of species sequenced fall within the class III cellulosome-encoding Clostridium and the class V saccharolytic Thermoanaerobacteraceae. Species were chosen based on representation in the experimental literature as model organisms, ability to degrade cellulosic biomass either by free enzymes or by cellulosomes, ability to rapidly ferment hexose and pentose sugars to ethanol, and ability to ferment synthesis gas to ethanol. The sequenced strains significantly increase the number of noncommensal/nonpathogenic clostridial species and provide a key foundation for future studies of biomass conversion, cellulosome composition, and clostridial systems biology.

85 citations


Journal ArticleDOI
TL;DR: The features of this organism are described, together with the complete genome sequence, and annotation, and the 9,446,314 bp long single replicon genome with its 6,898 protein-coding and 53 RNA genes is part of the GenomicEncyclopedia ofBacteria andArchaea project.
Abstract: Haliangium ochraceum Fudou et al. 2002 is the type species of the genus Haliangium in the myxococcal family ‘Haliangiaceae’. Members of the genus Haliangium are the first halophilic myxobacterial taxa described. The cells of the species follow a multicellular lifestyle in highly organized biofilms, called swarms, they decompose bacterial and yeast cells as most myxobacteria do. The fruiting bodies contain particularly small coccoid myxospores. H. ochraceum encodes the first actin homologue identified in a bacterial genome. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the myxococcal suborder Nannocystineae, and the 9,446,314 bp long single replicon genome with its 6,898 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

63 citations


Journal ArticleDOI
TL;DR: M. luteus is capable of long-chain alkene biosynthesis, which is of interest for advanced biofuel production; a three-gene cluster essential for this metabolism has been identified in the genome.
Abstract: Micrococcus luteus (NCTC2665, "Fleming strain") has one of the smallest genomes of free-living actinobacteria sequenced to date, comprising a single circular chromosome of 2,501,097 bp (G+C content 73%) predicted to encode 2403 proteins. The genome shows extensive synteny with that of the closely related organism, Kocuria rhizophila, from which it was taxonomically separated relatively recently. Despite its small size, the genome harbors 73 IS elements, almost all of which are closely related to elements found in other actinobacteria. An IS element is inserted into the rrs gene of one of only two rrn operons found in M. luteus. The genome encodes only four sigma factors and fourteen response regulators, indicative of adaptation to a rather strict ecological niche (mammalian skin). The high sensitivity of M. luteus to beta-lactam antibiotics may result from the presence of a reduced set of penicillin-binding proteins and the absence of a wblC gene, which plays an important role in antibiotic resistance in other actinobacteria. Consistent with the restricted range of compounds it can use as a sole source of carbon for energy and growth, M. luteus has a minimal complement of genes concerned with carbohydrate transport and metabolism and its inability to utilize glucose as a sole carbon source may be due to the apparent absence of a gene encoding glucokinase. Uniquely among characterized bacteria, M. luteus appears to be able to metabolize glycogen only via trehalose, and to make trehalose only via glycogen. It has very few genes associated with secondary metabolism. In contrast to other actinobacteria, M. luteus encodes only one resuscitation-promoting factor (Rpf) required for emergence from dormancy and its complement of other dormancy-related proteins is also much reduced. M. luteus is capable of long-chain alkene biosynthesis, which is of interest for advanced biofuel production; a three-gene cluster essential for this metabolism has been identified in the genome.

Journal ArticleDOI
TL;DR: This is the first completed genome sequence for a microsymbiont of annual clovers, and it is revealed that its genome size is 7,418,122 bp encoding 7,232 protein-coding genes and 61 RNA-only encoding genes.
Abstract: Rhizobium leguminosarum bv trifolii is a soil-inhabiting bacterium that has the capacity to be an effective nitrogen fixing microsymbiont of a diverse range of annual Trifolium (clover) species. Strain WSM1325 is an aerobic, motile, non-spore forming, Gram-negative rod isolated from root nodules collected in 1993 from the Greek Island of Serifos. WSM1325 is produced commercially in Australia as an inoculant for a broad range of annual clovers of Mediterranean origin due to its superior attributes of saprophytic competence, nitrogen fixation and acid-tolerance. Here we describe the basic features of this organism, together with the complete genome sequence, and annotation. This is the first completed genome sequence for a microsymbiont of annual clovers. We reveal that its genome size is 7,418,122 bp encoding 7,232 protein-coding genes and 61 RNA-only encoding genes. This multipartite genome contains 6 distinct replicons; a chromosome of size 4,767,043 bp and 5 plasmids of size 828,924 bp, 660,973 bp, 516,088 bp, 350,312 bp and 294,782 bp.

Journal ArticleDOI
TL;DR: This is the first completed genome sequence for a nitrogen fixing microsymbiont of a clover species from the American center of origin and it is revealed that its genome size is 6,872,702 bp encoding 6,643 protein-coding genes and 62 RNA only encoding genes.
Abstract: Rhizobium leguminosarum bv trifolii is the effective nitrogen fixing microsymbiont of a diverse range of annual and perennial Trifolium (clover) species. Strain WSM2304 is an aerobic, motile, non-spore forming, Gram-negative rod, isolated from Trifolium polymorphum in Uruguay in 1998. This microsymbiont predominated in the perennial grasslands of Glencoe Research Station, in Uruguay, to competitively nodulate its host, and fix atmospheric nitrogen. Here we describe the basic features of WSM2304, together with the complete genome sequence, and annotation. This is the first completed genome sequence for a nitrogen fixing microsymbiont of a clover species from the American center of origin. We reveal that its genome size is 6,872,702 bp encoding 6,643 protein-coding genes and 62 RNA only encoding genes. This multipartite genome was found to contain 5 distinct replicons; a chromosome of size 4,537,948 bp and four circular plasmids of size 1,266,105 bp, 501,946 bp, 308,747 bp and 257,956 bp.

Journal ArticleDOI
TL;DR: This is the first completed genome sequence of the genus Meiothermus and only the third genome sequence to be published from a member of the family Thermaceae.
Abstract: Meiothermus ruber (Loginova et al. 1984) Nobre et al. 1996 is the type species of the genus Meiothermus. This thermophilic genus is of special interest, as its members share relatively low degrees of 16S rRNA gene sequence similarity and constitute a separate evolutionary lineage from members of the genus Thermus, from which they can generally be distinguished by their slightly lower temperature optima. The temperature related split is in accordance with the chemotaxonomic feature of the polar lipids. M. ruber is a representative of the low-temperature group. This is the first completed genome sequence of the genus Meiothermus and only the third genome sequence to be published from a member of the family Thermaceae. The 3,097,457 bp long genome with its 3,052 protein-coding and 53 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

Journal ArticleDOI
TL;DR: This is the firstcomplete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family CellULomonadaceae.
Abstract: Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Journal ArticleDOI
TL;DR: The features of this organism are described, together with the complete genome sequence and annotation, which are of interest because of its significant contribution to the global sulfur cycle as it oxidizes sulfur compounds to sulfate and by its apparent habitation of deep-sea hydrothermal and marine sulfidic environments as potential ecological niche.
Abstract: Sulfurimonas autotrophica Inagaki et al. 2003 is the type species of the genus Sulfurimonas. This genus is of interest because of its significant contribution to the global sulfur cycle as it oxidizes sulfur compounds to sulfate and by its apparent habitation of deep-sea hydrothermal and marine sulfidic environments as potential ecological niche. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the second complete genome sequence of the genus Sulfurimonas and the 15th genome in the family Helicobacteraceae. The 2,153,198 bp long genome with its 2,165 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Journal ArticleDOI
23 Dec 2010-Archaea
TL;DR: A comparison of the reconstructed energy metabolism in the halophilic species Mhp.
Abstract: Methanohalophilus mahii is the type species of the genus Methanohalophilus, which currently comprises three distinct species with validly published names. Mhp. mahii represents moderately halophilic methanogenic archaea with a strictly methylotrophic metabolism. The type strain SLPT was isolated from hypersaline sediments collected from the southern arm of Great Salt Lake, Utah. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,012,424 bp genome is a single replicon with 2032 protein-coding and 63 RNA genes and part of the Genomic Encyclopedia of Bacteria and Archaea project. A comparison of the reconstructed energy metabolism in the halophilic species Mhp. mahii with other representatives of the Methanosarcinaceae reveals some interesting differences to freshwater species.

Journal ArticleDOI
TL;DR: This is the first complete genome sequence of a type stain of the genus Arcobacter, and it is of phylogenetic interest because of its lifestyle as a symbiotic organism in a marine environment in contrast to many other Arcobacteria species which are associated with warm-blooded animals and tend to be pathogenic.
Abstract: Arcobacter nitrofigilis (McClung et al. 1983) Vandamme et al. 1991 is the type species of the genus Arcobacter in the family Campylobacteraceae within the Epsilonproteobacteria. The species was first described in 1983 as Campylobacter nitrofigilis [1] after its detection as a free-living, nitrogen-fixing Campylobacter species associated with Spartina alterniflora Loisel roots [2]. It is of phylogenetic interest because of its lifestyle as a symbiotic organism in a marine environment in contrast to many other Arcobacter species which are associated with warm-blooded animals and tend to be pathogenic. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a type stain of the genus Arcobacter. The 3,192,235 bp genome with its 3,154 protein-coding and 70 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Journal ArticleDOI
TL;DR: The features of Spirosoma linguale are described, together with the complete genome sequence and annotation, which is only the third completed genome sequence of a member of the family Cytophagaceae.
Abstract: Spirosoma linguale Migula 1894 is the type species of the genus. S. linguale is a free-living and non-pathogenic organism, known for its peculiar ringlike and horseshoe-shaped cell morphology. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is only the third completed genome sequence of a member of the family Cytophagaceae. The 8,491,258 bp long genome with its eight plasmids, 7,069 protein-coding and 60 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Journal ArticleDOI
TL;DR: Nckyrpides et al. as discussed by the authors presented a five-tiered metagenome naming and classification scheme, where the top level includes the broad NCBI categories, but they also add a third category that separates out manipulated communities such as bioreactors or treatment plants from natural environmental communities.
Abstract: Everyone would agree that metagenomics has been a great boon to the field of environmental microbiology. Fuelled by major advances in sequencing technology, the number of metagenome projects has exploded in recent years, with hundreds of environmental samples having been interrogated by shotgun sequencing (Markowitz et al., 2008; Meyer et al., 2008; Liolios et al., 2009). As a result, while just a few years ago it was possible for an individual investigator to be familiar with the major shotgun metagenomic data sets, today there are far too many to easily recite. Therefore we argue that the time is ripe for developing and implementing a metagenome classification system. Why classify metagenomes? The ability to extract, study and understand information from genomic data depends heavily on comparative analysis, and metagenomic data are no exception. Yet the appropriate comparisons to make are much less clear for metagenomes than for genomes, where the choice of comparison can be guided by phylogenetic classification. Moreover, even if the type of environmental studies one would want to compare is known, it still remains difficult to know how many and which are available given the lack of systematic nomenclature describing these projects (i.e. standardized naming) or categorization. For example, if you were looking for metagenomes from organisms in the digestive tracts of various animals, they might be named ‘gut’ but could also be ‘rumen’, ‘forestomach’, ‘caecum’ or ‘faecal’ communities. Currently metagenomic projects are not systematically classified. NCBI’s metagenomic project catalogue has implemented a simple and general project type distinction between ‘environmental’ and ‘host-associated’ projects (named correspondingly as Ecological and Organismal). This shallow classification is a starting point but does not address the many other environmental features potentially of interest for comparison. In order to circumvent the present difficulty in identifying appropriate metagenomic projects for comparative analysis, we present here a fivetiered metagenome naming and classification scheme. The top level includes the broad NCBI categories, but we also add a third ‘engineered’ category that separates out manipulated communities such as bioreactors or treatment plants from natural environmental communities (Fig. 1). Each of these is then subcategorized according to a variety of criteria, taking into account knowledge of key variables that influence community composition [e.g. salinity (Lozupone and Knight, 2007) or soil pH (Lauber et al., 2009)]. Where possible, we have taken advantage of existing classification systems such as the Environment Ontology (EnvO; http://www.environmentontology.org/). Environmental communities are separated by the ecosystem category (aquatic, terrestrial, air) and ecosystem type (e.g. freshwater, marine) with more detailed categorizations based on specific features (e.g. salinity, pH). Host-associated communities are defined by host phylogeny, then sampling site; and finally engineered communities are classified by their function (e.g. bioremediation or food production) with further levels based on specific substrates or features. In some cases an individual ‘project’ may span multiple categories because it includes samples from different habitat types. A sampling of the higher-level categories is shown in Table 1, and the complete proposed schema is available from GOLD (Genomes OnLine Database, http://www.genomesonline.org/cgi-bin/ GOLD/bin/metagenomic_classification.cgi) and IMG/M (http://img.jgi.doe.gov/m/). Although we developed this schema to address an immediate need within these databases, we hope that it will provide the basis for a broadly *For correspondence. E-mail nckyrpides@lbl.gov; Tel. (+1) 925 296 5718; Fax (+1) 925 296 5720. Environmental Microbiology (2010) 12(7), 1803–1805 doi:10.1111/j.1462-2920.2010.02270.x

Journal ArticleDOI
TL;DR: The features of the Aminobacterium colombiense are described, together with the complete genome sequence and annotation, which is the second completed genome sequence of a member of the family Synergistaceae and the first genome sequence that is part of the GenomicEncyclopedia of bacteria andArchaea project.
Abstract: Aminobacterium colombiense Baena et al. 1999 is the type species of the genus Aminobacterium. This genus is of large interest because of its isolated phylogenetic location in the family Synergistaceae, its strictly anaerobic lifestyle, and its ability to grow by fermentation of a limited range of amino acids but not carbohydrates. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the second completed genome sequence of a member of the family Synergistaceae and the first genome sequence of a member of the genus Aminobacterium. The 1,980,592 bp long genome with its 1,914 protein-coding and 56 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Journal ArticleDOI
TL;DR: The strain described in this study was isolated from human gingival crevices and contains the first completed sequence of the genus Olsenella and the fifth sequence from a member of the family Coriobacteriaceae.
Abstract: Olsenella uli (Olsen et al. 1991) Dewhirst et al. 2001 is the type species of the genus Olsenella, which belongs to the actinobacterial family Coriobacteriaceae. The species is of interest because it is frequently isolated from dental plaque in periodontitis patients and can cause primary endodontic infection. The species is a Gram-positive, non-motile and non-sporulating bacterium. The strain described in this study was isolated from human gingival crevices. This is the first completed sequence of the genus Olsenella and the fifth sequence from a member of the family Coriobacteriaceae. The 2,051,896 bp long genome with its 1,795 protein-coding and 55 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

Journal ArticleDOI
TL;DR: This is the first complete genome sequence of a member of the large clostridial family Veillonellaceae, and the 2,132,142 bp long single replicon genome with its 1,859 protein-coding and 61 RNA genes is part of the GenomicEncyclopedia ofBacteria andArchaea project.
Abstract: Veillonella parvula (Veillon and Zuber 1898) Prevot 1933 is the type species of the genus Veillonella in the family Veillonellaceae within the order Clostridiales. The species V. parvula is of interest because it is frequently isolated from dental plaque in the human oral cavity and can cause opportunistic infections. The species is strictly anaerobic and grows as small cocci which usually occur in pairs. Veillonellae are characterized by their unusual metabolism which is centered on the activity of the enzyme methylmalonyl-CoA decarboxylase. Strain Te3T, the type strain of the species, was isolated from the human intestinal tract. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the large clostridial family Veillonellaceae, and the 2,132,142 bp long single replicon genome with its 1,859 protein-coding and 61 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Journal ArticleDOI
TL;DR: A gamma distribution is employed to model a metagenome as a population of DNA fragments (bins), each of which may be covered by one or more reads, and the number of bins that were not sequenced and that could potentially be revealed by additional sequencing is estimated.
Abstract: Motivation: Shotgun sequencing generates large numbers of short DNA reads from either an isolated organism or, in the case of metagenomics projects, from the aggregate genome of a microbial community. These reads are then assembled based on overlapping sequences into larger, contiguous sequences (contigs). The feasibility of assembly and the coverage achieved (reads per nucleotide or distinct sequence of nucleotides) depend on several factors: the number of reads sequenced, the read length and the relative abundances of their source genomes in the microbial community. A low coverage suggests that most of the genomic DNA in the sample has not been sequenced, but it is often difficult to estimate either the extent of the uncaptured diversity or the amount of additional sequencing that would be most efficacious. In this work, we regard a metagenome as a population of DNA fragments (bins), each of which may be covered by one or more reads. We employ a gamma distribution to model this bin population due to its flexibility and ease of use. When a gamma approximation can be found that adequately fits the data, we may estimate the number of bins that were not sequenced and that could potentially be revealed by additional sequencing. We evaluated the performance of this model using simulated metagenomes and demonstrate its applicability on three recent metagenomic datasets. Contact: sean.d.hooper@genpat.uu.se Supplementary information: Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: This is the first complete genome sequence of a member of the family Nakamurellaceae and the 6,060,298 bp long single replicon genome with its 5415 protein-coding and 56 RNA genes is part of the GenomicEncyclopedia of bacteria andArchaea project.
Abstract: Nakamurella multipartita (Yoshimi et al. 1996) Tao et al. 2004 is the type species of the monospecific genus Nakamurella in the actinobacterial suborder Frankineae. The nonmotile, coccus-shaped strain was isolated from activated sludge acclimated with sugar-containing synthetic wastewater, and is capable of accumulating large amounts of polysaccharides in its cells. Here we describe the features of the organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of a member of the family Nakamurellaceae. The 6,060,298 bp long single replicon genome with its 5415 protein-coding and 56 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Journal ArticleDOI
TL;DR: Haloterrigena turkmenica, was isolated from sulfate saline soil in Turkmenistan, is a relatively fast growing, chemoorganotrophic, carotenoid-containing, extreme halophile, requiring at least 2 M NaCl for growth.
Abstract: Haloterrigena turkmenica (Zvyagintseva and Tarasov 1987) Ventosa et al. 1999, comb. nov. is the type species of the genus Haloterrigena in the euryarchaeal family Halobacteriaceae. It is of phylogenetic interest because of the yet unclear position of the genera Haloterrigena and Natrinema within the Halobacteriaceae, which created some taxonomic problems historically. H. turkmenica, was isolated from sulfate saline soil in Turkmenistan, is a relatively fast growing, chemoorganotrophic, carotenoid-containing, extreme halophile, requiring at least 2 M NaCl for growth. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the genus Haloterrigena, but the eighth genome sequence from a member of the family Halobacteriaceae. The 5,440,782 bp genome (including six plasmids) with its 5,287 protein-coding and 63 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Journal ArticleDOI
TL;DR: This species is a mesophilic sulfate-reducing bacterium with the capability to oxidize acetate and fatty acids of up to 18 carbon atoms completely to CO2 and the acetyl-CoA/CODH pathway is used by this species for the complete oxidation of carbon sources and autotrophic growth on formate.
Abstract: Desulfarculus baarsii (Widdel 1981) Kuever et al. 2006 is the type and only species of the genus Desulfarculus, which represents the family Desulfarculaceae and the order Desulfarculales. This species is a mesophilic sulfate-reducing bacterium with the capability to oxidize acetate and fatty acids of up to 18 carbon atoms completely to CO2. The acetyl-CoA/CODH (Wood-Ljungdahl) pathway is used by this species for the complete oxidation of carbon sources and autotrophic growth on formate. The type strain 2st14T was isolated from a ditch sediment collected near the University of Konstanz, Germany. This is the first completed genome sequence of a member of the order Desulfarculales. The 3,655,731 bp long single replicon genome with its 3,303 protein-coding and 52 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

Journal ArticleDOI
TL;DR: This is the first complete genome sequence of a member of the family Acidaminococcaceae, and the 2,329,769 bp long genome with its 2,101 protein-coding and 81 RNA genes is part of the GenomicEncyclopedia ofBacteria andArchaea project.
Abstract: Acidaminococcus fermentans (Rogosa 1969) is the type species of the genus Acidaminococcus, and is of phylogenetic interest because of its isolated placement in a genomically little characterized region of the Firmicutes. A. fermentans is known for its habitation of the gastrointestinal tract and its ability to oxidize trans-aconitate. Its anaerobic fermentation of glutamate has been intensively studied and will now be complemented by the genomic basis. The strain described in this report is a nonsporulating, nonmotile, Gram-negative coccus, originally isolated from a pig alimentary tract. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the family Acidaminococcaceae, and the 2,329,769 bp long genome with its 2,101 protein-coding and 81 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Journal ArticleDOI
TL;DR: The 5,460,085 bp long genome with its 4,304 protein-coding and 66 RNA genes is a part of the GenomicEncyclopedia ofBacteria andArchaea project.
Abstract: Planctomyces limnophilus Hirsch and Muller 1986 belongs to the order Planctomycetales, which differs from other bacterial taxa by several distinctive features such as internal cell compartmentalization, multiplication by forming buds directly from the spherical, ovoid or pear-shaped mother cell and a cell wall which is stabilized by a proteinaceous layer rather than a peptidoglycan layer. Besides Pirellula staleyi, this is the second completed genome sequence of the family Planctomycetaceae. P. limnophilus is of interest because it differs from Pirellula by the presence of a stalk and its structure of fibril bundles, its cell shape and size, the formation of multicellular rosettes, low salt tolerance and red pigmented colonies. The 5,460,085 bp long genome with its 4,304 protein-coding and 66 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

Journal ArticleDOI
TL;DR: This is the first completed genome sequence of the genus Sulfurospirillum, a model organism for studying sulfur reduction and dissimilatory nitrate reduction as an energy source for growth and the GenomicEncyclopedia ofBacteria andArchaea project.
Abstract: Sulfurospirillum deleyianum Schumacher et al. 1993 is the type species of the genus Sulfurospirillum. S. deleyianum is a model organism for studying sulfur reduction and dissimilatory nitrate reduction as an energy source for growth. Also, it is a prominent model organism for studying the structural and functional characteristics of cytochrome c nitrite reductase. Here, we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the genus Sulfurospirillum. The 2,306,351 bp long genome with its 2,291 protein-coding and 52 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.