Author
Stephanie Malfatti
Other affiliations: United States Department of Energy, Lawrence Livermore National Laboratory
Bio: Stephanie Malfatti is an academic researcher from Joint Genome Institute. The author has contributed to research in topics: Metagenomics & Genome. The author has an hindex of 26, co-authored 38 publications receiving 7106 citations. Previous affiliations of Stephanie Malfatti include United States Department of Energy & Lawrence Livermore National Laboratory.
Topics: Metagenomics, Genome, Population, Phylogenetic tree, Genomics
Papers
More filters
••
TL;DR: The pyrosequencing of the bacterial 16S ribosomal RNA gene of more than 600 Arabidopsis thaliana plants is reported to test the hypotheses that the root rhizosphere and endophytic compartment microbiota of plants grown under controlled conditions in natural soils are sufficiently dependent on the host to remain consistent across different soil types and developmental stages.
Abstract: Sequencing of the Arabidopsis thaliana root microbiome shows that its composition is strongly influenced by location, inside or outside the root, and by soil type. The association between a land plant and the soil microbes of the root microbiome is important for the plant's well-being. A deeper understanding of these microbial communities will offer opportunities to control plant growth and susceptibility to pathogens, particularly in sustainable agricultural regimes. Two groups, working separately but developing best-practice protocols in parallel, have characterized the root microbiota of the model plant Arabidopis thaliana. Working on two continents and with five different soil types, they reach similar general conclusions. The bacterial communities in each root compartment — the rhizosphere immediately surrounding the root and the endophytic compartment within the root — are most strongly influenced by soil type, and to a lesser degree by host genotype. In natural soils, Arabidopsis plants are preferentially colonized by Actinobacteria, Proteobacteria, Bacteroidetes and Chloroflexi species. And — an important point for future work — Arabidopsis root selectivity for soil bacteria under controlled environmental conditions mimics that of plants grown in a natural environment. Land plants associate with a root microbiota distinct from the complex microbial community present in surrounding soil. The microbiota colonizing the rhizosphere (immediately surrounding the root) and the endophytic compartment (within the root) contribute to plant growth, productivity, carbon sequestration and phytoremediation1,2,3. Colonization of the root occurs despite a sophisticated plant immune system4,5, suggesting finely tuned discrimination of mutualists and commensals from pathogens. Genetic principles governing the derivation of host-specific endophyte communities from soil communities are poorly understood. Here we report the pyrosequencing of the bacterial 16S ribosomal RNA gene of more than 600 Arabidopsis thaliana plants to test the hypotheses that the root rhizosphere and endophytic compartment microbiota of plants grown under controlled conditions in natural soils are sufficiently dependent on the host to remain consistent across different soil types and developmental stages, and sufficiently dependent on host genotype to vary between inbred Arabidopsis accessions. We describe different bacterial communities in two geochemically distinct bulk soils and in rhizosphere and endophytic compartments prepared from roots grown in these soils. The communities in each compartment are strongly influenced by soil type. Endophytic compartments from both soils feature overlapping, low-complexity communities that are markedly enriched in Actinobacteria and specific families from other phyla, notably Proteobacteria. Some bacteria vary quantitatively between plants of different developmental stage and genotype. Our rigorous definition of an endophytic compartment microbiome should facilitate controlled dissection of plant–microbe interactions derived from complex soil communities.
2,097 citations
••
Joint Genome Institute1, Bielefeld University2, University of Technology, Sydney3, University of California, Davis4, Bigelow Laboratory For Ocean Sciences5, University of British Columbia6, University of Nevada, Las Vegas7, University of Patras8, Woods Hole Oceanographic Institution9, University of Illinois at Urbana–Champaign10, University of Queensland11
TL;DR: This study applies single-cell genomics to target and sequence 201 archaeal and bacterial cells from nine diverse habitats belonging to 29 major mostly uncharted branches of the tree of life and provides a systematic step towards a better understanding of biological evolution on the authors' planet.
Abstract: Genome sequencing enhances our understanding of the biological world by providing blueprints for the evolutionary and functional diversity that shapes the biosphere. However, microbial genomes that are currently available are of limited phylogenetic breadth, owing to our historical inability to cultivate most microorganisms in the laboratory. We apply single-cell genomics to target and sequence 201 uncultivated archaeal and bacterial cells from nine diverse habitats belonging to 29 major mostly uncharted branches of the tree of life, so-called 'microbial dark matter'. With this additional genomic information, we are able to resolve many intra- and inter-phylum-level relationships and to propose two new superphyla. We uncover unexpected metabolic features that extend our understanding of biology and challenge established boundaries between the three domains of life. These include a novel amino acid use for the opal stop codon, an archaeal-type purine synthesis in Bacteria and complete sigma factors in Archaea similar to those in Bacteria. The single-cell genomes also served to phylogenetically anchor up to 20% of metagenomic reads in some habitats, facilitating organism-level interpretation of ecosystem function. This study greatly expands the genomic representation of the tree of life and provides a systematic step towards a better understanding of biological evolution on our planet.
1,856 citations
••
TL;DR: The genomes of two Prochlorococcus strains that span the largest evolutionary distance within the Pro chlorococcus lineage are compared and reveal dynamic genomes that are constantly changing in response to myriad selection pressures.
Abstract: The marine unicellular cyanobacterium Prochlorococcus is the smallest-known oxygen-evolving autotroph1. It numerically dominates the phytoplankton in the tropical and subtropical oceans2,3, and is responsible for a significant fraction of global photosynthesis. Here we compare the genomes of two Prochlorococcus strains that span the largest evolutionary distance within the Prochlorococcus lineage4 and that have different minimum, maximum and optimal light intensities for growth5. The high-light-adapted ecotype has the smallest genome (1,657,990 base pairs, 1,716 genes) of any known oxygenic phototroph, whereas the genome of its low-light-adapted counterpart is significantly larger, at 2,410,873 base pairs (2,275 genes). The comparative architectures of these two strains reveal dynamic genomes that are constantly changing in response to myriad selection pressures. Although the two strains have 1,350 genes in common, a significant number are not shared, and these have been differentially retained from the common ancestor, or acquired through duplication or lateral transfer. Some of these genes have obvious roles in determining the relative fitness of the ecotypes in response to key environmental variables, and hence in regulating their distribution and abundance in the oceans.
1,106 citations
••
TL;DR: The complete genomic sequence of Pseudomonas syringae pv. syringa (Pss B728a) has been determined and is compared with that of P. tomato DC3000 (Pst DC3000).
Abstract: The complete genomic sequence of Pseudomonas syringae pv. syringae B728a (Pss B728a) has been determined and is compared with that of P. syringae pv. tomato DC3000 (Pst DC3000). The two pathovars of this economically important species of plant pathogenic bacteria differ in host range and other interactions with plants, with Pss having a more pronounced epiphytic stage of growth and higher abiotic stress tolerance and Pst DC3000 having a more pronounced apoplastic growth habitat. The Pss B728a genome (6.1 Mb) contains a circular chromosome and no plasmid, whereas the Pst DC3000 genome is 6.5 mbp in size, composed of a circular chromosome and two plasmids. Although a high degree of similarity exists between the two sequenced Pseudomonads, 976 protein-encoding genes are unique to Pss B728a when compared with Pst DC3000, including large genomic islands likely to contribute to virulence and host specificity. Over 375 repetitive extragenic palindromic sequences unique to Pss B728a when compared with Pst DC3000 are widely distributed throughout the chromosome except in 14 genomic islands, which generally had lower GC content than the genome as a whole. Content of the genomic islands varies, with one containing a prophage and another the plasmid pKLC102 of Pseudomonas aeruginosa PAO1. Among the 976 genes of Pss B728a with no counterpart in Pst DC3000 are those encoding for syringopeptin, syringomycin, indole acetic acid biosynthesis, arginine degradation, and production of ice nuclei. The genomic comparison suggests that several unique genes for Pss B728a such as ectoine synthase, DNA repair, and antibiotic production may contribute to the epiphytic fitness and stress tolerance of this organism.
439 citations
••
TL;DR: This work combines two preassembly filtering approaches—digital normalization and partitioning—to generate previously intractable large metagenome assemblies, which result in assemblies nearly identical to assemblies from unprocessed data.
Abstract: The large volumes of sequencing data required to sample deeply the microbial communities of complex environments pose new challenges to sequence analysis. De novo metagenomic assembly effectively reduces the total amount of data to be analyzed but requires substantial computational resources. We combine two preassembly filtering approaches—digital normalization and partitioning—to generate previously intractable large metagenome assemblies. Using a human-gut mock community dataset, we demonstrate that these methods result in assemblies nearly identical to assemblies from unprocessed data. We then assemble two large soil metagenomes totaling 398 billion bp (equivalent to 88,000 Escherichia coli genomes) from matched Iowa corn and native prairie soils. The resulting assembled contigs could be used to identify molecular interactions and reaction networks of known metabolic pathways using the Kyoto Encyclopedia of Genes and Genomes Orthology database. Nonetheless, more than 60% of predicted proteins in assemblies could not be annotated against known databases. Many of these unknown proteins were abundant in both corn and prairie soils, highlighting the benefits of assembly for the discovery and characterization of novelty in soil biodiversity. Moreover, 80% of the sequencing data could not be assembled because of low coverage, suggesting that considerably more sequencing data are needed to characterize the functional content of soil.
303 citations
Cited by
More filters
••
TL;DR: The UPARSE pipeline reports operational taxonomic unit (OTU) sequences with ≤1% incorrect bases in artificial microbial community tests, compared with >3% correct bases commonly reported by other methods.
Abstract: Amplified marker-gene sequences can be used to understand microbial community structure, but they suffer from a high level of sequencing and amplification artifacts. The UPARSE pipeline reports operational taxonomic unit (OTU) sequences with ≤1% incorrect bases in artificial microbial community tests, compared with >3% incorrect bases commonly reported by other methods. The improved accuracy results in far fewer OTUs, consistently closer to the expected number of species in a community.
11,329 citations
••
TL;DR: An objective measure of genome quality is proposed that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities and is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches.
Abstract: Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. Although this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of “marker” genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate-, single-cell-, and metagenome-derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities.
5,788 citations
01 Aug 2000
TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.
Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor.
4,833 citations
••
TL;DR: Over 1.2 million previously unknown genes represented in these samples, including more than 782 new rhodopsin-like photoreceptors are identified, suggesting substantial oceanic microbial diversity.
Abstract: We have applied “whole-genome shotgun sequencing” to microbial populations collected en masse on tangential flow and impact filters from seawater samples collected from the Sargasso Sea near Bermuda. A total of 1.045 billion base pairs of nonredundant sequence was generated, annotated, and analyzed to elucidate the gene content, diversity, and relative abundance of the organisms within these environmental samples. These data are estimated to derive from at least 1800 genomic species based on sequence relatedness, including 148 previously unknown bacterial phylotypes. We have identified over 1.2 million previously unknown genes represented in these samples, including more than 782 new rhodopsin-like photoreceptors. Variation in species present and stoichiometry suggests substantial oceanic microbial diversity. Microorganisms are responsible for most of the biogeochemical cycles that shape the environment of Earth and its oceans. Yet, these organisms are the least well understood on Earth, as the ability to study and understand the metabolic potential of microorganisms has been hampered by the inability to generate pure cultures. Recent studies have begun to explore environ
4,210 citations
••
TL;DR: MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner and generated a three-time larger assembly, with longer contig N50 and average contig length.
Abstract: Summary: MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252Gbps in 44.1 hours and 99.6 hours on a single computing node with and without a GPU, respectively. MEGAHIT assembles the data as a whole, i.e., no pre-processing like partitioning and normalization was needed. When compared with previous methods (Chikhi and Rizk, 2012; Howe, et al., 2014) on assembling the soil data, MEGAHIT generated a 3-time larger assembly, with longer contig N50 and average contig length; furthermore, 55.8% of the reads were aligned to the assembly, giving a 4-fold improvement . Availability: The source code of MEGAHIT is freely available at https://github.com/voutcn/megahit under GPLv3 license. Contact: rb@l3-bioinfo.com, twlam@cs.hku.hk
3,634 citations