Genomic Expansion of Domain Archaea Highlights Roles for Organisms from New Phyla in Anaerobic Carbon Cycling

doi:10.1016/J.CUB.2015.01.014

Home
/
Papers
/
Genomic Expansion of Domain Archaea Highlights Roles for Organisms from New Phyla in Anaerobic Carbon Cycling

Journal Article•DOI•

Genomic Expansion of Domain Archaea Highlights Roles for Organisms from New Phyla in Anaerobic Carbon Cycling

Cindy J. Castelle¹, Kelly C. Wrighton², Brian C. Thomas¹, Laura A. Hug¹, Christopher T. Brown¹, Michael J. Wilkins², Kyle R. Frischkorn³, Susannah G. Tringe⁴, Andrea Singh¹, Lye Meng Markillie⁵, Ronald C. Taylor⁵, Kenneth H. Williams⁶, Jillian F. Banfield¹ - Show less +9 more•Institutions (6)

University of California, Berkeley¹, Ohio State University², Lamont–Doherty Earth Observatory³, Joint Genome Institute⁴, Environmental Molecular Sciences Laboratory⁵, Lawrence Berkeley National Laboratory⁶

16 Mar 2015-Current Biology (Elsevier)-Vol. 25, Iss: 6, pp 690-701

TL;DR: This study sequenced DNA from complex sediment and planktonic consortia from an aquifer adjacent to the Colorado River and reconstructed the first complete genomes for Archaea using cultivation-independent methods, which dramatically expand genomic sampling of the domain Archaea and clarify taxonomic designations within a major superphylum.

read less

About: This article is published in Current Biology.The article was published on 2015-03-16 and is currently open access. It has received 463 citations till now. The article focuses on the topics: Nanohaloarchaea & Phylum.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A new view of the tree of life

[...]

Laura A. Hug¹, Laura A. Hug², Brett J. Baker³, Karthik Anantharaman², Christopher T. Brown⁴, Alexander J. Probst², Cindy J. Castelle², Cristina N. Butterfield², Alex W Hernsdorf⁴, Yuki Amano⁵, Kotaro Ise⁵, Yohey Suzuki⁶, Natasha Dudek⁷, David A. Relman⁸, David A. Relman⁹, Kari M. Finstad⁴, Ronald Amundson⁴, Brian C. Thomas², Jillian F. Banfield⁴, Jillian F. Banfield² - Show less +16 more•Institutions (9)

University of Waterloo¹, Planetary Science Institute², University of Texas at Austin³, University of California, Berkeley⁴, Japan Atomic Energy Agency⁵, University of Tokyo⁶, University of California, Santa Cruz⁷, Veterans Health Administration⁸, Stanford University⁹

11 Apr 2016-Nature microbiology

TL;DR: New genomic data from over 1,000 uncultivated and little known organisms, together with published sequences, are used to infer a dramatically expanded version of the tree of life, with Bacteria, Archaea and Eukarya included.

...read moreread less

Abstract: The tree of life is one of the most important organizing principles in biology1. Gene surveys suggest the existence of an enormous number of branches2, but even an approximation of the full scale of the tree has remained elusive. Recent depictions of the tree of life have focused either on the nature of deep evolutionary relationships3–5 or on the known, well-classified diversity of life with an emphasis on eukaryotes6. These approaches overlook the dramatic change in our understanding of life's diversity resulting from genomic sampling of previously unexamined environments. New methods to generate genome sequences illuminate the identity of organisms and their metabolic capacities, placing them in community and ecosystem contexts7,8. Here, we use new genomic data from over 1,000 uncultivated and little known organisms, together with published sequences, to infer a dramatically expanded version of the tree of life, with Bacteria, Archaea and Eukarya included. The depiction is both a global overview and a snapshot of the diversity within each major lineage. The results reveal the dominance of bacterial diversification and underline the importance of organisms lacking isolated representatives, with substantial evolution concentrated in a major radiation of such organisms. This tree highlights major lineages currently underrepresented in biogeochemical models and identifies radiations that are probably important for future evolutionary analyses. An update to the ‘tree of life’ has revealed a dominance of bacterial diversity in many ecosystems and extensive evolution in some branches of the tree. It also highlights how few organisms we have been able to cultivate for further investigation.

...read moreread less

1,614 citations

Journal Article•DOI•

Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life

[...]

Donovan H. Parks¹, Christian Rinke¹, Maria Chuvochina¹, Pierre-Alain Chaumeil¹, Ben J. Woodcroft¹, Paul N. Evans¹, Philip Hugenholtz¹, Gene W. Tyson¹ - Show less +4 more•Institutions (1)

University of Queensland¹

11 Sep 2017-Nature microbiology

TL;DR: The recovery of 7,903 bacterial and archaeal metagenome-assembled genomes increases the phylogenetic diversity represented by public genome repositories and provides the first representatives from 20 candidate phyla.

...read moreread less

Abstract: Challenges in cultivating microorganisms have limited the phylogenetic diversity of currently available microbial genomes. This is being addressed by advances in sequencing throughput and computational techniques that allow for the cultivation-independent recovery of genomes from metagenomes. Here, we report the reconstruction of 7,903 bacterial and archaeal genomes from >1,500 public metagenomes. All genomes are estimated to be ≥50% complete and nearly half are ≥90% complete with ≤5% contamination. These genomes increase the phylogenetic diversity of bacterial and archaeal genome trees by >30% and provide the first representatives of 17 bacterial and three archaeal candidate phyla. We also recovered 245 genomes from the Patescibacteria superphylum (also known as the Candidate Phyla Radiation) and find that the relative diversity of this group varies substantially with different protein marker sets. The scale and quality of this data set demonstrate that recovering genomes from metagenomes provides an expedient path forward to exploring microbial dark matter.

...read moreread less

1,248 citations

Cites background from "Genomic Expansion of Domain Archaea..."

...Ignavibacteriae (29) Calditrichaeota (5) Marinimicrobia (46), UBP11 (1) Fibrobacteres (17), Gemmatimonadetes (39)...
[...]

Journal Article•DOI•

Unusual biology across a group comprising more than 15% of domain Bacteria

[...]

Christopher T. Brown¹, Laura A. Hug¹, Brian C. Thomas¹, Itai Sharon¹, Cindy J. Castelle¹, Andrea Singh¹, Michael J. Wilkins², Kelly C. Wrighton², Kenneth H. Williams³, Jillian F. Banfield³ - Show less +6 more•Institutions (3)

University of California, Berkeley¹, Ohio State University², Lawrence Berkeley National Laboratory³

09 Jul 2015-Nature

TL;DR: This work reconstructed 8 complete and 789 draft genomes from bacteria representing >35 phyla and documented features that consistently distinguish these organisms from other bacteria, infer that this group, which may comprise >15% of the bacterial domain, has shared evolutionary history, and describe it as the candidate phyla radiation (CPR).

...read moreread less

Abstract: A prominent feature of the bacterial domain is a radiation of major lineages that are defined as candidate phyla because they lack isolated representatives. Bacteria from these phyla occur in diverse environments and are thought to mediate carbon and hydrogen cycles. Genomic analyses of a few representatives suggested that metabolic limitations have prevented their cultivation. Here we reconstructed 8 complete and 789 draft genomes from bacteria representing >35 phyla and documented features that consistently distinguish these organisms from other bacteria. We infer that this group, which may comprise >15% of the bacterial domain, has shared evolutionary history, and describe it as the candidate phyla radiation (CPR). All CPR genomes are small and most lack numerous biosynthetic pathways. Owing to divergent 16S ribosomal RNA (rRNA) gene sequences, 50-100% of organisms sampled from specific phyla would evade detection in typical cultivation-independent surveys. CPR organisms often have self-splicing introns and proteins encoded within their rRNA genes, a feature rarely reported in bacteria. Furthermore, they have unusual ribosome compositions. All are missing a ribosomal protein often absent in symbionts, and specific lineages are missing ribosomal proteins and biogenesis factors considered universal in bacteria. This implies different ribosome structures and biogenesis mechanisms, and underlines unusual biology across a large part of the bacterial domain.

...read moreread less

923 citations

Journal Article•DOI•

Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system

[...]

Karthik Anantharaman¹, Christopher T. Brown¹, Laura A. Hug¹, Itai Sharon¹, Cindy J. Castelle¹, Alexander J. Probst¹, Brian C. Thomas¹, Andrea Singh¹, Michael J. Wilkins², Ulas Karaoz³, Eoin L. Brodie³, Kenneth H. Williams³, Susan S. Hubbard³, Jillian F. Banfield³, Jillian F. Banfield¹ - Show less +11 more•Institutions (3)

University of California, Berkeley¹, Ohio State University², Lawrence Berkeley National Laboratory³

24 Oct 2016-Nature Communications

TL;DR: Terabase-scale cultivation-independent metagenomics is applied to aquifer sediments and groundwater and 2,540 draft-quality, near-complete and complete strain-resolved genomes are reconstructed, finding that few organisms within the community can conduct multiple sequential redox transformations.

...read moreread less

Abstract: The subterranean world hosts up to one-fifth of all biomass, including microbial communities that drive transformations central to Earth’s biogeochemical cycles. However, little is known about how complex microbial communities in such environments are structured, and how inter-organism interactions shape ecosystem function. Here we apply terabase-scale cultivation-independent metagenomics to aquifer sediments and groundwater, and reconstruct 2,540 draft-quality, near-complete and complete strain-resolved genomes that represent the majority of known bacterial phyla as well as 47 newly discovered phylum-level lineages. Metabolic analyses spanning this vast phylogenetic diversity and representing up to 36% of organisms detected in the system are used to document the distribution of pathways in coexisting organisms. Consistent with prior findings indicating metabolic handoffs in simple consortia, we find that few organisms within the community can conduct multiple sequential redox transformations. As environmental conditions change, different assemblages of organisms are selected for, altering linkages among the major biogeochemical cycles. Microorganisms from the terrestrial subsurface are understudied. Here, Anantharamanet al. analyse aquifer sediments and groundwater by genome-resolved metagenomics and reconstruct 2,540 genomes representing the majority of known bacterial phyla as well as 47 new phylum-level lineages.

...read moreread less

845 citations

Journal Article•DOI•

Asgard archaea illuminate the origin of eukaryotic cellular complexity

[...]

Katarzyna Zaremba-Niedzwiedzka¹, Eva F. Caceres¹, Jimmy H. Saw¹, Disa Bäckström¹, Lina Juzokaite¹, Emmelien Vancaester¹, Kiley W. Seitz², Karthik Anantharaman³, Piotr Starnawski⁴, Kasper Urup Kjeldsen⁴, Matthew B. Stott⁵, Takuro Nunoura⁶, Jillian F. Banfield³, Andreas Schramm⁴, Brett J. Baker², Anja Spang¹, Thijs J. G. Ettema¹ - Show less +13 more•Institutions (6)

Science for Life Laboratory¹, University of Texas at Austin², University of California, Berkeley³, Aarhus University⁴, GNS Science⁵, Japan Agency for Marine-Earth Science and Technology⁶

19 Jan 2017-Nature

TL;DR: The results expand the known repertoire of ‘eukaryote-specific’ proteins in Archaea, indicating that the archaeal host cell already contained many key components that govern eukaryotic cellular complexity.

...read moreread less

Abstract: The origin and cellular complexity of eukaryotes represent a major enigma in biology. Current data support scenarios in which an archaeal host cell and an alphaproteobacterial (mitochondrial) endosymbiont merged together, resulting in the first eukaryotic cell. The host cell is related to Lokiarchaeota, an archaeal phylum with many eukaryotic features. The emergence of the structural complexity that characterizes eukaryotic cells remains unclear. Here we describe the 'Asgard' superphylum, a group of uncultivated archaea that, as well as Lokiarchaeota, includes Thor-, Odin- and Heimdallarchaeota. Asgard archaea affiliate with eukaryotes in phylogenomic analyses, and their genomes are enriched for proteins formerly considered specific to eukaryotes. Notably, thorarchaeal genomes encode several homologues of eukaryotic membrane-trafficking machinery components, including Sec23/24 and TRAPP domains. Furthermore, we identify thorarchaeal proteins with similar features to eukaryotic coat proteins involved in vesicle biogenesis. Our results expand the known repertoire of 'eukaryote-specific' proteins in Archaea, indicating that the archaeal host cell already contained many key components that govern eukaryotic cellular complexity.

...read moreread less

789 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

Fast gapped-read alignment with Bowtie 2

[...]

Ben Langmead¹, Steven L. Salzberg², Steven L. Salzberg¹, Steven L. Salzberg³•Institutions (3)

University of Maryland, College Park¹, Johns Hopkins University School of Medicine², Johns Hopkins University³

01 Apr 2012-Nature Methods

TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

...read moreread less

Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

...read moreread less

37,898 citations

Journal Article•DOI•

MUSCLE: multiple sequence alignment with high accuracy and high throughput

[...]

Robert C. Edgar

01 Mar 2004-Nucleic Acids Research

TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.

...read moreread less

Abstract: We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the logexpectation score, and refinement using treedependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.

...read moreread less

37,524 citations

Journal Article•DOI•

A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.

[...]

Stéphane Guindon¹, Olivier Gascuel¹•Institutions (1)

Centre national de la recherche scientifique¹

01 Oct 2003-Systematic Biology

TL;DR: This work has used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches.

...read moreread less

Abstract: The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximum- likelihood principle, which clearly satisfies these requirements. The core of this method is a simple hill-climbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distance-based method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximum-likelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distance-based and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page: http://www.lirmm.fr/w3ifa/MAAS/. (Algorithm; computer simulations; maximum likelihood; phylogeny; rbcL; RDPII project.) The size of homologous sequence data sets has in- creased dramatically in recent years, and many of these data sets now involve several hundreds of taxa. More- over, current probabilistic sequence evolution models (Swofford et al., 1996 ; Page and Holmes, 1998 ), notably those including rate variation among sites (Uzzell and Corbin, 1971 ; Jin and Nei, 1990 ; Yang, 1996 ), require an increasing number of calculations. Therefore, the speed of phylogeny reconstruction methods is becoming a sig- nificant requirement and good compromises between speed and accuracy must be found. The maximum likelihood (ML) approach is especially accurate for building molecular phylogenies. Felsenstein (1981) brought this framework to nucleotide-based phy- logenetic inference, and it was later also applied to amino acid sequences (Kishino et al., 1990). Several vari- ants were proposed, most notably the Bayesian meth- ods (Rannala and Yang 1996; and see below), and the discrete Fourier analysis of Hendy et al. (1994), for ex- ample. Numerous computer studies (Huelsenbeck and Hillis, 1993; Kuhner and Felsenstein, 1994; Huelsenbeck, 1995; Rosenberg and Kumar, 2001; Ranwez and Gascuel, 2002) have shown that ML programs can recover the cor- rect tree from simulated data sets more frequently than other methods can. Another important advantage of the ML approach is the ability to compare different trees and evolutionary models within a statistical framework (see Whelan et al., 2001, for a review). However, like all optimality criterion-based phylogenetic reconstruction approaches, ML is hampered by computational difficul- ties, making it impossible to obtain the optimal tree with certainty from even moderate data sets (Swofford et al., 1996). Therefore, all practical methods rely on heuristics that obtain near-optimal trees in reasonable computing time. Moreover, the computation problem is especially difficult with ML, because the tree likelihood not only depends on the tree topology but also on numerical pa- rameters, including branch lengths. Even computing the optimal values of these parameters on a single tree is not an easy task, particularly because of possible local optima (Chor et al., 2000). The usual heuristic method, implemented in the pop- ular PHYLIP (Felsenstein, 1993 ) and PAUP ∗ (Swofford, 1999 ) packages, is based on hill climbing. It combines stepwise insertion of taxa in a growing tree and topolog- ical rearrangement. For each possible insertion position and rearrangement, the branch lengths of the resulting tree are optimized and the tree likelihood is computed. When the rearrangement improves the current tree or when the position insertion is the best among all pos- sible positions, the corresponding tree becomes the new current tree. Simple rearrangements are used during tree growing, namely "nearest neighbor interchanges" (see below), while more intense rearrangements can be used once all taxa have been inserted. The procedure stops when no rearrangement improves the current best tree. Despite significant decreases in computing times, no- tably in fastDNAml (Olsen et al., 1994 ), this heuristic becomes impracticable with several hundreds of taxa. This is mainly due to the two-level strategy, which sepa- rates branch lengths and tree topology optimization. In- deed, most calculations are done to optimize the branch lengths and evaluate the likelihood of trees that are finally rejected. New methods have thus been proposed. Strimmer and von Haeseler (1996) and others have assembled four- taxon (quartet) trees inferred by ML, in order to recon- struct a complete tree. However, the results of this ap- proach have not been very satisfactory to date (Ranwez and Gascuel, 2001 ). Ota and Li (2000, 2001) described

...read moreread less

16,261 citations

Journal Article•DOI•

RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models

[...]

Alexandros Stamatakis¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

01 Oct 2006-Bioinformatics

TL;DR: UNLABELLED RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML) that has been used to compute ML trees on two of the largest alignments to date.

...read moreread less

Abstract: Summary: RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML). Low-level technical optimizations, a modification of the search algorithm, and the use of the GTR+CAT approximation as replacement for GTR+Γ yield a program that is between 2.7 and 52 times faster than the previous version of RAxML. A large-scale performance comparison with GARLI, PHYML, IQPNNI and MrBayes on real data containing 1000 up to 6722 taxa shows that RAxML requires at least 5.6 times less main memory and yields better trees in similar times than the best competing program (GARLI) on datasets up to 2500 taxa. On datasets ≥4000 taxa it also runs 2--3 times faster than GARLI. RAxML has been parallelized with MPI to conduct parallel multiple bootstraps and inferences on distinct starting trees. The program has been used to compute ML trees on two of the largest alignments to date containing 25 057 (1463 bp) and 2182 (51 089 bp) taxa, respectively. Availability: icwww.epfl.ch/~stamatak Contact: Alexandros.Stamatakis@epfl.ch Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

14,847 citations