scispace - formally typeset
Open AccessJournal ArticleDOI

Structure and function of the global ocean microbiome

Reads0
Chats0
TLDR
This work identifies ocean microbial core functionality and reveals that >73% of its abundance is shared with the human gut microbiome despite the physicochemical differences between these two ecosystems.
Abstract
Microbes are dominant drivers of biogeochemical processes, yet drawing a global picture of functional diversity, microbial community structure, and their ecological determinants remains a grand challenge. We analyzed 7.2 terabases of metagenomic data from 243 Tara Oceans samples from 68 locations in epipelagic and mesopelagic waters across the globe to generate an ocean microbial reference gene catalog with >40 million nonredundant, mostly novel sequences from viruses, prokaryotes, and picoeukaryotes. Using 139 prokaryote-enriched samples, containing >35,000 species, we show vertical stratification with epipelagic community composition mostly driven by temperature rather than other environmental factors or geography. We identify ocean microbial core functionality and reveal that >73% of its abundance is shared with the human gut microbiome despite the physicochemical differences between these two ecosystems.

read more

Content maybe subject to copyright    Report

Revised Manuscript: Confidential 29 January 2015
Colors used throughout the revision of pre-edited manuscript
Edits by Science editor in blue
Edits by the Authors in green
Title: Structure and Function of the Global Ocean Microbiome
Authors:
Shinichi Sunagawa
1,†,*
, Luis Pedro Coelho
1,†
, Samuel Chaffron
2,3,4,†
, Jens Roat Kultima
1
, Karine Labadie
5
,
Guillem Salazar
6
, Bardya Djahanschiri
1
, Georg Zeller
1
, Daniel R. Mende
1
, Adriana Alberti
5
, Francisco M.
Cornejo-Castillo
6
, Paul I. Costea
1
, Corinne Cruaud
5
, Francesco d'Ovidio
7
, Stefan Engelen
5
, Isabel
Ferrera
6
, Josep M. Gasol
6
, Lionel Guidi
8,9
, Falk Hildebrand
1
, Florian Kokoszka
10,11
, Cyrille Lepoivre
12
,
Gipsi Lima-Mendez
2,3,4
, Julie Poulain
5
, Bonnie T. Poulos
13
, Marta Royo-Llonch
6
, Hugo Sarmento
6,14
, Sara
Vieira-Silva
2,3,4
, Céline Dimier
10,15,16
, Marc Picheral
8,9
, Sarah Searson
8,9
, Stefanie Kandels-Lewis
1,17
, Tara
Oceans coordinators, Chris Bowler
10
, Colomban de Vargas
15,16
, Gabriel Gorsky
8,9
, Nigel Grimsley
18,19
,
Pascal Hingamp
12
, Daniele Iudicone
20
, Olivier Jaillon
5,26,27
, Fabrice Not
15,16
, Hiroyuki Ogata
21
, Stephane
Pesant
22,23
, Sabrina Speich
24,25
, Lars Stemmann
8,9
, Matthew B. Sullivan
13
, Jean Weissenbach
5,26,27
, Patrick
Wincker
5,26,27
, Eric Karsenti
10,17,*
, Jeroen Raes
2,3,4,*
, Silvia G. Acinas
6,*
, Peer Bork
1,28*
Affiliations:
1
Structural and Computational Biology, European Molecular Biology Laboratory, Meyerhofstr. 1, 69117 Heidelberg, Germany.
2
Department of Microbiology and Immunology, Rega Institute, KU Leuven, Herestraat 49, 3000 Leuven, Belgium.
3
Center for the Biology of Disease, VIB, Herestraat 49, 3000 Leuven, Belgium.
4
Department of Applied Biological Sciences, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium.
5
CEA - Institut de Génomique, GENOSCOPE, 2 rue Gaston Crémieux, 91057 Evry, France.
6
Department of Marine Biology and Oceanography, Institute of Marine Science (ICM)-CSIC, Pg. Marítim de la Barceloneta, 37-49, Barcelona
E08003, Spain.
7
Sorbonne Universités, UPMC, Univ Paris 06, CNRS-IRD-MNHN, LOCEAN Laboratory, 4 Place Jussieu, 75005, Paris, France.
8
CNRS, UMR 7093, LOV, Observatoire Océanologique, F-06230, Villefranche-sur-mer, France.
9
Sorbonne Universités, UPMC Univ Paris 06, UMR 7093, LOV, Observatoire Océanologique, F-06230, Villefranche-sur-mer, France.
10
Ecole Normale Supérieure, Institut de Biologie de l’ENS (IBENS), and Inserm U1024, and CNRS UMR 8197, Paris, F-75005 France.
11
Laboratoire de Physique des Océan UBO-IUEM Palce Copernic 29820 Polouzané, France.
12
Aix Marseille Université CNRS IGS UMR 7256 13288 Marseille France.
13
Department of Ecology and Evolutionary Biology, University of Arizona, 1007 E Lowell Street, Tucson, AZ, 85721, USA.
14
Department of Hydrobiology, Federal University of São Carlos (UFSCar), Rodovia Washington Luiz, 13565-905 - São Carlos, SP – Brazil.
15
CNRS, UMR 7144, Station Biologique de Roscoff, Place Georges Teissier, 29680 Roscoff, France.
16
Sorbonne Universités, UPMC Univ Paris 06, UMR 7144, Station Biologique de Roscoff, Place Georges Teissier, 29680 Roscoff, France.
17
Directors’ Research, European Molecular Biology Laboratory, Heidelberg, Germany.
18
CNRS UMR 7232, BIOM, Avenue du Fontaulé, 66650 Banyuls-sur-Mer, France.
19
Sorbonne Universités Paris 06, OOB UPMC, Avenue du Fontaulé, 66650 Banyuls-sur-Mer France.
20
Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy.
21
Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, 611-001, Japan.
22
PANGAEA, Data Publisher for Earth and Environmental Science, University of Bremen, Bremen, Germany.
23
MARUM, Center for Marine Environmental Sciences, University of Bremen, Bremen, Germany.
24
Department of Geosciences, Laboratoire de Météorologie Dynamique (LMD), Ecole Normale Supérieure, 24 rue Lhomond 75231 Paris Cedex
05 France.
25
Laboratoire de Physique des Océan UBO-IUEM Palce Copernic 29820 Polouzané, France.
26
CNRS, UMR 8030, CP5706, Evry France.
27
Université d'Evry, UMR 8030, CP5706, Evry France.
28
Max-Delbrück-Centre for Molecular Medicine, 13092 Berlin, Germany.
Tara Oceans coordinators and affiliations are listed at the end of this manuscript.
These authors contributed equally to this work
*
Correspondence to: sunagawa@embl.de; karsenti@embl.de; jeroen.raes@vib-kuleuven.be;
sacinas@icm.csic.es; bork@embl.de

Abstract: Microbes are dominant drivers of biogeochemical processes, yet drawing a global picture of
functional diversity, microbial community structure and their ecological determinants remains a grand
challenge. We analyzed 7.2 terabases of metagenomic data from 243 Tara Oceans samples from 68
locations in epipelagic and mesopelagic waters across the globe to generate an ocean microbial reference
gene catalog with >40 million non-redundant, mostly novel sequences from viruses, prokaryotes and
picoeukaryotes. Using 139 prokaryote-enriched samples, containing >35,000 species, we show vertical
stratification with epipelagic community composition mostly driven by temperature rather than other
environmental factors or geography. We identify ocean microbial core functionality and reveal, given the
physicochemical differences, a surprisingly high fraction of its abundance (>73%) to be shared with the
human gut microbiome.
One Sentence Summary: Tara Oceans provides a gene catalogue and analysis of ocean microbes in their
environmental context across three depth layers at global scale.
Main Text: Microorganisms are ubiquitous in the ocean environment, where they play key roles in
biogeochemical processes, such as carbon and nutrient cycling (1). With an estimated 10
4
- 10
6
cells per
milliliter, their biomass combined with high turnover rates and environmental complexity, provides the
grounds for immense genetic diversity (2). These microorganisms, and the communities they form, drive
and respond to changes in the environment, including climate change-associated shifts in temperature,
carbon chemistry, nutrient and oxygen content, and alterations in ocean stratification and currents (3).
With recent advances in community DNA shotgun sequencing (metagenomics) and computational
analysis it is now possible to access the taxonomic and genomic content (microbiome) of ocean microbial
communities, and thus, to study their structural patterns, diversity and functional potential (4, 5). The
Sorcerer II Global Ocean Sampling (GOS) expedition, for example, collected, sequenced and analyzed
6.3 gigabases (Gb) of DNA from surface water samples along a transect from the Northwest Atlantic to
the Eastern Tropical Pacific (6, 7), which indicated that the vast majority of the global ocean microbiome
still remained to be uncovered (7). Nevertheless, the GOS project facilitated the study of surface
picoplanktonic communities from these regions by providing a large-scale ocean metagenomic data set to
the scientific community. Several studies have demonstrated that such data could, in principle, identify
relationships between gene functional compositions and environmental factors (8-10). However, an
extended breadth of sampling (e.g., across depth layers, domains of life, organismal size-classes, and
around the globe) combined with in situ measured environmental data could provide a global context and
minimize potential confounders.
To this end, Tara Oceans systematically collected ca. 35,000 samples for morphological, genetic and
environmental analyses using standardized protocols across multiple depths at global scale, aiming to
facilitate a holistic study on how environmental factors and biogeochemical cycles affect oceanic life
(11). Here we report the initial analysis of 243 ocean microbiome samples, collected at 68 locations
representing all main oceanic regions (except for the Arctic) from three depth layers, which were
subjected to metagenomic Illumina sequencing. By integrating these data with those from publicly
available ocean metagenomes and reference genomes, we assembled and annotated a reference gene
catalog, which we use in combination with phylogenetic marker genes (12, 13) to derive global patterns
of functional and taxonomic microbial community structures. The vast majority of genes uncovered in
Tara Oceans samples had previously not been identified with particularly high fractions of novel genes in
the Southern Ocean and in the twilight, mesopelagic zone. By correlating genomic and environmental
features, we infer that temperature, which we decoupled from dissolved oxygen, is the strongest
environmental factor shaping microbiome composition in the sunlit, epipelagic ocean layer. Furthermore,
we define a core set of gene families that are ubiquitous in the ocean and differentiate variable, adaptive
functions from stable core functions, which are compared between ocean depth layers and to those in the
human gut microbiome.

Ocean Microbial Reference Gene Catalog
To capture the genomic content of prevalent microbiota across major oceanic regions (Fig. 1A), Tara
Oceans collected seawater samples within the epipelagic layer, both from the surface water and the deep
chlorophyll maximum (DCM) layers, as well as the mesopelagic zone (14). From 68 selected locations,
243 size-fractionated samples targeting organisms up to 3 µm (virus-enriched fraction (<0.2 µm): n=45;
girus/prokaryote-enriched fractions (0.1-0.2 µm, 0.2-0.45 µm, 0.45-0.8 µm): n=59; prokaryote-enriched
fractions (0.2-1.6 µm, 0.2-3 µm): n=139) were paired-end shotgun Illumina sequenced to generate a total
of more than 7.2 terabases (29.6 ± 12.7 gigabases (Gb) per sample) of metagenomic data (14), which are
in the same order of magnitude as data from the US Human Microbiome Project (phase I) and the
European Metagenomics of the Human Intestinal Tract project combined (15-17).
To generate an ocean microbial reference gene catalog (see also (17, 18)), we first reconstructed the
genomic content of these new data by metagenomic assembly and gene prediction (19), and combined
these results with publicly available ocean metagenomic data and reference genomes (14). Specifically,
approximately 111.5 million (M) protein-coding nucleotide sequences were predicted from Tara Oceans
metagenomes, which were clustered at 95% nucleotide sequence identity with 24.4 M sequences from
other ocean metagenomes and 1.6 M sequences from ocean prokaryotic (n=433) and viral (n=121)
reference genomes (14). This resulted in a global Ocean Microbial Reference Gene Catalog (OM-RGC),
which comprises >40 M non-redundant representative genes from viruses, prokaryotes and
picoeukaryotes (Fig. 1B).
Compared to a human gut microbial reference gene catalog (18), the OM-RGC comprises more than four
times the number of genes, most of which (72.3% of the annotated fraction) appear prokaryotic (Fig. 1B).
In total, 81.4% of the genes were exclusive to Tara Oceans samples with only 5.11% and 0.44%
overlapping with GOS sequences and reference genomes, respectively (Fig. 1B), which highlights the
extent of the unexplored genomic potential in our oceans. Rarefaction analysis showed that the rate of
new gene detection decreased to 0.01% by the end of sampling (Fig. 1C), suggesting that the abundant
microbial sequence space appears well represented, at least for the targeted size ranges, sampling
locations and depths. Genes found in one sample only amounted to 3.6%, which may originate from
localized specialists.
To complement the work of Tara Oceans Consortium partners who analyzed viral and protist-enriched
size fractions (20, 21) and integrated data across domains of life (22, 23), we focused our analyses on 139
prokaryote-enriched samples, which included: 63 surface water samples (5 m; s.d. 0 m), 46 epipelagic
subsurface water samples mostly from the DCM (71 m; s.d. 41 m), and 30 mesopelagic samples (600 m;
s.d. 220 m). Using this set, we revealed that gene novelty generally increased from surface to DCM
waters and remained relatively stable across ocean regions with overall about half of the genes being
novel. As exceptions to this pattern, we find in Southern Ocean (SO) and mesopelagic samples about 80%
and 90% of novelty, respectively. In addition to higher novelty in hitherto uncharted regions, these
patterns likely reflect the detection of rare organisms by deep sequencing, although it could also be due to
seasonal and locational differences of sampling in relatively well-studied regions.
To put the degree of taxonomic novelty into context, we extracted a total of >14 M metagenomic 16S
rRNA gene tags (16S
mi
tags; (12)) and mapped these to operational taxonomic units (OTUs) based on
97% sequence identity clustering of reference 16S sequences (24). This cutoff has been commonly used
to group taxa at the species level, although it may rather represent clades somewhere between species and
genus level (25). The fraction of total 16S
mi
tags not matching any reference OTUs also increased with
depth, but was on average only 5.5% (14). Thus, although the vast majority of prokaryotic clades detected
in Tara Oceans metagenomes had been already captured by 16S rRNA sequencing, the OM-RGC now
provides a link to their genomic content.

Diversity and Stratification of the Ocean Microbiome
Given the global scale of Tara Oceans samples, we interrogated our data set for the composition and
stratifying factors of ocean microbial communities. Taxonomic and phylogenetic diversity were highly
(R
2
=0.96) correlated (14) and 16S
mi
tags identified in our metagenomic data set mapped to a total of
35,650 OTUs (2,937 OTUs; s.d. 585 OTUs). The total richness estimate of 37,470 is comparable to the
numbers from a previous study, which detected about 44,500 OTUs based on PCR-amplified 16S tags
from 356 globally distributed pelagic samples (26) that were collected in the context of the International
Census of Marine Microbes (ICoMM) project (27). At phylum level, more than 93% of 16S
mi
tags could
be annotated. We found typical members of Proteobacteria, including the ubiquitous clades SAR11
(Alphaproteobacteria) and SAR86 (Gammaproteobacteria), to dominate the sampled areas of the ocean
both in terms of relative abundance and taxonomic richness (28, 29). Cyanobacteria, Deferribacteres and
Thaumarchaeota were also abundant, although the taxonomic richness within these phyla was smaller
(Fig. 2). Photosynthetic cyanobacterial taxa such as Prochlorococcus and Synechococcus were detected in
all mesopelagic samples and contributed about 1% of the abundance (Fig. 2), which is in line with
previous reports suggesting a role for cyanobacteria in sinking particle flux (30).
To explore the overall variability in community composition, we performed a principal coordinate
analysis (PCoA), which revealed that depth explained 73% of the variance (PC1 in Fig. 3A). This is
consistent with several studies that have reported a vertical stratification of microbial taxa and viruses
according to changes in physico-chemical parameters, such as light, temperature and nutrients (31, 32).
Given the vertical stratification, we further characterized taxonomic and functional richness, between-
sample dissimilarity (
b
-diversity), total cell abundance and potential growth rates across three depth
layers. Our results revealed an increase of both taxonomic and functional richness with depth while cell
abundance, as measured by flow cytometry, and potential maximum growth rates (33) decreased with
depth (Fig. 3B).
Although increasing species richness from the surface to the mesopelagic has been reported locally, e.g.,
in the Mediterranean Sea (34), our findings emphasize the global relevance of this pattern. The observed
increase in taxonomic and functional richness may reflect diversified species adapted to a wider range of
niches, such as particle-associated micro-environments in the mesopelagic zone (35). In addition, slower
growth, due to more limited carbon sources in the mesopelagic zone, and higher motility have been
suggested to reduce predation by flagellates and ciliates as well as viral infection rates (36). Indeed, our
metagenomic analysis provides molecular support for these models by identifying a significant
enrichment of chemotaxis and motility genes in the mesopelagic zone (see below).
Environmental Drivers of Community Composition
A key question in ocean microbial ecology is to which extent limited dispersal and historical contingency
on the one hand, and global dispersion combined with selection by environmental factors on the other are
responsible for contemporary biogeographic patterns (4, 5). The relationship between absolute latitude
and biodiversity is an example for such a pattern, albeit being still controversial; while some authors
found a negative correlation (37), others reported maxima in intermediate latitudinal ranges (10, 38). The
latter is supported by our findings (Fig. 4A), as an increase in richness with temperature was found from 4
ºC to about 12 ºC, followed by a negative correlation for the remainder of the sampled temperature range
(up to 30 ºC). This is also congruent with previous reports on oceanic groups of eukaryotes (39). A
modeling study predicted season as a driver of biodiversity (40). For our data, however, the association of
richness with temperature and latitude is robust to the confounding effect of seasonality (partial Mantel
test, p-value < 0.01), although more data are needed for a rigorous statistical evaluation of such questions,
for example, by periodically sampling the ocean across the globe on the same day (41). In addition to
latitudinal biodiversity patterns, we found taxonomic community dissimilarity to increase up to about

5,000 km within an ocean region (Fig. 4B). Together, these findings support biogeographic patterns of
microbial communities in line with a number of previous studies (10, 37, 38).
To further investigate the underlying mechanisms, we investigated whether samples were more similar
within than across ocean regions by focusing on surface samples only. If dispersal limitation rather than
environmental selection dominated, we would expect a higher similarity within than across ocean regions.
On the other hand, if environmental selection explained biogeographic patterns, we would expect
environmental factors to correlate with community similarity. Previous studies on selected ocean
microbial taxa have shown a strong impact of light and temperature (42). For entire community
assemblages, however, expectations are less clear. In a large-scale meta-analysis, salinity has been
suggested as the major determinant across many (including ocean) ecosystems, exceeding the influence of
temperature (43). In contrast to this, an analysis of functional trait composition in ocean environments
suggested temperature and light to have stronger effects than nutrients or salinity (10, 44).
A PCoA of taxonomic compositions of surface samples does not show a clear separation by regional
origin, despite showing on average a higher similarity of communities within than across ocean regions
(Fig. 5A). Instead, temperature was found to strongly correlate with PC1 (R
2
=0.76). Thus, to identify
environmental drivers in our data set, we correlated geographic distance-corrected dissimilarities of
taxonomic and functional community composition with those of environmental factors (Fig. 5B). Overall,
temperature and dissolved oxygen were the strongest correlates of both taxonomic and functional
composition in the epipelagic layer (Fig. 5B and below), while no significant correlation was found for
salinity. Nutrients were only weakly correlated and, except for silicate, after the removal of a few extreme
locations with very low temperatures, the correlations were not statistically significant.
Finally, we tackled the challenge of disentangling the high correlation between temperature and dissolved
oxygen (R
2
=0.87) in surface waters. To this end, we first used a machine learning-based approach (45) to
independently model associations of each of these two factors with taxonomic/functional composition
within surface samples (Fig. 6A). We then tested the strength of these associations in DCM layers, where
correlations between the two factors were much weaker (R
2
=0.16), which allowed us to effectively
decouple dissolved oxygen from temperature. The surface-fitted model of temperature continues to
achieve high prediction accuracy when applied at the DCM layers. The oxygen model, on the other hand,
cannot generalize across depths. To illustrate the strength of these associations, we show that temperature
could be predicted with an explained variance of 86% using only species abundance as information (Fig.
6B). These results were validated using data from the GOS project (R
2
=0.66) despite a number of
differences in sampling and sequencing procedures between these two studies (Fig. 6B).
Taken together, our data suggest geographic distance to play a subordinate role and reveal temperature to
be the major environmental factor in shaping taxonomic and functional microbial community
compositions in the photic open ocean. Thus, a global dispersal potential for microorganisms (46) and
subsequent environmental selection may, at least for some taxa, represent a mechanism for driving
patterns of microbial biogeography. At the same time, localized adaptations by natural selection will lead
to differences in spatially distant populations of phylogenetically similar organisms, and characterizing
these variations at strain-level resolution represents an important challenge for the future.
Core Functional Analysis Between Ecosystems
The generation of non-redundant gene abundance profiles from a large number (e.g., >100) of samples
can be used to define a set of gene families, as a proxy for gene-encoded functions, which are
ubiquitously found (core) in microbial communities. Such an analysis was performed for the human gut
(17), which represents a fundamentally different microbial ecosystem (anoxic, host-associated, dominated
by heterotrophs). However, due to the lack of other large-scale, ecosystem-wide metagenomic data sets, it
has been unknown how much of these core functions are shared with any other ecosystem. Thus, we first

Citations
More filters
Journal ArticleDOI

KEGG: integrating viruses and cellular organisms.

TL;DR: The K EGG pathway maps are now integrated with network variation maps in the NETWORK database, as well as with conserved functional units of KEGG modules and reaction modules in the MODULE database, and the KO database for functional orthologs continues to be improved.
Journal ArticleDOI

The microbial nitrogen-cycling network

TL;DR: This Review summarizes the current understanding of the microbial nitrogen-cycling network, including novel processes, their underlying biochemical pathways, the involved microorganisms, their environmental importance and industrial applications.
Journal ArticleDOI

The biomass distribution on Earth

TL;DR: The overall biomass composition of the biosphere is assembled, establishing a census of the ≈550 gigatons of carbon (Gt C) of biomass distributed among all of the kingdoms of life and shows that terrestrial biomass is about two orders of magnitude higher than marine biomass and estimate a total of ≈6 Gt C of marine biota, doubling the previous estimated quantity.
Journal ArticleDOI

Decoupling function and taxonomy in the global ocean microbiome

TL;DR: It is found that environmental conditions strongly influence the distribution of functional groups in marine microbial communities by shaping metabolic niches, but only weakly influence taxonomic composition within individual functional groups.
References
More filters
Journal Article

Scikit-learn: Machine Learning in Python

TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Journal ArticleDOI

The SILVA ribosomal RNA gene database project: improved data processing and web-based tools

TL;DR: The extensively curated SILVA taxonomy and the new non-redundant SILVA datasets provide an ideal reference for high-throughput classification of data from next-generation sequencing approaches.
Journal ArticleDOI

Search and clustering orders of magnitude faster than BLAST

Robert C. Edgar
- 01 Oct 2010 - 
TL;DR: UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters and offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets.
Journal ArticleDOI

Regularization and variable selection via the elastic net

TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.
Related Papers (5)

Structure, function and diversity of the healthy human microbiome

Curtis Huttenhower, +253 more
- 14 Jun 2012 - 
Frequently Asked Questions (6)
Q1. What is the name of the study?

Genes identified in their study were clustered together with >26 M sequences from publicly available data (external genes; see (14)) to yield a set of >40 M reference genes (top left), which equals more than four times the number of genes in the human gut microbial reference gene catalog (top right). 

Edge width corresponds to the Mantel’s r statistic for the corresponding distance correlations and edge color denotes the statistical significance based on 9,999 permutations. 

With increasing sample size, the number of shared orthologous groups decreased first rapidly, then more gradually to a minimum of 5,755 OGs at 139 samples, which was considered the set of ocean core OGs. 

Taxonomic (based on two independent methods: mitags (12) and mOTUs (13)) and functional (based on biochemical KEGG modules) community composition was related to each environmental factor by partial (geographic distance-corrected) 

Using temperature prediction models trained at genus level using Tara Oceans data, the authors show (inset) that the results could be validated at relatively high accuracy given the large differences in sampling and sequencing methods between these two studies. 

The authors further declare that all data reported herein are fully and freely available from the date of publication, with no restrictions, and that all of the samples, analyses, publications, and ownership of data are free from legal entanglement or restriction of any sort by the various nations whose waters the Tara Oceans expedition sampled in.