scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Reply to: Examining microbe-metabolite correlations by linear methods

TL;DR: Morton, James T; McDonald, Daniel; Aksenov, Alexander A; Nothias, Louis Felix; Foulds, James R; Quinn, Robert A; Badri, Michelle H; Swenson, Tami L; Van Goethem, Marc W; Northen, Trent R; Vazquez-Baeza, Yoshiki; Wang, Mingxun; Bokulich, Nicholas A; Watters, Aaron; Song, Se Jin; Bonneau, Richard; Dorrestein, Pieter C; Knight, Rob RE
Abstract: Author(s): Morton, James T; McDonald, Daniel; Aksenov, Alexander A; Nothias, Louis Felix; Foulds, James R; Quinn, Robert A; Badri, Michelle H; Swenson, Tami L; Van Goethem, Marc W; Northen, Trent R; Vazquez-Baeza, Yoshiki; Wang, Mingxun; Bokulich, Nicholas A; Watters, Aaron; Song, Se Jin; Bonneau, Richard; Dorrestein, Pieter C; Knight, Rob

Summary (1 min read)

Introduction

  • Lawrence Berkeley National Laboratory Recent Work Title Reply to: Examining microbe-metabolite correlations by linear methods.
  • The authors have found that MMvec is a powerful discovery tool, as demonstrated by the other real datasets.

Reply to: Examining microbe–metabolite

  • Matters arising Nature Methods the authors evaluated in the original article.
  • It is critical that the authors provide accurate guidance to the community so that scenarios where one method works better than others are well understood.
  • While there may be scenarios where linear methods outperform neural networks, the authors show that there are scenarios where neural networks outperform linear methods.
  • Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/.

Methods

  • The simulations were created by using the generative form of MMvec; the microbe and metabolite factor loadings were randomly generated from a normal distribution to parameterize the MMvec parameters.
  • Microbial counts were then drawn from a multinomial logistic normal distribution and fed into MMvec to generate the metabolite counts.
  • To identify scenarios where CLR correlations underperformed in comparison to MMvec, the authors used Bayesian Optimization to tune the distributions used to generate the simulations.
  • The CLR-transformed correlations suggested by Quinn and Erb were benchmarked on the desert biocrust soils dataset using the R scripts provided in ref.
  • Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Author contributions

  • J.T.M. performed all analyses and wrote the manuscript.
  • All authors have contributed edits to the manuscript.

Additional information

  • Supplementary information is available for this paper at https://doi.org/10.1038/.
  • All manuscripts must include a data availability statement.
  • For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf.
  • 2 nature research | reporting sum m ary O ctober 2018 Life sciences study design Randomization Randomization was not necessary, since the data was simulated, not collected.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Lawrence Berkeley National Laboratory
Recent Work
Title
Reply to: Examining microbe-metabolite correlations by linear methods.
Permalink
https://escholarship.org/uc/item/3827k7p9
Journal
Nature methods, 18(1)
ISSN
1548-7091
Authors
Morton, James T
McDonald, Daniel
Aksenov, Alexander A
et al.
Publication Date
2021
DOI
10.1038/s41592-020-01007-0
Peer reviewed
eScholarship.org Powered by the California Digital Library
University of California

Matters arising
https://doi.org/10.1038/s41592-020-01007-0
1
Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA.
2
Department of Computer Science and Engineering, University of
California, San Diego, La Jolla, CA, USA.
3
Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA.
4
Collaborative Mass
Spectrometry Innovation Center, University of California, San Diego, La Jolla, CA, USA.
5
Skaggs School of Pharmacy and Pharmaceutical Sciences,
University of California, San Diego, La Jolla, CA, USA.
6
Department of Information Systems, University of Maryland–Baltimore County, Baltimore, MD,
USA.
7
Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA.
8
Department of Biology, New York University,
New York, NY, USA.
9
Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
10
DOE Joint
Genome Institute, Walnut Creek, CA, USA.
11
Jacobs School of Engineering, University of California, San Diego, La Jolla, CA, USA.
12
The Pathogen and
Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA.
13
Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ,
USA.
14
Flatiron Institute, Simons Foundation, New York, NY, USA.
15
Department of Computer Science, Courant Institute, New York, NY, USA.
16
Center for
Data Science, New York University, New York, NY, USA.
17
Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA.
18
Center for
Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA.
e-mail: rknight@ucsd.edu
Quinn and Erb
1
propose to apply a centered log-ratio (CLR) trans-
form before performing correlation analysis and make the case
that, when used correctly, correlation and proportionality can out
-
perform MMvec in identifying microbe–metabolite interactions.
While this may be an appealing strategy, it is important to note that
the correlations estimated from CLR-transformed data will have a
fundamentally different interpretation than the true correlations in
the environment, namely:
Cov x
i
; y
j

Cov clr x
ðÞ
i
; clr y
ðÞ
j

where x
i
and y
j
are the absolute abundances for microbe abundances
x and metabolite abundances y in taxon i and metabolite j. Because
the absolute abundances are often not available, inferring the true
correlations between microbes and metabolites is not tractable
(Supplementary Note 1). This phenomenon has been extensively
studied in refs.
24
, and one of our recent studies provides the intu-
ition behind this in the case of differential abundance
5
. Because of
this discrepancy, we proposed to use co-occurrence probabilities
instead of correlation.
We relied on simulated data in the original paper
6
as an artifi-
cial ground truth, as is common in the evaluation of omics tools.
However, simulated data will always have limitations because
of the inability to model unknown features of the real system or
because of deliberate simplifications that clarify key points in
the model system. Furthermore, it is possible to identify simu
-
lations where a proposed model is optimal. In Fig. 1, we used
Bayesian Optimization
7
to identify simulations where MMvec was
able to accurately estimate the correct parameters and Pearson
underperformed. If the appropriate assumptions are satisfied,
MMvec can correctly estimate the co-occurrence probabilities with
machine precision.
Therefore, a crucial aspect of the MMvec manuscript was to test
performance both on simulations and on real data. Performance on
real data is the ultimate test of methods, and we recommend that
simulated datasets be complemented with experimentally vali
-
dated datasets where possible. Accordingly, we applied the same
proportionality-based scripts described by Quinn and Erb
1
and eval-
uated them on one of the real datasets we used in the MMvec paper.
A major obstacle to analyzing real-world microbiome and
metabolomics data is sparsity. Traditional compositional methods
such as the proposed CLR transform cannot automatically deal with
zeros and require imputation as a preprocessing step. This imputa
-
tion adds bias and is impractical for the sparse datasets typically
encountered
8,9
. Microbiome and untargeted metabolomics datas-
ets are generally sparse: in large studies, such as the American Gut
Project
10
, the sparsity for stool samples alone is 99.946%. MMvec
was designed to handle sparse data. In the desert biocrust soils data
-
set (sparsity of 51%; ref.
11
) that was used in the MMvec publication,
we observe that MMvec dramatically outperformed the newly pro
-
posed linear methods (Fig. 2).
Contrary to the argument by Quinn and Erb
1
regarding the
complexity of neural networks, the MMvec model
6
is not much
more complex than the proposed regression techniques. It is a
simple one-layer neural network, which is in effect a two-stage
log–bilinear regression.
Methods similar to MMvec have been successful at the task of
learning word co-occurrences. Since Mikolov et al.
12
, these mod-
els have been designed with an emphasis on practical methods for
learning useful word representations at scale, rather than on per
-
fectly modeling the data distribution.
MMvec is only one tool in the arsenal of correlative methods.
It is not perfect for every correlation type or dataset and is not a
one-size-fits-all solution. However, we have found that MMvec is a
powerful discovery tool, as demonstrated by the other real datasets
Reply to: Examining microbe–metabolite
correlations by linear methods
James T. Morton
1,2
, Daniel McDonald
1,3
, Alexander A. Aksenov
4,5
, Louis Felix Nothias
4,5
,
James R. Foulds
6
, Robert A. Quinn
7
, Michelle H. Badri
8
, Tami L. Swenson
9
, Marc W. Van Goethem
9
,
Trent R. Northen
9,10
, Yoshiki Vazquez-Baeza
3,11
, Mingxun Wang
4,5
, Nicholas A. Bokulich
12,13
,
Aaron Watters
14
, Se Jin Song
1,3
, Richard Bonneau
8,14,15,16
, Pieter C. Dorrestein
4,5
and Rob Knight
1,2,17,18
 ✉
replying to T. P. Quinn & I. Erb Nature Methods https://doi.org/10.1038/s41592-020-01006-1 (2020)
NATURE METHODS | www.nature.com/naturemethods

Matters arising
Nature Methods
we evaluated in the original article. It is critical that we provide
accurate guidance to the community so that scenarios where one
method works better than others are well understood. While there
may be scenarios where linear methods outperform neural net
-
works, we show that there are scenarios where neural networks
outperform linear methods. We appreciate the communication on
the topic to the extent that it helps the community better under
-
stand the advantages and limitations of the different approaches and
prompts the community to continue to innovate in this area.
Online content
Any methods, additional references, Nature Research report-
ing summaries, source data, extended data, supplementary infor-
mation, acknowledgements, peer review information; details of
author contributions and competing interests; and statements of
data and code availability are available at https://doi.org/10.1038/
s41592-020-01007-0.
Received: 17 March 2020; Accepted: 27 October 2020;
Published: xx xx xxxx
References
1. Quinn, T. P. & Erb, I. Examining microbe–metabolite correlations by
linear methods. Nat. Methods https://doi.org/10.1038/s41592-020-01006-1
(2020).
2. Aitchison, J. A concise guide to compositional data analysis. http://www.leg.
ufpr.br/lib/exe/fetch.php/pessoais:abtmartins:a_concise_guide_to_
compositional_data_analysis.pdf (2003).
3. Filzmoser, P. & Hron, K. Correlation analysis for compositional data.
Math. Geosci. 41, 905 (2009).
4. Friedman, J. & Alm, E. J. Inferring correlation networks from genomic survey
data. PLoS Comput. Biol. 8, e1002687 (2012).
5. Morton, J. T. et al. Establishing microbial composition measurement
standards with reference frames. Nat. Commun. 10, 2719 (2019).
6. Morton, J. T. et al. Learning representations of microbe–metabolite
interactions. Nat. Methods 16, 1306–1314 (2019).
7. Nogueira, F. Bayesian Optimization: open source constrained global
optimization tool for Python. https://github.com/fmfn/BayesianOptimization
(2014).
8. Martın-Fernández, J. A., Barceló-Vidal, C. & Pawlowsky-Glahn, V. Dealing
with zeros and missing values in compositional data sets using nonparametric
imputation. Math. Geol. 35, 253–278 (2003).
9. Silverman, J. D., Roche, K., Mukherjee, S. & David, L. A. Naught all
zeros in sequence count data are the same. Preprint at bioRxiv
https://doi.org/10.1101/477794 (2018).
10. McDonald, D. et al. American Gut: an open platform for citizen science
microbiome research. mSystems 3, e00031-18 (2018).
11. Swenson, T. L., Karaoz, U., Swenson, J. M., Bowen, B. P. & Northen, T. R.
Linking soil biology and chemistry in biological soil crust using isolate
exometabolomics. Nat. Commun. 9, 19 (2018).
12. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. & Dean, J. Distributed
representations of words and phrases and their compositionality. in
Advances in Neural Information Processing Systems 3111–3119
(2013).
13. Aitchison, J. e statistical analysis of compositional data. J. R. Stat. Soc. B
44, 139–160 (1982).
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
© The Author(s), under exclusive licence to Springer Nature America, Inc. 2021
a b c
d e f
MMvec
(R = 0.999, P = 0.000)
MMvec
(R = 0.947, P = 0.000)
–4
–5
–6
–7
–4
–5
–6
–7
–4
–5
–6
–7
–6
0 0–5–10–15
–60
–40
–20
0
–30
–10
–20
0
–60
–40
–20
0
–5 –4 –6–8–10 –2–4
–6–8–10 –2–4
–6–8–10 –2–4
Estimated co-occurrences Estimated co-occurrences Estimated co-occurrences
Estimated co-occurrences Estimated co-occurrences Estimated co-occurrences
Ground truth
co-occurrences
Ground truth
co-occurrences
Pearson
(R = 0.002, P = 0.809)
Pearson
(R = 0.150, P = 0.000)
CLR-transformed Pearson
(R = 0.004, P = 0.714)
CLR-transformed Pearson
(R = 0.050, P = 0.000)
–6–8–10 –2–4
Fig. 1 | A simulation benchmark comparing MMvec to Pearson. Simulations were obtained through Bayesian Optimization
7
to showcase scenarios
where MMvec outperforms Pearson. ac, Simulation of a scenario where the microbiome dataset is 99% dense. df, Simulation of a scenario where the
microbiome dataset is 60% dense. All axes are represented on a log scale. Pearson’s R is used to measure the agreement between the simulated ground
truth co-occurrences and the estimated co-occurrences.
12
True positives
8
4
15
Microcoleus molecular detection rate
30
MMvec
Spearman
Pearson
φ
ρ
Top K hits
0
0
Fig. 2 | Biocrust soils benchmark. A comparison of MMvec to metrics
proposed by Quinn and Erb
1
. These proposed metrics include Spearman,
Pearson, φ and ρ applied after a CLR transformation
13
.
NATURE METHODS | www.nature.com/naturemethods

Matters arising
Nature Methods
Methods
The simulations were created by using the generative form of MMvec; the
microbe and metabolite factor loadings were randomly generated from a normal
distribution to parameterize the MMvec parameters. Microbial counts were then
drawn from a multinomial logistic normal distribution and fed into MMvec to
generate the metabolite counts. To identify scenarios where CLR correlations
underperformed in comparison to MMvec, we used Bayesian Optimization to tune
the distributions used to generate the simulations.
The CLR-transformed correlations suggested by Quinn and Erb were
benchmarked on the desert biocrust soils dataset using the R scripts provided in ref.
1
.
Reporting Summary. Further information on research design is available in the
Nature Research Reporting Summary linked to this article.
Data availability
The datasets to reproduce the results presented here can be found at https://github.
com/knightlab-analyses/multiomic-cooccurrences.
Code availability
The analysis software to reproduce the results presented here can be found at
https://github.com/knightlab-analyses/multiomic-cooccurrences.
Author contributions
J.T.M. performed all analyses and wrote the manuscript. All authors have contributed
edits to the manuscript.
Competing interests
M.W. is the founder of Ometa Labs. The remaining authors declare no competing interests.
Additional information
Supplementary information is available for this paper at https://doi.org/10.1038/
s41592-020-01007-0.
Correspondence and requests for materials should be addressed to R.K.
Reprints and permissions information is available at www.nature.com/reprints.
NATURE METHODS | www.nature.com/naturemethods

1
nature research | reporting summary
October 2018
Corresponding author(s):
Rob Knight
Last updated by author(s):
9/10/2020
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a
Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested
A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons
A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient)
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals)
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Software and code
Policy information about availability of computer code
Data collection
Only simulation data was used.
Data analysis
All data analysis scripts can be found here: https://github.com/knightlab-analyses/multiomic-cooccurences
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers.
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
- A list of figures that have associated raw data
- A description of any restrictions on data availability
The biocrust soils data was retrieved from the supplemental section in Swenson et al
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Citations
More filters
Journal ArticleDOI
TL;DR: In this article, the contribution of metabolites in the crosstalk between gut microbiota and immune cells is discussed, and the potential strategies to hear the sound of such metabolite-mediated cresstalk are discussed.
Abstract: Trillions of microorganisms, termed the “microbiota”, reside in the mammalian gastrointestinal tract, and collectively participate in regulating the host phenotype. It is now clear that the gut microbiota, metabolites, and intestinal immune function are correlated, and that alterations of the complex and dynamic host-microbiota interactions can have deep consequences for host health. However, the mechanisms by which the immune system regulates the microbiota and by which the microbiota shapes host immunity are still not fully understood. This article discusses the contribution of metabolites in the crosstalk between gut microbiota and immune cells. The identification of key metabolites having a causal effect on immune responses and of the mechanisms involved can contribute to a deeper insight into host-microorganism relationships. This will allow a better understanding of the correlation between dysbiosis, microbial-based dysmetabolism, and pathogenesis, thus creating opportunities to develop microbiota-based therapeutics to improve human health. In particular, we systematically review the role of soluble and membrane-bound microbial metabolites in modulating host immunity in the gut, and of immune cells-derived metabolites affecting the microbiota, while discussing evidence of the bidirectional impact of this crosstalk. Furthermore, we discuss the potential strategies to hear the sound of such metabolite-mediated crosstalk.

11 citations

Journal ArticleDOI
TL;DR: A workflow that could demonstrate the capability of untargeted metabolomics in differentiating gut bacterial species and detecting their characteristic metabolites proportionally to the microbial population in co-culture systems is proposed.
Abstract: Gut microbiome plays a vital role in human health, and its characteristic has been widely identified through next-generation sequencing techniques. Although with great genomic insights into gut microbiome, its functional information is not clearly elaborated through metagenomic techniques. On the other hand, it is suggested that fecal metabolome can be used as a functional readout of the microbiome composition; therefore, we designed a proof-of-concept study to first characterize the metabolome of different gut microbes and then investigate the relationship between bacterial metabolomes and their compositions in co-culture systems. We selected eight representative bacteria species from Bifidobacterium (2), Bacteroides (1), Lactobacillus (4), and Akkermansia (1) genera as our model microbes. Liquid chromatography coupled mass spectrometry-based untargeted metabolomics was utilized to explore the microbial metabolome of bacteria single cultures and co-culture systems. Through spectral comparisons, our results showed that untargeted metabolomics could capture the similarity and differences in metabolic profiles from eight representative gut bacteria. Also, untargeted metabolomics could sensitively differentiate gut bacterial species based on our statistical analyses. For example, citrulline and histamine levels were significantly different among four Lactobacillus species. In addition, in the co-culture systems with different bacteria population ratios, gut bacterial metabolomes can be used to quantitatively reflect bacterial population in a mixed culture. For instance, the relative abundance of 2-hydroxybutyric acid changed proportionately with the changed population ratio of Lactobacillus reuteri in the co-culture system. In summary, we proposed a workflow that could demonstrate the capability of untargeted metabolomics in differentiating gut bacterial species and detecting their characteristic metabolites proportionally to the microbial population in co-culture systems.

2 citations

Posted ContentDOI
23 May 2023-bioRxiv
TL;DR: In this paper , the authors introduce the methods SparCEV and SparXCC for quantifying correlations between abundances of different microbes (here referred to as operational taxonomic units, OTUs) and other variables.
Abstract: In the field of microbiome studies, it is of interest to infer correlations between abundances of different microbes (here referred to as operational taxonomic units, OTUs). Several methods taking the compositional nature of the sequencing data into account exist. However, these methods cannot infer correlations between OTU abundances and other variables. In this paper we introduce the methods SparCEV (Sparse Correlations with External Variables) and SparXCC (Sparse Cross-Correlations between Compositional data) for quantifying correlations between OTU abundances and either continuous phenotypic variables or components of other compositional datasets, such as transcriptomic data. We compare these new methods to empirical Pearson cross-correlations after applying naive transformations of the data (log and log-TSS). Additionally, we test the centered log ratio transformation (CLR) and the variance stabilising transformation (VST). We find that CLR and VST outperform naive transformations, except when the correlation matrix is dense. For large numbers of OTUs, SparCEV and SparXCC perform similarly to CLR and VST. SparCEV outperforms all other tested methods when the number of OTUs is small (less than 100). SparXCC outperforms all tested methods when at least one of the compositional datasets has few variables (less than 50), and more so when both datasets have few variables. Author summary Sequencing data of the microbiome posses a unique and challenging structure that renders many standard statistical tools invalid. Features such as compositionality and sparsity complicates statistical analysis, and as a result, specialized tools are needed. Practitioners have long been interested in the construction of correlation networks within the microbiome, and several methods for accomplishing this exist. However, less attention has been paid to the estimation of cross-correlations between microbial abundances and other variables (such as gene expression data or environmental and phenotypic variables). Here, we introduce novel approaches, SparCEV and SparXCC, for inferring such cross-correlations, and compare these to transformation-based approaches, namely log, log-TSS, CLR and VST. In some cases, SparCEV and SparXCC yield superior results, while in other cases, a simpler transformation-based approach suffices. The methods are used to study cross-correlations between bacterial abundances in the skin microbiome and the severity of atopic dermatitis, as well as cross-correlations between fungal and bacterial OTUs in the root microbiome of the legume Lotus japonicus.
Posted ContentDOI
12 Jan 2023
TL;DR: The authors explored the potential of volatile organic compounds (VOCs) to indicate water toxicity and microbial community composition in Upper Klamath Lake, OR. Elastic net regularization regression selected 29 of 229 detected m/z + 1 values (corresponding to unique VOCs).
Abstract: Abstract Toxins commonly produced by cyanobacterial blooms in freshwater lakes are a serious public health problem. The conditions leading to toxin production are currently unpredictable, thereby requiring expensive sampling and monitoring programs globally. We explored the potential of volatile organic compounds (VOCs) to indicate water toxicity and microbial community composition in Upper Klamath Lake, OR. Elastic net regularization regression selected 29 of 229 detected m/z + 1 values (corresponding to unique VOCs) in models predicting microcystin toxicity that outperformed or significantly improved upon regression models based on environmental parameters, such as chlorophyll, pH, and temperature. Several m/z + 1 values are tentatively identified as epinephrine pathway metabolites, indicating organismal stress associated with microcystin production. Unique sets of m/z + 1 values were also identified by elastic net regression that predicted the relative abundance of the most dominant bacterial phyla, classes, and cyanobacterial genera. These results show that VOCs may be a key component of lake monitoring strategies.
Posted ContentDOI
20 Apr 2023
TL;DR: This paper explored the potential of volatile organic compounds (VOCs) to indicate microcystin presence and concentration, and microbial community composition in Upper Klamath Lake, OR.
Abstract: Abstract Toxins commonly produced by cyanobacterial blooms in freshwater lakes are a serious public health problem. The conditions leading to toxin production are currently unpredictable, thereby requiring expensive sampling and monitoring programs globally. We explored the potential of volatile organic compounds (VOCs) to indicate microcystin presence and concentration, and microbial community composition in Upper Klamath Lake, OR. Elastic net regularization regression selected 29 of 229 detected m/z+1 values (corresponding to unique VOCs) in models predicting microcystin toxicity that outperformed or significantly improved upon regression models based on environmental parameters, including chlorophyll, pH, and temperature. Several m/z+1 values selected by elastic net were putatively identified as saturated fatty aldehydes (SFAs), which are important in defending cyanobacteria against oxidative stress. Unique sets of m/z+1 values were also identified by elastic net regression that predicted the relative abundance of the most dominant bacterial phyla, classes, and cyanobacterial genera. These results show that VOCs may be a key component of lake monitoring strategies.
References
More filters
Journal ArticleDOI
TL;DR: I.E. has received funding from the European Union's Horizon 2020 Research and Innovation Programme under grant agreement No. 825835 (BovReg), Secretaria de Universidades e Investigacion del Departamento de Economia y Conocimiento de la Generalidad de Cataluna, 2017 SGR 447 (SGR), Agencia Estatal de Investigacion (AEI) and FEDER under Project BFU2017-88264-P (Plan Estatal) as mentioned in this paper.
Abstract: Funding: I.E. has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No. 825835 (BovReg), Secretaria de Universidades e Investigacion del Departamento de Economia y Conocimiento de la Generalidad de Cataluna, 2017 SGR 447 (SGR), Agencia Estatal de Investigacion (AEI) and FEDER under Project BFU2017-88264-P (Plan Estatal). I.E. also acknowledges the following CRG funding sources: support of the Spanish Ministry of Economy, Industry and Competitiveness (MEIC) to the EMBL partnership, Centro de Excelencia Severo Ochoa, the CERCA Programme / Generalitat de Catalunya and the European Regional Development Fund (ERDF)

12 citations

Frequently Asked Questions (9)
Q1. How many samples were used in the biocrust soils study?

For the biocrust soils study, there were 19 samples and after filtering there were 466 unique microbial taxa and 85 metabolite features. 

Microbial counts were then drawn from a multinomial logistic normal distribution and fed into MMvec to generate the metabolite counts. 

Taxa that appeared in less than 10 samples for each study were removed, since there are fewer samples than degrees of freedom in the model to infer these microbes co-occurrence patterns. 

The simulations were created by using the generative form of MMvec; the microbe and metabolite factor loadings were randomly generated from a normal distribution to parameterize the MMvec parameters. 

Involved in the study Antibodies Eukaryotic cell lines Palaeontology Animals and other organisms Human research participants Clinical data Methods n/a Involved in the study ChIP-seq Flow cytometry MRI-based neuroimaging 

For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable. 

For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 

To identify scenarios where CLR correlations underperformed in comparison to MMvec, the authors used Bayesian Optimization to tune the distributions used to generate the simulations. 

A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals)