scispace - formally typeset
Search or ask a question
Posted ContentDOI

Combined effects of host genetics and diet on human gut microbiota and incident disease in a single population cohort

TL;DR: A combination of genetics and dietary habits was shown to strongly shape the abundances of certain key bacterial members of the gut microbiota, and explain their genetic association, and this work identifies putative causal relationships between gut microbes and complex diseases using MR.
Abstract: Co-evolution between humans and the microbial communities colonizing them has resulted in an intimate assembly of thousands of microbial species mutualistically living on and in their body and impacting multiple aspects of host physiology and health. Several studies examining whether human genetic variation can affect gut microbiota suggest a complex combination of environmental and host factors. Here, we leverage a single large-scale population-based cohort of 5,959 genotyped individuals with matched gut microbial shotgun metagenomes, dietary information and health records up to 16 years post-sampling, to characterize human genetic variations associated with microbial abundances, and predict possible causal links with various diseases using Mendelian randomization (MR). Genome-wide association study (GWAS) identified 583 independent SNP-taxon associations at genome-wide significance (p

Summary (6 min read)

Introduction

  • Humans have co-evolved with the microbial communities that colonize them, resulting in a complex assembly of thousands of microbial species mutualistically living in their gastrointestinal tract.
  • Nonetheless, a well-described association between Bifidobacterium levels and LCT-MCM6, governing the phenotype of lactase persistence throughout adulthood in Europeans, was uncovered in 2015 4 and subsequently replicated by later studies 6, 7, [9] [10] [11] [12] , suggesting a very strong influence of the evolution of dairy diet in modern humans on their gut bacteria.
  • The individual gut microbiota is largely influenced by environmental variables, mostly diet and medication [21] [22] [23] , which could explain a larger proportion of microbiome variance than identifiable host genetic factors 9, 10 .
  • Biological factors could also influence the cross-study reproducibility of results.

Genome-wide association analysis of gut microbial taxa

  • Genome-wide association tests were applied to 2,801 microbial taxa and 7,979,834 human genetic variants from 5,959 individuals enrolled in the FR02 cohort, which includes all taxa discovered to be prevalent in >25% of the cohort .
  • Conditional analysis found 583 independent SNP-taxon associations at genome-wide significance (Table S1 ).
  • Heritability across the 2,801 taxa ranged between h 2 =0.001 to 0.214, with the highest values observed for taxa belonging to the Firmicutes and Firmicutes_A GTDB phyla, both of which encompassed half (241/476, 50.4%) of all associated taxa with genetic variation .
  • The association of these three Firmicutes_A with LCT was still genome-wide significant after adjusting for Bifidobacterium abundances (Table S2 ).
  • A variant in ABO (rs545971), expressing the histo-blood group ABO system transferase, was strongly associated (p=1.1×10 -12 ) with levels of Faecalicatena lactaris.

Human gut microbiome keystone taxa are associated with genetic variation

  • Only one documented keystone species from Banerjee et al.
  • This observation suggests that keystone species, although defined as exerting selective modulation and not broad effects on microbiome composition variation, generally associates with human genetic variation, suggesting an intimate association with the human gut niche, in line with their reported key ecological roles in microbiome modulation and functioning.
  • The authors work highlights novel human genotypes possibly associated with keystone taxa (Table S1 ), which could further improve their understanding of their ecology.

Combined effect of host genetics and dietary dairy intake on gut levels of LCT-associated bacteria

  • The authors compared the abundances of 4 bacterial taxa strongly associated with the LCT locus (Bifidobacterium genus, Negativibacillus genus, UBA3855 sp900316885 and CAG-81 sp000435795) in individuals with different rs4988235 genotypes and dairy diets .
  • CC) self-reporting a regular dairy diet had a significant increase in Bifidobacterium abundance (p=1.75×10 -13 ; Wilcoxon-rank test), also known as lactose-intolerant individuals (rs4988235.
  • A clustering of carbohydrate-active enzymes profiles from reference genomes of all 11 Bifidobacterium species revealed that B. dentium clustered apart from the 10 other species, which grouped consistently with their co-abundance patterns .
  • Functionally distinct ABO-associated bacteria are impacted differently by genotype and dietary fiber intake A variety of bacteria metabolize blood antigens, with potential applications in synthetic universal donor blood production 42, 43 .
  • Both levels of F. lactaris and Collinsella were significantly higher when individuals were predicted to secrete A-, B-and AB-antigens in their gut mucosa (p<2.2×10 -16 and p=1.3×10 -8 , respectively) .

Causal inference predictions between microbes and diseases highlight causal effect of Morganella on MDD

  • When MR was performed in the reverse direction, using disease risk as an exposure and microbial levels as an outcome, most predicted causal effects involved autoimmune and inflammatory diseases but the strongest predicted causal effect involved type 2 diabetes (T2D) (Table S6 ).
  • Doubling the genetic risk of T2D (possibly accompanied by external factors such as hypoglycaemic medications or metformin intake) was predicted to reduce levels of the uncultured CAG-345 sp000433315 species (Firmicutes phylum) by 0.14 SD (SE=0.04, p=3.0×10 -4 , MR method IVW).
  • Finally, a higher genetic risk for multiple sclerosis (MS) was predicted to cause a reduction in the abundance of Lactobacillus_B ruminis, consistent with the report that Lactobacillus sp. can reduce symptom severity in an animal model of MS 61 .

Discussion

  • Here, through GWAS and the subsequent investigation of functional and ecological factors contributing to the most robust human-microbe associations, the authors present a diverse and global picture of human-microbe interactions in a single cohort of ~6,000 European individuals.
  • Two of these loci, LCT and ABO, are well-known and very segregated in human populations, possibly explaining why their homogenous European cohort identified them as being associated so strongly.
  • A third more mysterious association with the MED13L locus highlights possible links with cancer while predictive causal inference highlights several diseases as being causally linked to gut microbes.

Lactase persistence as a recently evolved strong modulator of gut bacterial abundances

  • Lactase persistence, or the continued ability to digest lactose into adulthood, is the most strongly selected single-gene trait over the last 10,000 years in multiple human populations 65 , believed to have spread amongst humans with the advent of animal domestication and the culturally transmitted practice of dairying 66 .
  • While self-reported dietary information is not entirely reliable due to various social reasons 67, 68 , their study population was large, and the differences were significant enough to consider this a robust observation.
  • Hints of a possible competitive relationship between Bifidobacterium and Negativibacillus, another LCTassociated taxon were revealed, which could be mediated by lactose intake and will need to be investigated further in functional studies.
  • Two interesting questions stem from their findings.
  • Secondly, despite recent progresses, lactose intolerance is still largely underdiagnosed, and genetic prediction rates from large population studies exceed lactose intolerance prevalence rates obtained using physical tests 70 .

Blood antigen secretion can influence levels of specific gut microbial commensals

  • The ABO gene expresses a glycosyltransferase in many cell types, which determines the ABO blood group of an individual by modifying the oligosaccharides on cell surface glycoproteins.
  • Indeed, many infectious diseases such as norovirus infection, bacterial meningitis, malaria, cholera 76 , or even more recently SARS-CoV-2 77, 78 are associated with host blood type and secretor status 76 , suggesting that infection could be a driver of a strong balancing selection that has maintained ABO polymorphisms.
  • An important research effort aiming to enzymatically produce synthetic universal donor blood has driven a push for screening a large diversity of CAZymes, including bacteria, revealing substrate affinities for blood antigens across various microbes 42, 43 .
  • F. lactaris is strongly associated with ABO genetic variation in their European cohort, and is differentially abundant in people according to their predicted gut mucosal secretion of A/B/AB-antigens.
  • Interestingly, their findings are not consistent with F. lactaris switching to a fiber-degrading activity in individuals reporting a high fiber diet, unlike other mucin-degrading bacteria in their study and in the literature 48 and Collinsella, another ABO-associated taxon .

The case for larger datasets and including uncultured novel species in metagenomic studies

  • The authors study highlights the benefits of increasing sample size to increase the statistical power for discovery.
  • ABO allelic variation is also notoriously affected by geography 92 , which could explain why some meta-analyses in non-homogenous populations could miss it or not.
  • Importantly, metagenomic sequencing with standardized, robust taxonomic definitions 93, 94 can provide species-level characterization of microbial profiles in the gut of individuals, which is challenging when using 16S rRNA-based studies.
  • An example from their work is the observation that Bifidobacterium dentium was prevalent but not associated with the LCT locus like all other Bifidobacterium species in the population.
  • Furthermore, GTDB taxonomic standardization results in greater taxon granularity, i.e. smaller, more discrete clades of similar phylogenetic depth than commonly known lineages or species 93, 94 .

Study population

  • The FINRISK study population has been extensively described elsewhere 98 .
  • FINRISK population surveys have been performed every 5 years since 1972 to monitor trends in cardiovascular disease risk factors in the Finnish population 98, 99 .
  • The sampling was stratified by sex, region and 10-year age group so that each stratum had 250 participants.
  • They also received a sampling kit and instructions to donate a stool sample at home and mailed it to the Finnish Institute for Health and Welfare in an overnight mail.
  • The study was conducted according to the World Medical Association's Declaration of Helsinki on ethical principles.

Cohort phenotype metadata and specific dietary information

  • The phenotype data in this study comprised of demographic characteristics, life habits, disease history, laboratory test results and follow-up electronic health records (EHRs).
  • More specifically, baseline dietary factors were collected.
  • Participants were asked to provide answers to exhaustive diet questionnaires when they were enrolled in the study.
  • To broadly assess diet information within the cohort participants, a binary variable was used to indicate whether individuals were self-reporting to follow various possible dietary restrictions.
  • Dietary consumption of specific food product categories was also reported.

Self-reporting of lactose-free diet and dietary fibre consumption

  • Allelic distribution at the LCT-MCM6:rs4988235 variant responsible for lactase persistence in Europeans was as following in their study population: 1,936 (35%) individuals had the T/T allele conferring a lactase persistence phenotype through adulthood, allowing them to digest lactose, while 981 (18%) individuals had the C/C allele conferring lactose intolerance.
  • Most individuals (n=2,611, 47%) had the intermediate allele C/T making them likely to be able to digest lactose.
  • A total fiber consumption score was calculated from the questionnaires, reflecting the overall consumption of a combination of various fiber-rich foods such as high-fiber bread, vegetables (vegetable foods, fresh and boiled) and berries (fruits, berries and natural juices).
  • The resulting total fiber index values ranged from 9 (low dietary fiber intake) to 48 (high dietary fiber intake), with a median of 33.

Genotyping, imputation and quality control

  • The genotyping was performed on Illumina genome-wide SNP arrays (the HumanCoreExome BeadChip, the Human610-Quad BeadChip and the HumanOmniExpress) and has been described previously 102 .
  • Stringent criteria were applied to remove samples and variants of low quality.
  • To evaluate the imputation quality, the authors compared the sample allele frequencies with reference populations and examined imputation quality (INFO scores) distributions.
  • Both genotyped and imputed SNPs were kept for analysis if they met the following criteria: call rate >90%, no significant deviation from Hardy-Weinberg Equilibrium (p>1.0×10 -6 ), and minor allele frequency >1%.

Metagenomic sequencing from stool samples

  • Stool samples were collected by participants and mailed overnight to Finnish Institute for Health and Welfare for storing at -20°C; the samples were sequenced at the University of California San Diego in 2017.
  • The gut microbiome was characterized by shallow shotgun metagenomics sequencing with Illumina HiSeq 4000 Systems.
  • The authors successfully performed stool shotgun sequencing in n=7,231 individuals.
  • The detailed procedures for DNA extraction, library preparation and sequence processing have been previously described 101 .
  • To preserve the quality of data while retaining most of the disease cases, samples with a total number of sequenced reads lower than 400,000 were removed.

Taxonomic profiling, quality filtering and data transformation

  • Taxonomic profiling of FR02 metagenomes has been described elsewhere 100, 106 .
  • For each metagenome at phylum, class, order, family, genus and species levels, the relative abundance of a taxon was computed as the proportion of reads assigned to the clade rooted at this taxon among total classified reads.
  • For the purpose of this association study and because of reduced accuracy and power when considering rare taxa, the authors focused on common and relatively abundant microbial taxa, defined as prevalent in >25% studied individuals, and defined with at least 10 mapped reads per individual.
  • CLR transformed data can vary in real space and better fit the normality assumption of linear regression.
  • This process was performed using the R package compositions.

Genome-wide association analysis

  • The protocol followed in this study was described elsewhere 111 .
  • As microbes interact non-independently with each other in the gut, as part of larger ecological and functional communities, matSpDlite 115, 116 was used to estimate the number of independent tests based on eigenvalue variance, the larger the eigenvalue variance the smaller the number of effective tests.
  • Prediction of ABO blood groups and secretor status SNP-based typing of ABO histo-blood group was performed.
  • The two predictions were highly consistent, with over 99.9% concordance.

Bidirectional two-sample Mendelian randomization (MR) analysis

  • Causal relationships between diseases and gut microbiota were investigated at genus and species levels only to maximise interpretability.
  • SNP instruments for disease exposures were selected at genome-wide significant threshold (p<5×10 -8 ).
  • IVW is the most sensitive method which requires all instruments are valid.
  • MR-Egger allows instruments having non-zero pleiotropy and provides way to test and estimate the pleiotropy effect in addition to causal estimate.

Cox proportional hazards regression

  • Cox proportional hazards regression was conducted to test the association between baseline abundance of gut microbe and incident major depression (16 years follow-up, n=181 incident events).
  • Microbial abundances were CLR-transformed and standardized to zero-mean and unitvariance.
  • The Cox models were stratified by sex and adjusted for age and log-transformed BMI, with time-on-study as the time scale.
  • Participants with prevalent major depression at baseline were excluded.
  • R function coxph in the R package survival was used for this analysis.

Profiling of carbohydrate-active enzymes (CAZymes) in bacterial genomes

  • The standalone run_dbCAN2 v2.0.11 tool 127 (https://github.com/linnabrown/run_dbcan) was used to scan for the presence of CAZyme genes from public assembled bacterial genomes taken from the GTDB release 89 reference.
  • In total, the authors scanned 327 Bifidobacterium sp., 2 Faecalicatena lactaris and 15 Collinsella sp. reference genomes included in GTDB release 89.
  • Three methods were compared as part of the run_dbCAN2 procedure (HMMER, DIAMOND, and Hotpep).
  • The authors considered a positive detection result when all three methods agreed on a CAZyme family identification.
  • Identification of preferred reported substrates for the various CAZyme families was done manually from key publications 48, 129 , from literature searches and from the CAZypedia website 130 .

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Combined effects of host genetics and diet on human gut microbiota and 1
incident disease in a single population cohort 2
3
Youwen Qin
1,2
, Aki S. Havulinna
3
, Yang Liu
1,4
, Pekka Jousilahti
3
, Scott C. Ritchie
1,5-7
, Alex Tokolyi
8
, 4
Jon G. Sanders
9,10
, Liisa Valsta
3
, Marta Bro
ż
y
ń
ska
1
, Qiyun Zhu
11
, Anupriya Tripathi
11,12
, Yoshiki 5
Vazquez-Baeza
13,14
, Rohit Loomba
15
, Susan Cheng
16
, Mohit Jain
11,13
, Teemu Niiranen
3,17
, Leo Lahti
18
, 6
Rob Knight
11,13,14
, Veikko Salomaa
3
, Michael Inouye
1,2,5-7,19-21*
§, Guillaume Méric
1,22*
§ 7
8
1
Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, 9
Australia;
2
School of BioSciences, The University of Melbourne, Melbourne, Victoria, Australia;
3
Department of 10
Public Health Solutions, Finnish Institute for Health and Welfare, Helsinki, Finland;
4
Department of Clinical 11
Pathology, The University of Melbourne, Melbourne, Victoria, Australia;
5
Cambridge Baker Systems Genomics 12
Initiative, Department of Public Health and Primary Care, University of Cambridge, UK;
6
British Heart 13
Foundation Centre of Research Excellence, University of Cambridge, UK;
7
National Institute for Health Research 14
Cambridge Biomedical Research Centre, University of Cambridge and Cambridge University Hospitals, 15
Cambridge, UK;
8
Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK;
9
Department of Ecology 16
and Evolutionary Biology, Cornell University, Ithaca, NY, USA;
10
Cornell Institute for Host-Microbe Interaction 17
and Disease, Cornell University, Ithaca, NY, USA;
11
Department of Pediatrics, School of Medicine, University of 18
California San Diego, La Jolla, CA, USA;
12
Division of Biological Sciences, University of California San Diego, 19
La Jolla, California, USA;
13
Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, 20
USA;
14
Department of Computer Science & Engineering, Jacobs School of Engineering, University of California 21
San Diego, La Jolla, CA, USA;
15
NAFLD Research Center, Department of Medicine, University of California San 22
Diego, La Jolla, CA, USA;
16
Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA; 23
17
Department of Medicine, Turku University Hospital and University of Turku, Turku, Finland;
18
Department of 24
Future Technologies, University of Turku, Turku, Finland;
19
British Heart Foundation Cardiovascular 25
Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, UK;
20
Health Data 26
Research UK Cambridge, Wellcome Genome Campus & University of Cambridge, UK;
21
The Alan Turing 27
Institute, London, UK;
22
Department of Infectious Diseases, Central Clinical School, Monash University, 28
Melbourne, Victoria, Australia.
29
30
§ These authors contributed equally 31
*Corresponding authors: Michael Inouye: mi336@medschl.cam.ac.uk
; Guillaume Méric: 32
guillaume.meric@baker.edu.au
. 33
34
. CC-BY 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.(which was not certified by peer review)preprint
The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.12.20193045doi: medRxiv preprint
NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.

Abstract 35
36
Co-evolution between humans and the microbial communities colonizing them has resulted in 37
an intimate assembly of thousands of microbial species mutualistically living on and in their 38
body and impacting multiple aspects of host physiology and health. Several studies examining 39
whether human genetic variation can affect gut microbiota suggest a complex combination of 40
environmental and host factors. Here, we leverage a single large-scale population-based cohort 41
of 5,959 genotyped individuals with matched gut microbial shotgun metagenomes, dietary 42
information and health records up to 16 years post-sampling, to characterize human genetic 43
variations associated with microbial abundances, and predict possible causal links with various 44
diseases using Mendelian randomization (MR). Genome-wide association study (GWAS) 45
identified 583 independent SNP-taxon associations at genome-wide significance (p<5.0×10
-8
), 46
which included notable strong associations with LCT (p=5.02×10
-35
), ABO (p=1.1×10
-12
), and 47
MED13L (p=1.84×10
-12
). A combination of genetics and dietary habits was shown to strongly 48
shape the abundances of certain key bacterial members of the gut microbiota, and explain their 49
genetic association. Genetic effects from the LCT locus on Bifidobacterium and three other 50
associated taxa significantly differed according to dairy intake. Variation in mucin-degrading 51
Faecalicatena lactaris abundances were associated with ABO, highlighting a preferential 52
utilization of secreted A/B/AB-antigens as energy source in the gut, irrespectively of fibre 53
intake. Enterococcus faecalis levels showed a robust association with a variant in MED13L, 54
with putative links to colorectal cancer. Finally, we identified putative causal relationships 55
between gut microbes and complex diseases using MR, with a predicted effect of Morganella 56
on major depressive disorder that was consistent with observational incident disease analysis. 57
Overall, we present striking examples of the intricate relationship between humans and their 58
gut microbial communities, and highlight important health implications. 59
60
. CC-BY 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.(which was not certified by peer review)preprint
The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.12.20193045doi: medRxiv preprint

Introduction 61
62
Humans have co-evolved with the microbial communities that colonize them, resulting in a 63
complex assembly of thousands of microbial species mutualistically living in their 64
gastrointestinal tract. A fine-tuned interplay between microbial and human physiologies can 65
impact multiple aspects of development and health to the point that dysbiosis is often 66
associated with disease
1–3
. As such, increasing evidence points to the influence of human 67
genetic variation on the composition and modulation of their gut microbiota. 68
69
Past genetic studies have collectively revealed important host-microbe interactions
4–14
. 70
Previous twin studies detected significant heritability signal from the presence and abundance 71
of only a few microbial taxa, such as some Firmicutes
15
, suggesting a strong transientness and 72
variability in gut microbial composition, as well as an important influence from external 73
factors
6,15–18
. Nonetheless, a well-described association between Bifidobacterium levels and 74
LCT-MCM6, governing the phenotype of lactase persistence throughout adulthood in 75
Europeans, was uncovered in 2015
4
and subsequently replicated by later studies
6,7,9–12
, 76
suggesting a very strong influence of the evolution of dairy diet in modern humans on their gut 77
bacteria. Additionally, genes involved in immune and metabolic processes
9
but also disease
19
78
were also associated with gut microbial variation. Despite several promising findings, 79
reproducibility across studies varying in sampling and methods is generally poor, and most 80
previously reported associations lose significance after multiple testing corrections
20
. The 81
individual gut microbiota is largely influenced by environmental variables, mostly diet and 82
medication
21–23
, which could explain a larger proportion of microbiome variance than 83
identifiable host genetic factors
9,10
. Biological factors could also influence the cross-study 84
reproducibility of results. GWAS would typically not reproducibly identify genetic 85
associations with taxa harbouring microbial functions potentially shared by multiple unrelated 86
species
24,25
. Indeed, a certain degree of functional redundancy has been observed in human gut 87
microbial communities
25
, which is believed to play a role in the resistance and resilience to 88
perturbations
26–28
. However, both assembly and functioning in human gut microbial 89
communities seem to be driven by the presence of a few particular and identifiable keystone 90
taxa
29
, which exert key ecological and modulatory roles on gut microbial composition 91
independently of their abundance
30,31
. Such taxa are relatively prevalent across individuals and 92
thought to be part of the human “core” microbiota
30,31
, which makes them potentially 93
identifiable through GWAS. 94
95
Increasing sample size in studied populations could yield novel and robustly associated results, 96
and alleviate the effect of confounding technical or biological factors. This could be achieved 97
either by performing meta-analyses of GWAS conducted in various populations
12
, or by using 98
larger cohort datasets. In this study, we used a large single homogenous population cohort with 99
matching human genotypes and shotgun faecal metagenomes (N=5959; FINRISK 2002 100
(FR02)) to identify novel genome-wide associations between human genotypes and gut 101
microbial abundances (Figure S1). We further leveraged additional and extensive health 102
registry and dietary individual data to investigate the effects of diet and genotype on particular 103
host-microbial associations, and to predict incident disease linked to gut microbial variation. 104
105
Results
106
107
Genome-wide association analysis of gut microbial taxa 108
109
Genome-wide association tests were applied to 2,801 microbial taxa and 7,979,834 human 110
genetic variants from 5,959 individuals enrolled in the FR02 cohort, which includes all taxa 111
. CC-BY 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.(which was not certified by peer review)preprint
The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.12.20193045doi: medRxiv preprint

discovered to be prevalent in >25% of the cohort (Methods). Using a genome-wide 112
significance threshold (p<5.0×10
-8
), a total of 478 distinct GTDB taxa, which represented 17% 113
of all tested taxa and included 11 phyla, 19 classes, 24 orders, 63 families, 148 genera and 213 114
species, were found to be associated with at least one genetic variant (Figure 1, Table S1). 115
Conditional analysis found 583 independent SNP-taxon associations at genome-wide 116
significance (Table S1). Heritability across the 2,801 taxa ranged between h
2
=0.001 to 0.214, 117
with the highest values observed for taxa belonging to the Firmicutes and Firmicutes_A GTDB 118
phyla, both of which encompassed half (241/476, 50.4%) of all associated taxa with genetic 119
variation (Figure S2). There were no differences in SNP heritability between groups of 120
associated or non-associated taxa at genome-wide significance (p=0.23). 121
122
Three loci were strongly associated with microbial variation at study-wide significance, as 123
shown on a Manhattan plot showing the lowest resulting p-value for each SNP tested against 124
each of the 2,801 taxa (Figure 1, Table 1). There was no evidence of excess false positive rate 125
in the GWAS (median
λ
GC
=1.0051) (Figure 1B). After conditional analysis, the strongest 126
association by far (p=5.0×10
-35
) involved members of class Actinobacteria and rs3940549, a 127
variant in the LCT-MCM6-ZRANB3 locus region which is in high LD (r
2
=0.87) with the well-128
described LCT variant rs4988235 causing lactase persistence in adults of European ancestry 129
(Figure S3). In total, 29 taxa were associated with the LCT-MCM6 region, including 18 below 130
study-wide significance (Figure 1, Table S1). These involved Bifidobacterium-related 131
Actinobacteriota and three taxa from the GTDB Firmicutes_A phylum which included 2 132
uncultured species defined from metagenome-assembled reference genomes (UBA3855 133
sp900316885 and CAG-81 sp000435795) (Table 1). The association of these three 134
Firmicutes_A with LCT was still genome-wide significant after adjusting for Bifidobacterium 135
abundances (Table S2). A variant in ABO (rs545971), expressing the histo-blood 136
group ABO system transferase, was strongly associated (p=1.1×10
-12
) with levels of 137
Faecalicatena lactaris. There was evidence for a second independent signal at ABO associated 138
with the Collinsella genus (chr9:133271182; p=2.5×10
-8
) (Table S1, Figure 1). Rs187309577 139
and rs143507801 in MED13L, expressing the Mediator complex subunit 13L, were found to be 140
associated with genus Enterococcus (p=1.8×10
-12
) and the Enterococcus faecalis species 141
(p=7.26×10
-11
), respectively (Table S1, Figure 1). 142
143
Human gut microbiome keystone taxa are associated with genetic variation 144
145
In total, we identified 31 distinct genetic variants associated (p<5.0×10
-8
) with 39 microbial 146
taxa related to identified keystone species as listed by Banerjee et al. (2018)
29,32
, which 147
included the Actinobacteria class
30
, Helicobacter pylori
29
, Bacteroides stercoris
33
, Bacteroides 148
thetaiotaomicron
34
, Ruminococcus bromii
35
, Klebsiella pneumoniae
36
, Proteus mirabilis
36
, 149
Akkermansia muciniphila
31
, and the archaeon Methanobrevibacter smithii
37,38
(Figure 1C, 150
Table S1). Only one documented keystone species from Banerjee et al.
29
, Bacteroides 151
fragilis
39
, was not associated with genetic variation in our study. This observation suggests that 152
keystone species, although defined as exerting selective modulation and not broad effects on 153
microbiome composition variation, generally associates with human genetic variation, 154
suggesting an intimate association with the human gut niche, in line with their reported key 155
ecological roles in microbiome modulation and functioning. Our work highlights novel human 156
genotypes possibly associated with keystone taxa (Table S1), which could further improve our 157
understanding of their ecology. 158
159
Combined effect of host genetics and dietary dairy intake on gut levels of LCT-associated 160
bacteria 161
162
. CC-BY 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.(which was not certified by peer review)preprint
The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.12.20193045doi: medRxiv preprint

We compared the abundances of 4 bacterial taxa strongly associated with the LCT locus 163
(Bifidobacterium genus, Negativibacillus genus, UBA3855 sp900316885 and CAG-81 164
sp000435795) in individuals with different rs4988235 genotypes and dairy diets (Figure 2A). 165
The abundance of Bifidobacterium in individuals producing lactase through adulthood 166
(rs4988235:TT) was unaffected by dairy intake. However, lactose-intolerant individuals 167
(rs4988235:CC) self-reporting a regular dairy diet had a significant increase in Bifidobacterium 168
abundance (p=1.75×10
-13
; Wilcoxon-rank test). An intermediate genotype (rs4988235:CT) was 169
linked to an intermediate increase (Figure 2A). This trend did not seem to be affected by age
40
170
(Figure S4). 171
172
An inverse pattern was observed for the abundance distributions of Negativibacillus and 173
uncultured CAG-81 sp000435795, for which abundances decreased in lactose intolerant 174
individuals reporting dairy intake, as compared to rs4988235:TT individuals consuming dairy 175
products (p=0.049 and p=0.041, respectively) (Figure 2A). Levels of UBA3855 sp900316885 176
were unaffected by a dairy diet in lactose-intolerant individuals but were surprisingly lower in 177
rs4988235:TT individuals who reported dairy intake (p=8.23×10
-5
) (Figure 2A). These 178
opposite and contrasting effects of dairy intake on associated bacterial abundances in lactose-179
intolerant individuals could reflect competition for lactose in the gut. Genus CAG-81 180
abundances were the most negatively correlated with those of the other LCT-associated taxa 181
(Figure S5), which suggests that this competition could be strong and prevalent enough to 182
drive co-association at the LCT locus, possibly mediated by lactose intake (Figure 2B). 183
184
Functional profiling of CAZymes in 11 Bifidobacterium species 185
186
Of all 11 Bifidobacterium species prevalent enough in our study population to be included in 187
the GWAS, only B. dentium was not associated with the LCT locus (p=1.70×10
-2
), nor was it 188
co-abundant with any other Bifidobacterium species (Figure S6A). B. dentium has previously 189
been suggested to have different metabolic abilities
41
. A clustering of carbohydrate-active 190
enzymes (CAZyme) profiles from reference genomes of all 11 Bifidobacterium species 191
revealed that B. dentium clustered apart from the 10 other species, which grouped consistently 192
with their co-abundance patterns (Figure S6B). B. dentium harboured more genes encoding 193
CAZyme families with preferred fiber/plant-related substrates (GH94, GH26, GH53) than 194
other Bifidobacterium species, which seemed to harbour more milk oligosaccharide-targeting 195
CAZyme families (GH129, GH112) than B. dentium (Figure S6B), which could relate to the 196
observed association differences. This suggests that bacterial metabolic abilities can be strong 197
drivers of co-abundance, and of association with human genetic variation. 198
199
Functionally distinct ABO-associated bacteria are impacted differently by genotype and 200
dietary fiber intake 201
202
A variety of bacteria metabolize blood antigens, with potential applications in synthetic 203
universal donor blood production
42,43
. Gut bacteria are particularly exposed to A- and B-204
antigens in the gut mucosa of secretor individuals
44
. Our associations of Faecalicatena lactaris 205
(p=1.10×10
-12
) and Collinsella (p=2.59×10
-8
) with ABO suggest a possible metabolic link with 206
blood antigens. A comparison of CAZyme profiles across a set of reference genomes revealed 207
3 CAZymes with blood-related activities in F. lactaris (GH110
45
, GH136
46
, CBM32
47
), but 208
none in any of 9 Collinsella species (Figure 3A). More mucus-targeting and less fiber-209
degrading enzymes were found in F. lactaris than Collinsella (Figure 3A), suggesting distinct 210
functions in the gut. 211
212
. CC-BY 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.(which was not certified by peer review)preprint
The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.12.20193045doi: medRxiv preprint

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper , a genome-wide association study of 207 taxa and 205 pathways representing microbial composition and function in 7,738 participants of the Dutch Microbiome Project was performed.
Abstract: Host genetics are known to influence the gut microbiome, yet their role remains poorly understood. To robustly characterize these effects, we performed a genome-wide association study of 207 taxa and 205 pathways representing microbial composition and function in 7,738 participants of the Dutch Microbiome Project. Two robust, study-wide significant (P < 1.89 × 10-10) signals near the LCT and ABO genes were found to be associated with multiple microbial taxa and pathways and were replicated in two independent cohorts. The LCT locus associations seemed modulated by lactose intake, whereas those at ABO could be explained by participant secretor status determined by their FUT2 genotype. Twenty-two other loci showed suggestive evidence (P < 5 × 10-8) of association with microbial taxa and pathways. At a more lenient threshold, the number of loci we identified strongly correlated with trait heritability, suggesting that much larger sample sizes are needed to elucidate the remaining effects of host genetics on the gut microbiome.

82 citations

Journal ArticleDOI
TL;DR: In this article, a genome-wide association analysis of 8,956 German individuals, identified 38 genetic loci to be associated with single bacteria and overall microbiome composition, and further analyses confirm the identified associations of ABO histo-blood groups and FUT2 secretor status with Bacteroides and Faecalibacterium spp.
Abstract: The intestinal microbiome is implicated as an important modulating factor in multiple inflammatory1,2, neurologic3 and neoplastic diseases4. Recent genome-wide association studies yielded inconsistent, underpowered and rarely replicated results such that the role of human host genetics as a contributing factor to microbiome assembly and structure remains uncertain5-11. Nevertheless, twin studies clearly suggest host genetics as a driver of microbiome composition11. In a genome-wide association analysis of 8,956 German individuals, we identified 38 genetic loci to be associated with single bacteria and overall microbiome composition. Further analyses confirm the identified associations of ABO histo-blood groups and FUT2 secretor status with Bacteroides and Faecalibacterium spp. Mendelian randomization analysis suggests causative and protective effects of gut microbes, with clade-specific effects on inflammatory bowel disease. This holistic investigative approach of the host, its genetics and its associated microbial communities as a 'metaorganism' broaden our understanding of disease etiology, and emphasize the potential for implementing microbiota in disease treatment and management.

75 citations

Journal ArticleDOI
TL;DR: In this article , the effect of host genotype on the composition of the intestinal microbiota in a large mosaic pig population was studied and it was shown that, under conditions of exacerbated genetic diversity and environmental uniformity, microbiota composition and the abundance of specific taxa are heritable.
Abstract: The composition of the intestinal microbiome varies considerably between individuals and is correlated with health1. Understanding the extent to which, and how, host genetics contributes to this variation is essential yet has proved to be difficult, as few associations have been replicated, particularly in humans2. Here we study the effect of host genotype on the composition of the intestinal microbiota in a large mosaic pig population. We show that, under conditions of exacerbated genetic diversity and environmental uniformity, microbiota composition and the abundance of specific taxa are heritable. We map a quantitative trait locus affecting the abundance of Erysipelotrichaceae species and show that it is caused by a 2.3 kb deletion in the gene encoding N-acetyl-galactosaminyl-transferase that underpins the ABO blood group in humans. We show that this deletion is a ≥3.5-million-year-old trans-species polymorphism under balancing selection. We demonstrate that it decreases the concentrations of N-acetyl-galactosamine in the gut, and thereby reduces the abundance of Erysipelotrichaceae that can import and catabolize N-acetyl-galactosamine. Our results provide very strong evidence for an effect of the host genotype on the abundance of specific bacteria in the intestine combined with insights into the molecular mechanisms that underpin this association. Our data pave the way towards identifying the same effect in rural human populations.

43 citations

Journal ArticleDOI
Richard P. Evershed, George Davey Smith, Mélanie Roffet-Salque, Adrian Timpson, Yoan Diekmann, Matthew Lyon, Lucy J E Cramp, Emmanuelle Casanova, Jessica Smyth, Helen Whelton, Julie Dunne, Veronika Brychová, Lucija Šoberl, Pascale Gerbault, Rosalind Gillis, Volker M Heyd, Emily Johnson, Iain Kendall, Katie Manning, Arkadiusz Marciniak, Alan K. Outram, Jean-Denis Vigne, Stephen Shennan, Andrew Bevan, Sue Colledge, Lyndsay Allason-Jones, L. Amkreutz, Alexandra Anders, Rose-Marie Arbogast, Adrian Bălăşescu, Eszter Bánffy, Alistair Barclay, Anja Behrens, Peter Bogucki, Ángel Carrancho Alonso, José Miguel Carretero, Nigel Cavanagh, Erich Claßen, Hipólito Collado Giraldo, Matthias Conrad, Piroska Csengeri, Lech Czerniak, Maciej Dębiec, Anthony Denaire, László Domboróczki, Christina Donald, Julia Ebert, Christopher H. Evans, Marta Francés-Negro, Detlef Gronenborn, Fabian Haack, Matthias Halle, Caroline Hamon, Roman Hülshoff, Michael Ilett, Eneko Iriarte, János Jakucs, Christian Jeunesse, Melanie Johnson, Andy Jones, Necmi Karul, Dmytro Kiosak, Nadezhda Kotova, Rüdiger Krause, Saskia Kretschmer, Marta Krüger, Philippe Lefranc, Olivia Lelong, Eva Lenneis, Andrey Logvin, Friedrich A. K. Lüth, Tibor Marton, Jane Marley, Richard Hugh Roger Mortimer, Luiz Oosterbeek, Krisztián Oross, Juraj Pavúk, J. Pechtl, Pierre Pétrequin, Joshua Pollard, Richard Pollard, Dominic Powlesland, Joanna Pyzel, Pál Raczky, A. Richards, Peter Rowe, Stephen Rowland, I.M. Rowlandson, Thomas Saile, Katalin Sebők, Wolfram Schier, G. Schmalfuss, S.V. Sharapova, H. H. Sharp, Alison Sheridan, Irina Shevnina, Iwona Sobkowiak-Tabaka, Peter F. Stadler, Harald Stäuble, Astrid Stobbe, Darko Stojanovski, Nenad Tasić, Ivo van Wijk, Ivana Vostrovská, Jasna Vuković, Sabine Wolfram, Andrea Zeeb-Lanz, Mark G. Thomas 
TL;DR: In this article , the authors provided detailed distributions of milk exploitation across Europe over the past 9,000 years using around 7,000 pottery fat residues from more than 550 archaeological sites and proposed that lactase nonpersistent individuals consumed milk when it became available but, under conditions of famine and/or increased pathogen exposure, this was disadvantageous, driving LP selection in prehistoric Europe.
Abstract: In European and many African, Middle Eastern and southern Asian populations, lactase persistence (LP) is the most strongly selected monogenic trait to have evolved over the past 10,000 years1. Although the selection of LP and the consumption of prehistoric milk must be linked, considerable uncertainty remains concerning their spatiotemporal configuration and specific interactions2,3. Here we provide detailed distributions of milk exploitation across Europe over the past 9,000 years using around 7,000 pottery fat residues from more than 550 archaeological sites. European milk use was widespread from the Neolithic period onwards but varied spatially and temporally in intensity. Notably, LP selection varying with levels of prehistoric milk exploitation is no better at explaining LP allele frequency trajectories than uniform selection since the Neolithic period. In the UK Biobank4,5 cohort of 500,000 contemporary Europeans, LP genotype was only weakly associated with milk consumption and did not show consistent associations with improved fitness or health indicators. This suggests that other reasons for the beneficial effects of LP should be considered for its rapid frequency increase. We propose that lactase non-persistent individuals consumed milk when it became available but, under conditions of famine and/or increased pathogen exposure, this was disadvantageous, driving LP selection in prehistoric Europe. Comparison of model likelihoods indicates that population fluctuations, settlement density and wild animal exploitation-proxies for these drivers-provide better explanations of LP selection than the extent of milk exploitation. These findings offer new perspectives on prehistoric milk exploitation and LP evolution.

33 citations

Journal ArticleDOI
TL;DR: The state of the art for mbGWAS is discussed in this article , focusing on current challenges such as the heterogeneity of microbiome measurements and power issues, and elaborate on potential future directions for genetic analysis of the microbiome.
Abstract: The human gut microbiome is a complex ecosystem that is involved in its host's metabolism, immunity and health. Although interindividual variations in gut microbial composition are mainly driven by environmental factors, some gut microorganisms are heritable and thus can be influenced by host genetics. In the past 5 years, 12 microbial genome-wide association studies (mbGWAS) with >1,000 participants have been published, yet only a few genetic loci have been consistently confirmed across multiple studies. Here we discuss the state of the art for mbGWAS, focusing on current challenges such as the heterogeneity of microbiome measurements and power issues, and we elaborate on potential future directions for genetic analysis of the microbiome.

24 citations

References
More filters
Journal ArticleDOI
TL;DR: In this review the usual methods applied in systematic reviews and meta-analyses are outlined, and the most common procedures for combining studies with binary outcomes are described, illustrating how they can be done using Stata commands.

31,656 citations

Journal ArticleDOI
TL;DR: The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility, and for the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.
Abstract: Background: PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for faster and scalable implementations of key functions, such as logistic regression, linkage disequilibrium estimation, and genomic distance evaluation. In addition, GWAS and population-genetic data now frequently contain genotype likelihoods, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1’s primary data format. Findings: To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, O √ n -time/constant-space Hardy-Weinberg equilibrium and Fisher’s exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. We have also developed an extension to the data format which adds low-overhead support for genotype likelihoods, phase, multiallelic variants, and reference vs. alternate alleles, which is the basis of our planned second release (PLINK 2.0). Conclusions: The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.

7,038 citations

Journal ArticleDOI
23 Jan 2014-Nature
TL;DR: Increases in the abundance and activity of Bilophila wadsworthia on the animal-based diet support a link between dietary fat, bile acids and the outgrowth of microorganisms capable of triggering inflammatory bowel disease.
Abstract: Long-term dietary intake influences the structure and activity of the trillions of microorganisms residing in the human gut, but it remains unclear how rapidly and reproducibly the human gut microbiome responds to short-term macronutrient change. Here we show that the short-term consumption of diets composed entirely of animal or plant products alters microbial community structure and overwhelms inter-individual differences in microbial gene expression. The animal-based diet increased the abundance of bile-tolerant microorganisms (Alistipes, Bilophila and Bacteroides) and decreased the levels of Firmicutes that metabolize dietary plant polysaccharides (Roseburia, Eubacterium rectale and Ruminococcus bromii). Microbial activity mirrored differences between herbivorous and carnivorous mammals, reflecting trade-offs between carbohydrate and protein fermentation. Foodborne microbes from both diets transiently colonized the gut, including bacteria, fungi and even viruses. Finally, increases in the abundance and activity of Bilophila wadsworthia on the animal-based diet support a link between dietary fat, bile acids and the outgrowth of microorganisms capable of triggering inflammatory bowel disease. In concert, these results demonstrate that the gut microbiome can rapidly respond to altered diet, potentially facilitating the diversity of human dietary lifestyles.

7,032 citations

Journal ArticleDOI
TL;DR: The Carbohydrate-Active Enzyme (CAZy) database is a knowledge-based resource specialized in the enzymes that build and breakdown complex carbohydrates and glycoconjugates and has been used to improve the quality of functional predictions of a number genome projects by providing expert annotation.
Abstract: The Carbohydrate-Active Enzyme (CAZy) database is a knowledge-based resource specialized in the enzymes that build and breakdown complex carbohydrates and glycoconjugates. As of September 2008, the database describes the present knowledge on 113 glycoside hydrolase, 91 glycosyltransferase, 19 polysaccharide lyase, 15 carbohydrate esterase and 52 carbohydrate-binding module families. These families are created based on experimentally characterized proteins and are populated by sequences from public databases with significant similarity. Protein biochemical information is continuously curated based on the available literature and structural information. Over 6400 proteins have assigned EC numbers and 700 proteins have a PDB structure. The classification (i) reflects the structural features of these enzymes better than their sole substrate specificity, (ii) helps to reveal the evolutionary relationships between these enzymes and (iii) provides a convenient framework to understand mechanistic properties. This resource has been available for over 10 years to the scientific community, contributing to information dissemination and providing a transversal nomenclature to glycobiologists. More recently, this resource has been used to improve the quality of functional predictions of a number genome projects by providing expert annotation. The CAZy resource resides at URL: http://www.cazy.org/.

6,028 citations

Journal ArticleDOI
27 May 2020-Nature
TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.
Abstract: Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

4,913 citations

Related Papers (5)