scispace - formally typeset
Open AccessPosted ContentDOI

Linking common and rare disease genetics through gene regulatory networks

Reads0
Chats0
TLDR
In this paper, the authors demonstrate that common and rare genetic diseases can be linked by studying the gene regulatory networks impacted by common disease-associated variants, and they implemented this in the downstreamer method and applied it to 44 GWAS traits and find that predicted downstream "key genes" are enriched with Mendelian disease genes.
Abstract
Genetic variants identified through genome-wide association studies (GWAS) are typically non-coding and exert small regulatory effects on downstream genes, but which downstream genes are ultimately impacted and how they confer risk remains mostly unclear. Conversely, variants that cause rare Mendelian diseases are often coding and have a more direct impact on disease development. We demonstrate that common and rare genetic diseases can be linked by studying the gene regulatory networks impacted by common disease-associated variants. We implemented this in the ‘Downstreamer’ method and applied it to 44 GWAS traits and find that predicted downstream “key genes” are enriched with Mendelian disease genes, e.g. key genes for height are enriched for genes that cause skeletal abnormalities and Ehlers-Danlos syndromes. We find that 82% of these key genes are located outside of GWAS loci, suggesting that they result from complex trans regulation rather than being impacted by disease-associated variants in cis. Finally, we discuss the challenges in reconstructing gene regulatory networks and provide a roadmap to improve identification of these highly connected genes for common traits and diseases.

read more

Content maybe subject to copyright    Report

1
Linking common and rare disease genetics through gene
regulatory networks
Olivier B. Bakker
1*
, Annique Claringbould
2*
, Harm-Jan Westra
1
, Henry Wiersma
1
, Floranne
Boulogne
1
, Urmo Võsa
3
, Sophie Mulcahy Symmons
1
, Iris H. Jonkers
1
, Lude Franke
1,4#
,
Patrick Deelen
1,4#
* These authors contributed equally
#
These authors contributed equally
1: University of Groningen, University Medical Center Groningen, Department of Genetics,
Groningen, The Netherlands
2: Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
3: Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
4: Oncode Institute, Utrecht, The Netherlands
Abstract
Genetic variants identified through genome-wide association studies (GWAS) are
typically non-coding and exert small regulatory effects on downstream genes, but which
downstream genes are ultimately impacted and how they confer risk remains mostly unclear.
Conversely, variants that cause rare Mendelian diseases are often coding and have a more
direct impact on disease development. We demonstrate that common and rare genetic
diseases can be linked by studying the gene regulatory networks impacted by common
disease-associated variants. We implemented this in the ‘Downstreamer’ method and
applied it to 44 GWAS traits and find that predicted downstream “key genes” are enriched
with Mendelian disease genes, e.g. key genes for height are enriched for genes that cause
skeletal abnormalities and Ehlers-Danlos syndromes. We find that 82% of these key genes
are located outside of GWAS loci, suggesting that they result from complex trans regulation
rather than being impacted by disease-associated variants in cis. Finally, we discuss the
challenges in reconstructing gene regulatory networks and provide a roadmap to improve
identification of these highly connected genes for common traits and diseases.
. CC-BY-NC-ND 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 27, 2021. ; https://doi.org/10.1101/2021.10.21.21265342doi: medRxiv preprint
NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.

2
Introduction
Genetic variation plays a major role in the development of both common and rare
diseases, yet the genetic architectures of these disease types are usually considered quite
different. Rare genetic disorders are thought to primarily be caused by a single, mostly
protein-coding genetic variant that has a large effect on disease risk. As a consequence, the
causal genes for a rare disorder can often be identified by sequencing individual patients or
families. In contrast, the genetic risks for common diseases are modulated by a large
number of mostly non-coding variants that individually exert small effects. These variants are
typically identified through genome-wide association studies (GWASs). However,
identification of the causal variants and genes affected by GWAS loci remains challenging,
in part due to linkage disequilibrium (LD) and small effect-sizes
1,2
.
Despite the differences between rare and complex diseases, it has been shown that
GWAS loci for multiple traits are enriched for genes that can cause related rare diseases
when damaged
3,4
. For instance, common variants associated to PR interval, a
measurement of heart function, have been found within the MYH6 gene
5
, which is known to
harbour mutations in individuals with familial dilated cardiomyopathy
6
. Moreover, eQTL
studies have found examples of rare disease genes that are affected by distal common
variants in trans, such as the immunodeficiency gene ISG15, which is affected by multiple
systemic lupus erythematosusassociated variants
7
. These results indicate that common
and rare diseases can result from damage to or altered regulation of the same genes,
suggesting that the same biological pathways underlie these conditions
4
. However, it is not
fully known to what extent specific genes and pathways are shared between rare and
common diseases.
Over the years, many pathway-enrichment methods have been developed that can
identify which biological pathways are enriched for common diseases
8–10
as well as
highlighting their most likely cellular context(s)
11,12
. In addition, several methods can
prioritize individual genes within GWAS susceptibility loci by studying how they are
functionally related to genes in other susceptibility loci
8,1316
. However, these methods
confine themselves to genes in GWAS loci, potentially missing relevant trans-regulated up-
or downstream effects. In blood, expression quantitative trait locus (eQTL) mapping has
been successful in identifying the downstream trans regulatory consequences of GWAS-
associated variants (i.e. trans-eQTLs and eQTSs, where polygenic scores are linked to
expression levels)
7
. Unfortunately, large eQTL sample sizes are required to detect such
effects, and such datasets are not yet available for most tissues.
. CC-BY-NC-ND 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 27, 2021. ; https://doi.org/10.1101/2021.10.21.21265342doi: medRxiv preprint

3
Here we build upon the ‘omnigenic model’ hypothesis, which states that the genes that
are most important in complex diseases are those that are modulated by many different
common variants through gene regulatory networks
17,18
. The omnigenic model postulates
that a limited number of core genes exist that drive diseases, but that many peripheral
genes, which contain associated variants, contribute indirectly to disease development by
modulating the activity of the core genes. Since the omnigenic model predicts that many
core genes map outside GWAS loci, these genes will be missed by methods that prioritize
genes within GWAS loci. The omnigenic model hypothesis is supported by recent works
assessing RNA levels of blood cells
19
and molecular traits
20
and a large-scale in vitro
knockdown experiment
21
. However, these studies were performed in blood, limiting their
conclusions to GWAS studies on blood-related traits and immunological disorders.
To take this work further, we integrated (mRNA level) gene regulatory networks with
GWAS summary statistics to prioritize key genes that we suspect are more likely to directly
contribute to disease predisposition than genes in GWAS loci. We have implemented this
strategy in a software package calledDownstreamerthat uses GWAS summary statistics
and gene co-regulation based on 31,499 multi-tissue RNA-seq samples in order to prioritize
these key genes. We also provide pathway, rare disease phenotype (coded by HPO terms)
and tissue enrichments to aid in the comprehensive interpretation of GWAS results.
We applied Downstreamer to 44 GWASs for a wide variety of traits (Table S1) and
show that the identified key genes are enriched for intolerance to loss-of-function (LoF) and
missense (MiS) mutations and for Mendelian disease genes that lead to similar phenotypic
outcomes as the GWAS trait. Specifically, we find that key genes for height are strongly
enriched for severe growth defects and skeletal abnormalities in humans and mice.
Additionally, key genes for auto-immune diseases point to lymphocyte checkpoints and
regulators and those for glomerular filtration rate (GFR; a measure of kidney function) are
transporters for several metabolites.
Key genes that cause Mendelian disease can therefore highlight the molecular
pathways driving the complex disease. Conversely, predicted key genes may aid in
identifying new Mendelian disease genes.
Results
To enable identification of GWAS key genes, we developed Downstreamer (Methods),
a tool that integrates GWAS summary statistics with gene expressionbased co-regulation
networks. Downstreamer first converts individual variant associations to gene p-values by
aggregating associations within a 25kb window around the gene body for all protein-coding
. CC-BY-NC-ND 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 27, 2021. ; https://doi.org/10.1101/2021.10.21.21265342doi: medRxiv preprint

4
genes while correcting for LD between variants
9
(Fig. 1A). These gene p-values are then
converted to z-scores. We calculated gene z-scores for 44 GWAS summary statistics
reflecting a wide variety of disorders and complex traits (Table S1).
Association signals for polygenic traits cluster around transcription factors
We observed that the z-scores of individual GWASs were often weakly positively
correlated (Fig. S1A), especially for traits for which many loci have been identified (Fig. S1
B). For instance, the gene-level z-scores for height correlated positively with the gene-level
z-scores of all other traits. To investigate the source of this shared signal, we calculated the
average gene-level significance across all 44 traits while correcting for bias that might be
introduced for traits that are strongly correlated (see Methods).
We observed that 30% of the variation in this average GWAS’ signal could be
explained by both the extent of LD around a gene and by the local gene density (Fig. S2).
The more extensive the LD around a gene, the higher the chance that genetic variants within
the gene are associated, especially for highly polygenic traits
22,23
. Consequently, the gene-
level z-score for these genes increases. Hence, when collapsing GWAS summary statistics
into gene z-scores, some amount of correlation between well-powered GWAS studies is to
be expected. However, this is unwanted when using gene z-scores in a pathway-enrichment
analysis. We next evaluated if the remaining 70% of the average signal was enriched for any
biological processes. After correcting for LD and gene density, we observed that 79 of the
top 500 genes are transcription factors (OR: 2.22, p-value: 4.25×10
-9
). We also saw
enrichment among the top 500 genes for pathways related to DNA binding and transcription,
for example, transcription regulator activity (OR: 1.98, p-value: 2.24×10
-11
) (Table S2).
Additionally, genes with a higher average gene z-score were enriched for intolerance to LoF
(Fig. S3). These enrichments suggest that there is a set of genes, enriched for GWAS hits,
that confer risk to many different types of traits. This is consistent with previous observations
that broad functional categories tend to be enriched for many traits
17
.
However, as these often-associated genes obscure the specific pathways and key
genes for a trait, we corrected the individual gene z-scores for the average gene z-score in
order to get disease-specific gene-level significance scores that were as specific as
possible. Downstreamer then correlates these corrected gene z-scores to gene expression
patterns, pathway memberships and tissue expression through a
generalized least squares
(G
LS) model that accounts for genegene correlations resulting from the relationship
between LD and sharing of biological functionality (Fig. 1B, Methods). This results in a z-
score that represents the significance of the association of a gene, pathway or tissue
. CC-BY-NC-ND 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 27, 2021. ; https://doi.org/10.1101/2021.10.21.21265342doi: medRxiv preprint

5
. CC-BY-NC-ND 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 27, 2021. ; https://doi.org/10.1101/2021.10.21.21265342doi: medRxiv preprint

Figures
Citations
More filters
Journal ArticleDOI

Brain expression quantitative trait locus and network analyses reveal downstream effects and putative drivers for brain-related diseases

TL;DR: The MetaBrain this article meta-analyses performed in multiple brain region-and ancestry-specific datasets (n ≤ 2,759) were conducted to identify therapeutic targets from genome-wide association studies (GWAS) requires insights into downstream functional consequences.
Posted ContentDOI

An efficient not-only-linear correlation coefficient based on machine learning

TL;DR: Clustermatch Correlation Coefficient (CCC), an efficient, easy-to-use and not-only-linear coefficient based on machine learning models, is introduced, revealing biologically meaningful linear and nonlinear patterns missed by standard, linear-only correlation coefficients.
Posted ContentDOI

A multi-ancestry genome-wide meta-analysis, fine-mapping, and target gene prioritization to characterize the genetic architecture of adiponectin.

TL;DR: This article proposed a combinatorial likelihood scoring formalism (GPScore) based on measures derived from 11 gene prioritization strategies and the physical distance to the transcription start site to prioritize the 30 most probable target genes underlying the adiponectin-associated variants in the multi-ancestry analysis.
Peer ReviewDOI

“Mendelian Code” in the Genetic Structure of Common Multifactorial Diseases

TL;DR: It is important to analyze the Mendelian code of multifactorial diseases using the multiomic approach, which will allow one to identify driver genes and biological pathways associated with the development of diseases.
References
More filters
Journal ArticleDOI

Gene Ontology: tool for the unification of biology

TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Journal ArticleDOI

KEGG: Kyoto Encyclopedia of Genes and Genomes

TL;DR: The Kyoto Encyclopedia of Genes and Genomes (KEGG) as discussed by the authors is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules.
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton, +517 more
- 01 Oct 2015 - 
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Journal ArticleDOI

The mutational constraint spectrum quantified from variation in 141,456 humans

TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.
Related Papers (5)