scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Identification of context-dependent expression quantitative trait loci in whole blood

TL;DR: This work generated peripheral blood RNA–seq data from 2,116 unrelated individuals and systematically identified context-dependent eQTLs using a hypothesis-free strategy that does not require previous knowledge of the identity of the modifiers.
Abstract: Genetic risk factors often localize to noncoding regions of the genome with unknown effects on disease etiology. Expression quantitative trait loci (eQTLs) help to explain the regulatory mechanisms underlying these genetic associations. Knowledge of the context that determines the nature and strength of eQTLs may help identify cell types relevant to pathophysiology and the regulatory networks underlying disease. Here we generated peripheral blood RNA-seq data from 2,116 unrelated individuals and systematically identified context-dependent eQTLs using a hypothesis-free strategy that does not require previous knowledge of the identity of the modifiers. Of the 23,060 significant cis-regulated genes (false discovery rate (FDR) ≤ 0.05), 2,743 (12%) showed context-dependent eQTL effects. The majority of these effects were influenced by cell type composition. A set of 145 cis-eQTLs depended on type I interferon signaling. Others were modulated by specific transcription factors binding to the eQTL SNPs.

Summary (2 min read)

Introduction

  • The molecular mechanisms underlying the association of genetic risk factors with disease and complex traits are still largely elusive.
  • Many disease-associated genetic variants are found in non-coding parts of the genome 1,2 and thus must have a regulatory effect on expression.
  • Mapping single nucleotide polymorphisms (SNPs) with an effect on the regulation of gene expression (expression quantitative trait loci, eQTLs) helps to unravel the regulatory networks that underlie physiological traits and diseases 3–8.
  • A subset of eQTLs in immune cells may only be observed after activation of these cells by immunological triggers 15–20.
  • Additionally, insights into the activity of signaling pathways modifying eQTL effects help to unravel the regulatory networks underlying disease.

Results

  • The authors generated a comprehensive set of cis-eQTLs by sequencing whole peripheral blood mRNA of 2,176 healthy adults from four Dutch cohorts 21–24 (2,116 individuals remaining after stringent quality control (Table S1, Supplementary material)).
  • The authors quantified gene and exon expression, as well as exon ratios (the proportion of expression of an exon relative to the total expression of all exons of a gene) and polyA ratios (the ratio of the expression in upstream and downstream parts of the 3’-UTRs separated by annotated polyadenylation (polyA) sites) and performed ciseQTL mapping for all of these.
  • A complete catalogue of all their eQTLs can be downloaded and explored via a dedicated browser at http://genenetwork.nl/biosqtlbrowser.
  • More than half of the cis-regulated genes showed evidence for multiple independent eQTL effects (Figure 1a, Figure S1).
  • As expected, eQTL effects were predominantly found for SNPs associated with hematological, lipid or immune-related traits.

Context-dependent eQTLs

  • The effects of SNPs on gene expression often depend on the cell type or tissue under investigation 9–12, and may be modified by external and environmental factors 15–19.
  • The authors first identified the proxy gene acting on the highest number of eQTLs.
  • There was a significant imbalance in the direction of regulation within the Tcell cluster: 54 genes were up-regulated by the IBD risk allele whereas only 29 were downregulated (binominal test p-value: 0.003), suggesting increased T-cell activity in IBD.
  • Five of these eQTLs were strongest in neutrophils (positive interaction score for module 1) and the genes containing these eQTLs were present in the neutrophil cluster (Figure 3d).
  • The authors therefore conclude that the effect of these 145 eQTL genes is dependent on stimulation with type I interferon.

Regulatory network discovery

  • Each of the aforementioned ten modules demonstrated effects on many (>120) eQTLs.
  • To identify these, the authors first corrected the expression data for the 10 module interaction effects and then ascertained for each gene-level eQTL whether the eQTL effect size was significantly dependent on the expression of any other gene.
  • The authors propose a model where extracellular (HDL) cholesterol levels modify SREBF2 binding to the FADS2 promoter, which, in turn has effects on the expression of FADS2 and the lipid unsaturase activity in the cell.
  • This eQTL activating cluster was strongly enriched for “positive regulation of B cell proliferation” (p-value = 1 x 10-7), and the strongest proxy gene in this cluster was FCRLA, which is known to be highly expressed in proliferating B-cells residing in the germinal center of the lymph nodes 44.
  • As such EBF1 influences MYBL2 gene expression, but because of its binding at SNP rs285205, this SNP likely affects the binding affinity of EBF1.

Discussion

  • Using whole blood RNA-seq data the authors greatly expanded the catalog of SNPs that have a known regulatory function.
  • To gain a better understanding of the biology behind these regulatory variants, the authors identified 2,743 context-dependent eQTLs (1,842 in the first 10 modules and 901 in the remainder) and identified many of the determinants that modify these eQTLs.
  • These provide further insight into the cell types in which the genetic risk factors are regulating gene expression and the regulatory networks in which they participate, further refining their findings on GWAS risk loci.
  • Unlike other approaches (15,16,20), their method does not rely on any prior knowledge or assumptions on differences in cell type composition or naturally occurring stimulations acting on their whole tissue data.
  • As such their approach complements perturbation experiments to gain better insight in regulatory networks and their stimuli.

Figures

  • Over 20,000 genes are regulated by cis-eQTLs overlapping with 33% of the entries in the GWAS catalog.
  • .CC-BY-NC-ND 4.0 International licenseunder a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
  • (B) Gene function enrichment per cluster showed T-cell biology for the yellow cluster and neutrophil biology for the blue cluster.
  • (D) All positive eQTL interaction effects for IBD eQTLs.
  • Genes positively correlated with the top covariate (SP140) are indicated in blue and those negatively correlated with SP140 in red.

Data availability

  • All results can be queried using their dedicated QTL browser: http://genenetwork.nl/biosqtlbrowser/.
  • Raw data was submitted to the European Genomephenome Archive (EGA, accession number EGAS00001001077).

Author contributions

  • BTH, PACtH, JBJvM, AI, RJ and LF formed the management team of the BIOS consortium.
  • JBJvM, PMJ, MV, JvR and NL generated RNA-seq data.
  • HM, MvI, MvG, WA, JB, DVZ, RJ, PvtH, PD, MV, IN, MaS, PACtH, BTH and MM were responsible for data management and the computational infrastructure.
  • DVZ, PD, PACtH and LF drafted the manuscript.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

1
Hypothesis-free identification of modulators of
genetic risk factors
Daria V. Zhernakova
1*
, Patrick Deelen
1,2*
, Martijn Vermaat
3*
, Maarten van Iterson
4*
, Michiel van
Galen
3
, Wibowo Arindrarto
5
, Peter van ’t Hof
5
, Hailiang Mei
5
, Freerk van Dijk
1,2
, Harm-Jan
Westra
6,7,8
, Marc Jan Bonder
1
, Jeroen van Rooij
9
, Marijn Verkerk
9
, P. Mila Jhamai
9
, Matthijs
Moed
4
, Szymon M. Kielbasa
4
, Jan Bot
10
, Irene Nooren
10
, René Pool
11
, Jenny van Dongen
11
,
Jouke J. Hottenga
11
, Coen D.A. Stehouwer
12
, Carla J.H. van der Kallen
12
, Casper G.
Schalkwijk
12
, Alexandra Zhernakova
1
, Yang Li
1
, Ettje F. Tigchelaar
1
, Marian Beekman
4
, Joris
Deelen
4
, Diana van Heemst
13
, Leonard H. van den Berg
14
, Albert Hofman
15
, André G.
Uitterlinden
9
, Marleen M.J. van Greevenbroek
12
, Jan H. Veldink
16
, Dorret I. Boomsma
11
,
Cornelia M. van Duijn
17
, Cisca Wijmenga
1
, P. Eline Slagboom
4
, Morris A. Swertz
1,2
, Aaron
Isaacs
17,18
, Joyce B.J. van Meurs
9
, Rick Jansen
19
, Bastiaan T. Heijmans
4#
, Peter A.C. ’t Hoen
3#
,
Lude Franke
1#
* Shared first; # Shared last
1
University of Groningen, University Medical Center Groningen, Department of Genetics,
Groningen, the Netherlands
2
University of Groningen, University Medical Center Groningen, Genomics Coordination Center,
Groningen, the Netherlands
3
Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands
4
Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden
University Medical Center, Leiden, the Netherlands
5
Sequence Analysis Support Core, Leiden University Medical Center, Leiden, the Netherlands
6
Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's
Hospital and Harvard Medical School, Boston, USA
7
Partners Center for Personalized Genetic Medicine, Boston, USA
8
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge,
USA
9
Department of Internal Medicine, ErasmusMC, Rotterdam, the Netherlands
10
SURFsara, Amsterdam, the Netherlands
.CC-BY-NC-ND 4.0 International licenseunder a
not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 30, 2015. ; https://doi.org/10.1101/033217doi: bioRxiv preprint

2
11
Department of Biological Psychology, VU Amsterdam, Neuroscience Campus Amsterdam,
Amsterdam, the Netherlands
12
Department of Internal Medicine and School for Cardiovascular Diseases (CARIM),
Maastricht University Medical Center, Maastricht, the Netherlands
13
Department of Gerontology and Geriatrics, Leiden University Medical Center, Leiden, the
Netherlands
14
Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht,
Utrecht, the Netherlands
15
Department of Epidemiology, ErasmusMC, Rotterdam, The Netherlands
16
Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht,
Utrecht, the Netherlands
17
Genetic Epidemiology Unit, Department of Epidemiology, ErasmusMC, Rotterdam, the
Netherlands
18
CARIM School for Cardiovascular Diseases and Maastricht Centre for Systems Biology
(MaCSBio), Maastricht University, Maastricht, the Netherlands
19
Department of Psychiatry, VU University Medical Center, Neuroscience Campus Amsterdam,
Amsterdam, the Netherlands
.CC-BY-NC-ND 4.0 International licenseunder a
not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 30, 2015. ; https://doi.org/10.1101/033217doi: bioRxiv preprint

3
Abstract
Genetic risk factors often localize in non-coding regions of the genome with unknown effects on
disease etiology. Expression quantitative trait loci (eQTLs) help to explain the regulatory
mechanisms underlying the association of genetic risk factors with disease. More mechanistic
insights can be derived from knowledge of the context, such as cell type or the activity of
signaling pathways, influencing the nature and strength of eQTLs. Here, we generated
peripheral blood RNA-seq data from 2,116 unrelated Dutch individuals and systematically
identified these context-dependent eQTLs using a hypothesis-free strategy that does not require
prior knowledge on the identity of the modifiers. Out of the 23,060 significant cis-regulated
genes (false discovery rate 0.05), 2,743 genes (12%) show context-dependent eQTL effects.
The majority of those were influenced by cell type composition, revealing eQTLs that are
particularly strong in cell types such as CD4+ T-cells, erythrocytes, and even lowly abundant
eosinophils. A set of 145 cis-eQTLs were influenced by the activity of the type I interferon
signaling pathway and we identified several cis-eQTLs that are modulated by specific
transcription factors that bind to the eQTL SNPs. This demonstrates that large-scale eQTL
studies in unchallenged individuals can complement perturbation experiments to gain better
insight in regulatory networks and their stimuli.
.CC-BY-NC-ND 4.0 International licenseunder a
not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 30, 2015. ; https://doi.org/10.1101/033217doi: bioRxiv preprint

4
Introduction
The molecular mechanisms underlying the association of genetic risk factors with disease and
complex traits are still largely elusive. Many disease-associated genetic variants are found in
non-coding parts of the genome
1,2
and thus must have a regulatory effect on expression.
Mapping single nucleotide polymorphisms (SNPs) with an effect on the regulation of gene
expression (expression quantitative trait loci, eQTLs) helps to unravel the regulatory networks
that underlie physiological traits and diseases
3–8
. Given differences between the regulatory
networks of different cell types, it is not surprising that a substantial fraction of eQTLs are only
apparent in specific cell types or tissues
9–14
. The presence of external stimuli and the activity of
internal signaling pathways may also determine the presence and strength of the regulatory
effects of eQTLs. For example, a subset of eQTLs in immune cells may only be observed after
activation of these cells by immunological triggers
1520
. Knowledge of the cellular context in
which disease-associated eQTLs are active can help to identify the cell types that are relevant
in the pathophysiology; identification of the cell type in which a risk locus shows the most
profound effects allows prioritization of variants for functional experiments. Additionally, insights
into the activity of signaling pathways modifying eQTL effects help to unravel the regulatory
networks underlying disease. Here, we developed and applied a strategy to identify the most
important intrinsic and extrinsic factors that modify eQTL effects in blood cells, without making
any prior assumptions on the identity of these modifiers. We demonstrate how the eQTLs and
their modifiers contribute to better understand the molecular basis of disease.
Results
Main-effect cis-eQTLs
We generated a comprehensive set of cis-eQTLs by sequencing whole peripheral blood mRNA
of 2,176 healthy adults from four Dutch cohorts
2124
(2,116 individuals remaining after stringent
quality control (Table S1, Supplementary material)). We quantified gene and exon expression,
as well as exon ratios (the proportion of expression of an exon relative to the total expression of
all exons of a gene) and polyA ratios (the ratio of the expression in upstream and downstream
parts of the 3’-UTRs separated by annotated polyadenylation (polyA) sites) and performed cis-
eQTL mapping for all of these. We detected cis-eQTL effects for 66% of the protein coding
genes tested and 19% of the non-coding genes tested. In total, we found eQTL effects for
23,060 different genes (false discovery rate (FDR) ≤ 0.05). We replicated 84% of 6,418
previously reported cis-eQTL genes that we had previously detected in a meta-analysis of 5,311
.CC-BY-NC-ND 4.0 International licenseunder a
not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 30, 2015. ; https://doi.org/10.1101/033217doi: bioRxiv preprint

5
array-based blood samples
4
(90% with the same allelic direction) (Table S2). This
demonstrates the superior statistical power to detect eQTLs when using RNA-seq data (Tables
S2 and S3). We also observed strong overlap with RNA-seq based cis-eQTLs from EBV-
transformed lymphoblastoid cell lines (LCL)
5
(78% of the LCL cis-eQTLs could be replicated,
88% with the same allelic direction), but substantially extended the list of genes that are known
to be under genetic regulation (replication results in Supplementary material online, Table S2).
In addition to detected gene-level eQTLs, we identified for 21,888 different genes with one or
more exon-level QTL effects and 9,777 and 2,322 genes where SNPs affected the inclusion rate
of exons and the usage of polyA sites, respectively (Table S3). A complete catalogue of all our
eQTLs can be downloaded and explored via a dedicated browser at
http://genenetwork.nl/biosqtlbrowser.
Multiple unlinked SNPs in the same locus may independently influence expression or mRNA
processing of the same gene
25
. We analyzed this using stepwise regression of the effects of
the top eQTL SNPs. More than half of the cis-regulated genes showed evidence for multiple
independent eQTL effects (Figure 1a, Figure S1).
The gene cis-eQTL SNPs are strongly enriched for DNase I footprints, various histone marks
and binding sites of multiple transcription factors
26
(Table S4) suggesting that our substantial
sample-size enabled us to pinpoint likely causal regulatory variants. Moreover, top eQTL SNPs
were significantly enriched for general and blood-cell-type-specific enhancers (as taken from
Andersson et al., 2014
27
), but not for non-blood tissue-specific enhancers (Table S5). Evidence
for the functionality of exon ratio and polyA ratio QTLs in mRNA splicing and polyadenylation is
presented in the supplementary material.
One third (2,064 / 32.7%) of previously established genetic risk factors for disease or complex
traits (derived from the NHGRI GWAS catalog and a set of reported ImmunoChip associations,
P 5 x 10
-8
) were in strong linkage disequilibrium (LD r
2
0.8) with a top eQTL SNP (Table S6,
Figure 1b). As expected, eQTL effects were predominantly found for SNPs associated with
hematological, lipid or immune-related traits. We observed a highly significant enrichment of co-
localization of eQTL and GWAS SNPs (LD r
2
≥ 0.8) for many immune disorders, as compared to
height (see supplementary material for details), indicating that our blood cis-eQTLs are highly
informative for diseases such as inflammatory bowel disease, multiple sclerosis and rheumatoid
arthritis (Figure 1c).
.CC-BY-NC-ND 4.0 International licenseunder a
not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 30, 2015. ; https://doi.org/10.1101/033217doi: bioRxiv preprint

Citations
More filters
Journal ArticleDOI
TL;DR: FUMA is a web-based bioinformatics tool that uses a combination of positional, eQTL and chromatin interaction mapping to prioritize likely causal variants and genes and directly aid in generating hypotheses that are testable in functional experiments aimed at proving causal relations.
Abstract: A main challenge in genome-wide association studies (GWAS) is to pinpoint possible causal variants. Results from GWAS typically do not directly translate into causal variants because the majority of hits are in non-coding or intergenic regions, and the presence of linkage disequilibrium leads to effects being statistically spread out across multiple variants. Post-GWAS annotation facilitates the selection of most likely causal variant(s). Multiple resources are available for post-GWAS annotation, yet these can be time consuming and do not provide integrated visual aids for data interpretation. We, therefore, develop FUMA: an integrative web-based platform using information from multiple biological resources to facilitate functional annotation of GWAS results, gene prioritization and interactive visualization. FUMA accommodates positional, expression quantitative trait loci (eQTL) and chromatin interaction mappings, and provides gene-based, pathway and tissue enrichment results. FUMA results directly aid in generating hypotheses that are testable in functional experiments aimed at proving causal relations.

2,092 citations

Journal ArticleDOI
Naomi R. Wray1, Stephan Ripke2, Stephan Ripke3, Stephan Ripke4  +259 moreInstitutions (79)
TL;DR: A genome-wide association meta-analysis of individuals with clinically assessed or self-reported depression identifies 44 independent and significant loci and finds important relationships of genetic risk for major depression with educational attainment, body mass, and schizophrenia.
Abstract: Major depressive disorder (MDD) is a common illness accompanied by considerable morbidity, mortality, costs, and heightened risk of suicide. We conducted a genome-wide association meta-analysis based in 135,458 cases and 344,901 controls and identified 44 independent and significant loci. The genetic findings were associated with clinical features of major depression and implicated brain regions exhibiting anatomical differences in cases. Targets of antidepressant medications and genes involved in gene splicing were enriched for smaller association signal. We found important relationships of genetic risk for major depression with educational attainment, body mass, and schizophrenia: lower educational attainment and higher body mass were putatively causal, whereas major depression and schizophrenia reflected a partly shared biological etiology. All humans carry lesser or greater numbers of genetic risk factors for major depression. These findings help refine the basis of major depression and imply that a continuous measure of risk underlies the clinical phenotype.

1,898 citations

Journal ArticleDOI
TL;DR: A large genome-wide association study of clinically diagnosed AD and AD-by-proxy identifies new loci and functional pathways that contribute to AD risk and adds novel insights into the neurobiology of AD.
Abstract: Alzheimer's disease (AD) is highly heritable and recent studies have identified over 20 disease-associated genomic loci. Yet these only explain a small proportion of the genetic variance, indicating that undiscovered loci remain. Here, we performed a large genome-wide association study of clinically diagnosed AD and AD-by-proxy (71,880 cases, 383,378 controls). AD-by-proxy, based on parental diagnoses, showed strong genetic correlation with AD (rg = 0.81). Meta-analysis identified 29 risk loci, implicating 215 potential causative genes. Associated genes are strongly expressed in immune-related tissues and cell types (spleen, liver, and microglia). Gene-set analyses indicate biological mechanisms involved in lipid-related processes and degradation of amyloid precursor proteins. We show strong genetic correlations with multiple health-related outcomes, and Mendelian randomization results suggest a protective effect of cognitive ability on AD risk. These results are a step forward in identifying the genetic factors that contribute to AD risk and add novel insights into the neurobiology of AD.

1,460 citations

Journal ArticleDOI
Ditte Demontis1, Ditte Demontis2, Raymond K. Walters3, Raymond K. Walters4, Joanna Martin3, Joanna Martin5, Joanna Martin6, Manuel Mattheisen, Thomas Damm Als1, Thomas Damm Als2, Esben Agerbo2, Esben Agerbo1, Gisli Baldursson, Rich Belliveau3, Jonas Bybjerg-Grauholm7, Jonas Bybjerg-Grauholm2, Marie Bækvad-Hansen2, Marie Bækvad-Hansen7, Felecia Cerrato3, Kimberly Chambert3, Claire Churchhouse4, Claire Churchhouse3, Ashley Dumont3, Nicholas Eriksson, Michael J. Gandal, Jacqueline I. Goldstein4, Jacqueline I. Goldstein3, Katrina L. Grasby8, Jakob Grove, Olafur O Gudmundsson9, Olafur O Gudmundsson10, Christine Søholm Hansen2, Christine Søholm Hansen11, Christine Søholm Hansen7, Mads E. Hauberg1, Mads E. Hauberg2, Mads V. Hollegaard2, Mads V. Hollegaard7, Daniel P. Howrigan3, Daniel P. Howrigan4, Hailiang Huang4, Hailiang Huang3, Julian Maller3, Alicia R. Martin3, Alicia R. Martin4, Nicholas G. Martin8, Jennifer L. Moran3, Jonatan Pallesen2, Jonatan Pallesen1, Duncan S. Palmer4, Duncan S. Palmer3, Carsten Bøcker Pedersen1, Carsten Bøcker Pedersen2, Marianne Giørtz Pedersen1, Marianne Giørtz Pedersen2, Timothy Poterba3, Timothy Poterba4, Jesper Buchhave Poulsen2, Jesper Buchhave Poulsen7, Stephan Ripke12, Stephan Ripke4, Stephan Ripke3, Elise B. Robinson4, F. Kyle Satterstrom4, F. Kyle Satterstrom3, Hreinn Stefansson10, Christine Stevens3, Patrick Turley3, Patrick Turley4, G. Bragi Walters9, G. Bragi Walters10, Hyejung Won13, Hyejung Won14, Margaret J. Wright15, Ole A. Andreassen16, Philip Asherson17, Christie L. Burton18, Dorret I. Boomsma19, Bru Cormand, Søren Dalsgaard1, Barbara Franke20, Joel Gelernter21, Joel Gelernter22, Daniel H. Geschwind14, Daniel H. Geschwind13, Hakon Hakonarson23, Jan Haavik24, Jan Haavik25, Henry R. Kranzler26, Henry R. Kranzler21, Jonna Kuntsi17, Kate Langley5, Klaus-Peter Lesch27, Klaus-Peter Lesch28, Klaus-Peter Lesch29, Christel M. Middeldorp19, Christel M. Middeldorp15, Andreas Reif30, Luis Augusto Rohde31, Panos Roussos, Russell Schachar18, Pamela Sklar32, Edmund J.S. Sonuga-Barke17, Patrick F. Sullivan33, Patrick F. Sullivan6, Anita Thapar5, Joyce Y. Tung, Irwin D. Waldman34, Sarah E. Medland8, Kari Stefansson9, Kari Stefansson10, Merete Nordentoft2, Merete Nordentoft35, David M. Hougaard7, David M. Hougaard2, Thomas Werge35, Thomas Werge11, Thomas Werge2, Ole Mors36, Ole Mors2, Preben Bo Mortensen, Mark J. Daly, Stephen V. Faraone37, Anders D. Børglum2, Anders D. Børglum1, Benjamin M. Neale3, Benjamin M. Neale4 
TL;DR: A genome-wide association meta-analysis of 20,183 individuals diagnosed with ADHD and 35,191 controls identifies variants surpassing genome- wide significance in 12 independent loci and implicates neurodevelopmental pathways and conserved regions of the genome as being involved in underlying ADHD biology.
Abstract: Attention deficit/hyperactivity disorder (ADHD) is a highly heritable childhood behavioral disorder affecting 5% of children and 2.5% of adults. Common genetic variants contribute substantially to ADHD susceptibility, but no variants have been robustly associated with ADHD. We report a genome-wide association meta-analysis of 20,183 individuals diagnosed with ADHD and 35,191 controls that identifies variants surpassing genome-wide significance in 12 independent loci, finding important new information about the underlying biology of ADHD. Associations are enriched in evolutionarily constrained genomic regions and loss-of-function intolerant genes and around brain-expressed regulatory marks. Analyses of three replication studies: a cohort of individuals diagnosed with ADHD, a self-reported ADHD sample and a meta-analysis of quantitative measures of ADHD symptoms in the population, support these findings while highlighting study-specific differences on genetic overlap with educational attainment. Strong concordance with GWAS of quantitative population measures of ADHD symptoms supports that clinical diagnosis of ADHD is an extreme expression of continuous heritable traits.

1,436 citations

Posted ContentDOI
03 Oct 2019-bioRxiv
TL;DR: Analysis of the v8 data provides insights into the tissue-specificity of genetic effects, and shows that cell type composition is a key factor in understanding gene regulatory mechanisms in human tissues.
Abstract: The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on the transcriptome across human tissues, and to link these regulatory mechanisms to trait and disease associations. Here, we present analyses of the v8 data, based on 17,382 RNA-sequencing samples from 54 tissues of 948 post-mortem donors. We comprehensively characterize genetic associations for gene expression and splicing in cis and trans, showing that regulatory associations are found for almost all genes, and describe the underlying molecular mechanisms and their contribution to allelic heterogeneity and pleiotropy of complex traits. Leveraging the large diversity of tissues, we provide insights into the tissue-specificity of genetic effects, and show that cell type composition is a key factor in understanding gene regulatory mechanisms in human tissues.

1,243 citations

References
More filters
Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: The Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure outperforms other aligners by a factor of >50 in mapping speed.
Abstract: Motivation Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. Results To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. Availability and implementation STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

30,684 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
16 Feb 2007-Science
TL;DR: A method called “affinity propagation,” which takes as input measures of similarity between pairs of data points, which found clusters with much lower error than other methods, and it did so in less than one-hundredth the amount of time.
Abstract: Clustering data by identifying a subset of representative examples is important for processing sensory signals and detecting patterns in data. Such "exemplars" can be found by randomly choosing an initial subset of data points and then iteratively refining it, but this works well only if that initial choice is close to a good solution. We devised a method called "affinity propagation," which takes as input measures of similarity between pairs of data points. Real-valued messages are exchanged between data points until a high-quality set of exemplars and corresponding clusters gradually emerges. We used affinity propagation to cluster images of faces, detect genes in microarray data, identify representative sentences in this manuscript, and identify cities that are efficiently accessed by airline travel. Affinity propagation found clusters with much lower error than other methods, and it did so in less than one-hundredth the amount of time.

6,429 citations

Journal ArticleDOI
TL;DR: A simple and effective method for performing normalization is outlined and dramatically improved results for inferring differential expression in simulated and publicly available data sets are shown.
Abstract: The fine detail provided by sequencing-based transcriptome surveys suggests that RNA-seq is likely to become the platform of choice for interrogating steady state RNA. In order to discover biologically important changes in expression, we show that normalization continues to be an essential step in the analysis. We outline a simple and effective method for performing normalization and show dramatically improved results for inferring differential expression in simulated and publicly available data sets.

6,042 citations

Related Papers (5)
08 May 2015-Science
Kristin G. Ardlie, David S. DeLuca, Ayellet V. Segrè, Timothy J. Sullivan, Taylor Young, Ellen Gelfand, Casandra A. Trowbridge, Julian Maller, Taru Tukiainen, Monkol Lek, Lucas D. Ward, Pouya Kheradpour, Benjamin Iriarte, Yan Meng, Cameron D. Palmer, Tõnu Esko, Wendy Winckler, Joel N. Hirschhorn, Manolis Kellis, Daniel G. MacArthur, Gad Getz, Andrey A. Shabalin, Gen Li, Yi-Hui Zhou, Andrew B. Nobel, Ivan Rusyn, Fred A. Wright, Tuuli Lappalainen, Pedro G. Ferreira, Halit Ongen, Manuel A. Rivas, Alexis Battle, Sara Mostafavi, Jean Monlong, Michael Sammeth, Marta Melé, Ferran Reverter, Jakob M. Goldmann, Daphne Koller, Roderic Guigó, Mark I. McCarthy, Emmanouil T. Dermitzakis, Eric R. Gamazon, Hae Kyung Im, Anuar Konkashbaev, Dan L. Nicolae, Nancy J. Cox, Timothée Flutre, Xiaoquan Wen, Matthew Stephens, Jonathan K. Pritchard, Zhidong Tu, Bin Zhang, Tao Huang, Quan Long, Luan Lin, Jialiang Yang, Jun Zhu, Jun Liu, Amanda Brown, Bernadette Mestichelli, Denee Tidwell, Edmund Lo, Mike Salvatore, Saboor Shad, Jeffrey A. Thomas, John T. Lonsdale, Michael T. Moser, Bryan Gillard, Ellen Karasik, Kimberly Ramsey, Christopher Choi, Barbara A. Foster, John Syron, Johnell Fleming, Harold Magazine, Rick Hasz, Gary Walters, Jason Bridge, Mark Miklos, Susan L. Sullivan, Laura Barker, Heather M. Traino, Maghboeba Mosavel, Laura A. Siminoff, Dana R. Valley, Daniel C. Rohrer, Scott D. Jewell, Philip A. Branton, Leslie H. Sobin, Mary Barcus, Liqun Qi, Jeffrey McLean, Pushpa Hariharan, Ki Sung Um, Shenpei Wu, David Tabor, Charles Shive, Anna M. Smith, Stephen A. Buia, Anita H. Undale, Karna Robinson, Nancy Roche, Kimberly M. Valentino, Angela Britton, Robin Burges, Debra Bradbury, Kenneth W. Hambright, John Seleski, Greg E. Korzeniewski, Kenyon Erickson, Yvonne Marcus, Jorge Tejada, Mehran Taherian, Chunrong Lu, Margaret J. Basile, Deborah C. Mash, Simona Volpi, Jeffery P. Struewing, Gary F. Temple, Joy T. Boyer, Deborah Colantuoni, Roger Little, Susan E. Koester, Latarsha J. Carithers, Helen M. Moore, Ping Guan, Carolyn C. Compton, Sherilyn Sawyer, Joanne P. Demchok, Jimmie B. Vaught, Chana A. Rabiner, Nicole C. Lockhart 
19 Feb 2015-Nature
Anshul Kundaje, Wouter Meuleman, Wouter Meuleman, Jason Ernst, Misha Bilenky, Angela Yen, Angela Yen, Alireza Heravi-Moussavi, Pouya Kheradpour, Pouya Kheradpour, Zhizhuo Zhang, Zhizhuo Zhang, Jianrong Wang, Jianrong Wang, Michael J. Ziller, Viren Amin, John W. Whitaker, Matthew D. Schultz, Lucas D. Ward, Lucas D. Ward, Abhishek Sarkar, Abhishek Sarkar, Gerald Quon, Gerald Quon, Richard Sandstrom, Matthew L. Eaton, Matthew L. Eaton, Yi-Chieh Wu, Yi-Chieh Wu, Andreas R. Pfenning, Andreas R. Pfenning, Xinchen Wang, Xinchen Wang, Melina Claussnitzer, Melina Claussnitzer, Yaping Liu, Yaping Liu, Cristian Coarfa, R. Alan Harris, Noam Shoresh, Charles B. Epstein, Elizabeta Gjoneska, Elizabeta Gjoneska, Danny Leung, Wei Xie, R. David Hawkins, Ryan Lister, Chibo Hong, Philippe Gascard, Andrew J. Mungall, Richard A. Moore, Eric Chuah, Angela Tam, Theresa K. Canfield, R. Scott Hansen, Rajinder Kaul, Peter J. Sabo, Mukul S. Bansal, Mukul S. Bansal, Mukul S. Bansal, Annaick Carles, Jesse R. Dixon, Kai How Farh, Soheil Feizi, Soheil Feizi, Rosa Karlic, Ah Ram Kim, Ah Ram Kim, Ashwinikumar Kulkarni, Daofeng Li, Rebecca F. Lowdon, Ginell Elliott, Tim R. Mercer, Shane Neph, Vitor Onuchic, Paz Polak, Paz Polak, Nisha Rajagopal, Pradipta R. Ray, Richard C Sallari, Richard C Sallari, Kyle Siebenthall, Nicholas A Sinnott-Armstrong, Nicholas A Sinnott-Armstrong, Michael Stevens, Robert E. Thurman, Jie Wu, Bo Zhang, Xin Zhou, Arthur E. Beaudet, Laurie A. Boyer, Philip L. De Jager, Philip L. De Jager, Peggy J. Farnham, Susan J. Fisher, David Haussler, Steven J.M. Jones, Steven J.M. Jones, Wei Li, Marco A. Marra, Michael T. McManus, Shamil R. Sunyaev, Shamil R. Sunyaev, James A. Thomson, Thea D. Tlsty, Li-Huei Tsai, Li-Huei Tsai, Wei Wang, Robert A. Waterland, Michael Q. Zhang, Lisa Helbling Chadwick, Bradley E. Bernstein, Bradley E. Bernstein, Bradley E. Bernstein, Joseph F. Costello, Joseph R. Ecker, Martin Hirst, Alexander Meissner, Aleksandar Milosavljevic, Bing Ren, John A. Stamatoyannopoulos, Ting Wang, Manolis Kellis, Manolis Kellis 
Frequently Asked Questions (18)
Q1. What are the contributions mentioned in the paper "Hypothesis-free identification of modulators of genetic risk factors" ?

In this paper, the most important intrinsic and extrinsic factors that modify eQTL effects in blood cells are identified. 

These provide further insight into the cell types in which the genetic risk factors are regulating gene expression and the regulatory networks in which they participate, further refining their findings on GWAS risk loci. 

The positively correlated genes are enriched for up-regulated genes upon rhinovirus stimulation 16 (Fisher exact p-value 1.14 x 10-9), in line with their involvement in the type The authorinterferon response. 

In support of the modifying effects of viral cues on this set of eQTLs, eQTL genes that have recently been reported as rhinovirus-response QTLs 16 typically have higher interaction z-scores for module 7 than other eQTL genes (Wilcoxon p-value = 0.02). 

(C) Expression levels in the cellsorted BLUEPRINT data show that the genes in the yellow cluster show higher expression in T-cells and the genes in the blue cluster show higher expression in neutrophils. 

Samples with very low expression of STX3 showed only a very weak eQTL on NOD2, whereas samples with very high STX3 expression showed a stronger eQTL effect size. 

When also including these genes, the authors observed this cluster of genes is strongly co-expressed with EBF1, a transcription factor that drives B-cell differentiation and proliferation, suggesting that EBF1 mightdrive the eQTL interaction effect for MYBL2. 

the authors expect that the genes whose expression levels modify eQTLs are proxies of cell types or other intrinsic or extrinsic factors, and the authors call these genes 'proxy genes'. 

The gene cis-eQTL SNPs are strongly enriched for DNase The authorfootprints, various histone marks and binding sites of multiple transcription factors 26 (Table S4) suggesting that their substantial sample-size enabled us to pinpoint likely causal regulatory variants. 

In this example, the eQTL effect is found to be more prominent in neutrophils than in other blood cell types, and the expression of NOD2 found to be lower in carriers of the risk allele compared to carriers of the protective allele. 

They were also enriched in binding sites for transcription factors involved in erythrocyte development based on ENCODE ChIP-seq data (GATA1, TAL1, GATA2 and MafK, each with enrichment p-values ≤ 10-5) 30–32. 

The other exons showed downregulation by the risk allele, suggesting that a shift to the NMD isoform is lowering overall gene expression levels (Figure S4). 

Of the 232 top SNPs reported in this meta-analysis, 95 loci (41%) are in strong LD (r2 ≥ 0.8) with a top eQTL SNP (median r2 = 0.96 and median D’ = 0.996). 

The authors also observed negative interactions, where the effect becomes smaller in a specific module, e.g. the eQTL effect of rs1728801 regulating ZPF90 (Figure 3f), a gene that is known to be important in T-helper cells 36. 

As such EBF1 influences MYBL2 gene expression, but because of its binding at SNP rs285205, this SNP likely affects the binding affinity of EBF1. 

Gene function enrichment analysis on the exon-level and exon ratio QTLs showed results similar to that of eQTL genes (Table S8), indicating that the proxy genes do not solely represent the factors modulating gene-level eQTLs but also those that affect alternative splicing eQTLs. 

; https://doi.org/10.1101/033217doi: bioRxiv preprintFigure 3. eQTLs associated with inflammatory bowel disease are predominantly active in neutrophils and T-cells. 

EBF1 is a known player in B-cell differentiation and proliferation and positively correlated to both MYBL2 (r = 0.11, p-value = 6.99 x 10-7) and FCRLA (r = 0.8, p-value ≤ 2.2 x 10-16).