scispace - formally typeset
Open AccessPosted ContentDOI

Expanding Parkinson’s disease genetics: novel risk loci, genomic context, causal insights and heritable risk

Mike A. Nalls, +65 more
- 04 Mar 2019 - 
- pp 388165
TLDR
These data provide the most comprehensive understanding of the genetic architecture of PD to date by revealing many additional PD risk loci, providing a biological context for these risk factors, and demonstrating that a considerable genetic component of this disease remains unidentified.
Abstract
We performed the largest genome-wide association study of PD to date, involving the analysis of 7.8M SNPs in 37.7K cases, 18.6K UK Biobank proxy-cases, and 1.4M controls. We identified 90 independent genome-wide significant signals across 78 loci, including 38 independent risk signals in 37 novel loci. These variants explained 26-36% of the heritable risk of PD. Tests of causality within a Mendelian randomization framework identified putatively causal genes for 70 risk signals. Tissue expression enrichment analysis suggested that signatures of PD loci were heavily brain-enriched, consistent with specific neuronal cell types being implicated from single cell expression data. We found significant genetic correlations with brain volumes, smoking status, and educational attainment. In sum, these data provide the most comprehensive understanding of the genetic architecture of PD to date by revealing many additional PD risk loci, providing a biological context for these risk factors, and demonstrating that a considerable genetic component of this disease remains unidentified.

read more

Content maybe subject to copyright    Report

TITLE
Expanding Parkinson’s disease genetics: novel risk loci, genomic context, causal insights and
heritable risk.
AUTHORS
Mike A. Nalls
1,2,CA
*, Cornelis Blauwendraat
1
*, Costanza L. Vallerga
3,4
*, Karl Heilbron
5
*, Sara
Bandres-Ciga
1
*, Diana Chang
6
*, Manuela Tan
7
, Demis A. Kia
7
, Alastair J. Noyce
7,8
, Angli Xue
3,4
,
Jose Bras
9,10
, Emily Young
11
, Rainer von Coelln
12
, Javier Simón-Sánchez
13,14
, Claudia
Schulte
13,14
, Manu Sharma
15
, Lynne Krohn
16,17
, Lasse Pihlstrom
18
, Ari Siitonen
19,20
, Hirotaka
Iwaki
1,2,21
, Hampton Leonard
1,2
, Faraz Faghri
1,22
, J. Raphael Gibbs
1
, Dena G. Hernandez
1
, Sonja
W. Scholz
23,24
, Juan A. Botia
7,25
, Maria Martinez
26
, Jean-Christophe Corvol
27
, Suzanne Lesage
27
,
Joseph Jankovic
11
, Lisa M. Shulman
11
, The 23andMe Research Team
5
, System Genomics of
Parkinson's Disease (SGPD) Consortium, Margaret Sutherland
28
, Pentti Tienari
29,30
, Kari
Majamaa
19,20
, Mathias Toft
18,31
, Ole A. Andreassen
32,33
, Tushar Bangale
6
, Alexis Brice
27
, Jian
Yang
3,4
, Ziv Gan-Or
16,17,34
, Thomas Gasser
13,14
, Peter Heutink
13,14
, Joshua M Shulman
11,35,36
,
Nicolas Wood
7
, David A. Hinds
5
, John A. Hardy
7
, Huw R Morris
37,38
, Jacob Gratten
3,4
, Peter M.
Visscher
3,4
, Robert R. Graham
6
, Andrew B. Singleton
1
for the International Parkinson’s Disease
Genomics Consortium.
AFFILIATIONS
1
Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health,
Bethesda, MD, 20892 USA
2
Data Tecnica International, Glen Echo, MD, 20812 USA
3
Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072 Australia
4
Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072
Australia
5
23andMe, Inc., Mountain View, California 94041 USA
6
Department of Human Genetics, Genentech, South San Francisco, 94080, CA, USA
7
Department of Molecular Neuroscience, UCL Institute of Neurology, London, UK
8
Preventive Neurology Unit, Wolfson Institute of Preventive Medicine, Queen Mary University of
London, London, UK
9
Center for Neurodegenerative Science, Van Andel Research Institute, Grand Rapids,
Michigan, USA
10
Department of Neurodegenerative Diseases, UCL Institute of Neurology, University College
London, London, UK
11
Department of Neurology, Baylor College of Medicine, Houston, USA
12
Department of Neurology, University of Maryland School of Medicine, Baltimore, MD, USA
13
Department for Neurodegenerative Diseases, Hertie Institute for Clinical Brain Research,
University of Tübingen, Tübingen, Germany
14
German Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany.
made available for use under a CC0 license.
certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted March 4, 2019. ; https://doi.org/10.1101/388165doi: bioRxiv preprint

15
Centre for Genetic Epidemiology, Institute for Clinical Epidemiology and Applied Biometry,
University of Tubingen, Germany
16
Department of Human Genetics, McGill University, Montreal, Quebec, Canada
17
Montreal Neurological Institute, McGill University, Montreal, Quebec, Canada
18
Department of Neurology, Oslo University Hospital, Oslo, Norway
19
Institute of Clinical Medicine, Department of Neurology, University of Oulu, Oulu, Finland
20
Department of Neurology and Medical Research Center, Oulu University Hospital, Oulu,
Finland
21
The Michael J. Fox Foundation, New York, New York, 10036 USA
22
Department of Computer Science, University of Illinois Urbana-Champaign, Champaign, IL,
61820, USA
23
Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and
Stroke, National Institutes of Health, Bethesda, MD 20892, USA
24
Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD 21287,
USA
25
Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia,
Spain
26
INSERM UMR 1220; and Paul Sabatier University, Toulouse, France
27
INSERM U1127, CNRS UMR 7225, Sorbonne Université UMR S1127, APHP, Institut du
Cerveau et de la Moelle épinière, ICM, Paris F-75013, France
28
National Institute on Neurological Diseases and Stroke, National Institutes of Health,
Bethesda, MD 20892 USA
29
Clinical Neurosciences, Neurology, University of Helsinki, Helsinki, Finland
30
Helsinki University Hospital, Helsinki, Finland
31
Institute of Clinical Medicine, University of Oslo, Oslo, Norway
32
Jebsen Centre for Psychosis Research, University of Oslo, Oslo, Norway
33
Division of Mental Health and Addiction, Oslo University Hospital, Oslo, Norway
34
Department of Neurology & Neurosurgery, McGill University, Montreal, Quebec, Canada
35
Departments of Molecular & Human Genetics and Neuroscience, Baylor College of Medicine,
Houston, USA
36
Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston
37
Department of Clinical Neuroscience, UCL Institute of Neurology, London UK
38
UCL Movement Disorders Centre, UCL Institute of Neurology, London, UK
*denotes shared first authorship.
CA
denotes corresponding author, mike[at]datatecnica[dot]com
Full consortia membership (PubMed indexed) is available in the supplemental materials (Text
S1).
ACKNOWLEDGEMENTS AND FUNDING
See supplemental materials (Text S2).
ABSTRACT
made available for use under a CC0 license.
certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted March 4, 2019. ; https://doi.org/10.1101/388165doi: bioRxiv preprint

We performed the largest genome-wide association study of PD to date, involving the analysis
of 7.8M SNPs in 37.7K cases, 18.6K UK Biobank proxy-cases, and 1.4M controls. We identified
90 independent genome-wide significant signals across 78 loci, including 38 independent risk
signals in 37 novel loci. These variants explained 26-36% of the heritable risk of PD. Tests of
causality within a Mendelian randomization framework identified putatively causal genes for 70
risk signals. Tissue expression enrichment analysis suggested that signatures of PD loci were
heavily brain-enriched, consistent with specific neuronal cell types being implicated from single
cell expression data. We found significant genetic correlations with brain volumes, smoking
status, and educational attainment. In sum, these data provide the most comprehensive
understanding of the genetic architecture of PD to date by revealing many additional PD risk
loci, providing a biological context for these risk factors, and demonstrating that a considerable
genetic component of this disease remains unidentified.
INTRODUCTION
Parkinson’s disease (PD) is a neurodegenerative disorder, affecting up to 2% of the population
older than 60 years, an estimated 1 million individuals in the United States alone. PD patients
suffer from a combination of progressive motor and non-motor symptoms that increasingly
impair daily function and quality of life. There are no treatments that delay or alter PD
1
. As the
global population continues to age, the prevalence of PD is projected to double in some age
groups by 2030, creating a substantial burden on healthcare systems.
1,2,3
Early investigations into the role of genetic factors in PD focused on the identification of rare
mutations underlying familial forms of the disease,
4–6
but over the past decade there has been a
growing appreciation for the important contribution of genetics in sporadic disease
7,8
. Genetic
studies of sporadic PD have altered the foundational view of disease etiology as much of
sporadic disease was formerly thought to be environmental.
With this in mind, we executed a series of experiments to further explore the genetics of PD
(summarized in Figure 1). We performed the largest-to-date GWAS for PD, including 7.8M
SNPs, 37.7K cases, 18.6K UK Biobank (UKB) “proxy-cases” and 1.4M controls. We identified
putatively causal genes for PD, providing valuable targets for therapeutic research. We
assessed the function of these putatively causal genes on a larger scale than in previous
studies of PD via Mendelian randomization (MR), expression enrichment, and protein-protein
interaction network analysis
9,10,11
. We estimated PD heritability, developed a polygenic risk
score that predicted a substantial proportion of this heritability, and leveraged these results to
inform future studies of PD genetics. Finally, we identified putative PD biomarkers and risk
factors using genetic correlation and Mendelian randomization.
METHODS
See Supplementary Methods
made available for use under a CC0 license.
certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted March 4, 2019. ; https://doi.org/10.1101/388165doi: bioRxiv preprint

RESULTS
Novel loci and multiple signals in known loci identified
To maximize our power for locus discovery we used a single stage design, meta-analyzing all
available GWAS summary statistics. In support of this design, we found strong genetic
correlations between GWAS using PD cases ascertained by clinicians compared to 23andMe
self-reported cases (rG = 0.85, SE = 0.06) and UKB proxy cases (rG = 0.84, SE = 0.134).
We identified a total of 90 independent genome-wide significant association signals through our
meta-analysis and conditional analyses of 37,688 cases, 18,618 UKB proxy-cases and
1,417,791 controls at 7,784,415 SNPs (Figure 2, Table 1, Supplementary Appendices, Table
S1, Table S2). Of these, 38 signals are new and more than 1MB from loci described in a
previous report by Chang et al. 2017 (Table S3).
In an attempt to detect multiple independent signals within loci we implemented conditional and
joint analysis (GCTA-COJO, http://cnsgenomics.com/software/gcta/) with a large study-specific
reference genotype series, as well as a participant-level conditional analysis using 23andMe
data
12
. We considered independent risk signals from conditional analyses to share the same
locus if they were within 250kb of each other. We detected 10 loci containing more than one
independent risk signal (22 risk SNPs in total across these loci), of which nine had been
identified by previous GWAS, including multi-signal loci in the vicinity of GBA, NUCKS1 /
RAB29, GAK / TMEM175, SNCA and LRRK2. The novel multi-signal locus comprised
independent risk variants rs2269906 (UBTF / GRN) and rs850738 (FAM171A2). Detailed
summary statistics on all nominated loci can be found in Table S2.
Refining heritability estimates and determining extant genetic risk
To quantify how much of the genetic liability we have explained and what direction to take with
future PD GWAS we calculated updated heritability estimates and polygenic risk scores (PRS).
Using LD score regression (LDSC) on a meta-analysis of all 11 clinically-ascertained datasets
from our GWAS and estimated the liability-scale narrow-sense heritability of PD as 0.22 (95%
CI 0.18 - 0.26), only slightly lower than a previous estimate derived using GCTA (0.27, 95% CI
0.17 - 0.38)
10,13,14
. This may be because LDSC is known to be more conservative than GCTA,
however, our LDSC heritability estimate does fall within the 95% confidence interval of the
GCTA estimate.
Next, we sought to determine the proportion of SNP-based heritability explained by our PD
GWAS results using polygenic risk scores (PRSs). We utilized a two-stage design for our PRS
analyses, with variant selection and training in the NeuroX-dbGaP dataset (5,851 cases and
5,866 controls) and then validation in the Harvard Biomarker Study (HBS, 527 cases and 472
controls). We focused on the NeuroX-dbGaP and HBS cohort as both of these clinically
characterized cohorts were genotyped on the same PD-focused array (NeuroX) and have been
used in previous studies of PRSs
8,15–18
. In addition, both of these studies directly genotyped
made available for use under a CC0 license.
certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted March 4, 2019. ; https://doi.org/10.1101/388165doi: bioRxiv preprint

larger effect, rare variants within LRRK2 (rs34637584, G2019S) and GBA (rs76763715, N370S)
of great interest in previous PRS analyses.
In order to prevent bias, we estimated the effect size of each SNP contributing to the PRS using
a meta-analysis of all PD GWAS datasets except NeuroX-dbGAP and HBS. Using permutation
testing in the NeuroX-dbGAP training cohort, we found that the optimal P threshold for variant
inclusion was 1.35E-03, which included 1809 variants. Two PRSs were tested in HBS, one
limited to 88 of the 90 genome-wide significant variants (two variants failed to pass quality
control in the HBS study), and the other incorporating 1805 variants from the training phase
(four variants failed to pass quality control in HBS due to low imputation quality). The 88 variant
PRS had an area under the curve (AUC) of 0.651 (95% CI 0.617 - 0.684), while the 1805 variant
PRS had an AUC of 0.692 (95% CI 0.660 - 0.725). The AUCs from our 88 variant PRS in both
the NeuroX-dbGAP cohort and the HBS cohort were significantly larger than the AUCs in those
same cohorts using a published PRS (Chang et al. 2017, AUC = 0.624, P < 0.002 from
DeLong’s test). Although the HBS cohort was used to discover the 90 PD GWAS risk variants,
therefore potentially biasing our 88 variant PRS, all 90 variants remained genome-wide
significant in a meta-analysis of all GWAS datasets excluding the HBS study. Extended results
for all included studies at all P-value thresholds can be found in the Supplementary Appendix.
Using equations from Wray et al. 2010 and our current heritability estimates, the 88 variant PRS
explained approximately 16% of the genetic liability of PD assuming a global prevalence of
0.5%
13,19
. The 1805 variant PRS explained roughly 26% of PD heritability. In a high-risk
population with a prevalence of 2%, the 1805 variant PRS explained 36% of PD heritable risk
13,19
(Table S4).
We then attempted to quantify strata of risk in our more inclusive PRS. Compared to individuals
with PRS values in the lowest quartile , the PD odds ratio for individuals with PRS values in the
highest quartile was 3.74 (95% CI = 3.35 - 4.18) in the NeuroX-dbGaP cohort and 6.25 (95% CI
= 4.26 - 9.28) in the HBS cohort (Table 2, Figure 3, Figure S1).
Variants in the range of 5E-08 < P < 1.35E-03 (used in the 1805 variant PRS) were rarer and
had smaller effect estimates than variants reaching genome-wide significance. These sub-
significant variants had a median minor allele frequency of 21.3% and a median effect estimate
(absolute value of the log odds ratio of the SNP parameter from regresion) of 0.047. Genome-
wide significant risk variants were more common with a median minor allele frequency of 25.1%,
and had a median effect estimate of 0.081. We performed power calculations to forecast the
number of additional PD cases needed to achieve genome-wide significance at 80% power for a
variant with a minor allele frequency of 21.3% and an effect estimate of 0.047
20
. Assuming that
all incoming data is well harmonized with current data and that disease prevalence is 0.5%, we
estimated that we would need a total of ~99K cases, ~2.3 times as many as our current
analysis. Variant discovery at this point will help us work towards the maximum achievable AUC
for a genetic predictor in PD (estimated 85%). Past this point it is possible that effect estimates
get too marginal, variants get too rare and they are no longer useful in predictions or in
estimating heritability
21
.
made available for use under a CC0 license.
certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted March 4, 2019. ; https://doi.org/10.1101/388165doi: bioRxiv preprint

Citations
More filters
Journal ArticleDOI

The genetic architecture of Parkinson's disease

TL;DR: Multiple efforts have been made to investigate the genetic architecture of Parkinson's disease, and emerging technologies, such as machine learning, single-cell RNA sequencing, and high-throughput screens, will improve the understanding of genetic risk.
Journal ArticleDOI

The genetic architecture of the human cerebral cortex

Katrina L. Grasby, +359 more
- 20 Mar 2020 - 
TL;DR: Results support the radial unit hypothesis that different developmental mechanisms promote surface area expansion and increases in thickness and find evidence that brain structure is a key phenotype along the causal pathway that leads from genetic variation to differences in general cognitive function.
Journal ArticleDOI

Genetics of Parkinson's disease: An introspection of its journey towards precision medicine.

TL;DR: The journey thus far of PD genetics is outlined, highlighting how significant advances have improved knowledge of the genetic basis of PD risk, onset and progression and foresee that genetic discoveries in PD will directly influence the ability to predict disease and aid in defining etiological subtypes, critical steps for the implementation of precision medicine for PD.
Journal ArticleDOI

Autophagy in Parkinson's Disease

TL;DR: Recent pathological, genetic, and mechanistic findings that link defective autophagy with PD pathogenesis in human patients, animal and cellular models are summarized and current challenges in the field are discussed.
References
More filters
Journal ArticleDOI

The Genotype-Tissue Expression (GTEx) project

John T. Lonsdale, +129 more
- 29 May 2013 - 
TL;DR: The Genotype-Tissue Expression (GTEx) project is described, which will establish a resource database and associated tissue bank for the scientific community to study the relationship between genetic variation and gene expression in human tissues.
Journal ArticleDOI

GCTA: a tool for genome-wide complex trait analysis.

TL;DR: The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets and focuses on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation.
Journal ArticleDOI

LD score regression distinguishes confounding from polygenicity in genome-wide association studies :

TL;DR: It is found that polygenicity accounts for the majority of the inflation in test statistics in many GWAS of large sample size, and the LD Score regression intercept can be used to estimate a more powerful and accurate correction factor than genomic control.
Related Papers (5)

Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson's disease

Biological insights from 108 schizophrenia-associated genetic loci

Stephan Ripke, +354 more
- 24 Jul 2014 - 

A global reference for human genetic variation.

Adam Auton, +517 more
- 01 Oct 2015 - 

Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression

Naomi R. Wray, +262 more
- 26 Apr 2018 - 
Frequently Asked Questions (17)
Q1. What contributions have the authors mentioned in the paper "Expanding parkinson’s disease genetics: novel risk loci, genomic context, causal insights and heritable risk. authors" ?

Certified by peer review ) is the author/funder. This article is a US Government work. 

Altogether, the data presented here has significantly expanded the resources available for future investigations into potential PD interventions. Power estimates suggest that expansions of case numbers to 99K cases will continue to reveal additional insights into PD genetics. While these yet-to-be defined risk variants will have relatively small effects, cumulatively they will improve their ability to predict PD and will help to further expand their knowledge of the genes and pathways that drive PD risk. Their bi-directional GSMR results suggest a complex etiological connection between smoking initiation and PD that will require further follow-up and should be viewed with some caution. 

Adding datasets from non-European populations would be helpful to further improve their granularity in association testing and ability to fine-map loci through integration of more variable LD signatures while also evaluating population specific associations. 

allowing researchers to share participant-level data in a secure environment would facilitate inclusiveness and uniformity in analyses while maintaining the confidentiality of study participants. 

In addition to studies of the genetics of PD risk, studies of disease onset, progression, and subtype will be important and will require large series of well-characterized patients. 

Larger QTL studies and PD-specific network data from large scale cellular screens would allow us to build a more robust functional inference framework. 

Variants in the range of 5E-08 < P < 1.35E-03 (used in the 1805 variant PRS) were rarer and had smaller effect estimates than variants reaching genome-wide significance. 

The authors identified 90 independent genome-wide significant signals across 78 loci, including 38 independent risk signals in 37 novel loci. 

In an attempt to detect multiple independent signals within loci the authors implemented conditional and joint analysis (GCTA-COJO, http://cnsgenomics.com/software/gcta/) with a large study-specific reference genotype series, as well as a participant-level conditional analysis using 23andMe data 12. 

The authors detected 10 loci containing more than one independent risk signal (22 risk SNPs in total across these loci), of which nine had been identified by previous GWAS, including multi-signal loci in the vicinity of GBA, NUCKS1 / RAB29, GAK / TMEM175, SNCA and LRRK2. 

Smoking initiation (the act of ever starting smoking) did not have a causal effect on PD risk (MR effect = - 0.063, SE = 0.034, Bonferroni-adjusted P = 0.315), whereas PD had a small, but significantly positive causal effect on smoking initiation (MR effect = 0.027, SE = 0.006, Bonferroni-adjusted P = 1.62E-05). 

the authors analyzed protein-protein interaction networks using webgestaltR29 and found that the genes highlighted by their PD GWAS were enriched in six functional ontological networks (FDRadjusted P < 0.1). 

The authors found 10 significantly enriched pathways (false discovery rate [FDR]-adjusted P < 0.05, Table S8), including four related to vacuolar function and three related to known drug targets (calcium transporters: ikeda_mir1_targets_dn and ikeda_mir30_targets_up, kinase signaling: kim_pten_targets_dn). 

While these yet-to-be defined risk variants will have relatively small effects, cumulatively they will improve their ability to predict PD and will help to further expand their knowledge of the genes and pathways that drive PD risk. 

The odds ration (OR) colum is the exponent of the regression coefficient (beta) from logistic regression of the polygenic risk score (PRS) on case status, with the standard error (SE) representing the precision of these estimates. 

Of the 90 PD GWAS risk variants, 70 were in loci containing at least one of these putatively causal genes after multiple test correction (Table 3 summarizes top QTL per gene). 

To a degree, the fact that the authors filtered their variants with a secondary random-effects metaanalysis may make their 90 PD GWAS hits somewhat more robust due to the conservative nature of random-effects.