Author
Allan Motyer
Other affiliations: University of New South Wales
Bio: Allan Motyer is an academic researcher from University of Melbourne. The author has contributed to research in topics: Genome-wide association study & Biobank. The author has an hindex of 9, co-authored 17 publications receiving 3106 citations. Previous affiliations of Allan Motyer include University of New South Wales.
Papers
More filters
••
TL;DR: Deep phenotype and genome-wide genetic data from 500,000 individuals from the UK Biobank is described, describing population structure and relatedness in the cohort, and imputation to increase the number of testable variants to 96 million.
Abstract: The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.
4,489 citations
••
TL;DR: The UK Biobank project is a large prospective cohort study of ~500,000 individuals from across the United Kingdom, aged between 40-69 at recruitment, and a set of analyses that reveal properties of the genetic data – such as population structure and relatedness – that can be important for downstream analyses are conducted.
Abstract: The UK Biobank project is a large prospective cohort study of ~500,000 individuals from across the United Kingdom, aged between 40-69 at recruitment. A rich variety of phenotypic and health-related information is available on each participant, making the resource unprecedented in its size and scope. Here we describe the genome-wide genotype data (~805,000 markers) collected on all individuals in the cohort and its quality control procedures. Genotype data on this scale offers novel opportunities for assessing quality issues, although the wide range of ancestries of the individuals in the cohort also creates particular challenges. We also conducted a set of analyses that reveal properties of the genetic data – such as population structure and relatedness – that can be important for downstream analyses. In addition, we phased and imputed genotypes into the dataset, using computationally efficient methods combined with the Haplotype Reference Consortium (HRC) and UK10K haplotype resource. This increases the number of testable variants by over 100-fold to ~96 million variants. We also imputed classical allelic variation at 11 human leukocyte antigen (HLA) genes, and as a quality control check of this imputation, we replicate signals of known associations between HLA alleles and many common diseases. We describe tools that allow efficient genome-wide association studies (GWAS) of multiple traits and fast phenome-wide association studies (PheWAS), which work together with a new compressed file format that has been used to distribute the dataset. As a further check of the genotyped and imputed datasets, we performed a test-case genome-wide association scan on a well-studied human trait, standing height.
659 citations
••
TL;DR: A new Bayesian analysis framework is developed that exploits the hierarchical structure of diagnosis classifications to analyze genetic variants against UK Biobank disease phenotypes derived from self-reporting and hospital episode statistics and identifies new associations between classical human leukocyte antigen (HLA) alleles and common immune-mediated diseases (IMDs).
Abstract: Genetic discovery from the multitude of phenotypes extractable from routine healthcare data can transform understanding of the human phenome and accelerate progress toward precision medicine. However, a critical question when analyzing high-dimensional and heterogeneous data is how best to interrogate increasingly specific subphenotypes while retaining statistical power to detect genetic associations. Here we develop and employ a new Bayesian analysis framework that exploits the hierarchical structure of diagnosis classifications to analyze genetic variants against UK Biobank disease phenotypes derived from self-reporting and hospital episode statistics. Our method displays a more than 20% increase in power to detect genetic effects over other approaches and identifies new associations between classical human leukocyte antigen (HLA) alleles and common immune-mediated diseases (IMDs). By applying the approach to genetic risk scores (GRSs), we show the extent of genetic sharing among IMDs and expose differences in disease perception or diagnosis with potential clinical implications.
58 citations
••
TL;DR: Genetic variants for IgE‐mediated peanut allergy are yet to be fully characterized and to date only one genomewide association study (GWAS) has been published.
Abstract: Background
Genetic variants for IgE-mediated peanut allergy are yet to be fully characterized and to date only one genome-wide association study (GWAS) has been published.
Objective
To identify genetic variants associated with challenge proven peanut allergy.
Methods
We carried out a GWAS comparing 73 infants with challenge-proven IgE-mediated peanut allergy against 148 non-allergic infants (all ~ 1 year old). We tested a total of 3.8 million single nucleotide polymorphism (SNPs), as well as imputed HLA alleles and amino acids. Replication was assessed by de novo genotyping in a panel of additional 117 cases and 380 controls, and in silico testing in two independent GWAS cohorts.
Results
We identified 21 independent associations at P ≤ 5x10-5 but were unable to replicate these. The most significant HLA association was the previously reported amino acid variant located at position 71, within the peptide-binding groove of HLA-DRB1 (P = 2x10-4). Our study therefore reproduced previous findings for the association between peanut allergy and HLA-DRB1 in this Australian population.
Conclusions & Clinical Relevance
Genetic determinants for challenge proven peanut allergy include alleles at the HLA-DRB1 locus.
This article is protected by copyright. All rights reserved.
40 citations
••
TL;DR: In this paper, the authors considered the class of level-independent quasi-birth-and-death (QBD) processes and derived simple conditions for possible decay rates of the stationary distribution of the 'level' process.
Abstract: We consider the class of level-independent quasi-birth-and-death (QBD) processes that have countably many phases and generator matrices with tridiagonal blocks that are themselves tridiagonal and phase independent. We derive simple conditions for possible decay rates of the stationary distribution of the 'level' process. It may be possible to obtain decay rates satisfying these conditions by varying only the transition structure at level 0. Our results generalize those of Kroese, Scheinhardt, and Taylor, who studied in detail a particular example, the tandem Jackson network, from the class of QBD processes studied here. The conditions derived here are applied to three practical examples.
32 citations
Cited by
More filters
••
TL;DR: MR-Base is a platform that integrates a curated database of complete GWAS results (no restrictions according to statistical significance) with an application programming interface, web app and R packages that automate 2SMR, and includes several sensitivity analyses for assessing the impact of horizontal pleiotropy and other violations of assumptions.
Abstract: Results from genome-wide association studies (GWAS) can be used to infer causal relationships between phenotypes, using a strategy known as 2-sample Mendelian randomization (2SMR) and bypassing the need for individual-level data. However, 2SMR methods are evolving rapidly and GWAS results are often insufficiently curated, undermining efficient implementation of the approach. We therefore developed MR-Base ( http://www.mrbase.org ): a platform that integrates a curated database of complete GWAS results (no restrictions according to statistical significance) with an application programming interface, web app and R packages that automate 2SMR. The software includes several sensitivity analyses for assessing the impact of horizontal pleiotropy and other violations of assumptions. The database currently comprises 11 billion single nucleotide polymorphism-trait associations from 1673 GWAS and is updated on a regular basis. Integrating data with software ensures more rigorous application of hypothesis-driven analyses and allows millions of potential causal relationships to be efficiently evaluated in phenome-wide association studies.
2,520 citations
••
TL;DR: Genome-wide polygenic risk scores derived from GWAS data for five common diseases can identify subgroups of the population with risk approaching or exceeding that of a monogenic mutation.
Abstract: A key public health need is to identify individuals at high risk for a given disease to enable enhanced screening or preventive therapies. Because most common diseases have a genetic component, one important approach is to stratify individuals based on inherited DNA variation1. Proposed clinical applications have largely focused on finding carriers of rare monogenic mutations at several-fold increased risk. Although most disease risk is polygenic in nature2-5, it has not yet been possible to use polygenic predictors to identify individuals at risk comparable to monogenic mutations. Here, we develop and validate genome-wide polygenic scores for five common diseases. The approach identifies 8.0, 6.1, 3.5, 3.2, and 1.5% of the population at greater than threefold increased risk for coronary artery disease, atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer, respectively. For coronary artery disease, this prevalence is 20-fold higher than the carrier frequency of rare monogenic mutations conferring comparable risk6. We propose that it is time to contemplate the inclusion of polygenic risk prediction in clinical care, and discuss relevant issues.
1,962 citations
••
1,756 citations
••
University of Minnesota1, University of Colorado Boulder2, VU University Amsterdam3, Harvard University4, University of Southern California5, University of Tartu6, University of Queensland7, Erasmus University Rotterdam8, Hospital for Special Surgery9, Statens Serum Institut10, University of Copenhagen11, Broad Institute12, University of Essex13, University of Edinburgh14, University of Cambridge15, University Hospital of Lausanne16, Geisinger Health System17, Wenzhou Medical College18, Stanford University19, University of North Carolina at Chapel Hill20, University of Wisconsin-Madison21, The Feinstein Institute for Medical Research22, Hofstra University23, University of Dundee24, University of Toronto25, Princeton University26, New York University Shanghai27, Queen's University28, National Bureau of Economic Research29, Karolinska Institutet30, Uppsala University31, University of Lausanne32, New York University33, Stockholm School of Economics34
TL;DR: A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11–13% of the variance ineducational attainment and 7–10% ofthe variance in cognitive performance, which substantially increases the utility ofpolygenic scores as tools in research.
Abstract: Here we conducted a large-scale genetic association analysis of educational attainment in a sample of approximately 1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs. For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of around 0.3% in both men and women, consistent with partial dosage compensation. A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11-13% of the variance in educational attainment and 7-10% of the variance in cognitive performance. This prediction accuracy substantially increases the utility of polygenic scores as tools in research.
1,658 citations
••
Royal Edinburgh Hospital1, King's College London2, Centre for Mental Health3, University of Glasgow4, University of Edinburgh5, Medical Research Council6, University of Münster7, University of Melbourne8, University of Freiburg9, University of Queensland10, Charité11, Broad Institute12, Harvard University13, Karolinska Institutet14, University of North Carolina at Chapel Hill15
TL;DR: A genetic meta-analysis of depression found 269 associated genes that highlight several potential drug repositioning opportunities, and relationships with depression were found for neuroticism and smoking.
Abstract: Major depression is a debilitating psychiatric illness that is typically associated with low mood and anhedonia. Depression has a heritable component that has remained difficult to elucidate with current sample sizes due to the polygenic nature of the disorder. To maximize sample size, we meta-analyzed data on 807,553 individuals (246,363 cases and 561,190 controls) from the three largest genome-wide association studies of depression. We identified 102 independent variants, 269 genes, and 15 genesets associated with depression, including both genes and gene pathways associated with synaptic structure and neurotransmission. An enrichment analysis provided further evidence of the importance of prefrontal brain regions. In an independent replication sample of 1,306,354 individuals (414,055 cases and 892,299 controls), 87 of the 102 associated variants were significant after multiple testing correction. These findings advance our understanding of the complex genetic architecture of depression and provide several future avenues for understanding etiology and developing new treatment approaches.
1,312 citations