scispace - formally typeset
Search or ask a question

Showing papers in "Nature Genetics in 2018"


Journal ArticleDOI
TL;DR: The MR-PRESSO test detects and corrects horizontal pleiotropy in multi-instrument Mendelian randomization (MR) analyses and introduces distortions in the causal estimates in MR that ranged on average from –131% to 201%; it is shown using simulations that the MR-pressO test is best suited when horizontal Pleiotropy occurs in <50% of instruments.
Abstract: Horizontal pleiotropy occurs when the variant has an effect on disease outside of its effect on the exposure in Mendelian randomization (MR). Violation of the ‘no horizontal pleiotropy’ assumption can cause severe bias in MR. We developed the Mendelian randomization pleiotropy residual sum and outlier (MR-PRESSO) test to identify horizontal pleiotropic outliers in multi-instrument summary-level MR testing. We showed using simulations that the MR-PRESSO test is best suited when horizontal pleiotropy occurs in 48% of causal relationships.

2,362 citations


Journal ArticleDOI
TL;DR: Genome-wide polygenic risk scores derived from GWAS data for five common diseases can identify subgroups of the population with risk approaching or exceeding that of a monogenic mutation.
Abstract: A key public health need is to identify individuals at high risk for a given disease to enable enhanced screening or preventive therapies. Because most common diseases have a genetic component, one important approach is to stratify individuals based on inherited DNA variation1. Proposed clinical applications have largely focused on finding carriers of rare monogenic mutations at several-fold increased risk. Although most disease risk is polygenic in nature2-5, it has not yet been possible to use polygenic predictors to identify individuals at risk comparable to monogenic mutations. Here, we develop and validate genome-wide polygenic scores for five common diseases. The approach identifies 8.0, 6.1, 3.5, 3.2, and 1.5% of the population at greater than threefold increased risk for coronary artery disease, atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer, respectively. For coronary artery disease, this prevalence is 20-fold higher than the carrier frequency of rare monogenic mutations conferring comparable risk6. We propose that it is time to contemplate the inclusion of polygenic risk prediction in clinical care, and discuss relevant issues.

1,962 citations


Journal ArticleDOI
Naomi R. Wray1, Stephan Ripke2, Stephan Ripke3, Stephan Ripke4  +259 moreInstitutions (79)
TL;DR: A genome-wide association meta-analysis of individuals with clinically assessed or self-reported depression identifies 44 independent and significant loci and finds important relationships of genetic risk for major depression with educational attainment, body mass, and schizophrenia.
Abstract: Major depressive disorder (MDD) is a common illness accompanied by considerable morbidity, mortality, costs, and heightened risk of suicide. We conducted a genome-wide association meta-analysis based in 135,458 cases and 344,901 controls and identified 44 independent and significant loci. The genetic findings were associated with clinical features of major depression and implicated brain regions exhibiting anatomical differences in cases. Targets of antidepressant medications and genes involved in gene splicing were enriched for smaller association signal. We found important relationships of genetic risk for major depression with educational attainment, body mass, and schizophrenia: lower educational attainment and higher body mass were putatively causal, whereas major depression and schizophrenia reflected a partly shared biological etiology. All humans carry lesser or greater numbers of genetic risk factors for major depression. These findings help refine the basis of major depression and imply that a continuous measure of risk underlies the clinical phenotype.

1,898 citations


Journal ArticleDOI
James J. Lee1, Robbee Wedow2, Aysu Okbay3, Edward Kong4, Omeed Maghzian4, Meghan Zacher4, Tuan Anh Nguyen-Viet5, Peter Bowers4, Julia Sidorenko6, Julia Sidorenko7, Richard Karlsson Linnér3, Richard Karlsson Linnér8, Mark Alan Fontana9, Mark Alan Fontana5, Tushar Kundu5, Chanwook Lee4, Hui Li4, Ruoxi Li5, Rebecca Royer5, Pascal Timshel10, Pascal Timshel11, Raymond K. Walters4, Raymond K. Walters12, Emily A. Willoughby1, Loic Yengo6, Maris Alver7, Yanchun Bao13, David W. Clark14, Felix R. Day15, Nicholas A. Furlotte, Peter K. Joshi16, Peter K. Joshi14, Kathryn E. Kemper6, Aaron Kleinman, Claudia Langenberg15, Reedik Mägi7, Joey W. Trampush5, Shefali S. Verma17, Yang Wu6, Max Lam, Jing Hua Zhao15, Zhili Zheng18, Zhili Zheng6, Jason D. Boardman2, Harry Campbell14, Jeremy Freese19, Kathleen Mullan Harris20, Caroline Hayward14, Pamela Herd21, Pamela Herd13, Meena Kumari13, Todd Lencz22, Todd Lencz23, Jian'an Luan15, Anil K. Malhotra22, Anil K. Malhotra23, Andres Metspalu7, Lili Milani7, Ken K. Ong15, John R. B. Perry15, David J. Porteous14, Marylyn D. Ritchie17, Melissa C. Smart14, Blair H. Smith24, Joyce Y. Tung, Nicholas J. Wareham15, James F. Wilson14, Jonathan P. Beauchamp25, Dalton Conley26, Tõnu Esko7, Steven F. Lehrer27, Steven F. Lehrer28, Steven F. Lehrer29, Patrik K. E. Magnusson30, Sven Oskarsson31, Tune H. Pers11, Tune H. Pers10, Matthew R. Robinson6, Matthew R. Robinson32, Kevin Thom33, Chelsea Watson5, Christopher F. Chabris17, Michelle N. Meyer17, David Laibson4, Jian Yang6, Magnus Johannesson34, Philipp Koellinger8, Philipp Koellinger3, Patrick Turley4, Patrick Turley12, Peter M. Visscher6, Daniel J. Benjamin27, Daniel J. Benjamin5, David Cesarini27, David Cesarini33 
TL;DR: A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11–13% of the variance ineducational attainment and 7–10% ofthe variance in cognitive performance, which substantially increases the utility ofpolygenic scores as tools in research.
Abstract: Here we conducted a large-scale genetic association analysis of educational attainment in a sample of approximately 1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs. For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of around 0.3% in both men and women, consistent with partial dosage compensation. A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11-13% of the variance in educational attainment and 7-10% of the variance in cognitive performance. This prediction accuracy substantially increases the utility of polygenic scores as tools in research.

1,658 citations


Journal ArticleDOI
Antonio F. Pardiñas1, Peter Holmans1, Andrew Pocklington1, Valentina Escott-Price1, Stephan Ripke2, Stephan Ripke3, Noa Carrera1, Sophie E. Legge1, Sophie Bishop1, D. F. Cameron1, Marian L. Hamshere1, Jun Han1, Leon Hubbard1, Amy Lynham1, Kiran Kumar Mantripragada1, Elliott Rees1, James H. MacCabe4, Steven A. McCarroll5, Bernhard T. Baune6, Gerome Breen4, Gerome Breen7, Enda M. Byrne8, Udo Dannlowski9, Thalia C. Eley4, Caroline Hayward10, Nicholas G. Martin8, Nicholas G. Martin11, Andrew M. McIntosh10, Robert Plomin4, David J. Porteous10, Naomi R. Wray8, Armando Caballero12, Daniel H. Geschwind13, Laura M. Huckins14, Douglas M. Ruderfer14, Enrique Santiago15, Pamela Sklar14, Eli A. Stahl14, Hyejung Won13, Esben Agerbo16, Esben Agerbo17, Thomas Damm Als17, Thomas Damm Als16, Ole A. Andreassen18, Ole A. Andreassen19, Marie Bækvad-Hansen16, Marie Bækvad-Hansen20, Preben Bo Mortensen17, Preben Bo Mortensen16, Carsten Bøcker Pedersen16, Carsten Bøcker Pedersen17, Anders D. Børglum16, Anders D. Børglum17, Jonas Bybjerg-Grauholm20, Jonas Bybjerg-Grauholm16, Srdjan Djurovic21, Srdjan Djurovic19, Naser Durmishi, Marianne Giørtz Pedersen16, Marianne Giørtz Pedersen17, Vera Golimbet, Jakob Grove, David M. Hougaard16, David M. Hougaard20, Manuel Mattheisen17, Manuel Mattheisen16, Espen Molden, Ole Mors22, Ole Mors16, Merete Nordentoft23, Merete Nordentoft16, Milica Pejovic-Milovancevic24, Engilbert Sigurdsson, Teimuraz Silagadze25, Christine Søholm Hansen20, Christine Søholm Hansen16, Kari Stefansson26, Hreinn Stefansson26, Stacy Steinberg26, Sarah Tosato27, Thomas Werge28, Thomas Werge16, Thomas Werge23, David A. Collier29, David A. Collier4, Dan Rujescu30, Dan Rujescu31, George Kirov1, Michael J. Owen1, Michael Conlon O'Donovan1, James T.R. Walters1 
TL;DR: A new genome-wide association study of schizophrenia is reported, and through meta-analysis with existing data and integrating genomic fine-mapping with brain expression and chromosome conformation data, 50 novel associated loci and 145 loci are identified.
Abstract: Schizophrenia is a debilitating psychiatric condition often associated with poor quality of life and decreased life expectancy. Lack of progress in improving treatment outcomes has been attributed to limited knowledge of the underlying biology, although large-scale genomic studies have begun to provide insights. We report a new genome-wide association study of schizophrenia (11,260 cases and 24,542 controls), and through meta-analysis with existing data we identify 50 novel associated loci and 145 loci in total. Through integrating genomic fine-mapping with brain expression and chromosome conformation data, we identify candidate causal genes within 33 loci. We also show for the first time that the common variant association signal is highly enriched among genes that are under strong selective pressures. These findings provide new insights into the biology and genetic architecture of schizophrenia, highlight the importance of mutation-intolerant genes and suggest a mechanism by which common risk variants persist in the population.

1,259 citations


Journal ArticleDOI
Anubha Mahajan1, Daniel Taliun2, Matthias Thurner1, Neil R. Robertson1, Jason M. Torres1, N. William Rayner1, N. William Rayner3, Anthony Payne1, Valgerdur Steinthorsdottir4, Robert A. Scott5, Niels Grarup6, James P. Cook7, Ellen M. Schmidt2, Matthias Wuttke8, Chloé Sarnowski9, Reedik Mägi10, Jana Nano11, Christian Gieger, Stella Trompet12, Cécile Lecoeur13, Michael Preuss14, Bram P. Prins3, Xiuqing Guo15, Lawrence F. Bielak2, Jennifer E. Below16, Donald W. Bowden17, John C. Chambers, Young-Jin Kim, Maggie C.Y. Ng17, Lauren E. Petty16, Xueling Sim18, Weihua Zhang19, Weihua Zhang20, Amanda J. Bennett1, Jette Bork-Jensen6, Chad M. Brummett2, Mickaël Canouil13, Kai-Uwe Ec Kardt21, Krista Fischer10, Sharon L.R. Kardia2, Florian Kronenberg22, Kristi Läll10, Ching-Ti Liu9, Adam E. Locke23, Jian'an Luan5, Ioanna Ntalla24, Vibe Nylander1, Sebastian Schönherr22, Claudia Schurmann14, Loic Yengo13, Erwin P. Bottinger14, Ivan Brandslund25, Cramer Christensen, George Dedoussis26, Jose C. Florez, Ian Ford27, Oscar H. Franco11, Timothy M. Frayling28, Vilmantas Giedraitis29, Sophie Hackinger3, Andrew T. Hattersley28, Christian Herder30, M. Arfan Ikram11, Martin Ingelsson29, Marit E. Jørgensen25, Marit E. Jørgensen31, Torben Jørgensen6, Torben Jørgensen32, Jennifer Kriebel, Johanna Kuusisto33, Symen Ligthart11, Cecilia M. Lindgren1, Cecilia M. Lindgren34, Allan Linneberg35, Allan Linneberg6, Valeriya Lyssenko36, Valeriya Lyssenko37, Vasiliki Mamakou26, Thomas Meitinger38, Karen L. Mohlke39, Andrew D. Morris40, Andrew D. Morris41, Girish N. Nadkarni14, James S. Pankow42, Annette Peters, Naveed Sattar43, Alena Stančáková33, Konstantin Strauch44, Kent D. Taylor15, Barbara Thorand, Gudmar Thorleifsson4, Unnur Thorsteinsdottir45, Unnur Thorsteinsdottir4, Jaakko Tuomilehto, Daniel R. Witte46, Josée Dupuis9, Patricia A. Peyser2, Eleftheria Zeggini3, Ruth J. F. Loos14, Philippe Froguel20, Philippe Froguel13, Erik Ingelsson47, Erik Ingelsson48, Lars Lind29, Leif Groop49, Leif Groop37, Markku Laakso33, Francis S. Collins50, J. Wouter Jukema12, Colin N. A. Palmer51, Harald Grallert, Andres Metspalu10, Abbas Dehghan11, Abbas Dehghan20, Anna Köttgen8, Gonçalo R. Abecasis2, James B. Meigs52, Jerome I. Rotter15, Jonathan Marchini1, Oluf Pedersen6, Torben Hansen6, Torben Hansen25, Claudia Langenberg5, Nicholas J. Wareham5, Kari Stefansson45, Kari Stefansson4, Anna L. Gloyn1, Andrew P. Morris1, Andrew P. Morris10, Andrew P. Morris7, Michael Boehnke2, Mark I. McCarthy1 
TL;DR: Combining 32 genome-wide association studies with high-density imputation provides a comprehensive view of the genetic contribution to type 2 diabetes in individuals of European ancestry with respect to locus discovery, causal-variant resolution, and mechanistic insight.
Abstract: We expanded GWAS discovery for type 2 diabetes (T2D) by combining data from 898,130 European-descent individuals (9% cases), after imputation to high-density reference panels. With these data, we (i) extend the inventory of T2D-risk variants (243 loci, 135 newly implicated in T2D predisposition, comprising 403 distinct association signals); (ii) enrich discovery of lower-frequency risk alleles (80 index variants with minor allele frequency 2); (iii) substantially improve fine-mapping of causal variants (at 51 signals, one variant accounted for >80% posterior probability of association (PPA)); (iv) extend fine-mapping through integration of tissue-specific epigenomic information (islet regulatory annotations extend the number of variants with PPA >80% to 73); (v) highlight validated therapeutic targets (18 genes with associations attributable to coding variants); and (vi) demonstrate enhanced potential for clinical translation (genome-wide chip heritability explains 18% of T2D risk; individuals in the extremes of a T2D polygenic risk score differ more than ninefold in prevalence).

1,136 citations


Journal ArticleDOI
TL;DR: A multiancestry genome-wide-association meta-analysis in 521,612 individuals and discovered 22 new stroke risk loci and eleven new susceptibility loci indicate mechanisms not previously implicated in stroke pathophysiology, with prioritization of risk variants and genes accomplished through bioinformatics analyses using extensive functional datasets.
Abstract: Stroke has multiple etiologies, but the underlying genes and pathways are largely unknown. We conducted a multiancestry genome-wide-association meta-analysis in 521,612 individuals (67,162 cases and 454,450 controls) and discovered 22 new stroke risk loci, bringing the total to 32. We further found shared genetic variation with related vascular traits, including blood pressure, cardiac traits, and venous thromboembolism, at individual loci (n = 18), and using genetic risk scores and linkage-disequilibrium-score regression. Several loci exhibited distinct association and pleiotropy patterns for etiological stroke subtypes. Eleven new susceptibility loci indicate mechanisms not previously implicated in stroke pathophysiology, with prioritization of risk variants and genes accomplished through bioinformatics analyses using extensive functional datasets. Stroke risk loci were significantly enriched in drug targets for antithrombotic therapy.

881 citations


Journal ArticleDOI
Jeanne E. Savage1, Philip R. Jansen1, Philip R. Jansen2, Sven Stringer1, Kyoko Watanabe1, Julien Bryois3, Christiaan de Leeuw1, Mats Nagel, Swapnil Awasthi4, Peter B. Barr5, Jonathan R. I. Coleman6, Katrina L. Grasby7, Anke R. Hammerschlag1, Jakob Kaminski4, Robert Karlsson3, Eva Krapohl8, Max Lam, Marianne Nygaard9, Chandra A. Reynolds10, Joey W. Trampush11, Hannah Young12, Delilah Zabaneh8, Sara Hägg3, Narelle K. Hansell13, Ida K. Karlsson3, Sten Linnarsson3, Grant W. Montgomery7, Grant W. Montgomery13, Ana B. Muñoz-Manchado3, Erin Burke Quinlan8, Gunter Schumann8, Nathan G. Skene14, Nathan G. Skene3, Bradley T. Webb5, Tonya White2, Dan E. Arking15, Dimitrios Avramopoulos15, Robert M. Bilder16, Panos Bitsios17, Katherine E. Burdick18, Katherine E. Burdick19, Katherine E. Burdick20, Tyrone D. Cannon21, Ornit Chiba-Falek, Andrea Christoforou22, Elizabeth T. Cirulli, Eliza Congdon16, Aiden Corvin23, Gail Davies24, Ian J. Deary24, Pamela DeRosse25, Pamela DeRosse26, Dwight Dickinson27, Srdjan Djurovic28, Srdjan Djurovic29, Gary Donohoe30, Emily Drabant Conley, Johan G. Eriksson31, Thomas Espeseth32, Nelson A. Freimer16, Stella G. Giakoumaki17, Ina Giegling33, Michael Gill23, David C. Glahn21, Ahmad R. Hariri34, Alex Hatzimanolis35, Alex Hatzimanolis36, Matthew C. Keller37, Emma Knowles21, Deborah C. Koltai34, Bettina Konte33, Jari Lahti31, Stephanie Le Hellard28, Todd Lencz26, Todd Lencz25, David C. Liewald24, Edythe D. London16, Astri J. Lundervold28, Anil K. Malhotra26, Anil K. Malhotra25, Ingrid Melle28, Ingrid Melle32, Derek W. Morris30, Anna C. Need38, William Ollier39, Aarno Palotie19, Aarno Palotie40, Aarno Palotie31, Antony Payton39, Neil Pendleton41, Russell A. Poldrack42, Katri Räikkönen31, Ivar Reinvang32, Panos Roussos18, Panos Roussos20, Dan Rujescu33, Fred W. Sabb43, Matthew A. Scult34, Olav B. Smeland32, Nikolaos Smyrnis36, Nikolaos Smyrnis35, John M. Starr24, Vidar M. Steen28, Nikos C. Stefanis36, Nikos C. Stefanis35, Richard E. Straub15, Kjetil Sundet32, Henning Tiemeier2, Aristotle N. Voineskos44, Daniel R. Weinberger15, Elisabeth Widen31, Jin Yu, Gonçalo R. Abecasis45, Ole A. Andreassen32, Gerome Breen6, Lene Christiansen9, Birgit Debrabant9, Danielle M. Dick5, Andreas Heinz4, Jens Hjerling-Leffler3, M. Arfan Ikram46, Kenneth S. Kendler5, Nicholas G. Martin7, Sarah E. Medland7, Nancy L. Pedersen3, Robert Plomin8, Tinca J. C. Polderman1, Stephan Ripke4, Stephan Ripke47, Stephan Ripke19, Sophie van der Sluis, Patrick Sullivan3, Patrick Sullivan48, Scott I. Vrieze12, Margaret J. Wright13, Danielle Posthuma1 
TL;DR: A large-scale genetic association study of intelligence identifies 190 new loci and implicates 939 new genes related to neurogenesis, neuron differentiation and synaptic structure, a major step forward in understanding the neurobiology of cognitive function as well as genetically related neurological and psychiatric disorders.
Abstract: Intelligence is highly heritable1 and a major determinant of human health and well-being2. Recent genome-wide meta-analyses have identified 24 genomic loci linked to variation in intelligence3-7, but much about its genetic underpinnings remains to be discovered. Here, we present a large-scale genetic association study of intelligence (n = 269,867), identifying 205 associated genomic loci (190 new) and 1,016 genes (939 new) via positional mapping, expression quantitative trait locus (eQTL) mapping, chromatin interaction mapping, and gene-based association analysis. We find enrichment of genetic effects in conserved and coding regions and associations with 146 nonsynonymous exonic variants. Associated genes are strongly expressed in the brain, specifically in striatal medium spiny neurons and hippocampal pyramidal neurons. Gene set analyses implicate pathways related to nervous system development and synaptic structure. We confirm previous strong genetic correlations with multiple health-related outcomes, and Mendelian randomization analysis results suggest protective effects of intelligence for Alzheimer's disease and ADHD and bidirectional causation with pleiotropic effects for schizophrenia. These results are a major step forward in understanding the neurobiology of cognitive function as well as genetically related neurological and psychiatric disorders.

800 citations


Journal ArticleDOI
TL;DR: SAIGE is a scalable and accurate generalized mixed model association test that can efficiently analyze large data sets while controlling for unbalanced case-control ratios and sample relatedness, as shown by applying SAIGE to the UK Biobank data for > 1,400 binary phenotypes.
Abstract: In genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, the linear mixed model and the recently proposed logistic mixed model, perform poorly; they produce large type I error rates when used to analyze unbalanced case-control phenotypes. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation to calibrate the distribution of score test statistics. This method, SAIGE (Scalable and Accurate Implementation of GEneralized mixed model), provides accurate P values even when case-control ratios are extremely unbalanced. SAIGE uses state-of-art optimization strategies to reduce computational costs; hence, it is applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK Biobank data of 408,961 samples from white British participants with European ancestry for > 1,400 binary phenotypes, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness.

773 citations


Journal ArticleDOI
TL;DR: In this article, the largest genetic association study of blood pressure traits (systolic, diastolic and pulse pressure) to date in over 1 million people of European ancestry was conducted.
Abstract: High blood pressure is a highly heritable and modifiable risk factor for cardiovascular disease We report the largest genetic association study of blood pressure traits (systolic, diastolic and pulse pressure) to date in over 1 million people of European ancestry We identify 535 novel blood pressure loci that not only offer new biological insights into blood pressure regulation but also highlight shared genetic architecture between blood pressure and lifestyle exposures Our findings identify new biological pathways for blood pressure regulation with potential for improved cardiovascular disease prevention in the future

728 citations


Journal ArticleDOI
TL;DR: An approach to identify disease-relevant tissues and cell types by analyzing gene expression data together with genome-wide association study (GWAS) summary statistics and found significant tissue-specific enrichments for 34 traits.
Abstract: We introduce an approach to identify disease-relevant tissues and cell types by analyzing gene expression data together with genome-wide association study (GWAS) summary statistics. Our approach uses stratified linkage disequilibrium (LD) score regression to test whether disease heritability is enriched in regions surrounding genes with the highest specific expression in a given tissue. We applied our approach to gene expression data from several sources together with GWAS summary statistics for 48 diseases and traits (average N = 169,331) and found significant tissue-specific enrichments (false discovery rate (FDR) < 5%) for 34 traits. In our analysis of multiple tissues, we detected a broad range of enrichments that recapitulated known biology. In our brain-specific analysis, significant enrichments included an enrichment of inhibitory over excitatory neurons for bipolar disorder, and excitatory over inhibitory neurons for schizophrenia and body mass index. Our results demonstrate that our polygenic approach is a powerful way to leverage gene expression data for interpreting GWAS signals.

Journal ArticleDOI
TL;DR: Applying MTAG to summary statistics for depressive symptoms, neuroticism and subjective well-being increased discovery of associated loci as compared to single-trait analyses, yielding more informative bioinformatics analyses and increasing the variance explained by polygenic scores by approximately 25%, matching theoretical expectations.
Abstract: We introduce multi-trait analysis of GWAS (MTAG), a method for joint analysis of summary statistics from genome-wide association studies (GWAS) of different traits, possibly from overlapping samples. We apply MTAG to summary statistics for depressive symptoms (N eff = 354,862), neuroticism (N = 168,105), and subjective well-being (N = 388,538). As compared to the 32, 9, and 13 genome-wide significant loci identified in the single-trait GWAS (most of which are themselves novel), MTAG increases the number of associated loci to 64, 37, and 49, respectively. Moreover, association statistics from MTAG yield more informative bioinformatics analyses and increase the variance explained by polygenic scores by approximately 25%, matching theoretical expectations.

Journal ArticleDOI
TL;DR: A large meta-analysis combining genome-wide and custom high-density genotyping array data identifies 63 new susceptibility loci for prostate cancer, enhancing fine-mapping efforts and providing insights into the underlying biology of PrCa1.
Abstract: Genome-wide association studies (GWAS) and fine-mapping efforts to date have identified more than 100 prostate cancer (PrCa)-susceptibility loci. We meta-analyzed genotype data from a custom high-density array of 46,939 PrCa cases and 27,910 controls of European ancestry with previously genotyped data of 32,255 PrCa cases and 33,202 controls of European ancestry. Our analysis identified 62 novel loci associated (P C, p.Pro1054Arg) in ATM and rs2066827 (OR = 1.06; P = 2.3 × 10−9; T>G, p.Val109Gly) in CDKN1B. The combination of all loci captured 28.4% of the PrCa familial relative risk, and a polygenic risk score conferred an elevated PrCa risk for men in the ninetieth to ninety-ninth percentiles (relative risk = 2.69; 95% confidence interval (CI): 2.55–2.82) and first percentile (relative risk = 5.71; 95% CI: 5.04–6.48) risk stratum compared with the population average. These findings improve risk prediction, enhance fine-mapping, and provide insight into the underlying biology of PrCa1. A large meta-analysis combining genome-wide and custom high-density genotyping array data identifies 63 new susceptibility loci for prostate cancer, enhancing fine-mapping efforts and providing insights into the underlying biology.

Journal ArticleDOI
TL;DR: A new class of E26 transformation-specific (ETS)-fusion-negative tumors defined by mutations in epigenetic regulators, as well as alterations in pathways not previously implicated in prostate cancer, such as the spliceosome pathway are identified.
Abstract: Comprehensive genomic characterization of prostate cancer has identified recurrent alterations in genes involved in androgen signaling, DNA repair, and PI3K signaling, among others. However, larger and uniform genomic analysis may identify additional recurrently mutated genes at lower frequencies. Here we aggregate and uniformly analyze exome sequencing data from 1,013 prostate cancers. We identify and validate a new class of E26 transformation-specific (ETS)-fusion-negative tumors defined by mutations in epigenetic regulators, as well as alterations in pathways not previously implicated in prostate cancer, such as the spliceosome pathway. We find that the incidence of significantly mutated genes (SMGs) follows a long-tail distribution, with many genes mutated in less than 3% of cases. We identify a total of 97 SMGs, including 70 not previously implicated in prostate cancer, such as the ubiquitin ligase CUL3 and the transcription factor SPEN. Finally, comparing primary and metastatic prostate cancer identifies a set of genomic markers that may inform risk stratification.

Journal ArticleDOI
TL;DR: It is demonstrated that even without prior biological knowledge of cross-phenotype relationships, genetics corresponding to clinical measurements successfully recapture those measurements’ relevance to diseases, and thus can contribute to the elucidation of unknown etiology and pathogenesis.
Abstract: Clinical measurements can be viewed as useful intermediate phenotypes to promote understanding of complex human diseases. To acquire comprehensive insights into the underlying genetics, here we conducted a genome-wide association study (GWAS) of 58 quantitative traits in 162,255 Japanese individuals. Overall, we identified 1,407 trait-associated loci (P < 5.0 × 10−8), 679 of which were novel. By incorporating 32 additional GWAS results for complex diseases and traits in Japanese individuals, we further highlighted pleiotropy, genetic correlations, and cell-type specificity across quantitative traits and diseases, which substantially expands the current understanding of the associated genetics and biology. This study identified both shared polygenic effects and cell-type specificity, represented by the genetic links among clinical measurements, complex diseases, and relevant cell types. Our findings demonstrate that even without prior biological knowledge of cross-phenotype relationships, genetics corresponding to clinical measurements successfully recapture those measurements’ relevance to diseases, and thus can contribute to the elucidation of unknown etiology and pathogenesis. A genome-wide association study (GWAS) of 58 traits using data from the Biobank Japan Project identifies 1,407 loci, 679 of which are novel. Comparison with disease GWASs and analysis of genetic correlations and cell-type enrichment show that these clinical measurements are relevant to human disease.

Journal ArticleDOI
TL;DR: A much faster version of the BOLT-LMM Bayesian mixed model association method is introduced—capable of running analyses of the full UK Biobank cohort in a few days on a single compute node—and it is shown that it produces highly powered, robust test statistics when run on all 459K European samples (retaining related individuals).
Abstract: Biobank-based genome-wide association studies are enabling exciting insights in complex trait genetics, but much uncertainty remains over best practices for optimizing statistical power and computational efficiency in GWAS while controlling confounders. Here, we introduce a much faster version of our BOLT-LMM Bayesian mixed model association method—capable of running analyses of the full UK Biobank cohort in a few days on a single compute node—and show that it produces highly powered, robust test statistics when run on all 459K European samples (retaining related individuals). When used to conduct a GWAS for height in UK Biobank, BOLT-LMM achieved power equivalent to linear regression on 650K samples—a 93% increase in effective sample size versus the common practice of analyzing unrelated British samples using linear regression (UK Biobank documentation; Bycroft et al. bioRxiv). Across a broader set of 23 highly heritable traits, the total number of independent GWAS loci detected increased from 5,839 to 10,759, an 84% increase. We recommend the use of BOLT-LMM (retaining related individuals) for biobank-scale analyses, and we have publicly released BOLT-LMM summary association statistics for the 23 traits analyzed as a resource for all researchers.

Journal ArticleDOI
TL;DR: It is shown that neuroticism’s genetic signal partly originates in two genetically distinguishable subclusters13 (‘depressed affect’ and ‘worry’), suggesting distinct causal mechanisms for subtypes of individuals.
Abstract: Neuroticism is an important risk factor for psychiatric traits, including depression1, anxiety2,3, and schizophrenia4-6. At the time of analysis, previous genome-wide association studies7-12 (GWAS) reported 16 genomic loci associated to neuroticism10-12. Here we conducted a large GWAS meta-analysis (n = 449,484) of neuroticism and identified 136 independent genome-wide significant loci (124 new at the time of analysis), which implicate 599 genes. Functional follow-up analyses showed enrichment in several brain regions and involvement of specific cell types, including dopaminergic neuroblasts (P = 3.49 × 10-8), medium spiny neurons (P = 4.23 × 10-8), and serotonergic neurons (P = 1.37 × 10-7). Gene set analyses implicated three specific pathways: neurogenesis (P = 4.43 × 10-9), behavioral response to cocaine processes (P = 1.84 × 10-7), and axon part (P = 5.26 × 10-8). We show that neuroticism's genetic signal partly originates in two genetically distinguishable subclusters13 ('depressed affect' and 'worry'), suggesting distinct causal mechanisms for subtypes of individuals. Mendelian randomization analysis showed unidirectional and bidirectional effects between neuroticism and multiple psychiatric traits. These results enhance neurobiological understanding of neuroticism and provide specific leads for functional follow-up experiments.

Journal ArticleDOI
Carolina Roselli1, Mark Chaffin1, Lu-Chen Weng2, Lu-Chen Weng1  +257 moreInstitutions (82)
TL;DR: This large, multi-ethnic genome-wide association study identifies 97 loci significantly associated with atrial fibrillation that are enriched for genes involved in cardiac development, electrophysiology, structure and contractile function.
Abstract: Atrial fibrillation (AF) affects more than 33 million individuals worldwide1 and has a complex heritability2. We conducted the largest meta-analysis of genome-wide association studies (GWAS) for AF to date, consisting of more than half a million individuals, including 65,446 with AF. In total, we identified 97 loci significantly associated with AF, including 67 that were novel in a combined-ancestry analysis, and 3 that were novel in a European-specific analysis. We sought to identify AF-associated genes at the GWAS loci by performing RNA-sequencing and expression quantitative trait locus analyses in 101 left atrial samples, the most relevant tissue for AF. We also performed transcriptome-wide analyses that identified 57 AF-associated genes, 42 of which overlap with GWAS loci. The identified loci implicate genes enriched within cardiac developmental, electrophysiological, contractile and structural pathways. These results extend our understanding of the biological pathways underlying AF and may facilitate the development of therapeutics for AF.

Journal ArticleDOI
TL;DR: An atlas of genetic associations for 118 non-binary and 660 binary traits of 452,264 UK Biobank participants of European ancestry and this atlas allows researchers to query these results without incurring high computational costs is presented.
Abstract: Genome-wide association studies (GWAS) have identified many loci contributing to variation in complex traits, yet the majority of loci that contribute to the heritability of complex traits remain elusive. Large study populations with sufficient statistical power are required to detect the small effect sizes of the yet unidentified genetic variants. However, the analysis of huge cohorts, like UK Biobank, is challenging. Here, we present an atlas of genetic associations for 118 non-binary and 660 binary traits of 452,264 UK Biobank participants of European ancestry. Results are compiled in a publicly accessible database that allows querying genome-wide association results for 9,113,133 genetic variants, as well as downloading GWAS summary statistics for over 30 million imputed genetic variants (>23 billion phenotype–genotype pairs). Our atlas of associations (GeneATLAS, http://geneatlas.roslin.ed.ac.uk ) will help researchers to query UK Biobank results in an easy and uniform way without the need to incur high computational costs. GeneATLAS is a web resource that presents genetic association results for 118 non-binary and 660 binary traits using UK Biobank data. This atlas allows researchers to query these results without incurring high computational costs.

Journal ArticleDOI
TL;DR: LeafCutter is a new tool that identifies variable intron splicing events from RNA-seq data for analysis of complex alternative splicing and does not require transcript annotation and can be used to map splicing quantitative trait loci.
Abstract: The excision of introns from pre-mRNA is an essential step in mRNA processing. We developed LeafCutter to study sample and population variation in intron splicing. LeafCutter identifies variable splicing events from short-read RNA-seq data and finds events of high complexity. Our approach obviates the need for transcript annotations and circumvents the challenges in estimating relative isoform or exon usage in complex splicing events. LeafCutter can be used both to detect differential splicing between sample groups and to map splicing quantitative trait loci (sQTLs). Compared with contemporary methods, our approach identified 1.4-2.1 times more sQTLs, many of which helped us ascribe molecular effects to disease-associated variants. Transcriptome-wide associations between LeafCutter intron quantifications and 40 complex traits increased the number of associated disease genes at a 5% false discovery rate by an average of 2.1-fold compared with that detected through the use of gene expression levels alone. LeafCutter is fast, scalable, easy to use, and available online.

Journal ArticleDOI
TL;DR: It is shown that targeted inactivation of the Malat1 gene in a transgenic mouse model of breast cancer, without altering the expression of its adjacent genes, promotes lung metastasis, and that this phenotype can be reversed by genetic add-back of Malat 1.
Abstract: MALAT1 has previously been described as a metastasis-promoting long noncoding RNA (lncRNA). We show here, however, that targeted inactivation of the Malat1 gene in a transgenic mouse model of breast cancer, without altering the expression of its adjacent genes, promotes lung metastasis, and that this phenotype can be reversed by genetic add-back of Malat1. Similarly, knockout of MALAT1 in human breast cancer cells induces their metastatic ability, which is reversed by re-expression of Malat1. Conversely, overexpression of Malat1 suppresses breast cancer metastasis in transgenic, xenograft, and syngeneic models. Mechanistically, the MALAT1 lncRNA binds and inactivates the prometastatic transcription factor TEAD, preventing TEAD from associating with its co-activator YAP and target gene promoters. Moreover, MALAT1 levels inversely correlate with breast cancer progression and metastatic ability. These findings demonstrate that MALAT1 is a metastasis-suppressing lncRNA rather than a metastasis promoter in breast cancer, calling for rectification of the model for this highly abundant and conserved lncRNA.

Journal ArticleDOI
TL;DR: Analysis of genetic data and blood lipid measurements from over 300,000 participants in the Million Veteran Program identifies new associations for blood lipid traits and proposes novel indications for pharmaceutical inhibitors targeting PCSK9, ANGPTL4 (type 2 diabetes) and PDE3B (triglycerides and coronary disease).
Abstract: The Million Veteran Program (MVP) was established in 2011 as a national research initiative to determine how genetic variation influences the health of US military veterans Here we genotyped 312,571 MVP participants using a custom biobank array and linked the genetic data to laboratory and clinical phenotypes extracted from electronic health records covering a median of 100 years of follow-up Among 297,626 veterans with at least one blood lipid measurement, including 57,332 black and 24,743 Hispanic participants, we tested up to around 32 million variants for association with lipid levels and identified 118 novel genome-wide significant loci after meta-analysis with data from the Global Lipids Genetics Consortium (total n > 600,000) Through a focus on mutations predicted to result in a loss of gene function and a phenome-wide association study, we propose novel indications for pharmaceutical inhibitors targeting PCSK9 (abdominal aortic aneurysm), ANGPTL4 (type 2 diabetes) and PDE3B (triglycerides and coronary disease) Analysis of genetic data and blood lipid measurements from over 300,000 participants in the Million Veteran Program identifies new associations for blood lipid traits

Journal ArticleDOI
TL;DR: It is suggested that many of the putative atrial fibrillation genes act via cardiac structural remodeling, potentially in the form of an ‘atrial cardiomyopathy’2, either during fetal heart development or as a response to stress in the adult heart.
Abstract: To identify genetic variation underlying atrial fibrillation, the most common cardiac arrhythmia, we performed a genome-wide association study of >1,000,000 people, including 60,620 atrial fibrillation cases and 970,216 controls. We identified 142 independent risk variants at 111 loci and prioritized 151 functional candidate genes likely to be involved in atrial fibrillation. Many of the identified risk variants fall near genes where more deleterious mutations have been reported to cause serious heart defects in humans (GATA4, MYH6, NKX2-5, PITX2, TBX5)1, or near genes important for striated muscle function and integrity (for example, CFL2, MYH7, PKP2, RBM20, SGCG, SSPN). Pathway and functional enrichment analyses also suggested that many of the putative atrial fibrillation genes act via cardiac structural remodeling, potentially in the form of an 'atrial cardiomyopathy'2, either during fetal heart development or as a response to stress in the adult heart.

Journal ArticleDOI
TL;DR: A pan-genome dataset of the Oryza sativa–Oryza rufipogon species complex generated through deep sequencing and de novo genome assembly of 66 divergent accessions will be helpful in pinpointing new causal variants underlying complex traits and in promoting evolutionary and functional studies in rice.
Abstract: The rich genetic diversity in Oryza sativa and Oryza rufipogon serves as the main sources in rice breeding. Large-scale resequencing has been undertaken to discover allelic variants in rice, but much of the information for genetic variation is often lost by direct mapping of short sequence reads onto the O. sativa japonica Nipponbare reference genome. Here we constructed a pan-genome dataset of the O. sativa–O. rufipogon species complex through deep sequencing and de novo assembly of 66 divergent accessions. Intergenomic comparisons identified 23 million sequence variants in the rice genome. This catalog of sequence variations includes many known quantitative trait nucleotides and will be helpful in pinpointing new causal variants that underlie complex traits. In particular, we systemically investigated the whole set of coding genes using this pan-genome data, which revealed extensive presence and absence of variation among rice accessions. This pan-genome resource will further promote evolutionary and functional studies in rice. A pan-genome dataset of the Oryza sativa–Oryza rufipogon species complex generated through deep sequencing and de novo genome assembly of 66 divergent accessions will be helpful in pinpointing new causal variants underlying complex traits and in promoting evolutionary and functional studies in rice.

Journal ArticleDOI
TL;DR: This study uniformly analyzed whole-exome sequencing of 249 tumors and matched normal tissue from patients with clinically annotated outcomes to immune checkpoint therapy across multiple cancer types to examine additional tumor genomic features that contribute to selective response.
Abstract: Tumor mutational burden correlates with response to immune checkpoint blockade in multiple solid tumors, although in microsatellite-stable tumors this association is of uncertain clinical utility. Here we uniformly analyzed whole-exome sequencing (WES) of 249 tumors and matched normal tissue from patients with clinically annotated outcomes to immune checkpoint therapy, including radiographic response, across multiple cancer types to examine additional tumor genomic features that contribute to selective response. Our analyses identified genomic correlates of response beyond mutational burden, including somatic events in individual driver genes, certain global mutational signatures, and specific HLA-restricted neoantigens. However, these features were often interrelated, highlighting the complexity of identifying genetic driver events that generate an immunoresponsive tumor environment. This study lays a path forward in analyzing large clinical cohorts in an integrated and multifaceted manner to enhance the ability to discover clinically meaningful predictive features of response to immune checkpoint blockade.

Journal ArticleDOI
Jisen Zhang1, Xingtan Zhang2, Haibao Tang2, Qing Zhang2, Xiuting Hua2, Xiaokai Ma2, Fan Zhu2, Tyler Jones, Xin-Guang Zhu3, John E. Bowers4, Ching Man Wai5, Chunfang Zheng6, Yan Shi2, Shuai Chen2, Xiuming Xu2, Jingjing Yue2, David R. Nelson7, Lixian Huang2, Zhen Li2, Huimin Xu2, Dong Zhou2, Yongjun Wang2, Weichang Hu2, Jishan Lin2, Youjin Deng2, Neha Pandey2, Melina Cristina Mancini2, Dessireé Zerpa2, Julie K. Nguyen2, Liming Wang2, Liang Yu2, Yinghui Xin2, Liangfa Ge2, Jie Arro2, Jennifer Han2, Setu Chakrabarty2, Marija Pushko2, Wenping Zhang2, Yanhong Ma2, Panpan Ma2, Mingju Lv3, Faming Chen8, Guangyong Zheng8, Jingsheng Xu2, Zhenhui Yang2, Fang Deng2, Xuequn Chen2, Zhenyang Liao2, Xunxiao Zhang2, Zhicong Lin2, Hai Lin2, Hansong Yan2, Zheng Kuang2, Weimin Zhong2, Pingping Liang2, Guofeng Wang2, Yuan Yuan2, Jiaxian Shi2, Jinxiang Hou2, Jingxian Lin2, Jingjing Jin, Peijian Cao, Qiaochu Shen2, Qing Jiang2, Ping Zhou2, Yaying Ma2, Xiaodan Zhang2, Rongrong Xu2, Juan Liu2, Yongmei Zhou2, Haifeng Jia2, Qing Ma2, Rui Qi2, Zhiliang Zhang2, Jingping Fang2, Hongkun Fang2, Jinjin Song2, Mengjuan Wang2, Guangrui Dong2, Gang Wang2, Zheng Chen2, Teng Ma2, Hong Liu2, Singha R. Dhungana9, Sarah E. Huss2, Xiping Yang10, Anupma Sharma11, Jhon H. Trujillo, Maria C. Martinez, Matthew E. Hudson2, John J. Riascos, Mary A. Schuler2, Li Qing Chen2, David M. Braun9, Lei Li2, Qingyi Yu11, Jianping Wang1, Jianping Wang10, Kai Wang2, Michael C. Schatz12, David Heckerman13, Marie-Anne Van Sluys14, Glaucia Mendes Souza14, Paul H. Moore, David Sankoff6, Robert VanBuren5, Andrew H. Paterson4, Chifumi Nagai, Ray Ming1, Ray Ming2 
TL;DR: In this article, a haplotype of S. spontaneum, AP85-441, facilitated the assembly of 32 pseudo-chromosomes comprising 8 homologous groups of 4 members each, bearing 35,525 genes with alleles defined.
Abstract: Modern sugarcanes are polyploid interspecific hybrids, combining high sugar content from Saccharum officinarum with hardiness, disease resistance and ratooning of Saccharum spontaneum. Sequencing of a haploid S. spontaneum, AP85-441, facilitated the assembly of 32 pseudo-chromosomes comprising 8 homologous groups of 4 members each, bearing 35,525 genes with alleles defined. The reduction of basic chromosome number from 10 to 8 in S. spontaneum was caused by fissions of 2 ancestral chromosomes followed by translocations to 4 chromosomes. Surprisingly, 80% of nucleotide binding site-encoding genes associated with disease resistance are located in 4 rearranged chromosomes and 51% of those in rearranged regions. Resequencing of 64 S. spontaneum genomes identified balancing selection in rearranged regions, maintaining their diversity. Introgressed S. spontaneum chromosomes in modern sugarcanes are randomly distributed in AP85-441 genome, indicating random recombination among homologs in different S. spontaneum accessions. The allele-defined Saccharum genome offers new knowledge and resources to accelerate sugarcane improvement.

Journal ArticleDOI
TL;DR: A transcriptome- wide association study integrating genome-wide association data with expression data from brain, blood and adipose tissues identifies new candidate susceptibility genes for schizophrenia, providing a step toward understanding the underlying biology.
Abstract: Genome-wide association studies (GWAS) have identified over 100 risk loci for schizophrenia, but the causal mechanisms remain largely unknown. We performed a transcriptome-wide association study (TWAS) integrating a schizophrenia GWAS of 79,845 individuals from the Psychiatric Genomics Consortium with expression data from brain, blood, and adipose tissues across 3,693 primarily control individuals. We identified 157 TWAS-significant genes, of which 35 did not overlap a known GWAS locus. Of these 157 genes, 42 were associated with specific chromatin features measured in independent samples, thus highlighting potential regulatory targets for follow-up. Suppression of one identified susceptibility gene, mapk3, in zebrafish showed a significant effect on neurodevelopmental phenotypes. Expression and splicing from the brain captured most of the TWAS effect across all genes. This large-scale connection of associations to target genes, tissues, and regulatory features is an essential step in moving toward a mechanistic understanding of GWAS.

Journal ArticleDOI
TL;DR: This work sequenced 484 genomes of bacterial isolates from roots of Brassicaceae, poplar, and maize and validated candidates from two sets of plant-associated genes, including one involved in plant colonization and the other serving in microbe–microbe competition between plant and microbe.
Abstract: Plants intimately associate with diverse bacteria. Plant-associated bacteria have ostensibly evolved genes that enable them to adapt to plant environments. However, the identities of such genes are mostly unknown, and their functions are poorly characterized. We sequenced 484 genomes of bacterial isolates from roots of Brassicaceae, poplar, and maize. We then compared 3,837 bacterial genomes to identify thousands of plant-associated gene clusters. Genomes of plant-associated bacteria encode more carbohydrate metabolism functions and fewer mobile elements than related non-plant-associated genomes do. We experimentally validated candidates from two sets of plant-associated genes: one involved in plant colonization, and the other serving in microbe-microbe competition between plant-associated bacteria. We also identified 64 plant-associated protein domains that potentially mimic plant domains; some are shared with plant-associated fungi and oomycetes. This work expands the genome-based understanding of plant-microbe interactions and provides potential leads for efficient and sustainable agriculture through microbiome engineering.

Journal ArticleDOI
TL;DR: The fecal metabolome largely reflects gut microbial composition and is strongly associated with visceral-fat mass, thereby illustrating potential mechanisms underlying the well-established microbial influence on abdominal obesity.
Abstract: The human gut microbiome plays a key role in human health 1 , but 16S characterization lacks quantitative functional annotation 2 . The fecal metabolome provides a functional readout of microbial activity and can be used as an intermediate phenotype mediating host–microbiome interactions 3 . In this comprehensive description of the fecal metabolome, examining 1,116 metabolites from 786 individuals from a population-based twin study (TwinsUK), the fecal metabolome was found to be only modestly influenced by host genetics (heritability (H2) = 17.9%). One replicated locus at the NAT2 gene was associated with fecal metabolic traits. The fecal metabolome largely reflects gut microbial composition, explaining on average 67.7% (±18.8%) of its variance. It is strongly associated with visceral-fat mass, thereby illustrating potential mechanisms underlying the well-established microbial influence on abdominal obesity. Fecal metabolic profiling thus is a novel tool to explore links among microbiome composition, host phenotypes, and heritable complex traits. Comprehensive fecal metabolic profiling in 786 individuals from TwinsUK provides insights into the influence of host genetics and gut microbial composition on metabolites that may mediate microbiome-associated phenotypes.

Journal ArticleDOI
TL;DR: WGD predicted for increased morbidity across cancer types, including KRAS-mutant colorectal cancers and estrogen receptor-positive breast cancers, independently of established clinical prognostic factors.
Abstract: Ploidy abnormalities are a hallmark of cancer, but their impact on the evolution and outcomes of cancers is unknown. Here, we identified whole-genome doubling (WGD) in the tumors of nearly 30% of 9,692 prospectively sequenced advanced cancer patients. WGD varied by tumor lineage and molecular subtype, and arose early in carcinogenesis after an antecedent transforming driver mutation. While associated with TP53 mutations, 46% of all WGD arose in TP53-wild-type tumors and in such cases was associated with an E2F-mediated G1 arrest defect, although neither aberration was obligate in WGD tumors. The variability of WGD across cancer types can be explained in part by cancer cell proliferation rates. WGD predicted for increased morbidity across cancer types, including KRAS-mutant colorectal cancers and estrogen receptor-positive breast cancers, independently of established clinical prognostic factors. We conclude that WGD is highly common in cancer and is a macro-evolutionary event associated with poor prognosis across cancer types.