Showing papers in "Nature Genetics in 2018"
••
TL;DR: The MR-PRESSO test detects and corrects horizontal pleiotropy in multi-instrument Mendelian randomization (MR) analyses and introduces distortions in the causal estimates in MR that ranged on average from –131% to 201%; it is shown using simulations that the MR-pressO test is best suited when horizontal Pleiotropy occurs in <50% of instruments.
Abstract: Horizontal pleiotropy occurs when the variant has an effect on disease outside of its effect on the exposure in Mendelian randomization (MR). Violation of the ‘no horizontal pleiotropy’ assumption can cause severe bias in MR. We developed the Mendelian randomization pleiotropy residual sum and outlier (MR-PRESSO) test to identify horizontal pleiotropic outliers in multi-instrument summary-level MR testing. We showed using simulations that the MR-PRESSO test is best suited when horizontal pleiotropy occurs in 48% of causal relationships.
2,362 citations
••
TL;DR: Genome-wide polygenic risk scores derived from GWAS data for five common diseases can identify subgroups of the population with risk approaching or exceeding that of a monogenic mutation.
Abstract: A key public health need is to identify individuals at high risk for a given disease to enable enhanced screening or preventive therapies. Because most common diseases have a genetic component, one important approach is to stratify individuals based on inherited DNA variation1. Proposed clinical applications have largely focused on finding carriers of rare monogenic mutations at several-fold increased risk. Although most disease risk is polygenic in nature2-5, it has not yet been possible to use polygenic predictors to identify individuals at risk comparable to monogenic mutations. Here, we develop and validate genome-wide polygenic scores for five common diseases. The approach identifies 8.0, 6.1, 3.5, 3.2, and 1.5% of the population at greater than threefold increased risk for coronary artery disease, atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer, respectively. For coronary artery disease, this prevalence is 20-fold higher than the carrier frequency of rare monogenic mutations conferring comparable risk6. We propose that it is time to contemplate the inclusion of polygenic risk prediction in clinical care, and discuss relevant issues.
1,962 citations
••
TL;DR: A genome-wide association meta-analysis of individuals with clinically assessed or self-reported depression identifies 44 independent and significant loci and finds important relationships of genetic risk for major depression with educational attainment, body mass, and schizophrenia.
Abstract: Major depressive disorder (MDD) is a common illness accompanied by considerable morbidity, mortality, costs, and heightened risk of suicide. We conducted a genome-wide association meta-analysis based in 135,458 cases and 344,901 controls and identified 44 independent and significant loci. The genetic findings were associated with clinical features of major depression and implicated brain regions exhibiting anatomical differences in cases. Targets of antidepressant medications and genes involved in gene splicing were enriched for smaller association signal. We found important relationships of genetic risk for major depression with educational attainment, body mass, and schizophrenia: lower educational attainment and higher body mass were putatively causal, whereas major depression and schizophrenia reflected a partly shared biological etiology. All humans carry lesser or greater numbers of genetic risk factors for major depression. These findings help refine the basis of major depression and imply that a continuous measure of risk underlies the clinical phenotype.
1,898 citations
••
University of Minnesota1, University of Colorado Boulder2, VU University Amsterdam3, Harvard University4, University of Southern California5, University of Queensland6, University of Tartu7, Erasmus University Rotterdam8, Hospital for Special Surgery9, University of Copenhagen10, Statens Serum Institut11, Broad Institute12, University of Essex13, University of Edinburgh14, University of Cambridge15, University Hospital of Lausanne16, Geisinger Health System17, Wenzhou Medical College18, Stanford University19, University of North Carolina at Chapel Hill20, University of Wisconsin-Madison21, Hofstra University22, The Feinstein Institute for Medical Research23, University of Dundee24, University of Toronto25, Princeton University26, National Bureau of Economic Research27, New York University Shanghai28, Queen's University29, Karolinska Institutet30, Uppsala University31, University of Lausanne32, New York University33, Stockholm School of Economics34
TL;DR: A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11–13% of the variance ineducational attainment and 7–10% ofthe variance in cognitive performance, which substantially increases the utility ofpolygenic scores as tools in research.
Abstract: Here we conducted a large-scale genetic association analysis of educational attainment in a sample of approximately 1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs. For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of around 0.3% in both men and women, consistent with partial dosage compensation. A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11-13% of the variance in educational attainment and 7-10% of the variance in cognitive performance. This prediction accuracy substantially increases the utility of polygenic scores as tools in research.
1,658 citations
••
Cardiff University1, Harvard University2, Charité3, King's College London4, Broad Institute5, University of Adelaide6, Centre for Mental Health7, University of Queensland8, University of Münster9, University of Edinburgh10, QIMR Berghofer Medical Research Institute11, University of Vigo12, University of California, Los Angeles13, Icahn School of Medicine at Mount Sinai14, University of Oviedo15, Lundbeck16, Aarhus University17, University of Oslo18, Oslo University Hospital19, Statens Serum Institut20, University of Bergen21, Aarhus University Hospital22, University of Copenhagen23, University of Belgrade24, Tbilisi State Medical University25, deCODE genetics26, University of Verona27, Mental Health Services28, Eli Lilly and Company29, Martin Luther University of Halle-Wittenberg30, Ludwig Maximilian University of Munich31
TL;DR: A new genome-wide association study of schizophrenia is reported, and through meta-analysis with existing data and integrating genomic fine-mapping with brain expression and chromosome conformation data, 50 novel associated loci and 145 loci are identified.
Abstract: Schizophrenia is a debilitating psychiatric condition often associated with poor quality of life and decreased life expectancy. Lack of progress in improving treatment outcomes has been attributed to limited knowledge of the underlying biology, although large-scale genomic studies have begun to provide insights. We report a new genome-wide association study of schizophrenia (11,260 cases and 24,542 controls), and through meta-analysis with existing data we identify 50 novel associated loci and 145 loci in total. Through integrating genomic fine-mapping with brain expression and chromosome conformation data, we identify candidate causal genes within 33 loci. We also show for the first time that the common variant association signal is highly enriched among genes that are under strong selective pressures. These findings provide new insights into the biology and genetic architecture of schizophrenia, highlight the importance of mutation-intolerant genes and suggest a mechanism by which common risk variants persist in the population.
1,259 citations
••
University of Oxford1, University of Michigan2, Wellcome Trust Sanger Institute3, Amgen4, University of Cambridge5, University of Copenhagen6, University of Liverpool7, University of Freiburg8, Boston University9, University of Tartu10, Erasmus University Medical Center11, Leiden University Medical Center12, Pasteur Institute13, Icahn School of Medicine at Mount Sinai14, UCLA Medical Center15, Vanderbilt University Medical Center16, Wake Forest University17, National University of Singapore18, London North West Healthcare NHS Trust19, Imperial College London20, Charité21, Innsbruck Medical University22, Washington University in St. Louis23, Queen Mary University of London24, University of Southern Denmark25, National and Kapodistrian University of Athens26, Robertson Centre for Biostatistics27, University of Exeter28, Uppsala University29, University of Düsseldorf30, Steno Diabetes Center31, Aalborg University32, University of Eastern Finland33, Broad Institute34, Frederiksberg Hospital35, University of Bergen36, Lund University37, Technische Universität München38, University of North Carolina at Chapel Hill39, Ninewells Hospital40, University of Edinburgh41, University of Minnesota42, University of Glasgow43, Ludwig Maximilian University of Munich44, University of Iceland45, Aarhus University46, Stanford University47, Science for Life Laboratory48, University of Helsinki49, National Institutes of Health50, University of Dundee51, Harvard University52
TL;DR: Combining 32 genome-wide association studies with high-density imputation provides a comprehensive view of the genetic contribution to type 2 diabetes in individuals of European ancestry with respect to locus discovery, causal-variant resolution, and mechanistic insight.
Abstract: We expanded GWAS discovery for type 2 diabetes (T2D) by combining data from 898,130 European-descent individuals (9% cases), after imputation to high-density reference panels. With these data, we (i) extend the inventory of T2D-risk variants (243 loci, 135 newly implicated in T2D predisposition, comprising 403 distinct association signals); (ii) enrich discovery of lower-frequency risk alleles (80 index variants with minor allele frequency 2); (iii) substantially improve fine-mapping of causal variants (at 51 signals, one variant accounted for >80% posterior probability of association (PPA)); (iv) extend fine-mapping through integration of tissue-specific epigenomic information (islet regulatory annotations extend the number of variants with PPA >80% to 73); (v) highlight validated therapeutic targets (18 genes with associations attributable to coding variants); and (vi) demonstrate enhanced potential for clinical translation (genome-wide chip heritability explains 18% of T2D risk; individuals in the extremes of a T2D polygenic risk score differ more than ninefold in prevalence).
1,136 citations
••
TL;DR: A multiancestry genome-wide-association meta-analysis in 521,612 individuals and discovered 22 new stroke risk loci and eleven new susceptibility loci indicate mechanisms not previously implicated in stroke pathophysiology, with prioritization of risk variants and genes accomplished through bioinformatics analyses using extensive functional datasets.
Abstract: Stroke has multiple etiologies, but the underlying genes and pathways are largely unknown. We conducted a multiancestry genome-wide-association meta-analysis in 521,612 individuals (67,162 cases and 454,450 controls) and discovered 22 new stroke risk loci, bringing the total to 32. We further found shared genetic variation with related vascular traits, including blood pressure, cardiac traits, and venous thromboembolism, at individual loci (n = 18), and using genetic risk scores and linkage-disequilibrium-score regression. Several loci exhibited distinct association and pleiotropy patterns for etiological stroke subtypes. Eleven new susceptibility loci indicate mechanisms not previously implicated in stroke pathophysiology, with prioritization of risk variants and genes accomplished through bioinformatics analyses using extensive functional datasets. Stroke risk loci were significantly enriched in drug targets for antithrombotic therapy.
881 citations
••
VU University Amsterdam1, Erasmus University Rotterdam2, Karolinska Institutet3, Charité4, Virginia Commonwealth University5, South London and Maudsley NHS Foundation Trust6, QIMR Berghofer Medical Research Institute7, King's College London8, University of Southern Denmark9, University of California, Riverside10, University of Southern California11, University of Minnesota12, University of Queensland13, University College London14, Johns Hopkins University15, University of California, Los Angeles16, University of Crete17, Icahn School of Medicine at Mount Sinai18, Harvard University19, Veterans Health Administration20, Yale University21, Haukeland University Hospital22, Trinity College, Dublin23, University of Edinburgh24, Hofstra University25, North Shore-LIJ Health System26, National Institutes of Health27, University of Bergen28, Oslo University Hospital29, National University of Ireland, Galway30, University of Helsinki31, University of Oslo32, Martin Luther University of Halle-Wittenberg33, Duke University34, National and Kapodistrian University of Athens35, Mental Health Research Institute36, University of Colorado Boulder37, Imperial College London38, University of Manchester39, Wellcome Trust40, Manchester Academic Health Science Centre41, Stanford University42, University of Oregon43, University of Toronto44, University of Michigan45, Erasmus University Medical Center46, Broad Institute47, University of North Carolina at Chapel Hill48
TL;DR: A large-scale genetic association study of intelligence identifies 190 new loci and implicates 939 new genes related to neurogenesis, neuron differentiation and synaptic structure, a major step forward in understanding the neurobiology of cognitive function as well as genetically related neurological and psychiatric disorders.
Abstract: Intelligence is highly heritable1 and a major determinant of human health and well-being2. Recent genome-wide meta-analyses have identified 24 genomic loci linked to variation in intelligence3-7, but much about its genetic underpinnings remains to be discovered. Here, we present a large-scale genetic association study of intelligence (n = 269,867), identifying 205 associated genomic loci (190 new) and 1,016 genes (939 new) via positional mapping, expression quantitative trait locus (eQTL) mapping, chromatin interaction mapping, and gene-based association analysis. We find enrichment of genetic effects in conserved and coding regions and associations with 146 nonsynonymous exonic variants. Associated genes are strongly expressed in the brain, specifically in striatal medium spiny neurons and hippocampal pyramidal neurons. Gene set analyses implicate pathways related to nervous system development and synaptic structure. We confirm previous strong genetic correlations with multiple health-related outcomes, and Mendelian randomization analysis results suggest protective effects of intelligence for Alzheimer's disease and ADHD and bidirectional causation with pleiotropic effects for schizophrenia. These results are a major step forward in understanding the neurobiology of cognitive function as well as genetically related neurological and psychiatric disorders.
800 citations
••
TL;DR: SAIGE is a scalable and accurate generalized mixed model association test that can efficiently analyze large data sets while controlling for unbalanced case-control ratios and sample relatedness, as shown by applying SAIGE to the UK Biobank data for > 1,400 binary phenotypes.
Abstract: In genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, the linear mixed model and the recently proposed logistic mixed model, perform poorly; they produce large type I error rates when used to analyze unbalanced case-control phenotypes. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation to calibrate the distribution of score test statistics. This method, SAIGE (Scalable and Accurate Implementation of GEneralized mixed model), provides accurate P values even when case-control ratios are extremely unbalanced. SAIGE uses state-of-art optimization strategies to reduce computational costs; hence, it is applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK Biobank data of 408,961 samples from white British participants with European ancestry for > 1,400 binary phenotypes, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness.
773 citations
••
Evangelos Evangelou1, Evangelos Evangelou2, Helen R. Warren3, Helen R. Warren4 +338 more•Institutions (93)
TL;DR: In this article, the largest genetic association study of blood pressure traits (systolic, diastolic and pulse pressure) to date in over 1 million people of European ancestry was conducted.
Abstract: High blood pressure is a highly heritable and modifiable risk factor for cardiovascular disease We report the largest genetic association study of blood pressure traits (systolic, diastolic and pulse pressure) to date in over 1 million people of European ancestry We identify 535 novel blood pressure loci that not only offer new biological insights into blood pressure regulation but also highlight shared genetic architecture between blood pressure and lifestyle exposures Our findings identify new biological pathways for blood pressure regulation with potential for improved cardiovascular disease prevention in the future
728 citations
••
TL;DR: An approach to identify disease-relevant tissues and cell types by analyzing gene expression data together with genome-wide association study (GWAS) summary statistics and found significant tissue-specific enrichments for 34 traits.
Abstract: We introduce an approach to identify disease-relevant tissues and cell types by analyzing gene expression data together with genome-wide association study (GWAS) summary statistics. Our approach uses stratified linkage disequilibrium (LD) score regression to test whether disease heritability is enriched in regions surrounding genes with the highest specific expression in a given tissue. We applied our approach to gene expression data from several sources together with GWAS summary statistics for 48 diseases and traits (average N = 169,331) and found significant tissue-specific enrichments (false discovery rate (FDR) < 5%) for 34 traits. In our analysis of multiple tissues, we detected a broad range of enrichments that recapitulated known biology. In our brain-specific analysis, significant enrichments included an enrichment of inhibitory over excitatory neurons for bipolar disorder, and excitatory over inhibitory neurons for schizophrenia and body mass index. Our results demonstrate that our polygenic approach is a powerful way to leverage gene expression data for interpreting GWAS signals.
••
Harvard University1, Broad Institute2, VU University Amsterdam3, University of Minnesota4, Hospital for Special Surgery5, University of Southern California6, University of Colorado Boulder7, Karolinska Institutet8, Uppsala University9, Stockholm School of Economics10, University of Queensland11, National Bureau of Economic Research12, New York University13, Research Institute of Industrial Economics14
TL;DR: Applying MTAG to summary statistics for depressive symptoms, neuroticism and subjective well-being increased discovery of associated loci as compared to single-trait analyses, yielding more informative bioinformatics analyses and increasing the variance explained by polygenic scores by approximately 25%, matching theoretical expectations.
Abstract: We introduce multi-trait analysis of GWAS (MTAG), a method for joint analysis of summary statistics from genome-wide association studies (GWAS) of different traits, possibly from overlapping samples. We apply MTAG to summary statistics for depressive symptoms (N eff = 354,862), neuroticism (N = 168,105), and subjective well-being (N = 388,538). As compared to the 32, 9, and 13 genome-wide significant loci identified in the single-trait GWAS (most of which are themselves novel), MTAG increases the number of associated loci to 64, 37, and 49, respectively. Moreover, association statistics from MTAG yield more informative bioinformatics analyses and increase the variance explained by polygenic scores by approximately 25%, matching theoretical expectations.
••
Fredrick R. Schumacher1, Ali Amin Al Olama2, Sonja I. Berndt3, Sara Benlloch2 +204 more•Institutions (79)
TL;DR: A large meta-analysis combining genome-wide and custom high-density genotyping array data identifies 63 new susceptibility loci for prostate cancer, enhancing fine-mapping efforts and providing insights into the underlying biology of PrCa1.
Abstract: Genome-wide association studies (GWAS) and fine-mapping efforts to date have identified more than 100 prostate cancer (PrCa)-susceptibility loci. We meta-analyzed genotype data from a custom high-density array of 46,939 PrCa cases and 27,910 controls of European ancestry with previously genotyped data of 32,255 PrCa cases and 33,202 controls of European ancestry. Our analysis identified 62 novel loci associated (P C, p.Pro1054Arg) in ATM and rs2066827 (OR = 1.06; P = 2.3 × 10−9; T>G, p.Val109Gly) in CDKN1B. The combination of all loci captured 28.4% of the PrCa familial relative risk, and a polygenic risk score conferred an elevated PrCa risk for men in the ninetieth to ninety-ninth percentiles (relative risk = 2.69; 95% confidence interval (CI): 2.55–2.82) and first percentile (relative risk = 5.71; 95% CI: 5.04–6.48) risk stratum compared with the population average. These findings improve risk prediction, enhance fine-mapping, and provide insight into the underlying biology of PrCa1. A large meta-analysis combining genome-wide and custom high-density genotyping array data identifies 63 new susceptibility loci for prostate cancer, enhancing fine-mapping efforts and providing insights into the underlying biology.
••
TL;DR: A new class of E26 transformation-specific (ETS)-fusion-negative tumors defined by mutations in epigenetic regulators, as well as alterations in pathways not previously implicated in prostate cancer, such as the spliceosome pathway are identified.
Abstract: Comprehensive genomic characterization of prostate cancer has identified recurrent alterations in genes involved in androgen signaling, DNA repair, and PI3K signaling, among others. However, larger and uniform genomic analysis may identify additional recurrently mutated genes at lower frequencies. Here we aggregate and uniformly analyze exome sequencing data from 1,013 prostate cancers. We identify and validate a new class of E26 transformation-specific (ETS)-fusion-negative tumors defined by mutations in epigenetic regulators, as well as alterations in pathways not previously implicated in prostate cancer, such as the spliceosome pathway. We find that the incidence of significantly mutated genes (SMGs) follows a long-tail distribution, with many genes mutated in less than 3% of cases. We identify a total of 97 SMGs, including 70 not previously implicated in prostate cancer, such as the ubiquitin ligase CUL3 and the transcription factor SPEN. Finally, comparing primary and metastatic prostate cancer identifies a set of genomic markers that may inform risk stratification.
••
TL;DR: It is demonstrated that even without prior biological knowledge of cross-phenotype relationships, genetics corresponding to clinical measurements successfully recapture those measurements’ relevance to diseases, and thus can contribute to the elucidation of unknown etiology and pathogenesis.
Abstract: Clinical measurements can be viewed as useful intermediate phenotypes to promote understanding of complex human diseases. To acquire comprehensive insights into the underlying genetics, here we conducted a genome-wide association study (GWAS) of 58 quantitative traits in 162,255 Japanese individuals. Overall, we identified 1,407 trait-associated loci (P < 5.0 × 10−8), 679 of which were novel. By incorporating 32 additional GWAS results for complex diseases and traits in Japanese individuals, we further highlighted pleiotropy, genetic correlations, and cell-type specificity across quantitative traits and diseases, which substantially expands the current understanding of the associated genetics and biology. This study identified both shared polygenic effects and cell-type specificity, represented by the genetic links among clinical measurements, complex diseases, and relevant cell types. Our findings demonstrate that even without prior biological knowledge of cross-phenotype relationships, genetics corresponding to clinical measurements successfully recapture those measurements’ relevance to diseases, and thus can contribute to the elucidation of unknown etiology and pathogenesis. A genome-wide association study (GWAS) of 58 traits using data from the Biobank Japan Project identifies 1,407 loci, 679 of which are novel. Comparison with disease GWASs and analysis of genetic correlations and cell-type enrichment show that these clinical measurements are relevant to human disease.
••
TL;DR: A much faster version of the BOLT-LMM Bayesian mixed model association method is introduced—capable of running analyses of the full UK Biobank cohort in a few days on a single compute node—and it is shown that it produces highly powered, robust test statistics when run on all 459K European samples (retaining related individuals).
Abstract: Biobank-based genome-wide association studies are enabling exciting insights in complex trait genetics, but much uncertainty remains over best practices for optimizing statistical power and computational efficiency in GWAS while controlling confounders. Here, we introduce a much faster version of our BOLT-LMM Bayesian mixed model association method—capable of running analyses of the full UK Biobank cohort in a few days on a single compute node—and show that it produces highly powered, robust test statistics when run on all 459K European samples (retaining related individuals). When used to conduct a GWAS for height in UK Biobank, BOLT-LMM achieved power equivalent to linear regression on 650K samples—a 93% increase in effective sample size versus the common practice of analyzing unrelated British samples using linear regression (UK Biobank documentation; Bycroft et al. bioRxiv). Across a broader set of 23 highly heritable traits, the total number of independent GWAS loci detected increased from 5,839 to 10,759, an 84% increase. We recommend the use of BOLT-LMM (retaining related individuals) for biobank-scale analyses, and we have publicly released BOLT-LMM summary association statistics for the 23 traits analyzed as a resource for all researchers.
••
TL;DR: It is shown that neuroticism’s genetic signal partly originates in two genetically distinguishable subclusters13 (‘depressed affect’ and ‘worry’), suggesting distinct causal mechanisms for subtypes of individuals.
Abstract: Neuroticism is an important risk factor for psychiatric traits, including depression1, anxiety2,3, and schizophrenia4-6. At the time of analysis, previous genome-wide association studies7-12 (GWAS) reported 16 genomic loci associated to neuroticism10-12. Here we conducted a large GWAS meta-analysis (n = 449,484) of neuroticism and identified 136 independent genome-wide significant loci (124 new at the time of analysis), which implicate 599 genes. Functional follow-up analyses showed enrichment in several brain regions and involvement of specific cell types, including dopaminergic neuroblasts (P = 3.49 × 10-8), medium spiny neurons (P = 4.23 × 10-8), and serotonergic neurons (P = 1.37 × 10-7). Gene set analyses implicated three specific pathways: neurogenesis (P = 4.43 × 10-9), behavioral response to cocaine processes (P = 1.84 × 10-7), and axon part (P = 5.26 × 10-8). We show that neuroticism's genetic signal partly originates in two genetically distinguishable subclusters13 ('depressed affect' and 'worry'), suggesting distinct causal mechanisms for subtypes of individuals. Mendelian randomization analysis showed unidirectional and bidirectional effects between neuroticism and multiple psychiatric traits. These results enhance neurobiological understanding of neuroticism and provide specific leads for functional follow-up experiments.
••
TL;DR: This large, multi-ethnic genome-wide association study identifies 97 loci significantly associated with atrial fibrillation that are enriched for genes involved in cardiac development, electrophysiology, structure and contractile function.
Abstract: Atrial fibrillation (AF) affects more than 33 million individuals worldwide1 and has a complex heritability2. We conducted the largest meta-analysis of genome-wide association studies (GWAS) for AF to date, consisting of more than half a million individuals, including 65,446 with AF. In total, we identified 97 loci significantly associated with AF, including 67 that were novel in a combined-ancestry analysis, and 3 that were novel in a European-specific analysis. We sought to identify AF-associated genes at the GWAS loci by performing RNA-sequencing and expression quantitative trait locus analyses in 101 left atrial samples, the most relevant tissue for AF. We also performed transcriptome-wide analyses that identified 57 AF-associated genes, 42 of which overlap with GWAS loci. The identified loci implicate genes enriched within cardiac developmental, electrophysiological, contractile and structural pathways. These results extend our understanding of the biological pathways underlying AF and may facilitate the development of therapeutics for AF.
••
TL;DR: An atlas of genetic associations for 118 non-binary and 660 binary traits of 452,264 UK Biobank participants of European ancestry and this atlas allows researchers to query these results without incurring high computational costs is presented.
Abstract: Genome-wide association studies (GWAS) have identified many loci contributing to variation in complex traits, yet the majority of loci that contribute to the heritability of complex traits remain elusive. Large study populations with sufficient statistical power are required to detect the small effect sizes of the yet unidentified genetic variants. However, the analysis of huge cohorts, like UK Biobank, is challenging. Here, we present an atlas of genetic associations for 118 non-binary and 660 binary traits of 452,264 UK Biobank participants of European ancestry. Results are compiled in a publicly accessible database that allows querying genome-wide association results for 9,113,133 genetic variants, as well as downloading GWAS summary statistics for over 30 million imputed genetic variants (>23 billion phenotype–genotype pairs). Our atlas of associations (GeneATLAS, http://geneatlas.roslin.ed.ac.uk
) will help researchers to query UK Biobank results in an easy and uniform way without the need to incur high computational costs. GeneATLAS is a web resource that presents genetic association results for 118 non-binary and 660 binary traits using UK Biobank data. This atlas allows researchers to query these results without incurring high computational costs.
••
TL;DR: LeafCutter is a new tool that identifies variable intron splicing events from RNA-seq data for analysis of complex alternative splicing and does not require transcript annotation and can be used to map splicing quantitative trait loci.
Abstract: The excision of introns from pre-mRNA is an essential step in mRNA processing. We developed LeafCutter to study sample and population variation in intron splicing. LeafCutter identifies variable splicing events from short-read RNA-seq data and finds events of high complexity. Our approach obviates the need for transcript annotations and circumvents the challenges in estimating relative isoform or exon usage in complex splicing events. LeafCutter can be used both to detect differential splicing between sample groups and to map splicing quantitative trait loci (sQTLs). Compared with contemporary methods, our approach identified 1.4-2.1 times more sQTLs, many of which helped us ascribe molecular effects to disease-associated variants. Transcriptome-wide associations between LeafCutter intron quantifications and 40 complex traits increased the number of associated disease genes at a 5% false discovery rate by an average of 2.1-fold compared with that detected through the use of gene expression levels alone. LeafCutter is fast, scalable, easy to use, and available online.
••
TL;DR: It is shown that targeted inactivation of the Malat1 gene in a transgenic mouse model of breast cancer, without altering the expression of its adjacent genes, promotes lung metastasis, and that this phenotype can be reversed by genetic add-back of Malat 1.
Abstract: MALAT1 has previously been described as a metastasis-promoting long noncoding RNA (lncRNA). We show here, however, that targeted inactivation of the Malat1 gene in a transgenic mouse model of breast cancer, without altering the expression of its adjacent genes, promotes lung metastasis, and that this phenotype can be reversed by genetic add-back of Malat1. Similarly, knockout of MALAT1 in human breast cancer cells induces their metastatic ability, which is reversed by re-expression of Malat1. Conversely, overexpression of Malat1 suppresses breast cancer metastasis in transgenic, xenograft, and syngeneic models. Mechanistically, the MALAT1 lncRNA binds and inactivates the prometastatic transcription factor TEAD, preventing TEAD from associating with its co-activator YAP and target gene promoters. Moreover, MALAT1 levels inversely correlate with breast cancer progression and metastatic ability. These findings demonstrate that MALAT1 is a metastasis-suppressing lncRNA rather than a metastasis promoter in breast cancer, calling for rectification of the model for this highly abundant and conserved lncRNA.
••
Harvard University1, Broad Institute2, Veterans Health Administration3, University of Pennsylvania4, VA Boston Healthcare System5, Emory University6, Boston University7, University of Utah8, VA Palo Alto Healthcare System9, Stanford University10, Yale University11, University of Massachusetts Amherst12, Pennsylvania State University13, University of Cambridge14, Brigham and Women's Hospital15, University of Michigan16, Geisinger Health System17
TL;DR: Analysis of genetic data and blood lipid measurements from over 300,000 participants in the Million Veteran Program identifies new associations for blood lipid traits and proposes novel indications for pharmaceutical inhibitors targeting PCSK9, ANGPTL4 (type 2 diabetes) and PDE3B (triglycerides and coronary disease).
Abstract: The Million Veteran Program (MVP) was established in 2011 as a national research initiative to determine how genetic variation influences the health of US military veterans Here we genotyped 312,571 MVP participants using a custom biobank array and linked the genetic data to laboratory and clinical phenotypes extracted from electronic health records covering a median of 100 years of follow-up Among 297,626 veterans with at least one blood lipid measurement, including 57,332 black and 24,743 Hispanic participants, we tested up to around 32 million variants for association with lipid levels and identified 118 novel genome-wide significant loci after meta-analysis with data from the Global Lipids Genetics Consortium (total n > 600,000) Through a focus on mutations predicted to result in a loss of gene function and a phenome-wide association study, we propose novel indications for pharmaceutical inhibitors targeting PCSK9 (abdominal aortic aneurysm), ANGPTL4 (type 2 diabetes) and PDE3B (triglycerides and coronary disease) Analysis of genetic data and blood lipid measurements from over 300,000 participants in the Million Veteran Program identifies new associations for blood lipid traits
••
TL;DR: It is suggested that many of the putative atrial fibrillation genes act via cardiac structural remodeling, potentially in the form of an ‘atrial cardiomyopathy’2, either during fetal heart development or as a response to stress in the adult heart.
Abstract: To identify genetic variation underlying atrial fibrillation, the most common cardiac arrhythmia, we performed a genome-wide association study of >1,000,000 people, including 60,620 atrial fibrillation cases and 970,216 controls. We identified 142 independent risk variants at 111 loci and prioritized 151 functional candidate genes likely to be involved in atrial fibrillation. Many of the identified risk variants fall near genes where more deleterious mutations have been reported to cause serious heart defects in humans (GATA4, MYH6, NKX2-5, PITX2, TBX5)1, or near genes important for striated muscle function and integrity (for example, CFL2, MYH7, PKP2, RBM20, SGCG, SSPN). Pathway and functional enrichment analyses also suggested that many of the putative atrial fibrillation genes act via cardiac structural remodeling, potentially in the form of an 'atrial cardiomyopathy'2, either during fetal heart development or as a response to stress in the adult heart.
••
TL;DR: A pan-genome dataset of the Oryza sativa–Oryza rufipogon species complex generated through deep sequencing and de novo genome assembly of 66 divergent accessions will be helpful in pinpointing new causal variants underlying complex traits and in promoting evolutionary and functional studies in rice.
Abstract: The rich genetic diversity in Oryza sativa and Oryza rufipogon serves as the main sources in rice breeding. Large-scale resequencing has been undertaken to discover allelic variants in rice, but much of the information for genetic variation is often lost by direct mapping of short sequence reads onto the O. sativa japonica Nipponbare reference genome. Here we constructed a pan-genome dataset of the O. sativa–O. rufipogon species complex through deep sequencing and de novo assembly of 66 divergent accessions. Intergenomic comparisons identified 23 million sequence variants in the rice genome. This catalog of sequence variations includes many known quantitative trait nucleotides and will be helpful in pinpointing new causal variants that underlie complex traits. In particular, we systemically investigated the whole set of coding genes using this pan-genome data, which revealed extensive presence and absence of variation among rice accessions. This pan-genome resource will further promote evolutionary and functional studies in rice. A pan-genome dataset of the Oryza sativa–Oryza rufipogon species complex generated through deep sequencing and de novo genome assembly of 66 divergent accessions will be helpful in pinpointing new causal variants underlying complex traits and in promoting evolutionary and functional studies in rice.
••
TL;DR: This study uniformly analyzed whole-exome sequencing of 249 tumors and matched normal tissue from patients with clinically annotated outcomes to immune checkpoint therapy across multiple cancer types to examine additional tumor genomic features that contribute to selective response.
Abstract: Tumor mutational burden correlates with response to immune checkpoint blockade in multiple solid tumors, although in microsatellite-stable tumors this association is of uncertain clinical utility. Here we uniformly analyzed whole-exome sequencing (WES) of 249 tumors and matched normal tissue from patients with clinically annotated outcomes to immune checkpoint therapy, including radiographic response, across multiple cancer types to examine additional tumor genomic features that contribute to selective response. Our analyses identified genomic correlates of response beyond mutational burden, including somatic events in individual driver genes, certain global mutational signatures, and specific HLA-restricted neoantigens. However, these features were often interrelated, highlighting the complexity of identifying genetic driver events that generate an immunoresponsive tumor environment. This study lays a path forward in analyzing large clinical cohorts in an integrated and multifaceted manner to enhance the ability to discover clinically meaningful predictive features of response to immune checkpoint blockade.
••
Fujian Agriculture and Forestry University1, University of Illinois at Urbana–Champaign2, Chinese Academy of Sciences3, University of Georgia4, Michigan State University5, University of Ottawa6, University of Tennessee7, CAS-MPG Partner Institute for Computational Biology8, University of Missouri9, University of Florida10, Texas A&M University System11, Johns Hopkins University12, Microsoft13, University of São Paulo14
TL;DR: In this article, a haplotype of S. spontaneum, AP85-441, facilitated the assembly of 32 pseudo-chromosomes comprising 8 homologous groups of 4 members each, bearing 35,525 genes with alleles defined.
Abstract: Modern sugarcanes are polyploid interspecific hybrids, combining high sugar content from Saccharum officinarum with hardiness, disease resistance and ratooning of Saccharum spontaneum. Sequencing of a haploid S. spontaneum, AP85-441, facilitated the assembly of 32 pseudo-chromosomes comprising 8 homologous groups of 4 members each, bearing 35,525 genes with alleles defined. The reduction of basic chromosome number from 10 to 8 in S. spontaneum was caused by fissions of 2 ancestral chromosomes followed by translocations to 4 chromosomes. Surprisingly, 80% of nucleotide binding site-encoding genes associated with disease resistance are located in 4 rearranged chromosomes and 51% of those in rearranged regions. Resequencing of 64 S. spontaneum genomes identified balancing selection in rearranged regions, maintaining their diversity. Introgressed S. spontaneum chromosomes in modern sugarcanes are randomly distributed in AP85-441 genome, indicating random recombination among homologs in different S. spontaneum accessions. The allele-defined Saccharum genome offers new knowledge and resources to accelerate sugarcane improvement.
••
TL;DR: A transcriptome- wide association study integrating genome-wide association data with expression data from brain, blood and adipose tissues identifies new candidate susceptibility genes for schizophrenia, providing a step toward understanding the underlying biology.
Abstract: Genome-wide association studies (GWAS) have identified over 100 risk loci for schizophrenia, but the causal mechanisms remain largely unknown. We performed a transcriptome-wide association study (TWAS) integrating a schizophrenia GWAS of 79,845 individuals from the Psychiatric Genomics Consortium with expression data from brain, blood, and adipose tissues across 3,693 primarily control individuals. We identified 157 TWAS-significant genes, of which 35 did not overlap a known GWAS locus. Of these 157 genes, 42 were associated with specific chromatin features measured in independent samples, thus highlighting potential regulatory targets for follow-up. Suppression of one identified susceptibility gene, mapk3, in zebrafish showed a significant effect on neurodevelopmental phenotypes. Expression and splicing from the brain captured most of the TWAS effect across all genes. This large-scale connection of associations to target genes, tissues, and regulatory features is an essential step in moving toward a mechanistic understanding of GWAS.
••
Joint Genome Institute1, Howard Hughes Medical Institute2, University of North Carolina at Chapel Hill3, ETH Zurich4, Stanford University5, Virginia Tech6, Gansu Agricultural University7, International Centre for Genetic Engineering and Biotechnology8, Oak Ridge National Laboratory9, University of Tennessee10, Cornell University11, Research Triangle Park12, University of Washington13, Max Planck Society14, University of California, Merced15
TL;DR: This work sequenced 484 genomes of bacterial isolates from roots of Brassicaceae, poplar, and maize and validated candidates from two sets of plant-associated genes, including one involved in plant colonization and the other serving in microbe–microbe competition between plant and microbe.
Abstract: Plants intimately associate with diverse bacteria. Plant-associated bacteria have ostensibly evolved genes that enable them to adapt to plant environments. However, the identities of such genes are mostly unknown, and their functions are poorly characterized. We sequenced 484 genomes of bacterial isolates from roots of Brassicaceae, poplar, and maize. We then compared 3,837 bacterial genomes to identify thousands of plant-associated gene clusters. Genomes of plant-associated bacteria encode more carbohydrate metabolism functions and fewer mobile elements than related non-plant-associated genomes do. We experimentally validated candidates from two sets of plant-associated genes: one involved in plant colonization, and the other serving in microbe-microbe competition between plant-associated bacteria. We also identified 64 plant-associated protein domains that potentially mimic plant domains; some are shared with plant-associated fungi and oomycetes. This work expands the genome-based understanding of plant-microbe interactions and provides potential leads for efficient and sustainable agriculture through microbiome engineering.
••
TL;DR: The fecal metabolome largely reflects gut microbial composition and is strongly associated with visceral-fat mass, thereby illustrating potential mechanisms underlying the well-established microbial influence on abdominal obesity.
Abstract: The human gut microbiome plays a key role in human health
1
, but 16S characterization lacks quantitative functional annotation
2
. The fecal metabolome provides a functional readout of microbial activity and can be used as an intermediate phenotype mediating host–microbiome interactions
3
. In this comprehensive description of the fecal metabolome, examining 1,116 metabolites from 786 individuals from a population-based twin study (TwinsUK), the fecal metabolome was found to be only modestly influenced by host genetics (heritability (H2) = 17.9%). One replicated locus at the NAT2 gene was associated with fecal metabolic traits. The fecal metabolome largely reflects gut microbial composition, explaining on average 67.7% (±18.8%) of its variance. It is strongly associated with visceral-fat mass, thereby illustrating potential mechanisms underlying the well-established microbial influence on abdominal obesity. Fecal metabolic profiling thus is a novel tool to explore links among microbiome composition, host phenotypes, and heritable complex traits. Comprehensive fecal metabolic profiling in 786 individuals from TwinsUK provides insights into the influence of host genetics and gut microbial composition on metabolites that may mediate microbiome-associated phenotypes.
••
TL;DR: WGD predicted for increased morbidity across cancer types, including KRAS-mutant colorectal cancers and estrogen receptor-positive breast cancers, independently of established clinical prognostic factors.
Abstract: Ploidy abnormalities are a hallmark of cancer, but their impact on the evolution and outcomes of cancers is unknown. Here, we identified whole-genome doubling (WGD) in the tumors of nearly 30% of 9,692 prospectively sequenced advanced cancer patients. WGD varied by tumor lineage and molecular subtype, and arose early in carcinogenesis after an antecedent transforming driver mutation. While associated with TP53 mutations, 46% of all WGD arose in TP53-wild-type tumors and in such cases was associated with an E2F-mediated G1 arrest defect, although neither aberration was obligate in WGD tumors. The variability of WGD across cancer types can be explained in part by cancer cell proliferation rates. WGD predicted for increased morbidity across cancer types, including KRAS-mutant colorectal cancers and estrogen receptor-positive breast cancers, independently of established clinical prognostic factors. We conclude that WGD is highly common in cancer and is a macro-evolutionary event associated with poor prognosis across cancer types.