scispace - formally typeset
Search or ask a question

The genetic architecture of type 2 diabetes

TL;DR: Large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes, but most fell within regions previously identified by genome-wide association studies.
Abstract: The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.
Citations
More filters
Journal ArticleDOI
11 Oct 2018-Nature
TL;DR: Deep phenotype and genome-wide genetic data from 500,000 individuals from the UK Biobank is described, describing population structure and relatedness in the cohort, and imputation to increase the number of testable variants to 96 million.
Abstract: The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.

4,489 citations

Journal ArticleDOI
TL;DR: An updated view of the global epidemiology of type 2 diabetes mellitus, as well as dietary, lifestyle and other risk factors for T2DM and its complications are provided.
Abstract: Globally, the number of people with diabetes mellitus has quadrupled in the past three decades, and diabetes mellitus is the ninth major cause of death. About 1 in 11 adults worldwide now have diabetes mellitus, 90% of whom have type 2 diabetes mellitus (T2DM). Asia is a major area of the rapidly emerging T2DM global epidemic, with China and India the top two epicentres. Although genetic predisposition partly determines individual susceptibility to T2DM, an unhealthy diet and a sedentary lifestyle are important drivers of the current global epidemic; early developmental factors (such as intrauterine exposures) also have a role in susceptibility to T2DM later in life. Many cases of T2DM could be prevented with lifestyle changes, including maintaining a healthy body weight, consuming a healthy diet, staying physically active, not smoking and drinking alcohol in moderation. Most patients with T2DM have at least one complication, and cardiovascular complications are the leading cause of morbidity and mortality in these patients. This Review provides an updated view of the global epidemiology of T2DM, as well as dietary, lifestyle and other risk factors for T2DM and its complications.

2,763 citations

Journal ArticleDOI
TL;DR: The remarkable range of discoveriesGWASs has facilitated in population and complex-trait genetics, the biology of diseases, and translation toward new therapeutics are reviewed.
Abstract: Application of the experimental design of genome-wide association studies (GWASs) is now 10 years old (young), and here we review the remarkable range of discoveries it has facilitated in population and complex-trait genetics, the biology of diseases, and translation toward new therapeutics. We predict the likely discoveries in the next 10 years, when GWASs will be based on millions of samples with array data imputed to a large fully sequenced reference panel and on hundreds of thousands of samples with whole-genome sequencing data.

2,669 citations


Cites background or result from "The genetic architecture of type 2 ..."

  • ...For others, such as RREB1 (MIM: 602209), identification of T2D-associated coding variants, statistically independent of the original GWAS signal, flags the likely effector transcripts.(74) All in all, it is possible to point to a compelling effector transcript at around one-third of the 100 T2D loci identified by GWASs....

    [...]

  • ...Recent efforts to extend GWASs beyond arraybased genotyping and to access a broader range of variants through sequencing (particularly those of lower frequency) have revealed that most genetic variation influencing T2D appears to reside at common variant sites.(74,77) This chimes with the viewof T2D as a largely post-reproductive trait and is consistent with a failure to detect compelling empirical evidence that T2D risk alleles have been subject to marked purifying selection....

    [...]

Journal ArticleDOI
TL;DR: Genome-wide polygenic risk scores derived from GWAS data for five common diseases can identify subgroups of the population with risk approaching or exceeding that of a monogenic mutation.
Abstract: A key public health need is to identify individuals at high risk for a given disease to enable enhanced screening or preventive therapies. Because most common diseases have a genetic component, one important approach is to stratify individuals based on inherited DNA variation1. Proposed clinical applications have largely focused on finding carriers of rare monogenic mutations at several-fold increased risk. Although most disease risk is polygenic in nature2-5, it has not yet been possible to use polygenic predictors to identify individuals at risk comparable to monogenic mutations. Here, we develop and validate genome-wide polygenic scores for five common diseases. The approach identifies 8.0, 6.1, 3.5, 3.2, and 1.5% of the population at greater than threefold increased risk for coronary artery disease, atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer, respectively. For coronary artery disease, this prevalence is 20-fold higher than the carrier frequency of rare monogenic mutations conferring comparable risk6. We propose that it is time to contemplate the inclusion of polygenic risk prediction in clinical care, and discuss relevant issues.

1,962 citations

Journal ArticleDOI
Anubha Mahajan1, Daniel Taliun2, Matthias Thurner1, Neil R. Robertson1, Jason M. Torres1, N. William Rayner1, N. William Rayner3, Anthony Payne1, Valgerdur Steinthorsdottir4, Robert A. Scott5, Niels Grarup6, James P. Cook7, Ellen M. Schmidt2, Matthias Wuttke8, Chloé Sarnowski9, Reedik Mägi10, Jana Nano11, Christian Gieger, Stella Trompet12, Cécile Lecoeur13, Michael Preuss14, Bram P. Prins3, Xiuqing Guo15, Lawrence F. Bielak2, Jennifer E. Below16, Donald W. Bowden17, John C. Chambers, Young-Jin Kim, Maggie C.Y. Ng17, Lauren E. Petty16, Xueling Sim18, Weihua Zhang19, Weihua Zhang20, Amanda J. Bennett1, Jette Bork-Jensen6, Chad M. Brummett2, Mickaël Canouil13, Kai-Uwe Ec Kardt21, Krista Fischer10, Sharon L.R. Kardia2, Florian Kronenberg22, Kristi Läll10, Ching-Ti Liu9, Adam E. Locke23, Jian'an Luan5, Ioanna Ntalla24, Vibe Nylander1, Sebastian Schönherr22, Claudia Schurmann14, Loic Yengo13, Erwin P. Bottinger14, Ivan Brandslund25, Cramer Christensen, George Dedoussis26, Jose C. Florez, Ian Ford27, Oscar H. Franco11, Timothy M. Frayling28, Vilmantas Giedraitis29, Sophie Hackinger3, Andrew T. Hattersley28, Christian Herder30, M. Arfan Ikram11, Martin Ingelsson29, Marit E. Jørgensen25, Marit E. Jørgensen31, Torben Jørgensen6, Torben Jørgensen32, Jennifer Kriebel, Johanna Kuusisto33, Symen Ligthart11, Cecilia M. Lindgren1, Cecilia M. Lindgren34, Allan Linneberg6, Allan Linneberg35, Valeriya Lyssenko36, Valeriya Lyssenko37, Vasiliki Mamakou26, Thomas Meitinger38, Karen L. Mohlke39, Andrew D. Morris40, Andrew D. Morris41, Girish N. Nadkarni14, James S. Pankow42, Annette Peters, Naveed Sattar43, Alena Stančáková33, Konstantin Strauch44, Kent D. Taylor15, Barbara Thorand, Gudmar Thorleifsson4, Unnur Thorsteinsdottir4, Unnur Thorsteinsdottir45, Jaakko Tuomilehto, Daniel R. Witte46, Josée Dupuis9, Patricia A. Peyser2, Eleftheria Zeggini3, Ruth J. F. Loos14, Philippe Froguel20, Philippe Froguel13, Erik Ingelsson47, Erik Ingelsson48, Lars Lind29, Leif Groop37, Leif Groop49, Markku Laakso33, Francis S. Collins50, J. Wouter Jukema12, Colin N. A. Palmer51, Harald Grallert, Andres Metspalu10, Abbas Dehghan11, Abbas Dehghan20, Anna Köttgen8, Gonçalo R. Abecasis2, James B. Meigs52, Jerome I. Rotter15, Jonathan Marchini1, Oluf Pedersen6, Torben Hansen25, Torben Hansen6, Claudia Langenberg5, Nicholas J. Wareham5, Kari Stefansson45, Kari Stefansson4, Anna L. Gloyn1, Andrew P. Morris1, Andrew P. Morris7, Andrew P. Morris10, Michael Boehnke2, Mark I. McCarthy1 
TL;DR: Combining 32 genome-wide association studies with high-density imputation provides a comprehensive view of the genetic contribution to type 2 diabetes in individuals of European ancestry with respect to locus discovery, causal-variant resolution, and mechanistic insight.
Abstract: We expanded GWAS discovery for type 2 diabetes (T2D) by combining data from 898,130 European-descent individuals (9% cases), after imputation to high-density reference panels. With these data, we (i) extend the inventory of T2D-risk variants (243 loci, 135 newly implicated in T2D predisposition, comprising 403 distinct association signals); (ii) enrich discovery of lower-frequency risk alleles (80 index variants with minor allele frequency 2); (iii) substantially improve fine-mapping of causal variants (at 51 signals, one variant accounted for >80% posterior probability of association (PPA)); (iv) extend fine-mapping through integration of tissue-specific epigenomic information (islet regulatory annotations extend the number of variants with PPA >80% to 73); (v) highlight validated therapeutic targets (18 genes with associations attributable to coding variants); and (vi) demonstrate enhanced potential for clinical translation (genome-wide chip heritability explains 18% of T2D risk; individuals in the extremes of a T2D polygenic risk score differ more than ninefold in prevalence).

1,136 citations

References
More filters
Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations

Journal ArticleDOI
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

34,830 citations

Journal ArticleDOI
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

20,557 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: A unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs is presented.
Abstract: Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.

10,056 citations