The genetic architecture of type 2 diabetes

Home
/
Papers
/
The genetic architecture of type 2 diabetes

The genetic architecture of type 2 diabetes

Christian Fuchsberger, Jason Flannick, Tanya M. Teslovich, Anubha Mahajan +297 more

01 Jan 2016-

TL;DR: Large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes, but most fell within regions previously identified by genome-wide association studies.

read less

Abstract: The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

The UK Biobank resource with deep phenotyping and genomic data

[...]

Clare Bycroft¹, Colin Freeman¹, Desislava Petkova², Desislava Petkova¹, Gavin Band¹, Lloyd T. Elliott¹, Kevin Sharp¹, Allan Motyer³, Damjan Vukcevic³, Olivier Delaneau⁴, Olivier Delaneau⁵, Jared O'Connell⁶, Adrian Cortes¹, Adrian Cortes⁷, Samantha Welsh, Alan Young¹, Mark Effingham, Gil McVean¹, Stephen Leslie³, Naomi E. Allen¹, Peter Donnelly¹, Jonathan Marchini¹ - Show less +18 more•Institutions (7)

University of Oxford¹, Procter & Gamble², University of Melbourne³, University of Geneva⁴, Swiss Institute of Bioinformatics⁵, Illumina⁶, John Radcliffe Hospital⁷

11 Oct 2018-Nature

TL;DR: Deep phenotype and genome-wide genetic data from 500,000 individuals from the UK Biobank is described, describing population structure and relatedness in the cohort, and imputation to increase the number of testable variants to 96 million.

...read moreread less

Abstract: The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.

...read moreread less

4,489 citations

Journal Article•DOI•

Global aetiology and epidemiology of type 2 diabetes mellitus and its complications.

[...]

Yan Zheng¹, Sylvia H. Ley², Frank B. Hu², Frank B. Hu³•Institutions (3)

Fudan University¹, Harvard University², Brigham and Women's Hospital³

01 Feb 2018-Nature Reviews Endocrinology

TL;DR: An updated view of the global epidemiology of type 2 diabetes mellitus, as well as dietary, lifestyle and other risk factors for T2DM and its complications are provided.

...read moreread less

Abstract: Globally, the number of people with diabetes mellitus has quadrupled in the past three decades, and diabetes mellitus is the ninth major cause of death. About 1 in 11 adults worldwide now have diabetes mellitus, 90% of whom have type 2 diabetes mellitus (T2DM). Asia is a major area of the rapidly emerging T2DM global epidemic, with China and India the top two epicentres. Although genetic predisposition partly determines individual susceptibility to T2DM, an unhealthy diet and a sedentary lifestyle are important drivers of the current global epidemic; early developmental factors (such as intrauterine exposures) also have a role in susceptibility to T2DM later in life. Many cases of T2DM could be prevented with lifestyle changes, including maintaining a healthy body weight, consuming a healthy diet, staying physically active, not smoking and drinking alcohol in moderation. Most patients with T2DM have at least one complication, and cardiovascular complications are the leading cause of morbidity and mortality in these patients. This Review provides an updated view of the global epidemiology of T2DM, as well as dietary, lifestyle and other risk factors for T2DM and its complications.

...read moreread less

2,763 citations

Journal Article•DOI•

10 Years of GWAS Discovery: Biology, Function, and Translation

[...]

Peter M. Visscher¹, Naomi R. Wray¹, Qian Zhang¹, Pamela Sklar², Mark I. McCarthy³, Matthew A. Brown⁴, Jian Yang¹ - Show less +3 more•Institutions (4)

University of Queensland¹, Icahn School of Medicine at Mount Sinai², Wellcome Trust Centre for Human Genetics³, Queensland University of Technology⁴

06 Jul 2017-American Journal of Human Genetics

TL;DR: The remarkable range of discoveriesGWASs has facilitated in population and complex-trait genetics, the biology of diseases, and translation toward new therapeutics are reviewed.

...read moreread less

Abstract: Application of the experimental design of genome-wide association studies (GWASs) is now 10 years old (young), and here we review the remarkable range of discoveries it has facilitated in population and complex-trait genetics, the biology of diseases, and translation toward new therapeutics. We predict the likely discoveries in the next 10 years, when GWASs will be based on millions of samples with array data imputed to a large fully sequenced reference panel and on hundreds of thousands of samples with whole-genome sequencing data.

...read moreread less

2,669 citations

Cites background or result from "The genetic architecture of type 2 ..."

...For others, such as RREB1 (MIM: 602209), identification of T2D-associated coding variants, statistically independent of the original GWAS signal, flags the likely effector transcripts.(74) All in all, it is possible to point to a compelling effector transcript at around one-third of the 100 T2D loci identified by GWASs....
[...]
...Recent efforts to extend GWASs beyond arraybased genotyping and to access a broader range of variants through sequencing (particularly those of lower frequency) have revealed that most genetic variation influencing T2D appears to reside at common variant sites.(74,77) This chimes with the viewof T2D as a largely post-reproductive trait and is consistent with a failure to detect compelling empirical evidence that T2D risk alleles have been subject to marked purifying selection....
[...]

Journal Article•DOI•

Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations

[...]

Amit Khera¹, Mark Chaffin², Krishna G. Aragam, Mary E. Haas², Carolina Roselli², Seung Hoan Choi², Pradeep Natarajan¹, Eric S. Lander², Steven A. Lubitz¹, Steven A. Lubitz², Patrick T. Ellinor², Patrick T. Ellinor¹, Sekar Kathiresan - Show less +9 more•Institutions (2)

Harvard University¹, Broad Institute²

13 Aug 2018-Nature Genetics

TL;DR: Genome-wide polygenic risk scores derived from GWAS data for five common diseases can identify subgroups of the population with risk approaching or exceeding that of a monogenic mutation.

...read moreread less

Abstract: A key public health need is to identify individuals at high risk for a given disease to enable enhanced screening or preventive therapies. Because most common diseases have a genetic component, one important approach is to stratify individuals based on inherited DNA variation1. Proposed clinical applications have largely focused on finding carriers of rare monogenic mutations at several-fold increased risk. Although most disease risk is polygenic in nature2-5, it has not yet been possible to use polygenic predictors to identify individuals at risk comparable to monogenic mutations. Here, we develop and validate genome-wide polygenic scores for five common diseases. The approach identifies 8.0, 6.1, 3.5, 3.2, and 1.5% of the population at greater than threefold increased risk for coronary artery disease, atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer, respectively. For coronary artery disease, this prevalence is 20-fold higher than the carrier frequency of rare monogenic mutations conferring comparable risk6. We propose that it is time to contemplate the inclusion of polygenic risk prediction in clinical care, and discuss relevant issues.

...read moreread less

1,962 citations

Journal Article•DOI•

Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps.

[...]

Anubha Mahajan¹, Daniel Taliun², Matthias Thurner¹, Neil R. Robertson¹, Jason M. Torres¹, N. William Rayner¹, N. William Rayner³, Anthony Payne¹, Valgerdur Steinthorsdottir⁴, Robert A. Scott⁵, Niels Grarup⁶, James P. Cook⁷, Ellen M. Schmidt², Matthias Wuttke⁸, Chloé Sarnowski⁹, Reedik Mägi¹⁰, Jana Nano¹¹, Christian Gieger, Stella Trompet¹², Cécile Lecoeur¹³, Michael Preuss¹⁴, Bram P. Prins³, Xiuqing Guo¹⁵, Lawrence F. Bielak², Jennifer E. Below¹⁶, Donald W. Bowden¹⁷, John C. Chambers, Young-Jin Kim, Maggie C.Y. Ng¹⁷, Lauren E. Petty¹⁶, Xueling Sim¹⁸, Weihua Zhang¹⁹, Weihua Zhang²⁰, Amanda J. Bennett¹, Jette Bork-Jensen⁶, Chad M. Brummett², Mickaël Canouil¹³, Kai-Uwe Ec Kardt²¹, Krista Fischer¹⁰, Sharon L.R. Kardia², Florian Kronenberg²², Kristi Läll¹⁰, Ching-Ti Liu⁹, Adam E. Locke²³, Jian'an Luan⁵, Ioanna Ntalla²⁴, Vibe Nylander¹, Sebastian Schönherr²², Claudia Schurmann¹⁴, Loic Yengo¹³, Erwin P. Bottinger¹⁴, Ivan Brandslund²⁵, Cramer Christensen, George Dedoussis²⁶, Jose C. Florez, Ian Ford²⁷, Oscar H. Franco¹¹, Timothy M. Frayling²⁸, Vilmantas Giedraitis²⁹, Sophie Hackinger³, Andrew T. Hattersley²⁸, Christian Herder³⁰, M. Arfan Ikram¹¹, Martin Ingelsson²⁹, Marit E. Jørgensen²⁵, Marit E. Jørgensen³¹, Torben Jørgensen⁶, Torben Jørgensen³², Jennifer Kriebel, Johanna Kuusisto³³, Symen Ligthart¹¹, Cecilia M. Lindgren¹, Cecilia M. Lindgren³⁴, Allan Linneberg⁶, Allan Linneberg³⁵, Valeriya Lyssenko³⁶, Valeriya Lyssenko³⁷, Vasiliki Mamakou²⁶, Thomas Meitinger³⁸, Karen L. Mohlke³⁹, Andrew D. Morris⁴⁰, Andrew D. Morris⁴¹, Girish N. Nadkarni¹⁴, James S. Pankow⁴², Annette Peters, Naveed Sattar⁴³, Alena Stančáková³³, Konstantin Strauch⁴⁴, Kent D. Taylor¹⁵, Barbara Thorand, Gudmar Thorleifsson⁴, Unnur Thorsteinsdottir⁴, Unnur Thorsteinsdottir⁴⁵, Jaakko Tuomilehto, Daniel R. Witte⁴⁶, Josée Dupuis⁹, Patricia A. Peyser², Eleftheria Zeggini³, Ruth J. F. Loos¹⁴, Philippe Froguel²⁰, Philippe Froguel¹³, Erik Ingelsson⁴⁷, Erik Ingelsson⁴⁸, Lars Lind²⁹, Leif Groop³⁷, Leif Groop⁴⁹, Markku Laakso³³, Francis S. Collins⁵⁰, J. Wouter Jukema¹², Colin N. A. Palmer⁵¹, Harald Grallert, Andres Metspalu¹⁰, Abbas Dehghan¹¹, Abbas Dehghan²⁰, Anna Köttgen⁸, Gonçalo R. Abecasis², James B. Meigs⁵², Jerome I. Rotter¹⁵, Jonathan Marchini¹, Oluf Pedersen⁶, Torben Hansen²⁵, Torben Hansen⁶, Claudia Langenberg⁵, Nicholas J. Wareham⁵, Kari Stefansson⁴⁵, Kari Stefansson⁴, Anna L. Gloyn¹, Andrew P. Morris¹, Andrew P. Morris⁷, Andrew P. Morris¹⁰, Michael Boehnke², Mark I. McCarthy¹ - Show less +128 more•Institutions (52)

University of Oxford¹, University of Michigan², Wellcome Trust Sanger Institute³, Amgen⁴, University of Cambridge⁵, University of Copenhagen⁶, University of Liverpool⁷, University of Freiburg⁸, Boston University⁹, University of Tartu¹⁰, Erasmus University Medical Center¹¹, Leiden University Medical Center¹², Pasteur Institute¹³, Icahn School of Medicine at Mount Sinai¹⁴, UCLA Medical Center¹⁵, Vanderbilt University Medical Center¹⁶, Wake Forest University¹⁷, National University of Singapore¹⁸, London North West Healthcare NHS Trust¹⁹, Imperial College London²⁰, Charité²¹, Innsbruck Medical University²², Washington University in St. Louis²³, Queen Mary University of London²⁴, University of Southern Denmark²⁵, National and Kapodistrian University of Athens²⁶, Robertson Centre for Biostatistics²⁷, University of Exeter²⁸, Uppsala University²⁹, University of Düsseldorf³⁰, Steno Diabetes Center³¹, Aalborg University³², University of Eastern Finland³³, Broad Institute³⁴, Frederiksberg Hospital³⁵, University of Bergen³⁶, Lund University³⁷, Technische Universität München³⁸, University of North Carolina at Chapel Hill³⁹, Ninewells Hospital⁴⁰, University of Edinburgh⁴¹, University of Minnesota⁴², University of Glasgow⁴³, Ludwig Maximilian University of Munich⁴⁴, University of Iceland⁴⁵, Aarhus University⁴⁶, Stanford University⁴⁷, Science for Life Laboratory⁴⁸, University of Helsinki⁴⁹, National Institutes of Health⁵⁰, University of Dundee⁵¹, Harvard University⁵²

08 Oct 2018-Nature Genetics

TL;DR: Combining 32 genome-wide association studies with high-density imputation provides a comprehensive view of the genetic contribution to type 2 diabetes in individuals of European ancestry with respect to locus discovery, causal-variant resolution, and mechanistic insight.

...read moreread less

Abstract: We expanded GWAS discovery for type 2 diabetes (T2D) by combining data from 898,130 European-descent individuals (9% cases), after imputation to high-density reference panels. With these data, we (i) extend the inventory of T2D-risk variants (243 loci, 135 newly implicated in T2D predisposition, comprising 403 distinct association signals); (ii) enrich discovery of lower-frequency risk alleles (80 index variants with minor allele frequency 2); (iii) substantially improve fine-mapping of causal variants (at 51 signals, one variant accounted for >80% posterior probability of association (PPA)); (iv) extend fine-mapping through integration of tissue-specific epigenomic information (islet regulatory annotations extend the number of variants with PPA >80% to 73); (v) highlight validated therapeutic targets (18 genes with associations attributable to coding variants); and (vi) demonstrate enhanced potential for clinical translation (genome-wide chip heritability explains 18% of T2D risk; individuals in the extremes of a T2D polygenic risk score differ more than ninefold in prevalence).

...read moreread less

1,136 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Fast and accurate short read alignment with Burrows–Wheeler transform

[...]

Heng Li¹, Richard Durbin¹•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Jul 2009-Bioinformatics

TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.

...read moreread less

Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

...read moreread less

43,862 citations

Journal Article•DOI•

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

[...]

Aravind Subramanian¹, Pablo Tamayo¹, Vamsi K. Mootha², Sayan Mukherjee³, Benjamin L. Ebert², Michael A. Gillette², Amanda G. Paulovich⁴, Scott L. Pomeroy², Todd R. Golub², Eric S. Lander¹, Jill P. Mesirov¹ - Show less +7 more•Institutions (4)

Massachusetts Institute of Technology¹, Harvard University², Duke University³, Fred Hutchinson Cancer Research Center⁴

25 Oct 2005-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.

...read moreread less

Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

...read moreread less

34,830 citations

Journal Article•DOI•

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

[...]

Aaron McKenna¹, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, Andrew Kernytsky, Kiran V. Garimella, David Altshuler, Stacey Gabriel, Mark J. Daly, Mark A. DePristo - Show less +7 more•Institutions (1)

Broad Institute¹

01 Sep 2010-Genome Research

TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

20,557 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

A framework for variation discovery and genotyping using next-generation DNA sequencing data

[...]

Mark A. DePristo¹, Eric Banks¹, Ryan Poplin¹, Kiran V. Garimella¹, Jared Maguire¹, Christopher Hartl¹, Anthony A. Philippakis¹, Anthony A. Philippakis², Anthony A. Philippakis³, Guillermo del Angel¹, Manuel A. Rivas¹, Manuel A. Rivas³, Matt Hanna¹, Aaron McKenna¹, Timothy Fennell¹, Andrew Kernytsky¹, Andrey Sivachenko¹, Kristian Cibulskis¹, Stacey Gabriel¹, David Altshuler³, David Altshuler¹, Mark J. Daly¹, Mark J. Daly³ - Show less +19 more•Institutions (3)

Broad Institute¹, Brigham and Women's Hospital², Harvard University³

01 May 2011-Nature Genetics

TL;DR: A unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs is presented.

...read moreread less

Abstract: Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.

...read moreread less

10,056 citations