Home
/
Authors
/
Gonçalo R. Abecasis

Author

Gonçalo R. Abecasis

Other affiliations: Johns Hopkins University School of Medicine, Wellcome Trust Centre for Human Genetics, University of California, Los Angeles ...read more

Bio: Gonçalo R. Abecasis is an academic researcher from University of Michigan. The author has contributed to research in topics: Genome-wide association study & Population. The author has an hindex of 179, co-authored 595 publications receiving 230323 citations. Previous affiliations of Gonçalo R. Abecasis include Johns Hopkins University School of Medicine & Wellcome Trust Centre for Human Genetics.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Detecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype Data

[...]

Goo Jun¹, Matthew Flickinger¹, Kurt N. Hetrick², Jane Romm², Kimberly F. Doheny², Gonçalo R. Abecasis¹, Michael Boehnke¹, Hyun Min Kang¹ - Show less +4 more•Institutions (2)

University of Michigan¹, Johns Hopkins University²

02 Nov 2012-American Journal of Human Genetics

TL;DR: Through a combination of analysis of in silico and experimentally contaminated samples, it is shown that the methods described can reliably detect and estimate levels of contamination as low as 1%.

...read moreread less

Abstract: DNA sample contamination is a serious problem in DNA sequencing studies and may result in systematic genotype misclassification and false positive associations. Although methods exist to detect and filter out cross-species contamination, few methods to detect within-species sample contamination are available. In this paper, we describe methods to identify within-species DNA sample contamination based on (1) a combination of sequencing reads and array-based genotype data, (2) sequence reads alone, and (3) array-based genotype data alone. Analysis of sequencing reads allows contamination detection after sequence data is generated but prior to variant calling; analysis of array-based genotype data allows contamination detection prior to generation of costly sequence data. Through a combination of analysis of in silico and experimentally contaminated samples, we show that our methods can reliably detect and estimate levels of contamination as low as 1%. We evaluate the impact of DNA contamination on genotype accuracy and propose effective strategies to screen for and prevent DNA contamination in sequencing studies.

...read moreread less

460 citations

Journal Article•DOI•

Novel Loci for Adiponectin Levels and Their Influence on Type 2 Diabetes and Metabolic Traits: A Multi-Ethnic Meta-Analysis of 45,891 Individuals

[...]

Zari Dastani¹, Hivert M-F.², Hivert M-F.³, N J Timpson⁴ +615 more•Institutions (128)

29 Mar 2012-PLOS Genetics

TL;DR: A meta-analysis of genome-wide association studies in 39,883 individuals of European ancestry to identify genes associated with metabolic disease identifies novel genetic determinants of adiponectin levels, which, taken together, influence risk of T2D and markers of insulin resistance.

...read moreread less

Abstract: Circulating levels of adiponectin, a hormone produced predominantly by adipocytes, are highly heritable and are inversely associated with type 2 diabetes mellitus (T2D) and other metabolic traits. We conducted a meta-analysis of genome-wide association studies in 39,883 individuals of European ancestry to identify genes associated with metabolic disease. We identified 8 novel loci associated with adiponectin levels and confirmed 2 previously reported loci (P = 4.5×10(-8)-1.2×10(-43)). Using a novel method to combine data across ethnicities (N = 4,232 African Americans, N = 1,776 Asians, and N = 29,347 Europeans), we identified two additional novel loci. Expression analyses of 436 human adipocyte samples revealed that mRNA levels of 18 genes at candidate regions were associated with adiponectin concentrations after accounting for multiple testing (p<3×10(-4)). We next developed a multi-SNP genotypic risk score to test the association of adiponectin decreasing risk alleles on metabolic traits and diseases using consortia-level meta-analytic data. This risk score was associated with increased risk of T2D (p = 4.3×10(-3), n = 22,044), increased triglycerides (p = 2.6×10(-14), n = 93,440), increased waist-to-hip ratio (p = 1.8×10(-5), n = 77,167), increased glucose two hours post oral glucose tolerance testing (p = 4.4×10(-3), n = 15,234), increased fasting insulin (p = 0.015, n = 48,238), but with lower in HDL-cholesterol concentrations (p = 4.5×10(-13), n = 96,748) and decreased BMI (p = 1.4×10(-4), n = 121,335). These findings identify novel genetic determinants of adiponectin levels, which, taken together, influence risk of T2D and markers of insulin resistance.

...read moreread less

456 citations

Journal Article•DOI•

minimac2: faster genotype imputation

[...]

Christian Fuchsberger¹, Gonçalo R. Abecasis¹, David A. Hinds¹•Institutions (1)

University of Michigan¹

01 Mar 2015-Bioinformatics

TL;DR: This work demonstrates how the application of software engineering techniques can help to keep imputation broadly accessible and speed up imputation by an order of magnitude compared with the previous implementation.

...read moreread less

Abstract: Summary: Genotype imputation is a key step in the analysis of genome-wide association studies. Upcoming very large reference panels, such as those from The 1000 Genomes Project and the Haplotype Consortium, will improve imputation quality of rare and less common variants, but will also increase the computational burden. Here, we demonstrate how the application of software engineering techniques can help to keep imputation broadly accessible. Overall, these improvements speed up imputation by an order of magnitude compared with our previous implementation. Availability and implementation: minimac2, including source code, documentation, and examples is available at http://genome.sph.umich.edu/wiki/Minimac2 Contact: ude.hcimu@bshcufc, ude.hcimu@olacnog

...read moreread less

454 citations

Journal Article•DOI•

Genetics of blood lipids among ~300,000 multi-ethnic participants of the Million Veteran Program.

[...]

Derek Klarin¹, Derek Klarin², Scott M. Damrauer³, Scott M. Damrauer⁴, Kelly Cho⁵, Yan V. Sun⁶, Tanya M. Teslovich, Jacqueline Honerlaw⁵, David R. Gagnon⁷, David R. Gagnon⁵, Scott L. DuVall⁸, Jin Li⁹, Jin Li¹⁰, Gina M. Peloso⁷, Mark Chaffin², Aeron Small¹¹, Aeron Small⁴, Jie Huang⁵, Hua Tang¹⁰, Julie A. Lynch¹², Yuk-Lam Ho⁵, Dajiang J. Liu¹³, Connor A. Emdin¹, Connor A. Emdin², Alexander H. Li, Jennifer E. Huffman⁵, Jennifer Lee⁹, Jennifer Lee¹⁰, Pradeep Natarajan², Pradeep Natarajan¹, Rajiv Chowdhury¹⁴, Danish Saleheen³, Danish Saleheen⁴, Marijana Vujkovic³, Marijana Vujkovic⁴, Aris Baras, Saiju Pyarajan⁵, Saiju Pyarajan¹⁵, Emanuele Di Angelantonio¹⁴, Benjamin M. Neale¹, Benjamin M. Neale², Aliya Naheed, Amit Khera¹, Amit Khera², John Danesh¹⁴, Kyong-Mi Chang⁴, Kyong-Mi Chang³, Gonçalo R. Abecasis¹⁶, Cristen J. Willer¹⁶, Frederick E. Dewey, David J. Carey¹⁷, VA Million Veteran Program⁹, VA Million Veteran Program¹⁰, John Concato¹¹, J. Michael Gaziano¹⁵, J. Michael Gaziano⁵, J. Michael Gaziano¹, Christopher J. O'Donnell¹, Christopher J. O'Donnell⁵, Philip S. Tsao⁹, Philip S. Tsao¹⁰, Sekar Kathiresan², Sekar Kathiresan¹, Daniel J. Rader, Peter W.F. Wilson⁶, Peter W.F. Wilson⁴, Themistocles L. Assimes¹⁰, Themistocles L. Assimes⁹ - Show less +64 more•Institutions (17)

Harvard University¹, Broad Institute², University of Pennsylvania³, Veterans Health Administration⁴, VA Boston Healthcare System⁵, Emory University⁶, Boston University⁷, University of Utah⁸, VA Palo Alto Healthcare System⁹, Stanford University¹⁰, Yale University¹¹, University of Massachusetts Amherst¹², Pennsylvania State University¹³, University of Cambridge¹⁴, Brigham and Women's Hospital¹⁵, University of Michigan¹⁶, Geisinger Health System¹⁷

01 Oct 2018-Nature Genetics

TL;DR: Analysis of genetic data and blood lipid measurements from over 300,000 participants in the Million Veteran Program identifies new associations for blood lipid traits and proposes novel indications for pharmaceutical inhibitors targeting PCSK9, ANGPTL4 (type 2 diabetes) and PDE3B (triglycerides and coronary disease).

...read moreread less

Abstract: The Million Veteran Program (MVP) was established in 2011 as a national research initiative to determine how genetic variation influences the health of US military veterans Here we genotyped 312,571 MVP participants using a custom biobank array and linked the genetic data to laboratory and clinical phenotypes extracted from electronic health records covering a median of 100 years of follow-up Among 297,626 veterans with at least one blood lipid measurement, including 57,332 black and 24,743 Hispanic participants, we tested up to around 32 million variants for association with lipid levels and identified 118 novel genome-wide significant loci after meta-analysis with data from the Global Lipids Genetics Consortium (total n > 600,000) Through a focus on mutations predicted to result in a loss of gene function and a phenome-wide association study, we propose novel indications for pharmaceutical inhibitors targeting PCSK9 (abdominal aortic aneurysm), ANGPTL4 (type 2 diabetes) and PDE3B (triglycerides and coronary disease) Analysis of genetic data and blood lipid measurements from over 300,000 participants in the Million Veteran Program identifies new associations for blood lipid traits

...read moreread less

447 citations

Journal Article•DOI•

Biobank-driven genomic discovery yields new insight into atrial fibrillation biology

[...]

Jonas B. Nielsen¹, Rosa B. Thorolfsdottir², Rosa B. Thorolfsdottir³, Lars G. Fritsche, Weichen Zhou¹, Morten W. Skov⁴, Sarah E. Graham¹, Todd J. Herron¹, Shane McCarthy, Ellen M. Schmidt¹, Gardar Sveinbjornsson³, Ida Surakka¹, Michael R. Mathis¹, Masatoshi Yamazaki⁵, Ryan D. Crawford¹, Maiken Elvestad Gabrielsen⁶, Anne Heidi Skogholt⁶, Oddgeir L. Holmen⁶, Maoxuan Lin¹, Brooke N. Wolford¹, Rounak Dey¹, Håvard Dalen⁷, Håvard Dalen⁶, Patrick Sulem³, Jonathan H. Chung, Joshua D. Backman, David O. Arnar², David O. Arnar³, Unnur Thorsteinsdottir³, Unnur Thorsteinsdottir², Aris Baras, Colm O'Dushlaine, Anders G. Holst⁴, Xiaoquan Wen¹, Whitney E. Hornsby¹, Frederick E. Dewey, Michael Boehnke¹, Sachin Kheterpal¹, Bhramar Mukherjee¹, Seunggeun Lee¹, Hyun Min Kang¹, Hilma Holm³, Jacob O. Kitzman¹, Jordan A. Shavit¹, José Jalife⁸, José Jalife¹, Chad M. Brummett¹, Tanya M. Teslovich, David J. Carey⁹, Daniel F. Gudbjartsson², Daniel F. Gudbjartsson³, Kari Stefansson², Kari Stefansson³, Gonçalo R. Abecasis⁶, Kristian Hveem⁷, Kristian Hveem⁶, Cristen J. Willer¹ - Show less +53 more•Institutions (9)

University of Michigan¹, University of Iceland², Amgen³, Copenhagen University Hospital⁴, University of Tokyo⁵, Norwegian University of Science and Technology⁶, Nord-Trøndelag Hospital Trust⁷, Centro Nacional de Investigaciones Cardiovasculares⁸, Geisinger Health System⁹

30 Jul 2018-Nature Genetics

TL;DR: It is suggested that many of the putative atrial fibrillation genes act via cardiac structural remodeling, potentially in the form of an ‘atrial cardiomyopathy’2, either during fetal heart development or as a response to stress in the adult heart.

...read moreread less

Abstract: To identify genetic variation underlying atrial fibrillation, the most common cardiac arrhythmia, we performed a genome-wide association study of >1,000,000 people, including 60,620 atrial fibrillation cases and 970,216 controls. We identified 142 independent risk variants at 111 loci and prioritized 151 functional candidate genes likely to be involved in atrial fibrillation. Many of the identified risk variants fall near genes where more deleterious mutations have been reported to cause serious heart defects in humans (GATA4, MYH6, NKX2-5, PITX2, TBX5)1, or near genes important for striated muscle function and integrity (for example, CFL2, MYH7, PKP2, RBM20, SGCG, SSPN). Pathway and functional enrichment analyses also suggested that many of the putative atrial fibrillation genes act via cardiac structural remodeling, potentially in the form of an 'atrial cardiomyopathy'2, either during fetal heart development or as a response to stress in the adult heart.

...read moreread less

447 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
…
17
18
19
20
21
22
23
…
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Fast and accurate short read alignment with Burrows–Wheeler transform

[...]

Heng Li¹, Richard Durbin¹•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Jul 2009-Bioinformatics

TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.

...read moreread less

Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

...read moreread less

43,862 citations

Journal Article•DOI•

Fast gapped-read alignment with Bowtie 2

[...]

Ben Langmead¹, Steven L. Salzberg², Steven L. Salzberg¹, Steven L. Salzberg³•Institutions (3)

University of Maryland, College Park¹, Johns Hopkins University², Johns Hopkins University School of Medicine³

01 Apr 2012-Nature Methods

TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

...read moreread less

Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

...read moreread less

37,898 citations

Journal Article•DOI•

PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses

[...]

Shaun Purcell¹, Shaun Purcell², Benjamin M. Neale³, Benjamin M. Neale¹, Kathe Todd-Brown², Lori Thomas², Manuel A. R. Ferreira², David Bender¹, David Bender², Julian Maller¹, Julian Maller², Pamela Sklar¹, Pamela Sklar², Paul I.W. de Bakker², Paul I.W. de Bakker¹, Mark J. Daly¹, Mark J. Daly², Pak C. Sham⁴ - Show less +14 more•Institutions (4)

Massachusetts Institute of Technology¹, Harvard University², University of London³, University of Hong Kong⁴

01 Sep 2007-American Journal of Human Genetics

TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.

...read moreread less

Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

...read moreread less

26,280 citations

Journal Article•DOI•

Initial sequencing and analysis of the human genome.

[...]

Eric S. Lander¹, Lauren Linton¹, Bruce W. Birren¹, Chad Nusbaum¹ +245 more•Institutions (29)

15 Feb 2001-Nature

TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.

...read moreread less

Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

...read moreread less

22,269 citations

Journal Article•DOI•

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

[...]

Aaron McKenna¹, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, Andrew Kernytsky, Kiran V. Garimella, David Altshuler, Stacey Gabriel, Mark J. Daly, Mark A. DePristo - Show less +7 more•Institutions (1)

Broad Institute¹

01 Sep 2010-Genome Research

TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

20,557 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse