Home
/
Authors
/
Gonçalo R. Abecasis

Author

Gonçalo R. Abecasis

Other affiliations: Johns Hopkins University School of Medicine, Wellcome Trust Centre for Human Genetics, University of California, Los Angeles ...read more

Bio: Gonçalo R. Abecasis is an academic researcher from University of Michigan. The author has contributed to research in topics: Genome-wide association study & Population. The author has an hindex of 179, co-authored 595 publications receiving 230323 citations. Previous affiliations of Gonçalo R. Abecasis include Johns Hopkins University School of Medicine & Wellcome Trust Centre for Human Genetics.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Genotype-Based Matching to Correct for Population Stratification in Large-Scale Case-Control Genetic Association Studies

[...]

Weihua Guan¹, Liming Liang¹, Michael Boehnke¹, Gonçalo R. Abecasis¹•Institutions (1)

University of Michigan¹

01 Sep 2009-Genetic Epidemiology

TL;DR: Through computer simulation, it is shown that GSM correctly controls false‐positive rates and improves power to detect true disease predisposing variants and compares GSM to genomic control using computer simulations, and finds improved power using GSM.

...read moreread less

Abstract: Genome-wide association studies are helping to dissect the etiology of complex diseases. Although case-control association tests are generally more powerful than family-based association tests, population stratification can lead to spurious diseasemarker association or mask a true association. Several methods have been proposed to match cases and controls prior to genotyping, using family information or epidemiological data, or using genotype data for a modest number of genetic markers. Here, we describe a genetic similarity score matching (GSM) method for efficient matched analysis of cases and controls in a genome-wide or large-scale candidate gene association study. GSM comprises three steps: (1) calculating similarity scores for pairs of individuals using the genotype data; (2) matching sets of cases and controls based on the similarity scores so that matched cases and controls have similar genetic background; and (3) using conditional logistic regression to perform association tests. Through computer simulation we show that GSM correctly controls false-positive rates and improves power to detect true disease predisposing variants. We compare GSM to genomic control using computer simulations, and find improved power using GSM. We suggest that initial matching of cases and controls prior to genotyping combined with careful re-matching after genotyping is a method of choice for genome-wide association studies. Genet. Epidemiol. 33:508–517, 2009. r 2009 Wiley-Liss, Inc.

...read moreread less

42 citations

Journal Article•DOI•

Correcting for Sample Contamination in Genotype Calling of DNA Sequence Data

[...]

Matthew Flickinger¹, Goo Jun², Gonçalo R. Abecasis¹, Michael Boehnke¹, Hyun Min Kang¹ - Show less +1 more•Institutions (2)

University of Michigan¹, University of Texas Health Science Center at Houston²

06 Aug 2015-American Journal of Human Genetics

TL;DR: This work proposes methods to model contamination during genotype calling as an alternative to removal of contaminated samples from further analyses and demonstrates that, for moderate contamination levels, contamination-adjusted calls eliminate 48%-77% of the genotyping errors.

...read moreread less

Abstract: DNA sample contamination is a frequent problem in DNA sequencing studies and can result in genotyping errors and reduced power for association testing. We recently described methods to identify within-species DNA sample contamination based on sequencing read data, showed that our methods can reliably detect and estimate contamination levels as low as 1%, and suggested strategies to identify and remove contaminated samples from sequencing studies. Here we propose methods to model contamination during genotype calling as an alternative to removal of contaminated samples from further analyses. We compare our contamination-adjusted calls to calls that ignore contamination and to calls based on uncontaminated data. We demonstrate that, for moderate contamination levels (5%–20%), contamination-adjusted calls eliminate 48%–77% of the genotyping errors. For lower levels of contamination, our contamination correction methods produce genotypes nearly as accurate as those based on uncontaminated data. Our contamination correction methods are useful generally, but are particularly helpful for sample contamination levels from 2% to 20%.

...read moreread less

41 citations

Protein-Truncating Variants at the Cholesteryl Ester Transfer Protein Gene and Risk for Coronary Heart DiseaseNovelty and Significance

[...]

Akihiro Nomura, Hong-Hee Won, Amit Khera, Fumihiko Takeuchi, Kaoru Ito, Shane McCarthy, Connor A. Emdin, Derek Klarin, Pradeep Natarajan, Namrata Gupta, Gina M. Peloso, Ingrid B. Borecki, Tanya M. Teslovich, Rosanna Asselta, Stefano Duga, Piera Angelica Merlini, Adolfo Correa, Thorsten Kessler, James G. Wilson, Matthew J. Bown, Alistair S. Hall, Peter S. Braund, David J. Carey, H. Lester Kirchner, Joseph B. Leader, Daniel R. Lavage, J. Neil Manus, Dustin N. Hartze, Nilesh J. Samani, Heribert Schunkert, Jaume Marrugat, Roberto Elosua, Ruth McPherson, Martin Farrall, Hugh Watkins, Jyh-Ming Jimmy Juang, Chao A. Hsiung, Shih-Yi Lin, Jun-Sing Wang, Hayato Tada, Masa-aki Kawashiri, Akihiro Inazu, Masakazu Yamagishi, Tomohiro Katsuya, Eitaro Nakashima, Masahiro Nakatochi, Ken Yamamoto, Mitsuhiro Yokota, Yukihide Momozawa, Jerome I. Rotter, Daniel J. Rader, John Danesh, Diego Ardissino, Stacey Gabriel, Cristen J. Willer, Gonçalo R. Abecasis, Danish Saleheen, Michiaki Kubo, Norihiro Kato, Yii-Der Ida Chen, Frederick E. Dewey, Sekar Kathiresan, Seyedeh M. Zekavat, Michael F. Murray, Eric S. Lander - Show less +61 more

01 May 2017

TL;DR: CETP PTV carrier status was associated with reduced risk for CHD and, compared with noncarriers, carriers of PTV at CETP displayed higher high-density lipoprotein cholesterol, lower low-density lipid levels, lower triglycerides, and lower risk forCHD.

...read moreread less

Abstract: Rationale: Therapies that inhibit CETP (cholesteryl ester transfer protein) have failed to demonstrate a reduction in risk for coronary heart disease (CHD). Human DNA sequence variants that truncate the CETP gene may provide insight into the efficacy of CETP inhibition. Objective: To test whether protein-truncating variants (PTVs) at the CETP gene were associated with plasma lipid levels and CHD. Methods and Results: We sequenced the exons of the CETP gene in 58 469 participants from 12 case–control studies (18 817 CHD cases, 39 652 CHD-free controls). We defined PTV as those that lead to a premature stop, disrupt canonical splice sites, or lead to insertions/deletions that shift frame. We also genotyped 1 Japanese-specific PTV in 27561 participants from 3 case–control studies (14 286 CHD cases, 13 275 CHD-free controls). We tested association of CETP PTV carrier status with both plasma lipids and CHD. Among 58 469 participants with CETP gene-sequencing data available, average age was 51.5 years and 43% were women; 1 in 975 participants carried a PTV at the CETP gene. Compared with noncarriers, carriers of PTV at CETP had higher high-density lipoprotein cholesterol (effect size, 22.6 mg/dL; 95% confidence interval, 18–27; P<1.0×10−4), lower low-density lipoprotein cholesterol (−12.2 mg/dL; 95% confidence interval, −23 to −0.98; P=0.033), and lower triglycerides (−6.3%; 95% confidence interval, −12 to −0.22; P=0.043). CETP PTV carrier status was associated with reduced risk for CHD (summary odds ratio, 0.70; 95% confidence interval, 0.54–0.90; P=5.1×10−3). Conclusions: Compared with noncarriers, carriers of PTV at CETP displayed higher high-density lipoprotein cholesterol, lower low-density lipoprotein cholesterol, lower triglycerides, and lower risk for CHD.

...read moreread less

40 citations

Journal Article•DOI•

Population- and individual-specific regulatory variation in Sardinia

[...]

Mauro Pala¹, Mauro Pala², Mauro Pala³, Zachary Zappala³, Mara Marongiu, Xin Li³, Joe R. Davis³, Roberto Cusano, Francesca Crobu, Kimberly R. Kukurba³, Michael J. Gloudemans³, Frederic Reinier, Riccardo Berutti¹, Maria Grazia Piras, Antonella Mulas, Magdalena Zoledziewska, Michele Marongiu, Elena P. Sorokin³, Gaelen T. Hess³, Kevin S. Smith³, Fabio Busonero, Andrea Maschio, Maristella Steri, Carlo Sidore, Serena Sanna, Edoardo Fiorillo, Michael C. Bassik³, Stephen Sawcer⁴, Alexis Battle⁵, John Novembre⁶, Chris Jones, Andrea Angius, Gonçalo R. Abecasis⁷, David Schlessinger⁸, Francesco Cucca¹, Francesco Cucca², Stephen B. Montgomery³ - Show less +33 more•Institutions (8)

University of Sassari¹, University of Cagliari², Stanford University³, University of Cambridge⁴, Johns Hopkins University⁵, University of Chicago⁶, University of Michigan⁷, National Institutes of Health⁸

01 May 2017-Nature Genetics

TL;DR: Outlier genes were enriched for proximal rare variants, providing a new approach to study large-effect regulatory variants and their relevance to traits and their relationship to population history and individual genetic risk.

...read moreread less

Abstract: Genetic studies of complex traits have mainly identified associations with noncoding variants. To further determine the contribution of regulatory variation, we combined whole-genome and transcriptome data for 624 individuals from Sardinia to identify common and rare variants that influence gene expression and splicing. We identified 21,183 expression quantitative trait loci (eQTLs) and 6,768 splicing quantitative trait loci (sQTLs), including 619 new QTLs. We identified high-frequency QTLs and found evidence of selection near genes involved in malarial resistance and increased multiple sclerosis risk, reflecting the epidemiological history of Sardinia. Using family relationships, we identified 809 segregating expression outliers (median z score of 2.97), averaging 13.3 genes per individual. Outlier genes were enriched for proximal rare variants, providing a new approach to study large-effect regulatory variants and their relevance to traits. Our results provide insight into the effects of regulatory variants and their relationship to population history and individual genetic risk.

...read moreread less

38 citations

Journal Article•DOI•

Quantitative Trait Linkage Analysis Using Gaussian Copulas

[...]

Mingyao Li¹, Mingyao Li², Michael Boehnke¹, Gonçalo R. Abecasis¹, Peter X.-K. Song³ - Show less +1 more•Institutions (3)

University of Michigan¹, University of Pennsylvania², University of Waterloo³

01 Aug 2006-Genetics

TL;DR: A modified VC method is developed and implemented that directly models the nonnormal distribution using Gaussian copulas and yields unbiased parameter estimates, correct type I error rates, and improved power for testing linkage with a variety of nonnormal traits as compared with the standard VC and the regression-based methods.

...read moreread less

Abstract: Mapping and identifying variants that influence quantitative traits is an important problem for genetic studies. Traditional QTL mapping relies on a variance-components (VC) approach with the key assumption that the trait values in a family follow a multivariate normal distribution. Violation of this assumption can lead to inflated type I error, reduced power, and biased parameter estimates. To accommodate nonnormally distributed data, we developed and implemented a modified VC method, which we call the “copula VC method,” that directly models the nonnormal distribution using Gaussian copulas. The copula VC method allows the analysis of continuous, discrete, and censored trait data, and the standard VC method is a special case when the data are distributed as multivariate normal. Through the use of link functions, the copula VC method can easily incorporate covariates. We use computer simulations to show that the proposed method yields unbiased parameter estimates, correct type I error rates, and improved power for testing linkage with a variety of nonnormal traits as compared with the standard VC and the regression-based methods.

...read moreread less

38 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
…
72
73
74
75
76
77
78
…
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Fast and accurate short read alignment with Burrows–Wheeler transform

[...]

Heng Li¹, Richard Durbin¹•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Jul 2009-Bioinformatics

TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.

...read moreread less

Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

...read moreread less

43,862 citations

Journal Article•DOI•

Fast gapped-read alignment with Bowtie 2

[...]

Ben Langmead¹, Steven L. Salzberg², Steven L. Salzberg¹, Steven L. Salzberg³•Institutions (3)

University of Maryland, College Park¹, Johns Hopkins University², Johns Hopkins University School of Medicine³

01 Apr 2012-Nature Methods

TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

...read moreread less

Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

...read moreread less

37,898 citations

Journal Article•DOI•

PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses

[...]

Shaun Purcell¹, Shaun Purcell², Benjamin M. Neale³, Benjamin M. Neale¹, Kathe Todd-Brown², Lori Thomas², Manuel A. R. Ferreira², David Bender¹, David Bender², Julian Maller¹, Julian Maller², Pamela Sklar¹, Pamela Sklar², Paul I.W. de Bakker², Paul I.W. de Bakker¹, Mark J. Daly¹, Mark J. Daly², Pak C. Sham⁴ - Show less +14 more•Institutions (4)

Massachusetts Institute of Technology¹, Harvard University², University of London³, University of Hong Kong⁴

01 Sep 2007-American Journal of Human Genetics

TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.

...read moreread less

Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

...read moreread less

26,280 citations

Journal Article•DOI•

Initial sequencing and analysis of the human genome.

[...]

Eric S. Lander¹, Lauren Linton¹, Bruce W. Birren¹, Chad Nusbaum¹ +245 more•Institutions (29)

15 Feb 2001-Nature

TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.

...read moreread less

Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

...read moreread less

22,269 citations

Journal Article•DOI•

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

[...]

Aaron McKenna¹, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, Andrew Kernytsky, Kiran V. Garimella, David Altshuler, Stacey Gabriel, Mark J. Daly, Mark A. DePristo - Show less +7 more•Institutions (1)

Broad Institute¹

01 Sep 2010-Genome Research

TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

20,557 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse