scispace - formally typeset
Search or ask a question
Author

Gonçalo R. Abecasis

Bio: Gonçalo R. Abecasis is an academic researcher from University of Michigan. The author has contributed to research in topics: Genome-wide association study & Population. The author has an hindex of 179, co-authored 595 publications receiving 230323 citations. Previous affiliations of Gonçalo R. Abecasis include Johns Hopkins University School of Medicine & Wellcome Trust Centre for Human Genetics.


Papers
More filters
Journal ArticleDOI
TL;DR: A new method based on direct analysis of off-target sequence reads that can accurately infer worldwide continental ancestry when used with sequencing data sets with whole-genome shotgun coverage as low as 0.001× is proposed and implemented.
Abstract: Estimating individual ancestry is important in genetic association studies where population structure leads to false positive signals, although assigning ancestry remains challenging with targeted sequence data. We propose a new method for the accurate estimation of individual genetic ancestry, based on direct analysis of off-target sequence reads, and implement our method in the publicly available LASER software. We validate the method using simulated and empirical data and show that the method can accurately infer worldwide continental ancestry when used with sequencing data sets with whole-genome shotgun coverage as low as 0.001×. For estimates of fine-scale ancestry within Europe, the method performs well with coverage of 0.1×. On an even finer scale, the method improves discrimination between exome-sequenced study participants originating from different provinces within Finland. Finally, we show that our method can be used to improve case-control matching in genetic association studies and to reduce the risk of spurious findings due to population structure.

141 citations

Journal ArticleDOI
TL;DR: Phenome-wide significant associations were observed between PRS and many non-cancer diagnoses, and the idea of "exclusion PRS PheWAS" was introduced to differentiate PRS associations driven by the primary trait from associations arising through shared genetic risk profiles.
Abstract: Health systems are stewards of patient electronic health record (EHR) data with extraordinarily rich depth and breadth, reflecting thousands of diagnoses and exposures. Measures of genomic variation integrated with EHRs offer a potential strategy to accurately stratify patients for risk profiling and discover new relationships between diagnoses and genomes. The objective of this study was to evaluate whether polygenic risk scores (PRS) for common cancers are associated with multiple phenotypes in a phenome-wide association study (PheWAS) conducted in 28,260 unrelated, genotyped patients of recent European ancestry who consented to participate in the Michigan Genomics Initiative, a longitudinal biorepository effort within Michigan Medicine. PRS for 12 cancer traits were calculated using summary statistics from the NHGRI-EBI catalog. A total of 1,711 synthetic case-control studies was used for PheWAS analyses. There were 13,490 (47.7%) patients with at least one cancer diagnosis in this study sample. PRS exhibited strong association for several cancer traits they were designed for, including female breast cancer, prostate cancer, melanoma, basal cell carcinoma, squamous cell carcinoma, and thyroid cancer. Phenome-wide significant associations were observed between PRS and many non-cancer diagnoses. To differentiate PRS associations driven by the primary trait from associations arising through shared genetic risk profiles, the idea of "exclusion PRS PheWAS" was introduced. Further analysis of temporal order of the diagnoses improved our understanding of these secondary associations. This comprehensive PheWAS used PRS instead of a single variant.

141 citations

Journal ArticleDOI
TL;DR: The strongest association signals for trait depression were found in RORA (rs12912233; p = 6 × 10 −7 ), a gene involved in circadian rhythm as discussed by the authors.

140 citations

Journal ArticleDOI
07 Mar 2017-JAMA
TL;DR: The presence of rare damaging mutations in LPL was significantly associated with higher triglyceride levels and presence of coronary arteries disease, and further research is needed to assess whether there are causal mechanisms by which heterozygous lipoprotein lipase deficiency could lead to coronary artery disease.
Abstract: Importance The activity of lipoprotein lipase (LPL) is the rate-determining step in clearing triglyceride-rich lipoproteins from the circulation. Mutations that damage the LPL gene ( LPL ) lead to lifelong deficiency in enzymatic activity and can provide insight into the relationship of LPL to human disease. Objective To determine whether rare and/or common variants in LPL are associated with early-onset coronary artery disease (CAD). Design, Setting, and Participants In a cross-sectional study, LPL was sequenced in 10 CAD case-control cohorts of the multinational Myocardial Infarction Genetics Consortium and a nested CAD case-control cohort of the Geisinger Health System DiscovEHR cohort between 2010 and 2015. Common variants were genotyped in up to 305 699 individuals of the Global Lipids Genetics Consortium and up to 120 600 individuals of the CARDIoGRAM Exome Consortium between 2012 and 2014. Study-specific estimates were pooled via meta-analysis. Exposures Rare damaging mutations in LPL included loss-of-function variants and missense variants annotated as pathogenic in a human genetics database or predicted to be damaging by computer prediction algorithms trained to identify mutations that impair protein function. Common variants in the LPL gene region included those independently associated with circulating triglyceride levels. Main Outcomes and Measures Circulating lipid levels and CAD. Results Among 46 891 individuals with LPL gene sequencing data available, the mean (SD) age was 50 (12.6) years and 51% were female. A total of 188 participants (0.40%; 95% CI, 0.35%-0.46%) carried a damaging mutation in LPL , including 105 of 32 646 control participants (0.32%) and 83 of 14 245 participants with early-onset CAD (0.58%). Compared with 46 703 noncarriers, the 188 heterozygous carriers of an LPL damaging mutation displayed higher plasma triglyceride levels (19.6 mg/dL; 95% CI, 4.6-34.6 mg/dL) and higher odds of CAD (odds ratio = 1.84; 95% CI, 1.35-2.51; P LPL variants resulted in an odds ratio for CAD of 1.51 (95% CI, 1.39-1.64; P = 1.1 × 10 −22 ) per 1-SD increase in triglycerides. Conclusions and Relevance The presence of rare damaging mutations in LPL was significantly associated with higher triglyceride levels and presence of coronary artery disease. However, further research is needed to assess whether there are causal mechanisms by which heterozygous lipoprotein lipase deficiency could lead to coronary artery disease.

139 citations

Marco Medici, Eleonora Porcu, Giorgio Pistis, Alexander Teumer, Suzanne J. Brown, Richard A. Jensen, Rajesh Rawal, Greet Roef, Theo S. Plantinga, Sita H. Vermeulen, Jari Lahti, Matthew J. Simmonds, Lise Lotte N. Husemoen, Rachel M. Freathy, Beverley M. Shields, Diana Pietzner, Rebecca Nagy, Linda Broer, Layal Chaker, Tim I M Korevaar, Maria Grazia Plia, Cinzia Sala, Uwe Voelker, J. Brent Richards, Fred C.G.J. Sweep, Christian Gieger, Tanguy Corre, Eero Kajantie, Betina H. Thuesen, Youri Taes, W. Edward Visser, Andrew T. Hattersley, Juergen Kratzsch, Alexander Hamilton, W. G. Li, Georg Homuth, Monia Lobina, Stefano Mariotti, Nicole Soranzo, Massimiliano Cocca, Matthias Nauck, Christin Spielhagen, Alec H. Ross, Alice M. Arnold, Martijn van de Bunt, Sandya Liyanarachchi, Margit Heier, Hans Joergen Grabe, Corrado Masciullo, Tessel E. Galesloot, Ee Mun Lim, Eva Reischl, Peter J. Leedman, Sandra Lai, Alessandro P Delitala, Alexandra Bremner, David I. W. Philips, John Beilby, Antonella Mulas, Matteo Vocale, Gonçalo R. Abecasis, Tom Forsén, Alan James, Elisabeth Widen, Jennie Hui, Holger Prokisch, Ernst E. Rietzschel, Aarno Palotie, Peter Feddema, Stephen J. Fletcher, Katharina Schramm, Jerome I. Rotter, Alexander Kluttig, Doerte Radke, Michela Traglia, Gabriela L. Surdulescu, Huiling He, Jayne A. Franklyn, Daniel Tiller, Bijay Vaidya, Tim De Meyer, Torben Jørgensen, Johan G. Eriksson, Peter O'Leary, Eric Wichmann, Ad R. M. M. Hermus, Bruce M. Psaty, Till Ittermann, Albert Hofman, Emanuele Bosi, David Schlessinger, Henri Wallaschofski, Nicola Pirastu, Yurii S. Aulchenko, Albert de la Chapelle, Romana T. Netea-Maier, Stephen C. L. Gough, Henriette E. Meyer zu Schwabedissen, Timothy M. Frayling, Jean-Marc Kaufman, Allan Linneberg, Katri Raeikkoenen, Johannes W. A. Smit, Lambertus A. Kiemeney, Fernando Rivadeneira, André G. Uitterlinden, John P. Walsh, Christa Meisinger, Martin den Heijer, Theo J. Visser, Tim D. Spector, Scott Wilson, Henry Voelzke, Anne R. Cappola, Daniela Toniolo, Serena Sanna, Silvia Naitza, Robin P. Peeters 
28 Aug 2014
TL;DR: The results provide insight into why individuals with thyroid autoimmunity do or do not eventually develop thyroid disease, and these markers may therefore predict which TPOAb-positives are particularly at risk of developing clinical thyroid dysfunction.
Abstract: Autoimmune thyroid diseases (AITD) are common, affecting 2-5% of the general population. Individuals with positive thyroid peroxidase antibodies (TPOAbs) have an increased risk of autoimmune hypothyroidism (Hashimoto's thyroiditis), as well as autoimmune hyperthyroidism (Graves' disease). As the possible causative genes of TPOAbs and AITD remain largely unknown, we performed GWAS meta-analyses in 18,297 individuals for TPOAb-positivity (1769 TPOAb-positives and 16,528 TPOAb-negatives) and in 12,353 individuals for TPOAb serum levels, with replication in 8,990 individuals. Significant associations (P<5×10−8) were detected at TPO-rs11675434, ATXN2-rs653178, and BACH2-rs10944479 for TPOAb-positivity, and at TPO-rs11675434, MAGI3-rs1230666, and KALRN-rs2010099 for TPOAb levels. Individual and combined effects (genetic risk scores) of these variants on (subclinical) hypo- and hyperthyroidism, goiter and thyroid cancer were studied. Individuals with a high genetic risk score had, besides an increased risk of TPOAb-positivity (OR: 2.18, 95% CI 1.68–2.81, P = 8.1×10−8), a higher risk of increased thyroid-stimulating hormone levels (OR: 1.51, 95% CI 1.26–1.82, P = 2.9×10−6), as well as a decreased risk of goiter (OR: 0.77, 95% CI 0.66–0.89, P = 6.5×10−4). The MAGI3 and BACH2 variants were associated with an increased risk of hyperthyroidism, which was replicated in an independent cohort of patients with Graves' disease (OR: 1.37, 95% CI 1.22–1.54, P = 1.2×10−7 and OR: 1.25, 95% CI 1.12–1.39, P = 6.2×10−5). The MAGI3 variant was also associated with an increased risk of hypothyroidism (OR: 1.57, 95% CI 1.18–2.10, P = 1.9×10−3). This first GWAS meta-analysis for TPOAbs identified five newly associated loci, three of which were also associated with clinical thyroid disease. With these markers we identified a large subgroup in the general population with a substantially increased risk of TPOAbs. The results provide insight into why individuals with thyroid autoimmunity do or do not eventually develop thyroid disease, and these markers may therefore predict which TPOAb-positives are particularly at risk of developing clinical thyroid dysfunction.

136 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations

Journal ArticleDOI
TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

37,898 citations

Journal ArticleDOI
TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

26,280 citations

Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

20,557 citations