scispace - formally typeset
Search or ask a question
Author

Gonçalo R. Abecasis

Bio: Gonçalo R. Abecasis is an academic researcher from University of Michigan. The author has contributed to research in topics: Genome-wide association study & Population. The author has an hindex of 179, co-authored 595 publications receiving 230323 citations. Previous affiliations of Gonçalo R. Abecasis include Johns Hopkins University School of Medicine & Wellcome Trust Centre for Human Genetics.


Papers
More filters
01 Jan 2019
TL;DR: A transancestral exome-wide association study for body-fat distribution identifies protein-coding variants that are significantly associated with waist-to-hip ratio adjusted for body mass index.

18 citations

Journal ArticleDOI
Catherine Tcheandjieu, Ke Xiao, Heliodoro Tejeda, Julie Lynch, Sanni Ruotsalainen, Tiffany R. Bellomo, Madhuri Palnati, Renae Judy, Derek Klarin, Rachel L. Kember, Shefali S. Verma, Gonçalo R. Abecasis, Aris Baras, Michael Cantor, Giovanni Coppola, Andrew Deubler, Aris Bass Economides, Katia Karalis, Luca A. Lotta, John D. Overton, Jeffrey G. Reid, Katherine A. Siminovitch, Alan R. Shuldiner, Christina Beechert, Caitlin Forsythe, Erin D. Fuller, Zhenhua Gu, Michael Lattari, Alexander Lopez, Maria Sotiopoulos Padilla, Manasi Pradhan, Kia Manoochehri, Thomas D. Schleicher, Louis Widom, Sarah E. Wolf, Ricardo H. Ulloa, Amelia Nilanjana Dadong Sameer Deepika Jeffrey C. Averitt Banerjee Li Malhotra Sharma Staples, Amelia J Averitt, Nilanjana Banerjee, Dadong Li, Sameer Malhotra, Deepika Sharma, Jeffrey Staples, Xiaodong Bai, Suganthi Balasubramanian, Suying Bao, Boris Boutkov, Siying Chen, Gi Seon Eom, Lukas Habegger, Alicia Hawes, Shareef Khalid, Olga A. Krasheninina, Rouel Lanche, Adam Mansfield, Evan Maxwell, George Mitra, Mona Nafde, Sean O’Keeffe, Max Orelus, Razvan Panea, Tommy Polanco, Ayesha Ghulam Rasool, William J Salerno, Kathie Sun, Jiwen Xin, Joshua D. Backman, Amy Damask, L. Dobbyn, Manuel A. R. Ferreira, Arkopravo Ghosh, Christopher Gillies, Lauren Gurski, Eric Jorgenson, Hyun Min Kang, Michael Kessler, Jack A. Kosmicki, Alexander H. Li, Nan Lin, Daren Liu, Adam E. Locke, Jonathan Marchini, Anthony Marcketta, Joelle Mbatchou, Arden Moscati, Charles Paulding, Carlo Sidore, Elisabeth Stahl, Kyoko Watanabe, Bin Ye, B. Zhang, Andrey Ziyatdinov, Marcus B. Jason Lyndon J. Jones Mighty Mitnaul, Marcus B. Jones, Jason Mighty, Lyndon J. Mitnaul, Aarno Palotie, Mark Daly, Mary G. Ritchie, Daniel J. Rader, Manuel A. Rivas, Themistocles L. Assimes, Philip S. Tsao, Scott M. Damrauer, James R. Priest 
TL;DR:

18 citations

Journal ArticleDOI
TL;DR: A method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amounts of missing values is developed.
Abstract: Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values, i.e. the summary association statistics are measured for all variants in contributing studies. In practice, genotype imputation is not always effective. This may be the case when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Existing methods for imputing missing summary association statistics and using imputed values in meta-analysis, approximate conditional analysis, or simple strategies such as complete case analysis all have theoretical limitations. Applying these approaches can bias genetic effect estimates and lead to seriously inflated type-I or type-II errors in conditional analysis, which is a critical tool for identifying independently associated variants. To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amounts of missing values. Based on this estimator, we proposed a score statistic called PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to one of the largest meta-analyses to date for the cigarettes-per-day phenotype. Using the new method, we identified multiple novel independently associated variants at known loci for tobacco use, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants was 1.1%, improving that of previously reported associations by 71%. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants.

18 citations

Posted ContentDOI
19 Oct 2017-bioRxiv
TL;DR: To differentiate primary trait-mediated PRS associations from associations that arise through shared genetic risk profiles, the idea of “exclusion PRS PheWAS” was introduced and uncovered phenome-wide significant associations between a lower risk for hypothyroidism in Patients with high thyroid cancer PRS and a higher risk for actinic keratosis in patients with high squamous cell carcinoma PRS.
Abstract: Health systems are stewards of patient electronic health record (EHR) data with extraordinarily rich depth and breadth, reflecting thousands of diagnoses and exposures. Measures of genomic variation integrated with EHRs offer a potential strategy to accurately stratify patients for risk profiling and discover new relationships between diagnoses and genomes. The objective of this study was to evaluate whether Polygenic Risk Scores (PRS) for common cancers are associated with multiple phenotypes in a Phenome-wide Association Study (PheWAS) conducted in 28,260 unrelated, genotyped patients of recent European ancestry who consented to participate in the Michigan Genomics Initiative, a longitudinal biorepository effort within Michigan Medicine. PRS for 12 cancer traits were calculated using summary statistics from the NHGRI-EBI GWAS catalog. A total of 1,711 synthetic case-control studies was used for PheWAS analyses. Patients with at least one cancer diagnosis constituted 13,490 patients. PRSs exhibited strong association for several cancer traits they were designed for including female breast cancer, prostate cancer and melanomas of skin. Phenome-wide significant associations were observed between PRS and many non-cancer diagnoses. To differentiate primary trait-mediated PRS associations from associations that arise through shared genetic risk profiles, the idea of "exclusion PRS PheWAS" was introduced and uncovered phenome-wide significant associations between a lower risk for hypothyroidism in patients with high thyroid cancer PRS and a higher risk for actinic keratosis in patients with high squamous cell carcinoma PRS after removing all cases of the primary cancer trait. This is the first PheWAS study using PRS instead of single variant.

18 citations

Journal ArticleDOI
TL;DR: The member databases themselves produce regular releases, and for TIGRFAMs the number of models has increased from 1109 in release 1.0 to 1415 in release 2.0 (beginning of 2002).
Abstract: The member databases themselves produce regular releases. PRINTS produces quarterly releases with 50 new fingerprints per release, resulting in 200 additional fingerprints per annum. At InterPro’s conception Pfam had 2008 HMMs, and plan to reach a total of 5000 families by the end of 2002. In 2000 they produced 715 HMMs, in 2001 735 HMMs and aim to have produced 1700 additional HMMs by the end of 2002. For TIGRFAMs, the number of models has increased from 1109 in release 1.0 (2001) to 1415 in release 2.0 (beginning of 2002). The first release of PROSITE in 1989 contained just 60 entries, and today release 17.0 has 1501 signatures. Release 12.0 in 1994 saw the introduction of the first profiles into the releases, and since then they have produced an average of just over 100 new signatures per release (approximately per year).

17 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations

Journal ArticleDOI
TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

37,898 citations

Journal ArticleDOI
TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

26,280 citations

Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

20,557 citations