scispace - formally typeset
Search or ask a question
Author

Gonçalo R. Abecasis

Bio: Gonçalo R. Abecasis is an academic researcher from University of Michigan. The author has contributed to research in topics: Genome-wide association study & Population. The author has an hindex of 179, co-authored 595 publications receiving 230323 citations. Previous affiliations of Gonçalo R. Abecasis include Johns Hopkins University School of Medicine & Wellcome Trust Centre for Human Genetics.


Papers
More filters
Journal ArticleDOI
Matthijs J. H. M. van der Loos1, Cornelius A. Rietveld1, Niina Eklund2, Niina Eklund3, Philipp Koellinger1, Fernando Rivadeneira1, Gonçalo R. Abecasis4, Georgina A. Ankra-Badu5, Sebastian E. Baumeister6, Daniel J. Benjamin7, Reiner Biffar6, Stefan Blankenberg8, Dorret I. Boomsma9, David Cesarini10, Francesco Cucca11, Eco J. C. de Geus9, George Dedoussis12, Panos Deloukas13, Maria Dimitriou12, Gudny Eiriksdottir, Johan G. Eriksson, Christian Gieger, Vilmundur Gudnason14, Birgit Höhne, Rolf Holle, Jouke-Jan Hottenga9, Aaron Isaacs1, Marjo-Riitta Järvelin15, Marjo-Riitta Järvelin2, Marjo-Riitta Järvelin16, Magnus Johannesson17, Marika Kaakinen15, Mika Kähönen, Stavroula Kanoni13, Maarit A. Laaksonen2, Jari Lahti3, Lenore J. Launer18, Terho Lehtimäki, Marisa Loitfelder19, Patrik K. E. Magnusson20, Silvia Naitza11, Ben A. Oostra1, Markus Perola3, Markus Perola21, Markus Perola18, Katja Petrovic19, Lydia Quaye5, Olli T. Raitakari22, Samuli Ripatti2, Samuli Ripatti3, Samuli Ripatti13, Paul Scheet23, David Schlessinger18, Carsten Oliver Schmidt6, Helena Schmidt19, Reinhold Schmidt19, Andrea Senft24, Albert V. Smith14, Tim D. Spector5, Ida Surakka2, Ida Surakka3, Rauli Svento15, Antonio Terracciano25, Antonio Terracciano18, Emmi Tikkanen2, Emmi Tikkanen3, Cornelia M. van Duijn1, Jorma Viikari22, Henry Völzke6, H.-Erich Wichmann26, Philipp S. Wild27, Sara M. Willems1, Gonneke Willemsen9, Frank J. A. van Rooij1, Patrick J. F. Groenen1, André G. Uitterlinden1, Albert Hofman1, Roy Thurik1 
04 Apr 2013-PLOS ONE
TL;DR: For example, this paper found that common SNPs when considered jointly explain about half of the narrow-sense heritability of self-employment estimated in twin data (σ(g)(2)/σ(P)(2) = 25%, h(2) = 55%).
Abstract: Economic variables such as income, education, and occupation are known to affect mortality and morbidity, such as cardiovascular disease, and have also been shown to be partly heritable. However, very little is known about which genes influence economic variables, although these genes may have both a direct and an indirect effect on health. We report results from the first large-scale collaboration that studies the molecular genetic architecture of an economic variable-entrepreneurship-that was operationalized using self-employment, a widely-available proxy. Our results suggest that common SNPs when considered jointly explain about half of the narrow-sense heritability of self-employment estimated in twin data (σ(g)(2)/σ(P)(2) = 25%, h(2) = 55%). However, a meta-analysis of genome-wide association studies across sixteen studies comprising 50,627 participants did not identify genome-wide significant SNPs. 58 SNPs with p<10(-5) were tested in a replication sample (n = 3,271), but none replicated. Furthermore, a gene-based test shows that none of the genes that were previously suggested in the literature to influence entrepreneurship reveal significant associations. Finally, SNP-based genetic scores that use results from the meta-analysis capture less than 0.2% of the variance in self-employment in an independent sample (p≥0.039). Our results are consistent with a highly polygenic molecular genetic architecture of self-employment, with many genetic variants of small effect. Although self-employment is a multi-faceted, heavily environmentally influenced, and biologically distal trait, our results are similar to those for other genetically complex and biologically more proximate outcomes, such as height, intelligence, personality, and several diseases.

164 citations

Journal ArticleDOI
Anne E. Justice1, Thomas W. Winkler2, Mary F. Feitosa3, Misa Graff1  +367 moreInstitutions (97)
TL;DR: The results suggest that tobacco smoking may alter the genetic susceptibility to overall adiposity and body fat distribution, and highlight the importance of accounting for environment in genetic analyses.
Abstract: Few genome-wide association studies (GWAS) account for environmental exposures, like smoking, potentially impacting the overall trait variance when investigating the genetic contribution to obesity-related traits. Here, we use GWAS data from 51,080 current smokers and 190,178 nonsmokers (87% European descent) to identify loci influencing BMI and central adiposity, measured as waist circumference and waist-to-hip ratio both adjusted for BMI. We identify 23 novel genetic loci, and 9 loci with convincing evidence of gene-smoking interaction (GxSMK) on obesity-related traits. We show consistent direction of effect for all identified loci and significance for 18 novel and for 5 interaction loci in an independent study sample. These loci highlight novel biological functions, including response to oxidative stress, addictive behaviour, and regulatory functions emphasizing the importance of accounting for environment in genetic analyses. Our results suggest that tobacco smoking may alter the genetic susceptibility to overall adiposity and body fat distribution.

159 citations

Posted ContentDOI
Ji Chen1, Ji Chen2, Cassandra N. Spracklen3, Cassandra N. Spracklen4  +475 moreInstitutions (145)
25 Jul 2020-bioRxiv
TL;DR: Genomic feature, gene-expression and gene-set analyses revealed distinct biological signatures for each trait, highlighting different underlying biological pathways, increasing understanding of diabetes pathophysiology by use of trans-ancestry studies for improved power and resolution.
Abstract: Glycaemic traits are used to diagnose and monitor type 2 diabetes, and cardiometabolic health. To date, most genetic studies of glycaemic traits have focused on individuals of European ancestry. Here, we aggregated genome-wide association studies in up to 281,416 individuals without diabetes (30% non-European ancestry) with fasting glucose, 2h-glucose post-challenge, glycated haemoglobin, and fasting insulin data. Trans-ancestry and single-ancestry meta-analyses identified 242 loci (99 novel; P

158 citations

Journal ArticleDOI
TL;DR: The methods reduced false-discovery rates and increased the number of expression quantitative trait loci (eQTLs) mapped either locally or at a distance, and used new statistical methods for dimension reduction to account for nongenetic effects in estimates of expression levels.
Abstract: Expression quantitative trait loci (eQTLs) provide insights into the regulation of transcription and aid in interpretation of genome-wide association studies (GWASs) (Stranger et al. 2005, 2007a,b; Dixon et al. 2007; Moffatt et al. 2007; Cookson et al. 2009; Heid et al. 2010; Hsu et al. 2010; Lango Allen et al. 2010; Speliotes et al. 2010; Chu et al. 2011). Transcript abundances for 40%–70% of genes are heritable, but only 25%–35% of the heritable component in expression levels has been explained by the eQTLs so far identified (Dixon et al. 2007; Goring et al. 2007; Stranger et al. 2007a,b; Emilsson et al. 2008). The lack of eQTLs for many heritable transcript abundances may be due to multiple factors. These include the limited sample sizes of previous studies, high signal noise in microarray measurements of transcript abundances, variation in biological and technical factors that increase measurement errors in gene expression abundance, limited coverage of genetic variation using commercial genotyping platforms, and incomplete coverage of the transcriptome by gene expression arrays. In order to increase the power of eQTL mapping and to build a more complete map of single nucleotide polymorphisms (SNPs) influencing gene expression, we have expanded our previous analysis (Dixon et al. 2007) by including data generated using newer whole-genome gene expression arrays. We have refined our analyses using newly developed statistical methods (Leek and Storey 2007; Stegle et al. 2010) together with an expanded catalog of genetic variation generated by the 1000 Genomes Project. In this introduction, we first briefly review the rationale for each of these refinements. Variation in the conditions and timing of experiments and operator characteristics may introduce variation in the measurements of transcript abundances, as may batch effects on the manufacture of microarray chips (Akey et al. 2007). Biological conditions such as stage of the cells when RNA is extracted and other unknown factors may also form important influences on the measurement of gene expression. Despite these confounders, the deep information among the thousands of transcripts on microarrays may be used to improve the accuracy of gene expression measurements. All probes on an individual microarray undergo identical experimental conditions that can be summarized by dimension reduction methods, such as principal components analysis (PCA) or factor analysis (Leek and Storey 2007; Stegle et al. 2010). We systematically evaluate this strategy in our data sets and show that the top principal components (PCs) of gene expression are highly correlated with RNA extraction and cDNA synthesis dates, the date that the sample was fragmented, and the date of chip hybridization. We go on to show that including these PCs in downstream analyses reduces false positives and increases power for both local and distant eQTLs. Commonly used gene expression microarrays are manufactured using chip designs that may lead to differential coverage of the transcriptome. For example, the probesets on the Affymetrix U133 Plus 2 chip consist of multiple probes, each 25 bp long. The probeset level intensity combining all probes is used as the measure of transcript abundance. On the other hand, the Illumina Human6 V1 array has only one probe of 50 bp long per transcript. Affymetrix and Illumina probes may sit in different positions in a gene and, as a consequence, produce different intensities of gene expression measurements. In addition, the genes that are represented on an array may differ between platforms, so that only 7601 genes are covered by both the Affymetrix and Illumina microarrays discussed above. Newer chip designs such as the Affymetrix Human Gene 1.0 ST arrays are more inclusive, and RNA sequencing can now provide comprehensive cover of the transcriptome, although its cost and complexity still limits its utility. While waiting for the technology to evolve, it is of importance to recognize that individual eQTL detection may be limited by the experimental platform chosen. Genotype imputation is commonly used to increase the power and coverage of individual GWASs and to facilitate meta-analysis across studies utilizing different genotyping platforms (Scott et al. 2007; Wellcome Trust Case Control Consortium 2007; Sanna et al. 2008; Willer et al. 2008). To date, most studies using genotype imputation have used HapMap samples as a template reference panel (Frazer et al. 2007). The 1000 Genomes Project Consortium (in the following text abbreviated as 1000G) (1000 Genomes Project Consortium 2010; http://www.1000genomes.org) aims at developing a comprehensive catalog of human genetic variants of SNP and structure variants with allele frequency down to 1%. One immediate benefit from this project is a deeper and broader reference panel of variants for genotype imputation. Common SNPs that were implicitly tested for association by being tagged by one or more HapMap SNPs may now be directly imputed and tested. In this study we use two large gene expression data sets from nuclear families ascertained through a child with asthma using the Affymetrix Hu133A platform (the MRCA panel) (Dixon et al. 2007) or eczema using the Illumina bead array platform (the MRCE panel) (Morar et al. 2007). The study of families allows estimations of heritability for each expression trait. We compare the power of eQTL mapping using imputation of the new reference panel of 8 million SNPs and imputation of HapMap SNPs. We are able to identify new eQTLs and categorize them by allele frequency, genome coverage, effect size, and trait heritability. We have defined local associations as expression SNP (eSNP) and gene within 1 Mb on the same chromosome (the equivalent of cis), and distant associations as eSNP and gene >1 Mb away from gene, either on the same chromosome or on different chromosomes (the equivalent of trans).

158 citations

Journal ArticleDOI
Stéphanie Martine van den Berg1, Marleen H. M. de Moor2, Karin J. H. Verweij2, Karin J. H. Verweij3, Robert F. Krueger4, Michelle Luciano5, Alejandro Arias Vasquez, Lindsay K. Matteson4, Jaime Derringer6, Tõnu Esko7, Najaf Amin8, Scott D. Gordon3, Narelle K. Hansell3, Amy B. Hart9, Ilkka Seppälä10, Jennifer E. Huffman11, Bettina Konte12, Jari Lahti13, Minyoung Lee14, Michael B. Miller4, Teresa Nutile, Toshiko Tanaka15, Alexander Teumer16, Alexander Viktorin17, Juho Wedenoja13, Abdel Abdellaoui2, Gonçalo R. Abecasis18, Daniel E. Adkins14, Arpana Agrawal19, Jueri Allik7, Jueri Allik20, Katja Appel16, Timothy B. Bigdeli14, Fabio Busonero, Harry Campbell5, Paul T. Costa21, George Davey Smith22, Gail Davies5, Harriet de Wit9, Jun Ding15, Barbara E. Engelhardt21, Johan G. Eriksson, Iryna O. Fedko2, Luigi Ferrucci15, Barbara Franke23, Ina Giegling12, Richard A. Grucza19, Annette M. Hartmann12, Andrew C. Heath19, Kati Heinonen13, Anjali K. Henders3, Georg Homuth24, Jouke-Jan Hottenga2, William G. Iacono4, Joost G. E. Janzing23, Markus Jokela13, Robert Karlsson17, John P. Kemp22, John P. Kemp25, Matthew G. Kirkpatrick9, Antti Latvala13, Antti Latvala15, Terho Lehtimäki10, David C. Liewald5, Pamela A. F. Madden19, Chiara Magri26, Patrik K. E. Magnusson17, Jonathan Marten11, Andrea Maschio, Hamdi Mbarek2, Sarah E. Medland3, Evelin Mihailov7, Yuri Milaneschi27, Grant W. Montgomery3, Matthias Nauck16, Michel G. Nivard2, Klaasjan G. Ouwens2, Aarno Palotie13, Aarno Palotie28, Erik Pettersson17, Ozren Polasek29, Yong Qian15, Laura Pulkki-Råback13, Olli T. Raitakari30, Olli T. Raitakari31, Anu Realo7, Richard J. Rose32, Daniela Ruggiero, Carsten Oliver Schmidt16, Wendy S. Slutske33, Rossella Sorice, John M. Starr5, Beate St Pourcain, Angelina R. Sutin15, Angelina R. Sutin34, Nicholas J. Timpson22, Holly Trochet11, Sita H. Vermeulen23, Eero Vuoksimaa13, Elisabeth Widen13, Jasper Wouda2, Jasper Wouda1, Margaret J. Wright3, Lina Zgaga5, Lina Zgaga35, David J. Porteous5, Alessandra Minelli26, Abraham A. Palmer9, Dan Rujescu12, Marina Ciullo, Caroline Hayward11, Igor Rudan5, Andres Metspalu20, Andres Metspalu7, Jaakko Kaprio13, Jaakko Kaprio15, Ian J. Deary5, Katri Räikkönen13, James F. Wilson5, James F. Wilson11, Liisa Keltikangas-Järvinen13, Laura J. Bierut19, John M. Hettema14, Hans J. Grabe16, Brenda W.J.H. Penninx27, Cornelia M. van Duijn8, David M. Evans22, David Schlessinger15, Nancy L. Pedersen17, Antonio Terracciano15, Matt McGue36, Matt McGue4, Nicholas G. Martin3, Dorret I. Boomsma2 
TL;DR: A large meta-analysis of GWA studies for extraversion in 63,030 subjects in 29 cohorts shows that extraversion is a highly polygenic personality trait, with an architecture possibly different from other complex human traits, including other personality traits.
Abstract: Extraversion is a relatively stable and heritable personality trait associated with numerous psychosocial, lifestyle and health outcomes. Despite its substantial heritability, no genetic variants have been detected in previous genome-wide association (GWA) studies, which may be due to relatively small sample sizes of those studies. Here, we report on a large meta-analysis of GWA studies for extraversion in 63,030 subjects in 29 cohorts. Extraversion item data from multiple personality inventories were harmonized across inventories and cohorts. No genome-wide significant associations were found at the single nucleotide polymorphism (SNP) level but there was one significant hit at the gene level for a long non-coding RNA site (LOC101928162). Genome-wide complex trait analysis in two large cohorts showed that the additive variance explained by common SNPs was not significantly different from zero, but polygenic risk scores, weighted using linkage information, significantly predicted extraversion scores in an independent cohort. These results show that extraversion is a highly polygenic personality trait, with an architecture possibly different from other complex human traits, including other personality traits. Future studies are required to further determine which genetic variants, by what modes of gene action, constitute the heritable nature of extraversion.

156 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations

Journal ArticleDOI
TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

37,898 citations

Journal ArticleDOI
TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

26,280 citations

Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

20,557 citations