scispace - formally typeset
Search or ask a question
Author

Gonçalo R. Abecasis

Bio: Gonçalo R. Abecasis is an academic researcher from University of Michigan. The author has contributed to research in topics: Genome-wide association study & Population. The author has an hindex of 179, co-authored 595 publications receiving 230323 citations. Previous affiliations of Gonçalo R. Abecasis include Johns Hopkins University School of Medicine & Wellcome Trust Centre for Human Genetics.


Papers
More filters
Journal ArticleDOI
TL;DR: This work used genotypes at 513,008 autosomal single-nucleotide polymorphism (SNP) loci in 443 unrelated individuals from 29 worldwide populations to evaluate the "portability" of the HapMap reference panels for imputation in studies of diverse populations and identified "optimal" mixtures of reference panels that maximized imputation accuracy.
Abstract: A current approach to mapping complex-disease-susceptibility loci in genome-wide association (GWA) studies involves leveraging the information in a reference database of dense genotype data. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and tested for disease association. This imputation strategy has been successful for GWA studies in populations well represented by existing reference panels. We used genotypes at 513,008 autosomal single-nucleotide polymorphism (SNP) loci in 443 unrelated individuals from 29 worldwide populations to evaluate the "portability" of the HapMap reference panels for imputation in studies of diverse populations. When a single HapMap panel was leveraged for imputation of randomly masked genotypes, European populations had the highest imputation accuracy, followed by populations from East Asia, Central and South Asia, the Americas, Oceania, the Middle East, and Africa. For each population, we identified "optimal" mixtures of reference panels that maximized imputation accuracy, and we found that in most populations, mixtures including individuals from at least two HapMap panels produced the highest imputation accuracy. From a separate survey of additional SNPs typed in the same samples, we evaluated imputation accuracy in the scenario in which all genotypes at a given SNP position were unobserved and were imputed on the basis of data from a commercial "SNP chip," again finding that most populations benefited from the use of combinations of two or more HapMap reference panels. Our results can serve as a guide for selecting appropriate reference panels for imputation-based GWA analysis in diverse populations.

237 citations

Journal ArticleDOI
TL;DR: A genome-wide association study to discern differences in genetic risk factors for PsA and cutaneous-only psoriasis (PsC) and finds multiple independent susceptibility variants in the IL12B, NOS2, and IFIH1 regions.
Abstract: Psoriasis vulgaris (PsV) is a common inflammatory and hyperproliferative skin disease. Up to 30% of people with PsV eventually develop psoriatic arthritis (PsA), an inflammatory musculoskeletal condition. To discern differences in genetic risk factors for PsA and cutaneous-only psoriasis (PsC), we carried out a genome-wide association study (GWAS) of 1,430 PsA case subjects and 1,417 unaffected control subjects. Meta-analysis of this study with three other GWASs and two targeted genotyping studies, encompassing a total of 9,293 PsV case subjects, 3,061 PsA case subjects, 3,110 PsC case subjects, and 13,670 unaffected control subjects of European descent, detected 10 regions associated with PsA and 11 with PsC at genome-wide (GW) significance. Several of these association signals (IFNLR1, IFIH1, NFKBIA for PsA; TNFRSF9, LCE3C/B, TRAF3IP2, IL23A, NFKBIA for PsC) have not previously achieved GW significance. After replication, we also identified a PsV-associated SNP near CDKAL1 (rs4712528, odds ratio [OR] = 1.16, p = 8.4 × 10(-11)). Among identified psoriasis risk variants, three were more strongly associated with PsC than PsA (rs12189871 near HLA-C, p = 5.0 × 10(-19); rs4908742 near TNFRSF9, p = 0.00020; rs10888503 near LCE3A, p = 0.0014), and two were more strongly associated with PsA than PsC (rs12044149 near IL23R, p = 0.00018; rs9321623 near TNFAIP3, p = 0.00022). The PsA-specific variants were independent of previously identified psoriasis variants near IL23R and TNFAIP3. We also found multiple independent susceptibility variants in the IL12B, NOS2, and IFIH1 regions. These results provide insights into the pathogenetic similarities and differences between PsC and PsA.

234 citations

Journal ArticleDOI
TL;DR: The findings suggest that TMPRSS6, a regulator of hepcidin synthesis and iron handling, is crucial in hemoglobin level maintenance.
Abstract: We carried out a genome-wide association study of hemoglobin levels in 16,001 individuals of European and Indian Asian ancestry. The most closely associated SNP (rs855791) results in nonsynonymous (V736A) change in the serine protease domain of TMPRSS6 and a blood hemoglobin concentration 0.13 (95% CI 0.09-0.17) g/dl lower per copy of allele A (P = 1.6 x 10(-13)). Our findings suggest that TMPRSS6, a regulator of hepcidin synthesis and iron handling, is crucial in hemoglobin level maintenance.

226 citations

01 Jan 2017
TL;DR: In this article, the effect of genetic risk-scores comprised of erythrocytic or glycemic variants on incident diabetes prediction and on prevalent diabetes screening performance was investigated.
Abstract: Background Glycated hemoglobin (HbA1c) is used to diagnose type 2 diabetes (T2D) and assess glycemic control in patients with diabetes. Previous genome-wide association studies (GWAS) have identified 18 HbA1c-associated genetic variants. These variants proved to be classifiable by their likely biological action as erythrocytic (also associated with erythrocyte traits) or glycemic (associated with other glucose-related traits). In this study, we tested the hypotheses that, in a very large scale GWAS, we would identify more genetic variants associated with HbA1c and that HbA1c variants implicated in erythrocytic biology would affect the diagnostic accuracy of HbA1c. We therefore expanded the number of HbA1c-associated loci and tested the effect of genetic risk-scores comprised of erythrocytic or glycemic variants on incident diabetes prediction and on prevalent diabetes screening performance. Throughout this multiancestry study, we kept a focus on interancestry differences in HbA1c genetics performance that might influence race-ancestry differences in health outcomes. Methods & findings Using genome-wide association meta-analyses in up to 159,940 individuals from 82 cohorts of European, African, East Asian, and South Asian ancestry, we identified 60 common genetic variants associated with HbA1c. We classified variants as implicated in glycemic, erythrocytic, or unclassified biology and tested whether additive genetic scores of erythrocytic variants (GS-E) or glycemic variants (GS-G) were associated with higher T2D incidence in multiethnic longitudinal cohorts (N = 33,241). Nineteen glycemic and 22 erythrocytic variants were associated with HbA1c at genome-wide significance. GS-G was associated with higher T2D risk (incidence OR = 1.05, 95% CI 1.04–1.06, per HbA1c-raising allele, p = 3 × 10−29); whereas GS-E was not (OR = 1.00, 95% CI 0.99–1.01, p = 0.60). In Europeans and Asians, erythrocytic variants in aggregate had only modest effects on the diagnostic accuracy of HbA1c. Yet, in African Americans, the X-linked G6PD G202A variant (T-allele frequency 11%) was associated with an absolute decrease in HbA1c of 0.81%-units (95% CI 0.66–0.96) per allele in hemizygous men, and 0.68%-units (95% CI 0.38–0.97) in homozygous women. The G6PD variant may cause approximately 2% (N = 0.65 million, 95% CI 0.55–0.74) of African American adults with T2D to remain undiagnosed when screened with HbA1c. Limitations include the smaller sample sizes for non-European ancestries and the inability to classify approximately one-third of the variants. Further studies in large multiethnic cohorts with HbA1c, glycemic, and erythrocytic traits are required to better determine the biological action of the unclassified variants. Conclusions As G6PD deficiency can be clinically silent until illness strikes, we recommend investigation of the possible benefits of screening for the G6PD genotype along with using HbA1c to diagnose T2D in populations of African ancestry or groups where G6PD deficiency is common. Screening with direct glucose measurements, or genetically-informed HbA1c diagnostic thresholds in people with G6PD deficiency, may be required to avoid missed or delayed diagnoses.

220 citations

Journal ArticleDOI
18 Oct 2021-Nature
TL;DR: This paper used exome sequencing to explore protein altering variants and their consequences in 454,787 UK Biobank study participants and identified 12 million coding variants, including ~1 million loss-of-function and ~1.8 million deleterious missense variants.
Abstract: A major goal in human genetics is to use natural variation to understand the phenotypic consequences of altering each protein-coding gene in the genome. Here we used exome sequencing1 to explore protein altering variants and their consequences in 454,787 UK Biobank study participants2. We identified 12 million coding variants, including ~1 million loss-of-function and ~1.8 million deleterious missense variants. When these were tested for association with 3,994 health-related traits, we found 564 genes with trait associations at P≤2.18x10-11. Rare variant associations were enriched in GWAS loci, but most (91%) were independent of common variant signals. We discover several risk-increasing associations with traits related to liver disease, eye disease and cancer, among others, as well as novel risk-lowering associations for hypertension (SLC9A3R2), diabetes (MAP3K15, FAM234A) and asthma (SLC27A3). Six genes were associated with brain imaging phenotypes, including two involved in neural development (GBE1, PLD1). 81% of signals available and powered for replication were confirmed in an independent cohort; furthermore, association signals were generally consistent across European, Asian and African ancestry individuals. We illustrate the ability of exome sequencing to identify novel gene-trait associations, elucidate gene function, and pinpoint effector genes underlying GWAS signals at scale.

217 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations

Journal ArticleDOI
TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

37,898 citations

Journal ArticleDOI
TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

26,280 citations

Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

20,557 citations