scispace - formally typeset
Search or ask a question
Author

Gonçalo R. Abecasis

Bio: Gonçalo R. Abecasis is an academic researcher from University of Michigan. The author has contributed to research in topics: Genome-wide association study & Population. The author has an hindex of 179, co-authored 595 publications receiving 230323 citations. Previous affiliations of Gonçalo R. Abecasis include Johns Hopkins University School of Medicine & Wellcome Trust Centre for Human Genetics.


Papers
More filters
Posted ContentDOI
23 Dec 2015-bioRxiv
TL;DR: A reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole genome sequence data from 20 studies of predominantly European ancestry is described, leading to a large increase in the number of SNPs tested in association studies.
Abstract: We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1%, a large increase in the number of SNPs tested in association studies and can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.

20 citations

Journal ArticleDOI
28 Mar 2018-Heredity
TL;DR: An IBD-based approach (GREML-IBD) was applied to estimate heritability in unrelated individuals using phenotypic simulation with thousands of whole-genome sequences across a range of stratification, polygenicity levels, and the minor allele frequencies of causal variants (CVs).
Abstract: Heritability is a fundamental parameter in genetics. Traditional estimates based on family or twin studies can be biased due to shared environmental or non-additive genetic variance. Alternatively, those based on genotyped or imputed variants typically underestimate narrow-sense heritability contributed by rare or otherwise poorly tagged causal variants. Identical-by-descent (IBD) segments of the genome share all variants between pairs of chromosomes except new mutations that have arisen since the last common ancestor. Therefore, relating phenotypic similarity to degree of IBD sharing among classically unrelated individuals is an appealing approach to estimating the near full additive genetic variance while possibly avoiding biases that can occur when modeling close relatives. We applied an IBD-based approach (GREML-IBD) to estimate heritability in unrelated individuals using phenotypic simulation with thousands of whole-genome sequences across a range of stratification, polygenicity levels, and the minor allele frequencies of causal variants (CVs). In simulations, the IBD-based approach produced unbiased heritability estimates, even when CVs were extremely rare, although precision was low. However, population stratification and non-genetic familial environmental effects shared across generations led to strong biases in IBD-based heritability. We used data on two traits in ~120,000 people from the UK Biobank to demonstrate that, depending on the trait and possible confounding environmental effects, GREML-IBD can be applied to very large genetic datasets to infer the contribution of very rare variants lost using other methods. However, we observed apparent biases in these real data, suggesting that more work may be required to understand and mitigate factors that influence IBD-based heritability estimates.

19 citations

Journal ArticleDOI
TL;DR: Evaluating the utility--in terms of trial size, duration, and cost-- of enriching prevention trial samples by combining clinical information with genetic risk scores to identify individuals at greater risk of disease shows that these benefits should increase as the list of robustly associated markers for each disease grows and as large samples of genotyped individuals become available.
Abstract: Clinical trials for preventative therapies are complex and costly endeavors focused on individuals likely to develop disease in a short time frame, randomizing them to treatment groups, and following them over time. In such trials, statistical power is governed by the rate of disease events in each group and cost is determined by randomization, treatment, and follow-up. Strategies that increase the rate of disease events by enrolling individuals with high risk of disease can significantly reduce study size, duration, and cost. Comprehensive study of common, complex diseases has resulted in a growing list of robustly associated genetic markers. Here, we evaluate the utility--in terms of trial size, duration, and cost--of enriching prevention trial samples by combining clinical information with genetic risk scores to identify individuals at greater risk of disease. We also describe a framework for utilizing genetic risk scores in these trials and evaluating the associated cost and time savings. With type 1 diabetes (T1D), type 2 diabetes (T2D), myocardial infarction (MI), and advanced age-related macular degeneration (AMD) as examples, we illustrate the potential and limitations of using genetic data for prevention trial design. We illustrate settings where incorporating genetic information could reduce trial cost or duration considerably, as well as settings where potential savings are negligible. Results are strongly dependent on the genetic architecture of the disease, but we also show that these benefits should increase as the list of robustly associated markers for each disease grows and as large samples of genotyped individuals become available.

19 citations

Journal ArticleDOI
TL;DR: Screening for the MEPE LoF mutations before adulthood could potentially prevent osteoporosis and fractures due to the lifelong effect on BMD observed in the study.
Abstract: A major challenge in genetic association studies is that most associated variants fall in the non-coding part of the human genome. We searched for variants associated with bone mineral density (BMD) after enriching the discovery cohort for loss-of-function (LoF) mutations by sequencing a subset of the Nord-Trondelag Health Study, followed by imputation in the remaining sample (N = 19,705), and identified ten known BMD loci. However, one previously unreported variant, LoF mutation in MEPE, p.(Lys70IlefsTer26, minor allele frequency [MAF] = 0.8%), was associated with decreased ultradistal forearm BMD (P-value = 2.1 × 10−18), and increased osteoporosis (P-value = 4.2 × 10−5) and fracture risk (P-value = 1.6 × 10−5). The MEPE LoF association with BMD and fractures was further evaluated in 279,435 UK (MAF = 0.05%, heel bone estimated BMD P-value = 1.2 × 10−16, any fracture P-value = 0.05) and 375,984 Icelandic samples (MAF = 0.03%, arm BMD P-value = 0.12, forearm fracture P-value = 0.005). Screening for the MEPE LoF mutations before adulthood could potentially prevent osteoporosis and fractures due to the lifelong effect on BMD observed in the study. A key implication for precision medicine is that high-impact functional variants missing from the publicly available cosmopolitan panels could be clinically more relevant than polygenic risk scores. Bone mineral density (BMD) is associated with fracture risk and many genetic loci with small effect sizes have been discovered by genome-wide association studies (GWAS). Here, the authors discover a large-effect rare loss-of-function genetic variant for BMD in the MEPE gene in the Norwegian HUNT study which replicates in the UK Biobank.

19 citations

Journal ArticleDOI
TL;DR: In this paper, a 2-sample mendelian randomization was used to assess whether smoking, alcohol consumption, blood pressure, body mass index, and glycemic traits are associated with increased risk of advanced age-related macular degeneration.
Abstract: Importance Advanced age-related macular degeneration (AMD) is a leading cause of blindness in Western countries. Causal, modifiable risk factors need to be identified to develop preventive measures for advanced AMD. Objective To assess whether smoking, alcohol consumption, blood pressure, body mass index, and glycemic traits are associated with increased risk of advanced AMD. Design, Setting, Participants This study used 2-sample mendelian randomization. Genetic instruments composed of variants associated with risk factors at genome-wide significance (P < 5 × 10-8) were obtained from published genome-wide association studies. Summary-level statistics for these instruments were obtained for advanced AMD from the International AMD Genomics Consortium 2016 data set, which consisted of 16 144 individuals with AMD and 17 832 control individuals. Data were analyzed from July 2020 to September 2021. Exposures Smoking initiation, smoking cessation, lifetime smoking, age at smoking initiation, alcoholic drinks per week, body mass index, systolic and diastolic blood pressure, type 2 diabetes, glycated hemoglobin, fasting glucose, and fasting insulin. Main Outcomes and Measures Advanced AMD and its subtypes, geographic atrophy (GA), and neovascular AMD. Results A 1-SD increase in logodds of genetically predicted smoking initiation was associated with higher risk of advanced AMD (odds ratio [OR], 1.26; 95% CI, 1.13-1.40; P < .001), while a 1-SD increase in logodds of genetically predicted smoking cessation (former vs current smoking) was associated with lower risk of advanced AMD (OR, 0.66; 95% CI, 0.50-0.87; P = .003). Genetically predicted increased lifetime smoking was associated with increased risk of advanced AMD (OR per 1-SD increase in lifetime smoking behavior, 1.32; 95% CI, 1.09-1.59; P = .004). Genetically predicted alcohol consumption was associated with higher risk of GA (OR per 1-SD increase of log-transformed alcoholic drinks per week, 2.70; 95% CI, 1.48-4.94; P = .001). There was insufficient evidence to suggest that genetically predicted blood pressure, body mass index, and glycemic traits were associated with advanced AMD. Conclusions and Relevance This study provides genetic evidence that increased alcohol intake may be a causal risk factor for GA. As there are currently no known treatments for GA, this finding has important public health implications. These results also support previous observational studies associating smoking behavior with risk of advanced AMD, thus reinforcing existing public health messages regarding the risk of blindness associated with smoking.

19 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations

Journal ArticleDOI
TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

37,898 citations

Journal ArticleDOI
TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

26,280 citations

Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

20,557 citations