scispace - formally typeset
Search or ask a question

Showing papers by "Gonçalo R. Abecasis published in 2013"


Journal ArticleDOI
Cristen J. Willer1, Ellen M. Schmidt1, Sebanti Sengupta1, Gina M. Peloso2  +316 moreInstitutions (87)
TL;DR: It is found that loci associated with blood lipid levels are often associated with cardiovascular and metabolic traits, including coronary artery disease, type 2 diabetes, blood pressure, waist-hip ratio and body mass index.
Abstract: Levels of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides and total cholesterol are heritable, modifiable risk factors for coronary artery disease. To identify new loci and refine known loci influencing these lipids, we examined 188,577 individuals using genome-wide and custom genotyping arrays. We identify and annotate 157 loci associated with lipid levels at P < 5 × 10(-8), including 62 loci not previously associated with lipid levels in humans. Using dense genotyping in individuals of European, East Asian, South Asian and African ancestry, we narrow association signals in 12 loci. We find that loci associated with blood lipid levels are often associated with cardiovascular and metabolic traits, including coronary artery disease, type 2 diabetes, blood pressure, waist-hip ratio and body mass index. Our results demonstrate the value of using genetic data from individuals of diverse ancestry and provide insights into the biological mechanisms regulating blood lipids to guide future genetic, biological and therapeutic research.

2,585 citations


Journal ArticleDOI
Liuqing Yang, Chunru Lin, Chunyu Jin, Joy C. Yang  +165 moreInstitutions (1)

1,514 citations


Journal ArticleDOI
10 Jan 2013-Nature
TL;DR: The results better delimit the historical details of human protein-coding variation, show the profound effect of recent human history on the burden of deleterious SNVs segregating in contemporary populations, and provide important practical information that can be used to prioritize variants in disease-gene discovery.
Abstract: Establishing the age of each mutation segregating in contemporary human populations is important to fully understand our evolutionary history and will help to facilitate the development of new approaches for disease-gene discovery. Large-scale surveys of human genetic variation have reported signatures of recent explosive population growth, notable for an excess of rare genetic variants, suggesting that many mutations arose recently. To more quantitatively assess the distribution of mutation ages, we resequenced 15,336 genes in 6,515 individuals of European American and African American ancestry and inferred the age of 1,146,401 autosomal single nucleotide variants (SNVs). We estimate that approximately 73% of all protein-coding SNVs and approximately 86% of SNVs predicted to be deleterious arose in the past 5,000-10,000 years. The average age of deleterious SNVs varied significantly across molecular pathways, and disease genes contained a significantly higher proportion of recently arisen deleterious SNVs than other genes. Furthermore, European Americans had an excess of deleterious variants in essential and Mendelian disease genes compared to African Americans, consistent with weaker purifying selection due to the Out-of-Africa dispersal. Our results better delimit the historical details of human protein-coding variation, show the profound effect of recent human history on the burden of deleterious SNVs segregating in contemporary populations, and provide important practical information that can be used to prioritize variants in disease-gene discovery.

934 citations


Journal ArticleDOI
Ron Do1, Cristen J. Willer2, Ellen M. Schmidt2, Sebanti Sengupta2  +263 moreInstitutions (83)
TL;DR: It is suggested that triglyceride-rich lipoproteins causally influence risk for CAD, and the strength of a polymorphism's effect on triglyceride levels is correlated with the magnitude of its effect on CAD risk.
Abstract: Triglycerides are transported in plasma by specific triglyceride-rich lipoproteins; in epidemiological studies, increased triglyceride levels correlate with higher risk for coronary artery disease (CAD). However, it is unclear whether this association reflects causal processes. We used 185 common variants recently mapped for plasma lipids (P < 5 × 10(-8) for each) to examine the role of triglycerides in risk for CAD. First, we highlight loci associated with both low-density lipoprotein cholesterol (LDL-C) and triglyceride levels, and we show that the direction and magnitude of the associations with both traits are factors in determining CAD risk. Second, we consider loci with only a strong association with triglycerides and show that these loci are also associated with CAD. Finally, in a model accounting for effects on LDL-C and/or high-density lipoprotein cholesterol (HDL-C) levels, the strength of a polymorphism's effect on triglyceride levels is correlated with the magnitude of its effect on CAD risk. These results suggest that triglyceride-rich lipoproteins causally influence risk for CAD.

817 citations


Journal ArticleDOI
Lars G. Fritsche1, Lars G. Fritsche2, Wei Chen1, Wei Chen3  +182 moreInstitutions (60)
TL;DR: A collaborative genome-wide association study, including >17,100 advanced AMD cases and >60,000 controls of European and Asian ancestry, identifies 19 loci associated at P < 5 × 10−8, which show enrichment for genes involved in the regulation of complement activity, lipid metabolism, extracellular matrix remodeling and angiogenesis.
Abstract: Age-related macular degeneration (AMD) is a common cause of blindness in older individuals To accelerate the understanding of AMD biology and help design new therapies, we executed a collaborative genome-wide association study, including >17,100 advanced AMD cases and >60,000 controls of European and Asian ancestry We identified 19 loci associated at P < 5 × 10(-8) These loci show enrichment for genes involved in the regulation of complement activity, lipid metabolism, extracellular matrix remodeling and angiogenesis Our results include seven loci with associations reaching P < 5 × 10(-8) for the first time, near the genes COL8A1-FILIP1L, IER3-DDR1, SLC16A8, TGFBR1, RAD51B, ADAMTS9 and B3GALTL A genetic risk score combining SNP genotypes from all loci showed similar ability to distinguish cases and controls in all samples examined Our findings provide new directions for biological, genetic and therapeutic studies of AMD

745 citations


Journal ArticleDOI
Sonja I. Berndt1, Stefan Gustafsson2, Stefan Gustafsson3, Reedik Mägi4  +382 moreInstitutions (117)
TL;DR: A genome-wide search for loci associated with the upper versus the lower 5th percentiles of body mass index, height and waist-to-hip ratio as well as clinical classes of obesity, including up to 263,407 individuals of European ancestry finds a large overlap in genetic structure and the distribution of variants between traits based on extremes and the general population and little etiological heterogeneity between obesity subgroups.
Abstract: Approaches exploiting trait distribution extremes may be used to identify loci associated with common traits, but it is unknown whether these loci are generalizable to the broader population. In a genome-wide search for loci associated with the upper versus the lower 5th percentiles of body mass index, height and waist-to-hip ratio, as well as clinical classes of obesity, including up to 263,407 individuals of European ancestry, we identified 4 new loci (IGFBP4, H6PD, RSRC1 and PPP2R2A) influencing height detected in the distribution tails and 7 new loci (HNF4G, RPTOR, GNAT2, MRPS33P4, ADCY9, HS6ST3 and ZZZ3) for clinical classes of obesity. Further, we find a large overlap in genetic structure and the distribution of variants between traits based on extremes and the general population and little etiological heterogeneity between obesity subgroups.

576 citations



Journal ArticleDOI
TL;DR: The value of sex-specific GWAS to unravel the sexually dimorphic genetic underpinning of complex traits is demonstrated, with no evidence for genetic effects with opposite directions in men versus women.
Abstract: Given the anthropometric differences between men and women and previous evidence of sex-difference in genetic effects, we conducted a genome-wide search for sexually dimorphic associations with height, weight, body mass index, waist circumference, hip circumference, and waist-to-hip-ratio (133,723 individuals) and took forward 348 SNPs into follow-up (additional 137,052 individuals) in a total of 94 studies. Seven loci displayed significant sex-difference (FDR<5%), including four previously established (near GRB14/COBLL1, LYPLAL1/SLC30A10, VEGFA, ADAMTS9) and three novel anthropometric trait loci (near MAP3K1, HSD17B4, PPARG), all of which were genome-wide significant in women (P<5×10(-8)), but not in men. Sex-differences were apparent only for waist phenotypes, not for height, weight, BMI, or hip circumference. Moreover, we found no evidence for genetic effects with opposite directions in men versus women. The PPARG locus is of specific interest due to its role in diabetes genetics and therapy. Our results demonstrate the value of sex-specific GWAS to unravel the sexually dimorphic genetic underpinning of complex traits.

402 citations



Journal ArticleDOI
TL;DR: Exome array genotyping is a valuable approach to identify low-frequency variants that contribute to complex traits and it is demonstrated that the interpretation of single-variant and gene-based tests needs to consider the effects of noncoding SNPs both nearby and megabases away.
Abstract: Karen Mohlke, Markku Laakso, Michael Boehnke and colleagues report the first application of the Illumina HumanExome Beadchip array, examining association with insulin and glycemic traits in 8,229 nondiabetic Finnish males from the population-based Metabolic Syndrome in Men (METSIM) study. They identify low-frequency coding variants at both known and newly associated loci with insulin processing and secretion.

282 citations


Journal ArticleDOI
TL;DR: Two large-effect rare coding variants associated with a large increase in risk of age-related macular degeneration suggest decreased inhibition of C3 by complement factor H, resulting in increased activation of the alternative complement pathway, as a key component of disease biology.
Abstract: Macular degeneration is a common cause of blindness in the elderly. To identify rare coding variants associated with a large increase in risk of age-related macular degeneration (AMD), we sequenced 2,335 cases and 789 controls in 10 candidate loci (57 genes). To increase power, we augmented our control set with ancestry-matched exome-sequenced controls. An analysis of coding variation in 2,268 AMD cases and 2,268 ancestry-matched controls identified 2 large-effect rare variants: previously described p.Arg1210Cys encoded in the CFH gene (case frequency (fcase) = 0.51%; control frequency (fcontrol) = 0.02%; odds ratio (OR) = 23.11) and newly identified p.Lys155Gln encoded in the C3 gene (fcase = 1.06%; fcontrol = 0.39%; OR = 2.68). The variants suggest decreased inhibition of C3 by complement factor H, resulting in increased activation of the alternative complement pathway, as a key component of disease biology.

Journal ArticleDOI
Matthijs J. H. M. van der Loos1, Cornelius A. Rietveld1, Niina Eklund2, Niina Eklund3, Philipp Koellinger1, Fernando Rivadeneira1, Gonçalo R. Abecasis4, Georgina A. Ankra-Badu5, Sebastian E. Baumeister6, Daniel J. Benjamin7, Reiner Biffar6, Stefan Blankenberg8, Dorret I. Boomsma9, David Cesarini10, Francesco Cucca11, Eco J. C. de Geus9, George Dedoussis12, Panos Deloukas13, Maria Dimitriou12, Gudny Eiriksdottir, Johan G. Eriksson, Christian Gieger, Vilmundur Gudnason14, Birgit Höhne, Rolf Holle, Jouke-Jan Hottenga9, Aaron Isaacs1, Marjo-Riitta Järvelin15, Marjo-Riitta Järvelin16, Marjo-Riitta Järvelin3, Magnus Johannesson17, Marika Kaakinen16, Mika Kähönen, Stavroula Kanoni13, Maarit A. Laaksonen3, Jari Lahti2, Lenore J. Launer18, Terho Lehtimäki, Marisa Loitfelder19, Patrik K. E. Magnusson20, Silvia Naitza11, Ben A. Oostra1, Markus Perola21, Markus Perola2, Markus Perola18, Katja Petrovic19, Lydia Quaye5, Olli T. Raitakari22, Samuli Ripatti3, Samuli Ripatti2, Samuli Ripatti13, Paul Scheet23, David Schlessinger18, Carsten Oliver Schmidt6, Helena Schmidt19, Reinhold Schmidt19, Andrea Senft24, Albert V. Smith14, Tim D. Spector5, Ida Surakka3, Ida Surakka2, Rauli Svento16, Antonio Terracciano25, Antonio Terracciano18, Emmi Tikkanen2, Emmi Tikkanen3, Cornelia M. van Duijn1, Jorma Viikari22, Henry Völzke6, H.-Erich Wichmann26, Philipp S. Wild27, Sara M. Willems1, Gonneke Willemsen9, Frank J. A. van Rooij1, Patrick J. F. Groenen1, André G. Uitterlinden1, Albert Hofman1, Roy Thurik1 
04 Apr 2013-PLOS ONE
TL;DR: For example, this paper found that common SNPs when considered jointly explain about half of the narrow-sense heritability of self-employment estimated in twin data (σ(g)(2)/σ(P)(2) = 25%, h(2) = 55%).
Abstract: Economic variables such as income, education, and occupation are known to affect mortality and morbidity, such as cardiovascular disease, and have also been shown to be partly heritable. However, very little is known about which genes influence economic variables, although these genes may have both a direct and an indirect effect on health. We report results from the first large-scale collaboration that studies the molecular genetic architecture of an economic variable-entrepreneurship-that was operationalized using self-employment, a widely-available proxy. Our results suggest that common SNPs when considered jointly explain about half of the narrow-sense heritability of self-employment estimated in twin data (σ(g)(2)/σ(P)(2) = 25%, h(2) = 55%). However, a meta-analysis of genome-wide association studies across sixteen studies comprising 50,627 participants did not identify genome-wide significant SNPs. 58 SNPs with p<10(-5) were tested in a replication sample (n = 3,271), but none replicated. Furthermore, a gene-based test shows that none of the genes that were previously suggested in the literature to influence entrepreneurship reveal significant associations. Finally, SNP-based genetic scores that use results from the meta-analysis capture less than 0.2% of the variance in self-employment in an independent sample (p≥0.039). Our results are consistent with a highly polygenic molecular genetic architecture of self-employment, with many genetic variants of small effect. Although self-employment is a multi-faceted, heavily environmentally influenced, and biologically distal trait, our results are similar to those for other genetically complex and biologically more proximate outcomes, such as height, intelligence, personality, and several diseases.

Journal ArticleDOI
TL;DR: The methods reduced false-discovery rates and increased the number of expression quantitative trait loci (eQTLs) mapped either locally or at a distance, and used new statistical methods for dimension reduction to account for nongenetic effects in estimates of expression levels.
Abstract: Expression quantitative trait loci (eQTLs) provide insights into the regulation of transcription and aid in interpretation of genome-wide association studies (GWASs) (Stranger et al. 2005, 2007a,b; Dixon et al. 2007; Moffatt et al. 2007; Cookson et al. 2009; Heid et al. 2010; Hsu et al. 2010; Lango Allen et al. 2010; Speliotes et al. 2010; Chu et al. 2011). Transcript abundances for 40%–70% of genes are heritable, but only 25%–35% of the heritable component in expression levels has been explained by the eQTLs so far identified (Dixon et al. 2007; Goring et al. 2007; Stranger et al. 2007a,b; Emilsson et al. 2008). The lack of eQTLs for many heritable transcript abundances may be due to multiple factors. These include the limited sample sizes of previous studies, high signal noise in microarray measurements of transcript abundances, variation in biological and technical factors that increase measurement errors in gene expression abundance, limited coverage of genetic variation using commercial genotyping platforms, and incomplete coverage of the transcriptome by gene expression arrays. In order to increase the power of eQTL mapping and to build a more complete map of single nucleotide polymorphisms (SNPs) influencing gene expression, we have expanded our previous analysis (Dixon et al. 2007) by including data generated using newer whole-genome gene expression arrays. We have refined our analyses using newly developed statistical methods (Leek and Storey 2007; Stegle et al. 2010) together with an expanded catalog of genetic variation generated by the 1000 Genomes Project. In this introduction, we first briefly review the rationale for each of these refinements. Variation in the conditions and timing of experiments and operator characteristics may introduce variation in the measurements of transcript abundances, as may batch effects on the manufacture of microarray chips (Akey et al. 2007). Biological conditions such as stage of the cells when RNA is extracted and other unknown factors may also form important influences on the measurement of gene expression. Despite these confounders, the deep information among the thousands of transcripts on microarrays may be used to improve the accuracy of gene expression measurements. All probes on an individual microarray undergo identical experimental conditions that can be summarized by dimension reduction methods, such as principal components analysis (PCA) or factor analysis (Leek and Storey 2007; Stegle et al. 2010). We systematically evaluate this strategy in our data sets and show that the top principal components (PCs) of gene expression are highly correlated with RNA extraction and cDNA synthesis dates, the date that the sample was fragmented, and the date of chip hybridization. We go on to show that including these PCs in downstream analyses reduces false positives and increases power for both local and distant eQTLs. Commonly used gene expression microarrays are manufactured using chip designs that may lead to differential coverage of the transcriptome. For example, the probesets on the Affymetrix U133 Plus 2 chip consist of multiple probes, each 25 bp long. The probeset level intensity combining all probes is used as the measure of transcript abundance. On the other hand, the Illumina Human6 V1 array has only one probe of 50 bp long per transcript. Affymetrix and Illumina probes may sit in different positions in a gene and, as a consequence, produce different intensities of gene expression measurements. In addition, the genes that are represented on an array may differ between platforms, so that only 7601 genes are covered by both the Affymetrix and Illumina microarrays discussed above. Newer chip designs such as the Affymetrix Human Gene 1.0 ST arrays are more inclusive, and RNA sequencing can now provide comprehensive cover of the transcriptome, although its cost and complexity still limits its utility. While waiting for the technology to evolve, it is of importance to recognize that individual eQTL detection may be limited by the experimental platform chosen. Genotype imputation is commonly used to increase the power and coverage of individual GWASs and to facilitate meta-analysis across studies utilizing different genotyping platforms (Scott et al. 2007; Wellcome Trust Case Control Consortium 2007; Sanna et al. 2008; Willer et al. 2008). To date, most studies using genotype imputation have used HapMap samples as a template reference panel (Frazer et al. 2007). The 1000 Genomes Project Consortium (in the following text abbreviated as 1000G) (1000 Genomes Project Consortium 2010; http://www.1000genomes.org) aims at developing a comprehensive catalog of human genetic variants of SNP and structure variants with allele frequency down to 1%. One immediate benefit from this project is a deeper and broader reference panel of variants for genotype imputation. Common SNPs that were implicitly tested for association by being tagged by one or more HapMap SNPs may now be directly imputed and tested. In this study we use two large gene expression data sets from nuclear families ascertained through a child with asthma using the Affymetrix Hu133A platform (the MRCA panel) (Dixon et al. 2007) or eczema using the Illumina bead array platform (the MRCE panel) (Morar et al. 2007). The study of families allows estimations of heritability for each expression trait. We compare the power of eQTL mapping using imputation of the new reference panel of 8 million SNPs and imputation of HapMap SNPs. We are able to identify new eQTLs and categorize them by allele frequency, genome coverage, effect size, and trait heritability. We have defined local associations as expression SNP (eSNP) and gene within 1 Mb on the same chromosome (the equivalent of cis), and distant associations as eSNP and gene >1 Mb away from gene, either on the same chromosome or on different chromosomes (the equivalent of trans).

Journal ArticleDOI
02 Aug 2013-Science
TL;DR: A putative age for coalescence of ~180,000 to 200,000 years ago is calculated, which is consistent with previous mitochondrial DNA–based estimates and indicates the presumptive timing of coalescence with other human populations.
Abstract: Genetic variation within the male-specific portion of the Y chromosome (MSY) can clarify the origins of contemporary populations, but previous studies were hampered by partial genetic information. Population sequencing of 1204 Sardinian males identified 11,763 MSY single-nucleotide polymorphisms, 6751 of which have not previously been observed. We constructed a MSY phylogenetic tree containing all main haplogroups found in Europe, along with many Sardinian-specific lineage clusters within each haplogroup. The tree was calibrated with archaeological data from the initial expansion of the Sardinian population ~7700 years ago. The ages of nodes highlight different genetic strata in Sardinia and reveal the presumptive timing of coalescence with other human populations. We calculate a putative age for coalescence of ~180,000 to 200,000 years ago, which is consistent with previous mitochondrial DNA–based estimates.


Journal ArticleDOI
TL;DR: It is suggested that genetic loci for BMI can vary in their effects across the life course, underlying the importance of evaluating BMI at different ages.
Abstract: Genetic loci for body mass index (BMI) in adolescence and young adulthood, a period of high risk for weight gain, are understudied, yet may yield important insight into the etiology of obesity and early intervention. To identify novel genetic loci and examine the influence of known loci on BMI during this critical time period in late adolescence and early adulthood, we performed a two-stage meta-analysis using 14 genome-wide association studies in populations of European ancestry with data on BMI between ages 16 and 25 in up to 29 880 individuals. We identified seven independent loci (P < 5.0 × 10−8) near FTO (P = 3.72 × 10−23), TMEM18 (P = 3.24 × 10−17), MC4R (P = 4.41 × 10−17), TNNI3K (P = 4.32 × 10−11), SEC16B (P = 6.24 × 10−9), GNPDA2 (P = 1.11 × 10−8) and POMC (P = 4.94 × 10−8) as well as a potential secondary signal at the POMC locus (rs2118404, P = 2.4 × 10−5 after conditioning on the established single-nucleotide polymorphism at this locus) in adolescents and young adults. To evaluate the impact of the established genetic loci on BMI at these young ages, we examined differences between the effect sizes of 32 published BMI loci in European adult populations (aged 18–90) and those observed in our adolescent and young adult meta-analysis. Four loci (near PRKD1, TNNI3K, SEC16B and CADM2) had larger effects and one locus (near SH2B1) had a smaller effect on BMI during adolescence and young adulthood compared with older adults (P < 0.05). These results suggest that genetic loci for BMI can vary in their effects across the life course, underlying the importance of evaluating BMI at different ages.

Journal ArticleDOI
14 Mar 2013-Nature
TL;DR: Rieder as mentioned in this paper was a member of the Seattle Grand Opportunity group and oversaw data generation and quality control and was one of the pioneers in the development of the GANs.
Abstract: Nature 493, 216–220 (2013); doi:10.1038/nature11690 In this Letter, Mark J. Rieder (Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA) was inadvertently omitted from the author list. He oversaw data generation and quality control and is a member of the Seattle Grand Opportunity group.

Journal ArticleDOI
TL;DR: This study identifies a functional genetic variant in IL6R influencing disease prognosis and specifically predisposing to persistent AD, and supports the importance of genetic variants influencing inflammation in the etiology of AD.
Abstract: Background Atopic dermatitis (AD) is a common inflammatory skin disease. Previous studies have revealed shared genetic determinants among different inflammatory disorders, suggesting that markers associated with immune-related traits might also play a role in AD. Objective We sought to identify novel genetic risk factors for AD. Methods We examined the results of all genome-wide association studies from a public repository and selected 318 genetic markers that were significantly associated with any inflammatory trait. These markers were considered candidates and tested for association with AD in a 3-step approach including 7 study populations with 7130 patients with AD and 9253 control subjects. Results A functional amino acid change in the IL-6 receptor (IL-6R Asp358Ala; rs2228145) was significantly associated with AD (odds ratio [OR], 1.15; P = 5 × 10 −9 ). Interestingly, investigation of 2 independent population-based birth cohorts showed that IL-6R 358Ala specifically predisposes to the persistent form of AD (OR persistent AD = 1.22, P = .0008; OR transient AD = 1.04, P = .54). This variant determines the balance between the classical membrane-bound versus soluble IL-6R signaling pathways. Carriers of 358Ala had increased serum levels of soluble IL-6R ( P = 4 × 10 −14 ), with homozygote carriers showing a 2-fold increase. Moreover, we demonstrate that soluble IL-6R levels were higher in patients with AD than in control subjects (46.0 vs 37.8 ng/mL, P = .001). Additional AD risk variants were identified in RAD50 , RUNX3 , and ERBB3 . Conclusion Our study supports the importance of genetic variants influencing inflammation in the etiology of AD. Moreover, we identified a functional genetic variant in IL6R influencing disease prognosis and specifically predisposing to persistent AD.

Journal ArticleDOI
TL;DR: The results strongly point to a common biological basis of the regulation of theregulation of the authors' appetite for tobacco and food, and thus the vulnerability to nicotine addiction and obesity, and the effect of single-nucleotide polymorphisms affecting body mass index (BMI).
Abstract: Smoking influences body weight such that smokers weigh less than non-smokers and smoking cessation often leads to weight increase. The relationship between body weight and smoking is partly explained by the effect of nicotine on appetite and metabolism. However, the brain reward system is involved in the control of the intake of both food and tobacco. We evaluated the effect of single-nucleotide polymorphisms (SNPs) affecting body mass index (BMI) on smoking behavior, and tested the 32 SNPs identified in a meta-analysis for association with two smoking phenotypes, smoking initiation (SI) and the number of cigarettes smoked per day (CPD) in an Icelandic sample (N=34 216 smokers). Combined according to their effect on BMI, the SNPs correlate with both SI (r=0.019, P=0.00054) and CPD (r=0.032, P=8.0 × 10−7). These findings replicate in a second large data set (N=127 274, thereof 76 242 smokers) for both SI (P=1.2 × 10−5) and CPD (P=9.3 × 10−5). Notably, the variant most strongly associated with BMI (rs1558902-A in FTO) did not associate with smoking behavior. The association with smoking behavior is not due to the effect of the SNPs on BMI. Our results strongly point to a common biological basis of the regulation of our appetite for tobacco and food, and thus the vulnerability to nicotine addiction and obesity.

Journal ArticleDOI
TL;DR: A method for genotype calling in settings where sequence data is available for unrelated individuals and parent-offspring trios is described and it is shown that modeling trio information can greatly increase the accuracy of inferred genotypes and haplotypes, especially on low to modest depth sequencing data.
Abstract: Emerging sequencing technologies allow common and rare variants to be systematically assayed across the human genome in many individuals. In order to improve variant detection and genotype calling, raw sequence data are typically examined across many individuals. Here, we describe a method for genotype calling in settings where sequence data are available for unrelated individuals and parent-offspring trios and show that modeling trio information can greatly increase the accuracy of inferred genotypes and haplotypes, especially on low to modest depth sequencing data. Our method considers both linkage disequilibrium (LD) patterns and the constraints imposed by family structure when assigning individual genotypes and haplotypes. Using simulations, we show that trios provide higher genotype calling accuracy across the frequency spectrum, both overall and at hard-to-call heterozygous sites. In addition, trios provide greatly improved phasing accuracy--improving the accuracy of downstream analyses (such as genotype imputation) that rely on phased haplotypes. To further evaluate our approach, we analyzed data on the first 508 individuals sequenced by the SardiNIA sequencing project. Our results show that our method reduces the genotyping error rate by 50% compared with analysis using existing methods that ignore family structure. We anticipate our method will facilitate genotype calling and haplotype inference for many ongoing sequencing projects.

Journal ArticleDOI
TL;DR: Recombination rate and recombination hotspots have little effect on rare variants of any subtype, yet both have a relatively strong impact on multiple variant subtypes in common variants and substitutions.
Abstract: Understanding patterns of spontaneous mutations is of fundamental interest in studies of human genome evolution and genetic disease. Here, we used extremely rare variants in humans to model the molecular spectrum of single-nucleotide mutations. Compared to common variants in humans and human-chimpanzee fixed differences (substitutions), rare variants, on average, arose more recently in the human lineage and are less affected by the potentially confounding effects of natural selection, population demographic history, and biased gene conversion. We analyzed variants obtained from a population-based sequencing study of 202 genes in >14,000 individuals. We observed considerable variability in the per-gene mutation rate, which was correlated with local GC content, but not recombination rate. Using >20,000 variants with a derived allele frequency ≤ 10(-4), we examined the effect of local GC content and recombination rate on individual variant subtypes and performed comparisons with common variants and substitutions. The influence of local GC content on rare variants differed from that on common variants or substitutions, and the differences varied by variant subtype. Furthermore, recombination rate and recombination hotspots have little effect on rare variants of any subtype, yet both have a relatively strong impact on multiple variant subtypes in common variants and substitutions. This observation is consistent with the effect of biased gene conversion or selection-dependent processes. Our results highlight the distinct biases inherent in the initial mutation patterns and subsequent evolutionary processes that affect segregating variants.

Journal ArticleDOI
TL;DR: Although the 1000 Genomes haplotypes are the most commonly used reference panel for imputation, medical sequencing projects are generating large alternate sets of sequenced samples, and haplotypes from Exome Sequencing Project alone or concatenation of the two panels over quality score-based post-imputation selection or IMPUTE2's two-panel combination are recommended.
Abstract: Summary: Although the 1000 Genomes haplotypes are the most commonly used reference panel for imputation, medical sequencing projects are generating large alternate sets of sequenced samples. Imputation in African Americans using 3384 haplotypes from the Exome Sequencing Project, compared with 2184 haplotypes from 1000 Genomes Project, increased effective sample size by 8.3–11.4% for coding variants with minor allele frequency 51%. No loss of imputation quality was observed using a panel built from phenotypic extremes. We recommend using haplotypes from Exome Sequencing Project alone or concatenation of the two panels over quality score-based post-imputation selection or IMPUTE2’s twopanel combination.

Journal ArticleDOI
TL;DR: The findings of the linkage meta-analysis and those of the GWAS suggest that 11q24 is a susceptible locus for openness, with KCNJ1 as the possible candidate gene.
Abstract: Personality traits are complex phenotypes related to psychosomatic health. Individually, various gene finding methods have not achieved much success in finding genetic variants associated with personality traits. We performed a meta-analysis of four genome-wide linkage scans (N ¼6149 subjects) of five basic personality traits assessed with the NEO Five-Factor Inventory. We compared the significant regions from the meta-analysis of linkage scans with the results of a meta-analysis of genome-wide association studies (GWAS) (NB17000). We found significant evidence of linkage of neuroticism to chromosome 3p14 (rs1490265, LOD ¼4.67) and to chromosome 19q13 (rs628604, LOD ¼3.55); of extraversion to 14q32 (ATGG002, LOD ¼3.3); and of agreeableness to 3p25 (rs709160, LOD ¼3.67) and to two adjacent regions on chromosome 15, including 15q13 (rs970408, LOD ¼4.07) and 15q14 (rs1055356, LOD ¼3.52) in the individual scans. In the meta-analysis, we found strong evidence of linkage of extraversion to 4q34, 9q34, 10q24 and 11q22, openness to 2p25, 3q26, 9p21, 11q24, 15q26 and 19q13 and agreeableness to 4q34 and 19p13. Significant evidence of association in the GWAS was detected between openness and rs677035 at 11q24 (P-value ¼2.6 � 10 � 06 , KCNJ1). The findings of our linkage meta-analysis and those of the GWAS suggest that 11q24 is a susceptible locus for openness, with KCNJ1 as the possible candidate gene.

Journal ArticleDOI
TL;DR: In African Americans, it is shown that local-ancestry estimates derived by the method SEQMIX are very similar to those derived with Illumina's Omni 2.5M genotyping array and much improved in relation to estimates that use only exome genotypes and ignore off-target sequencing reads.
Abstract: Estimates of the ancestry of specific chromosomal regions in admixed individuals are useful for studies of human evolutionary history and for genetic association studies. Previously, this ancestry inference relied on high-quality genotypes from genome-wide association study (GWAS) arrays. These high-quality genotypes are not always available when samples are exome sequenced, and exome sequencing is the strategy of choice for many ongoing genetic studies. Here we show that off-target reads generated during exome-sequencing experiments can be combined with on-target reads to accurately estimate the ancestry of each chromosomal segment in an admixed individual. To reconstruct local ancestry, our method SEQMIX models aligned bases directly instead of relying on hard genotype calls. We evaluate the accuracy of our method through simulations and analysis of samples sequenced by the 1000 Genomes Project and the NHLBI Grand Opportunity Exome Sequencing Project. In African Americans, we show that local-ancestry estimates derived by our method are very similar to those derived with Illumina’s Omni 2.5M genotyping array and much improved in relation to estimates that use only exome genotypes and ignore off-target sequencing reads. Software implementing this method, SEQMIX, can be applied to analysis of human population history or used for genetic association studies in admixed individuals.

Journal ArticleDOI
TL;DR: QPLOT is an automated tool that facilitates assessment of sequence run quality that is computationally efficient, generates webpages for interactive exploration of detailed results, and can handle the joint output of many sequencing runs.
Abstract: Background. Next generation sequencing (NGS) is being widely used to identify genetic variants associated with human disease. Although the approach is cost effective, the underlying data is susceptible to many types of error. Importantly, since NGS technologies and protocols are rapidly evolving, with constantly changing steps ranging from sample preparation to data processing software updates, it is important to enable researchers to routinely assess the quality of sequencing and alignment data prior to downstream analyses. Results. Here we describe QPLOT, an automated tool that can facilitate the quality assessment of sequencing run performance. Taking standard sequence alignments as input, QPLOT generates a series of diagnostic metrics summarizing run quality and produces convenient graphical summaries for these metrics. QPLOT is computationally efficient, generates webpages for interactive exploration of detailed results, and can handle the joint output of many sequencing runs. Conclusion. QPLOT is an automated tool that facilitates assessment of sequence run quality. We routinely apply QPLOT to ensure quick detection of diagnostic of sequencing run problems. We hope that QPLOT will be useful to the community as well.

Journal ArticleDOI
TL;DR: Evaluating the utility--in terms of trial size, duration, and cost-- of enriching prevention trial samples by combining clinical information with genetic risk scores to identify individuals at greater risk of disease shows that these benefits should increase as the list of robustly associated markers for each disease grows and as large samples of genotyped individuals become available.
Abstract: Clinical trials for preventative therapies are complex and costly endeavors focused on individuals likely to develop disease in a short time frame, randomizing them to treatment groups, and following them over time. In such trials, statistical power is governed by the rate of disease events in each group and cost is determined by randomization, treatment, and follow-up. Strategies that increase the rate of disease events by enrolling individuals with high risk of disease can significantly reduce study size, duration, and cost. Comprehensive study of common, complex diseases has resulted in a growing list of robustly associated genetic markers. Here, we evaluate the utility--in terms of trial size, duration, and cost--of enriching prevention trial samples by combining clinical information with genetic risk scores to identify individuals at greater risk of disease. We also describe a framework for utilizing genetic risk scores in these trials and evaluating the associated cost and time savings. With type 1 diabetes (T1D), type 2 diabetes (T2D), myocardial infarction (MI), and advanced age-related macular degeneration (AMD) as examples, we illustrate the potential and limitations of using genetic data for prevention trial design. We illustrate settings where incorporating genetic information could reduce trial cost or duration considerably, as well as settings where potential savings are negligible. Results are strongly dependent on the genetic architecture of the disease, but we also show that these benefits should increase as the list of robustly associated markers for each disease grows and as large samples of genotyped individuals become available.

Journal ArticleDOI
TL;DR: The member databases themselves produce regular releases, and for TIGRFAMs the number of models has increased from 1109 in release 1.0 to 1415 in release 2.0 (beginning of 2002).
Abstract: The member databases themselves produce regular releases. PRINTS produces quarterly releases with 50 new fingerprints per release, resulting in 200 additional fingerprints per annum. At InterPro’s conception Pfam had 2008 HMMs, and plan to reach a total of 5000 families by the end of 2002. In 2000 they produced 715 HMMs, in 2001 735 HMMs and aim to have produced 1700 additional HMMs by the end of 2002. For TIGRFAMs, the number of models has increased from 1109 in release 1.0 (2001) to 1415 in release 2.0 (beginning of 2002). The first release of PROSITE in 1989 contained just 60 entries, and today release 17.0 has 1501 signatures. Release 12.0 in 1994 saw the introduction of the first profiles into the releases, and since then they have produced an average of just over 100 new signatures per release (approximately per year).

Journal ArticleDOI
TL;DR: AbCD (arbitrary coverage design) is a user-friendly interface providing pre-estimated effective sample sizes, specific to each minor allele frequency category, for designs with arbitrary coverage and sample size and for four major ethnic groups (Europeans, Africans, Asians and African Americans).
Abstract: Summary: Recent advances in sequencing technologies have revolutionized genetic studies. Although high-coverage sequencing can uncover most variants present in the sequenced sample, low-coverage sequencing is appealing for its cost effectiveness. Here, we present AbCD (arbitrary coverage design) to aid the design of sequencingbased studies. AbCD is a user-friendly interface providing pre-estimated effective sample sizes, specific to each minor allele frequency category, for designs with arbitrary coverage (0.5–30� )a nd sample size (20–10 000), and for four major ethnic groups (Europeans, Africans, Asians and African Americans). In addition, we also present two software tools: ShotGun and DesignPlanner, which were used to generate the estimates behind AbCD. ShotGun is a flexible short-read simulator for arbitrary user-specified read length and average depth, allowing cycle-specific sequencing error rates and realistic read depth distributions. DesignPlanner is a full pipeline that uses ShotGun to generate sequence data and performs initial SNP discovery, uses our previously presented linkage disequilibrium-aware method to call genotypes, and, finally, provides minor allele frequency-specific effective sample sizes. ShotGun plus DesignPlanner can accommodate effective sample size estimate for any combination of high-depth and low-depth data (for example, whole-genome low-depth plus exonic high-depth) or combination of sequence and genotype data [for example, whole-exome sequencing plus genotyping from existing

Journal ArticleDOI
TL;DR: Shrunken Average (SHAVE) is developed, an approach using a Bayesian Shrinkage estimator that shows a clear increase in power relative to single visits and derived a relation to assess the improvement in power as a function of number of visits and correlation between visits.
Abstract: Measurement error and biological variability generate distortions in quantitative phenotypic data. In longitudinal studies with repeated measurements, the multiple measurements provide a route to reduce noise and correspondingly increase the strength of signals in genome-wide association studies (GWAS).To optimize noise correction, we have developed Shrunken Average (SHAVE), an approach using a Bayesian Shrinkage estimator. This estimator uses regression toward the mean for every individual as a function of (1) their average across visits; (2) their number of visits; and (3) the correlation between visits. Computer simulations support an increase in power, with results very similar to those expected by the assumptions of the model. The method was applied to a real data set for 14 anthropomorphic traits in ∼6000 individuals enrolled in the SardiNIA project, with up to three visits (measurements) for each participant. Results show that additional measurements have a large impact on the strength of GWAS signals, especially when participants have different number of visits, with SHAVE showing a clear increase in power relative to single visits. In addition, we have derived a relation to assess the improvement in power as a function of number of visits and correlation between visits. It can also be applied in the optimization of experimental designs or usage of measuring devices. SHAVE is fast and easy to run, written in R and freely available online.

Posted Content
TL;DR: This work proposes and evaluates new approaches for meta-analysis of rare variant association and shows that this approach retains useful features of single variant meta-analytic approaches and demonstrates its utility in a study of blood lipid levels in ~18,500 individuals genotyped with exome arrays.
Abstract: The vast majority of connections between complex disease and common genetic variants were identified through meta-analysis, a powerful approach that enables large samples sizes while protecting against common artifacts due to population structure, repeated small sample analyses, and/or limitations with sharing individual level data. As the focus of genetic association studies shifts to rare variants, genes and other functional units are becoming the unit of analysis. Here, we propose and evaluate new approaches for meta-analysis of rare variant association. We show that our approach retains useful features of single variant meta-analytic approaches and demonstrate its utility in a study of blood lipid levels in ~18,500 individuals genotyped with exome arrays.