scispace - formally typeset
Search or ask a question

Showing papers by "Michael Boehnke published in 2015"


01 Jan 2015
TL;DR: This paper conducted a genome-wide association study and meta-analysis of body mass index (BMI), a measure commonly used to define obesity and assess adiposity, in up to 339,224 individuals.
Abstract: Obesity is heritable and predisposes to many diseases. To understand the genetic basis of obesity better, here we conduct a genome-wide association study and Metabochip meta-analysis of body mass index (BMI), a measure commonly used to define obesity and assess adiposity, in up to 339,224 individuals. This analysis identifies 97 BMI-associated loci (P 20% of BMI variation. Pathway analyses provide strong support for a role of the central nervous system in obesity susceptibility and implicate new genes and pathways, including those related to synaptic function, glutamate signalling, insulin secretion/action, energy metabolism, lipid biology and adipogenesis.

2,721 citations


Posted ContentDOI
30 Oct 2015-bioRxiv
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities. The resulting catalogue of human genetic diversity has unprecedented resolution, with an average of one variant every eight bases of coding sequence and the presence of widespread mutational recurrence. The deep catalogue of variation provided by the Exome Aggregation Consortium (ExAC) can be used to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; we identify 3,230 genes with near-complete depletion of truncating variants, 79% of which have no currently established human disease phenotype. Finally, we show that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human knockout variants in protein-coding genes.

1,552 citations


Journal ArticleDOI
TL;DR: DEPICT as mentioned in this paper is an integrative tool that employs predicted gene functions to systematically prioritize the most likely causal genes at associated loci, highlight enriched pathways and identify tissues/cell types where genes from associated locis are highly expressed.
Abstract: The main challenge for gaining biological insights from genetic associations is identifying which genes and pathways explain the associations. Here we present DEPICT, an integrative tool that employs predicted gene functions to systematically prioritize the most likely causal genes at associated loci, highlight enriched pathways and identify tissues/cell types where genes from associated loci are highly expressed. DEPICT is not limited to genes with established functions and prioritizes relevant gene sets for many phenotypes.

699 citations


Journal ArticleDOI
Colm O'Dushlaine1, Lizzy Rossin1, Phil Lee2, Laramie E. Duncan1  +401 moreInstitutions (115)
TL;DR: It is indicated that risk variants for psychiatric disorders aggregate in particular biological pathways and that these pathways are frequently shared between disorders.
Abstract: Genome-wide association studies (GWAS) of psychiatric disorders have identified multiple genetic associations with such disorders, but better methods are needed to derive the underlying biological mechanisms that these signals indicate. We sought to identify biological pathways in GWAS data from over 60,000 participants from the Psychiatric Genomics Consortium. We developed an analysis framework to rank pathways that requires only summary statistics. We combined this score across disorders to find common pathways across three adult psychiatric disorders: schizophrenia, major depression and bipolar disorder. Histone methylation processes showed the strongest association, and we also found statistically significant evidence for associations with multiple immune and neuronal signaling pathways and with the postsynaptic density. Our study indicates that risk variants for psychiatric disorders aggregate in particular biological pathways and that these pathways are frequently shared between disorders. Our results confirm known mechanisms and suggest several novel insights into the etiology of psychiatric disorders.

630 citations


Journal ArticleDOI
Thomas W. Winkler1, Anne E. Justice2, Mariaelisa Graff2, Llilda Barata3  +435 moreInstitutions (106)
TL;DR: In this paper, the authors performed meta-analyses of 114 studies with genome-wide chip and/or Metabochip data by the Genetic Investigation of Anthropometric Traits (GIANT) Consortium.
Abstract: Genome-wide association studies (GWAS) have identified more than 100 genetic variants contributing to BMI, a measure of body size, or waist-to-hip ratio (adjusted for BMI, WHRadjBMI), a measure of body shape. Body size and shape change as people grow older and these changes differ substantially between men and women. To systematically screen for age- and/or sex-specific effects of genetic variants on BMI and WHRadjBMI, we performed meta-analyses of 114 studies (up to 320,485 individuals of European descent) with genome-wide chip and/or Metabochip data by the Genetic Investigation of Anthropometric Traits (GIANT) Consortium. Each study tested the association of up to ~2.8M SNPs with BMI and WHRadjBMI in four strata (men ≤50y, men >50y, women ≤50y, women >50y) and summary statistics were combined in stratum-specific meta-analyses. We then screened for variants that showed age-specific effects (G x AGE), sex-specific effects (G x SEX) or age-specific effects that differed between men and women (G x AGE x SEX). For BMI, we identified 15 loci (11 previously established for main effects, four novel) that showed significant (FDR<5%) age-specific effects, of which 11 had larger effects in younger (<50y) than in older adults (≥50y). No sex-dependent effects were identified for BMI. For WHRadjBMI, we identified 44 loci (27 previously established for main effects, 17 novel) with sex-specific effects, of which 28 showed larger effects in women than in men, five showed larger effects in men than in women, and 11 showed opposite effects between sexes. No age-dependent effects were identified for WHRadjBMI. This is the first genome-wide interaction meta-analysis to report convincing evidence of age-dependent genetic effects on BMI. In addition, we confirm the sex-specificity of genetic effects on WHRadjBMI. These results may provide further insights into the biology that underlies weight change with age or the sexually dimorphism of body shape.

584 citations


Journal ArticleDOI
Kyle J. Gaulton1, Kyle J. Gaulton2, Teresa Ferreira2, Yeji Lee3  +258 moreInstitutions (73)
TL;DR: This paper performed fine mapping of 39 established type 2 diabetes (T2D) loci in 27,206 cases and 57,574 controls of European ancestry, and identified 49 distinct association signals at these loci including five mapping in or near KCNQ1.
Abstract: We performed fine mapping of 39 established type 2 diabetes (T2D) loci in 27,206 cases and 57,574 controls of European ancestry. We identified 49 distinct association signals at these loci, including five mapping in or near KCNQ1. 'Credible sets' of the variants most likely to drive each distinct signal mapped predominantly to noncoding sequence, implying that association with T2D is mediated through gene regulation. Credible set variants were enriched for overlap with FOXA2 chromatin immunoprecipitation binding sites in human islet and liver cells, including at MTNR1B, where fine mapping implicated rs10830963 as driving T2D association. We confirmed that the T2D risk allele for this SNP increases FOXA2-bound enhancer activity in islet- and liver-derived cells. We observed allele-specific differences in NEUROD1 binding in islet-derived cells, consistent with evidence that the T2D risk allele increases islet MTNR1B expression. Our study demonstrates how integration of genetic and genomic information can define molecular mechanisms through which variants underlying association signals exert their effects on disease.

370 citations


Journal ArticleDOI
Jennifer Wessel1, Audrey Y. Chu2, Sara M. Willems3, Shuai Wang2  +199 moreInstitutions (63)
TL;DR: In this article, the role of coding variation on intermediate traits for type 2 diabetes was explored by analysis of variants on the HumanExome BeadChip in 60,564 non-diabetic individuals and in 16,491 T2D cases and 81,877 controls.
Abstract: Fasting glucose and insulin are intermediate traits for type 2 diabetes. Here we explore the role of coding variation on these traits by analysis of variants on the HumanExome BeadChip in 60,564 non-diabetic individuals and in 16,491 T2D cases and 81,877 controls. We identify a novel association of a low-frequency nonsynonymous SNV in GLP1R (A316T; rs10305492; MAF=1.4%) with lower FG (β=-0.09±0.01 mmol l(-1), P=3.4 × 10(-12)), T2D risk (OR[95%CI]=0.86[0.76-0.96], P=0.010), early insulin secretion (β=-0.07±0.035 pmolinsulin mmolglucose(-1), P=0.048), but higher 2-h glucose (β=0.16±0.05 mmol l(-1), P=4.3 × 10(-4)). We identify a gene-based association with FG at G6PC2 (pSKAT=6.8 × 10(-6)) driven by four rare protein-coding SNVs (H177Y, Y207S, R283X and S324P). We identify rs651007 (MAF=20%) in the first intron of ABO at the putative promoter of an antisense lncRNA, associating with higher FG (β=0.02±0.004 mmol l(-1), P=1.3 × 10(-8)). Our approach identifies novel coding variant associations and extends the allelic spectrum of variation underlying diabetes-related quantitative traits and T2D susceptibility.

192 citations


Journal ArticleDOI
Peter K. Joshi1, Tõnu Esko2, Hannele Mattsson3, Niina Eklund4  +355 moreInstitutions (106)
23 Jul 2015-Nature
TL;DR: This study provides evidence that increased stature and cognitive function have been positively selected in human evolution, whereas many important risk factors for late-onset complex diseases may not have been.
Abstract: Homozygosity has long been associated with rare, often devastating, Mendelian disorders, and Darwin was one of the first to recognize that inbreeding reduces evolutionary fitness. However, the effect of the more distant parental relatedness that is common in modern human populations is less well understood. Genomic data now allow us to investigate the effects of homozygosity on traits of public health importance by observing contiguous homozygous segments (runs of homozygosity), which are inferred to be homozygous along their complete length. Given the low levels of genome-wide homozygosity prevalent in most human populations, information is required on very large numbers of people to provide sufficient power. Here we use runs of homozygosity to study 16 health-related quantitative traits in 354,224 individuals from 102 cohorts, and find statistically significant associations between summed runs of homozygosity and four complex traits: height, forced expiratory lung volume in one second, general cognitive ability and educational attainment (P < 1 × 10(-300), 2.1 × 10(-6), 2.5 × 10(-10) and 1.8 × 10(-10), respectively). In each case, increased homozygosity was associated with decreased trait value, equivalent to the offspring of first cousins being 1.2 cm shorter and having 10 months' less education. Similar effect sizes were found across four continental groups and populations with different degrees of genome-wide homozygosity, providing evidence that homozygosity, rather than confounding, directly contributes to phenotypic variance. Contrary to earlier reports in substantially smaller samples, no evidence was seen of an influence of genome-wide homozygosity on blood pressure and low density lipoprotein cholesterol, or ten other cardio-metabolic traits. Since directional dominance is predicted for traits under directional evolutionary selection, this study provides evidence that increased stature and cognitive function have been positively selected in human evolution, whereas many important risk factors for late-onset complex diseases may not have been.

145 citations


Journal ArticleDOI
TL;DR: The results imply that tens of thousands of individuals, extensive functional annotation, or highly targeted hypothesis testing will be required to confidently detect or exclude rare variant signals at complex disease loci.
Abstract: Genome and exome sequencing in large cohorts enables characterization of the role of rare variation in complex diseases. Success in this endeavor, however, requires investigators to test a diverse array of genetic hypotheses which differ in the number, frequency and effect sizes of underlying causal variants. In this study, we evaluated the power of gene-based association methods to interrogate such hypotheses, and examined the implications for study design. We developed a flexible simulation approach, using 1000 Genomes data, to (a) generate sequence variation at human genes in up to 10K case-control samples, and (b) quantify the statistical power of a panel of widely used gene-based association tests under a variety of allelic architectures, locus effect sizes, and significance thresholds. For loci explaining ~1% of phenotypic variance underlying a common dichotomous trait, we find that all methods have low absolute power to achieve exome-wide significance (~5-20% power at α=2.5×10-6) in 3K individuals; even in 10K samples, power is modest (~60%). The combined application of multiple methods increases sensitivity, but does so at the expense of a higher false positive rate. MiST, SKAT-O, and KBAC have the highest individual mean power across simulated datasets, but we observe wide architecture-dependent variability in the individual loci detected by each test, suggesting that inferences about disease architecture from analysis of sequencing studies can differ depending on which methods are used. Our results imply that tens of thousands of individuals, extensive functional annotation, or highly targeted hypothesis testing will be required to confidently detect or exclude rare variant signals at complex disease loci.

132 citations


Journal ArticleDOI
TL;DR: It is found that the AMY1 copy number in an individual's genome is generally even (rather than odd) and partially correlates with nearby SNPs, which do not associate with body mass index (BMI).
Abstract: Hundreds of genes reside in structurally complex, poorly understood regions of the human genome. One such region contains the three amylase genes (AMY2B, AMY2A and AMY1) responsible for digesting starch into sugar. Copy number of AMY1 is reported to be the largest genomic influence on obesity, although genome-wide association studies for obesity have found this locus unremarkable. Using whole-genome sequence analysis, droplet digital PCR and genome mapping, we identified eight common structural haplotypes of the amylase locus that suggest its mutational history. We found that the AMY1 copy number in an individual's genome is generally even (rather than odd) and partially correlates with nearby SNPs, which do not associate with body mass index (BMI). We measured amylase gene copy number in 1,000 obese or lean Estonians and in 2 other cohorts totaling ∼3,500 individuals. We had 99% power to detect the lower bound of the reported effects on BMI, yet found no association.

127 citations


23 Jul 2015
TL;DR: In this article, the authors use runs of homozygosity to study 16 health-related quantitative traits in 354,224 individuals from 102 cohorts, and find statistically significant associations between summed runs of heterozygosity and four complex traits: height, forced expiratory lung volume in one second, general cognitive ability and educational attainment.
Abstract: Homozygosity has long been associated with rare, often devastating, Mendelian disorders1, and Darwin was one of the first to recognize that inbreeding reduces evolutionary fitness2. However, the effect of the more distant parental relatedness that is common in modern human populations is less well understood. Genomic data now allow us to investigate the effects of homozygosity on traits of public health importance by observing contiguous homozygous segments (runs of homozygosity), which are inferred to be homozygous along their complete length. Given the low levels of genome-wide homozygosity prevalent in most human populations, information is required on very large numbers of people to provide sufficient power3, 4. Here we use runs of homozygosity to study 16 health-related quantitative traits in 354,224 individuals from 102 cohorts, and find statistically significant associations between summed runs of homozygosity and four complex traits: height, forced expiratory lung volume in one second, general cognitive ability and educational attainment (P < 1 × 10−300, 2.1 × 10−6, 2.5 × 10−10 and 1.8 × 10−10, respectively). In each case, increased homozygosity was associated with decreased trait value, equivalent to the offspring of first cousins being 1.2 cm shorter and having 10 months’ less education. Similar effect sizes were found across four continental groups and populations with different degrees of genome-wide homozygosity, providing evidence that homozygosity, rather than confounding, directly contributes to phenotypic variance. Contrary to earlier reports in substantially smaller samples5, 6, no evidence was seen of an influence of genome-wide homozygosity on blood pressure and low density lipoprotein cholesterol, or ten other cardio-metabolic traits. Since directional dominance is predicted for traits under directional evolutionary selection7, this study provides evidence that increased stature and cognitive function have been positively selected in human evolution, whereas many important risk factors for late-onset complex diseases may not have been.

Journal ArticleDOI
TL;DR: Genome-wide association (GWAS) and sequencing studies are providing new insights into the genetic basis of type 2 diabetes (T2D) and the inter-individual variation in glycemic traits, including levels of glucose, insulin, proinsulin and hemoglobin A1c.
Abstract: Genome-wide association (GWAS) and sequencing studies are providing new insights into the genetic basis of type 2 diabetes (T2D) and the inter-individual variation in glycemic traits, including levels of glucose, insulin, proinsulin and hemoglobin A1c (HbA1c). At the end of 2011, established loci (P 0.05] variants in increasingly large sample sizes from populations around the world, and in trans-ancestry studies that successfully combine data from diverse populations. Most recently, advances in sequencing have led to the discovery of four loci for T2D or glycemic traits based on low-frequency (0.005 < MAF ≤ 0.05) variants, and additional low-frequency, potentially functional variants have been identified at GWAS loci. Established published loci now total ∼88 for T2D and 83 for one or more glycemic traits, and many additional loci likely remain to be discovered. Future studies will build on these successes by identifying additional loci and by determining the pathogenic effects of the underlying variants and genes.


29 Jan 2015
TL;DR: The approach identifies novel coding variant associations and extends the allelic spectrum of variation underlying diabetes-related quantitative traits and T2D susceptibility.

Journal ArticleDOI
Anubha Mahajan1, Xueling Sim2, Hui Jin Ng3, Alisa K. Manning4, Manuel A. Rivas1, Heather M. Highland5, Adam E. Locke2, Niels Grarup6, Hae Kyung Im7, Pablo Cingolani8, Jason Flannick9, Pierre Fontanillas4, Christian Fuchsberger2, Kyle J. Gaulton1, Tanya M. Teslovich2, N. William Rayner1, Neil R. Robertson1, Nicola L. Beer3, Jana K. Rundle3, Jette Bork-Jensen6, Claes Ladenvall10, Christine Blancher1, David Buck1, Gemma Buck1, Noël P. Burtt4, Stacey Gabriel4, Anette P. Gjesing6, Christopher J. Groves3, Mette Hollensted6, Jeroen R. Huyghe2, Anne U. Jackson2, Goo Jun2, Johanne Marie Justesen6, Massimo Mangino11, Jacquelyn Murphy4, Matt J. Neville3, Robert C. Onofrio4, Kerrin S. Small11, Heather M. Stringham2, Ann-Christine Syvänen12, Joseph Trakalo1, Gonçalo R. Abecasis2, Graeme I. Bell7, John Blangero13, Nancy J. Cox7, Ravindranath Duggirala13, Craig L. Hanis5, Mark Seielstad14, James G. Wilson15, Cramer Christensen, Ivan Brandslund16, Rainer Rauramaa, Gabriela L. Surdulescu11, Alex S. F. Doney17, Lars Lannfelt18, Allan Linneberg6, Bo Isomaa, Tiinamaija Tuomi19, Marit E. Jørgensen20, Torben Jørgensen21, Johanna Kuusisto22, Matti Uusitupa22, Veikko Salomaa23, Tim D. Spector11, Andrew D. Morris17, Colin N. A. Palmer17, Francis S. Collins23, Karen L. Mohlke24, Richard N. Bergman25, Erik Ingelsson1, Lars Lind18, Jaakko Tuomilehto26, Torben Hansen16, Richard M. Watanabe27, Inga Prokopenko1, Josée Dupuis28, Fredrik Karpe3, Leif Groop10, Markku Laakso22, Oluf Pedersen6, Jose C. Florez9, Andrew P. Morris1, David Altshuler29, James B. Meigs9, Michael Boehnke2, Mark I. McCarthy1, Cecilia M. Lindgren1, Anna L. Gloyn3 
TL;DR: In this article, the authors analyzed exome-array data from up to 33,231 non-diabetic individuals of European ancestry and identified multiple coding variants in G6PC2 (p.Val219Leu, p.His177Tyr, and p.Tyr207Ser) influencing FG levels, conditionally independent of each other and the non-coding GWAS signal.
Abstract: Genome wide association studies (GWAS) for fasting glucose (FG) and insulin (FI) have identified common variant signals which explain 4.8% and 1.2% of trait variance, respectively. It is hypothesized that low-frequency and rare variants could contribute substantially to unexplained genetic variance. To test this, we analyzed exome-array data from up to 33,231 non-diabetic individuals of European ancestry. We found exome-wide significant (P<5×10-7) evidence for two loci not previously highlighted by common variant GWAS: GLP1R (p.Ala316Thr, minor allele frequency (MAF)=1.5%) influencing FG levels, and URB2 (p.Glu594Val, MAF = 0.1%) influencing FI levels. Coding variant associations can highlight potential effector genes at (non-coding) GWAS signals. At the G6PC2/ABCB11 locus, we identified multiple coding variants in G6PC2 (p.Val219Leu, p.His177Tyr, and p.Tyr207Ser) influencing FG levels, conditionally independent of each other and the non-coding GWAS signal. In vitro assays demonstrate that these associated coding alleles result in reduced protein abundance via proteasomal degradation, establishing G6PC2 as an effector gene at this locus. Reconciliation of single-variant associations and functional effects was only possible when haplotype phase was considered. In contrast to earlier reports suggesting that, paradoxically, glucose-raising alleles at this locus are protective against type 2 diabetes (T2D), the p.Val219Leu G6PC2 variant displayed a modest but directionally consistent association with T2D risk. Coding variant associations for glycemic traits in GWAS signals highlight PCSK1, RREB1, and ZHX3 as likely effector transcripts. These coding variant association signals do not have a major impact on the trait variance explained, but they do provide valuable biological insights.

Journal ArticleDOI
TL;DR: Three types of approximate F‐distribution tests based on Pillai–Bartlett trace, Hotelling–Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants in one genetic region.
Abstract: In genetics, pleiotropy describes the genetic effect of a single gene on multiple phenotypic traits. A common approach is to analyze the phenotypic traits separately using univariate analyses and combine the test results through multiple comparisons. This approach may lead to low power. Multivariate functional linear models are developed to connect genetic variant data to multiple quantitative traits adjusting for covariates for a unified analysis. Three types of approximate F-distribution tests based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants in one genetic region. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and optimal sequence kernel association test (SKAT-O). Extensive simulations were performed to evaluate the false positive rates and power performance of the proposed models and tests. We show that the approximate F-distribution tests control the type I error rates very well. Overall, simultaneous analysis of multiple traits can increase power performance compared to an individual test of each trait. The proposed methods were applied to analyze (1) four lipid traits in eight European cohorts, and (2) three biochemical traits in the Trinity Students Study. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and SKAT-O for the three biochemical traits. The approximate F-distribution tests of the proposed functional linear models are more sensitive than those of the traditional multivariate linear models that in turn are more sensitive than SKAT-O in the univariate case. The analysis of the four lipid traits and the three biochemical traits detects more association than SKAT-O in the univariate case.

Journal ArticleDOI
TL;DR: Allelic-expression-imbalance assays performed with RNA from primary human hepatocyte samples and expression-quantitative-trait-locus data in human subcutaneous adipose tissue samples confirmed that alleles associated with increased HDL-C are associated with a modest increase in GALNT2 expression.
Abstract: Genome-wide association studies (GWASs) have identified more than 150 loci associated with blood lipid and cholesterol levels; however, the functional and molecular mechanisms for many associations are unknown. We examined the functional regulatory effects of candidate variants at the GALNT2 locus associated with high-density lipoprotein cholesterol (HDL-C). Fine-mapping and conditional analyses in the METSIM study identified a single locus harboring 25 noncoding variants (r(2) > 0.7 with the lead GWAS variants) strongly associated with total cholesterol in medium-sized HDL (e.g., rs17315646, p = 3.5 × 10(-12)). We used luciferase reporter assays in HepG2 cells to test all 25 variants for allelic differences in regulatory enhancer activity. rs2281721 showed allelic differences in transcriptional activity (75-fold [T] versus 27-fold [C] more than the empty-vector control), as did a separate 780-bp segment containing rs4846913, rs2144300, and rs6143660 (49-fold [AT(-) haplotype] versus 16-fold [CC(+) haplotype] more). Using electrophoretic mobility shift assays, we observed differential CEBPB binding to rs4846913, and we confirmed this binding in a native chromatin context by performing chromatin-immunoprecipitation (ChIP) assays in HepG2 and Huh-7 cell lines of differing genotypes. Additionally, sequence reads in HepG2 DNase-I-hypersensitivity and CEBPB ChIP-seq signals spanning rs4846913 showed significant allelic imbalance. Allelic-expression-imbalance assays performed with RNA from primary human hepatocyte samples and expression-quantitative-trait-locus (eQTL) data in human subcutaneous adipose tissue samples confirmed that alleles associated with increased HDL-C are associated with a modest increase in GALNT2 expression. Together, these data suggest that at least rs4846913 and rs2281721 play key roles in influencing GALNT2 expression at this HDL-C locus.

Journal ArticleDOI
TL;DR: This work proposes methods to model contamination during genotype calling as an alternative to removal of contaminated samples from further analyses and demonstrates that, for moderate contamination levels, contamination-adjusted calls eliminate 48%-77% of the genotyping errors.
Abstract: DNA sample contamination is a frequent problem in DNA sequencing studies and can result in genotyping errors and reduced power for association testing. We recently described methods to identify within-species DNA sample contamination based on sequencing read data, showed that our methods can reliably detect and estimate contamination levels as low as 1%, and suggested strategies to identify and remove contaminated samples from sequencing studies. Here we propose methods to model contamination during genotype calling as an alternative to removal of contaminated samples from further analyses. We compare our contamination-adjusted calls to calls that ignore contamination and to calls based on uncontaminated data. We demonstrate that, for moderate contamination levels (5%–20%), contamination-adjusted calls eliminate 48%–77% of the genotyping errors. For lower levels of contamination, our contamination correction methods produce genotypes nearly as accurate as those based on uncontaminated data. Our contamination correction methods are useful generally, but are particularly helpful for sample contamination levels from 2% to 20%.

Journal ArticleDOI
01 Jan 2015-Genetics
TL;DR: Functional linear models are developed for meta-analyses that connect genetic data to quantitative traits, adjusting for covariates, and related test statistics can be useful in whole-genome and whole-exome association studies.
Abstract: Meta-analysis of genetic data must account for differences among studies including study designs, markers genotyped, and covariates. The effects of genetic variants may differ from population to population, i.e., heterogeneity. Thus, meta-analysis of combining data of multiple studies is difficult. Novel statistical methods for meta-analysis are needed. In this article, functional linear models are developed for meta-analyses that connect genetic data to quantitative traits, adjusting for covariates. The models can be used to analyze rare variants, common variants, or a combination of the two. Both likelihood-ratio test (LRT) and F-distributed statistics are introduced to test association between quantitative traits and multiple variants in one genetic region. Extensive simulations are performed to evaluate empirical type I error rates and power performance of the proposed tests. The proposed LRT and F-distributed statistics control the type I error very well and have higher power than the existing methods of the meta-analysis sequence kernel association test (MetaSKAT). We analyze four blood lipid levels in data from a meta-analysis of eight European studies. The proposed methods detect more significant associations than MetaSKAT and the P-values of the proposed LRT and F-distributed statistics are usually much smaller than those of MetaSKAT. The functional linear models and related test statistics can be useful in whole-genome and whole-exome association studies.

Journal ArticleDOI
TL;DR: Three gene‐based tests designed for association testing of low‐frequency variants on the X chromosome are proposed: burden, sequence kernel association test (SKAT), and optimal unified SKAT (SKat‐O).
Abstract: Although genome-wide association studies (GWAS) have identified thousands of trait-associated genetic variants, there are relatively few findings on the X chromosome. For analysis of low-frequency variants (minor allele frequency <5%), investigators can use region- or gene-based tests where multiple variants are analyzed jointly to increase power. To date, there are no gene-based tests designed for association testing of low-frequency variants on the X chromosome. Here we propose three gene-based tests for the X chromosome: burden, sequence kernel association test (SKAT), and optimal unified SKAT (SKAT-O). Using simulated case-control and quantitative trait (QT) data, we evaluate the calibration and power of these tests as a function of (1) male:female sample size ratio; and (2) coding of haploid male genotypes for variants under X-inactivation. For case-control studies, all three tests are reasonably well-calibrated for all scenarios we evaluated. As expected, power for gene-based tests depends on the underlying genetic architecture of the genomic region analyzed. Studies with more (haploid) males are generally less powerful due to decreased number of chromosomes. Power generally is slightly greater when the coding scheme for male genotypes matches the true underlying model, but the power loss for misspecifying the (generally unknown) model is small. For QT studies, type I error and power results largely mirror those for binary traits. We demonstrate the use of these three gene-based tests for X-chromosome association analysis in simulated data and sequencing data from the Genetics of Type 2 Diabetes (GoT2D) study.

23 Jul 2015
TL;DR: In this paper, the authors use runs of homozygosity to study 16 health-related quantitative traits in 354,224 individuals from 102 cohorts, and find statistically significant associations between summed runs and four complex traits: height, forced expiratory lung volume in one second, general cognitive ability and educational attainment.
Abstract: Homozygosity has long been associated with rare, often devastating, Mendelian disorders, and Darwin was one of the first to recognize that inbreeding reduces evolutionary fitness. However, the effect of the more distant parental relatedness that is common in modern human populations is less well understood. Genomic data now allow us to investigate the effects of homozygosity on traits of public health importance by observing contiguous homozygous segments (runs of homozygosity), which are inferred to be homozygous along their complete length. Given the low levels of genome-wide homozygosity prevalent in most human populations, information is required on very large numbers of people to provide sufficient power. Here we use runs of homozygosity to study 16 health-related quantitative traits in 354,224 individuals from 102 cohorts, and find statistically significant associations between summed runs of homozygosity and four complex traits: height, forced expiratory lung volume in one second, general cognitive ability and educational attainment (P < 1 × 10-300, 2.1 × 10-6, 2.5 × 10-10 and 1.8 × 10-10, respectively). In each case, increased homozygosity was associated with decreased trait value, equivalent to the offspring of first cousins being 1.2 cm shorter and having 10 months' less education. Similar effect sizes were found across four continental groups and populations with different degrees of genome-wide homozygosity, providing evidence that homozygosity, rather than confounding, directly contributes to phenotypic variance. Contrary to earlier reports in substantially smaller samples, no evidence was seen of an influence of genome-wide homozygosity on blood pressure and low density lipoprotein cholesterol, or ten other cardio-metabolic traits. Since directional dominance is predicted for traits under directional evolutionary selection, this study provides evidence that increased stature and cognitive function have been positively selected in human

DatasetDOI
01 Jan 2015
TL;DR: In this article, the sum of cells within 5° latitudinal bins for all model classes, binary thresholds and climate scenarios was calculated. But the number of cells in each bin was not included.
Abstract: Sum of cells within 5° latitudinal bins for all model classes, binary thresholds and climate scenarios.