Showing papers in &quot;Genetic Epidemiology in 2016&quot;

A perspective on interaction effects in genetic association studies.

TL;DR: A comprehensive meta‐analysis of 19 studies was performed, and a general model allowing the integration of the different types of cancer risk available in the literature was developed, to obtain a consensus estimate of BC penetrance.

...read moreread less

Abstract: The gene responsible for ataxia-telangiectasia syndrome, ATM, is also an intermediate-risk breast cancer (BC) susceptibility gene. Numerous studies have been carried out to determine the contribution of ATM gene mutations to BC risk. Epidemiological cohorts, segregation analyses, and case-control studies reported BC risk in different forms, including penetrance, relative risk, standardized incidence ratio, and odds ratio. Because the reported estimates vary both qualitatively and quantitatively, we developed a general model allowing the integration of the different types of cancer risk available in the literature. We performed a comprehensive meta-analysis identifying 19 studies, and used our model to obtain a consensus estimate of BC penetrance. We estimated the cumulative risk of BC in heterozygous ATM mutation carriers to be 6.02% by 50 years of age (95% credible interval: 4.58-7.42%) and 32.83% by 80 years of age (95% credible interval: 24.55-40.43%). An accurate assessment of cancer penetrance is crucial to help mutation carriers make medical and lifestyle decisions that can reduce their chances of developing the disease.

...read moreread less

97 citations

Journal Article•DOI•

[...]

Hugues Aschard¹•Institutions (1)

Harvard University¹

07 Jul 2016-Genetic Epidemiology

TL;DR: A revisit and untangle major theoretical aspects of interaction tests in the special case of linear regression, and explores the advantages and limitations of multivariate interaction models, when testing for interaction between multiple SNPs and/or multiple exposures, over univariate approaches.

...read moreread less

Abstract: The identification of gene-gene and gene-environment interaction in human traits and diseases is an active area of research that generates high expectation, and most often lead to high disappointment. This is partly explained by a misunderstanding of the inherent characteristics of standard regression-based interaction analyses. Here, I revisit and untangle major theoretical aspects of interaction tests in the special case of linear regression; in particular, I discuss variables coding scheme, interpretation of effect estimate, statistical power, and estimation of variance explained in regard of various hypothetical interaction patterns. Linking this components it appears first that the simplest biological interaction models-in which the magnitude of a genetic effect depends on a common exposure-are among the most difficult to identify. Second, I highlight the demerit of the current strategy to evaluate the contribution of interaction effects to the variance of quantitative outcomes and argue for the use of new approaches to overcome this issue. Finally, I explore the advantages and limitations of multivariate interaction models, when testing for interaction between multiple SNPs and/or multiple exposures, over univariate approaches. Together, these new insights can be leveraged for future method development and to improve our understanding of the genetic architecture of multifactorial traits.

...read moreread less

83 citations

Journal Article•DOI•

JAM: A Scalable Bayesian Framework for Joint Analysis of Marginal SNP Effects

[...]

Paul J. Newcombe, David V. Conti¹, Sylvia Richardson•Institutions (1)

University of Southern California¹

01 Apr 2016-Genetic Epidemiology

TL;DR: A new and scalable algorithm, joint analysis of marginal summary statistics (JAM), for the re‐analysis of published marginal summary stactistics under joint multi‐SNP models is described and demonstrated identical performance to various alternatives designed for single region settings.

...read moreread less

Abstract: Recently, large scale genome-wide association study (GWAS) meta-analyses have boosted the number of known signals for some traits into the tens and hundreds. Typically, however, variants are only analysed one-at-a-time. This complicates the ability of fine-mapping to identify a small set of SNPs for further functional follow-up. We describe a new and scalable algorithm, joint analysis of marginal summary statistics (JAM), for the re-analysis of published marginal summary statistics under joint multi-SNP models. The correlation is accounted for according to estimates from a reference dataset, and models and SNPs that best explain the complete joint pattern of marginal effects are highlighted via an integrated Bayesian penalized regression framework. We provide both enumerated and Reversible Jump MCMC implementations of JAM and present some comparisons of performance. In a series of realistic simulation studies, JAM demonstrated identical performance to various alternatives designed for single region settings. In multi-region settings, where the only multivariate alternative involves stepwise selection, JAM offered greater power and specificity. We also present an application to real published results from MAGIC (meta-analysis of glucose and insulin related traits consortium) - a GWAS meta-analysis of more than 15,000 people. We re-analysed several genomic regions that produced multiple significant signals with glucose levels 2 hr after oral stimulation. Through joint multivariate modelling, JAM was able to formally rule out many SNPs, and for one gene, ADCY5, suggests that an additional SNP, which transpired to be more biologically plausible, should be followed up with equal priority to the reported index.

...read moreread less

75 citations

Journal Article•DOI•

Review of the Gene-Environment Interaction Literature in Cancer: What Do We Know?

[...]

Naoko I. Simonds¹, Armen A. Ghazarian, Camilla B. Pimentel², Sheri D. Schully¹, Gary L. Ellison¹, Elizabeth M. Gillanders¹, Leah E. Mechanic¹ - Show less +3 more•Institutions (2)

National Institutes of Health¹, University of Massachusetts Medical School²

Many Phenotypes Without Many False Discoveries: Error Controlling Strategies for Multitrait Association Studies

TL;DR: The study of gene‐environment interactions (G×E) has been an active area of research, but little is reported about the known findings in the literature.

...read moreread less

Abstract: Background Risk of cancer is determined by a complex interplay of genetic and environmental factors. Although the study of gene-environment interactions (G×E) has been an active area of research, little is reported about the known findings in the literature. Methods To examine the state of the science in G×E research in cancer, we performed a systematic review of published literature using gene-environment or pharmacogenomic flags from two curated databases of genetic association studies, the Human Genome Epidemiology (HuGE) literature finder and Cancer Genome-Wide Association and Meta Analyses Database (CancerGAMAdb), from January 1, 2001, to January 31, 2011. A supplemental search using HuGE was conducted for articles published from February 1, 2011, to April 11, 2013. A 25% sample of the supplemental publications was reviewed. Results A total of 3,019 articles were identified in the original search. From these articles, 243 articles were determined to be relevant based on inclusion criteria (more than 3,500 interactions). From the supplemental search (1,400 articles identified), 29 additional relevant articles (1,370 interactions) were included. The majority of publications in both searches examined G×E in colon, rectal, or colorectal; breast; or lung cancer. Specific interactions examined most frequently included environmental factors categorized as energy balance (e.g., body mass index, diet), exogenous (e.g., oral contraceptives) and endogenous hormones (e.g., menopausal status), chemical environment (e.g., grilled meats), and lifestyle (e.g., smoking, alcohol intake). In both searches, the majority of interactions examined were using loci from candidate genes studies and none of the studies were genome-wide interaction studies (GEWIS). The most commonly reported measure was the interaction P-value, of which a sizable number of P-values were considered statistically significant (i.e., <0.05). In addition, the magnitude of interactions reported was modest. Conclusion Observations of published literature suggest that opportunity exists for increased sample size in G×E research, including GWAS-identified loci in G×E studies, exploring more GWAS approaches in G×E such as GEWIS, and improving the reporting of G×E findings.

...read moreread less

75 citations

Journal Article•DOI•

[...]

Christine B. Peterson¹, Marina Bogomolov², Yoav Benjamini³, Chiara Sabatti¹•Institutions (3)

Stanford University¹, Technion – Israel Institute of Technology², Tel Aviv University³

Assessing the genetic predisposition of education on myopia: a Mendelian randomization study

TL;DR: In this article, a simple hierarchical testing procedure was proposed to control the false discovery rate and the expected value of the average proportion of false discovery of phenotypes influenced by such variants.

...read moreread less

Abstract: The genetic basis of multiple phenotypes such as gene expression, metabolite levels, or imaging features is often investigated by testing a large collection of hypotheses, probing the existence of association between each of the traits and hundreds of thousands of genotyped variants. Appropriate multiplicity adjustment is crucial to guarantee replicability of findings, and the false discovery rate (FDR) is frequently adopted as a measure of global error. In the interest of interpretability, results are often summarized so that reporting focuses on variants discovered to be associated to some phenotypes. We show that applying FDR-controlling procedures on the entire collection of hypotheses fails to control the rate of false discovery of associated variants as well as the expected value of the average proportion of false discovery of phenotypes influenced by such variants. We propose a simple hierarchical testing procedure that allows control of both these error rates and provides a more reliable basis for the identification of variants with functional effects. We demonstrate the utility of this approach through simulation studies comparing various error rates and measures of power for genetic association studies of multiple traits. Finally, we apply the proposed method to identify genetic variants that impact flowering phenotypes in Arabidopsis thaliana, expanding the set of discoveries.

...read moreread less

67 citations

Journal Article•DOI•

[...]

Gabriel Cuellar-Partida¹, Yi Lu¹, Pik Fang Kho¹, Alex W. Hewitt², H-Erich Wichmann, Seyhan Yazar³, Dwight Stambolian⁴, Joan E. Bailey-Wilson⁵, Robert Wojciechowski⁶, Jie Jin Wang⁷, Paul Mitchell⁷, David A. Mackey³, Stuart MacGregor¹ - Show less +9 more•Institutions (7)

QIMR Berghofer Medical Research Institute¹, Menzies Research Institute², University of Western Australia³, University of Pennsylvania⁴, National Institutes of Health⁵, Johns Hopkins University⁶, University of Sydney⁷

USAT: A Unified Score-Based Association Test for Multiple Phenotype-Genotype Analysis.

TL;DR: The Mendelian Randomization analysis provides new evidence for a causal role of educational attainment on refractive error and suggests that observational studies may actually underestimate the true effect of education.

...read moreread less

Abstract: Myopia is the largest cause of uncorrected visual impairments globally and its recent dramatic increase in the population has made it a major public health problem. In observational studies, educational attainment has been consistently reported to be correlated to myopia. Nonetheless, correlation does not imply causation. Observational studies do not tell us if education causes myopia or if instead there are confounding factors underlying the association. In this work, we use a two-step least squares instrumental-variable (IV) approach to estimate the causal effect of education on refractive error, specifically myopia. We used the results from the educational attainment GWAS from the Social Science Genetic Association Consortium to define a polygenic risk score (PGRS) in three cohorts of late middle age and elderly Caucasian individuals (N = 5,649). In a meta-analysis of the three cohorts, using the PGRS as an IV, we estimated that each z-score increase in education (approximately 2 years of education) results in a reduction of 0.92 ± 0.29 diopters (P = 1.04 × 10(-3) ). Our estimate of the effect of education on myopia was higher (P = 0.01) than the observed estimate (0.25 ± 0.03 diopters reduction per education z-score [∼2 years] increase). This suggests that observational studies may actually underestimate the true effect. Our Mendelian Randomization (MR) analysis provides new evidence for a causal role of educational attainment on refractive error.

...read moreread less

61 citations

Journal Article•DOI•

[...]

Debashree Ray¹, James S. Pankow¹, Saonli Basu¹•Institutions (1)

University of Minnesota¹

Small Sample Kernel Association Tests for Human Genetic and Microbiome Association Studies.

TL;DR: It is shown that MANOVA is generally very powerful for detecting association but there are situations, such as when a genetic variant is associated with all the traits, where MANOVA may not have any detection power, and a unified score‐based test statistic USAT is proposed that can perform better than MANOVA in such situations and nearly as well as MANOVA elsewhere.

...read moreread less

Abstract: Genome-wide association studies (GWASs) for complex diseases often collect data on multiple correlated endo-phenotypes. Multivariate analysis of these correlated phenotypes can improve the power to detect genetic variants. Multivariate analysis of variance (MANOVA) can perform such association analysis at a GWAS level, but the behavior of MANOVA under different trait models has not been carefully investigated. In this paper, we show that MANOVA is generally very powerful for detecting association but there are situations, such as when a genetic variant is associated with all the traits, where MANOVA may not have any detection power. In these situations, marginal model based methods, however, perform much better than multivariate methods. We investigate the behavior of MANOVA, both theoretically and using simulations, and derive the conditions where MANOVA loses power. Based on our findings, we propose a unified score-based test statistic USAT that can perform better than MANOVA in such situations and nearly as well as MANOVA elsewhere. Our proposed test reports an approximate asymptotic P-value for association and is computationally very efficient to implement at a GWAS level. We have studied through extensive simulations the performance of USAT, MANOVA, and other existing approaches and demonstrated the advantage of using the USAT approach to detect association between a genetic variant and multivariate phenotypes. We applied USAT to data from three correlated traits collected on 5, 816 Caucasian individuals from the Atherosclerosis Risk in Communities (ARIC, The ARIC Investigators []) Study and detected some interesting associations.

...read moreread less

44 citations

Journal Article•DOI•

[...]

Jun Chen¹, Wenan Chen¹, Ni Zhao², Michael C. Wu², Daniel J. Schaid¹ - Show less +1 more•Institutions (2)

Mayo Clinic¹, Fred Hutchinson Cancer Research Center²

Sequence Kernel Association Test of Multiple Continuous Phenotypes.

TL;DR: This work derives an exact test for KAT with continuous traits, which resolve the small sample conservatism of KAT without the need for resampling, and proposes a similar approximate test for binary traits that has significantly improved power to detect association for microbiome studies.

...read moreread less

Abstract: Kernel machine based association tests (KAT) have been increasingly used in testing the association between an outcome and a set of biological measurements due to its power to combine multiple weak signals of complex relationship with the outcome through the specification of a relevant kernel. Human genetic and microbiome association studies are two important applications of KAT. However, the classic KAT framework relies on large sample theory, and conservativeness has been observed for small sample studies, especially for microbiome association studies. The common approach for addressing the small sample problem relies on computationally intensive resampling methods. Here, we derive an exact test for KAT with continuous traits, which resolve the small sample conservatism of KAT without the need for resampling. The exact test has significantly improved power to detect association for microbiome studies. For binary traits, we propose a similar approximate test, and we show that the approximate test is very powerful for a wide range of kernels including common variant- and microbiome-based kernels, and the approximate test controls the type I error well for these kernels. In contrast, the sequence kernel association tests have slightly inflated genomic inflation factors after small sample adjustment. Extensive simulations and application to a real microbiome association study are used to demonstrate the utility of our method.

...read moreread less

Journal Article•DOI•

[...]

Baolin Wu¹, James S. Pankow¹•Institutions (1)

University of Minnesota¹

When do myopia genes have their effect? Comparison of genetic risks between children and adults

TL;DR: The commonly used sequence kernel association test (SKAT) for single‐trait analysis is extended to test for the joint association of rare variant sets with multiple traits to identify an exome‐wide significant rare variant set in the gene YAP1 worthy of further investigations.

...read moreread less

Abstract: Genetic studies often collect multiple correlated traits, which could be analyzed jointly to increase power by aggregating multiple weak effects and provide additional insights into the etiology of complex human diseases. Existing methods for multiple trait association tests have primarily focused on common variants. There is a surprising dearth of published methods for testing the association of rare variants with multiple correlated traits. In this paper, we extend the commonly used sequence kernel association test (SKAT) for single-trait analysis to test for the joint association of rare variant sets with multiple traits. We investigate the performance of the proposed method through extensive simulation studies. We further illustrate its usefulness with application to the analysis of diabetes-related traits in the Atherosclerosis Risk in Communities (ARIC) Study. We identified an exome-wide significant rare variant set in the gene YAP1 worthy of further investigations.

...read moreread less

Journal Article•DOI•

[...]

J. Willem L. Tideman¹, Qiao Fan, Jan Roelof Polling¹, Xiaobo Guo², Xiaobo Guo³, Seyhan Yazar⁴, Anthony P Khawaja⁵, René Höhn⁶, Yi Lu⁷, Vincent W. V. Jaddoe¹, Kenji Yamashiro⁸, Munemitsu Yoshikawa⁸, Aslihan Gerhold-Ay⁶, Stefan Nickels⁶, Tanja Zeller, Mingguang He⁹, Mingguang He³, Thibaud Boutin¹⁰, Goran Benčić, Veronique Vitart¹⁰, David A. Mackey⁴, Paul J. Foster¹¹, Stuart MacGregor⁷, Cathy Williams¹², Seang-Mei Saw¹³, Jeremy A. Guggenheim¹⁴, Caroline C W Klaver¹ - Show less +23 more•Institutions (14)

Erasmus University Rotterdam¹, Carnegie Mellon University², Sun Yat-sen University³, University of Western Australia⁴, University of Cambridge⁵, University of Mainz⁶, QIMR Berghofer Medical Research Institute⁷, Kyoto University⁸, University of Melbourne⁹, University of Edinburgh¹⁰, UCL Institute of Ophthalmology¹¹, University of Bristol¹², National University of Singapore¹³, Cardiff University¹⁴

01 Dec 2016-Genetic Epidemiology

TL;DR: The results provide insights on the age span during which myopia genes exert their effect, and form the basis for understanding the mechanisms underlying high and pathological myopia.

...read moreread less

Abstract: Previous studies have identified many genetic loci for refractive error and myopia. We aimed to investigate the effect of these loci on ocular biometry as a function of age in children, adolescents, and adults. The study population consisted of three age groups identified from the international CREAM consortium: 5,490 individuals aged 25 years. All participants had undergone standard ophthalmic examination including measurements of axial length (AL) and corneal radius (CR). We examined the lead SNP at all 39 currently known genetic loci for refractive error identified from genome-wide association studies (GWAS), as well as a combined genetic risk score (GRS). The beta coefficient for association between SNP genotype or GRS versus AL/CR was compared across the three age groups, adjusting for age, sex, and principal components. Analyses were Bonferroni-corrected. In the age group <10 years, three loci (GJD2, CHRNG, ZIC2) were associated with AL/CR. In the age group 10–25 years, four loci (BMP2, KCNQ5, A2BP1, CACNA1D) were associated; and in adults 20 loci were associated. Association with GRS increased with age; β = 0.0016 per risk allele (P = 2 × 10–8) in <10 years, 0.0033 (P = 5 × 10–15) in 10- to 25-year-olds, and 0.0048 (P = 1 × 10–72) in adults. Genes with strongest effects (LAMA2, GJD2) had an early effect that increased with age. Our results provide insights on the age span during which myopia genes exert their effect. These insights form the basis for understanding the mechanisms underlying high and pathological myopia.

...read moreread less

Journal Article•DOI•

Using Whole Exome Sequencing to Identify Candidate Genes With Rare Variants In Nonsyndromic Cleft Lip and Palate

[...]

Alana Aylward¹, Yi Cai¹, Andrew H. Lee¹, Elizabeth Blue², Daniel Rabinowitz¹, Joseph Haddad¹ - Show less +2 more•Institutions (2)

Columbia University¹, University of Washington²

27 May 2016-Genetic Epidemiology

TL;DR: The goal was to identify candidate genes with rare genetic variants for NSCLP in a Honduran population using whole exome sequencing, and preliminary results identified 3,727 heterozygous rare variants that were predicted to be functionally consequential.

...read moreread less

Abstract: Studies suggest that nonsyndromic cleft lip and palate (NSCLP) is polygenic with variable penetrance, presenting a challenge in identifying all causal genetic variants. Despite relatively high prevalence of NSCLP among Amerindian populations, no large whole exome sequencing (WES) studies have been completed in this population. Our goal was to identify candidate genes with rare genetic variants for NSCLP in a Honduran population using WES. WES was performed on two to four members of 27 multiplex Honduran families. Genetic variants with a minor allele frequency > 1% in reference databases were removed. Heterozygous variants consistent with dominant disease with incomplete penetrance were ascertained, and variants with predicted functional consequence were prioritized for analysis. Pedigree-specific P-values were calculated as the probability of all affected members in the pedigree being carriers, given that at least one is a carrier. Preliminary results identified 3,727 heterozygous rare variants; 1,282 were predicted to be functionally consequential. Twenty-three genes had variants of interest in ≥3 families, where some genes had different variants in each family, giving a total of 50 variants. Variant validation via Sanger sequencing of the families and unrelated unaffected controls excluded variants that were sequencing errors or common variants not in databases, leaving four genes with candidate variants in ≥3 families. Of these, candidate variants in two genes consistently segregate with NSCLP as a dominant variant with incomplete penetrance: ACSS2 and PHYH. Rare variants found at the same gene in all affected individuals in several families are likely to be directly related to NSCLP.

...read moreread less

Journal Article•DOI•

Comparison of Heritability Estimation and Linkage Analysis for Multiple Traits Using Principal Component Analyses

[...]

Jingjing Liang¹, Brian E. Cade², Brian E. Cade³, Heming Wang¹, Han Chen², Kevin J. Gleason³, Kevin J. Gleason², Emma K. Larkin⁴, Richa Saxena, Xihong Lin², Susan Redline², Susan Redline³, Susan Redline⁵, Xiaofeng Zhu¹ - Show less +10 more•Institutions (5)

Case Western Reserve University¹, Harvard University², Brigham and Women's Hospital³, Vanderbilt University Medical Center⁴, Beth Israel Deaconess Medical Center⁵

01 Apr 2016-Genetic Epidemiology

TL;DR: This study demonstrates that principal components generally result in higher heritability and linkage evidence than individual traits, and PCHs can provide useful traits for using data on multiple phenotypes and for genetic studies of trans‐ethnic populations.

...read moreread less

Abstract: A disease trait often can be characterized by multiple phenotypic measurements that can provide complementary information on disease etiology, physiology, or clinical manifestations. Given that multiple phenotypes may be correlated and reflect common underlying genetic mechanisms, the use of multivariate analysis of multiple traits may improve statistical power to detect genes and variants underlying complex traits. The literature, however, has been unclear as to the optimal approach for analyzing multiple correlated traits. In this study, heritability and linkage analysis was performed for six obstructive sleep apnea hypopnea syndrome (OSAHS) related phenotypes, as well as principal components of the phenotypes and principal components of the heritability (PCHs) using the data from Cleveland Family Study, which include both African and European American families. Our study demonstrates that principal components generally result in higher heritability and linkage evidence than individual traits. Furthermore, the PCHs can be transferred across populations, strongly suggesting that these PCHs reflect traits with common underlying genetic mechanisms for OSAHS across populations. Thus, PCHs can provide useful traits for using data on multiple phenotypes and for genetic studies of trans-ethnic populations.

...read moreread less

Journal Article•DOI•

Detecting Gene–Environment Interactions for a Quantitative Trait in a Genome-Wide Association Study

[...]

Pingye Zhang¹, Juan Pablo Lewinger¹, David V. Conti¹, John Morrison¹, W. James Gauderman¹ - Show less +1 more•Institutions (1)

University of Southern California¹

An Empirical Comparison of Joint and Stratified Frameworks for Studying G × E Interactions: Systolic Blood Pressure and Smoking in the CHARGE Gene-Lifestyle Interactions Working Group

TL;DR: It is shown that the Paré et al. approach has an inflated false‐positive rate in the presence of an environmental marginal effect, and an alternative that remains valid is proposed, and a novel 2‐step approach that combines the two screening approaches is proposed that can outperform other GWIS approaches.

...read moreread less

Abstract: A genome-wide association study (GWAS) typically is focused on detecting marginal genetic effects. However, many complex traits are likely to be the result of the interplay of genes and environmental factors. These SNPs may have a weak marginal effect and thus unlikely to be detected from a scan of marginal effects, but may be detectable in a gene-environment (G × E) interaction analysis. However, a genome-wide interaction scan (GWIS) using a standard test of G × E interaction is known to have low power, particularly when one corrects for testing multiple SNPs. Two 2-step methods for GWIS have been previously proposed, aimed at improving efficiency by prioritizing SNPs most likely to be involved in a G × E interaction using a screening step. For a quantitative trait, these include a method that screens on marginal effects [Kooperberg and Leblanc, 2008] and a method that screens on variance heterogeneity by genotype [Pare et al., 2010] In this paper, we show that the Pare et al. approach has an inflated false-positive rate in the presence of an environmental marginal effect, and we propose an alternative that remains valid. We also propose a novel 2-step approach that combines the two screening approaches, and provide simulations demonstrating that the new method can outperform other GWIS approaches. Application of this method to a G × Hispanic-ethnicity scan for childhood lung function reveals a SNP near the MARCO locus that was not identified by previous marginal-effect scans.

...read moreread less

Journal Article•DOI•

[...]

Yun Ju Sung¹, Thomas W. Winkler², Alisa K. Manning³, Alisa K. Manning⁴, Hugues Aschard³, Vilmundur Gudnason⁵, Tamara B. Harris⁶, Albert V. Smith⁵, Eric Boerwinkle⁷, Eric Boerwinkle⁸, Michael R. Brown⁷, Alanna C. Morrison⁷, Myriam Fornage⁷, Li-An Lin⁷, Melissa A. Richard⁷, Traci M. Bartz⁹, Bruce M. Psaty⁹, Bruce M. Psaty¹⁰, Caroline Hayward¹¹, Ozren Polasek¹², Ozren Polasek¹¹, Jonathan Marten¹¹, Igor Rudan¹¹, Mary F. Feitosa¹, Aldi T. Kraja¹, Michael A. Province¹, Xuan Deng¹³, Virginia Fisher¹³, Yanhua Zhou¹³, Lawrence F. Bielak¹⁴, Jennifer A. Smith¹⁴, Jennifer E. Huffman¹¹, Sandosh Padmanabhan¹¹, Sandosh Padmanabhan¹⁵, Blair H. Smith¹¹, Blair H. Smith¹⁶, Jingzhong Ding¹⁷, Yongmei Liu¹⁷, Kurt Lohman¹⁷, Claude Bouchard¹⁸, Tuomo Rankinen¹⁸, Treva Rice¹, Donna K. Arnett¹⁹, Karen Schwander¹, Xiuqing Guo²⁰, Walter Palmas²¹, Jerome I. Rotter²⁰, Tamuno Alfred²², Erwin P. Bottinger²², Ruth J. F. Loos²², Najaf Amin²³, Oscar H. Franco²³, Cornelia M. van Duijn²³, Dina Vojinovic²³, Daniel I. Chasman²⁴, Daniel I. Chasman³, Paul M. Ridker²⁴, Paul M. Ridker³, Lynda M. Rose²⁴, Sharon L.R. Kardia¹⁴, Xiaofeng Zhu²⁵, Kenneth Rice⁹, Ingrid B. Borecki¹, Dabeeru C. Rao¹, W. James Gauderman²⁶, L. Adrienne Cupples¹³ - Show less +62 more•Institutions (26)

Toward the integration of Omics data in epidemiological studies: still a “long and winding road”

TL;DR: In this article, the authors compare the two frameworks using results from genome-wide association studies of systolic blood pressure for 3.2 million low frequency and 6.5 million common variants across 20 cohorts of European ancestry, comprising 79,731 individuals.

...read moreread less

Abstract: Studying gene-environment (G × E) interactions is important, as they extend our knowledge of the genetic architecture of complex traits and may help to identify novel variants not detected via analysis of main effects alone. The main statistical framework for studying G × E interactions uses a single regression model that includes both the genetic main and G × E interaction effects (the “joint” framework). The alternative “stratified” framework combines results from genetic main-effect analyses carried out separately within the exposed and unexposed groups. Although there have been several investigations using theory and simulation, an empirical comparison of the two frameworks is lacking. Here, we compare the two frameworks using results from genome-wide association studies of systolic blood pressure for 3.2 million low frequency and 6.5 million common variants across 20 cohorts of European ancestry, comprising 79,731 individuals. Our cohorts have sample sizes ranging from 456 to 22,983 and include both family-based and population-based samples. In cohort-specific analyses, the two frameworks provided similar inference for population-based cohorts. The agreement was reduced for family-based cohorts. In meta-analyses, agreement between the two frameworks was less than that observed in cohort-specific analyses, despite the increased sample size. In meta-analyses, agreement depended on (1) the minor allele frequency, (2) inclusion of family-based cohorts in meta-analysis, and (3) filtering scheme. The stratified framework appears to approximate the joint framework well only for common variants in population-based cohorts. We conclude that the joint framework is the preferred approach and should be used to control false positives when dealing with low-frequency variants and/or family-based cohorts.

...read moreread less

Journal Article•DOI•

[...]

Evangelina López de Maturana, Silvia Pineda, Angela Brand¹, Kristel Van Steen², Núria Malats - Show less +1 more•Institutions (2)

Maastricht University¹, University of Liège²

01 Nov 2016-Genetic Epidemiology

TL;DR: This contribution aims at approaching the omics and non‐omics data integration from the epidemiology scope by considering the “massive” inclusion of variables in the risk assessment and predictive models.

...read moreread less

Abstract: Primary and secondary prevention can highly benefit a personalized medicine approach through the accurate discrimination of individuals at high risk of developing a specific disease from those at moderate and low risk. To this end precise risk prediction models need to be built. This endeavor requires a precise characterization of the individual exposome, genome, and phenome. Massive molecular omics data representing the different layers of the biological processes of the host and the nonhost will enable to build more accurate risk prediction models. Epidemiologists aim to integrate omics data along with important information coming from other sources (questionnaires, candidate markers) that has been proved to be relevant in the discrimination risk assessment of complex diseases. However, the integrative models in large-scale epidemiologic research are still in their infancy and they face numerous challenges, some of them at the analytical stage. So far, there are a small number of studies that have integrated more than two omics data sets, and the inclusion of non-omics data in the same models is still missing in most of studies. In this contribution, we aim at approaching the omics and non-omics data integration from the epidemiology scope by considering the "massive" inclusion of variables in the risk assessment and predictive models. We also provide already available examples of integrative contributions in the field, propose analytical strategies that allow considering both omics and non-omics data in the models, and finally review the challenges imbedding this type of research.

...read moreread less

Journal Article•DOI•

Identifying significant gene-environment interactions using a combination of screening testing and hierarchical false discovery rate control.

[...]

H. Robert Frost¹, Li Shen², Andrew J. Saykin², Scott M. Williams¹, Jason H. Moore¹ - Show less +1 more•Institutions (2)

Dartmouth College¹, Indiana University²

31 Aug 2016-Genetic Epidemiology

TL;DR: The ability of the approach to identify biologically plausible SNP‐education interactions relative to Alzheimer's disease status using genome‐wide association study data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) is demonstrated.

...read moreread less

Abstract: Although gene-environment (G× E) interactions play an important role in many biological systems, detecting these interactions within genome-wide data can be challenging due to the loss in statistical power incurred by multiple hypothesis correction. To address the challenge of poor power and the limitations of existing multistage methods, we recently developed a screening-testing approach for G× E interaction detection that combines elastic net penalized regression with joint estimation to support a single omnibus test for the presence of G× E interactions. In our original work on this technique, however, we did not assess type I error control or power and evaluated the method using just a single, small bladder cancer data set. In this paper, we extend the original method in two important directions and provide a more rigorous performance evaluation. First, we introduce a hierarchical false discovery rate approach to formally assess the significance of individual G× E interactions. Second, to support the analysis of truly genome-wide data sets, we incorporate a score statistic-based prescreening step to reduce the number of single nucleotide polymorphisms prior to fitting the first stage penalized regression model. To assess the statistical properties of our method, we compare the type I error rate and statistical power of our approach with competing techniques using both simple simulation designs as well as designs based on real disease architectures. Finally, we demonstrate the ability of our approach to identify biologically plausible SNP-education interactions relative to Alzheimer's disease status using genome-wide association study data from the Alzheimer's Disease Neuroimaging Initiative (ADNI).

...read moreread less

Journal Article•DOI•

Meta‐Analysis of Genome‐Wide Association Studies with Correlated Individuals: Application to the Hispanic Community Health Study/Study of Latinos (HCHS/SOL)

[...]

Tamar Sofer¹, John R. Shaffer², Mariaelisa Graff³, Qibin Qi⁴, Adrienne M. Stilp¹, Stephanie M. Gogarten¹, Kari E. North³, Carmen R. Isasi⁴, Cathy C. Laurie¹, Adam A. Szpiro¹ - Show less +6 more•Institutions (4)

University of Washington¹, University of Pittsburgh², University of North Carolina at Chapel Hill³, Albert Einstein College of Medicine⁴

01 Sep 2016-Genetic Epidemiology

TL;DR: Simulations show that MetaCor controls inflation better than alternatives such as ignoring the correlation between the strata or analyzing all strata together in a “pooled” GWAS, especially with different minor allele frequencies (MAFs) between strata.

...read moreread less

Abstract: Investigators often meta-analyze multiple genome-wide association studies (GWASs) to increase the power to detect associations of single nucleotide polymorphisms (SNPs) with a trait. Meta-analysis is also performed within a single cohort that is stratified by, e.g., sex or ancestry group. Having correlated individuals among the strata may complicate meta-analyses, limit power, and inflate Type 1 error. For example, in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), sources of correlation include genetic relatedness, shared household, and shared community. We propose a novel mixed-effect model for meta-analysis, "MetaCor," which accounts for correlation between stratum-specific effect estimates. Simulations show that MetaCor controls inflation better than alternatives such as ignoring the correlation between the strata or analyzing all strata together in a "pooled" GWAS, especially with different minor allele frequencies (MAFs) between strata. We illustrate the benefits of MetaCor on two GWASs in the HCHS/SOL. Analysis of dental caries (tooth decay) stratified by ancestry group detected a genome-wide significant SNP (rs7791001, P-value = 3.66×10-8, compared to 4.67×10-7 in pooled), with different MAFs between strata. Stratified analysis of body mass index (BMI) by ancestry group and sex reduced overall inflation from λGC=1.050 (pooled) to λGC=1.028 (MetaCor). Furthermore, even after removing close relatives to obtain nearly uncorrelated strata, a naive stratified analysis resulted in λGC=1.058 compared to λGC=1.027 for MetaCor.

...read moreread less

Journal Article•DOI•

Determining Which Phenotypes Underlie a Pleiotropic Signal

[...]

Arunabha Majumdar¹, Tanushree Haldar¹, John S. Witte¹•Institutions (1)

University of California, San Francisco¹

30 May 2016-Genetic Epidemiology

TL;DR: In the application to a large GWAS, it is found that the modified B–H procedure also performs well, indicating that this may be an optimal approach for determining the traits underlying a pleiotropic signal.

...read moreread less

Abstract: Discovering pleiotropic loci is important to understand the biological basis of seemingly distinct phenotypes. Most methods for assessing pleiotropy only test for the overall association between genetic variants and multiple phenotypes. To determine which specific traits are pleiotropic, we evaluate via simulation and application three different strategies. The first is model selection techniques based on the inverse regression of genotype on phenotypes. The second is a subset-based meta analysis ASSET [Bhattacharjee et al., 2012], which provides an optimal subset of nonnull traits. And the third is a modified Benjamini–Hochberg (B-H) procedure of controlling the expected false discovery rate [Benjamini and Hochberg, 1995] in the framework of phenome-wide association study. From our simulations we see that an inverse regression-based approach MultiPhen [O'Reilly et al., 2012] is more powerful than ASSET for detecting overall pleiotropic association, except for when all the phenotypes are associated and have genetic effects in the same direction. For determining which specific traits are pleiotropic, the modified B–H procedure performs consistently better than the other two methods. The inverse regression-based selection methods perform competitively with the modified B–H procedure only when the phenotypes are weakly correlated. The efficiency of ASSET is observed to lie below and in between the efficiency of the other two methods when the traits are weakly and strongly correlated, respectively. In our application to a large GWAS, we find that the modified B–H procedure also performs well, indicating that this may be an optimal approach for determining the traits underlying a pleiotropic signal.

...read moreread less

Journal Article•DOI•

Incorporating Functional Genomic Information in Genetic Association Studies Using an Empirical Bayes Approach

[...]

Amy V. Spencer¹, Amy V. Spencer², Angela Cox¹, Wei-Yu Lin³, Wei-Yu Lin¹, Douglas F. Easton³, Kyriaki Michailidou³, Kevin Walters¹ - Show less +4 more•Institutions (3)

University of Sheffield¹, AstraZeneca², University of Cambridge³

Gene-Based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions.

TL;DR: This work proposes both an empirical method to estimate this prior variance, and a coherent approach to using SNP‐level functional data, to inform the prior probability of causal association, and shows that assigning SNP‐specific prior probabilities of association based on expert prior functional knowledge of the disease mechanism can lead to improved causal SNPs ranks.

...read moreread less

Abstract: There is a large amount of functional genetic data available, which can be used to inform fine-mapping association studies (in diseases with well-characterised disease pathways). Single nucleotide polymorphism (SNP) prioritization via Bayes factors is attractive because prior information can inform the effect size or the prior probability of causal association. This approach requires the specification of the effect size. If the information needed to estimate a priori the probability density for the effect sizes for causal SNPs in a genomic region isn't consistent or isn't available, then specifying a prior variance for the effect sizes is challenging. We propose both an empirical method to estimate this prior variance, and a coherent approach to using SNP-level functional data, to inform the prior probability of causal association. Through simulation we show that when ranking SNPs by our empirical Bayes factor in a fine-mapping study, the causal SNP rank is generally as high or higher than the rank using Bayes factors with other plausible values of the prior variance. Importantly, we also show that assigning SNP-specific prior probabilities of association based on expert prior functional knowledge of the disease mechanism can lead to improved causal SNPs ranks compared to ranking with identical prior probabilities of association. We demonstrate the use of our methods by applying the methods to the fine mapping of the CASP8 region of chromosome 2 using genotype data from the Collaborative Oncological Gene-Environment Study (COGS) Consortium. The data we analysed included approximately 46,000 breast cancer case and 43,000 healthy control samples.

...read moreread less

Journal Article•DOI•

[...]

Ruzong Fan¹, Yifan Wang¹, Qi Yan², Ying Ding², Daniel E. Weeks², Zhaohui Lu¹, Haobo Ren³, Richard J. Cook, Momiao Xiong⁴, Anand Swaroop¹, Emily Y. Chew¹, Wei Chen² - Show less +8 more•Institutions (4)

National Institutes of Health¹, University of Pittsburgh², Regeneron³, University of Texas Health Science Center at Houston⁴

A Framework for Interpreting Type I Error Rates from a Product-Term Model of Interaction Applied to Quantitative Traits.

TL;DR: Cox proportional hazard models using functional regression (FR) to perform gene‐based association analysis of survival traits while adjusting for covariates and likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region are developed.

...read moreread less

Abstract: Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare In addition, the Cox FR LRT statistics have higher power than Cox BT LRT The models and related test statistics can be useful in the whole genome and whole exome association studies An age-related macular degeneration dataset was analyzed as an example

...read moreread less

Journal Article•DOI•

[...]

Tara J. Rao¹, Michael A. Province¹•Institutions (1)

Washington University in St. Louis¹

Prioritizing individual genetic variants after kernel machine testing using variable selection.

TL;DR: The interaction‐term genomic inflation factor (lambda) showed inflation and deflation that varied with sample size and allele frequency; that similar lambda variation occurred in the absence of population substructure; and that lambda was strongly related to heteroskedasticity but not to minor non‐normality of phenotypes.

...read moreread less

Abstract: Adequate control of type I error rates will be necessary in the increasing genome-wide search for interactive effects on complex traits. After observing unexpected variability in type I error rates from SNP-by-genome interaction scans, we sought to characterize this variability and test the ability of heteroskedasticity-consistent standard errors to correct it. We performed 81 SNP-by-genome interaction scans using a product-term model on quantitative traits in a sample of 1,053 unrelated European Americans from the NHLBI Family Heart Study, and additional scans on five simulated datasets. We found that the interaction-term genomic inflation factor (lambda) showed inflation and deflation that varied with sample size and allele frequency; that similar lambda variation occurred in the absence of population substructure; and that lambda was strongly related to heteroskedasticity but not to minor non-normality of phenotypes. Heteroskedasticity-consistent standard errors narrowed the range of lambda, with HC3 outperforming HC0, but in individual scans tended to create new P-value outliers related to sparse two-locus genotype classes. We explain the lambda variation as a result of non-independence of test statistics coupled with stochastic biases in test statistics due to a failure of the test to reach asymptotic properties. We propose that one way to interpret lambda is by comparison to an empirical distribution generated from data simulated under the null hypothesis and without population substructure. We further conclude that the interaction-term lambda should not be used to adjust test statistics and that heteroskedasticity-consistent standard errors come with limitations that may outweigh their benefits in this setting.

...read moreread less

Journal Article•DOI•

[...]

Qianchuan He¹, Tianxi Cai², Yang Liu¹, Ni Zhao¹, Quaker E. Harmon³, Lynn M. Almli⁴, Elisabeth B. Binder⁵, Stephanie M. Engel⁶, Kerry J. Ressler⁷, Karen N. Conneely⁴, Xihong Lin², Michael C. Wu¹ - Show less +8 more•Institutions (7)

Fred Hutchinson Cancer Research Center¹, Harvard University², Research Triangle Park³, Emory University⁴, Max Planck Society⁵, University of North Carolina at Chapel Hill⁶, McLean Hospital⁷

01 Dec 2016-Genetic Epidemiology

TL;DR: The approach accommodates several kernels that are widely used in SNP analysis, such as the linear kernel and the Identity by State (IBS) kernel and provides practically useful utilities to prioritize SNPs, and fills the gap between SNP set analysis and biological functional studies.

...read moreread less

Abstract: Kernel machine learning methods, such as the SNP-set kernel association test (SKAT), have been widely used to test associations between traits and genetic polymorphisms. In contrast to traditional single-SNP analysis methods, these methods are designed to examine the joint effect of a set of related SNPs (such as a group of SNPs within a gene or a pathway) and are able to identify sets of SNPs that are associated with the trait of interest. However, as with many multi-SNP testing approaches, kernel machine testing can draw conclusion only at the SNP-set level, and does not directly inform on which one(s) of the identified SNP set is actually driving the associations. A recently proposed procedure, KerNel Iterative Feature Extraction (KNIFE), provides a general framework for incorporating variable selection into kernel machine methods. In this article, we focus on quantitative traits and relatively common SNPs, and adapt the KNIFE procedure to genetic association studies and propose an approach to identify driver SNPs after the application of SKAT to gene set analysis. Our approach accommodates several kernels that are widely used in SNP analysis, such as the linear kernel and the Identity by State (IBS) kernel. The proposed approach provides practically useful utilities to prioritize SNPs, and fills the gap between SNP set analysis and biological functional studies. Both simulation studies and real data application are used to demonstrate the proposed approach.

...read moreread less

Journal Article•DOI•

Identification of Rare Variants in Metabolites of the Carnitine Pathway by Whole Genome Sequencing Analysis.

[...]

Akram Yazdani¹, Azam Yazdani¹, Xiaoming Liu¹, Eric Boerwinkle², Eric Boerwinkle¹ - Show less +1 more•Institutions (2)

University of Texas Health Science Center at Houston¹, Baylor College of Medicine²

01 Sep 2016-Genetic Epidemiology

TL;DR: The utility of whole genome sequence and innovative analyses for identifying candidate regions influencing complex phenotypes are demonstrated, including 16 carnitine‐related metabolites that are important components of mammalian energy metabolism.

...read moreread less

Abstract: We use whole genome sequence data and rare variant analysis methods to investigate a subset of the human serum metabolome, including 16 carnitine-related metabolites that are important components of mammalian energy metabolism. Medium pass sequence data consisting of 12,820,347 rare variants and serum metabolomics data were available on 1,456 individuals. By applying a penalization method, we identified two genes FGF8 and MDGA2 with significant effects on lysine and cis-4-decenoylcarnitine, respectively, using Δ-AIC and likelihood ratio test statistics. Single variant analyses in these regions did not identify a single low-frequency variant (minor allele count > 3) responsible for the underlying signal. The results demonstrate the utility of whole genome sequence and innovative analyses for identifying candidate regions influencing complex phenotypes.

...read moreread less

Journal Article•DOI•

Meta‐Analysis of Rare Variant Association Tests in Multiethnic Populations

[...]

Akweley Mensah-Ablorh¹, Sara Lindström¹, Christopher A. Haiman², Brian E. Henderson², Loic Le Marchand³, Seunngeun Lee⁴, Daniel O. Stram², A. Heather Eliassen⁵, A. Heather Eliassen¹, Alkes L. Price¹, Peter Kraft¹ - Show less +7 more•Institutions (5)

Harvard University¹, University of Southern California², University of Hawaii³, University of Michigan⁴, Brigham and Women's Hospital⁵