scispace - formally typeset
Search or ask a question

Showing papers by "Xihong Lin published in 2016"


Journal ArticleDOI
TL;DR: This work develops a computationally efficient logistic mixed model approach for genome-wide analysis of binary traits, the generalized linear mixed model association test (GMMAT), and shows that GMMAT effectively controls for population structure and relatedness when analyzing binary traits in a wide variety of study designs.
Abstract: Linear mixed models (LMMs) are widely used in genome-wide association studies (GWASs) to account for population structure and relatedness, for both continuous and binary traits. Motivated by the failure of LMMs to control type I errors in a GWAS of asthma, a binary trait, we show that LMMs are generally inappropriate for analyzing binary traits when population stratification leads to violation of the LMM's constant-residual variance assumption. To overcome this problem, we develop a computationally efficient logistic mixed model approach for genome-wide analysis of binary traits, the generalized linear mixed model association test (GMMAT). This approach fits a logistic mixed model once per GWAS and performs score tests under the null hypothesis of no association between a binary trait and individual genetic variants. We show in simulation studies and real data analysis that GMMAT effectively controls for population structure and relatedness when analyzing binary traits in a wide variety of study designs.

331 citations


Journal ArticleDOI
TL;DR: These findings identify novel associations in inflammatory, hypoxia signaling, and sleep pathways in Hispanic/Latino Americans from three cohorts, which are the first genome-level significant findings reported for obstructive sleep apnea-related physiologic traits in any population.
Abstract: Rationale: Obstructive sleep apnea is a common disorder associated with increased risk for cardiovascular disease, diabetes, and premature mortality. Although there is strong clinical and epidemiologic evidence supporting the importance of genetic factors in influencing obstructive sleep apnea, its genetic basis is still largely unknown. Prior genetic studies focused on traits defined using the apnea–hypopnea index, which contains limited information on potentially important genetically determined physiologic factors, such as propensity for hypoxemia and respiratory arousability.Objectives: To define novel obstructive sleep apnea genetic risk loci for obstructive sleep apnea, we conducted genome-wide association studies of quantitative traits in Hispanic/Latino Americans from three cohorts.Methods: Genome-wide data from as many as 12,558 participants in the Hispanic Community Health Study/Study of Latinos, Multi-Ethnic Study of Atherosclerosis, and Starr County Health Studies population-based cohorts were...

98 citations


Journal ArticleDOI
01 Nov 2016-Pancreas
TL;DR: It is demonstrated that expression of SSTR2, but not other SSTRs, is associated with longer OS, in patients treated with SSAs, and in a subgroup of patients with metastatic small intestine NET treated withSSAs and evaluable for progression, SSTR 2 expression was associated with both longer progression-free survival (PFS) and OS.
Abstract: ObjectiveSomatostatin receptors (SSTRs), products of gene superfamily SSTR1-5, are commonly expressed in neuroendocrine tumors (NETs). Somatostatin analogs (SSAs) bind to SSTRs and are used as therapeutic agents in patients with advanced NETs. We hypothesized that tumor SSTR expression status would

78 citations


Journal ArticleDOI
TL;DR: The interaction sequence kernel association test (iSKAT) is developed and is powerful and robust to the proportion of variants in a gene that interact with environment and the signs of the effects, and properly controls for the main effects of the rare variants using weighted ridge regression while adjusting for covariates.
Abstract: We consider in this article testing rare variants by environment interactions in sequencing association studies. Current methods for studying the association of rare variants with traits cannot be readily applied for testing for rare variants by environment interactions, as these methods do not effectively control for the main effects of rare variants, leading to unstable results and/or inflated Type 1 error rates. We will first analytically study the bias of the use of conventional burden-based tests for rare variants by environment interactions, and show the tests can often be invalid and result in inflated Type 1 error rates. To overcome these difficulties, we develop the interaction sequence kernel association test (iSKAT) for assessing rare variants by environment interactions. The proposed test iSKAT is optimal in a class of variance component tests and is powerful and robust to the proportion of variants in a gene that interact with environment and the signs of the effects. This test properly controls for the main effects of the rare variants using weighted ridge regression while adjusting for covariates. We demonstrate the performance of iSKAT using simulation studies and illustrate its application by analysis of a candidate gene sequencing study of plasma adiponectin levels.

73 citations


Journal ArticleDOI
TL;DR: It is found that higher birth weight-for-gestational age was associated with higher methylation at four CpGs at the PBX1 locus (e.g., β (95% CI) for lead signal at cg06750897 = 1.9 (1.2, 2.6)), which encodes a transcription factor that regulates embryonic development.
Abstract: Both higher and lower fetal growth are associated with cardio-metabolic health later in life, suggesting that prenatal developmental programming determines long-term cardiovascular disease risk. Epigenetic mechanisms, which orchestrate fetal growth and development, may offer insight on the early programming of health and disease. We investigated whether birth weight-for-gestational is associated with DNA methylation at birth and mid-childhood, measured via the Infinium 450K array. Participants were from Project Viva, a pre-birth cohort of pregnant women and their children in Eastern Massachusetts. After exclusion of participants with maternal type 1 or 2 diabetes and gestational age <34 weeks, we used DNA methylation assays from 476 venous umbilical cord blood samples and a subset of 235 who additionally had peripheral blood samples available in mid-childhood (age 7–10 years). Among 392,918 CpG sites analyzed, birth weight-for-gestational age z-score was associated with cord blood DNA methylation at 34 CpGs (false discovery rate P < 0.05), after adjusting for maternal age, race/ethnicity, education, smoking, parity, delivery mode, pre-pregnancy BMI, gestational diabetes status, child sex, and estimated cord blood cell proportions based on a cord blood reference panel. Two of these CpGs were previously reported in epigenome-wide analyses of birth weight, and several other CpGs map to genes relevant to fetal growth and development. Namely, higher birth weight-for-gestational age was associated with higher methylation at four CpGs at the PBX1 locus (e.g., β (95% CI) for lead signal at cg06750897 = 1.9 (1.2, 2.6)), which encodes a transcription factor that regulates embryonic development. Birth weight-for-gestational age was also associated with mid-childhood blood DNA methylation at four of the 34 CpGs identified in cord blood analyses, including sites at the PBX1 locus described. We identified CpG sites where birth weight-for-gestational age was associated with DNA methylation at birth, and for a subset of these sites, birth weight-for-gestational age was also associated with DNA methylation at mid-childhood.

59 citations


Journal ArticleDOI
TL;DR: In an exome array analysis of COPD, nonsynonymous variants at previously described loci and a novel exome-wide significant variant in IL27 are identified and appears to affect genes potentially related to COPD pathogenesis.
Abstract: Rationale: Chronic obstructive pulmonary disease (COPD) susceptibility is in part related to genetic variants. Most genetic studies have been focused on genome-wide common variants without a specific focus on coding variants, but common and rare coding variants may also affect COPD susceptibility.Objectives: To identify coding variants associated with COPD.Methods: We tested nonsynonymous, splice, and stop variants derived from the Illumina HumanExome array for association with COPD in five study populations enriched for COPD. We evaluated single variants with a minor allele frequency greater than 0.5% using logistic regression. Results were combined using a fixed effects meta-analysis. We replicated novel single-variant associations in three additional COPD cohorts.Measurements and Main Results: We included 6,004 control subjects and 6,161 COPD cases across five cohorts for analysis. Our top result was rs16969968 (P = 1.7 × 10−14) in CHRNA5, a locus previously associated with COPD susceptibility and nico...

56 citations


Journal ArticleDOI
TL;DR: This study demonstrates that principal components generally result in higher heritability and linkage evidence than individual traits, and PCHs can provide useful traits for using data on multiple phenotypes and for genetic studies of trans‐ethnic populations.
Abstract: A disease trait often can be characterized by multiple phenotypic measurements that can provide complementary information on disease etiology, physiology, or clinical manifestations. Given that multiple phenotypes may be correlated and reflect common underlying genetic mechanisms, the use of multivariate analysis of multiple traits may improve statistical power to detect genes and variants underlying complex traits. The literature, however, has been unclear as to the optimal approach for analyzing multiple correlated traits. In this study, heritability and linkage analysis was performed for six obstructive sleep apnea hypopnea syndrome (OSAHS) related phenotypes, as well as principal components of the phenotypes and principal components of the heritability (PCHs) using the data from Cleveland Family Study, which include both African and European American families. Our study demonstrates that principal components generally result in higher heritability and linkage evidence than individual traits. Furthermore, the PCHs can be transferred across populations, strongly suggesting that these PCHs reflect traits with common underlying genetic mechanisms for OSAHS across populations. Thus, PCHs can provide useful traits for using data on multiple phenotypes and for genetic studies of trans-ethnic populations.

34 citations


Journal ArticleDOI
TL;DR: This work proposes to use intra-class correlation coefficient (ICC), which characterizes the relative contribution of the biological variability to the total variability, to filter CpGs when technical replicates are available to estimate the ICC based on a linear mixed effects model by pooling all the samples.
Abstract: Summary: The development of the Infinium HumanMethylation450 BeadChip enables epigenome-wide association studies at a reduced cost. One observation of the 450K data is that many CpG sites the beadchip interrogates have very large measurement errors. Including these noisy CpGs will decrease the statistical power of detecting relevant associations due to multiple testing correction. We propose to use intra-class correlation coefficient (ICC), which characterizes the relative contribution of the biological variability to the total variability, to filter CpGs when technical replicates are available. We estimate the ICC based on a linear mixed effects model by pooling all the samples instead of using the technical replicates only. An ultra-fast algorithm has been developed to address the computational complexity and CpG filtering can be completed in minutes on a desktop computer for a 450K data set of over 1000 samples. Our method is very flexible and can accommodate any replicate design. Simulations and a real data application demonstrate that our whole-sample ICC method performs better than replicate-sample ICC or variance-based method. Availability and implementation: CpGFilter is implemented in R and publicly available under CRAN via the R package ‘CpGFilter’. Contact: chen.jun2@mayo.edu or xlin@hsph.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online.

24 citations


Journal ArticleDOI
TL;DR: This study provides the first evidence for the association of ANGPT2, a gene previously implicated in acute lung injury syndromes, with nocturnal SaO2, suggesting that this gene has a broad range of effects on gas exchange, including influencing oxygenation during sleep.
Abstract: Genetic determinants of sleep-disordered breathing (SDB), a common set of disorders that contribute to significant cardiovascular and neuropsychiatric morbidity, are not clear. Overnight nocturnal oxygen saturation (SaO2) is a clinically relevant and easily measured indicator of SDB severity but its genetic contribution has never been studied. Our recent study suggests nocturnal SaO2 is heritable. We performed linkage analysis, association analysis and haplotype analysis of average nocturnal oxyhaemoglobin saturation in participants in the Cleveland Family Study (CFS), followed by gene-based association and additional tests in four independent samples. Linkage analysis identified a peak (LOD = 4.29) on chromosome 8p23. Follow-up association analysis identified two haplotypes in angiopoietin-2 (ANGPT2) that significantly contributed to the variation of SaO2 (P = 8 × 10-5) and accounted for a portion of the linkage evidence. Gene-based association analysis replicated the association of ANGPT2 and nocturnal SaO2. A rare missense SNP rs200291021 in ANGPT2 was associated with serum angiopoietin-2 level (P = 1.29 × 10-4), which was associated with SaO2 (P = 0.002). Our study provides the first evidence for the association of ANGPT2, a gene previously implicated in acute lung injury syndromes, with nocturnal SaO2, suggesting that this gene has a broad range of effects on gas exchange, including influencing oxygenation during sleep.

23 citations


Journal ArticleDOI
TL;DR: Initial evidence is provided that presumed sporadic small intestine neuroendocrine tumors may have a genetic etiology, and the results provide a basis for further exploring the role of genes implicated in this analysis, and for replication studies to confirm the observed associations.
Abstract: The etiology of neuroendocrine tumors remains poorly defined. Although neuroendocrine tumors are in some cases associated with inherited genetic syndromes, such syndromes are rare. The majority of neuroendocrine tumors are thought to be sporadic. We performed a genome-wide association study (GWAS) to identify potential genetic risk factors for sporadic neuroendocrine tumors. Using germline DNA from blood specimens, we genotyped 909,622 SNPs using the Affymetrix 6.0 GeneChip, in a cohort comprising 832 neuroendocrine tumor cases from Dana-Farber Cancer Institute and Massachusetts General Hospital and 4542 controls from the Harvard School of Public Health. An additional 241 controls from Dana-Farber Cancer Institute were used for quality control. We assessed risk associations in the overall cohort, and in neuroendocrine tumor subgroups. We identified no potential risk associations in the cohort overall. In the small intestine neuroendocrine tumor subgroup, comprising 293 cases, we identified risk associations with three SNPs on chromosome 12, all in strong LD. The three SNPs are located upstream of ELK3, a transcription factor implicated in angiogenesis. We did not identify clear risk associations in the bronchial or pancreatic neuroendocrine subgroups. This large-scale study provides initial evidence that presumed sporadic small intestine neuroendocrine tumors may have a genetic etiology. Our results provide a basis for further exploring the role of genes implicated in this analysis, and for replication studies to confirm the observed associations. Additional studies to evaluate potential genetic risk factors for sporadic pancreatic and bronchial neuroendocrine tumors are warranted.

20 citations


Journal ArticleDOI
TL;DR: This paper proposes a general approach for estimating and testing the population effect of a genetic variant on a secondary phenotype based on inverse probability weighted estimating equations, where the weights depend on genotype and the secondary phenotype, and shows that it is substantially more robust to model misspecification.
Abstract: The case-control study is a common design for assessing the association between genetic exposures and a disease phenotype. Though association with a given (case-control) phenotype is always of primary interest, there is often considerable interest in assessing relationships between genetic exposures and other (secondary) phenotypes. However, the case-control sample represents a biased sample from the general population. As a result, if this sampling framework is not correctly taken into account, analyses estimating the effect of exposures on secondary phenotypes can be biased leading to incorrect inference. In this paper, we address this problem and propose a general approach for estimating and testing the population effect of a genetic variant on a secondary phenotype. Our approach is based on inverse probability weighted estimating equations, where the weights depend on genotype and the secondary phenotype. We show that, though slightly less efficient than a full likelihood-based analysis when the likelihood is correctly specified, it is substantially more robust to model misspecification, and can out-perform likelihood-based analysis, both in terms of validity and power, when the model is misspecified. We illustrate our approach with an application to a case-control study extracted from the Framingham Heart Study. Copyright © 2016 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The approach accommodates several kernels that are widely used in SNP analysis, such as the linear kernel and the Identity by State (IBS) kernel and provides practically useful utilities to prioritize SNPs, and fills the gap between SNP set analysis and biological functional studies.
Abstract: Kernel machine learning methods, such as the SNP-set kernel association test (SKAT), have been widely used to test associations between traits and genetic polymorphisms. In contrast to traditional single-SNP analysis methods, these methods are designed to examine the joint effect of a set of related SNPs (such as a group of SNPs within a gene or a pathway) and are able to identify sets of SNPs that are associated with the trait of interest. However, as with many multi-SNP testing approaches, kernel machine testing can draw conclusion only at the SNP-set level, and does not directly inform on which one(s) of the identified SNP set is actually driving the associations. A recently proposed procedure, KerNel Iterative Feature Extraction (KNIFE), provides a general framework for incorporating variable selection into kernel machine methods. In this article, we focus on quantitative traits and relatively common SNPs, and adapt the KNIFE procedure to genetic association studies and propose an approach to identify driver SNPs after the application of SKAT to gene set analysis. Our approach accommodates several kernels that are widely used in SNP analysis, such as the linear kernel and the Identity by State (IBS) kernel. The proposed approach provides practically useful utilities to prioritize SNPs, and fills the gap between SNP set analysis and biological functional studies. Both simulation studies and real data application are used to demonstrate the proposed approach.

Journal ArticleDOI
TL;DR: This article extends existing burden and kernel-based gene set association tests for population data to related samples, with a particular emphasis on binary phenotypes, and proposes the efficient generalized kernel score test, which can be applied as a mega-analysis framework to combine studies with different designs.
Abstract: The objective of this article is to introduce valid and robust methods for the analysis of rare variants for family-based exome chips, whole-exome sequencing or whole-genome sequencing data. Family-based designs provide unique opportunities to detect genetic variants that complement studies of unrelated individuals. Currently, limited methods and software tools have been developed to assist family-based association studies with rare variants, especially for analyzing binary traits. In this article, we address this gap by extending existing burden and kernel-based gene set association tests for population data to related samples, with a particular emphasis on binary phenotypes. The proposed approach blends the strengths of kernel machine methods and generalized estimating equations. Importantly, the efficient generalized kernel score test can be applied as a mega-analysis framework to combine studies with different designs. We illustrate the application of the proposed method using data from an exome sequencing study of autism. Methods discussed in this article are implemented in an R package 'gskat', which is available on CRAN and GitHub.

Journal ArticleDOI
TL;DR: It is shown that in taking an ad hoc approach, it may be desirable to include covariates that affect the primary disease in the secondary phenotype model, even though these covariates are not necessarily associated with the secondary phenotypes.
Abstract: Case-control association studies often collect from their subjects information on secondary phenotypes. Reusing the data and studying the association between genes and secondary phenotypes provide an attractive and cost-effective approach that can lead to discovery of new genetic associations. A number of approaches have been proposed, including simple and computationally efficient ad hoc methods that ignore ascertainment or stratify on case-control status. Justification for these approaches relies on the assumption of no covariates and the correct specification of the primary disease model as a logistic model. Both might not be true in practice, for example, in the presence of population stratification or the primary disease model following a probit model. In this paper, we investigate the validity of ad hoc methods in the presence of covariates and possible disease model misspecification. We show that in taking an ad hoc approach, it may be desirable to include covariates that affect the primary disease in the secondary phenotype model, even though these covariates are not necessarily associated with the secondary phenotype. We also show that when the disease is rare, ad hoc methods can lead to severely biased estimation and inference if the true disease model follows a probit model instead of a logistic model. Our results are justified theoretically and via simulations. Applied to real data analysis of genetic associations with cigarette smoking, ad hoc methods collectively identified as highly significant (P<10−5) single nucleotide polymorphisms from over 10 genes, genes that were identified in previous studies of smoking cessation.

Journal ArticleDOI
TL;DR: In this paper, a panel of 48 male welders had particulate matter less than 2.5 microns in diameter (PM2.5) exposure measurements over 4-6 hours repeated over five sampling periods between January 2010 and June 2012.
Abstract: Objective Acceleration (AC) and deceleration (DC) capacities measure heart rate variability during speeding up and slowing down of the heart, respectively. We investigated associations between AC and DC with occupational short-term metal PM2.5 exposures. Methods A panel of 48 male welders had particulate matter less than 2.5 microns in diameter (PM2.5) exposure measurements over 4–6 h repeated over 5 sampling periods between January 2010 and June 2012. We simultaneously obtained continuous recordings of digital ECG using a Holter monitor. We analysed ECG data in the time domain to obtain hourly AC and DC. Linear mixed models were used to assess the associations between hourly PM2.5 exposure and each of hourly AC and DC, controlling for age, smoking status, active smoking, exposure to secondhand smoke, season/ time of day when ECG reading was obtained and baseline AC or DC. We also ran lagged exposure response models for each successive hour up to 3 h after onset of exposure. Results Mean (SD) shift PM2.5 exposure during welding was 0.47 (0.43) mg/m 3 . Significant exposure–response associations were found for AC and DC with increased PM2.5 exposure. In our adjusted models without any lag between exposure and response, a 1 mg/m 3 increase of PM2.5 was associated with a decrease of 1.46 (95% CI 1.00 to 1.92) ms in AC and a decrease of 1.00 (95% CI 0.53 to 1.46) ms in DC. The effect of PM2.5 on AC and DC was maximal immediately postexposure and lasted 1 h following exposure. Conclusions There are short-term effects of metal particulates on AC and DC.

Journal ArticleDOI
TL;DR: Long-term metal particulate exposures decrease cardiac accelerations and decelerations, as measured by chronic exposure index for PM2.5.
Abstract: Objective: The aim of the study was to clarify whether long-term metal particulates affect cardiac acceleration capacity (AC), deceleration capacity (DC), or both. Methods: We calculated chronic exposure index (CEI) for PM2.5 over the work life of 50 boilermakers and obtained their resting AC and DC. Linear regression was used to assess the associations between CEI PM2.5 exposure and each of AC and DC, controlling for age, acute effects of welding exposure, and diurnal variation. Results: Mean (standard deviation) CEI for PM2.5 exposure was 1.6 (2.4) mg/m3-work years and ranged from 0.001 to 14.6 mg/m3-work years. In our fully adjusted models, a 1 mg/m3-work year increase in CEI for PM2.5 was associated with a decrease of 1.03 (95% confidence interval: 0.10, 1.96) ms resting AC, and a decrease of 0.67 (95% confidence interval: −0.14, 1.49) ms resting DC. Conclusions: Long-term metal particulate exposures decrease cardiac accelerations and decelerations.

Journal ArticleDOI
TL;DR: IL-6 may be mediating the effect of metal particulates on AC, as measured by associations of acceleration capacity (AC) and deceleration capacity (DC) with metal-PM2.5 are mediated by inflammation.
Abstract: The aim of this study was to investigate whether associations of acceleration capacity (AC) and deceleration capacity (DC) with metal-PM2.5 are mediated by inflammation.We obtained PM2.5, C-reactive protein, interleukin (IL)-6, 8, and 10, and electrocardiograms to compute AC and DC, from 45 male welders. Mediation analyses were performed using linear mixed models to assess associations between PM2.5 exposure, inflammatory mediator, and AC or DC, controlling for covariates.The proportion of total effect of PM2.5 on AC or DC (indirect effect) mediated through IL-6 on AC was 4% at most. Controlling for IL-6 (direct effect), a 1 mg/m increase of PM2.5 was associated with a decrease of 2.16 (95% confidence interval -0.36 to 4.69) msec in AC and a decrease of 2.51 (95% confidence interval -0.90 to 5.93) msec in DC.IL-6 may be mediating the effect of metal particulates on AC.