scispace - formally typeset
Search or ask a question

Showing papers by "Xihong Lin published in 2011"


Journal ArticleDOI
TL;DR: The sequence kernel association test (SKAT) is proposed, a supervised, flexible, computationally efficient regression method to test for association between genetic variants (common and rare) in a region and a continuous or dichotomous trait while easily adjusting for covariates.
Abstract: Sequencing studies are increasingly being conducted to identify rare variants associated with complex traits. The limited power of classical single-marker association analysis for rare variants poses a central challenge in such studies. We propose the sequence kernel association test (SKAT), a supervised, flexible, computationally efficient regression method to test for association between genetic variants (common and rare) in a region and a continuous or dichotomous trait while easily adjusting for covariates. As a score-based variance-component test, SKAT can quickly calculate p values analytically by fitting the null model containing only the covariates, and so can easily be applied to genome-wide data. Using SKAT to analyze a genome-wide sequencing study of 1000 individuals, by segmenting the whole genome into 30 kb regions, requires only 7 hr on a laptop. Through analysis of simulated data across a wide range of practical scenarios and triglyceride data from the Dallas Heart Study, we show that SKAT can substantially outperform several alternative rare-variant association tests. We also provide analytic power and sample-size calculations to help design candidate-gene, whole-exome, and whole-genome sequence association studies.

2,202 citations


Journal ArticleDOI
26 May 2011-Nature
TL;DR: Correcting the obesity-induced alteration of ER phospholipid composition or hepatic Serca overexpression in vivo both reduced chronic ER stress and improved glucose homeostasis is established.
Abstract: The endoplasmic reticulum (ER) is the main site of protein and lipid synthesis, membrane biogenesis, xenobiotic detoxification and cellular calcium storage, and perturbation of ER homeostasis leads to stress and the activation of the unfolded protein response. Chronic activation of ER stress has been shown to have an important role in the development of insulin resistance and diabetes in obesity. However, the mechanisms that lead to chronic ER stress in a metabolic context in general, and in obesity in particular, are not understood. Here we comparatively examined the proteomic and lipidomic landscape of hepatic ER purified from lean and obese mice to explore the mechanisms of chronic ER stress in obesity. We found suppression of protein but stimulation of lipid synthesis in the obese ER without significant alterations in chaperone content. Alterations in ER fatty acid and lipid composition result in the inhibition of sarco/endoplasmic reticulum calcium ATPase (SERCA) activity and ER stress. Correcting the obesity-induced alteration of ER phospholipid composition or hepatic Serca overexpression in vivo both reduced chronic ER stress and improved glucose homeostasis. Hence, we established that abnormal lipid and calcium metabolism are important contributors to hepatic ER stress in obesity.

848 citations


Journal ArticleDOI
TL;DR: It is shown that correlation may increase the bias and variance of the estimators substantially with respect to the independent case, and that in some cases, such as an exchangeable correlation structure, the estimator fails to be consistent as the number of tests becomes large.
Abstract: The objective of this paper is to quantify the effect of correlation in false discovery rate analysis. Specifically, we derive approximations for the mean, variance, distribution and quantiles of the standard false discovery rate estimator for arbitrarily correlated data. This is achieved using a negative binomial model for the number of false discoveries, where the parameters are found empirically from the data. We show that correlation may increase the bias and variance of the estimator substantially with respect to the independent case, and that in some cases, such as an exchangeable correlation structure, the estimator fails to be consistent as the number of tests becomes large.

120 citations


Journal ArticleDOI
TL;DR: Toenails appeared to be a valid measure of cumulative manganese exposure 7 to 12 months earlier and was significantly correlated with cumulative exposure in 7 to 9, 10 to 12, and 7 to12 months before toenail clipping date, but not 1 to 6 months.
Abstract: Objective:This study examined the correlation between manganese exposure and manganese concentrations in different biomarkers.Methods:Air measurement data and work histories were used to determine manganese exposure over a work shift and cumulative exposure. Toenail samples (n = 49), as well as bloo

107 citations


Journal ArticleDOI
TL;DR: It is suggested that even at relatively low Mn exposure levels neuropsychological effects may manifest particularly with respect to attention, mood, and fine motor control.
Abstract: While the neuropsychological effects of high manganese (Mn) exposure in occupational settings are well known, the effects of lower levels of exposure are less understood. In this study, we investigated the neuropsychological effects of lower level occupational Mn exposure in 46 male welders (mean age=37.4, S.D.=11.7 years). Each welders' cumulative Mn exposure indices (Mn-CEI) for the past 12 months and total work history Mn exposure were constructed based on air Mn measurements and work histories. The association between these exposure indices and performance on cognitive, motor control, and psychological tests was examined. In addition, among a subset of welders (n=24) who completed the tests both before and after a work shift, we examined the association between cross-shift Mn exposure assessed from personal monitoring and acute changes in test scores. Mn exposures in this study (median=12.9 μg/m³) were much lower, as compared to those observed in other similar studies. Increasing total Mn-CEI was significantly associated with slower reaction time on the continuous performance test (CPT; p<0.01), as well as worse mood for several scales on the Profile of Mood States (POMS; confused, tired, and a composite of tired and energetic, all p ≤ 0.03). Increasing Mn-CEI over the previous 12 months was significantly associated with worse mood on the sad, tense, and confused POMS scales (all p ≤ 0.03) and the association with worse CPT performance approached significance (p=0.10). Higher Mn exposure over the course of a workday was associated with worse performance on the CPT test across the day (p=0.06) as well as declines in fine motor control over the work-shift (p=0.04), adjusting for age and time between the 2 tests. Our study suggests that even at relatively low Mn exposure levels neuropsychological effects may manifest particularly with respect to attention, mood, and fine motor control.

71 citations


Journal ArticleDOI
TL;DR: A powerful test for identifying single nucleotide polymorphism (SNP)‐sets that are predictive of survival with data from genome‐wide association studies with censored survival outcomes is developed.
Abstract: In this article, we develop a powerful test for identifying single nucleotide polymorphism (SNP)-sets that are predictive of survival with data from genome-wide association studies. We first group typed SNPs into SNP-sets based on genomic features and then apply a score test to assess the overall effect of each SNP-set on the survival outcome through a kernel machine Cox regression framework. This approach uses genetic information from all SNPs in the SNP-set simultaneously and accounts for linkage disequilibrium (LD), leading to a powerful test with reduced degrees of freedom when the typed SNPs are in LD with each other. This type of test also has the advantage of capturing the potentially nonlinear effects of the SNPs, SNP-SNP interactions (epistasis), and the joint effects of multiple causal variants. By simulating SNP data based on the LD structure of real genes from the HapMap project, we demonstrate that our proposed test is more powerful than the standard single SNP minimum P-value-based test for association studies with censored survival outcomes. We illustrate the proposed test with a real data application.

68 citations


Journal ArticleDOI
TL;DR: An efficient score test is proposed to assess the overall effect of a set of markers, such as genes within a pathway or a network, on survival outcomes and has the advantage of capturing the potentially nonlinear effects without explicitly specifying a particular nonlinear functional form.
Abstract: There is growing evidence that genomic and proteomic research holds great potential for changing irrevocably the practice of medicine. The ability to identify important genomic and biological markers for risk assessment can have a great impact in public health from disease prevention, to detection, to treatment selection. However, the potentially large number of markers and the complexity in the relationship between the markers and the outcome of interest impose a grand challenge in developing accurate risk prediction models. The standard approach to identifying important markers often assesses the marginal effects of individual markers on a phenotype of interest. When multiple markers relate to the phenotype simultaneously via a complex structure, such a type of marginal analysis may not be effective. To overcome such difficulties, we employ a kernel machine Cox regression framework and propose an efficient score test to assess the overall effect of a set of markers, such as genes within a pathway or a network, on survival outcomes. The proposed test has the advantage of capturing the potentially nonlinear effects without explicitly specifying a particular nonlinear functional form. To approximate the null distribution of the score statistic, we propose a simple resampling procedure that can be easily implemented in practice. Numerical studies suggest that the test performs well with respect to both empirical size and power even when the number of variables in a gene set is not small compared to the sample size.

61 citations


Journal ArticleDOI
TL;DR: It is found that heavy cigarette smokers have significantly more copy number gains than non- or light smokers (≤60 pack-years) and leads to morecopy number alterations, which may be mediated by the genome instability.
Abstract: Cigarette smoking has been a well-established risk factor of lung cancer for decades. How smoking contributes to tumorigenesis in the lung remains not fully understood. Here we report the results of a genome-wide study of DNA copy number and smoking pack-years in a large collection of nonsmall-cell lung cancer (NSCLC) tumors. Genome-wide analyses of DNA copy number and pack-years of cigarette smoking were performed on 264 NSCLC tumors, which were divided into discovery and validation sets. The copy number-smoking associations were investigated in three scales: whole-genome, chromosome/arm, and focal regions. We found that heavy cigarette smokers (>60 pack-years) have significantly more copy number gains than non- or light smokers (≤60 pack-years) (P = 2.46 × 10−4), especially in 8q and 12q. Copy number losses tend to occur away from genes in non/light smokers (P = 5.15 × 10−5) but not in heavy smokers (P = 0.52). Focal copy number analyses showed that there are strong associations of copy number and cigarette smoking pack-years in 12q23 (P = 9.69 × 10−10) where IGF1 (insulin-like growth factor 1) is located. All of the above analyses were tested in the discovery set and confirmed in the validation set. DNA double-strand break assays using human bronchial epithelial cell lines treated with cigarette smoke condensate were also performed, and indicated that cigarette smoke condensate leads to genome instability in human bronchial epithelial cells. We conclude that cigarette smoking leads to more copy number alterations, which may be mediated by the genome instability.

56 citations


Journal ArticleDOI
TL;DR: A formal statistical approach is described that provides fast computation of approximate p values for individual genes, adjusts for the background variation in each gene, and allows for incorporation of functional or linkage-based information, and it accommodates designs based on both affected relative pairs and unrelated affected individuals.
Abstract: Many sequencing studies are now underway to identify the genetic causes for both Mendelian and complex traits. Via exome-sequencing, genes harboring variants implicated in several Mendelian traits have already been identified. The underlying methodology in these studies is a multistep algorithm based on filtering variants identified in a small number of affected individuals and depends on whether they are novel (not yet seen in public resources such as dbSNP), shared among affected individuals, and other external functional information on the variants. Although intuitive, these filter-based methods are nonoptimal and do not provide any measure of statistical uncertainty. We describe here a formal statistical approach that has several distinct advantages: (1) it provides fast computation of approximate p values for individual genes, (2) it adjusts for the background variation in each gene, (3) it allows for incorporation of functional or linkage-based information, and (4) it accommodates designs based on both affected relative pairs and unrelated affected individuals. We show via simulations that the proposed approach can be used in conjunction with the existing filter-based methods to achieve a substantially better ranking of a gene relevant for disease when compared to currently used filter-based approaches, this is especially so in the presence of disease locus heterogeneity. We revisit recent studies on three Mendelian diseases and show that the proposed approach results in the implicated gene being ranked first in all studies, and approximate p values of 10−6 for the Miller Syndrome gene, 1.0 × 10−4 for the Freeman-Sheldon Syndrome gene, and 3.5 × 10−5 for the Kabuki Syndrome gene.

50 citations


Journal ArticleDOI
TL;DR: A key feature of the proposed test is that it is flexible and developed for both parametric and nonparametric models within a unified framework, and is more powerful than the standard test by accounting for the correlation among genes and hence often uses a much smaller degrees of freedom.
Abstract: We propose in this article a powerful testing procedure for detecting a gene effect on a continuous outcome in the presence of possible gene-gene interactions (epistasis) in a gene set, e.g., a genetic pathway or network. Traditional tests for this purpose require a large number of degrees of freedom by testing the main effect and all the corresponding interactions under a parametric assumption, and hence suffer from low power. In this article, we propose a powerful kernel machine based test. Specifically, our test is based on a garrote kernel method and is constructed as a score test. Here, the term garrote refers to an extra nonnegative parameter that is multiplied to the covariate of interest so that our score test can be formulated in terms of this nonnegative parameter. A key feature of the proposed test is that it is flexible and developed for both parametric and nonparametric models within a unified framework, and is more powerful than the standard test by accounting for the correlation among genes and hence often uses a much smaller degrees of freedom. We investigate the theoretical properties of the proposed test. We evaluate its finite sample performance using simulation studies, and apply the method to the Michigan prostate cancer gene expression data.

40 citations


Journal ArticleDOI
TL;DR: This work proposes an adjustment of conventional inference using a post-processing technique based on an analytic evaluation of the Moments of the random moments of the DP that can be conveniently incorporated into Markov chain Monte Carlo simulations at essentially no additional computational cost.
Abstract: Dirichlet process (DP) priors are a popular choice for semiparametric Bayesian random effect models. The fact that the DP prior implies a non-zero mean for the random effect distribution creates an identifiability problem that complicates the interpretation of, and inference for, the fixed effects that are paired with the random effects. Similarly, the interpretation of, and inference for, the variance components of the random effects also becomes a challenge. We propose an adjustment of conventional inference using a post-processing technique based on an analytic evaluation of the moments of the random moments of the DP. The adjustment for the moments of the DP can be conveniently incorporated into Markov chain Monte Carlo simulations at essentially no additional computational cost. We conduct simulation studies to evaluate the performance of the proposed inference procedure in both a linear mixed model and a logistic linear mixed effect model. We illustrate the method by applying it to a prostate specific antigen dataset. We provide an R function that allows one to implement the proposed adjustment in a post-processing step of posterior simulation output, without any change to the posterior simulation itself.

Journal ArticleDOI
TL;DR: These findings support further investigation of the potential role of IL12A and DAD1 in the etiology of NET and suggest that genetic variation in inflammatory pathways or apoptosis pathways is associated with NET risk.
Abstract: Genetic risk factors for sporadic neuroendocrine tumors (NET) are poorly understood. We tested risk associations in patients with sporadic NET and non-cancer controls, using a custom array containing 1536 single-nucleotide polymorphisms (SNPs) in 355 candidate genes. We identified 18 SNPs associated with NET risk at a P-value <0.01 in a discovery set of 261 cases and 319 controls. Two of these SNPs were found to be significantly associated with NET risk in an independent replication set of 235 cases and 113 controls, at a P value ≤0.05. An SNP in interleukin 12A (IL12A rs2243123), a gene implicated in inflammatory response, replicated with an adjusted odds ratio (95% confidence interval) (aOR) = 1.47 (1.03, 2.11) P-trend = 0.04. A second SNP in defender against cell death, (DAD1 rs8005354), a gene that modulates apoptosis, replicated at aOR = 1.43 (1.02, 2.02) P-trend = 0.04. Consistent with our observations, a pathway analysis, performed in the discovery set, suggested that genetic variation in inflammatory pathways or apoptosis pathways is associated with NET risk. Our findings support further investigation of the potential role of IL12A and DAD1 in the etiology of NET.

Journal ArticleDOI
TL;DR: A novel parametric cross ratio estimator that is a flexible continuous function of both components of the bivariate survival times is proposed and it is shown that the proposed estimator is consistent and asymptotically normal.
Abstract: In the analysis of bivariate correlated failure time data, it is important to measure the strength of association among the correlated failure times. One commonly used measure is the cross ratio. Motivated by Cox's partial likelihood idea, we propose a novel parametric cross ratio estimator that is a flexible continuous function of both components of the bivariate survival times. We show that the proposed estimator is consistent and asymptotically normal. Its finite sample performance is examined using simulation studies, and it is applied to the Australian twin data.

Journal ArticleDOI
02 Aug 2011-PLOS ONE
TL;DR: It is found that CNAs increase with disease progression and CNAs are both positionally and functionally clustered, and genes with CNAs in non-small cell lung tumors were enriched in certain gene sets and biological pathways that play crucial roles in oncogenesis and cancer progression.
Abstract: Lung cancer, of which more than 80% is non-small cell, is the leading cause of cancer-related death in the United States. Copy number alterations (CNAs) in lung cancer have been shown to be positionally clustered in certain genomic regions. However, it remains unclear whether genes with copy number changes are functionally clustered. Using a dense single nucleotide polymorphism array, we performed genome-wide copy number analyses of a large collection of non-small cell lung tumors (n = 301). We proposed a formal statistical test for CNAs between different groups (e.g., non-involved lung vs. tumors, early vs. late stage tumors). We also customized the gene set enrichment analysis (GSEA) algorithm to investigate the overrepresentation of genes with CNAs in predefined biological pathways and gene sets (i.e., functional clustering). We found that CNAs events increase substantially from germline, early stage to late stage tumor. In addition to genomic position, CNAs tend to occur away from the gene locations, especially in germline, non-involved tissue and early stage tumors. Such tendency decreases from germline to early stage and then to late stage tumors, suggesting a relaxation of selection during tumor progression. Furthermore, genes with CNAs in non-small cell lung tumors were enriched in certain gene sets and biological pathways that play crucial roles in oncogenesis and cancer progression, demonstrating the functional aspect of CNAs in the context of biological pathways that were overlooked previously. We conclude that CNAs increase with disease progression and CNAs are both positionally and functionally clustered. The potential functional capabilities acquired via CNAs may be sufficient for normal cells to transform into malignant cells.

Journal ArticleDOI
TL;DR: The findings indicate that ESR2 is not associated with risk of lung cancer in women and neither the four individual htSNPs nor their resolved haplotypes were associated with lung cancer risk in the entire population.