scispace - formally typeset
Search or ask a question

Showing papers by "Xihong Lin published in 2010"


Journal ArticleDOI
TL;DR: SNPs are grouped together into SNP sets on the basis of proximity to genomic features such as genes or haplotype blocks, then testing the joint effect of each SNP set, showing that SNP-set testing can have improved power over standard individual-SNP analysis under a wide range of settings.
Abstract: GWAS have emerged as popular tools for identifying genetic variants that are associated with disease risk. Standard analysis of a case-control GWAS involves assessing the association between each individual genotyped SNP and disease risk. However, this approach suffers from limited reproducibility and difficulties in detecting multi-SNP and epistatic effects. As an alternative analytical strategy, we propose grouping SNPs together into SNP sets on the basis of proximity to genomic features such as genes or haplotype blocks, then testing the joint effect of each SNP set. Testing of each SNP set proceeds via the logistic kernel-machine-based test, which is based on a statistical framework that allows for flexible modeling of epistatic and nonlinear SNP effects. This flexibility and the ability to naturally adjust for covariate effects are important features of our test that make it appealing in comparison to individual SNP tests and existing multimarker tests. Using simulated data based on the International HapMap Project, we show that SNP-set testing can have improved power over standard individual-SNP analysis under a wide range of settings. In particular, we find that our approach has higher power than individual-SNP analysis when the median correlation between the disease-susceptibility variant and the genotyped SNPs is moderate to high. When the correlation is low, both individual-SNP analysis and the SNP-set analysis tend to have low power. We apply SNP-set analysis to analyze the Cancer Genetic Markers of Susceptibility (CGEMS) breast cancer GWAS discovery-phase data.

592 citations


Journal ArticleDOI
TL;DR: D disruption of the folate pathway contributes to the incidence of AVSD among individuals with DS, and SLC19A1 was found to be associated with AVSD.
Abstract: Cardiac abnormalities are one of the most common congenital defects observed in individuals with Down syndrome. Considerable research has implicated both folate deficiency and genetic variation in folate pathway genes with birth defects, including both congenital heart defects (CHD) and Down syndrome (DS). Here, we test variation in folate pathway genes for a role in the major DS-associated CHD atrioventricular septal defect (AVSD). In a group of 121 case families (mother, father, and proband with DS and AVSD) and 122 control families (mother, father, and proband with DS and no CHD), tag SNPs were genotyped in and around five folate pathway genes: 5,10-methylenetetrahyrdofolate reductase (MTHFR), methionine synthase (MTR), methionine synthase reductase (MTRR), cystathionine b-synthase (CBS), and the reduced folate carrier (SLC19A1, RFC1). SLC19A1 was found to be associated with AVSD using a multilocus allele-sharing test. Individual SNP tests also showed nominally significant associations with odds ratios of between 1.34 and 3.78, depending on the SNP and genetic model. Interestingly, all marginally significant SNPs in SLC19A1 are in strong linkage disequilibrium (r 2 Z0.8) with the nonsynonymous coding SNP rs1051266 (c.80A4G), which has previously been associated with nonsyndromic cases of CHD. In addition to SLC19A1, the known functional polymorphism MTHFR c.1298Awas overtransmitted to cases with AVSD (P 50.05) and under-transmitted to controls (P 50.02). We conclude, therefore, that disruption of the folate pathway contributes to the incidence of AVSD among individuals with DS. Genet. Epidemiol. 34:613–623, 2010. r 2010 Wiley-Liss, Inc.

79 citations


Journal ArticleDOI
TL;DR: It is suggested that the genetic variants of CASP7 and CASP9 in the apoptosis pathway may be important predictive markers for EA susceptibility and that PGR in the sex hormone signaling pathway may been associated with the gender differences in EA risk.
Abstract: The incidence of esophageal adenocarcinoma (EA) has been increasing rapidly, particularly among white males, over the past few decades in the USA. However, the etiology of EA and the striking male predominance is not fully explained by known risk factors. To identify susceptible genes for EA risk, we conducted a pathway-based candidate gene association study on 335 Caucasian EA cases and 319 Caucasian controls. A total of 1330 single-nucleotide polymorphisms (SNPs) selected from 354 genes were analyzed using an Illumina GoldenGate assay. The genotyped common SNPs include missense and exonic SNPs, SNPs within untranslated regions and 2 kb 5' of the gene, and tagSNPs for genes with little functional information available. Logistic regression adjusted for potential confounders was used to assess the genetic effect of each SNP on EA risk. We also tested gene-gender interactions using the likelihood ratio tests. We found that the genetic variants in the apoptosis pathway were significantly associated with EA risk after correcting for multiple comparisons. SNPs of rs3127075 in Caspase-7 (case7) and rs4661636 in Caspase-9 (CASP9) genes that play a critical role in apoptosis were found to be associated with an increased risk of EA. A protective effect of SNP rs572483 in the progesterone receptor (PGR) gene was observed among women carrying the variant G allele [adjusted odds ratio (OR) = 0.19; 95% confidence interval (CI) = 0.08-0.46] but was not observed among men (adjusted OR = 1.38; 95% CI = 0.95-2.00). In conclusion, this study suggests that the genetic variants of CASP7 and CASP9 in the apoptosis pathway may be important predictive markers for EA susceptibility and that PGR in the sex hormone signaling pathway may be associated with the gender differences in EA risk.

55 citations


Journal ArticleDOI
TL;DR: It is argued that the commonly assumed DP prior implies a nonzero mean of the random effect distribution, even when a base measure with mean zero is specified, and can therefore lead to biased estimators and poor inference for the regression coefficients and the spline estimator of the nonparametric function.
Abstract: We consider Bayesian inference in semiparametric mixed models (SPMMs) for longitudinal data. SPMMs are a class of models that use a nonparametric function to model a time effect, a parametric function to model other covariate effects, and parametric or nonparametric random effects to account for the within-subject correlation. We model the nonparametric function using a Bayesian formulation of a cubic smoothing spline, and the random effect distribution using a normal distribution and alternatively a nonparametric Dirichlet process (DP) prior. When the random effect distribution is assumed to be normal, we propose a uniform shrinkage prior (USP) for the variance components and the smoothing parameter. When the random effect distribution is modeled nonparametrically, we use a DP prior with a normal base measure and propose a USP for the hyperparameters of the DP base measure. We argue that the commonly assumed DP prior implies a nonzero mean of the random effect distribution, even when a base measure with mean zero is specified. This implies weak identifiability for the fixed effects, and can therefore lead to biased estimators and poor inference for the regression coefficients and the spline estimator of the nonparametric function. We propose an adjustment using a postprocessing technique. We show that under mild conditions the posterior is proper under the proposed USP, a flat prior for the fixed effect parameters, and an improper prior for the residual variance. We illustrate the proposed approach using a longitudinal hormone dataset, and carry out extensive simulation studies to compare its finite sample performance with existing methods.

52 citations


Journal ArticleDOI
13 May 2010-PLOS ONE
TL;DR: Simulations suggest that canonical correlation analysis has higher power than standard pairwise univariate regression to detect single nucleotide polymorphisms when the expression trait has low heritability.
Abstract: Background Discovering genetic associations between genetic markers and gene expression levels can provide insight into gene regulation and, potentially, mechanisms of disease. Such analyses typically involve a linkage or association analysis in which expression data are used as phenotypes. This approach leads to a large number of multiple comparisons and may therefore lack power. We assess the potential of applying canonical correlation analysis to partitioned genomewide data as a method for discovering regulatory variants. Methodology/Principal Findings Simulations suggest that canonical correlation analysis has higher power than standard pairwise univariate regression to detect single nucleotide polymorphisms when the expression trait has low heritability. The increase in power is even greater under the recessive model. We demonstrate this approach using the Childhood Asthma Management Program data. Conclusions/Significance Our approach reduces multiple comparisons and may provide insight into the complex relationships between genotype and gene expression.

50 citations


Journal ArticleDOI
TL;DR: This work proposes a principal component analysis-based statistical method, ProPCA, for efficiently estimating relative protein abundance from bottom-up label-free LC-MS/MS data that incorporates both spectral count information andLC-MS peptide ion peak attributes, such as peak area, volume, or height.

49 citations


Journal ArticleDOI
TL;DR: Although a history of reflux is an important risk for EA, multifactor interactions also play important roles in EA risk.
Abstract: Purpose Apoptosis pathway, gastroesophageal reflux symptoms (reflux), higher body mass index (BMI), and tobacco smoking have been individually associated with esophageal adenocarcinoma (EA) development. However, how multiple factors jointly affect EA risk remains unclear. Patients and Methods In total, 305 patients with EA and 339 age- and sex-matched controls were studied. High-order interactions among reflux, BMI, smoking, and functional polymorphisms in five apoptotic genes (FAS, FASL, IL1B, TP53BP, and BAT3) were investigated by entropy-based multifactor dimensionality reduction (MDR), classification and regression tree (CART), and traditional logistic regression (LR) models. Results In LR analysis, reflux, BMI, and smoking were significantly associated with EA risk, with reflux as the strongest individual factor. No individual single nucleotide polymorphism was associated with EA susceptibility. However, there was a two-way interaction between IL1B + 3954C>T and reflux (P = .008). In both CART and MD...

39 citations


Journal ArticleDOI
TL;DR: A class of augmented inverse probability weighted (AIPW) kernel estimating equations for nonparametric regression under MAR is proposed and it is shown that a specific AIPW kernel estimator in this class that employs the fitted values from a model for the conditional mean of the outcome given covariates and auxiliaries is double-robust.
Abstract: We consider nonparametric regression of a scalar outcome on a covariate when the outcome is missing at random (MAR) given the covariate and other observed auxiliary variables. We propose a class of augmented inverse probability weighted (AIPW) kernel estimating equations for nonparametric regression under MAR. We show that AIPW kernel estimators are consistent when the probability that the outcome is observed, that is, the selection probability, is either known by design or estimated under a correctly specified model. In addition, we show that a specific AIPW kernel estimator in our class that employs the fitted values from a model for the conditional mean of the outcome given covariates and auxiliaries is double-robust, that is, it remains consistent if this model is correctly specified even if the selection probabilities are modeled or specified incorrectly. Furthermore, when both models happen to be right, this double-robust estimator attains the smallest possible asymptotic variance of all AIPW kernel...

32 citations


Journal ArticleDOI
TL;DR: A protective association between childbearing and lung cancer is observed, adding to existing evidence that reproductive factors may moderate lung cancer risk in women.
Abstract: Patterns of lung cancer incidence suggest that gender-associated factors may influence lung cancer risk. Given the association of parity with risk of some women's cancers, the authors hypothesized that childbearing history may also be associated with lung cancer. Women enrolled in the Lung Cancer Susceptibility Study at Massachusetts General Hospital (Boston, Massachusetts) between 1992 and 2004 (1,004 cases, 848 controls) were available for analysis of the association between parity and lung cancer risk. Multivariate logistic regression was used to estimate adjusted odds ratios and 95% confidence intervals. After results were controlled for age and smoking history, women with at least 1 child had 0.71 times the odds of lung cancer as women without children (odds ratio = 0.71, 95% confidence interval: 0.52, 0.97). A significant linear trend was found: Lung cancer risk decreased with increasing numbers of children (P < 0.001). This inverse association was stronger in never smokers (P = 0.12) and was limited to women over age 50 years at diagnosis (P = 0.17). Age at first birth was not associated with risk. The authors observed a protective association between childbearing and lung cancer, adding to existing evidence that reproductive factors may moderate lung cancer risk in women.

25 citations


Journal ArticleDOI
TL;DR: A likelihood-based framework that models potential outcomes in this setting and a Bayes procedure for statistical inference is proposed and is compared with a method of moments approach proposed by Cheng & Small (2006).
Abstract: Summary. Data analysis for randomized trials including multitreatment arms is often complicated by subjects who do not comply with their treatment assignment.We discuss here methods of estimating treatment efficacy for randomized trials involving multitreatment arms subject to non-compliance. One treatment effect of interest in the presence of non-compliance is the complier average causal effect, which is defined as the treatment effect for subjects who would comply regardless of the treatment assigned. Following the idea of principal stratification, we define principal compliance in trials with three treatment arms, extend the complier average causal effect and define causal estimands of interest in this setting. In addition, we discuss structural assumptions that are needed for estimation of causal effects and the identifiability problem that is inherent in this setting from both a Bayesian and a classical statistical perspective. We propose a likelihood-based framework that models potential outcomes in this setting and a Bayes procedure for statistical inference. We compare our method with a method-of-moments approach that was proposed by Cheng and Small in 2006 by using a hypothetical data set, and we further illustrate our approach with an application to a behavioural intervention study.

24 citations


Journal ArticleDOI
TL;DR: Whether the portable X-ray fluorescence spectrometer (XRF) is suitable for analysis of five metals (manganese, iron, zinc, copper, and chromium) on 37-mm polytetrafluoroethylene filters and the relationships between measurement results of each metal obtained are assessed are evaluated.
Abstract: Elemental analysis of welding fume samples can be done using several laboratory-based techniques. However, portable measurement techniques could offer several advantages. In this study, we sought to determine whether the portable X-ray fluorescence spectrometer (XRF) is suitable for analysis of five metals (manganese, iron, zinc, copper, and chromium) on 37-mm polytetrafluoroethylene filters. Using this filter fitted on a cyclone in line with a personal pump, gravimetric samples were collected from a group of boilermakers exposed to welding fumes. We assessed the assumption of uniform deposition of these metals on the filters, and the relationships between measurement results of each metal obtained from traditional laboratory-based XRF and the portable XRF. For all five metals of interest, repeated measurements with the portable XRF at the same filter area showed good consistency (reliability ratios are equal or close to 1.0 for almost all metals). The portable XRF readings taken from three different area...

Journal Article
TL;DR: This work proposes a working independent profile likelihood method for the semiparametric time-varying coefficient model with correlation, evaluates the performance of proposed nonparametric kernel estimator and the profile estimator, and applies the method to the western Kenya parasitemia data.
Abstract: We propose a working independent profile likelihood method for the semiparametric time-varying coefficient model with correlation. Kernel likelihood is used to estimate time-varying coefficient. Profile likelihood for the parametric coefficient is formed by plugging in the nonparametric estimator. For independent data, the estimator is asymptotically normal and achieves the asymptotic semiparametric efficiency bound. We evaluate the performance of proposed nonparametric kernel estimator and the profile estimator, and apply the method to the western Kenya parasitemia data.

Journal ArticleDOI
TL;DR: Nonparametric estimation of the association between consecutive gap times based on Kendall's τ in the presence of dependent censoring is discussed, and a nonparametric estimator that uses inverse probability of censoring weights is provided.
Abstract: Summary In life history studies, interest often lies in the analysis of the interevent, or gap times and the association between event times. Gap time analyses are challenging however, even when the length of follow-up is determined independently of the event process, because associations between gap times induce dependent censoring for second and subsequent gap times. This article discusses nonparametric estimation of the association between consecutive gap times based on Kendall's τ in the presence of this type of dependent censoring. A nonparametric estimator that uses inverse probability of censoring weights is provided. Estimates of conditional gap time distributions can be obtained following specification of a particular copula function. Simulation studies show the estimator performs well and compares favorably with an alternative estimator. Generalizations to a piecewise constant Clayton copula are given. Several simulation studies and illustrations with real data sets are also provided.

Journal ArticleDOI
TL;DR: Clinical outcomes and prognostic factors in 853 NET pts enrolled in a large, prospective outcomes study remain poorly defined, and the clinical course of NET patients (pts) remains poorly defined.
Abstract: 4044 Background: The clinical course of NET patients (pts) remains poorly defined. We evaluated clinical outcomes and prognostic factors in 853 NET pts enrolled in a large, prospective outcomes stu...