scispace - formally typeset
Search or ask a question

Showing papers by "Xihong Lin published in 2009"


Journal ArticleDOI
TL;DR: Five SNPs were identified that may be prognostic of overall survival in early-stage NSCLC and all five were located in known genes: STK39, PCDH7, A2BP1, and EYA2.
Abstract: Purpose Lung cancer, of which 85% is non–small-cell (NSCLC), is the leading cause of cancer-related death in the United States. We used genome-wide analysis of tumor tissue to investigate whether single nucleotide polymorphisms (SNPs) in tumors are prognostic factors in early-stage NSCLC. Patients and Methods One hundred early-stage NSCLC patients from Massachusetts General Hospital (MGH) were used as a discovery set and 89 NSCLC patients collected by the National Institute of Occupational Health, Norway, were used as a validation set. DNA was extracted from flash-frozen lung tissue with at least 70% tumor cellularity. Genome-wide genotyping was done using the high-density SNP chip. Copy numbers were inferred using median smoothing after intensity normalization. Cox models were used to screen and validate significant SNPs associated with the overall survival. Results Copy number gains in chromosomes 3q, 5p, and 8q were observed in both MGH and Norwegian cohorts. The top 50 SNPs associated with overall sur...

121 citations


Journal ArticleDOI
TL;DR: The results show that sLDA-based testing provides a powerful approach to test for the significance of a differentially expressed pathway and gene selection.
Abstract: Motivation: Pathway and gene set-based approaches for the analysis of gene expression profiling experiments have become increasingly popular for addressing problems associated with individual gene analysis. Since most genes are not differently expressed, existing gene set tests, which consider all the genes within a gene set, are subject to considerable noise and power loss, a concern exacerbated in studies in which the degree of differential expression is moderate for truly differentially expressed genes. For a significantly differentially expressed pathway, it is also of substantial interest to select important genes that drive the differential expression of the pathway. Methods: We develop a unified framework to jointly test the significance of a pathway and to select a subset of genes that drive the significant pathway effect. To achieve dimension reduction and gene selection, we decompose each gene pathway into a single score by using a regularized form of linear discriminant analysis, called sparse linear discriminant analysis (sLDA). Testing for the significance of the pathway effect proceeds via permutation of the sLDA score. The sLDA-based test is compared with competing approaches with simulations and two applications: a study on the effect of metal fume exposure on immune response and a study of gene expression profiles among Type II Diabetes patients. Results: Our results show that sLDA-based testing provides a powerful approach to test for the significance of a differentially expressed pathway and gene selection. Availability: An implementation of the proposed sLDA-based pathway test in the R statistical computing environment is available at http://www.hsph.harvard.edu/~mwu/software/ Contact: xlin@hsph.harvard.edu Supplementary information:Supplementary data are available at Bioinformatics online.

115 citations


Journal ArticleDOI
TL;DR: Three approaches to estimating treatment efficacy in clinical trials that involve randomization to an active treatment or a control treatment are compared, where the treatment effect is estimated using the randomization indicator as an IV.
Abstract: Summary. We consider the analysis of clinical trials that involve randomization to an active treatment (T =1 ) or ac ontrol treatment (T = 0), when the active treatment is subject to all-or-nothing compliance. We compare three approaches to estimating treatment efficacy in this situation: as-treated analysis, per-protocol analysis, and instrumental variable (IV) estimation, where the treatment effect is estimated using the randomization indicator as an IV. Both model- and method-ofmoment based IV estimators are considered. The assumptions underlying these estimators are assessed, standard errors and mean squared errors of the estimates are compared, and design implications of the three methods are examined. Extensions of the methods to include observed covariates are then discussed, emphasizing the role of compliance propensity methods and the contrasting role of covariates in these extensions. Methods are illustrated on data from the Women Take Pride study, an assessment of behavioral treatments for women with heart disease.

103 citations


Journal ArticleDOI
TL;DR: This article provides a review of statistical methods for analysis of microarray data by incorporating prior biological knowledge using gene sets and biological pathways, which consist of groups of biologically similar genes.
Abstract: An increasing challenge in analysis of microarray data is how to interpret and gain biological insight of profiles of thousands of genes. This article provides a review of statistical methods for analysis of microarray data by incorporating prior biological knowledge using gene sets and biological pathways, which consist of groups of biologically similar genes. We first discuss issues of individual gene analysis. We compare several methods for analysis of gene sets including over-representation anlaysis, gene set enrichment analysis, principal component analysis, global test and kernel machine. We discuss the assumptions of these methods and their pros and cons. We illustrate these methods by application to a type II diabetes data set.

58 citations


Journal ArticleDOI
06 Feb 2009-PLOS ONE
TL;DR: Testing the hypothesis that an imbalance between neutrophil elastase (HNE) and its inhibitors in blood is related to the development of ARDS found that plasma profiles of PI3, HNE, and HNE/PI3 may be useful clinical biomarkers in monitoring the development.
Abstract: Background We conducted an exploratory study of genome-wide gene expression in whole blood and found that the expression of neutrophil elastase inhibitor (PI3, elafin) was down-regulated during the early phase of ARDS. Further analyses of plasma PI3 levels revealed a rapid decrease during early ARDS development. PI3 and secretory leukocyte proteinase inhibitor (SLPI) are important low-molecular-weight proteinase inhibitors produced locally at neutrophil infiltration site in the lung. In this study, we tested the hypothesis that an imbalance between neutrophil elastase (HNE) and its inhibitors in blood is related to the development of ARDS. Methodology/Principal Findings PI3, SLPI, and HNE were measured in plasma samples collected from 148 ARDS patients and 63 critical ill patients at risk for ARDS (controls). Compared with the controls, the ARDS patients had higher HNE, but lower PI3, at the onset of ARDS, resulting in increased HNE/PI3 ratio (mean = 14.5; 95% CI, 10.9–19.4, P<0.0001), whereas plasma SLPI was not associated with the risk of ARDS development. Although the controls had elevated plasma PI3 and HNE, their HNE/PI3 ratio (mean = 6.5; 95% CI, 4.9–8.8) was not significantly different from the healthy individuals (mean = 3.9; 95% CI, 2.7–5.9). Before the onset (7-days period prior to ARDS diagnosis), we only observed significantly elevated HNE, but the HNE-PI3 balance remained normal. With the progress from prior to the onset of ARDS, the plasma level of PI3 declined, whereas HNE was maintained at a higher level, tilting the balance toward more HNE in the circulation as characterized by an increased HNE/PI3 ratio. In contrast, three days after ICU admission, there was a significant drop of HNE/PI3 ratio in the at-risk controls. Conclusions/Significance Plasma profiles of PI3, HNE, and HNE/PI3 may be useful clinical biomarkers in monitoring the development of ARDS.

57 citations


Journal Article
TL;DR: This paper proposes a new class of linear mixed models for spatial data in the presence of covariate measurement errors, and develops a structural modeling approach to obtaining the maximum likelihood estimator by accounting for the measurement error.
Abstract: Spatial data with covariate measurement errors have been commonly observed in public health studies. Existing work mainly concentrates on parameter estimation using Gibbs sampling, and no work has been conducted to understand and quantify the theoretical impact of ignoring measurement error on spatial data analysis in the form of the asymptotic biases in regression coefficients and variance components when measurement error is ignored. Plausible implementations, from frequentist perspectives, of maximum likelihood estimation in spatial covariate measurement error models are also elusive. In this paper, we propose a new class of linear mixed models for spatial data in the presence of covariate measurement errors. We show that the naive estimators of the regression coefficients are attenuated while the naive estimators of the variance components are inflated, if measurement error is ignored. We further develop a structural modeling approach to obtaining the maximum likelihood estimator by accounting for the measurement error. We study the large sample properties of the proposed maximum likelihood estimator, and propose an EM algorithm to draw inference. All the asymptotic properties are shown under the increasing-domain asymptotic framework. We illustrate the method by analyzing the Scottish lip cancer data, and evaluate its performance through a simulation study, all of which elucidate the importance of adjusting for covariate measurement errors.

39 citations


Journal ArticleDOI
TL;DR: A randomized controlled trial of two formats of a program to enhance management of heart disease by patients was conducted, finding the self-directed format was better than the control in reducing the number, frequency, and bothersomeness of cardiac symptoms.
Abstract: A randomized controlled trial of two formats of a program (Women Take PRIDE) to enhance management of heart disease by patients was conducted. Older women (N = 575) were randomly assigned to a group or self-directed format or to a control group. Data regarding symptoms, functional health status, and weight were collected at baseline and at 4, 12, and 18 months. The formats produced different outcomes. At 18 months, the self-directed format was better than the control in reducing the number (p ≤ .02), frequency (p ≤ .03), and bothersomeness (p ≤ .02) of cardiac symptoms. The self-directed format was also better than the group format in reducing symptom frequency of all types (p ≤ .04). The group format improved ambulation at 12 months (p ≤ .04) and weight loss at 18 months (p ≤ .03), and group participants were more likely to complete the program (p ≤ .05). The availability of different learning formats could enhance management of cardiovascular disease by patients.

25 citations


Journal ArticleDOI
TL;DR: An estimating equation approach based on the pseudo conditional score method is proposed, and it is shown the resulting estimators of the regression coefficients are consistent and asymptotically normal.
Abstract: We consider semiparametric transition measurement error models for longitudinal data, where one of the covariates is measured with error in transition models, and no distributional assumption is made for the underlying unobserved covariate. An estimating equation approach based on the pseudo conditional score method is proposed. We show the resulting estimators of the regression coefficients are consistent and asymptotically normal. We also discuss the issue of efficiency loss. Simulation studies are conducted to examine the finite-sample performance of our estimators. The longitudinal AIDS Costs and Services Utilization Survey data are analyzed for illustration.

16 citations


Journal ArticleDOI
TL;DR: This paper develops and applies a general variance- component framework for pedigree analysis of continuous and categorical outcomes and demonstrates that one can perform variance-component pedigree analysis on outcomes that follow any exponential-family distribution.
Abstract: Variance-component methods are popular and flexible analytic tools for elucidating the genetic mechanisms of complex quantitative traits from pedigree data. However, variance-component methods typically assume that the trait of interest follows a multivariate normal distribution within a pedigree. Studies have shown that violation of this normality assumption can lead to biased parameter estimates and inflations in type-I error. This limits the application of variance-component methods to more general trait outcomes, whether continuous or categorical in nature. In this paper, we develop and apply a general variance-component framework for pedigree analysis of continuous and categorical outcomes. We develop appropriate models using generalized-linear mixed model theory and fit such models using approximate maximum-likelihood procedures. Using our proposed method, we demonstrate that one can perform variance-component pedigree analysis on outcomes that follow any exponential-family distribution. Additionally, we also show how one can modify the method to perform pedigree analysis of ordinal outcomes. We also discuss extensions of our variance-component framework to accommodate pedigrees ascertained based on trait outcome. We demonstrate the feasibility of our method using both simulated data and data from a genetic study of ovarian insufficiency.

4 citations


Journal ArticleDOI
TL;DR: In this article, a very useful review of comparing several parametric and nonparametric tests in the two-sample problem is provided. But the review is limited to two-dimensional data.
Abstract: We would like to thank Professor Lehmann for providing a very useful review of comparing several parametric and nonparametric tests in the two-sample problem. Such a review is timely, as many class...

1 citations