scispace - formally typeset
Search or ask a question

Showing papers in "Biometrics in 2008"


Journal ArticleDOI
TL;DR: New, flexible capture-recapture models that use the capture locations to estimate animal locations and spatially referenced capture probability are proposed, which allow use of Akaike's information criterion or other likelihood-based methods of model selection.
Abstract: Live-trapping capture-recapture studies of animal populations with fixed trap locations inevitably have a spatial component: animals close to traps are more likely to be caught than those far away. This is not addressed in conventional closed-population estimates of abundance and without the spatial component, rigorous estimates of density cannot be obtained. We propose new, flexible capture-recapture models that use the capture locations to estimate animal locations and spatially referenced capture probability. The models are likelihood-based and hence allow use of Akaike's information criterion or other likelihood-based methods of model selection. Density is an explicit parameter, and the evaluation of its dependence on spatial or temporal covariates is therefore straightforward. Additional (nonspatial) variation in capture probability may be modeled as in conventional capture-recapture. The method is tested by simulation, using a model in which capture probability depends only on location relative to traps. Point estimators are found to be unbiased and standard error estimators almost unbiased. The method is used to estimate the density of Red-eyed Vireos (Vireo olivaceus) from mist-netting data from the Patuxent Research Refuge, Maryland, U.S.A. Estimates agree well with those from an existing spatially explicit method based on inverse prediction. A variety of additional spatially explicit models are fitted; these include models with temporal stratification, behavioral response, and heterogeneous animal home ranges.

738 citations


Journal ArticleDOI
TL;DR: A new method called the OSCAR (octagonal shrinkage and clustering algorithm for regression) is proposed to simultaneously select variables while grouping them into predictive clusters, in addition to improving prediction accuracy and interpretation.
Abstract: Variable selection can be challenging, particularly in situations with a large number of predictors with possibly high correlations, such as gene expression data. In this article, a new method called the OSCAR (octagonal shrinkage and clustering algorithm for regression) is proposed to simultaneously select variables while grouping them into predictive clusters. In addition to improving prediction accuracy and interpretation, these resulting groups can then be investigated further to discover what contributes to the group having a similar behavior. The technique is based on penalized least squares with a geometrically intuitive penalty function that shrinks some coefficients to exactly zero. Additionally, this penalty yields exact equality of some coefficients, encouraging correlated predictors that have a similar effect on the response to form predictive clusters represented by a single coefficient. The proposed procedure is shown to compare favorably to the existing shrinkage and variable selection techniques in terms of both prediction error and model complexity, while yielding the additional grouping information.

488 citations


Journal ArticleDOI
TL;DR: The approach first constructs a prior chosen to be vague in a suitable sense, and updates this prior to obtain a sequence of posteriors corresponding to each of a range of sample sizes, and compute a distance between each posterior and the parametric prior.
Abstract: We present a definition for the effective sample size of a parametric prior distribution in a Bayesian model, and propose methods for computing the effective sample size in a variety of settings. Our approach first constructs a prior chosen to be vague in a suitable sense, and updates this prior to obtain a sequence of posteriors corresponding to each of a range of sample sizes. We then compute a distance between each posterior and the parametric prior, defined in terms of the curvature of the logarithm of each distribution, and the posterior minimizing the distance defines the effective sample size of the prior. For cases where the distance cannot be computed analytically, we provide a numerical approximation based on Monte Carlo simulation. We provide general guidelines for application, illustrate the method in several standard cases where the answer seems obvious, and then apply it to some nonstandard settings.

216 citations


Journal ArticleDOI
TL;DR: How covariates can influence the mood variances is described, and the standard mixed model is extended by adding a subject-level random effect to the within-subject variance specification, allowing subjects to have influence on the mean, or location, and variability, or (square of the) scale, of their mood responses.
Abstract: For longitudinal data, mixed models include random subject effects to indicate how subjects influence their responses over repeated assessments. The error variance and the variance of the random effects are usually considered to be homogeneous. These variance terms characterize the within-subjects (i.e., error variance) and between-subjects (i.e., random-effects variance) variation in the data. In studies using ecological momentary assessment (EMA), up to 30 or 40 observations are often obtained for each subject, and interest frequently centers around changes in the variances, both within and between subjects. In this article, we focus on an adolescent smoking study using EMA where interest is on characterizing changes in mood variation. We describe how covariates can influence the mood variances, and also extend the standard mixed model by adding a subject-level random effect to the within-subject variance specification. This permits subjects to have influence on the mean, or location, and variability, or (square of the) scale, of their mood responses. Additionally, we allow the location and scale random effects to be correlated. These mixed-effects location scale models have useful applications in many research areas where interest centers on the joint modeling of the mean and variance structure.

209 citations


Journal ArticleDOI
TL;DR: A novel maximally selected rank statistic is derived from this framework for a censored response partitioned with respect to two ordered categorical covariates and potential interactions and is employed to search for a high-risk group of rectal cancer patients treated with a neo-adjuvant chemoradiotherapy.
Abstract: Maximally selected statistics for the estimation of simple cutpoint models are embedded into a generalized conceptual framework based on conditional inference procedures. This powerful framework contains most of the published procedures in this area as special cases, such as maximally selected chi-squared and rank statistics, but also allows for direct construction of new test procedures for less standard test problems. As an application, a novel maximally selected rank statistic is derived from this framework for a censored response partitioned with respect to two ordered categorical covariates and potential interactions. This new test is employed to search for a high-risk group of rectal cancer patients treated with a neo-adjuvant chemoradiotherapy. Moreover, a new efficient algorithm for the evaluation of the asymptotic distribution for a large class of maximally selected statistics is given enabling the fast evaluation of a large number of cutpoints.

206 citations


Journal ArticleDOI
TL;DR: Taking a semiparametric theory perspective, this work proposes a broadly applicable approach to adjustment for auxiliary covariates to achieve more efficient estimators and tests for treatment parameters in the analysis of randomized clinical trials.
Abstract: The primary goal of a randomized clinical trial is to make comparisons among two or more treatments. For example, in a two-arm trial with continuous response, the focus may be on the difference in treatment means; with more than two treatments, the comparison may be based on pairwise differences. With binary outcomes, pairwise odds ratios or log odds ratios may be used. In general, comparisons may be based on meaningful parameters in a relevant statistical model. Standard analyses for estimation and testing in this context typically are based on the data collected on response and treatment assignment only. In many trials, auxiliary baseline covariate information may also be available, and it is of interest to exploit these data to improve the efficiency of inferences. Taking a semiparametric theory perspective, we propose a broadly applicable approach to adjustment for auxiliary covariates to achieve more efficient estimators and tests for treatment parameters in the analysis of randomized clinical trials. Simulations and applications demonstrate the performance of the methods.

202 citations


Journal ArticleDOI
TL;DR: This article proposes a novel empirical Bayes-type shrinkage estimator to analyze case-control data that can relax the gene-environment independence assumption in a data-adaptive fashion and suggests that the proposed estimator strikes a balance between bias and efficiency depending on the true nature of the Gene-environment association and the sample size for a given study.
Abstract: Summary Standard prospective logistic regression analysis of case–control data often leads to very imprecise estimates of gene-environment interactions due to small numbers of cases or controls in cells of crossing genotype and exposure. In contrast, under the assumption of gene-environment independence, modern “retrospective” methods, including the “case-only” approach, can estimate the interaction parameters much more precisely, but they can be seriously biased when the underlying assumption of gene-environment independence is violated. In this article, we propose a novel empirical Bayes-type shrinkage estimator to analyze case–control data that can relax the gene-environment independence assumption in a data-adaptive fashion. In the special case, involving a binary gene and a binary exposure, the method leads to an estimator of the interaction log odds ratio parameter in a simple closed form that corresponds to an weighted average of the standard case-only and case–control estimators. We also describe a general approach for deriving the new shrinkage estimator and its variance within the retrospective maximum-likelihood framework developed by Chatterjee and Carroll (2005, Biometrika92, 399–418). Both simulated and real data examples suggest that the proposed estimator strikes a balance between bias and efficiency depending on the true nature of the gene-environment association and the sample size for a given study.

190 citations


Journal ArticleDOI
TL;DR: This work proposes a new and intuitive two-stage probabilistic approach, which leads to a general framework to simultaneously compare multiple communities based on abundance data, and extends the commonly used Morisita index and NESS index to the case of N communities.
Abstract: A traditional approach for assessing similarity among N (N > 2) communities is to use multiple pairwise comparisons. However, pairwise similarity indices do not completely characterize multiple-community similarity because the information shared by at least three communities is ignored. We propose a new and intuitive two-stage probabilistic approach, which leads to a general framework to simultaneously compare multiple communities based on abundance data. The approach is specifically used to extend the commonly used Morisita index and NESS (normalized expected species shared) index to the case of N communities. For comparing N communities, a profile of N - 1 indices is proposed to characterize similarity of species composition across communities. Based on sample abundance data, nearly unbiased estimators of the proposed indices and their variances are obtained. These generalized NESS and Morisita indices are applied to comparison of three size classes of plant data (seedling, saplings, and trees) within old-growth and secondary rain forest plots in Costa Rica. © 2008, The International Biometric Society.

188 citations


Journal ArticleDOI
TL;DR: Two methods that simultaneously separate data points into similar clusters and select informative variables that contribute to the clustering are proposed in the framework of penalized model‐based clustering.
Abstract: Variable selection in high‐dimensional clustering analysis is an important yet challenging problem In this article, we propose two methods that simultaneously separate data points into similar clusters and select informative variables that contribute to the clustering Our methods are in the framework of penalized model‐based clustering Unlike the classical L1‐norm penalization, the penalty terms that we propose make use of the fact that parameters belonging to one variable should be treated as a natural “group” Numerical results indicate that the two new methods tend to remove noninformative variables more effectively and provide better clustering results than the L1‐norm approach

185 citations


Journal ArticleDOI
TL;DR: This work addresses the problem of testing many partial conjunction hypotheses simultaneously using the false discovery rate (FDR) approach and proves that if the FDR controlling procedure in Benjamini and Hochberg is used for this purpose the FDR is controlled under various dependency structures.
Abstract: We consider the problem of testing for partial conjunction of hypothesis, which argues that at least u out of n tested hypotheses are false. It offers an in-between approach to the testing of the conjunction of null hypotheses against the alternative that at least one is not, and the testing of the disjunction of null hypotheses against the alternative that all hypotheses are not null. We suggest powerful test statistics for testing such a partial conjunction hypothesis that are valid under dependence between the test statistics as well as under independence. We then address the problem of testing many partial conjunction hypotheses simultaneously using the false discovery rate (FDR) approach. We prove that if the FDR controlling procedure in Benjamini and Hochberg (1995, Journal of the Royal Statistical Society, Series B 57, 289-300) is used for this purpose the FDR is controlled under various dependency structures. Moreover, we can screen at all levels simultaneously in order to display the findings on a superimposed map and still control an appropriate FDR measure. We apply the method to examples from microarray analysis and functional magnetic resonance imaging (fMRI), two application areas where the need for partial conjunction analysis has been identified.

179 citations


Journal ArticleDOI
TL;DR: The state-space formulation provides a generic and flexible framework for modeling and inference in models with individual effects, and it yields a practical means of estimation in these complex problems via contemporary methods of Markov chain Monte Carlo.
Abstract: In population and evolutionary biology, there exists considerable interest in individual heterogeneity in parameters of demographic models for open populations. However, flexible and practical solutions to the development of such models have proven to be elusive. In this article, I provide a state-space formulation of open population capture-recapture models with individual effects. The state-space formulation provides a generic and flexible framework for modeling and inference in models with individual effects, and it yields a practical means of estimation in these complex problems via contemporary methods of Markov chain Monte Carlo. A straightforward implementation can be achieved in the software package WinBUGS. I provide an analysis of a simple model with constant parameter detection and survival probability parameters. A second example is based on data from a 7-year study of European dippers, in which a model with year and individual effects is fitted.

Journal ArticleDOI
TL;DR: An estimand for evaluating a principal surrogate, the causal effect predictiveness (CEP) surface, is introduced, which quantifies how well causal treatment effects on the biomarker predict causal treatment results on the clinical endpoint.
Abstract: Frangakis and Rubin (2002, Biometrics 58, 21-29) proposed a new definition of a surrogate endpoint (a "principal" surrogate) based on causal effects. We introduce an estimand for evaluating a principal surrogate, the causal effect predictiveness (CEP) surface, which quantifies how well causal treatment effects on the biomarker predict causal treatment effects on the clinical endpoint. Although the CEP surface is not identifiable due to missing potential outcomes, it can be identified by incorporating a baseline covariate(s) that predicts the biomarker. Given case-cohort sampling of such a baseline predictor and the biomarker in a large blinded randomized clinical trial, we develop an estimated likelihood method for estimating the CEP surface. This estimation assesses the "surrogate value" of the biomarker for reliably predicting clinical treatment effects for the same or similar setting as the trial. A CEP surface plot provides a way to compare the surrogate value of multiple biomarkers. The approach is illustrated by the problem of assessing an immune response to a vaccine as a surrogate endpoint for infection.

Journal ArticleDOI
TL;DR: This joint model provides a flexible approach to handle possible nonignorable missing data in the longitudinal measurements due to dropout and is an extension of previous joint models with a single failure type, offering a possible way to model informatively censored events as a competing risk.
Abstract: In this article we study a joint model for longitudinal measurements and competing risks survival data. Our joint model provides a flexible approach to handle possible nonignorable missing data in the longitudinal measurements due to dropout. It is also an extension of previous joint models with a single failure type, offering a possible way to model informatively censored events as a competing risk. Our model consists of a linear mixed effects submodel for the longitudinal outcome and a proportional cause-specific hazards frailty submodel (Prentice et al., 1978, Biometrics 34, 541-554) for the competing risks survival data, linked together by some latent random effects. We propose to obtain the maximum likelihood estimates of the parameters by an expectation maximization (EM) algorithm and estimate their standard errors using a profile likelihood method. The developed method works well in our simulation studies and is applied to a clinical trial for the scleroderma lung disease.

Journal ArticleDOI
TL;DR: The L2 regularization is introduced, and an alternating least-squares algorithm is developed, to enable SIR to work with n < p and highly correlated predictors and simultaneous reduction estimation and predictor selection.
Abstract: In high-dimensional data analysis, sliced inverse regression (SIR) has proven to be an effective dimension reduction tool and has enjoyed wide applications. The usual SIR, however, cannot work with problems where the number of predictors, p, exceeds the sample size, n, and can suffer when there is high collinearity among the predictors. In addition, the reduced dimensional space consists of linear combinations of all the original predictors and no variable selection is achieved. In this article, we propose a regularized SIR approach based on the least-squares formulation of SIR. The L2 regularization is introduced, and an alternating least-squares algorithm is developed, to enable SIR to work with n < p and highly correlated predictors. The L1 regularization is further introduced to achieve simultaneous reduction estimation and predictor selection. Both simulations and the analysis of a microarray expression data set demonstrate the usefulness of the proposed method.

Journal ArticleDOI
TL;DR: A general framework for the analysis of animal telemetry data through the use of weighted distributions is proposed and several popular resource selection models are shown to be special cases of the general model by making assumptions about animal movement and behavior.
Abstract: We propose a general framework for the analysis of animal telemetry data through the use of weighted distributions. It is shown that several interpretations of resource selection functions arise when constructed from the ratio of a use and availability distribution. Through the proposed general framework, several popular resource selection models are shown to be special cases of the general model by making assumptions about animal movement and behavior. The weighted distribution framework is shown to be easily extended to readily account for telemetry data that are highly autocorrelated; as is typical with use of new technology such as global positioning systems animal relocations. An analysis of simulated data using several models constructed within the proposed framework is also presented to illustrate the possible gains from the flexible modeling framework. The proposed model is applied to a brown bear data set from southeast Alaska.

Journal ArticleDOI
TL;DR: The recently developed Bayesian wavelet‐based functional mixed model methodology is applied to analyze MALDI‐TOF mass spectrometry proteomic data to identify spectral regions that are differentially expressed across experimental conditions, in a way that takes both statistical and clinical significance into account and controls the Bayesian false discovery rate to a prespecified level.
Abstract: In this article, we apply the recently developed Bayesian wavelet-based functional mixed model methodology to analyze MALDI-TOF mass spectrometry proteomic data. By modeling mass spectra as functions, this approach avoids reliance on peak detection methods. The flexibility of this framework in modeling nonparametric fixed and random effect functions enables it to model the effects of multiple factors simultaneously, allowing one to perform inference on multiple factors of interest using the same model fit, while adjusting for clinical or experimental covariates that may affect both the intensities and locations of peaks in the spectra. For example, this provides a straightforward way to account for systematic block and batch effects that characterize these data. From the model output, we identify spectral regions that are differentially expressed across experimental conditions, in a way that takes both statistical and clinical significance into account and controls the Bayesian false discovery rate to a prespecified level. We apply this method to two cancer studies.

Journal ArticleDOI
TL;DR: Convenient parameterizations requiring few random effects are proposed, which allow such models to be estimated using widely available software for linear mixed models (continuous phenotypes) or generalized linear mixed Models (categorical phenotypes).
Abstract: Biometrical genetic modeling of twin or other family data can be used to decompose the variance of an observed response or 'phenotype' into genetic and environmental components. Convenient parameterizations requiring few random effects are proposed, which allow such models to be estimated using widely available software for linear mixed models (continuous phenotypes) or generalized linear mixed models (categorical phenotypes). We illustrate the proposed approach by modeling family data on the continuous phenotype birth weight and twin data on the dichotomous phenotype depression. The example data sets and commands for Stata and R/S-PLUS are available at the Biometrics website.

Journal ArticleDOI
TL;DR: New methods to analyze data from an experiment using rodent models to investigate the role of p27, an important cell‐cycle mediator, in early colon carcinogenesis are presented and suggest the existence of significant crypt signaling.
Abstract: In this article, we present new methods to analyze data from an experiment using rodent models to investigate the role of p27, an important cell-cycle mediator, in early colon carcinogenesis. The responses modeled here are essentially functions nested within a two-stage hierarchy. Standard functional data analysis literature focuses on a single stage of hierarchy and conditionally independent functions with near white noise. However, in our experiment, there is substantial biological motivation for the existence of spatial correlation among the functions, which arise from the locations of biological structures called colonic crypts: this possible functional correlation is a phenomenon we term crypt signaling. Thus, as a point of general methodology, we require an analysis that allows for functions to be correlated at the deepest level of the hierarchy. Our approach is fully Bayesian and uses Markov chain Monte Carlo methods for inference and estimation. Analysis of this data set gives new insights into the structure of p27 expression in early colon carcinogenesis and suggests the existence of significant crypt signaling. Our methodology uses regression splines, and because of the hierarchical nature of the data, dimension reduction of the covariance matrix of the spline coefficients is important: we suggest simple methods for overcoming this problem.

Journal ArticleDOI
TL;DR: Proposed regression calibration methods to jointly model longitudinal and survival data using a semiparametric longitudinal model and a proportional hazards model are investigated and applied to the analysis of a dataset for evaluating the effects of the longitudinal biomarker PSA on the recurrence of prostate cancer.
Abstract: In this article we investigate regression calibration methods to jointly model longitudinal and survival data using a semiparametric longitudinal model and a proportional hazards model. In the longitudinal model, a biomarker is assumed to follow a semiparametric mixed model where covariate effects are modeled parametrically and subject-specific time profiles are modeled nonparametrially using a population smoothing spline and subject-specific random stochastic processes. The Cox model is assumed for survival data by including both the current measure and the rate of change of the underlying longitudinal trajectories as covariates, as motivated by a prostate cancer study application. We develop a two-stage semiparametric regression calibration (RC) method. Two variations of the RC method are considered, risk set regression calibration and a computationally simpler ordinary regression calibration. Simulation results show that the two-stage RC approach performs well in practice and effectively corrects the bias from the naive method. We apply the proposed methods to the analysis of a dataset for evaluating the effects of the longitudinal biomarker PSA on the recurrence of prostate cancer.

Journal ArticleDOI
TL;DR: The results provide the first glimpse of a plausible secular trend in prostate cancer incidence and suggest that, in the absence of PSA screening, disease incidence would not have continued its historic increase, rather it would have leveled off in accordance with changes in prostate patterns of care unrelated to PSA.
Abstract: The introduction of the prostate-specific antigen (PSA) test has led to dramatic changes in the incidence of prostate cancer in the United States. In this article, we use information on the increase and subsequent decline in prostate cancer incidence following the adoption of PSA to estimate the lead time associated with PSA screening. The lead time is a key determinant of the likelihood of overdiagnosis, one of the main costs associated with the PSA test. Our approach conceptualizes observed incidence as the sum of the secular trend in incidence, which reflects incidence in the absence of PSA, and the excess incidence over and above the secular trend, which is a function of population screening patterns and the unknown lead time. We develop a likelihood model for the excess incidence given the secular trend and use it to estimate the mean lead time under specified distributional assumptions. We also develop a likelihood model for observed incidence and use it to simultaneously estimate the mean lead time together with a smooth secular trend. Variances and confidence intervals are estimated via a parametric bootstrap. Our results indicate an average lead time of approximately 4.59 years (95% confidence interval [3.24, 5.93]) for whites and 6.78 years [5.42, 8.20] for blacks with a corresponding secular trend estimate that is fairly flat after the introduction of PSA screening. These estimates correspond to overdiagnosis frequencies of approximately 22.7% and 34.4% for screen-detected whites and blacks, respectively. Our results provide the first glimpse of a plausible secular trend in prostate cancer incidence and suggest that, in the absence of PSA screening, disease incidence would not have continued its historic increase, rather it would have leveled off in accordance with changes in prostate patterns of care unrelated to PSA.

Journal ArticleDOI
TL;DR: A random effects model of repeated measures in the presence of both informative observation times and a dependent terminal event is proposed and an analysis of the cost‐accrual process of chronic heart failure patients from the clinical data repository is presented.
Abstract: In longitudinal observational studies, repeated measures are often taken at informative observation times. Also, there may exist a dependent terminal event such as death that stops the follow-up. For example, patients in poorer health are more likely to seek medical treatment and their medical cost for each visit tends to be higher. They are also subject to a higher mortality rate. In this article, we propose a random effects model of repeated measures in the presence of both informative observation times and a dependent terminal event. Three submodels are used, respectively, for (1) the intensity of recurrent observation times, (2) the amount of repeated measure at each observation time, and (3) the hazard of death. Correlated random effects are incorporated to join the three submodels. The estimation can be conveniently accomplished by Gaussian quadrature techniques, e.g., SAS Proc NLMIXED. An analysis of the cost-accrual process of chronic heart failure patients from the clinical data repository at the University of Virginia Health System is presented to illustrate the proposed method.

Journal ArticleDOI
TL;DR: The symbolic Balke-Pearl linear programming method is applied to derive closed-form formulas for the upper and lower bounds on the ACDE under various assumptions of monotonicity to enable clinical experimenters to assess the direct effect of treatment from observed data with minimum computational effort.
Abstract: This article considers the problem of estimating the average controlled direct effect (ACDE) of a treatment on an outcome, in the presence of unmeasured confounders between an intermediate variable and the outcome. Such confounders render the direct effect unidentifiable even in cases where the total effect is unconfounded (hence identifiable). Kaufman et al. (2005, Statistics in Medicine 24, 1683-1702) applied a linear programming software to find the minimum and maximum possible values of the ACDE for specific numerical data. In this article, we apply the symbolic Balke-Pearl (1997, Journal of the American Statistical Association 92, 1171-1176) linear programming method to derive closed-form formulas for the upper and lower bounds on the ACDE under various assumptions of monotonicity. These universal bounds enable clinical experimenters to assess the direct effect of treatment from observed data with minimum computational effort, and they further shed light on the sign of the direct effect and the accuracy of the assessments.

Journal ArticleDOI
TL;DR: A doubly penalized Buckley-James method for the semiparametric accelerated failure time model to relate high-dimensional genomic data to censored survival outcomes, which uses the elastic-net penalty that is a mixture of L1- and L2-norm penalties.
Abstract: Recent interest in cancer research focuses on predicting patients' survival by investigating gene expression profiles based on microarray analysis. We propose a doubly penalized Buckley-James method for the semiparametric accelerated failure time model to relate high-dimensional genomic data to censored survival outcomes, which uses the elastic-net penalty that is a mixture of L1- and L2-norm penalties. Similar to the elastic-net method for a linear regression model with uncensored data, the proposed method performs automatic gene selection and parameter estimation, where highly correlated genes are able to be selected (or removed) together. The two-dimensional tuning parameter is determined by generalized crossvalidation. The proposed method is evaluated by simulations and applied to the Michigan squamous cell lung carcinoma study.

Journal ArticleDOI
TL;DR: A nonparametric multiplicative random effects model for the longitudinal process is proposed, which has many applications and leads to a flexible yet parsimoniousNonparametric random effectsmodel, which compares well with the competing parametric longitudinal approaches.
Abstract: In clinical studies, longitudinal biomarkers are often used to monitor disease progression and failure time. Joint modeling of longitudinal and survival data has certain advantages and has emerged as an effective way to mutually enhance information. Typically, a parametric longitudinal model is assumed to facilitate the likelihood approach. However, the choice of a proper parametric model turns out to be more elusive than models for standard longitudinal studies in which no survival endpoint occurs. In this article, we propose a nonparametric multiplicative random effects model for the longitudinal process, which has many applications and leads to a flexible yet parsimonious nonparametric random effects model. A proportional hazards model is then used to link the biomarkers and event time. We use B-splines to represent the nonparametric longitudinal process, and select the number of knots and degrees based on a version of the Akaike information criterion (AIC). Unknown model parameters are estimated through maximizing the observed joint likelihood, which is iteratively maximized by the Monte Carlo Expectation Maximization (MCEM) algorithm. Due to the simplicity of the model structure, the proposed approach has good numerical stability and compares well with the competing parametric longitudinal approaches. The new approach is illustrated with primary biliary cirrhosis (PBC) data, aiming to capture nonlinear patterns of serum bilirubin time courses and their relationship with survival time of PBC patients.

Journal ArticleDOI
TL;DR: It is argued that analyses valid under MNAR are not well suited for the primary analysis in clinical trials, and one route for sensitivity analysis is to consider, next to selection models, pattern-mixture models or shared-parameter models, a latent-class mixture model.
Abstract: In the analyses of incomplete longitudinal clinical trial data, there has been a shift, away from simple methods that are valid only if the data are missing completely at random, to more principled ignorable analyses, which are valid under the less restrictive missing at random assumption. The availability of the necessary standard statistical software nowadays allows for such analyses in practice. While the possibility of data missing not at random (MNAR) cannot be ruled out, it is argued that analyses valid under MNAR are not well suited for the primary analysis in clinical trials. Rather than either forgetting about or blindly shifting to an MNAR framework, the optimal place for MNAR analyses is within a sensitivity-analysis context. One such route for sensitivity analysis is to consider, next to selection models, pattern-mixture models or shared-parameter models. The latter can also be extended to a latent-class mixture model, the approach taken in this article. The performance of the so-obtained flexible model is assessed through simulations and the model is applied to data from a depression trial.

Journal ArticleDOI
TL;DR: A novel signed-rank test for clustered paired data is obtained using the general principle of within-cluster resampling and it is shown that only this test maintains the correct size under a null hypothesis of marginal symmetry compared to four existing signed rank tests.
Abstract: We consider the problem of comparing two outcome measures when the pairs are clustered. Using the general principle of within-cluster resampling, we obtain a novel signed-rank test for clustered paired data. We show by a simple informative cluster size simulation model that only our test maintains the correct size under a null hypothesis of marginal symmetry compared to four other existing signed rank tests; further, our test has adequate power when cluster size is noninformative. In general, cluster size is informative if the distribution of pair-wise differences within a cluster depends on the cluster size. An application of our method to testing radiation toxicity trend is presented.

Journal ArticleDOI
TL;DR: It is shown that optimal designs derived by the Bayesian approach are similar for observational studies of a single epidemic and for studies involving replicated epidemics in independent subpopulations.
Abstract: This article describes a method for choosing observation times for stochastic processes to maximise the expected information about their parameters. Two commonly used models for epidemiological processes are considered: a simple death process and a susceptible-infected (SI) epidemic process with dual sources for infection spreading within and from outwith the population. The search for the optimal design uses Bayesian computational methods to explore the joint parameter-data-design space, combined with a method known as moment closure to approximate the likelihood to make the acceptance step efficient. For the processes considered, a small number of optimally chosen observations are shown to yield almost as much information as much more intensively observed schemes that are commonly used in epidemiological experiments. Analysis of the simple death process allows a comparison between the full Bayesian approach and locally optimal designs around a point estimate from the prior based on asymptotic results. The robustness of the approach to misspecified priors is demonstrated for the SI epidemic process, for which the computational intractability of the likelihood precludes locally optimal designs. We show that optimal designs derived by the Bayesian approach are similar for observational studies of a single epidemic and for studies involving replicated epidemics in independent subpopulations. Different optima result, however, when the objective is to maximise the gain in information based on informative and non-informative priors: this has implications when an experiment is designed to convince a naive or sceptical observer rather than consolidate the belief of an informed observer. Some extensions to the methods, including the selection of information criteria and extension to other epidemic processes with transition probabilities, are briefly addressed.

Journal ArticleDOI
TL;DR: A new approach for choosing sample size based on cost efficiency, the ratio of a study's projected scientific and/or practical value to its total cost to be regarded as acceptable alternatives to current conventional approaches is proposed and justified.
Abstract: The conventional approach of choosing sample size to provide 80% or greater power ignores the cost implications of different sample size choices. Costs, however, are often impossible for investigators and funders to ignore in actual practice. Here, we propose and justify a new approach for choosing sample size based on cost efficiency, the ratio of a study's projected scientific and/or practical value to its total cost. By showing that a study's projected value exhibits diminishing marginal returns as a function of increasing sample size for a wide variety of definitions of study value, we are able to develop two simple choices that can be defended as more cost efficient than any larger sample size. The first is to choose the sample size that minimizes the average cost per subject. The second is to choose sample size to minimize total cost divided by the square root of sample size. This latter method is theoretically more justifiable for innovative studies, but also performs reasonably well and has some justification in other cases. For example, if projected study value is assumed to be proportional to power at a specific alternative and total cost is a linear function of sample size, then this approach is guaranteed either to produce more than 90% power or to be more cost efficient than any sample size that does. These methods are easy to implement, based on reliable inputs, and well justified, so they should be regarded as acceptable alternatives to current conventional approaches.

Journal ArticleDOI
TL;DR: This work forms the problem as one of testing for differences in survival curves after a prespecified time point, and proposes a variety of techniques for testing this hypothesis using simulation and illustrates them on a study comparing survival for autologous and allogeneic bone marrow transplants.
Abstract: In some clinical studies comparing treatments in terms of their survival curves, researchers may anticipate that the survival curves will cross at some point, leading to interest in a long-term survival comparison. However, simple comparison of the survival curves at a fixed point may be inefficient, and use of a weighted log-rank test may be overly sensitive to early differences in survival. We formulate the problem as one of testing for differences in survival curves after a prespecified time point, and propose a variety of techniques for testing this hypothesis. We study these methods using simulation and illustrate them on a study comparing survival for autologous and allogeneic bone marrow transplants.

Journal ArticleDOI
TL;DR: A general method for estimating the dependence parameter when the dependency is modeled with an Archimedean copula is suggested and is shown to be more accurate than the estimator proposed by Fine et al. (2001).
Abstract: In many follow-up studies, patients are subject to concurrent events. In this article, we consider semicompeting risks data as defined by Fine, Jiang, and Chappell (2001, Biometrika 88, 907-919) where one event is censored by the other but not vice versa. The proposed model involves marginal survival functions for the two events and a parametric family of copulas for their dependency. This article suggests a general method for estimating the dependence parameter when the dependency is modeled with an Archimedean copula. It uses the copula-graphic estimator of Zheng and Klein (1995, Biometrika 82, 127-138) for estimating the survival function of the nonterminal event, subject to dependent censoring. Asymptotic properties of these estimators are derived. Simulations show that the new methods work well with finite samples. The copula-graphic estimator is shown to be more accurate than the estimator proposed by Fine et al. (2001); its performances are similar to those of the self-consistent estimator of Jiang, Fine, Kosorok, and Chappell (2005, Scandinavian Journal of Statistics 33, 1-20). The analysis of a data set, emphasizing the estimation of characteristics of the observable region, is presented as an illustration.