scispace - formally typeset
Search or ask a question

Showing papers in "Biometrics in 2006"


Journal ArticleDOI
TL;DR: In this paper, distance-based tests of homogeneity of multivariate dispersions, which can be based on any dissimilarity measure of choice, are proposed, relying on the rotational invariance of either the multivariate centroid or the spatial median to obtain measures of spread using principal coordinate axes.
Abstract: The traditional likelihood-based test for differences in multivariate dispersions is known to be sensitive to nonnormality. It is also impossible to use when the number of variables exceeds the number of observations. Many biological and ecological data sets have many variables, are highly skewed, and are zero-inflated. The traditional test and even some more robust alternatives are also unreasonable in many contexts where measures of dispersion based on a non-Euclidean dissimilarity would be more appropriate. Distance-based tests of homogeneity of multivariate dispersions, which can be based on any dissimilarity measure of choice, are proposed here. They rely on the rotational invariance of either the multivariate centroid or the spatial median to obtain measures of spread using principal coordinate axes. The tests are straightforward multivariate extensions of Levene's test, with P-values obtained either using the traditional F-distribution or using permutation of either least-squares or LAD residuals. Examples illustrate the utility of the approach, including the analysis of stabilizing selection in sparrows, biodiversity of New Zealand fish assemblages, and the response of Indonesian reef corals to an El Nino. Monte Carlo simulations from the real data sets show that the distance-based tests are robust and powerful for relevant alternative hypotheses of real differences in spread.

2,255 citations


Journal ArticleDOI

806 citations


Journal ArticleDOI

559 citations


Journal ArticleDOI
TL;DR: This work provides a new probabilistic derivation for any incidence-based index that is symmetric and homogeneous and proposes estimators that adjust for the effect of unseen shared species on the authors' abundance-based indices.
Abstract: A wide variety of similarity indices for comparing two assemblages based on species incidence (i.e., presence/absence) data have been proposed in the literature. These indices are generally based on three simple incidence counts: the number of species shared by two assemblages and the number of species unique to each of them. We provide a new probabilistic derivation for any incidence-based index that is symmetric (i.e., the index is not affected by the identity ordering of the two assemblages) and homogeneous (i.e., the index is unchanged if all counts are multiplied by a constant). The probabilistic approach is further extended to formulate abundance-based indices. Thus any symmetric and homogeneous incidence index can be easily modified to an abundance-type version. Applying the Laplace approximation formulas, we propose estimators that adjust for the effect of unseen shared species on our abundance-based indices. Simulation results show that the adjusted estimators significantly reduce the biases of the corresponding unadjusted ones when a substantial fraction of species is missing from samples. Data on successional vegetation in six tropical forests are used for illustration. Advantages and disadvantages of some commonly applied indices are briefly discussed.

550 citations


Journal ArticleDOI
Simon N. Wood1
TL;DR: The smooths offer several advantages: they have one wiggliness penalty per covariate and are hence invariant to linear rescaling of covariates, making them useful when there is no “natural” way to scale covariates relative to each other.
Abstract: Summary A general method for constructing low-rank tensor product smooths for use as components of generalized additive models or generalized additive mixed models is presented. A penalized regression approach is adopted in which tensor product smooths of several variables are constructed from smooths of each variable separately, these “marginal” smooths being represented using a low-rank basis with an associated quadratic wiggliness penalty. The smooths offer several advantages: (i) they have one wiggliness penalty per covariate and are hence invariant to linear rescaling of covariates, making them useful when there is no “natural” way to scale covariates relative to each other; (ii) they have a useful tuneable range of smoothness, unlike single-penalty tensor product smooths that are scale invariant; (iii) the relatively low rank of the smooths means that they are computationally efficient; (iv) the penalties on the smooths are easily interpretable in terms of function shape; (v) the smooths can be generated completely automatically from any marginal smoothing bases and associated quadratic penalties, giving the modeler considerable flexibility to choose the basis penalty combination most appropriate to each modeling task; and (vi) the smooths can easily be written as components of a standard linear or generalized linear mixed model, allowing them to be used as components of the rich family of such models implemented in standard software, and to take advantage of the efficient and stable computational methods that have been developed for such models. A small simulation study shows that the methods can compare favorably with recently developed smoothing spline ANOVA methods.

501 citations


Journal ArticleDOI
TL;DR: A stochastic discrete-time susceptible-exposed-infectious-recovered (SEIR) model for infectious diseases is developed with the aim of estimating parameters from daily incidence and mortality time series for an outbreak of Ebola in the Democratic Republic of Congo in 1995.
Abstract: A stochastic discrete-time susceptible-exposed-infectious-recovered (SEIR) model for infectious diseases is developed with the aim of estimating parameters from daily incidence and mortality time series for an outbreak of Ebola in the Democratic Republic of Congo in 1995. The incidence time series exhibit many low integers as well as zero counts requiring an intrinsically stochastic modeling approach. In order to capture the stochastic nature of the transitions between the compartmental populations in such a model we specify appropriate conditional binomial distributions. In addition, a relatively simple temporally varying transmission rate function is introduced that allows for the effect of control interventions. We develop Markov chain Monte Carlo methods for inference that are used to explore the posterior distribution of the parameters. The algorithm is further extended to integrate numerically over state variables of the model, which are unobserved. This provides a realistic stochastic model that can be used by epidemiologists to study the dynamics of the disease and the effect of control interventions.

414 citations


Journal ArticleDOI
TL;DR: The Wilcoxon signed rank test is a frequently used nonparametric test for paired data based on independent units of analysis that is able to incorporate clustering and is shown to be asymptotically normal as the number of clusters becomes large, if the cluster size is bounded.
Abstract: The Wilcoxon signed rank test is a frequently used nonparametric test for paired data (e.g., consisting of pre- and posttreatment measurements) based on independent units of analysis. This test cannot be used for paired comparisons arising from clustered data (e.g., if paired comparisons are available for each of two eyes of an individual). To incorporate clustering, a generalization of the randomization test formulation for the signed rank test is proposed, where the unit of randomization is at the cluster level (e.g., person), while the individual paired units of analysis are at the subunit within cluster level (e.g., eye within person). An adjusted variance estimate of the signed rank test statistic is then derived, which can be used for either balanced (same number of subunits per cluster) or unbalanced (different number of subunits per cluster) data, with an exchangeable correlation structure, with or without tied values. The resulting test statistic is shown to be asymptotically normal as the number of clusters becomes large, if the cluster size is bounded. Simulation studies are performed based on simulating correlated ranked data from a signed log-normal distribution. These studies indicate appropriate type I error for data sets with > or =20 clusters and a superior power profile compared with either the ordinary signed rank test based on the average cluster difference score or the multivariate signed rank test of Puri and Sen. Finally, the methods are illustrated with two data sets, (i) an ophthalmologic data set involving a comparison of electroretinogram (ERG) data in retinitis pigmentosa (RP) patients before and after undergoing an experimental surgical procedure, and (ii) a nutritional data set based on a randomized prospective study of nutritional supplements in RP patients where vitamin E intake outside of study capsules is compared before and after randomization to monitor compliance with nutritional protocols.

291 citations


Journal ArticleDOI
TL;DR: A mechanism‐based dynamic model is proposed for characterizing long‐term viral dynamics with antiretroviral therapy, described by a set of nonlinear differential equations without closed‐form solutions that directly incorporate drug concentration, adherence, and drug susceptibility into a function of treatment efficacy.
Abstract: HIV dynamics studies have significantly contributed to the understanding of HIV infection and antiviral treatment strategies. But most studies are limited to short-term viral dynamics due to the difficulty of establishing a relationship of antiviral response with multiple treatment factors such as drug exposure and drug susceptibility during long-term treatment. In this article, a mechanism-based dynamic model is proposed for characterizing long-term viral dynamics with antiretroviral therapy, described by a set of nonlinear differential equations without closed-form solutions. In this model we directly incorporate drug concentration, adherence, and drug susceptibility into a function of treatment efficacy, defined as an inhibition rate of virus replication. We investigate a Bayesian approach under the framework of hierarchical Bayesian (mixed-effects) models for estimating unknown dynamic parameters. In particular, interest focuses on estimating individual dynamic parameters. The proposed methods not only help to alleviate the difficulty in parameter identifiability, but also flexibly deal with sparse and unbalanced longitudinal data from individual subjects. For illustration purposes, we present one simulation example to implement the proposed approach and apply the methodology to a data set from an AIDS clinical trial. The basic concept of the longitudinal HIV dynamic systems and the proposed methodologies are generally applicable to any other biomedical dynamic systems.

251 citations


Journal ArticleDOI
TL;DR: Disease-mapping models for areal data often have fixed effects to measure the effect of spatially varying covariates and random effects with a conditionally autoregressive (CAR) prior to account for spatial clustering, but adding the CAR random effects can cause large changes in the posterior mean and variance of fixed effects compared to the nonspatial regression model.
Abstract: Disease-mapping models for areal data often have fixed effects to measure the effect of spatially varying covariates and random effects with a conditionally autoregressive (CAR) prior to account for spatial clustering. In such spatial regressions, the objective may be to estimate the fixed effects while accounting for the spatial correlation. But adding the CAR random effects can cause large changes in the posterior mean and variance of fixed effects compared to the nonspatial regression model. This article explores the impact of adding spatial random effects on fixed effect estimates and posterior variance. Diagnostics are proposed to measure posterior variance inflation from collinearity between the fixed effect covariates and the CAR random effects and to measure each region's influence on the change in the fixed effect's estimates by adding the CAR random effects. A new model that alleviates the collinearity between the fixed effect covariates and the CAR random effects is developed and extensions of these methods to point-referenced data models are discussed.

249 citations


Journal ArticleDOI
TL;DR: Simulation studies suggest that AUC-based classification scores have performance comparable with logistic likelihood-based scores when the logistic regression model holds, and model fitting by maximizing the AUC should be considered when the goal is to derive a marker combination score for classification or prediction.
Abstract: No single biomarker for cancer is considered adequately sensitive and specific for cancer screening. It is expected that the results of multiple markers will need to be combined in order to yield adequately accurate classification. Typically, the objective function that is optimized for combining markers is the likelihood function. In this article, we consider an alternative objective function-the area under the empirical receiver operating characteristic curve (AUC). We note that it yields consistent estimates of parameters in a generalized linear model for the risk score but does not require specifying the link function. Like logistic regression, it yields consistent estimation with case-control or cohort data. Simulation studies suggest that AUC-based classification scores have performance comparable with logistic likelihood-based scores when the logistic regression model holds. Analysis of data from a proteomics biomarker study shows that performance can be far superior to logistic regression derived scores when the logistic regression model does not hold. Model fitting by maximizing the AUC rather than the likelihood should be considered when the goal is to derive a marker combination score for classification or prediction.

247 citations



Journal ArticleDOI
TL;DR: A pairwise approach in which all possible bivariate models are fitted, and where inference follows from pseudo-likelihood arguments is proposed, applicable for linear, generalized linear, and nonlinear mixed models, or for combinations of these.
Abstract: A mixed model is a flexible tool for joint modeling purposes, especially when the gathered data are unbalanced. However, computational problems due to the dimension of the joint covariance matrix of the random effects arise as soon as the number of outcomes and/or the number of used random effects per outcome increases. We propose a pairwise approach in which all possible bivariate models are fitted, and where inference follows from pseudo-likelihood arguments. The approach is applicable for linear, generalized linear, and nonlinear mixed models, or for combinations of these. The methodology will be illustrated for linear mixed models in the analysis of 22-dimensional, highly unbalanced, longitudinal profiles of hearing thresholds.

Journal ArticleDOI
TL;DR: Insight is provided into the robustness property of the MLEs against departure from the normal random effects assumption and the difficulty of reliable estimates for the standard errors is suggested by using bootstrap procedures.
Abstract: The maximum likelihood approach to jointly model the survival time and its longitudinal covariates has been successful to model both processes in longitudinal studies. Random effects in the longitudinal process are often used to model the survival times through a proportional hazards model, and this invokes an EM algorithm to search for the maximum likelihood estimates (MLEs). Several intriguing issues are examined here, including the robustness of the MLEs against departure from the normal random effects assumption, and difficulties with the profile likelihood approach to provide reliable estimates for the standard error of the MLEs. We provide insights into the robustness property and suggest to overcome the difficulty of reliable estimates for the standard errors by using bootstrap procedures. Numerical studies and data analysis illustrate our points.

Journal ArticleDOI
TL;DR: The generalized additive model boosting method is shown to be a strong competitor to common procedures for the fitting of generalized additive models, in high‐dimensional settings with many nuisance predictor variables it performs very well.
Abstract: The use of generalized additive models in statistical data analysis suffers from the restriction to few explanatory variables and the problems of selection of smoothing parameters. Generalized additive model boosting circumvents these problems by means of stagewise fitting of weak learners. A fitting procedure is derived which works for all simple exponential family distributions, including binomial, Poisson, and normal response variables. The procedure combines the selection of variables and the determination of the appropriate amount of smoothing. Penalized regression splines and the newly introduced penalized stumps are considered as weak learners. Estimates of standard deviations and stopping criteria, which are notorious problems in iterative procedures, are based on an approximate hat matrix. The method is shown to be a strong competitor to common procedures for the fitting of generalized additive models. In particular, in high-dimensional settings with many nuisance predictor variables it performs very well.

Journal ArticleDOI
TL;DR: A multivariate mixed effects model is presented to explicitly capture two different sources of dependence among longitudinal measures over time as well as dependence between different variables in cancer and AIDS clinical trials.
Abstract: Joint modeling of longitudinal and survival data is becoming increasingly essential in most cancer and AIDS clinical trials. We propose a likelihood approach to extend both longitudinal and survival components to be multidimensional. A multivariate mixed effects model is presented to explicitly capture two different sources of dependence among longitudinal measures over time as well as dependence between different variables. For the survival component of the joint model, we introduce a shared frailty, which is assumed to have a positive stable distribution, to induce correlation between failure times. The proposed marginal univariate survival model, which accommodates both zero and nonzero cure fractions for the time to event, is then applied to each marginal survival function. The proposed multivariate survival model has a proportional hazards structure for the population hazard, conditionally as well as marginally, when the baseline covariates are specified through a specific mechanism. In addition, the model is capable of dealing with survival functions with different cure rate structures. The methodology is specifically applied to the International Breast Cancer Study Group (IBCSG) trial to investigate the relationship between quality of life, disease-free survival, and overall survival.

Journal ArticleDOI
TL;DR: A unified and efficient nonparametric hypothesis testing procedure that can easily take into account correlation within subjects and deal directly with both continuous and discrete response longitudinal data under the framework of generalized linear models is proposed.
Abstract: Nonparametric smoothing methods are used to model longitudinal data, but the challenge remains to incorporate correlation into nonparametric estimation procedures. In this article, we propose an efficient estimation procedure for varying-coefficient models for longitudinal data. The proposed procedure can easily take into account correlation within subjects and deal directly with both continuous and discrete response longitudinal data under the framework of generalized linear models. The proposed approach yields a more efficient estimator than the generalized estimation equation approach when the working correlation is misspecified. For varying-coefficient models, it is often of interest to test whether coefficient functions are time varying or time invariant. We propose a unified and efficient nonparametric hypothesis testing procedure, and further demonstrate that the resulting test statistics have an asymptotic chi-squared distribution. In addition, the goodness-of-fit test is applied to test whether the model assumption is satisfied. The corresponding test is also useful for choosing basis functions and the number of knots for regression spline models in conjunction with the model selection criterion. We evaluate the finite sample performance of the proposed procedures with Monte Carlo simulation studies. The proposed methodology is illustrated by the analysis of an acquired immune deficiency syndrome (AIDS) data set.

Journal ArticleDOI
TL;DR: A new general approach for handling misclassification in discrete covariates or responses in regression models, applicable to models with misclassified response and/or misclassified discrete regressors and to a study on caries with a misclassified longitudinal response.
Abstract: We have developed a new general approach for handling misclassification in discrete covariates or responses in regression models. The simulation and extrapolation (SIMEX) method, which was originally designed for handling additive covariate measurement error, is applied to the case of misclassification. The statistical model for characterizing misclassification is given by the transition matrix Pi from the true to the observed variable. We exploit the relationship between the size of misclassification and bias in estimating the parameters of interest. Assuming that Pi is known or can be estimated from validation data, we simulate data with higher misclassification and extrapolate back to the case of no misclassification. We show that our method is quite general and applicable to models with misclassified response and/or misclassified discrete regressors. In the case of a binary response with misclassification, we compare our method to the approach of Neuhaus, and to the matrix method of Morrissey and Spiegelman in the case of a misclassified binary regressor. We apply our method to a study on caries with a misclassified longitudinal response.

Journal ArticleDOI
TL;DR: In this article, occurrence probability models that allow for heterogeneous detection probabilities by considering several common classes of mixture distributions for p are developed.
Abstract: Summary Models for estimating the probability of occurrence of a species in the presence of imperfect detection are important in many ecological disciplines. In these “site occupancy” models, the possibility of heterogeneity in detection probabilities among sites must be considered because variation in abundance (and other factors) among sampled sites induces variation in detection probability (p). In this article, I develop occurrence probability models that allow for heterogeneous detection probabilities by considering several common classes of mixture distributions for p. For any mixing distribution, the likelihood has the general form of a zero-inflated binomial mixture for which inference based upon integrated likelihood is straightforward. A recent paper by Link (2003, Biometrics59, 1123–1130) demonstrates that in closed population models used for estimating population size, different classes of mixture distributions are indistinguishable from data, yet can produce very different inferences about population size. I demonstrate that this problem can also arise in models for estimating site occupancy in the presence of heterogeneous detection probabilities. The implications of this are discussed in the context of an application to avian survey data and the development of animal monitoring programs.

Journal ArticleDOI
TL;DR: Two regularization approaches are considered, the LASSO and the threshold-gradient-directed regularization, for estimation and variable selection in the accelerated failure time model with multiple covariates based on Stute's weighted least squares method.
Abstract: We consider two regularization approaches, the LASSO and the threshold-gradient-directed regularization, for estimation and variable selection in the accelerated failure time model with multiple covariates based on Stute's weighted least squares method. The Stute estimator uses Kaplan-Meier weights to account for censoring in the least squares criterion. The weighted least squares objective function makes the adaptation of this approach to multiple covariate settings computationally feasible. We use V-fold cross-validation and a modified Akaike's Information Criterion for tuning parameter selection, and a bootstrap approach for variance estimation. The proposed method is evaluated using simulations and demonstrated on a real data example.

Journal ArticleDOI
TL;DR: Results from simulation studies indicate that the MOM model is best at controlling false discoveries, without sacrificing power, and is the only one capable of finding two genome regions previously shown to be involved in diabetes.
Abstract: Traditional genetic mapping has largely focused on the identification of loci affecting one, or at most a few, complex traits. Microarrays allow for measurement of thousands of gene expression abundances, themselves complex traits, and a number of recent investigations have considered these measurements as phenotypes in mapping studies. Combining traditional quantitative trait loci (QTL) mapping methods with microarray data is a powerful approach with demonstrated utility in a number of recent biological investigations. These expression quantitative trait loci (eQTL) studies are similar to traditional QTL studies, as a main goal is to identify the genomic locations to which the expression traits are linked. However, eQTL studies probe thousands of expression transcripts; and as a result, standard multi-trait QTL mapping methods, designed to handle at most tens of traits, do not directly apply. One possible approach is to use single-trait QTL mapping methods to analyze each transcript separately. This leads to an increased number of false discoveries, as corrections for multiple tests across transcripts are not made. Similarly, the repeated application, at each marker, of methods for identifying differentially expressed transcripts suffers from multiple tests across markers. Here, we demonstrate the deficiencies of these approaches and propose a mixture over markers (MOM) model that shares information across both markers and transcripts. The utility of all methods is evaluated using simulated data as well as data from an F(2) mouse cross in a study of diabetes. Results from simulation studies indicate that the MOM model is best at controlling false discoveries, without sacrificing power. The MOM model is also the only one capable of finding two genome regions previously shown to be involved in diabetes.

Journal ArticleDOI
TL;DR: Under various scenarios, the new Bayesian design based on the toxicity–efficacy odds ratio trade‐offs exhibits good properties and treats most patients at the desirable dose levels.
Abstract: A Bayesian adaptive design is proposed for dose-finding in phase I/II clinical trials to incorporate the bivariate outcomes, toxicity and efficacy, of a new treatment. Without specifying any parametric functional form for the drug dose-response curve, we jointly model the bivariate binary data to account for the correlation between toxicity and efficacy. After observing all the responses of each cohort of patients, the dosage for the next cohort is escalated, deescalated, or unchanged according to the proposed odds ratio criteria constructed from the posterior toxicity and efficacy probabilities. A novel class of prior distributions is proposed through logit transformations which implicitly imposes a monotonic constraint on dose toxicity probabilities and correlates the probabilities of the bivariate outcomes. We conduct simulation studies to evaluate the operating characteristics of the proposed method. Under various scenarios, the new Bayesian design based on the toxicity-efficacy odds ratio trade-offs exhibits good properties and treats most patients at the desirable dose levels. The method is illustrated with a real trial design for a breast medical oncology study.

Journal ArticleDOI
TL;DR: A mark-recapture-based model is developed that uses the observed distribution to relax the assumption of zero correlation between detection probabilities implicit in the mark-Recapture model and demonstrates its usefulness in coping with unmodeled heterogeneity using data from an aerial survey of crabeater seals in the Antarctic.
Abstract: Mark-recapture models applied to double-observer distance sampling data neglect the information on relative detectability of objects contained in the distribution of observed distances. A difference between the observed distribution and that predicted by the mark-recapture model is symptomatic of a failure of the assumption of zero correlation between detection probabilities implicit in the mark-recapture model. We develop a mark-recapture-based model that uses the observed distribution to relax this assumption to zero correlation at only one distance. We demonstrate its usefulness in coping with unmodeled heterogeneity using data from an aerial survey of crabeater seals in the Antarctic.

Journal ArticleDOI
TL;DR: This work calls this parameter the percent change annualized (PCA) and proposes two new estimators of it, an adaptive one and equals the linear model estimator with a high probability when the rates are not significantly different from linear on the log scale, but includes fewer points if there are significant departures from that linearity.
Abstract: The annual percent change (APC) is often used to measure trends in disease and mortality rates, and a common estimator of this parameter uses a linear model on the log of the age-standardized rates Under the assumption of linearity on the log scale, which is equivalent to a constant change assumption, APC can be equivalently defined in three ways as transformations of either (1) the slope of the line that runs through the log of each rate, (2) the ratio of the last rate to the first rate in the series, or (3) the geometric mean of the proportional changes in the rates over the series When the constant change assumption fails then the first definition cannot be applied as is, while the second and third definitions unambiguously define the same parameter regardless of whether the assumption holds We call this parameter the percent change annualized (PCA) and propose two new estimators of it The first, the two-point estimator, uses only the first and last rates, assuming nothing about the rates in between This estimator requires fewer assumptions and is asymptotically unbiased as the size of the population gets large, but has more variability since it uses no information from the middle rates The second estimator is an adaptive one and equals the linear model estimator with a high probability when the rates are not significantly different from linear on the log scale, but includes fewer points if there are significant departures from that linearity For the two-point estimator we can use confidence intervals previously developed for ratios of directly standardized rates For the adaptive estimator, we show through simulation that the bootstrap confidence intervals give appropriate coverage

Journal ArticleDOI
TL;DR: Methods for use in vaccine clinical trials to help determine whether the immune response to a vaccine is actually causing a reduction in the infection rate are introduced and may help elucidate the role of immune response in preventing infections.
Abstract: This article introduces methods for use in vaccine clinical trials to help determine whether the immune response to a vaccine is actually causing a reduction in the infection rate. This is not easy because immune response to the (say HIV) vaccine is only observed in the HIV vaccine arm. If we knew what the HIV-specific immune response in placebo recipients would have been, had they been vaccinated, this immune response could be treated essentially like a baseline covariate and an interaction with treatment could be evaluated. Relatedly, the rate of infection by this baseline covariate could be compared between the two groups and a causative role of immune response would be supported if infection risk decreased with increasing HIV immune response only in the vaccine group. We introduce two methods for inferring this HIV-specific immune response. The first involves vaccinating everyone before baseline with an irrelevant vaccine, for example, rabies. Randomization ensures that the relationship between the immune responses to the rabies and HIV vaccines observed in the vaccine group is the same as what would have been seen in the placebo group. We infer a placebo volunteer’s response to the HIV vaccine using their rabies response and a prediction model from the vaccine group. The second method entails vaccinating all uninfected placebo patients at the closeout of the trial with the HIV vaccine and recording immune response. We pretend this immune response at closeout is what they would have had at baseline. We can then infer what the distribution of immune response among placebo infecteds would have been. Such designs may help elucidate the role of immune response in preventing infections. More pointedly, they could be helpful in the decision to improve or abandon an HIV vaccine with mediocre performance in a phase III trial.

Journal ArticleDOI
TL;DR: In this article, a general class of structured additive regression models for categorical responses, allowing for a flexible semiparametric predictor, is proposed for forest health with damage state of trees as the response.
Abstract: Motivated by a space-time study on forest health with damage state of trees as the response, we propose a general class of structured additive regression models for categorical responses, allowing for a flexible semiparametric predictor Nonlinear effects of continuous covariates, time trends, and interactions between continuous covariates are modeled by penalized splines Spatial effects can be estimated based on Markov random fields, Gaussian random fields, or two-dimensional penalized splines We present our approach from a Bayesian perspective, with inference based on a categorical linear mixed model representation The resulting empirical Bayes method is closely related to penalized likelihood estimation in a frequentist setting Variance components, corresponding to inverse smoothing parameters, are estimated using (approximate) restricted maximum likelihood In simulation studies we investigate the performance of different choices for the spatial effect, compare the empirical Bayes approach to competing methodology, and study the bias of mixed model estimates As an application we analyze data from the forest health survey

Journal ArticleDOI
TL;DR: A class of three-level response surface designs is introduced which allows all except the quadratic parameters to be estimated orthogonally, as well as having a number of other useful properties.
Abstract: Many processes in the biological industries are studied using response surface methodology. The use of biological materials, however, means that run-to-run variation is typically much greater than that in many experiments in mechanical or chemical engineering and so the designs used require greater replication. The data analysis which is performed may involve some variable selection, as well as fitting polynomial response surface models. This implies that designs should allow the parameters of the model to be estimated nearly orthogonally. A class of three-level response surface designs is introduced which allows all except the quadratic parameters to be estimated orthogonally, as well as having a number of other useful properties. These subset designs are obtained by using two-level factorial designs in subsets of the factors, with the other factors being held at their middle level. This allows their properties to be easily explored. Replacing some of the two-level designs with fractional replicates broadens the class of useful designs, especially with five or more factors, and sometimes incomplete subsets can be used. It is very simple to include a few two- and four-level factors in these designs by excluding subsets with these factors at the middle level. Subset designs can be easily modified to include factors with five or more levels by allowing a different pair of levels to be used in different subsets.

Journal ArticleDOI
TL;DR: An explicit asymptotic method is provided to evaluate the performance of different response-adaptive randomization procedures in clinical trials with continuous outcomes and concludes that the doubly adaptive biased coin design procedure targeting optimal allocation is the best one for practical use.
Abstract: We provide an explicit asymptotic method to evaluate the performance of different response-adaptive randomization procedures in clinical trials with continuous outcomes. We use this method to investigate four different response-adaptive randomization procedures. Their performance, especially in power and treatment assignment skewing to the better treatment, is thoroughly evaluated theoretically. These results are then verified by simulation. Our analysis concludes that the doubly adaptive biased coin design procedure targeting optimal allocation is the best one for practical use. We also consider the effect of delay in responses and nonstandard responses, for example, Cauchy distributed response. We illustrate our procedure by redesigning a real clinical trial.

Journal ArticleDOI
TL;DR: More general versions of the focused information criterion (FIC) for variable selection in logistic regression are proposed, allowing other risk measures such as the one based on L(p) error.
Abstract: Summary In biostatistical practice, it is common to use information criteria as a guide for model selection. We propose new versions of the focused information criterion (FIC) for variable selection in logistic regression. The FIC gives, depending on the quantity to be estimated, possibly different sets of selected variables. The standard version of the FIC measures the mean squared error of the estimator of the quantity of interest in the selected model. In this article, we propose more general versions of the FIC, allowing other risk measures such as the one based on Lp error. When prediction of an event is important, as is often the case in medical applications, we construct an FIC using the error rate as a natural risk measure. The advantages of using an information criterion which depends on both the quantity of interest and the selected risk measure are illustrated by means of a simulation study and application to a study on diabetic retinopathy.

Journal ArticleDOI
TL;DR: A robust Bayesian hierarchical model is developed that can be used for testing for differentially expressed genes among multiple samples, and it can distinguish between the different possible patterns of differential expression when there are three or more samples.
Abstract: Summary. We consider the problem of identifying differentially expressed genes under different conditions using gene expression microarrays. Because of the many steps involved in the experimental process, from hybridization to image analysis, cDNA microarray data often contain outliers. For example, an outlying data value could occur because of scratches or dust on the surface, imperfections in the glass, or imperfections in the array production. We develop a robust Bayesian hierarchical model for testing for differential expression. Errors are modeled explicitly using a t-distribution, which accounts for outliers. The model includes an exchangeable prior for the variances, which allows different variances for the genes but still shrinks extreme empirical variances. Our model can be used for testing for differentially expressed genes among multiple samples, and it can distinguish between the different possible patterns of differential expression when there are three or more samples. Parameter estimation is carried out using a novel version of Markov chain Monte Carlo that is appropriate when the model puts mass on subspaces of the full parameter space. The method is illustrated using two publicly available gene expression data sets. We compare our method to six other baseline and commonly used techniques, namely the t-test, the Bonferroni-adjusted t-test, significance analysis of microarrays (SAM), Efron’s empirical Bayes, and EBarrays in both its lognormal– normal and gamma–gamma forms. In an experiment with HIV data, our method performed better than these alternatives, on the basis of between-replicate agreement and disagreement.

Journal ArticleDOI
TL;DR: A model is proposed to describe the evolution in continuous time of unobserved cognition in the elderly and assess the impact of covariates directly on it, using data from PAQUID, a French prospective cohort study of ageing.
Abstract: Cognition is not directly measurable. It is assessed using psychometric tests, which can be viewed as quantitative measures of cognition with error. The aim of this article is to propose a model to describe the evolution in continuous time of unobserved cognition in the elderly and assess the impact of covariates directly on it. The latent cognitive process is defined using a linear mixed model including a Brownian motion and time-dependent covariates. The observed psychometric tests are considered as the results of parameterized nonlinear transformations of the latent cognitive process at discrete occasions. Estimation of the parameters contained both in the transformations and in the linear mixed model is achieved by maximizing the observed likelihood and graphical methods are performed to assess the goodness of fit of the model. The method is applied to data from PAQUID, a French prospective cohort study of ageing.