Showing papers in "Biometrics in 2006"

PDF

Open Access

Journal Article•DOI•

Distance-based tests for homogeneity of multivariate dispersions.

[...]

01 Mar 2006-Biometrics

TL;DR: In this paper, distance-based tests of homogeneity of multivariate dispersions, which can be based on any dissimilarity measure of choice, are proposed, relying on the rotational invariance of either the multivariate centroid or the spatial median to obtain measures of spread using principal coordinate axes.

...read moreread less

Abstract: The traditional likelihood-based test for differences in multivariate dispersions is known to be sensitive to nonnormality. It is also impossible to use when the number of variables exceeds the number of observations. Many biological and ecological data sets have many variables, are highly skewed, and are zero-inflated. The traditional test and even some more robust alternatives are also unreasonable in many contexts where measures of dispersion based on a non-Euclidean dissimilarity would be more appropriate. Distance-based tests of homogeneity of multivariate dispersions, which can be based on any dissimilarity measure of choice, are proposed here. They rely on the rotational invariance of either the multivariate centroid or the spatial median to obtain measures of spread using principal coordinate axes. The tests are straightforward multivariate extensions of Levene's test, with P-values obtained either using the traditional F-distribution or using permutation of either least-squares or LAD residuals. Examples illustrate the utility of the approach, including the analysis of stabilizing selection in sparrows, biodiversity of New Zealand fish assemblages, and the response of Indonesian reef corals to an El Nino. Monte Carlo simulations from the real data sets show that the distance-based tests are robust and powerful for relevant alternative hypotheses of real differences in spread.

...read moreread less

2,255 citations

Journal Article•DOI•

Spatial Analysis: A Guide for Ecologists

[...]

Trevor C. Bailey¹•Institutions (1)

University of Exeter¹

01 Sep 2006-Biometrics

806 citations

Journal Article•DOI•

Survival Analysis: Techniques for Censored and Truncated Data

[...]

Ingo Langner¹•Institutions (1)

University of Bremen¹

01 Jun 2006-Biometrics

559 citations

Journal Article•DOI•

Abundance-based similarity indices and their estimation when there are unseen species in samples.

[...]

Anne Chao¹, Robin L. Chazdon², Robert K. Colwell², Tsung-Jen Shen³•Institutions (3)

National Tsing Hua University¹, University of Connecticut², National Chung Hsing University³

01 Jun 2006-Biometrics

TL;DR: This work provides a new probabilistic derivation for any incidence-based index that is symmetric and homogeneous and proposes estimators that adjust for the effect of unseen shared species on the authors' abundance-based indices.

...read moreread less

Abstract: A wide variety of similarity indices for comparing two assemblages based on species incidence (i.e., presence/absence) data have been proposed in the literature. These indices are generally based on three simple incidence counts: the number of species shared by two assemblages and the number of species unique to each of them. We provide a new probabilistic derivation for any incidence-based index that is symmetric (i.e., the index is not affected by the identity ordering of the two assemblages) and homogeneous (i.e., the index is unchanged if all counts are multiplied by a constant). The probabilistic approach is further extended to formulate abundance-based indices. Thus any symmetric and homogeneous incidence index can be easily modified to an abundance-type version. Applying the Laplace approximation formulas, we propose estimators that adjust for the effect of unseen shared species on our abundance-based indices. Simulation results show that the adjusted estimators significantly reduce the biases of the corresponding unadjusted ones when a substantial fraction of species is missing from samples. Data on successional vegetation in six tropical forests are used for illustration. Advantages and disadvantages of some commonly applied indices are briefly discussed.

...read moreread less

550 citations

Journal Article•DOI•

Low‐Rank Scale‐Invariant Tensor Product Smooths for Generalized Additive Mixed Models

[...]

Simon N. Wood¹•Institutions (1)

University of Bath¹

01 Dec 2006-Biometrics

TL;DR: The smooths offer several advantages: they have one wiggliness penalty per covariate and are hence invariant to linear rescaling of covariates, making them useful when there is no “natural” way to scale covariates relative to each other.

...read moreread less

Abstract: Summary A general method for constructing low-rank tensor product smooths for use as components of generalized additive models or generalized additive mixed models is presented. A penalized regression approach is adopted in which tensor product smooths of several variables are constructed from smooths of each variable separately, these “marginal” smooths being represented using a low-rank basis with an associated quadratic wiggliness penalty. The smooths offer several advantages: (i) they have one wiggliness penalty per covariate and are hence invariant to linear rescaling of covariates, making them useful when there is no “natural” way to scale covariates relative to each other; (ii) they have a useful tuneable range of smoothness, unlike single-penalty tensor product smooths that are scale invariant; (iii) the relatively low rank of the smooths means that they are computationally efficient; (iv) the penalties on the smooths are easily interpretable in terms of function shape; (v) the smooths can be generated completely automatically from any marginal smoothing bases and associated quadratic penalties, giving the modeler considerable flexibility to choose the basis penalty combination most appropriate to each modeling task; and (vi) the smooths can easily be written as components of a standard linear or generalized linear mixed model, allowing them to be used as components of the rich family of such models implemented in standard software, and to take advantage of the efficient and stable computational methods that have been developed for such models. A small simulation study shows that the methods can compare favorably with recently developed smoothing spline ANOVA methods.

...read moreread less

501 citations

Journal Article•DOI•

Statistical inference in a stochastic epidemic SEIR model with control intervention: Ebola as a case study.

[...]

Phenyo E. Lekone¹, Bärbel Finkenstädt¹•Institutions (1)

University of Warwick¹

01 Dec 2006-Biometrics

TL;DR: A stochastic discrete-time susceptible-exposed-infectious-recovered (SEIR) model for infectious diseases is developed with the aim of estimating parameters from daily incidence and mortality time series for an outbreak of Ebola in the Democratic Republic of Congo in 1995.

...read moreread less

Abstract: A stochastic discrete-time susceptible-exposed-infectious-recovered (SEIR) model for infectious diseases is developed with the aim of estimating parameters from daily incidence and mortality time series for an outbreak of Ebola in the Democratic Republic of Congo in 1995. The incidence time series exhibit many low integers as well as zero counts requiring an intrinsically stochastic modeling approach. In order to capture the stochastic nature of the transitions between the compartmental populations in such a model we specify appropriate conditional binomial distributions. In addition, a relatively simple temporally varying transmission rate function is introduced that allows for the effect of control interventions. We develop Markov chain Monte Carlo methods for inference that are used to explore the posterior distribution of the parameters. The algorithm is further extended to integrate numerically over state variables of the model, which are unobserved. This provides a realistic stochastic model that can be used by epidemiologists to study the dynamics of the disease and the effect of control interventions.

...read moreread less

414 citations

Journal Article•DOI•

The Wilcoxon signed rank test for paired comparisons of clustered data.

[...]

Bernard Rosner¹, Robert J. Glynn¹, Mei-Ling Ting Lee¹•Institutions (1)

Harvard University¹

01 Mar 2006-Biometrics

TL;DR: The Wilcoxon signed rank test is a frequently used nonparametric test for paired data based on independent units of analysis that is able to incorporate clustering and is shown to be asymptotically normal as the number of clusters becomes large, if the cluster size is bounded.

...read moreread less

Abstract: The Wilcoxon signed rank test is a frequently used nonparametric test for paired data (e.g., consisting of pre- and posttreatment measurements) based on independent units of analysis. This test cannot be used for paired comparisons arising from clustered data (e.g., if paired comparisons are available for each of two eyes of an individual). To incorporate clustering, a generalization of the randomization test formulation for the signed rank test is proposed, where the unit of randomization is at the cluster level (e.g., person), while the individual paired units of analysis are at the subunit within cluster level (e.g., eye within person). An adjusted variance estimate of the signed rank test statistic is then derived, which can be used for either balanced (same number of subunits per cluster) or unbalanced (different number of subunits per cluster) data, with an exchangeable correlation structure, with or without tied values. The resulting test statistic is shown to be asymptotically normal as the number of clusters becomes large, if the cluster size is bounded. Simulation studies are performed based on simulating correlated ranked data from a signed log-normal distribution. These studies indicate appropriate type I error for data sets with > or =20 clusters and a superior power profile compared with either the ordinary signed rank test based on the average cluster difference score or the multivariate signed rank test of Puri and Sen. Finally, the methods are illustrated with two data sets, (i) an ophthalmologic data set involving a comparison of electroretinogram (ERG) data in retinitis pigmentosa (RP) patients before and after undergoing an experimental surgical procedure, and (ii) a nutritional data set based on a randomized prospective study of nutritional supplements in RP patients where vitamin E intake outside of study capsules is compared before and after randomization to monitor compliance with nutritional protocols.

...read moreread less

291 citations

Journal Article•DOI•

Hierarchical Bayesian Methods for Estimation of Parameters in a Longitudinal HIV Dynamic System

[...]

Yangxin Huang¹, Dacheng Liu¹, Hulin Wu¹•Institutions (1)

University of Rochester¹

01 Jun 2006-Biometrics

TL;DR: A mechanism‐based dynamic model is proposed for characterizing long‐term viral dynamics with antiretroviral therapy, described by a set of nonlinear differential equations without closed‐form solutions that directly incorporate drug concentration, adherence, and drug susceptibility into a function of treatment efficacy.

...read moreread less

Abstract: HIV dynamics studies have significantly contributed to the understanding of HIV infection and antiviral treatment strategies. But most studies are limited to short-term viral dynamics due to the difficulty of establishing a relationship of antiviral response with multiple treatment factors such as drug exposure and drug susceptibility during long-term treatment. In this article, a mechanism-based dynamic model is proposed for characterizing long-term viral dynamics with antiretroviral therapy, described by a set of nonlinear differential equations without closed-form solutions. In this model we directly incorporate drug concentration, adherence, and drug susceptibility into a function of treatment efficacy, defined as an inhibition rate of virus replication. We investigate a Bayesian approach under the framework of hierarchical Bayesian (mixed-effects) models for estimating unknown dynamic parameters. In particular, interest focuses on estimating individual dynamic parameters. The proposed methods not only help to alleviate the difficulty in parameter identifiability, but also flexibly deal with sparse and unbalanced longitudinal data from individual subjects. For illustration purposes, we present one simulation example to implement the proposed approach and apply the methodology to a data set from an AIDS clinical trial. The basic concept of the longitudinal HIV dynamic systems and the proposed methodologies are generally applicable to any other biomedical dynamic systems.

...read moreread less

251 citations

Journal Article•DOI•

Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models.

[...]

Brian J. Reich¹, James S. Hodges², Vesna Zadnik•Institutions (2)

North Carolina State University¹, University of Minnesota²

01 Dec 2006-Biometrics

TL;DR: Disease-mapping models for areal data often have fixed effects to measure the effect of spatially varying covariates and random effects with a conditionally autoregressive (CAR) prior to account for spatial clustering, but adding the CAR random effects can cause large changes in the posterior mean and variance of fixed effects compared to the nonspatial regression model.

...read moreread less

Abstract: Disease-mapping models for areal data often have fixed effects to measure the effect of spatially varying covariates and random effects with a conditionally autoregressive (CAR) prior to account for spatial clustering. In such spatial regressions, the objective may be to estimate the fixed effects while accounting for the spatial correlation. But adding the CAR random effects can cause large changes in the posterior mean and variance of fixed effects compared to the nonspatial regression model. This article explores the impact of adding spatial random effects on fixed effect estimates and posterior variance. Diagnostics are proposed to measure posterior variance inflation from collinearity between the fixed effect covariates and the CAR random effects and to measure each region's influence on the change in the fixed effect's estimates by adding the CAR random effects. A new model that alleviates the collinearity between the fixed effect covariates and the CAR random effects is developed and extensions of these methods to point-referenced data models are discussed.

...read moreread less

249 citations

Journal Article•DOI•

Combining Predictors for Classification Using the Area under the Receiver Operating Characteristic Curve

[...]

Margaret S. Pepe¹, Margaret S. Pepe², Tianxi Cai³, Gary Longton²•Institutions (3)

University of Washington¹, Fred Hutchinson Cancer Research Center², Harvard University³

01 Mar 2006-Biometrics

TL;DR: Simulation studies suggest that AUC-based classification scores have performance comparable with logistic likelihood-based scores when the logistic regression model holds, and model fitting by maximizing the AUC should be considered when the goal is to derive a marker combination score for classification or prediction.

...read moreread less

Abstract: No single biomarker for cancer is considered adequately sensitive and specific for cancer screening. It is expected that the results of multiple markers will need to be combined in order to yield adequately accurate classification. Typically, the objective function that is optimized for combining markers is the likelihood function. In this article, we consider an alternative objective function-the area under the empirical receiver operating characteristic curve (AUC). We note that it yields consistent estimates of parameters in a generalized linear model for the risk score but does not require specifying the link function. Like logistic regression, it yields consistent estimation with case-control or cohort data. Simulation studies suggest that AUC-based classification scores have performance comparable with logistic likelihood-based scores when the logistic regression model holds. Analysis of data from a proteomics biomarker study shows that performance can be far superior to logistic regression derived scores when the logistic regression model does not hold. Model fitting by maximizing the AUC rather than the likelihood should be considered when the goal is to derive a marker combination score for classification or prediction.

...read moreread less

247 citations

Journal Article•DOI•

SAS® for Mixed Models, 2nd edition Edited by Littell, R. C., Milliken, G. A., Stroup, W. W., Wolfinger, R. D., and Schabenberger, O.

[...]

Thomas M. Loughin¹•Institutions (1)

Simon Fraser University¹

01 Dec 2006-Biometrics

Journal Article•DOI•

Pairwise Fitting of Mixed Models for the Joint Modeling of Multivariate Longitudinal Profiles

[...]

Steffen Fieuws¹, Geert Verbeke¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Jun 2006-Biometrics

TL;DR: A pairwise approach in which all possible bivariate models are fitted, and where inference follows from pseudo-likelihood arguments is proposed, applicable for linear, generalized linear, and nonlinear mixed models, or for combinations of these.

...read moreread less

Abstract: A mixed model is a flexible tool for joint modeling purposes, especially when the gathered data are unbalanced. However, computational problems due to the dimension of the joint covariance matrix of the random effects arise as soon as the number of outcomes and/or the number of used random effects per outcome increases. We propose a pairwise approach in which all possible bivariate models are fitted, and where inference follows from pseudo-likelihood arguments. The approach is applicable for linear, generalized linear, and nonlinear mixed models, or for combinations of these. The methodology will be illustrated for linear mixed models in the analysis of 22-dimensional, highly unbalanced, longitudinal profiles of hearing thresholds.

...read moreread less

Journal Article•DOI•

Joint modeling of survival and longitudinal data : Likelihood approach revisited

[...]

Fushing Hsieh¹, Yi Kuan Tseng², Jane-Ling Wang¹•Institutions (2)

University of California, Davis¹, National Central University²

01 Dec 2006-Biometrics

TL;DR: Insight is provided into the robustness property of the MLEs against departure from the normal random effects assumption and the difficulty of reliable estimates for the standard errors is suggested by using bootstrap procedures.

...read moreread less

Abstract: The maximum likelihood approach to jointly model the survival time and its longitudinal covariates has been successful to model both processes in longitudinal studies. Random effects in the longitudinal process are often used to model the survival times through a proportional hazards model, and this invokes an EM algorithm to search for the maximum likelihood estimates (MLEs). Several intriguing issues are examined here, including the robustness of the MLEs against departure from the normal random effects assumption, and difficulties with the profile likelihood approach to provide reliable estimates for the standard error of the MLEs. We provide insights into the robustness property and suggest to overcome the difficulty of reliable estimates for the standard errors by using bootstrap procedures. Numerical studies and data analysis illustrate our points.

...read moreread less

Journal Article•DOI•

Generalized additive modeling with implicit variable selection by likelihood-based boosting.

[...]

Gerhard Tutz¹, Harald Binder²•Institutions (2)

Ludwig Maximilian University of Munich¹, University of Regensburg²

01 Dec 2006-Biometrics

TL;DR: The generalized additive model boosting method is shown to be a strong competitor to common procedures for the fitting of generalized additive models, in high‐dimensional settings with many nuisance predictor variables it performs very well.

...read moreread less

Abstract: The use of generalized additive models in statistical data analysis suffers from the restriction to few explanatory variables and the problems of selection of smoothing parameters. Generalized additive model boosting circumvents these problems by means of stagewise fitting of weak learners. A fitting procedure is derived which works for all simple exponential family distributions, including binomial, Poisson, and normal response variables. The procedure combines the selection of variables and the determination of the appropriate amount of smoothing. Penalized regression splines and the newly introduced penalized stumps are considered as weak learners. Estimates of standard deviations and stopping criteria, which are notorious problems in iterative procedures, are based on an approximate hat matrix. The method is shown to be a strong competitor to common procedures for the fitting of generalized additive models. In particular, in high-dimensional settings with many nuisance predictor variables it performs very well.

...read moreread less

Journal Article•DOI•

Joint models for multivariate longitudinal and multivariate survival data.

[...]

Yueh-Yun Chi¹, Joseph G. Ibrahim¹•Institutions (1)

University of North Carolina at Chapel Hill¹

01 Jun 2006-Biometrics

TL;DR: A multivariate mixed effects model is presented to explicitly capture two different sources of dependence among longitudinal measures over time as well as dependence between different variables in cancer and AIDS clinical trials.

...read moreread less

Abstract: Joint modeling of longitudinal and survival data is becoming increasingly essential in most cancer and AIDS clinical trials. We propose a likelihood approach to extend both longitudinal and survival components to be multidimensional. A multivariate mixed effects model is presented to explicitly capture two different sources of dependence among longitudinal measures over time as well as dependence between different variables. For the survival component of the joint model, we introduce a shared frailty, which is assumed to have a positive stable distribution, to induce correlation between failure times. The proposed marginal univariate survival model, which accommodates both zero and nonzero cure fractions for the time to event, is then applied to each marginal survival function. The proposed multivariate survival model has a proportional hazards structure for the population hazard, conditionally as well as marginally, when the baseline covariates are specified through a specific mechanism. In addition, the model is capable of dealing with survival functions with different cure rate structures. The methodology is specifically applied to the International Breast Cancer Study Group (IBCSG) trial to investigate the relationship between quality of life, disease-free survival, and overall survival.

...read moreread less

Journal Article•DOI•

Quadratic Inference Functions for Varying‐Coefficient Models with Longitudinal Data

[...]

Annie Qu¹, Runze Li²•Institutions (2)

Oregon State University¹, Pennsylvania State University²

01 Jun 2006-Biometrics

TL;DR: A unified and efficient nonparametric hypothesis testing procedure that can easily take into account correlation within subjects and deal directly with both continuous and discrete response longitudinal data under the framework of generalized linear models is proposed.

...read moreread less

Abstract: Nonparametric smoothing methods are used to model longitudinal data, but the challenge remains to incorporate correlation into nonparametric estimation procedures. In this article, we propose an efficient estimation procedure for varying-coefficient models for longitudinal data. The proposed procedure can easily take into account correlation within subjects and deal directly with both continuous and discrete response longitudinal data under the framework of generalized linear models. The proposed approach yields a more efficient estimator than the generalized estimation equation approach when the working correlation is misspecified. For varying-coefficient models, it is often of interest to test whether coefficient functions are time varying or time invariant. We propose a unified and efficient nonparametric hypothesis testing procedure, and further demonstrate that the resulting test statistics have an asymptotic chi-squared distribution. In addition, the goodness-of-fit test is applied to test whether the model assumption is satisfied. The corresponding test is also useful for choosing basis functions and the number of knots for regression spline models in conjunction with the model selection criterion. We evaluate the finite sample performance of the proposed procedures with Monte Carlo simulation studies. The proposed methodology is illustrated by the analysis of an acquired immune deficiency syndrome (AIDS) data set.

...read moreread less

Journal Article•DOI•

A General Method for Dealing with Misclassification in Regression: The Misclassification SIMEX

[...]

Helmut Küchenhoff¹, Samuel Mwalili², Emmanuel Lesaffre²•Institutions (2)

Ludwig Maximilian University of Munich¹, Catholic University of Leuven²

01 Mar 2006-Biometrics

TL;DR: A new general approach for handling misclassification in discrete covariates or responses in regression models, applicable to models with misclassified response and/or misclassified discrete regressors and to a study on caries with a misclassified longitudinal response.

...read moreread less

Abstract: We have developed a new general approach for handling misclassification in discrete covariates or responses in regression models. The simulation and extrapolation (SIMEX) method, which was originally designed for handling additive covariate measurement error, is applied to the case of misclassification. The statistical model for characterizing misclassification is given by the transition matrix Pi from the true to the observed variable. We exploit the relationship between the size of misclassification and bias in estimating the parameters of interest. Assuming that Pi is known or can be estimated from validation data, we simulate data with higher misclassification and extrapolate back to the case of no misclassification. We show that our method is quite general and applicable to models with misclassified response and/or misclassified discrete regressors. In the case of a binary response with misclassification, we compare our method to the approach of Neuhaus, and to the matrix method of Morrissey and Spiegelman in the case of a misclassified binary regressor. We apply our method to a study on caries with a misclassified longitudinal response.

...read moreread less

Journal Article•DOI•

Site occupancy models with heterogeneous detection probabilities.

[...]

J. Andrew Royle¹•Institutions (1)

Patuxent Wildlife Research Center¹

01 Mar 2006-Biometrics

TL;DR: In this article, occurrence probability models that allow for heterogeneous detection probabilities by considering several common classes of mixture distributions for p are developed.

...read moreread less

Abstract: Summary Models for estimating the probability of occurrence of a species in the presence of imperfect detection are important in many ecological disciplines. In these “site occupancy” models, the possibility of heterogeneity in detection probabilities among sites must be considered because variation in abundance (and other factors) among sampled sites induces variation in detection probability (p). In this article, I develop occurrence probability models that allow for heterogeneous detection probabilities by considering several common classes of mixture distributions for p. For any mixing distribution, the likelihood has the general form of a zero-inflated binomial mixture for which inference based upon integrated likelihood is straightforward. A recent paper by Link (2003, Biometrics59, 1123–1130) demonstrates that in closed population models used for estimating population size, different classes of mixture distributions are indistinguishable from data, yet can produce very different inferences about population size. I demonstrate that this problem can also arise in models for estimating site occupancy in the presence of heterogeneous detection probabilities. The implications of this are discussed in the context of an application to avian survey data and the development of animal monitoring programs.

...read moreread less

Journal Article•DOI•

Regularized estimation in the accelerated failure time model with high-dimensional covariates.

[...]

Jian Huang¹, Shuangge Ma², Huiliang Xie¹•Institutions (2)

University of Iowa¹, University of Washington²

01 Sep 2006-Biometrics

TL;DR: Two regularization approaches are considered, the LASSO and the threshold-gradient-directed regularization, for estimation and variable selection in the accelerated failure time model with multiple covariates based on Stute's weighted least squares method.

...read moreread less

Abstract: We consider two regularization approaches, the LASSO and the threshold-gradient-directed regularization, for estimation and variable selection in the accelerated failure time model with multiple covariates based on Stute's weighted least squares method. The Stute estimator uses Kaplan-Meier weights to account for censoring in the least squares criterion. The weighted least squares objective function makes the adaptation of this approach to multiple covariate settings computationally feasible. We use V-fold cross-validation and a modified Akaike's Information Criterion for tuning parameter selection, and a bootstrap approach for variance estimation. The proposed method is evaluated using simulations and demonstrated on a real data example.

...read moreread less

Journal Article•DOI•

Statistical methods for expression quantitative trait loci (eQTL) mapping.

[...]

Christina Kendziorski¹, Meng Chen¹, Ming Yuan¹, Hong Lan¹, Alan D. Attie¹ - Show less +1 more•Institutions (1)

University of Wisconsin-Madison¹

01 Mar 2006-Biometrics

TL;DR: Results from simulation studies indicate that the MOM model is best at controlling false discoveries, without sacrificing power, and is the only one capable of finding two genome regions previously shown to be involved in diabetes.

...read moreread less

Abstract: Traditional genetic mapping has largely focused on the identification of loci affecting one, or at most a few, complex traits. Microarrays allow for measurement of thousands of gene expression abundances, themselves complex traits, and a number of recent investigations have considered these measurements as phenotypes in mapping studies. Combining traditional quantitative trait loci (QTL) mapping methods with microarray data is a powerful approach with demonstrated utility in a number of recent biological investigations. These expression quantitative trait loci (eQTL) studies are similar to traditional QTL studies, as a main goal is to identify the genomic locations to which the expression traits are linked. However, eQTL studies probe thousands of expression transcripts; and as a result, standard multi-trait QTL mapping methods, designed to handle at most tens of traits, do not directly apply. One possible approach is to use single-trait QTL mapping methods to analyze each transcript separately. This leads to an increased number of false discoveries, as corrections for multiple tests across transcripts are not made. Similarly, the repeated application, at each marker, of methods for identifying differentially expressed transcripts suffers from multiple tests across markers. Here, we demonstrate the deficiencies of these approaches and propose a mixture over markers (MOM) model that shares information across both markers and transcripts. The utility of all methods is evaluated using simulated data as well as data from an F(2) mouse cross in a study of diabetes. Results from simulation studies indicate that the MOM model is best at controlling false discoveries, without sacrificing power. The MOM model is also the only one capable of finding two genome regions previously shown to be involved in diabetes.

...read moreread less

Journal Article•DOI•

Bayesian dose-finding in phase I/II clinical trials using toxicity and efficacy odds ratios

[...]

Guosheng Yin¹, Yisheng Li¹, Yuan Ji¹•Institutions (1)

University of Texas MD Anderson Cancer Center¹

01 Sep 2006-Biometrics

TL;DR: Under various scenarios, the new Bayesian design based on the toxicity–efficacy odds ratio trade‐offs exhibits good properties and treats most patients at the desirable dose levels.

...read moreread less

Abstract: A Bayesian adaptive design is proposed for dose-finding in phase I/II clinical trials to incorporate the bivariate outcomes, toxicity and efficacy, of a new treatment. Without specifying any parametric functional form for the drug dose-response curve, we jointly model the bivariate binary data to account for the correlation between toxicity and efficacy. After observing all the responses of each cohort of patients, the dosage for the next cohort is escalated, deescalated, or unchanged according to the proposed odds ratio criteria constructed from the posterior toxicity and efficacy probabilities. A novel class of prior distributions is proposed through logit transformations which implicitly imposes a monotonic constraint on dose toxicity probabilities and correlates the probabilities of the bivariate outcomes. We conduct simulation studies to evaluate the operating characteristics of the proposed method. Under various scenarios, the new Bayesian design based on the toxicity-efficacy odds ratio trade-offs exhibits good properties and treats most patients at the desirable dose levels. The method is illustrated with a real trial design for a breast medical oncology study.

...read moreread less

Journal Article•DOI•

Accommodating unmodeled heterogeneity in double-observer distance sampling surveys.

[...]

David L. Borchers¹, Jeffrey L. Laake², Colin Southwell³, Charles G. M. Paxton¹•Institutions (3)

University of St Andrews¹, National Marine Fisheries Service², Australian Antarctic Division³

01 Jun 2006-Biometrics

TL;DR: A mark-recapture-based model is developed that uses the observed distribution to relax the assumption of zero correlation between detection probabilities implicit in the mark-Recapture model and demonstrates its usefulness in coping with unmodeled heterogeneity using data from an aerial survey of crabeater seals in the Antarctic.

...read moreread less

Abstract: Mark-recapture models applied to double-observer distance sampling data neglect the information on relative detectability of objects contained in the distribution of observed distances. A difference between the observed distribution and that predicted by the mark-recapture model is symptomatic of a failure of the assumption of zero correlation between detection probabilities implicit in the mark-recapture model. We develop a mark-recapture-based model that uses the observed distribution to relax this assumption to zero correlation at only one distance. We demonstrate its usefulness in coping with unmodeled heterogeneity using data from an aerial survey of crabeater seals in the Antarctic.

...read moreread less

Journal Article•DOI•

Estimating average annual percent change for disease rates without assuming constant change.

[...]

Michael P. Fay, Ram C. Tiwari¹, Eric J. Feuer¹, Zhaohui Zou•Institutions (1)

National Institutes of Health¹

01 Sep 2006-Biometrics

TL;DR: This work calls this parameter the percent change annualized (PCA) and proposes two new estimators of it, an adaptive one and equals the linear model estimator with a high probability when the rates are not significantly different from linear on the log scale, but includes fewer points if there are significant departures from that linearity.

...read moreread less

Abstract: The annual percent change (APC) is often used to measure trends in disease and mortality rates, and a common estimator of this parameter uses a linear model on the log of the age-standardized rates Under the assumption of linearity on the log scale, which is equivalent to a constant change assumption, APC can be equivalently defined in three ways as transformations of either (1) the slope of the line that runs through the log of each rate, (2) the ratio of the last rate to the first rate in the series, or (3) the geometric mean of the proportional changes in the rates over the series When the constant change assumption fails then the first definition cannot be applied as is, while the second and third definitions unambiguously define the same parameter regardless of whether the assumption holds We call this parameter the percent change annualized (PCA) and propose two new estimators of it The first, the two-point estimator, uses only the first and last rates, assuming nothing about the rates in between This estimator requires fewer assumptions and is asymptotically unbiased as the size of the population gets large, but has more variability since it uses no information from the middle rates The second estimator is an adaptive one and equals the linear model estimator with a high probability when the rates are not significantly different from linear on the log scale, but includes fewer points if there are significant departures from that linearity For the two-point estimator we can use confidence intervals previously developed for ratios of directly standardized rates For the adaptive estimator, we show through simulation that the bootstrap confidence intervals give appropriate coverage

...read moreread less

Journal Article•DOI•

Augmented designs to assess immune response in vaccine trials.

[...]

Dean Follmann

01 Dec 2006-Biometrics

TL;DR: Methods for use in vaccine clinical trials to help determine whether the immune response to a vaccine is actually causing a reduction in the infection rate are introduced and may help elucidate the role of immune response in preventing infections.

...read moreread less

Abstract: This article introduces methods for use in vaccine clinical trials to help determine whether the immune response to a vaccine is actually causing a reduction in the infection rate. This is not easy because immune response to the (say HIV) vaccine is only observed in the HIV vaccine arm. If we knew what the HIV-specific immune response in placebo recipients would have been, had they been vaccinated, this immune response could be treated essentially like a baseline covariate and an interaction with treatment could be evaluated. Relatedly, the rate of infection by this baseline covariate could be compared between the two groups and a causative role of immune response would be supported if infection risk decreased with increasing HIV immune response only in the vaccine group. We introduce two methods for inferring this HIV-specific immune response. The first involves vaccinating everyone before baseline with an irrelevant vaccine, for example, rabies. Randomization ensures that the relationship between the immune responses to the rabies and HIV vaccines observed in the vaccine group is the same as what would have been seen in the placebo group. We infer a placebo volunteer’s response to the HIV vaccine using their rabies response and a prediction model from the vaccine group. The second method entails vaccinating all uninfected placebo patients at the closeout of the trial with the HIV vaccine and recording immune response. We pretend this immune response at closeout is what they would have had at baseline. We can then infer what the distribution of immune response among placebo infecteds would have been. Such designs may help elucidate the role of immune response in preventing infections. More pointedly, they could be helpful in the decision to improve or abandon an HIV vaccine with mediocre performance in a phase III trial.

...read moreread less

Journal Article•DOI•

Structured additive regression for categorical space-time data: a mixed model approach.

[...]

Thomas Kneib¹, Ludwig Fahrmeir¹•Institutions (1)

Ludwig Maximilian University of Munich¹

01 Mar 2006-Biometrics

TL;DR: In this article, a general class of structured additive regression models for categorical responses, allowing for a flexible semiparametric predictor, is proposed for forest health with damage state of trees as the response.

...read moreread less

Abstract: Motivated by a space-time study on forest health with damage state of trees as the response, we propose a general class of structured additive regression models for categorical responses, allowing for a flexible semiparametric predictor Nonlinear effects of continuous covariates, time trends, and interactions between continuous covariates are modeled by penalized splines Spatial effects can be estimated based on Markov random fields, Gaussian random fields, or two-dimensional penalized splines We present our approach from a Bayesian perspective, with inference based on a categorical linear mixed model representation The resulting empirical Bayes method is closely related to penalized likelihood estimation in a frequentist setting Variance components, corresponding to inverse smoothing parameters, are estimated using (approximate) restricted maximum likelihood In simulation studies we investigate the performance of different choices for the spatial effect, compare the empirical Bayes approach to competing methodology, and study the bias of mixed model estimates As an application we analyze data from the forest health survey

...read moreread less

Journal Article•DOI•

Response Surface Designs for Experiments in Bioprocessing

[...]

Steven G. Gilmour¹•Institutions (1)

Queen Mary University of London¹

01 Jun 2006-Biometrics

TL;DR: A class of three-level response surface designs is introduced which allows all except the quadratic parameters to be estimated orthogonally, as well as having a number of other useful properties.

...read moreread less

Abstract: Many processes in the biological industries are studied using response surface methodology. The use of biological materials, however, means that run-to-run variation is typically much greater than that in many experiments in mechanical or chemical engineering and so the designs used require greater replication. The data analysis which is performed may involve some variable selection, as well as fitting polynomial response surface models. This implies that designs should allow the parameters of the model to be estimated nearly orthogonally. A class of three-level response surface designs is introduced which allows all except the quadratic parameters to be estimated orthogonally, as well as having a number of other useful properties. These subset designs are obtained by using two-level factorial designs in subsets of the factors, with the other factors being held at their middle level. This allows their properties to be easily explored. Replacing some of the two-level designs with fractional replicates broadens the class of useful designs, especially with five or more factors, and sometimes incomplete subsets can be used. It is very simple to include a few two- and four-level factors in these designs by excluding subsets with these factors at the middle level. Subset designs can be easily modified to include factors with five or more levels by allowing a different pair of levels to be used in different subsets.

...read moreread less

Journal Article•DOI•

Response‐Adaptive Randomization for Clinical Trials with Continuous Outcomes

[...]

Lanju Zhang¹, William F. Rosenberger¹, William F. Rosenberger², William F. Rosenberger³•Institutions (3)

University of Maryland, Baltimore County¹, University of Maryland, Baltimore², George Mason University³

01 Jun 2006-Biometrics

TL;DR: An explicit asymptotic method is provided to evaluate the performance of different response-adaptive randomization procedures in clinical trials with continuous outcomes and concludes that the doubly adaptive biased coin design procedure targeting optimal allocation is the best one for practical use.

...read moreread less

Abstract: We provide an explicit asymptotic method to evaluate the performance of different response-adaptive randomization procedures in clinical trials with continuous outcomes. We use this method to investigate four different response-adaptive randomization procedures. Their performance, especially in power and treatment assignment skewing to the better treatment, is thoroughly evaluated theoretically. These results are then verified by simulation. Our analysis concludes that the doubly adaptive biased coin design procedure targeting optimal allocation is the best one for practical use. We also consider the effect of delay in responses and nonstandard responses, for example, Cauchy distributed response. We illustrate our procedure by redesigning a real clinical trial.

...read moreread less

Journal Article•DOI•

Variable selection for logistic regression using a prediction-focused information criterion.

[...]

Gerda Claeskens¹, Christophe Croux¹, Johan Van Kerckhoven¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Dec 2006-Biometrics

TL;DR: More general versions of the focused information criterion (FIC) for variable selection in logistic regression are proposed, allowing other risk measures such as the one based on L(p) error.

...read moreread less

Abstract: Summary In biostatistical practice, it is common to use information criteria as a guide for model selection. We propose new versions of the focused information criterion (FIC) for variable selection in logistic regression. The FIC gives, depending on the quantity to be estimated, possibly different sets of selected variables. The standard version of the FIC measures the mean squared error of the estimator of the quantity of interest in the selected model. In this article, we propose more general versions of the FIC, allowing other risk measures such as the one based on Lp error. When prediction of an event is important, as is often the case in medical applications, we construct an FIC using the error rate as a natural risk measure. The advantages of using an information criterion which depends on both the quantity of interest and the selected risk measure are illustrated by means of a simulation study and application to a study on diabetic retinopathy.

...read moreread less

Journal Article•DOI•

Bayesian robust inference for differential gene expression in microarrays with multiple samples.

[...]

Raphael Gottardo¹, Adrian E. Raftery¹, Ka Yee Yeung¹, Roger E. Bumgarner¹•Institutions (1)

University of Washington¹

01 Mar 2006-Biometrics

TL;DR: A robust Bayesian hierarchical model is developed that can be used for testing for differentially expressed genes among multiple samples, and it can distinguish between the different possible patterns of differential expression when there are three or more samples.

...read moreread less

Abstract: Summary. We consider the problem of identifying differentially expressed genes under different conditions using gene expression microarrays. Because of the many steps involved in the experimental process, from hybridization to image analysis, cDNA microarray data often contain outliers. For example, an outlying data value could occur because of scratches or dust on the surface, imperfections in the glass, or imperfections in the array production. We develop a robust Bayesian hierarchical model for testing for differential expression. Errors are modeled explicitly using a t-distribution, which accounts for outliers. The model includes an exchangeable prior for the variances, which allows different variances for the genes but still shrinks extreme empirical variances. Our model can be used for testing for differentially expressed genes among multiple samples, and it can distinguish between the different possible patterns of differential expression when there are three or more samples. Parameter estimation is carried out using a novel version of Markov chain Monte Carlo that is appropriate when the model puts mass on subspaces of the full parameter space. The method is illustrated using two publicly available gene expression data sets. We compare our method to six other baseline and commonly used techniques, namely the t-test, the Bonferroni-adjusted t-test, significance analysis of microarrays (SAM), Efron’s empirical Bayes, and EBarrays in both its lognormal– normal and gamma–gamma forms. In an experiment with HIV data, our method performed better than these alternatives, on the basis of between-replicate agreement and disagreement.

...read moreread less

Journal Article•DOI•

A nonlinear model with latent process for cognitive evolution using multivariate longitudinal data

[...]

Cecile Proust¹, Hélène Jacqmin-Gadda¹, Jeremy M. G. Taylor², Julien Ganiayre¹, Daniel Commenges¹ - Show less +1 more•Institutions (2)

University of Bordeaux¹, University of Michigan²

01 Dec 2006-Biometrics

TL;DR: A model is proposed to describe the evolution in continuous time of unobserved cognition in the elderly and assess the impact of covariates directly on it, using data from PAQUID, a French prospective cohort study of ageing.

...read moreread less

Abstract: Cognition is not directly measurable. It is assessed using psychometric tests, which can be viewed as quantitative measures of cognition with error. The aim of this article is to propose a model to describe the evolution in continuous time of unobserved cognition in the elderly and assess the impact of covariates directly on it. The latent cognitive process is defined using a linear mixed model including a Brownian motion and time-dependent covariates. The observed psychometric tests are considered as the results of parameterized nonlinear transformations of the latent cognitive process at discrete occasions. Estimation of the parameters contained both in the transformations and in the linear mixed model is achieved by maximizing the observed likelihood and graphical methods are performed to assess the goodness of fit of the model. The method is applied to data from PAQUID, a French prospective cohort study of ageing.

...read moreread less

Collapse