scispace - formally typeset
Search or ask a question

Showing papers in "Biometrics in 2007"


Journal ArticleDOI
TL;DR: The modified BIC is derived by asymptotic approximation of the Bayes factor for the model of Brownian motion with changing drift and performs well compared to existing methods in accurately choosing the number of regions of changed copy number.
Abstract: In the analysis of data generated by change-point processes, one critical challenge is to determine the number of change-points. The classic Bayes information criterion (BIC) statistic does not work well here because of irregularities in the likelihood function. By asymptotic approximation of the Bayes factor, we derive a modified BIC for the model of Brownian motion with changing drift. The modified BIC is similar to the classic BIC in the sense that the first term consists of the log likelihood, but it differs in the terms that penalize for model dimension. As an example of application, this new statistic is used to analyze array-based comparative genomic hybridization (array-CGH) data. Array-CGH measures the number of chromosome copies at each genome location of a cell sample, and is useful for finding the regions of genome deletion and amplification in tumor cells. The modified BIC performs well compared to existing methods in accurately choosing the number of regions of changed copy number. Unlike existing methods, it does not rely on tuning parameters or intensive computing. Thus it is impartial and easier to understand and to use.

383 citations


Journal ArticleDOI
TL;DR: It is shown that the LSKM semiparametric regression can be formulated using a linear mixed model, and both the regression coefficients of the covariate effects and the L SKM estimator of the genetic pathway effect can be obtained using the best linear unbiased predictor in the correspondinglinear mixed model formulation.
Abstract: We consider a semiparametric regression model that relates a normal outcome to covariates and a genetic pathway, where the covariate effects are modeled parametrically and the pathway effect of multiple gene expressions is modeled parametrically or nonparametrically using least-squares kernel machines (LSKMs). This unified framework allows a flexible function for the joint effect of multiple genes within a pathway by specifying a kernel function and allows for the possibility that each gene expression effect might be nonlinear and the genes within the same pathway are likely to interact with each other in a complicated way. This semiparametric model also makes it possible to test for the overall genetic pathway effect. We show that the LSKM semiparametric regression can be formulated using a linear mixed model. Estimation and inference hence can proceed within the linear mixed model framework using standard mixed model software. Both the regression coefficients of the covariate effects and the LSKM estimator of the genetic pathway effect can be obtained using the best linear unbiased predictor in the corresponding linear mixed model formulation. The smoothing parameter and the kernel parameter can be estimated as variance components using restricted maximum likelihood. A score test is developed to test for the genetic pathway effect. Model/variable selection within the LSKM framework is discussed. The methods are illustrated using a prostate cancer data set and evaluated using simulations.

334 citations


Journal ArticleDOI
TL;DR: Regression parameter estimates obtained from a simple estimating function are shown to be asymptotically normal when the "mother" intensity for the Neyman-Scott process tends to infinity.
Abstract: This article is concerned with inference for a certain class of inhomogeneous Neyman-Scott point processes depending on spatial covariates. Regression parameter estimates obtained from a simple estimating function are shown to be asymptotically normal when the "mother" intensity for the Neyman-Scott process tends to infinity. Clustering parameter estimates are obtained using minimum contrast estimation based on the K-function. The approach is motivated and illustrated by applications to point pattern data from a tropical rain forest plot.

252 citations


Journal ArticleDOI
TL;DR: Murphy's model is a special case of Robins's and that the methods are closely related but not equivalent, as well as interesting features of the methods highlighted using the Multicenter AIDS Cohort Study and through simulation.
Abstract: A dynamic regime is a function that takes treatment and covariate history and baseline covariates as inputs and returns a decision to be made. Murphy (2003, Journal of the Royal Statistical Society, Series B 65, 331-366) and Robins (2004, Proceedings of the Second Seattle Symposium on Biostatistics, 189-326) have proposed models and developed semiparametric methods for making inference about the optimal regime in a multi-interval trial that provide clear advantages over traditional parametric approaches. We show that Murphy's model is a special case of Robins's and that the methods are closely related but not equivalent. Interesting features of the methods are highlighted using the Multicenter AIDS Cohort Study and through simulation.

221 citations


Journal ArticleDOI
TL;DR: A spatial scan statistic based on an exponential model to handle either uncensored or censored continuous survival data and performs well for different survival distribution functions including the exponential, gamma, and log-normal distributions.
Abstract: Spatial scan statistics with Bernoulli and Poisson models are commonly used for geographical disease surveillance and cluster detection. These models, suitable for count data, were not designed for data with continuous outcomes. We propose a spatial scan statistic based on an exponential model to handle either uncensored or censored continuous survival data. The power and sensitivity of the developed model are investigated through intensive simulations. The method performs well for different survival distribution functions including the exponential, gamma, and log-normal distributions. We also present a method to adjust the analysis for covariates. The cluster detection method is illustrated using survival data for men diagnosed with prostate cancer in Connecticut from 1984 to 1995.

186 citations


Journal ArticleDOI
TL;DR: This work examines two natural bivariate von Mises distributions--referred to as Sine and Cosine models--which have five parameters and, for concentrated data, tend to a bivariate normal distribution, and sees that the Cosine model may be preferred.
Abstract: Summary A fundamental problem in bioinformatics is to characterize the secondary structure of a protein, which has traditionally been carried out by examining a scatterplot (Ramachandran plot) of the conformational angles. We examine two natural bivariate von Mises distributions—referred to as Sine and Cosine models—which have five parameters and, for concentrated data, tend to a bivariate normal distribution. These are analyzed and their main properties derived. Conditions on the parameters are established which result in bimodal behavior for the joint density and the marginal distribution, and we note an interesting situation in which the joint density is bimodal but the marginal distributions are unimodal. We carry out comparisons of the two models, and it is seen that the Cosine model may be preferred. Mixture distributions of the Cosine model are fitted to two representative protein datasets using the expectation maximization algorithm, which results in an objective partition of the scatterplot into a number of components. Our results are consistent with empirical observations; new insights are discussed.

176 citations


Journal ArticleDOI
TL;DR: An empirical Bayes method (E-BAYES) is developed to map epistatic QTL under the mixed model framework and appears to outperform all other methods in terms of minimizing the mean-squared error (MSE) with relatively short computing time.
Abstract: The genetic variance of a quantitative trait is often controlled by the segregation of multiple interacting loci. Linear model regression analysis is usually applied to estimating and testing effects of these quantitative trait loci (QTL). Including all the main effects and the effects of interaction (epistatic effects), the dimension of the linear model can be extremely high. Variable selection via stepwise regression or stochastic search variable selection (SSVS) is the common procedure for epistatic effect QTL analysis. These methods are computationally intensive, yet they may not be optimal. The LASSO (least absolute shrinkage and selection operator) method is computationally more efficient than the above methods. As a result, it has been widely used in regression analysis for large models. However, LASSO has never been applied to genetic mapping for epistatic QTL, where the number of model effects is typically many times larger than the sample size. In this study, we developed an empirical Bayes method (E-BAYES) to map epistatic QTL under the mixed model framework. We also tested the feasibility of using LASSO to estimate epistatic effects, examined the fully Bayesian SSVS, and reevaluated the penalized likelihood (PENAL) methods in mapping epistatic QTL. Simulation studies showed that all the above methods performed satisfactorily well. However, E-BAYES appears to outperform all other methods in terms of minimizing the mean-squared error (MSE) with relatively short computing time. Application of the new method to real data was demonstrated using a barley dataset.

171 citations



Journal ArticleDOI
TL;DR: Simulations show good estimation and confidence interval performance by the proposed RPM approach under unmeasured confounding relative to the standard mediation approach, but poor performance under departures from the structural interaction assumptions.
Abstract: We present a linear rank preserving model (RPM) approach for analyzing mediation of a randomized baseline intervention's effect on a univariate follow-up outcome. Unlike standard mediation analyses, our approach does not assume that the mediating factor is also randomly assigned to individuals in addition to the randomized baseline intervention (i.e., sequential ignorability), but does make several structural interaction assumptions that currently are untestable. The G-estimation procedure for the proposed RPM represents an extension of the work on direct effects of randomized intervention effects for survival outcomes by Robins and Greenland (1994, Journal of the American Statistical Association 89, 737-749) and on intervention non-adherence by Ten Have et al. (2004, Journal of the American Statistical Association 99, 8-16). Simulations show good estimation and confidence interval performance by the proposed RPM approach under unmeasured confounding relative to the standard mediation approach, but poor performance under departures from the structural interaction assumptions. The trade-off between these assumptions is evaluated in the context of two suicide/depression intervention studies.

167 citations


Journal ArticleDOI
TL;DR: The weighted gap and the difference of difference‐weighted (DD‐ Weighted) gap methods for estimating the number of clusters in data using the weighted within‐clusters sum of errors: a measure of the within-clusters homogeneity are proposed.
Abstract: SUMMARY. Estimating the number of clusters in a data set is a crucial step in cluster analysis. In this article, motivated by the gap method (Tibshirani, Walther, and Hastie, 2001, Journal of the Royal Statistical Society B 63, 411-423), we propose the weighted gap and the difference of difference-weighted (DD-weighted) gap methods for estimating the number of clusters in data using the weighted within-clusters sum of errors: a measure of the within-clusters homogeneity. In addition, we propose a "multilayer" clustering approach, which is shown to be more accurate than the original gap method, particularly in detecting the nested cluster structure of the data. The methods are applicable when the input data contain continuous measurements and can be used with any clustering method. Simulation studies and real data are investigated and compared among these proposed methods as well as with the original gap method.

152 citations


Journal ArticleDOI
TL;DR: In observational studies, a method of sensitivity analysis is developed for m-tests, m-intervals, and m-estimates: it shows the extent to which inferences would be altered by biases of various magnitudes due to nonrandom treatment assignment.
Abstract: Huber's m-estimates use an estimating equation in which observations are permitted a controlled level of influence. The family of m-estimates includes least squares and maximum likelihood, but typical applications give extreme observations limited weight. Maritz proposed methods of exact and approximate permutation inference for m-tests, confidence intervals, and estimators, which can be derived from random assignment of paired subjects to treatment or control. In contrast, in observational studies, where treatments are not randomly assigned, subjects matched for observed covariates may differ in terms of unobserved covariates, so differing outcomes may not be treatment effects. In observational studies, a method of sensitivity analysis is developed for m-tests, m-intervals, and m-estimates: it shows the extent to which inferences would be altered by biases of various magnitudes due to nonrandom treatment assignment. The method is developed for both matched pairs, with one treated subject matched to one control, and for matched sets, with one treated subject matched to one or more controls. The method is illustrated using two studies: (i) a paired study of damage to DNA from exposure to chromium and nickel and (ii) a study with one or two matched controls comparing side effects of two drug regimes to treat tuberculosis. The approach yields sensitivity analyses for: (i) m-tests with Huber's weight function and other robust weight functions, (ii) the permutational t-test which uses the observations directly, and (iii) various other procedures such as the sign test, Noether's test, and the permutation distribution of the efficient score test for a location family of distributions. Permutation inference with covariance adjustment is briefly discussed.

Journal ArticleDOI
TL;DR: A semiparametric method to jointly model the recurrent and terminal event processes is proposed, which is modeled by a shared gamma frailty that is included in both the recurrent event rate and terminalevent hazard function.
Abstract: In clinical and observational studies, recurrent event data (e.g., hospitalization) with a terminal event (e.g., death) are often encountered. In many instances, the terminal event is strongly correlated with the recurrent event process. In this article, we propose a semiparametric method to jointly model the recurrent and terminal event processes. The dependence is modeled by a shared gamma frailty that is included in both the recurrent event rate and terminal event hazard function. Marginal models are used to estimate the regression effects on the terminal and recurrent event processes, and a Poisson model is used to estimate the dispersion of the frailty variable. A sandwich estimator is used to achieve additional robustness. An analysis of hospitalization data for patients in the peritoneal dialysis study is presented to illustrate the proposed method.

Journal ArticleDOI
TL;DR: A theoretical result is found which states that whenever a subset of fixed-effects parameters, not included in the random-effects structure equals zero, the corresponding maximum likelihood estimator will consistently estimate zero, which implies that under certain conditions a significant effect could be considered as a reliable result, even if therandom-effects distribution is misspecified.
Abstract: Generalized linear mixed models (GLMMs) have become a frequently used tool for the analysis of non-Gaussian longitudinal data. Estimation is based on maximum likelihood theory, which assumes that the underlying probability model is correctly specified. Recent research is showing that the results obtained from these models are not always robust against departures from the assumptions on which these models are based. In the present work we have used simulations with a logistic random-intercept model to study the impact of misspecifying the random-effects distribution on the type I and II errors of the tests for the mean structure in GLMMs. We found that the misspecification can either increase or decrease the power of the tests, depending on the shape of the underlying random-effects distribution, and it can considerably inflate the type I error rate. Additionally, we have found a theoretical result which states that whenever a subset of fixed-effects parameters, not included in the random-effects structure equals zero, the corresponding maximum likelihood estimator will consistently estimate zero. This implies that under certain conditions a significant effect could be considered as a reliable result, even if the random-effects distribution is misspecified.

Journal ArticleDOI
TL;DR: The methodology is illustrated by comparing different pooling algorithms for the detection of individuals recently infected with HIV in North Carolina and Malawi.
Abstract: We derive and compare the operating characteristics of hierarchical and square array-based testing algorithms for case identification in the presence of testing error. The operating characteristics investigated include efficiency (i.e., expected number of tests per specimen) and error rates (i.e., sensitivity, specificity, positive and negative predictive values, per-family error rate, and per-comparison error rate). The methodology is illustrated by comparing different pooling algorithms for the detection of individuals recently infected with HIV in North Carolina and Malawi.

Journal ArticleDOI
TL;DR: This work addresses the problem of selecting which variables should be included in the fixed and random components of logistic mixed effects models for correlated data using a stochastic search Gibbs sampler to estimate the exact model-averaged posterior distribution.
Abstract: We address the problem of selecting which variables should be included in the fixed and random components of logistic mixed effects models for correlated data. A fully Bayesian variable selection is implemented using a stochastic search Gibbs sampler to estimate the exact model-averaged posterior distribution. This approach automatically identifies subsets of predictors having nonzero fixed effect coefficients or nonzero random effects variance, while allowing uncertainty in the model selection process. Default priors are proposed for the variance components and an efficient parameter expansion Gibbs sampler is developed for posterior computation. The approach is illustrated using simulated data and an epidemiologic example.

Journal ArticleDOI
TL;DR: It is found that flexible rules, like artificial neural nets, classification and regression trees, or regression splines can be assessed, and compared to less flexible rules in the same data where they are developed.
Abstract: Estimates of the prediction error play an important role in the development of statistical methods and models, and in their applications. We adapt the resampling tools of Efron and Tibshirani (1997, Journal of the American Statistical Association92, 548-560) to survival analysis with right-censored event times. We find that flexible rules, like artificial neural nets, classification and regression trees, or regression splines can be assessed, and compared to less flexible rules in the same data where they are developed. The methods are illustrated with data from a breast cancer trial.

Journal ArticleDOI
TL;DR: It is argued that the predictive capacity of a marker has to do with the population distribution of risk given the marker and a graphical tool, the predictiveness curve, is suggested that displays this distribution.
Abstract: Consider a continuous marker for predicting a binary outcome. For example, the serum concentration of prostate specific antigen may be used to calculate the risk of finding prostate cancer in a biopsy. In this article, we argue that the predictive capacity of a marker has to do with the population distribution of risk given the marker and suggest a graphical tool, the predictiveness curve, that displays this distribution. The display provides a common meaningful scale for comparing markers that may not be comparable on their original scales. Some existing measures of predictiveness are shown to be summary indices derived from the predictiveness curve. We develop methods for making inference about the predictiveness curve, for making pointwise comparisons between two curves, and for evaluating covariate effects. Applications to risk prediction markers in cancer and cystic fibrosis are discussed.

Journal ArticleDOI
TL;DR: The methods are illustrated using data from a nested cross-sectional cluster intervention trial on reducing underage drinking and the tendency to undercoverage resulting from the substantial variability of sandwich estimators counteracts the impact of overcorrecting the bias.
Abstract: SummaryMancl and DeRouen (2001, Biometrics57, 126–134) and Kauermann and Carroll (2001, JASA96, 1387–1398) proposed alternative bias-corrected covariance estimators for generalized estimating equations parameter estimates of regression models for marginal means. The finite sample properties of these estimators are compared to those of the uncorrected sandwich estimator that underestimates variances in small samples. Although the formula of Mancl and DeRouen generally overestimates variances, it often leads to coverage of 95% confidence intervals near the nominal level even in some situations with as few as 10 clusters. An explanation for these seemingly contradictory results is that the tendency to undercoverage resulting from the substantial variability of sandwich estimators counteracts the impact of overcorrecting the bias. However, these positive results do not generally hold; for small cluster sizes (e.g., <10) their estimator often results in overcoverage, and the bias-corrected covariance estimator of Kauermann and Carroll may be preferred. The methods are illustrated using data from a nested cross-sectional cluster intervention trial on reducing underage drinking.

Journal ArticleDOI
TL;DR: This work proposes a simple extension to the continual reassessment method (CRM), called the Quasi-CRM, to incorporate grade information, which is superior to the standard CRM and comparable to a univariate version of the Bekele and Thall method.
Abstract: We consider the case of phase I trials for treatment of cancer or other severe diseases in which grade information is available about the severity of toxicity. Most dose allocation procedures dichotomize toxicity grades based on being dose limiting, which may not work well for severe and possibly irreversible toxicities such as renal, liver, and neurological toxicities, or toxicities with long duration. We propose a simple extension to the continual reassessment method (CRM), called the Quasi-CRM, to incorporate grade information. Toxicity grades are first converted to numeric scores that reflect their impacts on the dose allocation procedure, and then incorporated into the CRM using the quasi-Bernoulli likelihood. A simulation study demonstrates that the Quasi-CRM is superior to the standard CRM and comparable to a univariate version of the Bekele and Thall method (2004, Journal of the American Statistical Association 99, 26-35). We also present sensitivity analysis of the new method with respect to toxicity scores, and discuss practical issues such as extending the simple algorithmic up-and-down designs.

Journal ArticleDOI
TL;DR: A Gaussian process functional regression model is proposed for the analysis of batch data that models the nonlinear relationship between a functional output variable and a set of functional and nonfunctional covariates.
Abstract: A Gaussian process functional regression model is proposed for the analysis of batch data. Covariance structure and mean structure are considered simultaneously, with the covariance structure modeled by a Gaussian process regression model and the mean structure modeled by a functional regression model. The model allows the inclusion of covariates in both the covariance structure and the mean structure. It models the nonlinear relationship between a functional output variable and a set of functional and nonfunctional covariates. Several applications and simulation studies are reported and show that the method provides very good results for curve fitting and prediction.

Journal ArticleDOI
TL;DR: A time-dependent copula model is proposed in the observable region of the data which is more flexible than standard parametric copula models for the dependence between the events and which permits estimation of the marginal distribution under weaker assumptions than in previous work on competing risks data.
Abstract: Semicompeting risks data are often encountered in clinical trials with intermediate endpoints subject to dependent censoring from informative dropout. Unlike with competing risks data, dropout may not be dependently censored by the intermediate event. There has recently been increased attention to these data, in particular inferences about the marginal distribution of the intermediate event without covariates. In this article, we incorporate covariates and formulate their effects on the survival function of the intermediate event via a functional regression model. To accommodate informative censoring, a time-dependent copula model is proposed in the observable region of the data which is more flexible than standard parametric copula models for the dependence between the events. The model permits estimation of the marginal distribution under weaker assumptions than in previous work on competing risks data. New nonparametric estimators for the marginal and dependence models are derived from nonlinear estimating equations and are shown to be uniformly consistent and to converge weakly to Gaussian processes. Graphical model checking techniques are presented for the assumed models. Nonparametric tests are developed accordingly, as are inferences for parametric submodels for the time-varying covariate effects and copula parameters. A novel time-varying sensitivity analysis is developed using the estimation procedures. Simulations and an AIDS data analysis demonstrate the practical utility of the methodology.

Journal ArticleDOI
TL;DR: Simulation studies show that the proposed design saves sample size, has better power, and efficiently assigns more patients to doses with higher efficacy levels, compared to a conventional phase I and phase II trial.
Abstract: The use of multiple drugs in a single clinical trial or as a therapeutic strategy has become common, particularly in the treatment of cancer. Because traditional trials are designed to evaluate one agent at a time, the evaluation of therapies in combination requires specialized trial designs. In place of the traditional separate phase I and II trials, we propose using a parallel phase I/II clinical trial to evaluate simultaneously the safety and efficacy of combination dose levels, and select the optimal combination dose. The trial is started with an initial period of dose escalation, then patients are randomly assigned to admissible dose levels. These dose levels are compared with each other. Bayesian posterior probabilities are used in the randomization to adaptively assign more patients to doses with higher efficacy levels. Combination doses with lower efficacy are temporarily closed and those with intolerable toxicity are eliminated from the trial. The trial is stopped if the posterior probability for safety, efficacy, or futility crosses a prespecified boundary. For illustration, we apply the design to a combination chemotherapy trial for leukemia. We use simulation studies to assess the operating characteristics of the parallel phase I/II trial design, and compare it to a conventional design for a standard phase I and phase II trial. The simulations show that the proposed design saves sample size, has better power, and efficiently assigns more patients to doses with higher efficacy levels.

Journal ArticleDOI
TL;DR: A crossed design is developed, together with methods that exploit the additional information from such a design, to address problems of line transect sampling on small survey plots.
Abstract: Interest in surveys for monitoring plant abundance is increasing, due in part to the need to quantify the rate of loss of biodiversity. Line transect sampling offers an efficient way to monitor many species. However, the method does not work well in some circumstances, for example on small survey plots, when the plant species has a strongly aggregated distribution, or when plants that are on the line are not easily detected. We develop a crossed design, together with methods that exploit the additional information from such a design, to address these problems. The methods are illustrated using data on a colony of cowslips.

Journal ArticleDOI
TL;DR: This article shows how case-control data can be used to overcome the problems encountered when using the same data to estimate both a spatially varying intensity and second-order properties, and proposes a semiparametric method for adjusting the estimate of intensity so as to take account of explanatory variables attached to the cases and controls.
Abstract: Methods for the statistical analysis of stationary spatial point process data are now well established, methods for nonstationary processes less so. One of many sources of nonstationary point process data is a case-control study in environmental epidemiology. In that context, the data consist of a realization of each of two spatial point processes representing the locations, within a specified geographical region, of individual cases of a disease and of controls drawn at random from the population at risk. In this article, we extend work by Baddeley, Moller, and Waagepetersen (2000, Statistica Neerlandica54, 329-350) concerning estimation of the second-order properties of a nonstationary spatial point process. First, we show how case-control data can be used to overcome the problems encountered when using the same data to estimate both a spatially varying intensity and second-order properties. Second, we propose a semiparametric method for adjusting the estimate of intensity so as to take account of explanatory variables attached to the cases and controls. Our primary focus is estimation, but we also propose a new test for spatial clustering that we show to be competitive with existing tests. We describe an application to an ecological study in which juvenile and surviving adult trees assume the roles of controls and cases.

Journal ArticleDOI
TL;DR: It is demonstrated that LASSO outperforms PLS in terms of prediction error when the list of covariates includes a moderate to large percentage of useless or noise variables; otherwise, PLS may outperform LASSo.
Abstract: We consider the problem of predicting survival times of cancer patients from the gene expression profiles of their tumor samples via linear regression modeling of log-transformed failure times. The partial least squares (PLS) and least absolute shrinkage and selection operator (LASSO) methodologies are used for this purpose where we first modify the data to account for censoring. Three approaches of handling right censored data-reweighting, mean imputation, and multiple imputation-are considered. Their performances are examined in a detailed simulation study and compared with that of full data PLS and LASSO had there been no censoring. A major objective of this article is to investigate the performances of PLS and LASSO in the context of microarray data where the number of covariates is very large and there are extremely few samples. We demonstrate that LASSO outperforms PLS in terms of prediction error when the list of covariates includes a moderate to large percentage of useless or noise variables; otherwise, PLS may outperform LASSO. For a moderate sample size (100 with 10,000 covariates), LASSO performed better than a no covariate model (or noise-based prediction). The mean imputation method appears to best track the performance of the full data PLS or LASSO. The mean imputation scheme is used on an existing data set on lung cancer. This reanalysis using the mean imputed PLS and LASSO identifies a number of genes that were known to be related to cancer or tumor activities from previous studies.

Journal ArticleDOI
TL;DR: A new model is proposed for Microarray‐CGH (comparative genomic hybridization) experiments, combining a segmentation model with a mixture model, and a new hybrid algorithm called dynamic programming–expectation maximization (DP–EM) is presented to estimate the parameters of the model by maximum likelihood.
Abstract: Microarray-CGH (comparative genomic hybridization) experiments are used to detect and map chromosomal imbalances. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose representative sequences share the same relative copy number on average. Segmentation methods constitute a natural framework for the analysis, but they do not provide a biological status for the detected segments. We propose a new model for this segmentation/clustering problem, combining a segmentation model with a mixture model. We present a new hybrid algorithm called dynamic programming-expectation maximization (DP-EM) to estimate the parameters of the model by maximum likelihood. This algorithm combines DP and the EM algorithm. We also propose a model selection heuristic to select the number of clusters and the number of segments. An example of our procedure is presented, based on publicly available data sets. We compare our method to segmentation methods and to hidden Markov models, and we show that the new segmentation/clustering model is a promising alternative that can be applied in the more general context of signal processing.

Journal ArticleDOI
TL;DR: A hierarchical Bayesian framework is adopted for modeling the invasion of invasive species while addressing the discrete nature of the data and uncertainty associated with the probability of detection, and the importance of accommodating spatially varying dispersal rates.
Abstract: The growth and dispersal of biotic organisms is an important subject in ecology. Ecologists are able to accurately describe survival and fecundity in plant and animal populations and have developed quantitative approaches to study the dynamics of dispersal and population size. Of particular interest are the dynamics of invasive species. Such nonindigenous animals and plants can levy significant impacts on native biotic communities. Effective models for relative abundance have been developed; however, a better understanding of the dynamics of actual population size (as opposed to relative abundance) in an invasion would be beneficial to all branches of ecology. In this article, we adopt a hierarchical Bayesian framework for modeling the invasion of such species while addressing the discrete nature of the data and uncertainty associated with the probability of detection. The nonlinear dynamics between discrete time points are intuitively modeled through an embedded deterministic population model with density-dependent growth and dispersal components. Additionally, we illustrate the importance of accommodating spatially varying dispersal rates. The method is applied to the specific case of the Eurasian Collared-Dove, an invasive species at mid-invasion in the United States at the time of this writing.

Journal ArticleDOI
TL;DR: A joint frailty model is used to simultaneously analyze disease recurrences and survival of patients with a recurrent disease and the effect of disease recurrence on survival can be estimated by this model.
Abstract: Therapy for patients with a recurrent disease focuses on delaying disease recurrence and prolonging survival. A common analysis approach for such data is to estimate the distribution of disease-free survival, that is, the time to the first disease recurrence or death, whichever happens first. However, treating death similarly as disease recurrence may give misleading results. Also considering only the first recurrence and ignoring subsequent ones can result in loss of statistical power. We use a joint frailty model to simultaneously analyze disease recurrences and survival. Separate parameters for disease recurrence and survival are used in the joint model to distinguish treatment effects on these two types of events. The correlation between disease recurrences and survival is taken into account by a shared frailty. The effect of disease recurrence on survival can also be estimated by this model. The EM algorithm is used to fit the model, with Markov chain Monte Carlo simulations in the E-steps. The method is evaluated by simulation studies and illustrated through a study of patients with heart failure. Sensitivity analysis for the parametric assumption of the frailty distribution is assessed by simulations.

Journal ArticleDOI
TL;DR: This work considers a relatively complex ODE model for HIV infection and a model for the observations including the issue of detection limits, and proposes a full likelihood inference, adapting a Newton‐like algorithm for these particular models.
Abstract: The study of dynamical models of HIV infection, based on a system of nonlinear ordinary differential equations (ODE), has considerably improved the knowledge of its pathogenesis. While the first models used simplified ODE systems and analyzed each patient separately, recent works dealt with inference in non-simplified models borrowing strength from the whole sample. The complexity of these models leads to great difficulties for inference and only the Bayesian approach has been attempted by now. We propose a full likelihood inference, adapting a Newton-like algorithm for these particular models. We consider a relatively complex ODE model for HIV infection and a model for the observations including the issue of detection limits. We apply this approach to the analysis of a clinical trial of antiretroviral therapy (ALBI ANRS 070) and we show that the whole algorithm works well in a simulation study.

Journal ArticleDOI
TL;DR: This work proposes two approximate likelihood methods for semiparametric NLME models with covariate measurement errors and nonignorable missing responses with much better results than the commonly used naive method.
Abstract: Semiparametric nonlinear mixed-effects (NLME) models are flexible for modeling complex longitudinal data. Covariates are usually introduced in the models to partially explain interindividual variations. Some covariates, however, may be measured with substantial errors. Moreover, the responses may be missing and the missingness may be nonignorable. We propose two approximate likelihood methods for semiparametric NLME models with covariate measurement errors and nonignorable missing responses. The methods are illustrated in a real data example. Simulation results show that both methods perform well and are much better than the commonly used naive method.