scispace - formally typeset
Search or ask a question

Showing papers in "Biometrics in 2016"


Journal ArticleDOI
TL;DR: Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction is intended for a broad audience and succeeds in presenting material in a manner that is accessible to readers with a reasonable familiarity with mathematics and statistics.
Abstract: It is with some trepidation that we offer our review of Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. The field of causal inference is broad and known to have some strong personalities. A quick Google search shows how varied the responses have been to this book, ranging from unmitigated praise to derision. I (BES) agreed to review the book because I have been hoping to develop a course on causal inference and felt that this book could be a key component of my course. I recruited two graduate students from our biostatistics department, one in their first year and one in their second year, to read it with me this past spring semester. This review represents our combined efforts, but I (BES) certainly biased their thinking. The book is written by two prominent researchers in the area. Don Rubin’s influential papers in the 70s and 80s provided the foundation for much of the current field of causal inference. Dr. Rubin’s enormous impact on modern statistics– not just in causal inference–is well recognized. The other author, Guido Imbens, is no slacker either with a very impressive bibliography of important papers in both economics and statistics journals. Together, they are more than qualified to write an introductory text to causal inference. The book is well written. It is intended for a broad audience and, in most respects, it succeeds in presenting material in a manner that is accessible to readers with a reasonable familiarity with mathematics and statistics. Some sections have a fair amount of mathematical content, but the authors maintain a narrative, easy-to-read writing style throughout. The text is very structured. Every chapter begins with an introduction, which provides a clear and interesting overview. Most chapters then present a data example that is used to frame the material and to demonstrate the application of the methods. The rich use of diverse, stimulating, and understandable data examples throughout is a big plus, although a limitation is that datasets and analysis code are not provided. Also, occasionally the example datasets are not ideal fits. For example, in chapter 23, the authors introduce complier average causal effects in a study with a community randomized intervention; the concept may have been introduced more simply with an individually randomized intervention. The authors foreshadow future content almost to a fault, repeatedly saying in part/chapter/section X we will discuss Y . It is preferable to be overly clear versus overly terse, and although the book is a little slow and tedious at times, it is easy to follow. There are typos throughout (probably one or two per chapter, and more than that in the surprisingly error-prone Conclusion chapter), but none are particularly serious, and in almost all cases the true meaning is easily discerned. Chapters 1–3 provide an intuitive and well-articulated introduction to the “Rubin Causal Model,” in which inference regarding potential outcomes is treated as a missing data problem. As the authors focus solely on binary treatments at a single time point, there are only two potential outcomes for each individual, one of which is never observed. Basic assumptions and philosophies, assignment mechanisms, and a brief history of the potential outcomes approach to causal inference are provided. Some of these ideas are not uniformly accepted. For example, the dictum “no causation without manipulation” is still debated. And, reading a history of causal inference by Don Rubin kind of feels like reading a history of the establishment of the United States by Thomas Jefferson: certainly interesting and undoubtedly written by a founder, but different from how Alexander Hamilton would write it. With that said, these chapters contain some of the

191 citations


Journal ArticleDOI
TL;DR: Inference, for instance, based on simultaneous confidence bands for a single RMST curve and also the difference between two RMST curves are proposed, which is informative for evaluating two groups under an equivalence or noninferiority setting, and quantifies the difference of two groups in a time scale.
Abstract: For a study with an event time as the endpoint, its survival function contains all the information regarding the temporal, stochastic profile of this outcome variable. The survival probability at a specific time point, say t, however, does not transparently capture the temporal profile of this endpoint up to t. An alternative is to use the restricted mean survival time (RMST) at time t to summarize the profile. The RMST is the mean survival time of all subjects in the study population followed up to t, and is simply the area under the survival curve up to t. The advantages of using such a quantification over the survival rate have been discussed in the setting of a fixed-time analysis. In this article, we generalize this approach by considering a curve based on the RMST over time as an alternative summary to the survival function. Inference, for instance, based on simultaneous confidence bands for a single RMST curve and also the difference between two RMST curves are proposed. The latter is informative for evaluating two groups under an equivalence or noninferiority setting, and quantifies the difference of two groups in a time scale. The proposal is illustrated with the data from two clinical trials, one from oncology and the other from cardiology.

190 citations


Journal ArticleDOI
TL;DR: A testing procedure for mediation effects of high-dimensional continuous mediators is proposed and nine gene ontology sets with expression values that significantly mediate the effect of miR-223 on GBM survival are identified.
Abstract: Causal mediation modeling has become a popular approach for studying the effect of an exposure on an outcome through a mediator. However, current methods are not applicable to the setting with a large number of mediators. We propose a testing procedure for mediation effects of high-dimensional continuous mediators. We characterize the marginal mediation effect, the multivariate component-wise mediation effects, and the L2 norm of the component-wise effects, and develop a Monte-Carlo procedure for evaluating their statistical significance. To accommodate the setting with a large number of mediators and a small sample size, we further propose a transformation model using the spectral decomposition. Under the transformation model, mediation effects can be estimated using a series of regression models with a univariate transformed mediator, and examined by our proposed testing procedure. Extensive simulation studies are conducted to assess the performance of our methods for continuous and dichotomous outcomes. We apply the methods to analyze genomic data investigating the effect of microRNA miR-223 on a dichotomous survival status of patients with glioblastoma multiforme (GBM). We identify nine gene ontology sets with expression values that significantly mediate the effect of miR-223 on GBM survival.

114 citations


Journal ArticleDOI
TL;DR: As possibly the first user of the software package now known as “spatstat", it is an honor and a pleasure to review Baddeley, Rubak, and Turner's wonderful new book entitled “Spatial point patterns: Methodology and applications with R."
Abstract: As possibly the first user of the software package now known as “spatstat", it is an honor and a pleasure to review Baddeley, Rubak, and Turner’s wonderful new book entitled “Spatial point patterns: Methodology and applications with R."

91 citations


Journal ArticleDOI
TL;DR: In this paper, the authors developed new methods for estimating average treatment effects in observational studies, in settings with more than two treatment levels, assuming unconfoundedness given pretreatment variables.
Abstract: In this article, we develop new methods for estimating average treatment effects in observational studies, in settings with more than two treatment levels, assuming unconfoundedness given pretreatment variables. We emphasize propensity score subclassification and matching methods which have been among the most popular methods in the binary treatment literature. Whereas the literature has suggested that these particular propensity-based methods do not naturally extend to the multi-level treatment case, we show, using the concept of weak unconfoundedness and the notion of the generalized propensity score, that adjusting for a scalar function of the pretreatment variables removes all biases associated with observed pretreatment variables. We apply the proposed methods to an analysis of the effect of treatments for fibromyalgia. We also carry out a simulation study to assess the finite sample performance of the methods relative to previously proposed methods.

90 citations


Journal ArticleDOI
TL;DR: A novel expectation-maximization (EM) algorithm is developed for finding the maximum likelihood estimates of the parameters in the proportional hazards model, which uses a monotone spline representation to approximate the unknown nondecreasing cumulative baseline hazard function.
Abstract: The proportional hazards model (PH) is currently the most popular regression model for analyzing time-to-event data. Despite its popularity, the analysis of interval-censored data under the PH model can be challenging using many available techniques. This article presents a new method for analyzing interval-censored data under the PH model. The proposed approach uses a monotone spline representation to approximate the unknown nondecreasing cumulative baseline hazard function. Formulating the PH model in this fashion results in a finite number of parameters to estimate while maintaining substantial modeling flexibility. A novel expectation-maximization (EM) algorithm is developed for finding the maximum likelihood estimates of the parameters. The derivation of the EM algorithm relies on a two-stage data augmentation involving latent Poisson random variables. The resulting algorithm is easy to implement, robust to initialization, enjoys quick convergence, and provides closed-form variance estimates. The performance of the proposed regression methodology is evaluated through a simulation study, and is further illustrated using data from a large population-based randomized trial designed and sponsored by the United States National Cancer Institute.

83 citations


Journal ArticleDOI
TL;DR: The interaction sequence kernel association test (iSKAT) is developed and is powerful and robust to the proportion of variants in a gene that interact with environment and the signs of the effects, and properly controls for the main effects of the rare variants using weighted ridge regression while adjusting for covariates.
Abstract: We consider in this article testing rare variants by environment interactions in sequencing association studies. Current methods for studying the association of rare variants with traits cannot be readily applied for testing for rare variants by environment interactions, as these methods do not effectively control for the main effects of rare variants, leading to unstable results and/or inflated Type 1 error rates. We will first analytically study the bias of the use of conventional burden-based tests for rare variants by environment interactions, and show the tests can often be invalid and result in inflated Type 1 error rates. To overcome these difficulties, we develop the interaction sequence kernel association test (iSKAT) for assessing rare variants by environment interactions. The proposed test iSKAT is optimal in a class of variance component tests and is powerful and robust to the proportion of variants in a gene that interact with environment and the signs of the effects. This test properly controls for the main effects of the rare variants using weighted ridge regression while adjusting for covariates. We demonstrate the performance of iSKAT using simulation studies and illustrate its application by analysis of a candidate gene sequencing study of plasma adiponectin levels.

73 citations


Journal ArticleDOI
TL;DR: Using model selection frequencies and variable inclusion frequencies, empirically compare these two different resampling techniques, investigating the effect of their use in selected classical model selection procedures for multivariable regression.
Abstract: In recent years, increasing attention has been devoted to the problem of the stability of multivariable regression models, understood as the resistance of the model to small changes in the data on which it has been fitted. Resampling techniques, mainly based on the bootstrap, have been developed to address this issue. In particular, the approaches based on the idea of "inclusion frequency" consider the repeated implementation of a variable selection procedure, for example backward elimination, on several bootstrap samples. The analysis of the variables selected in each iteration provides useful information on the model stability and on the variables' importance. Recent findings, nevertheless, show possible pitfalls in the use of the bootstrap, and alternatives such as subsampling have begun to be taken into consideration in the literature. Using model selection frequencies and variable inclusion frequencies, we empirically compare these two different resampling techniques, investigating the effect of their use in selected classical model selection procedures for multivariable regression. We conduct our investigations by analyzing two real data examples and by performing a simulation study. Our results reveal some advantages in using a subsampling technique rather than the bootstrap in this context.

71 citations


Journal ArticleDOI
TL;DR: A Bayesian credible subgroups method is proposed to identify two bounding subgroups for the benefiting subgroup: one for which all members simultaneously have a treatment effect exceeding a specified threshold, and another for which it is likely that no members do.
Abstract: Many new experimental treatments benefit only a subset of the population. Identifying the baseline covariate profiles of patients who benefit from such a treatment, rather than determining whether or not the treatment has a population-level effect, can substantially lessen the risk in undertaking a clinical trial and expose fewer patients to treatments that do not benefit them. The standard analyses for identifying patient subgroups that benefit from an experimental treatment either do not account for multiplicity, or focus on testing for the presence of treatment-covariate interactions rather than the resulting individualized treatment effects. We propose a Bayesian credible subgroups method to identify two bounding subgroups for the benefiting subgroup: one for which it is likely that all members simultaneously have a treatment effect exceeding a specified threshold, and another for which it is likely that no members do. We examine frequentist properties of the credible subgroups method via simulations and illustrate the approach using data from an Alzheimer's disease treatment trial. We conclude with a discussion of the advantages and limitations of this approach to identifying patients for whom the treatment is beneficial.

52 citations


Journal ArticleDOI
TL;DR: A new Bayesian nonparametric method, based on Dirichlet process mixtures, that can accommodate complex patterns of heterogeneity of capture, and can transparently modulate its complexity without a separate model selection step is introduced.
Abstract: We introduce a new Bayesian nonparametric method for estimating the size of a closed population from multiple-recapture data. Our method, based on Dirichlet process mixtures, can accommodate complex patterns of heterogeneity of capture, and can transparently modulate its complexity without a separate model selection step. Additionally, it can handle the massively sparse contingency tables generated by large number of recaptures with moderate sample sizes. We develop an efficient and scalable MCMC algorithm for estimation. We apply our method to simulated data, and to two examples from the literature of estimation of casualties in armed conflicts.

51 citations


Journal ArticleDOI
TL;DR: A general statistical framework to combine such "opportunistic data" with data collected using schemes characterized by a known sampling effort makes it possible to estimate the relative abundance of several species in different sites.
Abstract: With the internet, a massive amount of information on species abundance can be collected by citizen science programs. However, these data are often difficult to use directly in statistical inference, as their collection is generally opportunistic, and the distribution of the sampling effort is often not known. In this article, we develop a general statistical framework to combine such "opportunistic data" with data collected using schemes characterized by a known sampling effort. Under some structural assumptions regarding the sampling effort and detectability, our approach makes it possible to estimate the relative abundance of several species in different sites. It can be implemented through a simple generalized linear model. We illustrate the framework with typical bird datasets from the Aquitaine region in south-western France. We show that, under some assumptions, our approach provides estimates that are more precise than the ones obtained from the dataset with a known sampling effort alone. When the opportunistic data are abundant, the gain in precision may be considerable, especially for rare species. We also show that estimates can be obtained even for species recorded only in the opportunistic scheme. Opportunistic data combined with a relatively small amount of data collected with a known effort may thus provide access to accurate and precise estimates of quantitative changes in relative abundance over space and/or time.

Journal ArticleDOI
TL;DR: This article develops a novel penalty‐based method to derive functional principal components that are only nonzero precisely in the intervals where the values of FPCs are significant, whence the derived F PCs possess better interpretability than the FPCS derived from existing methods.
Abstract: Functional principal component analysis (FPCA) is a popular approach to explore major sources of variation in a sample of random curves. These major sources of variation are represented by functional principal components (FPCs). The intervals where the values of FPCs are significant are interpreted as where sample curves have major variations. However, these intervals are often hard for naive users to identify, because of the vague definition of "significant values". In this article, we develop a novel penalty-based method to derive FPCs that are only nonzero precisely in the intervals where the values of FPCs are significant, whence the derived FPCs possess better interpretability than the FPCs derived from existing methods. To compute the proposed FPCs, we devise an efficient algorithm based on projection deflation techniques. We show that the proposed interpretable FPCs are strongly consistent and asymptotically normal under mild conditions. Simulation studies confirm that with a competitive performance in explaining variations of sample curves, the proposed FPCs are more interpretable than the traditional counterparts. This advantage is demonstrated by analyzing two real datasets, namely, electroencephalography data and Canadian weather data.

Journal ArticleDOI
TL;DR: This article describes an approach to modeling the regression coefficients as parametric functions of the order of the quantile, which may have advantages in terms of parsimony, efficiency, and the potential of statistical modeling.
Abstract: Estimating the conditional quantiles of outcome variables of interest is frequent in many research areas, and quantile regression is foremost among the utilized methods. The coefficients of a quantile regression model depend on the order of the quantile being estimated. For example, the coefficients for the median are generally different from those of the 10th centile. In this article, we describe an approach to modeling the regression coefficients as parametric functions of the order of the quantile. This approach may have advantages in terms of parsimony, efficiency, and may expand the potential of statistical modeling. Goodness-of-fit measures and testing procedures are discussed, and the results of a simulation study are presented. We apply the method to analyze the data that motivated this work. The described method is implemented in the qrcm R package.

Journal ArticleDOI
TL;DR: To evaluate a new therapy versus a control via a randomized, comparative clinical study or a series of trials, due to heterogeneity of the study patient population, a pre‐specified, predictive enrichment procedure may be implemented to identify an “enrichable” subpopulation.
Abstract: To evaluate a new therapy versus a control via a randomized, comparative clinical study or a series of trials, due to heterogeneity of the study patient population, a pre-specified, predictive enrichment procedure may be implemented to identify an "enrichable" subpopulation. For patients in this subpopulation, the therapy is expected to have a desirable overall risk-benefit profile. To develop and validate such a "therapy-diagnostic co-development" strategy, a three-step procedure may be conducted with three independent data sets from a series of similar studies or a single trial. At the first stage, we create various candidate scoring systems based on the baseline information of the patients via, for example, parametric models using the first data set. Each individual score reflects an anticipated average treatment difference for future patients who share similar baseline profiles. A large score indicates that these patients tend to benefit from the new therapy. At the second step, a potentially promising, enrichable subgroup is identified using the totality of evidence from these scoring systems. At the final stage, we validate such a selection via two-sample inference procedures for assessing the treatment effectiveness statistically and clinically with the third data set, the so-called holdout sample. When the study size is not large, one may combine the first two steps using a "cross-training-evaluation" process. Comprehensive numerical studies are conducted to investigate the operational characteristics of the proposed method. The entire enrichment procedure is illustrated with the data from a cardiovascular trial to evaluate a beta-blocker versus a placebo for treating chronic heart failure patients.

Journal ArticleDOI
TL;DR: Two joint frailty models for zero-inflated recurrent events in the presence of a terminal event are proposed, combining a logistic model for "structural zero" status (Yes/No) and a joint frailt proportional hazards model for recurrent and terminal event times.
Abstract: Recurrent event data arise frequently in longitudinal medical studies. In many situations, there are a large portion of subjects without any recurrent events, manifesting the "zero-inflated" nature of the data. Some of the zero events may be "structural zeros" as patients are unsusceptible to recurrent events, while others are "random zeros" due to censoring before any recurrent events. On the other hand, there often exists a terminal event which may be correlated with the recurrent events. In this article, we propose two joint frailty models for zero-inflated recurrent events in the presence of a terminal event, combining a logistic model for "structural zero" status (Yes/No) and a joint frailty proportional hazards model for recurrent and terminal event times. The models can be fitted conveniently in SAS Proc NLMIXED. We apply the methods to model recurrent opportunistic diseases in the presence of death in an AIDS study, and tumor recurrences and a terminal event in a sarcoma study.

Journal ArticleDOI
TL;DR: A new type of error control is defined, the interval-wise control of the family wise error rate, particularly suited for functional data, and it is shown that ITP is provided with such a control.
Abstract: We introduce in this work the Interval Testing Procedure (ITP), a novel inferential technique for functional data. The procedure can be used to test different functional hypotheses, e.g., distributional equality between two or more functional populations, equality of mean function of a functional population to a reference. ITP involves three steps: (i) the representation of data on a (possibly high-dimensional) functional basis; (ii) the test of each possible set of consecutive basis coefficients; (iii) the computation of the adjusted p-values associated to each basis component, by means of a new strategy here proposed. We define a new type of error control, the interval-wise control of the family wise error rate, particularly suited for functional data. We show that ITP is provided with such a control. A simulation study comparing ITP with other testing procedures is reported. ITP is then applied to the analysis of hemodynamical features involved with cerebral aneurysm pathology. ITP is implemented in the fdatest R package.

Journal ArticleDOI
TL;DR: A joint model for a simultaneous analysis of three types of data: a longitudinal marker, recurrent events, and a terminal event is proposed and applied to a randomized phase III clinical trial of metastatic colorectal cancer, showing that the proposed trivariate model is appropriate for practical use.
Abstract: In oncology, the international WHO and RECIST criteria have allowed the standardization of tumor response evaluation in order to identify the time of disease progression. These semi-quantitative measurements are often used as endpoints in phase II and phase III trials to study the efficacy of new therapies. However, through categorization of the continuous tumor size, information can be lost and they can be challenged by recently developed methods of modeling biomarkers in a longitudinal way. Thus, it is of interest to compare the predictive ability of cancer progressions based on categorical criteria and quantitative measures of tumor size (left-censored due to detection limit problems) and/or appearance of new lesions on overall survival. We propose a joint model for a simultaneous analysis of three types of data: a longitudinal marker, recurrent events, and a terminal event. The model allows to determine in a randomized clinical trial on which particular component treatment acts mostly. A simulation study is performed and shows that the proposed trivariate model is appropriate for practical use. We propose statistical tools that evaluate predictive accuracy for joint models to compare our model to models based on categorical criteria and their components. We apply the model to a randomized phase III clinical trial of metastatic colorectal cancer, conducted by the Federation Francophone de Cancerologie Digestive (FFCD 2000-05 trial), which assigned 410 patients to two therapeutic strategies with multiple successive chemotherapy regimens.

Journal ArticleDOI
TL;DR: A new approach to modeling group animal movement in continuous time as a multivariate Ornstein Uhlenbeck diffusion process in a high-dimensional space is presented and it is shown that the method detects dependency in movement between individuals.
Abstract: This article presents a new approach to modeling group animal movement in continuous time. The movement of a group of animals is modeled as a multivariate Ornstein Uhlenbeck diffusion process in a high-dimensional space. Each individual of the group is attracted to a leading point which is generally unobserved, and the movement of the leading point is also an Ornstein Uhlenbeck process attracted to an unknown attractor. The Ornstein Uhlenbeck bridge is applied to reconstruct the location of the leading point. All movement parameters are estimated using Markov chain Monte Carlo sampling, specifically a Metropolis Hastings algorithm. We apply the method to a small group of simultaneously tracked reindeer, Rangifer tarandus tarandus, showing that the method detects dependency in movement between individuals.

Journal ArticleDOI
TL;DR: This work investigates SII as an effective screener for ultrahigh‐dimensional data, not relying on rigid regression model assumptions for real applications, and establishes the sure screening property of the proposed SII‐based screeners.
Abstract: Motivated by ultrahigh-dimensional biomarkers screening studies, we propose a model-free screening approach tailored to censored lifetime outcomes. Our proposal is built upon the introduction of a new measure, survival impact index (SII). By its design, SII sensibly captures the overall influence of a covariate on the outcome distribution, and can be estimated with familiar nonparametric procedures that do not require smoothing and are readily adaptable to handle lifetime outcomes under various censoring and truncation mechanisms. We provide large sample distributional results that facilitate the inference on SII in classical multivariate settings. More importantly, we investigate SII as an effective screener for ultrahigh-dimensional data, not relying on rigid regression model assumptions for real applications. We establish the sure screening property of the proposed SII-based screener. Extensive numerical studies are carried out to assess the performance of our method compared with other existing screening methods. A lung cancer microarray data is analyzed to demonstrate the practical utility of our proposals.

Journal ArticleDOI
TL;DR: A Bayesian analysis of a state-space formulation of the Jolly-Seber mark-recapture model is used, integrated with a binomial model for counts of unmarked animals, to derive estimates of population size and arrival and departure probabilities.
Abstract: We present a novel formulation of a mark-recapture-resight model that allows estimation of population size, stopover duration, and arrival and departure schedules at migration areas. Estimation is based on encounter histories of uniquely marked individuals and relative counts of marked and unmarked animals. We use a Bayesian analysis of a state-space formulation of the Jolly-Seber mark-recapture model, integrated with a binomial model for counts of unmarked animals, to derive estimates of population size and arrival and departure probabilities. We also provide a novel estimator for stopover duration that is derived from the latent state variable representing the interim between arrival and departure in the state-space model. We conduct a simulation study of field sampling protocols to understand the impact of superpopulation size, proportion marked, and number of animals sampled on bias and precision of estimates. Simulation results indicate that relative bias of estimates of the proportion of the population with marks was low for all sampling scenarios and never exceeded 2%. Our approach does not require enumeration of all unmarked animals detected or direct knowledge of the number of marked animals in the population at the time of the study. This provides flexibility and potential application in a variety of sampling situations (e.g., migratory birds, breeding seabirds, sea turtles, fish, pinnipeds, etc.). Application of the methods is demonstrated with data from a study of migratory sandpipers.

Journal ArticleDOI
TL;DR: The methods to analyze the impact of a treatment on neurological function and death in an ALS trial and the choice of optimal weighting schemes based on power and relative importance of the outcomes are applied.
Abstract: Clinical trials often collect multiple outcomes on each patient, as the treatment may be expected to affect the patient on many dimensions. For example, a treatment for a neurological disease such as ALS is intended to impact several dimensions of neurological function as well as survival. The assessment of treatment on the basis of multiple outcomes is challenging, both in terms of selecting a test and interpreting the results. Several global tests have been proposed, and we provide a general approach to selecting and executing a global test. The tests require minimal parametric assumptions, are flexible about weighting of the various outcomes, and are appropriate even when some or all of the outcomes are censored. The test we propose is based on a simple scoring mechanism applied to each pair of subjects for each endpoint. The pairwise scores are then reduced to a summary score, and a rank-sum test is applied to the summary scores. This can be seen as a generalization of previously proposed nonparametric global tests (e.g., O'Brien, 1984). We discuss the choice of optimal weighting schemes based on power and relative importance of the outcomes. As the optimal weights are generally unknown in practice, we also propose an adaptive weighting scheme and evaluate its performance in simulations. We apply the methods to analyze the impact of a treatment on neurological function and death in an ALS trial.

Journal ArticleDOI
TL;DR: The proposed estimator is the first estimator of the causal relation between a treatment and an ordinal outcome to satisfy these properties and is guaranteed to have equal or better asymptotic precision than both the inverse probability-weighted and the unadjusted estimators.
Abstract: We present a general method for estimating the effect of a treatment on an ordinal outcome in randomized trials. The method is robust in that it does not rely on the proportional odds assumption. Our estimator leverages information in prognostic baseline variables, and has all of the following properties: (i) it is consistent; (ii) it is locally efficient; (iii) it is guaranteed to have equal or better asymptotic precision than both the inverse probability-weighted and the unadjusted estimators. To the best of our knowledge, this is the first estimator of the causal relation between a treatment and an ordinal outcome to satisfy these properties. We demonstrate the estimator in simulations based on resampling from a completed randomized clinical trial of a new treatment for stroke; we show potential gains of up to 39% in relative efficiency compared to the unadjusted estimator. The proposed estimator could be a useful tool for analyzing randomized trials with ordinal outcomes, since existing methods either rely on model assumptions that are untenable in many practical applications, or lack the efficiency properties of the proposed estimator. We provide R code implementing the estimator.

Journal ArticleDOI
TL;DR: A novel generalized abundance index is presented which encompasses both parametric and nonparametric approaches and has the potential for new insights into both phenology and spatial variation in seasonal patterns from parametric modeling and the incorporation of covariate dependence, relevant for both monitoring and conservation.
Abstract: At a time of climate change and major loss of biodiversity, it is important to have efficient tools for monitoring populations. In this context, animal abundance indices play an important role. In producing indices for invertebrates, it is important to account for variation in counts within seasons. Two new methods for describing seasonal variation in invertebrate counts have recently been proposed; one is nonparametric, using generalized additive models, and the other is parametric, based on stopover models. We present a novel generalized abundance index which encompasses both parametric and nonparametric approaches. It is extremely efficient to compute this index due to the use of concentrated likelihood techniques. This has particular relevance for the analysis of data from long-term extensive monitoring schemes with records for many species and sites, for which existing modeling techniques can be prohibitively time consuming. Performance of the index is demonstrated by several applications to UK Butterfly Monitoring Scheme data. We demonstrate the potential for new insights into both phenology and spatial variation in seasonal patterns from parametric modeling and the incorporation of covariate dependence, which is relevant for both monitoring and conservation. Associated R code is available on the journal website.

Journal ArticleDOI
TL;DR: A multi-mediator model for survival data is proposed by employing a flexible semiparametric probit model and path-specific effects (PSEs) of the exposure on the outcome mediated through specific mediators are characterized.
Abstract: Causal mediation modeling has become a popular approach for studying the effect of an exposure on an outcome through mediators. Currently, the literature on mediation analyses with survival outcomes largely focused on settings with a single mediator and quantified the mediation effects on the hazard, log hazard and log survival time (Lange and Hansen 2011; VanderWeele 2011). In this article, we propose a multi-mediator model for survival data by employing a flexible semiparametric probit model. We characterize path-specific effects (PSEs) of the exposure on the outcome mediated through specific mediators. We derive closed form expressions for PSEs on a transformed survival time and the survival probabilities. Statistical inference on the PSEs is developed using a nonparametric maximum likelihood estimator under the semiparametric probit model and the functional Delta method. Results from simulation studies suggest that our proposed methods perform well in finite sample. We illustrate the utility of our method in a genomic study of glioblastoma multiforme survival.

Journal ArticleDOI
Kun Liang1
TL;DR: It is shown that a broad class of FDR estimators is simultaneously conservative over all support points under some weak dependence condition in the asymptotic setting and a novel class of conservative FDR estimator is proposed in the finite sample setting.
Abstract: Large-scale homogeneous discrete p-values are encountered frequently in high-throughput genomics studies, and the related multiple testing problems become challenging because most existing methods for the false discovery rate (FDR) assume continuous p-values. In this article, we study the estimation of the null proportion and FDR for discrete p-values with common support. In the finite sample setting, we propose a novel class of conservative FDR estimators. Furthermore, we show that a broad class of FDR estimators is simultaneously conservative over all support points under some weak dependence condition in the asymptotic setting. We further demonstrate the significant improvement of a newly proposed method over existing methods through simulation studies and a case study.

Journal ArticleDOI
TL;DR: This article presents a log-rank-type test for comparing net survival functions (as estimated by PPE) between several groups and puts the test within the counting process framework to introduce the inverse probability weighting procedure as required by the PPE.
Abstract: In population-based cancer studies, it is often interesting to compare cancer survival between different populations. However, in such studies, the exact causes of death are often unavailable or unreliable. Net survival methods were developed to overcome this difficulty. Net survival is the survival that would be observed if the disease under study was the only possible cause of death. The Pohar-Perme estimator (PPE) is a nonparametric consistent estimator of net survival. In this article, we present a log-rank-type test for comparing net survival functions (as estimated by PPE) between several groups. We put the test within the counting process framework to introduce the inverse probability weighting procedure as required by the PPE. We built a stratified version to control for categorical covariates that affect the outcome. We performed simulation studies to evaluate the performance of this test and worked an application on real data.

Journal ArticleDOI
TL;DR: A novel extension of the rank-sum test is developed to determine whether there are differences in the attachment loss between the upper and lower teeth and between mesial and buccal sites of periodontal patients.
Abstract: The Wilcoxon rank-sum test is a popular nonparametric test for comparing two independent populations (groups). In recent years, there have been renewed attempts in extending the Wilcoxon rank sum test for clustered data, one of which (Datta and Satten, 2005, Journal of the American Statistical Association 100, 908-915) addresses the issue of informative cluster size, i.e., when the outcomes and the cluster size are correlated. We are faced with a situation where the group specific marginal distribution in a cluster depends on the number of observations in that group (i.e., the intra-cluster group size). We develop a novel extension of the rank-sum test for handling this situation. We compare the performance of our test with the Datta-Satten test, as well as the naive Wilcoxon rank sum test. Using a naturally occurring simulation model of informative intra-cluster group size, we show that only our test maintains the correct size. We also compare our test with a classical signed rank test based on averages of the outcome values in each group paired by the cluster membership. While this test maintains the size, it has lower power than our test. Extensions to multiple group comparisons and the case of clusters not having samples from all groups are also discussed. We apply our test to determine whether there are differences in the attachment loss between the upper and lower teeth and between mesial and buccal sites of periodontal patients.

Journal ArticleDOI
TL;DR: Novel modeling incorporating zero inflation, clustering, and overdispersion sheds some new light on the effect of community water fluoridation and other factors.
Abstract: Community water fluoridation is an important public health measure to prevent dental caries, but it continues to be somewhat controversial. The Iowa Fluoride Study (IFS) is a longitudinal study on a cohort of Iowa children that began in 1991. The main purposes of this study (http://www.dentistry.uiowa.edu/preventive-fluoride-study) were to quantify fluoride exposures from both dietary and nondietary sources and to associate longitudinal fluoride exposures with dental fluorosis (spots on teeth) and dental caries (cavities). We analyze a subset of the IFS data by a marginal regression model with a zero-inflated version of the Conway-Maxwell-Poisson distribution for count data exhibiting excessive zeros and a wide range of dispersion patterns. In general, we introduce two estimation methods for fitting a ZICMP marginal regression model. Finite sample behaviors of the estimators and the resulting confidence intervals are studied using extensive simulation studies. We apply our methodologies to the dental caries data. Our novel modeling incorporating zero inflation, clustering, and overdispersion sheds some new light on the effect of community water fluoridation and other factors. We also include a second application of our methodology to a genomic (next-generation sequencing) dataset that exhibits underdispersion.

Journal ArticleDOI
TL;DR: It is shown how using this objective function that accounts for correct classification is important for design when considering group testing under misclassify, and novel analytical results which characterize the optimal Dorfman (1943) design under the misclassification are presented.
Abstract: In the context of group testing screening, McMahan, Tebbs, and Bilder (2012, Biometrics 68, 287-296) proposed a two-stage procedure in a heterogenous population in the presence of misclassification. In earlier work published in Biometrics, Kim, Hudgens, Dreyfuss, Westreich, and Pilcher (2007, Biometrics 63, 1152-1162) also proposed group testing algorithms in a homogeneous population with misclassification. In both cases, the authors evaluated performance of the algorithms based on the expected number of tests per person, with the optimal design being defined by minimizing this quantity. The purpose of this article is to show that although the expected number of tests per person is an appropriate evaluation criteria for group testing when there is no misclassification, it may be problematic when there is misclassification. Specifically, a valid criterion needs to take into account the amount of correct classification and not just the number of tests. We propose, a more suitable objective function that accounts for not only the expected number of tests, but also the expected number of correct classifications. We then show how using this objective function that accounts for correct classification is important for design when considering group testing under misclassification. We also present novel analytical results which characterize the optimal Dorfman (1943) design under the misclassification.

Journal ArticleDOI
TL;DR: New methods for treatment effect calibration are proposed: one based on a conditional effect (CE) model and two doubly robust (DR) methods, which are compared in a simulation study and applied to recent clinical trials for treating human immunodeficiency virus infection.
Abstract: In comparative effectiveness research, it is often of interest to calibrate treatment effect estimates from a clinical trial to a target population that differs from the study population. One important application is an indirect comparison of a new treatment with a placebo control on the basis of two separate randomized clinical trials: a non-inferiority trial comparing the new treatment with an active control and a historical trial comparing the active control with placebo. The available methods for treatment effect calibration include an outcome regression (OR) method based on a regression model for the outcome and a weighting method based on a propensity score (PS) model. This article proposes new methods for treatment effect calibration: one based on a conditional effect (CE) model and two doubly robust (DR) methods. The first DR method involves a PS model and an OR model, is asymptotically valid if either model is correct, and attains the semiparametric information bound if both models are correct. The second DR method involves a PS model, a CE model, and possibly an OR model, is asymptotically valid under the union of the PS and CE models, and attains the semiparametric information bound if all three models are correct. The various methods are compared in a simulation study and applied to recent clinical trials for treating human immunodeficiency virus infection.