scispace - formally typeset
Search or ask a question

Showing papers in "Statistical Methods in Medical Research in 2017"


Journal ArticleDOI
TL;DR: Settings particularly relevant to Mendelian randomization are prioritized in the paper, notably the scenario of a continuous exposure and a continuous or binary outcome.
Abstract: Instrumental variable analysis is an approach for obtaining causal inferences on the effect of an exposure (risk factor) on an outcome from observational data. It has gained in popularity over the past decade with the use of genetic variants as instrumental variables, known as Mendelian randomization. An instrumental variable is associated with the exposure, but not associated with any confounder of the exposure-outcome association, nor is there any causal pathway from the instrumental variable to the outcome other than via the exposure. Under the assumption that a single instrumental variable or a set of instrumental variables for the exposure is available, the causal effect of the exposure on the outcome can be estimated. There are several methods available for instrumental variable estimation; we consider the ratio method, two-stage methods, likelihood-based methods, and semi-parametric methods. Techniques for obtaining statistical inferences and confidence intervals are presented. The statistical properties of estimates from these methods are compared, and practical advice is given about choosing a suitable analysis method. In particular, bias and coverage properties of estimators are considered, especially with weak instruments. Settings particularly relevant to Mendelian randomization are prioritized in the paper, notably the scenario of a continuous exposure and a continuous or binary outcome.

619 citations


Journal ArticleDOI
TL;DR: In this article, the effect of the number of events per variable (EPV) on the relative performance of three different methods for assessing the predictive accuracy of a logistic regression model: apparent performance in the analysis sample, split-sample validation and optimism correction using bootstrap methods.
Abstract: We conducted an extensive set of empirical analyses to examine the effect of the number of events per variable (EPV) on the relative performance of three different methods for assessing the predictive accuracy of a logistic regression model: apparent performance in the analysis sample, split-sample validation, and optimism correction using bootstrap methods. Using a single dataset of patients hospitalized with heart failure, we compared the estimates of discriminatory performance from these methods to those for a very large independent validation sample arising from the same population. As anticipated, the apparent performance was optimistically biased, with the degree of optimism diminishing as the number of events per variable increased. Differences between the bootstrap-corrected approach and the use of an independent validation sample were minimal once the number of events per variable was at least 20. Split-sample assessment resulted in too pessimistic and highly uncertain estimates of model performance. Apparent performance estimates had lower mean squared error compared to split-sample estimates, but the lowest mean squared error was obtained by bootstrap-corrected optimism estimates. For bias, variance, and mean squared error of the performance estimates, the penalty incurred by using split-sample validation was equivalent to reducing the sample size by a proportion equivalent to the proportion of the sample that was withheld for model validation. In conclusion, split-sample validation is inefficient and apparent performance is too optimistic for internal validation of regression-based prediction models. Modern validation methods, such as bootstrap-based optimism correction, are preferable. While these findings may be unsurprising to many statisticians, the results of the current study reinforce what should be considered good statistical practice in the development and validation of clinical prediction models.

258 citations


Journal ArticleDOI
TL;DR: Two approaches for estimating the Average Treatment Effect (ATE) on survival outcomes: Inverse Probability of Treatment Weighting (IPTW) and full matching are compared in an extensive set of simulations that varied the extent of confounding and the amount of misspecification of the propensity score model.
Abstract: There is increasing interest in estimating the causal effects of treatments using observational data Propensity-score matching methods are frequently used to adjust for differences in observed characteristics between treated and control individuals in observational studies Survival or time-to-event outcomes occur frequently in the medical literature, but the use of propensity score methods in survival analysis has not been thoroughly investigated This paper compares two approaches for estimating the Average Treatment Effect (ATE) on survival outcomes: Inverse Probability of Treatment Weighting (IPTW) and full matching The performance of these methods was compared in an extensive set of simulations that varied the extent of confounding and the amount of misspecification of the propensity score model We found that both IPTW and full matching resulted in estimation of marginal hazard ratios with negligible bias when the ATE was the target estimand and the treatment-selection process was weak to moderate However, when the treatment-selection process was strong, both methods resulted in biased estimation of the true marginal hazard ratio, even when the propensity score model was correctly specified When the propensity score model was correctly specified, bias tended to be lower for full matching than for IPTW The reasons for these biases and for the differences between the two methods appeared to be due to some extreme weights generated for each method Both methods tended to produce more extreme weights as the magnitude of the effects of covariates on treatment selection increased Furthermore, more extreme weights were observed for IPTW than for full matching However, the poorer performance of both methods in the presence of a strong treatment-selection process was mitigated by the use of IPTW with restriction and full matching with a caliper restriction when the propensity score model was correctly specified

198 citations


Journal ArticleDOI
TL;DR: The results indicate that simpler hierarchical models are valid in situations with few studies or sparse data, and univariate random effects logistic regression models are appropriate when a bivariate model cannot be fitted.
Abstract: Hierarchical models such as the bivariate and hierarchical summary receiver operating characteristic (HSROC) models are recommended for meta-analysis of test accuracy studies. These models are challenging to fit when there are few studies and/or sparse data (for example zero cells in contingency tables due to studies reporting 100% sensitivity or specificity); the models may not converge, or give unreliable parameter estimates. Using simulation, we investigated the performance of seven hierarchical models incorporating increasing simplifications in scenarios designed to replicate realistic situations for meta-analysis of test accuracy studies. Performance of the models was assessed in terms of estimability (percentage of meta-analyses that successfully converged and percentage where the between study correlation was estimable), bias, mean square error and coverage of the 95% confidence intervals. Our results indicate that simpler hierarchical models are valid in situations with few studies or sparse data. For synthesis of sensitivity and specificity, univariate random effects logistic regression models are appropriate when a bivariate model cannot be fitted. Alternatively, an HSROC model that assumes a symmetric SROC curve (by excluding the shape parameter) can be used if the HSROC model is the chosen meta-analytic approach. In the absence of heterogeneity, fixed effect equivalent of the models can be applied.

168 citations


Journal ArticleDOI
TL;DR: The overall recommendation is to avoid the DerSimonian and Laird method when the number of meta-analysis studies is modest and prefer a more comprehensive procedure that compares alternative inferential approaches.
Abstract: This paper investigates the impact of the number of studies on meta-analysis and meta-regression within the random-effects model framework. It is frequently neglected that inference in random-effects models requires a substantial number of studies included in meta-analysis to guarantee reliable conclusions. Several authors warn about the risk of inaccurate results of the traditional DerSimonian and Laird approach especially in the common case of meta-analysis involving a limited number of studies. This paper presents a selection of likelihood and non-likelihood methods for inference in meta-analysis proposed to overcome the limitations of the DerSimonian and Laird procedure, with a focus on the effect of the number of studies. The applicability and the performance of the methods are investigated in terms of Type I error rates and empirical power to detect effects, according to scenarios of practical interest. Simulation studies and applications to real meta-analyses highlight that it is not possible to identify an approach uniformly superior to alternatives. The overall recommendation is to avoid the DerSimonian and Laird method when the number of meta-analysis studies is modest and prefer a more comprehensive procedure that compares alternative inferential approaches. R code for meta-analysis according to all of the inferential methods examined in the paper is provided.

140 citations


Journal ArticleDOI
TL;DR: A linear mixed effect model is considered to describe the responses over age with random effects for intercept and slope parameters and there was a cutoff point for measurement costs relative to recruitment costs relating to frequency of measurements.
Abstract: Longitudinal studies are often used to investigate age-related developmental change. Whereas a single cohort design takes a group of individuals at the same initial age and follows them over time, an accelerated longitudinal design takes multiple single cohorts, each one starting at a different age. The main advantage of an accelerated longitudinal design is its ability to span the age range of interest in a shorter period of time than would be possible with a single cohort longitudinal design. This paper considers design issues for accelerated longitudinal studies. A linear mixed effect model is considered to describe the responses over age with random effects for intercept and slope parameters. Random and fixed cohort effects are used to cope with the potential bias accelerated longitudinal designs have due to multiple cohorts. The impact of other factors such as costs and the impact of dropouts on the power of testing or the precision of estimating parameters are examined. As duration-related costs increase relative to recruitment costs the best designs shift towards shorter duration and eventually cross-sectional design being best. For designs with the same duration but differing interval between measurements, we found there was a cutoff point for measurement costs relative to recruitment costs relating to frequency of measurements. Under our model of 30% dropout there was a maximum power loss of 7%.

139 citations


Journal ArticleDOI
TL;DR: Methods that can be used for estimating the effect of treatment on binary outcomes when using full matching are described and compared with that of nearest neighbour matching (with and without a caliper) and inverse probability of treatment weighting.
Abstract: Many non-experimental studies use propensity-score methods to estimate causal effects by balancing treatment and control groups on a set of observed baseline covariates. Full matching on the propensity score has emerged as a particularly effective and flexible method for utilizing all available data, and creating well-balanced treatment and comparison groups. However, full matching has been used infrequently with binary outcomes, and relatively little work has investigated the performance of full matching when estimating effects on binary outcomes. This paper describes methods that can be used for estimating the effect of treatment on binary outcomes when using full matching. It then used Monte Carlo simulations to evaluate the performance of these methods based on full matching (with and without a caliper), and compared their performance with that of nearest neighbour matching (with and without a caliper) and inverse probability of treatment weighting. The simulations varied the prevalence of the treatment and the strength of association between the covariates and treatment assignment. Results indicated that all of the approaches work well when the strength of confounding is relatively weak. With stronger confounding, the relative performance of the methods varies, with nearest neighbour matching with a caliper showing consistently good performance across a wide range of settings. We illustrate the approaches using a study estimating the effect of inpatient smoking cessation counselling on survival following hospitalization for a heart attack.

84 citations


Journal ArticleDOI
TL;DR: This work assessed the performance of alternative adjustment methods based upon bias, coverage and mean squared error, related to the estimation of true restricted mean survival in the absence of switching in the control group and found that a simplified two-stage Weibull method produced low bias across all scenarios and provided the treatment switching mechanism is suitable, represents an appropriate adjustment method.
Abstract: Estimates of the overall survival benefit of new cancer treatments are often confounded by treatment switching in randomised controlled trials (RCTs) - whereby patients randomised to the control group are permitted to switch onto the experimental treatment upon disease progression. In health technology assessment, estimates of the unconfounded overall survival benefit associated with the new treatment are needed. Several switching adjustment methods have been advocated in the literature, some of which have been used in health technology assessment. However, it is unclear which methods are likely to produce least bias in realistic RCT-based scenarios. We simulated RCTs in which switching, associated with patient prognosis, was permitted. Treatment effect size and time dependency, switching proportions and disease severity were varied across scenarios. We assessed the performance of alternative adjustment methods based upon bias, coverage and mean squared error, related to the estimation of true restricted mean survival in the absence of switching in the control group. We found that when the treatment effect was not time-dependent, rank preserving structural failure time models (RPSFTM) and iterative parameter estimation methods produced low levels of bias. However, in the presence of a time-dependent treatment effect, these methods produced higher levels of bias, similar to those produced by an inverse probability of censoring weights method. The inverse probability of censoring weights and structural nested models produced high levels of bias when switching proportions exceeded 85%. A simplified two-stage Weibull method produced low bias across all scenarios and provided the treatment switching mechanism is suitable, represents an appropriate adjustment method.

82 citations


Journal ArticleDOI
TL;DR: This paper derives McNemar’s test statistic under the null hypothesis considering both assumptions of conditional independence and conditional dependence and performs power analyses to show how the result is affected by the amount of the conditional dependence under alternative hypothesis.
Abstract: McNemar's test is often used in practice to compare the sensitivities and specificities for the evaluation of two diagnostic tests. For correct evaluation of accuracy, an intuitive recommendation is to test the diseased and the non-diseased groups separately so that the sensitivities can be compared among the diseased, and specificities can be compared among the healthy group of people. This paper provides a rigorous theoretical framework for this argument and study the validity of McNemar's test regardless of the conditional independence assumption. We derive McNemar's test statistic under the null hypothesis considering both assumptions of conditional independence and conditional dependence. We then perform power analyses to show how the result is affected by the amount of the conditional dependence under alternative hypothesis.

73 citations


Journal ArticleDOI
TL;DR: This paper proposes a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation and develops an automatic cluster centroid selection method through maximizing an average silhouette index.
Abstract: Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.

73 citations


Journal ArticleDOI
TL;DR: Copulas are utilizes to generalize the joint frailty model by introducing additional source of dependence arising from intra-subject association between tumour progression and death and are applied to a meta-analysis for assessing a recently suggested biomarker CXCL12 for survival in ovarian cancer patients.
Abstract: Dependent censoring often arises in biomedical studies when time to tumour progression (e.g., relapse of cancer) is censored by an informative terminal event (e.g., death). For meta-analysis combining existing studies, a joint survival model between tumour progression and death has been considered under semicompeting risks, which induces dependence through the study-specific frailty. Our paper here utilizes copulas to generalize the joint frailty model by introducing additional source of dependence arising from intra-subject association between tumour progression and death. The practical value of the new model is particularly evident for meta-analyses in which only a few covariates are consistently measured across studies and hence there exist residual dependence. The covariate effects are formulated through the Cox proportional hazards model, and the baseline hazards are nonparametrically modeled on a basis of splines. The estimator is then obtained by maximizing a penalized log-likelihood function. We also show that the present methodologies are easily modified for the competing risks or recurrent event data, and are generalized to accommodate left-truncation. Simulations are performed to examine the performance of the proposed estimator. The method is applied to a meta-analysis for assessing a recently suggested biomarker CXCL12 for survival in ovarian cancer patients. We implement our proposed methods in R joint.Cox package.

Journal ArticleDOI
TL;DR: A Bayesian optimal interval design for dose finding in drug-combination trials is developed and enjoys convergence properties for large samples and the entire dose-finding procedure is nonparametric (model-free), which is thus robust and also does not require the typical “nonparametric” prephase used in model-based designs for drug- Combination trials.
Abstract: Interval designs have recently attracted enormous attention due to their simplicity and desirable properties. We develop a Bayesian optimal interval design for dose finding in drug-combination trials. To determine the next dose combination based on the cumulative data, we propose an allocation rule by maximizing the posterior probability that the toxicity rate of the next dose falls inside a prespecified probability interval. The entire dose-finding procedure is nonparametric (model-free), which is thus robust and also does not require the typical "nonparametric" prephase used in model-based designs for drug-combination trials. The proposed two-dimensional interval design enjoys convergence properties for large samples. We conduct simulation studies to demonstrate the finite-sample performance of the proposed method under various scenarios and further make a modication to estimate toxicity contours by parallel dose-finding paths. Simulation results show that on average the performance of the proposed design is comparable with model-based designs, but it is much easier to implement.

Journal ArticleDOI
TL;DR: This paper describes an analytic solution based on using optimal or nearest neighbor matching, rather than caliper matching, to address the bias due to incomplete matching and found that the proposed method resulted in estimates of treatment effect that were essentially unbiased.
Abstract: Propensity-score matching is frequently used to reduce the effects of confounding when using observational data to estimate the effects of treatments. Matching allows one to estimate the average effect of treatment in the treated. Rosenbaum and Rubin coined the term ‘‘bias due to incomplete matching’’ to describe the bias that can occur when some treated subjects are excluded from the matched sample because no appropriate control subject was available. The presence of incomplete matching raises important questions around the generalizability of estimated treatment effects to the entire population of treated subjects. We describe an analytic solution to address the bias due to incomplete matching. Our method is based on using optimal or nearest neighbor matching, rather than caliper matching (which frequently results in the exclusion of some treated subjects). Within the sample matched on the propensity score, covariate adjustment using the propensity score is then employed to impute missing potential outcomes under lack of treatment for each treated subject. Using Monte Carlo simulations, we found that the proposed method resulted in estimates of treatment effect that were essentially unbiased. This method resulted in decreased bias compared to caliper matching alone and compared to either optimal matching or nearest neighbor matching alone. Caliper matching alone resulted in design bias or bias due to incomplete matching, while optimal matching or nearest neighbor matching alone resulted in bias due to residual confounding. The proposed method also tended to result in estimates with decreased mean squared error compared to when caliper matching was used.

Journal ArticleDOI
TL;DR: This paper focuses on categorising a continuous predictor within a logistic regression model, in such a way that the best discriminative ability is obtained in terms of the highest area under the receiver operating characteristic curve (AUC).
Abstract: When developing prediction models for application in clinical practice, health practitioners usually categorise clinical variables that are continuous in nature. Although categorisation is not regarded as advisable from a statistical point of view, due to loss of information and power, it is a common practice in medical research. Consequently, providing researchers with a useful and valid categorisation method could be a relevant issue when developing prediction models. Without recommending categorisation of continuous predictors, our aim is to propose a valid way to do it whenever it is considered necessary by clinical researchers. This paper focuses on categorising a continuous predictor within a logistic regression model, in such a way that the best discriminative ability is obtained in terms of the highest area under the receiver operating characteristic curve (AUC). The proposed methodology is validated when the optimal cut points' location is known in theory or in practice. In addition, the proposed method is applied to a real data-set of patients with an exacerbation of chronic obstructive pulmonary disease, in the context of the IRYSS-COPD study where a clinical prediction rule for severe evolution was being developed. The clinical variable PCO2 was categorised in a univariable and a multivariable setting.

Journal ArticleDOI
TL;DR: Nine methods for analyzing partially paired data are reviewed and it is suggested that when the sample size is moderate, the test based on the modified maximum likelihood estimator is generally superior to the other approaches when the data is normally distributed and the optimal pooled t-test performs the best when theData is not normally distributed.
Abstract: In medical experiments with the objective of testing the equality of two means, data are often partially paired by design or because of missing data. The partially paired data represent a combinati...

Journal ArticleDOI
TL;DR: It is shown how episodes of hospitalisation for disease-related events, obtained from administrative data, can be used as a surrogate for disease status, and proposes flexible multi-state models for serial hospital admissions and death in HF patients, that are able to accommodate important features of disease progression.
Abstract: In chronic diseases like heart failure (HF), the disease course and associated clinical event histories for the patient population vary widely. To improve understanding of the prognosis of patients and enable health care providers to assess and manage resources, we wish to jointly model disease progression, mortality and their relation with patient characteristics. We show how episodes of hospitalisation for disease-related events, obtained from administrative data, can be used as a surrogate for disease status. We propose flexible multi-state models for serial hospital admissions and death in HF patients, that are able to accommodate important features of disease progression, such as multiple ordered events and competing risks. Fully parametric and semi-parametric semi-Markov models are implemented using freely available software in R. The models were applied to a dataset from the administrative data bank of the Lombardia region in Northern Italy, which included 15,298 patients who had a first hospitalisation ending in 2006 and 4 years of follow-up thereafter. This provided estimates of the associations of age and gender with rates of hospital admission and length of stay in hospital, and estimates of the expected total time spent in hospital over five years. For example, older patients and men were readmitted more frequently, though the total time in hospital was roughly constant with age. We also discuss the relative merits of parametric and semi-parametric multi-state models, and model assessment and comparison.

Journal ArticleDOI
TL;DR: A system of multiphase non-linear mixed effects model is presented to model temporal patterns of longitudinal continuous measurements, with temporal decomposition to identify the phases and risk factors within each phase.
Abstract: In medical sciences, we often encounter longitudinal temporal relationships that are non-linear in nature. The influence of risk factors may also change across longitudinal follow-up. A system of multiphase non-linear mixed effects model is presented to model temporal patterns of longitudinal continuous measurements, with temporal decomposition to identify the phases and risk factors within each phase. Application of this model is illustrated using spirometry data after lung transplantation using readily available statistical software. This application illustrates the usefulness of our flexible model when dealing with complex non-linear patterns and time-varying coefficients.

Journal ArticleDOI
TL;DR: The viability of the marginal mean approach for both stepped wedge and parallel cluster randomized trials is demonstrated, with one correction appearing to provide at least nominal coverage even with only 10 clusters per group.
Abstract: Stepped wedge designs are increasingly commonplace and advantageous for cluster randomized trials when it is both unethical to assign placebo, and it is logistically difficult to allocate an intervention simultaneously to many clusters. We study marginal mean models fit with generalized estimating equations for assessing treatment effectiveness in stepped wedge cluster randomized trials. This approach has advantages over the more commonly used mixed models that (1) the population-average parameters have an important interpretation for public health applications and (2) they avoid untestable assumptions on latent variable distributions and avoid parametric assumptions about error distributions, therefore, providing more robust evidence on treatment effects. However, cluster randomized trials typically have a small number of clusters, rendering the standard generalized estimating equation sandwich variance estimator biased and highly variable and hence yielding incorrect inferences. We study the usual asymptotic generalized estimating equation inferences (i.e., using sandwich variance estimators and asymptotic normality) and four small-sample corrections to generalized estimating equation for stepped wedge cluster randomized trials and for parallel cluster randomized trials as a comparison. We show by simulation that the small-sample corrections provide improvement, with one correction appearing to provide at least nominal coverage even with only 10 clusters per group. These results demonstrate the viability of the marginal mean approach for both stepped wedge and parallel cluster randomized trials. We also study the comparative performance of the corrected methods for stepped wedge and parallel designs, and describe how the methods can accommodate interval censoring of individual failure times and incorporate semiparametric efficient estimators.

Journal ArticleDOI
TL;DR: A graphical summary display of several fit and model adequacy criteria, the fit-criteria assessment plot, is introduced to facilitate class enumeration and is an exploratory, visualisation tool that can be employed to assist decisions in the initial and decisive phase of group-based trajectory modelling analysis.
Abstract: Background and objective Group-based trajectory modelling is a model-based clustering technique applied for the identification of latent patterns of temporal changes. Despite its manifold applications in clinical and health sciences, potential problems of the model selection procedure are often overlooked. The choice of the number of latent trajectories (class-enumeration), for instance, is to a large degree based on statistical criteria that are not fail-safe. Moreover, the process as a whole is not transparent. To facilitate class enumeration, we introduce a graphical summary display of several fit and model adequacy criteria, the fit-criteria assessment plot. Methods An R-code that accepts universal data input is presented. The programme condenses relevant group-based trajectory modelling output information of model fit indices in automated graphical displays. Examples based on real and simulated data are provided to illustrate, assess and validate fit-criteria assessment plot's utility. Results Fit-criteria assessment plot provides an overview of fit criteria on a single page, placing users in an informed position to make a decision. Fit-criteria assessment plot does not automatically select the most appropriate model but eases the model assessment procedure. Conclusions Fit-criteria assessment plot is an exploratory, visualisation tool that can be employed to assist decisions in the initial and decisive phase of group-based trajectory modelling analysis. Considering group-based trajectory modelling's widespread resonance in medical and epidemiological sciences, a more comprehensive, easily interpretable and transparent display of the iterative process of class enumeration may foster group-based trajectory modelling's adequate use.

Journal ArticleDOI
TL;DR: It is discovered that the bias induced in the maximum likelihood estimate of a response probability parameter, p, for binary outcome by the process of adaptive randomization is small in magnitude and, under mild assumptions, can only be negative – causing one’s estimate to be closer to zero on average than the truth.
Abstract: Bayesian adaptive trials have the defining feature that the probability of randomization to a particular treatment arm can change as information becomes available as to its true worth However, there is still a general reluctance to implement such designs in many clinical settings One area of concern is that their frequentist operating characteristics are poor or, at least, poorly understood We investigate the bias induced in the maximum likelihood estimate of a response probability parameter, p, for binary outcome by the process of adaptive randomization We discover that it is small in magnitude and, under mild assumptions, can only be negative - causing one's estimate to be closer to zero on average than the truth A simple unbiased estimator for p is obtained, but it is shown to have a large mean squared error Two approaches are therefore explored to improve its precision based on inverse probability weighting and Rao-Blackwellization We illustrate these estimation strategies using two well-known designs from the literature

Journal ArticleDOI
TL;DR: The simulation results showed that NB and NB-GLMM were preferred for dealing with overdispersion resulting from any of the sources the authors considered, and Poisson and DS-Poisson often produced smaller standard-error estimates than expected, while PS-Poison conversely produced larger standard- error estimates.
Abstract: Overdispersion is a common problem in count data. It can occur due to extra population-heterogeneity, omission of key predictors, and outliers. Unless properly handled, this can lead to invalid inference. Our goal is to assess the differential performance of methods for dealing with overdispersion from several sources. We considered six different approaches: unadjusted Poisson regression (Poisson), deviance-scale-adjusted Poisson regression (DS-Poisson), Pearson-scale-adjusted Poisson regression (PS-Poisson), negative-binomial regression (NB), and two generalized linear mixed models (GLMM) with random intercept, log-link and Poisson (Poisson-GLMM) and negative-binomial (NB-GLMM) distributions. To rank order the preference of the models, we used Akaike's information criteria/Bayesian information criteria values, standard error, and 95% confidence-interval coverage of the parameter values. To compare these methods, we used simulated count data with overdispersion of different magnitude from three different sources. Mean of the count response was associated with three predictors. Data from two real-case studies are also analyzed. The simulation results showed that NB and NB-GLMM were preferred for dealing with overdispersion resulting from any of the sources we considered. Poisson and DS-Poisson often produced smaller standard-error estimates than expected, while PS-Poisson conversely produced larger standard-error estimates. Thus, it is good practice to compare several model options to determine the best method of modeling count data.

Journal ArticleDOI
TL;DR: This work adapts a proposal from Rosenbaum that addresses concerns about selection bias and makes its assumptions explicit and discusses methods for constructing approximate confidence intervals for treatment effects on quantiles of the outcome distribution or on proportions of patients with outcomes preferable to various cutoffs.
Abstract: Length of stay in the intensive care unit (ICU) is a common outcome measure in randomized trials of ICU interventions. Because many patients die in the ICU, it is difficult to disentangle treatment effects on length of stay from effects on mortality; conventional analyses depend on assumptions that are often unstated and hard to interpret or check. We adapt a proposal from Rosenbaum that addresses concerns about selection bias and makes its assumptions explicit. A composite outcome is constructed that equals ICU length of stay if the patient was discharged alive and indicates death otherwise. Given any preference ordering that compares death with possible lengths of stay, we can estimate the intervention's effects on the composite outcome distribution. Sensitivity analyses can show results for different preference orderings. We discuss methods for constructing approximate confidence intervals for treatment effects on quantiles of the outcome distribution or on proportions of patients with outcomes preferable to various cutoffs. Strengths and weaknesses of possible primary significance tests (including the Wilcoxon-Mann-Whitney rank sum test and a heteroskedasticity-robust variant due to Brunner and Munzel) are reviewed. An illustrative example reanalyzes a randomized trial of an ICU staffing intervention.

Journal ArticleDOI
TL;DR: A decomposition of the score statistic is shown to be able to be interpreted as comparing the precision of estimates from the multivariate and univariate models, and a method for calculating study weights is derived, so that percentage study weights can accompany the results fromMultivariate and network meta-analyses as they do in conventional univariate meta-analysis.
Abstract: Multivariate and network meta-analysis have the potential for the estimated mean of one effect to borrow strength from the data on other effects of interest. The extent of this borrowing of strength is usually assessed informally. We present new mathematical definitions of ‘borrowing of strength’. Our main proposal is based on a decomposition of the score statistic, which we show can be interpreted as comparing the precision of estimates from the multivariate and univariate models. Our definition of borrowing of strength therefore emulates the usual informal assessment. We also derive a method for calculating study weights, which we embed into the same framework as our borrowing of strength statistics, so that percentage study weights can accompany the results from multivariate and network meta-analyses as they do in conventional univariate meta-analyses. Our proposals are illustrated using three meta-analyses involving correlated effects for multiple outcomes, multiple risk factor associations and multip...

Journal ArticleDOI
TL;DR: A marginalized two-part model for longitudinal data is proposed that allows investigators to obtain the effect of covariates on the overall population mean and maintains the flexibility to include complex random-effect structures and easily estimate functions of the overall mean.
Abstract: In health services research, it is common to encounter semicontinuous data, characterized by a point mass at zero followed by a right-skewed continuous distribution with positive support. Examples include health expenditures, in which the zeros represent a subpopulation of patients who do not use health services, while the continuous distribution describes the level of expenditures among health services users. Longitudinal semicontinuous data are typically analyzed using two-part random-effect mixtures with one component that models the probability of health services use, and a second component that models the distribution of log-scale positive expenditures among users. However, because the second part conditions on a non-zero response, obtaining interpretable effects of covariates on the combined population of health services users and non-users is not straightforward, even though this is often of greatest interest to investigators. Here, we propose a marginalized two-part model for longitudinal data that allows investigators to obtain the effect of covariates on the overall population mean. The model additionally provides estimates of the overall population mean on the original, untransformed scale, and many covariates take a dual population average and subject-specific interpretation. Using a Bayesian estimation approach, this model maintains the flexibility to include complex random-effect structures and easily estimate functions of the overall mean. We illustrate this approach by evaluating the effect of a copayment increase on health care expenditures in the Veterans Affairs health care system over a four-year period.

Journal ArticleDOI
TL;DR: A model-based framework for comparing measurement systems that overcomes the challenges of the current technique is proposed and is based on a simple metric, the probability of agreement, and a corresponding plot which can be used to summarize the agreement between two measurement systems.
Abstract: The comparison of two measurement systems is important in medical and other contexts. A common goal is to decide if a new measurement system agrees suitably with an existing one, and hence whether the two can be used interchangeably. Various methods for assessing interchangeability are available, the most popular being the limits of agreement approach due to Bland and Altman. In this article, we review the challenges of this technique and propose a model-based framework for comparing measurement systems that overcomes those challenges. The proposal is based on a simple metric, the probability of agreement, and a corresponding plot which can be used to summarize the agreement between two measurement systems. We also make recommendations for a study design that facilitates accurate and precise estimation of the probability of agreement.

Journal ArticleDOI
TL;DR: The bootstrapping approach demonstrates appropriate coverage of the nominal 95% CI over a spectrum of populations and sample sizes and is applicable to other binomial proportions with homogeneous responses.
Abstract: Objectives Assessing high-sensitivity tests for mortal illness is crucial in emergency and critical care medicine. Estimating the 95% confidence interval (CI) of the likelihood ratio (LR) can be challenging when sample sensitivity is 100%. We aimed to develop, compare, and automate a bootstrapping method to estimate the negative LR CI when sample sensitivity is 100%. Methods The lowest population sensitivity that is most likely to yield sample sensitivity 100% is located using the binomial distribution. Random binomial samples generated using this population sensitivity are then used in the LR bootstrap. A free R program, "bootLR," automates the process. Extensive simulations were performed to determine how often the LR bootstrap and comparator method 95% CIs cover the true population negative LR value. Finally, the 95% CI was compared for theoretical sample sizes and sensitivities approaching and including 100% using: (1) a technique of individual extremes, (2) SAS software based on the technique of Gart and Nam, (3) the Score CI (as implemented in the StatXact, SAS, and R PropCI package), and (4) the bootstrapping technique. Results The bootstrapping approach demonstrates appropriate coverage of the nominal 95% CI over a spectrum of populations and sample sizes. Considering a study of sample size 200 with 100 patients with disease, and specificity 60%, the lowest population sensitivity with median sample sensitivity 100% is 99.31%. When all 100 patients with disease test positive, the negative LR 95% CIs are: individual extremes technique (0,0.073), StatXact (0,0.064), SAS Score method (0,0.057), R PropCI (0,0.062), and bootstrap (0,0.048). Similar trends were observed for other sample sizes. Conclusions When study samples demonstrate 100% sensitivity, available methods may yield inappropriately wide negative LR CIs. An alternative bootstrapping approach and accompanying free open-source R package were developed to yield realistic estimates easily. This methodology and implementation are applicable to other binomial proportions with homogeneous responses.

Journal ArticleDOI
TL;DR: This paper proposes to use the generalized estimating equation (GEE) approach to assess treatment effect in split-mouth trials, accounting for correlations among observations, and introduces closed-form sample size formulas forsplit-mouth design with continuous and binary outcomes.
Abstract: Split-mouth designs are frequently used in dental clinical research, where a mouth is divided into two or more experimental segments that are randomly assigned to different treatments. It has the distinct advantage of removing a lot of inter-subject variability from the estimated treatment effect. Methods of statistical analyses for split-mouth design have been well developed. However, little work is available on sample size consideration at the design phase of a split-mouth trial, although many researchers pointed out that the split-mouth design can only be more efficient than a parallel-group design when within-subject correlation coefficient is substantial. In this paper, we propose to use the generalized estimating equation (GEE) approach to assess treatment effect in split-mouth trials, accounting for correlations among observations. Closed-form sample size formulas are introduced for the split-mouth design with continuous and binary outcomes, assuming exchangeable and "nested exchangeable" correlation structures for outcomes from the same subject. The statistical inference is based on the large sample approximation under the GEE approach. Simulation studies are conducted to investigate the finite-sample performance of the GEE sample size formulas. A dental clinical trial example is presented for illustration.

Journal ArticleDOI
TL;DR: Extending the two-stage drop-the-losers design to have more than two stages is discussed, which is shown to considerably reduce the sample size required, and the impact of delay between recruitment and assessment as well as unknown variance on the drop- the-loser designs are assessed.
Abstract: Multi-arm multi-stage trials can improve the efficiency of the drug development process when multiple new treatments are available for testing. A group-sequential approach can be used in order to design multi-arm multi-stage trials, using an extension to Dunnett’s multiple-testing procedure. The actual sample size used in such a trial is a random variable that has high variability. This can cause problems when applying for funding as the cost will also be generally highly variable. This motivates a type of design that provides the efficiency advantages of a group-sequential multi-arm multi-stage design, but has a fixed sample size. One such design is the two-stage drop-the-losers design, in which a number of experimental treatments, and a control treatment, are assessed at a prescheduled interim analysis. The best-performing experimental treatment and the control treatment then continue to a second stage. In this paper, we discuss extending this design to have more than two stages, which is shown to considerably reduce the sample size required. We also compare the resulting sample size requirements to the sample size distribution of analogous group-sequential multi-arm multi-stage designs. The sample size required for a multi-stage drop-the-losers design is usually higher than, but close to, the median sample size of a groupsequential multi-arm multi-stage trial. In many practical scenarios, the disadvantage of a slight loss in average efficiency would be overcome by the huge advantage of a fixed sample size. We assess the impact of delay between recruitment and assessment as well as unknown variance on the drop-thelosers designs.

Journal ArticleDOI
TL;DR: A doubly robust estimator of the attributable fraction function is derived, which requires one model for the outcome, and one joint models for the exposure and censoring.
Abstract: The attributable fraction is a commonly used measure that quantifies the public health impact of an exposure on an outcome. It was originally defined for binary outcomes, but an extension has recently been proposed for right-censored survival time outcomes; the so-called attributable fraction function. A maximum likelihood estimator of the attributable fraction function has been developed, which requires a model for the outcome. In this paper, we derive a doubly robust estimator of the attributable fraction function. This estimator requires one model for the outcome, and one joint model for the exposure and censoring. The estimator is consistent if either model is correct, not necessarily both.

Journal ArticleDOI
TL;DR: A model to represent the impact of preferences on trial outcomes, in addition to the usual treatment effect is described, and how outcomes might differ between participants who would choose one treatment or the other, if they were free to do so is described.
Abstract: The treatments under comparison in a randomised trial should ideally have equal value and acceptability – a position of equipoise – to study participants. However, it is unlikely that true equipoise exists in practice, because at least some participants may have preferences for one treatment or the other, for a variety of reasons. These preferences may be related to study outcomes, and hence affect the estimation of the treatment effect. Furthermore, the effects of preferences can sometimes be substantial, and may even be larger than the direct effect of treatment. Preference effects are of interest in their own right, but they cannot be assessed in the standard parallel group design for a randomised trial. In this paper, we describe a model to represent the impact of preferences on trial outcomes, in addition to the usual treatment effect. In particular, we describe how outcomes might differ between participants who would choose one treatment or the other, if they were free to do so. Additionally, we inv...