scispace - formally typeset
Search or ask a question

Showing papers in "Statistics in Medicine in 2017"


Journal ArticleDOI
TL;DR: How established methods of meta‐regression and random effects modelling from mainstream meta‐analysis are being adapted to perform MR analyses are clarified, and the ability of two popular random effects models to provide robustness to pleiotropy under the IVW approach is investigated.
Abstract: Mendelian randomization (MR) uses genetic data to probe questions of causality in epidemiological research, by invoking the Instrumental Variable (IV) assumptions. In recent years, it has become commonplace to attempt MR analyses by synthesising summary data estimates of genetic association gleaned from large and independent study populations. This is referred to as two-sample summary data MR. Unfortunately, due to the sheer number of variants that can be easily included into summary data MR analyses, it is increasingly likely that some do not meet the IV assumptions due to pleiotropy. There is a pressing need to develop methods that can both detect and correct for pleiotropy, in order to preserve the validity of the MR approach in this context. In this paper, we aim to clarify how established methods of meta-regression and random effects modelling from mainstream meta-analysis are being adapted to perform this task. Specifically, we focus on two contrastin g approaches: the Inverse Variance Weighted (IVW) method which assumes in its simplest form that all genetic variants are valid IVs, and the method of MR-Egger regression that allows all variants to violate the IV assumptions, albeit in a specific way. We investigate the ability of two popular random effects models to provide robustness to pleiotropy under the IVW approach, and propose statistics to quantify the relative goodness-of-fit of the IVW approach over MR-Egger regression. © 2017 The Authors. Statistics in Medicine Published by JohnWiley & Sons Ltd.

702 citations


Journal ArticleDOI
TL;DR: It is found that many authors provided an unclear or incorrect interpretation of the regression coefficients associated with the Fine‐Gray subdistribution hazard model, and suggestions for interpreting these coefficients are proposed.
Abstract: In survival analysis, a competing risk is an event whose occurrence precludes the occurrence of the primary event of interest. Outcomes in medical research are frequently subject to competing risks. In survival analysis, there are 2 key questions that can be addressed using competing risk regression models: first, which covariates affect the rate at which events occur, and second, which covariates affect the probability of an event occurring over time. The cause-specific hazard model estimates the effect of covariates on the rate at which events occur in subjects who are currently event-free. Subdistribution hazard ratios obtained from the Fine-Gray model describe the relative effect of covariates on the subdistribution hazard function. Hence, the covariates in this model can also be interpreted as having an effect on the cumulative incidence function or on the probability of events occurring over time. We conducted a review of the use and interpretation of the Fine-Gray subdistribution hazard model in articles published in the medical literature in 2015. We found that many authors provided an unclear or incorrect interpretation of the regression coefficients associated with this model. An incorrect and inconsistent interpretation of regression coefficients may lead to confusion when comparing results across different studies. Furthermore, an incorrect interpretation of estimated regression coefficients can result in an incorrect understanding about the magnitude of the association between exposure and the incidence of the outcome. The objective of this article is to clarify how these regression coefficients should be reported and to propose suggestions for interpreting these coefficients.

543 citations


Journal ArticleDOI
TL;DR: A suite of analyses that can complement the fitting of multilevel logistic regression models are described, which permit analysts to estimate the marginal or population‐average effect of covariates measured at the subject and cluster level, in contrast to the within‐cluster or cluster‐specific effects arising from the original multileVEL logistics regression model.
Abstract: Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher-level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within-cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population-average effect of covariates measured at the subject and cluster level, in contrast to the within-cluster or cluster-specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster-level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

364 citations


Journal ArticleDOI
TL;DR: This tutorial paper outlines the key statistical methods for one‐stage and two‐stage IPD meta‐analyses, and provides 10 key reasons why they may produce different summary results, and explains that most differences arise because of different modelling assumptions.
Abstract: Meta-analysis using individual participant data (IPD) obtains and synthesises the raw, participant-level data from a set of relevant studies. The IPD approach is becoming an increasingly popular tool as an alternative to traditional aggregate data meta-analysis, especially as it avoids reliance on published results and provides an opportunity to investigate individual-level interactions, such as treatment-effect modifiers. There are two statistical approaches for conducting an IPD meta-analysis: one-stage and two-stage. The one-stage approach analyses the IPD from all studies simultaneously, for example, in a hierarchical regression model with random effects. The two-stage approach derives aggregate data (such as effect estimates) in each study separately and then combines these in a traditional meta-analysis model. There have been numerous comparisons of the one-stage and two-stage approaches via theoretical consideration, simulation and empirical examples, yet there remains confusion regarding when each approach should be adopted, and indeed why they may differ. In this tutorial paper, we outline the key statistical methods for one-stage and two-stage IPD meta-analyses, and provide 10 key reasons why they may produce different summary results. We explain that most differences arise because of different modelling assumptions, rather than the choice of one-stage or two-stage itself. We illustrate the concepts with recently published IPD meta-analyses, summarise key statistical software and provide recommendations for future IPD meta-analyses. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

320 citations


Journal ArticleDOI
TL;DR: This tutorial focuses on a general class of problems arising in data-driven subgroup analysis, namely, identification of biomarkers with strong predictive properties and patient subgroups with desirable characteristics such as improved benefit and/or safety.
Abstract: It is well known that both the direction and magnitude of the treatment effect in clinical trials are often affected by baseline patient characteristics (generally referred to as biomarkers). Characterization of treatment effect heterogeneity plays a central role in the field of personalized medicine and facilitates the development of tailored therapies. This tutorial focuses on a general class of problems arising in data-driven subgroup analysis, namely, identification of biomarkers with strong predictive properties and patient subgroups with desirable characteristics such as improved benefit and/or safety. Limitations of ad-hoc approaches to biomarker exploration and subgroup identification in clinical trials are discussed, and the ad-hoc approaches are contrasted with principled approaches to exploratory subgroup analysis based on recent advances in machine learning and data mining. A general framework for evaluating predictive biomarkers and identification of associated subgroups is introduced. The tutorial provides a review of a broad class of statistical methods used in subgroup discovery, including global outcome modeling methods, global treatment effect modeling methods, optimal treatment regimes, and local modeling methods. Commonly used subgroup identification methods are illustrated using two case studies based on clinical trials with binary and survival endpoints. Copyright © 2016 John Wiley & Sons, Ltd.

218 citations


Journal ArticleDOI
TL;DR: It is shown, through theoretical arguments and a simulation study, that the multivariable MR‐Egger method has advantages over its univariable counterpart in terms of plausibility of the assumption needed for consistent causal estimation and power to detect a causal effect when this assumption is satisfied.
Abstract: Methods have been developed for Mendelian randomization that can obtain consistent causal estimates while relaxing the instrumental variable assumptions. These include multivariable Mendelian randomization, in which a genetic variant may be associated with multiple risk factors so long as any association with the outcome is via the measured risk factors (measured pleiotropy), and the MR-Egger (Mendelian randomization-Egger) method, in which a genetic variant may be directly associated with the outcome not via the risk factor of interest, so long as the direct effects of the variants on the outcome are uncorrelated with their associations with the risk factor (unmeasured pleiotropy). In this paper, we extend the MR-Egger method to a multivariable setting to correct for both measured and unmeasured pleiotropy. We show, through theoretical arguments and a simulation study, that the multivariable MR-Egger method has advantages over its univariable counterpart in terms of plausibility of the assumption needed for consistent causal estimation and power to detect a causal effect when this assumption is satisfied. The methods are compared in an applied analysis to investigate the causal effect of high-density lipoprotein cholesterol on coronary heart disease risk. The multivariable MR-Egger method will be useful to analyse high-dimensional data in situations where the risk factors are highly related and it is difficult to find genetic variants specifically associated with the risk factor of interest (multivariable by design), and as a sensitivity analysis when the genetic variants are known to have pleiotropic effects on measured risk factors.

202 citations


Journal ArticleDOI
TL;DR: Two simple modifications of Firth's logistic regression resulting in unbiased predicted probabilities are proposed and one introduces some bias, but this is compensated by a decrease in the mean squared error.
Abstract: Firth's logistic regression has become a standard approach for the analysis of binary outcomes with small samples. Whereas it reduces the bias in maximum likelihood estimates of coefficients, bias towards one-half is introduced in the predicted probabilities. The stronger the imbalance of the outcome, the more severe is the bias in the predicted probabilities. We propose two simple modifications of Firth's logistic regression resulting in unbiased predicted probabilities. The first corrects the predicted probabilities by a post hoc adjustment of the intercept. The other is based on an alternative formulation of Firth's penalization as an iterative data augmentation procedure. Our suggested modification consists in introducing an indicator variable that distinguishes between original and pseudo-observations in the augmented data. In a comprehensive simulation study, these approaches are compared with other attempts to improve predictions based on Firth's penalization and to other published penalization strategies intended for routine use. For instance, we consider a recently suggested compromise between maximum likelihood and Firth's logistic regression. Simulation results are scrutinized with regard to prediction and effect estimation. We find that both our suggested methods do not only give unbiased predicted probabilities but also improve the accuracy conditional on explanatory variables compared with Firth's penalization. While one method results in effect estimates identical to those of Firth's penalization, the other introduces some bias, but this is compensated by a decrease in the mean squared error. Finally, all methods considered are illustrated and compared for a study on arterial closure devices in minimally invasive cardiac surgery. Copyright © 2017 John Wiley & Sons, Ltd.

163 citations


Journal ArticleDOI
TL;DR: The present study compared nonparametric bootstrap test with pooled resampling method corresponding to parametric, non Parametric, and permutation tests through extensive simulations under various conditions and using real data examples to overcome the problem related with small samples in hypothesis testing.
Abstract: Experimental studies in biomedical research frequently pose analytical problems related to small sample size. In such studies, there are conflicting findings regarding the choice of parametric and nonparametric analysis, especially with non-normal data. In such instances, some methodologists questioned the validity of parametric tests and suggested nonparametric tests. In contrast, other methodologists found nonparametric tests to be too conservative and less powerful and thus preferred using parametric tests. Some researchers have recommended using a bootstrap test; however, this method also has small sample size limitation. We used a pooled method in nonparametric bootstrap test that may overcome the problem related with small samples in hypothesis testing. The present study compared nonparametric bootstrap test with pooled resampling method corresponding to parametric, nonparametric, and permutation tests through extensive simulations under various conditions and using real data examples. The nonparametric pooled bootstrap t-test provided equal or greater power for comparing two means as compared with unpaired t-test, Welch t-test, Wilcoxon rank sum test, and permutation test while maintaining type I error probability for any conditions except for Cauchy and extreme variable lognormal distributions. In such cases, we suggest using an exact Wilcoxon rank sum test. Nonparametric bootstrap paired t-test also provided better performance than other alternatives. Nonparametric bootstrap test provided benefit over exact Kruskal-Wallis test. We suggest using nonparametric bootstrap test with pooled resampling method for comparing paired or unpaired means and for validating the one way analysis of variance test results for non-normal data in small sample size studies. Copyright © 2017 John Wiley & Sons, Ltd.

152 citations


Journal ArticleDOI
TL;DR: It is found that properly specified CPMs generally have good finite sample performance with moderate sample sizes, but that bias may occur when the sample size is small, and these models are fairly robust to minor or moderate link function misspecification in the authors' simulations.
Abstract: We study the application of a widely used ordinal regression model, the cumulative probability model (CPM), for continuous outcomes. Such models are attractive for the analysis of continuous response variables because they are invariant to any monotonic transformation of the outcome and because they directly model the cumulative distribution function from which summaries such as expectations and quantiles can easily be derived. Such models can also readily handle mixed type distributions. We describe the motivation, estimation, inference, model assumptions, and diagnostics. We demonstrate that CPMs applied to continuous outcomes are semiparametric transformation models. Extensive simulations are performed to investigate the finite sample performance of these models. We find that properly specified CPMs generally have good finite sample performance with moderate sample sizes, but that bias may occur when the sample size is small. Cumulative probability models are fairly robust to minor or moderate link function misspecification in our simulations. For certain purposes, the CPMs are more efficient than other models. We illustrate their application, with model diagnostics, in a study of the treatment of HIV. CD4 cell count and viral load 6 months after the initiation of antiretroviral therapy are modeled using CPMs; both variables typically require transformations, and viral load has a large proportion of measurements below a detection limit.

126 citations


Journal ArticleDOI
TL;DR: Researchers should be cautious in deriving 95% prediction intervals following a frequentist random‐effects meta‐analysis until a more reliable solution is identified, especially when there are few studies.
Abstract: A random effects meta-analysis combines the results of several independent studies to summarise the evidence about a particular measure of interest, such as a treatment effect. The approach allows for unexplained between-study heterogeneity in the true treatment effect by incorporating random study effects about the overall mean. The variance of the mean effect estimate is conventionally calculated by assuming that the between study variance is known; however, it has been demonstrated that this approach may be inappropriate, especially when there are few studies. Alternative methods that aim to account for this uncertainty, such as Hartung-Knapp, Sidik-Jonkman and Kenward-Roger, have been proposed and shown to improve upon the conventional approach in some situations. In this paper, we use a simulation study to examine the performance of several of these methods in terms of the coverage of the 95% confidence and prediction intervals derived from a random effects meta-analysis estimated using restricted maximum likelihood. We show that, in terms of the confidence intervals, the Hartung-Knapp correction performs well across a wide-range of scenarios and outperforms other methods when heterogeneity was large and/or study sizes were similar. However, the coverage of the Hartung-Knapp method is slightly too low when the heterogeneity is low (I2 30%) and study sizes are similar. In other situations, especially when heterogeneity is small and the study sizes are quite varied, the coverage is far too low and could not be consistently improved by either increasing the number of studies, altering the degrees of freedom or using variance inflation methods. Therefore, researchers should be cautious in deriving 95% prediction intervals following a frequentist random-effects meta-analysis until a more reliable solution is identified. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

116 citations


Journal ArticleDOI
TL;DR: It is concluded that the inclusion of real-world evidence from non-randomized studies has the potential to corroborate findings from RCTs, increase precision and enhance the decision-making process.
Abstract: Non-randomized studies aim to reveal whether or not interventions are effective in real-life clinical practice, and there is a growing interest in including such evidence in the decision-making process. We evaluate existing methodologies and present new approaches to using non-randomized evidence in a network meta-analysis of randomized controlled trials (RCTs) when the aim is to assess relative treatment effects. We first discuss how to assess compatibility between the two types of evidence. We then present and compare an array of alternative methods that allow the inclusion of non-randomized studies in a network meta-analysis of RCTs: the naive data synthesis, the design-adjusted synthesis, the use of non-randomized evidence as prior information and the use of three-level hierarchical models. We apply some of the methods in two previously published clinical examples comparing percutaneous interventions for the treatment of coronary in-stent restenosis and antipsychotics in patients with schizophrenia. We discuss in depth the advantages and limitations of each method, and we conclude that the inclusion of real-world evidence from non-randomized studies has the potential to corroborate findings from RCTs, increase precision and enhance the decision-making process. Copyright © 2017 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: It is concluded that disentangling adherence to treatment and the efficacy and safety of treatment in patients that adhere leads to a transparent and clinical meaningful assessment of treatment risks and benefits.
Abstract: Defining the scientific questions of interest in a clinical trial is crucial to align its planning, design, conduct, analysis, and interpretation. However, practical experience shows that oftentimes specific choices in the statistical analysis blur the scientific question either in part or even completely, resulting in misalignment between trial objectives, conduct, analysis, and confusion in interpretation. The need for more clarity was highlighted by the Steering Committee of the International Council for Harmonization (ICH) in 2014, which endorsed a Concept Paper with the goal of developing a new regulatory guidance, suggested to be an addendum to ICH guideline E9. Triggered by these developments, we elaborate in this paper what the relevant questions in drug development are and how they fit with the current practice of intention-to-treat analyses. To this end, we consider the perspectives of patients, physicians, regulators, and payers. We argue that despite the different backgrounds and motivations of the various stakeholders, they all have similar interests in what the clinical trial estimands should be. Broadly, these can be classified into estimands addressing (a) lack of adherence to treatment due to different reasons and (b) efficacy and safety profiles when patients, in fact, are able to adhere to the treatment for its intended duration. We conclude that disentangling adherence to treatment and the efficacy and safety of treatment in patients that adhere leads to a transparent and clinical meaningful assessment of treatment risks and benefits. We touch upon statistical considerations and offer a discussion of additional implications. Copyright © 2016 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this paper, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed instead of linear rank statistics for the split point selection.
Abstract: The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: It is concluded that the Hartung and Knapp modification may be a suitable replacement for the standard method and analysts who advocate the modified method should be ready to defend its use against the possible objections to it that are presented.
Abstract: The modified method for random-effects meta-analysis, usually attributed to Hartung and Knapp and also proposed by Sidik and Jonkman, is easy to implement and is becoming advocated for general use. Here, we examine a range of potential concerns about the widespread adoption of this method. Motivated by these issues, a variety of different conventions can be adopted when using the modified method in practice. We describe and investigate the use of a variety of these conventions using a new taxonomy of meta-analysis datasets. We conclude that the Hartung and Knapp modification may be a suitable replacement for the standard method. Despite this, analysts who advocate the modified method should be ready to defend its use against the possible objections to it that we present. We further recommend that the results from more conventional approaches should be used as sensitivity analyses when using the modified method. It has previously been suggested that a common-effect analysis should be used for this purpose but we suggest amending this recommendation and argue that a standard random-effects analysis should be used instead.

Journal ArticleDOI
TL;DR: The proposed closed testing procedure may be useful in selecting appropriate update methods for previously developed prediction models by considering the balance between the amount of evidence for updating in the new patient sample and the danger of overfitting.
Abstract: Prediction models fitted with logistic regression often show poor performance when applied in populations other than the development population. Model updating may improve predictions. Previously suggested methods vary in their extensiveness of updating the model. We aim to define a strategy in selecting an appropriate update method that considers the balance between the amount of evidence for updating in the new patient sample and the danger of overfitting. We consider recalibration in the large (re-estimation of model intercept); recalibration (re-estimation of intercept and slope) and model revision (re-estimation of all coefficients) as update methods. We propose a closed testing procedure that allows the extensiveness of the updating to increase progressively from a minimum (the original model) to a maximum (a completely revised model). The procedure involves multiple testing with maintaining approximately the chosen type I error rate. We illustrate this approach with three clinical examples: patients with prostate cancer, traumatic brain injury and children presenting with fever. The need for updating the prostate cancer model was completely driven by a different model intercept in the update sample (adjustment: 2.58). Separate testing of model revision against the original model showed statistically significant results, but led to overfitting (calibration slope at internal validation = 0.86). The closed testing procedure selected recalibration in the large as update method, without overfitting. The advantage of the closed testing procedure was confirmed by the other two examples. We conclude that the proposed closed testing procedure may be useful in selecting appropriate update methods for previously developed prediction models

Journal ArticleDOI
TL;DR: A review of randomized controlled trials with survival outcomes that were published in high‐impact general medical journals found that in the majority of these studies, the potential presence of competing risks was not accounted for in the statistical analyses that were described.
Abstract: In studies with survival or time-to-event outcomes, a competing risk is an event whose occurrence precludes the occurrence of the primary event of interest. Specialized statistical methods must be used to analyze survival data in the presence of competing risks. We conducted a review of randomized controlled trials with survival outcomes that were published in high-impact general medical journals. Of 40 studies that we identified, 31 (77.5%) were potentially susceptible to competing risks. However, in the majority of these studies, the potential presence of competing risks was not accounted for in the statistical analyses that were described. Of the 31 studies potentially susceptible to competing risks, 24 (77.4%) reported the results of a Kaplan–Meier survival analysis, while only five (16.1%) reported using cumulative incidence functions to estimate the incidence of the outcome over time in the presence of competing risks. The former approach will tend to result in an overestimate of the incidence of the outcome over time, while the latter approach will result in unbiased estimation of the incidence of the primary outcome over time. We provide recommendations on the analysis and reporting of randomized controlled trials with survival outcomes in the presence of competing risks. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

Journal ArticleDOI
TL;DR: A multivariate time series model for weekly surveillance counts on norovirus gastroenteritis from the 12 city districts of Berlin, in six age groups, from week 2011/27 to week 2015/26 is described and the following year is used to assess the quality of the predictions.
Abstract: Routine surveillance of notifiable infectious diseases gives rise to daily or weekly counts of reported cases stratified by region and age group. From a public health perspective, forecasts of infectious disease spread are of central importance. We argue that such forecasts need to properly incorporate the attached uncertainty, so they should be probabilistic in nature. However, forecasts also need to take into account temporal dependencies inherent to communicable diseases, spatial dynamics through human travel and social contact patterns between age groups. We describe a multivariate time series model for weekly surveillance counts on norovirus gastroenteritis from the 12 city districts of Berlin, in six age groups, from week 2011/27 to week 2015/26. The following year (2015/27 to 2016/26) is used to assess the quality of the predictions. Probabilistic forecasts of the total number of cases can be derived through Monte Carlo simulation, but first and second moments are also available analytically. Final size forecasts as well as multivariate forecasts of the total number of cases by age group, by district and by week are compared across different models of varying complexity. This leads to a more general discussion of issues regarding modelling, prediction and evaluation of public health surveillance data. Copyright © 2017 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: How a sharper focus upon statistical power may reduce the impact of selective reporting bias in meta-analyses of medical research is documented and an alternative unrestricted weighted least squares weighted average can be used instead of WAAP.
Abstract: The central purpose of this study is to document how a sharper focus upon statistical power may reduce the impact of selective reporting bias in meta-analyses. We introduce the weighted average of the adequately powered (WAAP) as an alternative to the conventional random-effects (RE) estimator. When the results of some of the studies have been selected to be positive and statistically significant (i.e. selective reporting), our simulations show that WAAP will have smaller bias than RE at no loss to its other statistical properties. When there is no selective reporting, the difference between RE's and WAAP's statistical properties is practically negligible. Nonetheless, when selective reporting is especially severe or heterogeneity is very large, notable bias can remain in all weighted averages. The main limitation of this approach is that the majority of meta-analyses of medical research do not contain any studies with adequate power (i.e. >80%). For such areas of medical research, it remains important to document their low power, and, as we demonstrate, an alternative unrestricted weighted least squares weighted average can be used instead of WAAP. Copyright © 2017 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This article focuses on parametric multistate models, both Markov and semi‐Markov, and develops a flexible framework where each transition can be specified by a variety of parametric models including exponential, Weibull, Gompertz, Royston‐Parmar proportional hazards models or log‐logistic, log‐normal, generalised gamma accelerated failure time models.
Abstract: Multistate models are increasingly being used to model complex disease profiles. By modelling transitions between disease states, accounting for competing events at each transition, we can gain a much richer understanding of patient trajectories and how risk factors impact over the entire disease pathway. In this article, we concentrate on parametric multistate models, both Markov and semi-Markov, and develop a flexible framework where each transition can be specified by a variety of parametric models including exponential, Weibull, Gompertz, Royston-Parmar proportional hazards models or log-logistic, log-normal, generalised gamma accelerated failure time models, possibly sharing parameters across transitions. We also extend the framework to allow time-dependent effects. We then use an efficient and generalisable simulation method to calculate transition probabilities from any fitted multistate model, and show how it facilitates the simple calculation of clinically useful measures, such as expected length of stay in each state, and differences and ratios of proportion within each state as a function of time, for specific covariate patterns. We illustrate our methods using a dataset of patients with primary breast cancer. User-friendly Stata software is provided.

Journal ArticleDOI
TL;DR: This investigation demonstrates that substantial efficiencies are possible if the drug works in most or all baskets, at the cost of modest losses of power if theDrug works in only a single basket.
Abstract: The landscape for early phase cancer clinical trials is changing dramatically because of the advent of targeted therapy. Increasingly, new drugs are designed to work against a target such as the presence of a specific tumor mutation. Because typically only a small proportion of cancer patients will possess the mutational target, but the mutation is present in many different cancers, a new class of basket trials is emerging, whereby the drug is tested simultaneously in different baskets, that is, subgroups of different tumor types. Investigators desire not only to test whether the drug works but also to determine which types of tumors are sensitive to the drug. A natural strategy is to conduct parallel trials, with the drug 's effectiveness being tested separately, using for example, the popular Simon two-stage design independently in each basket. The work presented is motivated by the premise that the efficiency of this strategy can be improved by assessing the homogeneity of the baskets ' response rates at an interim analysis and aggregating the baskets in the second stage if the results suggest the drug might be effective in all or most baskets. Via simulations, we assess the relative efficiencies of the two strategies. Because the operating characteristics depend on how many tumor types are sensitive to the drug, there is no uniformly efficient strategy. However, our investigation demonstrates that substantial efficiencies are possible if the drug works in most or all baskets, at the cost of modest losses of power if the drug works in only a single basket. Copyright © 2017 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The results suggest that regression adjustment and matching weights, regardless of the propensity score model estimation method, provide lower bias and mean squared error in the context of rare binary outcomes.
Abstract: Nonrandomized studies of treatments from electronic healthcare databases are critical for producing the evidence necessary to making informed treatment decisions, but often rely on comparing rates of events observed in a small number of patients. In addition, studies constructed from electronic healthcare databases, for example, administrative claims data, often adjust for many, possibly hundreds, of potential confounders. Despite the importance of maximizing efficiency when there are many confounders and few observed outcome events, there has been relatively little research on the relative performance of different propensity score methods in this context. In this paper, we compare a wide variety of propensity-based estimators of the marginal relative risk. In contrast to prior research that has focused on specific statistical methods in isolation of other analytic choices, we instead consider a method to be defined by the complete multistep process from propensity score modeling to final treatment effect estimation. Propensity score model estimation methods considered include ordinary logistic regression, Bayesian logistic regression, lasso, and boosted regression trees. Methods for utilizing the propensity score include pair matching, full matching, decile strata, fine strata, regression adjustment using one or two nonlinear splines, inverse propensity weighting, and matching weights. We evaluate methods via a 'plasmode' simulation study, which creates simulated datasets on the basis of a real cohort study of two treatments constructed from administrative claims data. Our results suggest that regression adjustment and matching weights, regardless of the propensity score model estimation method, provide lower bias and mean squared error in the context of rare binary outcomes. Copyright © 2017 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: Property of the net reclassification improvement at event rate is explored, finding it informative to present plots of standardized net benefit/relative utility for the new versus old model across the domain of classification thresholds.
Abstract: The net reclassification improvement (NRI) is an attractively simple summary measure quantifying improvement in performance because of addition of new risk marker(s) to a prediction model. Originally proposed for settings with well-established classification thresholds, it quickly extended into applications with no thresholds in common use. Here we aim to explore properties of the NRI at event rate. We express this NRI as a difference in performance measures for the new versus old model and show that the quantity underlying this difference is related to several global as well as decision analytic measures of model performance. It maximizes the relative utility (standardized net benefit) across all classification thresholds and can be viewed as the Kolmogorov-Smirnov distance between the distributions of risk among events and non-events. It can be expressed as a special case of the continuous NRI, measuring reclassification from the 'null' model with no predictors. It is also a criterion based on the value of information and quantifies the reduction in expected regret for a given regret function, casting the NRI at event rate as a measure of incremental reduction in expected regret. More generally, we find it informative to present plots of standardized net benefit/relative utility for the new versus old model across the domain of classification thresholds. Then, these plots can be summarized with their maximum values, and the increment in model performance can be described by the NRI at event rate. We provide theoretical examples and a clinical application on the evaluation of prognostic biomarkers for atrial fibrillation. Copyright © 2016 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The Median Hazard Ratio (MHR) is a useful and intuitive measure for expressing cluster heterogeneity in the outcome and, thereby, estimating general contextual effects in multilevel survival analysis.
Abstract: Multilevel data occurs frequently in many research areas like health services research and epidemiology. A suitable way to analyze such data is through the use of multilevel regression models (MLRM). MLRM incorporate cluster-specific random effects which allow one to partition the total individual variance into between-cluster variation and between-individual variation. Statistically, MLRM account for the dependency of the data within clusters and provide correct estimates of uncertainty around regression coefficients. Substantively, the magnitude of the effect of clustering provides a measure of the General Contextual Effect (GCE). When outcomes are binary, the GCE can also be quantified by measures of heterogeneity like the Median Odds Ratio (MOR) calculated from a multilevel logistic regression model. Time-to-event outcomes within a multilevel structure occur commonly in epidemiological and medical research. However, the Median Hazard Ratio (MHR) that corresponds to the MOR in multilevel (i.e., 'frailty') Cox proportional hazards regression is rarely used. Analogously to the MOR, the MHR is the median relative change in the hazard of the occurrence of the outcome when comparing identical subjects from two randomly selected different clusters that are ordered by risk. We illustrate the application and interpretation of the MHR in a case study analyzing the hazard of mortality in patients hospitalized for acute myocardial infarction at hospitals in Ontario, Canada. We provide R code for computing the MHR. The MHR is a useful and intuitive measure for expressing cluster heterogeneity in the outcome and, thereby, estimating general contextual effects in multilevel survival analysis. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

Journal ArticleDOI
TL;DR: Results indicate that constrained randomization improves the power of the linearization F-test, the KC-corrected GEE t- test, and two permutation tests when the prognostic group-level variables are controlled for in the analysis and the size of randomization space is reasonably small.
Abstract: Group-randomized trials are randomized studies that allocate intact groups of individuals to different comparison arms. A frequent practical limitation to adopting such research designs is that only a limited number of groups may be available, and therefore, simple randomization is unable to adequately balance multiple group-level covariates between arms. Therefore, covariate-based constrained randomization was proposed as an allocation technique to achieve balance. Constrained randomization involves generating a large number of possible allocation schemes, calculating a balance score that assesses covariate imbalance, limiting the randomization space to a prespecified percentage of candidate allocations, and randomly selecting one scheme to implement. When the outcome is binary, a number of statistical issues arise regarding the potential advantages of such designs in making inference. In particular, properties found for continuous outcomes may not directly apply, and additional variations on statistical tests are available. Motivated by two recent trials, we conduct a series of Monte Carlo simulations to evaluate the statistical properties of model-based and randomization-based tests under both simple and constrained randomization designs, with varying degrees of analysis-based covariate adjustment. Our results indicate that constrained randomization improves the power of the linearization F-test, the KC-corrected GEE t-test (Kauermann and Carroll, 2001, Journal of the American Statistical Association 96, 1387-1396), and two permutation tests when the prognostic group-level variables are controlled for in the analysis and the size of randomization space is reasonably small. We also demonstrate that constrained randomization reduces power loss from redundant analysis-based adjustment for non-prognostic covariates. Design considerations such as the choice of the balance metric and the size of randomization space are discussed.

Journal ArticleDOI
TL;DR: It is shown that it is crucial to centre patient‐level covariates by their mean value in each trial, in order to separate out within‐trial and across‐trial information, and recommended that meta‐analysts should only use within‐ Trial information to examine individual predictors of treatment effect and that one‐stage IPD models should separate within‐ trial from across‐ trial information to avoid ecological bias.
Abstract: Stratified medicine utilizes individual-level covariates that are associated with a differential treatment effect, also known as treatment-covariate interactions. When multiple trials are available, meta-analysis is used to help detect true treatment-covariate interactions by combining their data. Meta-regression of trial-level information is prone to low power and ecological bias, and therefore, individual participant data (IPD) meta-analyses are preferable to examine interactions utilizing individual-level information. However, one-stage IPD models are often wrongly specified, such that interactions are based on amalgamating within- and across-trial information. We compare, through simulations and an applied example, fixed-effect and random-effects models for a one-stage IPD meta-analysis of time-to-event data where the goal is to estimate a treatment-covariate interaction. We show that it is crucial to centre patient-level covariates by their mean value in each trial, in order to separate out within-trial and across-trial information. Otherwise, bias and coverage of interaction estimates may be adversely affected, leading to potentially erroneous conclusions driven by ecological bias. We revisit an IPD meta-analysis of five epilepsy trials and examine age as a treatment effect modifier. The interaction is -0.011 (95% CI: -0.019 to -0.003; p = 0.004), and thus highly significant, when amalgamating within-trial and across-trial information. However, when separating within-trial from across-trial information, the interaction is -0.007 (95% CI: -0.019 to 0.005; p = 0.22), and thus its magnitude and statistical significance are greatly reduced. We recommend that meta-analysts should only use within-trial information to examine individual predictors of treatment effect and that one-stage IPD models should separate within-trial from across-trial information to avoid ecological bias. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

Journal ArticleDOI
TL;DR: The approach and extensions of it could yield improved predictions for public health decision makers, particularly in diseases with heterogeneous seasonal dynamics such as dengue fever, and demonstrate via simulation that a fully parameterized bandwidth matrix can be beneficial for estimating conditional densities.
Abstract: Creating statistical models that generate accurate predictions of infectious disease incidence is a challenging problem whose solution could benefit public health decision makers. We develop a new approach to this problem using kernel conditional density estimation (KCDE) and copulas. We obtain predictive distributions for incidence in individual weeks using KCDE and tie those distributions together into joint distributions using copulas. This strategy enables us to create predictions for the timing of and incidence in the peak week of the season. Our implementation of KCDE incorporates 2 novel kernel components: a periodic component that captures seasonality in disease incidence and a component that allows for a full parameterization of the bandwidth matrix with discrete variables. We demonstrate via simulation that a fully parameterized bandwidth matrix can be beneficial for estimating conditional densities. We apply the method to predicting dengue fever and influenza and compare to a seasonal autoregressive integrated moving average model and HHH4, a previously published extension to the generalized linear model framework developed for infectious disease incidence. The KCDE outperforms the baseline methods for predictions of dengue incidence in individual weeks. The KCDE also offers more consistent performance than the baseline models for predictions of incidence in the peak week and is comparable to the baseline models on the other prediction targets. Using the periodic kernel function led to better predictions of incidence. Our approach and extensions of it could yield improved predictions for public health decision makers, particularly in diseases with heterogeneous seasonal dynamics such as dengue fever.

Journal ArticleDOI
TL;DR: The survival mediational g‐formula constitutes a powerful tool for conducting mediation analysis with longitudinal data and is applied to analyze the Framingham Heart Study data to investigate the causal mechanism of smoking on mortality through coronary artery disease.
Abstract: We propose an approach to conduct mediation analysis for survival data with time-varying exposures, mediators, and confounders. We identify certain interventional direct and indirect effects through a survival mediational g-formula and describe the required assumptions. We also provide a feasible parametric approach along with an algorithm and software to estimate these effects. We apply this method to analyze the Framingham Heart Study data to investigate the causal mechanism of smoking on mortality through coronary artery disease. The estimated overall 10-year all-cause mortality risk difference comparing “always smoke 30 cigarettes per day” versus “never smoke” was 4.3 (95 % CI = (1.37, 6.30)). Of the overall effect, we estimated 7.91% (95% CI: = 1.36%, 19.32%) was mediated by the incidence and timing of coronary artery disease. The survival mediational g-formula constitutes a powerful tool for conducting mediation analysis with longitudinal data.

Journal ArticleDOI
TL;DR: Vn is applied to two published meta‐analyses and is concluded that it usefully augments standard methods when deciding upon the likely validity of summary meta‐analysis estimates in clinical practice and the link between statistical validity and homogeneity is demonstrated.
Abstract: An important question for clinicians appraising a meta-analysis is: are the findings likely to be valid in their own practice-does the reported effect accurately represent the effect that would occur in their own clinical population? To this end we advance the concept of statistical validity-where the parameter being estimated equals the corresponding parameter for a new independent study. Using a simple ('leave-one-out') cross-validation technique, we demonstrate how we may test meta-analysis estimates for statistical validity using a new validation statistic, Vn, and derive its distribution. We compare this with the usual approach of investigating heterogeneity in meta-analyses and demonstrate the link between statistical validity and homogeneity. Using a simulation study, the properties of Vn and the Q statistic are compared for univariate random effects meta-analysis and a tailored meta-regression model, where information from the setting (included as model covariates) is used to calibrate the summary estimate to the setting of application. Their properties are found to be similar when there are 50 studies or more, but for fewer studies Vn has greater power but a higher type 1 error rate than Q. The power and type 1 error rate of Vn are also shown to depend on the within-study variance, between-study variance, study sample size, and the number of studies in the meta-analysis. Finally, we apply Vn to two published meta-analyses and conclude that it usefully augments standard methods when deciding upon the likely validity of summary meta-analysis estimates in clinical practice. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

Journal ArticleDOI
TL;DR: A recently proposed Bayesian estimation procedure is described and compared with a profile likelihood method and with the DerSimonian-Laird and Mandel-Paule estimators including the Knapp-Hartung correction to reveal that the Bayesian approach is a promising alternative producing more accurate interval estimates than those three conventional procedures for meta-analysis.
Abstract: Pooling information from multiple, independent studies (meta-analysis) adds great value to medical research Random effects models are widely used for this purpose However, there are many different ways of estimating model parameters, and the choice of estimation procedure may be influential upon the conclusions of the meta-analysis In this paper, we describe a recently proposed Bayesian estimation procedure and compare it with a profile likelihood method and with the DerSimonian-Laird and Mandel-Paule estimators including the Knapp-Hartung correction The Bayesian procedure uses a non-informative prior for the overall mean and the between-study standard deviation that is determined by the Berger and Bernardo reference prior principle The comparison of these procedures focuses on the frequentist properties of interval estimates for the overall mean The results of our simulation study reveal that the Bayesian approach is a promising alternative producing more accurate interval estimates than those three conventional procedures for meta-analysis The Bayesian procedure is also illustrated using three examples of meta-analysis involving real data Copyright © 2016 John Wiley & Sons, Ltd

Journal ArticleDOI
TL;DR: A flexible Bayesian optimal phase II (BOP2) design that is capable of handling simple and complicated endpoints under a unified framework using a Dirichlet-multinomial model to accommodate different types of endpoints is proposed.
Abstract: We propose a flexible Bayesian optimal phase II (BOP2) design that is capable of handling simple (e.g., binary) and complicated (e.g., ordinal, nested, and co-primary) endpoints under a unified framework. We use a Dirichlet-multinomial model to accommodate different types of endpoints. At each interim, the go/no-go decision is made by evaluating a set of posterior probabilities of the events of interest, which is optimized to maximize power or minimize the number of patients under the null hypothesis. Unlike other existing Bayesian designs, the BOP2 design explicitly controls the type I error rate, thereby bridging the gap between Bayesian designs and frequentist designs. In addition, the stopping boundary of the BOP2 design can be enumerated prior to the onset of the trial. These features make the BOP2 design accessible to a wide range of users and regulatory agencies and particularly easy to implement in practice. Simulation studies show that the BOP2 design has favorable operating characteristics with higher power and lower risk of incorrectly terminating the trial than some existing Bayesian phase II designs. The software to implement the BOP2 design is freely available at www.trialdesign.org. Copyright © 2017 John Wiley & Sons, Ltd.