scispace - formally typeset
Search or ask a question

Showing papers in "Statistics in Medicine in 2007"


Journal ArticleDOI
TL;DR: This tutorial aims to review statistical methods for the analysis of competing risks and multi-state models, with the emphasis on practical issues like data preparation, estimation of the effect of covariates, and estimation of cumulative incidence functions and state and transition probabilities.
Abstract: Standard survival data measure the time span from some time origin until the occurrence of one type of event. If several types of events occur, a model describing progression to each of these competing risks is needed. Multi-state models generalize competing risks models by also describing transitions to intermediate events. Methods to analyze such models have been developed over the last two decades. Fortunately, most of the analyzes can be performed within the standard statistical packages, but may require some extra effort with respect to data preparation and programming. This tutorial aims to review statistical methods for the analysis of competing risks and multi-state models. Although some conceptual issues are covered, the emphasis is on practical issues like data preparation, estimation of the effect of covariates, and estimation of cumulative incidence functions and state and transition probabilities. Examples of analysis with standard software are shown.

1,881 citations


Journal ArticleDOI
TL;DR: A case study of the association between drug exposure and mortality is provided to show that including a variable that is related to treatment, but not outcome, does not improve balance and reduces the number of matched pairs available for analysis.
Abstract: The propensity score--the probability of exposure to a specific treatment conditional on observed variables--is increasingly being used in observational studies. Creating strata in which subjects are matched on the propensity score allows one to balance measured variables between treated and untreated subjects. There is an ongoing controversy in the literature as to which variables to include in the propensity score model. Some advocate including those variables that predict treatment assignment, while others suggest including all variables potentially related to the outcome, and still others advocate including only variables that are associated with both treatment and outcome. We provide a case study of the association between drug exposure and mortality to show that including a variable that is related to treatment, but not outcome, does not improve balance and reduces the number of matched pairs available for analysis. In order to investigate this issue more comprehensively, we conducted a series of Monte Carlo simulations of the performance of propensity score models that contained variables related to treatment allocation, or variables that were confounders for the treatment-outcome pair, or variables related to outcome or all variables related to either outcome or treatment or neither. We compared the use of these different propensity scores models in matching and stratification in terms of the extent to which they balanced variables. We demonstrated that all propensity scores models balanced measured confounders between treated and untreated subjects in a propensity-score matched sample. However, including only the true confounders or the variables predictive of the outcome in the propensity score model resulted in a substantially larger number of matched pairs than did using the treatment-allocation model. Stratifying on the quintiles of any propensity score model resulted in residual imbalance between treated and untreated subjects in the upper and lower quintiles. Greater balance between treated and untreated subjects was obtained after matching on the propensity score than after stratifying on the quintiles of the propensity score. When a confounding variable was omitted from any of the propensity score models, then matching or stratifying on the propensity score resulted in residual imbalance in prognostically important variables between treated and untreated subjects. We considered four propensity score models for estimating treatment effects: the model that included only true confounders; the model that included all variables associated with the outcome; the model that included all measured variables; and the model that included all variables associated with treatment selection. Reduction in bias when estimating a null treatment effect was equivalent for all four propensity score models when propensity score matching was used. Reduction in bias was marginally greater for the first two propensity score models than for the last two propensity score models when stratification on the quintiles of the propensity score model was employed. Furthermore, omitting a confounding variable from the propensity score model resulted in biased estimation of the treatment effect. Finally, the mean squared error for estimating a null treatment effect was lower when either of the first two propensity scores was used compared to when either of the last two propensity score models was used.

1,064 citations


Journal ArticleDOI
TL;DR: The theoretical perspective underlying this position will be presented followed by a particular application in the context of the US tobacco litigation that uses propensity score methods to create subgroups of treated units and control units who are at least as similar with respect to their distributions of observed background characteristics as if they had been randomized.
Abstract: For estimating causal effects of treatments, randomized experiments are generally considered the gold standard. Nevertheless, they are often infeasible to conduct for a variety of reasons, such as ethical concerns, excessive expense, or timeliness. Consequently, much of our knowledge of causal effects must come from non-randomized observational studies. This article will advocate the position that observational studies can and should be designed to approximate randomized experiments as closely as possible. In particular, observational studies should be designed using only background information to create subgroups of similar treated and control units, where 'similar' here refers to their distributions of background variables. Of great importance, this activity should be conducted without any access to any outcome data, thereby assuring the objectivity of the design. In many situations, this objective creation of subgroups of similar treated and control units, which are balanced with respect to covariates, can be accomplished using propensity score methods. The theoretical perspective underlying this position will be presented followed by a particular application in the context of the US tobacco litigation. This application uses propensity score methods to create subgroups of treated units (male current smokers) and control units (male never smokers) who are at least as similar with respect to their distributions of observed background characteristics as if they had been randomized. The collection of these subgroups then 'approximate' a randomized block experiment with respect to the observed covariates.

1,028 citations


Journal ArticleDOI
TL;DR: At event rates below 1 per cent the Peto one‐step odds ratio method was the least biased and most powerful method, and provided the best confidence interval coverage, provided there was no substantial imbalance between treatment and control group sizes within trials, and treatment effects were not exceptionally large.
Abstract: For rare outcomes, meta-analysis of randomized trials may be the only way to obtain reliable evidence of the effects of healthcare interventions. However, many methods of meta-analysis are based on large sample approximations, and may be unsuitable when events are rare. Through simulation, we evaluated the performance of 12 methods for pooling rare events, considering estimability, bias, coverage and statistical power. Simulations were based on data sets from three case studies with between five and 19 trials, using baseline event rates between 0.1 and 10 per cent and risk ratios of 1, 0.75, 0.5 and 0.2. We found that most of the commonly used meta-analytical methods were biased when data were sparse. The bias was greatest in inverse variance and DerSimonian and Laird odds ratio and risk difference methods, and the Mantel-Haenszel (MH) odds ratio method using a 0.5 zero-cell correction. Risk difference meta-analytical methods tended to show conservative confidence interval coverage and low statistical power at low event rates. At event rates below 1 per cent the Peto one-step odds ratio method was the least biased and most powerful method, and provided the best confidence interval coverage, provided there was no substantial imbalance between treatment and control group sizes within trials, and treatment effects were not exceptionally large. In other circumstances the MH OR without zero-cell corrections, logistic regression and the exact method performed similarly to each other, and were less biased than the Peto method.

830 citations


Journal ArticleDOI
TL;DR: Applied researchers are encouraged to use propensity score matching and IPTW using the propensity score when estimating the relative effect of treatment on time-to-event outcomes, as both approaches allow for the estimation of marginal hazard ratios with minimal bias.
Abstract: Propensity score methods are increasingly being used to reduce or minimize the effects of confounding when estimating the effects of treatments, exposures, or interventions when using observational or non-randomized data. Under the assumption of no unmeasured confounders, previous research has shown that propensity score methods allow for unbiased estimation of linear treatment effects (e.g., differences in means or proportions). However, in biomedical research, time-to-event outcomes occur frequently. There is a paucity of research into the performance of different propensity score methods for estimating the effect of treatment on time-to-event outcomes. Furthermore, propensity score methods allow for the estimation of marginal or population-average treatment effects. We conducted an extensive series of Monte Carlo simulations to examine the performance of propensity score matching (1:1 greedy nearest-neighbor matching within propensity score calipers), stratification on the propensity score, inverse probability of treatment weighting (IPTW) using the propensity score, and covariate adjustment using the propensity score to estimate marginal hazard ratios. We found that both propensity score matching and IPTW using the propensity score allow for the estimation of marginal hazard ratios with minimal bias. Of these two approaches, IPTW using the propensity score resulted in estimates with lower mean squared error when estimating the effect of treatment in the treated. Stratification on the propensity score and covariate adjustment using the propensity score result in biased estimation of both marginal and conditional hazard ratios. Applied researchers are encouraged to use propensity score matching and IPTW using the propensity score when estimating the relative effect of treatment on time-to-event outcomes. Copyright © 2012 John Wiley & Sons, Ltd.

821 citations


Journal ArticleDOI
TL;DR: It is argued why MFP is the preferred approach for multivariable model building with continuous covariates, and it is shown that spline modelling, while extremely flexible, can generate fitted curves with uninterpretable 'wiggles'.
Abstract: In developing regression models, data analysts are often faced with many predictor variables that may influence an outcome variable. After more than half a century of research, the 'best' way of selecting a multivariable model is still unresolved. It is generally agreed that subject matter knowledge, when available, should guide model building. However, such knowledge is often limited, and data-dependent model building is required. We limit the scope of the modelling exercise to selecting important predictors and choosing interpretable and transportable functions for continuous predictors. Assuming linear functions, stepwise selection and all-subset strategies are discussed; the key tuning parameters are the nominal P-value for testing a variable for inclusion and the penalty for model complexity, respectively. We argue that stepwise procedures perform better than a literature-based assessment would suggest. Concerning selection of functional form for continuous predictors, the principal competitors are fractional polynomial functions and various types of spline techniques. We note that a rigorous selection strategy known as multivariable fractional polynomials (MFP) has been developed. No spline-based procedure for simultaneously selecting variables and functional forms has found wide acceptance. Results of FP and spline modelling are compared in two data sets. It is shown that spline modelling, while extremely flexible, can generate fitted curves with uninterpretable 'wiggles', particularly when automatic methods for choosing the smoothness are employed. We give general recommendations to practitioners for carrying out variable and function selection. While acknowledging that further research is needed, we argue why MFP is our preferred approach for multivariable model building with continuous covariates.

806 citations


Journal ArticleDOI
TL;DR: The optimum test policy was found to be analysis by the 'N-1' chi-squared test when the minimum expected number is at least 1, and otherwise, by the Fisher-Irwin test by Irwin's rule (taking the total probability of tables in either tail that are as likely as, or less likely than the one observed).
Abstract: Two-by-two tables commonly arise in comparative trials and cross-sectional studies. In medical studies, two-by-two tables may have a small sample size due to the rarity of a condition, or to limited resources. Current recommendations on the appropriate statistical test mostly specify the chi-squared test for tables where the minimum expected number is at least 5 (following Fisher and Cochran), and otherwise the Fisher-Irwin test; but there is disagreement on which versions of the chi-squared and Fisher-Irwin tests should be used. A further uncertainty is that, according to Cochran, the number 5 was chosen arbitrarily. Computer-intensive techniques were used in this study to compare seven two-sided tests of two-by-two tables in terms of their Type I errors. The tests were K. Pearson's and Yates's chi-squared tests and the 'N-1' chi-squared test (first proposed by E. Pearson), together with four versions of the Fisher-Irwin test (including two mid-P versions). The optimum test policy was found to be analysis by the 'N-1' chi-squared test when the minimum expected number is at least 1, and otherwise, by the Fisher-Irwin test by Irwin's rule (taking the total probability of tables in either tail that are as likely as, or less likely than the one observed). This policy was found to have increased power compared to Cochran's recommendations.

701 citations


Journal ArticleDOI
TL;DR: Using the trim and fill method as a form of sensitivity analysis as intended by the authors of the method can help to reduce the bias in pooled estimates, even though the performance of this method is not ideal.
Abstract: The trim and fill method allows estimation of an adjusted meta-analysis estimate in the presence of publication bias. To date, the performance of the trim and fill method has had little assessment. In this paper, we provide a more comprehensive examination of different versions of the trim and fill method in a number of simulated meta-analysis scenarios, comparing results with those from usual unadjusted meta-analysis models and two simple alternatives, namely use of the estimate from: (i) the largest; or (ii) the most precise study in the meta-analysis. Findings suggest a great deal of variability in the performance of the different approaches. When there is large between-study heterogeneity the trim and fill method can underestimate the true positive effect when there is no publication bias. However, when publication bias is present the trim and fill method can give estimates that are less biased than the usual meta-analysis models. Although results suggest that the use of the estimate from the largest or most precise study seems a reasonable approach in the presence of publication bias, when between-study heterogeneity exists our simulations show that these estimates are quite biased. We conclude that in the presence of publication bias use of the trim and fill method can help to reduce the bias in pooled estimates, even though the performance of this method is not ideal. However, because we do not know whether funnel plot asymmetry is truly caused by publication bias, and because there is great variability in the performance of different trim and fill estimators and models in various meta-analysis scenarios, we recommend use of the trim and fill method as a form of sensitivity analysis as intended by the authors of the method.

535 citations


Journal ArticleDOI
TL;DR: Analysis of rates from disease registers are often reported inadequately because of too coarse tabulation of data and because of confusion about the mechanics of the age–period–cohort model used for analysis.
Abstract: Analysis of rates from disease registers are often reported inadequately because of too coarse tabulation of data and because of confusion about the mechanics of the age-period-cohort model used for analysis. Rates should be considered as observations in a Lexis diagram, and tabulation a necessary reduction of data, which should be as small as possible, and age, period and cohort should be treated as continuous variables. Reporting should include the absolute level of the rates as part of the age-effects. This paper gives a guide to analysis of rates from a Lexis diagram by the age-period-cohort model. Three aspects are considered separately: (1) tabulation of cases and person-years; (2) modelling of age, period and cohort effects; and (3) parametrization and reporting of the estimated effects. It is argued that most of the confusion in the literature comes from failure to make a clear distinction between these three aspects. A set of recommendations for the practitioner is given and a package for R that implements the recommendations is introduced.

397 citations


Journal ArticleDOI
TL;DR: This tutorial provides an overview and combines together the theory behind MI, the implementation of MI, and discusses increasing possibilities of the use of MI using commercial and free software.
Abstract: Missing data is a common complication in data analysis. In many medical settings missing data can cause difficulties in estimation, precision and inference. Multiple imputation (MI) (Multiple Imputation for Nonresponse in Surveys. Wiley: New York, 1987) is a simulation-based approach to deal with incomplete data. Although there are many different methods to deal with incomplete data, MI has become one of the leading methods. Since the late 1980s we observed a constant increase in the use and publication of MI-related research. This tutorial does not attempt to cover all the material concerning MI, but rather provides an overview and combines together the theory behind MI, the implementation of MI, and discusses increasing possibilities of the use of MI using commercial and free software. We illustrate some of the major points using an example from an Alzheimer disease (AD) study. In this AD study, while clinical data are available for all subjects, postmortem data are only available for the subset of those who died and underwent an autopsy. Analysis of incomplete data requires making unverifiable assumptions. These assumptions are discussed in detail in the text. Relevant S-Plus code is provided.

381 citations


Journal ArticleDOI
TL;DR: A taxonomy of the hazard functions of the GG family, which includes various special distributions and allows depiction of effects of exposures on hazard functions is presented, which was applied to study survival after a diagnosis of clinical AIDS during different eras of HIV therapy.
Abstract: The widely used Cox proportional hazards regression model for the analysis of censored survival data has limited utility when either hazard functions themselves are of primary interest, or when relative times instead of relative hazards are the relevant measures of association. Parametric regression models are an attractive option in situations such as this, although the choice of a particular model from the available families of distributions can be problematic. The generalized gamma (GG) distribution is an extensive family that contains nearly all of the most commonly used distributions, including the exponential, Weibull, log normal and gamma. More importantly, the GG family includes all four of the most common types of hazard function: monotonically increasing and decreasing, as well as bathtub and arc-shaped hazards. We present here a taxonomy of the hazard functions of the GG family, which includes various special distributions and allows depiction of effects of exposures on hazard functions. We applied the proposed taxonomy to study survival after a diagnosis of clinical AIDS during different eras of HIV therapy, where proportionality of hazard functions was clearly not fulfilled and flexibility in estimating hazards with very different shapes was needed. Comparisons of survival after AIDS in different eras of therapy are presented in terms of both relative times and relative hazards. Standard errors for these and other derived quantities are computed using the delta method and checked using the bootstrap. Description of standard statistical software (Stata, SAS and S-Plus) for the computations is included and available at http://statepi.jhsph.edu/software.

Journal ArticleDOI
TL;DR: A novel method for the estimation of PPV and NPV, as well as their confidence intervals, is developed and is applied to two case–control studies: a diagnostic test assessing the ability of the e4 allele of the apolipoprotein E gene (ApoE) on distinguishing patients with late‐onset Alzheimer's disease (AD) and a prognostic test assessingThe predictive ability of a 70‐gene signature on breast cancer metastasis.
Abstract: The accuracy of a binary-scale diagnostic test can be represented by sensitivity (Se), specificity (Sp) and positive and negative predictive values (PPV and NPV). Although Se and Sp measure the intrinsic accuracy of a diagnostic test that does not depend on the prevalence rate, they do not provide information on the diagnostic accuracy of a particular patient. To obtain this information we need to use PPV and NPV. Since PPV and NPV are functions of both the accuracy of the test and the prevalence of the disease, constructing their confidence intervals for a particular patient is not straightforward. In this paper, a novel method for the estimation of PPV and NPV, as well as their confidence intervals, is developed. For both predictive values, standard, adjusted and their logit transformed-based confidence intervals are compared using coverage probabilities and interval lengths in a simulation study. These methods are then applied to two case-control studies: a diagnostic test assessing the ability of the e4 allele of the apolipoprotein E gene (ApoE.e4) on distinguishing patients with late-onset Alzheimer's disease (AD) and a prognostic test assessing the predictive ability of a 70-gene signature on breast cancer metastasis.

Journal ArticleDOI
TL;DR: The two estimators MVvc and EB are found to be the most accurate in general, particularly when the heterogeneity variance is moderate to large, particularly during the period when the number of studies is large.
Abstract: For random effects meta-analysis, seven different estimators of the heterogeneity variance are compared and assessed using a simulation study. The seven estimators are the variance component type estimator (VC), the method of moments estimator (MM), the maximum likelihood estimator (ML), the restricted maximum likelihood estimator (REML), the empirical Bayes estimator (EB), the model error variance type estimator (MV), and a variation of the MV estimator (MVvc). The performance of the estimators is compared in terms of both bias and mean squared error, using Monte Carlo simulation. The results show that the REML and especially the ML and MM estimators are not accurate, having large biases unless the true heterogeneity variance is small. The VC estimator tends to overestimate the heterogeneity variance in general, but is quite accurate when the number of studies is large. The MV estimator is not a good estimator when the heterogeneity variance is small to moderate, but it is reasonably accurate when the heterogeneity variance is large. The MVvc estimator is an improved estimator compared to the MV estimator, especially for small to moderate values of the heterogeneity variance. The two estimators MVvc and EB are found to be the most accurate in general, particularly when the heterogeneity variance is moderate to large.

Journal ArticleDOI
TL;DR: A novel method for constructing confidence intervals for the amount of heterogeneity in the effect sizes is proposed that guarantees nominal coverage probabilities even in small samples when model assumptions are satisfied and yields the most accurate coverage probabilities under conditions more analogous to practice.
Abstract: Effect size estimates to be combined in a systematic review are often found to be more variable than one would expect based on sampling differences alone. This is usually interpreted as evidence that the effect sizes are heterogeneous. A random-effects model is then often used to account for the heterogeneity in the effect sizes. A novel method for constructing confidence intervals for the amount of heterogeneity in the effect sizes is proposed that guarantees nominal coverage probabilities even in small samples when model assumptions are satisfied. A variety of existing approaches for constructing such confidence intervals are summarized and the various methods are applied to an example to illustrate their use. A simulation study reveals that the newly proposed method yields the most accurate coverage probabilities under conditions more analogous to practice, where assumptions about normally distributed effect size estimates and known sampling variances only hold asymptotically.

Journal ArticleDOI
TL;DR: In this paper, the authors examined the performance of alternatives to the naive test for comparison of survival curves and compared the type I errors and power of these tests for a variety of sample sizes by a Monte Carlo study.
Abstract: A common problem encountered in many medical applications is the comparison of survival curves. Often, rather than comparison of the entire survival curves, interest is focused on the comparison at a fixed point in time. In most cases, the naive test based on a difference in the estimates of survival is used for this comparison. In this note, we examine the performance of alternatives to the naive test. These include tests based on a number of transformations of the survival function and a test based on a generalized linear model for pseudo-observations. The type I errors and power of these tests for a variety of sample sizes are compared by a Monte Carlo study. We also discuss how these tests may be extended to situations where the data are stratified. The pseudo-value approach is also applicable in more detailed regression analysis of the survival probability at a fixed point in time. The methods are illustrated on a study comparing survival for autologous and allogeneic bone marrow transplants.

Journal ArticleDOI
TL;DR: The design and analysis of cluster randomized trials has been a recurrent theme in Statistics in Medicine since the early volumes and recent developments, particularly those that featured in the journal are reviewed.
Abstract: The design and analysis of cluster randomized trials has been a recurrent theme in Statistics in Medicine since the early volumes. In celebration of 25 years of Statistics in Medicine, this paper reviews recent developments, particularly those that featured in the journal. Issues in design such as sample size calculations, matched paired designs, cohort versus cross-sectional designs, and practical design problems are covered. Developments in analysis include modification of robust methods to cope with small numbers of clusters, generalized estimation equations, population averaged and cluster specific models. Finally, issues on presenting data, some other clustering issues and the general problem of evaluating complex interventions are briefly mentioned.

Journal ArticleDOI
TL;DR: It is found that conditioning on the propensity score resulted in biased estimation of the true conditional odds ratio and thetrue conditional hazard ratio, however, conditioning onThe propensity score did not result inbiased estimation ofThe true conditional rate ratio.
Abstract: Propensity score methods are increasingly being used to estimate causal treatment effects in the medical literature. Conditioning on the propensity score results in unbiased estimation of the expected difference in observed responses to two treatments. The degree to which conditioning on the propensity score introduces bias into the estimation of the conditional odds ratio or conditional hazard ratio, which are frequently used as measures of treatment effect in observational studies, has not been extensively studied. We conducted Monte Carlo simulations to determine the degree to which propensity score matching, stratification on the quintiles of the propensity score, and covariate adjustment using the propensity score result in biased estimation of conditional odds ratios, hazard ratios, and rate ratios. We found that conditioning on the propensity score resulted in biased estimation of the true conditional odds ratio and the true conditional hazard ratio. In all scenarios examined, treatment effects were biased towards the null treatment effect. However, conditioning on the propensity score did not result in biased estimation of the true conditional rate ratio. In contrast, conventional regression methods allowed unbiased estimation of the true conditional treatment effect when all variables associated with the outcome were included in the regression model. The observed bias in propensity score methods is due to the fact that regression models allow one to estimate conditional treatment effects, whereas propensity score methods allow one to estimate marginal treatment effects. In several settings with non-linear treatment effects, marginal and conditional treatment effects do not coincide.

Journal ArticleDOI
TL;DR: This work proposes an extension to relative survival of a flexible parametric model proposed by Royston and Parmar for censored survival data that provides smooth estimates of the relative survival and excess mortality rates by using restricted cubic splines on the log cumulative excess hazard scale.
Abstract: Relative survival is frequently used in population-based studies as a method for estimating disease-related mortality without the need for information on cause of death. We propose an extension to relative survival of a flexible parametric model proposed by Royston and Parmar for censored survival data. The model provides smooth estimates of the relative survival and excess mortality rates by using restricted cubic splines on the log cumulative excess hazard scale. The approach has several advantages over some of the more standard relative survival models, which adopt a piecewise approach, the main being the ability to model time on a continuous scale, the survival and hazard functions are obtained analytically and it does not use split-time data.


Journal ArticleDOI
TL;DR: In disease surveillance, there are often many different data sets or data groupings for which the authors wish to do surveillance, and if each data set is analysed separately rather than combined, the statistical power to detect an outbreak that is present in all data sets may suffer due to low numbers in each.
Abstract: In disease surveillance, there are often many different data sets or data groupings for which we wish to do surveillance. If each data set is analysed separately rather than combined, the statistical power to detect an outbreak that is present in all data sets may suffer due to low numbers in each. On the other hand, if the data sets are added by taking the sum of the counts, then a signal that is primarily present in one data set may be hidden due to random noise in the other data sets. In this paper, we present an extension of the spatial and space-time scan statistic that simultaneously incorporates multiple data sets into a single likelihood function, so that a signal is generated whether it occurs in only one or in multiple data sets. This is done by defining the combined log likelihood as the sum of the individual log likelihoods for those data sets for which the observed case count is more than the expected. We also present another extension, where the concept of combining likelihoods from different data sets is used to adjust for covariates. Using data from the National Bioterrorism Syndromic Surveillance Demonstration Project, we illustrate the new method using physician telephone calls, regular physician visits and urgent care visits by Harvard Pilgrim Health Care members cared for by Harvard Vanguard Medical Associates, a large multi-specialty group practice in Massachusetts. For upper and lower gastrointestinal (GI) illness, there were on average 20 telephone calls, nine urgent care visits and 22 regular physician visits per day. The strongest signal was generated by a single data set and due to a familial outbreak of pinworm disease. The second and third strongest signals were generated by the combined strength of two of the three data sets.

Journal ArticleDOI
TL;DR: A measure of the difference between a pair of curves based on the area between them, standardized by the average of the areas under the couple of curves is proposed.
Abstract: To allow for non-linear exposure-response relationships, we applied flexible non-parametric smoothing techniques to models of time to lung cancer mortality in two occupational cohorts with skewed exposure distributions. We focused on three different smoothing techniques in Cox models: penalized splines, restricted cubic splines, and fractional polynomials. We compared standard software implementations of these three methods based on their visual representation and criterion for model selection. We propose a measure of the difference between a pair of curves based on the area between them, standardized by the average of the areas under the pair of curves. To capture the variation in the difference over the range of exposure, the area between curves was also calculated at percentiles of exposure and expressed as a percentage of the total difference. The dose-response curves from the three methods were similar in both studies over the denser portion of the exposure range, with the difference between curves up to the 50th percentile less than 1 per cent of the total difference. A comparison of inverse variance weighted areas applied to the data set with a more skewed exposure distribution allowed us to estimate area differences with more precision by reducing the proportion attributed to the upper 1 per cent tail region. Overall, the penalized spline and the restricted cubic spline were closer to each other than either was to the fractional polynomial.

Journal ArticleDOI
TL;DR: Three alternative methods for converting a multivariate normal imputation value into a binary imputed value are explored, finding that adaptive rounding provided the best performance.
Abstract: Multiple imputation has become easier to perform with the advent of several software packages that provide imputations under a multivariate normal model, but imputation of missing binary data remains an important practical problem Here, we explore three alternative methods for converting a multivariate normal imputed value into a binary imputed value: (1) simple rounding of the imputed value to the nearer of 0 or 1, (2) a Bernoulli draw based on a ‘coin flip’ where an imputed value between 0 and 1 is treated as the probability of drawing a 1, and (3) an adaptive rounding scheme where the cut-off value for determining whether to round to 0 or 1 is based on a normal approximation to the binomial distribution, making use of the marginal proportions of 0's and 1's on the variable We perform simulation studies on a data set of 206 802 respondents to the California Healthy Kids Survey, where the fully observed data on 198 262 individuals defines the population, from which we repeatedly draw samples with missing data, impute, calculate statistics and confidence intervals, and compare bias and coverage against the true values Frequently, we found satisfactory bias and coverage properties, suggesting that approaches such as these that are based on statistical approximations are preferable in applied research to either avoiding settings where missing data occur or relying on complete-case analyses Considering both the occurrence and extent of deficits in coverage, we found that adaptive rounding provided the best performance Copyright © 2006 John Wiley & Sons, Ltd

Journal ArticleDOI
TL;DR: This paper contrasts these two methods and presents the benefits of each: modelling the cause specific hazard and modelling the hazard of the subdistribution.
Abstract: When competing risks are present, two types of analysis can be performed: modelling the cause specific hazard and modelling the hazard of the subdistribution. This paper contrasts these two methods and presents the benefits of each. The interpretation is specific to the analysis performed. When modelling the cause specific hazard, one performs the analysis under the assumption that the competing risks do not exist. This could be beneficial when, for example, the main interest is whether the treatment works in general. In modelling the hazard of the subdistribution, one incorporates the competing risks in the analysis. This analysis compares the observed incidence of the event of interest between groups. The latter analysis is specific to the structure of the observed data and it can be generalized only to another population with similar competing risks.

Journal ArticleDOI
TL;DR: The benefits and limitations of multivariate meta-analysis are illustrated to provide helpful insight for practitioners, and how and why a BRMA is able to 'borrow strength' across outcomes is shown.
Abstract: Often multiple outcomes are of interest in each study identified by a systematic review, and in this situation a separate univariate meta-analysis is usually applied to synthesize the evidence for each outcome independently; an alternative approach is a single multivariate meta-analysis model that utilizes any correlation between outcomes and obtains all the pooled estimates jointly. Surprisingly, multivariate meta-analysis is rarely considered in practice, so in this paper we illustrate the benefits and limitations of the approach to provide helpful insight for practitioners. We compare a bivariate random-effects meta-analysis (BRMA) to two independent univariate random-effects meta-analyses (URMA), and show how and why a BRMA is able to 'borrow strength' across outcomes. Then, on application to two examples in healthcare, we show: (i) given complete data for both outcomes in each study, BRMA is likely to produce individual pooled estimates with very similar standard errors to those from URMA; (ii) given some studies where one of the outcomes is missing at random, the 'borrowing of strength' is likely to allow BRMA to produce individual pooled estimates with noticeably smaller standard errors than those from URMA; (iii) for either complete data or missing data, BRMA will produce a more appropriate standard error of the pooled difference between outcomes as it incorporates their correlation, which is not possible using URMA; and (iv) despite its advantages, BRMA may often not be possible due to the difficulty in obtaining the within-study correlations required to fit the model. Bivariate meta-regression and further research priorities are also discussed.

Journal ArticleDOI
TL;DR: This paper provides a tutorial on the practical implementation of a flexible random effects model based on methodology developed in Bayesian non‐parametrics literature, and implemented in freely available software, by providing code for Winbugs.
Abstract: Random effects models are used in many applications in medical statistics, including meta-analysis, cluster randomized trials and comparisons of health care providers. This paper provides a tutorial on the practical implementation of a flexible random effects model based on methodology developed in Bayesian non-parametrics literature, and implemented in freely available software. The approach is applied to the problem of hospital comparisons using routine performance data, and among other benefits provides a diagnostic to detect clusters of providers with unusual results, thus avoiding problems caused by masking in traditional parametric approaches. By providing code for Winbugs we hope that the model can be used by applied statisticians working in a wide variety of applications.

Journal ArticleDOI
TL;DR: It is argued that the popular misconception that Blomqvist's formula is superior to Oldham's method is due to a failure to recognize that the heterogeneity of individual responses to treatment is a source of regression to the mean in the analysis of the relation between change and initial value.
Abstract: The relation between initial disease status and subsequent change following treatment has attracted great interest in clinical research. However, statisticians have repeatedly warned against correlating/regressing change with baseline due to two methodological concerns known as mathematical coupling and regression to the mean. Oldham's method and Blomqvist's formula are the two most often adopted methods to rectify these problems. The aims of this article are to review briefly the proposed solutions in the statistical and psychological literature, and to clarify the popular misconception that Blomqvist's formula is superior to Oldham's method. We argue that this misconception is due to a failure to recognize that the heterogeneity of individual responses to treatment is a source of regression to the mean in the analysis of the relation between change and initial value. Furthermore, we demonstrate how each method actually answers different research questions, and how confusion arises when this is not always understood.

Journal ArticleDOI
TL;DR: The results indicate that credible intervals will have approximately nominal coverage probability, on average, when the prior distribution used for sensitivity analysis approximates the sampling distribution of model parameters in a hypothetical sequence of observational studies.
Abstract: We consider Bayesian sensitivity analysis for unmeasured confounding in observational studies where the association between a binary exposure, binary response, measured confounders and a single binary unmeasured confounder can be formulated using logistic regression models. A model for unmeasured confounding is presented along with a family of prior distributions that model beliefs about a possible unknown unmeasured confounder. Simulation from the posterior distribution is accomplished using Markov chain Monte Carlo. Because the model for unmeasured confounding is not identifiable, standard large-sample theory for Bayesian analysis is not applicable. Consequently, the impact of different choices of prior distributions on the coverage probability of credible intervals is unknown. Using simulations, we investigate the coverage probability when averaged with respect to various distributions over the parameter space. The results indicate that credible intervals will have approximately nominal coverage probability, on average, when the prior distribution used for sensitivity analysis approximates the sampling distribution of model parameters in a hypothetical sequence of observational studies. We motivate the method in a study of the effectiveness of beta blocker therapy for treatment of heart failure.

Journal ArticleDOI
TL;DR: This work has shown that meta‐regression can be used to estimate treatment‐covariate interactions using published data, but it is known to lack statistical power, and is prone to bias.
Abstract: Meta-analyses of clinical trials are increasingly seeking to go beyond estimating the effect of a treatment and may also aim to investigate the effect of other covariates and how they alter treatment effectiveness. This requires the estimation of treatment-covariate interactions. Meta-regression can be used to estimate such interactions using published data, but it is known to lack statistical power, and is prone to bias. The use of individual patient data can improve estimation of such interactions, among other benefits, but it can be difficult and time-consuming to collect and analyse. This paper derives, under certain conditions, the power of meta-regression and IPD methods to detect treatment–covariate interactions. These power formulae are shown to depend on heterogeneity in the covariate distributions across studies. This allows the derivation of simple tests, based on heterogeneity statistics, for comparing the statistical power of the analysis methods. Copyright © 2006 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: It is concluded that correct adjustment for the age at entry is crucial in reducing bias of the estimated coefficients of proportional hazards regression models, and the unadjusted age-scale model is inferior to any of the five other models considered, regardless of their choice of time scale.
Abstract: Time-to-event regression is a frequent tool in biomedical research. In clinical trials this time is usually measured from the beginning of the study. The same approach is often adopted in the analysis of longitudinal observational studies. However, in recent years there has appeared literature making a case for the use of the date of birth as a starting point, and thus utilize age as the time-to-event. In this paper, we explore different types of age-scale models and compare them with time-on-study models in terms of the estimated regression coefficients they produce. We consider six proportional hazards regression models that differ in the choice of time scale and in the method of adjusting for the years before the study. By considering the estimating equations of these models as well as numerical simulations we conclude that correct adjustment for the age at entry is crucial in reducing bias of the estimated coefficients. The unadjusted age-scale model is inferior to any of the five other models considered, regardless of their choice of time scale. Additionally, if adjustment for age at entry is made, our analyses show very little to suggest that there exists any practically meaningful difference in the estimated regression coefficients depending on the choice of time scale. These findings are supported by four practical examples from the Framingham Heart Study.

Journal ArticleDOI
TL;DR: This study shows that regression trees do not perform as well as logistic regression for predicting mortality following AMI, however, the logistic regressors had performance comparable to that of more flexible, data-driven models such as GAMs and MARS.
Abstract: Clinicians and health service researchers are frequently interested in predicting patient-specific probabilities of adverse events (e.g. death, disease recurrence, post-operative complications, hospital readmission). There is an increasing interest in the use of classification and regression trees (CART) for predicting outcomes in clinical studies. We compared the predictive accuracy of logistic regression with that of regression trees for predicting mortality after hospitalization with an acute myocardial infarction (AMI). We also examined the predictive ability of two other types of data-driven models: generalized additive models (GAMs) and multivariate adaptive regression splines (MARS). We used data on 9484 patients admitted to hospital with an AMI in Ontario. We used repeated split-sample validation: the data were randomly divided into derivation and validation samples. Predictive models were estimated using the derivation sample and the predictive accuracy of the resultant model was assessed using the area under the receiver operating characteristic (ROC) curve in the validation sample. This process was repeated 1000 times-the initial data set was randomly divided into derivation and validation samples 1000 times, and the predictive accuracy of each method was assessed each time. The mean ROC curve area for the regression tree models in the 1000 derivation samples was 0.762, while the mean ROC curve area of a simple logistic regression model was 0.845. The mean ROC curve areas for the other methods ranged from a low of 0.831 to a high of 0.851. Our study shows that regression trees do not perform as well as logistic regression for predicting mortality following AMI. However, the logistic regression model had performance comparable to that of more flexible, data-driven models such as GAMs and MARS.