scispace - formally typeset
Search or ask a question

Showing papers in "Statistics in Medicine in 2005"


Journal ArticleDOI
TL;DR: It is concluded that funnel plots are flexible, attractively simple, and avoid spurious ranking of institutions into 'league tables'.
Abstract: 'Funnel plots' are recommended as a graphical aid for institutional comparisons, in which an estimate of an underlying quantity is plotted against an interpretable measure of its precision. 'Control limits' form a funnel around the target outcome, in a close analogy to standard Shewhart control charts. Examples are given for comparing proportions and changes in rates, assessing association between outcome and volume of cases, and dealing with over-dispersion due to unmeasured risk factors. We conclude that funnel plots are flexible, attractively simple, and avoid spurious ranking of institutions into 'league tables'.

752 citations


Journal ArticleDOI
TL;DR: In this paper, a general formula describing the relation between the hazard and the corresponding survival time of the Cox model is derived, which is useful in simulation studies, and techniques to generate survival times for simulation studies regarding Cox proportional hazards models are presented.
Abstract: Simulation studies present an important statistical tool to investigate the performance, properties and adequacy of statistical models in pre-specified situations. One of the most important statistical models in medical research is the proportional hazards model of Cox. In this paper, techniques to generate survival times for simulation studies regarding Cox proportional hazards models are presented. A general formula describing the relation between the hazard and the corresponding survival time of the Cox model is derived, which is useful in simulation studies. It is shown how the exponential, the Weibull and the Gompertz distribution can be applied to generate appropriate survival times for simulation studies. Additionally, the general relation between hazard and survival time can be used to develop own distributions for special situations and to handle flexibly parameterized proportional hazards models. The use of distributions other than the exponential distribution is indispensable to investigate the characteristics of the Cox proportional hazards model, especially in non-standard situations, where the partial likelihood depends on the baseline hazard. A simulation study investigating the effect of measurement errors in the German Uranium Miners Cohort Study is considered to illustrate the proposed simulation techniques and to emphasize the importance of a careful modelling of the baseline hazard in Cox models.

749 citations


Journal ArticleDOI
TL;DR: Two of the adjustment methods based on a censored normal regression model appear to perform well across a range of realistic settings and there is no argument for undertaking a flawed analysis that wastes power and results in excessive bias.
Abstract: A population-based study of a quantitative trait may be seriously compromised when the trait is subject to the effects of a treatment. For example, in a typical study of quantitative blood pressure (BP) 15 per cent or more of middle-aged subjects may take antihypertensive treatment. Without appropriate correction, this can lead to substantial shrinkage in the estimated effect of aetiological determinants of scientific interest and a marked reduction in statistical power. Correction relies upon imputation, in treated subjects, of the underlying BP from the observed BP having invoked one or more assumptions about the bioclinical setting. There is a range of different assumptions that may be made, and a number of different analytical models that may be used. In this paper, we motivate an approach based on a censored normal regression model and compare it with a range of other methods that are currently used or advocated. We compare these methods in simulated data sets and assess the estimation bias and the loss of power that ensue when treatment effects are not appropriately addressed. We also apply the same methods to real data and demonstrate a pattern of behaviour that is consistent with that in the simulation studies. Although all approaches to analysis are necessarily approximations, we conclude that two of the adjustment methods appear to perform well across a range of realistic settings. These are: (1) the addition of a sensible constant to the observed BP in treated subjects; and (2) the censored normal regression model. A third, non-parametric, method based on averaging ordered residuals may also be advocated in some settings. On the other hand, three approaches that are used relatively commonly are fundamentally flawed and should not be used at all. These are: (i) ignoring the problem altogether and analysing observed BP in treated subjects as if it was underlying BP; (ii) fitting a conventional regression model with treatment as a binary covariate; and (iii) excluding treated subjects from the analysis. Given that the more effective methods are straightforward to implement, there is no argument for undertaking a flawed analysis that wastes power and results in excessive bias.

623 citations


Journal ArticleDOI
TL;DR: This paper advocates the use of sequential multiple assignment randomized trials in the development of adaptive treatment strategies and both a simple ad hoc method for ascertaining sample sizes and simple analysis methods are provided.
Abstract: In adaptive treatment strategies, the treatment level and type is repeatedly adjusted according to ongoing individual response. Since past treatment may have delayed effects, the development of these treatment strategies is challenging. This paper advocates the use of sequential multiple assignment randomized trials in the development of adaptive treatment strategies. Both a simple ad hoc method for ascertaining sample sizes and simple analysis methods are provided.

603 citations


Journal ArticleDOI
TL;DR: A process for pooling results from population‐based molecular association studies which consists of checking Hardy–Weinberg equilibrium using chi‐square goodness of fit and performing sensitivity analysis with and without studies that are in HWE is proposed.
Abstract: Although population-based molecular association studies are becoming increasingly popular, methodology for the meta-analysis of these studies has been neglected, particularly with regard to two issues: testing Hardy-Weinberg equilibrium (HWE), and pooling results in a manner that reflects a biological model of gene effect We propose a process for pooling results from population-based molecular association studies which consists of the following steps: (1) checking HWE using chi-square goodness of fit; we suggest performing sensitivity analysis with and without studies that are in HWE (2) Heterogeneity is then checked, and if present, possible causes are explored (3) If no heterogeneity is present, regression analysis is used to pool data and to determine the gene effect (4) If there is a significant gene effect, pairwise group differences are analysed and these data are allowed to 'dictate' the best genetic model (5) Data may then be pooled using this model This method is easily performed using standard software, and has the advantage of not assuming an a priori genetic model

577 citations


Journal ArticleDOI
TL;DR: A simulation study comparing the effects of 13 different prior distributions for the scale parameter on simulated random effects meta-analysis data and the frequentist properties of bias and coverage were investigated for the between-study variance and the effect size.
Abstract: There has been a recent growth in the use of Bayesian methods in medical research. The main reasons for this are the development of computer intensive simulation based methods such as Markov chain Monte Carlo (MCMC), increases in computing power and the introduction of powerful software such as WinBUGS. This has enabled increasingly complex models to be fitted. The ability to fit these complex models has led to MCMC methods being used as a convenient tool by frequentists, who may have no desire to be fully Bayesian. Often researchers want 'the data to dominate' when there is no prior information and thus attempt to use vague prior distributions. However, with small amounts of data the use of vague priors can be problematic. The results are potentially sensitive to the choice of prior distribution. In general there are fewer problems with location parameters. The main problem is with scale parameters. With scale parameters, not only does one have to decide the distributional form of the prior distribution, but also whether to put the prior distribution on the variance, standard deviation or precision. We have conducted a simulation study comparing the effects of 13 different prior distributions for the scale parameter on simulated random effects meta-analysis data. We varied the number of studies (5, 10 and 30) and compared three different between-study variances to give nine different simulation scenarios. One thousand data sets were generated for each scenario and each data set was analysed using the 13 different prior distributions. The frequentist properties of bias and coverage were investigated for the between-study variance and the effect size. The choice of prior distribution was crucial when there were just five studies. There was a large variation in the estimates of the between-study variance for the 13 different prior distributions. With a large number of studies the choice of prior distribution was less important. The effect size estimated was not biased, but the precision with which it was estimated varied with the choice of prior distribution leading to varying coverage intervals and, potentially, to different statistical inferences. Again there was less of a problem with a larger number of studies. There is a particular problem if the between-study variance is close to the boundary at zero, as MCMC results tend to produce upwardly biased estimates of the between-study variance, particularly if inferences are based on the posterior mean. The choice of 'vague' prior distribution can lead to a marked variation in results, particularly in small studies. Sensitivity to the choice of prior distribution should always be assessed.

472 citations


Journal ArticleDOI
TL;DR: Geographically weighted Poisson regression and its semi-parametric variant are described as a new statistical tool for analysing disease maps arising from spatially non-stationary processes and GWPR provides disease analysts with an important new set of statistical tools.
Abstract: This paper describes geographically weighted Poisson regression (GWPR) and its semi-parametric variant as a new statistical tool for analysing disease maps arising from spatially non-stationary processes. The method is a type of conditional kernel regression which uses a spatial weighting function to estimate spatial variations in Poisson regression parameters. It enables us to draw surfaces of local parameter estimates which depict spatial variations in the relationships between disease rates and socio-economic characteristics. The method therefore can be used to test the general assumption made, often without question, in the global modelling of spatial data that the processes being modelled are stationary over space. Equally, it can be used to identify parts of the study region in which 'interesting' relationships might be occurring and where further investigation might be warranted. Such exceptions can easily be missed in traditional global modelling and therefore GWPR provides disease analysts with an important new set of statistical tools. We demonstrate the GWPR approach applied to a data set of working-age deaths in the Tokyo metropolitan area, Japan. The results indicate that there are significant spatial variations (that is, variation beyond that expected from random sampling) in the relationships between working-age mortality and occupational segregation and between working-age mortality and unemployment throughout the Tokyo metropolitan area and that, consequently, the application of traditional 'global' models would yield misleading results.

440 citations


Journal ArticleDOI
TL;DR: An adjusted Kaplan-Meier estimator (AKME) is developed to reduce confounding effects using inverse probability of treatment weighting (IPTW) and a weighted log-rank test is proposed for comparing group differences of survival functions.
Abstract: Estimation and group comparison of survival curves are two very common issues in survival analysis. In practice, the Kaplan-Meier estimates of survival functions may be biased due to unbalanced distribution of confounders. Here we develop an adjusted Kaplan-Meier estimator (AKME) to reduce confounding effects using inverse probability of treatment weighting (IPTW). Each observation is weighted by its inverse probability of being in a certain group. The AKME is shown to be a consistent estimate of the survival function, and the variance of the AKME is derived. A weighted log-rank test is proposed for comparing group differences of survival functions. Simulation studies are used to illustrate the performance of AKME and the weighted log-rank test. The method proposed here outperforms the Kaplan-Meier estimate, and it does better than or as well as other estimators based on stratification. The AKME and the weighted log-rank test are applied to two real examples: one is the study of times to reinfection of sexually transmitted diseases, and the other is the primary biliary cirrhosis (PBC) study.

405 citations


Journal ArticleDOI
TL;DR: Joint modelling of baseline and outcome is the most efficient method, subject to three conditions, and a dummy variable for missingness should be included as a covariate (the missing indicator method).
Abstract: Adjustment for baseline variables in a randomized trial can increase power to detect a treatment effect. However, when baseline data are partly missing, analysis of complete cases is inefficient. We consider various possible improvements in the case of normally distributed baseline and outcome variables. Joint modelling of baseline and outcome is the most efficient method. Mean imputation is an excellent alternative, subject to three conditions. Firstly, if baseline and outcome are correlated more than about 0.6 then weighting should be used to allow for the greater information from complete cases. Secondly, imputation should be carried out in a deterministic way, using other baseline variables if possible, but not using randomized arm or outcome. Thirdly, if baselines are not missing completely at random, then a dummy variable for missingness should be included as a covariate (the missing indicator method). The methods are illustrated in a randomized trial in community psychiatry.

322 citations


Journal ArticleDOI
TL;DR: The proposed index is used to evaluate the discrimination ability of a model, including covariates having time-dependent effects, concerning time to relapse in breast cancer patients treated with adjuvant tamoxifen.
Abstract: To derive models suitable for outcome prediction, a crucial aspect is the availability of appropriate measures of predictive accuracy, which have to be usable for a general class of models. The Harrell's C discrimination index is an extension of the area under the ROC curve to the case of censored survival data, which owns a straightforward interpretability. For a model including covariates with time-dependent effects and/or time-dependent covariates, the original definition of C would require the prediction of individual failure times, which is not generally addressed in most clinical applications. Here we propose a time-dependent discrimination index Ctd where the whole predicted survival function is utilized as outcome prediction, and the ability to discriminate among subjects having different outcome is summarized over time. Ctd is based on a novel definition of concordance: a subject who developed the event should have a less predicted probability of surviving beyond his/her survival time than any subject who survived longer. The predicted survival function of a subject who developed the event is compared to: (1) that of subjects who developed the event before his/her survival time, and (2) that of subjects who developed the event, or were censored, after his/her survival time. Subjects who were censored are involved in comparisons with subjects who developed the event before their observed times. The index reduces to the previous C in the presence of separation between survival curves on the whole follow-up. A confidence interval for Ctd is derived using the jackknife method on correlated one-sample U-statistics. The proposed index is used to evaluate the discrimination ability of a model, including covariates having time-dependent effects, concerning time to relapse in breast cancer patients treated with adjuvant tamoxifen. The model was obtained from 596 patients entered prospectively at Istituto Nazionale per lo Studio e la Cura dei Tumori di Milano (INT). The model discrimination ability was validated on an independent testing data set of 175 patients provided by Centro Regionale Indicatori Biochimici di Tumore (CRIBT) in Venice. Copyright © 2005 John Wiley & Sons, Ltd.

215 citations


Journal ArticleDOI
TL;DR: It is demonstrated that propensity scores developed using administrative data do not necessarily balance patient characteristics contained in clinical data, and measures of treatment effectiveness were attenuated when obtained using clinical data compared to when administrative data were used.
Abstract: There is an increasing interest in using administrative data to estimate the treatment effects of interventions. While administrative data are relatively inexpensive to obtain and provide population coverage, they are frequently characterized by lack of clinical detail, often leading to problematic confounding when they are used to conduct observational research. Propensity score methods are increasingly being used to address confounding in estimating the effects of interventions in such studies. Using data on patients discharged from hospital for whom both administrative data and detailed clinical data obtained from chart reviews were available, we examined the degree to which stratifying on the quintiles of propensity scores derived from administrative data was able to balance patient characteristics measured in clinical data. We also determined the extent to which measures of treatment effect obtained using propensity score methods were similar to those obtained using traditional regression methods. As a test case, we examined the treatment effects of ASA and beta-blockers following acute myocardial infarction. We demonstrated that propensity scores developed using administrative data do not necessarily balance patient characteristics contained in clinical data. Furthermore, measures of treatment effectiveness were attenuated when obtained using clinical data compared to when administrative data were used.

Journal ArticleDOI
TL;DR: This paper review adaptive treatment selection based on combination tests and propose overall adjusted p-values and simultaneous confidence intervals and point estimation in adaptive trials is considered.
Abstract: Integrating selection and confirmation phases into a single trial can expedite the development of new treatments and allows to use all accumulated data in the decision process. In this paper we review adaptive treatment selection based on combination tests and propose overall adjusted p-values and simultaneous confidence intervals. Also point estimation in adaptive trials is considered. The methodology is illustrated in a detailed example based on an actual planned study.

Journal ArticleDOI
TL;DR: Using incremental net benefit and the theory of the expected value of information, and taking a societal perspective, it is shown how to determine the sample size that maximizes the difference between the cost of doing the trial and the value of the information gained from the results.
Abstract: Traditional sample size calculations for randomized clinical trials depend on somewhat arbitrarily chosen factors, such as type I and II errors. Type I error, the probability of rejecting the null hypothesis of no difference when it is true, is most often set to 0.05, regardless of the cost of such an error. In addition, the traditional use of 0.2 for the type II error means that the money and effort spent on the trial will be wasted 20 per cent of the time, even when the true treatment difference is equal to the smallest clinically important one and, again, will not reflect the cost of making such an error. An effectiveness trial (otherwise known as a pragmatic trial or management trial) is essentially an effort to inform decision-making, i.e. should treatment be adopted over standard? As such, a decision theoretic approach will lead to an optimal sample size determination. Using incremental net benefit and the theory of the expected value of information, and taking a societal perspective, it is shown how to determine the sample size that maximizes the difference between the cost of doing the trial and the value of the information gained from the results. The methods are illustrated using examples from oncology and obstetrics.

Journal ArticleDOI
TL;DR: The conventional randomized clinical trial design is compared to a design based on randomizing only patients predicted to preferentially benefit from the new treatment, based on the required number of randomized patients and the expected number of patients screened for randomization eligibility.
Abstract: The development of genomics-based technologies is demonstrating that many common diseases are heterogeneous collections of molecularly distinct entities. Molecularly targeted therapeutics is often effective only for some subsets patients with a conventionally defined disease. We consider the problem of design of phase III randomized clinical trials for the evaluation of a molecularly targeted treatment when there is an assay predictive of which patients will be more responsive to the experimental treatment than to the control regimen. We compare the conventional randomized clinical trial design to a design based on randomizing only patients predicted to preferentially benefit from the new treatment. Trial designs are compared based on the required number of randomized patients and the expected number of patients screened for randomization eligibility. Relative efficiency depends upon the distribution of treatment effect across patient subsets, prevalence of the subset of patients who respond preferentially to the experimental treatment, and assay performance.

Journal ArticleDOI
TL;DR: The area under the curve (AUC) is commonly used as a summary measure of the receiver operating characteristic (ROC) curve which indicates the overall performance of a diagnostic test in terms of its accuracy at various diagnostic thresholds used to discriminate cases and non‐cases of disease.
Abstract: The area under the curve (AUC) is commonly used as a summary measure of the receiver operating characteristic (ROC) curve. It indicates the overall performance of a diagnostic test in terms of its accuracy at various diagnostic thresholds used to discriminate cases and non-cases of disease. The AUC measure is also used in meta-analyses, where each component study provides an estimate of the test sensitivity and specificity. These estimates are then combined to calculate a summary ROC (SROC) curve which describes the relationship between-test sensitivity and specificity across studies. The partial AUC has been proposed as an alternative measure to the full AUC. When using the partial AUC, one considers only those regions of the ROC space where data have been observed, or which correspond to clinically relevant values of test sensitivity or specificity. In this paper, we extend the idea of using the partial AUC to SROC curves in meta-analysis. Theoretical and numerical results describe the variation in the partial AUC and its standard error as a function of the degree of inter-study heterogeneity and of the extent of truncation applied to the ROC space. A scaled partial area measure is also proposed to restore the property that the summary measure should range from 0 to 1. The results suggest several disadvantages of the partial AUC measures. In contrast to earlier findings with the full AUC, the partial AUC is rather sensitive to heterogeneity. Comparisons between tests are more difficult, especially if an empirical truncation process is used. Finally, the partial area lacks a useful symmetry property enjoyed by the full AUC. Although the partial AUC may sometimes have clinical appeal, on balance the use of the full AUC is preferred.

Journal ArticleDOI
TL;DR: An iterative procedure is described that determines a stopping boundary on the B-value and a final test critical Z-value with specified type I and II error probabilities and the implementation in conjunction with a group sequential analysis for effectiveness is also described.
Abstract: Conditional power (CP) is the probability that the final study result will be statistically significant, given the data observed thus far and a specific assumption about the pattern of the data to be observed in the remainder of the study, such as assuming the original design effect, or the effect estimated from the current data, or under the null hypothesis. In many clinical trials, a CP computation at a pre-specified point in the study, such as mid-way, is used as the basis for early termination for futility when there is little evidence of a beneficial effect. Brownian motion can be used to describe the distribution of the interim Z-test value, the corresponding B-value, and the CP values under a specific assumption about the future data. A stopping boundary on the CP value specifies an equivalent boundary on the B-value from which the probability of stopping for futility can then be computed based on the planned study design (sample size and duration) and the assumed true effect size. This yields expressions for the total type I and II error probabilities. As the probability of stopping increases, the probability of a type I error alpha decreases from the nominal desired level (e.g. 0.05) while the probability of a type II error beta increases from the level specified in the study design. Thus a stopping boundary on the B-value can be determined such that the inflation in type II error probability is controlled at a desired level. An iterative procedure is also described that determines a stopping boundary on the B-value and a final test critical Z-value with specified type I and II error probabilities. The implementation in conjunction with a group sequential analysis for effectiveness is also described.

Journal ArticleDOI
TL;DR: By adopting a fully Bayesian approach, as opposed to undertaking sensitivity analyses assuming fixed values for unknown parameters, the overall intervention effect can be estimated with greater uncertainty, but that assessing the sensitivity of results to choice of prior distributions in such analyses is crucial.
Abstract: This paper considers the quantitative synthesis of published comparative study results when the outcome measures used in the individual studies and the way in which they are reported varies between studies. Whilst the former difficulty may be overcome, at least to a limited extent, by the use of standardized effects, the latter is often more problematic. Two potential solutions to this problem are; sensitivity analyses and a fully Bayesian approach, in which pertinent external information is included. Both approaches are illustrated using the results of two systematic reviews and meta-analyses which consider the difference in mean change in systolic blood pressure and the difference in physical functioning between an intervention and control group. The two examples illustrate that by adopting a fully Bayesian approach, as opposed to undertaking sensitivity analyses assuming fixed values for unknown parameters, the overall intervention effect can be estimated with greater uncertainty, but that assessing the sensitivity of results to choice of prior distributions in such analyses is crucial.

Journal ArticleDOI
TL;DR: A derivation of overlap bias is given, its magnitude is explored, and how the bias depends on properties of the exposure series is considered, which concludes that the bias is usually small, though highly unpredictable, and easily avoided.
Abstract: The case-crossover design uses cases only, and compares exposures just prior to the event times to exposures at comparable control, or 'referent' times, in order to assess the effect of short-term exposure on the risk of a rare event. It has commonly been used to study the effect of air pollution on the risk of various adverse health events. Proper selection of referents is crucial, especially with air pollution exposures, which are shared, highly seasonal, and often have a long-term time trend. Hence, careful referent selection is important to control for time-varying confounders, and in order to ensure that the distribution of exposure is constant across referent times, a key assumption of this method. Yet the referent strategy is important for a more basic reason: the conditional logistic regression estimating equations commonly used are biased when referents are not chosen a priori and are functions of the observed event times. We call this bias in the estimating equations overlap bias. In this paper, we propose a new taxonomy of referent selection strategies in order to emphasize their statistical properties. We give a derivation of overlap bias, explore its magnitude, and consider how the bias depends on properties of the exposure series. We conclude that the bias is usually small, though highly unpredictable, and easily avoided.

Journal ArticleDOI
TL;DR: A simple semiparametric model for fitting subject-specific curves for longitudinal data is presented and it is showed that the growth rate of girls in the study cannot be fully explained by the group-average curve and that individual curves are necessary to reflect the individual response to treatment.
Abstract: We present a simple semiparametric model for fitting subject-specific curves for longitudinal data. Individual curves are modelled as penalized splines with random coefficients. This model has a mixed model representation, and it is easily implemented in standard statistical software. We conduct an analysis of the long-term effect of radiation therapy on the height of children suffering from acute lymphoblastic leukaemia using penalized splines in the framework of semiparametric mixed effects models. The analysis revealed significant differences between therapies and showed that the growth rate of girls in the study cannot be fully explained by the group-average curve and that individual curves are necessary to reflect the individual response to treatment. We also show how to implement these models in S-PLUS and R in the appendix.

Journal ArticleDOI
TL;DR: Hierarchical Cox regression models are used to identify and explore the evidence for heterogeneity in meta-analysis and examine the relationship between covariates and censored failure time data in this context.
Abstract: Differences across studies in terms of design features and methodology, clinical procedures, and patient characteristics, are factors that can contribute to variability in the treatment effect between studies in a meta-analysis (statistical heterogeneity). Regression modelling can be used to examine relationships between treatment effect and covariates with the aim of explaining the variability in terms of clinical, methodological, or other factors. Such an investigation can be undertaken using aggregate data or individual patient data. An aggregate data approach can be problematic as sufficient data are rarely available and translating aggregate effects to individual patients can often be misleading. An individual patient data approach, although usually more resource demanding, allows a more thorough investigation of potential sources of heterogeneity and enables a fuller analysis of time to event outcomes in meta-analysis. Hierarchical Cox regression models are used to identify and explore the evidence for heterogeneity in meta-analysis and examine the relationship between covariates and censored failure time data in this context. Alternative formulations of the model are possible and illustrated using individual patient data from a meta-analysis of five randomized controlled trials which compare two drugs for the treatment of epilepsy. The models are further applied to simulated data examples in which the degree of heterogeneity and magnitude of treatment effect are varied. The behaviour of each model in each situation is explored and compared.

Journal ArticleDOI
TL;DR: It is shown that the original DBM and OR F statistics for testing the null hypothesis of equal treatments have the same form and will typically have similar values; however, differences in the denominator degrees of freedom will result in differences in p-values even when the F statistics are identical.
Abstract: There are several different statistical methods for analysing multireader ROC studies, with the Dorfman-Berbaum-Metz (DBM) method being the most frequently used. Another method is the corrected F method proposed by Obuchowski and Rockette (OR). The DBM and OR procedures at first appear quite different: DBM is a three-way ANOVA analysis of pseudovalues while OR is a two-way ANOVA analysis of accuracy estimates with correlated errors. We show that the original DBM and OR F statistics for testing the null hypothesis of equal treatments have the same form and will typically have similar values; however, differences in the denominator degrees of freedom will result in differences in p-values even when the F statistics are identical. We show how the methods can be generalized to include variations in the accuracy measure, covariance method, and degrees of freedom. Identical results are obtained when the methods agree with respect to all three of these procedure parameters; hence for a particular choice of procedure parameters the choice of method appears to depend mainly on software preference and availability. The methods are compared using data from a factorial study with two modalities, five readers, and 114 patients.

Journal ArticleDOI
TL;DR: There are techniques from the Classical approach that are closer-those based directly on the likelihood-and Bayesian techniques failed to make comparisons with these, as this letter will argue.
Abstract: In a recent Statistics in Medicine paper, Warn, Thompson and Spiegelhalter (WTS) made a comparison between the Bayesian approach to the meta-analysis of binary outcomes and a popular Classical approach that uses summary (two-stage) techniques. They included approximate summary (two-stage) Bayesian techniques in their comparisons in an attempt undoubtedly to make the comparison less unfair. But, as this letter will argue, there are techniques from the Classical approach that are closer-those based directly on the likelihood-and they failed to make comparisons with these. Here the differences between Bayesian and Classical approaches in meta-analysis applications reside solely in how the likelihood functions are converted into either credibility intervals or confidence intervals. Both summarize, contrast and combine data using likelihood functions. Conflating what Bayes actually offers to meta-analysts-a means of converting likelihood functions to credibility intervals-with the use of likelihood functions themselves to summarize, contrast and combine studies is at best misleading.

Journal ArticleDOI
TL;DR: This note indicates that, under an independent censoring assumption, the two population coefficients coincide and points out that a sample-based coefficient in common use in the SAS statistical package can be interpreted as an estimate of explained randomness when there is no censoring.
Abstract: A coefficient of explained randomness, analogous to explained variation but for non-linear models, was presented by Kent. The construct hinges upon the notion of Kullback-Leibler information gain. Kent and O'Quigley developed these ideas, obtaining simple, multiple and partial coefficients for the situation of proportional hazards regression. Their approach was based upon the idea of transforming a general proportional hazards model to a specific one of Weibull form. Xu and O'Quigley developed a more direct approach, more in harmony with the semi-parametric nature of the proportional hazards model thereby simplifying inference and allowing, for instance, the use of time dependent covariates. A potential drawback to the coefficient of Xu and O'Quigley is its interpretation as explained randomness in the covariate given time. An investigator might feel that the interpretation of the Kent and O'Quigley coefficient, as a proportion of explained randomness of time given the covariate, is preferable. One purpose of this note is to indicate that, under an independent censoring assumption, the two population coefficients coincide. Thus the simpler inferential setting for Xu and O'Quigley can also be applied to the coefficient of Kent and O'Quigley. Our second purpose is to point out that a sample-based coefficient in common use in the SAS statistical package can be interpreted as an estimate of explained randomness when there is no censoring. When there is censoring the SAS coefficient would not seem satisfactory in that its population counterpart depends on an independent censoring mechanism. However there is a quick fix and we argue in favour of its use.

Journal ArticleDOI
TL;DR: This paper compares two alternative methods of handling multivariate incomplete data across a baseline assessment and three follow-up time points in a multi-centre randomized controlled trial of a disease management programme for late-life depression.
Abstract: It is common for longitudinal clinical trials to face problems of item non-response, unit non-response, and drop-out. In this paper, we compare two alternative methods of handling multivariate incomplete data across a baseline assessment and three follow-up time points in a multi-centre randomized controlled trial of a disease management programme for late-life depression. One approach combines hot-deck (HD) multiple imputation using a predictive mean matching method for item non-response and the approximate Bayesian bootstrap for unit non-response. A second method is based on a multivariate normal (MVN) model using PROC MI in SAS software V8.2. These two methods are contrasted with a last observation carried forward (LOCF) technique and available-case (AC) analysis in a simulation study where replicate analyses are performed on subsets of the originally complete cases. Missing-data patterns were simulated to be consistent with missing-data patterns found in the originally incomplete cases, and observed complete data means were taken to be the targets of estimation. Not surprisingly, the LOCF and AC methods had poor coverage properties for many of the variables evaluated. Multiple imputation under the MVN model performed well for most variables but produced less than nominal coverage for variables with highly skewed distributions. The HD method consistently produced close to nominal coverage, with interval widths that were roughly 7 per cent larger on average than those produced from the MVN model.

Journal ArticleDOI
Lili Tian1
TL;DR: This paper considers the problem of making inference about the common population coefficient of variation when it is a priori suspected that several independent samples are from populations with a common coefficients of variation.
Abstract: The coefficient of variation is often used as a measure of precision and reproducibility of data in medical and biological science. This paper considers the problem of making inference about the common population coefficient of variation when it is a priori suspected that several independent samples are from populations with a common coefficient of variation. The procedures for confidence interval estimation and hypothesis testing are developed based on the concepts of generalized variables. The coverage properties of the proposed confidence intervals and type-I errors of the proposed tests are evaluated by simulation. The proposed methods are illustrated by a real life example.

Journal ArticleDOI
TL;DR: Although within-study selection was evident or suspected in several trials, the impact on the conclusions of the meta-analyses was minimal and sensitivity analysis was undertaken to assess the robustness of the conclusion to this bias.
Abstract: The systematic review community has become increasingly aware of the importance of addressing the issues of heterogeneity and publication bias in meta-analyses. A potentially bigger threat to the validity of a meta-analysis appears relatively unnoticed. The within-study selective reporting of outcomes, defined as the selection of a subset of the original variables recorded for inclusion in publication of trials, can theoretically have a substantial impact on the results. A cohort of meta-analyses on the Cochrane Library was reviewed to examine how often this form of within-study publication bias was suspected and explained some of the evident funnel plot asymmetry. In cases where the level of suspicion was high, sensitivity analysis was undertaken to assess the robustness of the conclusion to this bias. Although within-study selection was evident or suspected in several trials, the impact on the conclusions of the meta-analyses was minimal. This paper deals with the identification of, sensitivity analysis for, and impact of within-study selective reporting in meta-analysis.

Journal ArticleDOI
TL;DR: It is shown through simulations that a candidate joint prior for (ρ0,γ) with negative a priori correlation structure results in a safer trial than the one that assumes independent priors for these two parameters while keeping the efficiency of the estimate of the MTD essentially unchanged.
Abstract: SUMMARY We examine a large class of prior distributions to model the dose–response relationship in cancer phase I clinical trials. We parameterize the dose–toxicity model in terms of the maximum tolerated dose (MTD) � and the probability of dose limiting toxicity (DLT) at the initial dose � 0. The MTD is estimated using the EWOC (escalation with overdose control) method of Babb et al. We show through simulations that a candidate joint prior for (� 0 ;� ) with negative a priori correlation structure results in a safer trial than


Journal ArticleDOI
TL;DR: In interval estimation of the difference of probabilities and an odds ratio for comparing 'success' probabilities, binary matched-pairs data, simple improvements of the commonly used Wald confidence intervals are presented.
Abstract: For binary matched-pairs data, this article discusses interval estimation of the difference of probabilities and an odds ratio for comparing 'success' probabilities. We present simple improvements of the commonly used Wald confidence intervals for these parameters. The improvement of the interval for the difference of probabilities is to add two observations to each sample before applying it. The improvement for estimating an odds ratio transforms a confidence interval for a single proportion.

Journal ArticleDOI
TL;DR: Public health officials continue to develop and implement new types of ongoing surveillance systems in an attempt to detect aberrations in surveillance data as early as possible, but many of the new surveillance systems have limited historical data from which to calculate an expected baseline value.
Abstract: Public health officials continue to develop and implement new types of ongoing surveillance systems in an attempt to detect aberrations in surveillance data as early as possible. In public health surveillance, aberrations are traditionally defined as an observed value being greater than an expected historical value for that same time period. To account for seasonality, traditional aberration detection methods use three or more years of baseline data across the same time period to calculate the expected historical value. Due to the recent implementation of short-term bioterrorism surveillance systems, many of the new surveillance systems have limited historical data from which to calculate an expected baseline value. Three limited baseline aberration detection methods, C1-MILD, C2-MEDIUM, and C3-ULTRA, were developed based on a one-sided positive CUSUM (cumulative sum) calculation, a commonly used quality control method used in the manufacturing industry. To evaluate the strengths and weakness of these methods, data were simulated to represent syndromic data collected through the recently developed hospital-based enhanced syndromic surveillance systems. The three methods were applied to the simulated data and estimates of sensitivity, specificity, and false-positive rates for the three methods were obtained. For the six syndromes, sensitivity for the C1-MILD, C2-MEDIUM, and C3-ULTRA models averaged 48.2, 51.3, and 53.7 per cent, respectively. Similarly, the specificities averaged 97.7, 97.8, and 96.1 per cent, respectively. The average false-positive rates for the three models were 31.8, 29.2, and 41.5 per cent, respectively. The results highlight the value and importance of developing and testing new aberration detection methods for public health surveillance data with limited baseline information. Copyright © 2005 John Wiley & Sons, Ltd.