scispace - formally typeset
Search or ask a question

Showing papers in "Statistics in Medicine in 2008"


Journal ArticleDOI
TL;DR: Two new measures, one based on integrated sensitivity and specificity and the other on reclassification tables, are introduced that offer incremental information over the AUC and are proposed to be considered in addition to the A UC when assessing the performance of newer biomarkers.
Abstract: Identification of key factors associated with the risk of developing cardiovascular disease and quantification of this risk using multivariable prediction algorithms are among the major advances made in preventive cardiology and cardiovascular epidemiology in the 20th century. The ongoing discovery of new risk markers by scientists presents opportunities and challenges for statisticians and clinicians to evaluate these biomarkers and to develop new risk formulations that incorporate them. One of the key questions is how best to assess and quantify the improvement in risk prediction offered by these new models. Demonstration of a statistically significant association of a new biomarker with cardiovascular risk is not enough. Some researchers have advanced that the improvement in the area under the receiver-operating-characteristic curve (AUC) should be the main criterion, whereas others argue that better measures of performance of prediction models are needed. In this paper, we address this question by introducing two new measures, one based on integrated sensitivity and specificity and the other on reclassification tables. These new measures offer incremental information over the AUC. We discuss the properties of these new measures and contrast them with the AUC. We also develop simple asymptotic tests of significance. We illustrate the use of these measures with an example from the Framingham Heart Study. We propose that scientists consider these types of measures in addition to the AUC when assessing the performance of newer biomarkers.

5,651 citations


Journal ArticleDOI
TL;DR: The use of germline genetic variants that proxy for environmentally modifiable exposures as instruments for these exposures is one form of IV analysis that can be implemented within observational epidemiological studies and can be considered as analogous to randomized controlled trials.
Abstract: Observational epidemiological studies suffer from many potential biases, from confounding and from reverse causation, and this limits their ability to robustly identify causal associations. Several high-profile situations exist in which randomized controlled trials of precisely the same intervention that has been examined in observational studies have produced markedly different findings. In other observational sciences, the use of instrumental variable (IV) approaches has been one approach to strengthening causal inferences in non-experimental situations. The use of germline genetic variants that proxy for environmentally modifiable exposures as instruments for these exposures is one form of IV analysis that can be implemented within observational epidemiological studies. The method has been referred to as 'Mendelian randomization', and can be considered as analogous to randomized controlled trials. This paper outlines Mendelian randomization, draws parallels with IV methods, provides examples of implementation of the approach and discusses limitations of the approach and some methods for dealing with these.

2,364 citations


Journal ArticleDOI
TL;DR: This work recommends rescaling as a default option--an improvement upon the usual approach of including variables in whatever way they are coded in the data file--so that the magnitudes of coefficients can be directly compared as a matter of routine statistical practice.
Abstract: Interpretation of regression coefficients is sensitive to the scale of the inputs. One method often used to place input variables on a common scale is to divide each numeric variable by its standard deviation. Here we propose dividing each numeric variable by two times its standard deviation, so that the generic comparison is with inputs equal to the mean ±1 standard deviation. The resulting coefficients are then directly comparable for untransformed binary predictors. We have implemented the procedure as a function in R. We illustrate the method with two simple analyses that are typical of applied modeling: a linear regression of data from the National Election Study and a multilevel logistic regression of data on the prevalence of rodents in New York City apartments. We recommend our rescaling as a default option—an improvement upon the usual approach of including variables in whatever way they are coded in the data file—so that the magnitudes of coefficients can be directly compared as a matter of routine statistical practice. Copyright © 2007 John Wiley & Sons, Ltd.

1,894 citations



Journal ArticleDOI
TL;DR: Recommendations are provided for the analysis and reporting of studies that employ propensity-score matching and common errors included using the log-rank test to compare Kaplan-Meier survival curves in the matched sample, and failing to account for the matched nature of the data.
Abstract: Propensity-score methods are increasingly being used to reduce the impact of treatment-selection bias in the estimation of treatment effects using observational data. Commonly used propensity-score methods include covariate adjustment using the propensity score, stratification on the propensity score, and propensity-score matching. Empirical and theoretical research has demonstrated that matching on the propensity score eliminates a greater proportion of baseline differences between treated and untreated subjects than does stratification on the propensity score. However, the analysis of propensity-score-matched samples requires statistical methods appropriate for matched-pairs data. We critically evaluated 47 articles that were published between 1996 and 2003 in the medical literature and that employed propensity-score matching. We found that only two of the articles reported the balance of baseline characteristics between treated and untreated subjects in the matched sample and used correct statistical methods to assess the degree of imbalance. Thirteen (28 per cent) of the articles explicitly used statistical methods appropriate for the analysis of matched data when estimating the treatment effect and its statistical significance. Common errors included using the log-rank test to compare Kaplan-Meier survival curves in the matched sample, using Cox regression, logistic regression, chi-squared tests, t-tests, and Wilcoxon rank sum tests in the matched sample, thereby failing to account for the matched nature of the data. We provide guidelines for the analysis and reporting of studies that employ propensity-score matching.

1,134 citations


Journal ArticleDOI
TL;DR: A method for estimating alternative comparisons based on the ideas originally put forward by Greenland and Longnecker is described and implementations of the method are provided, developed using Microsoft Excel and SAS.
Abstract: Epidemiological studies relating a particular exposure to a specified disease may present their results in a variety of ways Often, results are presented as estimated odds ratios (or relative risks) and confidence intervals (CIs) for a number of categories of exposure, for example, by duration or level of exposure, compared with a single reference category, often the unexposed For systematic literature review, and particularly meta-analysis, estimates for an alternative comparison of the categories, such as any exposure versus none, may be required Obtaining these alternative comparisons is not straightforward, as the initial set of estimates is correlated This paper describes a method for estimating these alternative comparisons based on the ideas originally put forward by Greenland and Longnecker, and provides implementations of the method, developed using Microsoft Excel and SAS Examples of the method based on studies of smoking and cancer are given The method also deals with results given by categories of disease (such as histological types of a cancer) The method allows the use of a more consistent comparison when summarizing published evidence, thus potentially improving the reliability of a meta-analysis

540 citations


Journal ArticleDOI
TL;DR: Highlights of recent developments in meta‐analysis in medical research are reviewed, outlining in particular how emphasis has been placed on heterogeneity and random‐effects analyses and extension of ideas to complex evidence synthesis.
Abstract: The art and science of meta-analysis, the combination of results from multiple independent studies, is now more than a century old. In the last 30 years, however, as the need for medical research and clinical practice to be based on the totality of relevant and sound evidence has been increasingly recognized, the impact of meta-analysis has grown enormously. In this paper, we review highlights of recent developments in meta-analysis in medical research. We outline in particular how emphasis has been placed on (i) heterogeneity and random-effects analyses; (ii) special consideration in different areas of application; (iii) assessing bias within and across studies; and (iv) extension of ideas to complex evidence synthesis. We conclude the paper with some remarks on ongoing challenges and possible directions for the future.

538 citations


Journal ArticleDOI
TL;DR: This work defines and describes how to compute a model R(2) statistic for the linear mixed model by using only a single model and indicates that ethnicity has an almost negligible association with the repeated BP outcomes for the study.
Abstract: Statisticians most often use the linear mixed model to analyze Gaussian longitudinal data. The value and familiarity of the R2 statistic in the linear univariate model naturally creates great interest in extending it to the linear mixed model. We define and describe how to compute a model R2 statistic for the linear mixed model by using only a single model. The proposed R2 statistic measures multivariate association between the repeated outcomes and the fixed effects in the linear mixed model. The R2 statistic arises as a 1–1 function of an appropriate F statistic for testing all fixed effects (except typically the intercept) in a full model. The statistic compares the full model with a null model with all fixed effects deleted (except typically the intercept) while retaining exactly the same covariance structure. Furthermore, the R2 statistic leads immediately to a natural definition of a partial R2 statistic. A mixed model in which ethnicity gives a very small p-value as a longitudinal predictor of blood pressure (BP) compellingly illustrates the value of the statistic. In sharp contrast to the extreme p-value, a very small R2 , a measure of statistical and scientific importance, indicates that ethnicity has an almost negligible association with the repeated BP outcomes for the study. Copyright © 2008 John Wiley & Sons, Ltd.

498 citations


Journal ArticleDOI
TL;DR: Most methods improve on the naïve complete‐case analysis for variable selection, but importantly the type 1 error is only preserved if selection is based on RR, which is the recommended approach.
Abstract: Multiple imputation is a popular technique for analysing incomplete data. Given the imputed data and a particular model, Rubin's rules (RR) for estimating parameters and standard errors are well established. However, there are currently no guidelines for variable selection in multiply imputed data sets. The usual practice is to perform variable selection amongst the complete cases, a simple but inefficient and potentially biased procedure. Alternatively, variable selection can be performed by repeated use of RR, which is more computationally demanding. An approximation can be obtained by a simple 'stacked' method that combines the multiply imputed data sets into one and uses a weighting scheme to account for the fraction of missing data in each covariate. We compare these and other approaches using simulations based around a trial in community psychiatry. Most methods improve on the naive complete-case analysis for variable selection, but importantly the type 1 error is only preserved if selection is based on RR, which is our recommended approach.

372 citations


Journal ArticleDOI
TL;DR: The size of one of the new tests is comparable to those of the best existing tests, including those recently published, and among such tests it has slightly greater power, especially when the effect size is small and heterogeneity is present.
Abstract: In meta-analyses, it sometimes happens that smaller trials show different, often larger, treatment effects. One possible reason for such 'small study effects' is publication bias. This is said to occur when the chance of a smaller study being published is increased if it shows a stronger effect. Assuming no other small study effects, under the null hypothesis of no publication bias, there should be no association between effect size and effect precision (e.g. inverse standard error) among the trials in a meta-analysis.A number of tests for small study effects/publication bias have been developed. These use either a non-parametric test or a regression test for association between effect size and precision. However, when the outcome is binary, the effect is summarized by the log-risk ratio or log-odds ratio (log OR). Unfortunately, these measures are not independent of their estimated standard error. Consequently, established tests reject the null hypothesis too frequently. We propose new tests based on the arcsine transformation, which stabilizes the variance of binomial random variables. We report results of a simulation study under the Copas model (on the log OR scale) for publication bias, which evaluates tests so far proposed in the literature. This shows that: (i) the size of one of the new tests is comparable to those of the best existing tests, including those recently published; and (ii) among such tests it has slightly greater power, especially when the effect size is small and heterogeneity is present. Arcsine tests have additional advantages that they can include trials with zero events in both arms and that they can be very easily performed using the existing software for regression tests.

334 citations


Journal ArticleDOI
TL;DR: The Bayesian approach to finding the maximum-tolerated dose in phase I cancer trials is discussed and a comparison with the continual reassessment method (CRM) is performed with data from an actual trial and a simulation study.
Abstract: The Bayesian approach to finding the maximum-tolerated dose in phase I cancer trials is discussed. The suggested approach relies on a realistic dose-toxicity model, allows one to include prior information, and supports clinical decision making by presenting within-trial information in a transparent way. The modeling and decision-making components are flexible enough to be extendable to more complex settings. Critical aspects are emphasized and a comparison with the continual reassessment method (CRM) is performed with data from an actual trial and a simulation study. The comparison revealed similar operating characteristics while avoiding some of the difficulties encountered in the actual trial when applying the CRM.


Journal ArticleDOI
TL;DR: It is concluded that an approach based on a log-normal assumption for the raw data is reasonably robust to different true distributions; and new standard error approximations are provided for this method.
Abstract: When literature-based meta-analyses involve outcomes with skewed distributions, the best available data can sometimes be a mixture of results presented on the raw scale and results presented on the logarithmic scale. We review and develop methods for transforming between these results for two-group studies, such as clinical trials and prospective or cross-sectional epidemiological studies. These allow meta-analyses to be conducted using all studies and on a common scale. The methods can also be used to produce a meta-analysis of ratios of geometric means when skewed data are reported on the raw scale for every study. We compare three methods, two of which have alternative standard error formulae, in an application and in a series of simulation studies. We conclude that an approach based on a log-normal assumption for the raw data is reasonably robust to different true distributions; and we provide new standard error approximations for this method. An assumption can be made that the standard deviations in the two groups are equal. This increases precision of the estimates, but if incorrect can lead to very misleading results.

Journal ArticleDOI
TL;DR: Applying the theory of semiparametrics is led naturally to a characterization of all treatment effect estimators and to principled, practically feasible methods for covariate adjustment that yield the desired gains in efficiency and that allow covariate relationships to be identified and exploited while circumventing the usual concerns.
Abstract: There is considerable debate regarding whether and how covariate-adjusted analyses should be used in the comparison of treatments in randomized clinical trials. Substantial baseline covariate information is routinely collected in such trials, and one goal of adjustment is to exploit covariates associated with outcome to increase precision of estimation of the treatment effect. However, concerns are routinely raised over the potential for bias when the covariates used are selected post hoc and the potential for adjustment based on a model of the relationship between outcome, covariates, and treatment to invite a 'fishing expedition' for that leading to the most dramatic effect estimate. By appealing to the theory of semiparametrics, we are led naturally to a characterization of all treatment effect estimators and to principled, practically feasible methods for covariate adjustment that yield the desired gains in efficiency and that allow covariate relationships to be identified and exploited while circumventing the usual concerns. The methods and strategies for their implementation in practice are presented. Simulation studies and an application to data from an HIV clinical trial demonstrate the performance of the techniques relative to the existing methods.

Journal ArticleDOI
TL;DR: A range of statistical methods for combining IPD and AD in meta-analysis of continuous outcomes from randomized controlled trials are developed and an approach to separate within-trial and across-trials treatment-covariate interactions is developed.
Abstract: Meta-analysis of individual patient data (IPD) is the gold-standard for synthesizing evidence across clinical studies. However, for some studies IPD may not be available and only aggregate data (AD), such as a treatment effect estimate and its standard error, may be obtained. In this situation, methods for combining IPD and AD are important to utilize all the available evidence. In this paper, we develop and assess a range of statistical methods for combining IPD and AD in meta-analysis of continuous outcomes from randomized controlled trials. The methods take either a one-step or a two-step approach. The latter is simple, with IPD reduced to AD so that standard AD meta-analysis techniques can be employed. The one-step approach is more complex but offers a flexible framework to include both patient-level and trial-level parameters. It uses a dummy variable to distinguish IPD trials from AD trials and to constrain which parameters the AD trials estimate. We show that this is important when assessing how patient-level covariates modify treatment effect, as aggregate-level relationships across trials are subject to ecological bias and confounding. We thus develop models to separate within-trial and across-trials treatment-covariate interactions; this ensures that only IPD trials estimate the former, whilst both IPD and AD trials estimate the latter in addition to the pooled treatment effect and any between-study heterogeneity. Extension to multiple correlated outcomes is also considered. Ten IPD trials in hypertension, with blood pressure the continuous outcome of interest, are used to assess the models and identify the benefits of utilizing AD alongside IPD.

Journal ArticleDOI
TL;DR: A novel method for estimating the optimal timing of expensive and/or painful diagnostic or prognostic tests is proposed, which explicitly incorporates this no direct effect restriction.
Abstract: We review recent developments in the estimation of an optimal treatment strategy or regime from longitudinal data collected in an observational study. We also propose novel methods for using the data obtained from an observational database in one health-care system to determine the optimal treatment regime for biologically similar subjects in a second health-care system when, for cultural, logistical, or financial reasons, the two health-care systems differ (and will continue to differ) in the frequency of, and reasons for, both laboratory tests and physician visits. Finally, we propose a novel method for estimating the optimal timing of expensive and/or painful diagnostic or prognostic tests. Diagnostic or prognostic tests are only useful in so far as they help a physician to determine the optimal dosing strategy, by providing information on both the current health state and the prognosis of a patient because, in contrast to drug therapies, these tests have no direct causal effect on disease progression. Our new method explicitly incorporates this no direct effect restriction.

Journal ArticleDOI
TL;DR: The multi‐rule quality control system (MRQCS) used during the later part of the trial (AREDS Phase III) is reported here and the features of the MRQCS are demonstrated using quality control (QC) data associated with vitamin C measurements.
Abstract: The Age-Related Eye Disease Study (AREDS), sponsored by the National Eye Institute, was designed to study the natural history and risk factors of age-related macular degeneration (AMD) and cataract, and to evaluate the effect of high doses of antioxidants and zinc on eye disease progression. AMD and cataract are leading causes of visual impairment and blindness in the U.S., with frequency of both diseases increasing dramatically after age 65. Participants were randomly chosen to receive antioxidant or placebo tablets. Blood was drawn annually from a subset of patients, and serum concentrations of 17 different nutritional indicators were measured. Because of the complexity of the analytical methods, and possibility of instrument error due to failure of any one of many component parts, several different instruments were used for most analytes. In addition, to assure that the measurement systems were performing adequately across a wide range of concentrations, multiple control pools were monitored with analyte concentrations at low, medium, and high concentrations. We report here the multi-rule quality control system (MRQCS) used during the later part of the trial (AREDS Phase III). This system was designed to monitor systematic error and random within- and among-run error for analytical runs using 1-3 different quality control pools per run and 1-2 measurements of each pool per run. We demonstrate the features of the MRQCS using quality control (QC) data associated with vitamin C measurements. We also provide operating characteristics to demonstrate how the MRQCS responds to increases in systematic and/or random error.

Journal ArticleDOI
TL;DR: This work presents a method for the simultaneous estimation of the basic reproductive number, R(0), and the serial interval for infectious disease epidemics, using readily available surveillance data and implements the proposed method with data from three infectious disease outbreaks.
Abstract: We present a method for the simultaneous estimation of the basic reproductive number, R0, and the serial interval for infectious disease epidemics, using readily available surveillance data. These estimates can be obtained in real time to inform an appropriate public health response to the outbreak. We show how this methodology, in its most simple case, is related to a branching process and describe similarities between the two that allow us to draw parallels which enable us to understand some of the theoretical properties of our estimators. We provide simulation results that illustrate the efficacy of the method for estimating R0 and the serial interval in real time. Finally, we implement our proposed method with data from three infectious disease outbreaks. Copyright © 2007 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A general approach for estimating a difference between effect measures is described, which can be used to obtain confidence limits for a risk ratio and a lognormal mean and outperforms existing methods, including the bootstrap.
Abstract: It is widely accepted that confidence interval construction has important advantages over significance testing for the presentation of research results, as now facilitated by readily available software. However, for a number of effect measures, procedures are either not available or not satisfactory in samples of small to moderate size. In this paper, we describe a general approach for estimating a difference between effect measures, which can also be used to obtain confidence limits for a risk ratio and a lognormal mean. Numerical evaluation shows that this closed-form procedure outperforms existing methods, including the bootstrap.

Journal ArticleDOI
TL;DR: ‘Imputed placebo’ and ‘margin’‐based approaches to NI trial design will be considered, as well as the risk of ‘bio‐creep’ with repeatedNI trials, use of NI trials when determining whether excess safety risks can be ruled out, higher standards regarding quality of study conduct required with NI trials, and the myth that NI trials always require huge sample sizes.
Abstract: Non-inferiority (NI) trials enable a direct comparison of the relative benefit-to-risk profiles of an experimental intervention and a standard-of-care regimen. When the standard has clinical efficacy of substantial magnitude that is precisely estimated ideally using data from multiple adequate and well-controlled trials, with such estimates being relevant to the setting of the NI trial, then the NI trial can provide the scientific and regulatory evidence required to reliably assess the efficacy of the new intervention. In clinical practice, considerable uncertainty remains regarding when such trials should be conducted, how they should be designed, what standards for quality of trial conduct must be achieved, and how results should be interpreted. Recent examples will be considered to provide important insights and to highlight some of the challenges that remain to be adequately addressed regarding the use of the NI approach for the evaluation of new interventions. 'Imputed placebo' and 'margin'-based approaches to NI trial design will be considered, as well as the risk of 'bio-creep' with repeated NI trials, use of NI trials when determining whether excess safety risks can be ruled out, higher standards regarding quality of study conduct required with NI trials, and the myth that NI trials always require huge sample sizes.

Journal ArticleDOI
TL;DR: A series of Monte Carlo simulations were performed to assess the performance of propensity score matching, stratifying on the propensity score, and covariate adjustment using the propensityscore to estimate marginal odds ratios and found that matching onThe propensity score tended to result in estimators with the lowest MSE.
Abstract: The propensity score which is the probability of exposure to a specific treatment conditional on observed variables. Conditioning on the propensity score results in unbiased estimation of the expected difference in observed responses to two treatments. In the medical literature, propensity score methods are frequently used for estimating odds ratios. The performance of propensity score methods for estimating marginal odds ratios has not been studied. We performed a series of Monte Carlo simulations to assess the performance of propensity score matching, stratifying on the propensity score, and covariate adjustment using the propensity score to estimate marginal odds ratios. We assessed bias, precision, and mean-squared error (MSE) of the propensity score estimators, in addition to the proportion of bias eliminated due to conditioning on the propensity score. When the true marginal odds ratio was one, then matching on the propensity score and covariate adjustment using the propensity score resulted in unbiased estimation of the true treatment effect, whereas stratification on the propensity score resulted in minor bias in estimating the true marginal odds ratio. When the true marginal odds ratio ranged from 2 to 10, then matching on the propensity score resulted in the least bias, with a relative biases ranging from 2.3 to 13.3 per cent. Stratifying on the propensity score resulted in moderate bias, with relative biases ranging from 15.8 to 59.2 per cent. For both methods, relative bias was proportional to the true odds ratio. Finally, matching on the propensity score tended to result in estimators with the lowest MSE.

Journal ArticleDOI
TL;DR: Three different statistical models are considered that do not suffer from the drawback that the usual bivariate meta-analysis with a bivariate normal distribution can sometimes lead to positive probability mass at values that are not possible and are advised to consider meta- analysis of sensitivity and specificity values instead of likelihood ratios.
Abstract: Some authors plead for the explicit use of diagnostic likelihood ratios to describe the accuracy of diagnostic tests. Likelihood ratios are also preferentially used by some journals, and, naturally, are also used in meta-analysis. Although likelihood ratios vary between zero and infinity, meta-analysis is complicated by the fact that not every combination in ℜ+ is appropriate. The usual bivariate meta-analysis with a bivariate normal distribution can sometimes lead to positive probability mass at values that are not possible. We considered, therefore, three different statistical models that do not suffer from this drawback. All three approaches are so complicated that we advise to consider meta-analysis of sensitivity and specificity values instead of likelihood ratios. Copyright © 2007 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: Close-form formulas are derived for interaction studies with binary exposure and covariate in logistic regression and the formula for the optimal control-case ratio is derived such that it maximizes the power function given other parameters.
Abstract: There is no consensus on what test to use as the basis for sample size determination and power analysis. Some authors advocate the Wald test and some the likelihood-ratio test. We argue that the Wald test should be used because the Z-score is commonly applied for regression coefficient significance testing and therefore the same statistic should be used in the power function. We correct a widespread mistake on sample size determination when the variance of the maximum likelihood estimate (MLE) is estimated at null value. In our previous paper, we developed a correct sample size formula for logistic regression with single exposure (Statist. Med. 2007; 26(18):3385-3397). In the present paper, closed-form formulas are derived for interaction studies with binary exposure and covariate in logistic regression. The formula for the optimal control-case ratio is derived such that it maximizes the power function given other parameters. Our sample size and power calculations with interaction can be carried out online at www.dartmouth.edu/ approximately eugened.

Journal ArticleDOI
TL;DR: A model‐based approach to analyse multivariate time series data on counts of infectious diseases to deal with possible dependence between disease counts from different pathogens and is proposed to include additional information on global dispersal of the pathogen in the model.
Abstract: This paper describes a model-based approach to analyse multivariate time series data on counts of infectious diseases. It extends a method previously described in the literature to deal with possible dependence between disease counts from different pathogens. In a spatio-temporal context it is proposed to include additional information on global dispersal of the pathogen in the model. Two examples are given: the first describes an analysis of weekly influenza and meningococcal disease counts from Germany. The second gives an analysis of the spatio-temporal spread of influenza in the U.S.A., 1996-2006, using air traffic information. Maximum likelihood estimates in this non-standard model class are obtained using general optimization routines, which are integrated in the R package surveillance.


Journal ArticleDOI
TL;DR: The objective was to develop a (random effects) meta-analysis model that could synthesize both individual-level and aggregate-level binary outcome data while exploring the effects of binary covariates also available in a combination of individual participant and aggregate level data.
Abstract: The methodology described here was developed for a systematic review and individual participant-level meta-analysis of home safety education and the provision of safety equipment for the prevention of childhood accidents. This review had a particular emphasis on exploring whether effectiveness was related to socio-demographic characteristics previously shown to be associated with injury risk. Individual participant data were only made available to us for a proportion of the included studies. This resulted in the need for developing a new methodology to combine the available data most efficiently. Our objective was to develop a (random effects) meta-analysis model that could synthesize both individual-level and aggregate-level binary outcome data while exploring the effects of binary covariates also available in a combination of individual participant and aggregate level data. To add further complication, the studies to be combined were a mixture of cluster and individual participant-allocated designs.A Bayesian model using Markov chain Monte Carlo methods to estimate parameters is described which efficiently synthesizes the data by allowing different models to be fitted to the different study design and data format combinations available. Initially we describe a model to estimate mean effects ignoring the influence of the covariates, and then extend it to include a binary covariate. The application of the method is illustrated by application to one outcome from the motivating home safety meta-analysis for illustration. Using the same general approach, it would be possible to develop further 'tailor made' evidence synthesis models to synthesize all available evidence most effectively.

Journal ArticleDOI
TL;DR: The properties of several tests for goodness-of-fit for multinomial logistic regression for breast tumors are examined using data from a study of cytological criteria for the diagnosis of breast tumors.
Abstract: We examine the properties of several tests for goodness-of-fit for multinomial logistic regression. One test is based on a strategy of sorting the observations according to the complement of the estimated probability for the reference outcome category and then grouping the subjects into g equal-sized groups. A g x c contingency table, where c is the number of values of the outcome variable, is constructed. The test statistic, denoted as Cg, is obtained by calculating the Pearson chi2 statistic where the estimated expected frequencies are the sum of the model-based estimated logistic probabilities. Simulations compare the properties of Cg with those of the ungrouped Pearson chi2 test (X2) and its normalized test (z). The null distribution of Cg is well approximated by the chi2 distribution with (g-2) x (c-1) degrees of freedom. The sampling distribution of X2 is compared with a chi2 distribution with n x (c-1) degrees of freedom but shows erratic behavior. With a few exceptions, the sampling distribution of z adheres reasonably well to the standard normal distribution. Power simulations show that Cg has low power for a sample of 100 observations, but satisfactory power for a sample of 400. The tests are illustrated using data from a study of cytological criteria for the diagnosis of breast tumors.

Journal ArticleDOI
TL;DR: This paper develops a causal or manipulation model framework for mediation analysis based on the concept of potential outcome, which provides new definitions and measures of mediation and describes a sensitivity analysis approach to handle unidentified parameters.
Abstract: This paper develops a causal or manipulation model framework for mediation analysis based on the concept of potential outcome. Using this framework, we provide new definitions and measures of mediation. Effects of manipulations are modeled via the linear structural model. Corresponding structural equation models (SEMs), in conjunction with two-stage least-squares estimation and the delta method, are used to perform inference. The methods are applied to data from a study of nursing interventions for postoperative pain. We address the cases of more than two treatment groups, and an interaction among mediators. For the latter, a sensitivity analysis approach to handle unidentified parameters is described. Interpretative advantages of the potential outcomes framework for mediation are emphasized.

Journal ArticleDOI
TL;DR: Peter Austin should be commended for addressing the rampant lack of good practice in propensity-score matching applications with some much needed policing and rehabilitation.
Abstract: Research using propensity-score matching has been in the literature for over two decades now. During this time, in a process akin to the way a message gets distorted and passed on in the children’s game of ‘telephone,’ widespread dissemination has led to misunderstandings regarding the required assumptions, goals, and appropriate implementation of propensity-score matching. Thus, the bad practice that exists today is due, in large part, to degrees of separation from original sources coupled with the changing knowledge base and the time lag between new information appearing in the statistics literature and it reaching applied researchers. Another culprit more intrinsic to the nature of the method itself (at least in its current incarnation) is the ‘art form’ involved in proper practice [1]. Irrespective of how the current state of affairs came to be, a remedy is warranted. Peter Austin should be commended for addressing the rampant lack of good practice in propensity-score matching applications with some much needed policing and rehabilitation. Austin provides some useful advice with regard to good practice (I avoid the term ‘best practice’ since there seems to be no consensus as to what this comprises). I especially appreciate his push for explicit discussion of the strategy used to create matched pairs and examination of balance across matched groups. Austin also provides advice with which I don’t agree. I am particularly at odds with his position requiring matched pairs’ analyses, which is an overly narrow approach to the problem. There are many ways to address the lack of independence across samples, and methods that explicitly adjust for pairwise dependence are not always the best choice (even if, algorithmically, the dependence was created by forming matched pairs). Moreover, Austin gives this issue undue weight compared

Journal ArticleDOI
TL;DR: A shrinkage observed‐to‐expected ratio is implemented and evaluated for exploratory analysis of suspected drug–drug interaction in ICSR data, based on comparison with an additive risk model.
Abstract: Interaction between drug substances may yield excessive risk of adverse drug reactions (ADRs) when two drugs are taken in combination. Collections of individual case safety reports (ICSRs) related to suspected ADR incidents in clinical practice have proven to be very useful in post-marketing surveillance for pairwise drug--ADR associations, but have yet to reach their full potential for drug-drug interaction surveillance. In this paper, we implement and evaluate a shrinkage observed-to-expected ratio for exploratory analysis of suspected drug-drug interaction in ICSR data, based on comparison with an additive risk model. We argue that the limited success of previously proposed methods for drug-drug interaction detection based on ICSR data may be due to an underlying assumption that the absence of interaction is equivalent to having multiplicative risk factors. We provide empirical examples of established drug-drug interaction highlighted with our proposed approach that go undetected with logistic regression. A database wide screen for suspected drug-drug interaction in the entire WHO database is carried out to demonstrate the feasibility of the proposed approach. As always in the analysis of ICSRs, the clinical validity of hypotheses raised with the proposed method must be further reviewed and evaluated by subject matter experts.