scispace - formally typeset
Search or ask a question

Showing papers in "Statistics in Medicine in 2004"


Journal ArticleDOI
TL;DR: A range of Bayesian hierarchical models using the Markov chain Monte Carlo software WinBUGS are presented that allow for variation in true treatment effects across trials, and models where the between-trials variance is homogeneous across treatment comparisons are considered.
Abstract: Mixed treatment comparison (MTC) meta-analysis is a generalization of standard pairwise meta-analysis for A vs B trials, to data structures that include, for example, A vs B, B vs C, and A vs C trials. There are two roles for MTC: one is to strengthen inference concerning the relative efficacy of two treatments, by including both 'direct' and 'indirect' comparisons. The other is to facilitate simultaneous inference regarding all treatments, in order for example to select the best treatment. In this paper, we present a range of Bayesian hierarchical models using the Markov chain Monte Carlo software WinBUGS. These are multivariate random effects models that allow for variation in true treatment effects across trials. We consider models where the between-trials variance is homogeneous across treatment comparisons as well as heterogeneous variance models. We also compare models with fixed (unconstrained) baseline study effects with models with random baselines drawn from a common distribution. These models are applied to an illustrative data set and posterior parameter distributions are compared. We discuss model critique and model selection, illustrating the role of Bayesian deviance analysis, and node-based model criticism. The assumptions underlying the MTC models and their parameterization are also discussed.

1,861 citations


Journal ArticleDOI
TL;DR: The propensity score, the probability of treatment exposure conditional on covariates, is the basis for two approaches to adjusting for confounding: methods based on stratification of observations by quantiles of estimated propensity scores and methods based upon weighting observations by the inverse of estimated covariates.
Abstract: Estimation of treatment effects with causal interpretation from observational data is complicated because exposure to treatment may be confounded with subject characteristics. The propensity score, the probability of treatment exposure conditional on covariates, is the basis for two approaches to adjusting for confounding: methods based on stratification of observations by quantiles of estimated propensity scores and methods based on weighting observations by the inverse of estimated propensity scores. We review popular versions of these approaches and related methods offering improved precision, describe theoretical properties and highlight their implications for practice, and present extensive comparisons of performance that provide guidance for practical use.

1,548 citations


Journal ArticleDOI
TL;DR: Many routinely used summary methods provide widely ranging estimates when applied to sparse data with high imbalance between the size of the studies' arms, including Mantel-Haenszel summary estimates using the alternative continuity correction factors, which gave the least biased results for all group size imbalances.
Abstract: Objectives: To compare the performance of different meta-analysis methods for pooling odds ratios when applied to sparse event data with emphasis on the use of continuity corrections. Background: Meta-analysis of side effects from RCTs or risk factors for rare diseases in epidemiological studies frequently requires the synthesis of data with sparse event rates. Combining such data can be problematic when zero events exist in one or both arms of a study as continuity corrections are often needed, but, these can influence results and conclusions. Methods: A simulation study was undertaken comparing several meta-analysis methods for combining odds ratios (using various classical and Bayesian methods of estimation) on sparse event data. Where required, the routine use of a constant and two alternative continuity corrections; one based on a function of the reciprocal of the opposite group arm size; and the other an empirical estimate of the pooled effect size from the remaining studies in the meta-analysis, were also compared. A number of meta-analysis scenarios were simulated and replicated 1000 times, varying the ratio of the study arm sizes. Results: Mantel–Haenszel summary estimates using the alternative continuity correction factors gave the least biased results for all group size imbalances. Logistic regression was virtually unbiased for all scenarios and gave good coverage properties. The Peto method provided unbiased results for balanced treatment groups but bias increased with the ratio of the study arm sizes. The Bayesian fixed effect model provided good coverage for all group size imbalances. The two alternative continuity corrections outperformed the constant correction factor in nearly all situations. The inverse variance method performed consistently badly, irrespective of the continuity correction used. Conclusions: Many routinely used summary methods provide widely ranging estimates when applied to sparse data with high imbalance between the size of the studies' arms. A sensitivity analysis using several methods and continuity correction factors is advocated for routine practice. Copyright 2004 John Wiley & Sons, Ltd.

1,357 citations


Journal ArticleDOI
TL;DR: The overall C index is developed as a parameter describing the performance of a given model applied to the population under consideration and a confidence interval is constructed based on the asymptotic normality of its estimate.
Abstract: The assessment of the discrimination ability of a survival analysis model is a problem of considerable theoretical interest and important practical applications. This issue is, however, more complex than evaluating the performance of a linear or logistic regression. Several different measures have been proposed in the biostatistical literature. In this paper we investigate the properties of the overall C index introduced by Harrell as a natural extension of the ROC curve area to survival analysis. We develop the overall C index as a parameter describing the performance of a given model applied to the population under consideration and discuss the statistic used as its sample estimate. We discover a relationship between the overall C and the modified Kendall's tau and construct a confidence interval for our measure based on the asymptotic normality of its estimate. Then we investigate via simulations the length and coverage probability of this interval. Finally, we present a real life example evaluating the performance of a Framingham Heart Study model.

1,339 citations


Journal ArticleDOI
TL;DR: An effort to make available a tool for clinicians to aid in their decision‐making process regarding treatment and to assist them in motivating patients toward healthy behaviours is made available.
Abstract: The Framingham Heart Study has been a leader in the development and dissemination of multivariable statistical models to estimate the risk of coronary heart disease. These models quantify the impact of measurable and modifiable risk factors on the development of coronary heart disease and can be used to generate estimates of risk of coronary heart disease over a predetermined period, for example the next 10 years. We developed a system, which we call a points system, for making these complex statistical models useful to practitioners. The system is easy to use, it does not require a calculator or computer and it simplifies the estimation of risk based on complex statistical models. This system represents an effort to make available a tool for clinicians to aid in their decision-making process regarding treatment and to assist them in motivating patients toward healthy behaviours. The system is also readily available to patients who can easily estimate their own coronary heart disease risk and monitor this risk over time.

1,247 citations


Journal ArticleDOI
TL;DR: An Erratum has been published for this article in Statistics in Medicine 2005; 24(1):156.
Abstract: Many factors determine a woman's risk of breast cancer. Some of them are genetic and relate to family history, others are based on personal factors such as reproductive history and medical history. While many papers have concentrated on subsets of these risk factors, no papers have incorporated personal risk factors with a detailed genetic analysis. There is a need to combine these factors to provide a better overall determinant of risk. The discovery of the BRCA1 and BRCA2 genes has explained some of the genetic determinants of breast cancer risk, but these genes alone do not explain all of the familial aggregation of breast cancer. We have developed a model incorporating the BRCA genes, a low penetrance gene and personal risk factors. For an individual woman her family history is used in conjuction with Bayes theorem to iteratively produce the likelihood of her carrying any genes predisposing to breast cancer, which in turn affects her likelihood of developing breast cancer. This risk was further refined based on the woman's personal history. The model has been incorporated into a computer program that gives a personalised risk estimate.

1,079 citations


Journal ArticleDOI
TL;DR: It is demonstrated in particular that fixed effect meta‐regression is likely to produce seriously misleading results in the presence of heterogeneity, and the permutation test is recommended before a statistically significant relationship is claimed from a standard meta-regression analysis.
Abstract: Meta-regression has become a commonly used tool for investigating whether study characteristics may explain heterogeneity of results among studies in a systematic review. However, such explorations of heterogeneity are prone to misleading false-positive results. It is unclear how many covariates can reliably be investigated, and how this might depend on the number of studies, the extent of the heterogeneity and the relative weights awarded to the different studies. Our objectives in this paper are two-fold. First, we use simulation to investigate the type I error rate of meta-regression in various situations. Second, we propose a permutation test approach for assessing the true statistical significance of an observed meta-regression finding. Standard meta-regression methods suffer from substantially inflated false-positive rates when heterogeneity is present, when there are few studies and when there are many covariates. These are typical of situations in which meta-regressions are routinely employed. We demonstrate in particular that fixed effect meta-regression is likely to produce seriously misleading results in the presence of heterogeneity. The permutation test appropriately tempers the statistical significance of meta-regression findings. We recommend its use before a statistically significant relationship is claimed from a standard meta-regression analysis.

1,076 citations


Journal ArticleDOI
TL;DR: The model can be estimated in any software package that estimates GLMs with user‐defined link functions and utilizes the theory of generalized linear models for assessing goodness‐of‐fit and studying regression diagnostics.
Abstract: Four approaches to estimating a regression model for relative survival using the method of maximum likelihood are described and compared. The underlying model is an additive hazards model where the total hazard is written as the sum of the known baseline hazard and the excess hazard associated with a diagnosis of cancer. The excess hazards are assumed to be constant within pre-specified bands of follow-up. The likelihood can be maximized directly or in the framework of generalized linear models. Minor differences exist due to, for example, the way the data are presented (individual, aggregated or grouped), and in some assumptions (e.g. distributional assumptions). The four approaches are applied to two real data sets and produce very similar estimates even when the assumption of proportional excess hazards is violated. The choice of approach to use in practice can, therefore, be guided by ease of use and availability of software. We recommend using a generalized linear model with a Poisson error structure based on collapsed data using exact survival times. The model can be estimated in any software package that estimates GLMs with user-defined link functions (including SAS, Stata, S-plus, and R) and utilizes the theory of generalized linear models for assessing goodness-of-fit and studying regression diagnostics.

787 citations



Journal ArticleDOI
TL;DR: A simulation study performed on data from a cohort of 84329 French women followed prospectively for breast cancer occurrence found that bias could occur even when the covariate of interest was independent from age, especially with time‐dependent covariates.
Abstract: Cox's regression model is widely used for assessing associations between potential risk factors and disease occurrence in epidemiologic cohort studies. Although age is often a strong determinant of disease risk, authors have frequently used time-on-study instead of age as the time-scale, as for clinical trials. Unless the baseline hazard is an exponential function of age, this approach can yield different estimates of relative hazards than using age as the time-scale, even when age is adjusted for. We performed a simulation study in order to investigate the existence and magnitude of bias for different degrees of association between age and the covariate of interest. Age to disease onset was generated from exponential, Weibull or piecewise Weibull distributions, and both fixed and time-dependent dichotomous covariates were considered. We observed no bias upon using age as the time-scale. Upon using time-on-study, we verified the absence of bias for exponentially distributed age to disease onset. For non-exponential distributions, we found that bias could occur even when the covariate of interest was independent from age. It could be severe in case of substantial association with age, especially with time-dependent covariates. These findings were illustrated on data from a cohort of 84,329 French women followed prospectively for breast cancer occurrence. In view of our results, we strongly recommend not using time-on-study as the time-scale for analysing epidemiologic cohort data.

529 citations


Journal ArticleDOI
TL;DR: A logistic regression model may be used to provide predictions of outcome for individual patients at another centre than where the model was developed to improve predictions for future patients.
Abstract: A logistic regression model may be used to provide predictions of outcome for individual patients at another centre than where the model was developed. When empirical data are available from this centre, the validity of predictions can be assessed by comparing observed outcomes and predicted probabilities. Subsequently, the model may be updated to improve predictions for future patients. As an example, we analysed 30-day mortality after acute myocardial infarction in a large data set (GUSTO-I, n = 40 830). We validated and updated a previously published model from another study (TIMI-II, n = 3339) in validation samples ranging from small (200 patients, 14 deaths) to large (10,000 patients, 700 deaths). Updated models were tested on independent patients. Updating methods included re-calibration (re-estimation of the intercept or slope of the linear predictor) and more structural model revisions (re-estimation of some or all regression coefficients, model extension with more predictors). We applied heuristic shrinkage approaches in the model revision methods, such that regression coefficients were shrunken towards their re-calibrated values. Parsimonious updating methods were found preferable to more extensive model revisions, which should only be attempted with relatively large validation samples in combination with shrinkage.

Journal ArticleDOI
TL;DR: This article gives an overview of sample size calculations for parallel group and cross-over studies with Normal data and demonstrates how the different trial objectives influence the null and alternative hypotheses of the trials and how these hypotheses influence the calculations.
Abstract: This article gives an overview of sample size calculations for parallel group and cross-over studies with Normal data. Sample size derivation is given for trials where the objective is to demonstrate: superiority, equivalence, non-inferiority, bioequivalence and estimation to a given precision, for different types I and II errors. It is demonstrated how the different trial objectives influence the null and alternative hypotheses of the trials and how these hypotheses influence the calculations. Sample size tables for the different types of trials and worked examples are given. Copyright © 2004 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This paper investigates generalized estimating equations for association parameters, which are frequently of interest in family studies, with emphasis on covariance estimation, and finds that the formula for the approximate jackknife variance estimator in Ziegler et al. is deficient, resulting in systematic deviations from the fully iterated jackknifevariance estimator.
Abstract: This paper investigates generalized estimating equations for association parameters, which are frequently of interest in family studies, with emphasis on covariance estimation. Separate link functions are used to connect the mean, the scale, and the correlation to linear predictors involving possibly different sets of covariates, and separate estimating equations are proposed for the three sets of parameters. Simulations show that the robust 'sandwich' variance estimator and the jackknife variance estimator for the correlation parameters are generally close to the empirical variance for the sample size of 50 clusters. The results contradict Ziegler et al. and Kastner and Ziegler, where the 'sandwich' estimator obtained from the software MAREG was shown to be unsuitable for practical usage. The problem appears to arise because the MAREG variance estimator does not account for variability in estimation of the scale parameters, but may be valid with fixed scale. We also find that the formula for the approximate jackknife variance estimator in Ziegler et al. is deficient, resulting in systematic deviations from the fully iterated jackknife variance estimator. A general jackknife formula is provided and performs well in numerical studies. Data from a study on the genetics of alcoholism is used to illustrate the importance of reliable variance estimation in biomedical applications.

Journal ArticleDOI
TL;DR: A well‐founded and reliable measure of the prognostic ability of a model would be valuable to help define the separation between patients or prognostic groups that the model could provide, and to act as a benchmark of model performance in a validation setting.
Abstract: Multivariable prognostic models are widely used in cancer and other disease areas, and have a range of applications in clinical medicine, clinical trials and allocation of health services resources. A well-founded and reliable measure of the prognostic ability of a model would be valuable to help define the separation between patients or prognostic groups that the model could provide, and to act as a benchmark of model performance in a validation setting. We propose such a measure for models of survival data. Its motivation derives originally from the idea of separation between Kaplan-Meier curves. We define the criteria for a successful measure and discuss them with respect to our approach. Adjustments for 'optimism', the tendency for a model to predict better on the data on which it was derived than on new data, are suggested. We study the properties of the measure by simulation and by example in three substantial data sets. We believe that our new measure will prove useful as a tool to evaluate the separation available-with a prognostic model.

Journal ArticleDOI
TL;DR: This tutorial is designed to synthesize and illustrate the broad array of techniques that are used to address outcome‐related drop‐out, with emphasis on regression‐based methods.
Abstract: Drop-out is a prevalent complication in the analysis of data from longitudinal studies, and remains an active area of research for statisticians and other quantitative methodologists. This tutorial is designed to synthesize and illustrate the broad array of techniques that are used to address outcome-related drop-out, with emphasis on regression-based methods. We begin with a review of important assumptions underlying likelihood-based and semi-parametric models, followed by an overview of models and methods used to draw inferences from incomplete longitudinal data. The majority of the tutorial is devoted to detailed analysis of two studies with substantial rates of drop-out, designed to illustrate the use of effective methods that are relatively easy to apply: in the first example, we use both semi-parametric and fully parametric models to analyse repeated binary responses from a clinical trial of smoking cessation interventions; in the second, pattern mixture models are used to analyse longitudinal CD4 counts from an observational cohort study of HIV-infected women. In each example, we describe exploratory analyses, model formulation, estimation methodology and interpretation of results. Analyses of incomplete data requires making unverifiable assumptions, and these are discussed in detail within the context of each application. Relevant SAS code is provided.

Journal ArticleDOI
TL;DR: A transmission model was applied to a data set consisting of the follow‐up of influenza symptoms in 334 households during 15 days after an index case visited a general practitioner with virologically confirmed influenza.
Abstract: We propose a transmission model to estimate the main characteristics of influenza transmission in households. The model details the risks of infection in the household and in the community at the individual scale. Heterogeneity among subjects is investigated considering both individual susceptibility and infectiousness. The model was applied to a data set consisting of the follow-up of influenza symptoms in 334 households during 15 days after an index case visited a general practitioner with virologically confirmed influenza. Estimating the parameters of the transmission model was challenging because a large part of the infectious process was not observed: only the dates when new cases were detected were observed. For each case, the data were augmented with the unobserved dates of the start and the end of the infectious period. The transmission model was included in a 3-levels hierarchical structure: (i) the observation level ensured that the augmented data were consistent with the observed data, (ii) the transmission level described the underlying epidemic process, (iii) the prior level specified the distribution of the parameters. From a Bayesian perspective, the joint posterior distribution of model parameters and augmented data was explored by Markov chain Monte Carlo (MCMC) sampling. The mean duration of influenza infectious period was estimated at 3.8 days (95 per cent credible interval, 95 per cent CI [3.1,4.6]) with a standard deviation of 2.0 days (95 per cent CI [1.1,2.8]). The instantaneous risk of influenza transmission between an infective and a susceptible within a household was found to decrease with the size of the household, and established at 0.32 person day(-1) (95 per cent CI [0.26,0.39]); the instantaneous risk of infection from the community was 0.0056 day(-1) (95 per cent CI [0.0029,0.0087]). Focusing on the differences in transmission between children (less than 15 years old) and adults, we estimated that the former were more likely to transmit than adults (posterior probability larger than 99 per cent), but that the mean duration of the infectious period was similar in children (3.6 days, 95 per cent CI [2.3,5.2]) and adults (3.9 days, 95 per cent CI [3.2,4.9]). The posterior probability that children had a larger community risk was 76 per cent and the posterior probability that they were more susceptible than adults was 79 per cent.

Journal ArticleDOI
TL;DR: The LMSP method of centile estimation is applied to modelling the body mass index of Dutch males against age by modelling each of the four parameters of the BCPE distribution as a smooth non-parametric function of an explanatory variable.
Abstract: The Box–Cox power exponential (BCPE) distribution, developed in this paper, provides a model for a dependent variable Y exhibiting both skewness and kurtosis (leptokurtosis or platykurtosis). The distribution is defined by a power transformation Yν having a shifted and scaled (truncated) standard power exponential distribution with parameter τ. The distribution has four parameters and is denoted BCPE (µ,σ,ν,τ). The parameters, µ, σ, ν and τ, may be interpreted as relating to location (median), scale (approximate coefficient of variation), skewness (transformation to symmetry) and kurtosis (power exponential parameter), respectively. Smooth centile curves are obtained by modelling each of the four parameters of the distribution as a smooth non-parametric function of an explanatory variable. A Fisher scoring algorithm is used to fit the non-parametric model by maximizing a penalized likelihood. The first and expected second and cross derivatives of the likelihood, with respect to µ, σ, ν and τ, required for the algorithm, are provided. The centiles of the BCPE distribution are easy to calculate, so it is highly suited to centile estimation. This application of the BCPE distribution to smooth centile estimation provides a generalization of the LMS method of the centile estimation to data exhibiting kurtosis (as well as skewness) different from that of a normal distribution and is named here the LMSP method of centile estimation. The LMSP method of centile estimation is applied to modelling the body mass index of Dutch males against age. Copyright © 2004 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The ability of the approach to detect and display treatment/covariate interactions in two examples from controlled trials in cancer is demonstrated.
Abstract: We consider modelling interaction between a categoric covariate T and a continuous covariate Z in a regression model. Here T represents the two treatment arms in a parallel-group clinical trial and Z is a prognostic factor which may influence response to treatment (known as a predictive factor). Generalization to more than two treatments is straightforward. The usual approach to analysis is to categorize Z into groups according to cutpoint(s) and to analyse the interaction in a model with main effects and multiplicative terms. The cutpoint approach raises several well-known and difficult issues for the analyst. We propose an alternative approach based on fractional polynomial (FP) modelling of Z in all patients and at each level of T. Other prognostic variables can also be incorporated by first constructing a multivariable adjustment model which may contain binary covariates and FP transformations of continuous covariates other than Z. The main step involves FP modelling of Z and testing equality of regression coefficients between treatment groups in an interaction model adjusted for other covariates. Extensive experience suggests that a two-term fractional polynomial (FP2) function may describe the effect of a prognostic factor on a survival outcome quite well. In a controlled trial, this FP2 function describes the prognostic effect averaged over the treatment groups. We refit this function in each treatment group to see if there are substantial differences between groups. Allowing different parameter values for the chosen FP2 function is flexible enough to detect such differences. Within the same algorithm we can also deal with the conceptually different cases of a predefined hypothesis of interaction or searching for interactions. We demonstrate the ability of the approach to detect and display treatment/covariate interactions in two examples from controlled trials in cancer.

Journal ArticleDOI
TL;DR: A new method to aggregate survival trees in order to obtain better predictions for breast cancer and lymphoma patients is suggested and the aggregated Kaplan–Meier curve of a new observation is defined by the Kaplan– meiers curve of all observations identified by the B leaves containing the new observation.
Abstract: Predicted survival probability functions of censored event free survival are improved by bagging survival trees. We suggest a new method to aggregate survival trees in order to obtain better predictions for breast cancer and lymphoma patients. A set of survival trees based on B bootstrap samples is computed. We define the aggregated Kaplan-Meier curve of a new observation by the Kaplan-Meier curve of all observations identified by the B leaves containing the new observation. The integrated Brier score is used for the evaluation of predictive models. We analyse data of a large trial on node positive breast cancer patients conducted by the German Breast Cancer Study Group and a smaller 'pilot' study on diffuse large B-cell lymphoma, where prognostic factors are derived from microarray expression values. In addition, simulation experiments underline the predictive power of our proposal.

Journal ArticleDOI
TL;DR: The definition of ROC surfaces here is less complex and addresses directly the narrower problem of ordered categories in the three-class and, by extension, the multi-class problem applied to continuous and ordinal data.
Abstract: Receiver operating characteristic (ROC) curves have been useful in two-group classification problems. In three- and multiple-class diagnostic problems, an ROC surface or hyper-surface can be constructed. The volume under these surfaces can be used for inference using bootstrap techniques or U-statistics theory. In this article, ROC surfaces and hyper-surfaces are defined and their behaviour and utility in multi-group classification problems is investigated. The formulation of the problem is equivalent to what has previously been proposed in the general multi-category classification problem but the definition of ROC surfaces here is less complex and addresses directly the narrower problem of ordered categories in the three-class and, by extension, the multi-class problem applied to continuous and ordinal data. Non-parametric manipulation of both continuous and discrete test data and comparison between two diagnostic tests applied to the same subjects are considered. A three-group classification example in the context of HIV neurological disease is presented and the results are discussed.

Journal ArticleDOI
TL;DR: A method for estimating the floated variances is presented that improves on the previously proposed 'heuristic' method and may be calculated iteratively with a simple algorithm.
Abstract: Floating absolute risks are an alternative way of presenting relative risk estimates for polychotomous risk factors. Instead of choosing one level of the risk factor as a reference category, each level is assigned a 'floated' variance which describes the uncertainty in risk without reference to another level. In this paper, a method for estimating the floated variances is presented that improves on the previously proposed 'heuristic' method. The estimates may be calculated iteratively with a simple algorithm. A benchmark for validating the floated variance estimates is also proposed and an interpretation of floating confidence intervals is given.

Journal ArticleDOI
TL;DR: The methods to assess sensitivity of the analysis of Hernán et al., who used an MSM to estimate the causal effect of zidovudine therapy on repeated CD4 counts among HIV-infected men in the Multicenter AIDS Cohort Study, show that under the assumption of a moderate amount of unmeasured confounding, a 95% confidence interval for the treatment effect no longer includes zero.
Abstract: Robins introduced marginal structural models (MSMs) and inverse probability of treatment weighted (IPTW) estimators for the causal effect of a time-varying treatment on the mean of repeated measures. We investigate the sensitivity of IPTW estimators to unmeasured confounding. We examine a new framework for sensitivity analyses based on a nonidentifiable model that quantifies unmeasured confounding in terms of a sensitivity parameter and a user-specified function. We present augmented IPTW estimators of MSM parameters and prove their consistency for the causal effect of an MSM, assuming a correct confounding bias function for unmeasured confounding. We apply the methods to assess sensitivity of the analysis of Hernan et al., who used an MSM to estimate the causal effect of zidovudine therapy on repeated CD4 counts among HIV-infected men in the Multicenter AIDS Cohort Study. Under the assumption of no unmeasured confounders, a 95 per cent confidence interval for the treatment effect includes zero. We show that under the assumption of a moderate amount of unmeasured confounding, a 95 per cent confidence interval for the treatment effect no longer includes zero. Thus, the analysis of Hernan et al. is somewhat sensitive to unmeasured confounding. We hope that our research will encourage and facilitate analyses of sensitivity to unmeasured confounding in other applications.

Journal ArticleDOI
TL;DR: It is found that the inflation of the type I error rate increases with increasing sample size, as the correlation between the risk factor and the confounding variable increases, and with a decrease in the number of categories into which the confounder is divided.
Abstract: This paper demonstrates an inflation of the type I error rate that occurs when testing the statistical significance of a continuous risk factor after adjusting for a correlated continuous confounding variable that has been divided into a categorical variable. We used Monte Carlo simulation methods to assess the inflation of the type I error rate when testing the statistical significance of a risk factor after adjusting for a continuous confounding variable that has been divided into categories. We found that the inflation of the type I error rate increases with increasing sample size, as the correlation between the risk factor and the confounding variable increases, and with a decrease in the number of categories into which the confounder is divided. Even when the confounder is divided in a five-level categorical variable, the inflation of the type I error rate remained high when both the sample size and the correlation between the risk factor and the confounder were high.

Journal ArticleDOI
TL;DR: This approach compares favourably with competing methods and appears minimally affected by violation of the assumption of a gamma-distributed frailty.
Abstract: In randomized clinical trials, subjects are recruited at multiple study centres. Factors that vary across centres may exert a powerful independent influence on study outcomes. A common problem is how to incorporate these centre effects into the analysis of censored time-to-event data. We survey various methods and find substantial advantages in the gamma frailty model. This approach compares favourably with competing methods and appears minimally affected by violation of the assumption of a gamma-distributed frailty. Recent computational advances make use of the gamma frailty model a practical and appealing tool for addressing centre effects in the analysis of multicentre trials.

Journal ArticleDOI
TL;DR: This paper shows that increasing the sample size when the unblinded interim result is promising will not inflate the type I error rate and therefore no statistical adjustment is necessary and the typeI error rate is also well controlled.
Abstract: Increasing the sample size based on unblinded interim result may inflate the type I error rate and appropriate statistical adjustments may be needed to control the type I error rate at the nominal level. We briefly review the existing approaches which allow early stopping due to futility, or change the test statistic by using different weights, or adjust the critical value for final test, or enforce rules for sample size recalculation. The implication of early stopping due to futility and a simple modification to the weighted Z-statistic approach are discussed. In this paper, we show that increasing the sample size when the unblinded interim result is promising will not inflate the type I error rate and therefore no statistical adjustment is necessary. The unblinded interim result is considered promising if the conditional power is greater than 50 per cent or equivalently, the sample size increment needed to achieve a desired power does not exceed an upper bound. The actual sample size increment may be determined by important factors such as budget, size of the eligible patient population and competition in the market. The 50 per cent-conditional-power approach is extended to a group sequential trial with one interim analysis where a decision may be made at the interim analysis to stop the trial early due to a convincing treatment benefit, or to increase the sample size if the interim result is not as good as expected. The type I error rate will not be inflated if the sample size may be increased only when the conditional power is greater than 50 per cent. If there are two or more interim analyses in a group sequential trial, our simulation study shows that the type I error rate is also well controlled.

Journal ArticleDOI
TL;DR: In this paper, a longitudinal study on maternal stress, child illness, and maternal employment illustrates concepts such as marginalized latent variable models, marginalized transition models, and marginalized latent variables, which allow for simple procedures to determine a suitable dependence model for the data.
Abstract: 3401 regression structure may be the primary focus. The marginalized latent variable models allow a exible choice between modelling the marginal means or the conditional means. The marginalized transition models separate the dependence on the exposure variables from the dependence on previous response values. Orthogonality properties between the mean and the dependence parameters in a marginalized model secure robustness for the marginal means. Marginalized models further allow for simple procedures to determine a suitable dependence model for the data. Chapter 12 on time-dependent covariates is also new. The temporal order between key exposure and response events is emphasized and exogenous and endogenous covariates are formally deÿned. When covariates are endogenous, then meaningful targets for inference need to be formulated as well as valid methods of estimation. A longitudinal study on maternal stress, child illness and maternal employment illustrates concepts. The scientiÿc questions include (i) Is there an association between maternal employment and stress? (ii) Is there an association between maternal employment and child illness? (iii) Do the data provide evidence that maternal stress causes child illness? Since stress may be in the causal pathway that leads from employment to illness no adjustment is made for the daily stress indicators when evaluating the dependence of illness on employment. Similarly no adjustment is made for illness in the analysis of employment and stress. Question (iii) raises issues such as 'does illness at day t depend on prior stress measured at day (t − k)' and 'does illness on day (t − k) predict stress on day t'. A covariate which is both a predictor for the response and is predicted by earlier responses is endogenous. No standard regression methods are available to obtain causal statements when dealing with endogenous covari-ates. Targets for inference are discussed in terms of counterfactual outcomes. Causal eeects refer to interventions in the entire population rather than among possibly select, observed subgroups. Focus is on an average response after assignment of the covariate value rather than the average response in subgroups after simply observing the covariate status. The g-computation algorithm of Robins is presented as well as estimation using inverse probability of treatment weighting (IPTW). Chapter 13 discusses approaches to dealing with incomplete data in longitudinal studies, with emphasis on random and informative missing data mechanism. Likelihood inference and generalized estimating equations when data are missing at random are dealt with. Selection models and pattern mixture models are …


Journal ArticleDOI
TL;DR: A nested case‐control sample from the Physicians' Health Study, a randomized trial assessing the effects of aspirin and beta‐carotene on cardiovascular disease and cancer among 22071 US male physicians, was used to examine relationships for ischemic stroke.
Abstract: In the biology of complex disorders, such as atherothrombosis, interactions among genetic factors may play an important role, and theoretical considerations suggest that gene-gene interactions are quite common in such diseases. We used a nested case-control sample from the Physicians' Health Study, a randomized trial assessing the effects of aspirin and beta-carotene on cardiovascular disease and cancer among 22071 US male physicians, to examine these relationships for ischemic stroke. Data were available on 92 polymorphisms from 56 candidate genes related to inflammation, thrombosis and lipid metabolism, assessed in 319 incident cases of ischemic stroke and 2090 disease-free controls. We used classification and regression trees (CART) and multivariate adaptive regression spline (MARS) models to explore the presence of genetic interactions in these data. These models offer advantages over typical logistic regression methods in that they may uncover interactions among genes that do not exhibit strong marginal effects. Final models were selected using either the Bayes Information Criterion or cross-validation. Model fit was assessed using 10-fold cross-validation of the entire selection process. Both the CART and two-way MARS-logit models identified an interaction between two polymorphisms linked to inflammation, the P-selectin (val640leu) and interleukin-4 (C(582) T) genes. Internal validation of these models, however, suggested that effects of these polymorphisms are additive. Although further external validation of these models is necessary, these methods may be valuable in exploring and identifying potential gene-gene as well as gene-environment interactions in association studies.

Journal ArticleDOI
TL;DR: Analysis of DIF in the Cognitive Assessment Screening Instrument (CASI) using data from a large cohort study of elderly adults is presented and an ordinal logistic regression modelling technique to assess test items for DIF is developed.
Abstract: Assessment of test bias is important to establish the construct validity of tests. Assessment of differential item functioning (DIF) is an important first step in this process. DIF is present when examinees from different groups have differing probabilities of success on an item, after controlling for overall ability level. Here, we present analysis of DIF in the Cognitive Assessment Screening Instrument (CASI) using data from a large cohort study of elderly adults. We developed an ordinal logistic regression modelling technique to assess test items for DIF. Estimates of cognitive ability were obtained in two ways based on responses to CASI items: using traditional CASI scoring according to the original test instructions as well as using item response theory (IRT) scoring. Several demographic characteristics were examined for potential DIF, including ethnicity and gender (entered into the model as dichotomous variables), and years of education and age (entered as continuous variables). We found that a disappointingly large number of items had DIF with respect to at least one of these demographic variables. More items were found to have DIF with traditional CASI scoring than with IRT scoring. This study demonstrates a powerful technique for the evaluation of DIF in psychometric tests. The finding that so many CASI items had DIF suggests that previous findings of differences between groups in cognitive functioning as measured by the CASI may be due to biased test items rather than true differences between groups. The finding that IRT scoring diminished the impact of DIF is discussed. Some preliminary suggestions for how to deal with items found to have DIF in cognitive tests are made. The advantages of the DIF detection techniques we developed are discussed in relation to other techniques for the evaluation of DIF. Copyright © 2004 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A family of two-stage designs that are admissible according to a Bayesian decision-theoretic criterion based on an ethically justifiable loss function are developed and shown to include as special cases the Simon's minimax and the optimal designs.
Abstract: In a typical two-stage design for a phase II cancer clinical trial for efficacy screening of cytotoxic agents, a fixed number of patients are initially enrolled and treated. The trial may be terminated for lack of efficacy if the observed number of tumour responses after the first stage is too small, thus avoiding treatment of patient with inefficacious regimen. Otherwise, an additional fixed number of patients are enrolled and treated to accumulate additional information on efficacy as well as safety. The minimax and the so-called 'optimal' designs by Simon have been widely used, and other designs have largely been ignored in the past for such two-stage cancer clinical trials. Recently Jung et al. proposed a graphical method to search for compromise designs with features more favourable than either the minimax or the optimal design. In this paper, we develop a family of two-stage designs that are admissible according to a Bayesian decision-theoretic criterion based on an ethically justifiable loss function. We show that the admissible designs include as special cases the Simon's minimax and the optimal designs as well as the compromise designs introduced by Jung et al. We also present a Java program to search for admissible designs that are compromises between the minimax and the optimal designs.