scispace - formally typeset
Search or ask a question

Showing papers in "Biometrical Journal in 2015"


Journal ArticleDOI
TL;DR: It is argued and demonstrated in this paper that the occurrence of zero events in clinical trials or cohort studies, even if zeros occur in both arms, is less problematic, at least from a statistical perspective, if the available statistical tools are applied in the appropriate way.
Abstract: Meta-analysis of rare event studies has recently become a subject of controversy and debate. We will argue and demonstrate in this paper that the occurrence of zero events in clinical trials or cohort studies, even if zeros occur in both arms (the case of a double-zero trial), is less problematic, at least from a statistical perspective, if the available statistical tools are applied in the appropriate way. In particular, it is neither necessary nor advisable to exclude studies with zero events from the meta-analysis. In terms of statistical tools, we will focus here on Mantel-Haenszel techniques, mixed Poisson regression and related regression models.

63 citations


Journal ArticleDOI
TL;DR: The European Medicines Agency´s (EMA) draft policy on proactive access to clinical trial data was published at the end of June 2013 and open for public consultation until the end- September 2013 as mentioned in this paper.
Abstract: In recent months one of the most controversially discussed topics among regulatory agencies, the pharmaceutical industry, journal editors, and academia has been the sharing of patient-level clinical trial data. Several projects have been started such as the European Medicines Agency´s (EMA) “proactive publication of clinical trial data”, the BMJ open data campaign, or the AllTrials initiative. The executive director of the EMA, Dr. Guido Rasi, has recently announced that clinical trial data on patient level will be published from 2014 onwards (although it has since been delayed). The EMA draft policy on proactive access to clinical trial data was published at the end of June 2013 and open for public consultation until the end of September 2013. These initiatives will change the landscape of drug development and publication of medical research. They provide unprecedented opportunities for research and research synthesis, but pose new challenges for regulatory authorities, sponsors, scientific journals, and the public. Besides these general aspects, data sharing also entails intricate biostatistical questions such as problems of multiplicity. An important issue in this respect is the interpretation of multiple statistical analyses, both prospective and retrospective. Expertise in biostatistics is needed to assess the interpretation of such multiple analyses, for example, in the context of regulatory decision-making by optimizing procedural guidance and sophisticated analysis methods.

61 citations


Journal ArticleDOI
TL;DR: It is determined that measures of ideal model performance can be estimated within imputed datasets and subsequently pooled to give an overall measure of model performance.
Abstract: Multiple imputation can be used as a tool in the process of constructing prediction models in medical and epidemiological studies with missing covariate values. Such models can be used to make predictions for model performance assessment, but the task is made more complicated by the multiple imputation structure. We summarize various predictions constructed from covariates, including multiply imputed covariates, and either the set of imputation-specific prediction model coefficients or the pooled prediction model coefficients. We further describe approaches for using the predictions to assess model performance. We distinguish between ideal model performance and pragmatic model performance, where the former refers to the model's performance in an ideal clinical setting where all individuals have fully observed predictors and the latter refers to the model's performance in a real-world clinical setting where some individuals have missing predictors. The approaches are compared through an extensive simulation study based on the UK700 trial. We determine that measures of ideal model performance can be estimated within imputed datasets and subsequently pooled to give an overall measure of model performance. Alternative methods to evaluate pragmatic model performance are required and we propose constructing predictions either from a second set of covariate imputations which make no use of observed outcomes, or from a set of partial prediction models constructed for each potential observed pattern of covariate. Pragmatic model performance is generally lower than ideal model performance. We focus on model performance within the derivation data, but describe how to extend all the methods to a validation dataset.

55 citations


Journal ArticleDOI
TL;DR: It is demonstrated that outside of the classic model there are practically reasonable ROC types for which comparisons of noncrossing concave curves would be more powerful when based on a part of the curve rather than the entire curve, and it is argued that this phenomenon stems in part from the exclusion of noninformative parts of the ROC curves that resemble straight‐lines.
Abstract: Evaluation of diagnostic performance is typically based on the receiver operating characteristic (ROC) curve and the area under the curve (AUC) as its summary index The partial area under the curve (pAUC) is an alternative index focusing on the range of practical/clinical relevance One of the problems preventing more frequent use of the pAUC is the perceived loss of efficiency in cases of noncrossing ROC curves In this paper, we investigated statistical properties of comparisons of two correlated pAUCs We demonstrated that outside of the classic model there are practically reasonable ROC types for which comparisons of noncrossing concave curves would be more powerful when based on a part of the curve rather than the entire curve We argue that this phenomenon stems in part from the exclusion of noninformative parts of the ROC curves that resemble straight-lines We conducted extensive simulation studies in families of binormal, straight-line, and bigamma ROC curves We demonstrated that comparison of pAUCs is statistically more powerful than comparison of full AUCs when ROC curves are close to a "straight line" For less flat binormal ROC curves an increase in the integration range often leads to a disproportional increase in pAUCs' difference, thereby contributing to an increase in statistical power Thus, efficiency of differences in pAUCs of noncrossing ROC curves depends on the shape of the curves, and for families of ROC curves that are nearly straight-line shaped, such as bigamma ROC curves, there are multiple practical scenarios in which comparisons of pAUCs are preferable

55 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a method for sparse CCA, which combines an alternating regression approach together with a lasso penalty to induce sparsity in the canonical vectors, thereby increasing the interpretability of the canonical variates.
Abstract: Canonical correlation analysis (CCA) describes the associations between two sets of variables by maximizing the correlation between linear combinations of the variables in each dataset. However, in high-dimensional settings where the number of variables exceeds the sample size or when the variables are highly correlated, traditional CCA is no longer appropriate. This paper proposes a method for sparse CCA. Sparse estimation produces linear combinations of only a subset of variables from each dataset, thereby increasing the interpretability of the canonical variates. We consider the CCA problem from a predictive point of view and recast it into a regression framework. By combining an alternating regression approach together with a lasso penalty, we induce sparsity in the canonical vectors. We compare the performance with other sparse CCA techniques in different simulation settings and illustrate its usefulness on a genomic dataset.

47 citations


Journal ArticleDOI
TL;DR: It is proposed to quantify risks with utility functions and investigate nonadaptive study designs that allow for inference on subgroups using multiple testing procedures as well as adaptive designs, where subgroups may be selected in an interim analysis.
Abstract: If the response to treatment depends on genetic biomarkers, it is important to identify predictive biomarkers that define (sub-)populations where the treatment has a positive benefit risk balance. One approach to determine relevant subpopulations are subgroup analyses where the treatment effect is estimated in biomarker positive and biomarker negative groups. Subgroup analyses are challenging because several types of risks are associated with inference on subgroups. On the one hand, by disregarding a relevant subpopulation a treatment option may be missed due to a dilution of the treatment effect in the full population. Furthermore, even if the diluted treatment effect can be demonstrated in an overall population, it is not ethical to treat patients that do not benefit from the treatment when they can be identified in advance. On the other hand, selecting a spurious subpopulation increases the risk to restrict an efficacious treatment to a too narrow fraction of a potential benefiting population. We propose to quantify these risks with utility functions and investigate nonadaptive study designs that allow for inference on subgroups using multiple testing procedures as well as adaptive designs, where subgroups may be selected in an interim analysis. The characteristics of such adaptive and nonadaptive designs are compared for a range of scenarios.

45 citations


Journal ArticleDOI
TL;DR: Bootstrap resampling will be used to assess variable selection stability, to derive a predictor that incorporates model uncertainty, check for influential points, and visualize the variable selection process.
Abstract: In many areas of science where empirical data are analyzed, a task is often to identify important variables with influence on an outcome. Most often this is done by using a variable selection strategy in the context of a multivariable regression model. Using a study on ozone effects in children (n = 496, 24 covariates), we will discuss aspects relevant for deriving a suitable model. With an emphasis on model stability, we will explore and illustrate differences between predictive models and explanatory models, the key role of stopping criteria, and the value of bootstrap resampling (with and without replacement). Bootstrap resampling will be used to assess variable selection stability, to derive a predictor that incorporates model uncertainty, check for influential points, and visualize the variable selection process. For the latter two tasks we adapt and extend recent approaches, such as stability paths, to serve our purposes. Based on earlier experiences and on results from the example, we will argue for simpler models and that predictions are usually very similar, irrespective of the selection method used. Important differences exist for the corresponding variances, and the model uncertainty concept helps to protect against serious underestimation of the variance of a predictor-derived data dependently. Results of stability investigations illustrate severe difficulties in the task of deriving a suitable explanatory model. It seems possible to identify a small number of variables with an important and probably true influence on the outcome, but too often several variables are included whose selection may be a result of chance or may depend on a small number of observations.

45 citations


Journal ArticleDOI
TL;DR: An EM (expectation‐maximization) algorithm is proposed for estimating the model parameters and conducting variable selection simultaneously and shows that the new algorithm not only has more accurate or at least comparable estimation, but also is more robust than the traditional stepwise variable selection.
Abstract: In health services and outcome research, count outcomes are frequently encountered and often have a large proportion of zeros. The zero-inflated negative binomial (ZINB) regression model has important applications for this type of data. With many possible candidate risk factors, this paper proposes new variable selection methods for the ZINB model. We consider maximum likelihood function plus a penalty including the least absolute shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD), and minimax concave penalty (MCP). An EM (expectation-maximization) algorithm is proposed for estimating the model parameters and conducting variable selection simultaneously. This algorithm consists of estimating penalized weighted negative binomial models and penalized logistic models via the coordinated descent algorithm. Furthermore, statistical properties including the standard error formulae are provided. A simulation study shows that the new algorithm not only has more accurate or at least comparable estimation, but also is more robust than the traditional stepwise variable selection. The proposed methods are applied to analyze the health care demand in Germany using the open-source R package mpath.

44 citations


Journal ArticleDOI
TL;DR: This work presents a novel multiple testing method for testing null hypotheses that are structured in a directed acyclic graph (DAG) and can be seen as a generalization of Meinshausen's procedure for tree‐structured hypotheses.
Abstract: We present a novel multiple testing method for testing null hypotheses that are structured in a directed acyclic graph (DAG) The method is a top-down method that strongly controls the familywise error rate and can be seen as a generalization of Meinshausen's procedure for tree-structured hypotheses Just as Meinshausen's procedure, our proposed method can be used to test for variable importance, only the corresponding variable clusters can be chosen more freely, because the method allows for multiple parent nodes and partially overlapping hypotheses An important application of our method is in gene set analysis, in which one often wants to test multiple gene sets as well as individual genes for their association with a clinical outcome By considering the genes and gene sets as nodes in a DAG, our method enables us to test both for significant gene sets as well as for significant individual genes within the same multiple testing procedure The method will be illustrated by testing Gene Ontology terms for evidence of differential expression in a survival setting and is implemented in the R package cherry

39 citations


Journal ArticleDOI
TL;DR: This work synthesizes the outbreak detection algorithms of Noufaily et al. ( 2013) and Manitz and Höhle (2013) while additionally addressing right truncation caused by reporting delays and considers the resulting time series as an incomplete two‐way contingency table which it model using negative binomial regression.
Abstract: One use of infectious disease surveillance systems is the statistical aberration detection performed on time series of counts resulting from the aggregation of individual case reports. However, inherent reporting delays in such surveillance systems make the considered time series incomplete, which can be an impediment to the timely detection and thus to the containment of emerging outbreaks. In this work, we synthesize the outbreak detection algorithms of Noufaily et al. (2013) and Manitz and Hohle (2013) while additionally addressing right truncation caused by reporting delays. We do so by considering the resulting time series as an incomplete two-way contingency table which we model using negative binomial regression. Our approach is defined in a Bayesian setting allowing a direct inclusion of all sources of uncertainty in the derivation of whether an observed case count is to be considered an aberration. The proposed algorithm is evaluated both on simulated data and on the time series of Salmonella Newport cases in Germany in 2011. Altogether, our method aims at allowing timely aberration detection in the presence of reporting delays and hence underlines the need for statistical modeling to address complications of reporting systems. An implementation of the proposed method is made available in the R package surveillance as the function "bodaDelay".

37 citations


Journal ArticleDOI
TL;DR: A new HC test is provided that is typically more powerful than the original HC test in normal mixture models and motivated by the asymptotic behavior of the so-called local levels related to the originalHC test.
Abstract: The higher criticism (HC) statistic, which can be seen as a normalized version of the famous Kolmogorov–Smirnov statistic, has a long history, dating back to the mid seventies. Originally, HC statistics were used in connection with goodness of fit (GOF) tests but they recently gained some attention in the context of testing the global null hypothesis in high dimensional data. The continuing interest for HC seems to be inspired by a series of nice asymptotic properties related to this statistic. For example, unlike Kolmogorov–Smirnov tests, GOF tests based on the HC statistic are known to be asymptotically sensitive in the moderate tails, hence it is favorably applied for detecting the presence of signals in sparse mixture models. However, some questions around the asymptotic behavior of the HC statistic are still open. We focus on two of them, namely, why a specific intermediate range is crucial for GOF tests based on the HC statistic and why the convergence of the HC distribution to the limiting one is extremely slow. Moreover, the inconsistency in the asymptotic and finite behavior of the HC statistic prompts us to provide a new HC test that has better finite properties than the original HC test while showing the same asymptotics. This test is motivated by the asymptotic behavior of the so-called local levels related to the original HC test. By means of numerical calculations and simulations we show that the new HC test is typically more powerful than the original HC test in normal mixture models.

Journal ArticleDOI
TL;DR: In this paper, the authors reviewed options for graphical display and summary measures to assess the predictive value of markers over standard, readily available predictors, and illustrated various approaches using previously published data on 3264 participants from the Framingham Heart Study, where 183 developed coronary heart disease (10-year risk 5.6%).
Abstract: New markers may improve prediction of diagnostic and prognostic outcomes. We aimed to review options for graphical display and summary measures to assess the predictive value of markers over standard, readily available predictors. We illustrated various approaches using previously published data on 3264 participants from the Framingham Heart Study, where 183 developed coronary heart disease (10-year risk 5.6%). We considered performance measures for the incremental value of adding HDL cholesterol to a prediction model. An initial assessment may consider statistical significance (HR = 0.65, 95% confidence interval 0.53 to 0.80; likelihood ratio p < 0.001), and distributions of predicted risks (densities or box plots) with various summary measures. A range of decision thresholds is considered in predictiveness and receiver operating characteristic curves, where the area under the curve (AUC) increased from 0.762 to 0.774 by adding HDL. We can furthermore focus on reclassification of participants with and without an event in a reclassification graph, with the continuous net reclassification improvement (NRI) as a summary measure. When we focus on one particular decision threshold, the changes in sensitivity and specificity are central. We propose a net reclassification risk graph, which allows us to focus on the number of reclassified persons and their event rates. Summary measures include the binary AUC, the two-category NRI, and decision analytic variants such as the net benefit (NB). Various graphs and summary measures can be used to assess the incremental predictive value of a marker. Important insights for impact on decision making are provided by a simple graph for the net reclassification risk.

Journal ArticleDOI
TL;DR: A delayed recycling method is proposed that allocates the recycled significance level from Stage r onward, where r is prespecified, and it is shown that r cannot be chosen adaptively to coincide with the random stage at which the hypothesis from which the significance level is recycled is rejected.
Abstract: Graphical approaches have been proposed in the literature for testing hypotheses on multiple endpoints by recycling significance levels from rejected hypotheses to unrejected ones Recently, they have been extended to group sequential procedures (GSPs) Our focus in this paper is on the allocation of recycled significance levels from rejected hypotheses to the stages of the GSPs for unrejected hypotheses We propose a delayed recycling method that allocates the recycled significance level from Stage r onward, where r is prespecified We show that r cannot be chosen adaptively to coincide with the random stage at which the hypothesis from which the significance level is recycled is rejected Such an adaptive GSP does not always control the FWER One can choose r to minimize the expected sample size for a given power requirement We illustrate how a simulation approach can be used for this purpose Several examples, including a clinical trial example, are given to illustrate the proposed procedure


Journal ArticleDOI
TL;DR: A range of distances including the spike‐time distance and its variants, as well as cluster‐based distances and dissimilarity measures based on classical statistical summaries of point patterns are considered.
Abstract: This paper presents a collection of dissimilarity measures to describe and then classify spatial point patterns when multiple replicates of different types are available for analysis. In particular, we consider a range of distances including the spike-time distance and its variants, as well as cluster-based distances and dissimilarity measures based on classical statistical summaries of point patterns. We review and explore, in the form of a tutorial, their uses, and their pros and cons. These distances are then used to summarize and describe collections of repeated realizations of point patterns via prototypes and multidimensional scaling. We also show a simulation study to evaluate the performance of multidimensional scaling with two types of selected distances. Finally, a multivariate spatial point pattern of a natural plant community is analyzed through various of these measures of dissimilarity.

Journal ArticleDOI
TL;DR: This paper presents an extension of the joint modeling strategy for the case of multiple longitudinal outcomes and repeated infections of different types over time, motivated by postkidney transplantation data and implemented the parameterization used in joint models which uses the fitted longitudinal measurements as time‐dependent covariates in a relative risk model.
Abstract: This paper presents an extension of the joint modeling strategy for the case of multiple longitudinal outcomes and repeated infections of different types over time, motivated by postkidney transplantation data. Our model comprises two parts linked by shared latent terms. On the one hand is a multivariate mixed linear model with random effects, where a low-rank thin-plate spline function is incorporated to collect the nonlinear behavior of the different profiles over time. On the other hand is an infection-specific Cox model, where the dependence between different types of infections and the related times of infection is through a random effect associated with each infection type to catch the within dependence and a shared frailty parameter to capture the dependence between infection types. We implemented the parameterization used in joint models which uses the fitted longitudinal measurements as time-dependent covariates in a relative risk model. Our proposed model was implemented in OpenBUGS using the MCMC approach.

Journal ArticleDOI
TL;DR: This work model the joint distribution of recurrent events explicitly using parametric copulas within a Bayesian framework and illustrates the flexibility of this approach using data from an asthma prevention trial in young children.
Abstract: The analysis of recurrent event data is of particular importance in medical statistics where patients suffering from chronic diseases often present with multiple recurring relapses or cancer patients experience several tumor recurrences. Whereas individual subjects can be assumed to be independent, the times between events of one subject are neither independent nor identically distributed. Apart from the marginal approach by Wei et al. (1989), the shared frailty model, see for example Duchateau and Janssen (2008), has been used extensively to analyze recurrent event data, where the correlation between sequential times is implicitly taken into account via a random effect. Oakes (1989) and Romeo et al. (2006) showed and exemplified the equivalence of frailty models for bivariate survival data to Archimedean copulas. Despite the fact that copula-based models have been used to model parallel survival data, their application to recurrent failure time data has only recently been suggested by Lawless and Yilmaz (2011) for the bivariate case. Here, we extend this to more than two recurrent events and model the joint distribution of recurrent events explicitly using parametric copulas within a Bayesian framework. This framework allows for parametric as well as a nonparametric modeling of the marginal baseline hazards and models the influence of covariates on the marginals via a proportional hazards assumption. Furthermore, the parameters of the copula may also depend on the covariates. We illustrate the flexibility of this approach using data from an asthma prevention trial in young children.

Journal ArticleDOI
TL;DR: The proposed solution is the arcsine transformation of the crude cumulative incidence as its approximate variance, which is inversely proportional to the sample size, can be calculated also for studies with a zero estimate.
Abstract: When performing single arm meta-analyses of rare events in small populations, if the outcome of interest is incidence, it is not uncommon to have at least one study with zero events, especially in the presence of competing risks. In this paper, we address the problem of how to include studies with zero events in inverse variance meta-analyses when individual patient data are not available, going beyond the naive approach of not including the study or the use of a continuity correction. The proposed solution is the arcsine transformation of the crude cumulative incidence as its approximate variance, which is inversely proportional to the sample size, can be calculated also for studies with a zero estimate. As an alternative, generalized linear mixed models (GLMM) can be used. Simulations were performed to compare the results from inverse variance method meta-analyses of the arcsine transformed cumulative incidence to those obtained from meta-analyses of the cumulative incidence itself and of the logit transformation of the cumulative incidence. The comparisons have been carried out for different scenarios of heterogeneity, incidence, and censoring and for competing and not competing risks. The arcsine transformation showed the smallest bias and the highest coverage among models assuming within study normality. At the same time, the GLMM model had the best performance at very low incidences. The proposed method was applied to the clinical context that motivated this work, i.e. a meta-analysis of 5-year crude cumulative incidence of central nervous system recurrences in children treated for acute lymphoblastic leukemia.

Journal ArticleDOI
TL;DR: This work outlines the modeling of multiarm trials and inference for functional contrasts with INLA, and demonstrates how INLA facilitate the assessment of network inconsistency with node‐splitting.
Abstract: Analyzing the collected evidence of a systematic review in form of a network meta-analysis (NMA) enjoys increasing popularity and provides a valuable instrument for decision making. Bayesian inference of NMA models is often propagated, especially if correlated random effects for multiarm trials are included. The standard choice for Bayesian inference is Markov chain Monte Carlo (MCMC) sampling, which is computationally intensive. An alternative to MCMC sampling is the recently suggested approximate Bayesian method of integrated nested Laplace approximations (INLA) that dramatically saves computation time without any substantial loss in accuracy. We show how INLA apply to NMA models for summary level as well as trial-arm level data. Specifically, we outline the modeling of multiarm trials and inference for functional contrasts with INLA. We demonstrate how INLA facilitate the assessment of network inconsistency with node-splitting. Three applications illustrate the use of INLA for a NMA.

Journal ArticleDOI
TL;DR: In this paper, the authors argue that causal inference can be seen as posterior predictive inference in a hypothetical population without covariate imbalances, and also discuss how controlling for confounding through inverse probability of treatment weighting can be justified and incorporated in the Bayesian setting.
Abstract: While optimal dynamic treatment regimes (DTRs) can be estimated without specification of a predictive model, a model-based approach, combined with dynamic programming and Monte Carlo integration, enables direct probabilistic comparisons between the outcomes under the optimal DTR and alternative (dynamic or static) treatment regimes. The Bayesian predictive approach also circumvents problems related to frequentist estimators under the nonregular estimation problem. However, the model-based approach is susceptible to misspecification, in particular of the "null-paradox" type, which is due to the model parameters not having a direct causal interpretation in the presence of latent individual-level characteristics. Because it is reasonable to insist on correct inferences under the null of no difference between the alternative treatment regimes, we discuss how to achieve this through a "null-robust" reparametrization of the problem in a longitudinal setting. Since we argue that causal inference can be entirely understood as posterior predictive inference in a hypothetical population without covariate imbalances, we also discuss how controlling for confounding through inverse probability of treatment weighting can be justified and incorporated in the Bayesian setting.

Journal ArticleDOI
TL;DR: The powers of several normal probability plot based (graphical) tests and the most popular nongraphical Anderson-Darling and Shapiro-Wilk tests are compared by simulation and recommendations are given on which graphical tests should be used in what circumstances.
Abstract: Normal probability plots are widely used as a statistical tool for assessing whether an observed simple random sample is drawn from a normally distributed population. The users, however, have to judge subjectively, if no objective rule is provided, whether the plotted points fall close to a straight line. In this paper, we focus on how a normal probability plot can be augmented by intervals for all the points so that, if the population distribution is normal, then all the points should fall into the corresponding intervals simultaneously with probability 1-α. These simultaneous 1-α probability intervals provide therefore an objective mean to judge whether the plotted points fall close to the straight line: the plotted points fall close to the straight line if and only if all the points fall into the corresponding intervals. The powers of several normal probability plot based (graphical) tests and the most popular nongraphical Anderson-Darling and Shapiro-Wilk tests are compared by simulation. Based on this comparison, recommendations are given in Section 3 on which graphical tests should be used in what circumstances. An example is provided to illustrate the methods.

Journal ArticleDOI
TL;DR: PPC involves simulating “replicated” data from the posterior predictive distribution of the model under scrutiny, and an extreme value of which suggests a misfit between the model and the data is proposed.
Abstract: Multiple imputation is gaining popularity as a strategy for handling missing data, but there is a scarcity of tools for checking imputation models, a critical step in model fitting. Posterior predictive checking (PPC) has been recommended as an imputation diagnostic. PPC involves simulating "replicated" data from the posterior predictive distribution of the model under scrutiny. Model fit is assessed by examining whether the analysis from the observed data appears typical of results obtained from the replicates produced by the model. A proposed diagnostic measure is the posterior predictive "p-value", an extreme value of which (i.e., a value close to 0 or 1) suggests a misfit between the model and the data. The aim of this study was to evaluate the performance of the posterior predictive p-value as an imputation diagnostic. Using simulation methods, we deliberately misspecified imputation models to determine whether posterior predictive p-values were effective in identifying these problems. When estimating the regression parameter of interest, we found that more extreme p-values were associated with poorer imputation model performance, although the results highlighted that traditional thresholds for classical p-values do not apply in this context. A shortcoming of the PPC method was its reduced ability to detect misspecified models with increasing amounts of missing data. Despite the limitations of posterior predictive p-values, they appear to have a valuable place in the imputer's toolkit. In addition to automated checking using p-values, we recommend imputers perform graphical checks and examine other summaries of the test quantity distribution.

Journal ArticleDOI
TL;DR: Alternative methods for estimating a treatment's effect in the presence of missing data perform better than the classic LOCF method, but in situations with high marker's variance and rarely recorded measurements none of the examined method adequately corrected the bias.
Abstract: Marginal structural models (MSMs) have been proposed for estimating a treatment's effect, in the presence of time-dependent confounding. We aimed to evaluate the performance of the Cox MSM in the presence of missing data and to explore methods to adjust for missingness. We simulated data with a continuous time-dependent confounder and a binary treatment. We explored two classes of missing data: (i) missed visits, which resemble clinical cohort studies; (ii) missing confounder's values, which correspond to interval cohort studies. Missing data were generated under various mechanisms. In the first class, the source of the bias was the extreme treatment weights. Truncation or normalization improved estimation. Therefore, particular attention must be paid to the distribution of weights, and truncation or normalization should be applied if extreme weights are noticed. In the second case, bias was due to the misspecification of the treatment model. Last observation carried forward (LOCF), multiple imputation (MI), and inverse probability of missingness weighting (IPMW) were used to correct for the missingness. We found that alternatives, especially the IPMW method, perform better than the classic LOCF method. Nevertheless, in situations with high marker's variance and rarely recorded measurements none of the examined method adequately corrected the bias.

Journal ArticleDOI
TL;DR: A unified approach based on a bivariate linear mixed effects model to estimate three types of bivariate correlation coefficients (BCCs) as well as the associated variances between two quantitative variables in cross-sectional data from a family-type clustered design is proposed.
Abstract: We propose a unified approach based on a bivariate linear mixed effects model to estimate three types of bivariate correlation coefficients (BCCs), as well as the associated variances between two quantitative variables in cross-sectional data from a family-type clustered design. These BCCs are defined at different levels of experimental units including clusters (e.g., families) and subjects within clusters and assess different aspects on the relationships between two variables. We study likelihood-based inferences for these BCCs, and provide easy implementation using standard software SAS. Unlike several existing BCC estimators in the literature on clustered data, our approach can seamlessly handle two major analytic challenges arising from a family-type clustered design: (1) many families may consist of only one single subject; (2) one of the paired measurements may be missing for some subjects. Hence, our approach maximizes the use of data from all subjects (even those missing one of the two variables to be correlated) from all families, regardless of family size. We also conduct extensive simulations to show that our estimators are superior to existing estimators in handling missing data or/and imbalanced family sizes and the proposed Wald test maintains good size and power for hypothesis testing. Finally, we analyze a real-world Alzheimer's disease dataset from a family clustered study to investigate the BCCs across different modalities of disease markers including cognitive tests, cerebrospinal fluid biomarkers, and neuroimaging biomarkers.

Journal ArticleDOI
TL;DR: It is shown that also reverse attenuation of interaction effects may emerge, namely when heteroscedastic measurement error or sampling variances of a mismeasured covariate are present, which are not unrealistic scenarios in practice.
Abstract: Covariate measurement error may cause biases in parameters of regression coefficients in generalized linear models. The influence of measurement error on interaction parameters has, however, only rarely been investigated in depth, and if so, attenuation effects were reported. In this paper, we show that also reverse attenuation of interaction effects may emerge, namely when heteroscedastic measurement error or sampling variances of a mismeasured covariate are present, which are not unrealistic scenarios in practice. Theoretical findings are illustrated with simulations. A Bayesian approach employing integrated nested Laplace approximations is suggested to model the heteroscedastic measurement error and covariate variances, and an application shows that the method is able to reveal approximately correct parameter estimates.

Journal ArticleDOI
TL;DR: A modification of the SGoF method is introduced, termed majorant version, which rejects the null hypotheses with adjusted p-values below the level, which is the smallest level at which the S goF procedure would still reject the given null hypothesis, while controlling for the multiplicity of tests.
Abstract: In the field of multiple comparison procedures, adjusted p-values are an important tool to evaluate the significance of a test statistic while taking the multiplicity into account. In this paper, we introduce adjusted p-values for the recently proposed Sequential Goodness-of-Fit (SGoF) multiple test procedure by letting the level of the test vary on the unit interval. This extends previous research on the SGoF method, which is a method of high interest when one aims to increase the statistical power in a multiple testing scenario. The adjusted p-value is the smallest level at which the SGoF procedure would still reject the given null hypothesis, while controlling for the multiplicity of tests. The main properties of the adjusted p-values are investigated. In particular, we show that they are a subset of the original p-values, being equal to 1 for p-values above a certain threshold. These are very useful properties from a numerical viewpoint, since they allow for a simplified method to compute the adjusted p-values. We introduce a modification of the SGoF method, termed majorant version, which rejects the null hypotheses with adjusted p-values below the level. This modification rejects more null hypotheses as the level increases, something which is not in general the case for the original SGoF. Adjusted p-values for the conservative version of the SGoF procedure, which estimates the variance without assuming that all the null hypotheses are true, are also included. The situation with ties among the p-values is discussed too. Several real data applications are investigated to illustrate the practical usage of adjusted p-values, ranging from a small to a large number of tests.

Journal ArticleDOI
TL;DR: The expected benefit is regarded as an estimation problem and two approaches to statistical inference are considered, using data from a previously published study, to illustrate the possible insights to be gained from the application of formal inference techniques to determine the expected benefit.
Abstract: When the efficacy of a new medical drug is compared against that of an established competitor in a randomized controlled trial, the difference in patient-relevant outcomes, such as mortality, is usually measured directly. In diagnostic research, however, the impact of diagnostic procedures is of an indirect nature as test results do influence downstream clinical decisions, but test performance (as characterized by sensitivity, specificity, and the predictive values of a procedure) is, at best, only a surrogate endpoint for patient outcome and does not necessarily translate into it. Not many randomized controlled trials have been conducted so far in diagnostic research, and, hence, we need alternative approaches to close the gap between test characteristics and patient outcomes. Several informal approaches have been suggested in order to close this gap, and decision modeling has been advocated as a means of obtaining formal approaches. Recently, the expected benefit has been proposed as a quantity that allows a simple formal approach, and we take up this suggestion in this paper. We regard the expected benefit as an estimation problem and consider two approaches to statistical inference. Moreover, using data from a previously published study, we illustrate the possible insights to be gained from the application of formal inference techniques to determine the expected benefit.

Journal ArticleDOI
TL;DR: There are no covariate adjustment methods for discrimination statistics in censored survival data and the D‐index, a standard procedure for quantifying disease‐risk factor associations, is described.
Abstract: This work was supported by the Medical Research Council Grant G0700463 and Unit Programme U105260558.

Journal ArticleDOI
TL;DR: A new model for recurrent event data characterized by a baseline rate function fully parametric, which is based on the exponential‐Poisson distribution, is introduced, which has a particular case which is the classical homogeneous Poisson process.
Abstract: In this paper, we introduce a new model for recurrent event data characterized by a baseline rate function fully parametric, which is based on the exponential-Poisson distribution. The model arises from a latent competing risk scenario, in the sense that there is no information about which cause was responsible for the event occurrence. Then, the time of each recurrence is given by the minimum lifetime value among all latent causes. The new model has a particular case, which is the classical homogeneous Poisson process. The properties of the proposed model are discussed, including its hazard rate function, survival function, and ordinary moments. The inferential procedure is based on the maximum likelihood approach. We consider an important issue of model selection between the proposed model and its particular case by the likelihood ratio test and score test. Goodness of fit of the recurrent event models is assessed using Cox-Snell residuals. A simulation study evaluates the performance of the estimation procedure in the presence of a small and moderate sample sizes. Applications on two real data sets are provided to illustrate the proposed methodology. One of them, first analyzed by our team of researchers, considers the data concerning the recurrence of malaria, which is an infectious disease caused by a protozoan parasite that infects red blood cells.

Journal ArticleDOI
TL;DR: A robust Dirichlet process for estimating survival functions from samples with right-censored data adopts a prior near-ignorance approach to avoid almost any assumption about the distribution of the population lifetimes, as well as the need of eliciting an infinite dimensional parameter.
Abstract: We present a robust Dirichlet process for estimating survival functions from samples with right-censored data. It adopts a prior near-ignorance approach to avoid almost any assumption about the distribution of the population lifetimes, as well as the need of eliciting an infinite dimensional parameter (in case of lack of prior information), as it happens with the usual Dirichlet process prior. We show how such model can be used to derive robust inferences from right-censored lifetime data. Robustness is due to the identification of the decisions that are prior-dependent, and can be interpreted as an analysis of sensitivity with respect to the hypothetical inclusion of fictitious new samples in the data. In particular, we derive a nonparametric estimator of the survival probability and a hypothesis test about the probability that the lifetime of an individual from one population is shorter than the lifetime of an individual from another. We evaluate these ideas on simulated data and on the Australian AIDS survival dataset. The methods are publicly available through an easy-to-use R package.