scispace - formally typeset
Search or ask a question

Showing papers in "BMC Medical Research Methodology in 2014"


Journal ArticleDOI
TL;DR: In this article, the authors proposed a new estimation method by incorporating the sample size and compared the estimators of the sample mean and standard deviation under all three scenarios and presented some suggestions on which scenario is preferred in real-world applications.
Abstract: In systematic reviews and meta-analysis, researchers often pool the results of the sample mean and standard deviation from a set of similar clinical trials. A number of the trials, however, reported the study using the median, the minimum and maximum values, and/or the first and third quartiles. Hence, in order to combine results, one may have to estimate the sample mean and standard deviation for such trials. In this paper, we propose to improve the existing literature in several directions. First, we show that the sample standard deviation estimation in Hozo et al.’s method (BMC Med Res Methodol 5:13, 2005) has some serious limitations and is always less satisfactory in practice. Inspired by this, we propose a new estimation method by incorporating the sample size. Second, we systematically study the sample mean and standard deviation estimation problem under several other interesting settings where the interquartile range is also available for the trials. We demonstrate the performance of the proposed methods through simulation studies for the three frequently encountered scenarios, respectively. For the first two scenarios, our method greatly improves existing methods and provides a nearly unbiased estimate of the true sample standard deviation for normal data and a slightly biased estimate for skewed data. For the third scenario, our method still performs very well for both normal data and skewed data. Furthermore, we compare the estimators of the sample mean and standard deviation under all three scenarios and present some suggestions on which scenario is preferred in real-world applications. In this paper, we discuss different approximation methods in the estimation of the sample mean and standard deviation and propose some new estimation methods to improve the existing literature. We conclude our work with a summary table (an Excel spread sheet including all formulas) that serves as a comprehensive guidance for performing meta-analysis in different situations.

4,745 citations


Journal ArticleDOI
TL;DR: Widespread adoption and implementation of this tool will facilitate and improve critical appraisal of evidence from animal studies and enhance the efficiency of translatingAnimal research into clinical practice and increase awareness of the necessity of improving the methodological quality of animal studies.
Abstract: Systematic Reviews (SRs) of experimental animal studies are not yet common practice, but awareness of the merits of conducting such SRs is steadily increasing. As animal intervention studies differ from randomized clinical trials (RCT) in many aspects, the methodology for SRs of clinical trials needs to be adapted and optimized for animal intervention studies. The Cochrane Collaboration developed a Risk of Bias (RoB) tool to establish consistency and avoid discrepancies in assessing the methodological quality of RCTs. A similar initiative is warranted in the field of animal experimentation. We provide an RoB tool for animal intervention studies (SYRCLE’s RoB tool). This tool is based on the Cochrane RoB tool and has been adjusted for aspects of bias that play a specific role in animal intervention studies. To enhance transparency and applicability, we formulated signalling questions to facilitate judgment. The resulting RoB tool for animal studies contains 10 entries. These entries are related to selection bias, performance bias, detection bias, attrition bias, reporting bias and other biases. Half these items are in agreement with the items in the Cochrane RoB tool. Most of the variations between the two tools are due to differences in design between RCTs and animal studies. Shortcomings in, or unfamiliarity with, specific aspects of experimental design of animal studies compared to clinical studies also play a role. SYRCLE’s RoB tool is an adapted version of the Cochrane RoB tool. Widespread adoption and implementation of this tool will facilitate and improve critical appraisal of evidence from animal studies. This may subsequently enhance the efficiency of translating animal research into clinical practice and increase awareness of the necessity of improving the methodological quality of animal studies.

1,773 citations


Journal ArticleDOI
TL;DR: Differences in assessment and low agreement between reviewers and authors suggest the need to contact authors for information not published in studies when applying the NOS in systematic reviews.
Abstract: Lack of appropriate reporting of methodological details has previously been shown to distort risk of bias assessments in randomized controlled trials. The same might be true for observational studies. The goal of this study was to compare the Newcastle-Ottawa Scale (NOS) assessment for risk of bias between reviewers and authors of cohort studies included in a published systematic review on risk factors for severe outcomes in patients infected with influenza. Cohort studies included in the systematic review and published between 2008–2011 were included. The corresponding or first authors completed a survey covering all NOS items. Results were compared with the NOS assessment applied by reviewers of the systematic review. Inter-rater reliability was calculated using kappa (K) statistics. Authors of 65/182 (36%) studies completed the survey. The overall NOS score was significantly higher (p < 0.001) in the reviewers’ assessment (median = 6; interquartile range [IQR] 6–6) compared with those by authors (median = 5, IQR 4–6). Inter-rater reliability by item ranged from slight (K = 0.15, 95% confidence interval [CI] = −0.19, 0.48) to poor (K = −0.06, 95% CI = −0.22, 0.10). Reliability for the overall score was poor (K = −0.004, 95% CI = −0.11, 0.11). Differences in assessment and low agreement between reviewers and authors suggest the need to contact authors for information not published in studies when applying the NOS in systematic reviews.

1,171 citations


Journal ArticleDOI
TL;DR: The authors' simulations showed that the HKSJ method consistently results in more adequate error rates than the DL method, especially when the number of studies is small, and can easily be applied routinely in meta-analyses.
Abstract: The DerSimonian and Laird approach (DL) is widely used for random effects meta-analysis, but this often results in inappropriate type I error rates. The method described by Hartung, Knapp, Sidik and Jonkman (HKSJ) is known to perform better when trials of similar size are combined. However evidence in realistic situations, where one trial might be much larger than the other trials, is lacking. We aimed to evaluate the relative performance of the DL and HKSJ methods when studies of different sizes are combined and to develop a simple method to convert DL results to HKSJ results. We evaluated the performance of the HKSJ versus DL approach in simulated meta-analyses of 2–20 trials with varying sample sizes and between-study heterogeneity, and allowing trials to have various sizes, e.g. 25% of the trials being 10-times larger than the smaller trials. We also compared the number of “positive” (statistically significant at p = 3 studies of interventions from the Cochrane Database of Systematic Reviews. The simulations showed that the HKSJ method consistently resulted in more adequate error rates than the DL method. When the significance level was 5%, the HKSJ error rates at most doubled, whereas for DL they could be over 30%. DL, and, far less so, HKSJ had more inflated error rates when the combined studies had unequal sizes and between-study heterogeneity. The empirical data from 689 meta-analyses showed that 25.1% of the significant findings for the DL method were non-significant with the HKSJ method. DL results can be easily converted into HKSJ results. Our simulations showed that the HKSJ method consistently results in more adequate error rates than the DL method, especially when the number of studies is small, and can easily be applied routinely in meta-analyses. Even with the HKSJ method, extra caution is needed when there are = <5 studies of very unequal sizes.

1,022 citations


Journal ArticleDOI
TL;DR: A comprehensive summation of the major barriers to working with various disadvantaged groups is provided, along with proposed strategies for addressing each of the identified types of barriers.
Abstract: Background This study aims to review the literature regarding the barriers to sampling, recruitment, participation, and retention of members of socioeconomically disadvantaged groups in health research and strategies for increasing the amount of health research conducted with socially disadvantaged groups.

890 citations


Journal ArticleDOI
TL;DR: The development and use of the ConQual approach will assist users of qualitative systematic reviews to establish confidence in the evidence produced in these types of reviews and can serve as a practical tool to assist in decision making.
Abstract: The importance of findings derived from syntheses of qualitative research has been increasingly acknowledged. Findings that arise from qualitative syntheses inform questions of practice and policy in their own right and are commonly used to complement findings from quantitative research syntheses. The GRADE approach has been widely adopted by international organisations to rate the quality and confidence of the findings of quantitative systematic reviews. To date, there has been no widely accepted corresponding approach to assist health care professionals and policy makers in establishing confidence in the synthesised findings of qualitative systematic reviews. A methodological group was formed develop a process to assess the confidence in synthesised qualitative research findings and develop a Summary of Findings tables for meta-aggregative qualitative systematic reviews. Dependability and credibility are two elements considered by the methodological group to influence the confidence of qualitative synthesised findings. A set of critical appraisal questions are proposed to establish dependability, whilst credibility can be ranked according to the goodness of fit between the author’s interpretation and the original data. By following the processes outlined in this article, an overall ranking can be assigned to rate the confidence of synthesised qualitative findings, a system we have labelled ConQual. The development and use of the ConQual approach will assist users of qualitative systematic reviews to establish confidence in the evidence produced in these types of reviews and can serve as a practical tool to assist in decision making.

510 citations


Journal ArticleDOI
TL;DR: Claims-based measures using BM ICD 9 coding may be insufficient to identify patients with incident BM diagnosis and should be validated against chart data to maximize their potential for population-based analyses.
Abstract: To assess concordance between Medicare claims and Surveillance, Epidemiology, and End Results (SEER) reports of incident BM among prostate cancer (PCa) patients. The prevalence and consequences of bone metastases (BM) have been examined across tumor sites using healthcare claims data however the reliability of these claims-based BM measures has not been investigated. This retrospective cohort study utilized linked registry and claims (SEER-Medicare) data on men diagnosed with incident stage IV M1 PCa between 2005 and 2007. The SEER-based measure of incident BM was cross-tabulated with three separate Medicare claims approaches to assess concordance. Sensitivity, specificity and positive predictive value (PPV) were calculated to assess the concordance between registry- and claims-based measures. Based on 2,708 PCa patients in SEER-Medicare, there is low to moderate concordance between the SEER- and claims-based measures of incident BM. Across the three approaches, sensitivity ranged from 0.48 (0.456 – 0.504) to 0.598 (0.574 - 0.621), specificity ranged from 0.538 (0.507 - 0.569) to 0.620 (0.590 - 0.650) and PPV ranged from 0.679 (0.651 - 0.705) to 0.690 (0.665 - 0.715). A comparison of utilization patterns between SEER-based and claims-based measures suggested avenues for improving sensitivity. Claims-based measures using BM ICD 9 coding may be insufficient to identify patients with incident BM diagnosis and should be validated against chart data to maximize their potential for population-based analyses.

510 citations


Journal ArticleDOI
TL;DR: The vast majority of studies describing some form of external validation of a multivariable prediction model were poorly reported with key details frequently not presented and calibration often omitted from the publication.
Abstract: Before considering whether to use a multivariable (diagnostic or prognostic) prediction model, it is essential that its performance be evaluated in data that were not used to develop the model (referred to as external validation). We critically appraised the methodological conduct and reporting of external validation studies of multivariable prediction models. We conducted a systematic review of articles describing some form of external validation of one or more multivariable prediction models indexed in PubMed core clinical journals published in 2010. Study data were extracted in duplicate on design, sample size, handling of missing data, reference to the original study developing the prediction models and predictive performance measures. 11,826 articles were identified and 78 were included for full review, which described the evaluation of 120 prediction models. in participant data that were not used to develop the model. Thirty-three articles described both the development of a prediction model and an evaluation of its performance on a separate dataset, and 45 articles described only the evaluation of an existing published prediction model on another dataset. Fifty-seven percent of the prediction models were presented and evaluated as simplified scoring systems. Sixteen percent of articles failed to report the number of outcome events in the validation datasets. Fifty-four percent of studies made no explicit mention of missing data. Sixty-seven percent did not report evaluating model calibration whilst most studies evaluated model discrimination. It was often unclear whether the reported performance measures were for the full regression model or for the simplified models. The vast majority of studies describing some form of external validation of a multivariable prediction model were poorly reported with key details frequently not presented. The validation studies were characterised by poor design, inappropriate handling and acknowledgement of missing data and one of the most key performance measures of prediction models i.e. calibration often omitted from the publication. It may therefore not be surprising that an overwhelming majority of developed prediction models are not used in practice, when there is a dearth of well-conducted and clearly reported (external validation) studies describing their performance on independent participant data.

498 citations


Journal ArticleDOI
TL;DR: An eight-step procedure for better validation of meta-analytic results in systematic reviews of randomised clinical trials is proposed, which will increase the validity of assessments of intervention effects in systematic Reviews of Randomised Clinical trials.
Abstract: Background: Thresholds for statistical significance when assessing meta-analysis results are being insufficiently demonstrated by traditional 95% confidence intervals and P-values. Assessment of intervention effects in systematic reviews with meta-analysis deserves greater rigour. Methods: Methodologies for assessing statistical and clinical significance of intervention effects in systematic reviews were considered. Balancing simplicity and comprehensiveness, an operational procedure was developed, based mainly on The Cochrane Collaboration methodology and the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) guidelines. Results: We propose an eight-step procedure for better validation of meta-analytic results in systematic reviews (1) Obtain the 95% confidence intervals and the P-values from both fixed-effect and random-effects meta-analyses and report the most conservative results as the main results. (2) Explore the reasons behind substantial statistical heterogeneity using subgroup and sensitivity analyses (see step 6). (3) To take account of problems with multiplicity adjust the thresholds for significance according to the number of primary outcomes. (4) Calculate required information sizes (≈ the ap riorirequired number of participants for a meta-analysis to be conclusive) for all outcomes and analyse each outcome with trial sequential analysis. Report whether the trial sequential monitoring boundaries for benefit, harm, or futility are crossed. (5) Calculate Bayes factors for all primary outcomes. (6) Use subgroup analyses and sensitivity analyses to assess the potential impact of bias on the review results. (7) Assess the risk of publication bias. (8) Assess the clinical significance of the statistically significant review results. Conclusions: If followed, the proposed eight-step procedure will increase the validity of assessments of intervention effects in systematic reviews of randomised clinical trials.

431 citations


Journal ArticleDOI
TL;DR: These extended definitions of attributable risk account for the additional temporal dimension which characterizes exposure-response associations, providing more appropriate attributable measures in the presence of dependencies characterized by potentially complex temporal patterns.
Abstract: Measures of attributable risk are an integral part of epidemiological analyses, particularly when aimed at the planning and evaluation of public health interventions. However, the current definition of such measures does not consider any temporal relationships between exposure and risk. In this contribution, we propose extended definitions of attributable risk within the framework of distributed lag non-linear models, an approach recently proposed for modelling delayed associations in either linear or non-linear exposure-response associations. We classify versions of attributable number and fraction expressed using either a forward or backward perspective. The former specifies the future burden due to a given exposure event, while the latter summarizes the current burden due to the set of exposure events experienced in the past. In addition, we illustrate how the components related to sub-ranges of the exposure can be separated. We apply these methods for estimating the mortality risk attributable to outdoor temperature in two cities, London and Rome, using time series data for the periods 1993–2006 and 1992–2010, respectively. The analysis provides estimates of the overall mortality burden attributable to temperature, and then computes the components attributable to cold and heat and then mild and extreme temperatures. These extended definitions of attributable risk account for the additional temporal dimension which characterizes exposure-response associations, providing more appropriate attributable measures in the presence of dependencies characterized by potentially complex temporal patterns.

404 citations


Journal ArticleDOI
TL;DR: Modern modelling techniquessuch as SVM, NN and RF may need over 10 times as many events per variable to achieve a stable AUC and a small optimism than classical modelling techniques such as LR, which implies that such modern techniques should only be used in medical prediction problems if very large data sets are available.
Abstract: Background Modern modelling techniques may potentially provide more accurate predictions of binary outcomes than classical techniques. We aimed to study the predictive performance of different modelling techniques in relation to the effective sample size (“data hungriness”).

Journal ArticleDOI
TL;DR: PMM and LRD may have a role for imputing covariates which are not strongly associated with outcome, and when the imputation model is thought to be slightly but not grossly misspecified, which is better than fully parametric imputation in simulation studies.
Abstract: Multiple imputation is a commonly used method for handling incomplete covariates as it can provide valid inference when data are missing at random. This depends on being able to correctly specify the parametric model used to impute missing values, which may be difficult in many realistic settings. Imputation by predictive mean matching (PMM) borrows an observed value from a donor with a similar predictive mean; imputation by local residual draws (LRD) instead borrows the donor’s residual. Both methods relax some assumptions of parametric imputation, promising greater robustness when the imputation model is misspecified. We review development of PMM and LRD and outline the various forms available, and aim to clarify some choices about how and when they should be used. We compare performance to fully parametric imputation in simulation studies, first when the imputation model is correctly specified and then when it is misspecified. In using PMM or LRD we strongly caution against using a single donor, the default value in some implementations, and instead advocate sampling from a pool of around 10 donors. We also clarify which matching metric is best. Among the current MI software there are several poor implementations. PMM and LRD may have a role for imputing covariates (i) which are not strongly associated with outcome, and (ii) when the imputation model is thought to be slightly but not grossly misspecified. Researchers should spend efforts on specifying the imputation model correctly, rather than expecting predictive mean matching or local residual draws to do the work.

Journal ArticleDOI
TL;DR: The aim of this study was to evaluate if and how publication bias was assessed in meta-analyses of DTA, and to compare the results of various statistical methods used to assess publication bias.
Abstract: The validity of a meta-analysis can be understood better in light of the possible impact of publication bias. The majority of the methods to investigate publication bias in terms of small study-effects are developed for meta-analyses of intervention studies, leaving authors of diagnostic test accuracy (DTA) systematic reviews with limited guidance. The aim of this study was to evaluate if and how publication bias was assessed in meta-analyses of DTA, and to compare the results of various statistical methods used to assess publication bias. A systematic search was initiated to identify DTA reviews with a meta-analysis published between September 2011 and January 2012. We extracted all information about publication bias from the reviews and the two-by-two tables. Existing statistical methods for the detection of publication bias were applied on data from the included studies. Out of 1,335 references, 114 reviews could be included. Publication bias was explicitly mentioned in 75 reviews (65.8%) and 47 of these had performed statistical methods to investigate publication bias in terms of small study-effects: 6 by drawing funnel plots, 16 by statistical testing and 25 by applying both methods. The applied tests were Egger’s test (n = 18), Deeks’ test (n = 12), Begg’s test (n = 5), both the Egger and Begg tests (n = 4), and other tests (n = 2). Our own comparison of the results of Begg’s, Egger’s and Deeks’ test for 92 meta-analyses indicated that up to 34% of the results did not correspond with one another. The majority of DTA review authors mention or investigate publication bias. They mainly use suboptimal methods like the Begg and Egger tests that are not developed for DTA meta-analyses. Our comparison of the Begg, Egger and Deeks tests indicated that these tests do give different results and thus are not interchangeable. Deeks’ test is recommended for DTA meta-analyses and should be preferred.

Journal ArticleDOI
TL;DR: It is shown how significance levels other than the traditional 5% should be considered to provide preliminary evidence for efficacy and how estimation and confidence intervals should be the focus to provide an estimated range of possible treatment effects.
Abstract: Background: In an evaluation of a new health technology, a pilot trial may be undertaken prior to a trial that makes a definitive assessment of benefit. The objective of pilot studies is to provide sufficient evidence that a larger definitive trial can be undertaken and, at times, to provide a preliminary assessment of benefit. Methods: We describe significance thresholds, confidence intervals and surrogate markers in the context of pilot studies and how Bayesian methods can be used in pilot trials. We use a worked example to illustrate the issues raised. Results: We show how significance levels other than the traditional 5% should be considered to provide preliminary evidence for efficacy and how estimation and confidence intervals should be the focus to provide an estimated range of possible treatment effects. We also illustrate how Bayesian methods could also assist in the early assessment of a health technology. Conclusions: We recommend that in pilot trials the focus should be on descriptive statistics and estimation, using confidence intervals, rather than formal hypothesis testing and that confidence intervals other than 95% confidence intervals, such as 85% or 75%, be used for the estimation. The confidence interval should then be interpreted with regards to the minimum clinically important difference. We also recommend that Bayesian methods be used to assist in the interpretation of pilot trials. Surrogate endpoints can also be used in pilot trials but they must reliably predict the overall effect on the clinical outcome.

Journal ArticleDOI
TL;DR: A large gap is apparent between statistical methods research related to missing data and use of these methods in application settings, including RCTs in top medical journals.
Abstract: Missing outcome data is a threat to the validity of treatment effect estimates in randomized controlled trials. We aimed to evaluate the extent, handling, and sensitivity analysis of missing data and intention-to-treat (ITT) analysis of randomized controlled trials (RCTs) in top tier medical journals, and compare our findings with previous reviews related to missing data and ITT in RCTs. Review of RCTs published between July and December 2013 in the BMJ, JAMA, Lancet, and New England Journal of Medicine, excluding cluster randomized trials and trials whose primary outcome was survival. Of the 77 identified eligible articles, 73 (95%) reported some missing outcome data. The median percentage of participants with a missing outcome was 9% (range 0 – 70%). The most commonly used method to handle missing data in the primary analysis was complete case analysis (33, 45%), while 20 (27%) performed simple imputation, 15 (19%) used model based methods, and 6 (8%) used multiple imputation. 27 (35%) trials with missing data reported a sensitivity analysis. However, most did not alter the assumptions of missing data from the primary analysis. Reports of ITT or modified ITT were found in 52 (85%) trials, with 21 (40%) of them including all randomized participants. A comparison to a review of trials reported in 2001 showed that missing data rates and approaches are similar, but the use of the term ITT has increased, as has the report of sensitivity analysis. Missing outcome data continues to be a common problem in RCTs. Definitions of the ITT approach remain inconsistent across trials. A large gap is apparent between statistical methods research related to missing data and use of these methods in application settings, including RCTs in top medical journals.

Journal ArticleDOI
TL;DR: The conditional Poisson model as discussed by the authors avoids estimating stratum parameters by conditioning on the total event count in each stratum, thus simplifying the computing and increasing the number of strata for which fitting is feasible.
Abstract: The time stratified case cross-over approach is a popular alternative to conventional time series regression for analysing associations between time series of environmental exposures (air pollution, weather) and counts of health outcomes. These are almost always analyzed using conditional logistic regression on data expanded to case–control (case crossover) format, but this has some limitations. In particular adjusting for overdispersion and auto-correlation in the counts is not possible. It has been established that a Poisson model for counts with stratum indicators gives identical estimates to those from conditional logistic regression and does not have these limitations, but it is little used, probably because of the overheads in estimating many stratum parameters. The conditional Poisson model avoids estimating stratum parameters by conditioning on the total event count in each stratum, thus simplifying the computing and increasing the number of strata for which fitting is feasible compared with the standard unconditional Poisson model. Unlike the conditional logistic model, the conditional Poisson model does not require expanding the data, and can adjust for overdispersion and auto-correlation. It is available in Stata, R, and other packages. By applying to some real data and using simulations, we demonstrate that conditional Poisson models were simpler to code and shorter to run than are conditional logistic analyses and can be fitted to larger data sets than possible with standard Poisson models. Allowing for overdispersion or autocorrelation was possible with the conditional Poisson model but when not required this model gave identical estimates to those from conditional logistic regression. Conditional Poisson regression models provide an alternative to case crossover analysis of stratified time series data with some advantages. The conditional Poisson model can also be used in other contexts in which primary control for confounding is by fine stratification.

Journal ArticleDOI
TL;DR: Suggestions to improve recruitment included reducing participant burden, providing support for individuals who do not speak English, and forming collaborations with primary care to improve the identification of, and access to, potentially eligible participants.
Abstract: Background Recruiting the required number of participants is vital to the success of clinical research and yet many studies fail to achieve their expected recruitment rate. Increasing research participation is a key agenda within the NHS and elsewhere, but the optimal methods of improving recruitment to clinical research remain elusive. The aim of this study was to identify the factors that researchers perceive as influential in the recruitment of participants to clinically focused research.

Journal ArticleDOI
TL;DR: The SMD was more generalizable than the MD and the MD had a greater statistical power than the SMD but did not result in material differences.
Abstract: To examine empirically whether the mean difference (MD) or the standardised mean difference (SMD) is more generalizable and statistically powerful in meta-analyses of continuous outcomes when the same unit is used. From all the Cochrane Database (March 2013), we identified systematic reviews that combined 3 or more randomised controlled trials (RCT) using the same continuous outcome. Generalizability was assessed using the I-squared (I2) and the percentage agreement. The percentage agreement was calculated by comparing the MD or SMD of each RCT with the corresponding MD or SMD from the meta-analysis of all the other RCTs. The statistical power was estimated using Z-scores. Meta-analyses were conducted using both random-effects and fixed-effect models. 1068 meta-analyses were included. The I2 index was significantly smaller for the SMD than for the MD (P < 0.0001, sign test). For continuous outcomes, the current Cochrane reviews pooled some extremely heterogeneous results. When all these or less heterogeneous subsets of the reviews were examined, the SMD always showed a greater percentage agreement than the MD. When the I2 index was less than 30%, the percentage agreement was 55.3% for MD and 59.8% for SMD in the random-effects model and 53.0% and 59.8%, respectively, in the fixed effect model (both P < 0.0001, sign test). Although the Z-scores were larger for MD than for SMD, there were no differences in the percentage of statistical significance between MD and SMD in either model. The SMD was more generalizable than the MD. The MD had a greater statistical power than the SMD but did not result in material differences.

Journal ArticleDOI
TL;DR: The method aimed to build on the methods of Noblit and Hare and explore the challenges of including a large number of qualitative studies into a qualitative systematic review of chronic musculoskeletal pain.
Abstract: Studies that systematically search for and synthesise qualitative research are becoming more evident in health care, and they can make an important contribution to patient care. Our team was funded to complete a meta-ethnography of patients’ experience of chronic musculoskeletal pain. It has been 25 years since Noblit and Hare published their core text on meta-ethnography, and the current health research environment brings additional challenges to researchers aiming to synthesise qualitative research. Noblit and Hare propose seven stages of meta-ethnography which take the researcher from formulating a research idea to expressing the findings. These stages are not discrete but form part of an iterative research process. We aimed to build on the methods of Noblit and Hare and explore the challenges of including a large number of qualitative studies into a qualitative systematic review. These challenges hinge upon epistemological and practical issues to be considered alongside expectations about what determines high quality research. This paper describes our method and explores these challenges. Central to our method was the process of collaborative interpretation of concepts and the decision to exclude original material where we could not decipher a concept. We use excerpts from our research team’s reflexive statements to illustrate the development of our methods.

Journal ArticleDOI
TL;DR: Across a range of correlations between pre- and post-treatment scores and at varying levels and direction of baseline imbalance, ANCOVA remains the optimum statistical method for the analysis of continuous outcomes in RCTs, in terms of bias, precision and statistical power.
Abstract: Analysis of variance (ANOVA), change-score analysis (CSA) and analysis of covariance (ANCOVA) respond differently to baseline imbalance in randomized controlled trials. However, no empirical studies appear to have quantified the differential bias and precision of estimates derived from these methods of analysis, and their relative statistical power, in relation to combinations of levels of key trial characteristics. This simulation study therefore examined the relative bias, precision and statistical power of these three analyses using simulated trial data. 126 hypothetical trial scenarios were evaluated (126 000 datasets), each with continuous data simulated by using a combination of levels of: treatment effect; pretest-posttest correlation; direction and magnitude of baseline imbalance. The bias, precision and power of each method of analysis were calculated for each scenario. Compared to the unbiased estimates produced by ANCOVA, both ANOVA and CSA are subject to bias, in relation to pretest-posttest correlation and the direction of baseline imbalance. Additionally, ANOVA and CSA are less precise than ANCOVA, especially when pretest-posttest correlation ≥ 0.3. When groups are balanced at baseline, ANCOVA is at least as powerful as the other analyses. Apparently greater power of ANOVA and CSA at certain imbalances is achieved in respect of a biased treatment effect. Across a range of correlations between pre- and post-treatment scores and at varying levels and direction of baseline imbalance, ANCOVA remains the optimum statistical method for the analysis of continuous outcomes in RCTs, in terms of bias, precision and statistical power.

Journal ArticleDOI
TL;DR: GP survey response rates may improve by using the following strategies: monetary and nonmonetary incentives, larger incentives, upfront monetary incentives, postal surveys, pre-contact with a phonecall from a peer, personalised packages, sending mail on Friday, and using registered mail.
Abstract: Background Low survey response rates in general practice are common and lead to loss of power, selection bias, unexpected budgetary constraints and time delays in research projects.

Journal ArticleDOI
TL;DR: This is the first in-depth methodological systematic review of meta-ethnography conduct and reporting, focusing on the analysis and synthesis process and output of health-related meta-ETHnography journal papers published from 2012–2013.
Abstract: Syntheses of qualitative studies can inform health policy, services and our understanding of patient experience. Meta-ethnography is a systematic seven-phase interpretive qualitative synthesis approach well-suited to producing new theories and conceptual models. However, there are concerns about the quality of meta-ethnography reporting, particularly the analysis and synthesis processes. Our aim was to investigate the application and reporting of methods in recent meta-ethnography journal papers, focusing on the analysis and synthesis process and output. Methodological systematic review of health-related meta-ethnography journal papers published from 2012–2013. We searched six electronic databases, Google Scholar and Zetoc for papers using key terms including ‘meta-ethnography.’ Two authors independently screened papers by title and abstract with 100% agreement. We identified 32 relevant papers. Three authors independently extracted data and all authors analysed the application and reporting of methods using content analysis. Meta-ethnography was applied in diverse ways, sometimes inappropriately. In 13% of papers the approach did not suit the research aim. In 66% of papers reviewers did not follow the principles of meta-ethnography. The analytical and synthesis processes were poorly reported overall. In only 31% of papers reviewers clearly described how they analysed conceptual data from primary studies (phase 5, ‘translation’ of studies) and in only one paper (3%) reviewers explicitly described how they conducted the analytic synthesis process (phase 6). In 38% of papers we could not ascertain if reviewers had achieved any new interpretation of primary studies. In over 30% of papers seminal methodological texts which could have informed methods were not cited. We believe this is the first in-depth methodological systematic review of meta-ethnography conduct and reporting. Meta-ethnography is an evolving approach. Current reporting of methods, analysis and synthesis lacks clarity and comprehensiveness. This is a major barrier to use of meta-ethnography findings that could contribute significantly to the evidence base because it makes judging their rigour and credibility difficult. To realise the high potential value of meta-ethnography for enhancing health care and understanding patient experience requires reporting that clearly conveys the methodology, analysis and findings. Tailored meta-ethnography reporting guidelines, developed through expert consensus, could improve reporting.

Journal ArticleDOI
TL;DR: Assessment of intervention effects in randomised clinical trials deserves more rigour in order to become more valid, and the proposed five-step procedure may increase the validity of assessments of interventions in randomising clinical trials.
Abstract: Thresholds for statistical significance are insufficiently demonstrated by 95% confidence intervals or P-values when assessing results from randomised clinical trials. First, a P-value only shows the probability of getting a result assuming that the null hypothesis is true and does not reflect the probability of getting a result assuming an alternative hypothesis to the null hypothesis is true. Second, a confidence interval or a P-value showing significance may be caused by multiplicity. Third, statistical significance does not necessarily result in clinical significance. Therefore, assessment of intervention effects in randomised clinical trials deserves more rigour in order to become more valid. Several methodologies for assessing the statistical and clinical significance of intervention effects in randomised clinical trials were considered. Balancing simplicity and comprehensiveness, a simple five-step procedure was developed. For a more valid assessment of results from a randomised clinical trial we propose the following five-steps: (1) report the confidence intervals and the exact P-values; (2) report Bayes factor for the primary outcome, being the ratio of the probability that a given trial result is compatible with a ‘null’ effect (corresponding to the P-value) divided by the probability that the trial result is compatible with the intervention effect hypothesised in the sample size calculation; (3) adjust the confidence intervals and the statistical significance threshold if the trial is stopped early or if interim analyses have been conducted; (4) adjust the confidence intervals and the P-values for multiplicity due to number of outcome comparisons; and (5) assess clinical significance of the trial results. If the proposed five-step procedure is followed, this may increase the validity of assessments of intervention effects in randomised clinical trials.

Journal ArticleDOI
TL;DR: The authors' subjective judgement was that Latent Gold offered the best balance of sensitivity to subgroups, ease of use and presentation of results with these datasets but it is recognised that different clustering methods may suit other types of data and clinical research questions.
Abstract: Background: There are various methodological approaches to identifying clinically important subgroups and one method is to identify clusters of characteristics that differentiate people in cross-sectional and/or longitudinal data using Cluster Analysis (CA) or Latent Class Analysis (LCA). There is a scarcity of head-to-head comparisons that can inform the choice of which clustering method might be suitable for particular clinical datasets and research questions. Therefore, the aim of this study was to perform a head-to-head comparison of three commonly available methods (SPSS TwoStep CA, Latent Gold LCA and SNOB LCA). Methods: The performance of these three methods was compared: (i) quantitatively using the number of subgroups detected, the classification probability of individuals into subgroups, the reproducibility of results, and (ii) qualitatively using subjective judgments about each program’s ease of use and interpretability of the presentation of results. We analysed five real datasets of varying complexity in a secondary analysis of data from other research projects. Three datasets contained only MRI findings (n=2,060 to 20,810 vertebral disc levels), one dataset contained only pain intensity data collected for 52 weeks by text (SMS) messaging (n=1,121 people), and the last dataset contained a range of clinical variables measured in low back pain patients (n=543 people). Four artificial datasets (n=1,000 each) containing subgroups of varying complexity were also analysed testing the ability of these clustering methods to detect subgroups and correctly classify individuals when subgroup membership was known. Results: The results from the real clinical datasets indicated that the number of subgroups detected varied, the certainty of classifying individuals into those subgroups varied, the findings had perfect reproducibility, some programs were easier to use and the interpretability of the presentation of their findings also varied. The results from the artificial datasets indicated that all three clustering methods showed a near-perfect ability to detect known subgroups and correctly classify individuals into those subgroups.

Journal ArticleDOI
TL;DR: Validation studies that included objective confirmation of clinical stability in their design yielded better results for the test-retest analysis with regard to both pain and global HRQoL scores, and it is suggested that special attention be focused on clinical stability when designing a PRO validation study that includes advanced cancer patients under PC.
Abstract: Patient-reported outcome validation needs to achieve validity and reliability standards. Among reliability analysis parameters, test-retest reliability is an important psychometric property. Retested patients must be in a clinically stable condition. This is particularly problematic in palliative care (PC) settings because advanced cancer patients are prone to a faster rate of clinical deterioration. The aim of this study was to evaluate the methods by which multi-symptom and health-related qualities of life (HRQoL) based on patient-reported outcomes (PROs) have been validated in oncological PC settings with regards to test-retest reliability. A systematic search of PubMed (1966 to June 2013), EMBASE (1980 to June 2013), PsychInfo (1806 to June 2013), CINAHL (1980 to June 2013), and SCIELO (1998 to June 2013), and specific PRO databases was performed. Studies were included if they described a set of validation studies. Studies were included if they described a set of validation studies for an instrument developed to measure multi-symptom or multidimensional HRQoL in advanced cancer patients under PC. The COSMIN checklist was used to rate the methodological quality of the study designs. We identified 89 validation studies from 746 potentially relevant articles. From those 89 articles, 31 measured test-retest reliability and were included in this review. Upon critical analysis of the overall quality of the criteria used to determine the test-retest reliability, 6 (19.4%), 17 (54.8%), and 8 (25.8%) of these articles were rated as good, fair, or poor, respectively, and no article was classified as excellent. Multi-symptom instruments were retested over a shortened interval when compared to the HRQoL instruments (median values 24 hours and 168 hours, respectively; p = 0.001). Validation studies that included objective confirmation of clinical stability in their design yielded better results for the test-retest analysis with regard to both pain and global HRQoL scores (p < 0.05). The quality of the statistical analysis and its description were of great concern. Test-retest reliability has been infrequently and poorly evaluated. The confirmation of clinical stability was an important factor in our analysis, and we suggest that special attention be focused on clinical stability when designing a PRO validation study that includes advanced cancer patients under PC.

Journal ArticleDOI
TL;DR: With a small number of centres, the use of fixed- effects, random-effects, or GEE with non-robust SEs should be used with a moderate or large number of centre, while with a large number, fixed-effects led to biased estimates and inflated type I error rates in many situations, and Mantel-Haenszel lost power compared to other analysis methods in some situations.
Abstract: It is often desirable to account for centre-effects in the analysis of multicentre randomised trials, however it is unclear which analysis methods are best in trials with a binary outcome. We compared the performance of four methods of analysis (fixed-effects models, random-effects models, generalised estimating equations (GEE), and Mantel-Haenszel) using a re-analysis of a previously reported randomised trial (MIST2) and a large simulation study. The re-analysis of MIST2 found that fixed-effects and Mantel-Haenszel led to many patients being dropped from the analysis due to over-stratification (up to 69% dropped for Mantel-Haenszel, and up to 33% dropped for fixed-effects). Conversely, random-effects and GEE included all patients in the analysis, however GEE did not reach convergence. Estimated treatment effects and p-values were highly variable across different analysis methods. The simulation study found that most methods of analysis performed well with a small number of centres. With a large number of centres, fixed-effects led to biased estimates and inflated type I error rates in many situations, and Mantel-Haenszel lost power compared to other analysis methods in some situations. Conversely, both random-effects and GEE gave nominal type I error rates and good power across all scenarios, and were usually as good as or better than either fixed-effects or Mantel-Haenszel. However, this was only true for GEEs with non-robust standard errors (SEs); using a robust ‘sandwich’ estimator led to inflated type I error rates across most scenarios. With a small number of centres, we recommend the use of fixed-effects, random-effects, or GEE with non-robust SEs. Random-effects and GEE with non-robust SEs should be used with a moderate or large number of centres.

Journal ArticleDOI
TL;DR: A vignette was incorporated into qualitative interview discussion guides and used successfully in rural Africa to draw out barriers to PMTCT service use; vignettes may also be valuable in HIV, health service use and drug adherence research in this setting.
Abstract: Background: Vignettes are short stories about a hypothetical person, traditionally used within research (quantitative or qualitative) on sensitive topics in the developed world. Studies using vignettes in the developing world are emerging, but with no critical examination of their usefulness in such settings. We describe the development and application of vignettes to a qualitative investigation of barriers to uptake of prevention of mother-to-child transmission (PMTCT) HIV services in rural Tanzania in 2012, and critique the successes and challenges of using the technique in this setting. Methods: Participatory Learning and Action (PLA) group activities (3 male; 3 female groups from Kisesa, north-west Tanzania) were used to develop a vignette representing realistic experiences of an HIV-infected pregnant woman in the community. The vignette was discussed during in-depth interviews with 16 HIV-positive women, 3 partners/relatives, and 5 HIV-negative women who had given birth recently. A critical analysis was applied to assess the development, implementation and usefulness of the vignette. Results: The majority of in-depth interviewees understood the concept of the vignette and felt the story was realistic, although the story or questions needed repeating in some cases. In-depth interviewers generally applied the vignette as intended, though occasionally were unsure whether to steer the conversation back to the vignette character when participants segued into personal experiences. Interviewees were occasionally confused by questions and responded with what the character should do rather than would do; also confusing fieldworkers and presenting difficulties for researchers in interpretation. Use of the vignette achieved the main objectives, putting most participants at ease and generating data on barriers to PMTCT service uptake. Participants’ responses to the vignette often reflected their own experience (revealed later in the interviews). Conclusions: Participatory group research is an effective method for developing vignettes. A vignette was incorporated into qualitative interview discussion guides and used successfully in rural Africa to draw out barriers to PMTCT service use; vignettes may also be valuable in HIV, health service use and drug adherence research in this setting. Application of this technique can prove challenging for fieldworkers, so thorough training should be provided prior to its use.

Journal ArticleDOI
TL;DR: Bias was greater when the match rate was low or the identifier error rate was high and in these cases, PII performed better than HW classification at reducing bias due to false-matches and this study highlights the importance of evaluating the potential impact of linkage error on results.
Abstract: Background: Linkage of electronic healthcare records is becoming increasingly important for research purposes. However, linkage error due to mis-recorded or missing identifiers can lead to biased results. We evaluated the impact of linkage error on estimated infection rates using two different methods for classifying links: highest-weight (HW) classification using probabilistic match weights and prior-informed imputation (PII) using match probabilities. Methods: A gold-standard dataset was created through deterministic linkage of unique identifiers in admission data from two hospitals and infection data recorded at the hospital laboratories (original data). Unique identifiers were then removed and data were re-linked by date of birth, sex and Soundex using two classification methods: i) HW classification - accepting the candidate record with the highest weight exceeding a threshold and ii) PII–imputing values from a match probability distribution. To evaluate methods for linking data with different error rates, non-random error and different match rates, we generated simulation data. Each set of simulated files was linked using both classification methods. Infection rates in the linked data were compared with those in the gold-standard data. Results: In the original gold-standard data, 1496/20924 admissions linked to an infection. In the linked original data, PII provided least biased results: 1481 and 1457 infections (upper/lower thresholds) compared with 1316 and 1287 (HW upper/lower thresholds). In the simulated data, substantial bias (up to 112%) was introduced when linkage error varied by hospital. Bias was also greater when the match rate was low or the identifier error rate was high and in these cases, PII performed better than HW classification at reducing bias due to false-matches. Conclusions: This study highlights the importance of evaluating the potential impact of linkage error on results. PII can help incorporate linkage uncertainty into analysis and reduce bias due to linkage error, without requiring identifiers.

Journal ArticleDOI
TL;DR: Performance of prognostic models constructed using the lasso technique can be optimistic as well, although results of the internal validation are sensitive to how bootstrap resampling is performed.
Abstract: Background: In prognostic studies, the lasso technique is attractive since it improves the quality of predictions by shrinking regression coefficients, compared to predictions based on a model fitted via unpenalized maximum likelihood Since some coefficients are set to zero, parsimony is achieved as well It is unclear whether the performance of a model fitted using the lasso still shows some optimism Bootstrap methods have been advocated to quantify optimism and generalize model performance to new subjects It is unclear how resampling should be performed in the presence of multiply imputed data Method: The data were based on a cohort of Chronic Obstructive Pulmonary Disease patients We constructed models to predict Chronic Respiratory Questionnaire dyspnea 6 months ahead Optimism of the lasso model was investigated by comparing 4 approaches of handling multiply imputed data in the bootstrap procedure, using the study data and simulated data sets In the first 3 approaches, data sets that had been completed via multiple imputation (MI) were resampled, while the fourth approach resampled the incomplete data set and then performed MI Results: The discriminative model performance of the lasso was optimistic There was suboptimal calibration due to over-shrinkage The estimate of optimism was sensitive to the choice of handling imputed data in the bootstrap resampling procedure Resampling the completed data sets underestimates optimism, especially if, within a bootstrap step, selected individuals differ over the imputed data sets Incorporating the MI procedure in the validation yields estimates of optimism that are closer to the true value, albeit slightly too larger Conclusion: Performance of prognostic models constructed using the lasso technique can be optimistic as well Results of the internal validation are sensitive to how bootstrap resampling is performed

Journal ArticleDOI
TL;DR: An IPD meta-analysis offers unique opportunities for risk prediction research by allowing separate model intercept terms for each study (population) to improve generalisability, and by using ‘internal-external cross-validation’ to simultaneously develop and validate their model.
Abstract: Background Risk prediction models estimate the risk of developing future outcomes for individuals based on one or more underlying characteristics (predictors). We review how researchers develop and validate risk prediction models within an individual participant data (IPD) meta-analysis, in order to assess the feasibility and conduct of the approach.