scispace - formally typeset
Search or ask a question

Showing papers in "BMC Medical Research Methodology in 2016"


Journal ArticleDOI
TL;DR: In this article, the authors conducted a scoping review to identify papers that utilized and/or described Scoping review methods; guidelines for reporting Scoping reviews; and studies that assessed the quality of reporting of scoping reviews.
Abstract: Scoping reviews are used to identify knowledge gaps, set research agendas, and identify implications for decision-making. The conduct and reporting of scoping reviews is inconsistent in the literature. We conducted a scoping review to identify: papers that utilized and/or described scoping review methods; guidelines for reporting scoping reviews; and studies that assessed the quality of reporting of scoping reviews. We searched nine electronic databases for published and unpublished literature scoping review papers, scoping review methodology, and reporting guidance for scoping reviews. Two independent reviewers screened citations for inclusion. Data abstraction was performed by one reviewer and verified by a second reviewer. Quantitative (e.g. frequencies of methods) and qualitative (i.e. content analysis of the methods) syntheses were conducted. After searching 1525 citations and 874 full-text papers, 516 articles were included, of which 494 were scoping reviews. The 494 scoping reviews were disseminated between 1999 and 2014, with 45 % published after 2012. Most of the scoping reviews were conducted in North America (53 %) or Europe (38 %), and reported a public source of funding (64 %). The number of studies included in the scoping reviews ranged from 1 to 2600 (mean of 118). Using the Joanna Briggs Institute methodology guidance for scoping reviews, only 13 % of the scoping reviews reported the use of a protocol, 36 % used two reviewers for selecting citations for inclusion, 29 % used two reviewers for full-text screening, 30 % used two reviewers for data charting, and 43 % used a pre-defined charting form. In most cases, the results of the scoping review were used to identify evidence gaps (85 %), provide recommendations for future research (84 %), or identify strengths and limitations (69 %). We did not identify any guidelines for reporting scoping reviews or studies that assessed the quality of scoping review reporting. The number of scoping reviews conducted per year has steadily increased since 2012. Scoping reviews are used to inform research agendas and identify implications for policy or practice. As such, improvements in reporting and conduct are imperative. Further research on scoping review methodology is warranted, and in particular, there is need for a guideline to standardize reporting.

856 citations


Journal ArticleDOI
TL;DR: The snowball sampling method achieved greater participation with more Hispanics but also more individuals with disabilities than a purposive-convenience sampling method, however, priorities for research on chronic pain from both stakeholder groups were similar.
Abstract: Effective community-partnered and patient-centered outcomes research needs to address community priorities However, optimal sampling methods to engage stakeholders from hard-to-reach, vulnerable communities to generate research priorities have not been identified In two similar rural, largely Hispanic communities, a community advisory board guided recruitment of stakeholders affected by chronic pain using a different method in each community: 1) snowball sampling, a chain- referral method or 2) purposive sampling to recruit diverse stakeholders In both communities, three groups of stakeholders attended a series of three facilitated meetings to orient, brainstorm, and prioritize ideas (9 meetings/community) Using mixed methods analysis, we compared stakeholder recruitment and retention as well as priorities from both communities’ stakeholders on mean ratings of their ideas based on importance and feasibility for implementation in their community Of 65 eligible stakeholders in one community recruited by snowball sampling, 55 (85 %) consented, 52 (95 %) attended the first meeting, and 36 (65 %) attended all 3 meetings In the second community, the purposive sampling method was supplemented by convenience sampling to increase recruitment Of 69 stakeholders recruited by this combined strategy, 62 (90 %) consented, 36 (58 %) attended the first meeting, and 26 (42 %) attended all 3 meetings Snowball sampling recruited more Hispanics and disabled persons (all P < 005) Despite differing recruitment strategies, stakeholders from the two communities identified largely similar ideas for research, focusing on non-pharmacologic interventions for management of chronic pain Ratings on importance and feasibility for community implementation differed only on the importance of massage services (P = 0045) which was higher for the purposive/convenience sampling group and for city improvements/transportation services (P = 0004) which was higher for the snowball sampling group In each of the two similar hard-to-reach communities, a community advisory board partnered with researchers to implement a different sampling method to recruit stakeholders The snowball sampling method achieved greater participation with more Hispanics but also more individuals with disabilities than a purposive-convenience sampling method However, priorities for research on chronic pain from both stakeholder groups were similar Although utilizing a snowball sampling method appears to be superior, further research is needed on implementation costs and resources

315 citations


Journal ArticleDOI
TL;DR: It is shown that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation, and there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
Abstract: Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared. The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.

275 citations


Journal ArticleDOI
TL;DR: In this paper, the authors developed a clinical pathway algorithm that sets forth a stepwise process for making decisions about the diagnosis and treatment of rotator cuff pathology presenting to primary, secondary, and tertiary healthcare settings.
Abstract: Patients presenting to the healthcare system with rotator cuff pathology do not always receive high quality care. High quality care occurs when a patient receives care that is accessible, appropriate, acceptable, effective, efficient, and safe. The aim of this study was twofold: 1) to develop a clinical pathway algorithm that sets forth a stepwise process for making decisions about the diagnosis and treatment of rotator cuff pathology presenting to primary, secondary, and tertiary healthcare settings; and 2) to establish clinical practice guidelines for the diagnosis and treatment of rotator cuff pathology to inform decision-making processes within the algorithm. A three-step modified Delphi method was used to establish consensus. Fourteen experts representing athletic therapy, physiotherapy, sport medicine, and orthopaedic surgery were invited to participate as the expert panel. In round 1, 123 best practice statements were distributed to the panel. Panel members were asked to mark “agree” or “disagree” beside each statement, and provide comments. The same voting method was again used for round 2. Round 3 consisted of a final face-to-face meeting. In round 1, statements were grouped and reduced to 44 statements that met consensus. In round 2, five statements reached consensus. In round 3, ten statements reached consensus. Consensus was reached for 59 statements representing five domains: screening, diagnosis, physical examination, investigations, and treatment. The final face-to-face meeting was also used to develop clinical pathway algorithms (i.e., clinical care pathways) for three types of rotator cuff pathology: acute, chronic, and acute-on-chronic. This consensus guideline will help to standardize care, provide guidance on the diagnosis and treatment of rotator cuff pathology, and assist in clinical decision-making for all healthcare professionals.

256 citations


Journal ArticleDOI
TL;DR: Why and how purposeful sampling was used in a qualitative evidence synthesis about ‘sexual adjustment to a cancer trajectory’ and the possible inclusion of new perspectives to the line-of-argument were discussed, which could make the results more conceptually aligned with the synthesis purpose.
Abstract: An increasing number of qualitative evidence syntheses papers are found in health care literature. Many of these syntheses use a strictly exhaustive search strategy to collect articles, mirroring the standard template developed by major review organizations such as the Cochrane and Campbell Collaboration. The hegemonic idea behind it is that non-comprehensive samples in systematic reviews may introduce selection bias. However, exhaustive sampling in a qualitative evidence synthesis has been questioned, and a more purposeful way of sampling papers has been proposed as an alternative, although there is a lack of transparency on how these purposeful sampling strategies might be applied to a qualitative evidence synthesis. We discuss in our paper why and how we used purposeful sampling in a qualitative evidence synthesis about ‘sexual adjustment to a cancer trajectory’, by giving a worked example. We have chosen a mixed purposeful sampling, combining three different strategies that we considered the most consistent with our research purpose: intensity sampling, maximum variation sampling and confirming/disconfirming case sampling. The concept of purposeful sampling on the meta-level could not readily been borrowed from the logic applied in basic research projects. It also demands a considerable amount of flexibility, and is labour-intensive, which goes against the argument of many authors that using purposeful sampling provides a pragmatic solution or a short cut for researchers, compared with exhaustive sampling. Opportunities of purposeful sampling were the possible inclusion of new perspectives to the line-of-argument and the enhancement of the theoretical diversity of the papers being included, which could make the results more conceptually aligned with the synthesis purpose. This paper helps researchers to make decisions related to purposeful sampling in a more systematic and transparent way. Future research could confirm or disconfirm the hypothesis of conceptual enhancement by comparing the findings of a purposefully sampled qualitative evidence synthesis with those drawing on an exhaustive sample of the literature.

238 citations


Journal ArticleDOI
TL;DR: The application of methods for integrating sex and gender in implementation research is described, which has potential to strengthen both the practice and science of implementation, improve health outcomes and reduce gender inequities.
Abstract: There has been a recent swell in activity by health research funding organizations and science journal editors to increase uptake of sex and gender considerations in study design, conduct and reporting in order to ensure that research results apply to everyone However, examination of the implementation research literature reveals that attention to sex and gender has not yet infiltrated research methods in this field The rationale for routinely considering sex and gender in implementation research is multifold Sex and gender are important in decision-making, communication, stakeholder engagement and preferences for the uptake of interventions Gender roles, gender identity, gender relations, and institutionalized gender influence the way in which an implementation strategy works, for whom, under what circumstances and why There is emerging evidence that programme theories may operate differently within and across sexes, genders and other intersectional characteristics under various circumstances Furthermore, without proper study, implementation strategies may inadvertently exploit or ignore, rather than transform thinking about sex and gender-related factors Techniques are described for measuring and analyzing sex and gender in implementation research using both quantitative and qualitative methods The present paper describes the application of methods for integrating sex and gender in implementation research Consistently asking critical questions about sex and gender will likely lead to the discovery of positive outcomes, as well as unintended consequences The result has potential to strengthen both the practice and science of implementation, improve health outcomes and reduce gender inequities

205 citations


Journal ArticleDOI
TL;DR: Fleiss’ K and Krippendorff’s alpha with bootstrap confidence intervals are equally suitable for the analysis of reliability of complete nominal data and the asymptotic confidence interval for Fleiss' K should not be used.
Abstract: Reliability of measurements is a prerequisite of medical research. For nominal data, Fleiss’ kappa (in the following labelled as Fleiss’ K) and Krippendorff’s alpha provide the highest flexibility of the available reliability measures with respect to number of raters and categories. Our aim was to investigate which measures and which confidence intervals provide the best statistical properties for the assessment of inter-rater reliability in different situations. We performed a large simulation study to investigate the precision of the estimates for Fleiss’ K and Krippendorff’s alpha and to determine the empirical coverage probability of the corresponding confidence intervals (asymptotic for Fleiss’ K and bootstrap for both measures). Furthermore, we compared measures and confidence intervals in a real world case study. Point estimates of Fleiss’ K and Krippendorff’s alpha did not differ from each other in all scenarios. In the case of missing data (completely at random), Krippendorff’s alpha provided stable estimates, while the complete case analysis approach for Fleiss’ K led to biased estimates. For shifted null hypotheses, the coverage probability of the asymptotic confidence interval for Fleiss’ K was low, while the bootstrap confidence intervals for both measures provided a coverage probability close to the theoretical one. Fleiss’ K and Krippendorff’s alpha with bootstrap confidence intervals are equally suitable for the analysis of reliability of complete nominal data. The asymptotic confidence interval for Fleiss’ K should not be used. In the case of missing data or data or higher than nominal order, Krippendorff’s alpha is recommended. Together with this article, we provide an R-script for calculating Fleiss’ K and Krippendorff’s alpha and their corresponding bootstrap confidence intervals.

193 citations


Journal ArticleDOI
TL;DR: Using the topic of low-calorie sweeteners and selected health outcomes, the process of creating an evidence-map database is described and several example descriptive analyses using this database are demonstrated.
Abstract: Evidence mapping is an emerging tool used to systematically identify, organize and summarize the quantity and focus of scientific evidence on a broad topic, but there are currently no methodological standards. Using the topic of low-calorie sweeteners (LCS) and selected health outcomes, we describe the process of creating an evidence-map database and demonstrate several example descriptive analyses using this database. The process of creating an evidence-map database is described in detail. The steps include: developing a comprehensive literature search strategy, establishing study eligibility criteria and a systematic study selection process, extracting data, developing outcome groups with input from expert stakeholders and tabulating data using descriptive analyses. The database was uploaded onto SRDR™ (Systematic Review Data Repository), an open public data repository. Our final LCS evidence-map database included 225 studies, of which 208 were interventional studies and 17 were cohort studies. An example bubble plot was produced to display the evidence-map data and visualize research gaps according to four parameters: comparison types, population baseline health status, outcome groups, and study sample size. This plot indicated a lack of studies assessing appetite and dietary intake related outcomes using LCS with a sugar intake comparison in people with diabetes. Evidence mapping is an important tool for the contextualization of in-depth systematic reviews within broader literature and identifies gaps in the evidence base, which can be used to inform future research. An open evidence-map database has the potential to promote knowledge translation from nutrition science to policy.

165 citations


Journal ArticleDOI
TL;DR: Although, in an era of personalized medicine, the value of multivariate joint modelling has been established, researchers are currently limited in their ability to fit these models routinely.
Abstract: Available methods for the joint modelling of longitudinal and time-to-event outcomes have typically only allowed for a single longitudinal outcome and a solitary event time. In practice, clinical studies are likely to record multiple longitudinal outcomes. Incorporating all sources of data will improve the predictive capability of any model and lead to more informative inferences for the purpose of medical decision-making. We reviewed current methodologies of joint modelling for time-to-event data and multivariate longitudinal data including the distributional and modelling assumptions, the association structures, estimation approaches, software tools for implementation and clinical applications of the methodologies. We found that a large number of different models have recently been proposed. Most considered jointly modelling linear mixed models with proportional hazard models, with correlation between multiple longitudinal outcomes accounted for through multivariate normally distributed random effects. So-called current value and random effects parameterisations are commonly used to link the models. Despite developments, software is still lacking, which has translated into limited uptake by medical researchers. Although, in an era of personalized medicine, the value of multivariate joint modelling has been established, researchers are currently limited in their ability to fit these models routinely. We make a series of recommendations for future research needs.

135 citations


Journal ArticleDOI
TL;DR: In this paper, a systematic search of three databases (Scopus, CINAHL and Cochrane) for studies published between January 2000 to April 2015 using fatigue and variations of the term MID, e.g. MCID, MIC, etc.
Abstract: Fatigue is the most frequent symptom reported by patients with chronic illnesses. As a subjective experience, fatigue is commonly assessed with patient-reported outcome measures (PROMs). Currently, there are more than 40 generic and disease-specific PROMs for assessing fatigue in use today. The interpretation of changes in PROM scores may be enhanced by estimates of the so-called minimal important difference (MID). MIDs are not fixed attributes of PROMs but rather vary in relation to estimation method, clinical and demographic characteristics of the study group, etc. The purpose of this paper is to compile published MIDs for fatigue PROMs, spanning diagnostic/patient groups and estimation methods, and to provide information relevant for appraising their appropriateness for use in specific clinical trials and in monitoring fatigue in defined patient groups in routine clinical practice. A systematic search of three databases (Scopus, CINAHL and Cochrane) for studies published between January 2000 to April 2015 using fatigue and variations of the term MID, e.g. MCID, MIC, etc. Two authors screened search hits and extracted data independently. Data regarding MIDs, anchors used and study designs were compiled in tables. Included studies (n = 41) reported 60 studies or substudies estimating MID for 28 fatigue scales, subscales or single item measures in a variety of diagnostic groups and study designs. All studies used anchor-based methods, 21/60 measures also included distribution-based methods and 17/60 used triangulation of methods. Both similarities and dissimilarities were seen within the MIDs. Magnitudes of published MIDs for fatigue PROMs vary considerably. Information about the derivation of fatigue MIDs is needed to evaluate their applicability and suitability for use in clinical practice and research.

131 citations


Journal ArticleDOI
TL;DR: The approach uses all the available information and results in an estimation not only of the performance of the biomarker but also of the threshold at which the optimal performance can be expected.
Abstract: In meta-analyses of diagnostic test accuracy, routinely only one pair of sensitivity and specificity per study is used. However, for tests based on a biomarker or a questionnaire often more than one threshold and the corresponding values of true positives, true negatives, false positives and false negatives are known. We present a new meta-analysis approach using this additional information. It is based on the idea of estimating the distribution functions of the underlying biomarker or questionnaire within the non-diseased and diseased individuals. Assuming a normal or logistic distribution, we estimate the distribution parameters in both groups applying a linear mixed effects model to the transformed data. The model accounts for across-study heterogeneity and dependence of sensitivity and specificity. In addition, a simulation study is presented. We obtain a summary receiver operating characteristic (SROC) curve as well as the pooled sensitivity and specificity at every specific threshold. Furthermore, the determination of an optimal threshold across studies is possible through maximization of the Youden index. We demonstrate our approach using two meta-analyses of B type natriuretic peptide in heart failure and procalcitonin as a marker for sepsis. Our approach uses all the available information and results in an estimation not only of the performance of the biomarker but also of the threshold at which the optimal performance can be expected.

Journal ArticleDOI
TL;DR: For the data scenarios examined, choice of optimal LASSO-type method was data structure dependent and should be guided by the research objective.
Abstract: The study of circulating biomarkers and their association with disease outcomes has become progressively complex due to advances in the measurement of these biomarkers through multiplex technologies. The Least Absolute Shrinkage and Selection Operator (LASSO) is a data analysis method that may be utilized for biomarker selection in these high dimensional data. However, it is unclear which LASSO-type method is preferable when considering data scenarios that may be present in serum biomarker research, such as high correlation between biomarkers, weak associations with the outcome, and sparse number of true signals. The goal of this study was to compare the LASSO to five LASSO-type methods given these scenarios. A simulation study was performed to compare the LASSO, Adaptive LASSO, Elastic Net, Iterated LASSO, Bootstrap-Enhanced LASSO, and Weighted Fusion for the binary logistic regression model. The simulation study was designed to reflect the data structure of the population-based Tucson Epidemiological Study of Airway Obstructive Disease (TESAOD), specifically the sample size (N = 1000 for total population, 500 for sub-analyses), correlation of biomarkers (0.20, 0.50, 0.80), prevalence of overweight (40%) and obese (12%) outcomes, and the association of outcomes with standardized serum biomarker concentrations (log-odds ratio = 0.05–1.75). Each LASSO-type method was then applied to the TESAOD data of 306 overweight, 66 obese, and 463 normal-weight subjects with a panel of 86 serum biomarkers. Based on the simulation study, no method had an overall superior performance. The Weighted Fusion correctly identified more true signals, but incorrectly included more noise variables. The LASSO and Elastic Net correctly identified many true signals and excluded more noise variables. In the application study, biomarkers of overweight and obesity selected by all methods were Adiponectin, Apolipoprotein H, Calcitonin, CD14, Complement 3, C-reactive protein, Ferritin, Growth Hormone, Immunoglobulin M, Interleukin-18, Leptin, Monocyte Chemotactic Protein-1, Myoglobin, Sex Hormone Binding Globulin, Surfactant Protein D, and YKL-40. For the data scenarios examined, choice of optimal LASSO-type method was data structure dependent and should be guided by the research objective. The LASSO-type methods identified biomarkers that have known associations with obesity and obesity related conditions.

Journal ArticleDOI
TL;DR: There is the need for research on the minimum number of clusters required for both types of stepped wedge design, and researchers should distinguish in the sample size calculation between cohort and cross sectional stepped wedge designs.
Abstract: Previous reviews have focussed on the rationale for employing the stepped wedge design (SWD), the areas of research to which the design has been applied and the general characteristics of the design. However these did not focus on the statistical methods nor addressed the appropriateness of sample size methods used.This was a review of the literature of the statistical methodology used in stepped wedge cluster randomised trials. Literature Review. The Medline, Embase, PsycINFO, CINAHL and Cochrane databases were searched for methodological guides and RCTs which employed the stepped wedge design. This review identified 102 trials which employed the stepped wedge design compared to 37 from the most recent review by Beard et al. 2015. Forty six trials were cohort designs and 45 % (n = 46) had fewer than 10 clusters. Of the 42 articles discussing the design methodology 10 covered analysis and seven covered sample size. For cohort stepped wedge designs there was only one paper considering analysis and one considering sample size methods. Most trials employed either a GEE or mixed model approach to analysis (n = 77) but only 22 trials (22 %) estimated sample size in a way which accounted for the stepped wedge design that was subsequently used. Many studies which employ the stepped wedge design have few clusters but use methods of analysis which may require more clusters for unbiased and efficient intervention effect estimates. There is the need for research on the minimum number of clusters required for both types of stepped wedge design. Researchers should distinguish in the sample size calculation between cohort and cross sectional stepped wedge designs. Further research is needed on the effect of adjusting for the potential confounding of time on the study power.

Journal ArticleDOI
TL;DR: The described approach should be adopted to combine correlated differences in means of quantitative outcomes arising from multiple studies and can be a useful tool to assess the robustness of the overall dose-response curve to different modelling strategies.
Abstract: Meta-analytical methods are frequently used to combine dose-response findings expressed in terms of relative risks. However, no methodology has been established when results are summarized in terms of differences in means of quantitative outcomes. We proposed a two-stage approach. A flexible dose-response model is estimated within each study (first stage) taking into account the covariance of the data points (mean differences, standardized mean differences). Parameters describing the study-specific curves are then combined using a multivariate random-effects model (second stage) to address heterogeneity across studies. The method is fairly general and can accommodate a variety of parametric functions. Compared to traditional non-linear models (e.g. E max, logistic), spline models do not assume any pre-specified dose-response curve. Spline models allow inclusion of studies with a small number of dose levels, and almost any shape, even non monotonic ones, can be estimated using only two parameters. We illustrated the method using dose-response data arising from five clinical trials on an antipsychotic drug, aripiprazole, and improvement in symptoms in shizoaffective patients. Using the Positive and Negative Syndrome Scale (PANSS), pooled results indicated a non-linear association with the maximum change in mean PANSS score equal to 10.40 (95 % confidence interval 7.48, 13.30) observed for 19.32 mg/day of aripiprazole. No substantial change in PANSS score was observed above this value. An estimated dose of 10.43 mg/day was found to produce 80 % of the maximum predicted response. The described approach should be adopted to combine correlated differences in means of quantitative outcomes arising from multiple studies. Sensitivity analysis can be a useful tool to assess the robustness of the overall dose-response curve to different modelling strategies. A user-friendly R package has been developed to facilitate applications by practitioners.

Journal ArticleDOI
TL;DR: Some key methodological components of the systematic review process—search for grey literature, description of the type of NRSI included, assessment of risk of confounding bias and reporting of whether crude or adjusted estimates were combined—are not adequately carried out or reported in meta-analyses including NRSI.
Abstract: There is an increasing number of meta-analyses including data from non-randomized studies for therapeutic evaluation. We aimed to systematically assess the methods used in meta-analyses including non-randomized studies evaluating therapeutic interventions. For this methodological review, we searched MEDLINE via PubMed, from January 1, 2013 to December 31, 2013 for meta-analyses including at least one non-randomized study evaluating therapeutic interventions. Etiological assessments and meta-analyses with no comparison group were excluded. Two reviewers independently assessed the general characteristics and key methodological components of the systematic review process and meta-analysis methods. One hundred eighty eight meta-analyses were selected: 119 included both randomized controlled trials (RCTs) and non-randomized studies of interventions (NRSI) and 69 only NRSI. Half of the meta-analyses (n = 92, 49 %) evaluated non-pharmacological interventions. “Grey literature” was searched for 72 meta-analyses (38 %). An assessment of methodological quality or risk of bias was reported in 135 meta-analyses (72 %) but this assessment considered the risk of confounding bias in only 33 meta-analyses (18 %). In 130 meta-analyses (69 %), the design of each NRSI was not clearly specified. In 131 (70 %), whether crude or adjusted estimates of treatment effect for NRSI were combined was unclear or not reported. Heterogeneity across studies was assessed in 182 meta-analyses (97 %) and further explored in 157 (84 %). Reporting bias was assessed in 127 (68 %). Some key methodological components of the systematic review process—search for grey literature, description of the type of NRSI included, assessment of risk of confounding bias and reporting of whether crude or adjusted estimates were combined—are not adequately carried out or reported in meta-analyses including NRSI.

Journal ArticleDOI
TL;DR: Recommendations on best practices to de-identify/anonymise clinical trial data for sharing with third-party researchers, as well as controlled access to data and data sharing agreements are provided.
Abstract: Greater transparency and, in particular, sharing of patient-level data for further scientific research is an increasingly important topic for the pharmaceutical industry and other organisations who sponsor and conduct clinical trials as well as generally in the interests of patients participating in studies. A concern remains, however, over how to appropriately prepare and share clinical trial data with third party researchers, whilst maintaining patient confidentiality. Clinical trial datasets contain very detailed information on each participant. Risk to patient privacy can be mitigated by data reduction techniques. However, retention of data utility is important in order to allow meaningful scientific research. In addition, for clinical trial data, an excessive application of such techniques may pose a public health risk if misleading results are produced. After considering existing guidance, this article makes recommendations with the aim of promoting an approach that balances data utility and privacy risk and is applicable across clinical trial data holders. Our key recommendations are as follows: This article provides recommendations on best practices to de-identify/anonymise clinical trial data for sharing with third-party researchers, as well as controlled access to data and data sharing agreements. The recommendations are applicable to all clinical trial data holders. Further work will be needed to identify and evaluate competing possibilities as regulations, attitudes to risk and technologies evolve.

Journal ArticleDOI
TL;DR: The customization of model-building strategies and study designs through simulations that consider the likely imperfections in the data, as well as finite-sample behavior, would constitute an important improvement on some of the currently prevailing practices in confounder identification and evaluation.
Abstract: Common methods for confounder identification such as directed acyclic graphs (DAGs), hypothesis testing, or a 10 % change-in-estimate (CIE) criterion for estimated associations may not be applicable due to (a) insufficient knowledge to draw a DAG and (b) when adjustment for a true confounder produces less than 10 % change in observed estimate (e.g. in presence of measurement error). We compare previously proposed simulation-based approach for confounder identification that can be tailored to each specific study and contrast it with commonly applied methods (significance criteria with cutoff levels of p-values of 0.05 or 0.20, and CIE criterion with a cutoff of 10 %), as well as newly proposed two-stage procedure aimed at reduction of false positives (specifically, risk factors that are not confounders). The new procedure first evaluates potential for confounding by examination of correlation of covariates and applies simulated CIE criteria only if there is evidence of correlation, while rejecting a covariate as confounder otherwise. These approaches are compared in simulations studies with binary, continuous, and survival outcomes. We illustrate the application of our proposed confounder identification strategy in examining the association of exposure to mercury in relation to depression in the presence of suspected confounding by fish intake using the National Health and Nutrition Examination Survey (NHANES) 2009–2010 data. Our simulations showed that the simulation-determined cutoff was very sensitive to measurement error in exposure and potential confounder. The analysis of NHANES data demonstrated that if the noise-to-signal ratio (error variance in confounder/variance of confounder) is at or below 0.5, roughly 80 % of the simulated analyses adjusting for fish consumption would correctly result in a null association of mercury and depression, and only an extremely poorly measured confounder is not useful to adjust for in this setting. No a prior criterion developed for a specific application is guaranteed to be suitable for confounder identification in general. The customization of model-building strategies and study designs through simulations that consider the likely imperfections in the data, as well as finite-sample behavior, would constitute an important improvement on some of the currently prevailing practices in confounder identification and evaluation.

Journal ArticleDOI
TL;DR: The main features and functionalities of HEAT are presented and its relevance and use are discussed and its significance and use for health inequality monitoring are discussed.
Abstract: Background It is widely recognised that the pursuit of sustainable development cannot be accomplished without addressing inequality, or observed differences between subgroups of a population. Monitoring health inequalities allows for the identification of health topics where major group differences exist, dimensions of inequality that must be prioritised to effect improvements in multiple health domains, and also population subgroups that are multiply disadvantaged. While availability of data to monitor health inequalities is gradually improving, there is a commensurate need to increase, within countries, the technical capacity for analysis of these data and interpretation of results for decision-making. Prior efforts to build capacity have yielded demand for a toolkit with the computational ability to display disaggregated data and summary measures of inequality in an interactive and customisable fashion that would facilitate interpretation and reporting of health inequality in a given country.

Journal ArticleDOI
TL;DR: Results did not change in a systematic manner (i.e., regularly over- or underestimating treatment effects), suggesting that selective searching may not introduce bias in terms of effect estimates.
Abstract: One of the best sources for high quality information about healthcare interventions is a systematic review. A well-conducted systematic review includes a comprehensive literature search. There is limited empiric evidence to guide the extent of searching, in particular the number of electronic databases that should be searched. We conducted a cross-sectional quantitative analysis to examine the potential impact of selective database searching on results of meta-analyses. Our sample included systematic reviews (SRs) with at least one meta-analysis from three Cochrane Review Groups: Acute Respiratory Infections (ARI), Infectious Diseases (ID), Developmental Psychosocial and Learning Problems (DPLP) (n = 129). Outcomes included: 1) proportion of relevant studies indexed in each of 10 databases; and 2) changes in results and statistical significance of primary meta-analysis for studies identified in Medline only and in Medline plus each of the other databases. Due to variation across topics, we present results by group (ARI n = 57, ID n = 38, DPLP n = 34). For ARI, identification of relevant studies was highest for Medline (85 %) and Embase (80 %). Restricting meta-analyses to trials that appeared in Medline + Embase yielded fewest changes in statistical significance: 53/55 meta-analyses showed no change. Point estimates changed in 12 cases; in 7 the change was less than 20 %. For ID, yield was highest for Medline (92 %), Embase (81 %), and BIOSIS (67 %). Restricting meta-analyses to trials that appeared in Medline + BIOSIS yielded fewest changes with 1 meta-analysis changing in statistical significance. Point estimates changed in 8 of 31 meta-analyses; change less than 20 % in all cases. For DPLP, identification of relevant studies was highest for Medline (75 %) and Embase (62 %). Restricting meta-analyses to trials that appeared in Medline + PsycINFO resulted in only one change in significance. Point estimates changed for 13 of 33 meta-analyses; less than 20 % in 9 cases. Majority of relevant studies can be found within a limited number of databases. Results of meta-analyses based on the majority of studies did not differ in most cases. There were very few cases of changes in statistical significance. Effect estimates changed in a minority of meta-analyses but in most the change was small. Results did not change in a systematic manner (i.e., regularly over- or underestimating treatment effects), suggesting that selective searching may not introduce bias in terms of effect estimates.

Journal ArticleDOI
TL;DR: The modified NGT process, criteria and tools contribute to building a suite of methods that can be applied in prioritising evidence-practice gaps and could be adapted for other health settings within the broader context of implementation science projects.
Abstract: There are a variety of methods for priority setting in health research but few studies have addressed how to prioritise the gaps that exist between research evidence and clinical practice. This study aimed to build a suite of robust, evidence based techniques and tools for use in implementation science projects. We applied the priority setting methodology in lung cancer care as an example. We reviewed existing techniques and tools for priority setting in health research and the criteria used to prioritise items. An expert interdisciplinary consensus group comprised of health service, cancer and nursing researchers iteratively reviewed and adapted the techniques and tools. We tested these on evidence-practice gaps identified for lung cancer. The tools were pilot tested and finalised. A brief process evaluation was conducted. We based our priority setting on the Nominal Group Technique (NGT). The adapted tools included a matrix for individuals to privately rate priority gaps; the same matrix was used for group discussion and reaching consensus. An investment exercise was used to validate allocation of priorities across the gaps. We describe the NGT process, criteria and tool adaptations and process evaluation results. The modified NGT process, criteria and tools contribute to building a suite of methods that can be applied in prioritising evidence-practice gaps. These methods could be adapted for other health settings within the broader context of implementation science projects.

Journal ArticleDOI
TL;DR: The need for considering different approaches to measures of SES among adolescences and when evaluating SES in relation to HRQOL is indicated, indicating the need for sustainable ways to measure SES.
Abstract: Research has shown inconsistencies in results and difficulties in conceptualization of assessment of socioeconomic status (SES) among adolescents. The aim of this study was thus to test the validity of self-reported information on SES in two age-groups (11–13 and 14–16 years old) in an adolescent population and to evaluate its relationship to self-reported health related quality of life (HRQOL). Different measures of SES commonly used in research in relation to HRQOL were tested in this study; parent’s occupations status, family material affluence status (FAS) and perceived SES. A cross-sectional study, with a sample of 948 respondents (n = 467, 11–13 years old and n = 481, 14–16 years old) completed questionnaires about SES and HRQOL. The adolescents’ completion rates were used, with chi2-test, to investigate differences between gender and age-group. Correlation was used for convergent validity and ANOVA for concurrent validity. We found a low completion rate for both fathers’ (41.7 %) and mothers' (37.5 %) occupation status, and a difference in completion rate between gender and age-groups. FAS had the highest completion rate (100 %) compared to parent's occupations status and perceived SES. The convergent validity between the SES-indicators was weak (Spearman correlation coefficient below 0.3), suggesting that the indicators measured different dimensions of SES. Both FAS and perceived SES showed a gradient in mean HRQOL between low and high SES in relation to HRQOL, this was significant only for perceived SES (p < 0.01, both age-groups). This study indicates the need for considering different approaches to measures of SES among adolescences and when evaluating SES in relation to HRQOL. Further research is needed to investigate sustainable ways to measure SES, delineating the relevance of tangible measures of education, occupation and income in relation to the perceived socioeconomic status in comparison with others in immediate social networks and in society at large.

Journal ArticleDOI
TL;DR: The typology developed outlines the common ways dual-role is experienced in research involving clinician-researchers and patient-participants, and perhaps the inevitability of the experience given the primacy accorded to patient well-being.
Abstract: Many health researchers are clinicians. Dual-role experiences are common for clinician-researchers in research involving patient-participants, even if not their own patients. To extend the existing body of literature on why dual-role is experienced, we aimed to develop a typology of common catalysts for dual-role experiences to help clinician-researchers plan and implement methodologically and ethically sound research. Systematic searching of Medline, CINAHL, PsycINFO, Embase and Scopus (inception to 28.07.2014) for primary studies or first-person reflexive reports of clinician-researchers’ dual-role experiences, supplemented by reference list checking and Google Scholar scoping searches. Included articles were loaded in NVivo for analysis. The coding was focused on how dual-role was evidenced for the clinician-researchers in research involving patients. Procedures were completed by one researcher (MB) and independently cross-checked by another (JHS). All authors contributed to extensive discussions to resolve all disagreements about initial coding and verify the final themes. Database searching located 7135 records, resulting in 29 included studies, with the addition of 7 studies through reference checks and scoping searches. Two overarching themes described the most common catalysts for dual-role experiences – ways a research role can involve patterns of behaviour typical of a clinical role, and the developing connection that starts to resemble a clinician-patient relationship. Five subthemes encapsulated the clinical patterns commonly repeated in research settings (clinical queries, perceived agenda, helping hands, uninvited clinical expert, and research or therapy) and five subthemes described concerns about the researcher-participant relationship (clinical assumptions, suspicion and holding back, revelations, over-identification, and manipulation). Clinician-researchers use their clinical skills in health research in ways that set up a relationship resembling that of clinician-patient. Clinicians’ ingrained orientation to patients’ needs can be in tension with their research role, and can set up ethical and methodological challenges. The typology we developed outlines the common ways dual-role is experienced in research involving clinician-researchers and patient-participants, and perhaps the inevitability of the experience given the primacy accorded to patient well-being. The typology offers clinician-researchers a framework for grappling with the ethical and methodological implications of dual-role throughout the research process, including planning, implementation, monitoring and reporting.

Journal ArticleDOI
TL;DR: Six methods of reweighting are examined and a new method leads to weighted distributions that more accurately reproduce national demographic characteristics that will reduce bias in estimates of these outcomes at the national level.
Abstract: The Behavioral Risk Factor Surveillance System (BRFSS) is a network of health-related telephone surveys--conducted by all 50 states, the District of Columbia, and participating US territories—that receive technical assistance from CDC. Data users often aggregate BRFSS state samples for national estimates without accounting for state-level sampling, a practice that could introduce bias because the weighted distributions of the state samples do not always adhere to national demographic distributions. This article examines six methods of reweighting, which are then compared with key health indicator estimates from the National Health Interview Survey (NHIS) based on 2013 data. Compared to the usual stacking approach, all of the six new methods reduce the variance of weights and design effect at the national level, and some also reduce the estimated bias. This article also provides a comparison of the methods based on the variances induced by unequal weighting as well as the bias reduction induced by raking at the national level, and recommends a preferred method. The new method leads to weighted distributions that more accurately reproduce national demographic characteristics. While the empirical results for key estimates were limited to a few health indicators, they also suggest reduction in potential bias and mean squared error. To the extent that survey outcomes are associated with these demographic characteristics, matching the national distributions will reduce bias in estimates of these outcomes at the national level.

Journal ArticleDOI
TL;DR: A clear operational definition for consistency of treatment effects across subgroups is lacking, but is needed to improve the usability of subgroup analyses in this setting, and methods to particularly explore benefit-risk systematically across sub groups need more research.
Abstract: It is well recognized that treatment effects may not be homogeneous across the study population. Subgroup analyses constitute a fundamental step in the assessment of evidence from confirmatory (Phase III) clinical trials, where conclusions for the overall study population might not hold. Subgroup analyses can have different and distinct purposes, requiring specific design and analysis solutions. It is relevant to evaluate methodological developments in subgroup analyses against these purposes to guide health care professionals and regulators as well as to identify gaps in current methodology.

Journal ArticleDOI
TL;DR: An exploratory study to provide an in-depth characterization of a neighborhood’s social and physical environment in relation to cardiovascular health in Madrid, using quantitative and qualitative data following a mixed-methods merging approach.
Abstract: Our aim is to conduct an exploratory study to provide an in-depth characterization of a neighborhood’s social and physical environment in relation to cardiovascular health. A mixed-methods approach was used to better understand the food, alcohol, tobacco and physical activity domains of the urban environment. We conducted this study in an area of 16,000 residents in Madrid (Spain). We obtained cardiovascular health and risk factors data from all residents aged 45 and above using Electronic Health Records from the Madrid Primary Health Care System. We used several quantitative audit tools to assess: the type and location of food outlets and healthy food availability; tobacco and alcohol points of sale; walkability of all streets and use of parks and public spaces. We also conducted 11 qualitative interviews with key informants to help understanding the relationships between urban environment and cardiovascular behaviors. We integrated quantitative and qualitative data following a mixed-methods merging approach. Electronic Health Records of the entire population of the area showed similar prevalence of risk factors compared to the rest of Madrid/Spain (prevalence of diabetes: 12 %, hypertension: 34 %, dyslipidemia: 32 %, smoking: 10 %, obesity: 20 %). The food environment was very dense, with many small stores (n = 44) and a large food market with 112 stalls. Residents highlighted the importance of these small stores for buying healthy foods. Alcohol and tobacco environments were also very dense (n = 91 and 64, respectively), dominated by bars and restaurants (n = 53) that also acted as food services. Neighbors emphasized the importance of drinking as a socialization mechanism. Public open spaces were mostly used by seniors that remarked the importance of accessibility to these spaces and the availability of destinations to walk to. This experience allowed testing and refining measurement tools, drawn from epidemiology, geography, sociology and anthropology, to better understand the urban environment in relation to cardiovascular health.

Journal ArticleDOI
TL;DR: A novel test that unites the Cox test with a permutation test based on restricted mean survival time that increases trial power under an early treatment effect and protects power under other scenarios.
Abstract: Most randomized controlled trials with a time-to-event outcome are designed assuming proportional hazards (PH) of the treatment effect. The sample size calculation is based on a logrank test. However, non-proportional hazards are increasingly common. At analysis, the estimated hazards ratio with a confidence interval is usually presented. The estimate is often obtained from a Cox PH model with treatment as a covariate. If non-proportional hazards are present, the logrank and equivalent Cox tests may lose power. To safeguard power, we previously suggested a ‘joint test’ combining the Cox test with a test of non-proportional hazards. Unfortunately, a larger sample size is needed to preserve power under PH. Here, we describe a novel test that unites the Cox test with a permutation test based on restricted mean survival time. We propose a combined hypothesis test based on a permutation test of the difference in restricted mean survival time across time. The test involves the minimum of the Cox and permutation test P-values. We approximate its null distribution and correct it for correlation between the two P-values. Using extensive simulations, we assess the type 1 error and power of the combined test under several scenarios and compare with other tests. We investigate powering a trial using the combined test. The type 1 error of the combined test is close to nominal. Power under proportional hazards is slightly lower than for the Cox test. Enhanced power is available when the treatment difference shows an ‘early effect’, an initial separation of survival curves which diminishes over time. The power is reduced under a ‘late effect’, when little or no difference in survival curves is seen for an initial period and then a late separation occurs. We propose a method of powering a trial using the combined test. The ‘insurance premium’ offered by the combined test to safeguard power under non-PH represents about a single-digit percentage increase in sample size. The combined test increases trial power under an early treatment effect and protects power under other scenarios. Use of restricted mean survival time facilitates testing and displaying a generalized treatment effect.

Journal ArticleDOI
TL;DR: The relative importance of adjusting for clustering at the higher and lower level in a logistic regression model is assessed and confidence intervals adjusted for the higher level of clustering had coverage close to 95 %, even when there were few clusters.
Abstract: Background Clustering commonly affects the uncertainty of parameter estimates in epidemiological studies. Cluster-robust variance estimates (CRVE) are used to construct confidence intervals that account for single-level clustering, and are easily implemented in standard software. When data are clustered at more than one level (e.g. village and household) the level for the CRVE must be chosen. CRVE are consistent when used at the higher level of clustering (village), but since there are fewer clusters at the higher level, and consistency is an asymptotic property, there may be circumstances under which coverage is better from lower- rather than higher-level CRVE. Here we assess the relative importance of adjusting for clustering at the higher and lower level in a logistic regression model.

Journal ArticleDOI
TL;DR: Val-MI represents a valid strategy to obtain estimates of predictive performance measures in prognostic models developed on incomplete data, and bootstrap 0.632+ estimate representing a reliable method to correct for optimism.
Abstract: Missing values are a frequent issue in human studies. In many situations, multiple imputation (MI) is an appropriate missing data handling strategy, whereby missing values are imputed multiple times, the analysis is performed in every imputed data set, and the obtained estimates are pooled. If the aim is to estimate (added) predictive performance measures, such as (change in) the area under the receiver-operating characteristic curve (AUC), internal validation strategies become desirable in order to correct for optimism. It is not fully understood how internal validation should be combined with multiple imputation. In a comprehensive simulation study and in a real data set based on blood markers as predictors for mortality, we compare three combination strategies: Val-MI, internal validation followed by MI on the training and test parts separately, MI-Val, MI on the full data set followed by internal validation, and MI(-y)-Val, MI on the full data set omitting the outcome followed by internal validation. Different validation strategies, including bootstrap und cross-validation, different (added) performance measures, and various data characteristics are considered, and the strategies are evaluated with regard to bias and mean squared error of the obtained performance estimates. In addition, we elaborate on the number of resamples and imputations to be used, and adopt a strategy for confidence interval construction to incomplete data. Internal validation is essential in order to avoid optimism, with the bootstrap 0.632+ estimate representing a reliable method to correct for optimism. While estimates obtained by MI-Val are optimistically biased, those obtained by MI(-y)-Val tend to be pessimistic in the presence of a true underlying effect. Val-MI provides largely unbiased estimates, with a slight pessimistic bias with increasing true effect size, number of covariates and decreasing sample size. In Val-MI, accuracy of the estimate is more strongly improved by increasing the number of bootstrap draws rather than the number of imputations. With a simple integrated approach, valid confidence intervals for performance estimates can be obtained. When prognostic models are developed on incomplete data, Val-MI represents a valid strategy to obtain estimates of predictive performance measures.

Journal ArticleDOI
TL;DR: It is concluded that survival analyses that explicitly account in the statistical model for the times at which time dependent covariates are measured provide more reliable estimates compared to unadjusted analyses.
Abstract: Typical survival studies follow individuals to an event and measure explanatory variables for that event, sometimes repeatedly over the course of follow up. The Cox regression model has been used widely in the analyses of time to diagnosis or death from disease. The associations between the survival outcome and time dependent measures may be biased unless they are modeled appropriately. In this paper we explore the Time Dependent Cox Regression Model (TDCM), which quantifies the effect of repeated measures of covariates in the analysis of time to event data. This model is commonly used in biomedical research but sometimes does not explicitly adjust for the times at which time dependent explanatory variables are measured. This approach can yield different estimates of association compared to a model that adjusts for these times. In order to address the question of how different these estimates are from a statistical perspective, we compare the TDCM to Pooled Logistic Regression (PLR) and Cross Sectional Pooling (CSP), considering models that adjust and do not adjust for time in PLR and CSP. In a series of simulations we found that time adjusted CSP provided identical results to the TDCM while the PLR showed larger parameter estimates compared to the time adjusted CSP and the TDCM in scenarios with high event rates. We also observed upwardly biased estimates in the unadjusted CSP and unadjusted PLR methods. The time adjusted PLR had a positive bias in the time dependent Age effect with reduced bias when the event rate is low. The PLR methods showed a negative bias in the Sex effect, a subject level covariate, when compared to the other methods. The Cox models yielded reliable estimates for the Sex effect in all scenarios considered. We conclude that survival analyses that explicitly account in the statistical model for the times at which time dependent covariates are measured provide more reliable estimates compared to unadjusted analyses. We present results from the Framingham Heart Study in which lipid measurements and myocardial infarction data events were collected over a period of 26 years.

Journal ArticleDOI
TL;DR: The Canadian HIV Women’s Sexual and Reproductive Health Cohort Study (CHIWOS) is presented, a large-scale, multi-site, national, longitudinal quantitative study that has operationalized community-based research in all steps of the research process.
Abstract: Community-based research has gained increasing recognition in health research over the last two decades. Such participatory research approaches are lauded for their ability to anchor research in lived experiences, ensuring cultural appropriateness, accessing local knowledge, reaching marginalized communities, building capacity, and facilitating research-to-action. While having these positive attributes, the community-based health research literature is predominantly composed of small projects, using qualitative methods, and set within geographically limited communities. Its use in larger health studies, including clinical trials and cohorts, is limited. We present the Canadian HIV Women’s Sexual and Reproductive Health Cohort Study (CHIWOS), a large-scale, multi-site, national, longitudinal quantitative study that has operationalized community-based research in all steps of the research process. Successes, challenges and further considerations are offered. Through the integration of community-based research principles, we have been successful in: facilitating a two-year long formative phase for this study; developing a novel survey instrument with national involvement; training 39 Peer Research Associates (PRAs); offering ongoing comprehensive support to PRAs; and engaging in an ongoing iterative community-based research process. Our community-based research approach within CHIWOS demanded that we be cognizant of challenges managing a large national team, inherent power imbalances and challenges with communication, compensation and volunteering considerations, and extensive delays in institutional processes. It is important to consider the iterative nature of community-based research and to work through tensions that emerge given the diverse perspectives of numerous team members. Community-based research, as an approach to large-scale quantitative health research projects, is an increasingly viable methodological option. Community-based research has several advantages that go hand-in-hand with its obstacles. We offer guidance on implementing this approach, such that the process can be better planned and result in success.