scispace - formally typeset
Search or ask a question

Showing papers in "BMC Medical Research Methodology in 2018"


Journal ArticleDOI
TL;DR: The purpose of this article is to clearly describe the differences in indications between scoping reviews and systematic reviews and to provide guidance for when a scoping review is (and is not) appropriate.
Abstract: Scoping reviews are a relatively new approach to evidence synthesis and currently there exists little guidance regarding the decision to choose between a systematic review or scoping review approach when synthesising evidence. The purpose of this article is to clearly describe the differences in indications between scoping reviews and systematic reviews and to provide guidance for when a scoping review is (and is not) appropriate. Researchers may conduct scoping reviews instead of systematic reviews where the purpose of the review is to identify knowledge gaps, scope a body of literature, clarify concepts or to investigate research conduct. While useful in their own right, scoping reviews may also be helpful precursors to systematic reviews and can be used to confirm the relevance of inclusion criteria and potential questions. Scoping reviews are a useful tool in the ever increasing arsenal of evidence synthesis approaches. Although conducted for different purposes compared to systematic reviews, scoping reviews still require rigorous and transparent methods in their conduct to ensure that the results are trustworthy. Our hope is that with clear guidance available regarding whether to conduct a scoping review or a systematic review, there will be less scoping reviews being performed for inappropriate indications better served by a systematic review, and vice-versa.

3,945 citations


Journal ArticleDOI
TL;DR: It is recommended that qualitative health researchers be more transparent about evaluations of their sample size sufficiency, situating these within broader and more encompassing assessments of data adequacy.
Abstract: Choosing a suitable sample size in qualitative research is an area of conceptual debate and practical uncertainty. That sample size principles, guidelines and tools have been developed to enable researchers to set, and justify the acceptability of, their sample size is an indication that the issue constitutes an important marker of the quality of qualitative research. Nevertheless, research shows that sample size sufficiency reporting is often poor, if not absent, across a range of disciplinary fields. A systematic analysis of single-interview-per-participant designs within three health-related journals from the disciplines of psychology, sociology and medicine, over a 15-year period, was conducted to examine whether and how sample sizes were justified and how sample size was characterised and discussed by authors. Data pertinent to sample size were extracted and analysed using qualitative and quantitative analytic techniques. Our findings demonstrate that provision of sample size justifications in qualitative health research is limited; is not contingent on the number of interviews; and relates to the journal of publication. Defence of sample size was most frequently supported across all three journals with reference to the principle of saturation and to pragmatic considerations. Qualitative sample sizes were predominantly – and often without justification – characterised as insufficient (i.e., ‘small’) and discussed in the context of study limitations. Sample size insufficiency was seen to threaten the validity and generalizability of studies’ results, with the latter being frequently conceived in nomothetic terms. We recommend, firstly, that qualitative health researchers be more transparent about evaluations of their sample size sufficiency, situating these within broader and more encompassing assessments of data adequacy. Secondly, we invite researchers critically to consider how saturation parameters found in prior methodological studies and sample size community norms might best inform, and apply to, their own project and encourage that data adequacy is best appraised with reference to features that are intrinsic to the study at hand. Finally, those reviewing papers have a vital role in supporting and encouraging transparent study-specific reporting.

1,052 citations


Journal ArticleDOI
TL;DR: The predictive and modeling capabilities of DeepSurv will enable medical researchers to use deep neural networks as a tool in their exploration, understanding, and prediction of the effects of a patient’s characteristics on their risk of failure.
Abstract: Medical practitioners use survival models to explore and understand the relationships between patients’ covariates (e.g. clinical and genetic features) and the effectiveness of various treatment options. Standard survival models like the linear Cox proportional hazards model require extensive feature engineering or prior medical knowledge to model treatment interaction at an individual level. While nonlinear survival methods, such as neural networks and survival forests, can inherently model these high-level interaction terms, they have yet to be shown as effective treatment recommender systems. We introduce DeepSurv, a Cox proportional hazards deep neural network and state-of-the-art survival method for modeling interactions between a patient’s covariates and treatment effectiveness in order to provide personalized treatment recommendations. We perform a number of experiments training DeepSurv on simulated and real survival data. We demonstrate that DeepSurv performs as well as or better than other state-of-the-art survival models and validate that DeepSurv successfully models increasingly complex relationships between a patient’s covariates and their risk of failure. We then show how DeepSurv models the relationship between a patient’s features and effectiveness of different treatment options to show how DeepSurv can be used to provide individual treatment recommendations. Finally, we train DeepSurv on real clinical studies to demonstrate how it’s personalized treatment recommendations would increase the survival time of a set of patients. The predictive and modeling capabilities of DeepSurv will enable medical researchers to use deep neural networks as a tool in their exploration, understanding, and prediction of the effects of a patient’s characteristics on their risk of failure.

858 citations


Journal ArticleDOI
TL;DR: The aim is to provide a typology of review types and describe key elements that need to be addressed during question development for each type and provide clarified guidance for both novice and experienced reviewers and a unified typology with respect to review types.
Abstract: Systematic reviews have been considered as the pillar on which evidence-based healthcare rests. Systematic review methodology has evolved and been modified over the years to accommodate the range of questions that may arise in the health and medical sciences. This paper explores a concept still rarely considered by novice authors and in the literature: determining the type of systematic review to undertake based on a research question or priority. Within the framework of the evidence-based healthcare paradigm, defining the question and type of systematic review to conduct is a pivotal first step that will guide the rest of the process and has the potential to impact on other aspects of the evidence-based healthcare cycle (evidence generation, transfer and implementation). It is something that novice reviewers (and others not familiar with the range of review types available) need to take account of but frequently overlook. Our aim is to provide a typology of review types and describe key elements that need to be addressed during question development for each type. In this paper a typology is proposed of various systematic review methodologies. The review types are defined and situated with regard to establishing corresponding questions and inclusion criteria. The ultimate objective is to provide clarified guidance for both novice and experienced reviewers and a unified typology with respect to review types.

424 citations


Journal ArticleDOI
TL;DR: Log-binomial and robust (modified) Poisson regression models are popular approaches to estimate risk ratios for binary response variables but their performance under model misspecification is poorly understood.
Abstract: Log-binomial and robust (modified) Poisson regression models are popular approaches to estimate risk ratios for binary response variables. Previous studies have shown that comparatively they produce similar point estimates and standard errors. However, their performance under model misspecification is poorly understood. In this simulation study, the statistical performance of the two models was compared when the log link function was misspecified or the response depended on predictors through a non-linear relationship (i.e. truncated response). Point estimates from log-binomial models were biased when the link function was misspecified or when the probability distribution of the response variable was truncated at the right tail. The percentage of truncated observations was positively associated with the presence of bias, and the bias was larger if the observations came from a population with a lower response rate given that the other parameters being examined were fixed. In contrast, point estimates from the robust Poisson models were unbiased. Under model misspecification, the robust Poisson model was generally preferable because it provided unbiased estimates of risk ratios.

260 citations


Journal ArticleDOI
TL;DR: A systematic scoping review of published methodological recommendations on how to systematically review and meta-analyse observational studies found substantial agreement in some methodological areas but there was also considerable disagreement on how evidence synthesis of observational studies should be conducted.
Abstract: Systematic reviews and meta-analyses of observational studies are frequently performed, but no widely accepted guidance is available at present. We performed a systematic scoping review of published methodological recommendations on how to systematically review and meta-analyse observational studies. We searched online databases and websites and contacted experts in the field to locate potentially eligible articles. We included articles that provided any type of recommendation on how to conduct systematic reviews and meta-analyses of observational studies. We extracted and summarised recommendations on pre-defined key items: protocol development, research question, search strategy, study eligibility, data extraction, dealing with different study designs, risk of bias assessment, publication bias, heterogeneity, statistical analysis. We summarised recommendations by key item, identifying areas of agreement and disagreement as well as areas where recommendations were missing or scarce. The searches identified 2461 articles of which 93 were eligible. Many recommendations for reviews and meta-analyses of observational studies were transferred from guidance developed for reviews and meta-analyses of RCTs. Although there was substantial agreement in some methodological areas there was also considerable disagreement on how evidence synthesis of observational studies should be conducted. Conflicting recommendations were seen on topics such as the inclusion of different study designs in systematic reviews and meta-analyses, the use of quality scales to assess the risk of bias, and the choice of model (e.g. fixed vs. random effects) for meta-analysis. There is a need for sound methodological guidance on how to conduct systematic reviews and meta-analyses of observational studies, which critically considers areas in which there are conflicting recommendations.

258 citations


Journal ArticleDOI
TL;DR: If a shared model of the literature searching process can be detected across systematic review guidance documents and, if so, how this process is reported in the guidance and supported by published studies is determined.
Abstract: Systematic literature searching is recognised as a critical component of the systematic review process. It involves a systematic search for studies and aims for a transparent report of study identification, leaving readers clear about what was done to identify studies, and how the findings of the review are situated in the relevant evidence. Information specialists and review teams appear to work from a shared and tacit model of the literature search process. How this tacit model has developed and evolved is unclear, and it has not been explicitly examined before. The purpose of this review is to determine if a shared model of the literature searching process can be detected across systematic review guidance documents and, if so, how this process is reported in the guidance and supported by published studies. A literature review. Two types of literature were reviewed: guidance and published studies. Nine guidance documents were identified, including: The Cochrane and Campbell Handbooks. Published studies were identified through ‘pearl growing’, citation chasing, a search of PubMed using the systematic review methods filter, and the authors’ topic knowledge. The relevant sections within each guidance document were then read and re-read, with the aim of determining key methodological stages. Methodological stages were identified and defined. This data was reviewed to identify agreements and areas of unique guidance between guidance documents. Consensus across multiple guidance documents was used to inform selection of ‘key stages’ in the process of literature searching. Eight key stages were determined relating specifically to literature searching in systematic reviews. They were: who should literature search, aims and purpose of literature searching, preparation, the search strategy, searching databases, supplementary searching, managing references and reporting the search process. Eight key stages to the process of literature searching in systematic reviews were identified. These key stages are consistently reported in the nine guidance documents, suggesting consensus on the key stages of literature searching, and therefore the process of literature searching as a whole, in systematic reviews. Further research to determine the suitability of using the same process of literature searching for all types of systematic review is indicated.

208 citations


Journal ArticleDOI
TL;DR: Methods based on summary statistics reported in the literature facilitate more comprehensive inclusion of randomised controlled trials with missing mean or variability summary statistics within meta-analyses.
Abstract: Rigorous, informative meta-analyses rely on availability of appropriate summary statistics or individual participant data. For continuous outcomes, especially those with naturally skewed distributions, summary information on the mean or variability often goes unreported. While full reporting of original trial data is the ideal, we sought to identify methods for handling unreported mean or variability summary statistics in meta-analysis. We undertook two systematic literature reviews to identify methodological approaches used to deal with missing mean or variability summary statistics. Five electronic databases were searched, in addition to the Cochrane Colloquium abstract books and the Cochrane Statistics Methods Group mailing list archive. We also conducted cited reference searching and emailed topic experts to identify recent methodological developments. Details recorded included the description of the method, the information required to implement the method, any underlying assumptions and whether the method could be readily applied in standard statistical software. We provided a summary description of the methods identified, illustrating selected methods in example meta-analysis scenarios. For missing standard deviations (SDs), following screening of 503 articles, fifteen methods were identified in addition to those reported in a previous review. These included Bayesian hierarchical modelling at the meta-analysis level; summary statistic level imputation based on observed SD values from other trials in the meta-analysis; a practical approximation based on the range; and algebraic estimation of the SD based on other summary statistics. Following screening of 1124 articles for methods estimating the mean, one approximate Bayesian computation approach and three papers based on alternative summary statistics were identified. Illustrative meta-analyses showed that when replacing a missing SD the approximation using the range minimised loss of precision and generally performed better than omitting trials. When estimating missing means, a formula using the median, lower quartile and upper quartile performed best in preserving the precision of the meta-analysis findings, although in some scenarios, omitting trials gave superior results. Methods based on summary statistics (minimum, maximum, lower quartile, upper quartile, median) reported in the literature facilitate more comprehensive inclusion of randomised controlled trials with missing mean or variability summary statistics within meta-analyses.

207 citations


Journal ArticleDOI
TL;DR: Employing a larger number of retention strategies may not be associated with improved retention in longitudinal cohort studies, contrary to earlier narrative reviews, and strategies that aim to reduce participant burden might be most effective in maximising cohort retention.
Abstract: Participant retention strategies that minimise attrition in longitudinal cohort studies have evolved considerably in recent years. This study aimed to assess, via systematic review and meta-analysis, the effectiveness of both traditional strategies and contemporary innovations for retention adopted by longitudinal cohort studies in the past decade. Health research databases were searched for retention strategies used within longitudinal cohort studies published in the 10-years prior, with 143 eligible longitudinal cohort studies identified (141 articles; sample size range: 30 to 61,895). Details on retention strategies and rates, research designs, and participant demographics were extracted. Meta-analyses of retained proportions were performed to examine the association between cohort retention rate and individual and thematically grouped retention strategies. Results identified 95 retention strategies, broadly classed as either: barrier-reduction, community-building, follow-up/reminder, or tracing strategies. Forty-four of these strategies had not been identified in previous reviews. Meta-regressions indicated that studies using barrier-reduction strategies retained 10% more of their sample (95%CI [0.13 to 1.08]; p = .01); however, studies using follow-up/reminder strategies lost an additional 10% of their sample (95%CI [− 1.19 to − 0.21]; p = .02). The overall number of strategies employed was not associated with retention. Employing a larger number of retention strategies may not be associated with improved retention in longitudinal cohort studies, contrary to earlier narrative reviews. Results suggest that strategies that aim to reduce participant burden (e.g., flexibility in data collection methods) might be most effective in maximising cohort retention.

194 citations


Journal ArticleDOI
TL;DR: Results showed the majority of older adult’s under-report their level of MVPA and SB when completing the IPAQ and the linear relationship above the mean shows an error from under to over reporting as the mean increases.
Abstract: In order to accurately measure and monitor levels of moderate-to-vigorous physical activity (MVPA) and sedentary behaviour (SB) in older adults, cost efficient and valid instruments are required. To date, the International Physical Activity Questionnaire (IPAQ) has not been validated with older adults (aged 60 years plus) in the United Kingdom. The current study aimed to test the validity of the IPAQ in a group of older adults for both MVPA and SB. Participants wore an Actigraph GT3X+ for seven consecutive days and following the monitor wear participants were asked to complete the IPAQ. Statistical analysis included: Kolmogorov-Smirnov tests; descriptive analyses; Spearman’s rho coefficients; and Bland-Altman analyses. A sample of 253 older adults were recruited (mean age 71.8 years (SD 6.6) and 57% male). In total, 226 had valid accelerometer and IPAQ data for MVPA and 228 had valid data for SB. Results showed the IPAQ had moderate/acceptable levels of validity (r = .430–.557) for MVPA. For SB, there was substantial levels of validity on weekdays (r = .702) and fair levels of validity (r = .257) on weekend days. Bland-Altman analysis showed inherent measurement error with the majority of participants tending to under-report both MVPA and SB. Results showed the majority of older adult’s under-report their level of MVPA and SB when completing the IPAQ and the linear relationship above the mean shows an error from under to over reporting as the mean increases. Findings from the current study suggest that the IPAQ is better implemented in larger surveillance studies comparing groups within or between countries rather than on an individual basis. Findings also suggest that the IPAQ validity scores could be strengthened by providing additional detail of types of activities older adults might do on a daily basis, improving recall; and it may also be necessary to provide an example of a daily break down of typical activities performed. This may enable older adults to more fully comprehend the amount of time they may spend active, sitting and/or lying during waking hours.

172 citations


Journal ArticleDOI
TL;DR: Analysis of simulated data under missing at random (MAR) mechanisms showed that the generally available MI methods provided less biased estimates with better coverage for the linear regression model and around half of these methods performed well for the estimation of regression parameters for a linear mixed model with random intercept.
Abstract: Multiple imputation (MI) is now widely used to handle missing data in longitudinal studies. Several MI techniques have been proposed to impute incomplete longitudinal covariates, including standard fully conditional specification (FCS-Standard) and joint multivariate normal imputation (JM-MVN), which treat repeated measurements as distinct variables, and various extensions based on generalized linear mixed models. Although these MI approaches have been implemented in various software packages, there has not been a comprehensive evaluation of the relative performance of these methods in the context of longitudinal data. Using both empirical data and a simulation study based on data from the six waves of the Longitudinal Study of Australian Children (N = 4661), we investigated the performance of a wide range of MI methods available in standard software packages for investigating the association between child body mass index (BMI) and quality of life using both a linear regression and a linear mixed-effects model. In this paper, we have identified and compared 12 different MI methods for imputing missing data in longitudinal studies. Analysis of simulated data under missing at random (MAR) mechanisms showed that the generally available MI methods provided less biased estimates with better coverage for the linear regression model and around half of these methods performed well for the estimation of regression parameters for a linear mixed model with random intercept. With the observed data, we observed an inverse association between child BMI and quality of life, with available data as well as multiple imputation. Both FCS-Standard and JM-MVN performed well for the estimation of regression parameters in both analysis models. More complex methods that explicitly reflect the longitudinal structure for these analysis models may only be needed in specific circumstances such as irregularly spaced data.

Journal ArticleDOI
TL;DR: The NoMAD instrument has good face validity, construct validity and internal consistency, for assessing staff perceptions of factors relevant to embedding interventions that change their work practices.
Abstract: Understanding and measuring implementation processes is a key challenge for implementation researchers. This study draws on Normalization Process Theory (NPT) to develop an instrument that can be applied to assess, monitor or measure factors likely to affect normalization from the perspective of implementation participants. An iterative process of instrument development was undertaken using the following methods: theoretical elaboration, item generation and item reduction (team workshops); item appraisal (QAS-99); cognitive testing with complex intervention teams; theory re-validation with NPT experts; and pilot testing of instrument. We initially generated 112 potential questionnaire items; these were then reduced to 47 through team workshops and item appraisal. No concerns about item wording and construction were raised through the item appraisal process. We undertook three rounds of cognitive interviews with professionals (n = 30) involved in the development, evaluation, delivery or reception of complex interventions. We identified minor issues around wording of some items; universal issues around how to engage with people at different time points in an intervention; and conceptual issues around the types of people for whom the instrument should be designed. We managed these by adding extra items (n = 6) and including a new set of option responses: ‘not relevant at this stage’, ‘not relevant to my role’ and ‘not relevant to this intervention’ and decided to design an instrument explicitly for those people either delivering or receiving an intervention. This version of the instrument had 53 items. Twenty-three people with a good working knowledge of NPT reviewed the items for theoretical drift. Items that displayed a poor alignment with NPT sub-constructs were removed (n = 8) and others revised or combined (n = 6). The final instrument, with 43 items, was successfully piloted with five people, with a 100% completion rate of items. The process of moving through cycles of theoretical translation, item generation, cognitive testing, and theoretical (re)validation was essential for maintaining a balance between the theoretical integrity of the NPT concepts and the ease with which intended respondents could answer the questions. The final instrument could be easily understood and completed, while retaining theoretical validity. NoMAD represents a measure that can be used to understand implementation participants’ experiences. It is intended as a measure that can be used alongside instruments that measure other dimensions of implementation activity, such as implementation fidelity, adoption, and readiness.

Journal ArticleDOI
TL;DR: It is demonstrated that misuse of the ICC statistics under common assumption violations leads to misleading and likely inflated estimates of interrater reliability.
Abstract: Intraclass correlation coefficients (ICC) are recommended for the assessment of the reliability of measurement scales. However, the ICC is subject to a variety of statistical assumptions such as normality and stable variance, which are rarely considered in health applications. A Bayesian approach using hierarchical regression and variance-function modeling is proposed to estimate the ICC with emphasis on accounting for heterogeneous variances across a measurement scale. As an application, we review the implementation of using an ICC to evaluate the reliability of Observer OPTION5, an instrument which used trained raters to evaluate the level of Shared Decision Making between clinicians and patients. The study used two raters to evaluate recordings of 311 clinical encounters across three studies to evaluate the impact of using a Personal Decision Aid over usual care. We particularly focus on deriving an estimate for the ICC when multiple studies are being considered as part of the data. The results demonstrate that ICC varies substantially across studies and patient-physician encounters within studies. Using the new framework we developed, the study-specific ICCs were estimated to be 0.821, 0.295, and 0.644. If the within- and between-encounter variances were assumed to be the same across studies, the estimated within-study ICC was 0.609. If heteroscedasticity is not properly adjusted for, the within-study ICC estimate was inflated to be as high as 0.640. Finally, if the data were pooled across studies without accounting for the variability between studies then ICC estimates were further inflated by approximately 0.02 while formerly allowing for between study variation in the ICC inflated its estimated value by approximately 0.066 to 0.072 depending on the model. We demonstrated that misuse of the ICC statistics under common assumption violations leads to misleading and likely inflated estimates of interrater reliability. A statistical analysis that overcomes these violations by expanding the standard statistical model to account for them leads to estimates that are a better reflection of a measurement scale’s reliability while maintaining ease of interpretation. Bayesian methods are particularly well suited to estimating the expanded statistical model.

Journal ArticleDOI
TL;DR: Two biomarker screening approaches are evaluated, a six-month risk prediction model and a parametric empirical Bayes (PEB) algorithm, in terms of their ability to improve the likelihood of early detection of HCC when applied prospectively in a future study.
Abstract: Hepatocellular carcinoma (HCC) has limited treatment options in patients with advanced stage disease and early detection of HCC through surveillance programs is a key component towards reducing mortality. The current practice guidelines recommend that high-risk cirrhosis patients are screened every six months with ultrasonography but these are done in local hospitals with variable quality leading to disagreement about the benefit of HCC surveillance. The well-established diagnostic biomarker α-Fetoprotein (AFP) is used widely in screening but the reported performance varies widely across studies. We evaluate two biomarker screening approaches, a six-month risk prediction model and a parametric empirical Bayes (PEB) algorithm, in terms of their ability to improve the likelihood of early detection of HCC compared to current AFP alone when applied prospectively in a future study. We used electronic medical records from the Department of Veterans Affairs Hepatitis C Clinical Case Registry to construct our analysis cohort, which consists of serial AFP tests in 11,222 cirrhosis control patients and 902 HCC cases prior to their HCC diagnosis. The six-month risk prediction model incorporates routinely measured laboratory tests, age, the rate of change in AFP over the past year with the current AFP. The PEB algorithm incorporates prior AFP screening values to identify patients with a significant elevated level of AFP at their current screen. We split the analysis cohort into independent training and validation datasets. All model fitting and parameter estimation was performed using the training data and the algorithm performance was assessed by applying each approach to patients in the validation dataset. When the screening-level false positive rate was set at 10%, the patient-level true positive rate using current AFP alone was 53.88% while the patient-level true positive rate for the six-month risk prediction model was 58.09% (4.21% increase) and PEB approach was 63.64% (9.76% increase). Both screening approaches identify a greater proportion of HCC cases earlier than using AFP alone. The two approaches show greater potential to improve early detection of HCC compared to using the current AFP only and are worthy of further study.

Journal ArticleDOI
TL;DR: NoMAD is intended as a measure that can be used alongside instruments that measure other dimensions of implementation activity, such as implementation fidelity, adoption, and readiness, and was successfully piloted with five people.

Journal ArticleDOI
TL;DR: TIDieR is found to be a useful tool for applied research outside the context of clinical trials and four revisions or additions are suggested which would enable it to better capture these complexities in applied health research.
Abstract: The Template for Intervention Description and Replication (TIDieR) checklist and guide was developed by an international team of experts to promote full and accurate description of trial interventions. It is now widely used in health research. The aim of this paper is to describe the experience of using TIDieR outside of trials, in a range of applied health research contexts, and make recommendations on its usefulness in such settings. We used the TIDieR template for intervention description in six applied health research projects. The six cases comprise a diverse sample in terms of clinical problems, population, settings, stage of intervention development and whether the intervention was led by researchers or the service deliverers. There was also variation in how the TIDieR description was produced in terms of contributors and time point in the project. Researchers involved in the six cases met in two workshops to identify issues and themes arising from their experience of using TIDieR. We identified four themes which capture the difficulties or complexities of using TIDieR in applied health research: (i) fidelity and adaptation: all aspects of an intervention can change over time; (ii) voice: the importance of clarity on whose voice the TIDieR description represents; (iii) communication beyond the immediate context: the usefulness of TIDieR for wider dissemination and sharing; (iv) the use of TIDieR as a research tool. We found TIDieR to be a useful tool for applied research outside the context of clinical trials and we suggest four revisions or additions to the original TIDieR which would enable it to better capture these complexities in applied health research:

Journal ArticleDOI
TL;DR: This article provides the sample size required to achieve 80% power by simulations under various sizes of the mediation effect, within-subject correlations and numbers of repeated measures, and shows that the distribution of the product method and bootstrapping method have superior performance to the Sobel's method.
Abstract: Sample size planning for longitudinal data is crucial when designing mediation studies because sufficient statistical power is not only required in grant applications and peer-reviewed publications, but is essential to reliable research results. However, sample size determination is not straightforward for mediation analysis of longitudinal design. To facilitate planning the sample size for longitudinal mediation studies with a multilevel mediation model, this article provides the sample size required to achieve 80% power by simulations under various sizes of the mediation effect, within-subject correlations and numbers of repeated measures. The sample size calculation is based on three commonly used mediation tests: Sobel’s method, distribution of product method and the bootstrap method. Among the three methods of testing the mediation effects, Sobel’s method required the largest sample size to achieve 80% power. Bootstrapping and the distribution of the product method performed similarly and were more powerful than Sobel’s method, as reflected by the relatively smaller sample sizes. For all three methods, the sample size required to achieve 80% power depended on the value of the ICC (i.e., within-subject correlation). A larger value of ICC typically required a larger sample size to achieve 80% power. Simulation results also illustrated the advantage of the longitudinal study design. The sample size tables for most encountered scenarios in practice have also been published for convenient use. Extensive simulations study showed that the distribution of the product method and bootstrapping method have superior performance to the Sobel’s method, but the product method was recommended to use in practice in terms of less computation time load compared to the bootstrapping method. A R package has been developed for the product method of sample size determination in mediation longitudinal study design.

Journal ArticleDOI
TL;DR: An ethnographically-informed method of guided discussions developed for use by a multi-project national implementation program to aid in documenting implementation phenomena over time shows potential as a straightforward and low-burden method for documenting events across the life cycle of an implementation effort.
Abstract: Ethnography has been proposed as a valuable method for understanding how implementation occurs within dynamic healthcare contexts, yet this method can be time-intensive and challenging to operationalize in pragmatic implementation. The current study describes an ethnographically-informed method of guided discussions developed for use by a multi-project national implementation program. The EMPOWER QUERI is conducting three projects to implement innovative care models in VA women’s health for high-priority health concerns – prediabetes, cardiovascular risk, and mental health – utilizing the Replicating Effective Programs (REP) implementation strategy enhanced with stakeholder engagement and complexity science. Drawing on tenets of ethnographic research, we developed a lightly-structured method of guided “periodic reflections” to aid in documenting implementation phenomena over time. Reflections are completed as 30–60 min telephone discussions with implementation team members at monthly or bi-monthly intervals, led by a member of the implementation core. Discussion notes are coded to reflect key domains of interest and emergent themes, and can be analyzed singly or in triangulation with other qualitative and quantitative assessments to inform evaluation and implementation activities. Thirty structured reflections were completed across the three projects during a 15-month period spanning pre-implementation, implementation, and sustainment activities. Reflections provide detailed, near-real-time information on projects’ dynamic implementation context, including characteristics of implementation settings and changes in the local or national environment, adaptations to the intervention and implementation plan, and implementation team sensemaking and learning. Reflections also provide an opportunity for implementation teams to engage in recurring reflection and problem-solving. To implement new, complex interventions into dynamic organizations, we must better understand the implementation process as it unfolds in real time. Ethnography is well suited to this task, but few approaches exist to aid in integrating ethnographic insights into implementation research. Periodic reflections show potential as a straightforward and low-burden method for documenting events across the life cycle of an implementation effort. They offer an effective means for capturing information on context, unfolding process and sensemaking, unexpected events, and diverse viewpoints, illustrating their value for use as part of an ethnographically-minded implementation approach. The two implementation research studies described in this article have been registered as required: Facilitating Cardiovascular Risk Screening and Risk Reduction in Women Veterans (NCT02991534); and Implementation of Tailored Collaborative Care for Women Veterans (NCT02950961).

Journal ArticleDOI
TL;DR: An important step to improve the quality of maternity care is to understand the magnitude and burden of mistreatment across contexts, and to inform the development of more women-centered, respectful maternity healthcare services.
Abstract: Efforts to improve maternal health are increasingly focused on improving the quality of care provided to women at health facilities, including the promotion of respectful care and eliminating mistreatment of women during childbirth. A WHO-led multi-country research project aims to develop and validate two tools (labor observation and community survey) to measure how women are treated during facility-based childbirth. This paper describes the development process for these measurement tools, and how they were implemented in a multi-country study (Ghana, Guinea, Myanmar and Nigeria). An iterative mixed-methods approach was used to develop two measurement tools. Methodological development was conducted in four steps: (1) initial tool development; (2) validity testing, item adjustment and piloting of paper-based tools; (3) conversion to digital, tablet-based tools; and (4) data collection and analysis. These steps included systematic reviews, primary qualitative research, mapping of existing tools, item consolidation, peer review by key stakeholders and piloting. The development, structure, administration format, and implementation of the labor observation and community survey tools are described. For the labor observations, a total of 2016 women participated: 408 in Nigeria, 682 in Guinea, and 926 in Ghana. For the community survey, a total of 2672 women participated: 561 in Nigeria, 644 in Guinea, 836 in Ghana, and 631 in Myanmar. Of the 2016 women who participated in the labor observations, 1536 women (76.2%) also participated in the community survey and have linked data: 779 in Ghana, 425 in Guinea, and 332 in Nigeria. An important step to improve the quality of maternity care is to understand the magnitude and burden of mistreatment across contexts. Researchers and healthcare providers in maternal health are encouraged to use and implement these tools, to inform the development of more women-centered, respectful maternity healthcare services. By measuring the prevalence of mistreatment of women during childbirth, we will be able to design and implement programs and policies to transform maternity services.

Journal ArticleDOI
TL;DR: The alternative approach of machine learning classification produced results comparable to that of risk prediction scores and, thus, it can be used as a method of CVD prediction, taking into consideration the advantages that machine learning methodologies may offer.
Abstract: The use of Cardiovascular Disease (CVD) risk estimation scores in primary prevention has long been established. However, their performance still remains a matter of concern. The aim of this study was to explore the potential of using ML methodologies on CVD prediction, especially compared to established risk tool, the HellenicSCORE. Data from the ATTICA prospective study (n = 2020 adults), enrolled during 2001–02 and followed-up in 2011–12 were used. Three different machine-learning classifiers (k-NN, random forest, and decision tree) were trained and evaluated against 10-year CVD incidence, in comparison with the HellenicSCORE tool (a calibration of the ESC SCORE). Training datasets, consisting from 16 variables to only 5 variables, were chosen, with or without bootstrapping, in an attempt to achieve the best overall performance for the machine learning classifiers. Depending on the classifier and the training dataset the outcome varied in efficiency but was comparable between the two methodological approaches. In particular, the HellenicSCORE showed accuracy 85%, specificity 20%, sensitivity 97%, positive predictive value 87%, and negative predictive value 58%, whereas for the machine learning methodologies, accuracy ranged from 65 to 84%, specificity from 46 to 56%, sensitivity from 67 to 89%, positive predictive value from 89 to 91%, and negative predictive value from 24 to 45%; random forest gave the best results, while the k-NN gave the poorest results. The alternative approach of machine learning classification produced results comparable to that of risk prediction scores and, thus, it can be used as a method of CVD prediction, taking into consideration the advantages that machine learning methodologies may offer.

Journal ArticleDOI
TL;DR: There is increasing use of mediation analysis with time-to-event outcomes, but current usage is limited by reliance on traditional methods and the Cox Proportional Hazards model, as well as low rates of reporting of underlying assumptions.
Abstract: Mediation analysis tests whether the relationship between two variables is explained by a third intermediate variable. We sought to describe the usage and reporting of mediation analysis with time-to-event outcomes in published healthcare research. A systematic search of Medline, Embase, and Web of Science was executed in December 2016 to identify applications of mediation analysis to healthcare research involving a clinically relevant time-to-event outcome. We summarized usage over time and reporting of important methodological characteristics. We included 149 primary studies, published from 1997 to 2016. Most studies were published after 2011 (n = 110, 74%), and the annual number of studies nearly doubled in the last year (from n = 21 to n = 40). A traditional approach (causal steps or change in coefficient) was most commonly taken (n = 87, 58%), and the majority of studies (n = 114, 77%) used a Cox Proportional Hazards regression for the outcome. Few studies (n = 52, 35%) mentioned any of the assumptions or limitations fundamental to a causal interpretation of mediation analysis. There is increasing use of mediation analysis with time-to-event outcomes. Current usage is limited by reliance on traditional methods and the Cox Proportional Hazards model, as well as low rates of reporting of underlying assumptions. There is a need for formal criteria to aid authors, reviewers, and readers reporting or appraising such studies.

Journal ArticleDOI
TL;DR: This work describes the classical joint model to the case of multiple longitudinal outcomes, proposes a practical algorithm for fitting the models, and demonstrates how to fit the models using a new package for the statistical software platform R, joineRML.
Abstract: Joint modelling of longitudinal and time-to-event outcomes has received considerable attention over recent years. Commensurate with this has been a rise in statistical software options for fitting these models. However, these tools have generally been limited to a single longitudinal outcome. Here, we describe the classical joint model to the case of multiple longitudinal outcomes, propose a practical algorithm for fitting the models, and demonstrate how to fit the models using a new package for the statistical software platform R, joineRML. A multivariate linear mixed sub-model is specified for the longitudinal outcomes, and a Cox proportional hazards regression model with time-varying covariates is specified for the event time sub-model. The association between models is captured through a zero-mean multivariate latent Gaussian process. The models are fitted using a Monte Carlo Expectation-Maximisation algorithm, and inferences are based on approximate standard errors from the empirical profile information matrix, which are contrasted to an alternative bootstrap estimation approach. We illustrate the model and software on a real data example for patients with primary biliary cirrhosis with three repeatedly measured biomarkers. An open-source software package capable of fitting multivariate joint models is available. The underlying algorithm and source code makes use of several methods to increase computational speed.

Journal ArticleDOI
TL;DR: The extent of bias associated with blinding in randomized controlled trials of oral health interventions was quantified using a two-level meta-meta-analytic approach with a random effects model to allow for intra- and inter- meta-analysis heterogeneity.
Abstract: Recent methodologic evidence suggests that lack of blinding in randomized trials can result in under- or overestimation of the treatment effect size. The objective of this study is to quantify the extent of bias associated with blinding in randomized controlled trials of oral health interventions. We selected all oral health meta-analyses that included a minimum of five randomized controlled trials. We extracted data, in duplicate, related to nine blinding-related criteria, namely: patient blinding, assessor blinding, care-provider blinding, investigator blinding, statistician blinding, blinding of both patients and assessors, study described as “double blind”, blinding of patients, assessors, and care providers concurrently, and the appropriateness of blinding. We quantified the impact of bias associated with blinding on the magnitude of effect size using a two-level meta-meta-analytic approach with a random effects model to allow for intra- and inter-meta-analysis heterogeneity. We identified 540 randomized controlled trials, included in 64 meta-analyses, analyzing data from 137,957 patients. We identified significantly larger treatment effect size estimates in trials that had inadequate patient blinding (difference in treatment effect size = 0.12; 95% CI: 0.00 to 0.23), lack of blinding of both patients and assessors (difference = 0.19; 95% CI: 0.06 to 0.32), and lack of blinding of patients, assessors, and care-providers concurrently (difference = 0.14; 95% CI: 0.03 to 0.25). In contrast, assessor blinding (difference = 0.06; 95% CI: -0.06 to 0.18), caregiver blinding (difference = 0.02; 95% CI: -0.04 to 0.09), principal-investigator blinding (difference = − 0.02; 95% CI: -0.10 to 0.06), describing a trial as “double-blind” (difference = 0.09; 95% CI: -0.05 to 0.22), and lack of an appropriate method of blinding (difference = 0.06; 95% CI: -0.06 to 0.18) were not associated with over- or underestimated treatment effect size. We found significant differences in treatment effect size estimates between oral health trials based on lack of patient and assessor blinding. Treatment effect size estimates were 0.19 and 0.14 larger in trials with lack of blinding of both patients and assessors and blinding of patients, assessors, and care-providers concurrently. No significant differences were identified in other blinding criteria. Investigators of oral health systematic reviews should perform sensitivity analyses based on the adequacy of blinding in included trials.

Journal ArticleDOI
TL;DR: This paper summarizes definitions of mechanism found in realist methodological literature, reports an empirical example of a realist analysis of the implementation of models of integrated community-based primary health care in three international jurisdictions, and outlines the implications of the analysis for realist research and realist evaluation.
Abstract: The concept of “mechanism” is central to realist approaches to research, yet research teams struggle to operationalize and apply the concept in empirical research. Our large, interdisciplinary research team has also experienced challenges in making the concept useful in our study of the implementation of models of integrated community-based primary health care (ICBPHC) in three international jurisdictions (Ontario and Quebec in Canada, and in New Zealand). In this paper we summarize definitions of mechanism found in realist methodological literature, and report an empirical example of a realist analysis of the implementation ICBPHC. We use our empirical example to illustrate two points. First, the distinction between contexts and mechanisms might ultimately be arbitrary, with more distally located mechanisms becoming contexts as research teams focus their analytic attention more proximally to the outcome of interest. Second, the relationships between mechanisms, human reasoning, and human agency need to be considered in greater detail to inform realist-informed analysis; understanding these relationships is fundamental to understanding the ways in which mechanisms operate through individuals and groups to effect the outcomes of complex health interventions. We conclude our paper with reflections on human agency and outline the implications of our analysis for realist research and realist evaluation.

Journal ArticleDOI
TL;DR: Authors, peer reviewers, and editors should pay more attention to the correct use and reporting of assessment tools in evidence synthesis and authors of overviews of reviews should ensure to have a methodological expert in their review team.
Abstract: The assessment of multiple systematic reviews (AMSTAR) tool is widely used for investigating the methodological quality of systematic reviews (SR). Originally, AMSTAR was developed for SRs of randomized controlled trials (RCTs). Its applicability to SRs of other study designs remains unclear. Our objectives were to: 1) analyze how AMSTAR is applied by authors and (2) analyze whether the authors pay attention to the original purpose of AMSTAR and for what it has been validated. We searched MEDLINE (via PubMed) from inception through October 2016 to identify studies that applied AMSTAR. Full-text studies were sought for all retrieved hits and screened by one reviewer. A second reviewer verified the excluded studies (liberal acceleration). Data were extracted into structured tables by one reviewer and were checked by a second reviewer. Discrepancies at any stage were resolved by consensus or by consulting a third person. We analyzed the data descriptively as frequencies or medians and interquartile ranges (IQRs). Associations were quantified using the risk ratio (RR), with 95% confidence intervals. We identified 247 studies. They included a median of 17 reviews (interquartile range (IQR): 8 to 47) per study. AMSTAR was modified in 23% (57/247) of studies. In most studies, an AMSTAR score was calculated (200/247; 81%). Methods for calculating an AMSTAR score varied, with summing up all yes answers (yes = 1) being the most frequent option (102/200; 51%). More than one third of the authors failed to report how the AMSTAR score was obtained (71/200; 36%). In a subgroup analysis, we compared overviews of reviews (n = 154) with the methodological publications (n = 93). The overviews of reviews were much less likely to mention both limitations with respect to study designs (if other studies other than RCTs were included in the reviews) (RR 0.27, 95% CI 0.09 to 0.75) and overall score (RR 0.08, 95% CI 0.02 to 0.35). Authors, peer reviewers, and editors should pay more attention to the correct use and reporting of assessment tools in evidence synthesis. Authors of overviews of reviews should ensure to have a methodological expert in their review team.

Journal ArticleDOI
TL;DR: Key criteria to evaluate a segmentation framework: internal validity, external validity, identifiability/interpretability, substantiality, stability, actionability/accessibility, and parsimony are identified.
Abstract: Data-driven population segmentation analysis utilizes data analytics to divide a heterogeneous population into parsimonious and relatively homogenous groups with similar healthcare characteristics. It is a promising patient-centric analysis that enables effective integrated healthcare interventions specific for each segment. Although widely applied, there is no systematic review on the clinical application of data-driven population segmentation analysis. We carried out a systematic literature search using PubMed, Embase and Web of Science following PRISMA criteria. We included English peer-reviewed articles that applied data-driven population segmentation analysis on empirical health data. We summarized the clinical settings in which segmentation analysis was applied, compared and contrasted strengths, limitations, and practical considerations of different segmentation methods, and assessed the segmentation outcome of all included studies. The studies were assessed by two independent reviewers. We retrieved 14,514 articles and included 216 articles. Data-driven population segmentation analysis was widely used in different clinical contexts. 163 studies examined the general population while 53 focused on specific population with certain diseases or conditions, including psychological, oncological, respiratory, cardiovascular, and gastrointestinal conditions. Variables used for segmentation in the studies are heterogeneous. Most studies (n = 170) utilized secondary data in community settings (n = 185). The most common segmentation method was latent class/profile/transition/growth analysis (n = 96) followed by K-means cluster analysis (n = 60) and hierarchical analysis (n = 50), each having its advantages, disadvantages, and practical considerations. We also identified key criteria to evaluate a segmentation framework: internal validity, external validity, identifiability/interpretability, substantiality, stability, actionability/accessibility, and parsimony. Data-driven population segmentation has been widely applied and holds great potential in managing population health. The evaluations of segmentation outcome require the interplay of data analytics and subject matter expertise. The optimal framework for segmentation requires further research.

Journal ArticleDOI
TL;DR: The objective of this paper is to introduce a high level reference model that is intended to be used as a foundation to design successful and contextually relevant CDSS systems.
Abstract: Clinical Decision Support Systems (CDSS) provide aid in clinical decision making and therefore need to take into consideration human, data interactions, and cognitive functions of clinical decision makers. The objective of this paper is to introduce a high level reference model that is intended to be used as a foundation to design successful and contextually relevant CDSS systems. The paper begins by introducing the information flow, use, and sharing characteristics in a hospital setting, and then it outlines the referential context for the model, which are clinical decisions in a hospital setting. Important characteristics of the Clinical decision making process include: (i) Temporally ordered steps, each leading to new data, which in turn becomes useful for a new decision, (ii) Feedback loops where acquisition of new data improves certainty and generates new questions to examine, (iii) Combining different kinds of clinical data for decision making, (iv) Reusing the same data in two or more different decisions, and (v) Clinical decisions requiring human cognitive skills and knowledge, to process the available information. These characteristics form the foundation to delineate important considerations of Clinical Decision Support Systems design. The model includes six interacting and interconnected elements, which formulate the high-level reference model (CDSS-RM). These elements are introduced in the form of questions, as considerations, and are examined with the use of illustrated scenario-based and data-driven examples. The six elements /considerations of the reference model are: (i) Do CDSS mimic the cognitive process of clinical decision makers? (ii) Do CDSS provide recommendations with longitudinal insight? (iii) Is the model performance contextually realistic? (iv) Is the ‘Historical Decision’ bias taken into consideration in CDSS design? (v) Do CDSS integrate established clinical standards and protocols? (vi) Do CDSS utilize unstructured data? The CDSS-RM reference model can contribute to optimized design of modeling methodologies, in order to improve response of health systems to clinical decision-making challenges.

Journal ArticleDOI
TL;DR: No golden standard for the optimal design of a diary study exists since the design depends heavily upon the research question of the study, and the findings of the current study are helpful to explicate and guide the specific choices that have to be made when designing a Diary study.
Abstract: Electronic diaries are increasingly used in diverse disciplines to collect momentary data on experienced feelings, cognitions, behavior and social context in real life situations. Choices to be made for an effective and feasible design are however a challenge. Careful and detailed documentation of argumentation of choosing a particular design, as well as general guidelines on how to design such studies are largely lacking in scientific papers. This qualitative study provides a systematic overview of arguments for choosing a specific diary study design (e.g. time frame) in order to optimize future design decisions. During the first data assessment round, 47 researchers experienced in diary research from twelve different countries participated. They gave a description of and arguments for choosing their diary design (i.e., study duration, measurement frequency, random or fixed assessment, momentary or retrospective assessment, allowed delay to respond to the beep). During the second round, 38 participants (81%) rated the importance of the different themes identified during the first assessment round for the different diary design topics. The rationales for diary design choices reported during the first round were mostly strongly related to the research question. The rationales were categorized into four overarching themes: nature of the variables, reliability, feasibility, and statistics. During the second round, all overarching themes were considered important for all diary design topics. We conclude that no golden standard for the optimal design of a diary study exists since the design depends heavily upon the research question of the study. The findings of the current study are helpful to explicate and guide the specific choices that have to be made when designing a diary study.

Journal ArticleDOI
TL;DR: Approaches to analysis that potentially provide estimates of causal effects when such issues arise are described and clinical trialists might consider these methods both at the design and analysis stages of randomised trials with long-term follow-up.
Abstract: Randomised trials with long-term follow-up can provide estimates of the long-term effects of health interventions. However, analysis of long-term outcomes in randomised trials may be complicated by problems with the administration of treatment such as non-adherence, treatment switching and co-intervention, and problems obtaining outcome measurements arising from loss to follow-up and death of participants. Methods for dealing with these issues that involve conditioning on post-randomisation variables are unsatisfactory because they may involve the comparison of non-exchangeable groups and generate estimates that do not have a valid causal interpretation. We describe approaches to analysis that potentially provide estimates of causal effects when such issues arise. Brief descriptions are provided of the use of instrumental variable and propensity score methods in trials with imperfect adherence, marginal structural models and g-estimation in trials with treatment switching, mixed longitudinal models and multiple imputation in trials with loss to follow-up, and a sensitivity analysis that can be used when trial follow-up is truncated by death or other events. Clinical trialists might consider these methods both at the design and analysis stages of randomised trials with long-term follow-up.

Journal ArticleDOI
TL;DR: A simulation-based comparison of recurrent event models applied to composite endpoints is provided and it is demonstrated that the Andersen-Gill model and the Prentice- Williams-Petersen models show similar results under various data scenarios whereas the Wei-Lin-Weissfeld model delivers effect estimators which can considerably deviate under commonly met data scenarios.
Abstract: Many clinical trials focus on the comparison of the treatment effect between two or more groups concerning a rarely occurring event. In this situation, showing a relevant effect with an acceptable power requires the observation of a large number of patients over a long period of time. For feasibility issues, it is therefore often considered to include several event types of interest, non-fatal or fatal, and to combine them within a composite endpoint. Commonly, a composite endpoint is analyzed with standard survival analysis techniques by assessing the time to the first occurring event. This approach neglects that an individual may experience more than one event which leads to a loss of information. As an alternative, composite endpoints could be analyzed by models for recurrent events. There exists a number of such models, e.g. regression models based on count data or Cox-based models such as the approaches of Andersen and Gill, Prentice, Williams and Peterson or, Wei, Lin and Weissfeld. Although some of the methods were already compared within the literature there exists no systematic investigation for the special requirements regarding composite endpoints. Within this work a simulation-based comparison of recurrent event models applied to composite endpoints is provided for different realistic clinical trial scenarios. We demonstrate that the Andersen-Gill model and the Prentice- Williams-Petersen models show similar results under various data scenarios whereas the Wei-Lin-Weissfeld model delivers effect estimators which can considerably deviate under commonly met data scenarios. Based on the conducted simulation study, this paper helps to understand the pros and cons of the investigated methods in the context of composite endpoints and provides therefore recommendations for an adequate statistical analysis strategy and a meaningful interpretation of results.