scispace - formally typeset
Search or ask a question

Showing papers in "BMC Medical Research Methodology in 2019"


Journal ArticleDOI
TL;DR: This work demonstrates the use of machine learning techniques by developing three predictive models for cancer diagnosis using descriptions of nuclei sampled from breast masses using regularized General Linear Model regression, Support Vector Machines, and single-layer Artificial Neural Networks.
Abstract: Following visible successes on a wide range of predictive tasks, machine learning techniques are attracting substantial interest from medical researchers and clinicians. We address the need for capacity development in this area by providing a conceptual introduction to machine learning alongside a practical guide to developing and evaluating predictive algorithms using freely-available open source software and public domain data. We demonstrate the use of machine learning techniques by developing three predictive models for cancer diagnosis using descriptions of nuclei sampled from breast masses. These algorithms include regularized General Linear Model regression (GLMs), Support Vector Machines (SVMs) with a radial basis function kernel, and single-layer Artificial Neural Networks. The publicly-available dataset describing the breast mass samples (N=683) was randomly split into evaluation (n=456) and validation (n=227) samples. We trained algorithms on data from the evaluation sample before they were used to predict the diagnostic outcome in the validation dataset. We compared the predictions made on the validation datasets with the real-world diagnostic decisions to calculate the accuracy, sensitivity, and specificity of the three models. We explored the use of averaging and voting ensembles to improve predictive performance. We provide a step-by-step guide to developing algorithms using the open-source R statistical programming environment. The trained algorithms were able to classify cell nuclei with high accuracy (.94 -.96), sensitivity (.97 -.99), and specificity (.85 -.94). Maximum accuracy (.96) and area under the curve (.97) was achieved using the SVM algorithm. Prediction performance increased marginally (accuracy =.97, sensitivity =.99, specificity =.95) when algorithms were arranged into a voting ensemble. We use a straightforward example to demonstrate the theory and practice of machine learning for clinicians and medical researchers. The principals which we demonstrate here can be readily applied to other complex tasks including natural language processing and image recognition.

506 citations


Journal ArticleDOI
TL;DR: This article is aimed at researchers and doctoral students new to thematic analysis by describing a framework to assist their processes and helps practitioners to be confident that the knowledge and claims contained within research are transferable to their practice.
Abstract: Navigating the world of qualitative thematic analysis can be challenging. This is compounded by the fact that detailed descriptions of methods are often omitted from qualitative discussions. While qualitative research methodologies are now mature, there often remains a lack of fine detail in their description both at submitted peer reviewed article level and in textbooks. As one of research’s aims is to determine the relationship between knowledge and practice through the demonstration of rigour, more detailed descriptions of methods could prove useful. Rigour in quantitative research is often determined through detailed explanation allowing replication, but the ability to replicate is often not considered appropriate in qualitative research. However, a well described qualitative methodology could demonstrate and ensure the same effect. This article details the codebook development which contributed to thematic analysis of qualitative data. This analysis formed part of a mixed methods multiphase design research project, with both qualitative and quantitative inquiry and involving the convergence of data and analyses. This design consisted of three distinct phases: quantitative, qualitative and implementation phases. This article is aimed at researchers and doctoral students new to thematic analysis by describing a framework to assist their processes. The detailed description of the methods used supports attempts to utilise the thematic analysis process and to determine rigour to support the establishment of credibility. This process will assist practitioners to be confident that the knowledge and claims contained within research are transferable to their practice. The approach described within this article builds on, and enhances, current accepted models.

269 citations


Journal ArticleDOI
TL;DR: An overview of the most widely used spline-based techniques and their implementation in R, the R Language for Statistical Computing, which has become a hugely popular statistics software is provided.
Abstract: With progress on both the theoretical and the computational fronts the use of spline modelling has become an established tool in statistical regression analysis. An important issue in spline modelling is the availability of user friendly, well documented software packages. Following the idea of the STRengthening Analytical Thinking for Observational Studies initiative to provide users with guidance documents on the application of statistical methods in observational research, the aim of this article is to provide an overview of the most widely used spline-based techniques and their implementation in R. In this work, we focus on the R Language for Statistical Computing which has become a hugely popular statistics software. We identified a set of packages that include functions for spline modelling within a regression framework. Using simulated and real data we provide an introduction to spline modelling and an overview of the most popular spline functions. We present a series of simple scenarios of univariate data, where different basis functions are used to identify the correct functional form of an independent variable. Even in simple data, using routines from different packages would lead to different results. This work illustrate challenges that an analyst faces when working with data. Most differences can be attributed to the choice of hyper-parameters rather than the basis used. In fact an experienced user will know how to obtain a reasonable outcome, regardless of the type of spline used. However, many analysts do not have sufficient knowledge to use these powerful tools adequately and will need more guidance.

206 citations


Journal ArticleDOI
TL;DR: The bespoke eMERGe Reporting Guidance, which incorporates new methodological developments and advances the methodology, can help researchers to report the important aspects of meta-ethnography and should raise reporting quality.
Abstract: The aim of this study was to provide guidance to improve the completeness and clarity of meta‐ethnography reporting. Evidence‐based policy and practice require robust evidence syntheses which can further understanding of people's experiences and associated social processes. Meta‐ethnography is a rigorous seven‐phase qualitative evidence synthesis methodology, developed by Noblit and Hare. Meta‐ethnography is used widely in health research, but reporting is often poor quality and this discourages trust in and use of its findings. Meta‐ethnography reporting guidance is needed to improve reporting quality. The eMERGe study used a rigorous mixed‐methods design and evidence‐based methods to develop the novel reporting guidance and explanatory notes. The study, conducted from 2015 to 2017, comprised of: (1) a methodological systematic review of guidance for meta‐ethnography conduct and reporting; (2) a review and audit of published meta‐ethnographies to identify good practice principles; (3) international, multidisciplinary consensus‐building processes to agree guidance content; (4) innovative development of the guidance and explanatory notes. Recommendations and good practice for all seven phases of meta‐ethnography conduct and reporting were newly identified leading to 19 reporting criteria and accompanying detailed guidance.The bespoke eMERGe Reporting Guidance, which incorporates new methodological developments and advances the methodology, can help researchers to report the important aspects of meta‐ethnography. Use of the guidance should raise reporting quality. Better reporting could make assessments of confidence in the findings more robust and increase use of meta‐ethnography outputs to improve practice, policyand service user outcomes in health and other fields. This is the first tailored reporting guideline for meta‐ethnography.

188 citations


Journal ArticleDOI
TL;DR: Single screening of the titles and abstracts of studies retrieved in bibliographic searches is not equivalent to double screening, as substantially more studies are missed, however, in the opinion such an approach could still represent an appropriate methodological shortcut in rapid reviews, as long as it is conducted by an experienced reviewer.
Abstract: Stringent requirements exist regarding the transparency of the study selection process and the reliability of results. A 2-step selection process is generally recommended; this is conducted by 2 reviewers independently of each other (conventional double-screening). However, the approach is resource intensive, which can be a problem, as systematic reviews generally need to be completed within a defined period with a limited budget. The aim of the following methodological systematic review was to analyse the evidence available on whether single screening is equivalent to double screening in the screening process conducted in systematic reviews. We searched Medline, PubMed and the Cochrane Methodology Register (last search 10/2018). We also used supplementary search techniques and sources (“similar articles” function in PubMed, conference abstracts and reference lists). We included all evaluations comparing single with double screening. Data were summarized in a structured, narrative way. The 4 evaluations included investigated a total of 23 single screenings (12 sets for screening involving 9 reviewers). The median proportion of missed studies was 5% (range 0 to 58%). The median proportion of missed studies was 3% for the 6 experienced reviewers (range: 0 to 21%) and 13% for the 3 reviewers with less experience (range: 0 to 58%). The impact of missing studies on the findings of meta-analyses had been reported in 2 evaluations for 7 single screenings including a total of 18,148 references. In 3 of these 7 single screenings – all conducted by the same reviewer (with less experience) – the findings would have changed substantially. The remaining 4 of these 7 screenings were conducted by experienced reviewers and the missing studies had no impact or a negligible on the findings of the meta-analyses. Single screening of the titles and abstracts of studies retrieved in bibliographic searches is not equivalent to double screening, as substantially more studies are missed. However, in our opinion such an approach could still represent an appropriate methodological shortcut in rapid reviews, as long as it is conducted by an experienced reviewer. Further research on single screening is required, for instance, regarding factors influencing the number of studies missed.

174 citations


Journal ArticleDOI
TL;DR: The development and application of a sampling framework to sample studies from among those eligible for inclusion in a qualitative evidence synthesis on vaccination communication and ensured that studies representing a wide geographic spread, rich data and a focus that closely resembled the authors' synthesis objective were included.
Abstract: In a qualitative evidence synthesis, too much data due to a large number of studies can undermine our ability to perform a thorough analysis. Purposive sampling of primary studies for inclusion in the synthesis is one way of achieving a manageable amount of data. The objective of this article is to describe the development and application of a sampling framework for a qualitative evidence synthesis on vaccination communication. We developed and applied a three-step framework to sample studies from among those eligible for inclusion in our synthesis. We aimed to prioritise studies that were from a range of settings, were as relevant as possible to the review, and had rich data. We extracted information from each study about country and study setting, vaccine, data richness, and study objectives and applied the following sampling framework: We assessed 79 studies as eligible for inclusion in the synthesis and sampled 38 of these. First, we sampled all nine studies that were from low and middle-income countries. These studies contributed to the least number of findings. We then sampled an additional 24 studies that scored high for data richness. These studies contributed to a larger number of findings. Finally, we sampled an additional five studies that most closely matched our synthesis objectives. These contributed to a large number of findings. Our approach to purposive sampling helped ensure that we included studies representing a wide geographic spread, rich data and a focus that closely resembled our synthesis objective. It is possible that we may have overlooked primary studies that did not meet our sampling criteria but would have contributed to the synthesis. For example, two studies on migration and access to health services did not meet the sampling criteria but might have contributed to strengthening at least one finding. We need methods to cross-check for under-represented themes.

174 citations


Journal ArticleDOI
TL;DR: The CONSIDER statement provides a checklist for the reporting of health research involving Indigenous peoples to strengthen research praxis and advance Indigenous health outcomes.
Abstract: Research reporting guidelines are increasingly commonplace and shown to improve the quality of published health research and health outcomes. Despite severe health inequities among Indigenous Peoples and the potential for research to address the causes, there is an extended legacy of health research exploiting Indigenous Peoples. This paper describes the development of the CONSolIDated critERtia for strengthening the reporting of health research involving Indigenous Peoples (CONSIDER) statement. A collaborative prioritization process was conducted based on national and international statements and guidelines about Indigenous health research from the following nations (Peoples): Australia (Aboriginal and Torres Strait Islanders), Canada (First Nations Peoples, Metis), Hawaii (Native Hawaiian), New Zealand (Māori), Taiwan (Taiwan Indigenous Tribes), United States of America (First Nations Peoples) and Northern Scandinavian countries (Sami). A review of seven research guidelines was completed, and meta-synthesis was used to construct a reporting guideline checklist for transparent and comprehensive reporting of research involving Indigenous Peoples. A list of 88 possible checklist items was generated, reconciled, and categorized. Eight research domains and 17 criteria for the reporting of research involving Indigenous Peoples were identified. The research reporting domains were: (i) governance; (ii) relationships; (iii) prioritization; (iv) methodologies; (v) participation; (vi) capacity; (vii) analysis and findings; and (viii) dissemination. The CONSIDER statement is a collaborative synthesis and prioritization of national and international research statements and guidelines. The CONSIDER statement provides a checklist for the reporting of health research involving Indigenous peoples to strengthen research praxis and advance Indigenous health outcomes.

159 citations


Journal ArticleDOI
TL;DR: The current work is the first to articulate and differentiate the methodological variations and their application for different purposes and represents a significant advance in the understanding of the methodological application of meta-ethnography.
Abstract: Decision making in health and social care requires robust syntheses of both quantitative and qualitative evidence. Meta-ethnography is a seven-phase methodology for synthesising qualitative studies. Developed in 1988 by sociologists in education Noblit and Hare, meta-ethnography has evolved since its inception; it is now widely used in healthcare research and is gaining popularity in education research. The aim of this article is to provide up-to-date, in-depth guidance on conducting the complex analytic synthesis phases 4 to 6 of meta-ethnography through analysis of the latest methodological evidence. We report findings from a methodological systematic review conducted from 2015 to 2016. Fourteen databases and five other online resources were searched. Expansive searches were also conducted resulting in inclusion of 57 publications on meta-ethnography conduct and reporting from a range of academic disciplines published from 1988 to 2016. Current guidance on applying meta-ethnography originates from a small group of researchers using the methodology in a health context. We identified that researchers have operationalised the analysis and synthesis methods of meta-ethnography – determining how studies are related (phase 4), translating studies into one another (phase 5), synthesising translations (phase 6) and line of argument synthesis - to suit their own syntheses resulting in variation in methods and their application. Empirical research is required to compare the impact of different methods of translation and synthesis. Some methods are potentially better at preserving links with the context and meaning of primary studies, a key principle of meta-ethnography. A meta-ethnography can and should include reciprocal and refutational translation and line of argument synthesis, rather than only one of these, to maximise the impact of its outputs. The current work is the first to articulate and differentiate the methodological variations and their application for different purposes and represents a significant advance in the understanding of the methodological application of meta-ethnography.

128 citations


Journal ArticleDOI
TL;DR: In in-person study interviews were marginally superior to video calls in that interviewees said more, although this was on a similar range of topics, and time and budget constraints may justify the use of some video call interviews within a qualitative research study.
Abstract: Within qualitative research in-person interviews have the reputation for being the highest standard of interviewer-participant encounter. However, there are other approaches to interviewing such as telephone and e-mail, which may be appropriate for a variety of reasons such as cost, time and privacy. Although there has been much discussion of the relative values of different interview methods, little research has been conducted to assess what differentiates them using quantifiable measures. None of this research has addressed the video call, which is the interview mode most like the in-person interview. This study uses quantifiable measures generated by the interview to explore the relative value of in-person and video call interview modes. Interview data gathered by a qualitative research study exploring the views of people with IBS about hypnotherapy for their condition were used. In-person and video call interviews using the same topic guide were compared on measures of length (time and word count), proportion of time the interviewer was dominant, the number of topics generated (codes) and the number of individual statements on which those topics were based. Both interview methods produced a similar number of words and a similar number of topics (codes) were discussed, however the number of statements upon which the variety of topics was based was notably larger for the in-person interviews. These findings suggest that in in-person study interviews were marginally superior to video calls in that interviewees said more, although this was on a similar range of topics. However, the difference is sufficiently modest that time and budget constraints may justify the use of some video call interviews within a qualitative research study.

122 citations


Journal ArticleDOI
TL;DR: A freely available web-based “point and click” interactive tool which allows users to input their DTA study data and conduct meta-analyses for DTA reviews, including sensitivity analyses, and allows for sensitivity analyses to be conducted in a timely manner.
Abstract: Recommended statistical methods for meta-analysis of diagnostic test accuracy studies require relatively complex bivariate statistical models which can be a barrier for non-statisticians. A further barrier exists in the software options available for fitting such models. Software accessible to non-statisticians, such as RevMan, does not support the fitting of bivariate models thus users must seek statistical support to use R, Stata or SAS. Recent advances in web technologies make analysis tool creation much simpler than previously. As well as accessibility, online tools can allow tailored interactivity not found in other packages allowing multiple perspectives of data to be displayed and information to be tailored to the user’s preference from a simple interface. We set out to: (i) Develop a freely available web-based “point and click” interactive tool which allows users to input their DTA study data and conduct meta-analyses for DTA reviews, including sensitivity analyses. (ii) Illustrate the features and benefits of the interactive application using an existing DTA meta-analysis for detecting dementia. To create our online freely available interactive application we used the existing R packages lme4 and Shiny to analyse the data and create an interactive user interface respectively. MetaDTA, an interactive online application was created for conducting meta-analysis of DTA studies. The user interface was designed to be easy to navigate having different tabs for different functions. Features include the ability for users to enter their own data, customise plots, incorporate quality assessment results and quickly conduct sensitivity analyses. All plots produced can be exported as either .png or .pdf files to be included in report documents. All tables can be exported as .csv files. MetaDTA, is a freely available interactive online application which meta-analyses DTA studies, plots the summary ROC curve, incorporates quality assessment results and allows for sensitivity analyses to be conducted in a timely manner. Due to the rich feature-set and user-friendliness of the software it should appeal to a wide audience including those without specialist statistical knowledge. We encourage others to create similar applications for specialist analysis methods to encourage broader uptake which in-turn could improve research quality.

121 citations


Journal ArticleDOI
TL;DR: BUGSnet is a new R package that can be used to conduct a Bayesian NMA and produce all of the necessary output needed to satisfy current scientific and regulatory standards.
Abstract: Several reviews have noted shortcomings regarding the quality and reporting of network meta-analyses (NMAs). We suspect that this issue may be partially attributable to limitations in current NMA software which do not readily produce all of the output needed to satisfy current guidelines. To better facilitate the conduct and reporting of NMAs, we have created an R package called “BUGSnet” (Bayesian inference Using Gibbs Sampling to conduct a Network meta-analysis). This R package relies upon Just Another Gibbs Sampler (JAGS) to conduct Bayesian NMA using a generalized linear model. BUGSnet contains a suite of functions that can be used to describe the evidence network, estimate a model and assess the model fit and convergence, assess the presence of heterogeneity and inconsistency, and output the results in a variety of formats including league tables and surface under the cumulative rank curve (SUCRA) plots. We provide a demonstration of the functions contained within BUGSnet by recreating a Bayesian NMA found in the second technical support document composed by the National Institute for Health and Care Excellence Decision Support Unit (NICE-DSU). We have also mapped these functions to checklist items within current reporting and best practice guidelines. BUGSnet is a new R package that can be used to conduct a Bayesian NMA and produce all of the necessary output needed to satisfy current scientific and regulatory standards. We hope that this software will help to improve the conduct and reporting of NMAs.

Journal ArticleDOI
TL;DR: The aim of this study was to analyse the definition of a systematic review in health care literature, elements of the definitions that are used and to propose a starting point for an explicit and non-ambiguous SR definition.
Abstract: A standard or consensus definition of a systematic review does not exist. Therefore, if there is no definition about a systematic review in secondary studies that analyse them or the definition is too broad, inappropriate studies might be included in such evidence synthesis. The aim of this study was to analyse the definition of a systematic review (SR) in health care literature, elements of the definitions that are used and to propose a starting point for an explicit and non-ambiguous SR definition. We included overviews of systematic reviews (OSRs), meta-epidemiological studies and epidemiology textbooks. We extracted the definitions of SRs, as well as the inclusion and exclusion criteria that could indicate which definition of a SR the authors used. We extracted individual elements of SR definitions, categorised and quantified them. Among the 535 analysed sources of information, 188 (35%) provided a definition of a SR. The most commonly used reference points for the definitions of SRs were Cochrane and the PRISMA statement. We found 188 different elements of SR definitions and divided them into 14 categories. The highest number of SR definition elements was found in categories related to searching (N = 51), analysis/synthesis (N = 23), overall methods (N = 22), quality/bias/appraisal/validity (N = 22) and aim/question (N = 13). The same five categories were also the most commonly used combination of categories in the SR definitions. Currently used definitions of SRs are vague and ambiguous, often using terms such as clear, explicit and systematic, without further elaboration. In this manuscript we propose a more specific definition of a systematic review, with the ultimate aim of motivating the research community to establish a clear and unambiguous definition of this type of research.

Journal ArticleDOI
TL;DR: This methodological study identified problems in the reporting of design features and results of interrupted time series studies, and highlights the need for future work in the development of formal reporting guidelines and methodological work.
Abstract: Randomised controlled trials (RCTs) are considered the gold standard when evaluating the causal effects of healthcare interventions. When RCTs cannot be used (e.g. ethically difficult), the interrupted time series (ITS) design is a possible alternative. ITS is one of the strongest quasi-experimental designs. The aim of this methodological study was to describe how ITS designs were being used, the design characteristics, and reporting in the healthcare setting. We searched MEDLINE for reports of ITS designs published in 2015 which had a minimum of two data points collected pre-intervention and one post-intervention. There was no restriction on participants, language of study, or type of outcome. Data were summarised using appropriate summary statistics. One hundred and sixteen studies were included in the study. Interventions evaluated were mainly programs 41 (35%) and policies 32 (28%). Data were usually collected at monthly intervals, 74 (64%). Of the 115 studies that reported an analysis, the most common method was segmented regression (78%), 55% considered autocorrelation, and only seven reported a sample size calculation. Estimation of intervention effects were reported as change in slope (84%) and change in level (70%) and 21% reported long-term change in levels. This methodological study identified problems in the reporting of design features and results of ITS studies, and highlights the need for future work in the development of formal reporting guidelines and methodological work.

Journal ArticleDOI
TL;DR: Simulation studies are used to investigate the disjunctive power, marginal power and FWER obtained after applying Bonferroni, Holm, Hochberg, Dubey/Armitage-Parmar and Stepdown-minP adjustment methods.
Abstract: Multiple primary outcomes may be specified in randomised controlled trials (RCTs). When analysing multiple outcomes it’s important to control the family wise error rate (FWER). A popular approach to do this is to adjust the p-values corresponding to each statistical test used to investigate the intervention effects by using the Bonferroni correction. It’s also important to consider the power of the trial to detect true intervention effects. In the context of multiple outcomes, depending on the clinical objective, the power can be defined as: ‘disjunctive power’, the probability of detecting at least one true intervention effect across all the outcomes or ‘marginal power’ the probability of finding a true intervention effect on a nominated outcome. We provide practical recommendations on which method may be used to adjust for multiple comparisons in the sample size calculation and the analysis of RCTs with multiple primary outcomes. We also discuss the implications on the sample size for obtaining 90% disjunctive power and 90% marginal power. We use simulation studies to investigate the disjunctive power, marginal power and FWER obtained after applying Bonferroni, Holm, Hochberg, Dubey/Armitage-Parmar and Stepdown-minP adjustment methods. Different simulation scenarios were constructed by varying the number of outcomes, degree of correlation between the outcomes, intervention effect sizes and proportion of missing data. The Bonferroni and Holm methods provide the same disjunctive power. The Hochberg and Hommel methods provide power gains for the analysis, albeit small, in comparison to the Bonferroni method. The Stepdown-minP procedure performs well for complete data. However, it removes participants with missing values prior to the analysis resulting in a loss of power when there are missing data. The sample size requirement to achieve the desired disjunctive power may be smaller than that required to achieve the desired marginal power. The choice between whether to specify a disjunctive or marginal power should depend on the clincial objective.

Journal ArticleDOI
TL;DR: The logic models used in healthcare research from a complexity perspective are assessed, with a typology of existing logic models proposed, as well as a formal methodology for deriving more flexible and dynamic logic models.
Abstract: Logic models are commonly used in evaluations to represent the causal processes through which interventions produce outcomes, yet significant debate is currently taking place over whether they can describe complex interventions which adapt to context. This paper assesses the logic models used in healthcare research from a complexity perspective. A typology of existing logic models is proposed, as well as a formal methodology for deriving more flexible and dynamic logic models. Various logic model types were tested as part of an evaluation of a complex Patient Experience Toolkit (PET) intervention, developed and implemented through action research across six hospital wards/departments in the English NHS. Three dominant types of logic model were identified, each with certain strengths but ultimately unable to accurately capture the dynamics of PET. Hence, a fourth logic model type was developed to express how success hinges on the adaption of PET to its delivery settings. Aspects of the Promoting Action on Research Implementation in Health Services (PARIHS) model were incorporated into a traditional logic model structure to create a dynamic “type 4” logic model that can accommodate complex interventions taking on a different form in different settings. Logic models can be used to model complex interventions that adapt to context but more flexible and dynamic models are required. An implication of this is that how logic models are used in healthcare research may have to change. Using logic models to forge consensus among stakeholders and/or provide precise guidance across different settings will be inappropriate in the case of complex interventions that adapt to context. Instead, logic models for complex interventions may be targeted at facilitators to enable them to prospectively assess the settings they will be working in and to develop context-sensitive facilitation strategies. Researchers should be clear as to why they are using a logic model and experiment with different models to ensure they have the correct type.

Journal ArticleDOI
TL;DR: Suboptimal use of secondary databases in pharmacoepidemiologic studies has introduced biases in the studies, which may have led to erroneous conclusions.
Abstract: The availability of clinical and therapeutic data drawn from medical records and administrative databases has entailed new opportunities for clinical and epidemiologic research. However, these databases present inherent limitations which may render them prone to new biases. We aimed to conduct a structured review of biases specific to observational clinical studies based on secondary databases, and to propose strategies for the mitigation of those biases. Scoping review of the scientific literature published during the period 2000–2018 through an automated search of MEDLINE, EMBASE and Web of Science, supplemented with manually cross-checking of reference lists. We included opinion essays, methodological reviews, analyses or simulation studies, as well as letters to the editor or retractions, the principal objective of which was to highlight the existence of some type of bias in pharmacoepidemiologic studies using secondary databases. A total of 117 articles were included. An increasing trend in the number of publications concerning the potential limitations of secondary databases was observed over time and across medical research disciplines. Confounding was the most reported category of bias (63.2% of articles), followed by selection and measurement biases (47.0% and 46.2% respectively). Confounding by indication (32.5%), unmeasured/residual confounding (28.2%), outcome misclassification (28.2%) and “immortal time” bias (25.6%) were the subcategories most frequently mentioned. Suboptimal use of secondary databases in pharmacoepidemiologic studies has introduced biases in the studies, which may have led to erroneous conclusions. Methods to mitigate biases are available and must be considered in the design, analysis and interpretation phases of studies using these data sources.

Journal ArticleDOI
TL;DR: This investigation suggests that the low baseline participation in LIFE-Adult is associated with the typical selection of study participants with higher social status and healthier lifestyle, and additionally less disease.
Abstract: Participation in epidemiologic studies is steadily declining, which may result in selection bias. It is therefore an ongoing challenge to clarify the determinants of participation to judge possible selection effects and to derive measures to minimise that bias. We evaluated the potential for selection bias in a recent population-based cohort study with low baseline participation and investigated reasons for nonparticipation. LIFE-Adult is a cohort study in the general population of the city of Leipzig (Germany) designed to gain insights into the distribution and development of civilisation diseases. Nine thousand one hundred forty-five participants aged 40–79 years were randomly sampled in 2011–2014. We compared LIFE-Adult participants with both the Leipzig population and nonparticipants using official statistics and short questionnaire data. We applied descriptive statistics and logistic regression analysis to evaluate the determinants of study participation. Thirty-one percent of the invited persons participated in the LIFE-Adult baseline examination. Study participants were less often elderly women and more often married, highly educated, employed, and current nonsmokers compared to both the Leipzig population and nonparticipants. They further reported better health than nonparticipants. The observed differences were considerable in education and health variables. They were generally stronger in men than in women. For example, in male study participants aged 50–69, the frequency of high education was 1.5 times that of the general population, and the frequency of myocardial infarction was half that of nonparticipants. Lack of time and interest, as well as health problems were the main reasons for nonparticipation. Our investigation suggests that the low baseline participation in LIFE-Adult is associated with the typical selection of study participants with higher social status and healthier lifestyle, and additionally less disease. Notably, education and health status seem to be crucial selection factors. Consequently, frequencies of major health conditions in the general population will likely be underestimated. A differential selection related to sex might also distort effect estimates. The extent of the assessment, the interest in the research topic, and health problems of potential participants should in future be considered in LIFE-Adult and in similar studies to raise participation and to minimise selection bias.

Journal ArticleDOI
TL;DR: The aim of this study was to show the relative performance of the unstandardized and standardized estimates of the indirect effect and proportion mediated based on multiple regression, structural equation modeling, and the potential outcomes framework for mediation models with a dichotomous outcome.
Abstract: BACKGROUND: Logistic regression is often used for mediation analysis with a dichotomous outcome. However, previous studies showed that the indirect effect and proportion mediated are often affected by a change of scales in logistic regression models. To circumvent this, standardization has been proposed. The aim of this study was to show the relative performance of the unstandardized and standardized estimates of the indirect effect and proportion mediated based on multiple regression, structural equation modeling, and the potential outcomes framework for mediation models with a dichotomous outcome. METHODS: We compared the performance of the effect estimates yielded by the three methods using a simulation study and two real-life data examples from an observational cohort study (n = 360). RESULTS: Lowest bias and highest efficiency were observed for the estimates from the potential outcomes framework and for the crude indirect effect ab and the proportion mediated ab/(ab + c') based on multiple regression and SEM. CONCLUSIONS: We advise the use of either the potential outcomes framework estimates or the ab estimate of the indirect effect and the ab/(ab + c') estimate of the proportion mediated based on multiple regression and SEM when mediation analysis is based on logistic regression. Standardization of the coefficients prior to estimating the indirect effect and the proportion mediated may not increase the performance of these estimates.

Journal ArticleDOI
TL;DR: A heuristic tool is shared to help investigators determine where their research questions fall in the translational research continuum and a series of structured questions about intervention efficacy, effectiveness, and implementation can help guide researchers to select research questions and appropriate study designs.
Abstract: Beginners to the discipline of implementation science often struggle to determine whether their research questions “count” as implementation science. In this paper, three implementation scientists share a heuristic tool to help investigators determine where their research questions fall in the translational research continuum. They use a “subway model” that envisions a journey to implementation research with stops along the way at efficacy and effectiveness research. A series of structured questions about intervention efficacy, effectiveness, and implementation can help guide researchers to select research questions and appropriate study designs along the spectrum of translational research.

Journal ArticleDOI
TL;DR: In this paper, the authors compared the performance of different meta-analysis methods, including the DerSimonian-Laird approach, empirically and in a simulation study, based on few studies, imbalanced study sizes, and considering odds-ratio and risk ratio (RR) effect sizes.
Abstract: Standard random-effects meta-analysis methods perform poorly when applied to few studies only. Such settings however are commonly encountered in practice. It is unclear, whether or to what extent small-sample-size behaviour can be improved by more sophisticated modeling. We consider likelihood-based methods, the DerSimonian-Laird approach, Empirical Bayes, several adjustment methods and a fully Bayesian approach. Confidence intervals are based on a normal approximation, or on adjustments based on the Student-t-distribution. In addition, a linear mixed model and two generalized linear mixed models (GLMMs) assuming binomial or Poisson distributed numbers of events per study arm are considered for pairwise binary meta-analyses. We extract an empirical data set of 40 meta-analyses from recent reviews published by the German Institute for Quality and Efficiency in Health Care (IQWiG). Methods are then compared empirically as well as in a simulation study, based on few studies, imbalanced study sizes, and considering odds-ratio (OR) and risk ratio (RR) effect sizes. Coverage probabilities and interval widths for the combined effect estimate are evaluated to compare the different approaches. Empirically, a majority of the identified meta-analyses include only 2 studies. Variation of methods or effect measures affects the estimation results. In the simulation study, coverage probability is, in the presence of heterogeneity and few studies, mostly below the nominal level for all frequentist methods based on normal approximation, in particular when sizes in meta-analyses are not balanced, but improve when confidence intervals are adjusted. Bayesian methods result in better coverage than the frequentist methods with normal approximation in all scenarios, except for some cases of very large heterogeneity where the coverage is slightly lower. Credible intervals are empirically and in the simulation study wider than unadjusted confidence intervals, but considerably narrower than adjusted ones, with some exceptions when considering RRs and small numbers of patients per trial-arm. Confidence intervals based on the GLMMs are, in general, slightly narrower than those from other frequentist methods. Some methods turned out impractical due to frequent numerical problems. In the presence of between-study heterogeneity, especially with unbalanced study sizes, caution is needed in applying meta-analytical methods to few studies, as either coverage probabilities might be compromised, or intervals are inconclusively wide. Bayesian estimation with a sensibly chosen prior for between-trial heterogeneity may offer a promising compromise.

Journal ArticleDOI
TL;DR: A range of methods can be used successively or combined at various steps of the evaluation approach, and a framework is proposed to situate each of the designs with respect to evaluation questions.
Abstract: Evaluation of complex interventions (CI) is challenging for health researchers and requires innovative approaches. The objective of this work is to present the main methods used to evaluate CI. A systematic review of the scientific literature was conducted to identify methods used for the evaluation of CI. We searched MEDLINE via PubMed databases for articles including an evaluation or a pilot study of a complex intervention, published in a ten-year period. Key-words of this research were (“complex intervention*” AND “evaluation”). Among 445 identified articles, 100 research results or protocols were included. Among them, 5 presented 2 different types of design in the same publication, thus our work included 105 designs. Individual randomized controlled trials (IRCT) represented 21.9% (n = 23) of evaluation designs, randomized clinical trials adaptations 44.8% (n = 47), quasi -experimental designs and cohort study 19.0% (n = 20), realist evaluation 6.7% (n = 7) and other cases studies and other approaches 8.6% (n = 9). A process/mechanisms analysis was included in 80% (n = 84) of these designs. A range of methods can be used successively or combined at various steps of the evaluation approach. A framework is proposed to situate each of the designs with respect to evaluation questions. The growing interest of researchers in alternative methods and the development of their use must be accompanied by conceptual and methodological research in order to more clearly define their principles of use.

Journal ArticleDOI
TL;DR: Caution is warranted when undertaking regression analysis of RDS data, and even when reported degree is accurate, low reported degree can unduly influence regression estimates.
Abstract: It is unclear whether weighted or unweighted regression is preferred in the analysis of data derived from respondent driven sampling. Our objective was to evaluate the validity of various regression models, with and without weights and with various controls for clustering in the estimation of the risk of group membership from data collected using respondent-driven sampling (RDS). Twelve networked populations, with varying levels of homophily and prevalence, based on a known distribution of a continuous predictor were simulated using 1000 RDS samples from each population. Weighted and unweighted binomial and Poisson general linear models, with and without various clustering controls and standard error adjustments were modelled for each sample and evaluated with respect to validity, bias and coverage rate. Population prevalence was also estimated. In the regression analysis, the unweighted log-link (Poisson) models maintained the nominal type-I error rate across all populations. Bias was substantial and type-I error rates unacceptably high for weighted binomial regression. Coverage rates for the estimation of prevalence were highest using RDS-weighted logistic regression, except at low prevalence (10%) where unweighted models are recommended. Caution is warranted when undertaking regression analysis of RDS data. Even when reported degree is accurate, low reported degree can unduly influence regression estimates. Unweighted Poisson regression is therefore recommended.

Journal ArticleDOI
TL;DR: The comorbidity code lists may be used by future researchers to calculate CCI and EM using records from Read coded databases, and the EM is preferable to the CCI but only marginal gains should be expected from incorporatingComorbidities over a period longer than 1 year.
Abstract: Comorbidity measures, such as the Charlson Comorbidity Index (CCI) and Elixhauser Method (EM), are frequently used for risk-adjustment by healthcare researchers. This study sought to create CCI and EM lists of Read codes, which are standard terminology used in some large primary care databases. It also aimed to describe and compare the predictive properties of the CCI and EM amongst patients with hip fracture (and matched controls) in a large primary care administrative dataset. Two researchers independently screened 111,929 individual Read codes to populate the 17 CCI and 31 EM comorbidity categories. Patients with hip fractures were identified (together with age- and sex-matched controls) from UK primary care practices participating in the Clinical Practice Research Datalink (CPRD). The predictive properties of both comorbidity measures were explored in hip fracture and control populations using logistic regression models fitted with 30- and 365-day mortality as the dependent variables together with tests of equality for Receiver Operating Characteristic (ROC) curves. There were 5832 CCI and 7156 EM comorbidity codes. The EM improved the ability of a logistic regression model (using age and sex as covariables) to predict 30-day mortality (AUROC 0.744 versus 0.686). The EM alone also outperformed the CCI (0.696 versus 0.601). Capturing comorbidities over a prolonged period only modestly improved the predictive value of either index: EM 1-year look-back 0.645 versus 5-year 0.676 versus complete record 0.695 and CCI 0.574 versus 0.591 versus 0.605. The comorbidity code lists may be used by future researchers to calculate CCI and EM using records from Read coded databases. The EM is preferable to the CCI but only marginal gains should be expected from incorporating comorbidities over a period longer than 1 year.

Journal ArticleDOI
TL;DR: A new construct is developed, Hierarchies of Evidence Applied to Lifestyle Medicine (HEALM), to illustrate the feasibility of a tool based on the specific contributions of diverse research methods to understanding lifetime effects of health behaviors.
Abstract: Current methods for assessing strength of evidence prioritize the contributions of randomized controlled trials (RCTs). The objective of this study was to characterize strength of evidence (SOE) tools in recent use, identify their application to lifestyle interventions for improved longevity, vitality, or successful aging, and to assess implications of the findings. The search strategy was created in PubMed and modified as needed for four additional databases: Embase, AnthropologyPlus, PsycINFO, and Ageline, supplemented by manual searching. Systematic reviews and meta-analyses of intervention trials or observational studies relevant to lifestyle intervention were included if they used a specified SOE tool. Data was collected for each SOE tool. Conditions necessary for assigning the highest SOE grading and treatment of prospective cohort studies within each SOE rating framework were summarized. The expert panel convened to discuss the implications of findings for assessing evidence in the domain of lifestyle medicine. A total of 15 unique tools were identified. Ten were tools developed and used by governmental agencies or other equivalent professional bodies and were applicable in a variety of settings. Of these 10, four require consistent results from RCTs of high quality to award the highest rating of evidence. Most SOE tools include prospective cohort studies only to note their secondary contribution to overall SOE as compared to RCTs. We developed a new construct, Hierarchies of Evidence Applied to Lifestyle Medicine (HEALM), to illustrate the feasibility of a tool based on the specific contributions of diverse research methods to understanding lifetime effects of health behaviors. Assessment of evidence relevant to lifestyle medicine requires a potential adaptation of SOE approaches when outcomes and/or exposures obviate exclusive or preferential reliance on RCTs. This systematic review was registered with the International Prospective Register of Systematic Reviews, PROSPERO [CRD42018082148].

Journal ArticleDOI
TL;DR: Pre-issued cash incentives and sending follow-up waves could maximize the representativeness and numbers of people from which to recruit, and may be an effective strategy for improving recruitment into field studies.
Abstract: Questionnaires are valuable data collection instruments in public health research, and can serve to pre-screen respondents for suitability in future studies. Survey non-response leads to reduced effective sample sizes and can decrease representativeness of the study population, so high response rates are needed to minimize the risk of bias. Here we present results on the success of different postal questionnaire strategies at effecting response, and the effectiveness of these strategies at recruiting participants for a field study on the effects of aircraft noise on sleep. In total, we mailed 17 rounds of 240 questionnaires (total n = 4080) to randomly selected households around Atlanta International Airport. Different mailing rounds were varied in the length of the questionnaire (11, 26 or 55 questions), survey incentive (gift card or $2 cash), number of follow-up waves (0, 2 or 3), incentive for participating in a 5-night in-home sleep study ($100, $150 or $200), and address personalization. We received completed questionnaires from 407 respondents (response rate 11.4%). Personalizing the address, enclosing a $2 cash incentive with the initial questionnaire mailing and repeated follow-up mailings were effective at increasing response rate. Despite the increased expense of these approaches in terms of each household mailed, the higher response rates meant that they were more cost-effective overall for obtaining an equivalent number of responses. Interest in participating in the field study decreased with age, but was unaffected by the mailing strategies or cash incentives for field study participation. The likelihood that a respondent would participate in the field study was unaffected by survey incentive, survey length, number of follow-up waves, field study incentive, age or sex. Pre-issued cash incentives and sending follow-up waves could maximize the representativeness and numbers of people from which to recruit, and may be an effective strategy for improving recruitment into field studies.

Journal ArticleDOI
TL;DR: Given the complex interactive influence among sample sizes, effect sizes and predictor distribution characteristics, it seems unwarranted to make generic rule-of-thumb sample size recommendations for multilevel logistic regression, aside from the fact that larger sample sizes are required when the distributions of the predictors are not symmetric or balanced.
Abstract: Despite its popularity, issues concerning the estimation of power in multilevel logistic regression models are prevalent because of the complexity involved in its calculation (i.e., computer-simulation-based approaches). These issues are further compounded by the fact that the distribution of the predictors can play a role in the power to estimate these effects. To address both matters, we present a sample of cases documenting the influence that predictor distribution have on statistical power as well as a user-friendly, web-based application to conduct power analysis for multilevel logistic regression. Computer simulations are implemented to estimate statistical power in multilevel logistic regression with varying numbers of clusters, varying cluster sample sizes, and non-normal and non-symmetrical distributions of the Level 1/2 predictors. Power curves were simulated to see in what ways non-normal/unbalanced distributions of a binary predictor and a continuous predictor affect the detection of population effect sizes for main effects, a cross-level interaction and the variance of the random effects. Skewed continuous predictors and unbalanced binary ones require larger sample sizes at both levels than balanced binary predictors and normally-distributed continuous ones. In the most extreme case of imbalance (10% incidence) and skewness of a chi-square distribution with 1 degree of freedom, even 110 Level 2 units and 100 Level 1 units were not sufficient for all predictors to reach power of 80%, mostly hovering at around 50% with the exception of the skewed, continuous Level 2 predictor. Given the complex interactive influence among sample sizes, effect sizes and predictor distribution characteristics, it seems unwarranted to make generic rule-of-thumb sample size recommendations for multilevel logistic regression, aside from the fact that larger sample sizes are required when the distributions of the predictors are not symmetric or balanced. The more skewed or imbalanced the predictor is, the larger the sample size requirements. To assist researchers in planning research studies, a user-friendly web application that conducts power analysis via computer simulations in the R programming language is provided. With this web application, users can conduct simulations, tailored to their study design, to estimate statistical power for multilevel logistic regression models.

Journal ArticleDOI
TL;DR: This study presents a formal guidance on sample size calculations for retrospective burden of illness studies based on the ability to comprehensively observe treatments and maximize precision of resulting 95% confidence intervals.
Abstract: Observational burden of illness studies are used in pharmacoepidemiology to address a variety of objectives, including contextualizing the current treatment setting, identifying important treatment gaps, and providing estimates to parameterize economic models. Methodologies such as retrospective chart review may be utilized in settings for which existing datasets are not available or do not include sufficient clinical detail. While specifying the number of charts to be extracted and/or determining whether the number that can feasibly extracted will be clinically meaningful is an important study design consideration, there is a lack of rigorous methods available for sample size calculation in this setting. The objective of this study was to develop recommended sample size calculations for use in such studies. Calculations for identifying the optimal feasible sample size calculations were derived, for studies characterizing treatment patterns and medical costs, based on the ability to comprehensively observe treatments and maximize precision of resulting 95% confidence intervals. For cost outcomes, if the standard deviation is not known, the coefficient of variation cv can be used as an alternative. A case study of a chart review of advanced melanoma (MELODY) was used to characterize plausible values for cv in a real-world example. Across sample sizes, any treatment given with greater than 1% frequency has a high likelihood of being observed. For a sample of size 200, and a treatment given to 5% of the population, the precision of a 95% confidence interval (CI) is expected to be ±0.03. For cost outcomes, for the median cv value observed in the MELODY study (0.72), a sample size of approximately 200 would be required to generate a 95% CI precise to within ±10% of the mean. This study presents a formal guidance on sample size calculations for retrospective burden of illness studies. The approach presented here is methodologically rigorous and designed for practical application in real-world retrospective chart review studies.

Journal ArticleDOI
TL;DR: Declining survey response rates may be ameliorated with the use of selected methods, and the effectiveness of each of these methods should be evaluated using randomised controlled trials to identify novel strategies for engaging populations in survey research.
Abstract: Surveys are established methods for collecting population data that are unavailable from other sources; however, response rates to surveys are declining. A number of methods have been identified to increase survey returns yet response rates remain low. This paper evaluates the impact of five selected methods on the response rate to pilot surveys, conducted prior to a large-scale National Maternity Survey in England. The pilot national maternity surveys were cross-sectional population-based questionnaire surveys of women who were three months postpartum selected at random from birth registrations. Women received a postal questionnaire, which they could complete on paper, online or verbally over the telephone. An initial pilot survey was conducted (pilot 1, n = 1000) to which the response rate was lower than expected. Therefore, a further pilot survey was conducted (pilot 2, n = 2000) using additional selected methods with the specific aim of increasing the response rate. The additional selected methods used for all women in pilot 2 were: pre-notification, a shorter questionnaire, more personable survey materials, an additional reminder, and inclusion of quick response (QR) codes to enable faster access to the online version of the survey. To assess the impact of the selected methods, response rates to pilot surveys 1 and 2 were compared. The response rate increased significantly from 28.7% in pilot 1 to 33.1% in pilot 2 (+ 4.4%, 95%CI:0.88–7.83, p = 0.02). Analysis of weekly returns according to time from initial and reminder mail-outs suggests that this increase was largely due to the additional reminder. Most respondents completed the paper questionnaire rather than taking part online or over the telephone in both pilot surveys. However, the overall response to the online questionnaire almost doubled from 1.8% in pilot 1 to 3.5% in pilot 2, corresponding to an absolute difference of 1.7% (95%CI:0.45–2.81, p = 0.01), suggesting that QR codes might have facilitated online participation. Declining survey response rates may be ameliorated with the use of selected methods. Further studies should evaluate the effectiveness of each of these methods using randomised controlled trials and identify novel strategies for engaging populations in survey research.

Journal ArticleDOI
TL;DR: A structured framework for designing a dose-finding study using the CRM is presented and practical tools to support clinicians and statisticians, including software recommendations, and template text and tables that can be edited and inserted into a trial protocol are provided.
Abstract: The continual reassessment method (CRM) is a model-based design for phase I trials, which aims to find the maximum tolerated dose (MTD) of a new therapy. The CRM has been shown to be more accurate in targeting the MTD than traditional rule-based approaches such as the 3 + 3 design, which is used in most phase I trials. Furthermore, the CRM has been shown to assign more trial participants at or close to the MTD than the 3 + 3 design. However, the CRM’s uptake in clinical research has been incredibly slow, putting trial participants, drug development and patients at risk. Barriers to increasing the use of the CRM have been identified, most notably a lack of knowledge amongst clinicians and statisticians on how to apply new designs in practice. No recent tutorial, guidelines, or recommendations for clinicians on conducting dose-finding studies using the CRM are available. Furthermore, practical resources to support clinicians considering the CRM for their trials are scarce. To help overcome these barriers, we present a structured framework for designing a dose-finding study using the CRM. We give recommendations for key design parameters and advise on conducting pre-trial simulation work to tailor the design to a specific trial. We provide practical tools to support clinicians and statisticians, including software recommendations, and template text and tables that can be edited and inserted into a trial protocol. We also give guidance on how to conduct and report dose-finding studies using the CRM. An initial set of design recommendations are provided to kick-start the design process. To complement these and the additional resources, we describe two published dose-finding trials that used the CRM. We discuss their designs, how they were conducted and analysed, and compare them to what would have happened under a 3 + 3 design. The framework and resources we provide are aimed at clinicians and statisticians new to the CRM design. Provision of key resources in this contemporary guidance paper will hopefully improve the uptake of the CRM in phase I dose-finding trials.

Journal ArticleDOI
TL;DR: A data quality evaluation process is proposed that emphasizes the use of multivariate outlier detection for identifying errors, and shows that univariate approaches alone are insufficient for large studies as many errors remain undetected.
Abstract: Large and complex studies are now routine, and quality assurance and quality control (QC) procedures ensure reliable results and conclusions. Standard procedures may comprise manual verification and double entry, but these labour-intensive methods often leave errors undetected. Outlier detection uses a data-driven approach to identify patterns exhibited by the majority of the data and highlights data points that deviate from these patterns. Univariate methods consider each variable independently, so observations that appear odd only when two or more variables are considered simultaneously remain undetected. We propose a data quality evaluation process that emphasizes the use of multivariate outlier detection for identifying errors, and show that univariate approaches alone are insufficient. Further, we establish an iterative process that uses multiple multivariate approaches, communication between teams, and visualization for other large-scale projects to follow. We illustrate this process with preliminary neuropsychology and gait data for the vascular cognitive impairment cohort from the Ontario Neurodegenerative Disease Research Initiative, a multi-cohort observational study that aims to characterize biomarkers within and between five neurodegenerative diseases. Each dataset was evaluated four times: with and without covariate adjustment using two validated multivariate methods – Minimum Covariance Determinant (MCD) and Candes’ Robust Principal Component Analysis (RPCA) – and results were assessed in relation to two univariate methods. Outlying participants identified by multiple multivariate analyses were compiled and communicated to the data teams for verification. Of 161 and 148 participants in the neuropsychology and gait data, 44 and 43 were flagged by one or both multivariate methods and errors were identified for 8 and 5 participants, respectively. MCD identified all participants with errors, while RPCA identified 6/8 and 3/5 for the neuropsychology and gait data, respectively. Both outperformed univariate approaches. Adjusting for covariates had a minor effect on the participants identified as outliers, though did affect error detection. Manual QC procedures are insufficient for large studies as many errors remain undetected. In these data, the MCD outperforms the RPCA for identifying errors, and both are more successful than univariate approaches. Therefore, data-driven multivariate outlier techniques are essential tools for QC as data become more complex.