scispace - formally typeset
Search or ask a question

Showing papers in "BMC Medical Research Methodology in 2007"


Journal ArticleDOI
TL;DR: A measurement tool for the 'assessment of multiple systematic reviews' (AMSTAR) was developed that consists of 11 items and has good face and content validity for measuring the methodological quality of systematic reviews.
Abstract: Our objective was to develop an instrument to assess the methodological quality of systematic reviews, building upon previous tools, empirical evidence and expert consensus. A 37-item assessment tool was formed by combining 1) the enhanced Overview Quality Assessment Questionnaire (OQAQ), 2) a checklist created by Sacks, and 3) three additional items recently judged to be of methodological importance. This tool was applied to 99 paper-based and 52 electronic systematic reviews. Exploratory factor analysis was used to identify underlying components. The results were considered by methodological experts using a nominal group technique aimed at item reduction and design of an assessment tool with face and content validity. The factor analysis identified 11 components. From each component, one item was selected by the nominal group. The resulting instrument was judged to have face and content validity. A measurement tool for the 'assessment of multiple systematic reviews' (AMSTAR) was developed. The tool consists of 11 items and has good face and content validity for measuring the methodological quality of systematic reviews. Additional studies are needed with a focus on the reproducibility and construct validity of AMSTAR, before strong recommendations can be made on its use.

3,583 citations


Journal ArticleDOI
TL;DR: It is found that more intensive follow-up of individuals in a placebo-controlled clinical trial of Ginkgo biloba for treating mild-moderate dementia resulted in a better outcome than minimal follow- up, as measured by their cognitive functioning.
Abstract: The 'Hawthorne Effect' may be an important factor affecting the generalisability of clinical research to routine practice, but has been little studied. Hawthorne Effects have been reported in previous clinical trials in dementia but to our knowledge, no attempt has been made to quantify them. Our aim was to compare minimal follow-up to intensive follow-up in participants in a placebo controlled trial of Ginkgo biloba for treating mild-moderate dementia. Participants in a dementia trial were randomised to intensive follow-up (with comprehensive assessment visits at baseline and two, four and six months post randomisation) or minimal follow-up (with an abbreviated assessment at baseline and a full assessment at six months). Our primary outcomes were cognitive functioning (ADAS-Cog) and participant and carer-rated quality of life (QOL-AD). We recruited 176 participants, mainly through general practices. The main analysis was based on Intention to treat (ITT), with available data. In the ANCOVA model with baseline score as a co-variate, follow-up group had a significant effect on outcome at six months on the ADAS-Cog score (n = 140; mean difference = -2.018; 95%CI -3.914, -0.121; p = 0.037 favouring the intensive follow-up group), and on participant-rated quality of life score (n = 142; mean difference = -1.382; 95%CI -2.642, -0.122; p = 0.032 favouring minimal follow-up group). There was no significant difference on carer quality of life. We found that more intensive follow-up of individuals in a placebo-controlled clinical trial of Ginkgo biloba for treating mild-moderate dementia resulted in a better outcome than minimal follow-up, as measured by their cognitive functioning. Current controlled trials: ISRCTN45577048

1,481 citations


Journal ArticleDOI
TL;DR: To include all relevant data regardless of effect measure chosen, reviewers should also include zero total event trials when calculating pooled estimates using OR and RR.
Abstract: Meta-analysis handles randomized trials with no outcome events in both treatment and control arms inconsistently, including them when risk difference (RD) is the effect measure but excluding them when relative risk (RR) or odds ratio (OR) are used. This study examined the influence of such trials on pooled treatment effects. Analysis with and without zero total event trials of three illustrative published meta-analyses with a range of proportions of zero total event trials, treatment effects, and heterogeneity using inverse variance weighting and random effects that incorporates between-study heterogeneity. Including zero total event trials in meta-analyses moves the pooled estimate of treatment effect closer to nil, decreases its confidence interval and decreases between-study heterogeneity. For RR and OR, inclusion of such trials causes small changes, even when they comprise the large majority of included trials. For RD, the changes are more substantial, and in extreme cases can eliminate a statistically significant effect estimate. To include all relevant data regardless of effect measure chosen, reviewers should also include zero total event trials when calculating pooled estimates using OR and RR.

406 citations


Journal ArticleDOI
TL;DR: Following the original use of Delphi in social sciences, Delphi is suggested to be an effective way to gain and measure group consensus in healthcare to ensure maximum validity of results in Delphi methodology for improved evidence of consensual decision-making.
Abstract: The criteria for stopping Delphi studies are often subjective. This study aimed to examine whether consensus and stability in the Delphi process can be ascertained by descriptive evaluation of trends in participants' views. A three round email-based Delphi required participants (n = 12) to verify their level of agreement with 8 statements, write comments on each if they considered it necessary and rank the statements for importance. Each statement was analysed quantitatively by the percentage of agreement ratings, importance rankings and the amount of comments made for each statement, and qualitatively using thematic analysis. Importance rankings between rounds were compared by calculating Kappa values to observe trends in how the process impacts on subject's views. Evolution of consensus was shown by increase in agreement percentages, convergence of range with standard deviations of importance ratings, and a decrease in the number of comments made. Stability was demonstrated by a trend of increasing Kappa values. Following the original use of Delphi in social sciences, Delphi is suggested to be an effective way to gain and measure group consensus in healthcare. However, the proposed analytical process should be followed to ensure maximum validity of results in Delphi methodology for improved evidence of consensual decision-making.

368 citations


Journal ArticleDOI
TL;DR: Working examples of two methods of data synthesis (textual narrative and thematic), used in relation to one review, are presented, with the aim of enabling researchers to consider the strength of different approaches.
Abstract: The inclusion of qualitative studies in systematic reviews poses methodological challenges. This paper presents worked examples of two methods of data synthesis (textual narrative and thematic), used in relation to one review, with the aim of enabling researchers to consider the strength of different approaches. A systematic review of lay perspectives of infant size and growth was conducted, locating 19 studies (including both qualitative and quantitative). The data extracted from these were synthesised using both a textual narrative and a thematic synthesis. The processes of both methods are presented, showing a stepwise progression to the final synthesis. Both methods led us to similar conclusions about lay views toward infant size and growth. Differences between methods lie in the way they dealt with study quality and heterogeneity. On the basis of the work reported here, we consider textual narrative and thematic synthesis have strengths and weaknesses in relation to different research questions. Thematic synthesis holds most potential for hypothesis generation, but may obscure heterogeneity and quality appraisal. Textual narrative synthesis is better able to describe the scope of existing research and account for the strength of evidence, but is less good at identifying commonality.

356 citations


Journal ArticleDOI
TL;DR: Simulation studies are used to assess the effect of varying sample size at both the individual and group level on the accuracy of the estimates of the parameters and variance components of multilevel logistic regression models, and suggest that low prevalent events require larger sample sizes.
Abstract: Background Many studies conducted in health and social sciences collect individual level data as outcome measures. Usually, such data have a hierarchical structure, with patients clustered within physicians, and physicians clustered within practices. Large survey data, including national surveys, have a hierarchical or clustered structure; respondents are naturally clustered in geographical units (e.g., health regions) and may be grouped into smaller units. Outcomes of interest in many fields not only reflect continuous measures, but also binary outcomes such as depression, presence or absence of a disease, and self-reported general health. In the framework of multilevel studies an important problem is calculating an adequate sample size that generates unbiased and accurate estimates.

355 citations


Journal ArticleDOI
TL;DR: A scientifically updated and regionally adapted multilingual Health Risk Appraisal for Older Persons (HRA-O) instrument consisting of a self-administered questionnaire and software-generated feed-back reports was developed and highly accepted by a broad range of community-dwelling non-disabled persons.
Abstract: Health risk appraisal is a promising method for health promotion and prevention in older persons. The Health Risk Appraisal for the Elderly (HRA-E) developed in the U.S. has unique features but has not been tested outside the United States. Based on the original HRA-E, we developed a scientifically updated and regionally adapted multilingual Health Risk Appraisal for Older Persons (HRA-O) instrument consisting of a self-administered questionnaire and software-generated feed-back reports. We evaluated the practicability and performance of the questionnaire in non-disabled community-dwelling older persons in London (U.K.) (N = 1090), Hamburg (Germany) (N = 804), and Solothurn (Switzerland) (N = 748) in a sub-sample of an international randomised controlled study. Over eighty percent of invited older persons returned the self-administered HRA-O questionnaire. Fair or poor self-perceived health status and older age were correlated with higher rates of non-return of the questionnaire. Older participants and those with lower educational levels reported more difficulty in completing the HRA-O questionnaire as compared to younger and higher educated persons. However, even among older participants and those with low educational level, more than 80% rated the questionnaire as easy to complete. Prevalence rates of risks for functional decline or problems were between 2% and 91% for the 19 HRA-O domains. Participants' intention to change health behaviour suggested that for some risk factors participants were in a pre-contemplation phase, having no short- or medium-term plans for change. Many participants perceived their health behaviour or preventative care uptake as optimal, despite indications of deficits according to the HRA-O based evaluation. The HRA-O questionnaire was highly accepted by a broad range of community-dwelling non-disabled persons. It identified a high number of risks and problems, and provided information on participants' intention to change health behaviour.

317 citations


Journal ArticleDOI
TL;DR: Using the proposed strategy will generate evidence relevant to clinical practice, while acknowledging the absence of regulatory and financial gatekeepers for CAM, and emphasize the important but subtle differences between CAM and conventional medical practice.
Abstract: To explore the strengths and weaknesses of conventional biomedical research strategies and methods as applied to complementary and alternative medicine (CAM), and to suggest a new research framework for assessing these treatment modalities. There appears to be a gap between published studies showing little or no efficacy of CAM, and reports of substantial clinical benefit from patients and CAM practitioners. This "gap" might be partially due to the current focus on placebo-controlled randomized trials, which are appropriately designed to answer questions about the efficacy and safety of pharmaceutical agents. In an attempt to fit this assessment strategy, complex CAM treatment approaches have been dissected into standardized and often simplified treatment methods, and outcomes have been limited. Unlike conventional medicine, CAM has no regulatory or financial gatekeeper controlling their therapeutic "agents" before they are marketed. Treatments may thus be in widespread use before researchers know of their existence. In addition, the treatments are often provided as an integrated 'whole system' of care, without careful consideration of the safety issue. We propose a five-phase strategy for assessing CAM built on the acknowledgement of the inherent, unique aspects of CAM treatments and their regulatory status in most Western countries. These phases comprise: Using the proposed strategy will generate evidence relevant to clinical practice, while acknowledging the absence of regulatory and financial gatekeepers for CAM. It will also emphasize the important but subtle differences between CAM and conventional medical practice.

284 citations


Journal ArticleDOI
TL;DR: This discussion paper was developed by consensus among experienced reviewers, members of the Adverse Effects Subgroup of The Cochrane Collaboration, and supplemented by a consultation of content experts in reviews methodology, as well as those working in drug safety.
Abstract: As every healthcare intervention carries some risk of harm, clinical decision making needs to be supported by a systematic assessment of the balance of benefit to harm. A systematic review that considers only the favourable outcomes of an intervention, without also assessing the adverse effects, can mislead by introducing a bias favouring the intervention. Much of the current guidance on systematic reviews is directed towards the evaluation of effectiveness; but this differs in important ways from the methods used in assessing the safety and tolerability of an intervention. A detailed discussion of why, how and when to include adverse effects in a systematic review, is required. This discussion paper, which presupposes a basic knowledge of systematic review methodology, was developed by consensus among experienced reviewers, members of the Adverse Effects Subgroup of The Cochrane Collaboration, and supplemented by a consultation of content experts in reviews methodology, as well as those working in drug safety. A logical framework for making decisions in reviews that incorporate adverse effects is provided. We explore situations where a comprehensive investigation of adverse effects is warranted and suggest strategies to identify practicable and clinically useful outcomes. The advantages and disadvantages of including observational and experimental study designs are reviewed. The consequences of including separate studies for intended and unintended effects are explained. Detailed advice is given on designing electronic searches for studies with adverse effects data. Reviewers of adverse effects are given general guidance on the assessment of study bias, data collection, analysis, presentation and the interpretation of harms in a systematic review. Readers need to be able to recognize how strategic choices made in the review process determine what harms are found, and how the findings may affect clinical decisions. Researchers undertaking a systematic review that incorporates adverse effect data should understand the rationale for the suggested methods and be able to implement them in their review.

230 citations


Journal ArticleDOI
TL;DR: Maximum likelihood estimation of a general normal model and a generalised model for bivariate random-effects meta-analysis (BRMA), which highlights some of these benefits in both a normal and generalised modelling framework, and examines the estimation of between-study correlation to aid practitioners.
Abstract: When multiple endpoints are of interest in evidence synthesis, a multivariate meta-analysis can jointly synthesise those endpoints and utilise their correlation. A multivariate random-effects meta-analysis must incorporate and estimate the between-study correlation (ρ B ). In this paper we assess maximum likelihood estimation of a general normal model and a generalised model for bivariate random-effects meta-analysis (BRMA). We consider two applied examples, one involving a diagnostic marker and the other a surrogate outcome. These motivate a simulation study where estimation properties from BRMA are compared with those from two separate univariate random-effects meta-analyses (URMAs), the traditional approach. The normal BRMA model estimates ρ B as -1 in both applied examples. Analytically we show this is due to the maximum likelihood estimator sensibly truncating the between-study covariance matrix on the boundary of its parameter space. Our simulations reveal this commonly occurs when the number of studies is small or the within-study variation is relatively large; it also causes upwardly biased between-study variance estimates, which are inflated to compensate for the restriction on B . Importantly, this does not induce any systematic bias in the pooled estimates and produces conservative standard errors and mean-square errors. Furthermore, the normal BRMA is preferable to two normal URMAs; the mean-square error and standard error of pooled estimates is generally smaller in the BRMA, especially given data missing at random. For meta-analysis of proportions we then show that a generalised BRMA model is better still. This correctly uses a binomial rather than normal distribution, and produces better estimates than the normal BRMA and also two generalised URMAs; however the model may sometimes not converge due to difficulties estimating ρ B . A BRMA model offers numerous advantages over separate univariate synthesises; this paper highlights some of these benefits in both a normal and generalised modelling framework, and examines the estimation of between-study correlation to aid practitioners.

215 citations


Journal ArticleDOI
TL;DR: The most suitable program for a meta-analysis will depend on the user's needs and preferences and this report provides an overview that should be helpful in making a substantiated choice.
Abstract: Background Our objective was to systematically assess the differences in features, results, and usability of currently available meta-analysis programs.

Journal ArticleDOI
TL;DR: The new procedure of combining MI with bootstrapping for variable selection, results in multivariable prognostic models with good performance and is therefore attractive to apply on data sets with missing values.
Abstract: Background. Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable selection. Method. In our prospective cohort study we merged data from three different randomized controlled trials (RCTs) to assess prognostic variables for chronicity of low back pain. Among the outcome and prognostic variables data were missing in the range of 0 and 48.1%. We used four methods to investigate the influence of respectively sampling and imputation variation: MI only, bootstrap only, and two methods that combine MI and bootstrapping. Variables were selected based on the inclusion frequency of each prognostic variable, i.e. the proportion of times that the variable appeared in the model. The discriminative and calibrative abilities of prognostic models developed by the four methods were assessed at different inclusion levels. Results. We found that the effect of imputation variation on the inclusion frequency was larger than the effect of sampling variation. When MI and bootstrapping were combined at the range of 0% (full model) to 90% of variable selection, bootstrap corrected c-index values of 0.70 to 0.71 and slope values of 0.64 to 0.86 were found. Conclusion. We recommend to account for both imputation and sampling variation in sets of missing data. The new procedure of combining MI with bootstrapping for variable selection, results in multivariable prognostic models with good performance and is therefore attractive to apply on data sets with missing values. © 2007 Heymans et al; licensee BioMed Central Ltd.

Journal ArticleDOI
TL;DR: The mode of data collection affects the reporting of self assessed health items substantially and in epidemiological studies, the method effect may be as large as the effects under investigation.
Abstract: Data for health surveys are often collected using either mailed questionnaires, telephone interviews or a combination. Mode of data collection can affect the propensity to refuse to respond and result in different patterns of responses. The objective of this paper is to examine and quantify effects of mode of data collection in health surveys. A stratified sample of 4,000 adults residing in Denmark was randomised to mailed questionnaires or computer-assisted telephone interviews. 45 health-related items were analyzed; four concerning behaviour and 41 concerning self assessment. Odds ratios for more positive answers and more frequent use of extreme response categories (both positive and negative) among telephone respondents compared to questionnaire respondents were estimated. Tests were Bonferroni corrected. For the four health behaviour items there were no significant differences in the response patterns. For 32 of the 41 health self assessment items the response pattern was statistically significantly different and extreme response categories were used more frequently among telephone respondents (Median estimated odds ratio: 1.67). For a majority of these mode sensitive items (26/32), a more positive reporting was observed among telephone respondents (Median estimated odds ratio: 1.73). The overall response rate was similar among persons randomly assigned to questionnaires (58.1%) and to telephone interviews (56.2%). A differential nonresponse bias for age and gender was observed. The rate of missing responses was higher for questionnaires (0.73 – 6.00%) than for telephone interviews (0 – 0.51%). The "don't know" option was used more often by mail respondents (10 – 24%) than by telephone respondents (2 – 4%). The mode of data collection affects the reporting of self assessed health items substantially. In epidemiological studies, the method effect may be as large as the effects under investigation. Caution is needed when comparing prevalences across surveys or when studying time trends.

Journal ArticleDOI
TL;DR: A multi-faceted approach to maximise participation of GPs and their patients in intervention studies, using an Australian randomised controlled trial of a depression/suicidality management intervention as a case study is described.
Abstract: Recruiting and retaining GPs for research can prove difficult, and may result in sub-optimal patient participation where GPs are required to recruit patients. Low participation rates may affect the validity of research. This paper describes a multi-faceted approach to maximise participation of GPs and their patients in intervention studies, using an Australian randomised controlled trial of a depression/suicidality management intervention as a case study. The paper aims to outline experiences that may be of interest to others considering engaging GPs and/or their patients in primary care studies. A case study approach is used to describe strategies for: (a) recruiting GPs; (b) encouraging GPs to recruit patients to complete a postal questionnaire; and (c) encouraging GPs to recruit patients as part of a practice audit. Participant retention strategies are discussed in light of reasons for withdrawal. The strategies described, led to the recruitment of a higher than expected number of GPs (n = 772). Three hundred and eighty three GPs (49.6%) followed through with the intent to participate by sending out a total of 77,820 postal questionnaires, 22,251 (28.6%) of which were returned. Three hundred and three GPs (37.0%) participated in the practice audit, which aimed to recruit 20 patients per participating GP (i.e., a total of 6,060 older adults). In total, 5,143 patients (84.9%) were represented in the audit. Inexpensive methods were chosen to identify and recruit GPs; these relied on an existing database, minor promotion and a letter of invitation. Anecdotally, participating GPs agreed to be involved because they had an interest in the topic, believed the study would not impinge too greatly on their time, and appreciated the professional recognition afforded by the Continuing Professional Development (CPD) points associated with study participation. The study team established a strong rapport with GPs and their reception staff, offered clear instructions, and were as flexible and helpful as possible to retain GP participants. Nonetheless, we experienced attrition due to GPs' competing demands, eligibility, personnel issues and the perceived impact of the study on patients. A summary of effective and ineffective methods for recruitment and retention is provided.

Journal ArticleDOI
TL;DR: Examination of the relation between body mass index (BMI), as a proxy for obesity, and depression using the Canadian Community Health Survey, Cycle 1.2 demonstrated that SEM is a feasible technique for modeling the relationship between obesity and depression.
Abstract: Obesity and depression are two major diseases which are associated with many other health problems such as hypertension, dyslipidemia, diabetes mellitus, coronary heart disease, stroke, myocardial infarction, heart failure in patients with systolic hypertension, low bone mineral density and increased mortality. Both diseases share common health complications but there are inconsistent findings concerning the relationship between obesity and depression. In this work we used the structural equation modeling (SEM) technique to examine the relation between body mass index (BMI), as a proxy for obesity, and depression using the Canadian Community Health Survey, Cycle 1.2. In this SEM model we postulate that 1) BMI and depression are directly related, 2) BMI is directly affected by the physical activity and, 3)depression is directly influenced by stress. SEM was also used to assess the relation between BMI and depression separately for males and females. The results indicate that higher BMI is associated with more severe form of depression. On the other hand, the more severe form of depression may result in less weight gain. However, the association between depression and BMI is gender dependent. In males, the higher BMI may result in a more severe form of depression while in females the relation may not be the same. Also, there was a negative relationship between physical activity and BMI. In general, use of SEM method showed that the two major diseases, obesity and depression, are associated but the form of the relation is different among males and females. More research is necessary to further understand the complexity of the relationship between obesity and depression. It also demonstrated that SEM is a feasible technique for modeling the relation between obesity and depression.

Journal ArticleDOI
TL;DR: A novel experimental method is outlined that permits disaggregation of maternally provided inherited genetic and post-implantation prenatal effects and a study based on children born by IVF treatment and who differ in genetic relatedness to the woman undergoing the pregnancy is feasible.
Abstract: There is much evidence to suggest that risk for common clinical disorders begins in foetal life. Exposure to environmental risk factors however is often not random. Many commonly used indices of prenatal adversity (e.g. maternal gestational stress, gestational diabetes, smoking in pregnancy) are influenced by maternal genes and genetically influenced maternal behaviour. As mother provides the baby with both genes and prenatal environment, associations between prenatal risk factors and offspring disease maybe attributable to true prenatal risk effects or to the "confounding" effects of genetic liability that are shared by mother and offspring. Cross-fostering designs, including those that involve embryo transfer have proved useful in animal studies. However disentangling these effects in humans poses significant problems for traditional genetic epidemiological research designs. We present a novel research strategy aimed at disentangling maternally provided pre-natal environmental and inherited genetic effects. Families of children aged 5 to 9 years born by assisted reproductive technologies, specifically homologous IVF, sperm donation, egg donation, embryo donation and gestational surrogacy were contacted through fertility clinics and mailed a package of questionnaires on health and mental health related risk factors and outcomes. Further data were obtained from antenatal records. To date 741 families from 18 fertility clinics have participated. The degree of association between maternally provided prenatal risk factor and child outcome in the group of families where the woman undergoing pregnancy and offspring are genetically related (homologous IVF, sperm donation) is compared to association in the group where offspring are genetically unrelated to the woman who undergoes the pregnancy (egg donation, embryo donation, surrogacy). These comparisons can be then examined to infer the extent to which prenatal effects are genetically and environmentally mediated. A study based on children born by IVF treatment and who differ in genetic relatedness to the woman undergoing the pregnancy is feasible. The present report outlines a novel experimental method that permits disaggregation of maternally provided inherited genetic and post-implantation prenatal effects.

Journal ArticleDOI
TL;DR: Several methodological challenges in designing and conducting a pragmatic primary care based randomised controlled trial are discussed, based on the experiences in the DIAMOND-study and the rationale behind the choices made are discussed.
Abstract: Pragmatic randomised controlled trials are often used in primary care to evaluate the effect of a treatment strategy. In these trials it is difficult to achieve both high internal validity and high generalisability. This article will discuss several methodological challenges in designing and conducting a pragmatic primary care based randomised controlled trial, based on our experiences in the DIAMOND-study and will discuss the rationale behind the choices we made. From the successes as well as the problems we experienced the quality of future pragmatic trials may benefit. The first challenge concerned choosing the clinically most relevant interventions to compare and enable blinded comparison, since two interventions had very different appearances. By adding treatment steps to one treatment arm and adding placebo to both treatment arms both internal and external validity were optimized. Nevertheless, although blinding is essential for a high internal validity, it should be warily considered in a pragmatic trial because it decreases external validity. Choosing and recruiting a representative selection of participants was the second challenge. We succeeded in retrieving a representative relatively large patient sample by carefully choosing (few) inclusion and exclusion criteria, by random selection, by paying much attention to participant recruitment and taking the participant's reasons to participate into account. Good and regular contact with the GPs and patients was to our opinion essential. The third challenge was to choose the primary outcome, which needed to reflect effectiveness of the treatment in every day practice. We also designed our protocol to follow every day practice as much as possible, although standardized treatment is usually preferred in trials. The aim of this was our fourth challenge: to limit the number of protocol deviations and increase external validity. It is challenging to design and conduct a pragmatic trial. Thanks to thorough preparation, we were able to collect highly valid data. To our opinion, a critical deliberation of where on the pragmatic – explanatory spectrum you want your trial to be on forehand, in combination with consulting publications especially on patient recruitment procedures, has been helpful in conducting a successful trial.

Journal ArticleDOI
TL;DR: Standard Poisson models provide a poor fit for alcohol consumption data from a motivating example, and did not preserve Type-I error rates for the randomized group comparison when the true distribution was over-dispersed Poisson.
Abstract: Alcohol consumption is commonly used as a primary outcome in randomized alcohol treatment studies. The distribution of alcohol consumption is highly skewed, particularly in subjects with alcohol dependence. In this paper, we will consider the use of count models for outcomes in a randomized clinical trial setting. These include the Poisson, over-dispersed Poisson, negative binomial, zero-inflated Poisson and zero-inflated negative binomial. We compare the Type-I error rate of these methods in a series of simulation studies of a randomized clinical trial, and apply the methods to the ASAP (Addressing the Spectrum of Alcohol Problems) trial. Standard Poisson models provide a poor fit for alcohol consumption data from our motivating example, and did not preserve Type-I error rates for the randomized group comparison when the true distribution was over-dispersed Poisson. For the ASAP trial, where the distribution of alcohol consumption featured extensive over-dispersion, there was little indication of significant randomization group differences, except when the standard Poisson model was fit. As with any analysis, it is important to choose appropriate statistical models. In simulation studies and in the motivating example, the standard Poisson was not robust when fit to over-dispersed count data, and did not maintain the appropriate Type-I error rate. To appropriately model alcohol consumption, more flexible count models should be routinely employed.

Journal ArticleDOI
TL;DR: This article proposes that caution should be exercised in the interpretation of journal impact factors and their ranks, and specifically that a measure of uncertainty should be routinely presented alongside the point estimate.
Abstract: Background Journal impact factors and their ranks are used widely by journals, researchers, and research assessment exercises.

Journal ArticleDOI
TL;DR: The PRO-AGE (PRevention in Older people - Assessment in GEneralists' practices) project as mentioned in this paper was the first large-scale randomised controlled trial of health risk appraisal for older people in Europe.
Abstract: This paper describes the study protocol, the recruitment, and base-line data for evaluating the success of randomisation of the PRO-AGE (PRevention in Older people – Assessment in GEneralists' practices) project. A group of general practitioners (GPs) in London (U.K.), Hamburg (Germany) and Solothurn (Switzerland) were trained in risk identification, health promotion, and prevention in older people. Their non-disabled older patients were invited to participate in a randomised controlled study. Participants allocated to the intervention group were offered the Health Risk Appraisal for Older Persons (HRA-O) instrument with a site-specific method for reinforcement (London: physician reminders in electronic medical record; Hamburg: one group session or two preventive home visits; Solothurn: six-monthly preventive home visits over a two-year period). Participants allocated to the control group received usual care. At each site, an additional group of GPs did not receive the training, and their eligible patients were invited to participate in a concurrent comparison group. Primary outcomes are self-reported health behaviour and preventative care use at one-year follow-up. In Solothurn, an additional follow-up was conducted at two years. The number of older persons agreeing to participate (% of eligible persons) in the randomised controlled study was 2503 (66.0%) in London, 2580 (53.6%) in Hamburg, and 2284 (67.5%) in Solothurn. Base-line findings confirm that randomisation of participants was successful, with comparable characteristics between intervention and control groups. The number of persons (% of eligible) enrolled in the concurrent comparison group was 636 (48.8%) in London, 746 (35.7%) in Hamburg, and 1171 (63.0%) in Solothurn. PRO-AGE is the first large-scale randomised controlled trial of health risk appraisal for older people in Europe. Its results will inform about the effects of implementing HRA-O with different methods of reinforcement.

Journal ArticleDOI
TL;DR: Bias due to nonresponse appears to be small in this study, and increasing the response rates had little effect on the results, but the results suggest that bias due to reporting errors could be greater than bias caused by nonresponse.
Abstract: Low response and reporting errors are major concerns for survey epidemiologists. However, while nonresponse is commonly investigated, the effects of misclassification are often ignored, possibly because they are hard to quantify. We investigate both sources of bias in a recent study of the effects of deployment to the 2003 Iraq war on the health of UK military personnel, and attempt to determine whether improving response rates by multiple mailouts was associated with increased misclassification error and hence increased bias in the results. Data for 17,162 UK military personnel were used to determine factors related to response and inverse probability weights were used to assess nonresponse bias. The percentages of inconsistent and missing answers to health questions from the 10,234 responders were used as measures of misclassification in a simulation of the 'true' relative risks that would have been observed if misclassification had not been present. Simulated and observed relative risks of multiple physical symptoms and post-traumatic stress disorder (PTSD) were compared across response waves (number of contact attempts). Age, rank, gender, ethnic group, enlistment type (regular/reservist) and contact address (military or civilian), but not fitness, were significantly related to response. Weighting for nonresponse had little effect on the relative risks. Of the respondents, 88% had responded by wave 2. Missing answers (total 3%) increased significantly (p < 0.001) between waves 1 and 4 from 2.4% to 7.3%, and the percentage with discrepant answers (total 14%) increased from 12.8% to 16.3% (p = 0.007). However, the adjusted relative risks decreased only slightly from 1.24 to 1.22 for multiple physical symptoms and from 1.12 to 1.09 for PTSD, and showed a similar pattern to those simulated. Bias due to nonresponse appears to be small in this study, and increasing the response rates had little effect on the results. Although misclassification is difficult to assess, the results suggest that bias due to reporting errors could be greater than bias caused by nonresponse. Resources might be better spent on improving and validating the data, rather than on increasing the response rate.

Journal ArticleDOI
TL;DR: A CUmulative SUM based surveillance system that could be used in continuous monitoring of clinical outcomes, using routinely collected data, and detected periods of increased rates of low Apgar scores for each of the nulliparous and multiparous cohorts.
Abstract: Background The lack of robust systems for monitoring quality in healthcare has been highlighted Statistical process control (SPC) methods, utilizing the increasingly available routinely collected electronic patient records, could be used in creating surveillance systems that could lead to rapid detection of periods of deteriorating standards We aimed to develop and test a CUmulative SUM (CUSUM) based surveillance system that could be used in continuous monitoring of clinical outcomes, using routinely collected data The low Apgar score (5 minute Apgar score < 7) was used as an example outcome

Journal ArticleDOI
TL;DR: Although activity monitors predict PA on the same scale (counts/min), the results between these two brands are not directly comparable, however, the data are comparable if a conversion equation is applied, with better results for log-transformed data.
Abstract: Understanding the relationships between physical activity (PA) and disease has become a major area of research interest. Activity monitors, devices that quantify free-living PA for prolonged periods of time (days or weeks), are increasingly being used to estimate PA. A range of different activity monitors brands are available for investigators to use, but little is known about how they respond to different levels of PA in the field, nor if data conversion between brands is possible. 56 women and men were fitted with two different activity monitors, the Actigraph™ (Actigraph LLC; AGR) and the Actical™ (Mini-Mitter Co.; MM) for 15 days. Both activity monitors were fixed to an elasticized belt worn over the hip, with the anterior and posterior position of the activity monitors randomized. Differences between activity monitors and the validity of brand inter-conversion were measured by t-tests, Pearson correlations, Bland-Altman plots, and coefficients of variation (CV). The AGR detected a significantly greater amount of daily PA (216.2 ± 106.2 vs. 188.0 ± 101.1 counts/min, P < 0.0001). The average difference between activity monitors expressed as a CV were 3.1 and 15.5% for log-transformed and raw data, respectively. When a conversion equation was applied to convert datasets from one brand to another, the differences were no longer significant, with CV's of 2.2 and 11.7%, log-transformed and raw data, respectively. Although activity monitors predict PA on the same scale (counts/min), the results between these two brands are not directly comparable. However, the data are comparable if a conversion equation is applied, with better results for log-transformed data.

Journal ArticleDOI
TL;DR: The findings support the credentials of WHO's 4-domain model as a universal QOL construct and give the impression that analysis of WHOQOL-Bref could benefit from including all the items in FA and using OQOL as a dependent variable.
Abstract: The widespread international use of the 26-item WHO Quality of Life Instrument (WHOQOL-Bref) necessitates the assessment of its factor structure across cultures. For, alternative factor models may provide a better explanation of the data than the WHO 4- and 6-domain models. The objectives of the study were: to assess the factor structure of the WHOQOL-Bref in a Sudanese general population sample; and use confirmatory factor analysis (CFA) and path analysis (PA) to see how well the model thus generated fits into the WHOQOL-Bref data of Sudanese psychiatric patients and their family caregivers. In exploratory factor analysis (FA) with all items, data from 623 general population subjects were used to generate a 5-domain model. In CFA and PA, the model was tested on the data of 300 psychiatric outpatients and their caregivers, using four goodness of fit (GOF) criteria in Analysis of Moment Structures (AMOS). In the path relationships for our model, the dependent variable was the item on overall QOL (OQOL). For the WHO 6-domain model, the general facet on health and QOL was the dependent variable. Two of the five factors ("personal relations" and "environment") from our FA were similar to the WHO's. In CFA, the four GOF criteria were met by our 5-domain model and WHO's 4-domain model on the psychiatric data. In PA, these two models met the GOF criteria on the general population data. The direct predictors of OQOL were our factors: "life satisfaction" and "sense of enjoyment". For the general facet, predictors were WHO domains: "environment", "physical health" and "independence'. The findings support the credentials of WHO's 4-domain model as a universal QOL construct; and the impression that analysis of WHOQOL-Bref could benefit from including all the items in FA and using OQOL as a dependent variable. The clinical significance is that by more of such studies, a combination of domains from the WHO models and the local models would be generated and used to develop rigorous definitions of QOL, from which primary targets for subjective QOL interventions could be delineated that would have cross-cultural relevance.

Journal ArticleDOI
TL;DR: Structured recruitment efforts that utilize characteristics of early responders (refusal or consent) in enrollment and recontact efforts may achieve early response, thereby reducing mail costs and the use of valuable resources in subsequent contact efforts.
Abstract: Often in survey research, subsets of the population invited to complete the survey do not respond in a timely manner and valuable resources are expended in recontact efforts. Various methods of improving response have been offered, such as reducing questionnaire length, offering incentives, and utilizing reminders; however, these methods can be costly. Utilizing characteristics of early responders (refusal or consent) in enrollment and recontact efforts may be a unique and cost-effective approach for improving the quality of epidemiologic research. To better understand early responders of any kind, we compared the characteristics of individuals who explicitly refused, consented, or did not respond within 2 months from the start of enrollment into a large cohort study of US military personnel. A multivariate polychotomous logistic regression model was used to estimate the effect of each covariate on the odds of early refusal and on the odds of early consent versus late/non-response, while simultaneously adjusting for all other variables in the model. From regression analyses, we found many similarities between early refusers and early consenters. Factors associated with both early refusal and early consent included older age, higher education, White race/ethnicity, Reserve/Guard affiliation, and certain information technology and support occupations. These data suggest that early refusers may differ from late/non-responders, and that certain characteristics are associated with both early refusal and early consent to participate. Structured recruitment efforts that utilize these differences may achieve early response, thereby reducing mail costs and the use of valuable resources in subsequent contact efforts.

Journal ArticleDOI
TL;DR: VIA sensitivity and specificity with the 3-class LCA model were within the range of published data and relatively consistent with conventional analyses, thus validating the original assessment of test accuracy.
Abstract: The purpose of this study was to validate the accuracy of an alternative cervical cancer test – visual inspection with acetic acid (VIA) – by addressing possible imperfections in the gold standard through latent class analysis (LCA). The data were originally collected at peri-urban health clinics in Zimbabwe. Conventional accuracy (sensitivity/specificity) estimates for VIA and two other screening tests using colposcopy/biopsy as the reference standard were compared to LCA estimates based on results from all four tests. For conventional analysis, negative colposcopy was accepted as a negative outcome when biopsy was not available as the reference standard. With LCA, local dependencies between tests were handled through adding direct effect parameters or additional latent classes to the model. Two models yielded good fit to the data, a 2-class model with two adjustments and a 3-class model with one adjustment. The definition of latent disease associated with the latter was more stringent, backed by three of the four tests. Under that model, sensitivity for VIA (abnormal+) was 0.74 compared to 0.78 with conventional analyses. Specificity was 0.639 versus 0.568, respectively. By contrast, the LCA-derived sensitivity for colposcopy/biopsy was 0.63. VIA sensitivity and specificity with the 3-class LCA model were within the range of published data and relatively consistent with conventional analyses, thus validating the original assessment of test accuracy. LCA probably yielded more likely estimates of the true accuracy than did conventional analysis with in-country colposcopy/biopsy as the reference standard. Colpscopy with biopsy can be problematic as a study reference standard and LCA offers the possibility of obtaining estimates adjusted for referent imperfections.

Journal ArticleDOI
TL;DR: While difficult, recruitment to and retention within multi-centre trials from primary care can be successfully achieved through the application of the best available evidence, establishing good relationships with practices, minimising the workload of those involved in recruitment and offering enhanced care to all participants.
Abstract: It is notoriously difficult to recruit patients to randomised controlled trials in primary care. This is particularly true when the disease process under investigation occurs relatively infrequently and must be investigated during a brief time window. Bell's palsy, an acute unilateral paralysis of the facial nerve is just such a relatively rare condition. In this case study we describe the organisational issues presented in setting up a large randomised controlled trial of the management of Bell's palsy across primary and secondary care in Scotland and how we managed to successfully recruit and retain patients presenting in the community. Where possible we used existing evidence on recruitment strategies to maximise recruitment and retention. We consider that the key issues in the success of this study were; the fact that the research was seen as clinically important by the clinicians who had initial responsibility for recruitment; employing an experienced trial co-ordinator and dedicated researchers willing to recruit participants seven days per week and to visit them at home at a time convenient to them, hence reducing missed patients and ensuring they were retained in the study; national visibility and repeated publicity at a local level delivered by locally based principal investigators well known to their primary care community; encouraging recruitment by payment to practices and reducing the workload of the referring doctors by providing immediate access to specialist care; good collaboration between primary and secondary care and basing local investigators in the otolarnygology trial centres Although the recruitment rate did not meet our initial expectations, enhanced retention meant that we exceeded our planned target of recruiting 550 patients within the planned time-scale. While difficult, recruitment to and retention within multi-centre trials from primary care can be successfully achieved through the application of the best available evidence, establishing good relationships with practices, minimising the workload of those involved in recruitment and offering enhanced care to all participants. Primary care trialists should describe their experiences of the methods used to persuade patients to participate in their trials when publishing their results.

Journal ArticleDOI
TL;DR: Coefficient δG was shown to have decisive utility in distinguishing between the cross-sectional discrimination of two equally reliable scoring methods, indicating that the dichotomous coding, although reliable, failed to discriminate between individuals.
Abstract: Questionnaires are used routinely in clinical research to measure health status and quality of life. Questionnaire measurements are traditionally formally assessed by indices of reliability (the degree of measurement error) and validity (the extent to which the questionnaire measures what it is supposed to measure). Neither of these indices assesses the degree to which the questionnaire is able to discriminate between individuals, an important aspect of measurement. This paper introduces and extends an existing index of a questionnaire's ability to distinguish between individuals, that is, the questionnaire's discrimination. Ferguson (1949) [1] derived an index of test discrimination, coefficient δ, for psychometric tests with dichotomous (correct/incorrect) items. In this paper a general form of the formula, δ G , is derived for the more general class of questionnaires allowing for several response choices. The calculation and characteristics of δ G are then demonstrated using questionnaire data (GHQ-12) from 2003–2004 British Household Panel Survey (N = 14761). Coefficients for reliability (α) and discrimination (δ G ) are computed for two commonly-used GHQ-12 coding methods: dichotomous coding and four-point Likert-type coding. Both scoring methods were reliable (α > 0.88). However, δ G was substantially lower (0.73) for the dichotomous coding of the GHQ-12 than for the Likert-type method (δ G = 0.96), indicating that the dichotomous coding, although reliable, failed to discriminate between individuals. Coefficient δ G was shown to have decisive utility in distinguishing between the cross-sectional discrimination of two equally reliable scoring methods. Ferguson's δ has been neglected in discussions of questionnaire design and performance, perhaps because it has not been implemented in software and was restricted to questionnaires with dichotomous items, which are rare in health care research. It is suggested that the more general formula introduced here is reported as δ G , to avoid the implication that items are dichotomously coded.

Journal ArticleDOI
TL;DR: A computer-based automated menu-driven system with 658 data fields was developed for a cohort study of women aged 65 years or older, diagnosed with invasive histologically confirmed primary breast cancer, at 6 Cancer Research Network sites and an innovative modified version of the EDC permitted an automated evaluation of inter-rater and intra- rater reliability across six data collection sites.
Abstract: The choice between paper data collection methods and electronic data collection (EDC) methods has become a key question for clinical researchers There remains a need to examine potential benefits, efficiencies, and innovations associated with an EDC system in a multi-center medical record review study A computer-based automated menu-driven system with 658 data fields was developed for a cohort study of women aged 65 years or older, diagnosed with invasive histologically confirmed primary breast cancer (N = 1859), at 6 Cancer Research Network sites Medical record review with direct data entry into the EDC system was implemented An inter-rater and intra-rater reliability (IRR) system was developed using a modified version of the EDC Automation of EDC accelerated the flow of study information and resulted in an efficient data collection process Data collection time was reduced by approximately four months compared to the project schedule and funded time available for manuscript preparation increased by 12 months In addition, an innovative modified version of the EDC permitted an automated evaluation of inter-rater and intra-rater reliability across six data collection sites Automated EDC is a powerful tool for research efficiency and innovation, especially when multiple data collection sites are involved

Journal ArticleDOI
TL;DR: Despite the comparatively high cost of telephone interviews, they offer clear advantages over mailed self-administered questionnaires as regards completeness of data, andNormative data for standardized telephone interviews could contribute to a better comparability with the results of the corresponding standardized paper questionnaires.
Abstract: The most commonly used survey methods are self-administered questionnaires, telephone interviews, and a mixture of both. But until now evidence out of randomised controlled trials as to whether patient responses differ depending on the survey mode is lacking. Therefore this study assessed whether patient responses to surveys depend on the mode of survey administration. The comparison was between mailed, self-administered questionnaires and telephone interviews. A four-armed, randomised controlled two-period change-over design. Each patient responded to the same survey twice, once in written form and once by telephone interview, separated by at least a fortnight. The study was conducted in 2003/2004 in Germany. 1087 patients taking part in the German Acupuncture Trials (GERAC cohort study), who agreed to participate in a survey after completing acupuncture treatment from an acupuncture-certified family physician for headache, were randomised. Of these, 823 (664 women) from the ages of 18 to 83 (mean 51.7) completed both parts of the study. The main outcome measure was the comparison of the scores on the 12-Item Short-Form Health Survey (SF-12) and the Graded Chronic Pain Scale (GCPS) questionnaire for the two survey modes. Computer-aided telephone interviews (CATI) resulted in significantly fewer missing data (0.5%) than did mailed questionnaires (2.8%; p < 0.001). The analysis of equivalence revealed a difference between the survey modes only for the SF-12 mental scales. On average, reported mental status score was 3.5 score points (2.9 to 4.0) lower on the self-administered questionnaire compared to the telephone interview. The order of administration affected results. Patients who responded to the telephone interview first reported better mental health in the subsequent paper questionnaire (mean difference 2.8 score points) compared to those who responded to the paper questionnaire first (mean difference 4.1 score points). Despite the comparatively high cost of telephone interviews, they offer clear advantages over mailed self-administered questionnaires as regards completeness of data. Only items concerning mental status were dependent on the survey mode and sequence of administration. Items on physical status were not affected. Normative data for standardized telephone questionnaires could contribute to a better comparability with the results of the corresponding standardized paper questionnaires.