scispace - formally typeset
Search or ask a question

Showing papers in "BMC Medical Research Methodology in 2004"


Journal ArticleDOI
TL;DR: The main objective of the Random Allocation Software project was to enhance the user's control over different aspects of randomization in parallel group trials, including output type and format, structure and ordering of generated unique identifiers and enabling users to specify group names for more than two groups.
Abstract: Background Typically, randomization software should allow users to exert control over the different aspects of randomization including block design, provision of unique identifiers and control over the format and type of program output. While some of these characteristics have been addressed by available software, none of them have all of these capabilities integrated into one package. The main objective of the Random Allocation Software project was to enhance the user's control over different aspects of randomization in parallel group trials, including output type and format, structure and ordering of generated unique identifiers and enabling users to specify group names for more than two groups.

585 citations


Journal ArticleDOI
TL;DR: It is suggested that the value of general open questions at the end of structured questionnaires can be optimised if researchers start with a clear understanding of the type of data they wish to generate from such a question, and employ an appropriate strategy when designing the study.
Abstract: The habitual "any other comments" general open question at the end of structured questionnaires has the potential to increase response rates, elaborate responses to closed questions, and allow respondents to identify new issues not captured in the closed questions. However, we believe that many researchers have collected such data and failed to analyse or present it. General open questions at the end of structured questionnaires can present a problem because of their uncomfortable status of being strictly neither qualitative nor quantitative data, the consequent lack of clarity around how to analyse and report them, and the time and expertise needed to do so. We suggest that the value of these questions can be optimised if researchers start with a clear understanding of the type of data they wish to generate from such a question, and employ an appropriate strategy when designing the study. The intention can be to generate depth data or 'stories' from purposively defined groups of respondents for qualitative analysis, or to produce quantifiable data, representative of the population sampled, as a 'safety net' to identify issues which might complement the closed questions. We encourage researchers to consider developing a more strategic use of general open questions at the end of structured questionnaires. This may optimise the quality of the data and the analysis, reduce dilemmas regarding whether and how to analyse such data, and result in a more ethical approach to making best use of the data which respondents kindly provide.

498 citations


Journal ArticleDOI
TL;DR: There is no "gold standard' critical appraisal tool for any study design, nor is there any widely accepted generic tool that can be applied equally well across study types.
Abstract: Background Consumers of research (researchers, administrators, educators and clinicians) frequently use standard critical appraisal tools to evaluate the quality of published research reports. However, there is no consensus regarding the most appropriate critical appraisal tool for allied health research. We summarized the content, intent, construction and psychometric properties of published, currently available critical appraisal tools to identify common elements and their relevance to allied health research.

359 citations


Journal ArticleDOI
TL;DR: It is confirmed that strategies that attempt to maximise the number of potentially relevant records found are likely to result in a large number of false positives and suggested that a range of search terms is required to optimise searching for qualitative evidence.
Abstract: Background: Qualitative research makes an important contribution to our understanding of health and healthcare. However, qualitative evidence can be difficult to search for and identify, and the effectiveness of different types of search strategies is unknown. Methods: Three search strategies for qualitative research in the example area of support for breast-feeding were evaluated using six electronic bibliographic databases. The strategies were based on using thesaurus terms, free-text terms and broad-based terms. These strategies were combined with recognised search terms for support for breast-feeding previously used in a Cochrane review. For each strategy, we evaluated the recall (potentially relevant records found) and precision (actually relevant records found). Results: A total yield of 7420 potentially relevant records was retrieved by the three strategies combined. Of these, 262 were judged relevant. Using one strategy alone would miss relevant records. The broad-based strategy had the highest recall and the thesaurus strategy the highest precision. Precision was generally poor: 96% of records initially identified as potentially relevant were deemed irrelevant. Searching for qualitative research involves trade-offs between recall and precision. Conclusions: These findings confirm that strategies that attempt to maximise the number of potentially relevant records found are likely to result in a large number of false positives. The findings also suggest that a range of search terms is required to optimise searching for qualitative evidence. This underlines the problems of current methods for indexing qualitative research in bibliographic databases and indicates where improvements need to be made.

280 citations


Journal ArticleDOI
Martin Bland1
TL;DR: Whether cluster randomised trials are increasing in both number and quality of reporting is asked, as well as whether statistician pressure works, because cluster trials are becoming more frequent and reporting is of higher quality.
Abstract: Several reviews of published cluster randomised trials have reported that about half did not take clustering into account in the analysis, which was thus incorrect and potentially misleading. In this paper I ask whether cluster randomised trials are increasing in both number and quality of reporting. Computer search for papers on cluster randomised trials since 1980, hand search of trial reports published in selected volumes of the British Medical Journal over 20 years. There has been a large increase in the numbers of methodological papers and of trial reports using the term 'cluster random' in recent years, with about equal numbers of each type of paper. The British Medical Journal contained more such reports than any other journal. In this journal there was a corresponding increase over time in the number of trials where subjects were randomised in clusters. In 2003 all reports showed awareness of the need to allow for clustering in the analysis. In 1993 and before clustering was ignored in most such trials. Cluster trials are becoming more frequent and reporting is of higher quality. Perhaps statistician pressure works.

183 citations


Journal ArticleDOI
TL;DR: Ranking journals by impact factor and non-citation produces similar results and creates a clear distinction between how citation analysis is used to determine the quality of a journal (low level of non-Citation) and an individual article (citation counting).
Abstract: Current methods of measuring the quality of journals assume that citations of articles within journals are normally distributed Furthermore using journal impact factors to measure the quality of individual articles is flawed if citations are not uniformly spread between articles The aim of this study was to assess the distribution of citations to articles and use the level of non-citation of articles within a journal as a measure of quality This ranking method is compared with the impact factor, as calculated by ISI® Total citations gained by October 2003, for every original article and review published in current immunology (13125 articles; 105 journals) and surgical (17083 articles; 120 journals) fields during 2001 were collected using ISI® Web of Science The distribution of citation of articles within an individual journal is mainly non-parametric throughout the literature One sixth (167%; IQR 136–192) of articles in a journal accrue half the total number of citations to that journal There was a broader distribution of citation to articles in higher impact journals and in the field of immunology compared to surgery 237% (IQR 146–424) of articles had not yet been cited Levels of non-citation varied between journals and subject fields There was a significant negative correlation between the proportion of articles never cited and a journal's impact factor for both immunology (rho = -0854) and surgery journals (rho = -0924) Ranking journals by impact factor and non-citation produces similar results Using a non-citation rate is advantageous as it creates a clear distinction between how citation analysis is used to determine the quality of a journal (low level of non-citation) and an individual article (citation counting) Non-citation levels should therefore be made available for all journals

157 citations


Journal ArticleDOI
TL;DR: This incongruence of test statistics and P values is another example that statistical practice is generally poor, even in the most renowned scientific journals, and that quality of papers should be more controlled and valued.
Abstract: Background: Given an observed test statistic and its degrees of freedom, one may compute the observed P value with most statistical packages. It is unknown to what extent test statistics and P values are congruent in published medical papers. Methods: We checked the congruence of statistical results reported in all the papers of volumes 409–412 of Nature (2001) and a random sample of 63 results from volumes 322–323 of BMJ (2001). We also tested whether the frequencies of the last digit of a sample of 610 test statistics deviated from a uniform distribution (i.e., equally probable digits). Results: 11.6% (21 of 181) and 11.1% (7 of 63) of the statistical results published in Nature and BMJ respectively during 2001 were incongruent, probably mostly due to rounding, transcription, or type-setting errors. At least one such error appeared in 38% and 25% of the papers of Nature and BMJ, respectively. In 12% of the cases, the significance level might change one or more orders of magnitude. The frequencies of the last digit of statistics deviated from the uniform distribution and suggested digit preference in rounding and reporting. Conclusions: This incongruence of test statistics and P values is another example that statistical practice is generally poor, even in the most renowned scientific journals, and that quality of papers should be more controlled and valued.

153 citations


Journal ArticleDOI
TL;DR: The PLATINO project will provide a detailed picture of the global distribution of COPD in Latin America and shows that studies from Latin America can be carried out with adequate quality and be of scientific value.
Abstract: Background The prevalence of Chronic Obstructive Pulmonary Disease (COPD) in many developed countries appears to be increasing. There is some evidence from Latin America that COPD is a growing cause of death, but information on prevalence is scant. It is possible that, due to the high frequency of smoking in these countries, this disease may represent a major public health problem that has not yet been recognized as such. The PLATINO study is aimed at measuring COPD prevalence in major cities in Latin America.

131 citations


Journal ArticleDOI
TL;DR: In this article, equation (8) is incorrect because it omitted the covariance terms and the effect of the correction was negligible; the corrected estimated standard error was the same to two significant digits as the incorrect value.
Abstract: In this article [1], equation (8) is incorrect because it omitted the covariance terms. Let h denote the number of strata, so s = 1,2,...,h. Let T denote transpose, • denote matrix product, and Diagonal Matrix [vector] denote a matrix of all 0's except for vector on the diagonal. The correct formula is where In our example, the effect of the correction was negligible; the corrected estimated standard error was the same to two significant digits as the incorrect value. Also for clarification, we note that in the sentence after (11), it is an assumption that, within stratum s, the difference, Δs, does not depend on the unobserved covariate x.

100 citations


Journal ArticleDOI
TL;DR: The development of a framework for the reporting of intracluster correlation coefficient (ICC) has the potential to facilitate the interpretation of the cluster trial being reported and should help the development of new trials in the area.
Abstract: Background: Increasingly, researchers are recognizing that there are many situations where the use of a cluster randomized trial may be more appropriate than an individually randomized trial. Similarly, the need for appropriate standards of reporting of cluster trials is more widely acknowledged. Methods: In this paper, we describe the results of a survey to inform the appropriate reporting of the intracluster correlation coefficient (ICC) – the statistical measure of the clustering effect associated with a cluster randomized trial. Results: We identified three dimensions that should be considered when reporting an ICC – a description of the dataset (including characteristics of the outcome and the intervention), information on how the ICC was calculated, and information on the precision of the ICC. Conclusions: This paper demonstrates the development of a framework for the reporting of ICCs. If adopted into routine practice, it has the potential to facilitate the interpretation of the cluster trial being reported and should help the development of new trials in the area.

100 citations


Journal ArticleDOI
TL;DR: There are unique challenges and issues regarding the search, critical appraisal and summarizing epidemiological data in this systematic review of prevalence/incidence studies.
Abstract: Background: Reducing maternal mortality and morbidity are among the key international development goals. A prerequisite for monitoring the progress towards attainment of these goals is accurate assessment of the levels of mortality and morbidity. In order to contribute to mapping the global burden of reproductive ill-health, we are conducting a systematic review of incidence and prevalence of maternal mortality and morbidity. Methods: We followed the standard methodology for systematic reviews. We prepared a protocol and a form for data extraction that identify key characteristics on study and reporting quality. An extensive search was conducted for the years 1997–2002 including electronic and hand searching. Results: We screened the titles and abstracts of about 65,000 citations identified through 11 electronic databases as well as various other sources. Four thousand six hundred and twenty-six full-text reports were critically appraised and 2443 are included in the review so far. Approximately one third of the studies were conducted in Asia and Africa. The reporting quality was generally low with definitions for conditions and the diagnostic methods often not reported. Conclusions: There are unique challenges and issues regarding the search, critical appraisal and summarizing epidemiological data in this systematic review of prevalence/incidence studies. More methodological studies and discussion to advance the field will be useful. Considerable efforts including leadership, consensus building and resources are required to improve the standards of monitoring burden of disease.

Journal ArticleDOI
TL;DR: The enrollment rate was low primarily because of travel considerations, but the experience with patient recruitment for the behavioral intervention randomized trial, "The relaxation response intervention for chronic heart failure (RRCHF)," was able to identify and highlight valuable information for planning recruitment for future similar studies.
Abstract: Patient recruitment is one of the most difficult aspects of clinical trials, especially for research involving elderly subjects. In this paper, we describe our experience with patient recruitment for the behavioral intervention randomized trial, "The relaxation response intervention for chronic heart failure (RRCHF)." Particularly, we identify factors that, according to patient reports, motivated study participation. The RRCHF was a three-armed, randomized controlled trial designed to evaluate the efficacy and cost of a 15-week relaxation response intervention on veterans with chronic heart failure. Patients from the Veterans Affairs (VA) Boston Healthcare System in the United States were recruited in the clinic and by telephone. Patients' reasons for rejecting the study participation were recorded during the screening. A qualitative sub-study in the trial consisted of telephone interviews of participating patients about their experiences in the study. The qualitative study included the first 57 patients who completed the intervention and/or the first follow-up outcome measures. Factors that distinguished patients who consented from those who refused study participation were identified using a t-test or a chi-square test. The reason for study participation was abstracted from the qualitative interview. We successfully consented 134 patients, slightly more than our target number, in 27 months. Ninety-five of the consented patients enrolled in the study. The enrollment rate among the patients approached was 18% through clinic and 6% through telephone recruitment. The most commonly cited reason for declining study participation given by patients recruited in the clinic was 'Lives Too Far Away'; for patients recruited by telephone it was 'Not Interested in the Study'. One factor that significantly distinguished patients who consented from patients who declined was the distance between their residence and the study site (t-test: p < .001). The most frequently reported reason for study participation was some benefit to the patient him/herself. Other reasons included helping others, being grateful to the VA, positive comments by trusted professionals, certain characteristics of the recruiter, and monetary compensation. The enrollment rate was low primarily because of travel considerations, but we were able to identify and highlight valuable information for planning recruitment for future similar studies.

Journal ArticleDOI
TL;DR: Examples are presented that show how easily PBIS can have a large impact on reported results, as well as how there can be no simple answer to it.
Abstract: Publication bias, as typically defined, refers to the decreased likelihood of studies' results being published when they are near the null, not statistically significant, or otherwise "less interesting." But choices about how to analyze the data and which results to report create a publication bias within the published results, a bias I label "publication bias in situ" (PBIS). PBIS may create much greater bias in the literature than traditionally defined publication bias (the failure to publish any result from a study). The causes of PBIS are well known, consisting of various decisions about reporting that are influenced by the data. But its impact is not generally appreciated, and very little attention is devoted to it. What attention there is consists largely of rules for statistical analysis that are impractical and do not actually reduce the bias in reported estimates. PBIS cannot be reduced by statistical tools because it is not fundamentally a problem of statistics, but rather of non-statistical choices and plain language interpretations. PBIS should be recognized as a phenomenon worthy of study – it is extremely common and probably has a huge impact on results reported in the literature – and there should be greater systematic efforts to identify and reduce it. The paper presents examples, including results of a recent HIV vaccine trial, that show how easily PBIS can have a large impact on reported results, as well as how there can be no simple answer to it. PBIS is a major problem, worthy of substantially more attention than it receives. There are ways to reduce the bias, but they are very seldom employed because they are largely unrecognized.

Journal ArticleDOI
TL;DR: Systematic reviews of harmful effects are more likely to yield information pertinent to clinical decision-making if they address a focused question, which will enable clear decisions to be made about the type of research to include in the review.
Abstract: Background Balanced decisions about health care interventions require reliable evidence on harms as well as benefits. Most systematic reviews focus on efficacy and randomised trials, for which the methodology is well established. Methods to systematically review harmful effects are less well developed and there are few sources of guidance for researchers. We present our own recent experience of conducting systematic reviews of harmful effects and make suggestions for future practice and further research.

Journal ArticleDOI
TL;DR: All standard tasks performed by CRCs were in the category of "monitoring activities" and included patient registration/randomization, recruitment follow-up, case report form completion, collaboration with the CRA, serious adverse events reporting, handling of investigator files, and preparing the site for and/or attending audits.
Abstract: Background The purpose of this study was to determine the standard tasks performed by clinical research coordinators (CRCs) in oncology clinical trials.

Journal ArticleDOI
TL;DR: A non-weighted approach to score the EQ5-D is enough to explain a high proportion of variance in scores obtained through the use of utilities, and the differential contribution of weights based on population preference values is minimal and negligible.
Abstract: The use of preference-based measures in the evaluation of health outcomes has extended considerably over the last decade. Their alleged advantage over other types of general instruments in the evaluation of health related quality of life (HRQOL), supposedly lies in the fact that preference measures incorporate values or utilities that reflects the value of social preferences through health states. The objective of this study was to determine whether the use of social preference weights or utilities makes any real difference when calculating scores for the Euroqol (EQ5-D) questionnaire, a HRQOL preference-based measure. Responses to the EQ5-D of a sample of 10,972 patients from 10 countries enrolled in an observational study of the treatment of schizophrenia in Europe were used for this purpose. Two different methods of scoring the EQ-5D where compared: 'weighting the items' of the questionnaire through the UK official weight coefficients, and 'non-weighting the items'. Pearson's, Spearman's, and two-way mixed parametric intraclass correlation coefficients were used to estimate the association of the scores obtained in both ways. The association between weighted and unweighted Euroqol scores was extremely high (Pearson's r = 0.91), as was the association between their ranks (Spearman's ρ = 0.93). The intraclass correlation coefficient obtained (0.89) also suggested that the concordance between the score distributions was prominent. A non-weighted approach to score the EQ5-D is enough to explain a high proportion of variance in scores obtained through the use of utilities. The differential contribution of weights based on population preference values is therefore minimal and, in our opinion, negligible.

Journal ArticleDOI
TL;DR: Comparing different statistical approaches for pooling count data of varying follow-up times in terms of estimates of effect, precision, and clinical interpretability suggested that analysts who want to improve the clinicalinterpretability of their findings should consider incidence rate methods.
Abstract: Background: Meta-analysis can be used to pool rate measures across studies, but challenges arise when follow-up duration varies. Our objective was to compare different statistical approaches for pooling count data of varying follow-up times in terms of estimates of effect, precision, and clinical interpretability. Methods: We examined data from a published Cochrane Review of asthma self-management education in children. We selected two rate measures with the largest number of contributing studies: school absences and emergency room (ER) visits. We estimated fixed- and random-effects standardized weighted mean differences (SMD), stratified incidence rate differences (IRD), and stratified incidence rate ratios (IRR). We also fit Poisson regression models, which allowed for further adjustment for clustering by study. Results: For both outcomes, all methods gave qualitatively similar estimates of effect in favor of the intervention. For school absences, SMD showed modest results in favor of the intervention (SMD -0.14, 95% CI -0.23 to -0.04). IRD implied that the intervention reduced school absences by 1.8 days per year (IRD -0.15 days/child-month, 95% CI -0.19 to -0.11), while IRR suggested a 14% reduction in absences (IRR 0.86, 95% CI 0.83 to 0.90). For ER visits, SMD showed a modest benefit in favor of the intervention (SMD -0.27, 95% CI: -0.45 to -0.09). IRD implied that the intervention reduced ER visits by 1 visit every 2 years (IRD -0.04 visits/child-month, 95% CI: -0.05 to -0.03), while IRR suggested a 34% reduction in ER visits (IRR 0.66, 95% CI 0.59 to 0.74). In Poisson models, adjustment for clustering lowered the precision of the estimates relative to stratified IRR results. For ER visits but not school absences, failure to incorporate study indicators resulted in a different estimate of effect (unadjusted IRR 0.77, 95% CI 0.59 to 0.99). Conclusions: Choice of method among the ones presented had little effect on inference but affected the clinical interpretability of the findings. Incidence rate methods gave more clinically interpretable results than SMD. Poisson regression allowed for further adjustment for heterogeneity across studies. These data suggest that analysts who want to improve the clinical interpretability of their findings should consider incidence rate methods.

Journal ArticleDOI
TL;DR: The percentage of editorial board members which are based in developing world countries is higher for the leading medical education journals than in most of their psychiatry and general medicine counterparts, but it is still too low.
Abstract: Background Researchers from the developing world contribute only a limited proportion to the total research output published in leading medical education journals. Some of them believe that there is a substantial editorial bias against their work. To obtain an objective basis for further discussion the present study was designed to assess the composition of the editorial boards of leading medical education journals.

Journal ArticleDOI
TL;DR: A new strategy, termed the "formal case study", allows for a naturalistic enquiry into the players, processes and outcomes of homeopathic practice using ideas from qualitative research, it allows a rigorous approach to types of research question that cannot typically be addressed through clinical trials and numeric outcome studies.
Abstract: Two main pathways exist for the development of knowledge in clinical homeopathy. These comprise clinical trials conducted primarily by university-based researchers and cases reports and homeopathic "provings" compiled by engaged homeopathic practitioners. In this paper the relative merits of these methods are examined and a middle way proposed. This consists of the "Formal Case Study" (FCS) in which qualitative methods are used to increase the rigour and sophistication with which homeopathic cases are studied. Before going into design issues this paper places the FCS in an historical and academic context and describes the relative merits of the method. Like any research, the FCS should have a clear focus. This focus can be both "internal", grounded in the discourse of homeopathy and also encompass issues of wider appeal. A selection of possible "internal" and "external" research questions is introduced. Data generation should be from multiple sources to ensure adequate triangulation. This could include the recording and transcription of actual consultations. Analysis is built around existing theory, involves cross-case comparison and the search for deviant cases. The trustworthiness of conclusions is ensured by the application of concepts from qualitative research including triangulation, groundedness, respondent validation and reflexivity. Though homeopathic case studies have been reported in mainstream literature, none has used formal qualitative methods – though some such studies are in progress. This paper introduces the reader to a new strategy for homeopathic research. This strategy, termed the "formal case study", allows for a naturalistic enquiry into the players, processes and outcomes of homeopathic practice. Using ideas from qualitative research, it allows a rigorous approach to types of research question that cannot typically be addressed through clinical trials and numeric outcome studies. The FCS provides an opportunity for the practitioner-researcher to contribute to the evidence-base in homeopathy in a systematic fashion. The FCS can also be used to inform the design of clinical trials through holistic study of the "active ingredients" of the therapeutic process and its clinical outcomes.

Journal ArticleDOI
TL;DR: Estimates of intra-cluster correlation coefficients observed within a large-scale cross-sectional study of general practice in Australia indicate that these coefficients will be useful for calculating sample sizes in future general practice surveys that use the GP as the primary sampling unit.
Abstract: Cluster sample study designs are cost effective, however cluster samples violate the simple random sample assumption of independence of observations. Failure to account for the intra-cluster correlation of observations when sampling through clusters may lead to an under-powered study. Researchers therefore need estimates of intra-cluster correlation for a range of outcomes to calculate sample size. We report intra-cluster correlation coefficients observed within a large-scale cross-sectional study of general practice in Australia, where the general practitioner (GP) was the primary sampling unit and the patient encounter was the unit of inference. Each year the Bettering the Evaluation and Care of Health (BEACH) study recruits a random sample of approximately 1,000 GPs across Australia. Each GP completes details of 100 consecutive patient encounters. Intra-cluster correlation coefficients were estimated for patient demographics, morbidity managed and treatments received. Intra-cluster correlation coefficients were estimated for descriptive outcomes and for associations between outcomes and predictors and were compared across two independent samples of GPs drawn three years apart. Between April 1999 and March 2000, a random sample of 1,047 Australian general practitioners recorded details of 104,700 patient encounters. Intra-cluster correlation coefficients for patient demographics ranged from 0.055 for patient sex to 0.451 for language spoken at home. Intra-cluster correlations for morbidity variables ranged from 0.005 for the management of eye problems to 0.059 for management of psychological problems. Intra-cluster correlation for the association between two variables was smaller than the descriptive intra-cluster correlation of each variable. When compared with the April 2002 to March 2003 sample (1,008 GPs) the estimated intra-cluster correlation coefficients were found to be consistent across samples. The demonstrated precision and reliability of the estimated intra-cluster correlations indicate that these coefficients will be useful for calculating sample sizes in future general practice surveys that use the GP as the primary sampling unit.

Journal ArticleDOI
TL;DR: The system presents an alternative to streamline the interdisciplinary collaboration of clinicians, statisticians, programmers, and graduate students in outcomes research.
Abstract: We describe a system of web applications designed to streamline the interdisciplinary collaboration in outcomes research. The outcomes research process can be described as a set of three interrelated phases: design and selection of data sources, analysis, and output. Each of these phases has inherent challenges that can be addressed by a group of five web applications developed by our group. QuestForm allows for the formulation of relevant and well-structured outcomes research questions; Research Manager facilitates the project management and electronic file exchange among researchers; Analysis Charts facilitate the communication of complex statistical techniques to clinicians with varying previous levels of statistical knowledge; Literature Matrices improve the efficiency of literature reviews. An outcomes research question is used to illustrate the use of the system. The system presents an alternative to streamline the interdisciplinary collaboration of clinicians, statisticians, programmers, and graduate students.

Journal ArticleDOI
TL;DR: Waist-to-hip circumference ratio and body mass index remained modest predictors of dyslipidemia and there were no striking differences between either the algebraic (i, ii) vs. non-algebraic (iii, iv) modeling approaches.
Abstract: We sought to improve upon previously published statistical modeling strategies for binary classification of dyslipidemia for general population screening purposes based on the waist-to-hip circumference ratio and body mass index anthropometric measurements. Study subjects were participants in WHO-MONICA population-based surveys conducted in two Swiss regions. Outcome variables were based on the total serum cholesterol to high density lipoprotein cholesterol ratio. The other potential predictor variables were gender, age, current cigarette smoking, and hypertension. The models investigated were: (i) linear regression; (ii) logistic classification; (iii) regression trees; (iv) classification trees (iii and iv are collectively known as "CART"). Binary classification performance of the region-specific models was externally validated by classifying the subjects from the other region. Waist-to-hip circumference ratio and body mass index remained modest predictors of dyslipidemia. Correct classification rates for all models were 60–80%, with marked gender differences. Gender-specific models provided only small gains in classification. The external validations provided assurance about the stability of the models. There were no striking differences between either the algebraic (i, ii) vs. non-algebraic (iii, iv), or the regression (i, iii) vs. classification (ii, iv) modeling approaches. Anticipated advantages of the CART vs. simple additive linear and logistic models were less than expected in this particular application with a relatively small set of predictor variables. CART models may be more useful when considering main effects and interactions between larger sets of predictor variables.

Journal ArticleDOI
TL;DR: The flexibility of POR model, coupled with ease with which it can be estimated in familiar software, suits the daily practice of meta-analysis and improves clinical decision-making.
Abstract: Consider a meta-analysis where a 'head-to-head' comparison of diagnostic tests for a disease of interest is intended. Assume there are two or more tests available for the disease, where each test has been studied in one or more papers. Some of the papers may have studied more than one test, hence the results are not independent. Also the collection of tests studied may change from one paper to the other, hence incomplete matched groups. We propose a model, the proportional odds ratio (POR) model, which makes no assumptions about the shape of OR p , a baseline function capturing the way OR changes across papers. The POR model does not assume homogeneity of ORs, but merely specifies a relationship between the ORs of the two tests. One may expand the domain of the POR model to cover dependent studies, multiple outcomes, multiple thresholds, multi-category or continuous tests, and individual-level data. In the paper we demonstrate how to formulate the model for a few real examples, and how to use widely available or popular statistical software (like SAS, R or S-Plus, and Stata) to fit the models, and estimate the discrimination accuracy of tests. Furthermore, we provide code for converting ORs into other measures of test performance like predictive values, post-test probabilities, and likelihood ratios, under mild conditions. Also we provide code to convert numerical results into graphical ones, like forest plots, heterogeneous ROC curves, and post test probability difference graphs. The flexibility of POR model, coupled with ease with which it can be estimated in familiar software, suits the daily practice of meta-analysis and improves clinical decision-making.

Journal ArticleDOI
TL;DR: A cluster randomized trial to improve obstetric practices in 40 hospitals in Mexico and Thailand is conducted using an active dissemination strategy to promote uptake of recommendations in The WHO Reproductive Health Library.
Abstract: Effective strategies for implementing best practices in low and middle income countries are needed. RHL is an annually updated electronic publication containing Cochrane systematic reviews, commentaries and practical recommendations on how to implement evidence-based practices. We are conducting a trial to evaluate the improvement in obstetric practices using an active dissemination strategy to promote uptake of recommendations in The WHO Reproductive Health Library (RHL). A cluster randomized trial to improve obstetric practices in 40 hospitals in Mexico and Thailand is conducted. The trial uses a stratified random allocation based on country, size and type of hospitals. The core intervention consists of three interactive workshops delivered over a period of six months. The main outcome measures are changes in clinical practices that are recommended in RHL measured approximately a year after the first workshop. The design and implementation of a complex intervention using a cluster randomized trial design are presented. Designing the intervention, choosing outcome variables and implementing the protocol in two diverse settings has been a time-consuming and challenging process. We hope that sharing this experience will help others planning similar projects and improve our ability to implement change.

Journal ArticleDOI
TL;DR: Across all three journals there were relatively few papers describing randomised controlled trials thus recognising the difficulty of implementing this design in general practice.
Abstract: Background Many medical specialities have reviewed the statistical content of their journals. To our knowledge this has not been done in general practice. Given the main role of a general practitioner as a diagnostician we thought it would be of interest to see whether the statistical methods reported reflect the diagnostic process.

Journal ArticleDOI
TL;DR: The examination of heterogeneity in conjunction with summary effect estimates in a cumulative meta-analysis offered valuable insight into the evolution of variation and led to the development of a richer picture of the effectiveness of interventions.
Abstract: Recently developed measures such as I 2 and H allow the evaluation of the impact of heterogeneity in conventional meta-analyses. There has been no examination of the development of heterogeneity in the context of a cumulative meta-analysis. Cumulative meta-analyses of five smoking cessation interventions (clonidine, nicotine replacement therapy using gum and patch, physician advice and acupuncture) were used to calculate I 2 and H. These values were plotted by year of publication, control event rate and sample size to trace the development of heterogeneity over these covariates. The cumulative evaluation of heterogeneity varied according to the measure of heterogeneity used and the basis of cumulation. Plots produced from the calculations revealed areas of heterogeneity useful in the consideration of potential sources for further study. The examination of heterogeneity in conjunction with summary effect estimates in a cumulative meta-analysis offered valuable insight into the evolution of variation. Such information is not available in the context of conventional meta-analysis and has the potential to lead to the development of a richer picture of the effectiveness of interventions.

Journal ArticleDOI
TL;DR: This paper describes an easy-to-use Web-based utility for estimating the reliability of ratings based on incomplete data using Ebel's algorithm, written in PHP, a common open source imbedded scripting language.
Abstract: Background Rating scales form an important means of gathering evaluation data. Since important decisions are often based on these evaluations, determining the reliability of rating data can be critical. Most commonly used methods of estimating reliability require a complete set of ratings i.e. every subject being rated must be rated by each judge. Over fifty years ago Ebel described an algorithm for estimating the reliability of ratings based on incomplete data. While his article has been widely cited over the years, software based on the algorithm is not readily available. This paper describes an easy-to-use Web-based utility for estimating the reliability of ratings based on incomplete data using Ebel's algorithm.

Journal ArticleDOI
TL;DR: The extended paired availability design yielded reasonably precise confidence intervals for the effect of receiving screening on the rate of incident breast cancer death and proposed a novel analysis to accommodate likely violations of the assumption of stable screening effects.
Abstract: In recent years there has been increased interest in evaluating breast cancer screening using data from before-and-after studies in multiple geographic regions. One approach, not previously mentioned, is the paired availability design. The paired availability design was developed to evaluate the effect of medical interventions by comparing changes in outcomes before and after a change in the availability of an intervention in various locations. A simple potential outcomes model yields estimates of efficacy, the effect of receiving the intervention, as opposed to effectiveness, the effect of changing the availability of the intervention. By combining estimates of efficacy rather than effectiveness, the paired availability design avoids confounding due to different fractions of subjects receiving the interventions at different locations. The original formulation involved short-term outcomes; the challenge here is accommodating long-term outcomes. The outcome is incident breast cancer deaths in a time period, which are breast cancer deaths that were diagnosed in the same time period. We considered the plausibility of the basic five assumptions of the paired availability design and propose a novel analysis to accommodate likely violations of the assumption of stable screening effects. We applied the paired availability design to data on breast cancer screening from six counties in Sweden. The estimated yearly change in incident breast cancer deaths per 100,000 persons ages 40–69 (in most counties) due to receipt of screening (among the relevant type of subject in the potential outcomes model) was -9 with 95% confidence interval (-14, -4) or (-14, -5), depending on the sensitivity analysis. In a realistic application, the extended paired availability design yielded reasonably precise confidence intervals for the effect of receiving screening on the rate of incident breast cancer death. Although the assumption of stable preferences may be questionable, its impact will be small if there is little screening in the first time period. However, estimates may be substantially confounded by improvements in systemic therapy over time. Therefore the results should be interpreted with care.

Journal ArticleDOI
TL;DR: Characteristics of participants who failed to complete seven months of planned participation in a trial of spermicide efficacy were explored, finding that failure to complete is a major problem in barrier method trials that seriously compromises the interpretation of results.
Abstract: Background: In most recent large efficacy trials of barrier contraceptive methods, a high proportion of participants withdrew before the intended end of follow-up. The objective of this analysis was to explore characteristics of participants who failed to complete seven months of planned participation in a trial of spermicide efficacy. Methods: Trial participants were expected to use the assigned spermicide for contraception for 7 months or until pregnancy occurred. In bivariable and multivariable analyses, we assessed the associations between failure to complete the trial and 17 pre-specified baseline characteristics. In addition, among women who participated for at least 6 weeks, we evaluated the relationships between failure to complete, various features of their first 6 weeks of experience with the spermicide, and characteristics of the study centers and population. Results: Of the 1514 participants in this analysis, 635 (42%) failed to complete the study for reasons other than pregnancy. Women were significantly less likely to complete if they were younger or unmarried, had intercourse at least 8 times per month, or were enrolled at a university center or at a center that enrolled fewer than 4 participants per month. Noncompliance with study procedures in the first 6 weeks was also associated with subsequent early withdrawal, but

Journal ArticleDOI
TL;DR: A priori assumptions of constant relative risk across risk groups are not robust, limiting extrapolation of estimates of benefit to the general population, unless the intervention is targeted to only high-risk subjects, cancer prevention trials should be implemented in the generalpopulation.
Abstract: There is a common belief that most cancer prevention trials should be restricted to high-risk subjects in order to increase statistical power. This strategy is appropriate if the ultimate target population is subjects at the same high-risk. However if the target population is the general population, three assumptions may underlie the decision to enroll high-risk subject instead of average-risk subjects from the general population: higher statistical power for the same sample size, lower costs for the same power and type I error, and a correct ratio of benefits to harms. We critically investigate the plausibility of these assumptions. We considered each assumption in the context of a simple example. We investigated statistical power for fixed sample size when the investigators assume that relative risk is invariant over risk group, but when, in reality, risk difference is invariant over risk groups. We investigated possible costs when a trial of high-risk subjects has the same power and type I error as a larger trial of average-risk subjects from the general population. We investigated the ratios of benefit to harms when extrapolating from high-risk to average-risk subjects. Appearances here are misleading. First, the increase in statistical power with a trial of high-risk subjects rather than the same number of average-risk subjects from the general population assumes that the relative risk is the same for high-risk and average-risk subjects. However, if the absolute risk difference rather than the relative risk were the same, the power can be less with the high-risk subjects. In the analysis of data from a cancer prevention trial, we found that invariance of absolute risk difference over risk groups was nearly as plausible as invariance of relative risk over risk groups. Therefore a priori assumptions of constant relative risk across risk groups are not robust, limiting extrapolation of estimates of benefit to the general population. Second, a trial of high-risk subjects may cost more than a larger trial of average risk subjects with the same power and type I error because of additional recruitment and diagnostic testing to identify high-risk subjects. Third, the ratio of benefits to harms may be more favorable in high-risk persons than in average-risk persons in the general population, which means that extrapolating this ratio to the general population would be misleading. Thus there is no free lunch when using a trial of high-risk subjects to extrapolate results to the general population. Unless the intervention is targeted to only high-risk subjects, cancer prevention trials should be implemented in the general population.