scispace - formally typeset
Search or ask a question

Showing papers in "BMC Medical Research Methodology in 2003"


Journal ArticleDOI
TL;DR: In this article, an evidence-based quality assessment tool called QUADAS was proposed to assess the quality of primary studies of diagnostic accuracy, based on the results of three previously conducted reviews of the diagnostic literature.
Abstract: Background: In the era of evidence based medicine, with systematic reviews as its cornerstone, adequate quality assessment tools should be available. There is currently a lack of a systematically developed and evaluated tool for the assessment of diagnostic accuracy studies. The aim of this project was to combine empirical evidence and expert opinion in a formal consensus method to develop a tool to be used in systematic reviews to assess the quality of primary studies of diagnostic accuracy. Methods: We conducted a Delphi procedure to develop the quality assessment tool by refining an initial list of items. Members of the Delphi panel were experts in the area of diagnostic research. The results of three previously conducted reviews of the diagnostic literature were used to generate a list of potential items for inclusion in the tool and to provide an evidence base upon which to develop the tool. Results: A total of nine experts in the field of diagnostics took part in the Delphi procedure. The Delphi procedure consisted of four rounds, after which agreement was reached on the items to be included in the tool which we have called QUADAS. The initial list of 28 items was reduced to fourteen items in the final tool. Items included covered patient spectrum, reference standard, disease progression bias, verification bias, review bias, clinical review bias, incorporation bias, test execution, study withdrawals, and indeterminate results. The QUADAS tool is presented together with guidelines for scoring each of the items included in the tool. Conclusions: This project has produced an evidence based quality assessment tool to be used in systematic reviews of diagnostic accuracy studies. Further work to determine the usability and validity of the tool continues.

3,468 citations


Journal ArticleDOI
TL;DR: Cox or Poisson regression with robust variance and log-binomial regression provide correct estimates and are a better alternative for the analysis of cross-sectional studies with binary outcomes than logistic regression, since the prevalence ratio is more interpretable and easier to communicate to non-specialists than the odds ratio.
Abstract: Cross-sectional studies with binary outcomes analyzed by logistic regression are frequent in the epidemiological literature. However, the odds ratio can importantly overestimate the prevalence ratio, the measure of choice in these studies. Also, controlling for confounding is not equivalent for the two measures. In this paper we explore alternatives for modeling data of such studies with techniques that directly estimate the prevalence ratio. We compared Cox regression with constant time at risk, Poisson regression and log-binomial regression against the standard Mantel-Haenszel estimators. Models with robust variance estimators in Cox and Poisson regressions and variance corrected by the scale parameter in Poisson regression were also evaluated. Three outcomes, from a cross-sectional study carried out in Pelotas, Brazil, with different levels of prevalence were explored: weight-for-age deficit (4%), asthma (31%) and mother in a paid job (52%). Unadjusted Cox/Poisson regression and Poisson regression with scale parameter adjusted by deviance performed worst in terms of interval estimates. Poisson regression with scale parameter adjusted by χ2 showed variable performance depending on the outcome prevalence. Cox/Poisson regression with robust variance, and log-binomial regression performed equally well when the model was correctly specified. Cox or Poisson regression with robust variance and log-binomial regression provide correct estimates and are a better alternative for the analysis of cross-sectional studies with binary outcomes than logistic regression, since the prevalence ratio is more interpretable and easier to communicate to non-specialists than the odds ratio. However, precautions are needed to avoid estimation problems in specific situations.

3,455 citations


Journal ArticleDOI
TL;DR: Two pragmatic randomized controlled trials on interventions in the management of hypertension in primary care are conducted, describing the design of the trials and the steps taken to deal with the competing demands of external and internal validity.
Abstract: Controlled clinical trials of health care interventions are either explanatory or pragmatic. Explanatory trials test whether an intervention is efficacious; that is, whether it can have a beneficial effect in an ideal situation. Pragmatic trials measure effectiveness; they measure the degree of beneficial effect in real clinical practice. In pragmatic trials, a balance between external validity (generalizability of the results) and internal validity (reliability or accuracy of the results) needs to be achieved. The explanatory trial seeks to maximize the internal validity by assuring rigorous control of all variables other than the intervention. The pragmatic trial seeks to maximize external validity to ensure that the results can be generalized. However the danger of pragmatic trials is that internal validity may be overly compromised in the effort to ensure generalizability. We are conducting two pragmatic randomized controlled trials on interventions in the management of hypertension in primary care. We describe the design of the trials and the steps taken to deal with the competing demands of external and internal validity. External validity is maximized by having few exclusion criteria and by allowing flexibility in the interpretation of the intervention and in management decisions. Internal validity is maximized by decreasing contamination bias through cluster randomization, and decreasing observer and assessment bias, in these non-blinded trials, through baseline data collection prior to randomization, automating the outcomes assessment with 24 hour ambulatory blood pressure monitors, and blinding the data analysis. Clinical trials conducted in community practices present investigators with difficult methodological choices related to maintaining a balance between internal validity (reliability of the results) and external validity (generalizability). The attempt to achieve methodological purity can result in clinically meaningless results, while attempting to achieve full generalizability can result in invalid and unreliable results. Achieving a creative tension between the two is crucial.

482 citations


Journal ArticleDOI
TL;DR: The system proposed would significantly improve the protection of privacy and confidentiality, while still allowing the efficient linkage of records between disease registers, under the control and supervision of the trusted third party and independent ethics committees.
Abstract: Background: Disease registers aim to collect information about all instances of a disease or condition in a defined population of individuals. Traditionally methods of operating disease registers have required that notifications of cases be identified by unique identifiers such as social security number or national identification number, or by ensembles of non-unique identifying data items, such as name, sex and date of birth. However, growing concern over the privacy and confidentiality aspects of disease registers may hinder their future operation. Technical solutions to these legitimate concerns are needed. Discussion: An alternative method of operation is proposed which involves splitting the personal identifiers from the medical details at the source of notification, and separately encrypting each part using asymmetrical (public key) cryptographic methods. The identifying information is sent to a single Population Register, and the medical details to the relevant disease register. The Population Register uses probabilistic record linkage to assign a unique personal identification (UPI) number to each person notified to it, although not necessarily everyone in the entire population. This UPI is shared only with a single trusted third party whose sole function is to translate between this UPI and separate series of personal identification numbers which are specific to each disease register. Summary: The system proposed would significantly improve the protection of privacy and confidentiality, while still allowing the efficient linkage of records between disease registers, under the control and supervision of the trusted third party and independent ethics committees. The proposed architecture could accommodate genetic databases and tissue banks as well as a wide range of other health and social data collections. It is important that proposals such as this are subject to widespread scrutiny by information security experts, researchers and interested members of the general public, alike.

250 citations


Journal ArticleDOI
TL;DR: A number of issues that should be considered when planning a factorial trial are presented, including that of sample size and the main analytical issues relate to the investigation of main effects and the interaction between the interventions in appropriate regression models.
Abstract: The evaluation of more than one intervention in the same randomised controlled trial can be achieved using a parallel group design. However this requires increased sample size and can be inefficient, especially if there is also interest in considering combinations of the interventions. An alternative may be a factorial trial, where for two interventions participants are allocated to receive neither intervention, one or the other, or both. Factorial trials require special considerations, however, particularly at the design and analysis stages. Using a 2 × 2 factorial trial as an example, we present a number of issues that should be considered when planning a factorial trial. The main design issue is that of sample size. Factorial trials are most often powered to detect the main effects of interventions, since adequate power to detect plausible interactions requires greatly increased sample sizes. The main analytical issues relate to the investigation of main effects and the interaction between the interventions in appropriate regression models. Presentation of results should reflect the analytical strategy with an emphasis on the principal research questions. We also give an example of how baseline and follow-up data should be presented. Lastly, we discuss the implications of the design, analytical and presentational issues covered. Difficulties in interpreting the results of factorial trials if an influential interaction is observed is the cost of the potential for efficient, simultaneous consideration of two or more interventions. Factorial trials can in principle be designed to have adequate power to detect realistic interactions, and in any case they are the only design that allows such effects to be investigated.

250 citations


Journal ArticleDOI
TL;DR: Estimating the proportion of abstracts submitted to meetings that are eventually published as full reports and identifying abstract and meeting characteristics associated with acceptance and publication, using logistic regression analysis, survival-type analysis, and meta-analysis.
Abstract: It has been estimated that about 45% of abstracts that are accepted for presentation at biomedical meetings will subsequently be published in full. The acceptance of abstracts at meetings and their fate after initial rejection are less well understood. We set out to estimate the proportion of abstracts submitted to meetings that are eventually published as full reports, and to explore factors that are associated with meeting acceptance and successful publication. Studies analysing acceptance of abstracts at biomedical meetings or their subsequent full publication were searched in MEDLINE, OLDMEDLINE, EMBASE, Cochrane Library, CINAHL, BIOSIS, Science Citation Index Expanded, and by hand searching of bibliographies and proceedings. We estimated rates of abstract acceptance and of subsequent full publication, and identified abstract and meeting characteristics associated with acceptance and publication, using logistic regression analysis, survival-type analysis, and meta-analysis. Analysed meetings were held between 1957 and 1999. Of 14945 abstracts that were submitted to 43 meetings, 46% were accepted. The rate of full publication was studied with 19123 abstracts that were presented at 234 meetings. Using survival-type analysis, we estimated that 27% were published after two, 41% after four, and 44% after six years. Of 2412 abstracts that were rejected at 24 meetings, 27% were published despite rejection. Factors associated with both abstract acceptance and subsequent publication were basic science and positive study outcome. Large meetings and those held outside the US were more likely to accept abstracts. Abstracts were more likely to be published subsequently if presented either orally, at small meetings, or at a US meeting. Abstract acceptance itself was strongly associated with full publication. About one third of abstracts submitted to biomedical meetings were published as full reports. Acceptance at meetings and publication were associated with specific characteristics of abstracts and meetings.

196 citations


Journal ArticleDOI
TL;DR: Perfect correlation between potential surrogate and unobserved true outcomes within randomized groups does not guarantee correct inference based on a potential surrogate endpoint and even in early phase trials, investigators should not base conclusions on potential surrogate endpoints.
Abstract: There is common belief among some medical researchers that if a potential surrogate endpoint is highly correlated with a true endpoint, then a positive (or negative) difference in potential surrogate endpoints between randomization groups would imply a positive (or negative) difference in unobserved true endpoints between randomization groups. We investigate this belief when the potential surrogate and unobserved true endpoints are perfectly correlated within each randomization group. We use a graphical approach. The vertical axis is the unobserved true endpoint and the horizontal axis is the potential surrogate endpoint. Perfect correlation within each randomization group implies that, for each randomization group, potential surrogate and true endpoints are related by a straight line. In this scenario the investigator does not know the slopes or intercepts. We consider a plausible example where the slope of the line is higher for the experimental group than for the control group. In our example with unknown lines, a decrease in mean potential surrogate endpoints from control to experimental groups corresponds to an increase in mean true endpoint from control to experimental groups. Thus the potential surrogate endpoints give the wrong inference. Similar results hold for binary potential surrogate and true outcomes (although the notion of correlation does not apply). The potential surrogate endpointwould give the correct inference if either (i) the unknown lines for the two group coincided, which means that the distribution of true endpoint conditional on potential surrogate endpoint does not depend on treatment group, which is called the Prentice Criterion or (ii) if one could accurately predict the lines based on data from prior studies. Perfect correlation between potential surrogate and unobserved true outcomes within randomized groups does not guarantee correct inference based on a potential surrogate endpoint. Even in early phase trials, investigators should not base conclusions on potential surrogate endpoints in which the only validation is high correlation with the true endpoint within a group.

157 citations


Journal ArticleDOI
TL;DR: A quality index was developed for the evaluation of scientific meeting abstracts which was shown to be reliable, valid and useful.
Abstract: The evaluation of abstracts for scientific meetings has been shown to suffer from poor inter observer reliability. A measure was developed to assess the formal quality of abstract submissions in a standardized way. Item selection was based on scoring systems for full reports, taking into account published guidelines for structured abstracts. Interrater agreement was examined using a random sample of submissions to the American Gastroenterological Association, stratified for research type (n = 100, 1992–1995). For construct validity, the association of formal quality with acceptance for presentation was examined. A questionnaire to expert reviewers evaluated sensibility items, such as ease of use and comprehensiveness. The index comprised 19 items. The summary quality scores showed good interrater agreement (intra class coefficient 0.60 – 0.81). Good abstract quality was associated with abstract acceptance for presentation at the meeting. The instrument was found to be acceptable by expert reviewers. A quality index was developed for the evaluation of scientific meeting abstracts which was shown to be reliable, valid and useful.

108 citations


Journal ArticleDOI
TL;DR: The proposed method offers a rational basis for determining the number of repeat measures in repeat measures designs and is effective in randomized and non-randomized comparative trials.
Abstract: In many randomized and non-randomized comparative trials, researchers measure a continuous endpoint repeatedly in order to decrease intra-patient variability and thus increase statistical power. There has been little guidance in the literature as to selecting the optimal number of repeated measures. The degree to which adding a further measure increases statistical power can be derived from simple formulae. This "marginal benefit" can be used to inform the optimal number of repeat assessments. Although repeating assessments can have dramatic effects on power, marginal benefit of an additional measure rapidly decreases as the number of measures rises. There is little value in increasing the number of either baseline or post-treatment assessments beyond four, or seven where baseline assessments are taken. An exception is when correlations between measures are low, for instance, episodic conditions such as headache. The proposed method offers a rational basis for determining the number of repeat measures in repeat measures designs.

94 citations


Journal ArticleDOI
TL;DR: It is argued that the choice of a control intervention should be supported by a systematic review of the relevant literature and, where necessary, solicitation of the informed beliefs of clinical experts through formal surveys and publication of the proposed trial's protocol.
Abstract: Randomised controlled clinical trials are performed to resolve uncertainty concerning comparator interventions. Appropriate acknowledgment of uncertainty enables the concurrent achievement of two goals : the acquisition of valuable scientific knowledge and an optimum treatment choice for the patient-participant. The ethical recruitment of patients requires the presence of clinical equipoise. This involves the appropriate choice of a control intervention, particularly when unapproved drugs or innovative interventions are being evaluated. We argue that the choice of a control intervention should be supported by a systematic review of the relevant literature and, where necessary, solicitation of the informed beliefs of clinical experts through formal surveys and publication of the proposed trial's protocol. When clinical equipoise is present, physicians may confidently propose trial enrollment to their eligible patients as an act of therapeutic beneficence.

57 citations


Journal ArticleDOI
TL;DR: Adaptive sampling techniques were effective in reaching drug-using heterosexual couples in an urban setting, and the application of these methods to other groups of related individuals in clinical and public health research may prove to be useful.
Abstract: Background Public health research involving social or kin groups (such as sexual partners or family members), rather than samples of unrelated individuals, has become more widespread in response to social ecological approaches to disease treatment and prevention. This approach requires the development of innovative sampling, recruitment and screening methodologies tailored to the study of related individuals.

Journal ArticleDOI
TL;DR: There currently persists a silence about the methods of interdisciplinary collaboration itself, and the core of this paper proposes a template for such methods.
Abstract: Background While the desirability of interdisciplinary inquiry has been widely acknowledged, indeed has become 'the mantra of science policy', the methods of interdisciplinary collaboration are opaque to outsiders and generally remain undescribed.

Journal ArticleDOI
TL;DR: It is demonstrated that Monte Carlo simulation, using available software, can estimate the uncertainty resulting from complicated calculations with many sources of uncertainty, and is applied to the current estimate of the annual incidence of foodborne illness in the United States.
Abstract: All quantifications of mortality, morbidity, and other health measures involve numerous sources of error. The routine quantification of random sampling error makes it easy to forget that other sources of error can and should be quantified. When a quantification does not involve sampling, error is almost never quantified and results are often reported in ways that dramatically overstate their precision. We argue that the precision implicit in typical reporting is problematic and sketch methods for quantifying the various sources of error, building up from simple examples that can be solved analytically to more complex cases. There are straightforward ways to partially quantify the uncertainty surrounding a parameter that is not characterized by random sampling, such as limiting reported significant figures. We present simple methods for doing such quantifications, and for incorporating them into calculations. More complicated methods become necessary when multiple sources of uncertainty must be combined. We demonstrate that Monte Carlo simulation, using available software, can estimate the uncertainty resulting from complicated calculations with many sources of uncertainty. We apply the method to the current estimate of the annual incidence of foodborne illness in the United States. Quantifying uncertainty from systematic errors is practical. Reporting this uncertainty would more honestly represent study results, help show the probability that estimated values fall within some critical range, and facilitate better targeting of further research.

Journal ArticleDOI
TL;DR: Two-stage phase II trials for assessing survival probabilities can be designed that do not require prolonged suspension of patient accrual, and are more efficient than single stage designs and more practical than existing two-stage designs developed for binomial outcomes, particularly in trials with slow Accrual.
Abstract: Phase II cancer studies are undertaken to assess the activity of a new drug or a new treatment regimen. Activity is sometimes defined in terms of a survival probability, a binary outcome such as one-year survival that is derived from a time-to-event variable. Phase II studies are usually designed with an interim analysis so they can be stopped if early results are disappointing. Most designs that allow for an interim look are not appropriate for monitoring survival probabilities since many patients will not have enough follow-up by the time of the interim analysis, thus necessitating an inconvenient suspension of accrual while patients are being followed. Two-stage phase II clinical trial designs are developed for evaluating survival probabilities. These designs are compared to fixed sample designs and to existing designs developed to monitor binomial probabilities to illustrate the expected reduction in sample size or study length possible with the use of the proposed designs. Savings can be realized in both the duration of accrual and the total study length, with the expected savings increasing as the accrual rate decreases. Misspecifying the underlying survival distribution and the accrual rate during the planning phase can adversely influence the operating characteristics of the designs. Two-stage phase II trials for assessing survival probabilities can be designed that do not require prolonged suspension of patient accrual. These designs are more efficient than single stage designs and more practical than existing two-stage designs developed for binomial outcomes, particularly in trials with slow accrual.

Journal ArticleDOI
TL;DR: Simulations suggest that the proposed new methods for determining when non-significant meta-analytic results might be overturned, based on a prediction of the number of participants required in new studies, are able to detect out-of-date meta-analyses.
Abstract: Background As an increasingly large number of meta-analyses are published, quantitative methods are needed to help clinicians and systematic review teams determine when meta-analyses are not up to date.

Journal ArticleDOI
TL;DR: This paper demonstrates how structural equation modelling (SEM) can be used as a tool to aid in carrying out power analyses, and may prove useful for researchers designing research in the health and medical spheres.
Abstract: Background: This paper demonstrates how structural equation modelling (SEM) can be used as a tool to aid in carrying out power analyses. For many complex multivariate designs that are increasingly being employed, power analyses can be difficult to carry out, because the software available lacks sufficient flexibility. Satorra and Saris developed a method for estimating the power of the likelihood ratio test for structural equation models. Whilst the Satorra and Saris approach is familiar to researchers who use the structural equation modelling approach, it is less well known amongst other researchers. The SEM approach can be equivalent to other multivariate statistical tests, and therefore the Satorra and Saris approach to power analysis can be used. Methods: The covariance matrix, along with a vector of means, relating to the alternative hypothesis is generated. This represents the hypothesised population effects. A model (representing the null hypothesis) is then tested in a structural equation model, using the population parameters as input. An analysis based on the chi-square of this model can provide estimates of the sample size required for different levels of power to reject the null hypothesis. Conclusions: The SEM based power analysis approach may prove useful for researchers designing research in the health and medical spheres.

Journal ArticleDOI
TL;DR: A logistic regression model is proposed to estimate both the probability of at least one false positive and the expected number of false positives associated with n cancer screenings and can be used for both informed decision making at the individual level, as well as planning of health services.
Abstract: When evaluating cancer screening it is important to estimate the cumulative risk of false positives from periodic screening. Because the data typically come from studies in which the number of screenings varies by subject, estimation must take into account dropouts. A previous approach to estimate the probability of at least one false positive in n screenings unrealistically assumed that the probability of dropout does not depend on prior false positives. By redefining the random variables, we obviate the unrealistic dropout assumption. We also propose a relatively simple logistic regression and extend estimation to the expected number of false positives in n screenings. We illustrate our methodology using data from women ages 40 to 64 who received up to four annual breast cancer screenings in the Health Insurance Program of Greater New York study, which began in 1963. Covariates were age, time since previous screening, screening number, and whether or not a previous false positive occurred. Defining a false positive as an unnecessary biopsy, the only statistically significant covariate was whether or not a previous false positive occurred. Because the effect of screening number was not statistically significant, extrapolation beyond 4 screenings was reasonable. The estimated mean number of unnecessary biopsies in 10 years per woman screened is .11 with 95% confidence interval of (.10, .12). Defining a false positive as an unnecessary work-up, all the covariates were statistically significant and the estimated mean number of unnecessary work-ups in 4 years per woman screened is .34 with 95% confidence interval (.32, .36). Using data from multiple cancer screenings with dropouts, and allowing dropout to depend on previous history of false positives, we propose a logistic regression model to estimate both the probability of at least one false positive and the expected number of false positives associated with n cancer screenings. The methodology can be used for both informed decision making at the individual level, as well as planning of health services.

Journal ArticleDOI
TL;DR: The use of the hyperbolic tangent transformation enables the investigator to take advantage of the conjugate properties of the normal distribution, which are expressed by combining correlation coefficients from different studies.
Abstract: Background The Bayesian approach is one alternative for estimating correlation coefficients in which knowledge from previous studies is incorporated to improve estimation. The purpose of this paper is to illustrate the utility of the Bayesian approach for estimating correlations using prior knowledge.

Journal ArticleDOI
TL;DR: The use of selected scales from a multi-scale health-status questionnaire seems to yield similar results when administered in isolation or within the entire SF-36 to patients with musculoskeletal disorders.
Abstract: Background Little work has been done to investigate the suggestion that the use of selected scales from a multi-scale health-status questionnaire would compromise reliability and validity. The aim of this study was to compare the performance of three scales selected from the SF-36 generic health questionnaire when administered in isolation or within the entire SF-36 to patients with musculoskeletal disorders.

Journal ArticleDOI
TL;DR: Narrative reviews and editorials are accessed more frequently than primary research papers or systematic reviews in the first week after their publication, which may disappoint those who believe that it is important for readers to critically appraise the primary research data.
Abstract: The electronic version of the British Medical Journal (eBMJ) has a unique feature in that it provides an electronic record of the number of times an article has been viewed ("hits") in the week after its publication. We sought to compare the relative popularity of primary research and "evidence-based" papers against that of narrative reviews and editorials. We surveyed four broad groupings of articles in 2001: Editorials, Clinical Reviews (which are narrative reviews), Education and Debate, and Papers (which are original research articles and systematic reviews). Clinical Reviews were the most frequently viewed articles, with an average of 4148 hits per article, while Papers were less popular (average of 1168 hits per article). Systematic reviews (23 articles, average of 1190 hits per article) were visited far less often than narrative reviews. Editorials (average of 2537 hits per article) were viewed much more frequently than Papers, even where the editorial was written as an accompanying piece with a direct link to the paper. Narrative reviews and editorials are accessed more frequently than primary research papers or systematic reviews in the first week after their publication. These findings may disappoint those who believe that it is important for readers to critically appraise the primary research data. Although the technical quality of journal articles may have been helped by recommendations on structured reporting, the readability of such articles has received little attention. Authors and journal editors must take steps to make research articles and systematic reviews more attractive to readers. This may involve using simpler language, as well as innovative use of web resources to produce shorter, snappier papers, with the methodological or technical details made available elsewhere. Primary research and "evidence-based" papers seem to be less attractive to readers than narrative reviews and editorials in the first week after publication. Authors and editors should try to improve the early appeal of primary research papers.

Journal ArticleDOI
TL;DR: Armitage's technique of matched-pairs sequential analysis, the statistical analysis of results while they are still accumulating, should be considered for laboratory experiments, because if enough interim analyses are conducted, ultimately one of the analyses will result in the magical P = 0.05.
Abstract: Techniques for interim analysis, the statistical analysis of results while they are still accumulating, are highly-developed in the setting of clinical trials. But in the setting of laboratory experiments such analyses are usually conducted secretly and with no provisions for the necessary adjustments of the Type I error-rate. Laboratory researchers, from ignorance or by design, often analyse their results before the final number of experimental units (humans, animals, tissues or cells) has been reached. If this is done in an uncontrolled fashion, the pejorative term 'peeking' has been applied. A statistical penalty must be exacted. This is because if enough interim analyses are conducted, and if the outcome of the trial is on the borderline between 'significant' and 'not significant', ultimately one of the analyses will result in the magical P = 0.05. I suggest that Armitage's technique of matched-pairs sequential analysis should be considered. The conditions for using this technique are ideal: almost unlimited opportunity for matched pairing, and a short time between commencement of a study and its completion. Both the Type I and Type II error-rates are controlled. And the maximum number of pairs necessary to achieve an outcome, whether P = 0.05 or P > 0.05, can be estimated in advance. Laboratory investigators, if they are to be honest, must adjust the critical value of P if they analyse their data repeatedly. I suggest they should consider employing matched-pairs sequential analysis in designing their experiments.

Journal ArticleDOI
TL;DR: The BK-Plot provides a simple method to understand generalizability in randomized trials by investigating how randomized trials can also stochastically answer the question of what would be the effect of treatment on outcome in a population with a possibly different distribution of an unobserved binary baseline variable.
Abstract: Background Randomized trials stochastically answer the question. "What would be the effect of treatment on outcome if one turned back the clock and switched treatments in the given population?" Generalizations to other subjects are reliable only if the particular trial is performed on a random sample of the target population. By considering an unobserved binary variable, we graphically investigate how randomized trials can also stochastically answer the question, "What would be the effect of treatment on outcome in a population with a possibly different distribution of an unobserved binary baseline variable that does not interact with treatment in its effect on outcome?"

Journal ArticleDOI
TL;DR: A novel approach that uses the randomization distribution to compute the anticipated maximum bias when missing at random does not hold due to an unobserved binary covariate (implying that missingness depends on outcome and treatment group).
Abstract: Many randomized trials involve missing binary outcomes. Although many previous adjustments for missing binary outcomes have been proposed, none of these makes explicit use of randomization to bound the bias when the data are not missing at random.

Journal ArticleDOI
TL;DR: PSE has promise for obtaining an upper bound on the reduction in population cancer mortality rates based on observational screening data, but if the upper bound estimate is found to be small and any birth cohort effects are likely minimal, then a definitive randomized trial would not be warranted.
Abstract: Because randomized cancer screening trials are very expensive, observational cancer screening studies can play an important role in the early phases of screening evaluation. Periodic screening evaluation (PSE) is a methodology for estimating the reduction in population cancer mortality from data on subjects who receive regularly scheduled screens. Although PSE does not require assumptions about natural history of cancer it requires other assumptions, particularly progressive detection – the assumption that once a cancer is detected by a screening test, it will always be detected by the screening test. We formulate a simple version of PSE and show that it leads to an upper bound on screening efficacy if the progressive detection assumption does not hold (and any effect of birth cohort is minimal) To determine if the upper bound is reasonable, for three randomized screening trials, we compared PSE estimates based only on screened subjects with PSE estimates based on all subjects. In the three randomized screening trials, PSE estimates based on screened subjects gave fairly close results to PSE estimates based on all subjects. PSE has promise for obtaining an upper bound on the reduction in population cancer mortality rates based on observational screening data. If the upper bound estimate is found to be small and any birth cohort effects are likely minimal, then a definitive randomized trial would not be warranted.

Journal ArticleDOI
TL;DR: This study indicates that high correlation coefficients can be considered as a leading sign of data fabrication; as more than 40% of the participants generated variables with correlation coefficients greater than 0.70.
Abstract: Misconduct in medical research has been the subject of many papers in recent years. Among different types of misconduct, data fabrication might be considered as one of the most severe cases. There have been some arguments that correlation coefficients in fabricated data-sets are usually greater than that found in real data-sets. We aim to study the differences between real and fabricated data-sets in term of the association between two variables. Three examples are presented where outcomes from made up (fabricated) data-sets are compared with the results from three real data-sets and with appropriate simulated data-sets. Data-sets were made up by faculty members in three universities. The first two examples are devoted to the correlation structures between continuous variables in two different settings: first, when there is high correlation coefficient between variables, second, when the variables are not correlated. In the third example the differences between real data-set and fabricated data-sets are studied using the independent t-test for comparison between two means. In general, higher correlation coefficients are seen in made up data-sets compared to the real data-sets. This occurs even when the participants are aware that the correlation coefficient for the corresponding real data-set is zero. The findings from the third example, a comparison between means in two groups, shows that many people tend to make up data with less or no differences between groups even when they know how and to what extent the groups are different. This study indicates that high correlation coefficients can be considered as a leading sign of data fabrication; as more than 40% of the participants generated variables with correlation coefficients greater than 0.70. However, when inspecting for the differences between means in different groups, the same rule may not be applicable as we observed smaller differences between groups in made up compared to the real data-set. We also showed that inspecting the scatter-plot of two variables can be considered as a useful tool for uncovering fabricated data.

Journal ArticleDOI
TL;DR: An innovative and practical design for the boxes for packing the drugs is described as a way of increasing the security of allocation concealment and blinding, and ascertainment bias is assessed using sensitivity analyses.
Abstract: Background: The aim of this article is to explore ways in which selection bias and ascertainment bias can be reduced and investigated in trials, by using the example of a drug trial carried out in both developed and developing countries in hospital delivery wards. Methods: We describe an innovative and practical design for the boxes for packing the drugs as a way of increasing the security of allocation concealment and blinding. We also assess ascertainment bias using sensitivity analyses, as some unblinding could have occurred due to a potential side effect of one of the drugs. Results: The sensitivity analyses indicated that the conclusions about the relative effects of the treatments could be maintained even in the unlikely worst-case scenarios. Conclusions: Detailed description of the procedures protecting against common biases and of the assessment of ascertainment bias in this trial should allow readers to confidently appraise and interpret the results obtained. In addition, our experiences will assist others in planning trials in the future.

Journal ArticleDOI
TL;DR: The full likelihood method provides a new imputation tool at the disposal of trials with surrogate data and yielded an estimate of the treatment effect which was more precise than an estimate gained from using the true end point data alone.
Abstract: The Anglia Menorrhagia Education Study (AMES) is a randomized controlled trial testing the effectiveness of an education package applied to general practices. Binary data are available from two sources; general practitioner reported referrals to hospital, and referrals to hospital determined by independent audit of the general practices. The former may be regarded as a surrogate for the latter, which is regarded as the true endpoint. Data are only available for the true end point on a sub set of the practices, but there are surrogate data for almost all of the audited practices and for most of the remaining practices. The aim of this paper was to estimate the treatment effect using data from every practice in the study. Where the true endpoint was not available, it was estimated by three approaches, a regression method, multiple imputation and a full likelihood model. Including the surrogate data in the analysis yielded an estimate of the treatment effect which was more precise than an estimate gained from using the true end point data alone. The full likelihood method provides a new imputation tool at the disposal of trials with surrogate data.