scispace - formally typeset
Search or ask a question

Showing papers by "Steven N. Goodman published in 2007"


Journal ArticleDOI
TL;DR: Inspired by both the recent wave of scientific misconduct and Annals' own experience in trying to minimize the impact of bias in the authors' pages, how to protect against publishing research that is misleading or outright wrong is reviewed.
Abstract: Scientists arrive at the truth by independently verifying new observations. Journals aid this process by winnowing out research that is unlikely to stand up to independent verification and trying t...

192 citations


Journal ArticleDOI
TL;DR: The mathematical argument in the PLoS Medicine paper underlying the “proof” of the title's claim has a degree of circularity, and the paper claims to have proven something it describes as paradoxical; that the more studies published, the more likely studies in that area are to make false claims.
Abstract: The article published in PLoS Medicine by Ioannidis [1] makes the dramatic claim in the title that “most published research claims are false,” and has received extensive attention as a result. The article does provide a useful reminder that the probability of hypotheses depends on much more than just the p-value, a point that has been made in the medical literature for at least four decades, and in the statistical literature for decades previous. This topic has renewed importance with the advent of the massive multiple testing often seen in genomics studies. Unfortunately, while we agree that there are more false claims than many would suspect—based both on poor study design, misinterpretation of p-values, and perhaps analytic manipulation—the mathematical argument in the PLoS Medicine paper underlying the “proof” of the title's claim has a degree of circularity. As we show in detail in a separately published paper [2], Dr. Ioannidis utilizes a mathematical model that severely diminishes the evidential value of studies—even meta-analyses—such that none can produce more than modest evidence against the null hypothesis, and most are far weaker. This is why, in the offered “proof,” the only study types that achieve a posterior probability of 50% or more (large RCTs [randomized controlled trials] and meta-analysis of RCTs) are those to which a prior probability of 50% or more are assigned. So the model employed cannot be considered a proof that most published claims are untrue, but is rather a claim that no study or combination of studies can ever provide convincing evidence. The two assumptions that produce the above effect are: Calculating the evidential effect only of verdicts of “significance,” i.e., p ≤ 0.05, instead of the actual p-value observed in a study, e.g., p = 0.001. Introducing a new “bias” term into the Bayesian calculations, which even at a described “minimal” level (of 10%) has the effect of very dramatically diminishing a study's evidential impact. In addition to the above problems, the paper claims to have proven something it describes as paradoxical; that the “hotter” an area is (i.e., the more studies published), the more likely studies in that area are to make false claims. We have shown this claim to be erroneous [2]. The mathematical proof offered for this in the PLoS Medicine paper shows merely that the more studies published on any subject, the higher the absolute number of false positive (and false negative) studies. It does not show what the papers' graphs and text claim, viz, that the number of false claims will be a higher proportion of the total number of studies published (i.e., that the positive predictive value of each study decreases with increasing number of studies). The paper offers useful guidance in a number of areas, calling attention to the importance of avoiding all forms of bias, of obtaining more empirical research on the prevalence of various forms of bias, and on the determinants of prior odds of hypotheses. But the claims that the model employed in this paper constitutes a “proof” that most published medical research claims are false, and that research in “hot” areas is most likely to be false, are unfounded.

84 citations


Journal ArticleDOI
TL;DR: This issue argues that the primary purpose of a trial is to get an accurate assessment of the risks and benefits associated with a given treatment, and argues forcefully that the goals of error control and accurate estimation can sometimes be in direct conflict.
Abstract: This commentary reviews the argument that clinical trials with data monitoring committees that use statistical stopping guidelines should generally not be stopped early for large observed efficacy differences because efficacy estimates may be exaggerated and there is minimal information on treatment harms. Overall, the average of estimates from trials that use these boundaries differs minimally from the true value. Estimates from a given trial that seem implausibly high can be moderated by using Bayesian methods. Data monitoring committees are not ethically required to precisely estimate a large efficacy difference if that difference differs convincingly from zero, and the requirement to detect harms and balance efficacy against harm depends on whether the nature of the harm is known or unknown before the trial.

73 citations


01 Jan 2007
TL;DR: This finding suggests that cardiac resynchronization may have a substantial impact on the most common mechanism of death among patients with advanced heart failure, and suggests a trend toward reducing all-cause mortality.
Abstract: CONTEXT Progressive heart failure is the most common mechanism of death among patients with advanced heart failure. Cardiac resynchronization, a pacemaker-based therapy for heart failure, enhances cardiac performance and quality of life, but its effect on mortality is uncertain. OBJECTIVE To determine whether cardiac resynchronization reduces mortality from progressive heart failure. DATA SOURCES MEDLINE (1966-2002), EMBASE (1980-2002), the Cochrane Controlled Trials Register (Second Quarter, 2002), The National Institutes of Health ClinicalTrials.gov database, the US Food and Drug Administration Web site, and reports presented at scientific meetings (1994-2002). Search terms included pacemaker, pacing, heart failure, dual-site, multisite, biventricular, resynchronization, and left ventricular preexcitation. STUDY SELECTION Eligible studies were randomized controlled trials of cardiac resynchronization for the treatment of chronic symptomatic left ventricular dysfunction. Eligible studies reported death, hospitalization for heart failure, or ventricular arrhythmia as outcomes. Of the 6883 potentially relevant reports initially identified, 11 reports of 4 randomized trials with 1634 total patients were included in the meta-analysis. DATA EXTRACTION Trial reports were reviewed independently by 2 investigators in an unblinded standardized manner. DATA SYNTHESIS Follow-up in the included trials ranged from 3 to 6 months. Pooled data from the 4 selected studies showed that cardiac resynchronization reduced death from progressive heart failure by 51% relative to controls (odds ratio [OR], 0.49; 95% confidence interval [CI], 0.25-0.93). Progressive heart failure mortality was 1.7% for cardiac resynchronization patients and 3.5% for controls. Cardiac resynchronization also reduced heart failure hospitalization by 29% (OR, 0.71; 95% CI, 0.53-0.96) and showed a trend toward reducing all-cause mortality (OR, 0.77; 95% CI, 0.51-1.18). Cardiac resynchronization was not associated with a statistically significant effect on non-heart failure mortality (OR, 1.15; 95% CI, 0.65-2.02). Among patients with implantable cardioverter defibrillators, cardiac resynchronization had no clear impact on ventricular tachycardia or ventricular fibrillation (OR, 0.92; 95% CI, 0.67-1.27). CONCLUSIONS Cardiac resynchronization reduces mortality from progressive heart failure in patients with symptomatic left ventricular dysfunction. This finding suggests that cardiac resynchronization may have a substantial impact on the most common mechanism of death among patients with advanced heart failure. Cardiac resynchronization also reduces heart failure hospitalization and shows a trend toward reducing all-cause mortality.

58 citations


Book ChapterDOI
01 Jan 2007
TL;DR: Molecular studies indicate that Cryptoprocta is part of a radiation of Carnivora endemic to Madagascar, which unites all of the native species on the island into a single clade, now recognized as the endemic family Eupleridae.
Abstract: The puma-like Cryptoprocta ferox is the largest living Carnivora on Madagascar (Goodman et al., 2003). Cryptoprocta has been a taxonomic enigma until recently (cf. Veron & Catzeflis, 1993; Veron, 1995), showing numerous convergent morphological characters with members of the Felidae. Some of these attributes, such as semi-retractable claws used in both climbing and hunting, contributed to the long-running uncertainty as to the phylogenetic relationships of this animal. Recent molecular studies indicate that Cryptoprocta is part of a radiation of Carnivora endemic to Madagascar, which unites all of the native species on the island into a single clade (Yoder et al., 2003), now recognized as the endemic family Eupleridae (Wozencraft, in press). On the basis of molecular data this radiation of Carnivora is slightly younger than that of lemurs, but the two groups have co-existed on Madagascar for something on the order of 20 million years (Yoder et al., 2003). Until the Holocene a second member of Cryptoprocta occurred on the island that was notably larger than the living species (Goodman et al., 2004).

44 citations


Journal ArticleDOI
TL;DR: It is thought that, for most conditions, surgical procedures, and outcomes, the accuracy of surgeon- and patient-specific performance rates is illusory, obviating the ethical obligation to communicate them as part of the informed consent process.
Abstract: Objective: The purpose of the paper is to examine the ethical arguments for and against disclosing surgeon-specific performance rates to patients during informed consent, and to examine the challenges that generating and using performance rates entail. Methods: Ethical, legal, and statistical theory is explored to approach the question of whether, when, and how surgeons should disclosure their personal performance rates to patients. The main ethical question addressed is what type of information surgeons owe their patients during informed consent. This question comprises 3 related, ethically relevant considerations that are explored in detail: 1) Does surgeon-specific performance information enhance patient decision-making? 2) Do patients want this type of information? 3) How do the potential benefits of disclosure balance against the risks? Results: Calculating individual performance measures requires tradeoffs and involves inherent uncertainty. There is a lack of evidence regarding whether patients want this information, whether it facilitates their decision-making for surgery, and how it is best communicated to them. Disclosure of personal performance rates during informed consent has the potential benefits of enhancing patient autonomy, improving patient decision-making, and improving quality of care. The major risks of disclosure include inaccurate and misleading performance rates, avoidance of high-risk cases, unjust damage to surgeon's reputations, and jeopardized patient trust. Conclusion: At this time, we think that, for most conditions, surgical procedures, and outcomes, the accuracy of surgeon- and patient-specific performance rates is illusory, obviating the ethical obligation to communicate them as part of the informed consent process. Nonetheless, the surgical profession has the duty to develop information systems that allow for performance to be evaluated to a high degree of accuracy. In the meantime, patients should be informed of the quantity of procedures their surgeons have performed, providing an idea of the surgeon's experience and qualitative idea of potential risk.

42 citations


01 Jan 2007
TL;DR: Ioannidis et al. as discussed by the authors examined the structure of that argument, and showed that it has three basic components: 1) An assumption that the prior probability of most hypotheses explored in medical research is below 50%. 2) Dichotomization of P- values at the 0.05 level and introduction of a bias factor (produced by significance-seeking), the combination of which severely weakens the evidence provided by every design.
Abstract: . A recent article in this journal (Ioannidis JP (2005) Why most published research findings are false. PLoS Med 2: e124) argued that more than half of published research findings in the medical literature are false. In this commentary, we examine the structure of that argument, and show that it has three basic components: 1) An assumption that the prior probability of most hypotheses explored in medical research is below 50%. 2) Dichotomization of P- values at the 0.05 level and introduction of a “bias” factor (produced by significance-seeking), the combination of which severely weakens the evidence provided by every design. 3) Use of Bayes theorem to show that, in the face of weak evidence, hypotheses with low prior probabilities cannot have posterior probabilities over 50%. Thus, the claim is based on a priori assumptions that most tested hypotheses are likely to be false, and then the inferential model used makes it impossible for evidence from any study to overcome this handicap. We focus largely on step (2), explaining how the combination of dichotomization and “bias” dilutes experimental evidence, and showing how this dilution leads inevitably to the stated conclusion. We also demonstrate a fallacy in another important component of the argument –that papers in “hot” fields are more likely to produce false findings.

35 citations


Journal ArticleDOI
TL;DR: This issue of the journal has an extraordinary pairing of two such articles published; the authors represent two different groups that lived with the evolving data in real time; the Clinical Coordinating Center (CCC) and the Data and Safety Monitoring Board (DSMB).
Abstract: For every generation of health scientists, there are touchstone clinical trials that shape their outlook on RCTs. From the 1960’s and 1970s, we still talk about the Health Insurance Program mammography trial [1], the Coronary Drug Project [2] and the University Group Diabetes Project (UGDP), the latter a trial featuring congressional hearings and a Supreme Court decision [3]. The late 1980s saw the counterituitive results of the CAST trial [4], now invoked as an almost talismanic corrective to undue faith in surrogate endpoints. For the new century, there could not be a more powerful reminder of the value and complexity of RCTs than the hormone replacement trials of the Women’s Health Initiative (WHI) (5,6). They had all the elements of powerful drama; politics, power (statistical), money (a lot), sex (one) and . . . statistics. Even now, three years after the publication of the second trial’s results, after literally thousands of journal and newspaper pages of subsequent commentary, scientific soul searching and clinical hand wringing, the principals involved are still writing in depth about the issues they encountered. In this issue of the journal, an extraordinary pairing of two such articles published [7,8]. The authors represent two different groups that lived with the evolving data in real time; the Clinical Coordinating Center (CCC) and the Data and Safety Monitoring Board (DSMB). As in the fabled movie Rashomon, we have one story told from two perspectives, stories that are tantalizing similar in broad outline, yet different in critical details. The details that differ are not the facts of the case. Both groups have been allowed to report the same data here so each article could stand alone, and to ensure that they told their story independently. It is their tone and overall focus, a difference striking enough make this pair required reading for current and future generations of trialists. This tone is mainly due to the different roles of a coordinating center and a monitoring board. For the CCC it was to be the guardians of the data, analysis, ‘proper’ scientific procedures, and to present the data to the DSMB and to the world. They struggled to develop monitoring procedures with the right degree of structure and flexibility. A telling phrase in their narrative is the rejection of procedures that they deemed ‘unduly’ ad hoc. This follows directly an exposition where they describe other approaches (eg, weighted versus unweighted statistics, multiplicity corrections) that had reasonable but informal justifications, and that might be described as ‘duly’ ad-hoc. That ad-hoc modifications needed to be made in the face of surprising and complex data is a given; but the line dividing ‘unduly’ from ‘duly’ ad-hoc is less clear. In contrast, DSMB exposition is distinctly personal. While they discuss many of the same methodologic issues mentioned in the CCC paper, they are accompanied with a distinctly different lexicon, eg, ‘struggle’, ‘internally conflicted’, ‘emergency’, ‘cohesiveness’ and ‘group dynamics’. Statistical boundaries, weighted statistics, and harm versus benefit tradeoffs aside, we are made privy to the extraordinarily difficult task of deciding for others whether to ‘knowingly’ expose them to further risk, and the cost to society of various decisions. Perhaps most telling is the story of the votes at the most critical junctures; 5–4 in both cases, with different persons in the majority, and nobody having more than 55% confidence in their decision. The knife edge does not get any sharper than that. Epistemologists, ethicists and historians of science – take note! Finally, we get a rare inside look at the politics of monitoring a high-stakes clinical trial, with the DSMB having to deal with not only the questions above, but the perspectives of the NIH, the NHLBI and the CCC. In a world where the main focus of conflict-of-interest is on industry-sponsored trials, the comments here on how structural relationships between government sponsors, DSMB and investigators can enhance or impair decision-making deserve special attention. These papers are an important addition to the literature on data monitoring [9–13] and highlight a number of inescapable DSMB dilemmas. First, there is the question of how a DSMB balances its dual Rashomon revisited: two views of monitoring the Women’s Health Initiative trials

4 citations