scispace - formally typeset
Open AccessJournal ArticleDOI

Fishing, Commitment, and Communication: A Proposal for Comprehensive Nonbinding Research Registration

Reads0
Chats0
TLDR
It is demonstrated here that even with this level of advanced specification, the scope for fishing is considerable when there is latitude over selection of covariates, subgroups, and other elements of an analysis plan.
Abstract
Social scientists generally enjoy substantial latitude in selecting measures and models for hypothesis testing. Coupled with publication and related biases, this latitude raises the concern that researchers may intentionally or unintentionally select models that yield positive findings, leading to an unreliable body of published research. To combat this “fishing” problem in medical studies, leading journals now require preregistration of designs that emphasize the prior identification of dependent and independent variables. However, we demonstrate here that even with this level of advanced specification, the scope for fishing is considerable when there is latitude over selection of covariates, subgroups, and other elements of an analysis plan. These concerns could be addressed through the use of a form of comprehensive registration. We experiment with such an approach in the context of an ongoing field experiment for which we drafted a complete “mock report” of findings using fake data on treatment assignment. We describe the advantages and disadvantages of this form of registration and propose that a comprehensive but nonbinding approach be adopted as a first step to combat fishing by social scientists. Likely effects of comprehensive but nonbinding registration are discussed, the principal advantage being communication rather than commitment, in particular that it generates a clear distinction between exploratory analyses and genuine tests.

read more

Content maybe subject to copyright    Report

Fishing, Commitment, and Communication: A Proposal for
Comprehensive Nonbinding Research Registration
Macartan Humphreys
Columbia University
Raul Sanchez de la Sierra
Columbia University
Peter van der Windt
Columbia University
May 29, 2012
Abstract
Social scientists generally enjoy substantial latitude in selecting measures and models
for hypothesis testing. Coupled with publication and related biases, this latitude raises
the concern that researchers may intentionally or unintentionally select models that yield
positive findings, leading to an unreliable body of published research. To combat this
“fishing” problem in medical studies, leading journals now require preregistration of designs
that emphasize the prior identification of dependent and independent variables. However,
we demonstrate here that even with this level of advanced specification, the scope for
fishing is considerable when there is latitude over selection of covariates, subgroups, and
other elements of an analysis plan. These concerns could be addressed through the use
of a form of comprehensive registration. We experiment with such an approach in the
context of an ongoing field experiment for which we drafted a complete “mock report”
of findings using fake data on treatment assignment. We describe the advantages and
disadvantages of this form of registration and propose that a comprehensive but non-
binding approach be adopted as a first step to combat fishing by social scientists. Likely
effects of comprehensive but non-binding registration are discussed, the principle advantage
being communication rather than commitment, in particular that it generates a clear
distinction between exploratory analyses and genuine tests.
Published in Political Analysis: 2013, 21(1), 1-20. Corresponding author: mh2245@columbia.edu. Our
thanks to Ali Cirone, Andy Gelman, Ryan Moore, and Ferran Elias Moreno for helpful comments. Our thanks
to the Population Center at Columbia for providing access to High Performance Computing (HPC) Cluster.
This research was undertaken in the context of a field experiment in DRC; we thank the International Rescue
Committee and CARE International for their partnership in that research and the International Initiative for
Impact Evaluation (3IE) for their support. Humphreys thanks the Trudeau Foundation for support while this
work was undertaken. Replication data and code for tables and figures can be found at http://hdl.handle.
net/1902.1/18182.

1 Introduction
There is a growing concern regarding reporting and publication bias in experimental and
observational work in social science arising from the intentional or unintentional practice
of data fishing. The adoption of registries provides one possible response to the problem;
but while the potential benefits of preregistration of research designs is easily grasped there
has been essentially no adoption of the practice by political scientists. Moreover, even with
agreement on registration there may be disagreement on what exactly should get registered
and the implications of this practice. These choices are consequential, and merit a focused
discussion in the discipline.
In this paper we provide a description of the scope for bias, an illustration of a candidate
solution, and a proposal for the introduction of registration procedures in political science.
We begin with a short discussion of the fishing problem (Section 2) before reviewing the
state of registration practice in Section 3. Section 3 also provides a set of simulation results
that demonstrate the scope for fishing that might exist even in the presence of basic registration
procedures such as those used in medical sciences. In Section 4 we discuss a limiting form of
comprehensive registration in which there is full detailing of an analysis and reporting plan.
We field-tested this approach to registration in the context of a randomized intervention on
development aid that we are undertaking in Congo. At the risk of losing freedom during the
analysis phase we draft a complete “Mock Report” of findings based on simulated data. As
we describe below, this informal experiment revealed both benefits and practical difficulties
that researchers may face when attempting comprehensive registration.
After reviewing the Mock Report and the main lessons that it generated, we propose an
approach to registration for political science research in Section 5 that involves the creation of
a nonbinding mechanism hosted by a third party and enforced by journals. After presenting
the proposal we describe possible effects. First we focus on the informational effects of a
nonbinding registration regime, arguing that nonbinding registration plays an important com-
munication function independent of any commitment benefits. Second, considering that the
institutional change we are recommending can generate a response in the researcher’s choices,
we use a simple model to discuss the kinds of incentive effects (positive and negative) that
might arise from a registration requirement. Section 6 concludes.
2 Fishing as a Reporting Problem
We begin with a very broad definition of data fishing. Say that there is a set of models
M(C) = {M
1
, M
2
, . . . M
n
} that could be examined in any given context C. We use “model”
2

in the broad sense to cover inquiries, specific tests, hypotheses, or procedures for assessing
collections of hypotheses. By context we mean the setting in which data is gathered, which
encompasses the cases used as well as the type of data available. Say then that each model
is associated with a set of possible conclusions and that prior to the implementation of the
research the conclusion to be drawn from model j in context C is a random variable. We say
that a model is “fished” when the decision to report the model depends on the realization of
the conclusion.
A few features of this definition are worth noting. First, under this definition both the
models that are and are not selected for reporting suffer from the fishing problem, highlighting
that the problem lies as much with those results that are not reported as with those that are.
Second, the definition we give is deliberately broad in order to highlight that the problem of
fishing is not tied to classic hypothesis testing. Thus for example an investigation may seek
to assess simply how many sheep there are in a flock. One model might suggest counting
the legs and dividing by four, another relies on a headcount. Fishing would occur if the
researcher reported only whichever method yielded the largest estimate. Third, the use of
complex models that determine how to make use of submodels conditional on the data does not
constitute fishing if the models themselves are reported independent of results. For example
the researcher might examine the complex model that selects the maximum number genderated
by the headcount and footcount approaches. Readers might not care for this approach and
worry that it is biased upwards, but the result is not fished. This leads to the fourth and
most important point. We deliberately distinguish the fishing problem from the problem of
assessing whether models are good or not and in doing so we locate the fishing problem in
reporting practices (which results end up being reported?) and not in the mode of analysis
(for example, whether multiple hypotheses are examined and how any complications arising
from this multiplicity are addressed).
1
This distinction is important for assessing the role of
registration independent of corrections that might be implemented to account for specification
searches. For example consider the complex model in which an investigator proposes to gather
20 outcome measures and declare a treatment effective if a t-test rejects the null of no effect
on at least one of them at the 95% level. Under our definition, this approach, which fails to
address the multiple comparisons problem, is not fished; it is simply unusually vulnerable to
Type 1 error. Conversely, it is also possible to address a specification search problem in a
seemingly reasonable way and still fish; for example the researcher may have 20 measures for
1
We note that our definition differs from data snooping as described by (White, 2000). For White, snooping
occurs “whenever a given dataset is used more than once for purposes of inference or model selection.” In
this account the problem is located not in the reporting but in the inference; snooping, like data mining, may
be a valuable way of undertaking analysis and the challenge taken up by White is to assess not how to avoid
snooping but when the results from snooping are reliable.
3

each of two families of outcomes and employ some method such as the Bonferroni correction
in each family, but report only the case that yields the more interesting results.
The problem with fishing is simple: selecting what results get reported can induce bias. If
for example classical statistical tests are reported only when they yield “significant” findings,
then false positives will be overreported, true negatives will be underreported, and overall
conclusions will be wrong.
2
But as highlighted above the problem extends beyond this type
of testing and could obtain for example in approaches that focus on “Type S” errors, seeking
to make statements of the form θ
1
> θ
2
, with confidence” (see for example Gelman and
Tuerlinckx (2000)). This kind of concern has led some to worry that “most current published
research findings are false” (Ioannidis (2005)). For evidence of the prevalence of these problems
in political science, see Gerber et al. (2001) and Gerber and Malhotra (2008). The evidence
provided by (Gerber et al., 2001) of p-values concentrating just shy of the 5% mark (with a
drop in concentration just above it) are consistent with a problem of selective reporting and
not simply inappropriate adjustment of tests to account for multiple comparisons.
The problem, though most often discussed in terms of reporting and publication biases, en-
ters at other stages, if more subtly: during analysis (if for example researchers find themselves
more drawn to analyze patterns in those parts of the data where the action is), discussions (if
researchers select stronger specification as “preferred specifications” or interpret their findings
ex post), at the time of submission (filedrawer bias), and in the attention subsequently paid
to results by others (where more counterintuitive positive findings get picked up in syllabi or
popular and social media).
We note that while fishing may be the product of outright fraud (Callaway, 2011) (or, as
termed euphemistically by Glaeser (2006), “researcher initiative”), much more subtle mech-
anisms may also be at play. One possible cause may be the presence of a kind of inferential
error that stems from the attempt to let the data speak before formal analysis. In particular,
if more reliable tests are more likely to produce true positives, then one might correctly infer
that of two candidate tests the one with the “significant” results is the more reliable one. In
that case, while the inference that the more significant measure is more likely the right one
may be correct, basing inferences on the the significant measure only would not be.
3
2
Selective reporting does not necessarily induce bias. For example if the conclusion of interest is a possibility
result—say that that black swans exist—then the conclusion stands independent of the number of unreported
failed tests.
3
Say a researcher has prior belief p that a proposition is true. The researcher runs two tests, M
1
and
M
2
; each test yields a response which is either positive or negative; that is, R
i
{P, N}. Ex ante the
researcher does not know the quality of the tests but expects that with probability q they are high quality
(in set H), otherwise they are low quality (in set L). Say that for both tests the probability of observing
a positive result if the proposition is false is φ and that the probability of observing a positive result is ψ
H
if the test is high quality and ψ
L
if it is low quality. Assume that φ is low (for example φ = 0.05) and
that ψ
H
> ψ
L
. Assume types and results are drawn independently. In this case it is easy to show that
4

3 Design Registration
A solution to the problem is to adopt a practice that has been promoted in medical fields (De
Angelis et al. (2004)) and requires some form of preregistration of research design. Registration
provides a way for researchers to prespecify a set of tests and it allows readers to assess the
results of a test in the context of the family of related tests.
4
3.1 Registration in Medical Studies
In medical studies the shift in norms took place in 2004 when the twelve journals belonging to
the International Committee of Medical Journal Editors (ICMJE) announced that they would
henceforth require “as a condition of consideration for publication, registration in a public
trials registry” (De Angelis et al. (2004)). The ICMJE elected to recognize registries only if
they meet several criteria: The registry must be electronically searchable and accessible to the
public at no charge; be open to all registrants and not for profit; and have a mechanism to
ensure the validity of the registration data (De Angelis et al. (2004)). The focus then was, and
to a large extent still is, on experimental studies although a number of journals now encourage
(but do not require) the registration of observational studies on a WHO-compliant registry
before they begin.
5
At the time, only ClinicalTrials.gov maintained by the US National Institutes of Health
(NIH) complied with these requirements. Since then, the WHO’s International Clinical
Trial Registry Platform (ICTRP) has developed a network of both primary and partner regis-
ters. Primary registers are WHO-selected registers that comply with the WHO’s 20 points of
minimal registration requirements. Important elements include 1) Unique trial number, 2) Re-
search ethics review, 3) The medical condition being studied, 4) Description of interventions,
5) Key inclusion and exclusion criteria, 6) Study type, 7) Target sample size, 8) Description
of primary outcome, 9) Description of secondary outcomes. Note that the method of analysis
does not enter in the list although it may be described under measurement of outcomes.
There has been rapid growth in the last decade in the use of registration procedures. Before
the ICMJE policy in 2004 ClinicalTrials.gov contained 13,153 trials (Laine et al. (2007)),
P r(M
1
H|R
1
= P, R
2
= N) > P r(M
2
H|R
1
= P, R
2
= N ) ψ
H
> ψ
L
The researcher would then
be right to conclude that Measure 1 is more reliable than Measure 2. But it would be a fallacy to then base
inferences on Measure 1 only. If Measure 1 is really high quality then the chance of a false positive is just φ.
However if the proposition were false the probability of seeing one positive and one negative result is much
higher: 2φ(1 φ).
4
There are other benefits to registration, including the ability to assess the population of studies that are
never completed or never published. Here however we focus on the benefits of ex ante specification of analysis
plans.
5
See, for example, Lancet (2010). A mechanism to report observational studies already exists on many
registries. ClinicalTrials.gov, for example, has (as of 23 March 2012) 22,297 observational studies registered.
5

Citations
More filters
Posted Content

Reshaping Institutions: Evidence on Aid Impacts Using a Pre-Analysis Plan

TL;DR: In this paper, the authors evaluate one attempt to make local institutions more egalitarian by imposing minority participation requirements in Sierra Leone and test for longer term learning-by-doing effects, finding positive short run effects on local public goods provision and economic outcomes, but no sustained impacts on collective action, decision-making processes, or the involvement of marginalized groups (like women) in local affairs, indicating that the intervention was ineffective at durably reshaping local institutions.
Journal ArticleDOI

Generalizing from Survey Experiments Conducted on Mechanical Turk: A Replication Approach

TL;DR: This paper showed that results derived from convenience samples like Amazon Mechanical Turk are similar to those obtained from national samples, and that the treatment effects uncovered in these experiments appear to be relatively homogeneous despite a wide diversity of background characteristics across samples.
Journal ArticleDOI

Reshaping Institutions: Evidence on Aid Impacts Using a Pre-Analysis Plan

TL;DR: In this article, the authors conducted a gender impact evaluation study, entitled Reshaping institutions : evidence on aid impacts using a pre analysis plan, conducted between 2005 and 2009 in Sierra Leone, and observed the impact of pre analysis plans (PAP) on local institutions by imposing participation requirements for marginalized groups (including women) and test for learning-by-doing effects in individual and village level.
Journal ArticleDOI

The Populist Style in American Politics: Presidential Campaign Discourse, 1952–1996

TL;DR: In this article, the authors define populism as a discursive strategy that juxtaposes the virtuous populace with a corrupt elite and views the former as the sole legitimate source of political power.
References
More filters
Journal ArticleDOI

Controlling the false discovery rate: a practical and powerful approach to multiple testing

TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
BookDOI

Designing Social Inquiry: Scientific Inference in Qualitative Research

TL;DR: For instance, King, Keohane, Verba, and Verba as mentioned in this paper have developed a unified approach to valid descriptive and causal inference in qualitative research, where numerical measurement is either impossible or undesirable.

Why Most Published Research Findings Are False

TL;DR: In this paper, the authors discuss the implications of these problems for the conduct and interpretation of research and suggest that claimed research findings may often be simply accurate measures of the prevailing bias.
Journal ArticleDOI

False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant

TL;DR: It is shown that despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings, flexibility in data collection, analysis, and reporting dramatically increases actual false- positive rates, and a simple, low-cost, and straightforwardly effective disclosure-based solution is suggested.
Journal ArticleDOI

A Reality Check for Data Snooping

TL;DR: The purpose here is to provide a straightforward procedure for testing the null hypothesis that the best model encountered in a specification search has no predictive superiority over a given benchmark model.
Related Papers (5)

Estimating the reproducibility of psychological science

Alexander A. Aarts, +290 more
- 28 Aug 2015 - 
Frequently Asked Questions (2)
Q1. What are the contributions mentioned in the paper "Fishing, commitment, and communication: a proposal for comprehensive nonbinding research registration" ?

Coupled with publication and related biases, this latitude raises the concern that researchers may intentionally or unintentionally select models that yield positive findings, leading to an unreliable body of published research. However, the authors demonstrate here that even with this level of advanced specification, the scope for fishing is considerable when there is latitude over selection of covariates, subgroups, and other elements of an analysis plan. The authors experiment with such an approach in the context of an ongoing field experiment for which they drafted a complete “ mock report ” of findings using fake data on treatment assignment. The authors describe the advantages and disadvantages of this form of registration and propose that a comprehensive but nonbinding approach be adopted as a first step to combat fishing by social scientists. Likely effects of comprehensive but non-binding registration are discussed, the principle advantage being communication rather than commitment, in particular that it generates a clear distinction between exploratory analyses and genuine tests. This research was undertaken in the context of a field experiment in DRC ; the authors thank the International Rescue Committee and CARE International for their partnership in that research and the International Initiative for Impact Evaluation ( 3IE ) for their support. Humphreys thanks the Trudeau Foundation for support while this work was undertaken. 

There may also be broader implications for the publication process, for example registration may require a change in journal procedures since if editors want reviewers to be able to provide substantive guidance to researchers, the review process might also need to shift to before data collection. Finally there may be other gains that could be achieved through the development of a registration mechanism. In addition there may be learning to assess what format can handle the array of methodologies, including quantitative and qualitative, that is employed in political science research and the extent to which registered designs should be publicly accessible prior to publication of results. These considerations suggest that a period of experimentation with the structure of a registry may be the critical next step.