Running experiments on Amazon Mechanical Turk

Home
/
Papers
/
Running experiments on Amazon Mechanical Turk

Posted Content•

Running experiments on Amazon Mechanical Turk

Gabriele Paolacci¹, Jesse Chandler², Panagiotis G. Ipeirotis³•Institutions (3)

Ca' Foscari University of Venice¹, Princeton University², New York University³

01 Jan 2010-Judgment and Decision Making (Society for Judgment and Decision Making)-Vol. 5, Iss: 5, pp 411-419

TL;DR: The authors presented new demographic data about the Mechanical Turk subject population, reviewed the strengths of Mechanical Turk relative to other online and offline methods of recruiting subjects, and compared the magnitude of effects obtained using Mechanical Turk and traditional subject pools.

read less

Abstract: Although Mechanical Turk has recently become popular among social scientists as a source of experimental data, doubts may linger about the quality of data provided by subjects recruited from online labor markets. We address these potential concerns by presenting new demographic data about the Mechanical Turk subject population, reviewing the strengths of Mechanical Turk relative to other online and offline methods of recruiting subjects, and comparing the magnitude of effects obtained using Mechanical Turk and traditional subject pools. We further discuss some additional benefits such as the possibility of longitudinal, cross cultural and prescreening designs, and offer some advice on how to best manage a common subject pool.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk

[...]

Adam J. Berinsky¹, Gregory A. Huber², Gabriel S. Lenz³•Institutions (3)

Massachusetts Institute of Technology¹, Yale University², University of California, Berkeley³

02 Mar 2012-Political Analysis

TL;DR: It is shown that respondents recruited in this manner are often more representative of the U.S. population than in-person convenience samples but less representative than subjects in Internet-based panels or national probability samples.

...read moreread less

Abstract: We examine the trade-offs associated with using Amazon.com’s Mechanical Turk (MTurk) interface for subject recruitment. We first describe MTurk and its promise as a vehicle for performing low-cost and easy-to-field experiments. We then assess the internal and external validity of experiments performed using MTurk, employing a framework that can be used to evaluate other subject pools. We first investigate the characteristics of samples drawn from the MTurk population. We show that respondents recruited in this manner are often more representative of the U.S. population than in-person convenience samples—the modal sample in published experimental political science—but less representative than subjects in Internet-based panels or national probability samples. Finally, we replicate important published experimental work using MTurk samples.

...read moreread less

3,517 citations

Journal Article•DOI•

Conducting behavioral research on Amazon's Mechanical Turk.

[...]

Winter Mason¹, Siddharth Suri¹•Institutions (1)

Yahoo!¹

01 Mar 2012-Behavior Research Methods

TL;DR: It is shown that when taken as a whole Mechanical Turk can be a useful tool for many researchers, and how the behavior of workers compares with that of experts and laboratory subjects is discussed.

...read moreread less

Abstract: Amazon’s Mechanical Turk is an online labor market where requesters post jobs and workers choose which jobs to do for pay. The central purpose of this article is to demonstrate how to use this Web site for conducting behavioral research and to lower the barrier to entry for researchers who could benefit from this platform. We describe general techniques that apply to a variety of types of research and experiments across disciplines. We begin by discussing some of the advantages of doing experiments on Mechanical Turk, such as easy access to a large, stable, and diverse subject pool, the low cost of doing experiments, and faster iteration between developing theory and executing experiments. While other methods of conducting behavioral research may be comparable to or even better than Mechanical Turk on one or more of the axes outlined above, we will show that when taken as a whole Mechanical Turk can be a useful tool for many researchers. We will discuss how the behavior of workers compares with that of experts and laboratory subjects. Then we will illustrate the mechanics of putting a task on Mechanical Turk, including recruiting subjects, executing the task, and reviewing the work that was submitted. We also provide solutions to common problems that a researcher might face when executing their research on this platform, including techniques for conducting synchronous experiments, methods for ensuring high-quality work, how to keep data private, and how to maintain code security.

...read moreread less

2,521 citations

Journal Article•DOI•

Inside the Turk Understanding Mechanical Turk as a Participant Pool

[...]

Gabriele Paolacci¹, Jesse Chandler²•Institutions (2)

Erasmus University Rotterdam¹, University of Michigan²

03 Jun 2014-Current Directions in Psychological Science

TL;DR: The characteristics of Mechanical Turk as a participant pool for psychology and other social sciences, highlighting the traits of the MTurk samples, why people become Mechanical Turk workers and research participants, and how data quality on Mechanical Turk compares to that from other pools and depends on controllable and uncontrollable factors as mentioned in this paper.

...read moreread less

Abstract: Mechanical Turk (MTurk), an online labor market created by Amazon, has recently become popular among social scientists as a source of survey and experimental data. The workers who populate this market have been assessed on dimensions that are universally relevant to understanding whether, why, and when they should be recruited as research participants. We discuss the characteristics of MTurk as a participant pool for psychology and other social sciences, highlighting the traits of the MTurk samples, why people become MTurk workers and research participants, and how data quality on MTurk compares to that from other pools and depends on controllable and uncontrollable factors.

...read moreread less

1,926 citations

Journal Article•DOI•

Data collection in a flat world: the strengths and weaknesses of mechanical turk samples

[...]

Joseph K. Goodman¹, Cynthia Cryder¹, Amar Cheema²•Institutions (2)

Washington University in St. Louis¹, University of Virginia²

01 Jul 2013-Journal of Behavioral Decision Making

TL;DR: The authors compared Mechanical Turk participants with community and student samples on a set of personality dimensions and classic decision-making biases and found that MTurk participants are less extraverted and have lower self-esteem than other participants, presenting challenges for some research domains.

...read moreread less

Abstract: Mechanical Turk (MTurk), an online labor system run by Amazon.com, provides quick, easy, and inexpensive access to online research participants. As use of MTurk has grown, so have questions from behavioral researchers about its participants, reliability, and low compensation. In this article, we review recent research about MTurk and compare MTurk participants with community and student samples on a set of personality dimensions and classic decision-making biases. Across two studies, we find many similarities between MTurk participants and traditional samples, but we also find important differences. For instance, MTurk participants are less likely to pay attention to experimental materials, reducing statistical power. They are more likely to use the Internet to find answers, even with no incentive for correct responses. MTurk participants have attitudes about money that are different from a community sample’s attitudes but similar to students’ attitudes. Finally, MTurk participants are less extraverted and have lower self-esteem than other participants, presenting challenges for some research domains. Despite these differences, MTurk participants produce reliable results consistent with standard decision-making biases: they are present biased, risk-averse for gains, risk-seeking for losses, show delay/expedite asymmetries, and show the certainty effect—with almost no significant differences in effect sizes from other samples. We conclude that MTurk offers a highly valuable opportunity for data collection and recommend that researchers using MTurk (1) include screening questions that gauge attention and language comprehension; (2) avoid questions with factual answers; and (3) consider how individual differences in financial and social domains may influence results. Copyright © 2012 John Wiley & Sons, Ltd.

...read moreread less

1,755 citations

Journal Article•DOI•

Beyond the Turk: Alternative platforms for crowdsourcing behavioral research

[...]

Eyal Peer¹, Laura Brandimarte², Sonam Samat³, Alessandro Acquisti³•Institutions (3)

Bar-Ilan University¹, University of Arizona², Carnegie Mellon University³

01 May 2017-Journal of Experimental Social Psychology

TL;DR: This article found that participants on both platforms were more naive and less dishonest compared to MTurk participants, and ProA and CrowdFlower participants produced data quality that was higher than CF's and comparable to M-Turk's.

...read moreread less

1,537 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

The Framing of Decisions and the Psychology of Choice

[...]

Amos Tversky¹, Daniel Kahneman²•Institutions (2)

Stanford University¹, University of British Columbia²

30 Jan 1981-Science

TL;DR: The psychological principles that govern the perception of decision problems and the evaluation of probabilities and outcomes produce predictable shifts of preference when the same problem is framed in different ways.

...read moreread less

Abstract: The psychological principles that govern the perception of decision problems and the evaluation of probabilities and outcomes produce predictable shifts of preference when the same problem is framed in different ways. Reversals of preference are demonstrated in choices regarding monetary outcomes, both hypothetical and real, and in questions pertaining to the loss of human lives. The effects of frames on preferences are compared to the effects of perspectives on perceptual appearance. The dependence of preferences on the formulation of decision problems is a significant concern for the theory of rational choice.

...read moreread less

15,513 citations

Journal Article•DOI•

Amazon's Mechanical Turk A New Source of Inexpensive, Yet High-Quality, Data?

[...]

Michael D. Buhrmester¹, Tracy Kwang¹, Samuel D. Gosling¹•Institutions (1)

University of Texas at Austin¹

03 Feb 2011-Perspectives on Psychological Science

TL;DR: Findings indicate that MTurk can be used to obtain high-quality data inexpensively and rapidly and the data obtained are at least as reliable as those obtained via traditional methods.

...read moreread less

Abstract: Amazon's Mechanical Turk (MTurk) is a relatively new website that contains the major elements required to conduct research: an integrated participant compensation system; a large participant pool; and a streamlined process of study design, participant recruitment, and data collection. In this article, we describe and evaluate the potential contributions of MTurk to psychology and other social sciences. Findings indicate that (a) MTurk participants are slightly more demographically diverse than are standard Internet samples and are significantly more diverse than typical American college samples; (b) participation is affected by compensation rate and task length, but participants can still be recruited rapidly and inexpensively; (c) realistic compensation rates do not affect data quality; and (d) the data obtained are at least as reliable as those obtained via traditional methods. Overall, MTurk can be used to obtain high-quality data inexpensively and rapidly.

...read moreread less

9,562 citations

Journal Article•DOI•

On the social psychology of the psychological experiment: With particular reference to demand characteristics and their implications.

[...]

Martin T. Orne¹•Institutions (1)

Harvard University¹

01 Nov 1962-American Psychologist

TL;DR: In this paper, the authors focus on some of the qualities peculiar to psychological experiments and point out that the demand characteristics perceived in any particular experiment will vary with the sophistication, intelligence, and previous experience of each experimental subject.

...read moreread less

Abstract: Since the time of Galileo, scientists have employed the laboratory experiment as a method of understanding natural phenomena. This chapter focuses on some of the qualities peculiar to psychological experiments. The experimental situation is one which takes place within the context of an explicit agreement of the subject to participate in a special form of social interaction known as "taking part in an experiment". The demand characteristics perceived in any particular experiment will vary with the sophistication, intelligence, and previous experience of each experimental subject. It becomes an empirical issue to study under what circumstances, in what kind of experimental contexts, and with what kind of subject populations, demand characteristics become significant in determining the behavior of subjects in experimental situations. The most obvious technique for determining what demand characteristics are perceived is the use of post-experimental inquiry. In this regard, it is well to point out that considerable self-discipline is necessary for the experimenter to obtain a valid inquiry.

...read moreread less

3,634 citations

Journal Article•DOI•

Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment.

[...]

Amos Tversky¹, Daniel Kahneman•Institutions (1)

Stanford University¹

01 Jun 1983-Psychological Review

TL;DR: The conjunction rule as mentioned in this paper states that the probability of a conjunction cannot exceed the probabilities of its constituents, P (A) and P (B), because the extension (or the possibility set) of the conjunction is included in the extension of their constituents.

...read moreread less

Abstract: Perhaps the simplest and the most basic qualitative law of probability is the conjunction rule: The probability of a conjunction, P (A&B) cannot exceed the probabilities of its constituents, P (A) and P (B), because the extension (or the possibility set) of the conjunction is included in the extension of its constituents. Judgments under uncertainty, however, are often mediated by intuitive heuristics that are not bound by the conjunction rule. A conjunction can be more representative than one of its constituents, and instances of a specific category can be easier to imagine or to retrieve than instances of a more inclusive category. The representativeness and availability heuristics therefore can make a conjunction appear more probable than one of its constituents. This phenomenon is demonstrated in a variety of contexts including estimation of word frequency, personality judgment, medical prognosis, decision under risk, suspicion of criminal acts, and political forecasting. Systematic violations of the conjunction rule are observed in judgments of lay people and of experts in both between-subjects and within-subjects comparisons. Alternative interpretations of the conjunction fallacy are discussed and attempts to combat it are explored.

...read moreread less

3,221 citations

Journal Article•DOI•

Should we trust web-based studies? A comparative analysis of six preconceptions about internet questionnaires.

[...]

Samuel D. Gosling¹, Simine Vazire¹, Sanjay Srivastava², Oliver P. John³•Institutions (3)

University of Texas at Austin¹, Stanford University², University of California, Berkeley³

01 Feb 2004-American Psychologist

TL;DR: Internet data collection methods, with a focus on self-report questionnaires from self-selected samples, are evaluated and compared with traditional paper-and-pencil methods and it is concluded that Internet methods can contribute to many areas of psychology.

...read moreread less

Abstract: The rapid growth of the Internet provides a wealth of new research opportunities for psychologists. Internet data collection methods, with a focus on self-report questionnaires from self-selected samples, are evaluated and compared with traditional paper-and-pencil methods. Six preconceptions about Internet samples and data quality are evaluated by comparing a new large Internet sample (N = 361,703) with a set of 510 published traditional samples. Internet samples are shown to be relatively diverse with respect to gender, socioeconomic status, geographic region, and age. Moreover, Internet findings generalize across presentation formats, are not adversely affected by nonserious or repeat responders, and are consistent with findings from traditional methods. It is concluded that Internet methods can contribute to many areas of psychology.

...read moreread less

2,870 citations