scispace - formally typeset
Search or ask a question
Posted Content

Running experiments on Amazon Mechanical Turk

01 Jan 2010-Judgment and Decision Making (Society for Judgment and Decision Making)-Vol. 5, Iss: 5, pp 411-419
TL;DR: The authors presented new demographic data about the Mechanical Turk subject population, reviewed the strengths of Mechanical Turk relative to other online and offline methods of recruiting subjects, and compared the magnitude of effects obtained using Mechanical Turk and traditional subject pools.
Abstract: Although Mechanical Turk has recently become popular among social scientists as a source of experimental data, doubts may linger about the quality of data provided by subjects recruited from online labor markets. We address these potential concerns by presenting new demographic data about the Mechanical Turk subject population, reviewing the strengths of Mechanical Turk relative to other online and offline methods of recruiting subjects, and comparing the magnitude of effects obtained using Mechanical Turk and traditional subject pools. We further discuss some additional benefits such as the possibility of longitudinal, cross cultural and prescreening designs, and offer some advice on how to best manage a common subject pool.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: It is shown that respondents recruited in this manner are often more representative of the U.S. population than in-person convenience samples but less representative than subjects in Internet-based panels or national probability samples.
Abstract: We examine the trade-offs associated with using Amazon.com’s Mechanical Turk (MTurk) interface for subject recruitment. We first describe MTurk and its promise as a vehicle for performing low-cost and easy-to-field experiments. We then assess the internal and external validity of experiments performed using MTurk, employing a framework that can be used to evaluate other subject pools. We first investigate the characteristics of samples drawn from the MTurk population. We show that respondents recruited in this manner are often more representative of the U.S. population than in-person convenience samples—the modal sample in published experimental political science—but less representative than subjects in Internet-based panels or national probability samples. Finally, we replicate important published experimental work using MTurk samples.

3,517 citations

Journal ArticleDOI
Winter Mason1, Siddharth Suri1
TL;DR: It is shown that when taken as a whole Mechanical Turk can be a useful tool for many researchers, and how the behavior of workers compares with that of experts and laboratory subjects is discussed.
Abstract: Amazon’s Mechanical Turk is an online labor market where requesters post jobs and workers choose which jobs to do for pay. The central purpose of this article is to demonstrate how to use this Web site for conducting behavioral research and to lower the barrier to entry for researchers who could benefit from this platform. We describe general techniques that apply to a variety of types of research and experiments across disciplines. We begin by discussing some of the advantages of doing experiments on Mechanical Turk, such as easy access to a large, stable, and diverse subject pool, the low cost of doing experiments, and faster iteration between developing theory and executing experiments. While other methods of conducting behavioral research may be comparable to or even better than Mechanical Turk on one or more of the axes outlined above, we will show that when taken as a whole Mechanical Turk can be a useful tool for many researchers. We will discuss how the behavior of workers compares with that of experts and laboratory subjects. Then we will illustrate the mechanics of putting a task on Mechanical Turk, including recruiting subjects, executing the task, and reviewing the work that was submitted. We also provide solutions to common problems that a researcher might face when executing their research on this platform, including techniques for conducting synchronous experiments, methods for ensuring high-quality work, how to keep data private, and how to maintain code security.

2,521 citations

Journal ArticleDOI
TL;DR: The characteristics of Mechanical Turk as a participant pool for psychology and other social sciences, highlighting the traits of the MTurk samples, why people become Mechanical Turk workers and research participants, and how data quality on Mechanical Turk compares to that from other pools and depends on controllable and uncontrollable factors as mentioned in this paper.
Abstract: Mechanical Turk (MTurk), an online labor market created by Amazon, has recently become popular among social scientists as a source of survey and experimental data. The workers who populate this market have been assessed on dimensions that are universally relevant to understanding whether, why, and when they should be recruited as research participants. We discuss the characteristics of MTurk as a participant pool for psychology and other social sciences, highlighting the traits of the MTurk samples, why people become MTurk workers and research participants, and how data quality on MTurk compares to that from other pools and depends on controllable and uncontrollable factors.

1,926 citations

Journal ArticleDOI
TL;DR: The authors compared Mechanical Turk participants with community and student samples on a set of personality dimensions and classic decision-making biases and found that MTurk participants are less extraverted and have lower self-esteem than other participants, presenting challenges for some research domains.
Abstract: Mechanical Turk (MTurk), an online labor system run by Amazon.com, provides quick, easy, and inexpensive access to online research participants. As use of MTurk has grown, so have questions from behavioral researchers about its participants, reliability, and low compensation. In this article, we review recent research about MTurk and compare MTurk participants with community and student samples on a set of personality dimensions and classic decision-making biases. Across two studies, we find many similarities between MTurk participants and traditional samples, but we also find important differences. For instance, MTurk participants are less likely to pay attention to experimental materials, reducing statistical power. They are more likely to use the Internet to find answers, even with no incentive for correct responses. MTurk participants have attitudes about money that are different from a community sample’s attitudes but similar to students’ attitudes. Finally, MTurk participants are less extraverted and have lower self-esteem than other participants, presenting challenges for some research domains. Despite these differences, MTurk participants produce reliable results consistent with standard decision-making biases: they are present biased, risk-averse for gains, risk-seeking for losses, show delay/expedite asymmetries, and show the certainty effect—with almost no significant differences in effect sizes from other samples. We conclude that MTurk offers a highly valuable opportunity for data collection and recommend that researchers using MTurk (1) include screening questions that gauge attention and language comprehension; (2) avoid questions with factual answers; and (3) consider how individual differences in financial and social domains may influence results. Copyright © 2012 John Wiley & Sons, Ltd.

1,755 citations

Journal ArticleDOI
TL;DR: This article found that participants on both platforms were more naive and less dishonest compared to MTurk participants, and ProA and CrowdFlower participants produced data quality that was higher than CF's and comparable to M-Turk's.

1,537 citations

References
More filters
Journal ArticleDOI
30 Jan 1981-Science
TL;DR: The psychological principles that govern the perception of decision problems and the evaluation of probabilities and outcomes produce predictable shifts of preference when the same problem is framed in different ways.
Abstract: The psychological principles that govern the perception of decision problems and the evaluation of probabilities and outcomes produce predictable shifts of preference when the same problem is framed in different ways. Reversals of preference are demonstrated in choices regarding monetary outcomes, both hypothetical and real, and in questions pertaining to the loss of human lives. The effects of frames on preferences are compared to the effects of perspectives on perceptual appearance. The dependence of preferences on the formulation of decision problems is a significant concern for the theory of rational choice.

15,513 citations

Journal ArticleDOI
TL;DR: Findings indicate that MTurk can be used to obtain high-quality data inexpensively and rapidly and the data obtained are at least as reliable as those obtained via traditional methods.
Abstract: Amazon's Mechanical Turk (MTurk) is a relatively new website that contains the major elements required to conduct research: an integrated participant compensation system; a large participant pool; and a streamlined process of study design, participant recruitment, and data collection. In this article, we describe and evaluate the potential contributions of MTurk to psychology and other social sciences. Findings indicate that (a) MTurk participants are slightly more demographically diverse than are standard Internet samples and are significantly more diverse than typical American college samples; (b) participation is affected by compensation rate and task length, but participants can still be recruited rapidly and inexpensively; (c) realistic compensation rates do not affect data quality; and (d) the data obtained are at least as reliable as those obtained via traditional methods. Overall, MTurk can be used to obtain high-quality data inexpensively and rapidly.

9,562 citations

Journal ArticleDOI
TL;DR: In this paper, the authors focus on some of the qualities peculiar to psychological experiments and point out that the demand characteristics perceived in any particular experiment will vary with the sophistication, intelligence, and previous experience of each experimental subject.
Abstract: Since the time of Galileo, scientists have employed the laboratory experiment as a method of understanding natural phenomena. This chapter focuses on some of the qualities peculiar to psychological experiments. The experimental situation is one which takes place within the context of an explicit agreement of the subject to participate in a special form of social interaction known as "taking part in an experiment". The demand characteristics perceived in any particular experiment will vary with the sophistication, intelligence, and previous experience of each experimental subject. It becomes an empirical issue to study under what circumstances, in what kind of experimental contexts, and with what kind of subject populations, demand characteristics become significant in determining the behavior of subjects in experimental situations. The most obvious technique for determining what demand characteristics are perceived is the use of post-experimental inquiry. In this regard, it is well to point out that considerable self-discipline is necessary for the experimenter to obtain a valid inquiry.

3,634 citations

Journal ArticleDOI
TL;DR: The conjunction rule as mentioned in this paper states that the probability of a conjunction cannot exceed the probabilities of its constituents, P (A) and P (B), because the extension (or the possibility set) of the conjunction is included in the extension of their constituents.
Abstract: Perhaps the simplest and the most basic qualitative law of probability is the conjunction rule: The probability of a conjunction, P (A&B) cannot exceed the probabilities of its constituents, P (A) and P (B), because the extension (or the possibility set) of the conjunction is included in the extension of its constituents. Judgments under uncertainty, however, are often mediated by intuitive heuristics that are not bound by the conjunction rule. A conjunction can be more representative than one of its constituents, and instances of a specific category can be easier to imagine or to retrieve than instances of a more inclusive category. The representativeness and availability heuristics therefore can make a conjunction appear more probable than one of its constituents. This phenomenon is demonstrated in a variety of contexts including estimation of word frequency, personality judgment, medical prognosis, decision under risk, suspicion of criminal acts, and political forecasting. Systematic violations of the conjunction rule are observed in judgments of lay people and of experts in both between-subjects and within-subjects comparisons. Alternative interpretations of the conjunction fallacy are discussed and attempts to combat it are explored.

3,221 citations

Journal ArticleDOI
TL;DR: Internet data collection methods, with a focus on self-report questionnaires from self-selected samples, are evaluated and compared with traditional paper-and-pencil methods and it is concluded that Internet methods can contribute to many areas of psychology.
Abstract: The rapid growth of the Internet provides a wealth of new research opportunities for psychologists. Internet data collection methods, with a focus on self-report questionnaires from self-selected samples, are evaluated and compared with traditional paper-and-pencil methods. Six preconceptions about Internet samples and data quality are evaluated by comparing a new large Internet sample (N = 361,703) with a set of 510 published traditional samples. Internet samples are shown to be relatively diverse with respect to gender, socioeconomic status, geographic region, and age. Moreover, Internet findings generalize across presentation formats, are not adversely affected by nonserious or repeat responders, and are consistent with findings from traditional methods. It is concluded that Internet methods can contribute to many areas of psychology.

2,870 citations