Human-powered sorts and joins

doi:10.14778/2047485.2047487

Open AccessJournal ArticleDOI

Human-powered sorts and joins

Adam Marcus, +4 more

- Vol. 5, Iss: 1, pp 13-24

Chats0

TLDR

This paper describes how MTurk tasks for processing datasets with humans are currently designed with significant reimplementation of common workflows and ad-hoc selection of parameters such as price to pay per task, and proposes a number of optimizations, including task batching, replacing pairwise comparisons with numerical ratings, and pre-filtering tables before joining them.

Abstract:

Crowdsourcing markets like Amazon's Mechanical Turk (MTurk) make it possible to task people with small jobs, such as labeling images or looking up phone numbers, via a programmatic interface. MTurk tasks for processing datasets with humans are currently designed with significant reimplementation of common workflows and ad-hoc selection of parameters such as price to pay per task. We describe how we have integrated crowds into a declarative workflow engine called Qurk to reduce the burden on workflow designers. In this paper, we focus on how to use humans to compare items for sorting and joining data, two of the most common operations in DBMSs. We describe our basic query interface and the user interface of the tasks we post to MTurk. We also propose a number of optimizations, including task batching, replacing pairwise comparisons with numerical ratings, and pre-filtering tables before joining them, which dramatically reduce the overall cost of running sorts and joins on the crowd. In an experiment joining two sets of images, we reduce the overall cost from $67 in a naive implementation to about $3, without substantially affecting accuracy or latency. In an end-to-end experiment, we reduced cost by a factor of 14.5.

Human-powered sorts and joins

Citations

Mobile Crowd Sensing and Computing: The Review of an Emerging Human-Powered Sensing Paradigm

CrowdER: crowdsourcing entity resolution

ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking

CrowdER: Crowdsourcing Entity Resolution

Truth inference in crowdsourcing: is the problem solved?

References

A technique for the measurement of attitudes

Measuring nominal scale agreement among many raters.

A new measure of rank correlation

Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm

Quality management on Amazon Mechanical Turk

Related Papers (5)

CrowdDB: answering queries with crowdsourcing

CrowdER: crowdsourcing entity resolution

CrowdScreen: algorithms for filtering data with humans

Crowdsourced Databases: Query Processing with People

CDAS: a crowdsourcing data analytics system