Human-powered sorts and joins
Adam Marcus,Eugene Wu,David R. Karger,Samuel Madden,Robert C. Miller +4 more
- Vol. 5, Iss: 1, pp 13-24
Reads0
Chats0
TLDR
This paper describes how MTurk tasks for processing datasets with humans are currently designed with significant reimplementation of common workflows and ad-hoc selection of parameters such as price to pay per task, and proposes a number of optimizations, including task batching, replacing pairwise comparisons with numerical ratings, and pre-filtering tables before joining them.Abstract:
Crowdsourcing markets like Amazon's Mechanical Turk (MTurk) make it possible to task people with small jobs, such as labeling images or looking up phone numbers, via a programmatic interface. MTurk tasks for processing datasets with humans are currently designed with significant reimplementation of common workflows and ad-hoc selection of parameters such as price to pay per task. We describe how we have integrated crowds into a declarative workflow engine called Qurk to reduce the burden on workflow designers. In this paper, we focus on how to use humans to compare items for sorting and joining data, two of the most common operations in DBMSs. We describe our basic query interface and the user interface of the tasks we post to MTurk. We also propose a number of optimizations, including task batching, replacing pairwise comparisons with numerical ratings, and pre-filtering tables before joining them, which dramatically reduce the overall cost of running sorts and joins on the crowd. In an experiment joining two sets of images, we reduce the overall cost from $67 in a naive implementation to about $3, without substantially affecting accuracy or latency. In an end-to-end experiment, we reduced cost by a factor of 14.5.read more
Citations
More filters
Journal ArticleDOI
Mobile Crowd Sensing and Computing: The Review of an Emerging Human-Powered Sensing Paradigm
TL;DR: The unique features and novel application areas of MCSC are characterized and a reference framework for building human-in-the-loop MCSC systems is proposed, which clarifies the complementary nature of human and machine intelligence and envision the potential of deep-fused human--machine systems.
Journal ArticleDOI
CrowdER: crowdsourcing entity resolution
TL;DR: This work proposes a hybrid human-machine approach in which machines are used to do an initial, coarse pass over all the data, and people are use to verify only the most likely matching pairs, and develops a novel two-tiered heuristic approach for creating batched tasks.
Proceedings ArticleDOI
ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking
TL;DR: A probabilistic framework to make sensible decisions about candidate links and to identify unreliable human workers is developed and developed to improve the quality of the links while limiting the amount of work performed by the crowd.
Posted Content
CrowdER: Crowdsourcing Entity Resolution
TL;DR: In this paper, a hybrid human-machine approach is proposed, in which machines are used to do an initial, coarse pass over all the data, and people were used to verify only the most likely matching pairs.
Journal ArticleDOI
Truth inference in crowdsourcing: is the problem solved?
TL;DR: It is believed that the truth inference problem is not fully solved, and the limitations of existing algorithms are identified and point out promising research directions.
References
More filters
Book
A technique for the measurement of attitudes
TL;DR: The instrument to be described here is not, however, indirect in the usual sense of the word; it does not seek responses to items apparently unrelated to the attitudes investigated, and seeks to measure prejudice in a manner less direct than is true of the usual prejudice scale.
Journal ArticleDOI
A new measure of rank correlation
TL;DR: Rank correlation as mentioned in this paper is a measure of similarity between two rankings of the same set of individuals, and it has been used in psychological work to compare two different rankings of individuals in order to indicate similarity of taste.
Journal ArticleDOI
Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm
A. P. Dawid,A. M. Skene +1 more
TL;DR: The EM algorithm is shown to provide a slow but sure way of obtaining maximum likelihood estimates of the parameters of interest in compiling a patient record.
Proceedings ArticleDOI
Quality management on Amazon Mechanical Turk
TL;DR: This work presents algorithms that improve the existing state-of-the-art techniques, enabling the separation of bias and error, and illustrates how to incorporate cost-sensitive classification errors in the overall framework and how to seamlessly integrate unsupervised and supervised techniques for inferring the quality of the workers.