Open AccessPosted Content
Human-powered Sorts and Joins
TLDR
The authors integrated crowds into a declarative workflow engine called Qurk to reduce the burden on workflow designers and used humans to compare items for sorting and joining data, two of the most common operations in DBMSs.Abstract:
Crowdsourcing markets like Amazon's Mechanical Turk (MTurk) make it possible to task people with small jobs, such as labeling images or looking up phone numbers, via a programmatic interface. MTurk tasks for processing datasets with humans are currently designed with significant reimplementation of common workflows and ad-hoc selection of parameters such as price to pay per task. We describe how we have integrated crowds into a declarative workflow engine called Qurk to reduce the burden on workflow designers. In this paper, we focus on how to use humans to compare items for sorting and joining data, two of the most common operations in DBMSs. We describe our basic query interface and the user interface of the tasks we post to MTurk. We also propose a number of optimizations, including task batching, replacing pairwise comparisons with numerical ratings, and pre-filtering tables before joining them, which dramatically reduce the overall cost of running sorts and joins on the crowd. In an experiment joining two sets of images, we reduce the overall cost from $67 in a naive implementation to about $3, without substantially affecting accuracy or latency. In an end-to-end experiment, we reduced cost by a factor of 14.5.read more
Citations
More filters
Journal ArticleDOI
Mobile Crowd Sensing and Computing: The Review of an Emerging Human-Powered Sensing Paradigm
TL;DR: The unique features and novel application areas of MCSC are characterized and a reference framework for building human-in-the-loop MCSC systems is proposed, which clarifies the complementary nature of human and machine intelligence and envision the potential of deep-fused human--machine systems.
Proceedings ArticleDOI
ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking
TL;DR: A probabilistic framework to make sensible decisions about candidate links and to identify unreliable human workers is developed and developed to improve the quality of the links while limiting the amount of work performed by the crowd.
Posted Content
CrowdER: Crowdsourcing Entity Resolution
TL;DR: In this paper, a hybrid human-machine approach is proposed, in which machines are used to do an initial, coarse pass over all the data, and people were used to verify only the most likely matching pairs.
Journal ArticleDOI
Truth inference in crowdsourcing: is the problem solved?
TL;DR: It is believed that the truth inference problem is not fully solved, and the limitations of existing algorithms are identified and point out promising research directions.
Proceedings ArticleDOI
Corleone: hands-off crowdsourcing for entity matching
Chaitanya Gokhale,Sanjib Das,AnHai Doan,Jeffrey F. Naughton,Narasimhan Rampalli,Jude W. Shavlik,Xiaojin Zhu +6 more
TL;DR: Corleone is described, a HOC solution for EM, which uses the crowd in all major steps of the EM process, and the implications of this work to executing crowdsourced RDBMS joins, cleaning learning models, and soliciting complex information types from crowd workers.
References
More filters
Book
A technique for the measurement of attitudes
TL;DR: The instrument to be described here is not, however, indirect in the usual sense of the word; it does not seek responses to items apparently unrelated to the attitudes investigated, and seeks to measure prejudice in a manner less direct than is true of the usual prejudice scale.
Journal ArticleDOI
A new measure of rank correlation
TL;DR: Rank correlation as mentioned in this paper is a measure of similarity between two rankings of the same set of individuals, and it has been used in psychological work to compare two different rankings of individuals in order to indicate similarity of taste.
Journal ArticleDOI
Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm
A. P. Dawid,A. M. Skene +1 more
TL;DR: The EM algorithm is shown to provide a slow but sure way of obtaining maximum likelihood estimates of the parameters of interest in compiling a patient record.
Proceedings ArticleDOI
Quality management on Amazon Mechanical Turk
TL;DR: This work presents algorithms that improve the existing state-of-the-art techniques, enabling the separation of bias and error, and illustrates how to incorporate cost-sensitive classification errors in the overall framework and how to seamlessly integrate unsupervised and supervised techniques for inferring the quality of the workers.