scispace - formally typeset
Open AccessPosted Content

Human-powered Sorts and Joins

TLDR
The authors integrated crowds into a declarative workflow engine called Qurk to reduce the burden on workflow designers and used humans to compare items for sorting and joining data, two of the most common operations in DBMSs.
Abstract
Crowdsourcing markets like Amazon's Mechanical Turk (MTurk) make it possible to task people with small jobs, such as labeling images or looking up phone numbers, via a programmatic interface. MTurk tasks for processing datasets with humans are currently designed with significant reimplementation of common workflows and ad-hoc selection of parameters such as price to pay per task. We describe how we have integrated crowds into a declarative workflow engine called Qurk to reduce the burden on workflow designers. In this paper, we focus on how to use humans to compare items for sorting and joining data, two of the most common operations in DBMSs. We describe our basic query interface and the user interface of the tasks we post to MTurk. We also propose a number of optimizations, including task batching, replacing pairwise comparisons with numerical ratings, and pre-filtering tables before joining them, which dramatically reduce the overall cost of running sorts and joins on the crowd. In an experiment joining two sets of images, we reduce the overall cost from $67 in a naive implementation to about $3, without substantially affecting accuracy or latency. In an end-to-end experiment, we reduced cost by a factor of 14.5.

read more

Citations
More filters
Journal ArticleDOI

Mobile Crowd Sensing and Computing: The Review of an Emerging Human-Powered Sensing Paradigm

TL;DR: The unique features and novel application areas of MCSC are characterized and a reference framework for building human-in-the-loop MCSC systems is proposed, which clarifies the complementary nature of human and machine intelligence and envision the potential of deep-fused human--machine systems.
Proceedings ArticleDOI

ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking

TL;DR: A probabilistic framework to make sensible decisions about candidate links and to identify unreliable human workers is developed and developed to improve the quality of the links while limiting the amount of work performed by the crowd.
Posted Content

CrowdER: Crowdsourcing Entity Resolution

TL;DR: In this paper, a hybrid human-machine approach is proposed, in which machines are used to do an initial, coarse pass over all the data, and people were used to verify only the most likely matching pairs.
Journal ArticleDOI

Truth inference in crowdsourcing: is the problem solved?

TL;DR: It is believed that the truth inference problem is not fully solved, and the limitations of existing algorithms are identified and point out promising research directions.
Proceedings ArticleDOI

Corleone: hands-off crowdsourcing for entity matching

TL;DR: Corleone is described, a HOC solution for EM, which uses the crowd in all major steps of the EM process, and the implications of this work to executing crowdsourced RDBMS joins, cleaning learning models, and soliciting complex information types from crowd workers.
References
More filters
Book

A technique for the measurement of attitudes

Rensis Likert
TL;DR: The instrument to be described here is not, however, indirect in the usual sense of the word; it does not seek responses to items apparently unrelated to the attitudes investigated, and seeks to measure prejudice in a manner less direct than is true of the usual prejudice scale.
Journal ArticleDOI

A new measure of rank correlation

Maurice G. Kendall
- 01 Jun 1938 - 
TL;DR: Rank correlation as mentioned in this paper is a measure of similarity between two rankings of the same set of individuals, and it has been used in psychological work to compare two different rankings of individuals in order to indicate similarity of taste.
Journal ArticleDOI

Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm

TL;DR: The EM algorithm is shown to provide a slow but sure way of obtaining maximum likelihood estimates of the parameters of interest in compiling a patient record.
Proceedings ArticleDOI

Quality management on Amazon Mechanical Turk

TL;DR: This work presents algorithms that improve the existing state-of-the-art techniques, enabling the separation of bias and error, and illustrates how to incorporate cost-sensitive classification errors in the overall framework and how to seamlessly integrate unsupervised and supervised techniques for inferring the quality of the workers.
Related Papers (5)