Human-powered Sorts and Joins

Open AccessPosted Content

Human-powered Sorts and Joins

- 30 Sep 2011 -

TLDR

The authors integrated crowds into a declarative workflow engine called Qurk to reduce the burden on workflow designers and used humans to compare items for sorting and joining data, two of the most common operations in DBMSs.

Abstract:

Crowdsourcing markets like Amazon's Mechanical Turk (MTurk) make it possible to task people with small jobs, such as labeling images or looking up phone numbers, via a programmatic interface. MTurk tasks for processing datasets with humans are currently designed with significant reimplementation of common workflows and ad-hoc selection of parameters such as price to pay per task. We describe how we have integrated crowds into a declarative workflow engine called Qurk to reduce the burden on workflow designers. In this paper, we focus on how to use humans to compare items for sorting and joining data, two of the most common operations in DBMSs. We describe our basic query interface and the user interface of the tasks we post to MTurk. We also propose a number of optimizations, including task batching, replacing pairwise comparisons with numerical ratings, and pre-filtering tables before joining them, which dramatically reduce the overall cost of running sorts and joins on the crowd. In an experiment joining two sets of images, we reduce the overall cost from $67 in a naive implementation to about $3, without substantially affecting accuracy or latency. In an end-to-end experiment, we reduced cost by a factor of 14.5.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Mobile Crowd Sensing and Computing: The Review of an Emerging Human-Powered Sensing Paradigm

Bin Guo, +6 more

- 10 Aug 2015 -

ACM Computing Surveys

TL;DR: The unique features and novel application areas of MCSC are characterized and a reference framework for building human-in-the-loop MCSC systems is proposed, which clarifies the complementary nature of human and machine intelligence and envision the potential of deep-fused human--machine systems.

...read moreread less

Proceedings ArticleDOI

ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking

Gianluca Demartini, +2 more

TL;DR: A probabilistic framework to make sensible decisions about candidate links and to identify unreliable human workers is developed and developed to improve the quality of the links while limiting the amount of work performed by the crowd.

...read moreread less

Posted Content

CrowdER: Crowdsourcing Entity Resolution

Jiannan Wang, +3 more

- 09 Aug 2012 -

arXiv: Databases

TL;DR: In this paper, a hybrid human-machine approach is proposed, in which machines are used to do an initial, coarse pass over all the data, and people were used to verify only the most likely matching pairs.

...read moreread less

Journal ArticleDOI

Truth inference in crowdsourcing: is the problem solved?

Yudian Zheng, +4 more

TL;DR: It is believed that the truth inference problem is not fully solved, and the limitations of existing algorithms are identified and point out promising research directions.

...read moreread less

Proceedings ArticleDOI

Corleone: hands-off crowdsourcing for entity matching

Chaitanya Gokhale, +6 more

TL;DR: Corleone is described, a HOC solution for EM, which uses the crowd in all major steps of the EM process, and the implications of this work to executing crowdsourced RDBMS joins, cleaning learning models, and soliciting complex information types from crowd workers.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

A technique for the measurement of attitudes

Rensis Likert

TL;DR: The instrument to be described here is not, however, indirect in the usual sense of the word; it does not seek responses to items apparently unrelated to the attitudes investigated, and seeks to measure prejudice in a manner less direct than is true of the usual prejudice scale.

...read moreread less

Journal ArticleDOI

Measuring nominal scale agreement among many raters.

Joseph L. Fleiss

- 01 Jan 1971 -

Psychological Bulletin

Journal ArticleDOI

A new measure of rank correlation

Maurice G. Kendall

- 01 Jun 1938 -

Biometrika

TL;DR: Rank correlation as mentioned in this paper is a measure of similarity between two rankings of the same set of individuals, and it has been used in psychological work to compare two different rankings of individuals in order to indicate similarity of taste.

...read moreread less

Journal ArticleDOI

Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm

A. P. Dawid, +1 more

- 01 Mar 1979 -

Journal of The Royal Statistical Society...

TL;DR: The EM algorithm is shown to provide a slow but sure way of obtaining maximum likelihood estimates of the parameters of interest in compiling a patient record.

...read moreread less

Proceedings ArticleDOI

Quality management on Amazon Mechanical Turk

Panagiotis G. Ipeirotis, +2 more

TL;DR: This work presents algorithms that improve the existing state-of-the-art techniques, enabling the separation of bias and error, and illustrates how to incorporate cost-sensitive classification errors in the overall framework and how to seamlessly integrate unsupervised and supervised techniques for inferring the quality of the workers.

...read moreread less

Related Papers (5)