SQUARE: A Benchmark for Research on Computing Crowd Consensus

Open AccessProceedings Article

SQUARE: A Benchmark for Research on Computing Crowd Consensus

TLDR

SQUARE, an open source shared task framework including benchmark datasets, defined tasks, standard metrics, and reference implementations with empirical results for several popular methods, is presented.

Abstract:

While many statistical consensus methods now exist, relatively little comparative benchmarking and integration of techniques has made it increasingly difficult to determine the current state-of-the-art, to evaluate the relative benefit of new methods, to understand where specific problems merit greater attention, and to measure field progress over time. To make such comparative evaluation easier for everyone, we present SQUARE, an open source shared task framework including benchmark datasets, defined tasks, standard metrics, and reference implementations with empirical results for several popular methods. In addition to measuring performance on a variety of public, real crowd datasets, the benchmark also varies supervision and noise by manipulating training size and labeling error. We envision SQUARE as dynamic and continually evolving, with new datasets and reference implementations being added according to community needs and interest. We invite community contributions and participation.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Quality Control in Crowdsourcing: A Survey of Quality Attributes, Assessment Techniques, and Assurance Actions

Florian Daniel, +4 more

- 04 Jan 2018 -

ACM Computing Surveys

TL;DR: In this paper, a survey of quality in the context of crowdsourcing along several dimensions is presented to define and characterize it and to understand the current state-of-the-art.

...read moreread less

Proceedings ArticleDOI

QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications

Yudian Zheng, +4 more

TL;DR: This paper investigates the online task assignment problem: Given a pool of n questions, which of the k questions should be assigned to a worker, and proposes a system called the Quality-Aware Task Assignment System for Crowdsourcing Applications (QASCA) on top of AMT.

...read moreread less

Journal ArticleDOI

Learning from crowdsourced labeled data: a survey

Jing Zhang, +2 more

- 01 Dec 2016 -

Artificial Intelligence Review

TL;DR: This survey introduces the basic concepts of the qualities of labels and learning models, and introduces open accessible real-world data sets collected from crowdsourcing systems and open source libraries and tools.

...read moreread less

Proceedings ArticleDOI

Comparing Person- and Process-centric Strategies for Obtaining Quality Data on Amazon Mechanical Turk

Tanushree Mitra, +2 more

TL;DR: It is found that screening workers for requisite cognitive aptitudes and providing training in qualitative coding techniques is quite effective, significantly outperforming control and baseline conditions and can improve coder annotation accuracy above and beyond common benchmark strategies such as Bayesian Truth Serum (BTS).

...read moreread less

Journal ArticleDOI

Active Learning With Imbalanced Multiple Noisy Labeling

Jing Zhang, +2 more

- 01 May 2015 -

IEEE Transactions on Systems, Man, and C...

TL;DR: A novel active learning framework with multiple imperfect annotators involved in crowdsourcing systems that solves the imbalanced multiple noisy labeling problem and three novel instance selection strategies are proposed to adapt PLAT for improving the learning performance.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Cheap and Fast -- But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks

Rion Snow, +3 more

TL;DR: This work explores the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web, and proposes a technique for bias correction that significantly improves annotation quality on two tasks.

...read moreread less

Journal ArticleDOI

Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm

A. P. Dawid, +1 more

- 01 Mar 1979 -

Journal of The Royal Statistical Society...

TL;DR: The EM algorithm is shown to provide a slow but sure way of obtaining maximum likelihood estimates of the parameters of interest in compiling a patient record.

...read moreread less

Journal ArticleDOI

Learning From Crowds

Vikas C. Raykar, +6 more

- 01 Mar 2010 -

Journal of Machine Learning Research

TL;DR: A probabilistic approach for supervised learning when the authors have multiple annotators providing (possibly noisy) labels but no absolute gold standard, and experimental results indicate that the proposed method is superior to the commonly used majority voting baseline.

...read moreread less

Proceedings Article

Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise

Jacob Whitehill, +4 more

TL;DR: A probabilistic model is presented and it is demonstrated that the model outperforms the commonly used "Majority Vote" heuristic for inferring image labels, and is robust to both noisy and adversarial labelers.

...read moreread less

Proceedings ArticleDOI

Quality management on Amazon Mechanical Turk

Panagiotis G. Ipeirotis, +2 more

TL;DR: This work presents algorithms that improve the existing state-of-the-art techniques, enabling the separation of bias and error, and illustrates how to incorporate cost-sensitive classification errors in the overall framework and how to seamlessly integrate unsupervised and supervised techniques for inferring the quality of the workers.

...read moreread less

Collapse

Related Papers (5)

Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm

A. P. Dawid, +1 more

- 01 Mar 1979 -

Journal of The Royal Statistical Society...

Journal of Machine Learning Research

Cheap and Fast -- But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks

Rion Snow, +3 more

SQUARE: A Benchmark for Research on Computing Crowd Consensus

Citations

Quality Control in Crowdsourcing: A Survey of Quality Attributes, Assessment Techniques, and Assurance Actions

QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications

Learning from crowdsourced labeled data: a survey

Comparing Person- and Process-centric Strategies for Obtaining Quality Data on Amazon Mechanical Turk

Active Learning With Imbalanced Multiple Noisy Labeling

References

Cheap and Fast -- But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks

Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm

Learning From Crowds

Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise

Quality management on Amazon Mechanical Turk

Related Papers (5)

Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm

Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise

The Multidimensional Wisdom of Crowds

Learning From Crowds

Cheap and Fast -- But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks