Open AccessProceedings Article
SQUARE: A Benchmark for Research on Computing Crowd Consensus
Aashish Sheshadri,Matthew Lease +1 more
TLDR
SQUARE, an open source shared task framework including benchmark datasets, defined tasks, standard metrics, and reference implementations with empirical results for several popular methods, is presented.Abstract:
While many statistical consensus methods now exist, relatively little comparative benchmarking and integration of techniques has made it increasingly difficult to determine the current state-of-the-art, to evaluate the relative benefit of new methods, to understand where specific problems merit greater attention, and to measure field progress over time. To make such comparative evaluation easier for everyone, we present SQUARE, an open source shared task framework including benchmark datasets, defined tasks, standard metrics, and reference implementations with empirical results for several popular methods. In addition to measuring performance on a variety of public, real crowd datasets, the benchmark also varies supervision and noise by manipulating training size and labeling error. We envision SQUARE as dynamic and continually evolving, with new datasets and reference implementations being added according to community needs and interest. We invite community contributions and participation.read more
Citations
More filters
Journal ArticleDOI
Quality Control in Crowdsourcing: A Survey of Quality Attributes, Assessment Techniques, and Assurance Actions
TL;DR: In this paper, a survey of quality in the context of crowdsourcing along several dimensions is presented to define and characterize it and to understand the current state-of-the-art.
Proceedings ArticleDOI
QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications
TL;DR: This paper investigates the online task assignment problem: Given a pool of n questions, which of the k questions should be assigned to a worker, and proposes a system called the Quality-Aware Task Assignment System for Crowdsourcing Applications (QASCA) on top of AMT.
Journal ArticleDOI
Learning from crowdsourced labeled data: a survey
TL;DR: This survey introduces the basic concepts of the qualities of labels and learning models, and introduces open accessible real-world data sets collected from crowdsourcing systems and open source libraries and tools.
Proceedings ArticleDOI
Comparing Person- and Process-centric Strategies for Obtaining Quality Data on Amazon Mechanical Turk
TL;DR: It is found that screening workers for requisite cognitive aptitudes and providing training in qualitative coding techniques is quite effective, significantly outperforming control and baseline conditions and can improve coder annotation accuracy above and beyond common benchmark strategies such as Bayesian Truth Serum (BTS).
Journal ArticleDOI
Active Learning With Imbalanced Multiple Noisy Labeling
TL;DR: A novel active learning framework with multiple imperfect annotators involved in crowdsourcing systems that solves the imbalanced multiple noisy labeling problem and three novel instance selection strategies are proposed to adapt PLAT for improving the learning performance.
References
More filters
Proceedings ArticleDOI
Cheap and Fast -- But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks
TL;DR: This work explores the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web, and proposes a technique for bias correction that significantly improves annotation quality on two tasks.
Journal ArticleDOI
Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm
A. P. Dawid,A. M. Skene +1 more
TL;DR: The EM algorithm is shown to provide a slow but sure way of obtaining maximum likelihood estimates of the parameters of interest in compiling a patient record.
Journal ArticleDOI
Learning From Crowds
Vikas C. Raykar,Shipeng Yu,Linda Zhao,Gerardo Hermosillo Valadez,Charles Florin,Luca Bogoni,Linda Moy +6 more
TL;DR: A probabilistic approach for supervised learning when the authors have multiple annotators providing (possibly noisy) labels but no absolute gold standard, and experimental results indicate that the proposed method is superior to the commonly used majority voting baseline.
Proceedings Article
Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise
TL;DR: A probabilistic model is presented and it is demonstrated that the model outperforms the commonly used "Majority Vote" heuristic for inferring image labels, and is robust to both noisy and adversarial labelers.
Proceedings ArticleDOI
Quality management on Amazon Mechanical Turk
TL;DR: This work presents algorithms that improve the existing state-of-the-art techniques, enabling the separation of bias and error, and illustrates how to incorporate cost-sensitive classification errors in the overall framework and how to seamlessly integrate unsupervised and supervised techniques for inferring the quality of the workers.