Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise

Open AccessProceedings Article

Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise

Jacob Whitehill, +4 more

- Vol. 22, pp 2035-2043

Chats0

TLDR

A probabilistic model is presented and it is demonstrated that the model outperforms the commonly used "Majority Vote" heuristic for inferring image labels, and is robust to both noisy and adversarial labelers.

Abstract:

Modern machine learning-based approaches to computer vision require very large databases of hand labeled images. Some contemporary vision systems already require on the order of millions of images for training (e.g., Omron face detector [9]). New Internet-based services allow for a large number of labelers to collaborate around the world at very low cost. However, using these services brings interesting theoretical and practical challenges: (1) The labelers may have wide ranging levels of expertise which are unknown a priori, and in some cases may be adversarial; (2) images may vary in their level of difficulty; and (3) multiple labels for the same image must be combined to provide an estimate of the actual label of the image. Probabilistic approaches provide a principled way to approach these problems. In this paper we present a probabilistic model and use it to simultaneously infer the label of each image, the expertise of each labeler, and the difficulty of each image. On both simulated and real data, we demonstrate that the model outperforms the commonly used "Majority Vote" heuristic for inferring image labels, and is robust to both noisy and adversarial labelers.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Learning From Crowds

Vikas C. Raykar, +6 more

- 01 Mar 2010 -

Journal of Machine Learning Research

TL;DR: A probabilistic approach for supervised learning when the authors have multiple annotators providing (possibly noisy) labels but no absolute gold standard, and experimental results indicate that the proposed method is superior to the commonly used majority voting baseline.

...read moreread less

Journal ArticleDOI

A brief introduction to weakly supervised learning

Zhi-Hua Zhou

- 01 Jan 2018 -

National Science Review

TL;DR: This article reviews some research progress of weakly supervised learning, focusing on three typical types of weak supervision: incomplete supervision, where only a subset of training data is given with labels; inexact supervision, Where the training data are given with only coarse-grained labels; and inaccurate supervision,Where the given labels are not always ground-truth.

...read moreread less

Proceedings Article

The Multidimensional Wisdom of Crowds

Peter Welinder, +3 more

TL;DR: A method for estimating the underlying value of each image from (noisy) annotations provided by multiple annotators, based on a model of the image formation and annotation process, which predicts ground truth labels on both synthetic and real data more accurately than state of the art methods.

...read moreread less

Proceedings ArticleDOI

Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild

Shan Li, +2 more

TL;DR: A new DLP-CNN (Deep Locality-Preserving CNN) method, which aims to enhance the discriminative power of deep features by preserving the locality closeness while maximizing the inter-class scatters, is proposed.

...read moreread less

Journal ArticleDOI

AggNet: Deep Learning From Crowds for Mitosis Detection in Breast Cancer Histology Images

Shadi Albarqouni, +5 more

- 11 Feb 2016 -

IEEE Transactions on Medical Imaging

TL;DR: An experimental study on learning from crowds that handles data aggregation directly as part of the learning process of the convolutional neural network (CNN) via additional crowdsourcing layer (AggNet), which gives valuable insights into the functionality of deep CNN learning from crowd annotations and proves the necessity of data aggregation integration.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Probabilistic Models for Some Intelligence and Attainment Tests

George Rasch

Book

Statistical Theories of Mental Test Scores

Frederic M. Lord, +2 more

TL;DR: In this paper, the authors present a survey of test theory models and their application in the field of mental test analysis. But the focus of the survey is on test-score theories and models, and not the practical applications and limitations of each model studied.

...read moreread less

Proceedings ArticleDOI

Labeling images with a computer game

Luis von Ahn, +1 more

TL;DR: A new interactive system: a game that is fun and can be used to create valuable output that addresses the image-labeling problem and encourages people to do the work by taking advantage of their desire to be entertained.

...read moreread less

Proceedings ArticleDOI

Cheap and Fast -- But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks

Rion Snow, +3 more

TL;DR: This work explores the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web, and proposes a technique for bias correction that significantly improves annotation quality on two tasks.

...read moreread less