scispace - formally typeset
Search or ask a question
Topic

Crowdsourcing

About: Crowdsourcing is a research topic. Over the lifetime, 12889 publications have been published within this topic receiving 230638 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This work presents an approach for live learning of object detectors, in which the system autonomously refines its models by actively requesting crowd-sourced annotations on images crawled from the Web, and introduces a novel part-based detector amenable to linear classifiers.
Abstract: Active learning and crowdsourcing are promising ways to efficiently build up training sets for object recognition, but thus far techniques are tested in artificially controlled settings. Typically the vision researcher has already determined the dataset's scope, the labels "actively" obtained are in fact already known, and/or the crowd-sourced collection process is iteratively fine-tuned. We present an approach for live learning of object detectors, in which the system autonomously refines its models by actively requesting crowd-sourced annotations on images crawled from the Web. To address the technical issues such a large-scale system entails, we introduce a novel part-based detector amenable to linear classifiers, and show how to identify its most uncertain instances in sub-linear time with a hashing-based solution. We demonstrate the approach with experiments of unprecedented scale and autonomy, and show it successfully improves the state-of-the-art for the most challenging objects in the PASCAL VOC benchmark. In addition, we show our detector competes well with popular nonlinear classifiers that are much more expensive to train.

273 citations

Posted Content
TL;DR: In this article, a two-stage efficient algorithm for multi-class crowd labeling problems is proposed, where the first stage uses the spectral method to obtain an initial estimate of parameters, and the second stage refines the estimation by optimizing the objective function of the Dawid-Skene estimator via the EM algorithm.
Abstract: Crowdsourcing is a popular paradigm for effectively collecting labels at low cost. The Dawid-Skene estimator has been widely used for inferring the true labels from the noisy labels provided by non-expert crowdsourcing workers. However, since the estimator maximizes a non-convex log-likelihood function, it is hard to theoretically justify its performance. In this paper, we propose a two-stage efficient algorithm for multi-class crowd labeling problems. The first stage uses the spectral method to obtain an initial estimate of parameters. Then the second stage refines the estimation by optimizing the objective function of the Dawid-Skene estimator via the EM algorithm. We show that our algorithm achieves the optimal convergence rate up to a logarithmic factor. We conduct extensive experiments on synthetic and real datasets. Experimental results demonstrate that the proposed algorithm is comparable to the most accurate empirical approach, while outperforming several other recently proposed methods.

272 citations

Journal ArticleDOI
TL;DR: If appropriate validation and quality control procedures are adopted and implemented, crowdsourcing has much potential to provide a valuable source of high temporal and spatial resolution, real-time data, especially in regions where few observations currently exist, thereby adding value to science, technology and society.
Abstract: Crowdsourcing is traditionally defined as obtaining data or information by enlisting the services of a (potentially large) number of people. However, due to recent innovations, this definition can now be expanded to include ‘and/or from a range of public sensors, typically connected via the Internet.’ A large and increasing amount of data is now being obtained from a huge variety of non-traditional sources – from smart phone sensors to amateur weather stations to canvassing members of the public. Some disciplines (e.g. astrophysics, ecology) are already utilizing crowdsourcing techniques (e.g. citizen science initiatives, web 2.0 technology, low-cost sensors), and while its value within the climate and atmospheric science disciplines is still relatively unexplored, it is beginning to show promise. However, important questions remain; this paper introduces and explores the wide-range of current and prospective methods to crowdsource atmospheric data, investigates the quality of such data and examines its potential applications in the context of weather, climate and society. It is clear that crowdsourcing is already a valuable tool for engaging the public, and if appropriate validation and quality control procedures are adopted and implemented, it has much potential to provide a valuable source of high temporal and spatial resolution, real-time data, especially in regions where few observations currently exist, thereby adding value to science, technology and society.

271 citations

Proceedings ArticleDOI
16 May 2016
TL;DR: This paper identifies a more practical micro-task allocation problem, called the Global Online Micro-task Allocation in spatial crowdsourcing (GOMA) problem, and proposes a two-phase-based framework, based on which the TGOA algorithm with 1 over 4 -competitive ratio under the online random order model is presented.
Abstract: With the rapid development of smartphones, spatial crowdsourcing platforms are getting popular. A foundational research of spatial crowdsourcing is to allocate micro-tasks to suitable crowd workers. Most existing studies focus on offline scenarios, where all the spatiotemporal information of micro-tasks and crowd workers is given. However, they are impractical since micro-tasks and crowd workers in real applications appear dynamically and their spatiotemporal information cannot be known in advance. In this paper, to address the shortcomings of existing offline approaches, we first identify a more practical micro-task allocation problem, called the Global Online Micro-task Allocation in spatial crowdsourcing (GOMA) problem. We first extend the state-of-art algorithm for the online maximum weighted bipartite matching problem to the GOMA problem as the baseline algorithm. Although the baseline algorithm provides theoretical guarantee for the worst case, its average performance in practice is not good enough since the worst case happens with a very low probability in real world. Thus, we consider the average performance of online algorithms, a.k.a online random order model.We propose a two-phase-based framework, based on which we present the TGOA algorithm with 1 over 4 -competitive ratio under the online random order model. To improve its efficiency, we further design the TGOA-Greedy algorithm following the framework, which runs faster than the TGOA algorithm but has lower competitive ratio of 1 over 8. Finally, we verify the effectiveness and efficiency of the proposed methods through extensive experiments on real and synthetic datasets.

271 citations

Journal ArticleDOI
Kim Sheehan1
TL;DR: An overview of Mechanical Turk as an academic research platform and a critical examination of its strengths and weaknesses for research are presented.
Abstract: Researchers in a variety of disciplines use Amazon’s crowdsourcing platform called Mechanical Turk as a way to collect data from a respondent pool that is much more diverse than a typical student s...

271 citations


Network Information
Related Topics (5)
Social network
42.9K papers, 1.5M citations
87% related
User interface
85.4K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Cluster analysis
146.5K papers, 2.9M citations
85% related
The Internet
213.2K papers, 3.8M citations
85% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023637
20221,420
2021996
20201,250
20191,341
20181,396