scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A Survey of Crowdsourcing Systems

TL;DR: A structured view of the research on crowd sourcing to date is provided, which is categorized according to their applications, algorithms, performances and datasets.
Abstract: Crowd sourcing is evolving as a distributed problem-solving and business production model in recent years. In crowd sourcing paradigm, tasks are distributed to networked people to complete such that a company's production cost can be greatly reduced. In 2003, Luis von Ahn and his colleagues pioneered the concept of "human computation", which utilizes human abilities to perform computation tasks that are difficult for computers to process. Later, the term "crowdsourcing" was coined by Jeff Howe in 2006. Since then, a lot of work in crowd sourcing has focused on different aspects of crowd sourcing, such as computational techniques and performance analysis. In this paper, we give a survey on the literature on crowd sourcing which are categorized according to their applications, algorithms, performances and datasets. This paper provides a structured view of the research on crowd sourcing to date.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, label noise consists of mislabeled instances: no additional information is assumed to be available like e.g., confidences on labels.
Abstract: Label noise is an important issue in classification, with many potential negative consequences. For example, the accuracy of predictions may decrease, whereas the complexity of inferred models and the number of necessary training samples may increase. Many works in the literature have been devoted to the study of label noise and the development of techniques to deal with label noise. However, the field lacks a comprehensive survey on the different types of label noise, their consequences and the algorithms that consider label noise. This paper proposes to fill this gap. First, the definitions and sources of label noise are considered and a taxonomy of the types of label noise is proposed. Second, the potential consequences of label noise are discussed. Third, label noise-robust, label noise cleansing, and label noise-tolerant algorithms are reviewed. For each category of approaches, a short discussion is proposed to help the practitioner to choose the most suitable technique in its own particular field of application. Eventually, the design of experiments is also discussed, what may interest the researchers who would like to test their own algorithms. In this paper, label noise consists of mislabeled instances: no additional information is assumed to be available like e.g., confidences on labels.

1,440 citations


Cites background from "A Survey of Crowdsourcing Systems"

  • ...Also, since collecting reliable labels is a time-consuming and costly task, there is an increasing interest in using cheap, easy-to-get labels from non-expert using frameworks like e.g. the Amazon Mechanical Turk1 [49]–[52]....

    [...]

Journal ArticleDOI
01 Nov 2017
TL;DR: Snorkel as mentioned in this paper is a system that enables users to train state-of-the-art models without hand labeling any training data, which can have unknown accuracies and correlations.
Abstract: Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs without access to ground truth by incorporating the first end-to-end implementation of our recently proposed machine learning paradigm, data programming. We present a flexible interface layer for writing labeling functions based on our experience over the past year collaborating with companies, agencies, and research labs. In a user study, subject matter experts build models 2.8X faster and increase predictive performance an average 45.5% versus seven hours of hand labeling. We study the modeling tradeoffs in this new setting and propose an optimizer for automating tradeoff decisions that gives up to 1.8X speedup per pipeline execution. In two collaborations, with the U.S. Department of Veterans Affairs and the U.S. Food and Drug Administration, and on four open-source text and image data sets representative of other deployments, Snorkel provides 132% average improvements to predictive performance over prior heuristic approaches and comes within an average 3.60% of the predictive performance of large hand-curated training sets.

856 citations

Journal ArticleDOI
TL;DR: The unique features and novel application areas of MCSC are characterized and a reference framework for building human-in-the-loop MCSC systems is proposed, which clarifies the complementary nature of human and machine intelligence and envision the potential of deep-fused human--machine systems.
Abstract: With the surging of smartphone sensing, wireless networking, and mobile social networking techniques, Mobile Crowd Sensing and Computing (MCSC) has become a promising paradigm for cross-space and large-scale sensing. MCSC extends the vision of participatory sensing by leveraging both participatory sensory data from mobile devices (offline) and user-contributed data from mobile social networking services (online). Further, it explores the complementary roles and presents the fusion/collaboration of machine and human intelligence in the crowd sensing and computing processes. This article characterizes the unique features and novel application areas of MCSC and proposes a reference framework for building human-in-the-loop MCSC systems. We further clarify the complementary nature of human and machine intelligence and envision the potential of deep-fused human--machine systems. We conclude by discussing the limitations, open issues, and research opportunities of MCSC.

650 citations


Cites result from "A Survey of Crowdsourcing Systems"

  • ...Similar to other crowdsourcing systems [Quinn and Bederson 2011; Yuen et al. 2011], broadly two types of incentives can be used in MCSC applications: intrinsic incentives or financial incentives....

    [...]

Journal ArticleDOI
TL;DR: The most recent applications of the OP, such as the Tourist Trip Design Problem and the mobile-crowdsourcing problem are discussed.

473 citations


Cites background from "A Survey of Crowdsourcing Systems"

  • ...For a comprehensive survey about the crowdsourcing problem itself, we can refer to Yuen et al. (2011)....

    [...]

  • ...For a comprehensive survey about the crowdsourcing problem itself, we can refer to Yuen et al. (2011). Table 13 summarizes different crowdsourcing papers including the proposed algorithms and relationship to the OP and its variants....

    [...]

Journal ArticleDOI
TL;DR: A blockchain-based decentralized framework for crowdsourcing named CrowdBC is conceptualized, in which a requester's task can be solved by a crowd of workers without relying on any third trusted institution, users’ privacy can be guaranteed and only low transaction fees are required.
Abstract: Crowdsourcing systems which utilize the human intelligence to solve complex tasks have gained considerable interest and adoption in recent years. However, the majority of existing crowdsourcing systems rely on central servers, which are subject to the weaknesses of traditional trust-based model, such as single point of failure. They are also vulnerable to distributed denial of service (DDoS) and Sybil attacks due to malicious users involvement. In addition, high service fees from the crowdsourcing platform may hinder the development of crowdsourcing. How to address these potential issues has both research and substantial value. In this paper, we conceptualize a blockchain-based decentralized framework for crowdsourcing named CrowdBC, in which a requester's task can be solved by a crowd of workers without relying on any third trusted institution, users’ privacy can be guaranteed and only low transaction fees are required. In particular, we introduce the architecture of our proposed framework, based on which we give a concrete scheme. We further implement a software prototype on Ethereum public test network with real-world dataset. Experiment results show the feasibility, usability, and scalability of our proposed crowdsourcing system.

387 citations


Cites background from "A Survey of Crowdsourcing Systems"

  • ...We refer readers to a comprehensive survey on crowdsourcing for more information [16], [17], [18], [19], [20]....

    [...]

References
More filters
Proceedings ArticleDOI
25 Apr 2004
TL;DR: A new interactive system: a game that is fun and can be used to create valuable output that addresses the image-labeling problem and encourages people to do the work by taking advantage of their desire to be entertained.
Abstract: We introduce a new interactive system: a game that is fun and can be used to create valuable output. When people play the game they help determine the contents of images by providing meaningful labels for them. If the game is played as much as popular online games, we estimate that most images on the Web can be labeled in a few months. Having proper labels associated with each image on the Web would allow for more accurate image search, improve the accessibility of sites (by providing descriptions of images to visually impaired individuals), and help users block inappropriate images. Our system makes a significant contribution because of its valuable output and because of the way it addresses the image-labeling problem. Rather than using computer vision techniques, which don't work well enough, we encourage people to do the work by taking advantage of their desire to be entertained.

2,365 citations


"A Survey of Crowdsourcing Systems" refers background in this paper

  • ...This survey not only provides a better understanding about crowdsourcing systems, but also facilitates future research activities and application developments in the field of crowdsourcing....

    [...]

Proceedings ArticleDOI
25 Oct 2008
TL;DR: This work explores the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web, and proposes a technique for bias correction that significantly improves annotation quality on two tasks.
Abstract: Human linguistic annotation is crucial for many natural language processing tasks but can be expensive and time-consuming. We explore the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web. We investigate five tasks: affect recognition, word similarity, recognizing textual entailment, event temporal ordering, and word sense disambiguation. For all five, we show high agreement between Mechanical Turk non-expert annotations and existing gold standard labels provided by expert labelers. For the task of affect recognition, we also show that using non-expert labels for training machine learning algorithms can be as effective as using gold standard annotations from experts. We propose a technique for bias correction that significantly improves annotation quality on two tasks. We conclude that many large labeling tasks can be effectively designed and carried out in this method at a fraction of the usual expense.

2,237 citations


Additional excerpts

  • ...language tasks provided by expert labelers [73]....

    [...]

Journal ArticleDOI
TL;DR: An introduction to crowdsourcing is provided, both its theoretical grounding and exemplar cases, taking care to distinguish crowdsourcing from open source production.
Abstract: Crowdsourcing is an online, distributed problem-solving and production model that has emerged in recent years. Notable examples of the model include Threadless, iStockphoto, InnoCentive, the Goldcorp Challenge, and user-generated advertising contests. This article provides an introduction to crowdsourcing, both its theoretical grounding and exemplar cases, taking care to distinguish crowdsourcing from open source production. This article also explores the possibilities for the model, its potential to exploit a crowd of innovators, and its potential for use beyond forprofit sectors. Finally, this article proposes an agenda for research into crowdsourcing.

2,019 citations


"A Survey of Crowdsourcing Systems" refers methods in this paper

  • ...It is affiliated with the Microsoft-CUHK Joint Laboratory for Human-centric Computing and Interface Technologies....

    [...]

Book
18 Aug 2008
TL;DR: The idea of crowdsourcing was first identified by journalist Jeff Howe in a June 2006 Wired article as mentioned in this paper, which describes the process by which the power of the many can be leveraged to accomplish feats that were once the province of the specialized few.
Abstract: The amount of knowledge and talent dispersed among the human race has always outstripped our capacity to harness it Crowdsourcing corrects thatbut in doing so, it also unleashes the forces of creative destruction From CrowdsourcingFirst identified by journalist Jeff Howe in a June 2006 Wired article, crowdsourcing describes the process by which the power of the many can be leveraged to accomplish feats that were once the province of the specialized few Howe reveals that the crowd is more than wiseits talented, creative, and stunningly productive Crowdsourcing activates the transformative power of todays technology, liberating the latent potential within us all Its a perfect meritocracy, where age, gender, race, education, and job history no longer matter; the quality of work is all that counts; and every field is open to people of every imaginable background If you can perform the service, design the product, or solve the problem, youve got the jobBut crowdsourcing has also triggered a dramatic shift in the way work is organized, talent is employed, research is conducted, and products are made and marketed As the crowd comes to supplant traditional forms of labor, pain and disruption are inevitable Jeff Howe delves into both the positive and negative consequences of this intriguing phenomenon Through extensive reporting from the front lines of this revolution, he employs a brilliant array of stories to look at the economic, cultural, business, and political implications of crowdsourcing How were a bunch of part-time dabblers in finance able to help an investment company consistently beat the market? Why does Procter & Gamble repeatedly call on enthusiastic amateurs to solve scientific and technical challenges? How can companies as diverse as iStockphoto and Threadless employ just a handful of people, yet generate millions of dollars in revenue every year? The answers lie within these pages The blueprint for crowdsourcing originated from a handful of computer programmers who showed that a community of like-minded peers could create better products than a corporate behemoth like Microsoft Jeff Howe tracks the amazing migration of this new model of production, showing the potential of the Internet to create human networks that can divvy up and make quick work of otherwise overwhelming tasks One of the most intriguing ideas of Crowdsourcing is that the knowledge to solve intractable problemsa cure for cancer, for instancemay already exist within the warp and weave of this infinite and, as yet, largely untapped resource But first, Howe proposes, we need to banish preconceived notions of how such problems are solved The very concept of crowdsourcing stands at odds with centuries of practice Yet, for the digital natives soon to enter the workforce, the technologies and principles behind crowdsourcing are perfectly intuitive This generation collaborates, shares, remixes, and creates with a fluency and ease the rest of us can hardly understand Crowdsourcing, just now starting to emerge, will in a short time simply be the way things are done

1,674 citations


"A Survey of Crowdsourcing Systems" refers background in this paper

  • ...To reduce a company’s production costs and make more efficient use of labor and resources, crowdsourcing was proposed [33]....

    [...]

Proceedings ArticleDOI
24 Aug 2008
TL;DR: The results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire; for certain label-quality/cost regimes, the benefit is substantial.
Abstract: This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of small tasks becoming easier, for example via Rent-A-Coder or Amazon's Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the data can become considerably more expensive than labeling. We present repeated-labeling strategies of increasing complexity, and show several main results. (i) Repeated-labeling can improve label quality and model quality, but not always. (ii) When labels are noisy, repeated labeling can be preferable to single labeling even in the traditional setting where labels are not particularly cheap. (iii) As soon as the cost of processing the unlabeled data is not free, even the simple strategy of labeling everything multiple times can give considerable advantage. (iv) Repeatedly labeling a carefully chosen set of points is generally preferable, and we present a robust technique that combines different notions of uncertainty to select data points for which quality should be improved. The bottom line: the results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire; for certain label-quality/cost regimes, the benefit is substantial.

1,199 citations


"A Survey of Crowdsourcing Systems" refers methods in this paper

  • ...[69] proposed an analysis to model the data quality using...

    [...]