Topic

Crowdsourcing

About: Crowdsourcing is a research topic. Over the lifetime, 12889 publications have been published within this topic receiving 230638 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Automated Crowdturfing Attacks and Defenses in Online Review Systems

[...]

Yuanshun Yao¹, Bimal Viswanath¹, Jenna Cryan¹, Haitao Zheng¹, Ben Y. Zhao¹ - Show less +1 more•Institutions (1)

University of Chicago¹

30 Oct 2017

TL;DR: In this paper, the authors identify a new class of attacks that leverage deep learning language models (Recurrent Neural Networks or RNNs) to automate the generation of fake online reviews for products and services.

...read moreread less

Abstract: Malicious crowdsourcing forums are gaining traction as sources of spreading misinformation online, but are limited by the costs of hiring and managing human workers. In this paper, we identify a new class of attacks that leverage deep learning language models (Recurrent Neural Networks or RNNs) to automate the generation of fake online reviews for products and services. Not only are these attacks cheap and therefore more scalable, but they can control rate of content output to eliminate the signature burstiness that makes crowdsourced campaigns easy to detect. Using Yelp reviews as an example platform, we show how a two phased review generation and customization attack can produce reviews that are indistinguishable by state-of-the-art statistical detectors. We conduct a survey-based user study to show these reviews not only evade human detection, but also score high on "usefulness" metrics by users. Finally, we develop novel automated defenses against these attacks, by leveraging the lossy transformation introduced by the RNN training and generation cycle. We consider countermeasures against our mechanisms, show that they produce unattractive cost-benefit tradeoffs for attackers, and that they can be further curtailed by simple constraints imposed by online service providers.

...read moreread less

126 citations

Journal Article•DOI•

An analysis of human factors and label accuracy in crowdsourcing relevance judgments

[...]

Gabriella Kazai¹, Jaap Kamps², Natasa Milic-Frayling¹•Institutions (2)

Microsoft¹, University of Amsterdam²

01 Apr 2013-Information Retrieval

TL;DR: In this paper, a series of experiments using Amazon’s Mechanical Turk are described, conducted to explore the ‘human’ characteristics of the crowds involved in a relevance assessment task and arrive at insights into the complex interaction of the observed factors.

...read moreread less

Abstract: Crowdsourcing relevance judgments for the evaluation of search engines is used increasingly to overcome the issue of scalability that hinders traditional approaches relying on a fixed group of trusted expert judges. However, the benefits of crowdsourcing come with risks due to the engagement of a self-forming group of individuals--the crowd, motivated by different incentives, who complete the tasks with varying levels of attention and success. This increases the need for a careful design of crowdsourcing tasks that attracts the right crowd for the given task and promotes quality work. In this paper, we describe a series of experiments using Amazon's Mechanical Turk, conducted to explore the `human' characteristics of the crowds involved in a relevance assessment task. In the experiments, we vary the level of pay offered, the effort required to complete a task and the qualifications required of the workers. We observe the effects of these variables on the quality of the resulting relevance labels, measured based on agreement with a gold set, and correlate them with self-reported measures of various human factors. We elicit information from the workers about their motivations, interest and familiarity with the topic, perceived task difficulty, and satisfaction with the offered pay. We investigate how these factors combine with aspects of the task design and how they affect the accuracy of the resulting relevance labels. Based on the analysis of 960 HITs and 2,880 HIT assignments resulting in 19,200 relevance labels, we arrive at insights into the complex interaction of the observed factors and provide practical guidelines to crowdsourcing practitioners. In addition, we highlight challenges in the data analysis that stem from the peculiarity of the crowdsourcing environment where the sample of individuals engaged in specific work conditions are inherently influenced by the conditions themselves.

...read moreread less

125 citations

Some Like it Hoax: Automated Fake News Detection in Social Networks

[...]

Eugenio Tacchini, Gabriele Ballarin, Marco L. Della Vedova, Stefano Moret, Luca de Alfaro - Show less +1 more

25 Apr 2017

TL;DR: In this paper, the authors show that Facebook posts can be classified with high accuracy as hoaxes or non-hoaxes on the basis of the users who "like" them.

...read moreread less

Abstract: In recent years, the reliability of information on the Internet has emerged as a crucial issue of modern society. Social network sites (SNSs) have revolutionized the way in which information is spread by allowing users to freely share content. As a consequence, SNSs are also increasingly used as vectors for the diffusion of misinformation and hoaxes. The amount of disseminated information and the rapidity of its diffusion make it practically impossible to assess reliability in a timely manner, highlighting the need for automatic hoax detection systems. As a contribution towards this objective, we show that Facebook posts can be classified with high accuracy as hoaxes or non-hoaxes on the basis of the users who "liked" them. We present two classification techniques, one based on logistic regression, the other on a novel adaptation of boolean crowdsourcing algorithms. On a dataset consisting of 15,500 Facebook posts and 909,236 users, we obtain classification accuracies exceeding 99% even when the training set contains less than 1% of the posts. We further show that our techniques are robust: they work even when we restrict our attention to the users who like both hoax and non-hoax posts. These results suggest that mapping the diffusion pattern of information can be a useful component of automatic hoax detection systems.

...read moreread less

125 citations

Proceedings Article•

Optimistic Knowledge Gradient Policy for Optimal Budget Allocation in Crowdsourcing

[...]

Xi Chen¹, Qihang Lin¹, Dengyong Zhou²•Institutions (2)

Carnegie Mellon University¹, Microsoft²

16 Jun 2013

TL;DR: A novel approximate policy is proposed which is called optimistic knowledge gradient which is practically efficient while theoretically its consistency can be guaranteed and extended to deal with inhomogeneous workers and tasks with contextual information available.

...read moreread less

Abstract: In real crowdsourcing applications, each label from a crowd usually comes with a certain cost. Given a pre-fixed amount of budget, since different tasks have different ambiguities and different workers have different expertises, we want to find an optimal way to allocate the budget among instance-worker pairs such that the overall label quality can be maximized. To address this issue, we start from the simplest setting in which all workers are assumed to be perfect. We formulate the problem as a Bayesian Markov Decision Process (MDP). Using the dynamic programming (DP) algorithm, one can obtain the optimal allocation policy for a given budget. However, DP is computationally intractable. To solve the computational challenge, we propose a novel approximate policy which is called optimistic knowledge gradient. It is practically efficient while theoretically its consistency can be guaranteed. We then extend the MDP framework to deal with inhomogeneous workers and tasks with contextual information available. The experiments on both simulated and real data demonstrate the superiority of our method.

...read moreread less

124 citations

Proceedings Article•DOI•

Crowdsourced enumeration queries

[...]

Beth Trushkowsky¹, Tim Kraska¹, Michael J. Franklin¹, Purnamrita Sarkar¹•Institutions (1)

University of California, Berkeley¹

08 Apr 2013

TL;DR: This work develops statistical tools that enable users and systems developers to reason about query completeness and can also help drive query execution and crowdsourcing strategies.

...read moreread less

Abstract: Hybrid human/computer database systems promise to greatly expand the usefulness of query processing by incorporating the crowd for data gathering and other tasks. Such systems raise many implementation questions. Perhaps the most fundamental question is that the closed world assumption underlying relational query semantics does not hold in such systems. As a consequence the meaning of even simple queries can be called into question. Furthermore, query progress monitoring becomes difficult due to non-uniformities in the arrival of crowdsourced data and peculiarities of how people work in crowdsourcing systems. To address these issues, we develop statistical tools that enable users and systems developers to reason about query completeness. These tools can also help drive query execution and crowdsourcing strategies. We evaluate our techniques using experiments on a popular crowdsourcing platform.

...read moreread less

124 citations

Collapse

Network Information

Performance

Metrics

14,950

Papers

282,478

Citations

No. of papers in the topic in previous years
Year	Papers
2023	637
2022	1,420
2021	996
2020	1,250
2019	1,341
2018	1,396

Crowdsourcing

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics