scispace - formally typeset
Search or ask a question
Topic

Crowdsourcing

About: Crowdsourcing is a research topic. Over the lifetime, 12889 publications have been published within this topic receiving 230638 citations.


Papers
More filters
Proceedings ArticleDOI
Tian Tian1, Jun Zhu1, Fen Xia2, Zhuang Xin2, Tong Zhang2 
18 May 2015
TL;DR: This paper carefully examines the characteristics of the group behaviors of crowd fraud and identifies three persistent patterns, which are moderateness, synchronicity and dispersivity, and proposes an effective crowd fraud detection method for search engine advertising based on these patterns.
Abstract: The rise of crowdsourcing brings new types of malpractices in Internet advertising. One can easily hire web workers through malicious crowdsourcing platforms to attack other advertisers. Such human generated crowd frauds are hard to detect by conventional fraud detection methods. In this paper, we carefully examine the characteristics of the group behaviors of crowd fraud and identify three persistent patterns, which are moderateness, synchronicity and dispersivity. Then we propose an effective crowd fraud detection method for search engine advertising based on these patterns, which consists of a constructing stage, a clustering stage and a filtering stage. At the constructing stage, we remove irrelevant data and reorganize the click logs into a surfer-advertiser inverted list; At the clustering stage, we define the sync-similarity between surfers' click histories and transform the coalition detection to a clustering problem, solved by a nonparametric algorithm; and finally we build a dispersity filter to remove false alarm clusters. The nonparametric nature of our method ensures that we can find an unbounded number of coalitions with nearly no human interaction. We also provide a parallel solution to make the method scalable to Web data and conduct extensive experiments. The empirical results demonstrate that our method is accurate and scalable.

60 citations

Journal ArticleDOI
TL;DR: This article conducted an event study analyzing stock market reactions to crowdsourcing announcements, a forward-looking market-based measure able to isolate the effect of crowdsourcing on a firm's future profits, which they refer to as firm stock market performance.

60 citations

Proceedings ArticleDOI
27 Apr 2013
TL;DR: The TimeWarp approach automatically increases and decreases the speed of speech playback systematically across individual workers who caption only the periods played at reduced speed, which may help crowds outperform individuals on other difficult real-time performance tasks.
Abstract: In this paper, we introduce the idea of "warping time" to improve crowd performance on the difficult task of captioning speech in real-time. Prior work has shown that the crowd can collectively caption speech in real-time by merging the partial results of multiple workers. Because non-expert workers cannot keep up with natural speaking rates, the task is frustrating and prone to errors as workers buffer what they hear to type later. The TimeWarp approach automatically increases and decreases the speed of speech playback systematically across individual workers who caption only the periods played at reduced speed. Studies with 139 remote crowd workers and 24 local participants show that this approach improves median coverage (14.8%), precision (11.2%), and per-word latency (19.1%). Warping time may also help crowds outperform individuals on other difficult real-time performance tasks.

60 citations

Proceedings ArticleDOI
11 Aug 2013
TL;DR: A hierarchical Bayesian model, TLC (Transfer Learning for Crowdsourcing), is proposed to implement this idea by considering the overlapping users as a bridge, which borrows knowledge from auxiliary historical tasks to improve the data veracity in a given target task.
Abstract: Crowdsourcing is an effective method for collecting labeled data for various data mining tasks. It is critical to ensure the veracity of the produced data because responses collected from different users may be noisy and unreliable. Previous works solve this veracity problem by estimating both the user ability and question difficulty based on the knowledge in each task individually. In this case, each single task needs large amounts of data to provide accurate estimations. However, in practice, budgets provided by customers for a given target task may be limited, and hence each question can be presented to only a few users where each user can answer only a few questions. This data sparsity problem can cause previous approaches to perform poorly due to the overfitting problem on rare data and eventually damage the data veracity. Fortunately, in real-world applications, users can answer questions from multiple historical tasks. For example, one can annotate images as well as label the sentiment of a given title. In this paper, we employ transfer learning, which borrows knowledge from auxiliary historical tasks to improve the data veracity in a given target task. The motivation is that users have stable characteristics across different crowdsourcing tasks and thus data from different tasks can be exploited collectively to estimate users' abilities in the target task. We propose a hierarchical Bayesian model, TLC (Transfer Learning for Crowdsourcing), to implement this idea by considering the overlapping users as a bridge. In addition, to avoid possible negative impact, TLC introduces task-specific factors to model task differences. The experimental results show that TLC significantly improves the accuracy over several state-of-the-art non-transfer-learning approaches under very limited budget in various labeling tasks.

60 citations


Network Information
Related Topics (5)
Social network
42.9K papers, 1.5M citations
87% related
User interface
85.4K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Cluster analysis
146.5K papers, 2.9M citations
85% related
The Internet
213.2K papers, 3.8M citations
85% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023637
20221,420
2021996
20201,250
20191,341
20181,396