Topic
Crowdsourcing
About: Crowdsourcing is a research topic. Over the lifetime, 12889 publications have been published within this topic receiving 230638 citations.
Papers published on a yearly basis
Papers
More filters
••
18 May 2015TL;DR: This paper carefully examines the characteristics of the group behaviors of crowd fraud and identifies three persistent patterns, which are moderateness, synchronicity and dispersivity, and proposes an effective crowd fraud detection method for search engine advertising based on these patterns.
Abstract: The rise of crowdsourcing brings new types of malpractices in Internet advertising. One can easily hire web workers through malicious crowdsourcing platforms to attack other advertisers. Such human generated crowd frauds are hard to detect by conventional fraud detection methods. In this paper, we carefully examine the characteristics of the group behaviors of crowd fraud and identify three persistent patterns, which are moderateness, synchronicity and dispersivity. Then we propose an effective crowd fraud detection method for search engine advertising based on these patterns, which consists of a constructing stage, a clustering stage and a filtering stage. At the constructing stage, we remove irrelevant data and reorganize the click logs into a surfer-advertiser inverted list; At the clustering stage, we define the sync-similarity between surfers' click histories and transform the coalition detection to a clustering problem, solved by a nonparametric algorithm; and finally we build a dispersity filter to remove false alarm clusters. The nonparametric nature of our method ensures that we can find an unbounded number of coalitions with nearly no human interaction. We also provide a parallel solution to make the method scalable to Web data and conduct extensive experiments. The empirical results demonstrate that our method is accurate and scalable.
60 citations
••
60 citations
••
TL;DR: This article conducted an event study analyzing stock market reactions to crowdsourcing announcements, a forward-looking market-based measure able to isolate the effect of crowdsourcing on a firm's future profits, which they refer to as firm stock market performance.
60 citations
••
27 Apr 2013TL;DR: The TimeWarp approach automatically increases and decreases the speed of speech playback systematically across individual workers who caption only the periods played at reduced speed, which may help crowds outperform individuals on other difficult real-time performance tasks.
Abstract: In this paper, we introduce the idea of "warping time" to improve crowd performance on the difficult task of captioning speech in real-time. Prior work has shown that the crowd can collectively caption speech in real-time by merging the partial results of multiple workers. Because non-expert workers cannot keep up with natural speaking rates, the task is frustrating and prone to errors as workers buffer what they hear to type later. The TimeWarp approach automatically increases and decreases the speed of speech playback systematically across individual workers who caption only the periods played at reduced speed. Studies with 139 remote crowd workers and 24 local participants show that this approach improves median coverage (14.8%), precision (11.2%), and per-word latency (19.1%). Warping time may also help crowds outperform individuals on other difficult real-time performance tasks.
60 citations
••
11 Aug 2013TL;DR: A hierarchical Bayesian model, TLC (Transfer Learning for Crowdsourcing), is proposed to implement this idea by considering the overlapping users as a bridge, which borrows knowledge from auxiliary historical tasks to improve the data veracity in a given target task.
Abstract: Crowdsourcing is an effective method for collecting labeled data for various data mining tasks. It is critical to ensure the veracity of the produced data because responses collected from different users may be noisy and unreliable. Previous works solve this veracity problem by estimating both the user ability and question difficulty based on the knowledge in each task individually. In this case, each single task needs large amounts of data to provide accurate estimations. However, in practice, budgets provided by customers for a given target task may be limited, and hence each question can be presented to only a few users where each user can answer only a few questions. This data sparsity problem can cause previous approaches to perform poorly due to the overfitting problem on rare data and eventually damage the data veracity. Fortunately, in real-world applications, users can answer questions from multiple historical tasks. For example, one can annotate images as well as label the sentiment of a given title. In this paper, we employ transfer learning, which borrows knowledge from auxiliary historical tasks to improve the data veracity in a given target task. The motivation is that users have stable characteristics across different crowdsourcing tasks and thus data from different tasks can be exploited collectively to estimate users' abilities in the target task. We propose a hierarchical Bayesian model, TLC (Transfer Learning for Crowdsourcing), to implement this idea by considering the overlapping users as a bridge. In addition, to avoid possible negative impact, TLC introduces task-specific factors to model task differences. The experimental results show that TLC significantly improves the accuracy over several state-of-the-art non-transfer-learning approaches under very limited budget in various labeling tasks.
60 citations