Showing papers on "Crowdsourcing published in 2015"

PDF

Open Access

Proceedings Article•DOI•

Evaluation methods for unsupervised word embeddings

[...]

Tobias Schnabel¹, Igor Labutov¹, David Mimno¹, Thorsten Joachims¹•Institutions (1)

01 Sep 2015

TL;DR: A comprehensive study of evaluation methods for unsupervised embedding techniques that obtain meaningful representations of words from text, calling into question the common assumption that there is one single optimal vector representation.

...read moreread less

Abstract: We present a comprehensive study of evaluation methods for unsupervised embedding techniques that obtain meaningful representations of words from text. Different evaluations result in different orderings of embedding methods, calling into question the common assumption that there is one single optimal vector representation. We present new evaluation techniques that directly compare embeddings with respect to specific queries. These methods reduce bias, provide greater insight, and allow us to solicit data-driven relevance judgments rapidly and accurately through crowdsourcing.

...read moreread less

560 citations

Posted Content•

An open access repository of images on plant health to enable the development of mobile disease diagnostics

[...]

David P. Hughes, Marcel Salathé

25 Nov 2015-arXiv: Computers and Society

TL;DR: These data are the beginning of an on-going, crowdsourcing effort to enable computer vision approaches to help solve the problem of yield losses in crop plants due to infectious diseases.

...read moreread less

Abstract: Human society needs to increase food production by an estimated 70% by 2050 to feed an expected population size that is predicted to be over 9 billion people. Currently, infectious diseases reduce the potential yield by an average of 40% with many farmers in the developing world experiencing yield losses as high as 100%. The widespread distribution of smartphones among crop growers around the world with an expected 5 billion smartphones by 2020 offers the potential of turning the smartphone into a valuable tool for diverse communities growing food. One potential application is the development of mobile disease diagnostics through machine learning and crowdsourcing. Here we announce the release of over 50,000 expertly curated images on healthy and infected leaves of crops plants through the existing online platform PlantVillage. We describe both the data and the platform. These data are the beginning of an on-going, crowdsourcing effort to enable computer vision approaches to help solve the problem of yield losses in crop plants due to infectious diseases.

...read moreread less

492 citations

Journal Article•DOI•

The Impact of Culture on Creativity: How Cultural Tightness and Cultural Distance Affect Global Innovation Crowdsourcing Work

[...]

Roy Y. J. Chua¹, Yannig Roth², Jean-François Lemoine³, Jean-François Lemoine²•Institutions (3)

Singapore Management University¹, University of Paris², École Normale Supérieure³

01 Jun 2015-Administrative Science Quarterly

TL;DR: The authors found that individuals from tight cultures are less likely than counterparts from loose cultures to engage in and succeed at foreign creative tasks; this effect is intensified as the cultural distance between the innovator's and the audience's country increases.

...read moreread less

Abstract: This paper advances a new theoretical model to understand the effect of culture on creativity in a global context. We theorize that creativity engagement and success depend on the cultural tightness—the extent to which a country is characterized by strong social norms and low tolerance for deviant behaviors—of both an innovator’s country and the audience’s country, as well as the cultural distance between these two countries. Using field data from a global online crowdsourcing platform that organizes creative contests for consumer-product brands, supplemented by interviews with marketing experts, we found that individuals from tight cultures are less likely than counterparts from loose cultures to engage in and succeed at foreign creative tasks; this effect is intensified as the cultural distance between the innovator’s and the audience’s country increases. Additionally, tight cultures are less receptive to foreign creative ideas. But we also found that in certain circumstances—when members of a tight cul...

...read moreread less

312 citations

Journal Article•DOI•

Crowdsourcing for climate and atmospheric sciences: current status and future potential

[...]

Catherine L. Muller¹, Lee Chapman¹, Samuel Johnston, Chris Kidd², Chris Kidd³, Sam Illingworth⁴, Giles M. Foody⁵, Aart Overeem⁶, Aart Overeem⁷, Rosie Leigh⁸ - Show less +6 more•Institutions (8)

University of Birmingham¹, University of Maryland, College Park², Goddard Space Flight Center³, Manchester Metropolitan University⁴, University of Nottingham⁵, Royal Netherlands Meteorological Institute⁶, Wageningen University and Research Centre⁷, University of Leicester⁸

01 Sep 2015-International Journal of Climatology

TL;DR: If appropriate validation and quality control procedures are adopted and implemented, crowdsourcing has much potential to provide a valuable source of high temporal and spatial resolution, real-time data, especially in regions where few observations currently exist, thereby adding value to science, technology and society.

...read moreread less

Abstract: Crowdsourcing is traditionally defined as obtaining data or information by enlisting the services of a (potentially large) number of people. However, due to recent innovations, this definition can now be expanded to include ‘and/or from a range of public sensors, typically connected via the Internet.’ A large and increasing amount of data is now being obtained from a huge variety of non-traditional sources – from smart phone sensors to amateur weather stations to canvassing members of the public. Some disciplines (e.g. astrophysics, ecology) are already utilizing crowdsourcing techniques (e.g. citizen science initiatives, web 2.0 technology, low-cost sensors), and while its value within the climate and atmospheric science disciplines is still relatively unexplored, it is beginning to show promise. However, important questions remain; this paper introduces and explores the wide-range of current and prospective methods to crowdsource atmospheric data, investigates the quality of such data and examines its potential applications in the context of weather, climate and society. It is clear that crowdsourcing is already a valuable tool for engaging the public, and if appropriate validation and quality control procedures are adopted and implemented, it has much potential to provide a valuable source of high temporal and spatial resolution, real-time data, especially in regions where few observations currently exist, thereby adding value to science, technology and society.

...read moreread less

271 citations

Journal Article•DOI•

LIRIS-ACCEDE: A Video Database for Affective Content Analysis

[...]

Yoann Baveye, Emmanuel Dellandréa¹, Christel Chamaret, Liming Chen¹•Institutions (1)

Centre national de la recherche scientifique¹

01 Jan 2015-IEEE Transactions on Affective Computing

TL;DR: A large video database, namely LIRIS-ACCEDE, is proposed, which consists of 9,800 good quality video excerpts with a large content diversity and provides four experimental protocols and a baseline for prediction of emotions using a large set of both visual and audio features.

...read moreread less

Abstract: Research in affective computing requires ground truth data for training and benchmarking computational models for machine-based emotion understanding. In this paper, we propose a large video database, namely LIRIS-ACCEDE, for affective content analysis and related applications, including video indexing, summarization or browsing. In contrast to existing datasets with very few video resources and limited accessibility due to copyright constraints, LIRIS-ACCEDE consists of 9,800 good quality video excerpts with a large content diversity. All excerpts are shared under creative commons licenses and can thus be freely distributed without copyright issues. Affective annotations were achieved using crowdsourcing through a pair-wise video comparison protocol, thereby ensuring that annotations are fully consistent, as testified by a high inter-annotator agreement, despite the large diversity of raters’ cultural backgrounds. In addition, to enable fair comparison and landmark progresses of future affective computational models, we further provide four experimental protocols and a baseline for prediction of emotions using a large set of both visual and audio features. The dataset (the video clips, annotations, features and protocols) is publicly available at: http://liris-accede.ec-lyon.fr/.

...read moreread less

270 citations

Posted Content•

An open access repository of images on plant health to enable the development of mobile disease diagnostics through machine learning and crowdsourcing

[...]

David P. Hughes, Marcel Salathé

25 Nov 2015-arXiv: Computers and Society

TL;DR: In this article, the authors have released >50,000 expertly curated images on healthy and infected leaves of crops plants through the existing platform www.plantVillage.org.

...read moreread less

Abstract: Human society needs to increase food production by an estimated 70% by 2050 to feed an expected population size that is predicted to be over 9 billion people. Currently infectious diseases reduce the potential yield by an average of 40% with many farmers in the developing world experiencing yield losses as high as 100%. Infectious diseases of crops are not new and historic examples such as the Irish Potato Famine of 1845-49 demonstrate this. But what is new is the widespread distribution of smartphones among crop growers around the world with an expected 5 billion smartphones by 2020. This offers the potential of turning the smartphone into a valuable tool for diverse communities growing food. One potential application is the development of mobile disease diagnostics through machine learning and crowdsourcing. Computer vision and machine learning have shown their potential to automatically classify images. To do this for plant diseases requires a training set that facilitates the development of the algorithms. Here we announce the release of >50,000 expertly curated images on healthy and infected leaves of crops plants through the existing platform www.PlantVillage.org. We describe both the data and the platform. These data are the beginning of an on-going, crowdsourcing effort to enable computer vision approaches to help solve the problem of yield losses in crop plants due to infectious diseases.

...read moreread less

265 citations

Proceedings Article•DOI•

KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing

[...]

Xu Chu¹, John Morcos¹, Ihab F. Ilyas¹, Mourad Ouzzani², Paolo Papotti², Nan Tang², Yin Ye² - Show less +3 more•Institutions (2)

University of Waterloo¹, Qatar Computing Research Institute²

27 May 2015

TL;DR: KATARA is proposed, a knowledge base and crowd powered data cleaning system that interprets table semantics to align it with the KB, identifies correct and incorrect data, and generates top-k possible repairs for incorrect data.

...read moreread less

Abstract: Classical approaches to clean data have relied on using integrity constraints, statistics, or machine learning. These approaches are known to be limited in the cleaning accuracy, which can usually be improved by consulting master data and involving experts to resolve ambiguity. The advent of knowledge bases KBs both general-purpose and within enterprises, and crowdsourcing marketplaces are providing yet more opportunities to achieve higher accuracy at a larger scale. We propose KATARA, a knowledge base and crowd powered data cleaning system that, given a table, a KB, and a crowd, interprets table semantics to align it with the KB, identifies correct and incorrect data, and generates top-k possible repairs for incorrect data. Experiments show that KATARA can be applied to various datasets and KBs, and can efficiently annotate data and suggest possible repairs.

...read moreread less

259 citations

Journal Article•DOI•

The cultural work of microwork

[...]

Lilly Irani¹•Institutions (1)

University of California, San Diego¹

01 May 2015-New Media & Society

TL;DR: It is argued that microwork systems produce the difference between “innovative“ laborers and “menial” laborers, ameliorating resulting tensions in new media production cultures in turn.

...read moreread less

Abstract: Crowdsourcing systems do more than get information work done. This paper argues that microwork systems produce the difference between “innovative” laborers and “menial” laborers, ameliorating resulting tensions in new media production cultures in turn. This paper focuses on Amazon Mechanical Turk (AMT) as an emblematic case of microwork crowdsourcing. Ethical research on crowdsourcing has focused on questions of worker fairness and microlabor alienation. This paper focuses on the cultural work of AMT’s mediations: divisions of labor and software interfaces. This paper draws from infrastructure studies and feminist science and technology studies to examine Amazon Mechanical Turk labor practice, its methods of worker control, and the kinds of users it produces.

...read moreread less

257 citations

Journal Article•DOI•

Crowd science user contribution patterns and their implications

[...]

Henry Sauermann¹, Chiara Franzoni²•Institutions (2)

Georgia Institute of Technology¹, Polytechnic University of Milan²

20 Jan 2015-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The findings inform recent discussions about potential benefits from crowd science, suggest that involving the crowd may be more effective for some kinds of projects than others, provide guidance for project managers, and raise important questions for future research.

...read moreread less

Abstract: Scientific research performed with the involvement of the broader public (the crowd) attracts increasing attention from scientists and policy makers. A key premise is that project organizers may be able to draw on underused human resources to advance research at relatively low cost. Despite a growing number of examples, systematic research on the effort contributions volunteers are willing to make to crowd science projects is lacking. Analyzing data on seven different projects, we quantify the financial value volunteers can bring by comparing their unpaid contributions with counterfactual costs in traditional or online labor markets. The volume of total contributions is substantial, although some projects are much more successful in attracting effort than others. Moreover, contributions received by projects are very uneven across time—a tendency toward declining activity is interrupted by spikes typically resulting from outreach efforts or media attention. Analyzing user-level data, we find that most contributors participate only once and with little effort, leaving a relatively small share of users who return responsible for most of the work. Although top contributor status is earned primarily through higher levels of effort, top contributors also tend to work faster. This speed advantage develops over multiple sessions, suggesting that it reflects learning rather than inherent differences in skills. Our findings inform recent discussions about potential benefits from crowd science, suggest that involving the crowd may be more effective for some kinds of projects than others, provide guidance for project managers, and raise important questions for future research.

...read moreread less

243 citations

Proceedings Article•DOI•

Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys

[...]

Ujwal Gadiraju¹, Ricardo Kawase, Stefan Dietze, Gianluca Demartini²•Institutions (2)

Leibniz University of Hanover¹, University of Sheffield²

18 Apr 2015

TL;DR: The prevalent malicious activity on crowdsourcing platforms is analyzed and different types of workers in the crowd are defined, a method to measure malicious activity is proposed, and guidelines for the efficient design of crowdsourced surveys are presented.

...read moreread less

Abstract: Crowdsourcing is increasingly being used as a means to tackle problems requiring human intelligence. With the ever-growing worker base that aims to complete microtasks on crowdsourcing platforms in exchange for financial gains, there is a need for stringent mechanisms to prevent exploitation of deployed tasks. Quality control mechanisms need to accommodate a diverse pool of workers, exhibiting a wide range of behavior. A pivotal step towards fraud-proof task design is understanding the behavioral patterns of microtask workers. In this paper, we analyze the prevalent malicious activity on crowdsourcing platforms and study the behavior exhibited by trustworthy and untrustworthy workers, particularly on crowdsourced surveys. Based on our analysis of the typical malicious activity, we define and identify different types of workers in the crowd, propose a method to measure malicious activity, and finally present guidelines for the efficient design of crowdsourced surveys.

...read moreread less

230 citations

Journal Article•DOI•

Massive Online Crowdsourced Study of Subjective and Objective Picture Quality

[...]

Deepti Ghadiyaram¹, Alan C. Bovik¹•Institutions (1)

University of Texas at Austin¹

09 Nov 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: The LIVE In the Wild Image Quality Challenge Database as discussed by the authors contains widely diverse authentic image distortions on a large number of images captured using a representative variety of modern mobile devices and has been used to conduct a very large-scale, multi-month image quality assessment subjective study.

...read moreread less

Abstract: Most publicly available image quality databases have been created under highly controlled conditions by introducing graded simulated distortions onto high-quality photographs. However, images captured using typical real-world mobile camera devices are usually afflicted by complex mixtures of multiple distortions, which are not necessarily well-modeled by the synthetic distortions found in existing databases. The originators of existing legacy databases usually conducted human psychometric studies to obtain statistically meaningful sets of human opinion scores on images in a stringently controlled visual environment, resulting in small data collections relative to other kinds of image analysis databases. Towards overcoming these limitations, we designed and created a new database that we call the LIVE In the Wild Image Quality Challenge Database, which contains widely diverse authentic image distortions on a large number of images captured using a representative variety of modern mobile devices. We also designed and implemented a new online crowdsourcing system, which we have used to conduct a very large-scale, multi-month image quality assessment subjective study. Our database consists of over 350000 opinion scores on 1162 images evaluated by over 7000 unique human observers. Despite the lack of control over the experimental environments of the numerous study participants, we demonstrate excellent internal consistency of the subjective dataset. We also evaluate several top-performing blind Image Quality Assessment algorithms on it and present insights on how mixtures of distortions challenge both end users as well as automatic perceptual quality prediction models.

...read moreread less

Proceedings Article•DOI•

iCrowd: An Adaptive Crowdsourcing Framework

[...]

Ju Fan¹, Guoliang Li², Beng Chin Ooi¹, Kian-Lee Tan¹, Jianhua Feng² - Show less +1 more•Institutions (2)

National University of Singapore¹, Tsinghua University²

27 May 2015

TL;DR: An adaptive crowdsourcing framework, called iCrowd, that estimates accuracies of a worker by evaluating her performance on the completed tasks, and predicts which tasks the worker is well acquainted with, and achieves higher quality than existing approaches.

...read moreread less

Abstract: Crowdsourcing is widely accepted as a means for resolving tasks that machines are not good at. Unfortunately, Crowdsourcing may yield relatively low-quality results if there is no proper quality control. Although previous studies attempt to eliminate "bad" workers by using qualification tests, the accuracies estimated from qualifications may not be accurate, because workers have diverse accuracies across tasks. Thus, the quality of the results could be further improved by selectively assigning tasks to the workers who are well acquainted with the tasks. To this end, we propose an adaptive crowdsourcing framework, called iCrowd. iCrowd on-the-fly estimates accuracies of a worker by evaluating her performance on the completed tasks, and predicts which tasks the worker is well acquainted with. When a worker requests for a task, iCrowd assigns her a task, to which the worker has the highest estimated accuracy among all online workers. Once a worker submits an answer to a task, iCrowd analyzes her answer and adjusts estimation of her accuracies to improve subsequent task assignments. This paper studies the challenges that arise in iCrowd. The first is how to estimate diverse accuracies of a worker based on her completed tasks. The second is instant task assignment. We deploy iCrowd on Amazon Mechanical Turk, and conduct extensive experiments on real datasets. Experimental results show that iCrowd achieves higher quality than existing approaches.

...read moreread less

Proceedings Article•DOI•

QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications

[...]

Yudian Zheng¹, Jiannan Wang², Guoliang Li³, Reynold Cheng¹, Jianhua Feng³ - Show less +1 more•Institutions (3)

University of Hong Kong¹, University of California, Berkeley², Tsinghua University³

27 May 2015

TL;DR: This paper investigates the online task assignment problem: Given a pool of n questions, which of the k questions should be assigned to a worker, and proposes a system called the Quality-Aware Task Assignment System for Crowdsourcing Applications (QASCA) on top of AMT.

...read moreread less

Abstract: A crowdsourcing system, such as the Amazon Mechanical Turk (AMT), provides a platform for a large number of questions to be answered by Internet workers. Such systems have been shown to be useful to solve problems that are difficult for computers, including entity resolution, sentiment analysis, and image recognition. In this paper, we investigate the online task assignment problem: Given a pool of n questions, which of the k questions should be assigned to a worker? A poor assignment may not only waste time and money, but may also hurt the quality of a crowdsourcing application that depends on the workers' answers. We propose to consider quality measures (also known as evaluation metrics) that are relevant to an application during the task assignment process. Particularly, we explore how Accuracy and F-score, two widely-used evaluation metrics for crowdsourcing applications, can facilitate task assignment. Since these two metrics assume that the ground truth of a question is known, we study their variants that make use of the probability distributions derived from workers' answers. We further investigate online assignment strategies, which enables optimal task assignments. Since these algorithms are expensive, we propose solutions that attain high quality in linear time. We develop a system called the Quality-Aware Task Assignment System for Crowdsourcing Applications (QASCA) on top of AMT. We evaluate our approaches on five real crowdsourcing applications. We find that QASCA is efficient, and attains better result quality (of more than 8% improvement) compared with existing methods.

...read moreread less

Journal Article•DOI•

How to work a crowd: Developing crowd capital through crowdsourcing

[...]

John Prpic¹, Prashant Shukla², Jan Kietzmann¹, Ian P. McCarthy¹•Institutions (2)

Simon Fraser University¹, University of Toronto²

01 Jan 2015-Business Horizons

TL;DR: The term "crowd" was used almost exclusively in the context of people who self-organized around a common purpose, emotion, or experience as mentioned in this paper. But, today, firms often refer to crowds...

...read moreread less

Journal Article•DOI•

Security and privacy in mobile crowdsourcing networks: challenges and opportunities

[...]

Kan Yang¹, Kuan Zhang¹, Ju Ren¹, Xuemin Shen¹•Institutions (1)

University of Waterloo¹

06 Aug 2015-IEEE Communications Magazine

TL;DR: A general architecture for a mobile crowdsourcing network comprising both crowdsourcing sensing and crowdsourcing computing is proposed, and several critical security and privacy challenges are set forth that essentially capture the characteristics of MCNs.

...read moreread less

Abstract: The mobile crowdsourcing network (MCN) is a promising network architecture that applies the principles of crowdsourcing to perform tasks with human involvement and powerful mobile devices. However, it also raises some critical security and privacy issues that impede the application of MCNs. In this article, in order to better understand these critical security and privacy challenges, we first propose a general architecture for a mobile crowdsourcing network comprising both crowdsourcing sensing and crowdsourcing computing. After that, we set forth several critical security and privacy challenges that essentially capture the characteristics of MCNs. We also formulate some research problems leading to possible research directions. We expect this work will bring more attention to further investigation on security and privacy solutions for mobile crowdsourcing networks.

...read moreread less

Journal Article•DOI•

Quality-Driven Auction-Based Incentive Mechanism for Mobile Crowd Sensing

[...]

Yutian Wen¹, Jinyu Shi¹, Qi Zhang¹, Xiaohua Tian¹, Zhengyong Huang¹, Hui Yu¹, Yu Cheng², Xuemin Shen³ - Show less +4 more•Institutions (3)

Shanghai Jiao Tong University¹, Illinois Institute of Technology², University of Waterloo³

01 Sep 2015-IEEE Transactions on Vehicular Technology

TL;DR: An incentive mechanism based on a quality-driven auction (QDA) for the MCS system, where the worker is paid off based on the quality of sensed data instead of working time, is proposed and theoretically proves that the mechanism is truthful, individual rational, platform profitable, and social-welfare optimal.

...read moreread less

Abstract: The recent paradigm of mobile crowd sensing (MCS) enables a broad range of mobile applications. A critical challenge for the paradigm is to incentivize phone users to be workers providing sensing services. While some theoretical incentive mechanisms for general-purpose crowdsourcing have been proposed, it is still an open issue as to how to incorporate the theoretical framework into the practical MCS system. In this paper, we propose an incentive mechanism based on a quality-driven auction (QDA). The mechanism is specifically for the MCS system, where the worker is paid off based on the quality of sensed data instead of working time, as adopted in the literature. We theoretically prove that the mechanism is truthful, individual rational, platform profitable, and social-welfare optimal. Moreover, we incorporate our incentive mechanism into a Wi-Fi fingerprint-based indoor localization system to incentivize the MCS-based fingerprint collection. We present a probabilistic model to evaluate the reliability of the submitted data, which resolves the issue that the ground truth for the data reliability is unavailable. We realize and deploy an indoor localization system to evaluate our proposed incentive mechanism and present extensive experimental results.

...read moreread less

Proceedings Article•DOI•

Truthful incentive mechanisms for crowdsourcing

[...]

Xiang Zhang¹, Guoliang Xue¹, Ruozhou Yu¹, Dejun Yang², Jian Tang³ - Show less +1 more•Institutions (3)

Arizona State University¹, Colorado School of Mines², Syracuse University³

21 Aug 2015

TL;DR: This work designs an incentive mechanism for each of the three models of crowdsourcing, and proves that these incentive mechanisms are individually rational, budget-balanced, computationally efficient, and truthful.

...read moreread less

Abstract: With the prosperity of smart devices, crowdsourcing has emerged as a new computing/networking paradigm. Through the crowdsourcing platform, service requesters can buy service from service providers. An important component of crowdsourcing is its incentive mechanism. We study three models of crowdsourcing, which involve cooperation and competition among the service providers. Our simplest model generalizes the well-known user-centric model studied in a recent Mobicom paper. We design an incentive mechanism for each of the three models, and prove that these incentive mechanisms are individually rational, budget-balanced, computationally efficient, and truthful.

...read moreread less

Journal Article•DOI•

Exploiting mobile crowdsourcing for pervasive cloud services: challenges and solutions

[...]

Ju Ren¹, Yaoxue Zhang¹, Kuan Zhang², Xuemin Shen²•Institutions (2)

Central South University¹, University of Waterloo²

18 Mar 2015-IEEE Communications Magazine

TL;DR: The mobile crowdsourcing architecture and applications are investigated, some research challenges and countermeasures are discussed, and some research orientations are finally envisioned for further studies.

...read moreread less

Abstract: With the proliferation of increasingly powerful mobile devices, mobile users can collaboratively form a mobile cloud to provide pervasive services, such as data collecting, processing, and computing. With this mobile cloud, mobile crowdsourcing, as an emerging service paradigm, can enable mobile users to take over the outsourced tasks. By leveraging the sensing capabilities of mobile devices and integrating humanintelligence and machine-computation, mobile crowdsourcing has the potential to revolutionize the approach of data collecting and processing. In this article we investigate the mobile crowdsourcing architecture and applications, then discuss some research challenges and countermeasures for developing mobile crowdsourcing. Some research orientations are finally envisioned for further studies.

...read moreread less

Proceedings Article•DOI•

The Dynamics of Micro-Task Crowdsourcing: The Case of Amazon MTurk

[...]

Djellel Eddine Difallah¹, Michele Catasta², Gianluca Demartini³, Panagiotis G. Ipeirotis⁴, Philippe Cudré-Mauroux¹ - Show less +1 more•Institutions (4)

University of Fribourg¹, École Polytechnique Fédérale de Lausanne², University of Sheffield³, New York University⁴

18 May 2015

TL;DR: This paper uses the main findings of the five year log analysis to propose features used in a predictive model aiming at determining the expected performance of any batch at a specific point in time, and shows that the number of tasks left in a batch and how recent the batch is are two key features of the prediction.

...read moreread less

Abstract: Micro-task crowdsourcing is rapidly gaining popularity among research communities and businesses as a means to leverage Human Computation in their daily operations. Unlike any other service, a crowdsourcing platform is in fact a marketplace subject to human factors that affect its performance, both in terms of speed and quality. Indeed, such factors shape the \emph{dynamics} of the crowdsourcing market. For example, a known behavior of such markets is that increasing the reward of a set of tasks would lead to faster results. However, it is still unclear how different dimensions interact with each other: reward, task type, market competition, requester reputation, etc. In this paper, we adopt a data-driven approach to (A) perform a long-term analysis of a popular micro-task crowdsourcing platform and understand the evolution of its main actors (workers, requesters, tasks, and platform). (B) We leverage the main findings of our five year log analysis to propose features used in a predictive model aiming at determining the expected performance of any batch at a specific point in time. We show that the number of tasks left in a batch and how recent the batch is are two key features of the prediction. (C) Finally, we conduct an analysis of the demand (new tasks posted by the requesters) and supply (number of tasks completed by the workforce) and show how they affect task prices on the marketplace.

...read moreread less

Proceedings Article•DOI•

High quality participant recruitment in vehicle-based crowdsourcing using predictable mobility

[...]

Zongjian He¹, Jiannong Cao¹, Xuefeng Liu¹•Institutions (1)

Hong Kong Polytechnic University¹

24 Aug 2015

TL;DR: This paper presents a new participant recruitment strategy for vehicle-based crowdsourcing, and proposes two algorithms, a greedy approximation and a genetic algorithm, to find the solution for different application scenarios.

...read moreread less

Abstract: The potential of crowdsourcing for complex problem solving has been revealed by smartphones. Nowadays, vehicles have also been increasingly adopted as participants in crowd-sourcing applications. Different from smartphones, vehicles have the distinct advantage of predictable mobility, which brings new insight into improving the crowdsourcing quality. Unfortunately, utilizing the predictable mobility in participant recruitment poses a new challenge of considering not only current location but also the future trajectories of participants. Therefore, existing participant recruitment algorithms that only use the current location may not perform well. In this paper, based on the predicted trajectory, we present a new participant recruitment strategy for vehicle-based crowdsourcing. This strategy guarantees that the system can perform well using the currently recruited participants for a period of time in the future. The participant recruitment problem is proven to be NP-complete, and we propose two algorithms, a greedy approximation and a genetic algorithm, to find the solution for different application scenarios. The performance of our algorithms is demonstrated with traffic trace dataset. The results show that our algorithms outperform some existing approaches in terms of the crowdsourcing quality.

...read moreread less

Journal Article•DOI•

A Server-Assigned Spatial Crowdsourcing Framework

[...]

Hien To¹, Cyrus Shahabi¹, Leyla Kazemi²•Institutions (2)

University of Southern California¹, Microsoft²

27 Jul 2015-ACM Transactions on Spatial Algorithms and Systems

TL;DR: This article formally defines the maximum task assignment (MTA) problem in spatial crowdsourcing, and proposes alternative solutions to address these challenges by exploiting the spatial properties of the problem space, including the spatial distribution and the travel cost of the workers.

...read moreread less

Abstract: With the popularity of mobile devices, spatial crowdsourcing is rising as a new framework that enables human workers to solve tasks in the physical world. With spatial crowdsourcing, the goal is to crowdsource a set of spatiotemporal tasks (i.e., tasks related to time and location) to a set of workers, which requires the workers to physically travel to those locations in order to perform the tasks. In this article, we focus on one class of spatial crowdsourcing, in which the workers send their locations to the server and thereafter the server assigns to every worker tasks in proximity to the worker’s location with the aim of maximizing the overall number of assigned tasks. We formally define this maximum task assignment (MTA) problem in spatial crowdsourcing, and identify its challenges. We propose alternative solutions to address these challenges by exploiting the spatial properties of the problem space, including the spatial distribution and the travel cost of the workers. MTA is based on the assumptions that all tasks are of the same type and all workers are equally qualified in performing the tasks. Meanwhile, different types of tasks may require workers with various skill sets or expertise. Subsequently, we extend MTA by taking the expertise of the workers into consideration. We refer to this problem as the maximum score assignment (MSA) problem and show its practicality and generality. Extensive experiments with various synthetic and two real-world datasets show the applicability of our proposed framework.

...read moreread less

Journal Article•DOI•

Reliable diversity-based spatial crowdsourcing by moving workers

[...]

Peng Cheng¹, Xiang Lian², Zhao Chen¹, Rui Fu¹, Lei Chen¹, Jinsong Han³, Jizhong Zhao³ - Show less +3 more•Institutions (3)

Hong Kong University of Science and Technology¹, University of Texas at Austin², Xi'an Jiaotong University³

01 Jun 2015

TL;DR: Wang et al. as discussed by the authors proposed a reliable diversity-based spatial crowdsourcing (RDB-SC) problem to assign workers to spatial tasks such that the completion reliability and the spatial/temporal diversities of spatial tasks are maximized.

...read moreread less

Abstract: With the rapid development of mobile devices and the crowdsourcing platforms, the spatial crowdsourcing has attracted much attention from the database community, specifically, spatial crowdsourcing refers to sending a location-based request to workers according to their positions. In this paper, we consider an important spatial crowdsourcing problem, namely reliable diversity-based spatial crowdsourcing (RDB-SC), in which spatial tasks (such as taking videos/photos of a landmark or firework shows, and checking whether or not parking spaces are available) are time-constrained, and workers are moving towards some directions. Our RDB-SC problem is to assign workers to spatial tasks such that the completion reliability and the spatial/temporal diversities of spatial tasks are maximized. We prove that the RDB-SC problem is NP-hard and intractable. Thus, we propose three effective approximation approaches, including greedy, sampling, and divide-and-conquer algorithms. In order to improve the efficiency, we also design an effective cost-model-based index, which can dynamically maintain moving workers and spatial tasks with low cost, and efficiently facilitate the retrieval of RDB-SC answers. Through extensive experiments, we demonstrate the efficiency and effectiveness of our proposed approaches over both real and synthetic datasets.

...read moreread less

Journal Article•DOI•

Crowdsourcing urban form and function

[...]

Andrew Crooks¹, Dieter Pfoser¹, Andrew Jenkins¹, Arie Croitoru¹, Anthony Stefanidis¹, Duncan D. Smith², Sophia Karagiorgou³, Alexandros Efentakis, George Lamprianidis - Show less +5 more•Institutions (3)

George Mason University¹, University College London², National Technical University of Athens³

01 May 2015-International Journal of Geographical Information Science

TL;DR: A new typology for characterizing the role of crowdsourcing in the study of urban morphology is provided by synthesizing recent advancements in the analysis of open-source data, which shows how social media, trajectory, and traffic data can be analyzed to capture the evolving nature of a city’s form and function.

...read moreread less

Abstract: Urban form and function have been studied extensively in urban planning and geographical information science. However, gaining a greater understanding of how they merge to define the urban morphology remains a substantial scientific challenge. Toward this goal, this paper addresses the opportunities presented by the emergence of crowdsourced data to gain novel insights into form and function in urban spaces. We are focusing in particular on information harvested from social media and other open-source and volunteered datasets e.g. trajectory and OpenStreetMap data. These data provide a first-hand account of form and function from the people who define urban space through their activities. This novel bottom-up approach to study these concepts complements traditional urban studies to provide a new lens for studying urban activity. By synthesizing recent advancements in the analysis of open-source data, we provide a new typology for characterizing the role of crowdsourcing in the study of urban morphology. We illustrate this new perspective by showing how social media, trajectory, and traffic data can be analyzed to capture the evolving nature of a city’s form and function. While these crowd contributions may be explicit or implicit in nature, they are giving rise to an emerging research agenda for monitoring, analyzing, and modeling form and function for urban design and analysis.

...read moreread less

Proceedings Article•DOI•

Exploring Cyberbullying and Other Toxic Behavior in Team Competition Online Games

[...]

Haewoon Kwak¹, Jeremy Blackburn², Seungyeop Han³•Institutions (3)

Qatar Computing Research Institute¹, Telefónica², University of Washington³

18 Apr 2015

TL;DR: This work explores cyberbullying and other toxic behavior in team competition online games using a dataset of over 10 million player reports on 1.46 million toxic players along with corresponding crowdsourced decisions, and test several hypotheses drawn from theories explaining toxic behavior.

...read moreread less

Abstract: In this work we explore cyberbullying and other toxic behavior in team competition online games. Using a dataset of over 10 million player reports on 1.46 million toxic players along with corresponding crowdsourced decisions, we test several hypotheses drawn from theories explaining toxic behavior. Besides providing large-scale, empirical based understanding of toxic behavior, our work can be used as a basis for building systems to detect, prevent, and counter-act toxic behavior.

...read moreread less

Journal Article•DOI•

Crowdsourcing-based business models: : How to create and capture value

[...]

Thomas Kohler¹•Institutions (1)

Hawaii Pacific University¹

01 Aug 2015-California Management Review

TL;DR: The authors analyzes successful platforms to identify patterns of effective crowdsourcing-based business models, and provides guidance for managers who need to create new (or adapt existing) business models to adapt existing business models.

...read moreread less

Abstract: Technology has transformed individuals from mere consumers of products to empowered participants in value co-creation. While numerous firms experiment with involving a crowd in value creation, few companies turn crowdsourcing projects into thriving platforms with a powerful business model. To address this challenge, this article analyzes successful platforms to identify patterns of effective crowdsourcing-based business models. The results provide guidance for managers who need to create new (or adapt existing) business models.

...read moreread less

Proceedings Article•DOI•

Structuring, Aggregating, and Evaluating Crowdsourced Design Critique

[...]

Kurt Luther¹, Jari-lee Tolentino², Wei Wu³, Amy Pavel³, Brian P. Bailey⁴, Maneesh Agrawala³, Björn Hartmann³, Steven Dow¹ - Show less +4 more•Institutions (4)

Carnegie Mellon University¹, University of California, Irvine², University of California, Berkeley³, University of Illinois at Urbana–Champaign⁴

28 Feb 2015

TL;DR: CrowdCrit, a web-based system that allows designers to receive design critiques from non-expert crowd workers, is presented and evidence that aggregated crowd critique approaches expert critique is found.

...read moreread less

Abstract: Feedback is an important component of the design process, but gaining access to high-quality critique outside a classroom or firm is challenging. We present CrowdCrit, a web-based system that allows designers to receive design critiques from non-expert crowd workers. We evaluated CrowdCrit in three studies focusing on the designer's experience and benefits of the critiques. In the first study, we compared crowd and expert critiques and found evidence that aggregated crowd critique approaches expert critique. In a second study, we found that designers who got crowd feedback perceived that it improved their design process. The third study showed that designers were enthusiastic about crowd critiques and used them to change their designs. We conclude with implications for the design of crowd feedback services.

...read moreread less

Journal Article•DOI•

Crowdsourcing: a comprehensive literature review

[...]

Mokter Hossain¹, Ilkka Kauranen¹•Institutions (1)

Aalto University¹

28 May 2015-Strategic Outsourcing: An International Journal

TL;DR: A comprehensive review of 346 articles on crowdsourcing is presented in this paper, where the authors discuss the loci and foci of extant articles and listing applications of crowdsourcing, including idea generation, microtasking, citizen science, public participation, wikies, open source software and citizen journalism.

...read moreread less

Abstract: – The purpose of this paper is to explore the development of crowdsourcing literature. , – This study is a comprehensive review of 346 articles on crowdsourcing. Both statistical and contents analyses were conducted in this paper. , – ISI listed journal articles, non-ISI listed journal articles and conference articles have had nearly the same contribution in crowdsourcing literature. Articles published in non-ISI listed journals have had an essential role in the initial theory development related to crowdsourcing. Scholars from the USA have authored approximately the same number of articles as scholars from all the European countries combined. Scholars from developing countries have been more relatively active in authoring conference articles than journal articles. Only very recently, top-tier journals have engaged in publishing on crowdsourcing. Crowdsourcing has proven to be beneficial in many tasks, but the extant literature does not give much help to practitioners in capturing value from crowdsourcing. Despite understanding that the motivations of crowds are crucial when planning crowdsourcing activities, the various motivations in different contexts have not been explored sufficiently. A major concern has been the quality and accuracy of information that has been gathered through crowdsourcing. Crowdsourcing bears a lot of unused potential. For example, it can increase employment opportunities to low-income people in developing countries. On the other hand, more should be known of fair ways to organize crowdsourcing so that solution seekers do not get a chance to exploit individuals committing to provide solutions. , – The literature included in the study is extensive, but an all-inclusive search for articles was limited to only nine selected publishers. However, in addition to the articles retrieved from the nine selected publishers, 52 highly cited articles were also included from other publishers. , – Crowdsourcing has much unused potential, and the use of crowdsourcing is increasing rapidly. The study provides a thematic review of various applications of crowdsourcing. , – The study is the first of its kind to explore the development of crowdsourcing literature, discussing the loci and foci of extant articles and listing applications of crowdsourcing. Successful applications of crowdsourcing include idea generation, microtasking, citizen science, public participation, wikies, open source software and citizen journalism.

...read moreread less

Proceedings Article•DOI•

Deep Questions without Deep Understanding

[...]

Igor Labutov¹, Sumit Basu², Lucy Vanderwende²•Institutions (2)

Cornell University¹, Microsoft²

01 Jul 2015

TL;DR: An approach for generating deep comprehension questions from novel text that bypasses the myriad challenges of creating a full semantic representation by decomposing the task into an ontologycrowd-relevance workflow, consisting of first representing the original text in a low-dimensional ontology, then crowdsourcing candidate question templates aligned with that space, and finally ranking potentially relevant templates for a novel region of text.

...read moreread less

Abstract: We develop an approach for generating deep (i.e, high-level) comprehension questions from novel text that bypasses the myriad challenges of creating a full semantic representation. We do this by decomposing the task into an ontologycrowd-relevance workflow, consisting of first representing the original text in a low-dimensional ontology, then crowdsourcing candidate question templates aligned with that space, and finally ranking potentially relevant templates for a novel region of text. If ontological labels are not available, we infer them from the text. We demonstrate the effectiveness of this method on a corpus of articles from Wikipedia alongside human judgments, and find that we can generate relevant deep questions with a precision of over 85% while maintaining a recall of 70%.

...read moreread less

Proceedings Article•DOI•

Avoiding the South Side and the Suburbs: The Geography of Mobile Crowdsourcing Markets

[...]

Jacob Thebault-Spieker¹, Loren Terveen¹, Brent Hecht¹•Institutions (1)

University of Minnesota¹

28 Feb 2015

TL;DR: The results suggest that low-SES areas are currently less able to take advantage of the benefits of mobile crowdsourcing markets, as well as for "sharing economy" phenomena like UberX, which have many properties in common with mobile crowdsourced markets.

...read moreread less

Abstract: Mobile crowdsourcing markets (e.g., Gigwalk and TaskRabbit) offer crowdworkers tasks situated in the physical world (e.g., checking street signs, running household errands). The geographic nature of these tasks distinguishes these markets from online crowdsourcing markets and raises new, fundamental questions. We carried out a controlled study in the Chicago metropolitan area aimed at addressing two key questions: (1) What geographic factors influence whether a crowdworker will be willing to do a task? (2) What geographic factors influence how much compensation a crowdworker will demand in order to do a task? Quantitative modeling shows that travel distance to the location of the task and the socioeconomic status (SES) of the task area are important factors. Qualitative analysis enriches our modeling, with workers mentioning safety and difficulties getting to a location as key considerations. Our results suggest that low-SES areas are currently less able to take advantage of the benefits of mobile crowdsourcing markets. We discuss the implications of our study for these markets, as well as for "sharing economy" phenomena like UberX, which have many properties in common with mobile crowdsourcing markets.

...read moreread less

Journal Article•DOI•

Task Assignment on Multi-Skill Oriented Spatial Crowdsourcing

[...]

Peng Cheng¹, Xiang Lian², Lei Chen¹, Jinsong Han³, Jizhong Zhao³ - Show less +1 more•Institutions (3)

Hong Kong University of Science and Technology¹, University of Texas at Austin², Xi'an Jiaotong University³

12 Oct 2015-arXiv: Databases

TL;DR: Wang et al. as discussed by the authors proposed three heuristic approaches, including greedy, g-divide-and-conquer and cost-model-based adaptive algorithms, to find an optimal worker and task assignment strategy, such that skills between workers and tasks match with each other.

...read moreread less

Abstract: With the rapid development of mobile devices and crowdsourcing platforms, the spatial crowdsourcing has attracted much attention from the database community. Specifically, the spatial crowdsourcing refers to sending location-based requests to workers, based on their current positions. In this paper, we consider a spatial crowdsourcing scenario, in which each worker has a set of qualified skills, whereas each spatial task (e.g., repairing a house, decorating a room, and performing entertainment shows for a ceremony) is time-constrained, under the budget constraint, and required a set of skills. Under this scenario, we will study an important problem, namely multi-skill spatial crowdsourcing (MS-SC), which finds an optimal worker-and-task assignment strategy, such that skills between workers and tasks match with each other, and workers' benefits are maximized under the budget constraint. We prove that the MS-SC problem is NP-hard and intractable. Therefore, we propose three effective heuristic approaches, including greedy, g-divide-and-conquer and cost-model-based adaptive algorithms to get worker-and-task assignments. Through extensive experiments, we demonstrate the efficiency and effectiveness of our MS-SC processing approaches on both real and synthetic data sets.

...read moreread less

Collapse