Showing papers on "Crowdsourcing published in 2017"

PDF

Open Access

Journal Article•DOI•

TurkPrime.com: A versatile crowdsourcing data acquisition platform for the behavioral sciences

[...]

Leib Litman¹, Jonathan Robinson¹, Tzvi Abberbock¹•Institutions (1)

01 Apr 2017-Behavior Research Methods

TL;DR: How TurkPrime saves time and resources, improves data quality, and allows researchers to design and implement studies that were previously very difficult or impossible to carry out on MTurk is described.

...read moreread less

Abstract: In recent years, Mechanical Turk (MTurk) has revolutionized social science by providing a way to collect behavioral data with unprecedented speed and efficiency. However, MTurk was not intended to be a research tool, and many common research tasks are difficult and time-consuming to implement as a result. TurkPrime was designed as a research platform that integrates with MTurk and supports tasks that are common to the social and behavioral sciences. Like MTurk, TurkPrime is an Internet-based platform that runs on any browser and does not require any downloads or installation. Tasks that can be implemented with TurkPrime include: excluding participants on the basis of previous participation, longitudinal studies, making changes to a study while it is running, automating the approval process, increasing the speed of data collection, sending bulk e-mails and bonuses, enhancing communication with participants, monitoring dropout and engagement rates, providing enhanced sampling options, and many others. This article describes how TurkPrime saves time and resources, improves data quality, and allows researchers to design and implement studies that were previously very difficult or impossible to carry out on MTurk. TurkPrime is designed as a research tool whose aim is to improve the quality of the crowdsourcing data collection process. Various features have been and continue to be implemented on the basis of feedback from the research community. TurkPrime is a free research platform.

...read moreread less

1,241 citations

Proceedings Article•DOI•

Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild

[...]

Shan Li¹, Weihong Deng¹, Junping Du¹•Institutions (1)

Beijing University of Posts and Telecommunications¹

01 Jul 2017

TL;DR: A new DLP-CNN (Deep Locality-Preserving CNN) method, which aims to enhance the discriminative power of deep features by preserving the locality closeness while maximizing the inter-class scatters, is proposed.

...read moreread less

Abstract: Past research on facial expressions have used relatively limited datasets, which makes it unclear whether current methods can be employed in real world. In this paper, we present a novel database, RAF-DB, which contains about 30000 facial images from thousands of individuals. Each image has been individually labeled about 40 times, then EM algorithm was used to filter out unreliable labels. Crowdsourcing reveals that real-world faces often express compound emotions, or even mixture ones. For all we know, RAF-DB is the first database that contains compound expressions in the wild. Our cross-database study shows that the action units of basic emotions in RAF-DB are much more diverse than, or even deviate from, those of lab-controlled ones. To address this problem, we propose a new DLP-CNN (Deep Locality-Preserving CNN) method, which aims to enhance the discriminative power of deep features by preserving the locality closeness while maximizing the inter-class scatters. The benchmark experiments on the 7-class basic expressions and 11-class compound expressions, as well as the additional experiments on SFEW and CK+ databases, show that the proposed DLP-CNN outperforms the state-of-the-art handcrafted features and deep learning based methods for the expression recognition in the wild.

...read moreread less

746 citations

Proceedings Article•DOI•

Ex Machina: Personal Attacks Seen at Scale

[...]

Ellery Wulczyn, Nithum Thain, Lucas Dixon

03 Apr 2017

TL;DR: A method that combines crowdsourcing and machine learning to analyze personal attacks at scale is developed and illustrated, and an evaluation method for a classifier in terms of the aggregated number of crowd-workers it can approximate is shown.

...read moreread less

Abstract: The damage personal attacks cause to online discourse motivates many platforms to try to curb the phenomenon However, understanding the prevalence and impact of personal attacks in online platforms at scale remains surprisingly difficult The contribution of this paper is to develop and illustrate a method that combines crowdsourcing and machine learning to analyze personal attacks at scale We show an evaluation method for a classifier in terms of the aggregated number of crowd-workers it can approximate We apply our methodology to English Wikipedia, generating a corpus of over 100k high quality human-labeled comments and 63M machine-labeled ones from a classifier that is as good as the aggregate of 3 crowd-workers, as measured by the area under the ROC curve and Spearman correlation Using this corpus of machine-labeled scores, our methodology allows us to explore some of the open questions about the nature of online personal attacks This reveals that the majority of personal attacks on Wikipedia are not the result of a few malicious users, nor primarily the consequence of allowing anonymous contributions from unregistered users

...read moreread less

472 citations

Journal Article•DOI•

Crowdsourcing consumer research

[...]

Joseph K. Goodman¹, Gabriele Paolacci²•Institutions (2)

Max M. Fisher College of Business¹, Erasmus University Rotterdam²

01 Jun 2017-Journal of Consumer Research

TL;DR: This tutorial assesses the evidence on the reliability of crowdsourced populations and the conditions under which crowdsourcing is a valid strategy for data collection, and proposes specific guidelines for researchers to conduct high-quality research via crowdsourcing.

...read moreread less

Abstract: Data collection in consumer research has progressively moved away from traditional samples (e.g., university undergraduates) and toward Internet samples. In the last complete volume of the Journal of Consumer Research (June 2015-April 2016), 43% of behavioral studies were conducted on the crowdsourcing website Amazon Mechanical Turk (MTurk). The option to crowdsource empirical investigations has great efficiency benefits for both individual researchers and the field, but it also poses new challenges and questions for how research should be designed, conducted, analyzed, and evaluated. We assess the evidence on the reliability of crowdsourced populations and the conditions under which crowdsourcing is a valid strategy for data collection. Based on this evidence, we propose specific guidelines for researchers to conduct high-quality research via crowdsourcing. We hope this tutorial will strengthen the community's scrutiny on data collection practices and move the field toward better and more valid crowdsourcing of consumer research.

...read moreread less

384 citations

Journal Article•DOI•

Truth inference in crowdsourcing: is the problem solved?

[...]

Yudian Zheng¹, Guoliang Li², Yuanbing Li², Caihua Shan¹, Reynold Cheng¹ - Show less +1 more•Institutions (2)

University of Hong Kong¹, Tsinghua University²

01 Jan 2017

TL;DR: It is believed that the truth inference problem is not fully solved, and the limitations of existing algorithms are identified and point out promising research directions.

...read moreread less

Abstract: Crowdsourcing has emerged as a novel problem-solving paradigm, which facilitates addressing problems that are hard for computers, e.g., entity resolution and sentiment analysis. However, due to the openness of crowdsourcing, workers may yield low-quality answers, and a redundancy-based method is widely employed, which first assigns each task to multiple workers and then infers the correct answer (called truth) for the task based on the answers of the assigned workers. A fundamental problem in this method is Truth Inference, which decides how to effectively infer the truth. Recently, the database community and data mining community independently study this problem and propose various algorithms. However, these algorithms are not compared extensively under the same framework and it is hard for practitioners to select appropriate algorithms. To alleviate this problem, we provide a detailed survey on 17 existing algorithms and perform a comprehensive evaluation using 5 real datasets. We make all codes and datasets public for future research. Through experiments we find that existing algorithms are not stable across different datasets and there is no algorithm that outperforms others consistently. We believe that the truth inference problem is not fully solved, and identify the limitations of existing algorithms and point out promising research directions.

...read moreread less

376 citations

Journal Article•DOI•

A survey of the use of crowdsourcing in software engineering

[...]

Ke Mao¹, Licia Capra¹, Mark Harman¹, Yue Jia¹•Institutions (1)

University College London¹

01 Apr 2017-Journal of Systems and Software

TL;DR: A comprehensive survey of the use of crowdsourcing in software engineering, seeking to cover all literature on this topic, and exposing trends, open issues and opportunities for future research on Crowdsourced Software Engineering.

...read moreread less

360 citations

Journal Article•DOI•

Amazon Mechanical Turk in Organizational Psychology: An Evaluation and Practical Recommendations

[...]

Janelle H. Cheung¹, Deanna K. Burns¹, Robert R. Sinclair¹, Michael T. Sliter•Institutions (1)

Clemson University¹

01 Aug 2017-Journal of Business and Psychology

TL;DR: In this paper, the authors present an evaluation of MTurk and provide a set of practical recommendations for researchers using the data source, based on which they evaluate the effectiveness of using it.

...read moreread less

Abstract: Purpose Amazon Mechanical Turk is an increasingly popular data source in the organizational psychology research community. This paper presents an evaluation of MTurk and provides a set of practical recommendations for researchers using MTurk.

...read moreread less

330 citations

Proceedings Article•DOI•

Rico: A Mobile App Dataset for Building Data-Driven Design Applications

[...]

Biplab Deka¹, Zifeng Huang¹, Chad Franzen¹, Joshua Hibschman², Daniel Afergan³, Yang Li³, Jeffrey Nichols³, Ranjitha Kumar¹ - Show less +4 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, Northwestern University², Google³

20 Oct 2017

TL;DR: Rico is presented, the largest repository of mobile app designs to date, created to support five classes of data-driven applications: design search, UI layout generation, UI code generation, user interaction modeling, and user perception prediction.

...read moreread less

Abstract: Data-driven models help mobile app designers understand best practices and trends, and can be used to make predictions about design performance and support the creation of adaptive UIs. This paper presents Rico, the largest repository of mobile app designs to date, created to support five classes of data-driven applications: design search, UI layout generation, UI code generation, user interaction modeling, and user perception prediction. To create Rico, we built a system that combines crowdsourcing and automation to scalably mine design and interaction data from Android apps at runtime. The Rico dataset contains design data from more than 9.7k Android apps spanning 27 categories. It exposes visual, textual, structural, and interactive design properties of more than 72k unique UI screens. To demonstrate the kinds of applications that Rico enables, we present results from training an autoencoder for UI layout similarity, which supports query- by-example search over UIs.

...read moreread less

309 citations

Proceedings Article•DOI•

A Corpus of Natural Language for Visual Reasoning.

[...]

Alane Suhr¹, Michael Lewis², James Yeh, Yoav Artzi¹•Institutions (2)

Cornell University¹, University of Pittsburgh²

01 Jul 2017

TL;DR: A method of crowdsourcing linguistically-diverse data, and an analysis of the data demonstrates a broad set of linguistic phenomena, requiring visual and set-theoretic reasoning.

...read moreread less

Abstract: We present a new visual reasoning language dataset, containing 92,244 pairs of examples of natural statements grounded in synthetic images with 3,962 unique sentences. We describe a method of crowdsourcing linguistically-diverse data, and present an analysis of our data. The data demonstrates a broad set of linguistic phenomena, requiring visual and set-theoretic reasoning. We experiment with various models, and show the data presents a strong challenge for future research.

...read moreread less

222 citations

Journal Article•DOI•

Gamified crowdsourcing: Conceptualization, literature review, and future agenda

[...]

Benedikt Morschheuser¹, Benedikt Morschheuser², Juho Hamari³, Juho Hamari⁴, Jonna Koivisto³, Alexander Maedche¹ - Show less +2 more•Institutions (4)

Karlsruhe Institute of Technology¹, Bosch², Tampere University of Technology³, University of Turku⁴

01 Oct 2017-International Journal of Human-computer Studies \/ International Journal of Man-machine Studies

TL;DR: This work provides a conceptual framework for gamified crowdsourcing systems in order to understand and conceptualize the key aspects of the phenomenon and indicates that gamification has been an effective approach for increasing crowdsourcing participation and the quality of the crowdsourced work.

...read moreread less

Abstract: Two parallel phenomena are gaining attention in human–computer interaction research: gamification and crowdsourcing Because crowdsourcing's success depends on a mass of motivated crowdsourcees, crowdsourcing platforms have increasingly been imbued with motivational design features borrowed from games; a practice often called gamification While the body of literature and knowledge of the phenomenon have begun to accumulate, we still lack a comprehensive and systematic understanding of conceptual foundations, knowledge of how gamification is used in crowdsourcing, and whether it is effective We first provide a conceptual framework for gamified crowdsourcing systems in order to understand and conceptualize the key aspects of the phenomenon The paper's main contributions are derived through a systematic literature review that investigates how gamification has been examined in different types of crowdsourcing in a variety of domains This meticulous mapping, which focuses on all aspects in our framework, enables us to infer what kinds of gamification efforts are effective in different crowdsourcing approaches as well as to point to a number of research gaps and lay out future research directions for gamified crowdsourcing systems Overall, the results indicate that gamification has been an effective approach for increasing crowdsourcing participation and the quality of the crowdsourced work; however, differences exist between different types of crowdsourcing: the research conducted in the context of crowdsourcing of homogenous tasks has most commonly used simple gamification implementations, such as points and leaderboards, whereas crowdsourcing implementations that seek diverse and creative contributions employ gamification with a richer set of mechanics

...read moreread less

212 citations

Proceedings Article•DOI•

Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets

[...]

Joseph Chee Chang¹, Saleema Amershi², Ece Kamar²•Institutions (2)

Carnegie Mellon University¹, Microsoft²

02 May 2017

TL;DR: Revolt eliminates the burden of creating detailed label guidelines by harnessing crowd disagreements to identify ambiguous concepts and create rich structures (groups of semantically related items) for post-hoc label decisions.

...read moreread less

Abstract: Crowdsourcing provides a scalable and efficient way to construct labeled datasets for training machine learning systems. However, creating comprehensive label guidelines for crowdworkers is often prohibitive even for seemingly simple concepts. Incomplete or ambiguous label guidelines can then result in differing interpretations of concepts and inconsistent labels. Existing approaches for improving label quality, such as worker screening or detection of poor work, are ineffective for this problem and can lead to rejection of honest work and a missed opportunity to capture rich interpretations about data. We introduce Revolt, a collaborative approach that brings ideas from expert annotation workflows to crowd-based labeling. Revolt eliminates the burden of creating detailed label guidelines by harnessing crowd disagreements to identify ambiguous concepts and create rich structures (groups of semantically related items) for post-hoc label decisions. Experiments comparing Revolt to traditional crowdsourced labeling show that Revolt produces high quality labels without requiring label guidelines in turn for an increase in monetary cost. This up front cost, however, is mitigated by Revolt's ability to produce reusable structures that can accommodate a variety of label boundaries without requiring new data to be collected. Further comparisons of Revolt's collaborative and non-collaborative variants show that collaboration reaches higher label accuracy with lower monetary cost.

...read moreread less

Posted Content•

How to Work a Crowd: Developing Crowd Capital Through Crowdsourcing

[...]

John Prpic¹, Prashant Shukla², Jan Kietzmann¹, Ian P. McCarthy¹•Institutions (2)

Simon Fraser University¹, University of Toronto²

13 Feb 2017-arXiv: Computers and Society

TL;DR: This work formulates a crowdsourcing typology and shows how its four categories—crowd voting, micro-task, idea, and solution crowdsourcing—can help firms develop ‘crowd capital,’ an organizational-level resource harnessed from the crowd.

...read moreread less

Abstract: Traditionally, the term crowd was used almost exclusively in the context of people who self-organized around a common purpose, emotion or experience. Today, however, firms often refer to crowds in discussions of how collections of individuals can be engaged for organizational purposes. Crowdsourcing, the use of information technologies to outsource business responsibilities to crowds, can now significantly influence a firms ability to leverage previously unattainable resources to build competitive advantage. Nonetheless, many managers are hesitant to consider crowdsourcing because they do not understand how its various types can add value to the firm. In response, we explain what crowdsourcing is, the advantages it offers and how firms can pursue crowdsourcing. We begin by formulating a crowdsourcing typology and show how its four categories (crowd-voting, micro-task, idea and solution crowdsourcing) can help firms develop crowd capital, an organizational-level resource harnessed from the crowd. We then present a three-step process model for generating crowd capital. Step one includes important considerations that shape how a crowd is to be constructed. Step two outlines the capabilities firms need to develop to acquire and assimilate resources (knowledge, labor, funds) from the crowd. Step three addresses key decision-areas that executives need to address to effectively engage crowds.

...read moreread less

Journal Article•DOI•

Crowdsourcing the last mile delivery of online orders by exploiting the social networks of retail store customers

[...]

Aashwinikumar Devari¹, Alexander G. Nikolaev¹, Qing He¹•Institutions (1)

State University of New York System¹

01 Sep 2017-Transportation Research Part E-logistics and Transportation Review

TL;DR: In this article, the authors demonstrate the potential benefits of crowdsourcing last mile delivery by exploiting a social network of the customers and show that using friends in social networks to assist in last-mile delivery greatly reduces delivery costs and total emissions while ensuring speedy and reliable delivery.

...read moreread less

Abstract: This paper demonstrates the potential benefits of crowdsourcing last mile delivery by exploiting a social network of the customers. The presented models and analysis are informed by the results of a survey to gauge people’s attitudes toward engaging in social network-reliant package delivery to and by friends or acquaintances. It is found that using friends in a social network to assist in last mile delivery greatly reduces delivery costs and total emissions while ensuring speedy and reliable delivery. The proposed new delivery method also mitigates the privacy concerns and not-at-home syndrome that widely exist in last mile delivery.

...read moreread less

Journal Article•DOI•

The Rise of Crowd Logistics: A New Way to Co‐Create Logistics Value

[...]

Valentina Carbone¹, Aurélien Rouquet², Christine Roussat•Institutions (2)

ESCP Europe¹, NEOMA Business School²

28 Aug 2017-Journal of Business Logistics

TL;DR: In this article, the authors proposed a conceptualization of the growing phenomenon of crowd logistics, which is a novel way of providing logistics services that taps into the dormant logistics resources and capabilities of individuals, using mobile applications and web-based platforms.

...read moreread less

Abstract: Patterned on crowdsourcing and crowdfunding, a new crowd practice has emerged in recent years: crowd logistics. In this paper, we propose a first conceptualization of this growing phenomenon. Crowd logistics is a novel way of providing logistics services that taps into the dormant logistics resources and capabilities of individuals, using mobile applications and web-based platforms. Although crowd logistics has been widely discussed in the business world, it has not yet been the subject of any academic publication. Following an exploratory case study approach, we review the websites of 57 crowd logistics initiatives around the world and highlight the main distinctive characteristics of crowd logistics, as compared to traditional business logistics. We introduce a segmented analysis in which crowd logistics solutions are classified according to four types of service offered. Finally, we introduce six theoretical propositions on the future development of crowd logistics. At a theoretical level, our findings contribute to enriching the service-dominant logic perspective in the logistics field by conceptualizing the crowd as a co-creator of logistics value. At a managerial level, our findings contribute to identifying which types of crowd logistics services are more likely to threaten or disrupt traditional business.

...read moreread less

Proceedings Article•

Building Task-Oriented Dialogue Systems for Online Shopping.

[...]

Zhao Yan¹, Nan Duan², Peng Chen², Ming Zhou², Jianshe Zhou³, Zhoujun Li¹ - Show less +2 more•Institutions (3)

Beihang University¹, Microsoft², Capital Normal University³

12 Feb 2017

TL;DR: This work presents a general solution towards building task-oriented dialogue systems for online shopping, aiming to assist online customers in completing various purchase-related tasks, such as searching products and answering questions, in a natural language conversation manner.

...read moreread less

Abstract: We present a general solution towards building task-oriented dialogue systems for online shopping, aiming to assist online customers in completing various purchase-related tasks, such as searching products and answering questions, in a natural language conversation manner. As a pioneering work, we show what & how existing NLP techniques, data resources, and crowdsourcing can be leveraged to build such task-oriented dialogue systems for E-commerce usage. To demonstrate its effectiveness, we integrate our system into a mobile online shopping app. To the best of our knowledge, this is the first time that an AI bot in Chinese is practically used in online shopping scenario with millions of real consumers. Interesting and insightful observations are shown in the experimental part, based on the analysis of human-bot conversation log. Several current challenges are also pointed out as our future directions.

...read moreread less

Journal Article•DOI•

Choose wisely: Crowdfunding through the stages of the startup life cycle

[...]

Jeannette Paschen¹•Institutions (1)

Royal Institute of Technology¹

01 Mar 2017-Business Horizons

TL;DR: In this paper, the authors describe the outsourcing of an organizational functio-ctio to crowdsourcing as an alternative funding source and offers non-monetary resources through organizational learning.

...read moreread less

Journal Article•DOI•

Crowdsourcing Samples in Cognitive Science

[...]

Neil Stewart¹, Jesse Chandler², Jesse Chandler³, Gabriele Paolacci⁴•Institutions (4)

University of Warwick¹, University of Michigan², Mathematica Policy Research³, Erasmus University Rotterdam⁴

01 Oct 2017-Trends in Cognitive Sciences

TL;DR: Crowdsourcing data collection from research participants recruited from online labor markets is now common in cognitive science and who is in the crowd and who can be reached by the average laboratory is reviewed.

...read moreread less

Journal Article•DOI•

Using online, crowdsourcing platforms for data collection in personality disorder research: The example of Amazon's Mechanical Turk.

[...]

Joshua D. Miller¹, Michael L. Crowe¹, Brandon Weiss¹, Jessica L. Maples-Keller¹, Donald R. Lynam² - Show less +1 more•Institutions (2)

University of Georgia¹, Purdue University²

01 Jan 2017-Personality Disorders: Theory, Research, and Treatment

TL;DR: It is concluded that platforms such as MTurk have much to offer PD researchers, especially for certain kinds of research (e.g., where large samples are required and there is a need for iterative sampling).

...read moreread less

Abstract: The use of crowdsourcing platforms such as Amazon's Mechanical Turk (MTurk) for data collection in the behavioral sciences has increased substantially in the past several years due in large part to (a) the ability to recruit large samples, (b) the inexpensiveness of data collection, (c) the speed of data collection, and (d) evidence that the data collected are, for the most part, of equal or better quality to that collected in undergraduate research pools. In this review, we first evaluate the strengths and potential limitations of this approach to data collection. Second, we examine how MTurk has been used to date in personality disorder (PD) research and compare the characteristics of such research to PD research conducted in other settings. Third, we compare PD trait data from the Section III trait model of the DSM-5 collected via MTurk to data collected using undergraduate and clinical samples with regard to internal consistency, mean-level differences, and factor structure. Overall, we conclude that platforms such as MTurk have much to offer PD researchers, especially for certain kinds of research (e.g., where large samples are required and there is a need for iterative sampling). Whether MTurk itself remains the predominant model of such platforms is unclear, however, and will largely depend on decisions related to cost effectiveness and the development of alternatives that offer even greater flexibility. (PsycINFO Database Record

...read moreread less

Journal Article•DOI•

The Crowd in Requirements Engineering: The Landscape and Challenges

[...]

Eduard C. Groen¹, Norbert Seyff², Raian Ali³, Fabiano Dalpiaz⁴, Joerg Doerr¹, Emitza Guzman⁵, Mahmood Hosseini³, Jordi Marco⁶, Marc Oriol⁶, Anna Perini, Melanie Stade² - Show less +7 more•Institutions (6)

Fraunhofer Society¹, University of Applied Sciences and Arts Northwestern Switzerland FHNW², Bournemouth University³, Utrecht University⁴, University of Zurich⁵, Polytechnic University of Catalonia⁶

01 Mar 2017-IEEE Software

TL;DR: Current research topics in CrowdRE are presented; the benefits, challenges, and lessons learned from projects and experiments are discussed; and how to apply the methods and tools in industrial contexts are assessed.

...read moreread less

Abstract: Crowd-based requirements engineering (CrowdRE) could significantly change RE. Performing RE activities such as elicitation with the crowd of stakeholders turns RE into a participatory effort, leads to more accurate requirements, and ultimately boosts software quality. Although any stakeholder in the crowd can contribute, CrowdRE emphasizes one stakeholder group whose role is often trivialized: users. CrowdRE empowers the management of requirements, such as their prioritization and segmentation, in a dynamic, evolved style through collecting and harnessing a continuous flow of user feedback and monitoring data on the usage context. To analyze the large amount of data obtained from the crowd, automated approaches are key. This article presents current research topics in CrowdRE; discusses the benefits, challenges, and lessons learned from projects and experiments; and assesses how to apply the methods and tools in industrial contexts. This article is part of a special issue on Crowdsourcing for Software Engineering.

...read moreread less

Proceedings Article•DOI•

Crowdsourcing Multiple Choice Science Questions

[...]

Johannes Welbl¹, Nelson F. Liu², Matt Gardner³•Institutions (3)

University College London¹, University of Washington², Allen Institute for Artificial Intelligence³

01 Sep 2017

TL;DR: The authors presented a method for obtaining high-quality, domain-targeted multiple choice questions from crowd workers, which produces model suggestions for document selection and answer distractor choice which aid the human question generation process.

...read moreread less

Abstract: We present a novel method for obtaining high-quality, domain-targeted multiple choice questions from crowd workers. Generating these questions can be difficult without trading away originality, relevance or diversity in the answer options. Our method addresses these problems by leveraging a large corpus of domain-specific text and a small set of existing questions. It produces model suggestions for document selection and answer distractor choice which aid the human question generation process. With this method we have assembled SciQ, a dataset of 13.7K multiple choice science exam questions. We demonstrate that the method produces in-domain questions by providing an analysis of this new dataset and by showing that humans cannot distinguish the crowdsourced questions from original questions. When using SciQ as additional training data to existing questions, we observe accuracy improvements on real science exams.

...read moreread less

Proceedings Article•DOI•

Flash Organizations: Crowdsourcing Complex Work by Structuring Crowds As Organizations

[...]

Melissa Valentine¹, Daniela Retelny¹, Alexandra To², Negar Rahmati¹, Tulsee S. Doshi¹, Michael S. Bernstein¹ - Show less +2 more•Institutions (2)

Stanford University¹, Carnegie Mellon University²

02 May 2017

TL;DR: A deployment is reported in which flash organizations successfully carried out open-ended and complex goals previously out of reach for crowdsourcing, including product design, software development, and game production.

...read moreread less

Abstract: This paper introduces flash organizations: crowds structured like organizations to achieve complex and open-ended goals. Microtask workflows, the dominant crowdsourcing structures today, only enable goals that are so simple and modular that their path can be entirely pre-defined. We present a system that organizes crowd workers into computationally-represented structures inspired by those used in organizations - roles, teams, and hierarchies - which support emergent and adaptive coordination toward open-ended goals. Our system introduces two technical contributions: 1) encoding the crowd's division of labor into de-individualized roles, much as movie crews or disaster response teams use roles to support coordination between on-demand workers who have not worked together before; and 2) reconfiguring these structures through a model inspired by version control, enabling continuous adaptation of the work and the division of labor. We report a deployment in which flash organizations successfully carried out open-ended and complex goals previously out of reach for crowdsourcing, including product design, software development, and game production. This research demonstrates digitally networked organizations that flexibly assemble and reassemble themselves from a globally distributed online workforce to accomplish complex work.

...read moreread less

Journal Article•DOI•

Solvers’ participation in crowdsourcing platforms: Examining the impacts of trust, and benefit and cost factors

[...]

Hua (Jonathan) Ye¹, Atreyi Kankanhalli²•Institutions (2)

University of Auckland¹, National University of Singapore²

01 Jun 2017-Journal of Strategic Information Systems

TL;DR: A model to explain the impacts of benefit and cost factors as well as trust on solver participation behavior in crowdsourcing was developed and found that monetary reward positively affects trust (trust partially mediates its effect on participation behavior), while loss of knowledge power negatively affects trust.

...read moreread less

Abstract: Organizations are increasingly crowdsourcing their tasks to unknown individual workers, i.e., solvers. Solvers' participation is critical to the success of crowdsourcing activities. However, challenges exist in attracting solvers to participate in crowdsourcing. In this regard, prior research has mainly investigated the influences of benefit factors on solvers’ intention to participate in crowdsourcing. Thus, there is a lack of understanding of the cost factors that influence actual participation behavior, in conjunction with the benefits. Additionally, the role of trust in the cost-benefit analysis remains to be explored. Motivated thus, based on social exchange theory and context-related literature, we develop a model to explain the impacts of benefit and cost factors as well as trust on solver participation behavior in crowdsourcing. The model was tested using survey and archival data from 156 solvers on a large crowdsourcing platform. As hypothesized, monetary reward, skill enhancement, work autonomy, enjoyment, and trust were found to positively affect solvers’ participation in crowdsourcing, while cognitive effort negatively affects their participation. In addition, it was found that monetary reward positively affects trust (trust partially mediates its effect on participation behavior), while loss of knowledge power negatively affects trust. The theoretical contributions and practical implications of the study are discussed.

...read moreread less

Journal Article•DOI•

Differentially Private Location Protection for Worker Datasets in Spatial Crowdsourcing

[...]

Hien To¹, Gabriel Ghinita², Liyue Fan¹, Cyrus Shahabi¹•Institutions (2)

University of Southern California¹, University of Massachusetts Boston²

01 Apr 2017-IEEE Transactions on Mobile Computing

TL;DR: This paper proposes a mechanism based on differential privacy and geocasting that achieves effective SC services while offering privacy guarantees to workers, and addresses scenarios with both static and dynamic datasets of workers.

...read moreread less

Abstract: Spatial Crowdsourcing (SC) is a transformative platform that engages individuals in collecting and analyzing environmental, social, and other spatio-temporal information. SC outsources spatio-temporal tasks to a set of workers , i.e., individuals with mobile devices that perform the tasks by physically traveling to specified locations. However, current solutions require the workers to disclose their locations to untrusted parties. In this paper, we introduce a framework for protecting location privacy of workers participating in SC tasks. We propose a mechanism based on differential privacy and geocasting that achieves effective SC services while offering privacy guarantees to workers. We address scenarios with both static and dynamic (i.e., moving) datasets of workers. Experimental results on real-world data show that the proposed technique protects location privacy without incurring significant performance overhead.

...read moreread less

Proceedings Article•DOI•

Crowdsourced Data Management: A Survey

[...]

Guoliang Li¹, Jiannan Wang², Yudian Zheng³, Michael J. Franklin⁴•Institutions (4)

Tsinghua University¹, Simon Fraser University², University of Hong Kong³, University of Chicago⁴

01 Apr 2017

TL;DR: This paper surveys and synthesizes a wide spectrum of existing studies on crowdsourced data management and outlines key factors that need to be considered to improve crowdsourcing data management.

...read moreread less

Abstract: Many important data management and analytics tasks cannot be completely addressed by automated processes. These tasks, such as entity resolution, sentiment analysis, and image recognition can be enhanced through the use of human cognitive ability. Crowdsouring is an effective way to harness the capabilities of people (i.e., the crowd) to apply human computation for such tasks. Thus, crowdsourced data management has become an area of increasing interest in research and industry. We identify three important problems in crowdsourced data management. (1) Quality Control: Workers may return noisy or incorrect results so effective techniques are required to achieve high quality, (2) Cost Control: The crowd is not free, and cost control aims to reduce the monetary cost, (3) Latency Control: The human workers can be slow, particularly compared to automated computing time scales, so latency-control techniques are required. There has been significant work addressing these three factors for designing crowdsourced tasks, developing crowdsourced data manipulation operators, and optimizing plans consisting of multiple operators. We survey and synthesize a wide spectrum of existing studies on crowdsourced data management.

...read moreread less

Proceedings Article•DOI•

Automated Crowdturfing Attacks and Defenses in Online Review Systems

[...]

Yuanshun Yao¹, Bimal Viswanath¹, Jenna Cryan¹, Haitao Zheng¹, Ben Y. Zhao¹ - Show less +1 more•Institutions (1)

University of Chicago¹

30 Oct 2017

TL;DR: In this paper, the authors identify a new class of attacks that leverage deep learning language models (Recurrent Neural Networks or RNNs) to automate the generation of fake online reviews for products and services.

...read moreread less

Abstract: Malicious crowdsourcing forums are gaining traction as sources of spreading misinformation online, but are limited by the costs of hiring and managing human workers. In this paper, we identify a new class of attacks that leverage deep learning language models (Recurrent Neural Networks or RNNs) to automate the generation of fake online reviews for products and services. Not only are these attacks cheap and therefore more scalable, but they can control rate of content output to eliminate the signature burstiness that makes crowdsourced campaigns easy to detect. Using Yelp reviews as an example platform, we show how a two phased review generation and customization attack can produce reviews that are indistinguishable by state-of-the-art statistical detectors. We conduct a survey-based user study to show these reviews not only evade human detection, but also score high on "usefulness" metrics by users. Finally, we develop novel automated defenses against these attacks, by leveraging the lossy transformation introduced by the RNN training and generation cycle. We consider countermeasures against our mechanisms, show that they produce unattractive cost-benefit tradeoffs for attackers, and that they can be further curtailed by simple constraints imposed by online service providers.

...read moreread less

Some Like it Hoax: Automated Fake News Detection in Social Networks

[...]

Eugenio Tacchini, Gabriele Ballarin, Marco L. Della Vedova, Stefano Moret, Luca de Alfaro - Show less +1 more

25 Apr 2017

TL;DR: In this paper, the authors show that Facebook posts can be classified with high accuracy as hoaxes or non-hoaxes on the basis of the users who "like" them.

...read moreread less

Abstract: In recent years, the reliability of information on the Internet has emerged as a crucial issue of modern society. Social network sites (SNSs) have revolutionized the way in which information is spread by allowing users to freely share content. As a consequence, SNSs are also increasingly used as vectors for the diffusion of misinformation and hoaxes. The amount of disseminated information and the rapidity of its diffusion make it practically impossible to assess reliability in a timely manner, highlighting the need for automatic hoax detection systems. As a contribution towards this objective, we show that Facebook posts can be classified with high accuracy as hoaxes or non-hoaxes on the basis of the users who "liked" them. We present two classification techniques, one based on logistic regression, the other on a novel adaptation of boolean crowdsourcing algorithms. On a dataset consisting of 15,500 Facebook posts and 909,236 users, we obtain classification accuracies exceeding 99% even when the training set contains less than 1% of the posts. We further show that our techniques are robust: they work even when we restrict our attention to the users who like both hoax and non-hoax posts. These results suggest that mapping the diffusion pattern of information can be a useful component of automatic hoax detection systems.

...read moreread less

Proceedings Article•DOI•

TrioVecEvent: Embedding-Based Online Local Event Detection in Geo-Tagged Tweet Streams

[...]

Chao Zhang¹, Liyuan Liu¹, Dongming Lei¹, Quan Yuan¹, Honglei Zhuang¹, Timothy Hanratty², Jiawei Han¹ - Show less +3 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, United States Army Research Laboratory²

04 Aug 2017

TL;DR: Crowdourcing is used to evaluate TrioVecEvent, a method that leverages multimodal embeddings to achieve accurate online local event detection and introduces discriminative features that can well characterize local events.

...read moreread less

Abstract: Detecting local events (e.g., protest, disaster) at their onsets is an important task for a wide spectrum of applications, ranging from disaster control to crime monitoring and place recommendation. Recent years have witnessed growing interest in leveraging geo-tagged tweet streams for online local event detection. Nevertheless, the accuracies of existing methods still remain unsatisfactory for building reliable local event detection systems. We propose TrioVecEvent, a method that leverages multimodal embeddings to achieve accurate online local event detection. The effectiveness of TrioVecEvent is underpinned by its two-step detection scheme. First, it ensures a high coverage of the underlying local events by dividing the tweets in the query window into coherent geo-topic clusters. To generate quality geo-topic clusters, we capture short-text semantics by learning multimodal embeddings of the location, time, and text, and then perform online clustering with a novel Bayesian mixture model. Second, TrioVecEvent considers the geo-topic clusters as candidate events and extracts a set of features for classifying the candidates. Leveraging the multimodal embeddings as background knowledge, we introduce discriminative features that can well characterize local events, which enables pinpointing true local events from the candidate pool with a small amount of training data. We have used crowdsourcing to evaluate TrioVecEvent, and found that it improves the performance of the state-of-the-art method by a large margin.

...read moreread less

Journal Article•DOI•

The Role of MTurk in Education Research: Advantages, Issues, and Future Directions:

[...]

D. Jake Follmer¹, Rayne A. Sperling, Hoi K. Suen•Institutions (1)

Pennsylvania State University¹

08 Aug 2017-Educational Researcher

TL;DR: This article provides a background on the use of MTurk as a mechanism for collecting research data, and reviews what is currently known about the advantages and issues associated with using M Turk and highlights important areas for future research.

...read moreread less

Abstract: The advent of online platforms such as Amazon’s Mechanical Turk (MTurk) has expanded considerably researchers’ options for collecting research data. Many researchers, however, express understandabl...

...read moreread less

Journal Article•DOI•

Can the crowdsourcing data paradigm take atmospheric science to a new level? A case study of the urban heat island of London quantified using Netatmo weather stations

[...]

Lee Chapman¹, Cassandra Bell¹, Simon Bell¹•Institutions (1)

University of Birmingham¹

01 Jul 2017-International Journal of Climatology

TL;DR: In this article, the authors used air temperature data from the prolific, low-cost, Netatmo weather station to quantify the urban heat island of London over the summer of 2015.

...read moreread less

Abstract: Crowdsourcing techniques are frequently used across science to supplement traditional means of data collection. Although atmospheric science has so far been slow to harness the technology, developments have now reached the point where the benefits of the approaches simply cannot be ignored: crowdsourcing has potentially far-reaching consequences for the way in which measurements are collected and used in the discipline. To illustrate this point, this paper uses air temperature data from the prolific, low-cost, Netatmo weather station to quantify the urban heat island of London over the summer of 2015. The results are broadly comparable with previous studies, and indeed standard observations (albeit with a warm bias, a likely consequence of non-standard site exposure), showing a range of magnitudes of between 1 and 6 °C across the city depending on atmospheric stability. However, not all the results can be easily explained by physical processes and therefore highlight quality issues with crowdsourced data that need to be resolved. This paper aims to kickstart a step-change in the use of crowdsourcing in urban meteorology by encouraging atmospheric scientists to more positively engage with the new generation of manufacturers producing mass market sensors.

...read moreread less

Journal Article•DOI•

Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach

[...]

Byron C. Wallace¹, Anna H Noel-Storr², Iain J. Marshall³, Aaron Cohen⁴, Neil R. Smalheiser⁵, James Thomas⁶ - Show less +2 more•Institutions (6)

Northeastern University¹, University of Oxford², King's College London³, Oregon Health & Science University⁴, University of Illinois at Chicago⁵, University College London⁶

01 Nov 2017-Journal of the American Medical Informatics Association

TL;DR: Combining ML and crowdsourcing provides a highly sensitive RCT identification strategy with substantially less effort than relying on manual screening alone.

...read moreread less

Collapse