scispace - formally typeset
Search or ask a question

Showing papers on "Crowdsourcing published in 2016"


Book ChapterDOI
08 Oct 2016
TL;DR: This work proposes a novel Hollywood in Homes approach to collect data, collecting a new dataset, Charades, with hundreds of people recording videos in their own homes, acting out casual everyday activities, and evaluates and provides baseline results for several tasks including action recognition and automatic description generation.
Abstract: Computer vision has a great potential to help our daily lives by searching for lost keys, watering flowers or reminding us to take a pill. To succeed with such tasks, computer vision methods need to be trained from real and diverse examples of our daily dynamic scenes. While most of such scenes are not particularly exciting, they typically do not appear on YouTube, in movies or TV broadcasts. So how do we collect sufficiently many diverse but boring samples representing our lives? We propose a novel Hollywood in Homes approach to collect such data. Instead of shooting videos in the lab, we ensure diversity by distributing and crowdsourcing the whole process of video creation from script writing to video recording and annotation. Following this procedure we collect a new dataset, Charades, with hundreds of people recording videos in their own homes, acting out casual everyday activities. The dataset is composed of 9,848 annotated videos with an average length of 30 s, showing activities of 267 people from three continents. Each video is annotated by multiple free-text descriptions, action labels, action intervals and classes of interacted objects. In total, Charades provides 27,847 video descriptions, 66,500 temporally localized intervals for 157 action classes and 41,104 labels for 46 object classes. Using this rich data, we evaluate and provide baseline results for several tasks including action recognition and automatic description generation. We believe that the realism, diversity, and casual nature of this dataset will present unique challenges and new opportunities for computer vision community.

865 citations


Journal ArticleDOI
TL;DR: This article addresses methodological issues with using MTurk--many of which are common to other nonprobability samples but unfamiliar to clinical science researchers--and suggests concrete steps to avoid these issues or minimize their impact.
Abstract: Crowdsourcing has had a dramatic impact on the speed and scale at which scientific research can be conducted. Clinical scientists have particularly benefited from readily available research study participants and streamlined recruiting and payment systems afforded by Amazon Mechanical Turk (MTurk), a popular labor market for crowdsourcing workers. MTurk has been used in this capacity for more than five years. The popularity and novelty of the platform have spurred numerous methodological investigations, making it the most studied nonprobability sample available to researchers. This article summarizes what is known about MTurk sample composition and data quality with an emphasis on findings relevant to clinical psychological research. It then addresses methodological issues with using MTurk--many of which are common to other nonprobability samples but unfamiliar to clinical science researchers--and suggests concrete steps to avoid these issues or minimize their impact.

803 citations


Journal ArticleDOI
TL;DR: An experimental study on learning from crowds that handles data aggregation directly as part of the learning process of the convolutional neural network (CNN) via additional crowdsourcing layer (AggNet), which gives valuable insights into the functionality of deep CNN learning from crowd annotations and proves the necessity of data aggregation integration.
Abstract: The lack of publicly available ground-truth data has been identified as the major challenge for transferring recent developments in deep learning to the biomedical imaging domain. Though crowdsourcing has enabled annotation of large scale databases for real world images, its application for biomedical purposes requires a deeper understanding and hence, more precise definition of the actual annotation task. The fact that expert tasks are being outsourced to non-expert users may lead to noisy annotations introducing disagreement between users. Despite being a valuable resource for learning annotation models from crowdsourcing, conventional machine-learning methods may have difficulties dealing with noisy annotations during training. In this manuscript, we present a new concept for learning from crowds that handle data aggregation directly as part of the learning process of the convolutional neural network (CNN) via additional crowdsourcing layer (AggNet). Besides, we present an experimental study on learning from crowds designed to answer the following questions. 1) Can deep CNN be trained with data collected from crowdsourcing? 2) How to adapt the CNN to train on multiple types of annotation datasets (ground truth and crowd-based)? 3) How does the choice of annotation and aggregation affect the accuracy? Our experimental setup involved Annot8, a self-implemented web-platform based on Crowdflower API realizing image annotation tasks for a publicly available biomedical image database. Our results give valuable insights into the functionality of deep CNN learning from crowd annotations and prove the necessity of data aggregation integration.

512 citations


Posted Content
TL;DR: Charades as discussed by the authors is a collection of 9,848 annotated videos with an average length of 30 seconds, showing activities of 267 people from three continents, each video is annotated by multiple free-text descriptions, action labels, action intervals and classes of interacted objects.
Abstract: Computer vision has a great potential to help our daily lives by searching for lost keys, watering flowers or reminding us to take a pill. To succeed with such tasks, computer vision methods need to be trained from real and diverse examples of our daily dynamic scenes. While most of such scenes are not particularly exciting, they typically do not appear on YouTube, in movies or TV broadcasts. So how do we collect sufficiently many diverse but boring samples representing our lives? We propose a novel Hollywood in Homes approach to collect such data. Instead of shooting videos in the lab, we ensure diversity by distributing and crowdsourcing the whole process of video creation from script writing to video recording and annotation. Following this procedure we collect a new dataset, Charades, with hundreds of people recording videos in their own homes, acting out casual everyday activities. The dataset is composed of 9,848 annotated videos with an average length of 30 seconds, showing activities of 267 people from three continents. Each video is annotated by multiple free-text descriptions, action labels, action intervals and classes of interacted objects. In total, Charades provides 27,847 video descriptions, 66,500 temporally localized intervals for 157 action classes and 41,104 labels for 46 object classes. Using this rich data, we evaluate and provide baseline results for several tasks including action recognition and automatic description generation. We believe that the realism, diversity, and casual nature of this dataset will present unique challenges and new opportunities for computer vision community.

458 citations


Journal ArticleDOI
TL;DR: Modelling with Stakeholders is updated and builds on Voinov and Bousquet, 2010, and structured mechanisms to examine and account for human biases and beliefs in participatory modelling are suggested.
Abstract: This paper updates and builds on 'Modelling with Stakeholders' Voinov and Bousquet, 2010 which demonstrated the importance of, and demand for, stakeholder participation in resource and environmental modelling. This position paper returns to the concepts of that publication and reviews the progress made since 2010. A new development is the wide introduction and acceptance of social media and web applications, which dramatically changes the context and scale of stakeholder interactions and participation. Technology advances make it easier to incorporate information in interactive formats via visualization and games to augment participatory experiences. Citizens as stakeholders are increasingly demanding to be engaged in planning decisions that affect them and their communities, at scales from local to global. How people interact with and access models and data is rapidly evolving. In turn, this requires changes in how models are built, packaged, and disseminated: citizens are less in awe of experts and external authorities, and they are increasingly aware of their own capabilities to provide inputs to planning processes, including models. The continued acceleration of environmental degradation and natural resource depletion accompanies these societal changes, even as there is a growing acceptance of the need to transition to alternative, possibly very different, life styles. Substantive transitions cannot occur without significant changes in human behaviour and perceptions. The important and diverse roles that models can play in guiding human behaviour, and in disseminating and increasing societal knowledge, are a feature of stakeholder processes today. Display Omitted Participatory modelling has become mainstream in resource and environmental management.We review recent contributions to participatory environmental modelling to identify the tools, methods and processes applied.Global internet connectivity, social media and crowdsourcing create opportunities for participatory modelling.We suggest structured mechanisms to examine and account for human biases and beliefs in participatory modelling.Advanced visualization tools, gaming, and virtual environments improve communication with stakeholders.

404 citations


Posted Content
TL;DR: In this article, a method that combines crowdsourcing and machine learning to analyze personal attacks at scale is presented, which shows that the majority of personal attacks on Wikipedia are not the result of a few malicious users, nor primarily the consequence of allowing anonymous contributions from unregistered users.
Abstract: The damage personal attacks cause to online discourse motivates many platforms to try to curb the phenomenon. However, understanding the prevalence and impact of personal attacks in online platforms at scale remains surprisingly difficult. The contribution of this paper is to develop and illustrate a method that combines crowdsourcing and machine learning to analyze personal attacks at scale. We show an evaluation method for a classifier in terms of the aggregated number of crowd-workers it can approximate. We apply our methodology to English Wikipedia, generating a corpus of over 100k high quality human-labeled comments and 63M machine-labeled ones from a classifier that is as good as the aggregate of 3 crowd-workers, as measured by the area under the ROC curve and Spearman correlation. Using this corpus of machine-labeled scores, our methodology allows us to explore some of the open questions about the nature of online personal attacks. This reveals that the majority of personal attacks on Wikipedia are not the result of a few malicious users, nor primarily the consequence of allowing anonymous contributions from unregistered users.

403 citations


Journal ArticleDOI
TL;DR: A snapshot of the role of citizens in crowdsourcing geographic information is provided and a guide to the current status of this rapidly emerging and evolving subject is provided.
Abstract: Citizens are increasingly becoming an important source of geographic information, sometimes entering domains that had until recently been the exclusive realm of authoritative agencies. This activity has a very diverse character as it can, amongst other things, be active or passive, involve spatial or aspatial data and the data provided can be variable in terms of key attributes such as format, description and quality. Unsurprisingly, therefore, there are a variety of terms used to describe data arising from citizens. In this article, the expressions used to describe citizen sensing of geographic information are reviewed and their use over time explored, prior to categorizing them and highlighting key issues in the current state of the subject. The latter involved a review of ~100 Internet sites with particular focus on their thematic topic, the nature of the data and issues such as incentives for contributors. This review suggests that most sites involve active rather than passive contribution, with citizens typically motivated by the desire to aid a worthy cause, often receiving little training. As such, this article provides a snapshot of the role of citizens in crowdsourcing geographic information and a guide to the current status of this rapidly emerging and evolving subject.

304 citations


Proceedings ArticleDOI
16 May 2016
TL;DR: This paper identifies a more practical micro-task allocation problem, called the Global Online Micro-task Allocation in spatial crowdsourcing (GOMA) problem, and proposes a two-phase-based framework, based on which the TGOA algorithm with 1 over 4 -competitive ratio under the online random order model is presented.
Abstract: With the rapid development of smartphones, spatial crowdsourcing platforms are getting popular. A foundational research of spatial crowdsourcing is to allocate micro-tasks to suitable crowd workers. Most existing studies focus on offline scenarios, where all the spatiotemporal information of micro-tasks and crowd workers is given. However, they are impractical since micro-tasks and crowd workers in real applications appear dynamically and their spatiotemporal information cannot be known in advance. In this paper, to address the shortcomings of existing offline approaches, we first identify a more practical micro-task allocation problem, called the Global Online Micro-task Allocation in spatial crowdsourcing (GOMA) problem. We first extend the state-of-art algorithm for the online maximum weighted bipartite matching problem to the GOMA problem as the baseline algorithm. Although the baseline algorithm provides theoretical guarantee for the worst case, its average performance in practice is not good enough since the worst case happens with a very low probability in real world. Thus, we consider the average performance of online algorithms, a.k.a online random order model.We propose a two-phase-based framework, based on which we present the TGOA algorithm with 1 over 4 -competitive ratio under the online random order model. To improve its efficiency, we further design the TGOA-Greedy algorithm following the framework, which runs faster than the TGOA algorithm but has lower competitive ratio of 1 over 8. Finally, we verify the effectiveness and efficiency of the proposed methods through extensive experiments on real and synthetic datasets.

271 citations


Journal ArticleDOI
TL;DR: This paper surveys and synthesizes a wide spectrum of existing studies on crowdsourced data management and outlines key factors that need to be considered to improve crowdsourcing data management.
Abstract: Any important data management and analytics tasks cannot be completely addressed by automated processes. These tasks, such as entity resolution, sentiment analysis, and image recognition can be enhanced through the use of human cognitive ability. Crowdsouring platforms are an effective way to harness the capabilities of people (i.e., the crowd) to apply human computation for such tasks. Thus, crowdsourced data management has become an area of increasing interest in research and industry. We identify three important problems in crowdsourced data management. (1) Quality Control: Workers may return noisy or incorrect results so effective techniques are required to achieve high quality; (2) Cost Control: The crowd is not free, and cost control aims to reduce the monetary cost; (3) Latency Control: The human workers can be slow, particularly compared to automated computing time scales, so latency-control techniques are required. There has been significant work addressing these three factors for designing crowdsourced tasks, developing crowdsourced data manipulation operators, and optimizing plans consisting of multiple operators. In this paper, we survey and synthesize a wide spectrum of existing studies on crowdsourced data management. Based on this analysis we then outline key factors that need to be considered to improve crowdsourced data management.

240 citations


Proceedings ArticleDOI
01 Jan 2016
TL;DR: A corpus of 115 privacy policies with manual annotations for 23K fine-grained data practices is introduced and the process of using skilled annotators and a purpose-built annotation tool to produce the data is described.
Abstract: Website privacy policies are often ignored by Internet users, because these documents tend to be long and difficult to understand. However, the significance of privacy policies greatly exceeds the attention paid to them: these documents are binding legal agreements between website operators and their users, and their opaqueness is a challenge not only to Internet users but also to policy regulators. One proposed alternative to the status quo is to automate or semi-automate the extraction of salient details from privacy policy text, using a combination of crowdsourcing, natural language processing, and machine learning. However, there has been a relative dearth of datasets appropriate for identifying data practices in privacy policies. To remedy this problem, we introduce a corpus of 115 privacy policies (267K words) with manual annotations for 23K fine-grained data practices. We describe the process of using skilled annotators and a purpose-built annotation tool to produce the data. We provide findings based on a census of the annotations and show results toward automating the annotation procedure. Finally, we describe challenges and opportunities for the research community to use this corpus to advance research in both privacy and language technologies.

216 citations


Book ChapterDOI
08 Oct 2016
TL;DR: A new crowdsourced dataset containing 110,988 images from 56 cities, and 1,170,000 pairwise comparisons provided by 81,630 online volunteers along six perceptual attributes are introduced, showing that crowdsourcing combined with neural networks can produce urban perception data at the global scale.
Abstract: Computer vision methods that quantify the perception of urban environment are increasingly being used to study the relationship between a city’s physical appearance and the behavior and health of its residents. Yet, the throughput of current methods is too limited to quantify the perception of cities across the world. To tackle this challenge, we introduce a new crowdsourced dataset containing 110,988 images from 56 cities, and 1,170,000 pairwise comparisons provided by 81,630 online volunteers along six perceptual attributes: safe, lively, boring, wealthy, depressing, and beautiful. Using this data, we train a Siamese-like convolutional neural architecture, which learns from a joint classification and ranking loss, to predict human judgments of pairwise image comparisons. Our results show that crowdsourcing combined with neural networks can produce urban perception data at the global scale.

Proceedings ArticleDOI
07 May 2016
TL;DR: An unsupervised system to capture dominating user behaviors from clickstream data, and visualize the detected behaviors in an intuitive manner, which effectively identifies previously unknown behaviors, e.g., dormant users, hostile chatters.
Abstract: Online services are increasingly dependent on user participation Whether it's online social networks or crowdsourcing services, understanding user behavior is important yet challenging In this paper, we build an unsupervised system to capture dominating user behaviors from clickstream data (traces of users' click events), and visualize the detected behaviors in an intuitive manner Our system identifies "clusters" of similar users by partitioning a similarity graph (nodes are users; edges are weighted by clickstream similarity) The partitioning process leverages iterative feature pruning to capture the natural hierarchy within user clusters and produce intuitive features for visualizing and understanding captured user behaviors For evaluation, we present case studies on two large-scale clickstream traces (142 million events) from real social networks Our system effectively identifies previously unknown behaviors, eg, dormant users, hostile chatters Also, our user study shows people can easily interpret identified behaviors using our visualization tool

Journal ArticleDOI
TL;DR: This paper overviews data sources, analytical approaches, and application systems for social transportation, and suggests a few future research directions for this new social transportation field.
Abstract: Big data for social transportation brings us unprecedented opportunities for resolving transportation problems for which traditional approaches are not competent and for building the next-generation intelligent transportation systems. Although social data have been applied for transportation analysis, there are still many challenges. First, social data evolve with time and contain abundant information, posing a crucial need for data collection and cleaning. Meanwhile, each type of data has specific advantages and limitations for social transportation, and one data type alone is not capable of describing the overall state of a transportation system. Systematic data fusing approaches or frameworks for combining social signal data with different features, structures, resolutions, and precision are needed. Second, data processing and mining techniques, such as natural language processing and analysis of streaming data, require further revolutions in effective utilization of real-time traffic information. Third, social data are connected to cyber and physical spaces. To address practical problems in social transportation, a suite of schemes are demanded for realizing big data in social transportation systems, such as crowdsourcing, visual analysis, and task-based services. In this paper, we overview data sources, analytical approaches, and application systems for social transportation, and we also suggest a few future research directions for this new social transportation field.

Journal ArticleDOI
TL;DR: This study explores microtask CS as perceived by crowd workers, revealing their values as a means of informing the design of CS platforms, and offers recommendations regarding the ethical use of crowd workers and calls for improving MTurk platform design for greater worker empowerment.
Abstract: Crowdsourcing (CS) of micro tasks is a relatively new, open source work form enabled by information and communication technologies. While anecdotal evidence of its benefits abounds, our understanding of the phenomenon's societal consequences remains limited. Drawing on value sensitive design (VSD), we explore microtask CS as perceived by crowd workers, revealing their values as a means of informing the design of CS platforms. Analyzing detailed narratives of 210 crowd workers participating in Amazon's Mechanical Turk (MTurk), we uncover a set of nine values they share: access, autonomy, fairness, transparency, communication, security, accountability, making an impact, and dignity. We find that these values are implicated in four crowdsourcing structures: compensation, governance, technology, and microtask. Two contrasting perceptions—empowerment and marginalization—coexist, forming a duality of microtask CS. The study contributes to the CS and VSD literatures, heightens awareness of worker marginalization in microtask CS, and offers guidelines for improving CS practice. Specifically, we offer recommendations regarding the ethical use of crowd workers (including for academic research), and call for improving MTurk platform design for greater worker empowerment.

Proceedings ArticleDOI
27 Feb 2016
TL;DR: Crowdworkers are not the independent, autonomous workers they are often assumed to be, but instead work within a social network of other crowdworkers to fulfill technical and social needs left by the platform they work on.
Abstract: The main goal of this paper is to show that crowdworkers collaborate to fulfill technical and social needs left by the platform they work on. That is, crowdworkers are not the independent, autonomous workers they are often assumed to be, but instead work within a social network of other crowdworkers. Crowdworkers collaborate with members of their networks to 1) manage the administrative overhead associated with crowdwork, 2) find lucrative tasks and reputable employers and 3) recreate the social connections and support often associated with brick and mortar-work environments. Our evidence combines ethnography, interviews, survey data and larger scale data analysis from four crowdsourcing platforms, emphasizing the qualitative data from the Amazon Mechanical Turk (MTurk) platform and Microsoft's proprietary crowdsourcing platform, the Universal Human Relevance System (UHRS). This paper draws from an ongoing, longitudinal study of Crowdwork that uses a mixed methods approach to understand the cultural meaning, political implications, and ethical demands of crowdsourcing.

Journal ArticleDOI
TL;DR: This paper addresses the opportunities of Big Data in healthcare together with issues of responsibility and accountability and aims to pave the way for public policy to support a balanced agenda that safeguards personal information while enabling the use of data to improve public health.
Abstract: Research on large shared medical datasets and data-driven research are gaining fast momentum and provide major opportunities for improving health systems as well as individual care. Such open data can shed light on the causes of disease and effects of treatment, including adverse reactions side-effects of treatments, while also facilitating analyses tailored to an individual's characteristics, known as personalized or "stratified medicine." Developments, such as crowdsourcing, participatory surveillance, and individuals pledging to become "data donors" and the "quantified self" movement (where citizens share data through mobile device-connected technologies), have great potential to contribute to our knowledge of disease, improving diagnostics, and delivery of -healthcare and treatment. There is not only a great potential but also major concerns over privacy, confidentiality, and control of data about individuals once it is shared. Issues, such as user trust, data privacy, transparency over the control of data ownership, and the implications of data analytics for personal privacy with potentially intrusive inferences, are becoming increasingly scrutinized at national and international levels. This can be seen in the recent backlash over the proposed implementation of care.data, which enables individuals' NHS data to be linked, retained, and shared for other uses, such as research and, more controversially, with businesses for commercial exploitation. By way of contrast, through increasing popularity of social media, GPS-enabled mobile apps and tracking/wearable devices, the IT industry and MedTech giants are pursuing new projects without clear public and policy discussion about ownership and responsibility for user-generated data. In the absence of transparent regulation, this paper addresses the opportunities of Big Data in healthcare together with issues of responsibility and accountability. It also aims to pave the way for public policy to support a balanced agenda that safeguards personal information while enabling the use of data to improve public health.

Journal ArticleDOI
TL;DR: The proposed incentive mechanism includes two algorithms which are an improved two-stage auction algorithm (ITA) and a truthful online reputation updating algorithm (TORU) which can solve the free-riding problem and improve the efficiency and utility of mobile crowdsourcing systems effectively.

Journal ArticleDOI
TL;DR: Two online mechanisms are designed, OMZ and OMG, satisfying the computational efficiency, individual rationality, budget feasibility, truthfulness, consumer sovereignty, and constant competitiveness under the zero arrival-departure interval case and a more general case, respectively.
Abstract: Mobile crowd sensing (MCS) is a new paradigm that takes advantage of pervasive mobile devices to efficiently collect data, enabling numerous novel applications. To achieve good service quality for an MCS application, incentive mechanisms are necessary to attract more user participation. Most existing mechanisms apply only for the offline scenario where all users report their strategic types in advance. On the contrary, we focus on a more realistic scenario where users arrive one by one online in a random order. Based on the online auction model, we investigate the problem that users submit their private types to the crowdsourcer when arriving, and the crowdsourcer aims at selecting a subset of users before a specified deadline for maximizing the value of services (assumed to be a nonnegative monotone submodular function) provided by selected users under a budget constraint. We design two online mechanisms, OMZ and OMG, satisfying the computational efficiency, individual rationality, budget feasibility, truthfulness, consumer sovereignty, and constant competitiveness under the zero arrival-departure interval case and a more general case, respectively. Through extensive simulations, we evaluate the performance and validate the theoretical properties of our online mechanisms.

Proceedings ArticleDOI
05 Jan 2016
TL;DR: The results of the review indicate that gamification has been an effective approach for increasing crowdsourcing participation and insights are provided for designers of gamified systems and further research on the topics of gamification and crowdsourcing.
Abstract: This study investigates how different gamification implementations can increase crowdsourcees' motivation and participation in crowdsourcing (CS). To this end, we review empirical literature that has investigated the use of gamification in crowdsourcing settings. Overall, the results of the review indicate that gamification has been an effective approach for increasing crowdsourcing participation. When comparing crowdcreating, -solving, -processing and-rating CS approaches, the results show differences in the use of gamification across CS types. Crowdsourcing initiatives that provide more monotonous tasks most commonly used mere points and other simpler gamification implementations, whereas CS initiatives that seek for diverse and creative contributions have employed gamification in more manifold ways employing a richer set of mechanics. These findings provide insights for designers of gamified systems and further research on the topics of gamification and crowdsourcing.

Journal ArticleDOI
TL;DR: Two crowdsourcing-based WPSs are proposed to build the databases on handheld devices by using designed algorithms and an inertial navigation solution from a Trusted Portable Navigator (T-PN), and implement a simple MEMS-based sensors' solution.
Abstract: Current WiFi positioning systems (WPSs) require databases – such as locations of WiFi access points and propagation parameters, or a radio map – to assist with positioning. Typically, procedures for building such databases are time-consuming and labour-intensive. In this paper, two autonomous crowdsourcing systems are proposed to build the databases on handheld devices by using our designed algorithms and an inertial navigation solution from a Trusted Portable Navigator (T-PN). The proposed systems, running on smartphones, build and update the database autonomously and adaptively to account for the dynamic environment. To evaluate the performance of automatically generated databases, two improved WiFi positioning schemes (fingerprinting and trilateration) corresponding to these two database building systems, are also discussed. The main contribution of the paper is the proposal of two crowdsourcing-based WPSs that eliminate the various limitations of current crowdsourcing-based systems which (a) require a floor plan or GPS, (b) are suitable only for specific indoor environments, and (c) implement a simple MEMS-based sensors’ solution. In addition, these two WPSs are evaluated and compared through field tests. Results in different test scenarios show that average positioning errors of both proposed systems are all less than 5.75 m.

Posted Content
TL;DR: In this article, a Siamese-like convolutional neural network was used to predict human judgments of pairwise image comparisons. But, the accuracy of the predictions was limited by the throughput of current methods, which is too limited to quantify the perception of cities across the world.
Abstract: Computer vision methods that quantify the perception of urban environment are increasingly being used to study the relationship between a city's physical appearance and the behavior and health of its residents. Yet, the throughput of current methods is too limited to quantify the perception of cities across the world. To tackle this challenge, we introduce a new crowdsourced dataset containing 110,988 images from 56 cities, and 1,170,000 pairwise comparisons provided by 81,630 online volunteers along six perceptual attributes: safe, lively, boring, wealthy, depressing, and beautiful. Using this data, we train a Siamese-like convolutional neural architecture, which learns from a joint classification and ranking loss, to predict human judgments of pairwise image comparisons. Our results show that crowdsourcing combined with neural networks can produce urban perception data at the global scale.

Journal ArticleDOI
TL;DR: Crowdsourcing the analysis of complex and massive data has emerged as a framework to find robust methodologies to solve diverse and important biomedical problems, and foster the creation and dissemination of well-curated data repositories.
Abstract: Considerable resources are required to gain maximal insights into the diverse big data sets in biomedicine. In this Review, the authors discuss how crowdsourcing, in the form of collaborative competitions (known as Challenges), can engage the scientific community to provide the diverse expertise and methodological approaches that can robustly address some of the most pressing questions in genetics, genomics and biomedical sciences. The generation of large-scale biomedical data is creating unprecedented opportunities for basic and translational science. Typically, the data producers perform initial analyses, but it is very likely that the most informative methods may reside with other groups. Crowdsourcing the analysis of complex and massive data has emerged as a framework to find robust methodologies. When the crowdsourcing is done in the form of collaborative scientific competitions, known as Challenges, the validation of the methods is inherently addressed. Challenges also encourage open innovation, create collaborative communities to solve diverse and important biomedical problems, and foster the creation and dissemination of well-curated data repositories.

Journal ArticleDOI
TL;DR: This survey introduces the basic concepts of the qualities of labels and learning models, and introduces open accessible real-world data sets collected from crowdsourcing systems and open source libraries and tools.
Abstract: With the rapid growing of crowdsourcing systems, quite a few applications based on a supervised learning paradigm can easily obtain massive labeled data at a relatively low cost. However, due to the variable uncertainty of crowdsourced labelers, learning procedures face great challenges. Thus, improving the qualities of labels and learning models plays a key role in learning from the crowdsourced labeled data. In this survey, we first introduce the basic concepts of the qualities of labels and learning models. Then, by reviewing recently proposed models and algorithms on ground truth inference and learning models, we analyze connections and distinctions among these techniques as well as clarify the level of the progress of related researches. In order to facilitate the studies in this field, we also introduce open accessible real-world data sets collected from crowdsourcing systems and open source libraries and tools. Finally, some potential issues for future studies are discussed.

Journal ArticleDOI
TL;DR: This work uses a cross-sectional research design to analyse publicly available data from an open idea call, and reveals that ideators paying major attention to crowdsourced ideas of others, the idea popularity, as well as its potential innovativeness positively influence whether an idea for NPD is implemented by the crowdsourcing company.

Journal ArticleDOI
TL;DR: A new indoor subarea localization scheme via fingerprint crowdsourcing, clustering, and matching, which first constructs subarea fingerprints from crowdsourced RSS measurements and relates them to indoor layouts, and also proposes a new online localization algorithm to deal with the device diversity issue.
Abstract: Nowadays, smartphones have become indispensable to everyone, with more and more built-in location-based applications to enrich our daily life. In the last decade, fingerprinting based on RSS has become a research focus in indoor localization, due to its minimum hardware requirement and satisfiable positioning accuracy. However, its time-consuming and labor-intensive site survey is a big hurdle for practical deployments. Fingerprint crowdsourcing has recently been promoted to relieve the burden of site survey by allowing common users to contribute to fingerprint collection in a participatory sensing manner. For its promising commitment, new challenges arise to practice fingerprint crowdsourcing. This article first identifies two main challenging issues, fingerprint annotation and device diversity, and then reviews the state of the art of fingerprint crowdsourcing-based indoor localization systems, comparing their approaches to cope with the two challenges. We then propose a new indoor subarea localization scheme via fingerprint crowdsourcing, clustering, and matching, which first constructs subarea fingerprints from crowdsourced RSS measurements and relates them to indoor layouts. We also propose a new online localization algorithm to deal with the device diversity issue. Our experiment results show that in a typical indoor scenario, the proposed scheme can achieve a 95 percent hit rate to correctly locate a smartphone in its subarea.

Journal ArticleDOI
18 Mar 2016
TL;DR: A hybrid crowdsourcing and real-time machine learning solution to rapidly process large volumes of aerial data for disaster response in a time-sensitive manner and can be applied to both aerial and satellite imagery and has applications beyond disaster response.
Abstract: Aerial imagery captured via unmanned aerial vehicles (UAVs) is playing an increasingly important role in disaster response. Unlike satellite imagery, aerial imagery can be captured and processed within hours rather than days. In addition, the spatial resolution of aerial imagery is an order of magnitude higher than the imagery produced by the most sophisticated commercial satellites today. Both the United States Federal Emergency Management Agency (FEMA) and the European Commission's Joint Research Center (JRC) have noted that aerial imagery will inevitably present a big data challenge. The purpose of this article is to get ahead of this future challenge by proposing a hybrid crowdsourcing and real-time machine learning solution to rapidly process large volumes of aerial data for disaster response in a time-sensitive manner. Crowdsourcing can be used to annotate features of interest in aerial images (such as damaged shelters and roads blocked by debris). These human-annotated features can then be used to train a supervised machine learning system to learn to recognize such features in new unseen images. In this article, we describe how this hybrid solution for image analysis can be implemented as a module (i.e., Aerial Clicker) to extend an existing platform called Artificial Intelligence for Disaster Response (AIDR), which has already been deployed to classify microblog messages during disasters using its Text Clicker module and in response to Cyclone Pam, a category 5 cyclone that devastated Vanuatu in March 2015. The hybrid solution we present can be applied to both aerial and satellite imagery and has applications beyond disaster response such as wildlife protection, human rights, and archeological exploration. As a proof of concept, we recently piloted this solution using very high-resolution aerial photographs of a wildlife reserve in Namibia to support rangers with their wildlife conservation efforts (SAVMAP project, http://lasig.epfl.ch/savmap ). The results suggest that the platform we have developed to combine crowdsourcing and machine learning to make sense of large volumes of aerial images can be used for disaster response.

Journal ArticleDOI
TL;DR: It is shown that the process of creating Big Data from local and global sources of knowledge entails the transformation of information as it moves from one distinct group of contributors to the next, and locally based, affected people and often the original ‘crowd’ are excluded from the information flow.
Abstract: The aim of this paper is to critically explore whether crowdsourced Big Data enables an inclusive humanitarian response at times of crisis. We argue that all data, including Big Data, are socially constructed artefacts that reflect the contexts and processes of their creation. To support our argument, we qualitatively analysed the process of ‘Big Data making’ that occurred by way of crowdsourcing through open data platforms, in the context of two specific humanitarian crises, namely the 2010 earthquake in Haiti and the 2015 earthquake in Nepal. We show that the process of creating Big Data from local and global sources of knowledge entails the transformation of information as it moves from one distinct group of contributors to the next. The implication of this transformation is that locally based, affected people and often the original ‘crowd’ are excluded from the information flow, and from the interpretation process of crowdsourced crisis knowledge, as used by formal responding organizations, and are marginalized in their ability to benefit from Big Data in support of their own means. Our paper contributes a critical perspective to the debate on participatory Big Data, by explaining the process of in and exclusion during data making, towards more responsive humanitarian relief.

Journal ArticleDOI
TL;DR: Almost surreptitiously, crowdsourcing has entered software engineering practice, and many development projects use crowdsourcing-for example, to squash bugs, test software, or gather alternative UI designs.
Abstract: Almost surreptitiously, crowdsourcing has entered software engineering practice. In-house development, contracting, and outsourcing still dominate, but many development projects use crowdsourcing-for example, to squash bugs, test software, or gather alternative UI designs. Although the overall impact has been mundane so far, crowdsourcing could lead to fundamental, disruptive changes in how software is developed. Various crowdsourcing models have been applied to software development. Such changes offer exciting opportunities, but several challenges must be met for crowdsourcing software development to reach its potential.

Journal ArticleDOI
TL;DR: While Mechanical Turk is currently the most popular crowdsourcing website for research, this paper presents general concepts, patterns, and suggestions that can be applied beyond Mechanical Turk to other crowdsourcing and online research.
Abstract: Amazon Mechanical Turk, an online marketplace designed for crowdsourcing tasks to other people for compensation, is growing in popularity as a platform for gathering research data within the social sciences. Sociology, compared to some other social sciences, has not been as quick to adopt this form of data collection. Therefore, in this paper I overview the basics of Mechanical Turk research and suggest its pros and cons, both in general and in relation to different sociological data-collection methods and research needs. While Mechanical Turk is currently the most popular crowdsourcing website for research, I present general concepts, patterns, and suggestions that can be applied beyond Mechanical Turk to other crowdsourcing and online research.

Journal ArticleDOI
TL;DR: A Lyapunov optimization based decision support approach - the Reputation-aware Task Sub-delegation approach with dynamic worker effort Pricing (RTS-P) - with objective functions aiming to achieve superlinear time-averaged collective productivity in an HCN.
Abstract: Hierarchical crowdsourcing networks (HCNs) provide a useful mechanism for social mobilization. However, spontaneous evolution of the complex resource allocation dynamics can lead to undesirable herding behaviours in which a small group of reputable workers are overloaded while leaving other workers idle. Existing herding control mechanisms designed for typical crowdsourcing systems are not effective in HCNs. In order to bridge this gap, we investigate the herding dynamics in HCNs and propose a Lyapunov optimization based decision support approach - the Reputation-aware Task Sub-delegation approach with dynamic worker effort Pricing (RTS-P) - with objective functions aiming to achieve superlinear time-averaged collective productivity in an HCN. By considering the workers’ current reputation, workload, eagerness to work, and trust relationships, RTS-P provides a systematic approach to mitigate herding by helping workers make joint decisions on task sub-delegation, task acceptance, and effort pricing in a distributed manner. It is an individual-level decision support approach which results in the emergence of productive and robust collective patterns in HCNs. High resolution simulations demonstrate that RTS-P mitigates herding more effectively than state-of-the-art approaches.