scispace - formally typeset
Search or ask a question
Topic

Crowdsourcing

About: Crowdsourcing is a research topic. Over the lifetime, 12889 publications have been published within this topic receiving 230638 citations.


Papers
More filters
Proceedings ArticleDOI
07 May 2016
TL;DR: An unsupervised system to capture dominating user behaviors from clickstream data, and visualize the detected behaviors in an intuitive manner, which effectively identifies previously unknown behaviors, e.g., dormant users, hostile chatters.
Abstract: Online services are increasingly dependent on user participation Whether it's online social networks or crowdsourcing services, understanding user behavior is important yet challenging In this paper, we build an unsupervised system to capture dominating user behaviors from clickstream data (traces of users' click events), and visualize the detected behaviors in an intuitive manner Our system identifies "clusters" of similar users by partitioning a similarity graph (nodes are users; edges are weighted by clickstream similarity) The partitioning process leverages iterative feature pruning to capture the natural hierarchy within user clusters and produce intuitive features for visualizing and understanding captured user behaviors For evaluation, we present case studies on two large-scale clickstream traces (142 million events) from real social networks Our system effectively identifies previously unknown behaviors, eg, dormant users, hostile chatters Also, our user study shows people can easily interpret identified behaviors using our visualization tool

211 citations

Proceedings ArticleDOI
31 May 2014
TL;DR: This case study highlights a number of challenges that arise when crowdsourcing software development at a multinational corporation and works better for specific software development tasks that are less complex and stand-alone without inter dependency.
Abstract: Crowdsourcing is an emerging and promising approach which involves delegating a variety of tasks to an unknown workforce - the crowd. Crowdsourcing has been applied quite successfully in various contexts from basic tasks on Amazon Mechanical Turk to solving complex industry problems, e.g. InnoCentive. Companies are increasingly using crowdsourcing to accomplish specific software development tasks. However, very little research exists on this specific topic. This paper presents an in-depth industry case study of crowdsourcing software development at a multinational corporation. Our case study highlights a number of challenges that arise when crowdsourcing software development. For example, the crowdsourcing development process is essentially a waterfall model and this must eventually be integrated with the agile approach used by the company. Crowdsourcing works better for specific software development tasks that are less complex and stand-alone without interdependencies. The development cost was much greater than originally expected, overhead in terms of company effort to prepare specifications and answer crowdsourcing community queries was much greater, and the time-scale to complete contests, review submissions and resolve quality issues was significant. Finally, quality issues were pushed later in the lifecycle given the lengthy process necessary to identify and resolve quality issues. Given the emphasis in software engineering on identifying bugs as early as possible, this is quite problematic.

207 citations

Journal ArticleDOI
TL;DR: The LIVE In the Wild Image Quality Challenge Database as discussed by the authors contains widely diverse authentic image distortions on a large number of images captured using a representative variety of modern mobile devices and has been used to conduct a very large-scale, multi-month image quality assessment subjective study.
Abstract: Most publicly available image quality databases have been created under highly controlled conditions by introducing graded simulated distortions onto high-quality photographs. However, images captured using typical real-world mobile camera devices are usually afflicted by complex mixtures of multiple distortions, which are not necessarily well-modeled by the synthetic distortions found in existing databases. The originators of existing legacy databases usually conducted human psychometric studies to obtain statistically meaningful sets of human opinion scores on images in a stringently controlled visual environment, resulting in small data collections relative to other kinds of image analysis databases. Towards overcoming these limitations, we designed and created a new database that we call the LIVE In the Wild Image Quality Challenge Database, which contains widely diverse authentic image distortions on a large number of images captured using a representative variety of modern mobile devices. We also designed and implemented a new online crowdsourcing system, which we have used to conduct a very large-scale, multi-month image quality assessment subjective study. Our database consists of over 350000 opinion scores on 1162 images evaluated by over 7000 unique human observers. Despite the lack of control over the experimental environments of the numerous study participants, we demonstrate excellent internal consistency of the subjective dataset. We also evaluate several top-performing blind Image Quality Assessment algorithms on it and present insights on how mixtures of distortions challenge both end users as well as automatic perceptual quality prediction models.

207 citations

Journal ArticleDOI
TL;DR: In this paper, in order to detect and describe the real time urban emergency event, the 5W (What, Where, When, Who, and Why) model is proposed and results show the accuracy and efficiency of the proposed method.
Abstract: Crowdsourcing is a process of acquisition, integration, and analysis of big and heterogeneous data generated by a diversity of sources in urban spaces, such as sensors, devices, vehicles, buildings, and human. Especially, nowadays, no countries, no communities, and no person are immune to urban emergency events. Detection about urban emergency events, e.g., fires, storms, traffic jams is of great importance to protect the security of humans. Recently, social media feeds are rapidly emerging as a novel platform for providing and dissemination of information that is often geographic. The content from social media usually includes references to urban emergency events occurring at, or affecting specific locations. In this paper, in order to detect and describe the real time urban emergency event, the 5W (What, Where, When, Who, and Why) model is proposed. Firstly, users of social media are set as the target of crowd sourcing. Secondly, the spatial and temporal information from the social media are extracted to detect the real time event. Thirdly, a GIS based annotation of the detected urban emergency event is shown. The proposed method is evaluated with extensive case studies based on real urban emergency events. The results show the accuracy and efficiency of the proposed method.

206 citations

Proceedings ArticleDOI
02 May 2017
TL;DR: Revolt eliminates the burden of creating detailed label guidelines by harnessing crowd disagreements to identify ambiguous concepts and create rich structures (groups of semantically related items) for post-hoc label decisions.
Abstract: Crowdsourcing provides a scalable and efficient way to construct labeled datasets for training machine learning systems. However, creating comprehensive label guidelines for crowdworkers is often prohibitive even for seemingly simple concepts. Incomplete or ambiguous label guidelines can then result in differing interpretations of concepts and inconsistent labels. Existing approaches for improving label quality, such as worker screening or detection of poor work, are ineffective for this problem and can lead to rejection of honest work and a missed opportunity to capture rich interpretations about data. We introduce Revolt, a collaborative approach that brings ideas from expert annotation workflows to crowd-based labeling. Revolt eliminates the burden of creating detailed label guidelines by harnessing crowd disagreements to identify ambiguous concepts and create rich structures (groups of semantically related items) for post-hoc label decisions. Experiments comparing Revolt to traditional crowdsourced labeling show that Revolt produces high quality labels without requiring label guidelines in turn for an increase in monetary cost. This up front cost, however, is mitigated by Revolt's ability to produce reusable structures that can accommodate a variety of label boundaries without requiring new data to be collected. Further comparisons of Revolt's collaborative and non-collaborative variants show that collaboration reaches higher label accuracy with lower monetary cost.

205 citations


Network Information
Related Topics (5)
Social network
42.9K papers, 1.5M citations
87% related
User interface
85.4K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Cluster analysis
146.5K papers, 2.9M citations
85% related
The Internet
213.2K papers, 3.8M citations
85% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023637
20221,420
2021996
20201,250
20191,341
20181,396