scispace - formally typeset
Search or ask a question
Topic

Crowdsourcing

About: Crowdsourcing is a research topic. Over the lifetime, 12889 publications have been published within this topic receiving 230638 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A computational modeling approach that combined several molecular features yielded a robust breast cancer prognostic model that was independently validated in a new patient data set and was described in a Research Article that described the winning model.
Abstract: Although they no longer live in the lab, scientific editors still enjoy doing experiments. The simultaneous publication of two unusual papers offered Science Translational Medicine ’s editors the chance to conduct an investigation into peer-review processes for competition-based crowdsourcing studies designed to address problems in biomedicine. In a Report by Margolin et al . (which was peer-reviewed in the traditional way), organizers of the Sage Bionetworks/DREAM Breast Cancer Prognosis Challenge (BCC) describe the contest’s conception, execution, and insights derived from its outcome. In the companion Research Article, Cheng et al . outline the development of the prognostic computational model that won the Challenge. In this experiment in scientific publishing, the rigor of the Challenge design and scoring process formed the basis for a new style of publication peer review. DREAM—Dialogue for Reverse Engineering Assessments and Methods—conducts a variety of computational Challenges with the goal of catalyzing the “interaction between theory and experiment, specifically in the area of cellular network inference and quantitative model building in systems biology.” Previous Challenges involved, for example, modeling of protein-protein interactions for binding domains and peptides and the specificity of transcription factor binding. In the BCC—which was a step in the translational direction—participants competed to create an algorithm that could predict, more accurately than current benchmarks, the prognosis of breast cancer patients from clinical information (age, tumor size, histological grade), genome-scale tumor mRNA expression data, and DNA copy number data. Participants were given Web access to such data for 1981 women diagnosed with breast cancer and used it to train computational models that were then submitted to a common, open-access computational platform as re-runnable source code. The predictive value of each model was assessed in real-time by calculating a concordance index (CI) of predicted death risks compared to overall survival in a held-out data set, and CIs were posted on a public leaderboard. The winner of the Challenge was ultimately determined when a select group of top models were validated in a new breast cancer data set. The winning model, described by Cheng et al ., was based on sets of genes (signatures)—called attractor metagenes—that the same research group had previously shown to be associated, in various ways, with multiple cancer types. Starting with these gene sets and some other clinical and molecular features, the team modeled various feature combinations, selecting ones that improved performance of their prognostic model until they ultimately fashioned the winning algorithm. Before the BCC was initiated, Challenge organizers approached Science Translational Medicine about the possibility of publishing a Research Article that described the winning model. The Challenge prize would be a scholarly publication—a form of “academic currency.” The editors pondered whether winning the Challenge, with its built-in transparency and check on model reproducibility, would be sufficient evidence in support of the model’s validity to substitute for traditional peer review. Because the specific conditions of a Challenge are critical in determining the meaningfulness of the outcome, the editors felt it was not. Thus, they chose to arrange for peer-reviewers, chosen by the editors, to be embedded within the challenge process, as members of the organizing team—a so-called Challenge-assisted review. The editors also helped to develop criteria for determining the winning model, and if the criteria were not met, there would have been no winner—and no publication. Last, the manuscript was subjected to advisory peer review after it was submitted to the journal. So what new knowledge was gained about reviewing an article in which the result is an active piece of software? Reviewing such a model required that referees have access to the data and platform used for the Challenge and have the ability to re-run each participant’s code; in the context of the BCC, this requirement was easily achievable, because Challenge-partner Sage Bionetworks had created a platform (Synapse) with this goal in mind. In fact, both the training and validation datasets for the BCC are available to readers via links into Synapse (for a six month period of time). In general, this requirement should not be an obstacle, as there are code-hosting sites such as GitHub and TopCoder.com that can accommodate data sharing. Mechanisms for confidentiality would need to be built into any computational platform to be used for peer review. Finally, because different conventions are used in divergent scientific fields, communicating the science to an interdisciplinary audience is not a trivial endeavor. The architecture of the Challenge itself is critical in determining the real-world importance of the result. The question to be investigated must be framed so as to capture a significant outcome. In the BCC, participants’ models had to score better than a set of 60 different prognostic models developed by a team of expert programmers during a Challenge precompetition as well as a previously described first-generation 70-gene risk predictor. Thus, the result may or may not be superior to existing gene expression profiling tests used in clinical practice. This remains to be tested. It also remains to be seen whether prize-based crowdsourcing contests can make varied and practical contributions in the clinic. Indeed, DREAM and Sage Bionetworks have immediate plans to collaborate on new clinically relevant Challenges. But there is no doubt that the approach has value in solving big-data problems. For example, in a recent contest, non-immunologists generated a method for annotating the complex genome sequence of the antibody repertoire when the contest organizers translated the problem into generic language. In the BCC, the Challenge winners used a mathematical approach to identify biological modules that might, with continued investigation, teach us something about cancer biology. These examples support the notion that harnessing the expertise of contestants outside of traditional biological disciplines may be a powerful way to accelerate the translation of biomedical science to the clinic.

129 citations

Proceedings Article
21 May 2012
TL;DR: It is argued that a bootstrapping approach comprising state-of-the-art NLP tools for parsing and semantic interpretation, in combination with a wiki-like interface for collaborative annotation of experts, and a game with a purpose for crowdsourcing, are the starting ingredients for fulfilling this enterprise.
Abstract: What would be a good method to provide a large collection of semantically annotated texts with formal, deep semantics rather than shallow? We argue that a bootstrapping approach comprising state-of-the-art NLP tools for parsing and semantic interpretation, in combination with a wiki-like interface for collaborative annotation of experts, and a game with a purpose for crowdsourcing, are the starting ingredients for fulfilling this enterprise. The result is a semantic resource that anyone can edit and that integrates various phenomena, including predicate-argument structure, scope, tense, thematic roles, rhetorical relations and presuppositions, into a single semantic formalism: Discourse Representation Theory. Taking texts rather than sentences as the units of annotation results in deep semantic representations that incorporate discourse structure and dependencies. To manage the various (possibly conflicting) annotations provided by experts and non-experts, we introduce a method that stores " Bits of Wisdom " in a database as stand-off annotations.

128 citations

Journal ArticleDOI
TL;DR: A task-based taxonomy is developed whose specific intent is the classification of approaches in terms of the types of tasks for which they are best suited, so that one should be able to determine which crowdsourcing approach is most suitable for a particular task situation.
Abstract: Although a great many different crowdsourcing approaches are available to those seeking to accomplish individual or organizational tasks, little research attention has yet been given to characterizing how those approaches might be based on task characteristics. To that end, we conducted an extensive review of the crowdsourcing landscape, including a look at what types of taxonomies are currently available. Our review found that no taxonomy explored the multidimensional nature of task complexity. This paper develops a taxonomy whose specific intent is the classification of approaches in terms of the types of tasks for which they are best suited. To develop this task-based taxonomy, we followed an iterative approach that considered over 100 well-known examples of crowdsourcing. The taxonomy considers three dimensions of task complexity: (a) task structure - is the task well-defined, or does it require a more open-ended solution; (2) task interdependence - can the task be solved by an individual, or does it require a community of problem solvers; and (3) task commitment - what level of commitment is expected from crowd members? Based on this taxonomy, we identify seven categories of crowdsourcing and discuss prototypical examples of each approach. Furnished with such an understanding, one should be able to determine which crowdsourcing approach is most suitable for a particular task situation.

128 citations

Journal ArticleDOI
TL;DR: In this paper, decision-theoretic techniques for the problem of optimizing workflows used in crowdsourcing are presented, where AI agents that use Bayesian network learning and inference in combination with Partially-Observable Markov Decision Processes (POMDPs) for obtaining excellent cost-quality tradeoffs.

128 citations

Journal ArticleDOI
01 Aug 2014
TL;DR: GMission is introduced, a general spatial crowdsourcing platform, which features with a collection of novel techniques, including geographic sensing, worker detection, and task recommendation, and the sketch of system architecture is introduced.
Abstract: As one of the successful forms of using Wisdom of Crowd, crowdsourcing, has been widely used for many human intrinsic tasks, such as image labeling, natural language understanding, market predication and opinion mining. Meanwhile, with advances in pervasive technology, mobile devices, such as mobile phones and tablets, have become extremely popular. These mobile devices can work as sensors to collect multimedia data(audios, images and videos) and location information. This power makes it possible to implement the new crowdsourcing mode: spatial crowdsourcing. In spatial crowdsourcing, a requester can ask for resources related a specific location, the mobile users who would like to take the task will travel to that place and get the data. Due to the rapid growth of mobile device uses, spatial crowdsourcing is likely to become more popular than general crowdsourcing, such as Amazon Turk and Crowdflower. However, to implement such a platform, effective and efficient solutions for worker incentives, task assignment, result aggregation and data quality control must be developed.In this demo, we will introduce gMission, a general spatial crowdsourcing platform, which features with a collection of novel techniques, including geographic sensing, worker detection, and task recommendation. We introduce the sketch of system architecture and illustrate scenarios via several case analysis.

127 citations


Network Information
Related Topics (5)
Social network
42.9K papers, 1.5M citations
87% related
User interface
85.4K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Cluster analysis
146.5K papers, 2.9M citations
85% related
The Internet
213.2K papers, 3.8M citations
85% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023637
20221,420
2021996
20201,250
20191,341
20181,396