scispace - formally typeset
Search or ask a question

Showing papers on "Crowdsourcing published in 2023"


Journal ArticleDOI
TL;DR: In this paper , a many-objective worker selection method is proposed to achieve the desired tradeoff and an optimization mechanism is designed based on the enhanced differential evolution algorithm to ensure data integrity and search solution optimality.
Abstract: With the development of mobile networks and intelligent equipment, as a new intelligent data sensing paradigm in large-scale sensor applications such as the industrial Internet of Things, mobile crowd sensing (MCS) assigns industrial sensing tasks to workers for data collection and sharing, which has created a bright future for building a strong industrial system and improving industrial services. How to design an effective worker selection mechanism to maximize the utility of crowdsourcing is the research hotspot of mobile sensing technologies. This article studies the problem of least workers selection to make large MCS system perform sensing tasks more effective and achieve certain coverage with certain constraints being meeting. A many-objective worker selection method is proposed to achieve the desired tradeoff and an optimization mechanism is designed based on the enhanced differential evolution algorithm to ensure data integrity and search solution optimality. The effectiveness of the proposed method is verified through a large scale of experimental evaluation datasets collected from real world.

18 citations


Proceedings ArticleDOI
19 Apr 2023
TL;DR: The authors used OpenAI's GPT-3 model to generate open-ended questionnaire responses about experiencing video games as art, a topic not tractable with traditional computational user models, and test whether synthetic responses can be distinguished from real responses and investigate content similarities between synthetic and real data.
Abstract: Collecting data is one of the bottlenecks of Human-Computer Interaction (HCI) research. Motivated by this, we explore the potential of large language models (LLMs) in generating synthetic user research data. We use OpenAI’s GPT-3 model to generate open-ended questionnaire responses about experiencing video games as art, a topic not tractable with traditional computational user models. We test whether synthetic responses can be distinguished from real responses, analyze errors of synthetic data, and investigate content similarities between synthetic and real data. We conclude that GPT-3 can, in this context, yield believable accounts of HCI experiences. Given the low cost and high speed of LLM data generation, synthetic data should be useful in ideating and piloting new experiments, although any findings must obviously always be validated with real data. The results also raise concerns: if employed by malicious users of crowdsourcing services, LLMs may make crowdsourcing of self-report data fundamentally unreliable.

12 citations


Posted ContentDOI
24 Jan 2023
TL;DR: In this article , the authors assess the feasibility of using ChatGPT or a similar AI-based chatbot for patient-provider communication and find that patients' ability to distinguish between provider and chatbot responses is not well established.
Abstract: Abstract Importance Chatbots could play a role in answering patient questions, but patients’ ability to distinguish between provider and chatbot responses, and patients’ trust in chatbots’ functions are not well established. Objective To assess the feasibility of using ChatGPT or a similar AI-based chatbot for patient-provider communication. Design Survey in January 2023 Setting Survey Participants A US representative sample of 430 study participants aged 18 and above was recruited on Prolific, a crowdsourcing platform for academic studies. 426 participants filled out the full survey. After removing participants who spent less than 3 minutes on the survey, 392 respondents remained. 53.2% of respondents analyzed were women; their average age was 47.1. Exposure(s) Ten representative non-administrative patient-provider interactions were extracted from the EHR. Patients’ questions were placed in ChatGPT with a request for the chatbot to respond using approximately the same word count as the human provider’s response. In the survey, each patient’s question was followed by a provider- or ChatGPT-generated response. Participants were informed that five responses were provider-generated and five were chatbot-generated. Participants were asked, and incentivized financially, to correctly identify the response source. Participants were also asked about their trust in chatbots’ functions in patient-provider communication, using a Likert scale of 1-5. Main Outcome(s) and Measure(s) Main outcome: Proportion of responses correctly classified as provider- vs chatbot-generated. Secondary outcomes: Average and standard deviation of responses to trust questions. Results The correct classification of responses ranged between 49.0% to 85.7% for different questions. On average, chatbot responses were correctly identified 65.5% of the time, and provider responses were correctly distinguished 65.1% of the time. On average, responses toward patients’ trust in chatbots’ functions were weakly positive (mean Likert score: 3.4), with lower trust as the health-related complexity of the task in questions increased. Conclusions and Relevance ChatGPT responses to patient questions were weakly distinguishable from provider responses. Laypeople appear to trust the use of chatbots to answer lower risk health questions. It is important to continue studying patient-chatbot interaction as chatbots move from administrative to more clinical roles in healthcare. Conclusions and Relevance AI in Medicine; ChatGPT; Generative AI; Healthcare AI; Turing Test;

7 citations


Journal ArticleDOI
TL;DR: In this article , the authors provide a comprehensive study to assess the quality of attribute (descriptive) data of the building stock mapped globally, e.g. building function, which are key ingredients in many analyses and simulations in the built environment.

5 citations


Journal ArticleDOI
TL;DR: Li et al. as discussed by the authors developed a collusion resistant scheme that ensures no coalition of weighted cardinality can improve its group utility by coordinating the bids at a probability of $p$
Abstract: In the wake of the Web 2.0, crowdsourcing has emerged as a promising approach to maintain a flexible workforce for human intelligence tasks. To stimulate worker participation, many reverse auction-based incentive mechanisms have been proposed. Designing auctions that discourage workers from cheating and instead encouraging them to reveal their true cost information has drawn significant attention. However, the existing efforts have been focusing on tackling individual cheating misbehaviors, while the scenarios that workers strategically form collusion coalitions and rig their bids together to manipulate auction outcomes have received little attention. To fill this gap, in this work we develop a $(t,p)$ -collusion resistant scheme that ensures no coalition of weighted cardinality $t$ can improve its group utility by coordinating the bids at a probability of $p$ . This paper takes into account the unique features of crowdsourcing, such as diverse worker types and reputations, in the design. The proposed scheme can suppress a broad spectrum of collusion strategies. Besides, desirable properties, including $p$ -truthfulness and $p$ -individual rationality, are also achieved. To provide a comprehensive evaluation, we first analytically prove our scheme's collusion resistance and then experimentally verify our analytical conclusion using a real-world dataset. Our experimental results show that the baseline scheme, where none of the critical properties is guaranteed, costs up to 20.1 times the optimal payment in an ideal case where no collusion exists, while our final scheme is merely 4.9 times the optimal payment.

4 citations


Journal ArticleDOI
TL;DR: Lauraset et al. as mentioned in this paper proposed a full-level fused cross-task transfer learning method for building change detection using only crowdsourced building labels and high-resolution satellite imagery.

4 citations


Journal ArticleDOI
TL;DR: In this article , a hybrid framework that crowd-sources vehicles' drivers and UAVs for delivery tasks is proposed, which aims at improving the delivery time and the allocation percentage mainly at peak hours, by the proposed task filtering, worker quality estimation, and task allocation and scheduling.

3 citations


Journal ArticleDOI
TL;DR: Li et al. as discussed by the authors proposed a label confidence-based noise correction (LCNC) method, which calculates the label confidence of each instance using its multiple noisy labels to filter all instances and obtains an original clean set and noise set.

3 citations


Journal ArticleDOI
TL;DR: In this article , the authors propose a multi-level framework that shows how federated learning, IoT, and crowdsourcing can come hand-in-hand with each other to make a robust ecosystem of multilevel federated Learning for Industry 4.0.

3 citations


Journal ArticleDOI
TL;DR: In this paper , the authors study Europe’s largest hackathon, Junction, to better understand the distinct crowdsourcing properties and mechanisms of this type of hackathon as a form of tournament-based crowdsourcing.
Abstract: Hackathons are time-bounded crowdsourcing events, which have recently prospered in many technology and science domains across the globe. We study Europe’s largest hackathon, Junction, to better understand the distinct crowdsourcing properties and mechanisms of this type of hackathon as a form of tournament-based crowdsourcing. Moreover, we determined how they add value to attending companies and participants. In this regard, six qualitative and quantitative datasets from participants, companies, and the organizer were collected and analyzed. Our findings revealed five distinct crowdsourcing properties and mechanisms of mega hackathons, including intricate crowd selection, strong crowd vibe, instant crowd feedback, versatile crowd potential, and pervasive information technology. Based on our findings, we argue that these properties and mechanisms increase the possibility of finding innovative solutions to companies’ problems in Junction-like mega hackathons. This article concludes with managerial implications for companies to consciously plan and prepare while knowing what to expect during the hackathon.

3 citations


Journal ArticleDOI
TL;DR: In this paper , a human-intelligence-enabled crowdsourcing application was combined with an AI-enabled IIoT framework to capture events and objects from industrial IoT data in real time.
Abstract: Recent advancements of the Industrial Internet of Things (IIoT) have revolutionized modern urbanization and smart cities. While IIoT data contain rich events and objects of interest, processing a massive amount of IIoT data and making predictions in real-time are challenging. Recent advancements in artificial intelligence (AI) allow processing such a massive amount of IIoT data and generating insights for further decision-making processes. In this article, we propose several key aspects of AI-enabled IIoT data for smart city monitoring. First, we have combined a human-intelligence-enabled crowdsourcing application with that of an AI-enabled IIoT framework to capture events and objects from IIoT data in real time. Second, we have combined multiple AI algorithms that can run on distributed edge and cloud nodes to automatically categorize the captured events and objects and generate analytics, reports, and alerts from the IIoT data in real time. The results can be utilized in two scenarios. In the first scenario, the smart city authority can authenticate the AI-processed events and assign these events to the appropriate authority for managing the events. In the second scenario, the AI algorithms are allowed to interact with humans or IIoT for further processes. Finally, we will present the implementation details of the scenarios mentioned above and the test results. The test results show that the framework has the potential to be deployed within a smart city.

Posted ContentDOI
27 Mar 2023
TL;DR: This paper showed that ChatGPT outperforms crowd-workers for several annotation tasks, including relevance, stance, topics, and frames detection, and showed the potential of large language models to drastically increase the efficiency of text classification.
Abstract: Many NLP applications require manual data annotations for a variety of tasks, notably to train classifiers or evaluate the performance of unsupervised models. Depending on the size and degree of complexity, the tasks may be conducted by crowd-workers on platforms such as MTurk as well as trained annotators, such as research assistants. Using a sample of 2,382 tweets, we demonstrate that ChatGPT outperforms crowd-workers for several annotation tasks, including relevance, stance, topics, and frames detection. Specifically, the zero-shot accuracy of ChatGPT exceeds that of crowd-workers for four out of five tasks, while ChatGPT's intercoder agreement exceeds that of both crowd-workers and trained annotators for all tasks. Moreover, the per-annotation cost of ChatGPT is less than $0.003 -- about twenty times cheaper than MTurk. These results show the potential of large language models to drastically increase the efficiency of text classification.

Journal ArticleDOI
TL;DR: The authors summarized the most important aspects of user studies and their design and evaluation, providing direct links to NLP tasks and NLP-specific challenges where appropriate, and outlined general study design, ethical considerations, and factors to consider for crowdsourcing.
Abstract: Many research topics in natural language processing (NLP), such as explanation generation, dialog modeling, or machine translation, require evaluation that goes beyond standard metrics like accuracy or F1 score toward a more human-centered approach. Therefore, understanding how to design user studies becomes increasingly important. However, few comprehensive resources exist on planning, conducting, and evaluating user studies for NLP, making it hard to get started for researchers without prior experience in the field of human evaluation. In this paper, we summarize the most important aspects of user studies and their design and evaluation, providing direct links to NLP tasks and NLP-specific challenges where appropriate. We (i) outline general study design, ethical considerations, and factors to consider for crowdsourcing, (ii) discuss the particularities of user studies in NLP, and provide starting points to select questionnaires, experimental designs, and evaluation methods that are tailored to the specific NLP tasks. Additionally, we offer examples with accompanying statistical evaluation code, to bridge the gap between theoretical guidelines and practical applications.

Journal ArticleDOI
TL;DR: In this paper , the authors explore the innovation orientation and performance relationship, the practices of companies with high innovation orientations, and the differences in innovation orientation scores of companies across the world.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed and developed a worker recruitment game in socially aware mobile crowdsourcing, where workers randomly propagate task invitations to social neighbors, and receivers independently make a decision whether to accept or not.
Abstract: With the increasing prominence of smart mobile devices, an innovative distributed computing paradigm, namely Mobile Crowdsourcing (MCS), has emerged. By directly recruiting skilled workers, MCS exploits the power of the crowd to complete location-dependent tasks. Currently, based on online social networks, a new and complementary worker recruitment mode, i.e., socially aware MCS, has been proposed to effectively enlarge worker pool and enhance task execution quality, by harnessing underlying social relationships. In this paper, we propose and develop a novel worker recruitment game in socially aware MCS, i.e., A cceptance-aware W orker R ecruitment ( AWR ). To accommodate MCS task invitation diffusion over social networks, we design a Random Diffusion model, where workers randomly propagate task invitations to social neighbors, and receivers independently make a decision whether to accept or not. Based on the diffusion model, we formulate the AWR game as a combinatorial optimization problem, which strives to search a subset of seed workers to maximize overall task acceptance under a pre-given incentive budget. We prove its NP hardness, and devise a meta-heuristic-based evolutionary approach named MA-RAWR to balance exploration and exploitation during the search process. Comprehensive experiments using two real-world data sets clearly validate the effectiveness and efficiency of our proposed approach.


Journal ArticleDOI
TL;DR: In this paper , a large sample of customers' reviews across multiple retailers reveals that customers exhibit higher appraisal levels of timeliness, price, and reliability of delivery services when CD is used.
Abstract: Thanks to increased technological advancements, retailers have progressively incorporated crowdsourcing into their delivery service portfolios to offer customers an enhanced last-mile delivery experience. Yet, while studies have explored the unique operational attributes of the crowdsourced delivery (CD) model in online retailing, the literature remains scant on how customers respond to the usage of this emerging delivery service. Building on the cognitive appraisal theory and e-Logistics Service Quality (e-LSQ) literatures, this study applies middle-range theorizing to examine differences between customers' appraisals of e-LSQ dimensions of CD and traditional delivery methods, and what types of products being delivered make such differences more pronounced. Our analysis of a large sample of customers' reviews across multiple retailers reveals that customers exhibit higher appraisal levels of timeliness, price, and reliability of delivery services when CD is used. Results also indicate that appraisals are more pronounced for timeliness and price of deliveries of high-turnover products that require minimal time and effort to purchase. Our findings, as such, underscore the power of CD as a tool to enhance customer experience and unveil potential opportunities for effective CD use in customer segmentation strategies.

Journal ArticleDOI
TL;DR: Jin et al. as discussed by the authors developed an efficient solution approach that incorporates scenario-based and time-based decomposition techniques, which outperforms a commercial solver in solution quality and computational time for solving large-scale problem instances based on real data.
Abstract: Problem definition: Shared micromobility vehicles provide an eco-friendly form of short-distance travel within an urban area. Because customers pick up and drop off vehicles in any service region at any time, such convenience often leads to a severe imbalance between vehicle supply and demand in different service regions. To overcome this, a micromobility operator can crowdsource individual riders with reward incentives in addition to engaging a third-party logistics provider (3PL) to relocate the vehicles. Methodology/results: We construct a time-space network with multiple service regions and formulate a two-stage stochastic mixed-integer program considering uncertain customer demands. In the first stage, the operator decides the initial vehicle allocation for the regions, whereas in the second stage, the operator determines subsequent vehicle relocation across the regions over an operational horizon. We develop an efficient solution approach that incorporates scenario-based and time-based decomposition techniques. Our approach outperforms a commercial solver in solution quality and computational time for solving large-scale problem instances based on real data. Managerial implications: The budgets for acquiring vehicles and for rider crowdsourcing significantly impact the vehicle initial allocation and subsequent relocation. Introducing rider crowdsourcing in addition to the 3PL can significantly increase profit, reduce demand loss, and improve the vehicle utilization rate of the system without affecting any existing commitment with the 3PL. The 3PL is more efficient for mass relocation than rider crowdsourcing, whereas the latter is more efficient in handling sporadic relocation needs. To serve a region, the 3PL often relocates vehicles in batches from faraway, low-demand regions around peak hours of a day, whereas rider crowdsourcing relocates a few vehicles each time from neighboring regions throughout the day. Furthermore, rider crowdsourcing relocates more vehicles under a unimodal customer arrival pattern than a bimodal pattern, whereas the reverse holds for the 3PL. Funding: This work was supported by the Research Grants Council of Hong Kong [Grants 15501319 and 15505318] and the National Natural Science Foundation of China [Grant 71931009]. Z. Jin was supported by the Hong Kong PhD Fellowship Scheme. Y. F. Lim was supported by the Lee Kong Chian School of Business, Singapore Management University [Maritime and Port Authority Research Fellowship]. Supplemental Material: The online appendices are available at https://doi.org/10.1287/msom.2023.1199 .

Journal ArticleDOI
TL;DR: In this article , a multi-level policy framework for sustainable entrepreneurship is proposed, including resource prioritization, competency building, sustainable market creation, networked sharing, collaborative replication, and impact reorientation.

Journal ArticleDOI
TL;DR: In this paper , a graph optimized data offloading algorithm leveraging a crowd-AI hybrid method is proposed to minimize the offloading cost and ensure the reliable urban tracking result in intelligent transportation systems.
Abstract: Urban tracking plays a vital role for people’s urban life in intelligent transportation systems, e.g., public safety, case investigation, finding missing items, etc. However, the current tracking methods consume a large amount of communication and computing resources since they mainly offload all related sensing data, i.e., videos, generated by widely deployed cameras to the cloud where data are stored, processed, and analyzed. In this paper, we propose a graph optimized data offloading algorithm leveraging a crowd-AI hybrid method to minimize the data offloading cost and ensure the reliable urban tracking result. To be specific, we first formulate a crowd-AI hybrid urban tracking scenario, and prove the proposed data offloading problem in this scenario is NP-hard. Then, we solve it by decomposing the problem into two parts, i.e., trajectory prediction and task allocation. The trajectory prediction algorithm, leveraging the state graph, computes possible tracking areas of the target object, and the task allocation algorithm, using the dependency graph, chooses the optimal set of crowds and cameras to cover the tracking area while minimizing the data offloading cost separately. Finally, the extensive simulations with large real world data set are conducted showing that the proposed algorithm outperforms benchmarks in reducing data offloading cost while ensuring the tracking success rate in intelligent transportation systems.

Journal ArticleDOI
M. P. SANTOS1
TL;DR: Wang et al. as mentioned in this paper developed a novel MCS framework, called GDA-Crowd(Group effect-based Data Aggregation), which consists of three parts: location obfuscation and aggregation, group effect based data privacy, and incentive mechanism.
Abstract: In mobile crowdsensing systems, the public crowd are required to report data with actual locations under location privacy vulnerabilities. Moreover, even sensing data itself further deepens location privacy breaches. Existing works allow each worker to consider his own privacy, but the accumulated privacy budget will lower down group data privacy of each sensing region. Moreover, multi-region spatial data correlations indicate that multi-group correlated data privacy may be leaked from each other. To this end, we develop a novel MCS framework, called GDA-Crowd(Group effect-based Data Aggregation), which consists of three parts: location obfuscation and aggregation, group effect-based data privacy and aggregation, and incentive mechanism. We start from individual location privacy guarantee and propose a location aggregation method to cluster workers into groups. Then, we exploit intra-group effect, i.e., data privacy interdependence under the judicious selection of workers’ participation, to enhance privacy-accuracy balance. Moreover, multi-group global histogram incorporates inter-group effect, i.e., correlated privacy loss from spatial data correlations, into inter-group data aggregation. Finally, we design a truthful, individually rational and computationally efficient incentive mechanism for participant selection. The synopsis of contributions includes dual privacy protection, dual group effect for desirable privacy-accuracy tradeoff, synergies among incentive mechanism, privacy-preserving data aggregation for approximate optimality. Theoretical analysis and extensive experiments validate our effectiveness and superiority.


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a testing assistance approach, which leverages Android automated testing (i.e., dynamic and static analysis) to improve the quality of crowdsourced testing.
Abstract: Crowdsourced testing is an emerging trend in mobile application testing. The openness of crowdsourced testing provides a promising way to conduct large-scale and user-oriented testing scenarios on various mobile devices, while it also brings a problem, i.e., crowdworkers with different levels of testing experience severely threaten the quality of crowdsourced testing. Currently, many approaches have been proposed and studied to improve crowdsourced testing. However, these approaches do not fundamentally improve the ability of crowdworkers. In essence, the low-quality crowdsourced testing is caused by crowdworkers who are unfamiliar with the App Under Test (AUT) and do not know which part of the AUT should be tested. To address this problem, we propose a testing assistance approach, which leverages Android automated testing (i.e., dynamic and static analysis) to improve crowdsourced testing. Our approach constructs an Annotated Window Transition Graph (AWTG) model for the AUT by merging dynamic and static analysis results. Based on the AWTG model, our approach implements a testing assistance pipeline that provides the test task extraction, test task recommendation, and test task guidance to assist crowdworkers in testing the AUT. We experimentally evaluate our approach on real-world AUTs. The quantitative results demonstrate that our approach can effectively and efficiently assist crowdsourced testing. Besides, the qualitative results from a user study confirm the usefulness of our approach.

Journal ArticleDOI
TL;DR: In this article , a new image-grounded dataset, Visual Writing Prompts (VWP), is presented, which contains almost 2k selected sequences of movie shots, each including 5-10 images, aligned with a total of 12k stories which were collected via crowdsourcing given the image sequences and a set of grounded characters from the corresponding image sequence.
Abstract: Current work on image-based story generation suffers from the fact that the existing image sequence collections do not have coherent plots behind them. We improve visual story generation by producing a new image-grounded dataset, Visual Writing Prompts (VWP). VWP contains almost 2K selected sequences of movie shots, each including 5-10 images. The image sequences are aligned with a total of 12K stories which were collected via crowdsourcing given the image sequences and a set of grounded characters from the corresponding image sequence. Our new image sequence collection and filtering process has allowed us to obtain stories that are more coherent, diverse, and visually grounded compared to previous work. We also propose a character-based story generation model driven by coherence as a strong baseline. Evaluations show that our generated stories are more coherent, visually grounded, and diverse than stories generated with the current state-of-the-art model. Our code, image features, annotations and collected stories are available at https://vwprompt.github.io/.

Journal ArticleDOI
TL;DR: In this paper , the authors developed a model that explains why and how managers in large organizations implement and design internal crowdsourcing, and identified three key reasons that underpin managers' decision to implement an internal CrowdSourcing platform: problem-stimulated, opportunity-driven, and legitimacy-seeking.
Abstract: Internal crowdsourcing platforms enable firms to involve a wider crowd of employees beyond the R&D department in the generation and development of ideas for innovation. Prior studies emphasize firm-level functional benefits, e.g., accessing larger and more diverse sets of ideas, competences, and expertise as well as reducing the costs for innovation. Inspired by the behavioral theory of the firm, we depart from such functionalist approach and develop a model that explains why and how managers in large organizations implement and design internal crowdsourcing. Drawing on a qualitative, multiple case study of five large organizations, we identify three key motives that underpin managers’ decision to implement an internal crowdsourcing platform: problem-stimulated, opportunity-driven, and legitimacy-seeking. Next, we discuss how such different motives drive managers’ decisions regarding their design of the internal crowdsourcing initiative. Broadly, our results help explain why firms continue to invest in crowdsourcing initiatives despite meager results and how managers’ cognitive frameworks impact heterogeneity in the design and implementation of crowdsourcing initiatives. This article contributes to the crowdsourcing literature by providing a more multifaceted picture of internal crowdsourcing than previous research has suggested. From a practitioners’ perspective, awareness and recognition by managers of the potential pitfalls in implementing and designing these platforms stand out as an important first step toward improving the effectiveness of internal crowdsourcing.

Journal ArticleDOI
TL;DR: Deep Gaussian Processes for Crowdsourcing (DGPCR) as mentioned in this paper was proposed to model the crowdsourcing problem with DGPs for the first time, and the behavior of each annotator is modeled with a confusion matrix among classes.
Abstract: Machine learning (ML) methods often require large volumes of labeled data to achieve meaningful performance. The expertise necessary for labeling data in medical applications like pathology presents a significant challenge in developing clinical-grade tools. Crowdsourcing approaches address this challenge by collecting labels from multiple annotators with varying degrees of expertise. In recent years, multiple methods have been adapted to learn from noisy crowdsourced labels. Among them, Gaussian Processes (GPs) have achieved excellent performance due to their ability to model uncertainty. Deep Gaussian Processes (DGPs) address the limitations of GPs using multiple layers to enable the learning of more complex representations. In this work, we develop Deep Gaussian Processes for Crowdsourcing (DGPCR) to model the crowdsourcing problem with DGPs for the first time. DGPCR models the (unknown) underlying true labels, and the behavior of each annotator is modeled with a confusion matrix among classes. We use end-to-end variational inference to estimate both DGPCR parameters and annotator biases. Using annotations from 25 pathologists and medical trainees, we show that DGPCR is competitive or superior to Scalable Gaussian Processes for Crowdsourcing (SVGPCR) and other state-of-the-art deep-learning crowdsourcing methods for breast cancer classification. Also, we observe that DGPCR with noisy labels obtains better results ( $\text{F}1=81.91$ %) than GPs ( $\text{F}1=81.57$ %) and deep learning methods ( $\text{F}1=80.88$ %) with true labels curated by experts. Finally, we show an improved estimation of annotators’ behavior.

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper designed, implemented and evaluated PPTA, a new system framework for location privacy-preserving task assignment in spatial crowdsourcing with strong security guarantees, which takes advantage of lightweight cryptography (such as additive secret sharing, function secret sharing and secure shuffle), and provides a suite of tailored secure components required by practical location-based task assignment processes.
Abstract: With the rapid growth of sensor-rich mobile devices, spatial crowdsourcing (SC) has emerged as a new crowdsourcing paradigm harnessing the crowd to perform location-dependent tasks. To appropriately select workers that are near the tasks, SC systems need to perform location-based task assignment, which requires collecting worker locations and task locations. Such practice, however, may easily compromise the location privacy of workers. In light of this, in this paper, we design, implement, and evaluate PPTA, a new system framework for location privacy-preserving task assignment in SC with strong security guarantees. PPTA takes advantage of only lightweight cryptography (such as additive secret sharing, function secret sharing, and secure shuffle), and provides a suite of tailored secure components required by practical location-based task assignment processes. Specifically, aiming for practical usability, PPTA is designed to flexibly support two realistic task assignment settings: (i) the online setting where tasks arrive and get processed at the SC platform one by one, and (ii) the batch-based setting where tasks arrive and get processed in a batch. Extensive experiments over a real-world dataset demonstrate that while providing strong security guarantees, PPTA supports task assignment with efficacy comparable to plaintext baselines and with promising performance.

Journal ArticleDOI
TL;DR: CD-CODE as mentioned in this paper is a community-editable platform that includes a database of biomolecular condensates based on the literature, an encyclopedia of relevant scientific terms and a crowdsourcing web application.
Abstract: Abstract The discovery of biomolecular condensates transformed our understanding of intracellular compartmentalization of molecules. To integrate interdisciplinary scientific knowledge about the function and composition of biomolecular condensates, we developed the crowdsourcing condensate database and encyclopedia ( cd-code.org ). CD-CODE is a community-editable platform, which includes a database of biomolecular condensates based on the literature, an encyclopedia of relevant scientific terms and a crowdsourcing web application. Our platform will accelerate the discovery and validation of biomolecular condensates, and facilitate efforts to understand their role in disease and as therapeutic targets.

Journal ArticleDOI
TL;DR: In this article , the authors present a probabilistic modeling framework that draws on citizen science data from the eBird database to model the population flows of migratory birds, using GPS and satellite tracking data to tune and evaluate model performance.
Abstract: Large-scale monitoring of seasonal animal movement is integral to science, conservation and outreach. However, gathering representative movement data across entire species ranges is frequently intractable. Citizen science databases collect millions of animal observations throughout the year, but it is challenging to infer individual movement behaviour solely from observational data. We present BirdFlow, a probabilistic modelling framework that draws on citizen science data from the eBird database to model the population flows of migratory birds. We apply the model to 11 species of North American birds, using GPS and satellite tracking data to tune and evaluate model performance. We show that BirdFlow models can accurately infer individual seasonal movement behaviour directly from eBird relative abundance estimates. Supplementing the model with a sample of tracking data from wild birds improves performance. Researchers can extract a number of behavioural inferences from model results, including migration routes, timing, connectivity and forecasts. The BirdFlow framework has the potential to advance migration ecology research, boost insights gained from direct tracking studies and serve a number of applied functions in conservation, disease surveillance, aviation and public outreach.

Journal ArticleDOI
TL;DR: In this paper , the authors present a new approach for the analysis of social media posts, based on configurable automatic classification combined with citizen science methodologies, facilitated by a set of flexible, automatic and open-source data processing tools called the Citizen Science Solution Kit.
Abstract: Social media have the potential to provide timely information about emergency situations and sudden events. However, finding relevant information among the millions of posts being added every day can be difficult, and in current approaches developing an automatic data analysis project requires time and technical skills. This work presents a new approach for the analysis of social media posts, based on configurable automatic classification combined with Citizen Science methodologies. The process is facilitated by a set of flexible, automatic and open-source data processing tools called the Citizen Science Solution Kit. The kit provides a comprehensive set of tools that can be used and personalized in different situations, particularly during natural emergencies, starting from images and text contained in the posts. The tools can be employed by citizen scientists for filtering, classifying, and geolocating the content with a human-in-the-loop approach to support the data analyst, including feedback and suggestions on how to configure the automated tools, and techniques to gather inputs from citizens. Using flooding scenario as a guiding example, this paper illustrates the structure and functioning of the different tools proposed to support citizens scientists in their projects, and a methodological approach to their use. The process is then validated by discussing three case studies based on the Albania earthquake of 2019, the Covid-19 pandemic, and the Thailand floods of 2021. The results suggest that a flexible approach to tools composition and configuration can support a timely setup of an analysis project by citizen scientists, especially in case of emergencies in unexpected locations.