scispace - formally typeset
Search or ask a question

Showing papers on "Crowdsourcing published in 2022"


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a novel Privacy-protected Intelligent Crowdsourcing scheme based on Reinforcement Learning (PICRL), which optimizes the utility of the system considering the data amount, data quality, and costs at the same time.

32 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a novel Privacy-protected Intelligent Crowdsourcing scheme based on Reinforcement Learning (PICRL), which optimizes the utility of the system considering the data amount, data quality, and costs at the same time.

32 citations


Proceedings ArticleDOI
09 Jun 2022
TL;DR: A novel framework is introduced, CrowdWorkSheets, for dataset developers to facilitate transparent documentation of key decisions points at various stages of the data annotation pipeline: task formulation, selection of annotators, platform and infrastructure choices, dataset analysis and evaluation, and dataset release and maintenance.
Abstract: Human annotated data plays a crucial role in machine learning (ML) research and development. However, the ethical considerations around the processes and decisions that go into dataset annotation have not received nearly enough attention. In this paper, we survey an array of literature that provides insights into ethical considerations around crowdsourced dataset annotation. We synthesize these insights, and lay out the challenges in this space along two layers: (1) who the annotator is, and how the annotators’ lived experiences can impact their annotations, and (2) the relationship between the annotators and the crowdsourcing platforms, and what that relationship affords them. Finally, we introduce a novel framework, CrowdWorkSheets, for dataset developers to facilitate transparent documentation of key decisions points at various stages of the data annotation pipeline: task formulation, selection of annotators, platform and infrastructure choices, dataset analysis and evaluation, and dataset release and maintenance.

29 citations


Journal ArticleDOI
TL;DR: In this paper, the authors apply a crowdsourcing approach and interpret the semantics of reviews for the top-rated courses on Coursera.org to explore the answers of what makes a great MOOC? What makes a hit?
Abstract: MOOC platforms have seen significant membership growth in recent years. MOOCs are leading the education world that has been digitized, remote, and highly competitive, and the competition is intense in the MOOC world. Based on the observation for top-rated MOOCs, this study proposes a research question, “What makes a great MOOC? What makes a hit?” To explore the answers, this study applies a crowdsourcing approach and interprets the semantics of reviews for the top-rated courses on Coursera.org. The paper has multiple steps and findings relevant to MOOC programs at universities worldwide. First, through exploratory analysis of learner reviews and expert judgment, this study identifies two distinct course categories focusing on learners' outcome intent, namely knowledge-seeking MOOCs and skill-seeking MOOCs. Further, this study uses a topical ontology of keywords and sentiment techniques to derive the intent of learners based on their comments. Through sentiment analysis and correlation analysis, it shows that knowledge-seeking MOOCs are driven by the quality of course design and materials. Skill-seeking MOOCs are driven by the instructor and their ability to present lectures and integrate course materials and assignments. This crowdsourcing method obtains the insights from large samples of learners’ reviews without the priming or self-selection biases of open surveys or interviews. The findings demonstrate the effectiveness of leveraging online learner reviews and offer practical implications for what truly “makes a hit” for top-rated MOOCs.

28 citations


Journal ArticleDOI
01 Jan 2022
TL;DR: In this paper , the authors apply a crowdsourcing approach and interpret the semantics of reviews for the top-rated courses on Coursera.org to explore the answers of what makes a great MOOC? What makes a hit?
Abstract: MOOC platforms have seen significant membership growth in recent years. MOOCs are leading the education world that has been digitized, remote, and highly competitive, and the competition is intense in the MOOC world. Based on the observation for top-rated MOOCs, this study proposes a research question, “What makes a great MOOC? What makes a hit?” To explore the answers, this study applies a crowdsourcing approach and interprets the semantics of reviews for the top-rated courses on Coursera.org. The paper has multiple steps and findings relevant to MOOC programs at universities worldwide. First, through exploratory analysis of learner reviews and expert judgment, this study identifies two distinct course categories focusing on learners' outcome intent, namely knowledge-seeking MOOCs and skill-seeking MOOCs. Further, this study uses a topical ontology of keywords and sentiment techniques to derive the intent of learners based on their comments. Through sentiment analysis and correlation analysis, it shows that knowledge-seeking MOOCs are driven by the quality of course design and materials. Skill-seeking MOOCs are driven by the instructor and their ability to present lectures and integrate course materials and assignments. This crowdsourcing method obtains the insights from large samples of learners’ reviews without the priming or self-selection biases of open surveys or interviews. The findings demonstrate the effectiveness of leveraging online learner reviews and offer practical implications for what truly “makes a hit” for top-rated MOOCs.

27 citations


Journal ArticleDOI
TL;DR: In this article, a co-training-based label noise correction (CTNC) algorithm is proposed, where the weight is calculated from the information provided by the multiple noisy label sets for each instance.

27 citations


Journal ArticleDOI
TL;DR: In this paper , the authors review challenges in building sustainable relationships between the parties involved in crowdfunding and crowdsourcing projects, which are running in extreme situations, such as the COVID-19 pandemic.
Abstract: This research reviews challenges in building sustainable relationships between the parties involved in the crowdfunding and crowdsourcing projects, which are running in extreme situations, such as the COVID-19 pandemic. This study aims to solve problems that generate the crowdsourcing concerns and to find better alternatives to increase trust for crowdfunding among donors, as this impacts their strategic sustainability in the conditions of turbulence and COVID-induced financial crisis. It was found that factors influence donor decisions in different ways, yet the common tendency for donor activity is non-monotonicity. Future development in the field of sustainable relationships should focus on creating a donor classification system.

27 citations


Journal ArticleDOI
TL;DR: In this article , a co-training-based label noise correction (CTNC) algorithm is proposed, where the weight is calculated from the information provided by the multiple noisy label sets for each instance.

27 citations


Journal ArticleDOI
TL;DR: A novel collaborative framework for engaging crowds of medical students and pathologists to produce quality labels for cell nuclei and results indicate that even noisy algorithmic suggestions do not adversely affect pathologist accuracy and can help non-experts improve annotation quality.
Abstract: Abstract Background Deep learning enables accurate high-resolution mapping of cells and tissue structures that can serve as the foundation of interpretable machine-learning models for computational pathology. However, generating adequate labels for these structures is a critical barrier, given the time and effort required from pathologists. Results This article describes a novel collaborative framework for engaging crowds of medical students and pathologists to produce quality labels for cell nuclei. We used this approach to produce the NuCLS dataset, containing >220,000 annotations of cell nuclei in breast cancers. This builds on prior work labeling tissue regions to produce an integrated tissue region- and cell-level annotation dataset for training that is the largest such resource for multi-scale analysis of breast cancer histology. This article presents data and analysis results for single and multi-rater annotations from both non-experts and pathologists. We present a novel workflow that uses algorithmic suggestions to collect accurate segmentation data without the need for laborious manual tracing of nuclei. Our results indicate that even noisy algorithmic suggestions do not adversely affect pathologist accuracy and can help non-experts improve annotation quality. We also present a new approach for inferring truth from multiple raters and show that non-experts can produce accurate annotations for visually distinctive classes. Conclusions This study is the most extensive systematic exploration of the large-scale use of wisdom-of-the-crowd approaches to generate data for computational pathology applications.

26 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a differential privacy-based location protection (DPLP) scheme, which protects the location privacy of both workers and tasks, and achieves task allocation with high data utility.
Abstract: Spatial crowdsourcing (SC) is a location-based outsourcing service whereby SC-server allocates tasks to workers with mobile devices according to the locations outsourced by requesters and workers. Since location information contains individual privacy, the locations should be protected before being submitted to untrusted SC-server. However, the encryption schemes limit data availability, and existing differential privacy (DP) methods do not protect the tasks’ location privacy. In this paper, we propose a differential privacy-based location protection (DPLP) scheme, which protects the location privacy of both workers and tasks, and achieves task allocation with high data utility. Specifically, DPLP splits the exact locations of both workers and tasks into noisy multi-level grids by using adaptive three-level grid decomposition (ATGD) algorithm and DP-based adaptive complete pyramid grid (DPACPG) algorithm, respectively, thereby considering the grid granularity and location privacy. Furthermore, DPLP adopts an optimal greedy algorithm to calculate a geocast region around the task grid, which achieves the trade-off between acceptance rate and system overhead. Detailed privacy analysis demonstrates that our DPLP scheme satisfies $\epsilon$ -differential privacy. The extensive analysis and experiments over two real-world datasets confirm high efficiency and data utility of our scheme.

22 citations


Journal ArticleDOI
TL;DR: In this article , a generic crowdsourcing approach for continuously evolving the industrial knowledge graph (IKG) is proposed to improve the quality and availability of the IKG in a smart manufacturing environment.

Proceedings ArticleDOI
11 Feb 2022
TL;DR: This paper proposes a conversational User Simulator, called USi, for automatic evaluation of conversational search systems, capable of automatically answering clarifying questions about the topic throughout the search session, and shows that responses generated by USi are both inline with the underlying information need and comparable to human-generated answers.
Abstract: Clarifying the underlying user information need by asking clarifying questions is an important feature of modern conversational search system. However, evaluation of such systems through answering prompted clarifying questions requires significant human effort, which can be time-consuming and expensive. In this paper, we propose a conversational User Simulator, called USi, for automatic evaluation of such conversational search systems. Given a description of an information need, USi is capable of automatically answering clarifying questions about the topic throughout the search session. Through a set of experiments, including automated natural language generation metrics and crowdsourcing studies, we show that responses generated by USi are both inline with the underlying information need and comparable to human-generated answers. Moreover, we make the first steps towards multi-turn interactions, where conversational search systems asks multiple questions to the (simulated) user with a goal of clarifying the user need. To this end, we expand on currently available datasets for studying clarifying questions, i.e., Qulac and ClariQ, by performing a crowdsourcing-based multi-turn data acquisition. We show that our generative, GPT2-based model, is capable of providing accurate and natural answers to unseen clarifying questions in the single-turn setting and discuss capabilities of our model in the multi-turn setting. We provide the code, data, and the pre-trained model to be used for further research on the topic.

Journal ArticleDOI
01 Nov 2022
TL;DR: Zhang et al. as discussed by the authors proposed a multiple noisy label distribution propagation (MNLDP) method, which estimates the label distribution of each instance from its multiple noisy labels and then propagates its label distribution to its nearest neighbors.
Abstract: Crowdsourcing services provide a fast, efficient, and cost-effective way to obtain large labeled data for supervised learning. Unfortunately, the quality of crowdsourced labels cannot satisfy the standards of practical applications. Ground-truth inference, simply called label integration, designs proper aggregation methods to infer the unknown true label of each instance (sample) from the multiple noisy label set provided by ordinary crowd labelers (workers). However, nearly all existing label integration methods focus solely on the multiple noisy label set per individual instance while totally ignoring the intercorrelation among multiple noisy label sets of different instances. To solve this problem, a multiple noisy label distribution propagation (MNLDP) method is proposed in this article. MNLDP at first estimates the multiple noisy label distribution of each instance from its multiple noisy label set and then propagates its multiple noisy label distribution to its nearest neighbors. Consequently, each instance absorbs a fraction of the multiple noisy label distributions from its nearest neighbors and yet simultaneously maintains a fraction of its own original multiple noisy label distribution. Empirical studies on a collection of an artificial dataset, six simulated UCI datasets, and three real-world crowdsourced datasets show that MNLDP outperforms all other existing state-of-the-art label integration methods in terms of the integration accuracy and classification accuracy.

Journal ArticleDOI
TL;DR: In this paper , a narrative review examines three key areas for digital approaches to deepen community engagement in clinical trials-the use of digital technology for trial processes to decentralize trials, digital crowdsourcing to develop trial components, and digital qualitative research methods.
Abstract: Digital approaches are increasingly common in clinical trial recruitment, retention, analysis, and dissemination. Community engagement processes have contributed to the successful implementation of clinical trials and are crucial in enhancing equity in trials. However, few studies focus on how digital approaches can be implemented to enhance community engagement in clinical trials. This narrative review examines three key areas for digital approaches to deepen community engagement in clinical trials-the use of digital technology for trial processes to decentralize trials, digital crowdsourcing to develop trial components, and digital qualitative research methods. We highlight how digital approaches enhanced community engagement through a greater diversity of participants, and deepened community engagement through the decentralization of research processes. We discuss new possibilities that digital technologies offer for community engagement, and highlight potential strengths, weaknesses, and practical considerations. We argue that strengthening community engagement using a digital approach can enhance equity and improve health outcomes.

Proceedings ArticleDOI
29 Apr 2022
TL;DR: In this paper , the authors used five existing survey instruments to explore the programming skills, privacy and security attitudes, and secure development self-efficacy of participants from a CS student mailing list and four crowdsourcing platforms (Appen, Clickworker, MTurk, and Prolific).
Abstract: Reliably recruiting participants with programming skills is an ongoing challenge for empirical studies involving software development technologies, often leading to the use of crowdsourcing platforms and computer science (CS) students. In this work, we use five existing survey instruments to explore the programming skills, privacy and security attitudes, and secure development self-efficacy of participants from a CS student mailing list and four crowdsourcing platforms (Appen, Clickworker, MTurk, and Prolific). We recruited 613 participants who claimed to have programming skills and assessed recruitment channels regarding costs, quality, programming skills, as well as privacy and security attitudes. We find that 27% of crowdsourcing participants, 40% of crowdsourcing participants who self-report to be developers, and 89% of CS students answered all programming skill questions correctly. CS students were the most cost-effective recruitment channel and rated themselves lower than crowdsourcing participants about secure development self-efficacy.

Journal ArticleDOI
TL;DR: The authors argue that GPT-3 cannot be forced to produce only true continuation, but rather to maximise their objective function they strategize to be plausible instead of truthful, which can hijack our intuitive capacity to evaluate the accuracy of its outputs.
Abstract: Abstract This article contributes to the debate around the abilities of large language models such as GPT-3, dealing with: firstly, evaluating how well GPT does in the Turing Test, secondly the limits of such models, especially their tendency to generate falsehoods, and thirdly the social consequences of the problems these models have with truth-telling. We start by formalising the recently proposed notion of reversible questions, which Floridi & Chiriatti (2020) propose allow one to ‘identify the nature of the source of their answers’, as a probabilistic measure based on Item Response Theory from psychometrics. Following a critical assessment of the methodology which led previous scholars to dismiss GPT’s abilities, we argue against claims that GPT-3 completely lacks semantic ability. Using ideas of compression, priming, distributional semantics and semantic webs we offer our own theory of the limits of large language models like GPT-3, and argue that GPT can competently engage in various semantic tasks. The real reason GPT’s answers seem senseless being that truth-telling is not amongst them. We claim that these kinds of models cannot be forced into producing only true continuation, but rather to maximise their objective function they strategize to be plausible instead of truthful. This, we moreover claim, can hijack our intuitive capacity to evaluate the accuracy of its outputs. Finally, we show how this analysis predicts that a widespread adoption of language generators as tools for writing could result in permanent pollution of our informational ecosystem with massive amounts of very plausible but often untrue texts.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a novel public participation consortium blockchain system for infrastructure maintenance that is expected to encourage citizens to actively participate in the decision-making process and enable them to witness all administrative procedures in a real-time manner.
Abstract: Smart cities have become a trend with improved efficiency, resilience, and sustainability, providing citizens with high quality of life. With the increasing demand for a more participatory and bottom–up governance approach, citizens play an active role in the process of policy making, revolutionizing the management of smart cities. In the example of urban infrastructure maintenance, the public participation demand is more remarkable as the infrastructure condition is closely related to their daily life. Although blockchain has been widely explored to benefit data collection and processing in smart city governance, public engagement remains a challenge. In this article, we propose a novel public participation consortium blockchain system for infrastructure maintenance that is expected to encourage citizens to actively participate in the decision-making process and enable them to witness all administrative procedures in a real-time manner. To that aim, we introduced a hybrid blockchain architecture to involve a verifier group, which is randomly and dynamically selected from the public citizens, to verify the transaction. In particular, we devised a private-prior peer-prediction-based truthful verification mechanism to tackle the collusion attacks from public verifiers. Then, we specified a Stackelberg-game-based incentive mechanism for encouraging public participation. Finally, we conducted extensive simulations to reveal the properties and performances of our proposed blockchain system, which indicates its superiority over other variations.

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed a triple real-time trajectory privacy protection mechanism (T-LGEB) based on edge computing and blockchain, which combines the localized differential privacy and multiple probability extension mechanism to send the requests and data to the edge server.
Abstract: With the rapid development of the Internet of Things (IoT) and the rapid popularization of 5G networks, the data that needs to be processed in Mobile Crowdsourcing (MCS) system is increasing every day. Traditional cloud computing can no longer meet the needs of crowdsourcing for real-time data and processing efficiency, thus, edge computing was born. Edge computing can be calculated at the edge of network so that greatly improve the efficiency and real-time performance of data processing. In addition, most of the existing privacy protection technologies are based on the trusted third parties. Therefore, in view of the semi-trustworthiness of edge servers and the transparency of blockchain, this paper proposes a triple real-time trajectory privacy protection mechanism (T-LGEB) based on edge computing and blockchain. Through combining the localized differential privacy and multiple probability extension mechanism, the T-LGEB mechanism is proposed to send the requests and data to the edge server in this paper. Then, through the spatio-temporal dynamic pseudonym mechanism proposed in the paper, the entire trajectory of task participants is divided into multiple unrelated trajectory segments with different pseudonymous identities in order to protect the trajectory privacy of task participants while ensuring high data availability and real-time data. Through a large number of experiments and comparative analysis on multiple real data sets, the proposed T-LGEB has extremely high privacy protection capabilities and data availability, and the resource consumption caused is relatively low.

Journal ArticleDOI
TL;DR: In this paper , a supervised machine learning model that classifies Arabic news articles based on their context's credibility was introduced, and the first dataset of Arabic fake news articles composed through crowdsourcing was also introduced.
Abstract: Abstract Over the years, social media has had a considerable impact on the way we share information and send messages. With this comes the problem of the rapid distribution of fake news which can have negative impacts on both individuals and society. Given the potential negative influence, detecting unmonitored ‘fake news’ has become a critical issue in mainstream media. While there are recent studies that built machine learning models that detect fake news in several languages, lack of studies in detecting fake news in the Arabic language is scare. Hence, in this paper, we study the issue of fake news detection in the Arabic language based on textual analysis. In an attempt to address the challenges of authenticating news, we introduce a supervised machine learning model that classifies Arabic news articles based on their context’s credibility. We also introduce the first dataset of Arabic fake news articles composed through crowdsourcing. Subsequently, to extract textual features from the articles, we create a unique approach of forming Arabic lexical wordlists and design an Arabic Natural Language Processing tool to perform textual features extraction. The findings of this study promises great results and outperformed human performance in the same task.

Journal ArticleDOI
TL;DR: HC-COVID is developed, a hierarchical crowdsource knowledge graph based framework that explicitly models the COVID-19 knowledge facts contributed by crowd workers with different levels of expertise and accurately identifies the related knowledge facts to explain the detection results.
Abstract: The proliferation of social media has promoted the spread of misinformation that raises many concerns in our society. This paper focuses on a critical problem of explainable COVID-19 misinformation detection that aims to accurately identify and explain misleading COVID-19 claims on social media. Motivated by the lack of COVID-19 relevant knowledge in existing solutions, we construct a novel crowdsource knowledge graph based approach to incorporate the COVID-19 knowledge facts by leveraging the collaborative efforts of expert and non-expert crowd workers. Two important challenges exist in developing our solution: i) how to effectively coordinate the crowd efforts from both expert and non-expert workers to generate the relevant knowledge facts for detecting COVID-19 misinformation; ii) How to leverage the knowledge facts from the constructed knowledge graph to accurately explain the detected COVID-19 misinformation. To address the above challenges, we develop HC-COVID, a hierarchical crowdsource knowledge graph based framework that explicitly models the COVID-19 knowledge facts contributed by crowd workers with different levels of expertise and accurately identifies the related knowledge facts to explain the detection results. We evaluate HC-COVID using two public real-world datasets on social media. Evaluation results demonstrate that HC-COVID significantly outperforms state-of-the-art baselines in terms of the detection accuracy of misleading COVID-19 claims and the quality of the explanations.

Journal ArticleDOI
TL;DR: In this article , the authors comprehensively survey the state-of-the-art mechanisms for protecting the location privacy of workers in mobile crowdsensing (MCS), and divide the location protection mechanisms into three categories depending on the nature of their algorithm and compare them from the viewpoints of architecture, privacy, computational overhead and utility.

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed a label augmented and weighted majority voting (LAWMV) method, which uses the KNN algorithm to find each instance's K-nearest neighbors (including itself) and merges their multiple noisy label sets to obtain its augmented multiple noisy labels set.

Journal ArticleDOI
TL;DR: FGFL as discussed by the authors , a blockchain-based incentive governor for federated learning, assess the participants with reputation and contribution indicators, and then the task publisher rewards workers fairly to attract efficient ones while the malicious ones are punished and eliminated.

Journal ArticleDOI
TL;DR: In this paper , the authors proposed a system that can exploit the collected data and predict when a plant would get a disease, with a specific degree of precision, with the final purpose to render agriculture more sustainable.
Abstract: As the world becomes increasingly interconnected, emerging and innovative sensing technologies are shaping the future of agriculture, with a special focus on sustainability-related issues. In this context, we envision the possibility to exploit Social Internet of Things for sensing of environmental conditions (solar radiation, humidity, air temperature, and soil moisture) and communications, deep learning for plant disease detection, and crowdsourcing for images collection and classification, engaging farmers and community garden owners and experts. Through, data fusion and deep learning, the designed system can exploit the collected data and predict when a plant would (or not) get a disease, with a specific degree of precision, with the final purpose to render agriculture more sustainable. We here present the architecture, the deep learning model, and the responsive Web app. Finally, some experimental evaluations and usability/engagement tests are reported and discussed, together with final remarks, limitations, and future work.

Journal ArticleDOI
01 Jul 2022-Matter
TL;DR: In this article , the authors propose a framework for materials acceleration for societal solutions (MASS) for materials discovery, which can now potentially be uncovered by computational machine learning, artificial intelligence, and other self-driving methods for material discovery.

Journal ArticleDOI
TL;DR: In this article , a systematic literature review was conducted, aided by the development of an ontology of collaborative and open approaches to innovation, associated with four main aspects: open innovation, business model innovation, non-producer innovation, and the open movement.

Journal ArticleDOI
TL;DR: An online RNA design game, Eterna, is harnessed to challenge a large community of RNA designers to create diverse RNA sensors, and the best player-generated designs approached the thermodynamic optimum.
Abstract: Significance Our manuscript presents a paradigm for carrying out distributed science. We have harnessed an online RNA design game, Eterna, to challenge a large community of RNA designers to create diverse RNA sensors. RNA is an attractive, biocompatible substrate for the design and implementation of molecular sensors. We tasked the diverse Eterna community, comprising a global network of molecular design enthusiasts, to submit thousands to tens of thousands of “solutions” to these RNA sensor design challenges. Crucially, community designs were synthesized and tested experimentally in the real world using high-throughput methods for biochemical assays built on repurposed DNA sequencers. The best player-generated designs for RNA sensors approached the thermodynamic optimum.

Journal ArticleDOI
TL;DR: In this paper , a review of the literature at the intersection of social media platforms, various social entrepreneurial practices and their influence on the social enterprise performance is presented, which integrates scattered findings into one body, allowing the practitioners and policymakers to discern the role of social platforms in dealing with emerging societal problems and increasing operational efficiencies of social enterprises (SEs).

Proceedings ArticleDOI
25 Apr 2022
TL;DR: A framework called Outlier Detection for Streaming Task Assignment that aims to improve robustness by detecting malicious actors and a novel socially aware Generative Adversarial Network (GAN) based architecture that is capable of contending with the complex distributions found in time series is proposed.
Abstract: Crowdsourcing aims to enable the assignment of available resources to the completion of tasks at scale. The continued digitization of societal processes translates into increased opportunities for crowdsourcing. For example, crowdsourcing enables the assignment of computational resources of humans, called workers, to tasks that are notoriously hard for computers. In settings faced with malicious actors, detection of such actors holds the potential to increase the robustness of crowdsourcing platform. We propose a framework called Outlier Detection for Streaming Task Assignment that aims to improve robustness by detecting malicious actors. In particular, we model the arrival of workers and the submission of tasks as evolving time series and provide means of detecting malicious actors by means of outlier detection. We propose a novel socially aware Generative Adversarial Network (GAN) based architecture that is capable of contending with the complex distributions found in time series. The architecture includes two GANs that are designed to adversarially train an autoencoder to learn the patterns of distributions in worker and task time series, thus enabling outlier detection based on reconstruction errors. A GAN structure encompasses a game between a generator and a discriminator, where it is desirable that the two can learn to coordinate towards socially optimal outcomes, while avoiding being exploited by selfish opponents. To this end, we propose a novel training approach that incorporates social awareness into the loss functions of the two GANs. Additionally, to improve task assignment efficiency, we propose an efficient greedy algorithm based on degree reduction that transforms task assignment into a bipartite graph matching. Extensive experiments offer insight into the effectiveness and efficiency of the proposed framework.

Proceedings ArticleDOI
29 Apr 2022
TL;DR: In this paper , the authors leverage data from Birdwatch, Twitter's crowdsourced fact-checking pilot program, to directly measure judgments of whether other users' tweets are misleading, and whether other user's free-text evaluations of third-party tweets are helpful.
Abstract: There is a great deal of interest in the role that partisanship, and cross-party animosity in particular, plays in interactions on social media. Most prior research, however, must infer users’ judgments of others’ posts from engagement data. Here, we leverage data from Birdwatch, Twitter’s crowdsourced fact-checking pilot program, to directly measure judgments of whether other users’ tweets are misleading, and whether other users’ free-text evaluations of third-party tweets are helpful. For both sets of judgments, we find that contextual features – in particular, the partisanship of the users – are far more predictive of judgments than the content of the tweets and evaluations themselves. Specifically, users are more likely to write negative evaluations of tweets from counter-partisans; and are more likely to rate evaluations from counter-partisans as unhelpful. Our findings provide clear evidence that Birdwatch users preferentially challenge content from those with whom they disagree politically. While not necessarily indicating that Birdwatch is ineffective for identifying misleading content, these results demonstrate the important role that partisanship can play in content evaluation. Platform designers must consider the ramifications of partisanship when implementing crowdsourcing programs.