scispace - formally typeset
Search or ask a question

Showing papers on "Crowdsourcing published in 2020"


Journal ArticleDOI
TL;DR: This work presents a systematic and scalable approach to creating KonIQ-10k, the largest IQA dataset to date, consisting of 10,073 quality scored images, and proposes a novel, deep learning model (KonCept512), to show an excellent generalization beyond the test set.
Abstract: Deep learning methods for image quality assessment (IQA) are limited due to the small size of existing datasets. Extensive datasets require substantial resources both for generating publishable content and annotating it accurately. We present a systematic and scalable approach to creating KonIQ-10k, the largest IQA dataset to date, consisting of 10,073 quality scored images. It is the first in-the-wild database aiming for ecological validity, concerning the authenticity of distortions, the diversity of content, and quality-related indicators. Through the use of crowdsourcing, we obtained 1.2 million reliable quality ratings from 1,459 crowd workers, paving the way for more general IQA models. We propose a novel, deep learning model (KonCept512), to show an excellent generalization beyond the test set (0.921 SROCC), to the current state-of-the-art database LIVE-in-the-Wild (0.825 SROCC). The model derives its core performance from the InceptionResNet architecture, being trained at a higher resolution than previous models ( $512\times 384$ ). Correlation analysis shows that KonCept512 performs similar to having 9 subjective scores for each test image.

299 citations


Journal ArticleDOI
TL;DR: The study explores all the possible services among various city dimensions which can make a city smart and suggests multi-dimensional service classification along with required basic infrastructural development.

224 citations


Journal ArticleDOI
TL;DR: In this paper, in order to detect and describe the real time urban emergency event, the 5W (What, Where, When, Who, and Why) model is proposed and results show the accuracy and efficiency of the proposed method.
Abstract: Crowdsourcing is a process of acquisition, integration, and analysis of big and heterogeneous data generated by a diversity of sources in urban spaces, such as sensors, devices, vehicles, buildings, and human. Especially, nowadays, no countries, no communities, and no person are immune to urban emergency events. Detection about urban emergency events, e.g., fires, storms, traffic jams is of great importance to protect the security of humans. Recently, social media feeds are rapidly emerging as a novel platform for providing and dissemination of information that is often geographic. The content from social media usually includes references to urban emergency events occurring at, or affecting specific locations. In this paper, in order to detect and describe the real time urban emergency event, the 5W (What, Where, When, Who, and Why) model is proposed. Firstly, users of social media are set as the target of crowd sourcing. Secondly, the spatial and temporal information from the social media are extracted to detect the real time event. Thirdly, a GIS based annotation of the detected urban emergency event is shown. The proposed method is evaluated with extensive case studies based on real urban emergency events. The results show the accuracy and efficiency of the proposed method.

206 citations


Journal ArticleDOI
01 Jan 2020
TL;DR: A comprehensive and systematic review of existing research on four core algorithmic issues in spatial crowdsourcing: (1) task assignment, (2) quality control, (3) incentive mechanism design, and (4) privacy protection.
Abstract: Crowdsourcing is a computing paradigm where humans are actively involved in a computing task, especially for tasks that are intrinsically easier for humans than for computers. Spatial crowdsourcing is an increasing popular category of crowdsourcing in the era of mobile Internet and sharing economy, where tasks are spatiotemporal and must be completed at a specific location and time. In fact, spatial crowdsourcing has stimulated a series of recent industrial successes including sharing economy for urban services (Uber and Gigwalk) and spatiotemporal data collection (OpenStreetMap and Waze). This survey dives deep into the challenges and techniques brought by the unique characteristics of spatial crowdsourcing. Particularly, we identify four core algorithmic issues in spatial crowdsourcing: (1) task assignment, (2) quality control, (3) incentive mechanism design, and (4) privacy protection. We conduct a comprehensive and systematic review of existing research on the aforementioned four issues. We also analyze representative spatial crowdsourcing applications and explain how they are enabled by these four technical issues. Finally, we discuss open questions that need to be addressed for future spatial crowdsourcing research and applications.

185 citations


Journal ArticleDOI
TL;DR: A privacy-preserving approach for learning effective personalized models on distributed user data while guaranteeing the differential privacy of user data is proposed and the experimental results demonstrate that the proposed approach is robust to user heterogeneity and offers a good tradeoff between accuracy and privacy.
Abstract: To provide intelligent and personalized services on smart devices, machine learning techniques have been widely used to learn from data, identify patterns, and make automated decisions. Machine learning processes typically require a large amount of representative data that are often collected through crowdsourcing from end users. However, user data could be sensitive in nature, and training machine learning models on these data may expose sensitive information of users, violating their privacy. Moreover, to meet the increasing demand of personalized services, these learned models should capture their individual characteristics. This article proposes a privacy-preserving approach for learning effective personalized models on distributed user data while guaranteeing the differential privacy of user data. Practical issues in a distributed learning system such as user heterogeneity are considered in the proposed approach. In addition, the convergence property and privacy guarantee of the proposed approach are rigorously analyzed. The experimental results on realistic mobile sensing data demonstrate that the proposed approach is robust to user heterogeneity and offers a good tradeoff between accuracy and privacy.

141 citations


Journal ArticleDOI
Saide Zhu1, Zhipeng Cai1, Huafu Hu, Yingshu Li1, Wei Li1 
TL;DR: This article proposes an innovative hybrid blockchain crowdsourcing platform, named zkCrowd, which integrates with a hybrid blockchain structure, smart contract, dual ledgers, and dual consensus protocols to secure communications, verify transactions, and preserve privacy.
Abstract: Blockchain, a promising decentralized para-digm, can be exploited not only to overcome the shortcomings of the traditional crowdsourcing systems, but also to bring technical innovations, such as decentralization and accountability. Nevertheless, some critical inherent limitations of blockchain have been rarely addressed in the literature when it is incorporated into crowdsourcing, which may yield the performance bottleneck in the crowdsourcing systems. To further leverage the superiority of combining blockchain and crowdsourcing, in this article, we propose an innovative hybrid blockchain crowdsourcing platform, named zkCrowd. Our zkCrowd integrates with a hybrid blockchain structure, smart contract, dual ledgers, and dual consensus protocols to secure communications, verify transactions, and preserve privacy. Both the theoretical analysis and experiments are performed to evaluate the advantages of zkCrowd over the state of the art.

130 citations


Posted Content
TL;DR: This article proposes to integrate federated learning and local differential privacy (LDP) to facilitate the crowdsourcing applications to achieve the machine learning model, and proposes four LDP mechanisms to perturb gradients generated by vehicles.
Abstract: Internet of Vehicles (IoV) is a promising branch of the Internet of Things. IoV simulates a large variety of crowdsourcing applications such as Waze, Uber, and Amazon Mechanical Turk, etc. Users of these applications report the real-time traffic information to the cloud server which trains a machine learning model based on traffic information reported by users for intelligent traffic management. However, crowdsourcing application owners can easily infer users' location information, which raises severe location privacy concerns of the users. In addition, as the number of vehicles increases, the frequent communication between vehicles and the cloud server incurs unexpected amount of communication cost. To avoid the privacy threat and reduce the communication cost, in this paper, we propose to integrate federated learning and local differential privacy (LDP) to facilitate the crowdsourcing applications to achieve the machine learning model. Specifically, we propose four LDP mechanisms to perturb gradients generated by vehicles. The Three-Outputs mechanism is proposed which introduces three different output possibilities to deliver a high accuracy when the privacy budget is small. The output possibilities of Three-Outputs can be encoded with two bits to reduce the communication cost. Besides, to maximize the performance when the privacy budget is large, an optimal piecewise mechanism (PM-OPT) is proposed. We further propose a suboptimal mechanism (PM-SUB) with a simple formula and comparable utility to PM-OPT. Then, we build a novel hybrid mechanism by combining Three-Outputs and PM-SUB.

102 citations


Journal ArticleDOI
TL;DR: In this paper, a new generation of big data analytics (BDA) companies are crowdsourcing large volumes of online consumer reviews by means of controlled ad hoc online experiments and advanced machine learning (ML) techniques to forecast demand and determine the market potential for new products in several industries.

90 citations


Journal ArticleDOI
TL;DR: This work shows an incentive-based interaction between the crowdsourcing platform and the participating client’s independent strategies for training a global learning model, where each side maximizes its own benefit and proposes a novel crowdsourcing framework to leverage FL that considers the communication efficiency during parameters exchange.
Abstract: Federated learning (FL) rests on the notion of training a global model in a decentralized manner. Under this setting, mobile devices perform computations on their local data before uploading the required updates to improve the global model. However, when the participating clients implement an uncoordinated computation strategy, the difficulty is to handle the communication efficiency (i.e., the number of communications per iteration) while exchanging the model parameters during aggregation. Therefore, a key challenge in FL is how users participate to build a high-quality global model with communication efficiency. We tackle this issue by formulating a utility maximization problem, and propose a novel crowdsourcing framework to leverage FL that considers the communication efficiency during parameters exchange. First, we show an incentive-based interaction between the crowdsourcing platform and the participating client’s independent strategies for training a global learning model, where each side maximizes its own benefit. We formulate a two-stage Stackelberg game to analyze such scenario and find the game’s equilibria. Second, we formalize an admission control scheme for participating clients to ensure a level of local accuracy. Simulated results demonstrate the efficacy of our proposed solution with up to 22% gain in the offered reward.

77 citations


Journal ArticleDOI
TL;DR: From experimental results, it can be inferred that the proposed worker-selection incentive mechanism can inspire users to participate in crowd tasks and maximize the utility of mobile crowdsourcing systems effectively.

71 citations


Proceedings ArticleDOI
15 Oct 2020
TL;DR: In this article, the authors show that conventional crowdsourcing algorithms struggle in this user feedback setting, and present a new algorithm, SURF, that can cope with this nonresponse ambiguity.
Abstract: Supervised learning classifiers inevitably make mistakes in production, perhaps mis-labeling an email, or flagging an otherwise routine transaction as fraudulent. It is vital that the end users of such a system are provided with a means of relabeling data points that they deem to have been mislabeled. The classifier can then be retrained on the relabeled data points in the hope of performance improvement. To reduce noise in this feedback data, well known algorithms from the crowdsourcing literature can be employed. However, the feedback setting provides a new challenge: how do we know what to do in the case of user non-response? If a user provides us with no feedback on a label then it can be dangerous to assume they implicitly agree: a user can be busy, lazy, or no longer a user of the system! We show that conventional crowdsourcing algorithms struggle in this user feedback setting, and present a new algorithm, SURF, that can cope with this non-response ambiguity.

Posted Content
TL;DR: An annotation schema is defined and detailed annotation instructions are defined, which reflect the perspectives of journalists, fact-checkers, policymakers, government entities, social media platforms, and society as a whole about fighting the first global infodemic.
Abstract: With the outbreak of the COVID-19 pandemic, people turned to social media to read and to share timely information including statistics, warnings, advice, and inspirational stories. Unfortunately, alongside all this useful information, there was also a new blending of medical and political misinformation and disinformation, which gave rise to the first global infodemic. While fighting this infodemic is typically thought of in terms of factuality, the problem is much broader as malicious content includes not only fake news, rumors, and conspiracy theories, but also promotion of fake cures, panic, racism, xenophobia, and mistrust in the authorities, among others. This is a complex problem that needs a holistic approach combining the perspectives of journalists, fact-checkers, policymakers, government entities, social media platforms, and society as a whole. Taking them into account we define an annotation schema and detailed annotation instructions, which reflect these perspectives. We performed initial annotations using this schema, and our initial experiments demonstrated sizable improvements over the baselines. Now, we issue a call to arms to the research community and beyond to join the fight by supporting our crowdsourcing annotation efforts.

Journal ArticleDOI
TL;DR: Crowdsourcing efforts are currently underway to collect and analyze data from patients with cancer who are affected by the COVID-19 pandemic to fill key knowledge gaps to tackle crucial clinical questions on the complexities of infection with the causative coronavirus SARS-Cov-2 in the large, heterogeneous group of vulnerable Patients with cancer.
Abstract: Crowdsourcing efforts are currently underway to collect and analyze data from patients with cancer who are affected by the COVID-19 pandemic. These community-led initiatives will fill key knowledge gaps to tackle crucial clinical questions on the complexities of infection with the causative coronavirus SARS-Cov-2 in the large, heterogeneous group of vulnerable patients with cancer.

Journal ArticleDOI
TL;DR: A privacy-aware task allocation and data aggregation scheme (PTAA) is proposed leveraging bilinear pairing and homomorphic encryption and security analysis shows that PTAA can achieve the desirable security goals.
Abstract: Spatial crowdsourcing (SC) enables task owners (TOs) to outsource spatial-related tasks to a SC-server who engages mobile users in collecting sensing data at some specified locations with their mobile devices. Data aggregation, as a specific SC task, has drawn much attention in mining the potential value of the massive spatial crowdsensing data. However, the release of SC tasks and the execution of data aggregation may pose considerable threats to the privacy of TOs and mobile users, respectively. Besides, it is nontrivial for the SC-server to allocate numerous tasks efficiently and accurately to qualified mobile users, as the SC-server has no knowledge about the entire geographical user distribution. To tackle these issues, in this paper, we introduce a fog-assisted SC architecture, in which many fog nodes deployed in different regions can assist the SC-server to distribute tasks and aggregate data in a privacy-aware manner. Specifically, a privacy-aware task allocation and data aggregation scheme (PTAA) is proposed leveraging bilinear pairing and homomorphic encryption. PTAA supports representative aggregate statistics (e.g., sum, mean, variance, and minimum) with efficient data update while providing strong privacy protection. Security analysis shows that PTAA can achieve the desirable security goals. Extensive experiments also demonstrate its feasibility and efficiency.

Journal ArticleDOI
TL;DR: The development trend of blockchain technology from the perspective of global government and enterprises and main technologies on security, privacy, and trust in crowdsourcing services and application scenarios related to this field are observed.
Abstract: Blockchain is a new decentralized distributed technology, which guarantees trusted transactions in untrustworthy environments by realizing value transfer network. Because of its important value and significance to lead human society from the information transmission internet era to the value transmission internet era, it has attracted the attention of researchers in crowdsourcing services. This paper firstly observes the development trend of blockchain technology from the perspective of global government and enterprises. Then we briefly review the related concepts of blockchain and basic model of blockchain. On this basis, a comprehensive summary of the state of the blockchain research has been made on related articles which are recently published. In order to show its functional value, further investigation has been taken on the main technologies on security, privacy, and trust in crowdsourcing services and application scenarios related to this field. Finally, the advantages and challenges of blockchains are discussed. It is hoped to provide useful reference and help for the future research on blockchain technology used in crowdsourcing services.

Journal ArticleDOI
TL;DR: This paper devise a grid-based location protection method, which can protect the locations of workers and tasks while keeping the distance-aware information on the protected locations such that it can quantify the distance between tasks and workers.
Abstract: Privacy leakage is a serious issue in spatial crowdsourcing in various scenarios. In this paper, we study privacy protection in spatial crowdsourcing. The main challenge is to efficiently assign tasks to nearby workers without needing to know the exact locations of tasks and workers. To address this problem, we propose a privacy-preserving framework without online trusted third parties. We devise a grid-based location protection method, which can protect the locations of workers and tasks while keeping the distance-aware information on the protected locations such that we can quantify the distance between tasks and workers. We propose an efficient task assignment algorithm, which can instantly assign tasks to nearby workers on encrypted data. To protect the task content, we leverage both attribute-based encryption and symmetric-key encryption to establish secure channels through servers, which ensures that the task is delivered securely and accurately by any untrusted server. Moreover, we analyze the security properties of our method. We have conducted real experiments on real-world datasets. Experimental results show that our method outperforms existing approaches.

Proceedings ArticleDOI
21 Apr 2020
TL;DR: Twitter A11y increases access to social media platforms for people with visual impairments by providing high-quality automatic descriptions for user-posted images by increasing alt-text coverage from 7.6% to 78.5%, before crowdsourcing descriptions for the remaining images.
Abstract: Social media platforms are integral to public and private discourse, but are becoming less accessible to people with vision impairments due to an increase in user-posted images. Some platforms (i.e. Twitter) let users add image descriptions (alternative text), but only 0.1% of images include these. To address this accessibility barrier, we created Twitter A11y, a browser extension to add alternative text on Twitter using six methods. For example, screenshots of text are common, so we detect textual images, and create alternative text using optical character recognition. Twitter A11y also leverages services to automatically generate alternative text or reuse them from across the web. We compare the coverage and quality of Twitter A11y's six alt-text strategies by evaluating the timelines of 50 self-identified blind Twitter users. We find that Twitter A11y increases alt-text coverage from 7.6% to 78.5%, before crowdsourcing descriptions for the remaining images. We estimate that 57.5% of returned descriptions are high-quality. We then report on the experiences of 10 participants with visual impairments using the tool during a week-long deployment. Twitter A11y increases access to social media platforms for people with visual impairments by providing high-quality automatic descriptions for user-posted images.

Journal ArticleDOI
TL;DR: Although crowdsourcing is effective at improving behavioral outcomes, more research is needed to understand effects on clinical outcomes and costs and to develop artificial intelligence systems in medicine.
Abstract: Crowdsourcing is used increasingly in health and medical research. Crowdsourcing is the process of aggregating crowd wisdom to solve a problem. The purpose of this systematic review is to summarize quantitative evidence on crowdsourcing to improve health. We followed Cochrane systematic review guidance and systematically searched seven databases up to September 4th 2019. Studies were included if they reported on crowdsourcing and related to health or medicine. Studies were excluded if recruitment was the only use of crowdsourcing. We determined the level of evidence associated with review findings using the GRADE approach. We screened 3508 citations, accessed 362 articles, and included 188 studies. Ninety-six studies examined effectiveness, 127 examined feasibility, and 37 examined cost. The most common purposes were to evaluate surgical skills (17 studies), to create sexual health messages (seven studies), and to provide layperson cardio-pulmonary resuscitation (CPR) out-of-hospital (six studies). Seventeen observational studies used crowdsourcing to evaluate surgical skills, finding that crowdsourcing evaluation was as effective as expert evaluation (low quality). Four studies used a challenge contest to solicit human immunodeficiency virus (HIV) testing promotion materials and increase HIV testing rates (moderate quality), and two of the four studies found this approach saved money. Three studies suggested that an interactive technology system increased rates of layperson initiated CPR out-of-hospital (moderate quality). However, studies analyzing crowdsourcing to evaluate surgical skills and layperson-initiated CPR were only from high-income countries. Five studies examined crowdsourcing to inform artificial intelligence projects, most often related to annotation of medical data. Crowdsourcing was evaluated using different outcomes, limiting the extent to which studies could be pooled. Crowdsourcing has been used to improve health in many settings. Although crowdsourcing is effective at improving behavioral outcomes, more research is needed to understand effects on clinical outcomes and costs. More research is needed on crowdsourcing as a tool to develop artificial intelligence systems in medicine. PROSPERO: CRD42017052835. December 27, 2016.

Journal ArticleDOI
TL;DR: This article proposes the Markov and Collaborative filtering-based Task Recommendation (MCTR) model, and based on the Walrasian equilibrium, the optimum solution is researched to maximize the social welfare of mobile crowdsourcing systems.
Abstract: With the rapid development of Industry 5.0 and mobile devices, the research of mobile crowdsensing networks has become an important research focus. Task allocation is an important research content that can inspire crowd workers to participate in crowd tasks and provide truthful sensed data in mobile crowdsourcing systems. However, how to inspire crowd workers to participate in crowd tasks and provide truthful sensed data still has many challenges. In this article, based on the Markov model and collaborative filtering model, the similarities, trajectory prediction, dwell time, and trust degree are considered to propose the Markov and Collaborative filtering-based Task Recommendation (MCTR) model. Then, based on the Walrasian equilibrium, the optimum solution is researched to maximize the social welfare of mobile crowdsourcing systems. Finally, the comparison experiments are carried out to evaluate the performance of the proposed multiobjective optimization and the Markov-based task allocation with other methods. Through comparison experiments, the efficiency and adaptation of mobile crowdsourcing systems could be improved by the proposed task allocation.

Journal ArticleDOI
22 Jun 2020
TL;DR: This article introduces approval voting to utilize the expertise of workers who have partial knowledge of the true answer and coupling it with two strictly proper scoring rules, and establishes attractive properties of optimality and uniqueness of the scoring rules.
Abstract: The growing need for labeled training data has made crowdsourcing a vital tool for developing machine learning applications. Here, workers on a crowdsourcing platform are typically shown a list of unlabeled items, and for each of these items, are asked to choose a label from one of the provided options. The workers in crowdsourcing platforms are not experts, thereby making it essential to judiciously elicit the information known to the workers. With respect to this goal, there are two key shortcomings of current systems: (i) the incentives of the workers are not aligned with those of the requesters; and (ii) the interface does not allow workers to convey their knowledge accurately by forcing them to make a single choice among a set of options. In this article, we address these issues by introducing approval voting to utilize the expertise of workers who have partial knowledge of the true answer and coupling it with two strictly proper scoring rules. We additionally establish attractive properties of optimality and uniqueness of our scoring rules. We also conduct preliminary empirical studies on Amazon Mechanical Turk, and the results of these experiments validate our approach.

Journal ArticleDOI
TL;DR: In the last few years, the emergence of two new ways in which firms interact with outside stakeholders, namely crowdsourcing and crowdfunding service providers, have been seen.
Abstract: In the last few years, we have seen the emergence of two new ways in which firms interact with outside stakeholders, namely crowdsourcing and crowdfunding service providers. In this article, we def...

Journal ArticleDOI
TL;DR: The approximation performance of the proposed Secure Reverse Auction (SRA) protocol is analyzed and it is proved that it has some desired properties, including truthfulness, individual rationality, computational efficiency, and security.
Abstract: In this paper, we study a new type of spatial crowdsourcing, namely competitive detour tasking, where workers can make detours from their original travel paths to perform multiple tasks, and each worker is allowed to compete for preferred tasks by strategically claiming his/her detour costs. The objective is to make suitable task assignment by maximizing the social welfare of crowdsourcing systems and protecting workers’ private sensitive information. We first model the task assignment problem as a reverse auction process. We formalize the winning bid selection of reverse auction as an $n$ n -to-one weighted bipartite graph matching problem with multiple 0-1 knapsack constraints. Since this problem is NP-hard, we design an approximation algorithm to select winning bids and determine corresponding payments. Based on this, a Secure Reverse Auction (SRA) protocol is proposed for this novel spatial crowdsourcing. We analyze the approximation performance of the proposed protocol and prove that it has some desired properties, including truthfulness, individual rationality, computational efficiency, and security. To the best of our knowledge, this is the first theoretically provable secure auction protocol for spatial crowdsourcing systems. In addition, we also conduct extensive simulations on a real trace to verify the performance of the proposed protocol.


Posted Content
TL;DR: ‘ETHOS’ (multi-labEl haTe speecH detectiOn dataSet), a textual dataset with two variants: binary and multi-label, based on YouTube and Reddit comments validated using the Figure-Eight crowdsourcing platform, and the annotation protocol used to create this dataset is presented.
Abstract: Online hate speech is a newborn problem in our modern society which is growing at a steady rate exploiting weaknesses of the corresponding regimes that characterise several social media platforms. Therefore, this phenomenon is mainly cultivated through such comments, either during users' interaction or on posted multimedia context. Nowadays, giant companies own platforms where many millions of users log in daily. Thus, protection of their users from exposure to similar phenomena for keeping up with the corresponding law, as well as for retaining a high quality of offered services, seems mandatory. Having a robust and reliable mechanism for identifying and preventing the uploading of related material would have a huge effect on our society regarding several aspects of our daily life. On the other hand, its absence would deteriorate heavily the total user experience, while its erroneous operation might raise several ethical issues. In this work, we present a protocol for creating a more suitable dataset, regarding its both informativeness and representativeness aspects, favouring the safer capture of hate speech occurrence, without at the same time restricting its applicability to other classification problems. Moreover, we produce and publish a textual dataset with two variants: binary and multi-label, called `ETHOS', based on YouTube and Reddit comments validated through figure-eight crowdsourcing platform. Our assumption about the production of more compatible datasets is further investigated by applying various classification models and recording their behaviour over several appropriate metrics.

Proceedings ArticleDOI
20 Apr 2020
TL;DR: This work studies a novel spatial crowdsourcing problem, namely Predictive Task Assignment (PTA), which aims to maximize the number of assigned tasks by taking into account both current and future workers/tasks that enter the system dynamically with location unknown in advance and proposes a two-phase data-driven framework.
Abstract: With the rapid development of mobile networks and the widespread usage of mobile devices, spatial crowdsourcing, which refers to assigning location-based tasks to moving workers, has drawn increasing attention. One of the major issues in spatial crowdsourcing is task assignment, which allocates tasks to appropriate workers. However, existing works generally assume the static offline scenarios, where the spatio-temporal information of all the workers and tasks is determined and known a priori. Ignorance of the dynamic spatio-temporal distributions of workers and tasks can often lead to poor assignment results. In this work we study a novel spatial crowdsourcing problem, namely Predictive Task Assignment (PTA), which aims to maximize the number of assigned tasks by taking into account both current and future workers/tasks that enter the system dynamically with location unknown in advance. We propose a two-phase data-driven framework. The prediction phase hybrids different learning models to predict the locations and routes of future workers and designs a graph embedding approach to estimate the distribution of future tasks. In the assignment component, we propose both greedy algorithm for large-scale applications and optimal algorithm with graph partition based decomposition. Extensive experiments on two real datasets demonstrate the effectiveness of our framework.

Proceedings ArticleDOI
20 Apr 2020
TL;DR: A novel privacy mechanism based on Hierarchically Well-Separated Trees (HSTs) is designed and extensive experiments show that online task assignment under this privacy mechanism is notably more effective in terms of total distance than under prior differentially private mechanisms.
Abstract: With spatial crowdsourcing applications such as Uber and Waze deeply penetrated into everyday life, there is a growing concern to protect user privacy in spatial crowdsourcing. Particularly, locations of workers and tasks should be properly processed via certain privacy mechanism before reporting to the untrusted spatial crowdsourcing server for task assignment. Privacy mechanisms typically permute the location information, which tends to make task assignment ineffective. Prior studies only provide guarantees on privacy protection without assuring the effectiveness of task assignment. In this paper, we investigate privacy protection for online task assignment with the objective of minimizing the total distance, an important task assignment formulation in spatial crowdsourcing. We design a novel privacy mechanism based on Hierarchically Well-Separated Trees (HSTs). We prove that the mechanism is e-Geo-Indistinguishable and show that there is a task assignment algorithm with a competitive ratio of $O\left( {\frac{1}{{{\varepsilon ^4}}}\log N{{\log }^2}k} \right)$, where is the privacy budget, N is the number of predefined points on the HST, and k is the matching size. Extensive experiments on synthetic and real datasets show that online task assignment under our privacy mechanism is notably more effective in terms of total distance than under prior differentially private mechanisms.

Journal ArticleDOI
TL;DR: This paper studies a destination-aware task assignment problem that concerns the optimal strategy of assigning each task to proper worker such that the total number of completed tasks can be maximized whilst all workers can reach their destinations before deadlines after performing assigned tasks.
Abstract: With the proliferation of GPS-enabled smart devices and increased availability of wireless network, spatial crowdsourcing (SC) has been recently proposed as a framework to automatically request workers (i.e., smart device carriers) to perform location-sensitive tasks (e.g., taking scenic photos, reporting events). In this paper, we study a destination-aware task assignment problem that concerns the optimal strategy of assigning each task to proper worker such that the total number of completed tasks can be maximized whilst all workers can reach their destinations before deadlines after performing assigned tasks. Finding the global optimal assignment turns out to be an intractable problem since it does not imply optimal assignment for individual worker. Observing that the task assignment dependency only exists amongst subsets of workers, we utilize tree-decomposition technique to separate workers into independent clusters and develop an efficient depth-first search algorithm with progressive bounds to prune non-promising assignments. In order to make our proposed framework applicable to more scenarios, we further optimize the original framework by proposing strategies to reduce the overall travel cost and allow each task to be assigned to multiple workers. Extensive empirical studies verify that the proposed technique and optimization strategies perform effectively and settle the problem nicely.

Proceedings ArticleDOI
01 Jul 2020
TL;DR: It is shown that co-attention models which explicitly encode dialoh history outperform models that don’t, achieving state-of-the-art performance, and a challenging subset of the VisdialVal set and the benchmark NDCG of 63%.
Abstract: Visual Dialogue involves "understanding'' the dialogue history (what has been discussed previously) and the current question (what is asked), in addition to grounding information in the image, to accurately generate the correct response. In this paper, we show that co-attention models which explicitly encode dialoh history outperform models that don't, achieving state-of-the-art performance (72 % NDCG on val set). However, we also expose shortcomings of the crowdsourcing dataset collection procedure, by showing that dialogue history is indeed only required for a small amount of the data, and that the current evaluation metric encourages generic replies. To that end, we propose a challenging subset (VisdialConv) of the VisdialVal set and the benchmark NDCG of 63%.

Journal ArticleDOI
TL;DR: In this paper, the authors classify the recent solutions into four different categories: matrix factorization based models (MF-based models), gradient boosting tree based models, deep learning based models and ranking based models.
Abstract: The rapid development of Community Question Answering (CQA) satisfies users’ quest for professional and personal knowledge about anything. In CQA, one central issue is to find users with expertise and willingness to answer the given questions. Expert finding in CQA often exhibits very different challenges compared to traditional methods. The new features of CQA (such as huge volume, sparse data and crowdsourcing) violate fundamental assumptions of traditional recommendation systems. This paper focuses on reviewing and categorizing the current progress on expert finding in CQA. We classify the recent solutions into four different categories: matrix factorization based models (MF-based models), gradient boosting tree based models (GBT-based models), deep learning based models (DL-based models) and ranking based models (R-based models). We find that MF-based models outperform other categories of models in the crowdsourcing situation. Moreover, we use innovative diagrams to clarify several important concepts of ensemble learning, and find that ensemble models with several specific single models can further boost the performance. Further, we compare the performance of different models on different types of matching tasks, including textvs.text, graphvs.text, audiovs.text and videovs.text. The results will help the model selection of expert finding in practice. Finally, we explore some potential future issues in expert finding research in CQA.

Journal ArticleDOI
TL;DR: An automated method for design concept assessment that provides a possible avenue to rate design concepts deterministically and hints at bias in human design concept selection is developed and demonstrated.
Abstract: In order to develop novel solutions for complex systems and in increasingly competitive markets, it may be advantageous to generate large numbers of design concepts and then to identify the most novel and valuable ideas. However, it can be difficult to process, review, and assess thousands of design concepts. Based on this need, we develop and demonstrate an automated method for design concept assessment. In the method, machine learning technologies are first applied to extract ontological data from design concepts. Then, a filtering strategy and quantitative metrics are introduced that enable creativity rating based on the ontological data. This method is tested empirically. Design concepts are crowd-generated for a variety of actual industry design problems/opportunities. Over 4000 design concepts were generated by humans for assessment. Empirical evaluation assesses: (1) correspondence of the automated ratings with human creativity ratings; (2) whether concepts selected using the method are highly scored by another set of crowd raters; and finally (3) if high scoring designs have a positive correlation or relationship to industrial technology development. The method provides a possible avenue to rate design concepts deterministically. A highlight is that a subset of designs selected automatically out of a large set of candidates was scored higher than a subset selected by humans when evaluated by a set of third-party raters. The results hint at bias in human design concept selection and encourage further study in this topic.