scispace - formally typeset
Search or ask a question
Author

Gianluca Demartini

Bio: Gianluca Demartini is an academic researcher from University of Queensland. The author has contributed to research in topics: Crowdsourcing & Computer science. The author has an hindex of 27, co-authored 156 publications receiving 3169 citations. Previous affiliations of Gianluca Demartini include University of California, Berkeley & Leibniz University of Hanover.


Papers
More filters
Proceedings ArticleDOI
16 Apr 2012
TL;DR: A probabilistic framework to make sensible decisions about candidate links and to identify unreliable human workers is developed and developed to improve the quality of the links while limiting the amount of work performed by the crowd.
Abstract: We tackle the problem of entity linking for large collections of online pages; Our system, ZenCrowd, identifies entities from natural language text using state of the art techniques and automatically connects them to the Linked Open Data cloud. We show how one can take advantage of human intelligence to improve the quality of the links by dynamically generating micro-tasks on an online crowdsourcing platform. We develop a probabilistic framework to make sensible decisions about candidate links and to identify unreliable human workers. We evaluate ZenCrowd in a real deployment and show how a combination of both probabilistic reasoning and crowdsourcing techniques can significantly improve the quality of the links, while limiting the amount of work performed by the crowd.

454 citations

Proceedings ArticleDOI
18 Apr 2015
TL;DR: The prevalent malicious activity on crowdsourcing platforms is analyzed and different types of workers in the crowd are defined, a method to measure malicious activity is proposed, and guidelines for the efficient design of crowdsourced surveys are presented.
Abstract: Crowdsourcing is increasingly being used as a means to tackle problems requiring human intelligence. With the ever-growing worker base that aims to complete microtasks on crowdsourcing platforms in exchange for financial gains, there is a need for stringent mechanisms to prevent exploitation of deployed tasks. Quality control mechanisms need to accommodate a diverse pool of workers, exhibiting a wide range of behavior. A pivotal step towards fraud-proof task design is understanding the behavioral patterns of microtask workers. In this paper, we analyze the prevalent malicious activity on crowdsourcing platforms and study the behavior exhibited by trustworthy and untrustworthy workers, particularly on crowdsourced surveys. Based on our analysis of the typical malicious activity, we define and identify different types of workers in the crowd, propose a method to measure malicious activity, and finally present guidelines for the efficient design of crowdsourced surveys.

230 citations

Proceedings ArticleDOI
13 May 2013
TL;DR: This paper proposes and extensively evaluate a different Crowdsourcing approach based on a push methodology that carefully selects which workers should perform a given task based on worker profiles extracted from social networks and shows that this approach consistently yield better results than usual pull strategies.
Abstract: Crowdsourcing allows to build hybrid online platforms that combine scalable information systems with the power of human intelligence to complete tasks that are difficult to tackle for current algorithms. Examples include hybrid database systems that use the crowd to fill missing values or to sort items according to subjective dimensions such as picture attractiveness. Current approaches to Crowdsourcing adopt a pull methodology where tasks are published on specialized Web platforms where workers can pick their preferred tasks on a first-come-first-served basis. While this approach has many advantages, such as simplicity and short completion times, it does not guarantee that the task is performed by the most suitable worker. In this paper, we propose and extensively evaluate a different Crowdsourcing approach based on a push methodology. Our proposed system carefully selects which workers should perform a given task based on worker profiles extracted from social networks. Workers and tasks are automatically matched using an underlying categorization structure that exploits entities extracted from the task descriptions on one hand, and categories liked by the user on social platforms on the other hand. We experimentally evaluate our approach on tasks of varying complexity and show that our push methodology consistently yield better results than usual pull strategies.

165 citations

Proceedings ArticleDOI
18 May 2015
TL;DR: This paper uses the main findings of the five year log analysis to propose features used in a predictive model aiming at determining the expected performance of any batch at a specific point in time, and shows that the number of tasks left in a batch and how recent the batch is are two key features of the prediction.
Abstract: Micro-task crowdsourcing is rapidly gaining popularity among research communities and businesses as a means to leverage Human Computation in their daily operations. Unlike any other service, a crowdsourcing platform is in fact a marketplace subject to human factors that affect its performance, both in terms of speed and quality. Indeed, such factors shape the \emph{dynamics} of the crowdsourcing market. For example, a known behavior of such markets is that increasing the reward of a set of tasks would lead to faster results. However, it is still unclear how different dimensions interact with each other: reward, task type, market competition, requester reputation, etc. In this paper, we adopt a data-driven approach to (A) perform a long-term analysis of a popular micro-task crowdsourcing platform and understand the evolution of its main actors (workers, requesters, tasks, and platform). (B) We leverage the main findings of our five year log analysis to propose features used in a predictive model aiming at determining the expected performance of any batch at a specific point in time. We show that the number of tasks left in a batch and how recent the batch is are two key features of the prediction. (C) Finally, we conduct an analysis of the demand (new tasks posted by the requesters) and supply (number of tasks completed by the workforce) and show how they affect task prices on the marketplace.

164 citations

Book ChapterDOI
07 Dec 2009
TL;DR: The XER tasks and the evaluation procedure used at the XER track in 2009, where a new version of Wikipedia was used as underlying collection are described; and the approaches adopted by the participants are summarized.
Abstract: In some situations search engine users would prefer to retrieve entities instead of just documents. Example queries include "Italian Nobel prize winners", "Formula 1 drivers that won the Monaco Grand Prix", or "German spoken Swiss cantons". The XML Entity Ranking (XER) track at INEX creates a discussion forum aimed at standardizing evaluation procedures for entity retrieval. This paper describes the XER tasks and the evaluation procedure used at the XER track in 2009, where a new version of Wikipedia was used as underlying collection; and summarizes the approaches adopted by the participants.

147 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: YAGO is a large ontology with high coverage and precision, based on a clean logical model with a decidable consistency that allows representing n-ary relations in a natural way while maintaining compatibility with RDFS.

912 citations

Book
01 Jan 2011
TL;DR: This book examines various aspects of the evaluation process with an emphasis on classification algorithms, describing several techniques for classifier performance assessment, error estimation and resampling, obtaining statistical significance as well as selecting appropriate domains for evaluation.
Abstract: The field of machine learning has matured to the point where many sophisticated learning approaches can be applied to practical applications. Thus it is of critical importance that researchers have the proper tools to evaluate learning approaches and understand the underlying issues. This book examines various aspects of the evaluation process with an emphasis on classification algorithms. The authors describe several techniques for classifier performance assessment, error estimation and resampling, obtaining statistical significance as well as selecting appropriate domains for evaluation. They also present a unified evaluation framework and highlight how different components of evaluation are both significantly interrelated and interdependent. The techniques presented in the book are illustrated using R and WEKA facilitating better practical insight as well as implementation.Aimed at researchers in the theory and applications of machine learning, this book offers a solid basis for conducting performance evaluations of algorithms in practical settings.

824 citations

Journal ArticleDOI
TL;DR: A thorough overview and analysis of the main approaches to entity linking is presented, and various applications, the evaluation of entity linking systems, and future directions are discussed.
Abstract: The large number of potential applications from bridging web data with knowledge bases have led to an increase in the entity linking research. Entity linking is the task to link entity mentions in text with their corresponding entities in a knowledge base. Potential applications include information extraction, information retrieval, and knowledge base population. However, this task is challenging due to name variations and entity ambiguity. In this survey, we present a thorough overview and analysis of the main approaches to entity linking, and discuss various applications, the evaluation of entity linking systems, and future directions.

702 citations

Journal ArticleDOI
TL;DR: This study is the first to analyze the 5G conspiracy theory in the context of COVID-19 on Twitter offering practical guidance to health authorities in how, in thecontext of a pandemic, rumors may be combated in the future.
Abstract: Background: Since the beginning of December 2019, the coronavirus disease (COVID-19) has spread rapidly around the world, which has led to increased discussions across online platforms. These conversations have also included various conspiracies shared by social media users. Amongst them, a popular theory has linked 5G to the spread of COVID-19, leading to misinformation and the burning of 5G towers in the United Kingdom. The understanding of the drivers of fake news and quick policies oriented to isolate and rebate misinformation are keys to combating it. Objective: The aim of this study is to develop an understanding of the drivers of the 5G COVID-19 conspiracy theory and strategies to deal with such misinformation. Methods: This paper performs a social network analysis and content analysis of Twitter data from a 7-day period (Friday, March 27, 2020, to Saturday, April 4, 2020) in which the #5GCoronavirus hashtag was trending on Twitter in the United Kingdom. Influential users were analyzed through social network graph clusters. The size of the nodes were ranked by their betweenness centrality score, and the graph’s vertices were grouped by cluster using the Clauset-Newman-Moore algorithm. The topics and web sources used were also examined. Results: Social network analysis identified that the two largest network structures consisted of an isolates group and a broadcast group. The analysis also revealed that there was a lack of an authority figure who was actively combating such misinformation. Content analysis revealed that, of 233 sample tweets, 34.8% (n=81) contained views that 5G and COVID-19 were linked, 32.2% (n=75) denounced the conspiracy theory, and 33.0% (n=77) were general tweets not expressing any personal views or opinions. Thus, 65.2% (n=152) of tweets derived from nonconspiracy theory supporters, which suggests that, although the topic attracted high volume, only a handful of users genuinely believed the conspiracy. This paper also shows that fake news websites were the most popular web source shared by users; although, YouTube videos were also shared. The study also identified an account whose sole aim was to spread the conspiracy theory on Twitter. Conclusions: The combination of quick and targeted interventions oriented to delegitimize the sources of fake information is key to reducing their impact. Those users voicing their views against the conspiracy theory, link baiting, or sharing humorous tweets inadvertently raised the profile of the topic, suggesting that policymakers should insist in the efforts of isolating opinions that are based on fake news. Many social media platforms provide users with the ability to report inappropriate content, which should be used. This study is the first to analyze the 5G conspiracy theory in the context of COVID-19 on Twitter offering practical guidance to health authorities in how, in the context of a pandemic, rumors may be combated in the future.

474 citations