scispace - formally typeset
Search or ask a question

Showing papers by "Gianluca Demartini published in 2013"


Proceedings ArticleDOI
13 May 2013
TL;DR: This paper proposes and extensively evaluate a different Crowdsourcing approach based on a push methodology that carefully selects which workers should perform a given task based on worker profiles extracted from social networks and shows that this approach consistently yield better results than usual pull strategies.
Abstract: Crowdsourcing allows to build hybrid online platforms that combine scalable information systems with the power of human intelligence to complete tasks that are difficult to tackle for current algorithms. Examples include hybrid database systems that use the crowd to fill missing values or to sort items according to subjective dimensions such as picture attractiveness. Current approaches to Crowdsourcing adopt a pull methodology where tasks are published on specialized Web platforms where workers can pick their preferred tasks on a first-come-first-served basis. While this approach has many advantages, such as simplicity and short completion times, it does not guarantee that the task is performed by the most suitable worker. In this paper, we propose and extensively evaluate a different Crowdsourcing approach based on a push methodology. Our proposed system carefully selects which workers should perform a given task based on worker profiles extracted from social networks. Workers and tasks are automatically matched using an underlying categorization structure that exploits entities extracted from the task descriptions on one hand, and categories liked by the user on social platforms on the other hand. We experimentally evaluate our approach on tasks of varying complexity and show that our push methodology consistently yield better results than usual pull strategies.

165 citations


Journal ArticleDOI
01 Oct 2013
TL;DR: The ZenCrowd system uses a three-stage blocking technique in order to obtain the best possible instance matches while minimizing both computational complexity and latency, and identifies entities from natural language text using state-of-the-art techniques and automatically connects them to the linked open data cloud.
Abstract: We tackle the problems of semiautomatically matching linked data sets and of linking large collections of Web pages to linked data. Our system, ZenCrowd, (1) uses a three-stage blocking technique in order to obtain the best possible instance matches while minimizing both computational complexity and latency, and (2) identifies entities from natural language text using state-of-the-art techniques and automatically connects them to the linked open data cloud. First, we use structured inverted indices to quickly find potential candidate results from entities that have been indexed in our system. Our system then analyzes the candidate matches and refines them whenever deemed necessary using computationally more expensive queries on a graph database. Finally, we resort to human computation by dynamically generating crowdsourcing tasks in case the algorithmic components fail to come up with convincing results. We integrate all results from the inverted indices, from the graph database and from the crowd using a probabilistic framework in order to make sensible decisions about candidate matches and to identify unreliable human workers. In the following, we give an overview of the architecture of our system and describe in detail our novel three-stage blocking technique and our probabilistic decision framework. We also report on a series of experimental results on a standard data set, showing that our system can achieve a 95 % average accuracy on instance matching (as compared to the initial 88 % average accuracy of the purely automatic baseline) while drastically limiting the amount of work performed by the crowd. The experimental evaluation of our system on the entity linking task shows an average relative improvement of 14 % over our best automatic approach.

89 citations


Book ChapterDOI
21 Oct 2013
TL;DR: The new task of ranking entity types given an entity and its context is defined and new methods to find the most relevant entity type based on collection statistics and on the graph structure interconnecting entities and types are proposed and evaluated.
Abstract: Much of Web search and browsing activity is today centered around entities. For this reason, Search Engine Result Pages (SERPs) increasingly contain information about the searched entities such as pictures, short summaries, related entities, and factual information. A key facet that is often displayed on the SERPs and that is instrumental for many applications is the entity type. However, an entity is usually not associated to a single generic type in the background knowledge bases but rather to a set of more specific types, which may be relevant or not given the document context. For example, one can find on the Linked Open Data cloud the fact that Tom Hanks is a person, an actor, and a person from Concord, California. All those types are correct but some may be too general to be interesting (e.g., person), while other may be interesting but already known to the user (e.g., actor), or may be irrelevant given the current browsing context (e.g., person from Concord, California). In this paper, we define the new task of ranking entity types given an entity and its context. We propose and evaluate new methods to find the most relevant entity type based on collection statistics and on the graph structure interconnecting entities and types. An extensive experimental evaluation over several document collections at different levels of granularity (e.g., sentences, paragraphs, etc.) and different type hierarchies (including DBPedia, Freebase, and schema.org) shows that hierarchy-based approaches provide more accurate results when picking entity types to be displayed to the end-user while still being highly scalable.

55 citations


Proceedings Article
01 Jan 2013
TL;DR: A novel hybrid human-machine approach that leverages the crowd to gain knowledge of query structure and entity relationships is proposed that exploits a combination of query log mining, natural language processing (NLP), and crowdsourcing to generate query templates.
Abstract: Work in hybrid human-machine query processing has thus far focused on the data: gathering, cleaning, and sorting In this paper, we address a missed opportunity to use crowdsourcing to understand the query itself We propose a novel hybrid human-machine approach that leverages the crowd to gain knowledge of query structure and entity relationships The proposed system exploits a combination of query log mining, natural language processing (NLP), and crowdsourcing to generate query templates that can be used to answer whole classes of different questions rather than focusing on just a specific question and answer

43 citations


Book ChapterDOI
26 Aug 2013
TL;DR: This paper has implemented techniques to create large amounts of data by combining crowdsourcing, data generation models, mobile computing, and big data analytics in a system, NoizCrowd, allowing to crowdsource noise levels in a given region and to generate noise models by using state-of-the-art noise propagation models and array data management techniques.
Abstract: Many systems require access to very large amounts of data to properly function, like systems allowing to visualize or predict meteorological changes in a country over a given period of time, or any other system holding, processing and displaying scientific or sensor data. However, filling out a database with large amounts of valuable data can be a difficult, costly and time-consuming task. In this paper, we present techniques to create large amounts of data by combining crowdsourcing, data generation models, mobile computing, and big data analytics. We have implemented our methods in a system, NoizCrowd, allowing to crowdsource noise levels in a given region and to generate noise models by using state-of-the-art noise propagation models and array data management techniques. The resulting models and data can then be accessed using a visual interface.

29 citations


Journal ArticleDOI
TL;DR: This paper creates the Bowlogna Ontology to model an academic setting as proposed by the Bologna reform and describes practical applications of the ontology for end-users at universities such as a faceted search and browsing system for course information.
Abstract: The Bologna Process initiated a radical change within higher education institutions. This change triggered the creation of new administrative procedures in the every day life of European universities. It also gave rise to the emergence of new concepts for the description of curricula. It is critical for the successful continuation of this process to support the publication and exchange of information among universities. With this aim in mind, we created the Bowlogna Ontology to model an academic setting as proposed by the Bologna reform. In this paper, we present our efforts to design this ontology and the entire process that lead to its creation starting from the definition of a linguistic lexicon derived from the Bologna reform and its conversion to a formal ontology. We also describe practical applications of our ontology for end-users at universities such as a faceted search and browsing system for course information.

25 citations


Book ChapterDOI
24 Mar 2013
TL;DR: This paper proposes novel semi-supervised methods to term disambiguation leveraging the structure of a community-based ontology of scientific concepts to automatically identify the correct sense that was originally picked by the authors of a scientific publication.
Abstract: Scientific documents often adopt a well-defined vocabulary and avoid the use of ambiguous terms. However, as soon as documents from different research sub-communities are considered in combination, many scientific terms become ambiguous as the same term can refer to different concepts from different sub-communities. The ability to correctly identify the right sense of a given term can considerably improve the effectiveness of retrieval models, and can also support additional features such as search diversification. This is even more critical when applied to explorative search systems within the scientific domain. In this paper, we propose novel semi-supervised methods to term disambiguation leveraging the structure of a community-based ontology of scientific concepts. Our approach exploits the graph structure that connects different terms and their definitions to automatically identify the correct sense that was originally picked by the authors of a scientific publication. Experimental evidence over two different test collections from the physics and biomedical domains shows that the proposed method is effective and outperforms state-of-the-art approaches based on feature vectors constructed out of term co-occurrences as well as standard supervised approaches.

18 citations


Proceedings ArticleDOI
01 Aug 2013
TL;DR: This work proposes and experimentally evaluates different approaches for entity disambiguation in social networks based on syntactic and semantic features on top of two different social networks: a general-interest network and a domain-specific network.
Abstract: Pervasive web and social networks are becoming part of everyone's life. Users through their activities on these networks are leaving traces of their expertise, interests and personalities. With the advances in Web mining and user modeling techniques it is possible to leverage the user social network activity history to extract the semantics of user-generated content. In this work we explore various techniques for constructing user profiles based on the content they publish on social networks. We further show that one of the advantages of maintaining social network user profiles is to provide the context for better understanding of microposts. We propose and experimentally evaluate different approaches for entity disambiguation in social networks based on syntactic and semantic features on top of two different social networks: a general-interest network (i.e., Twitter) and a domain-specific network (i.e., StackOverflow). We demonstrate how disambiguation accuracy increases when considering enriched user profiles integrating content from both social networks.

7 citations