scispace - formally typeset
Search or ask a question

Showing papers by "Marco de Gemmis published in 2007"


Proceedings ArticleDOI
23 Jun 2007
TL;DR: JIGSAW is a knowledge-based WSD system that attemps to disambiguate all words in a text by exploiting WordNet senses, the main assumption is that a specific strategy for each Part-Of-Speech (POS) is better than a single strategy.
Abstract: Word Sense Disambiguation (WSD) is traditionally considered an AI-hard problem. A breakthrough in this field would have a significant impact on many relevant web-based applications, such as information retrieval and information extraction. This paper describes JIGSAW, a knowledge-based WSD system that attemps to disambiguate all words in a text by exploiting WordNet senses. The main assumption is that a specific strategy for each Part-Of-Speech (POS) is better than a single strategy. We evaluated the accuracy of JIGSAW on SemEval-2007 task 1 competition. This task is an application-driven one, where the application is a fixed cross-lingual information retrieval system. Participants disambiguate text by assigning WordNet synsets, then the system has to do the expansion to other languages, index the expanded documents and run the retrieval for all the languages in batch. The retrieval results are taken as a measure for the effectiveness of the disambiguation.

37 citations


Proceedings Article
01 Jan 2007
TL;DR: The paper describes the JUMP framework, which is designed to offer multiple ways for the user to query the knowledge base resulting from integration of autonomous legacy systems.
Abstract: The JUMP project aims at bringing together the knowledge stored in different information systems in order to satisfy information and training needs in knowledge-intensive organisations. Electronic Performance Support Systems provide help, advices, demonstrations, or any other informative support that a user needs to the accomplishment of job tasks in her day-to-day working environment. The paper describes the JUMP framework, which is designed to offer multiple ways for the user to query the knowledge base resulting from integration of autonomous legacy systems. Semantic Web languages and technologies are used throughout the framework to represent, exchange and query the knowledge, while Natural Language Processing Techniques are implemented to understand natural language queries formulated by the user and provide consistent and satisfying results.

6 citations


Book ChapterDOI
TL;DR: Results show that sense-based profiles outperform keyword-based ones in the task of recommending scientific papers, and this approach relies on a knowledge-based WSD algorithm, called JIGSAW, for the semantic indexing of documents.
Abstract: Typically, personalized information recommendation services automatically infer the user profile, a structured model of the user interests, from documents that were already deemed relevant by the user. We present an approach based on Word Sense Disambiguation (WSD) for the extraction of user profiles from documents. This approach relies on a knowledge-based WSD algorithm, called JIGSAW, for the semantic indexing of documents: JIGSAW exploits the WordNet lexical database to select, among all the possible meanings (senses) of a polysemous word, the correct one. Semantically indexed documents are used to train a naive Bayes learner that infers "semantic", sense-baseduser profiles as binary text classifiers (user-likes and user-dislikes). Two empirical evaluations are described in the paper. In the first experimental session, JIGSAW has been evaluated according to the parameters of the Senseval-3 initiative, that provides a forum where the WSD systems are assessed against disambiguated datasets. The goal of the second empirical evaluation has been to measure the accuracy of the user profiles in selecting relevant documents to be recommended. Performance of classical keyword-based profiles has been compared to that of sense-based profiles in the task of recommending scientific papers. The results show that sense-based profiles outperform keyword-based ones.

4 citations