scispace - formally typeset
Search or ask a question
Proceedings Article

SemEval-2013 Task 12: Multilingual Word Sense Disambiguation

01 Jun 2013-pp 222-231
TL;DR: The experience in producing a multilingual sense-annotated corpus for the SemEval-2013 task on multilingual Word Sense Disambiguation is described, and the results of participating systems are presented and analyzed.
Abstract: This paper presents the SemEval-2013 task on multilingual Word Sense Disambiguation. We describe our experience in producing a multilingual sense-annotated corpus for the task. The corpus is tagged with BabelNet 1.1.1, a freely-available multilingual encyclopedic dictionary and, as a byproduct, WordNet 3.0 and the Wikipedia sense inventory. We present and analyze the results of participating systems, and discuss future directions.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Babelfy is presented, a unified graph-based approach to EL and WSD based on a loose identification of candidate meanings coupled with a densest subgraph heuristic which selects high-coherence semantic interpretations.
Abstract: Entity Linking (EL) and Word Sense Disambiguation (WSD) both address the lexical ambiguity of language. But while the two tasks are pretty similar, they differ in a fundamental respect: in EL the textual mention can be linked to a named entity which may or may not contain the exact mention, while in WSD there is a perfect match between the word form (better, its lemma) and a suitable word sense. In this paper we present Babelfy, a unified graph-based approach to EL and WSD based on a loose identification of candidate meanings coupled with a densest subgraph heuristic which selects high-coherence semantic interpretations. Our experiments show state-of- the-art performances on both tasks on 6 different datasets, including a multilingual setting. Babelfy is online at http://babelfy.org

811 citations


Cites background or methods or result from "SemEval-2013 Task 12: Multilingual ..."

  • ...We also compared the systems with the MFS baseline computed for the three inventories (Navigli et al., 2013)....

    [...]

  • ...The recent upsurge of interest in multilinguality has led to the development of cross-lingual and multilingual approaches to WSD (Lefever and Hoste, 2010; Lefever and Hoste, 2013; Navigli et al., 2013)....

    [...]

  • ...We carried out our experiments on six datasets, four for WSD and two for EL: • The SemEval-2013 task 12 dataset for multilingual WSD (Navigli et al., 2013), which consists of 13 documents in different domains, available in 5 languages....

    [...]

  • ...We tuned our two disambiguation parameters µ = 10 and θ = 0.8 by optimizing F1 on the trial dataset of the SemEval-2013 task on multilingual WSD (Navigli et al., 2013)....

    [...]

  • ...8 by optimizing F1 on the trial dataset of the SemEval-2013 task on multilingual WSD (Navigli et al., 2013)....

    [...]

Proceedings ArticleDOI
01 Apr 2017
TL;DR: A unified evaluation framework is developed and the results show that supervised systems clearly outperform knowledge-based models in Word Sense Disambiguation, and a linear classifier trained on conventional local features still proves to be a hard baseline to beat.
Abstract: Word Sense Disambiguation is a long-standing task in Natural Language Processing, lying at the core of human language understanding. However, the evaluation of automatic systems has been problematic, mainly due to the lack of a reliable evaluation framework. In this paper we develop a unified evaluation framework and analyze the performance of various Word Sense Disambiguation systems in a fair setup. The results show that supervised systems clearly outperform knowledge-based models. Among the supervised systems, a linear classifier trained on conventional local features still proves to be a hard baseline to beat. Nonetheless, recent approaches exploiting neural networks on unlabeled corpora achieve promising results, surpassing this hard baseline in most test sets.

291 citations


Cites methods from "SemEval-2013 Task 12: Multilingual ..."

  • ...…have been constructed for the task (Edmonds and Cotton, 2001; Snyder and Palmer, 2004; Navigli et al., 2007; Pradhan et al., 2007; Agirre et al., 2010a; Navigli et al., 2013; Moro and Navigli, 2015, inter alia), they tend to differ in format, construction guidelines and underlying sense inventory....

    [...]

  • ...• SemEval-13 task 12 (Navigli et al., 2013)....

    [...]

  • ...As unified format we use the XML scheme used for the SemEval-13 allwords WSD task (Navigli et al., 2013), where preprocessing information of a given corpus is also encoded....

    [...]

  • ...XML scheme used for the SemEval-13 allwords WSD task (Navigli et al., 2013), where preprocessing information of a given corpus is also encoded....

    [...]

Proceedings ArticleDOI
01 Jun 2015
TL;DR: The aim with this task is to analyze whether, and if so, how, using a resource that integrates both kinds of inventories might enable WSD and EL to be solved by means of similar (even, the same) methods.
Abstract: In this paper we present the Multilingual AllWords Sense Disambiguation and Entity Linking task. Word Sense Disambiguation (WSD) and Entity Linking (EL) are well-known problems in the Natural Language Processing field and both address the lexical ambiguity of language. Their main difference lies in the kind of meaning inventories that are used: EL uses encyclopedic knowledge, while WSD uses lexicographic information. Our aim with this task is to analyze whether, and if so, how, using a resource that integrates both kinds of inventories (i.e., BabelNet 2.5.1) might enable WSD and EL to be solved by means of similar (even, the same) methods. Moreover, we investigate this task in a multilingual setting and for some specific domains.

231 citations


Cites methods from "SemEval-2013 Task 12: Multilingual ..."

  • ...In contrast to the SemEval-2013 task 12 on Multilingual Word Sense Disambiguation (Navigli et al., 2013), our focus in task 13 is to present a dataset containing both kinds of inventories (i.e., named entities and word senses) in different specific domains (biomedical domain, maths and computer domain, and a broader domain about social issues)....

    [...]

  • ...The system performs WSD by taking advantage of the parallelism of the test data, a feature that was not exploited by the systems that participated in the SemEval-2013 Multilingual Word Sense Disambiguation task 12 (Navigli et al., 2013)....

    [...]

  • ...In contrast to the SemEval-2013 task 12 on Multilingual Word Sense Disambiguation (Navigli et al., 2013), our focus in task 13 is to present a dataset containing both kinds of inventories (i....

    [...]

  • ...In this paper we described the organization and results obtained within the SemEval 2015 task 13: Multilingual Word Sense Disambiguation....

    [...]

  • ...Differently from previous editions (Navigli et al., 2013; Lefever and Hoste, 2013; Manandhar et al., 2010; Lefever and Hoste, 2010; Pradhan et al., 2007; Navigli et al., 2007; Snyder and Palmer, 2004; Palmer et al., 2001), in this task we do not make explicit to the participating systems which fragments of the input text should be disambiguated, so as to have, on the one hand, a more realistic scenario, and, on the other hand, to follow the recent trend in EL challenges such as TAC KBP (Ji et al....

    [...]

Proceedings ArticleDOI
18 May 2015
TL;DR: GERBIL aims to become a focal point for the state of the art, driving the research agenda of the community by presenting comparable objective evaluation results.
Abstract: We present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools on multiple datasets. By these means, we aim to ensure that both tool developers and end users can derive meaningful insights pertaining to the extension, integration and use of annotation applications. In particular, GERBIL provides comparable results to tool developers so as to allow them to easily discover the strengths and weaknesses of their implementations with respect to the state of the art. With the permanent experiment URIs provided by our framework, we ensure the reproducibility and archiving of evaluation results. Moreover, the framework generates data in machine-processable format, allowing for the efficient querying and post-processing of evaluation results. Finally, the tool diagnostics provided by GERBIL allows deriving insights pertaining to the areas in which tools should be further refined, thus allowing developers to create an informed agenda for extensions and end users to detect the right tools for their purposes. GERBIL aims to become a focal point for the state of the art, driving the research agenda of the community by presenting comparable objective evaluation results.

219 citations


Cites methods from "SemEval-2013 Task 12: Multilingual ..."

  • ...Babelfy has been evaluated using six datasets: three from earlier SemEval tasks [33, 29, 28], one from a Senseval task [38] and two already used for evaluating AIDA [17, 16]....

    [...]

Journal ArticleDOI
TL;DR: A novel multilingual vector representation, called Nasari, is put forward, which not only enables accurate representation of word senses in different languages, but it also provides two main advantages over existing approaches: high coverage and comparability across languages and linguistic levels.

215 citations


Cites background or methods from "SemEval-2013 Task 12: Multilingual ..."

  • ...In order to compute θ , we use the English Wikipedia trial dataset provided within the SemEval-2013 WSD task [95]....

    [...]

  • ...On the other hand, the performance of Word Sense Disambiguation (WSD) techniques is still far from ideal [94], which in its turn prevents30 a reliable automatic sense-annotation of large text corpora that can be used for modeling individual word senses....

    [...]

  • ...Recent years have seen a growing interest in multilingual WSD [95]....

    [...]

  • ...Our system obtained state-of-the-art results on multilingual All-Words Word Sense Disambiguation using Wikipedia as sense inventory, evaluated on the SemEval-2013 dataset [95], and on English All-Words Word Sense Disambiguation using WordNet as sense inventory, evaluated on the SemEval-2007 [111] and SemEval-2013 [95] datasets....

    [...]

  • ...One of the main knowledge sense repositories used in this task was the manually constructed WordNet [95,111], which usually leads to a fine-grained type of disambiguation given the nature of the senses in WordNet....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Standard alphabetical procedures for organizing lexical information put together words that are spelled alike and scatter words with similar or related meanings haphazardly through the list.
Abstract: Standard alphabetical procedures for organizing lexical information put together words that are spelled alike and scatter words with similar or related meanings haphazardly through the list. Unfortunately, there is no obvious alternative, no other simple way for lexicographers to keep track of what has been done or for readers to find the word they are looking for. But a frequent objection to this solution is that finding things on an alphabetical list can be tedious and time-consuming. Many people who would like to refer to a dictionary decide not to bother with it because finding the information would interrupt their work and break their train of thought.

5,038 citations


"SemEval-2013 Task 12: Multilingual ..." refers background or methods in this paper

  • ...While an ad-hoc sense inventory was originally chosen for the first Senseval edition (Kilgarriff, 1998; Kilgarriff and Palmer, 2000), later tasks (Edmonds and Cotton, 2001; Snyder and Palmer, 2004; Mihalcea et al., 2004) focused on WordNet (Miller et al., 1990; Fellbaum, 1998) as a sense inventory....

    [...]

  • ...The basic meaning unit in BabelNet is the Babel synset, modeled after the WordNet synset (Miller et al., 1990; Fellbaum, 1998)....

    [...]

Proceedings ArticleDOI
26 Jun 1995
TL;DR: An unsupervised learning algorithm for sense disambiguation that, when trained on unannotated English text, rivals the performance of supervised techniques that require time-consuming hand annotations.
Abstract: This paper presents an unsupervised learning algorithm for sense disambiguation that, when trained on unannotated English text, rivals the performance of supervised techniques that require time-consuming hand annotations. The algorithm is based on two powerful constraints---that words tend to have one sense per discourse and one sense per collocation---exploited in an iterative bootstrapping procedure. Tested accuracy exceeds 96%.

2,594 citations


"SemEval-2013 Task 12: Multilingual ..." refers background in this paper

  • ...Notably, this system leverages the single-sense per discourse heuristic (Yarowsky, 1995), which uses the same sense label for all occurrences of a lemma in a document....

    [...]

Journal ArticleDOI
TL;DR: This work introduces the reader to the motivations for solving the ambiguity of words and provides a description of the task, and overviews supervised, unsupervised, and knowledge-based approaches.
Abstract: Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. We introduce the reader to the motivations for solving the ambiguity of words and provide a description of the task. We overview supervised, unsupervised, and knowledge-based approaches. The assessment of WSD systems is discussed in the context of the Senseval/Semeval campaigns, aiming at the objective evaluation of systems participating in several different disambiguation tasks. Finally, applications, open problems, and future directions are discussed.

2,178 citations


"SemEval-2013 Task 12: Multilingual ..." refers background or methods in this paper

  • ...Word Sense Disambiguation (WSD), the task of automatically assigning predefined meanings to words occurring in context, is a fundamental task in computational lexical semantics (Navigli, 2009; Navigli, 2012)....

    [...]

  • ...While these tasks addressed the multilingual aspect of sense-level text understanding, they departed from the traditional WSD paradigm, i.e., the automatic assignment of senses from an existing inventory, and instead focused on lexical substitution (McCarthy and Navigli, 2009)....

    [...]

  • ...Task 12 uses the standard definitions of precision and recall for WSD evaluation (see, e.g., (Navigli, 2009))....

    [...]

Journal ArticleDOI
TL;DR: An automatic approach to the construction of BabelNet, a very large, wide-coverage multilingual semantic network, key to this approach is the integration of lexicographic and encyclopedic knowledge from WordNet and Wikipedia.

1,522 citations


"SemEval-2013 Task 12: Multilingual ..." refers background or methods or result in this paper

  • ...We present and analyze the results of participating systems, and discuss future directions....

    [...]

  • ...To semantically annotate all the single- and multiword expressions, as well as the named entities, occurring in our test corpus we used BabelNet 1.1.1 (Navigli and Ponzetto, 2012a)....

    [...]

  • ...Over the past few years, a wide-coverage multilingual “encyclopedic” dictionary, called BabelNet, has been developed (Navigli and Ponzetto, 2012a)....

    [...]

  • ...Overall, these results corroborate previous studies suggesting that highly precise sense annotations can be obtained by leveraging multiple languages (Navigli and Ponzetto, 2012b; Navigli and Ponzetto, 2012c)....

    [...]

  • ...To reduce the time required for annotation in the other languages, the sense annotations for the English dataset were then projected onto the other four languages using the sense translation API of BabelNet (Navigli and Ponzetto, 2012d)....

    [...]

Proceedings ArticleDOI
30 Mar 2009
TL;DR: This paper proposes a new graph-based method that uses the knowledge in a LKB (based on WordNet) in order to perform unsupervised Word Sense Disambiguation, performing better than previous approaches in English all-words datasets.
Abstract: In this paper we propose a new graph-based method that uses the knowledge in a LKB (based on WordNet) in order to perform unsupervised Word Sense Disambiguation. Our algorithm uses the full graph of the LKB efficiently, performing better than previous approaches in English all-words datasets. We also show that the algorithm can be easily ported to other languages with good results, with the only requirement of having a wordnet. In addition, we make an analysis of the performance of the algorithm, showing that it is efficient and that it could be tuned to be faster.

608 citations


"SemEval-2013 Task 12: Multilingual ..." refers methods in this paper

  • ...WSD was then performed using the ISR-WN network in combination with the algorithm of Gutiérrez (2012), which is an extension of the Personalized PageRank algorithm for WSD (Agirre and Soroa, 2009) which includes senses frequency....

    [...]

  • ...WSD was then performed using the ISR-WN network in combination with the algorithm of Gutiérrez (2012), which is an extension of the Personalized PageRank algorithm for WSD (Agirre and Soroa, 2009) which includes senses frequency....

    [...]