Open AccessPosted Content
Knowledge-based Word Sense Disambiguation using Topic Models
TLDR
The authors proposed a variant of Latent Dirichlet Allocation in which the topic proportions for a document are replaced by synset proportions and further utilize the information in the WordNet by assigning a non-uniform prior to synset distribution over words and a logistic-normal prior for document distribution over synsets.Abstract:
Word Sense Disambiguation is an open problem in Natural Language Processing which is particularly challenging and useful in the unsupervised setting where all the words in any given text need to be disambiguated without using any labeled data. Typically WSD systems use the sentence or a small window of words around the target word as the context for disambiguation because their computational complexity scales exponentially with the size of the context. In this paper, we leverage the formalism of topic model to design a WSD system that scales linearly with the number of words in the context. As a result, our system is able to utilize the whole document as the context for a word to be disambiguated. The proposed method is a variant of Latent Dirichlet Allocation in which the topic proportions for a document are replaced by synset proportions. We further utilize the information in the WordNet by assigning a non-uniform prior to synset distribution over words and a logistic-normal prior for document distribution over synsets. We evaluate the proposed method on Senseval-2, Senseval-3, SemEval-2007, SemEval-2013 and SemEval-2015 English All-Word WSD datasets and show that it outperforms the state-of-the-art unsupervised knowledge-based WSD system by a significant margin.read more
Citations
More filters
Proceedings ArticleDOI
Zero-shot Entity Linking by Reading Entity Descriptions
TL;DR: It is shown that strong reading comprehension models pre-trained on large unlabeled data can be used to generalize to unseen entities and proposed domain-adaptive pre-training (DAP) is proposed to address the domain shift problem associated with linking unseen entities in a new domain.
Posted Content
From Word to Sense Embeddings: A Survey on Vector Representations of Meaning
TL;DR: This survey presents a comprehensive overview of the wide range of techniques in the two main branches of sense representation, i.e., unsupervised and knowledge-based and provides an analysis of four of its important aspects: interpretability, sense granularity, adaptability to different domains and compositionality.
Proceedings ArticleDOI
Zero-shot Word Sense Disambiguation using Sense Definition Embeddings
TL;DR: This work proposes Extended WSD Incorporating Sense Embeddings (EWISE), a supervised model to perform WSD by predicting over a continuous sense embedding space as opposed to a discrete label space, which allows EWISE to generalize over both seen and unseen senses, thus achieving generalized zero-shot learning.
Journal ArticleDOI
Analysis and Evaluation of Language Models for Word Sense Disambiguation
TL;DR: An in-depth quantitative and qualitative analysis of the celebrated BERT model with respect to lexical ambiguity reveals that BERT can accurately capture high-level sense distinctions, even when a limited number of examples is available for each word sense.
Proceedings ArticleDOI
Just “OneSeC” for Producing Multilingual Sense-Annotated Data
TL;DR: This paper presents OneSeC, a language-independent method for the automatic extraction of hundreds of thousands of sentences in which a target word is tagged with its meaning, which beats the existing state of the art on all languages and most domains.
References
More filters
Proceedings Article
Latent Dirichlet Allocation
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI
WordNet: a lexical database for English
TL;DR: WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.
Journal ArticleDOI
Probabilistic topic models
TL;DR: Surveying a suite of algorithms that offer a solution to managing large document archives suggests they are well-suited to handle large amounts of data.
Journal ArticleDOI
Word sense disambiguation: A survey
TL;DR: This work introduces the reader to the motivations for solving the ambiguity of words and provides a description of the task, and overviews supervised, unsupervised, and knowledge-based approaches.
Proceedings ArticleDOI
Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone
TL;DR: The authors used machine readable dictionaries and looked for words in sense definitions that overlap words in the definition of nearby words to decide automatically which sense of a word is intended (in written English).