scispace - formally typeset
Open AccessPosted Content

Knowledge-based Word Sense Disambiguation using Topic Models

TLDR
The authors proposed a variant of Latent Dirichlet Allocation in which the topic proportions for a document are replaced by synset proportions and further utilize the information in the WordNet by assigning a non-uniform prior to synset distribution over words and a logistic-normal prior for document distribution over synsets.
Abstract
Word Sense Disambiguation is an open problem in Natural Language Processing which is particularly challenging and useful in the unsupervised setting where all the words in any given text need to be disambiguated without using any labeled data. Typically WSD systems use the sentence or a small window of words around the target word as the context for disambiguation because their computational complexity scales exponentially with the size of the context. In this paper, we leverage the formalism of topic model to design a WSD system that scales linearly with the number of words in the context. As a result, our system is able to utilize the whole document as the context for a word to be disambiguated. The proposed method is a variant of Latent Dirichlet Allocation in which the topic proportions for a document are replaced by synset proportions. We further utilize the information in the WordNet by assigning a non-uniform prior to synset distribution over words and a logistic-normal prior for document distribution over synsets. We evaluate the proposed method on Senseval-2, Senseval-3, SemEval-2007, SemEval-2013 and SemEval-2015 English All-Word WSD datasets and show that it outperforms the state-of-the-art unsupervised knowledge-based WSD system by a significant margin.

read more

Citations
More filters
Proceedings ArticleDOI

Zero-shot Entity Linking by Reading Entity Descriptions

TL;DR: It is shown that strong reading comprehension models pre-trained on large unlabeled data can be used to generalize to unseen entities and proposed domain-adaptive pre-training (DAP) is proposed to address the domain shift problem associated with linking unseen entities in a new domain.
Posted Content

From Word to Sense Embeddings: A Survey on Vector Representations of Meaning

TL;DR: This survey presents a comprehensive overview of the wide range of techniques in the two main branches of sense representation, i.e., unsupervised and knowledge-based and provides an analysis of four of its important aspects: interpretability, sense granularity, adaptability to different domains and compositionality.
Proceedings ArticleDOI

Zero-shot Word Sense Disambiguation using Sense Definition Embeddings

TL;DR: This work proposes Extended WSD Incorporating Sense Embeddings (EWISE), a supervised model to perform WSD by predicting over a continuous sense embedding space as opposed to a discrete label space, which allows EWISE to generalize over both seen and unseen senses, thus achieving generalized zero-shot learning.
Journal ArticleDOI

Analysis and Evaluation of Language Models for Word Sense Disambiguation

TL;DR: An in-depth quantitative and qualitative analysis of the celebrated BERT model with respect to lexical ambiguity reveals that BERT can accurately capture high-level sense distinctions, even when a limited number of examples is available for each word sense.
Proceedings ArticleDOI

Just “OneSeC” for Producing Multilingual Sense-Annotated Data

TL;DR: This paper presents OneSeC, a language-independent method for the automatic extraction of hundreds of thousands of sentences in which a target word is tagged with its meaning, which beats the existing state of the art on all languages and most domains.
References
More filters
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI

WordNet: a lexical database for English

TL;DR: WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.
Journal ArticleDOI

Probabilistic topic models

TL;DR: Surveying a suite of algorithms that offer a solution to managing large document archives suggests they are well-suited to handle large amounts of data.
Journal ArticleDOI

Word sense disambiguation: A survey

TL;DR: This work introduces the reader to the motivations for solving the ambiguity of words and provides a description of the task, and overviews supervised, unsupervised, and knowledge-based approaches.
Proceedings ArticleDOI

Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

TL;DR: The authors used machine readable dictionaries and looked for words in sense definitions that overlap words in the definition of nearby words to decide automatically which sense of a word is intended (in written English).