Neural Sequence Learning Models for Word Sense Disambiguation

doi:10.18653/V1/D17-1120

Open AccessProceedings ArticleDOI

Neural Sequence Learning Models for Word Sense Disambiguation

Alessandro Raganato, +2 more

- pp 1156-1167

Chats0

TLDR

This work proposes and studies in depth a series of end-to-end neural architectures directly tailored to the task, from bidirectional Long Short-Term Memory to encoder-decoder models, and shows that sequence learning enables more versatile all-words models that consistently lead to state-of-the-art results, even against word experts with engineered features.

Abstract:

Word Sense Disambiguation models exist in many flavors. Even though supervised ones tend to perform best in terms of accuracy, they often lose ground to more flexible knowledge-based solutions, which do not require training by a word expert for every disambiguation target. To bridge this gap we adopt a different perspective and rely on sequence learning to frame the disambiguation problem: we propose and study in depth a series of end-to-end neural architectures directly tailored to the task, from bidirectional Long Short-Term Memory to encoder-decoder models. Our extensive evaluation over standard benchmarks and in multiple languages shows that sequence learning enables more versatile all-words models that consistently lead to state-of-the-art results, even against word experts with engineered features.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep contextualized word representations

Matthew E. Peters, +6 more

TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).

...read moreread less

Posted Content

Deep contextualized word representations

Matthew E. Peters, +6 more

- 15 Feb 2018 -

arXiv: Computation and Language

TL;DR: This article introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).

...read moreread less

Proceedings ArticleDOI

Knowledge Enhanced Contextual Word Representations

Matthew E. Peters, +6 more

TL;DR: KnowBert as discussed by the authors proposes a general method to embed multiple knowledge bases (KBs) into large scale models, and thereby enhance their representations with structured, human-curated knowledge by using an integrated entity linker to retrieve relevant entity embeddings, then update contextual word representations via a form of word-to-entity attention.

...read moreread less

Posted Content

Knowledge Enhanced Contextual Word Representations

Matthew E. Peters, +6 more

- 09 Sep 2019 -

arXiv: Computation and Language

TL;DR: After integrating WordNet and a subset of Wikipedia into BERT, the knowledge enhanced BERT (KnowBert) demonstrates improved perplexity, ability to recall facts as measured in a probing task and downstream performance on relationship extraction, entity typing, and word sense disambiguation.

...read moreread less

Proceedings ArticleDOI

GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge

Luyao Huang, +3 more

TL;DR: This paper construct context-gloss pairs and propose three BERT based models for WSD and fine-tune the pre-trained BERT model to achieve new state-of-the-art results on WSD task.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Posted Content

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, +3 more

- 16 Jan 2013 -

arXiv: Computation and Language

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.

...read moreread less

Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Proceedings ArticleDOI

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

Kyunghyun Cho, +8 more

TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.

...read moreread less

Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Sep 2014 -

arXiv: Computation and Language

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Collapse

Related Papers (5)

Word sense disambiguation: A survey

Roberto Navigli

- 23 Feb 2009 -

ACM Computing Surveys

Neural Sequence Learning Models for Word Sense Disambiguation

Citations

Deep contextualized word representations

Deep contextualized word representations

Knowledge Enhanced Contextual Word Representations

Knowledge Enhanced Contextual Word Representations

GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge

References

Long short-term memory

Efficient Estimation of Word Representations in Vector Space

Neural Machine Translation by Jointly Learning to Align and Translate

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

Neural Machine Translation by Jointly Learning to Align and Translate

Related Papers (5)

Word sense disambiguation: A survey

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

context2vec: Learning Generic Context Embedding with Bidirectional LSTM

Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

Deep contextualized word representations