Constituency Parsing with a Self-Attentive Encoder

doi:10.18653/V1/P18-1249

Open AccessProceedings ArticleDOI

Constituency Parsing with a Self-Attentive Encoder

Nikita Kitaev, +1 more

- Vol. 1, pp 2676-2686

Chats0

TLDR

This paper used an LSTM encoder with a self-attentive architecture and achieved state-of-the-art performance on the Penn Treebank with 93.55 F1 without the use of any external data.

Abstract:

We demonstrate that replacing an LSTM encoder with a self-attentive architecture can lead to improvements to a state-of-the-art discriminative constituency parser. The use of attention makes explicit the manner in which information is propagated between different locations in the sentence, which we use to both analyze our model and propose potential improvements. For example, we find that separating positional and content information in the encoder can lead to improved parsing accuracy. Additionally, we evaluate different approaches for lexical representation. Our parser achieves new state-of-the-art results for single models trained on the Penn Treebank: 93.55 F1 without the use of any external data, and 95.13 F1 when using pre-trained word representations. Our parser also outperforms the previous best-published accuracy figures on 8 of the 9 languages in the SPMRL dataset.

Citations

PDF

Open Access

More filters

Posted Content

Universal Transformers

Mostafa Dehghani, +4 more

- 10 Jul 2018 -

arXiv: Computation and Language

TL;DR: The authors proposed the Universal Transformer model, which employs a self-attention mechanism in every recursive step to combine information from different parts of a sequence, and further employs an adaptive computation time (ACT) mechanism to dynamically adjust the number of times the representation of each position in a sequence is revised.

...read moreread less

Posted Content

What do you learn from context? Probing for sentence structure in contextualized word representations

Ian Tenney, +10 more

- 15 May 2019 -

arXiv: Computation and Language

TL;DR: The authors investigate word-level contextual representations from four recent models and investigate how they encode sentence structure across a range of syntactic, semantic, local, and long-range phenomena, finding that existing models trained on language modeling and translation produce strong representations for syntactic phenomena, but only offer comparably small improvements on semantic tasks over a non-contextual baseline.

...read moreread less

Proceedings ArticleDOI

Dissecting Contextual Word Embeddings: Architecture and Representation

Matthew E. Peters, +3 more

TL;DR: There is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks, suggesting that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.

...read moreread less

Journal ArticleDOI

Attention in Natural Language Processing

Andrea Galassi, +2 more

- 01 Oct 2021 -

IEEE Transactions on Neural Networks

TL;DR: This article proposed a taxonomy of attention models according to four dimensions: the representation of the input, the compatibility function, the distribution function, and the multiplicity of the inputs and outputs.

...read moreread less

Journal ArticleDOI

A Survey on Deep Learning for Named Entity Recognition

- 01 Jan 2022 -

IEEE Transactions on Knowledge and Data ...

TL;DR: A comprehensive review on existing deep learning techniques for NER can be found in this article , where the authors systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

Proceedings ArticleDOI

Deep contextualized word representations

Matthew E. Peters, +6 more

TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).

...read moreread less

Proceedings Article

Grammar as a foreign language

Oriol Vinyals, +5 more

TL;DR: The domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers.

...read moreread less

Proceedings ArticleDOI

A Minimal Span-Based Neural Constituency Parser

Mitchell Stern, +2 more

TL;DR: This article proposed a greedy top-down inference algorithm based on recursive partitioning of the input, which achieved state-of-the-art performance on the Penn Treebank and the French Treebank.

...read moreread less

Proceedings Article

Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages

Djamé Seddah, +22 more

TL;DR: This paper presents and analyzes parsing results obtained by the task participants, and provides an analysis and comparison of the parsers across languages and frameworks, reported for gold input as well as more realistic parsing scenarios.

...read moreread less

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

Building a large annotated corpus of English: the penn treebank

Mitchell Marcus, +2 more

- 01 Jun 1993 -

Computational Linguistics

Constituency Parsing with a Self-Attentive Encoder

Citations

Universal Transformers

What do you learn from context? Probing for sentence structure in contextualized word representations

Dissecting Contextual Word Embeddings: Architecture and Representation

Attention in Natural Language Processing

A Survey on Deep Learning for Named Entity Recognition

References

Attention is All you Need

Deep contextualized word representations

Grammar as a foreign language

A Minimal Span-Based Neural Constituency Parser

Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Building a large annotated corpus of English: the penn treebank

Attention is All you Need

Adam: A Method for Stochastic Optimization

Deep contextualized word representations