scispace - formally typeset
Open AccessProceedings ArticleDOI

Constituency Parsing with a Self-Attentive Encoder

Nikita Kitaev, +1 more
- Vol. 1, pp 2676-2686
Reads0
Chats0
TLDR
This paper used an LSTM encoder with a self-attentive architecture and achieved state-of-the-art performance on the Penn Treebank with 93.55 F1 without the use of any external data.
Abstract
We demonstrate that replacing an LSTM encoder with a self-attentive architecture can lead to improvements to a state-of-the-art discriminative constituency parser. The use of attention makes explicit the manner in which information is propagated between different locations in the sentence, which we use to both analyze our model and propose potential improvements. For example, we find that separating positional and content information in the encoder can lead to improved parsing accuracy. Additionally, we evaluate different approaches for lexical representation. Our parser achieves new state-of-the-art results for single models trained on the Penn Treebank: 93.55 F1 without the use of any external data, and 95.13 F1 when using pre-trained word representations. Our parser also outperforms the previous best-published accuracy figures on 8 of the 9 languages in the SPMRL dataset.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Universal Transformers

TL;DR: The authors proposed the Universal Transformer model, which employs a self-attention mechanism in every recursive step to combine information from different parts of a sequence, and further employs an adaptive computation time (ACT) mechanism to dynamically adjust the number of times the representation of each position in a sequence is revised.
Posted Content

What do you learn from context? Probing for sentence structure in contextualized word representations

TL;DR: The authors investigate word-level contextual representations from four recent models and investigate how they encode sentence structure across a range of syntactic, semantic, local, and long-range phenomena, finding that existing models trained on language modeling and translation produce strong representations for syntactic phenomena, but only offer comparably small improvements on semantic tasks over a non-contextual baseline.
Proceedings ArticleDOI

Dissecting Contextual Word Embeddings: Architecture and Representation

TL;DR: There is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks, suggesting that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.
Journal ArticleDOI

Attention in Natural Language Processing

TL;DR: This article proposed a taxonomy of attention models according to four dimensions: the representation of the input, the compatibility function, the distribution function, and the multiplicity of the inputs and outputs.
Journal ArticleDOI

A Survey on Deep Learning for Named Entity Recognition

TL;DR: A comprehensive review on existing deep learning techniques for NER can be found in this article , where the authors systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder.
References
More filters
Proceedings Article

Attention is All you Need

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Proceedings ArticleDOI

Deep contextualized word representations

TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).
Proceedings Article

Grammar as a foreign language

TL;DR: The domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers.
Proceedings ArticleDOI

A Minimal Span-Based Neural Constituency Parser

TL;DR: This article proposed a greedy top-down inference algorithm based on recursive partitioning of the input, which achieved state-of-the-art performance on the Penn Treebank and the French Treebank.
Related Papers (5)