Constituency Parsing with a Self-Attentive Encoder
Nikita Kitaev,Dan Klein +1 more
- Vol. 1, pp 2676-2686
Reads0
Chats0
TLDR
This paper used an LSTM encoder with a self-attentive architecture and achieved state-of-the-art performance on the Penn Treebank with 93.55 F1 without the use of any external data.Abstract:
We demonstrate that replacing an LSTM encoder with a self-attentive architecture can lead to improvements to a state-of-the-art discriminative constituency parser. The use of attention makes explicit the manner in which information is propagated between different locations in the sentence, which we use to both analyze our model and propose potential improvements. For example, we find that separating positional and content information in the encoder can lead to improved parsing accuracy. Additionally, we evaluate different approaches for lexical representation. Our parser achieves new state-of-the-art results for single models trained on the Penn Treebank: 93.55 F1 without the use of any external data, and 95.13 F1 when using pre-trained word representations. Our parser also outperforms the previous best-published accuracy figures on 8 of the 9 languages in the SPMRL dataset.read more
Citations
More filters
Posted Content
Universal Transformers
TL;DR: The authors proposed the Universal Transformer model, which employs a self-attention mechanism in every recursive step to combine information from different parts of a sequence, and further employs an adaptive computation time (ACT) mechanism to dynamically adjust the number of times the representation of each position in a sequence is revised.
Posted Content
What do you learn from context? Probing for sentence structure in contextualized word representations
Ian Tenney,Patrick Xia,Berlin Chen,Alex Wang,Adam Poliak,R. Thomas McCoy,Najoung Kim,Benjamin Van Durme,Samuel R. Bowman,Dipanjan Das,Ellie Pavlick +10 more
TL;DR: The authors investigate word-level contextual representations from four recent models and investigate how they encode sentence structure across a range of syntactic, semantic, local, and long-range phenomena, finding that existing models trained on language modeling and translation produce strong representations for syntactic phenomena, but only offer comparably small improvements on semantic tasks over a non-contextual baseline.
Proceedings ArticleDOI
Dissecting Contextual Word Embeddings: Architecture and Representation
TL;DR: There is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks, suggesting that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.
Journal ArticleDOI
Attention in Natural Language Processing
TL;DR: This article proposed a taxonomy of attention models according to four dimensions: the representation of the input, the compatibility function, the distribution function, and the multiplicity of the inputs and outputs.
Journal ArticleDOI
A Survey on Deep Learning for Named Entity Recognition
TL;DR: A comprehensive review on existing deep learning techniques for NER can be found in this article , where the authors systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder.
References
More filters
Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Proceedings ArticleDOI
Deep contextualized word representations
Matthew E. Peters,Mark Neumann,Mohit Iyyer,Matt Gardner,Christopher Clark,Kenton Lee,Luke Zettlemoyer +6 more
TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).
Proceedings Article
Grammar as a foreign language
TL;DR: The domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers.
Proceedings ArticleDOI
A Minimal Span-Based Neural Constituency Parser
TL;DR: This article proposed a greedy top-down inference algorithm based on recursive partitioning of the input, which achieved state-of-the-art performance on the Penn Treebank and the French Treebank.
Proceedings Article
Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages
Djamé Seddah,Reut Tsarfaty,Sandra Kübler,Marie Candito,Jinho D. Choi,Richárd Farkas,Jennifer Foster,Iakes Goenaga,Koldo Gojenola Galletebeitia,Yoav Goldberg,Spence Green,Nizar Habash,Marco Kuhlmann,Wolfgang Maier,Joakim Nivre,Adam Przepiórkowski,Ryan M. Roth,Wolfgang Seeker,Yannick Versley,Veronika Vincze,Marcin Woliński,Alina Wróblewska,Éric Villemonte de la Clergerie +22 more
TL;DR: This paper presents and analyzes parsing results obtained by the task participants, and provides an analysis and comparison of the parsers across languages and frameworks, reported for gold input as well as more realistic parsing scenarios.