In-Order Transition-based Constituent Parsing

doi:10.1162/TACL_A_00070

Home
/
Papers
/
In-Order Transition-based Constituent Parsing

Journal Article•DOI•

In-Order Transition-based Constituent Parsing

Jiangming Liu¹, Yue Zhang¹•Institutions (1)

Singapore University of Technology and Design¹

11 Nov 2017-Transactions of the Association for Computational Linguistics (MIT Press One Rogers Street, Cambridge, MA 02142-1209 USA journals-info@mit.edu)-Vol. 5, Iss: 1, pp 413-424

TL;DR: A novel parsing system based on in-order traversal over syntactic trees, designing a set of transition actions to find a compromise between bottom-up constituent information and top-down lookahead information is proposed.

read less

Abstract: Both bottom-up and top-down strategies have been used for neural transition-based constituent parsing. The parsing strategies differ in terms of the order in which they recognize productions in the derivation tree, where bottom-up strategies and top-down strategies take post-order and pre-order traversal over trees, respectively. Bottom-up parsers benefit from rich features from readily built partial parses, but lack lookahead guidance in the parsing process; top-down parsers benefit from non-local guidance for local decisions, but rely on a strong encoder over the input to predict a constituent hierarchy before its construction. To mitigate both issues, we propose a novel parsing system based on in-order traversal over syntactic trees, designing a set of transition actions to find a compromise between bottom-up constituent information and top-down lookahead information. Based on stack-LSTM, our psycholinguistically motivated constituent parsing system achieves 91.8 F1 on WSJ benchmark. Furthermore, the system achieves 93.6 F1 with supervised reranking and 94.2 F1 with semi-supervised reranking, which are the best results on the WSJ benchmark.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Constituency Parsing with a Self-Attentive Encoder

[...]

Nikita Kitaev¹, Dan Klein¹•Institutions (1)

University of California, Berkeley¹

02 May 2018

TL;DR: This paper used an LSTM encoder with a self-attentive architecture and achieved state-of-the-art performance on the Penn Treebank with 93.55 F1 without the use of any external data.

...read moreread less

Abstract: We demonstrate that replacing an LSTM encoder with a self-attentive architecture can lead to improvements to a state-of-the-art discriminative constituency parser. The use of attention makes explicit the manner in which information is propagated between different locations in the sentence, which we use to both analyze our model and propose potential improvements. For example, we find that separating positional and content information in the encoder can lead to improved parsing accuracy. Additionally, we evaluate different approaches for lexical representation. Our parser achieves new state-of-the-art results for single models trained on the Penn Treebank: 93.55 F1 without the use of any external data, and 95.13 F1 when using pre-trained word representations. Our parser also outperforms the previous best-published accuracy figures on 8 of the 9 languages in the SPMRL dataset.

...read moreread less

380 citations

Proceedings Article•DOI•

Multilingual Constituency Parsing with Self-Attention and Pre-Training.

[...]

Nikita Kitaev¹, Steven Cao¹, Dan Klein¹•Institutions (1)

University of California, Berkeley¹

01 Jul 2019

TL;DR: It is shown that constituency parsing benefits from unsupervised pre-training across a variety of languages and a range of pre- training conditions, and the idea of joint fine-tuning is explored and shows that it gives low-resource languages a way to benefit from the larger datasets of other languages.

...read moreread less

Abstract: We show that constituency parsing benefits from unsupervised pre-training across a variety of languages and a range of pre-training conditions. We first compare the benefits of no pre-training, fastText, ELMo, and BERT for English and find that BERT outperforms ELMo, in large part due to increased model capacity, whereas ELMo in turn outperforms the non-contextual fastText embeddings. We also find that pre-training is beneficial across all 11 languages tested; however, large model sizes (more than 100 million parameters) make it computationally expensive to train separate models for each language. To address this shortcoming, we show that joint multilingual pre-training and fine-tuning allows sharing all but a small number of parameters between ten languages in the final model. The 10x reduction in model size compared to fine-tuning one model per language causes only a 3.2% relative error increase in aggregate. We further explore the idea of joint fine-tuning and show that it gives low-resource languages a way to benefit from the larger datasets of other languages. Finally, we demonstrate new state-of-the-art results for 11 languages, including English (95.8 F1) and Chinese (91.8 F1).

...read moreread less

148 citations

Proceedings Article•DOI•

Head-Driven Phrase Structure Grammar Parsing on Penn Treebank

[...]

Junru Zhou¹, Hai Zhao¹•Institutions (1)

Shanghai Jiao Tong University¹

05 Jul 2019

TL;DR: In this paper, a simplified head-driven phrase structure grammar (HPSG) was proposed by integrating constituent and dependency formal representations into HPSG, and two parsing algorithms were respectively proposed for two converted tree representations, division span and joint span.

...read moreread less

Abstract: Head-driven phrase structure grammar (HPSG) enjoys a uniform formalism representing rich contextual syntactic and even semantic meanings. This paper makes the first attempt to formulate a simplified HPSG by integrating constituent and dependency formal representations into head-driven phrase structure. Then two parsing algorithms are respectively proposed for two converted tree representations, division span and joint span. As HPSG encodes both constituent and dependency structure information, the proposed HPSG parsers may be regarded as a sort of joint decoder for both types of structures and thus are evaluated in terms of extracted or converted constituent and dependency parsing trees. Our parser achieves new state-of-the-art performance for both parsing tasks on Penn Treebank (PTB) and Chinese Penn Treebank, verifying the effectiveness of joint learning constituent and dependency structures. In details, we report 95.84 F1 of constituent parsing and 97.00% UAS of dependency parsing on PTB.

...read moreread less

117 citations

Proceedings Article•DOI•

Straight to the Tree: Constituency Parsing with Neural Syntactic Distance

[...]

Yikang Shen¹, Zhouhan Lin², Athul Paul Jacob³, Alessandro Sordoni⁴, Aaron Courville², Yoshua Bengio² - Show less +2 more•Institutions (4)

Beihang University¹, Université de Montréal², University of Waterloo³, Microsoft⁴

11 Jun 2018

TL;DR: The authors proposed a constituency parsing scheme, which predicts a real-valued scalar, named syntactic distance, for each split position in the sentence and the topology of grammar tree is then determined by the values of syntactic distances.

...read moreread less

Abstract: In this work, we propose a novel constituency parsing scheme. The model first predicts a real-valued scalar, named syntactic distance, for each split position in the sentence. The topology of grammar tree is then determined by the values of syntactic distances. Compared to traditional shift-reduce parsing schemes, our approach is free from the potentially disastrous compounding error. It is also easier to parallelize and much faster. Our model achieves the state-of-the-art single model F1 score of 92.1 on PTB and 86.4 on CTB dataset, which surpasses the previous single model results by a large margin.

...read moreread less

90 citations

Proceedings Article•DOI•

Rethinking Self-Attention: Towards Interpretability in Neural Parsing.

[...]

Khalil Mrini¹, Franck Dernoncourt², Quan Hung Tran³, Trung Bui³, Walter Chang³, Ndapa Nakashole¹ - Show less +2 more•Institutions (3)

University of California, San Diego¹, University of Oregon², Adobe Systems³

01 Nov 2020

TL;DR: The Label Attention Layer is introduced: a new form of self-attention where attention heads represent labels and the Label Attention heads learn relations between syntactic categories and show pathways to analyze errors.

...read moreread less

Abstract: Attention mechanisms have improved the performance of NLP tasks while allowing models to remain explainable. Self-attention is currently widely used, however interpretability is difficult due to the numerous attention distributions. Recent work has shown that model representations can benefit from label-specific information, while facilitating interpretation of predictions. We introduce the Label Attention Layer: a new form of self-attention where attention heads represent labels. We test our novel layer by running constituency and dependency parsing experiments and show our new model obtains new state-of-the-art results for both tasks on both the Penn Treebank (PTB) and Chinese Treebank. Additionally, our model requires fewer self-attention layers compared to existing work. Finally, we find that the Label Attention heads learn relations between syntactic categories and show pathways to analyze errors.

...read moreread less

75 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16

Collapse

References

PDF

Open Access

More filters

Report•DOI•

Building a large annotated corpus of English: the penn treebank

[...]

Mitchell Marcus¹, Mary Ann Marcinkiewicz¹, Beatrice Santorini²•Institutions (2)

University of Pennsylvania¹, Northwestern University²

01 Jun 1993-Computational Linguistics

TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.

...read moreread less

Abstract: : As a result of this grant, the researchers have now published oil CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, with over 3 million words of that material assigned skeletal grammatical structure This material now includes a fully hand-parsed version of the classic Brown corpus About one half of the papers at the ACL Workshop on Using Large Text Corpora this past summer were based on the materials generated by this grant

...read moreread less

8,377 citations

"In-Order Transition-based Constitue..." refers methods in this paper

...For English data, we use the standard benchmark of WSJ sections in PTB (Marcus et al., 1993), where the sections 2-21 are taken for training data, section 22 for development data and section 23 for test for both dependency parsing and constituency parsing....
[...]

Journal Article•DOI•

Head-Driven Statistical Models for Natural Language Parsing

[...]

Michael Collins¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Dec 2003-Computational Linguistics

TL;DR: Three statistical models for natural language parsing are described, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree.

...read moreread less

Abstract: This article describes three statistical models for natural language parsing. The models extend methods from probabilistic context-free grammars to lexicalized grammars, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree. Independence assumptions then lead to parameters that encode the X-bar schema, subcategorization, ordering of complements, placement of adjuncts, bigram lexical dependencies, wh-movement, and preferences for close attachment. All of these preferences are expressed by probabilities conditioned on lexical heads. The models are evaluated on the Penn Wall Street Journal Treebank, showing that their accuracy is competitive with other models in the literature. To gain a better understanding of the models, we also give results on different constituent types, as well as a breakdown of precision/recall results in recovering various types of dependencies. We analyze various characteristics of the models through experiments on parsing accuracy, by collecting frequencies of various structures in the treebank, and through linguistically motivated examples. Finally, we compare the models to others that have been applied to parsing the treebank, aiming to give some explanation of the difference in performance of the various models.

...read moreread less

1,956 citations

"In-Order Transition-based Constitue..." refers background in this paper

...When making local decisions, rich information is available from readily built partial trees (Zhu et al., 2013; Watanabe and Sumita, 2015; Cross and Huang, 2016), which contributes to local disambiguation....
[...]

Proceedings Article•DOI•

A Fast and Accurate Dependency Parser using Neural Networks

[...]

Danqi Chen¹, Christopher D. Manning¹•Institutions (1)

Stanford University¹

01 Jan 2014

TL;DR: This work proposes a novel way of learning a neural network classifier for use in a greedy, transition-based dependency parser that can work very fast, while achieving an about 2% improvement in unlabeled and labeled attachment scores on both English and Chinese datasets.

...read moreread less

Abstract: Almost all current dependency parsers classify based on millions of sparse indicator features. Not only do these features generalize poorly, but the cost of feature computation restricts parsing speed significantly. In this work, we propose a novel way of learning a neural network classifier for use in a greedy, transition-based dependency parser. Because this classifier learns and uses just a small number of dense features, it can work very fast, while achieving an about 2% improvement in unlabeled and labeled attachment scores on both English and Chinese datasets. Concretely, our parser is able to parse more than 1000 sentences per second at 92.2% unlabeled attachment score on the English Penn Treebank.

...read moreread less

1,939 citations

Additional excerpts

...Seminal work employs transition-based methods (Chen and Manning, 2014)....
[...]

Proceedings Article•

A maximum-entropy-inspired parser

[...]

Eugene Charniak¹•Institutions (1)

Brown University¹

29 Apr 2000

TL;DR: A new parser for parsing down to Penn tree-bank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less and 89.5% when trained and tested on the previously established sections of the Wall Street Journal treebank is presented.

...read moreread less

Abstract: We present a new parser for parsing down to Penn tree-bank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less, and 89.5% for sentences of length 100 and less when trained and tested on the previously established [5, 9, 10, 15, 17] "standard" sections of the Wall Street Journal treebank. This represents a 13% decrease in error rate over the best single-parser results on this corpus [9]. The major technical innovation is the use of a "maximum-entropy-inspired" model for conditioning and smoothing that let us successfully to test and combine many different conditioning events. We also present some partial results showing the effects of different conditioning information, including a surprising 2% improvement due to guessing the lexical head's pre-terminal before guessing the lexical head.

...read moreread less

1,709 citations

"In-Order Transition-based Constitue..." refers background in this paper

...However, there is lack of top-down guidance from lookahead information, which can be useful (Johnson, 1998; Roark and Johnson, 1999; Charniak, 2000; Liu and Zhang, 2017)....
[...]

Proceedings Article•

Parsing with Compositional Vector Grammars

[...]

Richard Socher¹, John Bauer¹, Christopher D. Manning¹, Ng Andrew Y.¹•Institutions (1)

Stanford University¹

01 Aug 2013

TL;DR: A Compositional Vector Grammar (CVG), which combines PCFGs with a syntactically untied recursive neural network that learns syntactico-semantic, compositional vector representations and improves performance on the types of ambiguities that require semantic information such as PP attachments.

...read moreread less

Abstract: Natural language parsing has typically been done with small sets of discrete categories such as NP and VP, but this representation does not capture the full syntactic nor semantic richness of linguistic phrases, and attempts to improve on this by lexicalizing phrases or splitting categories only partly address the problem at the cost of huge feature spaces and sparseness. Instead, we introduce a Compositional Vector Grammar (CVG), which combines PCFGs with a syntactically untied recursive neural network that learns syntactico-semantic, compositional vector representations. The CVG improves the PCFG of the Stanford Parser by 3.8% to obtain an F1 score of 90.4%. It is fast to train and implemented approximately as an efficient reranker it is about 20% faster than the current Stanford factored parser. The CVG learns a soft notion of head words and improves performance on the types of ambiguities that require semantic information such as PP attachments.

...read moreread less

953 citations