What Do Recurrent Neural Network Grammars Learn About Syntax
Citations
841 citations
541 citations
536 citations
505 citations
397 citations
Cites background from "What Do Recurrent Neural Network Gr..."
..., 2017), or by ablations to the models to investigate how behavior varies (Li et al., 2016b; Smith et al., 2017)....
[...]
...…by using extrinsic probing tasks to examine whether certain linguistic properties can be predicted from those representations (Shi et al., 2016; Linzen et al., 2016; Belinkov et al., 2017), or by ablations to the models to investigate how behavior varies (Li et al., 2016b; Smith et al., 2017)....
[...]
References
72,897 citations
"What Do Recurrent Neural Network Gr..." refers methods in this paper
...The new phrase’s final representation uses element-wise multiplication ( ) with respect to both tnt and m, a process reminiscent of the LSTM “forget” gate: c = g tnt + (1− g) m. (3) The intuition is that the composed representation should incorporate both nonterminal information and information about the constituents (through weighted sum and attention mechanism)....
[...]
...The length of these vectors is defined by the dimensionality of the bidirectional LSTM used in the original composition function (Fig....
[...]
...Vinyals et al. (2015) directly predict the sequence of nonterminals, “shifts” (which consume a terminal symbol), and parentheses from left to right, conditional on the input terminal sequence x, while Choe and Charniak (2016) used a sequential LSTM language model on the same linearized trees to create a generative variant of the Vinyals et al. (2015) model....
[...]
...Even when the preposi- 7Cf. Li et al. (2016), where sequential LSTMs discover polarity information in sentiment analysis, although perhaps more surprising as polarity information is less intuitively central to syntax and language modeling. tional phrase is only used to make a connection between two noun phrases (e.g., “PP → NP after NP”, 10), the prepositional connector is still considered the most salient element....
[...]
...…capturing a right parenthesis. low the same hyperparameters as the generative model proposed in Dyer et al. (2016).6 The generative model did not use any pretrained word embeddings or POS tags; a discriminative variant of the standard RNNG was used to obtain tree samples for the generative model....
[...]
20,027 citations
Additional excerpts
...To investigate what the stack-only RNNG learns about headedness (and later endocentricity), we propose a variant of the composition function that makes use of an explicit attention mechanism (Bahdanau et al., 2015) and a sigmoid gate with multiplicative interactions, henceforth called GARNNG....
[...]
14,077 citations
3,291 citations
"What Do Recurrent Neural Network Gr..." refers background or methods in this paper
...Extensive prior work on phrase-structure parsing typically employs the probabilistic context-free grammar formalism, with lexicalized (Collins, 1997) and nonterminal (Johnson, 1998; Klein and Manning, 2003) augmentations....
[...]
...Every generated word goes onto the stack, too; and some past words will be composed into larger structures, but through the composition function, they are all still “available” to the network that predicts the next action....
[...]
...Unlike previous works that rely on hand-crafted rules to compose more fine-grained phrase representations (Collins, 1997; Klein and Manning, 2003), the RNNG implicitly parameterizes the information passed through compositions of phrases (in Θ and the neural network architecture), hence weakening…...
[...]
2,527 citations
"What Do Recurrent Neural Network Gr..." refers methods in this paper
...…grammars with lexical head information has a long history in parsing, starting with the models of Collins (1997), and theories of syntax such as the “bare phrase structure” hypothesis of the Minimalist Program (Chomsky, 1993) posit that phrases are represented purely by single lexical heads....
[...]
...Augmenting grammars with lexical head information has a long history in parsing, starting with the models of Collins (1997), and theories of syntax such as the “bare phrase structure” hypothesis of the Minimalist Program (Chomsky, 1993) posit that phrases are represented purely by single lexical heads....
[...]