When Are Tree Structures Necessary for Deep Learning of Representations

Open AccessPosted Content

When Are Tree Structures Necessary for Deep Learning of Representations

- 28 Feb 2015 -

TLDR

The authors show that recursive neural models can outperform simple recurrent neural networks (LSTM and LSTM) on several tasks, such as sentiment classification at the sentence level and phrase level, matching questions to answer-phrases, discourse parsing and semantic relation extraction.

Abstract:

Recursive neural models, which use syntactic parse trees to recursively generate representations bottom-up, are a popular architecture. But there have not been rigorous evaluations showing for exactly which tasks this syntax-based method is appropriate. In this paper we benchmark {\bf recursive} neural models against sequential {\bf recurrent} neural models (simple recurrent and LSTM models), enforcing apples-to-apples comparison as much as possible. We investigate 4 tasks: (1) sentiment classification at the sentence level and phrase level; (2) matching questions to answer-phrases; (3) discourse parsing; (4) semantic relation extraction (e.g., {\em component-whole} between nouns). Our goal is to understand better when, and why, recursive models can outperform simpler models. We find that recursive models help mainly on tasks (like semantic relation extraction) that require associating headwords across a long distance, particularly on very long sequences. We then introduce a method for allowing recurrent models to achieve similar performance: breaking long sentences into clause-like units at punctuation and processing them separately before combining. Our results thus help understand the limitations of both classes of models, and suggest directions for improving recurrent models.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Document Modeling with Gated Recurrent Neural Network for Sentiment Classification

Duyu Tang, +2 more

TL;DR: A neural network model is introduced to learn vector-based document representation in a unified, bottom-up fashion and dramatically outperforms standard recurrent neural network in document modeling for sentiment classification.

...read moreread less

Proceedings ArticleDOI

End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures

Makoto Miwa, +1 more

TL;DR: A novel end-to-end neural model to extract entities and relations between them and compares favorably to the state-of-the-art CNN based model (in F1-score) on nominal relation classification (SemEval-2010 Task 8).

...read moreread less

Proceedings ArticleDOI

Aspect Level Sentiment Classification with Deep Memory Network

Duyu Tang, +2 more

TL;DR: The authors proposed a deep memory network for aspect level sentiment classification, which explicitly captures the importance of each context word when inferring the sentiment polarity of an aspect, such importance degree and text representation are calculated with multiple computational layers, each of which is a neural attention model over an external memory.

...read moreread less

Posted Content

A C-LSTM Neural Network for Text Classification

Chunting Zhou, +3 more

- 27 Nov 2015 -

arXiv: Computation and Language

TL;DR: C-LSTM is a novel and unified model for sentence representation and text classification that outperforms both CNN and LSTM and can achieve excellent performance on these tasks.

...read moreread less

Proceedings Article

Effective LSTMs for Target-Dependent Sentiment Classification

Duyu Tang, +3 more

TL;DR: Two target dependent long short-term memory models, where target information is automatically taken into account, are developed, which achieve state-of-the-art performances without using syntactic parser or external sentiment lexicons.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Posted Content

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, +3 more

- 16 Jan 2013 -

arXiv: Computation and Language

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.

...read moreread less

Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Posted Content

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, +2 more

- 10 Sep 2014 -

arXiv: Computation and Language

TL;DR: This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

...read moreread less