Deep pyramid convolutional neural networks for text categorization

doi:10.18653/V1/P17-1052

Open AccessProceedings ArticleDOI

Deep pyramid convolutional neural networks for text categorization

- Vol. 1, pp 562-570

TLDR

A low-complexity word-level deep convolutional neural network architecture for text categorization that can efficiently represent long-range associations in text and outperforms the previous best models on six benchmark datasets for sentiment classification and topic categorization.

Abstract:

This paper proposes a low-complexity word-level deep convolutional neural network (CNN) architecture for text categorization that can efficiently represent long-range associations in text. In the literature, several deep and complex neural networks have been proposed for this task, assuming availability of relatively large amounts of training data. However, the associated computational complexity increases as the networks go deeper, which poses serious challenges in practical applications. Moreover, it was shown recently that shallow word-level CNNs are more accurate and much faster than the state-of-the-art very deep nets such as character-level CNNs even in the setting of large training data. Motivated by these findings, we carefully studied deepening of word-level CNNs to capture global representations of text, and found a simple network architecture with which the best accuracy can be obtained by increasing the network depth without increasing computational cost by much. We call it deep pyramid CNN. The proposed model with 15 weight layers outperforms the previous best models on six benchmark datasets for sentiment classification and topic categorization.

Citations

PDF

Open Access

More filters

Proceedings Article

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Zhilin Yang, +5 more

TL;DR: The authors proposes XLNet, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT The authors.

...read moreread less

Posted Content

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Zhilin Yang, +5 more

- 19 Jun 2019 -

arXiv: Computation and Language

TL;DR: XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.

...read moreread less

Posted Content

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Shaojie Bai, +2 more

- 04 Mar 2018 -

arXiv: Learning

TL;DR: A systematic evaluation of generic convolutional and recurrent architectures for sequence modeling concludes that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutionals should be regarded as a natural starting point for sequence modeled tasks.

...read moreread less

Proceedings ArticleDOI

Universal Language Model Fine-tuning for Text Classification

Jeremy Howard, +1 more

TL;DR: Universal Language Model Fine-tuning (ULMFiT) as mentioned in this paper is an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for finetuning a language model.

...read moreread less

Posted Content

Unsupervised Data Augmentation for Consistency Training

Qizhe Xie, +4 more

- 29 Apr 2019 -

arXiv: Learning

TL;DR: A new perspective on how to effectively noise unlabeled examples is presented and it is argued that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Proceedings ArticleDOI

Convolutional Neural Networks for Sentence Classification

Yoon Kim

TL;DR: The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification, and are proposed to allow for the use of both task-specific and static vectors.

...read moreread less

Deep pyramid convolutional neural networks for text categorization

Citations

XLNet: Generalized Autoregressive Pretraining for Language Understanding

XLNet: Generalized Autoregressive Pretraining for Language Understanding

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Universal Language Model Fine-tuning for Text Classification

Unsupervised Data Augmentation for Consistency Training

References

Deep Residual Learning for Image Recognition

Long short-term memory

Very Deep Convolutional Networks for Large-Scale Image Recognition

Distributed Representations of Words and Phrases and their Compositionality

Convolutional Neural Networks for Sentence Classification

Related Papers (5)

Convolutional Neural Networks for Sentence Classification

Glove: Global Vectors for Word Representation

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Long short-term memory

Attention is All you Need