Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function

doi:10.1609/AAAI.V33I01.33016940

Open AccessJournal ArticleDOI

Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function

Devendra Singh Sachan, +2 more

- Vol. 33, Iss: 01, pp 6940-6948

Chats0

TLDR

This paper develops a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results compared with more complex approaches, and shows the generality of the mixed objective function by improving the performance on relation extraction task.

Abstract:

In this paper, we study bidirectional LSTM network for the task of text classification using both supervised and semisupervised approaches. Several prior works have suggested that either complex pretraining schemes using unsupervised methods such as language modeling (Dai and Le 2015; Miyato, Dai, and Goodfellow 2016) or complicated models (Johnson and Zhang 2017) are necessary to achieve a high classification accuracy. However, we develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results compared with more complex approaches. Furthermore, in addition to cross-entropy loss, by using a combination of entropy minimization, adversarial, and virtual adversarial losses for both labeled and unlabeled data, we report state-of-theart results for text classification task on several benchmark datasets. In particular, on the ACL-IMDB sentiment analysis and AG-News topic classification datasets, our method outperforms current approaches by a substantial margin. We also show the generality of the mixed objective function by improving the performance on relation extraction task.1

Citations

PDF

Open Access

More filters

Proceedings Article

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Zhilin Yang, +5 more

TL;DR: The authors proposes XLNet, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT The authors.

...read moreread less

Posted Content

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Zhilin Yang, +5 more

- 19 Jun 2019 -

arXiv: Computation and Language

TL;DR: XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.

...read moreread less

Posted Content

Unsupervised Data Augmentation for Consistency Training

Qizhe Xie, +4 more

- 29 Apr 2019 -

arXiv: Learning

TL;DR: A new perspective on how to effectively noise unlabeled examples is presented and it is argued that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.

...read moreread less

Journal ArticleDOI

Deep Learning--based Text Classification: A Comprehensive Review

Shervin Minaee, +5 more

- 17 Apr 2021 -

ACM Computing Surveys

TL;DR: This paper provided a comprehensive review of more than 150 deep learning-based models for text classification developed in recent years, and discussed their technical contributions, similarities, and strengths, and provided a quantitative analysis of the performance of different deep learning models on popular benchmarks.

...read moreread less

Journal ArticleDOI

A review on the long short-term memory model

Greg Van Houdt, +3 more

- 01 Dec 2020 -

Artificial Intelligence Review

TL;DR: A comprehensive review of LSTM’s formulation and training, relevant applications reported in the literature and code resources implementing this model for a toy example are presented.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Automatic differentiation in PyTorch

Adam Paszke, +9 more

TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.

...read moreread less

Book ChapterDOI

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

Thorsten Joachims

TL;DR: This paper explores the use of Support Vector Machines for learning text classifiers from examples and analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task.

...read moreread less

Collapse

Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function

Citations

XLNet: Generalized Autoregressive Pretraining for Language Understanding

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Unsupervised Data Augmentation for Consistency Training

Deep Learning--based Text Classification: A Comprehensive Review

A review on the long short-term memory model

References

Adam: A Method for Stochastic Optimization

Long short-term memory

Distributed Representations of Words and Phrases and their Compositionality

Automatic differentiation in PyTorch

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Long short-term memory

Glove: Global Vectors for Word Representation

Adam: A Method for Stochastic Optimization

Convolutional Neural Networks for Sentence Classification