Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

doi:10.18653/V1/P18-1041

Open AccessProceedings ArticleDOI

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

- Vol. 1, pp 440-450

TLDR

This paper conducted a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models.

Abstract:

Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper, we conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models. Surprisingly, SWEMs exhibit comparable or even superior performance in the majority of cases considered. Based upon this understanding, we propose two additional pooling strategies over learned word embeddings: (i) a max-pooling operation for improved interpretability; and (ii) a hierarchical pooling operation, which preserves spatial (n-gram) information within text sequences. We present experiments on 17 datasets encompassing three tasks: (i) (long) document classification; (ii) text sequence matching; and (iii) short text tasks, including classification and tagging.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A deep neural network model for fashion collocation recommendation using side information in e-commerce

Siyu Wang, +1 more

- 01 Oct 2021 -

Applied Soft Computing

TL;DR: Wang et al. as mentioned in this paper developed a fashion collocation recommendation model that leverages accessible side information in e-commerce, such as textual descriptions, purchase data and category information of items, generally bear valuable information regarding this task.

...read moreread less

Proceedings ArticleDOI

Neural Self-Training through Spaced Repetition

Hadi Amiri

TL;DR: This work introduces a new data sampling technique based on spaced repetition that dynamically samples informative and diverse unlabeled instances with respect to individual learner and instance characteristics that outperforms current semi-supervised learning approaches developed for neural networks on publicly-available datasets.

...read moreread less

Book ChapterDOI

A Weighted Word Embedding Model for Text Classification

Haopeng Ren, +5 more

TL;DR: A model called weighted word embedding model (WWEM) is proposed, a variant of NBOW model introducing term weighting schemes and n-grams that generates informative sentence or document representation considering the important degree of words and the word-order information.

...read moreread less

Posted Content

Recursive Graphical Neural Networks for Text Classification.

Wei Li, +5 more

- 18 Sep 2019 -

arXiv: Computation and Language

TL;DR: A novel Recursive Graphical Neural Networks model (ReGNN) is proposed to represent text organized in the form of graph to alleviating the over-smoothing problem and to encourage the exchange between the local and global information, a global graph-level node is designed.

...read moreread less

Journal ArticleDOI

Time-sync comments denoising via graph convolutional and contextual encoding

Zhenyu Liao, +4 more

- 06 May 2020 -

Pattern Recognition Letters

TL;DR: This study proposes GCCED, a graph convolutional and contextual encoding denoising model for TSC semantic Denoising problem, which demonstrates the proposed model outperforming other baselines in almost all classification metrics.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

Journal Article

Dropout: a simple way to prevent neural networks from overfitting

Nitish Srivastava, +4 more

- 01 Jan 2014 -

Journal of Machine Learning Research

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Collapse

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

Citations

A deep neural network model for fashion collocation recommendation using side information in e-commerce

Neural Self-Training through Spaced Repetition

A Weighted Word Embedding Model for Text Classification

Recursive Graphical Neural Networks for Text Classification.

Time-sync comments denoising via graph convolutional and contextual encoding

References

Adam: A Method for Stochastic Optimization

Long short-term memory

Attention is All you Need

Dropout: a simple way to prevent neural networks from overfitting

Glove: Global Vectors for Word Representation

Related Papers (5)

Glove: Global Vectors for Word Representation

Convolutional Neural Networks for Sentence Classification

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Long short-term memory

Distributed Representations of Words and Phrases and their Compositionality