Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

doi:10.18653/V1/P18-1041

Open AccessProceedings ArticleDOI

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

- Vol. 1, pp 440-450

TLDR

This paper conducted a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models.

Abstract:

Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper, we conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models. Surprisingly, SWEMs exhibit comparable or even superior performance in the majority of cases considered. Based upon this understanding, we propose two additional pooling strategies over learned word embeddings: (i) a max-pooling operation for improved interpretability; and (ii) a hierarchical pooling operation, which preserves spatial (n-gram) information within text sequences. We present experiments on 17 datasets encompassing three tasks: (i) (long) document classification; (ii) text sequence matching; and (iii) short text tasks, including classification and tagging.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Predicting Guesses and Slips Through Question Encoding with Complexity Hints

TL;DR: In this paper , the potential usefulness of complexity hints in predicting mastery of algebra problems is explored, where natural language processing methods were used to derive a complexity estimate of each assessment problem and applied to estimate guess and slip probabilities using individualized neural network models.

...read moreread less

Proceedings ArticleDOI

Word Order is Considerable: Contextual Position-aware Graph Neural Network for Text Classification

TL;DR: Wang et al. as mentioned in this paper proposed a Contextual Position-Aware Graph Neural Network (CGAN) for text classification, which includes the Position-aware Graph Attention module and the Contextual Fusion module.

...read moreread less

Journal ArticleDOI

Text classification on heterogeneous information network via enhanced GCN and knowledge

Hui Li, +4 more

- 30 Mar 2023 -

Neural Computing and Applications

Proceedings ArticleDOI

An Emotional Comfort Framework for Improving User Satisfaction in E-Commerce Customer Service Chatbots.

Shuangyong Song, +3 more

TL;DR: Wang et al. as discussed by the authors presented AliMe Assist, a Chinese intelligent assistant designed for creating an innovative online shopping experience in E-commerce, which offers assistance service, customer service, and chatting service.

...read moreread less

Book ChapterDOI

A New Method to Measure Similarity of Words in Japanese Twitter Based on Related Images

Zhelin Xu, +2 more

- 01 Jan 2022 -

Lecture Notes in Computer Science

TL;DR: In this article , a method to use word-related images to measure the similarity between words was proposed, assuming that words with the same meaning have similar or common related images, and a manually annotated Japanese data set was created to evaluate the proposed method.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

Journal Article

Dropout: a simple way to prevent neural networks from overfitting

Nitish Srivastava, +4 more

- 01 Jan 2014 -

Journal of Machine Learning Research

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Collapse

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

Citations

Predicting Guesses and Slips Through Question Encoding with Complexity Hints

Word Order is Considerable: Contextual Position-aware Graph Neural Network for Text Classification

Text classification on heterogeneous information network via enhanced GCN and knowledge

An Emotional Comfort Framework for Improving User Satisfaction in E-Commerce Customer Service Chatbots.

A New Method to Measure Similarity of Words in Japanese Twitter Based on Related Images

References

Adam: A Method for Stochastic Optimization

Long short-term memory

Attention is All you Need

Dropout: a simple way to prevent neural networks from overfitting

Glove: Global Vectors for Word Representation

Related Papers (5)

Glove: Global Vectors for Word Representation

Convolutional Neural Networks for Sentence Classification

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Long short-term memory

Distributed Representations of Words and Phrases and their Compositionality