Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

doi:10.18653/V1/P18-1041

Open AccessProceedings ArticleDOI

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

- Vol. 1, pp 440-450

TLDR

This paper conducted a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models.

Abstract:

Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper, we conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models. Surprisingly, SWEMs exhibit comparable or even superior performance in the majority of cases considered. Based upon this understanding, we propose two additional pooling strategies over learned word embeddings: (i) a max-pooling operation for improved interpretability; and (ii) a hierarchical pooling operation, which preserves spatial (n-gram) information within text sequences. We present experiments on 17 datasets encompassing three tasks: (i) (long) document classification; (ii) text sequence matching; and (iii) short text tasks, including classification and tagging.

Citations

PDF

Open Access

More filters

Posted Content

Graph Convolutional Networks for Text Classification

Liang Yao, +2 more

- 15 Sep 2018 -

arXiv: Computation and Language

TL;DR: Zhang et al. as mentioned in this paper proposed a Text Graph Convolutional Network (Text GCN) for text classification, which jointly learns the embeddings for both words and documents, as supervised by the known class labels for documents.

...read moreread less

Journal ArticleDOI

A Term Weighted Neural Language Model and Stacked Bidirectional LSTM Based Framework for Sarcasm Identification

Aytuğ Onan, +1 more

- 06 Jan 2021 -

IEEE Access

TL;DR: In this paper, an effective sarcasm identification framework on social media data by pursuing the paradigms of neural language models and deep neural networks is presented. But sarcasm detection on text documents is one of the most challenging tasks in NLP.

...read moreread less

Posted Content

Knowledge-Enriched Transformer for Emotion Detection in Textual Conversations

Peixiang Zhong, +2 more

- 24 Sep 2019 -

arXiv: Computation and Language

TL;DR: A Knowledge-Enriched Transformer (KET) is proposed, where contextual utterances are interpreted using hierarchical self-attention and external commonsense knowledge is dynamically leveraged using a context-aware affective graph attention mechanism.

...read moreread less

Posted Content

Tensor Graph Convolutional Networks for Text Classification.

Xien Liu, +4 more

- 12 Jan 2020 -

arXiv: Computation and Language

TL;DR: This paper investigates graph-based neural networks for text classification problem with a new framework TensorGCN (tensor graph convolutional networks), which presents an effective way to harmonize and integrate heterogeneous information from different kinds of graphs.

...read moreread less

Posted Content

Estimating Training Data Influence by Tracking Gradient Descent

Garima Pruthi, +3 more

- 19 Feb 2020 -

arXiv: Learning

TL;DR: TracIn as mentioned in this paper is a method that computes the influence of a training example on a prediction made by the model by tracing how the loss on the test point changes during the training process whenever the training example of interest was utilized.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions

Richard Socher, +4 more

TL;DR: A novel machine learning framework based on recursive autoencoders for sentence-level prediction of sentiment label distributions that outperform other state-of-the-art approaches on commonly used datasets, without using any pre-defined sentiment lexica or polarity shifting rules.

...read moreread less

Proceedings ArticleDOI

A Decomposable Attention Model for Natural Language Inference

Ankur P. Parikh, +3 more

TL;DR: The authors use attention to decompose the problem into subproblems that can be solved separately, thus making it trivially parallelizable and achieving state-of-the-art results on the Stanford Natural Language Inference (SNLI) dataset.

...read moreread less

Proceedings Article

A Simple but Tough-to-Beat Baseline for Sentence Embeddings

Sanjeev Arora, +2 more

TL;DR: This paper showed that using word embeddings computed using one of the popular methods on unlabeled corpus like Wikipedia, represent the sentence by a weighted average of the word vectors, and then modify them a bit using PCA/SVD.

...read moreread less

Proceedings Article

Convolutional Neural Network Architectures for Matching Natural Language Sentences

Baotian Hu, +3 more

TL;DR: Convolutional neural network models for matching two sentences are proposed, by adapting the convolutional strategy in vision and speech and nicely represent the hierarchical structures of sentences with their layer-by-layer composition and pooling.

...read moreread less

Journal ArticleDOI

Composition in distributional models of semantics.

Jeff Mitchell, +1 more

- 01 Nov 2010 -

Cognitive Science

TL;DR: This article proposes a framework for representing the meaning of word combinations in vector space in terms of additive and multiplicative functions, and introduces a wide range of composition models that are evaluated empirically on a phrase similarity task.

...read moreread less

Collapse

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

Citations

Graph Convolutional Networks for Text Classification

A Term Weighted Neural Language Model and Stacked Bidirectional LSTM Based Framework for Sarcasm Identification

Knowledge-Enriched Transformer for Emotion Detection in Textual Conversations

Tensor Graph Convolutional Networks for Text Classification.

Estimating Training Data Influence by Tracking Gradient Descent

References

Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions

A Decomposable Attention Model for Natural Language Inference

A Simple but Tough-to-Beat Baseline for Sentence Embeddings

Convolutional Neural Network Architectures for Matching Natural Language Sentences

Composition in distributional models of semantics.

Related Papers (5)

Glove: Global Vectors for Word Representation

Convolutional Neural Networks for Sentence Classification

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Long short-term memory

Distributed Representations of Words and Phrases and their Compositionality