scispace - formally typeset
Open AccessProceedings ArticleDOI

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

TLDR
This paper conducted a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models.
Abstract
Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper, we conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models. Surprisingly, SWEMs exhibit comparable or even superior performance in the majority of cases considered. Based upon this understanding, we propose two additional pooling strategies over learned word embeddings: (i) a max-pooling operation for improved interpretability; and (ii) a hierarchical pooling operation, which preserves spatial (n-gram) information within text sequences. We present experiments on 17 datasets encompassing three tasks: (i) (long) document classification; (ii) text sequence matching; and (iii) short text tasks, including classification and tagging.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Graph Convolutional Networks for Text Classification

TL;DR: Zhang et al. as mentioned in this paper proposed a Text Graph Convolutional Network (Text GCN) for text classification, which jointly learns the embeddings for both words and documents, as supervised by the known class labels for documents.
Journal ArticleDOI

A Term Weighted Neural Language Model and Stacked Bidirectional LSTM Based Framework for Sarcasm Identification

TL;DR: In this paper, an effective sarcasm identification framework on social media data by pursuing the paradigms of neural language models and deep neural networks is presented. But sarcasm detection on text documents is one of the most challenging tasks in NLP.
Posted Content

Knowledge-Enriched Transformer for Emotion Detection in Textual Conversations

TL;DR: A Knowledge-Enriched Transformer (KET) is proposed, where contextual utterances are interpreted using hierarchical self-attention and external commonsense knowledge is dynamically leveraged using a context-aware affective graph attention mechanism.
Posted Content

Tensor Graph Convolutional Networks for Text Classification.

TL;DR: This paper investigates graph-based neural networks for text classification problem with a new framework TensorGCN (tensor graph convolutional networks), which presents an effective way to harmonize and integrate heterogeneous information from different kinds of graphs.
Posted Content

Estimating Training Data Influence by Tracking Gradient Descent

TL;DR: TracIn as mentioned in this paper is a method that computes the influence of a training example on a prediction made by the model by tracing how the loss on the test point changes during the training process whenever the training example of interest was utilized.
References
More filters
Proceedings Article

Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions

TL;DR: A novel machine learning framework based on recursive autoencoders for sentence-level prediction of sentiment label distributions that outperform other state-of-the-art approaches on commonly used datasets, without using any pre-defined sentiment lexica or polarity shifting rules.
Proceedings ArticleDOI

A Decomposable Attention Model for Natural Language Inference

TL;DR: The authors use attention to decompose the problem into subproblems that can be solved separately, thus making it trivially parallelizable and achieving state-of-the-art results on the Stanford Natural Language Inference (SNLI) dataset.
Proceedings Article

A Simple but Tough-to-Beat Baseline for Sentence Embeddings

TL;DR: This paper showed that using word embeddings computed using one of the popular methods on unlabeled corpus like Wikipedia, represent the sentence by a weighted average of the word vectors, and then modify them a bit using PCA/SVD.
Proceedings Article

Convolutional Neural Network Architectures for Matching Natural Language Sentences

TL;DR: Convolutional neural network models for matching two sentences are proposed, by adapting the convolutional strategy in vision and speech and nicely represent the hierarchical structures of sentences with their layer-by-layer composition and pooling.
Journal ArticleDOI

Composition in distributional models of semantics.

TL;DR: This article proposes a framework for representing the meaning of word combinations in vector space in terms of additive and multiplicative functions, and introduces a wide range of composition models that are evaluated empirically on a phrase similarity task.
Related Papers (5)