scispace - formally typeset
Open AccessProceedings ArticleDOI

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

TLDR
This paper conducted a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models.
Abstract
Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper, we conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models. Surprisingly, SWEMs exhibit comparable or even superior performance in the majority of cases considered. Based upon this understanding, we propose two additional pooling strategies over learned word embeddings: (i) a max-pooling operation for improved interpretability; and (ii) a hierarchical pooling operation, which preserves spatial (n-gram) information within text sequences. We present experiments on 17 datasets encompassing three tasks: (i) (long) document classification; (ii) text sequence matching; and (iii) short text tasks, including classification and tagging.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Training Effective Neural Sentence Encoders from Automatically Mined Paraphrases

TL;DR: The authors proposed a method for training effective language-specific sentence encoders without manually labeled data by automatically constructing a dataset of paraphrase pairs from sentence-aligned bilingual text corpora and then using the collected data to fine-tune a Transformer language model with an additional recurrent pooling layer.
Proceedings ArticleDOI

Scalable Few-Shot Learning of Robust Biomedical Name Representations.

TL;DR: In this paper, a few-shot learning approach is proposed to explore the impact of conceptual distinctions on robust biomedical name representations, which is effective for various types of input representations, both domain-specific or unsupervised.
Proceedings ArticleDOI

Efficient Sentence Embedding via Semantic Subspace Analysis

TL;DR: In this paper, a novel sentence embedding method built upon semantic subspace analysis is proposed, called Semantic Subspace Sentence Embedding (S3E), which constructs a sentence model from two aspects.
Posted Content

Re-evaluating Word Mover's Distance.

TL;DR: In this article, the authors re-evaluate the performance of WMD and the classical baselines and find that WMD in high-dimensional spaces behaves more similarly to BOW than in low-dimensional space due to the curse of dimensionality.
Journal ArticleDOI

Social world knowledge: Modeling and applications

Nir Lotan, +1 more
- 28 Jun 2023 - 
TL;DR: SocialVec as discussed by the authors is a general framework for eliciting low-dimensional entity embeddings from the social contexts in which they occur in social networks, where entities correspond to highly popular accounts which invoke general interest.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article

Attention is All you Need

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Journal Article

Dropout: a simple way to prevent neural networks from overfitting

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Related Papers (5)