Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms
Dinghan Shen,Guoyin Wang,Wenlin Wang,Martin Renqiang Min,Qinliang Su,Yizhe Zhang,Chunyuan Li,Ricardo Henao,Lawrence Carin +8 more
- Vol. 1, pp 440-450
TLDR
This paper conducted a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models.Abstract:
Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper, we conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models. Surprisingly, SWEMs exhibit comparable or even superior performance in the majority of cases considered. Based upon this understanding, we propose two additional pooling strategies over learned word embeddings: (i) a max-pooling operation for improved interpretability; and (ii) a hierarchical pooling operation, which preserves spatial (n-gram) information within text sequences. We present experiments on 17 datasets encompassing three tasks: (i) (long) document classification; (ii) text sequence matching; and (iii) short text tasks, including classification and tagging.read more
Citations
More filters
Journal ArticleDOI
Predictive Student Modeling in Game-Based Learning Environments with Word Embedding Representations of Reflection
TL;DR: A predictive student modeling framework that leverages natural language responses to in-game reflection prompts to predict student learning outcomes in a game-based learning environment for middle school microbiology, CRYSTAL ISLAND is presented.
Posted ContentDOI
Using Whole Document Context in Neural Machine Translation
Valentin Mace,Christophe Servan +1 more
TL;DR: A method to add source context that capture the whole document with accurate boundaries, taking every word into account is presented, obtaining promising results in the English-German, English-French and French-English document-level translation tasks.
Journal ArticleDOI
On the cost-effectiveness of neural and non-neural approaches and representations for text classification: A comprehensive comparative study
Washington Cunha,Vítor Mangaravite,Christian Gomes,Sergio Canuto,Elaine Resende,Cecília Vieira do Nascimento,Felipe Viegas,Celso França,Wellington Santos Martins,Jussara M. Almeida,Thierson Couto Rosa,Leonardo Rocha,Marcos André Gonçalves +12 more
TL;DR: In this article, the authors present the results of a critical analysis of recent scientific articles about neural and non-neural approaches and representations for automatic text classification (ATC) focusing on assessing the scientific rigor of such studies.
Posted Content
Distilled Wasserstein Learning for Word Embedding and Topic Modeling
TL;DR: In this paper, the Euclidean distance between word embeddings may be employed as the underlying distance in the Wasserstein topic model, yielding joint learning of word embedding and topics.
Posted Content
Selection Bias Explorations and Debias Methods for Natural Language Sentence Matching Datasets
TL;DR: This paper investigates the problem of selection bias on six NLSM datasets and finds that four out of them are significantly biased, and proposes a training and evaluation framework to alleviate the bias.
References
More filters
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Journal Article
Dropout: a simple way to prevent neural networks from overfitting
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Proceedings ArticleDOI
Glove: Global Vectors for Word Representation
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.