scispace - formally typeset
Open AccessProceedings Article

On the dimensionality of word embedding

Zi Yin, +1 more
- Vol. 31, pp 895-906
TLDR
In this article, the Pairwise Inner Product (PIP) loss is proposed to measure the dissimilarity between word embeddings and reveal a fundamental bias-variance trade-off in dimensionality selection.
Abstract
In this paper, we provide a theoretical understanding of word embedding and its dimensionality. Motivated by the unitary-invariance of word embedding, we propose the Pairwise Inner Product (PIP) loss, a novel metric on the dissimilarity between word embeddings. Using techniques from matrix perturbation theory, we reveal a fundamental bias-variance trade-off in dimensionality selection for word embeddings. This bias-variance trade-off sheds light on many empirical observations which were previously unexplained, for example the existence of an optimal dimensionality. Moreover, new insights and discoveries, like when and how word embeddings are robust to over-fitting, are revealed. By optimizing over the bias-variance trade-off of the PIP loss, we can explicitly answer the open question of dimensionality selection for word embedding.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Survey on categorical data for neural networks

TL;DR: This study provides a starting point for research in determining which techniques for preparing qualitative data for use with neural networks are best, and is the first in-depth look at techniques for working with categorical data in neural networks.
Proceedings Article

N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules

TL;DR: The N-gram graph is introduced, a simple unsupervised representation for molecules that is equivalent to a simple graph neural network that needs no training and is complemented by theoretical analysis showing its strong representation and prediction power.
Proceedings ArticleDOI

A^3: Accelerating Attention Mechanisms in Neural Networks with Approximation

TL;DR: A3 is designed and architect, which accelerates attention mechanisms in neural networks with algorithmic approximation and hardware specialization and achieves multiple orders of magnitude improvement in energy efficiency (performance/watt) as well as substantial speedup over the state-of-the-art conventional hardware.
Proceedings ArticleDOI

Effective Dimensionality Reduction for Word Embeddings.

TL;DR: This work presents a novel technique that efficiently combines PCA based dimensionality reduction with a recently proposed post-processing algorithm, to construct effective word embeddings of lower dimensions.
Posted Content

Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems

TL;DR: This work proposes mixed dimension embedding layers in which the dimension of a particular embedding vector can depend on the frequency of the item, which drastically reduces the memory requirement for the embedding, while maintaining and sometimes improving the ML performance.
References
More filters
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Posted Content

Efficient Estimation of Word Representations in Vector Space

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.
Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Related Papers (5)