On the dimensionality of word embedding

Open AccessProceedings Article

On the dimensionality of word embedding

- Vol. 31, pp 895-906

TLDR

In this article, the Pairwise Inner Product (PIP) loss is proposed to measure the dissimilarity between word embeddings and reveal a fundamental bias-variance trade-off in dimensionality selection.

Abstract:

In this paper, we provide a theoretical understanding of word embedding and its dimensionality. Motivated by the unitary-invariance of word embedding, we propose the Pairwise Inner Product (PIP) loss, a novel metric on the dissimilarity between word embeddings. Using techniques from matrix perturbation theory, we reveal a fundamental bias-variance trade-off in dimensionality selection for word embeddings. This bias-variance trade-off sheds light on many empirical observations which were previously unexplained, for example the existence of an optimal dimensionality. Moreover, new insights and discoveries, like when and how word embeddings are robust to over-fitting, are revealed. By optimizing over the bias-variance trade-off of the PIP loss, we can explicitly answer the open question of dimensionality selection for word embedding.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Survey on categorical data for neural networks

John Hancock, +1 more

- 01 Dec 2020 -

Journal of Big Data

TL;DR: This study provides a starting point for research in determining which techniques for preparing qualitative data for use with neural networks are best, and is the first in-depth look at techniques for working with categorical data in neural networks.

...read moreread less

Proceedings Article

N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules

Shengchao Liu, +2 more

TL;DR: The N-gram graph is introduced, a simple unsupervised representation for molecules that is equivalent to a simple graph neural network that needs no training and is complemented by theoretical analysis showing its strong representation and prediction power.

...read moreread less

Proceedings ArticleDOI

A^3: Accelerating Attention Mechanisms in Neural Networks with Approximation

Tae Jun Ham, +10 more

TL;DR: A3 is designed and architect, which accelerates attention mechanisms in neural networks with algorithmic approximation and hardware specialization and achieves multiple orders of magnitude improvement in energy efficiency (performance/watt) as well as substantial speedup over the state-of-the-art conventional hardware.

...read moreread less

Proceedings ArticleDOI

Effective Dimensionality Reduction for Word Embeddings.

Vikas Raunak, +2 more

TL;DR: This work presents a novel technique that efficiently combines PCA based dimensionality reduction with a recently proposed post-processing algorithm, to construct effective word embeddings of lower dimensions.

...read moreread less

Posted Content

Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems

Antonio Ginart, +4 more

- 25 Sep 2019 -

arXiv: Learning

TL;DR: This work proposes mixed dimension embedding layers in which the dimension of a particular embedding vector can depend on the frequency of the item, which drastically reduces the memory requirement for the embedding, while maintaining and sometimes improving the ML performance.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Posted Content

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, +3 more

- 16 Jan 2013 -

arXiv: Computation and Language

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.

...read moreread less

Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Sep 2014 -

arXiv: Computation and Language

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less