Improving Neural Language Models with a Continuous Cache

Open AccessPosted Content

Improving Neural Language Models with a Continuous Cache

Edouard Grave, +2 more

- 13 Dec 2016 -

arXiv: Computation and Language

Chats0

TLDR

This article propose an extension to neural network language models to adapt their prediction to the recent history, which stores past hidden activations as memory and accesses them through a dot product with the current hidden activation.

Abstract:

We propose an extension to neural network language models to adapt their prediction to the recent history. Our model is a simplified version of memory augmented networks, which stores past hidden activations as memory and accesses them through a dot product with the current hidden activation. This mechanism is very efficient and scales to very large memory sizes. We also draw a link between the use of external memory in neural network and cache models used with count based language models. We demonstrate on several language model datasets that our approach performs significantly better than recent memory augmented networks.

Citations

PDF

Open Access

More filters

Posted Content

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Shaojie Bai, +2 more

- 04 Mar 2018 -

arXiv: Learning

TL;DR: A systematic evaluation of generic convolutional and recurrent architectures for sequence modeling concludes that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutionals should be regarded as a natural starting point for sequence modeled tasks.

...read moreread less

Proceedings ArticleDOI

Transformer-XL: Attentive Language Models beyond a Fixed-Length Context.

Zihang Dai, +5 more

TL;DR: This work proposes a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence, which consists of a segment-level recurrence mechanism and a novel positional encoding scheme.

...read moreread less

Posted Content

fairseq: A Fast, Extensible Toolkit for Sequence Modeling.

Myle Ott, +7 more

- 01 Apr 2019 -

arXiv: Computation and Language

TL;DR: fairseq as discussed by the authors is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks, and supports distributed training across multiple GPUs and machines.

...read moreread less

Proceedings ArticleDOI

fairseq: A Fast, Extensible Toolkit for Sequence Modeling

Myle Ott, +7 more

TL;DR: Fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks and supports distributed training across multiple GPUs and machines.

...read moreread less

Posted Content

DARTS: Differentiable Architecture Search

Hanxiao Liu, +2 more

- 24 Jun 2018 -

arXiv: Learning

TL;DR: In this article, the authors propose a differentiable architecture search algorithm based on the continuous relaxation of the architecture representation. But the architecture search is not a discrete and non-differentiable search space.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Sep 2014 -

arXiv: Computation and Language

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Journal ArticleDOI

Finding Structure in Time

Jeffrey L. Elman

- 01 Mar 1990 -

Cognitive Science

TL;DR: A proposal along these lines first described by Jordan (1986) which involves the use of recurrent links in order to provide networks with a dynamic memory and suggests a method for representing lexical categories and the type/token distinction is developed.

...read moreread less

Posted Content

Empirical evaluation of gated recurrent neural networks on sequence modeling

Junyoung Chung, +5 more

- 11 Dec 2014 -

arXiv: Neural and Evolutionary Computing

TL;DR: These advanced recurrent units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU), are found to be comparable to LSTM.

...read moreread less

Collapse

Related Papers (5)

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

Journal of Machine Learning Research

Improving Neural Language Models with a Continuous Cache

Citations

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Transformer-XL: Attentive Language Models beyond a Fixed-Length Context.

fairseq: A Fast, Extensible Toolkit for Sequence Modeling.

fairseq: A Fast, Extensible Toolkit for Sequence Modeling

DARTS: Differentiable Architecture Search

References

Long short-term memory

Neural Machine Translation by Jointly Learning to Align and Translate

Neural Machine Translation by Jointly Learning to Align and Translate

Finding Structure in Time

Empirical evaluation of gated recurrent neural networks on sequence modeling

Related Papers (5)

Long short-term memory

Neural Machine Translation by Jointly Learning to Align and Translate

Recurrent neural network based language model

Attention is All you Need

A neural probabilistic language model