Open AccessPosted Content
Improving Neural Language Models with a Continuous Cache
Reads0
Chats0
TLDR
This article propose an extension to neural network language models to adapt their prediction to the recent history, which stores past hidden activations as memory and accesses them through a dot product with the current hidden activation.Abstract:
We propose an extension to neural network language models to adapt their prediction to the recent history. Our model is a simplified version of memory augmented networks, which stores past hidden activations as memory and accesses them through a dot product with the current hidden activation. This mechanism is very efficient and scales to very large memory sizes. We also draw a link between the use of external memory in neural network and cache models used with count based language models. We demonstrate on several language model datasets that our approach performs significantly better than recent memory augmented networks.read more
Citations
More filters
Posted Content
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
TL;DR: A systematic evaluation of generic convolutional and recurrent architectures for sequence modeling concludes that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutionals should be regarded as a natural starting point for sequence modeled tasks.
Proceedings ArticleDOI
Transformer-XL: Attentive Language Models beyond a Fixed-Length Context.
TL;DR: This work proposes a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence, which consists of a segment-level recurrence mechanism and a novel positional encoding scheme.
Posted Content
fairseq: A Fast, Extensible Toolkit for Sequence Modeling.
Myle Ott,Sergey Edunov,Alexei Baevski,Angela Fan,Sam Gross,Nathan Ng,David Grangier,Michael Auli +7 more
TL;DR: fairseq as discussed by the authors is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks, and supports distributed training across multiple GPUs and machines.
Proceedings ArticleDOI
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott,Sergey Edunov,Alexei Baevski,Angela Fan,Sam Gross,Nathan Ng,David Grangier,Michael Auli +7 more
TL;DR: Fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks and supports distributed training across multiple GPUs and machines.
Posted Content
DARTS: Differentiable Architecture Search
TL;DR: In this article, the authors propose a differentiable architecture search algorithm based on the continuous relaxation of the architecture representation. But the architecture search is not a discrete and non-differentiable search space.
References
More filters
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article
Neural Machine Translation by Jointly Learning to Align and Translate
TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Posted Content
Neural Machine Translation by Jointly Learning to Align and Translate
TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Journal ArticleDOI
Finding Structure in Time
TL;DR: A proposal along these lines first described by Jordan (1986) which involves the use of recurrent links in order to provide networks with a dynamic memory and suggests a method for representing lexical categories and the type/token distinction is developed.
Posted Content
Empirical evaluation of gated recurrent neural networks on sequence modeling
TL;DR: These advanced recurrent units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU), are found to be comparable to LSTM.