Unitary evolution recurrent neural networks

Open AccessProceedings Article

Unitary evolution recurrent neural networks

Martin Arjovsky, +2 more

- pp 1120-1128

Chats0

TLDR

This work constructs an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned, and demonstrates the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.

Abstract:

Recurrent neural networks (RNNs) are notoriously difficult to train. When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies. To circumvent this problem, we propose a new architecture that learns a unitary weight matrix, with eigenvalues of absolute value exactly 1. The challenge we address is that of parametrizing unitary matrices in a way that does not require expensive computations (such as eigendecomposition) after each weight update. We construct an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned. Optimization with this parameterization becomes feasible only when considering hidden states in the complex domain. We demonstrate the potential of this architecture by achieving state of the art results in several hard tasks involving very longterm dependencies.

Citations

PDF

Open Access

More filters

Posted Content

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Shaojie Bai, +2 more

- 04 Mar 2018 -

arXiv: Learning

TL;DR: A systematic evaluation of generic convolutional and recurrent architectures for sequence modeling concludes that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutionals should be regarded as a natural starting point for sequence modeled tasks.

...read moreread less

Journal ArticleDOI

Deep learning with coherent nanophotonic circuits

Yichen Shen, +4 more

TL;DR: A new architecture for a fully optical neural network is demonstrated that enables a computational speed enhancement of at least two orders of magnitude and three order of magnitude in power efficiency over state-of-the-art electronics.

...read moreread less

Posted Content

Regularizing and Optimizing LSTM Language Models

Stephen Merity, +2 more

- 07 Aug 2017 -

arXiv: Computation and Language

TL;DR: This paper proposes the weight-dropped LSTM which uses DropConnect on hidden-to-hidden weights as a form of recurrent regularization and introduces NT-ASGD, a variant of the averaged stochastic gradient method, wherein the averaging trigger is determined using a non-monotonic condition as opposed to being tuned by the user.

...read moreread less

Posted Content

Classification with Quantum Neural Networks on Near Term Processors

Edward Farhi, +1 more

- 16 Feb 2018 -

arXiv: Quantum Physics

TL;DR: A quantum neural network, QNN, that can represent labeled data, classical or quantum, and be trained by supervised learning, is introduced and it is shown through classical simulation that parameters can be found that allow the QNN to learn to correctly distinguish the two data sets.

...read moreread less

Journal ArticleDOI

Circuit-centric quantum classifiers

Maria Schuld, +5 more

- 06 Mar 2020 -

Physical Review A

TL;DR: A machine learning design is developed to train a quantum circuit specialized in solving a classification problem and it is shown that the circuits perform reasonably well on classical benchmarks.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Journal ArticleDOI

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Journal ArticleDOI

ImageNet classification with deep convolutional neural networks

Alex Krizhevsky, +2 more

- 24 May 2017 -

Communications of The ACM

TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

...read moreread less

Proceedings Article

Rectified Linear Units Improve Restricted Boltzmann Machines

Vinod Nair, +1 more

TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.

...read moreread less

Related Papers (5)

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

Unitary evolution recurrent neural networks

Citations

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Deep learning with coherent nanophotonic circuits

Regularizing and Optimizing LSTM Language Models

Classification with Quantum Neural Networks on Near Term Processors

Circuit-centric quantum classifiers

References

ImageNet Classification with Deep Convolutional Neural Networks

Long short-term memory

Gradient-based learning applied to document recognition

ImageNet classification with deep convolutional neural networks

Rectified Linear Units Improve Restricted Boltzmann Machines

Related Papers (5)

Long short-term memory

Deep Residual Learning for Image Recognition

Gradient-based learning applied to document recognition

Adam: A Method for Stochastic Optimization

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation