scispace - formally typeset
S

Sepp Hochreiter

Researcher at Johannes Kepler University of Linz

Publications -  202
Citations -  103560

Sepp Hochreiter is an academic researcher from Johannes Kepler University of Linz. The author has contributed to research in topics: Computer science & Deep learning. The author has an hindex of 42, co-authored 168 publications receiving 72856 citations. Previous affiliations of Sepp Hochreiter include Information Technology University & Dalle Molle Institute for Artificial Intelligence Research.

Papers
More filters
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Posted Content

GANs Trained by a Two Time-Scale Update Rule Converge to a Nash Equilibrium

TL;DR: In this article, a two time-scale update rule (TTUR) was proposed for training GANs with stochastic gradient descent on arbitrary GAN loss functions, which has an individual learning rate for both the discriminator and the generator.
Proceedings Article

GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium

TL;DR: In this paper, a two time-scale update rule (TTUR) was proposed for training GANs with stochastic gradient descent on arbitrary GAN loss functions, which has an individual learning rate for both the discriminator and the generator.
Posted Content

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

TL;DR: The Exponential Linear Unit (ELU) as mentioned in this paper was proposed to alleviate the vanishing gradient problem via the identity for positive values, which has improved learning characteristics compared to the units with other activation functions.
Journal ArticleDOI

The vanishing gradient problem during learning recurrent neural nets and problem solutions

TL;DR: The de-caying error flow is theoretically analyzed, methods trying to overcome vanishing gradients are briefly discussed, and experiments comparing conventional algorithms and alternative methods are presented.