Giancarlo Kerg

Researcher at Université de Montréal

Publications - 13

Citations - 118

Giancarlo Kerg is an academic researcher from Université de Montréal. The author has contributed to research in topics: Recurrent neural network & Vanishing gradient problem. The author has an hindex of 5, co-authored 8 publications receiving 83 citations.

Papers

PDF

Open Access

More filters

Proceedings Article

Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics

Giancarlo Kerg, +6 more

TL;DR: In this paper, the authors propose a connectivity structure based on the Schur decomposition, which allows to parametrize matrices with unit-norm eigenspectra without orthogonality constraints on eigenbases.

...read moreread less

Proceedings Article

h-detach: Modifying the LSTM Gradient Towards Better Optimization

Devansh Arpit, +5 more

TL;DR: In this paper, a stochastic algorithm called H-detach was proposed to prevent the vanishing gradient problem in LSTM by suppressing the gradient components through the linear path (cell state) in the computational graph, which can prevent LSTMs from capturing long-term dependencies.

...read moreread less

Posted Content

Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics

Giancarlo Kerg, +6 more

- 28 May 2019 -

arXiv: Learning

TL;DR: This work proposes a novel connectivity structure based on the Schur decomposition and a splitting of theSchur form into normal and non-normal parts that retains the stability advantages and training speed of orthogonal RNNs while enhancing expressivity, especially on tasks that require computations over ongoing input sequences.

...read moreread less

Posted Content

h-detach: Modifying the LSTM Gradient Towards Better Optimization

Devansh Arpit, +5 more

- 06 Oct 2018 -

arXiv: Machine Learning

TL;DR: A simple stochastic algorithm is introduced that prevents gradients flowing through this path from getting suppressed, thus allowing the LSTM to capture such dependencies better and show significant improvements over vanilla L STM gradient based training in terms of convergence speed, robustness to seed and learning rate, and generalization.

...read moreread less

Proceedings Article

Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization

Stanisław Jastrzębski, +8 more

TL;DR: This paper showed that the early value of the trace of the Fisher Information Matrix (FIM) correlates strongly with the final generalization and showed that in the absence of implicit or explicit regularization, the trace can increase to a large value early in training, to which they refer as catastrophic Fisher explosion.

...read moreread less