G
Giancarlo Kerg
Researcher at Université de Montréal
Publications - 13
Citations - 118
Giancarlo Kerg is an academic researcher from Université de Montréal. The author has contributed to research in topics: Recurrent neural network & Vanishing gradient problem. The author has an hindex of 5, co-authored 8 publications receiving 83 citations.
Papers
More filters
Proceedings Article
Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics
Giancarlo Kerg,Kyle Goyette,Maximilian Puelma Touzel,Gauthier Gidel,Eugene Vorontsov,Yoshua Bengio,Guillaume Lajoie +6 more
TL;DR: In this paper, the authors propose a connectivity structure based on the Schur decomposition, which allows to parametrize matrices with unit-norm eigenspectra without orthogonality constraints on eigenbases.
Proceedings Article
h-detach: Modifying the LSTM Gradient Towards Better Optimization
Devansh Arpit,Bhargav Kanuparthi,Giancarlo Kerg,Nan Rosemary Ke,Ioannis Mitliagkas,Yoshua Bengio +5 more
TL;DR: In this paper, a stochastic algorithm called H-detach was proposed to prevent the vanishing gradient problem in LSTM by suppressing the gradient components through the linear path (cell state) in the computational graph, which can prevent LSTMs from capturing long-term dependencies.
Posted Content
Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics
Giancarlo Kerg,Kyle Goyette,Maximilian Puelma Touzel,Gauthier Gidel,Eugene Vorontsov,Yoshua Bengio,Guillaume Lajoie +6 more
TL;DR: This work proposes a novel connectivity structure based on the Schur decomposition and a splitting of theSchur form into normal and non-normal parts that retains the stability advantages and training speed of orthogonal RNNs while enhancing expressivity, especially on tasks that require computations over ongoing input sequences.
Posted Content
h-detach: Modifying the LSTM Gradient Towards Better Optimization
Devansh Arpit,Bhargav Kanuparthi,Giancarlo Kerg,Nan Rosemary Ke,Ioannis Mitliagkas,Yoshua Bengio +5 more
TL;DR: A simple stochastic algorithm is introduced that prevents gradients flowing through this path from getting suppressed, thus allowing the LSTM to capture such dependencies better and show significant improvements over vanilla L STM gradient based training in terms of convergence speed, robustness to seed and learning rate, and generalization.
Proceedings Article
Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization
Stanisław Jastrzębski,Devansh Arpit,Oliver Astrand,Giancarlo Kerg,Huan Wang,Caiming Xiong,Richard Socher,Kyunghyun Cho,Krzysztof J. Geras +8 more
TL;DR: This paper showed that the early value of the trace of the Fisher Information Matrix (FIM) correlates strongly with the final generalization and showed that in the absence of implicit or explicit regularization, the trace can increase to a large value early in training, to which they refer as catastrophic Fisher explosion.