Y
Yoshua Bengio
Researcher at Université de Montréal
Publications - 1146
Citations - 534376
Yoshua Bengio is an academic researcher from Université de Montréal. The author has contributed to research in topics: Artificial neural network & Deep learning. The author has an hindex of 202, co-authored 1033 publications receiving 420313 citations. Previous affiliations of Yoshua Bengio include McGill University & Centre de Recherches Mathématiques.
Papers
More filters
Journal Article
Understanding the difficulty of training deep feedforward neural networks
Xavier Glorot,Yoshua Bengio +1 more
TL;DR: In this article, the authors show that the logistic sigmoid activation is unsuited for deep networks with random initialization because of its mean value, which can drive especially the top hidden layer into saturation.
Journal ArticleDOI
Model Selection for Small Sample Regression
TL;DR: This work presents a new penalization method for performing model selection for regression that is appropriate even for small samples, based on an accurate estimator of the ratio of the expected training error and the expected generalization error, in terms of theexpected eigenvalues of the input covariance matrix.
Proceedings Article
Marginalized Denoising Auto-encoders for Nonlinear Representations
TL;DR: The marginalized Denoising Auto-encoder (mDAE) is presented, which (approximately) marginalizes out the corruption during training and is able to match or outperform the DAE with much fewer training epochs.
Posted Content
Interpolation Consistency Training for Semi-Supervised Learning
TL;DR: Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm, achieves state-of-the-art performance when applied to standard neural network architectures on the CIFAR-10 and SVHN benchmark datasets.
Proceedings Article
An Empirical Study of Example Forgetting during Deep Neural Network Learning
Mariya Toneva,Alessandro Sordoni,Remi Tachet des Combes,Adam Trischler,Yoshua Bengio,Geoffrey J. Gordon +5 more
TL;DR: It is found that certain examples are forgotten with high frequency, and some not at all; a data set’s (un)forgettable examples generalize across neural architectures; and a significant fraction of examples can be omitted from the training data set while still maintaining state-of-the-art generalization performance.