scispace - formally typeset
Y

Yoshua Bengio

Researcher at Université de Montréal

Publications -  1146
Citations -  534376

Yoshua Bengio is an academic researcher from Université de Montréal. The author has contributed to research in topics: Artificial neural network & Deep learning. The author has an hindex of 202, co-authored 1033 publications receiving 420313 citations. Previous affiliations of Yoshua Bengio include McGill University & Centre de Recherches Mathématiques.

Papers
More filters
Journal Article

Understanding the difficulty of training deep feedforward neural networks

TL;DR: In this article, the authors show that the logistic sigmoid activation is unsuited for deep networks with random initialization because of its mean value, which can drive especially the top hidden layer into saturation.
Journal ArticleDOI

Model Selection for Small Sample Regression

TL;DR: This work presents a new penalization method for performing model selection for regression that is appropriate even for small samples, based on an accurate estimator of the ratio of the expected training error and the expected generalization error, in terms of theexpected eigenvalues of the input covariance matrix.
Proceedings Article

Marginalized Denoising Auto-encoders for Nonlinear Representations

TL;DR: The marginalized Denoising Auto-encoder (mDAE) is presented, which (approximately) marginalizes out the corruption during training and is able to match or outperform the DAE with much fewer training epochs.
Posted Content

Interpolation Consistency Training for Semi-Supervised Learning

TL;DR: Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm, achieves state-of-the-art performance when applied to standard neural network architectures on the CIFAR-10 and SVHN benchmark datasets.
Proceedings Article

An Empirical Study of Example Forgetting during Deep Neural Network Learning

TL;DR: It is found that certain examples are forgotten with high frequency, and some not at all; a data set’s (un)forgettable examples generalize across neural architectures; and a significant fraction of examples can be omitted from the training data set while still maintaining state-of-the-art generalization performance.