scispace - formally typeset
Search or ask a question

Showing papers on "Deep belief network published in 2007"


01 Jan 2007
TL;DR: These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.
Abstract: Complexity theory of circuits strongly suggests that deep architectures can be much more ef cient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization appears to often get stuck in poor solutions Hinton et al recently introduced a greedy layer-wise unsupervised learning algorithm for Deep Belief Networks (DBN), a generative model with many layers of hidden causal variables In the context of the above optimization problem, we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task Our experiments also confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization

1,128 citations


Proceedings Article
03 Dec 2007
TL;DR: An unsupervised learning model is presented that faithfully mimics certain properties of visual area V2 and the encoding of these more complex "corner" features matches well with the results from the Ito & Komatsu's study of biological V2 responses, suggesting that this sparse variant of deep belief networks holds promise for modeling more higher-order features.
Abstract: Motivated in part by the hierarchical organization of the cortex, a number of algorithms have recently been proposed that try to learn hierarchical, or "deep," structure from unlabeled data. While several authors have formally or informally compared their algorithms to computations performed in visual area V1 (and the cochlea), little attempt has been made thus far to evaluate these algorithms in terms of their fidelity for mimicking computations at deeper levels in the cortical hierarchy. This paper presents an unsupervised learning model that faithfully mimics certain properties of visual area V2. Specifically, we develop a sparse variant of the deep belief networks of Hinton et al. (2006). We learn two layers of nodes in the network, and demonstrate that the first layer, similar to prior work on sparse coding and ICA, results in localized, oriented, edge filters, similar to the Gabor functions known to model V1 cell receptive fields. Further, the second layer in our model encodes correlations of the first layer responses in the data. Specifically, it picks up both colinear ("contour") features as well as corners and junctions. More interestingly, in a quantitative comparison, the encoding of these more complex "corner" features matches well with the results from the Ito & Komatsu's study of biological V2 responses. This suggests that our sparse variant of deep belief networks holds promise for modeling more higher-order features.

1,048 citations


Proceedings Article
03 Dec 2007
TL;DR: This work proposes a simple criterion to compare and select different unsupervised machines based on the trade-off between the reconstruction error and the information content of the representation, and describes a novel and efficient algorithm to learn sparse representations.
Abstract: Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the representation to have certain desirable properties (e.g. low dimension, sparsity, etc). Others are based on approximating density by stochastically reconstructing the input from the representation. We describe a novel and efficient algorithm to learn sparse representations, and compare it theoretically and experimentally with a similar machine trained probabilistically, namely a Restricted Boltzmann Machine. We propose a simple criterion to compare and select different unsupervised machines based on the trade-off between the reconstruction error and the information content of the representation. We demonstrate this method by extracting features from a dataset of handwritten numerals, and from a dataset of natural image patches. We show that by stacking multiple levels of such machines and by training sequentially, high-order dependencies between the input observed variables can be captured.

852 citations


01 Jan 2007
TL;DR: In this article, nonlinear units are obtained by passing the outputs of linear Gaussian units through various nonlinearities, and a general variational method that maximizes a lower bound on the likelihood of a training set is presented.
Abstract: We view perceptual tasks such as vision and speech recognition as inference problems where the goal is to estimate the posterior distribution over latent variables (e.g., depth in stereo vision) given the sensory input. The recent flurry of research in independent component analysis exemplifies the importance of inferring the continuousvalued latent variables of input data. The latent variables found by this method are linearly related to the input, but perception requires nonlinear inferences such as decision-making. Even continuous latent variables such as depth are nonlinearly related to the input. In this paper, we present a unifying framework for stochastic neural networks with nonlinear latent variables. Nonlinear units are obtained by passing the outputs of linear Gaussian units through various nonlinearities. We present a general variational method that maximizes a lower bound on the likelihood of a training set, and give results on two visual feature extraction problems.

57 citations


Proceedings ArticleDOI
29 Oct 2007
TL;DR: This work presents a methodology for spam detection based on DBNs and evaluates its performance on three widely used datasets and compares it to Support Vector Machines (SVMs) which is the state of the art method for spam filtering in terms of classification performance.
Abstract: This paper proposes a novel approach for spam filtering based on the use of Deep Belief Networks (DBNs). In contrast to conventional feedfoward neural networks having one or two hidden layers, DBNs are feedforward neural networks with many hidden layers. Until recently it was not clear how to initialize the weights of deep neural networks, which resulted in poor solutions with low generalization capabilities. A greedy layer-wise unsupervised algorithm was recently proposed to tackle this problem with successful results. In this work we present a methodology for spam detection based on DBNs and evaluate its performance on three widely used datasets. We also compare our method to Support Vector Machines (SVMs) which is the state-of-the-art method for spam filtering in terms of classification performance. Our experiments indicate that using DBNs to filter spam e-mails is a viable methodology, since they achieve similar or even better performance than SVMs on all three datasets.

36 citations


Book ChapterDOI
TL;DR: This chapter focuses on what is required for the learning of complex behaviors, in a mathematical sense, and brings forward two types of arguments which convey the message that many currently popular machine learning approaches to learning flexible functions have fundamental limitations that render them inappropriate for learning highly varying functions.
Abstract: A common goal of computational neuroscience and of artificial intelligence research based on statistical learning algorithms is the discovery and understanding of computational principles that could explain what we consider adaptive intelligence, in animals as well as in machines. This chapter focuses on what is required for the learning of complex behaviors. We believe it involves the learning of highly varying functions, in a mathematical sense. We bring forward two types of arguments which convey the message that many currently popular machine learning approaches to learning flexible functions have fundamental limitations that render them inappropriate for learning highly varying functions. The first issue concerns the representation of such functions with what we call shallow model architectures. We discuss limitations of shallow architectures, such as so-called kernel machines, boosting algorithms, and one-hidden-layer artificial neural networks. The second issue is more focused and concerns kernel machines with a local kernel (the type used most often in practice) that act like a collection of template-matching units. We present mathematical results on such computational architectures showing that they have a limitation similar to those already proved for older non-parametric methods, and connected to the so-called curse of dimensionality. Though it has long been believed that efficient learning in deep architectures is difficult, recently proposed computational principles for learning in deep architectures may offer a breakthrough.

20 citations


Dissertation
03 Jan 2007

8 citations


Dissertation
01 Jan 2007
TL;DR: This thesis analyzes the modelling of visual cognitive representations based on extracting cognitive components from the MNIST dataset of handwritten digits using simple unsupervised linear and non-linear matrix factorizations both non-negative and unconstrained based on gradient descent learning.
Abstract: This thesis analyzes the modelling of visual cognitive representations based on extracting cognitive components from the MNIST dataset of handwritten digits using simple unsupervised linear and non-linear matrix factorizations both non-negative and unconstrained based on gradient descent learning. We introduce two different classes of generative models for modelling the cognitive data: Mixture Models and Deep Network Models. Mixture models based on K-Means, Guassian and factor analyzer kernel functions are presented as simple generative models in a general framework. From simulations we analyze the generative properties of these models and show how they render insufficient to proper model the complex distribution of the visual cognitive data. Motivated by the introduction of deep belief nets by Hinton et al. [12] we propose a simpler generative deep network model based on cognitive components. A theoretical framework is presented as individual modules for building a generative hierarchical network model. We analyze the performance in terms of classification and generation of MNIST digits and show how our simplifications compared to Hinton et al. [12] leads to degraded performance. In this respect we outline the differences and conjecture obvious improvements.

2 citations