scispace - formally typeset
Search or ask a question

Showing papers by "Geoffrey E. Hinton published in 1996"


21 May 1996
TL;DR: This work presents an exact Expectation{Maximization algorithm for determining the parameters of this mixture of factor analyzers which concurrently performs clustering and dimensionality reduction, and can be thought of as a reduced dimension mixture of Gaussians.
Abstract: Factor analysis, a statistical method for modeling the covariance structure of high dimensional data using a small number of latent variables, can be extended by allowing di erent local factor models in di erent regions of the input space. This results in a model which concurrently performs clustering and dimensionality reduction, and can be thought of as a reduced dimension mixture of Gaussians. We present an exact Expectation{Maximization algorithm for tting the parameters of this mixture of factor analyzers.

705 citations


16 Sep 1996
TL;DR: The Expectation Maximization (EM) algorithm for estimating the parameters of linear systems (LDS) is introduced and its relation to factor analysis and other data modeling techniques is pointed out.
Abstract: Linear systems have been used extensively in engineering to model and control the behavior of dynamical systems. In this note, we present the Expectation Maximization (EM) algorithm for estimating the parameters of linear systems (Shumway and Sto er, 1982). We also point out the relationship between linear dynamical systems, factor analysis, and hidden Markov models. Introduction The goal of this note is to introduce the EM algorithm for estimating the parameters of linear dynamical systems (LDS). Such linear systems can be used both for supervised and unsupervised modeling of time series. We rst describe the model and then brie y point out its relation to factor analysis and other data modeling techniques. The Model Linear time-invariant dynamical systems, also known as linear Gaussian state-space models, can be described by the following two equations: xt+1 = Axt +wt (1) yt = Cxt+ vt: (2) Time is indexed by the discrete index t. The output yt is a linear function of the state, xt, and the state at one time step depends linearly on the previous state. Both state and output noise, wt and vt, are zero-mean normally distributed random variables with covariance matrices Q and R, respectively. Only the output of the system is observed, the state and all the noise variables are hidden. Rather than regarding the state as a deterministic value corrupted by random noise, we combine the state variable and the state noise variable into a single Gaussian random

593 citations


Book
01 Apr 1996
TL;DR: The assumption that acquired character istics are not in- herited is ofte n taken to imply that adaptations t he adaptations an organism learns dur ing its lifeti me cannot guide the course of evolut ion.
Abstract: The assumption that acquired character istics are not in­ herited is ofte n taken to imply t hat t he adaptations t hat an organism learns dur ing its lifeti me cannot guide t he course of evolut ion . This infere nce is incor rec t (2). Learni ng alt ers the shape of t he search space in which evolu tio n operates and thereby pro vides good evolut ion ar y paths towa rds sets of co-adapted alleles. We demonst r at e t hat th is effect allows learning organisms to evolve much faster than their 000 ­ learning equivalents, even though the characteris tics acquired by t he phenotype are not communicated to the genotype.

403 citations


Journal ArticleDOI
TL;DR: A method of recognizing handwritten digits by fitting generative models that are built from deformable B-splines with Gaussian "ink generators" spaced along the length of the spline using a novel elastic matching procedure based on the expectation maximization algorithm.
Abstract: We describe a method of recognizing handwritten digits by fitting generative models that are built from deformable B-splines with Gaussian "ink generators" spaced along the length of the spline. The splines are adjusted using a novel elastic matching procedure based on the expectation maximization algorithm that maximizes the likelihood of the model generating the data. This approach has many advantages: 1) the system not only produces a classification of the digit but also a rich description of the instantiation parameters which can yield information such as the writing style; 2) the generative models can perform recognition driven segmentation; 3) the method involves a relatively small number of parameters and hence training is relatively easy and fast; and 4) unlike many other recognition schemes, it does not rely on some form of pre-normalization of input images, but can handle arbitrary scalings, translations and a limited degree of image rotation. We have demonstrated that our method of fitting models to images does not get trapped in poor local minima. The main disadvantage of the method is that it requires much more computation than more standard OCR techniques.

227 citations


Journal ArticleDOI
TL;DR: A number of different varieties of Helmholtz machines are suggested, each with its own strengths and weaknesses, and relates them to cortical information processing.

137 citations


16 Sep 1996
TL;DR: This manual describes the preliminary release of the DELVE environment, and recommends that you exercise caution when using this version of DELVE for real work, as it is possible that bugs remain in the software.
Abstract: This manual describes the preliminary release of the DELVE environment. Some features described here have not yet implemented, as noted. Support for regression tasks is presently somewhat more developed than that for classiication tasks. We recommend that you exercise caution when using this version of DELVE for real work, as it is possible that bugs remain in the software. We hope that you will send us reports of any problems you encounter, as well as any other comments you may have on the software or manual, at the e-mail address below. Please mention the version number of the manual and/or the software with any comments you send. All Rights Reserved Permission to use, copy, modify, and distribute this software and its documentation for non-commercial purposes only is hereby granted without fee, provided that the above copyright notice appears in all copies and that both the copyright notice and this permission notice appear in supporting documentation, and that the name of The University of Toronto not be used in advertising or publicity pertaining to distribution of the software without speciic, written prior permission. The University of Toronto makes no representations about the suitability of this software for any purpose. It is provided \as is" without express or implied warranty. The University of Toronto disclaims all warranties with regard to this software, including all implied warranties of merchantability and tness. In no event shall the University of Toronto be liable for any special, indirect or consequential damages or any damages whatsoever resulting from loss of use, data or proots, whether in an action of contract, negligence or other tortious action, arising out of or in connection with the use or performance of this software. If you publish results obtained using DELVE, please cite this manual, and mention the version number of the software that you used.

79 citations


Proceedings ArticleDOI
31 Mar 1996
TL;DR: This work introduces a new approach to the problem of optimal compression when a source code produces multiple codewords for a given symbol, and illustrates the performance of free energy coding on a simple problem where a compression factor of two is gained.
Abstract: We introduce a new approach to the problem of optimal compression when a source code produces multiple codewords for a given symbol. It may seem that the most sensible codeword to use in this case is the shortest one. However, in the proposed free energy approach, random codeword selection yields an effective codeword length that can be less than the shortest codeword length. If the random choices are Boltzmann distributed, the effective length is optimal for the given source code. The expectation-maximization parameter estimation algorithms minimize this effective codeword length. We illustrate the performance of free energy coding on a simple problem where a compression factor of two is gained by using the new method.

20 citations


01 Jan 1996
TL;DR: A method of recognizing handwritten digits by fitting generative models that are built from deformable B- splines with Gaussian "ink generators" spaced along the length of the spline using a novel elastic matching procedure based on the Expectation Maximization algorithm.
Abstract: We describe a method of recognizing handwritten digits by fitting generative models that are built from deformable B- splines with Gaussian "ink generators" spaced along the length of the spline. The splines are adjusted using a novel elastic matching procedure based on the Expectation Maximization (EM) algorithm that maximizes the likelihood of the model generating the data. This approach has many advantages. 1) After identifying the model most likely to have generated the data, the system not only produces a classification of the digit but also a rich description of the instantiation parameters which can yield information such as the writing style. 2) During the process of explaining the image, generative models can perform recognition driven segmentation. 3) The method involves a relatively small number of parameters and hence training is relatively easy and fast. 4) Unlike many other recognition schemes, it does not rely on some form of pre-normalization of input images, but can handle arbitrary scalings, translations and a limited degree of image rotation. We have demonstrated our method of fitting models to images does not get trapped in poor local minima. The main disadvantage of the method is it requires much more computation than more standard OCR techniques. Index Terms-Deformable model, elastic net, optical character recognition, generative model, probabilistic model, mixture model