Showing papers by "Michael I. Jordan published in 1996"

PDF

Open Access

Journal Article•DOI•

[...]

David Cohn¹, Zoubin Ghahramani¹, Michael I. Jordan¹•Institutions (1)

01 Jan 1996-Journal of Artificial Intelligence Research

TL;DR: In this article, the optimal data selection techniques have been used with feed-forward neural networks and showed how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression.

...read moreread less

Abstract: For many types of machine learning algorithms, one can compute the statistically "optimal" way to select training data. In this paper, we review how optimal data selection techniques have been used with feedforward neural networks. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are computationally expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate. Empirically, we observe that the optimality criterion sharply decreases the number of training examples the learner needs in order to achieve good performance.

...read moreread less

2,122 citations

Journal Article•DOI•

On convergence properties of the em algorithm for gaussian mixtures

[...]

Lei Xu¹, Michael I. Jordan²•Institutions (2)

The Chinese University of Hong Kong¹, Massachusetts Institute of Technology²

01 Jan 1996-Neural Computation

TL;DR: The mathematical connection between the Expectation-Maximization (EM) algorithm and gradient-based approaches for maximum likelihood learning of finite gaussian mixtures is built up and an explicit expression for the matrix is provided.

...read moreread less

Abstract: We build up the mathematical connection between the “Expectation-Maximization” (EM) algorithm and gradient-based approaches for maximum likelihood learning of finite gaussian mixtures. We show that the EM step in parameter space is obtained from the gradient via a projection matrix P, and we provide an explicit expression for the matrix. We then analyze the convergence of EM in terms of special properties of P and provide new results analyzing the effect that P has on the likelihood surface. Based on these mathematical results, we present a comparative discussion of the advantages and disadvantages of EM and other algorithms for the learning of gaussian mixture models.

...read moreread less

849 citations

Journal Article•DOI•

Mean field theory for sigmoid belief networks

[...]

Lawrence K. Saul¹, Tommi S. Jaakkola¹, Michael I. Jordan¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1996-Journal of Artificial Intelligence Research

TL;DR: The utility of a mean field theory for sigmoid belief networks based on ideas from statistical mechanics is demonstrated on a benchmark problem in statistical pattern recognition-the classification of handwritten digits.

...read moreread less

Abstract: We develop a mean field theory for sigmoid belief networks based on ideas from statistical mechanics. Our mean field theory provides a tractable approximation to the true probability distribution in these networks; it also yields a lower bound on the likelihood of evidence. We demonstrate the utility of this framework on a benchmark problem in statistical pattern recognition-the classification of handwritten digits.

...read moreread less

428 citations

Posted Content•

Active Learning with Statistical Models

[...]

David Cohn¹, Zoubin Ghahramani¹, Michael I. Jordan¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Mar 1996-arXiv: Artificial Intelligence

TL;DR: This work shows how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression.

...read moreread less

Abstract: For many types of machine learning algorithms, one can compute the statistically `optimal' way to select training data. In this paper, we review how optimal data selection techniques have been used with feedforward neural networks. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are computationally expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate. Empirically, we observe that the optimality criterion sharply decreases the number of training examples the learner needs in order to achieve good performance.

...read moreread less

274 citations

Journal Article•DOI•

Generalization to local remappings of the visuomotor coordinate transformation.

[...]

Zoubin Ghahramani¹, Daniel M. Wolpert, Michael I. Jordan¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Nov 1996-The Journal of Neuroscience

TL;DR: A simple model, in which the transformation is computed via the population activity of a set of units with large sensory receptive fields, is shown to capture the observed pattern.

...read moreread less

Abstract: During visually guided movement, visual representations of target location must be transformed into coordinates appropriate for movement. To investigate the representation and plasticity of the visuomotor coordinate transformation, we examined the changes in pointing behavior after local visuomotor remappings. The visual feedback of finger position was limited to one or two locations in the workspace, at which a discrepancy was introduced between the actual and visually perceived finger position. These remappings induced changes in pointing, which were largest near the locus of remapping and decreased away from it. This pattern of spatial generalization highly constrains models of the computation of the visuomotor transformation in the CNS. A simple model, in which the transformation is computed via the population activity of a set of units with large sensory receptive fields, is shown to capture the observed pattern.

...read moreread less

204 citations

Proceedings Article•

Hidden Markov Decision Trees

[...]

Michael I. Jordan¹, Zoubin Ghahramani², Lawrence K. Saul¹•Institutions (2)

Massachusetts Institute of Technology¹, University of Toronto²

03 Dec 1996

TL;DR: A time series model that can be viewed as a decision tree with Markov temporal structure is studied and a Viterbi-like assumption is made to pick out a single most likely state sequence.

...read moreread less

Abstract: We study a time series model that can be viewed as a decision tree with Markov temporal structure. The model is intractable for exact calculations, thus we utilize variational approximations. We consider three different distributions for the approximation: one in which the Markov calculations are performed exactly and the layers of the decision tree are decoupled, one in which the decision tree calculations are performed exactly and the time steps of the Markov chain are decoupled, and one in which a Viterbi-like assumption is made to pick out a single most likely state sequence. We present simulation results for artificial data and the Bach chorales.

...read moreread less

109 citations

Journal Article•DOI•

Local linear perceptrons for classification

[...]

Ethem Alpaydin¹, Michael I. Jordan•Institutions (1)

Boğaziçi University¹

01 May 1996-IEEE Transactions on Neural Networks

TL;DR: A structure composed of local linear perceptrons for approximating global class discriminants is investigated and it is concluded that even on such a high-dimensional problem, such local models are promising, much better than RBF's and use much less memory.

...read moreread less

Abstract: A structure composed of local linear perceptrons for approximating global class discriminants is investigated. Such local linear models may be combined in a cooperative or competitive way. In the cooperative model, a weighted sum of the outputs of the local perceptrons is computed where the weight is a function of the distance between the input and the position of the local perceptron. In the competitive model, the cost function dictates a mixture model where only one of the local perceptrons give output. Learning of the local models' positions and the linear mappings they implement are coupled and both supervised. We show that this is preferable to the uncoupled case where the positions are trained in an unsupervised manner before the separate, supervised training of mappings. We use goodness criteria based on the cross-entropy and give learning equations for both the cooperative and competitive cases. The coupled and uncoupled versions of cooperative and competitive approaches are compared among themselves and with multilayer perceptrons of sigmoidal hidden units and radial basis functions (RBFs) of Gaussian units on the application of recognition of handwritten digits. The criteria of comparison are the generalization accuracy, learning time, and the number of free parameters. We conclude that even on such a high-dimensional problem, such local models are promising. They generalize much better than RBF's and use much less memory. When compared with multilayer perceptrons, we note that local models learn much faster and generalize as well and sometimes better with comparable number of parameters.

...read moreread less

82 citations

Proceedings Article•

Computing upper and lower bounds on likelihoods in intractable networks

[...]

Tommi S. Jaakkola¹, Michael I. Jordan¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Aug 1996

TL;DR: In this paper, the authors present deterministic techniques for computing upper and lower bounds on marginal probabilities in sigmoid and noisy-OR networks and illustrate the tightness of the bounds by numerical experiments.

...read moreread less

Abstract: We present deterministic techniques for computing upper and lower bounds on marginal probabilities in sigmoid and noisy-OR networks These techniques become useful when the size of the network (or clique size) precludes exact computations We illustrate the tightness of the bounds by numerical experiments

...read moreread less

74 citations

Book Chapter•DOI•

Chapter 2 Computational aspects of motor control and motor learning

[...]

Michael I. Jordan¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1996

TL;DR: This chapter reviews various computational issues that arise in the study of motor control and motor learning, and develops some of the basic ideas in the control of dynamical systems, distinguishing between feedback control and feedforward control.

...read moreread less

Abstract: Publisher Summary This chapter reviews various computational issues that arise in the study of motor control and motor learning. It describes feedback control, feedforward control, the problem of delay, observers, learning algorithms, motor learning, and reference models. It focuses on basic theoretical issues with broad applicability. The chapter develops some of the basic ideas in the control of dynamical systems, distinguishing between feedback control and feedforward control. In general, controlling a system involves finding an input to the system that will cause a desired behavior at its output. Feedback control and feedforward control can both be understood as techniques for inverting a dynamical system. The chapter discusses some mathematical representations for dynamical systems.

...read moreread less

51 citations

Proceedings Article•

Recursive Algorithms for Approximating Probabilities in Graphical Models

[...]

Tommi S. Jaakkola¹, Michael I. Jordan¹•Institutions (1)

Massachusetts Institute of Technology¹

03 Dec 1996

TL;DR: A recursive node-elimination formalism for efficiently approximating large probabilistic networks and shows that Boltzmann machines, sigmoid belief networks, or any combination (i.e., chain graphs) can be handled within the same framework.

...read moreread less

Abstract: We develop a recursive node-elimination formalism for efficiently approximating large probabilistic networks. No constraints are set on the network topologies. Yet the formalism can be straightforwardly integrated with exact methods whenever they are/become applicable. The approximations we use are controlled: they maintain consistently upper and lower bounds on the desired quantities at all times. We show that Boltzmann machines, sigmoid belief networks, or any combination (i.e., chain graphs) can be handled within the same framework. The accuracy of the methods is verified experimentally.

...read moreread less

40 citations

Posted Content•

Mean Field Theory for Sigmoid Belief Networks

[...]

Lawrence K. Saul¹, Tommi S. Jaakkola¹, Michael I. Jordan¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Mar 1996-arXiv: Artificial Intelligence

TL;DR: In this article, a mean field theory for sigmoid belief networks based on ideas from statistical mechanics was developed, which provides a tractable approximation to the true probability distribution in these networks; it also yields a lower bound on the likelihood of evidence.

...read moreread less

Journal Article•DOI•

Neural networks

[...]

Michael I. Jordan¹, Christopher M. Bishop²•Institutions (2)

Massachusetts Institute of Technology¹, Aston University²

01 Mar 1996-ACM Computing Surveys

TL;DR: An overview of current research on artificial neural networks is presented, emphasizing a statistical perspective, that views neural networks as parameterized graphs that make probabilistic assumptions about data and learning algorithms as methods for finding parameter values that look probable in the light of the data.

...read moreread less

Abstract: We present an overview of current research on artificial neural networks, emphasizing a statistical perspective. We view neural networks as parameterized graphs that make probabilistic assumptions about data, and view learning algorithms as methods for finding parameter values that look probable in the light of the data. We discuss basic issues in representation and learning, and treat some of the practical issues that arise in fitting networks to data. We also discuss links between neural networks and the general formalism of graphical models.

...read moreread less

Proceedings Article•

A Variational Principle for Model-based Morphing

[...]

Lawrence K. Saul¹, Michael I. Jordan¹•Institutions (1)

Massachusetts Institute of Technology¹

03 Dec 1996

TL;DR: From this path functional, the Euler-Lagrange equations for extremal motion are derived and it is shown that this interpolation can be done efficiently, in high dimensions, for Gaussian, Dirichlet, and mixture models.

...read moreread less

Abstract: Given a multidimensional data set and a model of its density, we consider how to define the optimal interpolation between two points. This is done by assigning a cost to each path through space, based on two competing goals-one to interpolate through regions of high density, the other to minimize arc length. From this path functional, we derive the Euler-Lagrange equations for extremal motion; given two points, the desired interpolation is found by solving a boundary value problem. We show that this interpolation can be done efficiently, in high dimensions, for Gaussian, Dirichlet, and mixture models.

...read moreread less

Proceedings Article•

Triangulation by Continuous Embedding

[...]

Marina Meila¹, Michael I. Jordan¹•Institutions (1)

Massachusetts Institute of Technology¹

03 Dec 1996

TL;DR: Two ways of embedding the triangulation problem into continuous domain are presented and it is shown that they perform well compared to the best known heuristic.

...read moreread less

Abstract: When triangulating a belief network we aim to obtain a junction tree of minimum state space. According to (Rose, 1970), searching for the optimal triangulation can be cast as a search over all the permutations of the graph's vertices. Our approach is to embed the discrete set of permutations in a convex continuous domain D. By suitably extending the cost function over D and solving the continous nonlinear optimization task we hope to obtain a good triangulation with respect to the aformentioned cost. This paper presents two ways of embedding the triangulation problem into continuous domain and shows that they perform well compared to the best known heuristic.

...read moreread less