scispace - formally typeset
Search or ask a question

Showing papers by "Michael I. Jordan published in 1994"


Journal ArticleDOI
TL;DR: An Expectation-Maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning and an on-line learning algorithm in which the parameters are updated incrementally.
Abstract: We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM's). Learning is treated as a maximum likelihood problem; in particular, we present an Expectation-Maximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an on-line learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain.

2,418 citations


Book ChapterDOI
01 Jan 1994
TL;DR: A new framework for learning without state-estimation in POMDPs is developed by including stochastic policies in the search space, and by defining the value or utility of a distribution over states.
Abstract: Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning control architectures for embedded agents. Unfortunately all of the theory and much of the practice (see Barto et al ., 1983, for an exception) of RL is limited to Markovian decision processes (MDPs). Many real-world decision tasks, however, are inherently non-Markovian, i.e., the state of the environment is only incompletely known to the learning agent. In this paper we consider only partially observable MDPs (POMDPs), a useful class of non-Markovian decision processes. Most previous approaches to such problems have combined computationally expensive state-estimation techniques with learning control. This paper investigates learning in POMDPs without resorting to any form of state estimation. We present results about what TD(0) and Q-learning will do when applied to POMDPs. It is shown that the conventional discounted RL framework is inadequate to deal with POMDPs. Finally we develop a new framework for learning without state-estimation in POMDPs by including stochastic policies in the search space, and by defining the value or utility of a distribution over states.

406 citations


Proceedings Article
01 Jan 1994
TL;DR: This work proposes and analyze a new learning algorithm to solve a certain class of non-Markov decision problems and operates in the space of stochastic policies, a space which can yield a policy that performs considerably better than any deterministic policy.
Abstract: Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due to successes in the theoretical analysis of their behavior in Markov environments. If the Markov assumption is removed, however, neither generally the algorithms nor the analyses continue to be usable. We propose and analyze a new learning algorithm to solve a certain class of non-Markov decision problems. Our algorithm applies to problems in which the environment is Markov, but the learner has restricted access to state information. The algorithm involves a Monte-Carlo policy evaluation combined with a policy improvement method that is similar to that of Markov decision problems and is guaranteed to converge to a local maximum. The algorithm operates in the space of stochastic policies, a space which can yield a policy that performs considerably better than any deterministic policy. Although the space of stochastic policies is continuous--even for a discrete action space--our algorithm is computationally tractable.

404 citations


Proceedings Article
01 Jan 1994
TL;DR: This paper presents a function approximator based on a simple extension to state aggregation (a commonly used form of compact representation), namely soft state aggregation, a theory of convergence for RL with arbitrary, but fixed, softstate aggregation, and a novel intuitive understanding of the effect of state aggregation on online RL.
Abstract: It is widely accepted that the use of more compact representations than lookup tables is crucial to scaling reinforcement learning (RL) algorithms to real-world problems. Unfortunately almost all of the theory of reinforcement learning assumes lookup table representations. In this paper we address the pressing issue of combining function approximation and RL, and present 1) a function approximator based on a simple extension to state aggregation (a commonly used form of compact representation), namely soft state aggregation, 2) a theory of convergence for RL with arbitrary, but fixed, soft state aggregation, 3) a novel intuitive understanding of the effect of state aggregation on online RL, and 4) a new heuristic adaptive state aggregation algorithm that finds improved compact representations by exploiting the non-discrete nature of soft state aggregation. Preliminary empirical results are also presented.

343 citations


Proceedings Article
01 Jan 1994
TL;DR: An alternative model for mixtures of experts which uses a different parametric form for the gating network, trained by the EM algorithm, and which yields faster convergence.
Abstract: We propose an alternative model for mixtures of experts which uses a different parametric form for the gating network. The modified model is trained by the EM algorithm. In comparison with earlier models--trained by either EM or gradient ascent--there is no need to select a learning stepsize. We report simulation experiments which show that the new architecture yields faster convergence. We also apply the new model to two problem domains: piecewise nonlinear function approximation and the combination of multiple previously trained classifiers.

258 citations


ReportDOI
01 Dec 1994
TL;DR: A set of algorithms are described that handle clustering, classification, and function approximation from incomplete data in a principled and efficient manner that make two distinct appeals to the Expectation-Maximization principle.
Abstract: Real-world learning tasks often involve high-dimensional data sets with complex patterns of missing features. In this paper we review the problem of learning from incomplete data from two statistical perspectives---the likelihood-based and the Bayesian. The goal is two-fold: to place current neural network approaches to missing data within a statistical framework, and to describe a set of algorithms, derived from the likelihood-based framework, that handle clustering, classification, and function approximation

243 citations


Journal ArticleDOI
TL;DR: It is concluded that perceptual distortion of curvature contributes to the curvature seen in human point-to-point arm movements and that this must be taken into account in the assessment of models of trajectory formation.
Abstract: Unconstrained point-to-point human arm movements are generally gently curved, a fact which has been used to assess the validity of models of trajectory formation. In this study we examined the relationship between curvature perception and movement curvature for planar sagittal and transverse arm movements. We found a significant correlation (P<0.0001, n=16) between the curvature perceived as straight and the curvature of actual arm movements. We suggest that subjects try to make straight-line movements, but that actual movements are curved because visual perceptual distortion makes the movements appear to be straighter than they really are. We conclude that perceptual distortion of curvature contributes to the curvature seen in human point-to-point arm movements and that this must be taken into account in the assessment of models of trajectory formation.

139 citations


Proceedings Article
01 Jan 1994
TL;DR: A statistical mechanical framework for the modeling of discrete time series is proposed, and maximum likelihood estimation is done via Boltzmann learning in one-dimensional networks with tied weights, which motivates new architectures that address particular shortcomings of HMMs.
Abstract: We propose a statistical mechanical framework for the modeling of discrete time series. Maximum likelihood estimation is done via Boltzmann learning in one-dimensional networks with tied weights. We call these networks Boltzmann chains and show that they contain hidden Markov models (HMMs) as a special case. Our framework also motivates new architectures that address particular shortcomings of HMMs. We look at two such architectures: parallel chains that model feature sets with disparate time scales, and looped networks that model long-term dependencies between hidden states. For these networks, we show how to implement the Boltzmann learning rule exactly, in polynomial time, without resort to simulated or mean-field annealing. The necessary computations are done by exact decimation procedures from statistical mechanics.

81 citations


Journal ArticleDOI
TL;DR: A neural network architecture was designed that learned to produce neural commands to a set of muscle-like actuators based only on information about spatial errors to generate point-to-point horizontal arm movements and the resulting muscle activation patterns and hand trajectories were found to be similar to those observed experimentally for human subjects.
Abstract: Unconstrained point-to-point reaching movements performed in the horizontal plane tend to follow roughly straight hand paths with smooth, bell-shaped velocity profiles. The objective of the research reported here was to explore the hypothesis that these data reflect an underlying learning process that prefers simple paths in space. Under this hypothesis, movements are learned based only on spatial errors between the actual hand path and a desired hand path; temporally varying targets are not allowed. We designed a neural network architecture that learned to produce neural commands to a set of muscle-like actuators based only on information about spatial errors. Following repetitive executions of the reaching task, the network was able to generate point-to-point horizontal arm movements and the resulting muscle activation patterns and hand trajectories were found to be similar to those observed experimentally for human subjects. The implications of our results with respect to current theories of multijoint limb movement generation are discussed.

57 citations


ReportDOI
01 Jan 1994
TL;DR: In this paper, the same principles are used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression, which are both efficient and accurate.
Abstract: For many types of learners one can compute the statistically "optimal" way to select data. We review how these techniques have been used with feedforward neural networks [MacKay, 1992; Cohn, 1994]. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate.

54 citations


Proceedings ArticleDOI
16 Jul 1994
TL;DR: A statistical approach to decision tree modeling is described, in which each decision in the tree is modeled parametrically as is the process by which an output is generated from an input and a sequence of decisions, yielding a likelihood measure of goodness of fit.
Abstract: A statistical approach to decision tree modeling is described. In this approach, each decision in the tree is modeled parametrically as is the process by which an output is generated from an input and a sequence of decisions. The resulting model yields a likelihood measure of goodness of fit, allowing ML and MAP estimation techniques to be utilized. An efficient algorithm is presented to estimate the parameters in the tree. The model selection problem is presented and several alternative proposals are considered. A hidden Markov version of the tree is described for data sequences that have temporal dependencies.

Journal ArticleDOI
TL;DR: A large family of Boltzmann machines that can be trained by standard gradient descent, which can have one or more layers of hidden units, with tree-like connectivity, are introduced.
Abstract: We introduce a large family of Boltzmann machines that can be trained by standard gradient descent. The networks can have one or more layers of hidden units, with tree-like connectivity. We show how to implement the supervised learning algorithm for these Boltzmann machines exactly, without resort to simulated or mean-field annealing. The stochastic averages that yield the gradients in weight space are computed by the technique of decimation. We present results on the problems of N-bit parity and the detection of hidden symmetries.

Proceedings Article
01 Jan 1994
TL;DR: Experimental results and simulations based on a novel approach that investigates the temporal propagation of errors in the sensorimotor integration process provide direct support for the existence of an internal model of the central nervous system.
Abstract: Based on computational principles, with as yet no direct experimental validation, it has been proposed that the central nervous system (CNS) uses an internal model to simulate the dynamic behavior of the motor system in planning, control and learning (Sutton and Barto, 1981; Ito, 1984; Kawato et al., 1987; Jordan and Rumelhart, 1992; Miall et al., 1993). We present experimental results and simulations based on a novel approach that investigates the temporal propagation of errors in the sensorimotor integration process. Our results provide direct support for the existence of an internal model.

Proceedings Article
01 Jan 1994
TL;DR: This work has studied the generalization of the visuomotor map subsequent to both local and context-dependent remappings, indicating that a single point in visual space can be mapped to two different finger locations depending on a context variable-the starting point of the movement.
Abstract: One of the fundamental properties that both neural networks and the central nervous system share is the ability to learn and generalize from examples While this property has been studied extensively in the neural network literature it has not been thoroughly explored in human perceptual and motor learning We have chosen a coordinate transformation system-the visuomotor map which transforms visual coordinates into motor coordinates--to study the generalization effects of learning new input-output pairs Using a paradigm of computer controlled altered visual feedback, we have studied the generalization of the visuomotor map subsequent to both local and context-dependent remappings A local remapping of one or two input-output pairs induced a significant global, yet decaying, change in the visuomotor map, suggesting a representation for the map composed of units with large functional receptive fields Our study of context-dependent remappings indicated that a single point in visual space can be mapped to two different finger locations depending on a context variable-the starting point of the movement Furthermore, as the context is varied there is a gradual shift between the two remappings, consistent with two visuomotor modules being learned and gated smoothly with the context

Book ChapterDOI
10 Jul 1994
TL;DR: A statistical approach to decision tree modeling is described, in which each decision in the tree is modeled parametrically as is the process by which an output is generated from an input and a sequence of decisions, yielding a likelihood measure of goodness of fit.
Abstract: A statistical approach to decision tree modeling is described In this approach, each decision in the tree is modeled parametrically as is the process by which an output is generated from an input and a sequence of decisions The resulting model yields a likelihood measure of goodness of fit, allowing ML and MAP estimation techniques to be utilized An efficient algorithm is presented to estimate the parameters in the tree The model selection problem is presented and several alternative proposals are considered A hidden Markov version of the tree is described for data sequences that have temporal dependencies