Showing papers by "Michael I. Jordan published in 1994"

PDF

Open Access

Journal Article•DOI•

Hierarchical mixtures of experts and the EM algorithm

[...]

Michael I. Jordan¹, Robert A. Jacobs²•Institutions (2)

Massachusetts Institute of Technology¹, University of Rochester²

01 Mar 1994-Neural Computation

TL;DR: An Expectation-Maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning and an on-line learning algorithm in which the parameters are updated incrementally.

...read moreread less

Abstract: We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM's). Learning is treated as a maximum likelihood problem; in particular, we present an Expectation-Maximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an on-line learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain.

...read moreread less

2,418 citations

Book Chapter•DOI•

Learning Without State-Estimation in Partially Observable Markovian Decision Processes

[...]

Satinder Singh¹, Tommi S. Jaakkola¹, Michael I. Jordan•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1994

TL;DR: A new framework for learning without state-estimation in POMDPs is developed by including stochastic policies in the search space, and by defining the value or utility of a distribution over states.

...read moreread less

Abstract: Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning control architectures for embedded agents. Unfortunately all of the theory and much of the practice (see Barto et al ., 1983, for an exception) of RL is limited to Markovian decision processes (MDPs). Many real-world decision tasks, however, are inherently non-Markovian, i.e., the state of the environment is only incompletely known to the learning agent. In this paper we consider only partially observable MDPs (POMDPs), a useful class of non-Markovian decision processes. Most previous approaches to such problems have combined computationally expensive state-estimation techniques with learning control. This paper investigates learning in POMDPs without resorting to any form of state estimation. We present results about what TD(0) and Q-learning will do when applied to POMDPs. It is shown that the conventional discounted RL framework is inadequate to deal with POMDPs. Finally we develop a new framework for learning without state-estimation in POMDPs by including stochastic policies in the search space, and by defining the value or utility of a distribution over states.

...read moreread less

406 citations

Proceedings Article•

Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems

[...]

Tommi S. Jaakkola¹, Satinder Singh¹, Michael I. Jordan¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1994

TL;DR: This work proposes and analyze a new learning algorithm to solve a certain class of non-Markov decision problems and operates in the space of stochastic policies, a space which can yield a policy that performs considerably better than any deterministic policy.

...read moreread less

Abstract: Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due to successes in the theoretical analysis of their behavior in Markov environments. If the Markov assumption is removed, however, neither generally the algorithms nor the analyses continue to be usable. We propose and analyze a new learning algorithm to solve a certain class of non-Markov decision problems. Our algorithm applies to problems in which the environment is Markov, but the learner has restricted access to state information. The algorithm involves a Monte-Carlo policy evaluation combined with a policy improvement method that is similar to that of Markov decision problems and is guaranteed to converge to a local maximum. The algorithm operates in the space of stochastic policies, a space which can yield a policy that performs considerably better than any deterministic policy. Although the space of stochastic policies is continuous--even for a discrete action space--our algorithm is computationally tractable.

...read moreread less

404 citations

Proceedings Article•

Reinforcement Learning with Soft State Aggregation

[...]

Satinder Singh¹, Tommi S. Jaakkola¹, Michael I. Jordan¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1994

TL;DR: This paper presents a function approximator based on a simple extension to state aggregation (a commonly used form of compact representation), namely soft state aggregation, a theory of convergence for RL with arbitrary, but fixed, softstate aggregation, and a novel intuitive understanding of the effect of state aggregation on online RL.

...read moreread less

Abstract: It is widely accepted that the use of more compact representations than lookup tables is crucial to scaling reinforcement learning (RL) algorithms to real-world problems. Unfortunately almost all of the theory of reinforcement learning assumes lookup table representations. In this paper we address the pressing issue of combining function approximation and RL, and present 1) a function approximator based on a simple extension to state aggregation (a commonly used form of compact representation), namely soft state aggregation, 2) a theory of convergence for RL with arbitrary, but fixed, soft state aggregation, 3) a novel intuitive understanding of the effect of state aggregation on online RL, and 4) a new heuristic adaptive state aggregation algorithm that finds improved compact representations by exploiting the non-discrete nature of soft state aggregation. Preliminary empirical results are also presented.

...read moreread less

343 citations

Proceedings Article•

An Alternative Model for Mixtures of Experts

[...]

Lei Xu¹, Michael I. Jordan², Geoffrey E. Hinton³•Institutions (3)

The Chinese University of Hong Kong¹, Massachusetts Institute of Technology², University of Toronto³

01 Jan 1994

TL;DR: An alternative model for mixtures of experts which uses a different parametric form for the gating network, trained by the EM algorithm, and which yields faster convergence.

...read moreread less

Abstract: We propose an alternative model for mixtures of experts which uses a different parametric form for the gating network. The modified model is trained by the EM algorithm. In comparison with earlier models--trained by either EM or gradient ascent--there is no need to select a learning stepsize. We report simulation experiments which show that the new architecture yields faster convergence. We also apply the new model to two problem domains: piecewise nonlinear function approximation and the combination of multiple previously trained classifiers.

...read moreread less

258 citations