scispace - formally typeset
J

Jonathan Baxter

Researcher at Australian National University

Publications -  49
Citations -  6091

Jonathan Baxter is an academic researcher from Australian National University. The author has contributed to research in topics: Reinforcement learning & Gradient descent. The author has an hindex of 26, co-authored 49 publications receiving 5420 citations. Previous affiliations of Jonathan Baxter include Graz University of Technology & London School of Economics and Political Science.

Papers
More filters
Journal ArticleDOI

A model of inductive bias learning

TL;DR: Under certain restrictions on the set of all hypothesis spaces available to the learner, it is shown that a hypothesis space that performs well on a sufficiently large number of training tasks will also perform well when learning novel tasks in the same environment.
Proceedings Article

Boosting Algorithms as Gradient Descent

TL;DR: Following previous theoretical results bounding the generalization performance of convex combinations of classifiers in terms of general cost functions of the margin, a new algorithm (DOOM II) is presented for performing a gradient descent optimization of such cost functions.
Journal ArticleDOI

Infinite-Horizon Policy-Gradient Estimation

TL;DR: GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies, is introduced.
Journal ArticleDOI

Infinite-horizon policy-gradient estimation

TL;DR: In this article, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies is proposed.
Journal ArticleDOI

A Bayesian/Information Theoretic Model of Learning to Learn viaMultiple Task Sampling

TL;DR: It is argued that for many common machine learning problems, although in general the authors do not know the true (objective) prior for the problem, they do have some idea of a set of possible priors to which the true prior belongs.