scispace - formally typeset
P

Philip S. Thomas

Researcher at University of Massachusetts Amherst

Publications -  102
Citations -  2968

Philip S. Thomas is an academic researcher from University of Massachusetts Amherst. The author has contributed to research in topics: Reinforcement learning & Markov decision process. The author has an hindex of 23, co-authored 91 publications receiving 2245 citations. Previous affiliations of Philip S. Thomas include Case Western Reserve University & Carnegie Mellon University.

Papers
More filters
Proceedings Article

Data-efficient off-policy policy evaluation for reinforcement learning

TL;DR: A new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy, based on an extension of the doubly robust estimator and a new way to mix between model based estimates and importance sampling based estimates.
Proceedings Article

Value function approximation in reinforcement learning using the fourier basis

TL;DR: The Fourier basis is described, a linear value function approximation scheme based on the Fourier series that performs well compared to radial basis functions and the polynomial basis, and is competitive with learned proto-value functions.
Proceedings Article

High confidence off-policy evaluation

TL;DR: This paper proposes an off-policy method for computing a lower confidence bound on the expected return of a policy and provides confidences regarding the accuracy of their estimates.
Proceedings Article

High Confidence Policy Improvement

TL;DR: A batch reinforcement learning (RL) algorithm that provides probabilistic guarantees about the quality of each policy that it proposes, and which has no hyper-parameters that require expert tuning is presented.
Journal ArticleDOI

Preventing undesirable behavior of intelligent machines.

TL;DR: A general framework for algorithm design is introduced in which the burden of avoiding undesirable behavior is shifted from the user to the designer of the algorithm, and this framework simplifies the problem of specifying and regulating undesirable behavior.