P
Philip S. Thomas
Researcher at University of Massachusetts Amherst
Publications - 102
Citations - 2968
Philip S. Thomas is an academic researcher from University of Massachusetts Amherst. The author has contributed to research in topics: Reinforcement learning & Markov decision process. The author has an hindex of 23, co-authored 91 publications receiving 2245 citations. Previous affiliations of Philip S. Thomas include Case Western Reserve University & Carnegie Mellon University.
Papers
More filters
Proceedings Article
Data-efficient off-policy policy evaluation for reinforcement learning
Philip S. Thomas,Emma Brunskill +1 more
TL;DR: A new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy, based on an extension of the doubly robust estimator and a new way to mix between model based estimates and importance sampling based estimates.
Proceedings Article
Value function approximation in reinforcement learning using the fourier basis
TL;DR: The Fourier basis is described, a linear value function approximation scheme based on the Fourier series that performs well compared to radial basis functions and the polynomial basis, and is competitive with learned proto-value functions.
Proceedings Article
High confidence off-policy evaluation
TL;DR: This paper proposes an off-policy method for computing a lower confidence bound on the expected return of a policy and provides confidences regarding the accuracy of their estimates.
Proceedings Article
High Confidence Policy Improvement
TL;DR: A batch reinforcement learning (RL) algorithm that provides probabilistic guarantees about the quality of each policy that it proposes, and which has no hyper-parameters that require expert tuning is presented.
Journal ArticleDOI
Preventing undesirable behavior of intelligent machines.
TL;DR: A general framework for algorithm design is introduced in which the burden of avoiding undesirable behavior is shifted from the user to the designer of the algorithm, and this framework simplifies the problem of specifying and regulating undesirable behavior.