Topic

Reward-based selection

About: Reward-based selection is a research topic. Over the lifetime, 365 publications have been published within this topic receiving 14137 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Apprenticeship learning via inverse reinforcement learning

[...]

Pieter Abbeel¹, Andrew Y. Ng¹•Institutions (1)

Stanford University¹

04 Jul 2004

TL;DR: This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.

...read moreread less

Abstract: We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform. This setting is useful in applications (such as the task of driving) where it may be difficult to write down an explicit reward function specifying exactly how different desiderata should be traded off. We think of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and give an algorithm for learning the task demonstrated by the expert. Our algorithm is based on using "inverse reinforcement learning" to try to recover the unknown reward function. We show that our algorithm terminates in a small number of iterations, and that even though we may never recover the expert's reward function, the policy output by the algorithm will attain performance close to that of the expert, where here performance is measured with respect to the expert's unknown reward function.

...read moreread less

3,110 citations

Journal Article•DOI•

Subjective probability and delay.

[...]

Howard Rachlin¹, Andrés Raineri², David V. Cross²•Institutions (2)

State University of New York System¹, Stony Brook University²

01 Mar 1991-Journal of the Experimental Analysis of Behavior

TL;DR: Human subjects indicated their preference between a hypothetical $1,000 reward available with various probabilities or delays and a certain reward of variable amount available immediately and the function relating the amount had the same general shape (hyperbolic) as the function found by Mazur (1987) to describe pigeons' delay discounting.

...read moreread less

Abstract: Human subjects indicated their preference between a hypothetical $1,000 reward available with various probabilities or delays and a certain reward of variable amount available immediately. The function relating the amount of the certain-immediate reward subjectively equivalent to the delayed $1,000 reward had the same general shape (hyperbolic) as the function found by Mazur (1987) to describe pigeons' delay discounting. The function relating the certain-immediate amount of money subjectively equivalent to the probabilistic $1,000 reward was also hyperbolic, provided that the stated probability was transformed to odds against winning. In a second experiment, when human subjects chose between a delayed $1,000 reward and a probabilistic $1,000 reward, delay was proportional to the same odds-against transformation of the probability to which it was subjectively equivalent.

...read moreread less

1,249 citations

Journal Article•DOI•

Neural Differentiation of Expected Reward and Risk in Human Subcortical Structures

[...]

Kerstin Preuschoff¹, Peter Bossaerts¹, Steven R. Quartz¹•Institutions (1)

California Institute of Technology¹

03 Aug 2006-Neuron

TL;DR: The results suggest that the primary task of the dopaminergic system is to convey signals of upcoming stochastic rewards, such as expected reward and risk, beyond its role in learning, motivation, and salience.

...read moreread less

664 citations

Journal Article•DOI•

Average reward reinforcement learning: foundations, algorithms, and empirical results

[...]

Sridhar Mahadevan¹•Institutions (1)

University of South Florida¹

01 Jan 1996-Machine Learning

TL;DR: This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework, and a detailed sensitivity analysis of R-learning is carried out to test its dependence on learning rates and exploration levels.

...read moreread less

Abstract: This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dynamic programming methods to several (provably convergent) asynchronous algorithms from optimal control and learning automata. A general sensitive discount optimality metric called n-discount-optimality is introduced, and used to compare the various algorithms. The overview identifies a key similarity across several asynchronous algorithms that is crucial to their convergence, namely independent estimation of the average reward and the relative values. The overview also uncovers a surprising limitation shared by the different algorithms: while several algorithms can provably generate gain-optimal policies that maximize average reward, none of them can reliably filter these to produce bias-optimal (or T-optimal) policies that also maximize the finite reward to absorbing goal states. This paper also presents a detailed empirical study of R-learning, an average reward reinforcement learning method, using two empirical testbeds: a stochastic grid world domain and a simulated robot environment. A detailed sensitivity analysis of R-learning is carried out to test its dependence on learning rates and exploration levels. The results suggest that R-learning is quite sensitive to exploration strategies, and can fall into sub-optimal limit cycles. The performance of R-learning is also compared with that of Q-learning, the best studied discounted RL method. Here, the results suggest that R-learning can be fine-tuned to give better performance than Q-learning in both domains.

...read moreread less

397 citations

Proceedings Article•

Nonlinear Inverse Reinforcement Learning with Gaussian Processes

[...]

Sergey Levine¹, Zoran Popović², Vladlen Koltun¹•Institutions (2)

Stanford University¹, University of Washington²

12 Dec 2011

TL;DR: A probabilistic algorithm that allows complex behaviors to be captured from suboptimal stochastic demonstrations, while automatically balancing the simplicity of the learned reward structure against its consistency with the observed actions.

...read moreread less

Abstract: We present a probabilistic algorithm for nonlinear inverse reinforcement learning. The goal of inverse reinforcement learning is to learn the reward function in a Markov decision process from expert demonstrations. While most prior inverse reinforcement learning algorithms represent the reward as a linear combination of a set of features, we use Gaussian processes to learn the reward as a nonlinear function, while also determining the relevance of each feature to the expert's policy. Our probabilistic algorithm allows complex behaviors to be captured from suboptimal stochastic demonstrations, while automatically balancing the simplicity of the learned reward structure against its consistency with the observed actions.

...read moreread less

336 citations

Collapse

Network Information

Performance

Metrics

365

Papers

16,041

Citations

No. of papers in the topic in previous years
Year	Papers
2019	2
2018	3
2017	25
2016	24
2015	25
2014	23

Reward-based selection

Papers published on a yearly basis

Papers

Trending Questions (3)

Network Information

Related Topics (5)

Performance

Metrics