Humans use directed and random exploration to solve the explore–exploit dilemma

doi:10.1037/A0038199

Open AccessJournal ArticleDOI

Humans use directed and random exploration to solve the explore–exploit dilemma

Robert C. Wilson, +4 more

- 27 Oct 2014 -

Journal of Experimental Psychology: Gene...

- Vol. 143, Iss: 6, pp 2074-2081

Chats0

TLDR

It is found that participants were more information seeking and had higher decision noise with the longer horizon, suggesting that humans use both strategies to solve the exploration-exploitation dilemma.

Abstract:

All adaptive organisms face the fundamental tradeoff between pursuing a known reward (exploitation) and sampling lesser-known options in search of something better (exploration). Theory suggests at least two strategies for solving this dilemma: a directed strategy in which choices are explicitly biased toward information seeking, and a random strategy in which decision noise leads to exploration by chance. In this work we investigated the extent to which humans use these two strategies. In our "Horizon task," participants made explore-exploit decisions in two contexts that differed in the number of choices that they would make in the future (the time horizon). Participants were allowed to make either a single choice in each game (horizon 1), or 6 sequential choices (horizon 6), giving them more opportunity to explore. By modeling the behavior in these two conditions, we were able to measure exploration-related changes in decision making and quantify the contributions of the two strategies to behavior. We found that participants were more information seeking and had higher decision noise with the longer horizon, suggesting that humans use both strategies to solve the exploration-exploitation dilemma. We thus conclude that both information seeking and choice variability can be controlled and put to use in the service of exploration.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

The Psychology and Neuroscience of Curiosity.

Celeste Kidd, +1 more

- 04 Nov 2015 -

Neuron

TL;DR: It is proposed that, rather than worry about defining curiosity, it is more helpful to consider the motivations for information-seeking behavior and to study it in its ethological context.

...read moreread less

Journal ArticleDOI

Taking Aim at the Cognitive Side of Learning in Sensorimotor Adaptation Tasks.

Samuel D. McDougle, +3 more

- 01 Jul 2016 -

Trends in Cognitive Sciences

TL;DR: This review focuses on the contribution of cognitive strategies and heuristics to sensorimotor learning, and how these processes enable humans to rapidly explore and evaluate novel solutions to enable flexible, goal-oriented behavior.

...read moreread less

Journal ArticleDOI

Deconstructing the human algorithms for exploration.

Samuel J. Gershman

- 01 Apr 2018 -

Cognition

TL;DR: It is shown that two families of algorithms can be distinguished in terms of how uncertainty affects exploration, and computational modeling confirms that a hybrid model is the best quantitative account of the data.

...read moreread less

Journal ArticleDOI

Generalization guides human exploration in vast decision spaces

Charley M. Wu, +5 more

- 12 Nov 2018 -

Nature Human Behaviour

TL;DR: Modelling how humans search for rewards under limited search horizons finds evidence that Gaussian process function learning—combined with an optimistic upper confidence bound sampling strategy—provides a robust account of how people use generalization to guide search.

...read moreread less

Journal ArticleDOI

Believing in dopamine

Samuel J. Gershman, +1 more

- 30 Sep 2019 -

Nature Reviews Neuroscience

TL;DR: Dopamine signals are implicated in not only reporting reward prediction errors but also various probabilistic computations, and it is proposed that these different roles for dopamine can be placed within a common reinforcement learning framework.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

Journal ArticleDOI

Risk, Ambiguity, and the Savage Axioms

Daniel Ellsberg

- 01 Nov 1961 -

Quarterly Journal of Economics

TL;DR: The notion of "degrees of belief" was introduced by Knight as mentioned in this paper, who argued that people tend to behave "as though" they assigned numerical probabilities to events, or degrees of belief to the events impinging on their actions.

...read moreread less

Journal ArticleDOI

Reinforcement learning: a survey

Leslie Pack Kaelbling, +2 more

- 01 Jan 1996 -

Journal of Artificial Intelligence Resea...

TL;DR: Central issues of reinforcement learning are discussed, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state.

...read moreread less

Journal ArticleDOI

Finite-time Analysis of the Multiarmed Bandit Problem

Peter Auer, +2 more

- 01 May 2002 -

Machine Learning

TL;DR: This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

...read moreread less

Posted Content

Reinforcement Learning: A Survey

Leslie Pack Kaelbling, +2 more

- 01 May 1996 -

arXiv: Artificial Intelligence

TL;DR: A survey of reinforcement learning from a computer science perspective can be found in this article, where the authors discuss the central issues of RL, including trading off exploration and exploitation, establishing the foundations of RL via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state.

...read moreread less