Tree-based reinforcement learning for optimal water reservoir operation

doi:10.1029/2009WR008898

Open AccessJournal ArticleDOI

Tree-based reinforcement learning for optimal water reservoir operation

Andrea Castelletti, +3 more

- 01 Sep 2010 -

Water Resources Research

- Vol. 46, Iss: 9, pp 1-19

TLDR

In this paper, a reinforcement learning approach, called fitted Q-iteration, is presented: it combines the principle of continuous approximation of the value functions with a process of learning off-line from experience to design daily, cyclostationary operating policies.

Abstract:

[1] Although being one of the most popular and extensively studied approaches to design water reservoir operations, Stochastic Dynamic Programming is plagued by a dual curse that makes it unsuitable to cope with large water systems: the computational requirement grows exponentially with the number of state variables considered (curse of dimensionality) and an explicit model must be available to describe every system transition and the associated rewards/costs (curse of modeling). A variety of simplifications and approximations have been devised in the past, which, in many cases, make the resulting operating policies inefficient and of scarce relevance in practical contexts. In this paper, a reinforcement-learning approach, called fitted Q-iteration, is presented: it combines the principle of continuous approximation of the value functions with a process of learning off-line from experience to design daily, cyclostationary operating policies. The continuous approximation, performed via tree-based regression, makes it possible to mitigate the curse of dimensionality by adopting a very coarse discretization grid with respect to the dense grid required to design an equally performing policy via Stochastic Dynamic Programming. The learning experience, in the form of a data set generated combining historical observations and model simulations, allows us to overcome the curse of modeling. Lake Como water system (Italy) is used as study site to infer general guidelines on the appropriate setting for the algorithm parameters and to demonstrate the advantages of the approach in terms of accuracy and computational effectiveness compared to traditional Stochastic Dynamic Programming.

Tree-based reinforcement learning for optimal water reservoir operation

Citations

A survey of multi-objective sequential decision-making

A Brief Review of Random Forests for Water Scientists and Practitioners and Their Recent History in Water Resources

Curses, Tradeoffs, and Scalable Management: Advancing Evolutionary Multiobjective Direct Policy Search to Improve Water Reservoir Operations

Position Paper: A general framework for Dynamic Emulation Modelling in environmental problems

Robustness Metrics: How Are They Calculated, When Should They Be Used and Why Do They Give Different Results?

References

Random Forests

Reinforcement Learning: An Introduction

Bagging predictors

Classification and regression trees

Dynamic Programming

Related Papers (5)

Optimal Operation of Multireservoir Systems: State-of-the-Art Review

Reservoir Management and Operations Models: A State‐of‐the‐Art Review

Tree-Based Batch Mode Reinforcement Learning

Dynamic Programming

Extremely randomized trees