A survey of multi-objective sequential decision-making
Citations
1,329 citations
935 citations
740 citations
492 citations
380 citations
Cites background from "A survey of multi-objective sequent..."
...For problems with multi-objective reward functions, there are approaches to learning the pareto-optimal reward function (Roijers et al., 2013), but none of these have been scaled to the deep reinforcement learning setting yet....
[...]
References
[...]
14,187 citations
11,625 citations
7,016 citations
"A survey of multi-objective sequent..." refers background or methods in this paper
...Model- free methods can also be practical for the multiple-policy setting if they employ off-policy learning (Sutton & Barto, 1998; Precup, Sutton, & Dasgupta, 2001), which makes it possible to learn about one policy using data gathered by another....
[...]
...…solving such problems, either by planning given a model of the MDP (e.g., via dynamic programming methods, Bellman, 1957b) or by learning through interaction with an unknown MDP (e.g., via temporal-difference methods, Sutton & Barto, 1998), is an important challenge in artificial intelligence....
[...]
...Note that the Bellman equation, which forms the heart of most standard solution algorithms such as dynamic programming (Bellman, 1957b) and temporal-difference methods (Sutton & Barto, 1998), explicitly relies on the assumption of additive returns....
[...]
5,994 citations
5,492 citations