Reinforcement Learning: An Introduction
Citations
[...]
38,208 citations
23,074 citations
14,635 citations
Cites background from "Reinforcement Learning: An Introduc..."
...Such NNs learn to perceive/encode/predict/ classify patterns or pattern sequences, but they do not learn to act in the more general sense of Reinforcement Learning (RL) in unknown environments (see surveys, e.g., Kaelbling et al., 1996; Sutton & Barto, 1998; Wiering & van Otterlo, 2012)....
[...]
...The latter is often explained in a probabilistic framework (e.g., Sutton & Barto, 1998), but its basic idea can already be conveyed in a deterministic setting....
[...]
...Such NNs learn to perceive / encode / predict / classify patterns or pattern sequences, but they do not learn to act in the more general sense of Reinforcement Learning (RL) in unknown environments (e.g., Kaelbling et al., 1996; Sutton and Barto, 1998)....
[...]
...Many variants of traditional RL exist (e.g., Barto et al., 1983; Watkins, 1989; Watkins and Dayan, 1992; Moore and Atkeson, 1993; Schwartz, 1993; Baird, 1994; Rummery and Niranjan, 1994; Singh, 1994; Baird, 1995; Kaelbling et al., 1995; Peng and Williams, 1996; Mahadevan, 1996; Tsitsiklis and van Roy, 1996; Bradtke et al., 1996; Santamarı́a et al., 1997; Prokhorov and Wunsch, 1997; Sutton and Barto, 1998; Wiering and Schmidhuber, 1998b; Baird and Moore, 1999; Meuleau et al., 1999; Morimoto and Doya, 2000; Bertsekas, 2001; Brafman and Tennenholtz, 2002; Abounadi et al., 2002; Lagoudakis and Parr, 2003; Sutton et al., 2008; Maei and Sutton, 2010)....
[...]
...This assumption does not hold in the broader fields of Sequential Decision Making and Reinforcement Learning (RL) (Kaelbling et al., 1996; Sutton and Barto, 1998; Hutter, 2005) (Sec....
[...]
14,377 citations
10,141 citations
References
325 citations
324 citations
319 citations
"Reinforcement Learning: An Introduc..." refers methods in this paper
...At about the same time as Samuel's work, Bellman and Dreyfus (1959) proposed using function approximation methods with DP....
[...]
...) There is now a fairly extensive literature on function approximation methods and DP, such as multigrid methods and methods using splines and orthogonal polynomials (e.g., Bellman and Dreyfus, 1959; Bellman, Kalaba, and Kotkin, 1973; Daniel, 1976; Whitt, 1978; Reetz, 1977; Schweitzer and Seidmann, 1985; Chow and Tsitsiklis, 1991; Kushner and Dupuis, 1992; Rust, 1996)....
[...]
...Dynamic programming has been extensively developed in the last four decades, including extensions to partially observable MDPs (surveyed by Lovejoy, 1991), many applications (surveyed by White, 1985, 1988, 1993), approximation methods (surveyed by Rust, 1996), and asynchronous methods (Bertsekas, 1982, 1983). Many excellent modern treatments of dynamic programming are available (e.g., Bertsekas, 1995; Puterman, 1994; Ross, 1983; and Whittle, 1982, 1983). Bryson (1996) provides a detailed authoritative history of optimal control....
[...]
319 citations
"Reinforcement Learning: An Introduc..." refers background in this paper
...The term associative reinforcement learning has also been used for associative search (Barto and Anandan, 1985), but we prefer to reserve that term as a synonym for the full reinforcement learning problem (as in Sutton, 1984)....
[...]
...that we and colleagues accomplished was directed toward showing that reinforcement learning and supervised learning were indeed different (Barto, Sutton, and Brouwer, 1981; Barto and Sutton, 1981b; Barto and Anandan, 1985)....
[...]
314 citations
"Reinforcement Learning: An Introduc..." refers background in this paper
...1–2 Most of the specific material from these sections is from Sutton (1988), including the TD(0) algorithm, the random walk example, and the term “temporaldifference learning....
[...]