Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding
Citations
37,989 citations
Cites methods from "Generalization in Reinforcement Lea..."
...The two left panels are applications to simple continuous-state control tasks using the Sarsa(λ) algorithm and tile coding, with either replacing or accumulating traces (Sutton, 1996)....
[...]
...Tile coding has been used in many reinforcement learning systems (e.g., Shewchuk and Dean, 1990; Lin and Kim, 1991; Miller, Scalera, and Kim, 1994; Sofge and White, 1992; Tham, 1994; Sutton, 1996; Watkins, 1989) as well as in other types of learning control systems (e....
[...]
6,895 citations
Cites background from "Generalization in Reinforcement Lea..."
...Sutton (1996) shows how modi ed versions of Boyan and Moore's examples can convergesuccessfully....
[...]
...…more e cient (Cichosz & Mulawka, 1995)and on changing the de nition to make TD( ) more consistent with the certainty-equivalentmethod (Singh & Sutton, 1996), which is discussed in Section 5.1.4.2 Q-learningThe work of the two components of AHC can be accomplished in a uni ed manner…...
[...]
5,970 citations
1,405 citations
Additional excerpts
...Traditional reinforcement-learning algorithms for control, such as SARSA learning (Rummery and Niranjan, 1994; Sutton, 1996) and Q-learning (Watkins, 1989), lack any stability or convergence guarantees when combined with most forms of value-function approximation....
[...]
1,175 citations
Cites background from "Generalization in Reinforcement Lea..."
...In a discrete-time SMDP [26] decisions can be made only at (positive) integer multiples of an underlying time step....
[...]
...Avoid the exhaustive sweeps of DP by restricting computation to states on, or in the neighborhood of, multiple sample trajectories, either real or simulated....
[...]
References
4,916 citations
"Generalization in Reinforcement Lea..." refers methods in this paper
...Reinforcement learning is a broad class of optimal control methods based on estimating value functions from experience, simulation, or search (Barto, Bradtke & Singh, 1995; Sutton, 1988; Watkins, 1989)....
[...]
...CMACs have been widely used in conjunction with reinforcement learning systems (e.g., Watkins, 1989; Lin & Kim, 1991; Dean, Basye & Shewchuk, 1992; Tham, 1994). and Moore, we found robust good performance on all tasks....
[...]
...To apply the sarsa algorithm to tasks with a continuous state space, we combined it with a sparse, coarse-coded function approximator known as the CMAC (Albus, 1980; Miller, Gordon & Kraft, 1990; Watkins, 1989; Lin & Kim, 1991; Dean et al., 1992; Tham, 1994)....
[...]
4,803 citations
3,736 citations
"Generalization in Reinforcement Lea..." refers background in this paper
...The acrobot is a two-link under-actuated robot (Figure 5) roughly analogous to a gymnast swinging on a highbar (Dejong & Spong, 1994; Spong & Vidyasagar, 1989 )....
[...]
3,240 citations
1,691 citations