# Reinforcement learning control of robot manipulators in uncertain environments

##### Citations

172 citations

### Cites background from "Reinforcement learning control of r..."

...Shah and Gopal (2009) have presented reinforcement learning control for robot manipulators in uncertain environments....

[...]

28 citations

11 citations

### Cites background from "Reinforcement learning control of r..."

...In [74], the tracking performance of reinforcement learning control for a two link robotic mechanism that has parameter variations and external disturbances is studied....

[...]

4 citations

### Cites background from "Reinforcement learning control of r..."

...And some papers gave the comparison among them [13, 14]....

[...]

2 citations

##### References

^{1}

40,147 citations

### "Reinforcement learning control of r..." refers methods in this paper

...SVM Q-learning SVM is a new universal learning machine in the framework of structural risk minimization (SRM) [13]....

[...]

...D. SVM Q-learning SVM is a new universal learning machine in the framework of structural risk minimization (SRM) [13]....

[...]

...SVM uses a kernel function that satisfies Mercer’s condition [13], to map the input data into a highdimensional feature space, and then construct a linear optimal separating hyper plane in that space....

[...]

...SRM has better generalization ability and is superior to the traditional empirical risk minimization (ERM) principle....

[...]

...The membership function parameters used in this paper are same as in [13]....

[...]

37,989 citations

### "Reinforcement learning control of r..." refers methods in this paper

...In order to explore the set of possible actions and acquire experience through the RL signals, actions are selected using an exploration/exploitation policy (EEP) [11]....

[...]

...If †iu , the action selected in rule iR is ε -greedy iu (where ε -greedy is a function implementing the EEP strategy), while *iu is the maximizing action, i.e., *( , ) max ( , )i b mq i u q i b≤= , then Q-value for the inferred action ku is † 1 1 ( , ) ( ) ( , ) ( ) N Nk k k k i i i i i Q x u x q i u xα α = = = ∑ ∑ , and value of state kx is: 1 1 ( ) ( ) ( , ) ( ) N Nk k k i i i i i V x x q i u xα α∗ = = = ∑ ∑ ....

[...]

...This information is used to calculate temporal difference (TD) [11] approximation error as: 1 ( ) ( , ) k k k n k Q c V x Q x u γ + ∆ = + − and q parameter values are updated as...

[...]

8,811 citations

4,916 citations

### "Reinforcement learning control of r..." refers background or methods in this paper

...One of the most popular RL approaches is the Q-learning [3]....

[...]

...It is an adaptation of Watkin’s Q-learning [3] for FIS, where both the actions and Q-functions are inferred from fuzzy rules....

[...]

437 citations

### "Reinforcement learning control of r..." refers background in this paper

...RL is a computationally simple, direct approach to the adaptive optimal control of nonlinear systems [2]....

[...]