# Hybridization of model-based approach with reinforcement fuzzy system design

05 Jul 2009-pp 1919-1924

TL;DR: This paper proposes a method for hybridization of model-based approach with RL, which is the right solution for such control problems and shows superiority in terms of robustness of the controller to parameter variations in the plant.

Abstract: Reinforcement learning (RL) is a popular learning paradigm to adaptive learning control of nonlinear systems, and is able to work without an explicit model. However, learning from scratch, i.e., without any a priori knowledge, is a daunting undertaking, which results in long training time and instability of learning process with large continuous state space. For physical systems, one must consider that the design of controller is very rarely a tabula rasa: some approximate mathematical model of the system is always available. In this paper, our focus is on control applications wherein the system to be controlled is a physical system. We can always obtain at least an approximate mathematical model of the plant to be controlled. We propose a method for hybridization of model-based approach with RL, which is the right solution for such control problems. The superiority of proposed hybrid approach has been established through simulation experiments on a cart-pole balance bench mark problem, comparing it with model-free RL system. We have used fuzzy inference system for function approximation; it can deal with continuous action space in Q-learning. Comparison with other function approximators has shown its superiority in terms of robustness of the controller to parameter variations in the plant.

##### Citations

More filters

••

[...]

TL;DR: Two strategies to design robust adaptive fault tolerant control (FTC) systems for a class of unknown n-order nonlinear systems in presence of actuator and sensor faults versus bounded unknown external disturbances are proposed.

Abstract: Proposing two robust adaptive FTC systems based on machine learning approachesPresenting adaptation laws in the sense of the proposed Lyapunov functionUsing an intelligent observer for unknown nonlinear systems in presence of faultsAdapting the critic and actor of continuous RL based on the Lyapunov function This paper proposes two strategies to design robust adaptive fault tolerant control (FTC) systems for a class of unknown n-order nonlinear systems in presence of actuator and sensor faults versus bounded unknown external disturbances It is based on machine learning approaches which are continuous reinforcement learning (RL) and neural networks (NNs) In the first FTC strategy, an intelligent observer is designed for unknown nonlinear systems when faults occur or not In the second strategy, a robust reinforcement learning FTC is proposed through combining reinforcement learning to treat the unknown nonlinear faulty system and nonlinear control theory to guarantee the stability and robustness of the system Critic and actor of continuous RL are adopted based on the behavior of the defined Lyapunov function In both strategies, to generate the residual a Gaussian radial basis function is used for an online estimation of the unknown dynamic function of the normal system The adaptation law of the online estimator is derived in the sense of Lyapunov function which is defined based on adjustable parameters of the estimator and switching surfaces containing dynamic errors and residuals Simulation results demonstrate the validity and feasibility of proposed FTC systems

11 citations

### Cites background from "Hybridization of model-based approa..."

[...]

••

[...]

TL;DR: A model-free tire slip control solution for a fast, highly nonlinear Anti-lock Braking System (ABS) via a reinforcement Q-learning optimal control approach tailored around a batch neural fitted scheme using two neural networks to approximate the value function and the controller, respectively.

Abstract: A model-free tire slip control solution for a fast, highly nonlinear Anti-lock Braking System (ABS) is proposed in this work via a reinforcement Q-learning optimal control approach. The solution is tailored around a batch neural fitted scheme using two neural networks to approximate the value function and the controller, respectively. The transition samples are collected from the process through interaction by online exploiting the current iteration controller (or policy) under an e-greedy exploration strategy. The ABS process fits this type of learning-by-interaction since it does not need an initial stabilizing controller. The validation case studies carried out on a real laboratory setup reveal that high control system performance can be achieved after several tens of interaction episodes with the controlled process. Insightful comments on the observed control behavior in a set of real-time experiments are offered along with performance comparisons with several other controllers.

2 citations

##### References

More filters

•

[...]

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

32,257 citations

••

[...]

TL;DR: A least squares version for support vector machine (SVM) classifiers that follows from solving a set of linear equations, instead of quadratic programming for classical SVM's.

Abstract: In this letter we discuss a least squares version for support vector machine (SVM) classifiers. Due to equality type constraints in the formulation, the solution follows from solving a set of linear equations, instead of quadratic programming for classical SVM‘s. The approach is illustrated on a two-spiral benchmark classification problem.

7,819 citations

••

[...]

TL;DR: In this article, a system consisting of two neuron-like adaptive elements can solve a difficult learning control problem, where the task is to balance a pole that is hinged to a movable cart by applying forces to the cart base.

Abstract: It is shown how a system consisting of two neuronlike adaptive elements can solve a difficult learning control problem. The task is to balance a pole that is hinged to a movable cart by applying forces to the cart's base. It is argued that the learning problems faced by adaptive elements that are components of adaptive networks are at least as difficult as this version of the pole-balancing problem. The learning system consists of a single associative search element (ASE) and a single adaptive critic element (ACE). In the course of learning to balance the pole, the ASE constructs associations between input and output by searching under the influence of reinforcement feedback, and the ACE constructs a more informative evaluation function than reinforcement feedback alone can provide. The differences between this approach and other attempts to solve problems using neurolike elements are discussed, as is the relation of this work to classical and instrumental conditioning in animal learning studies and its possible implications for research in the neurosciences.

3,112 citations

•

[...]

TL;DR: It is concluded that reinforcement learning can work robustly in conjunction with function approximators, and that there is little justification at present for avoiding the case of general λ.

Abstract: On large problems, reinforcement learning systems must use parameterized function approximators such as neural networks in order to generalize between similar situations and actions. In these cases there are no strong theoretical results on the accuracy of convergence, and computational results have been mixed. In particular, Boyan and Moore reported at last year's meeting a series of negative results in attempting to apply dynamic programming together with function approximation to simple control problems with continuous state spaces. In this paper, we present positive results for all the control tasks they attempted, and for one that is significantly larger. The most important differences are that we used sparse-coarse-coded function approximators (CMACs) whereas they used mostly global function approximators, and that we learned online whereas they learned offline. Boyan and Moore and others have suggested that the problems they encountered could be solved by using actual outcomes ("rollouts"), as in classical Monte Carlo methods, and as in the TD(λ) algorithm when λ = 1. However, in our experiments this always resulted in substantially poorer performance. We conclude that reinforcement learning can work robustly in conjunction with function approximators, and that there is little justification at present for avoiding the case of general λ.

1,171 citations

••

[...]

TL;DR: The generalized approximate-reasoning-based intelligent control (GARIC) architecture learns and tunes a fuzzy logic controller even when only weak reinforcement is available; introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; and learns to produce real-valued control actions.

Abstract: A method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system is presented. It is shown that: the generalized approximate-reasoning-based intelligent control (GARIC) architecture learns and tunes a fuzzy logic controller even when only weak reinforcement, such as a binary failure signal, is available; introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward network, which can then adaptively improve performance by using gradient descent methods. The GARIC architecture is applied to a cart-pole balancing system and demonstrates significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing. >

956 citations

##### Related Papers (5)

[...]

[...]

[...]