Hybridization of model-based approach with reinforcement fuzzy system design
05 Jul 2009-pp 1919-1924
TL;DR: This paper proposes a method for hybridization of model-based approach with RL, which is the right solution for such control problems and shows superiority in terms of robustness of the controller to parameter variations in the plant.
Abstract: Reinforcement learning (RL) is a popular learning paradigm to adaptive learning control of nonlinear systems, and is able to work without an explicit model. However, learning from scratch, i.e., without any a priori knowledge, is a daunting undertaking, which results in long training time and instability of learning process with large continuous state space. For physical systems, one must consider that the design of controller is very rarely a tabula rasa: some approximate mathematical model of the system is always available. In this paper, our focus is on control applications wherein the system to be controlled is a physical system. We can always obtain at least an approximate mathematical model of the plant to be controlled. We propose a method for hybridization of model-based approach with RL, which is the right solution for such control problems. The superiority of proposed hybrid approach has been established through simulation experiments on a cart-pole balance bench mark problem, comparing it with model-free RL system. We have used fuzzy inference system for function approximation; it can deal with continuous action space in Q-learning. Comparison with other function approximators has shown its superiority in terms of robustness of the controller to parameter variations in the plant.
01 Dec 2015
TL;DR: Two strategies to design robust adaptive fault tolerant control (FTC) systems for a class of unknown n-order nonlinear systems in presence of actuator and sensor faults versus bounded unknown external disturbances are proposed.
Abstract: Proposing two robust adaptive FTC systems based on machine learning approachesPresenting adaptation laws in the sense of the proposed Lyapunov functionUsing an intelligent observer for unknown nonlinear systems in presence of faultsAdapting the critic and actor of continuous RL based on the Lyapunov function This paper proposes two strategies to design robust adaptive fault tolerant control (FTC) systems for a class of unknown n-order nonlinear systems in presence of actuator and sensor faults versus bounded unknown external disturbances It is based on machine learning approaches which are continuous reinforcement learning (RL) and neural networks (NNs) In the first FTC strategy, an intelligent observer is designed for unknown nonlinear systems when faults occur or not In the second strategy, a robust reinforcement learning FTC is proposed through combining reinforcement learning to treat the unknown nonlinear faulty system and nonlinear control theory to guarantee the stability and robustness of the system Critic and actor of continuous RL are adopted based on the behavior of the defined Lyapunov function In both strategies, to generate the residual a Gaussian radial basis function is used for an online estimation of the unknown dynamic function of the normal system The adaptation law of the online estimator is derived in the sense of Lyapunov function which is defined based on adjustable parameters of the estimator and switching surfaces containing dynamic errors and residuals Simulation results demonstrate the validity and feasibility of proposed FTC systems
Cites background from "Hybridization of model-based approa..."
...In the literature, a number of research results have been reported for applying RL in the continuous state and action space [15-17]....
••19 Jun 2017
TL;DR: A model-free tire slip control solution for a fast, highly nonlinear Anti-lock Braking System (ABS) via a reinforcement Q-learning optimal control approach tailored around a batch neural fitted scheme using two neural networks to approximate the value function and the controller, respectively.
Abstract: A model-free tire slip control solution for a fast, highly nonlinear Anti-lock Braking System (ABS) is proposed in this work via a reinforcement Q-learning optimal control approach. The solution is tailored around a batch neural fitted scheme using two neural networks to approximate the value function and the controller, respectively. The transition samples are collected from the process through interaction by online exploiting the current iteration controller (or policy) under an e-greedy exploration strategy. The ABS process fits this type of learning-by-interaction since it does not need an initial stabilizing controller. The validation case studies carried out on a real laboratory setup reveal that high control system performance can be achieved after several tens of interaction episodes with the controlled process. Insightful comments on the observed control behavior in a set of real-time experiments are offered along with performance comparisons with several other controllers.
TL;DR: The CQGAF fulfills GA-based fuzzy system design under reinforcement learning environment where only weak reinforcement signals such as "success" and "failure" are available.
Abstract: This paper proposes a combination of online clustering and Q-value based genetic algorithm (GA) learning scheme for fuzzy system design (CQGAF) with reinforcements. The CQGAF fulfills GA-based fuzzy system design under reinforcement learning environment where only weak reinforcement signals such as "success" and "failure" are available. In CQGAF, there are no fuzzy rules initially. They are generated automatically. The precondition part of a fuzzy system is online constructed by an aligned clustering-based approach. By this clustering, a flexible partition is achieved. Then, the consequent part is designed by Q-value based genetic reinforcement learning. Each individual in the GA population encodes the consequent part parameters of a fuzzy system and is associated with a Q-value. The Q-value estimates the discounted cumulative reinforcement information performed by the individual and is used as a fitness value for GA evolution. At each time step, an individual is selected according to the Q-values, and then a corresponding fuzzy system is built and applied to the environment with a critic received. With this critic, Q-learning with eligibility trace is executed. After each trial, GA is performed to search for better consequent parameters based on the learned Q-values. Thus, in CQGAF, evolution is performed immediately after the end of one trial in contrast to general GA where many trials are performed before evolution. The feasibility of CQGAF is demonstrated through simulations in cart-pole balancing, magnetic levitation, and chaotic system control problems with only binary reinforcement signals.
••28 Jun 1993
TL;DR: This paper discusses approaches which combine fuzzy controllers and neural networks, and presents their own hybrid architecture where principles from fuzzy control theory and from neural networks are integrated into one system.
Abstract: Fuzzy controllers are designed to work with knowledge in the form of linguistic control rules. But the translation of these linguistic rules into the framework of fuzzy set theory depends on the choice of certain parameters, for which no formal method is known. The optimization of these parameters can be carried out by neural networks, which are designed to learn from training data, but which are in general not able to profit from structural knowledge. In this paper we discuss approaches which combine fuzzy controllers and neural networks, and present our own hybrid architecture where principles from fuzzy control theory and from neural networks are integrated into one system.
TL;DR: The application to a highly nonlinear chemical reactor and to an instable multi-output system shows the ability of the proposed neural control architecture to learn even difficult control strategies from scratch.
Abstract: The paper presents the concepts of a neural control architecture that is able to learn high quality control behaviour in technical process control from scratch. As the input to the learning system, only the control target must be specified. In the first part of the article, the underlying theoretical principles of dynamic programming methods are explained, and their adaptation to the context of technical process control is described. The second part discusses the basic capabilities of the learning system on a typical benchmark problem, where a special focus lies on the quality of the acquired control law. The application to a highly nonlinear chemical reactor and to an instable multi-output system shows the ability of the proposed neural control architecture to learn even difficult control strategies from scratch.
TL;DR: In this article, a neural reinforcement learning method is proposed that does not need a model of the system and can be trained directly using the measurements of the robot system using one function approximator and approximate the policy from this.
Abstract: In this paper we introduce a novel neural reinforcement learning method. Unlike existing methods, our approach does not need a model of the system and can be trained directly using the measurements of the system. We achieve this by only using one function approximator and approximate the improved policy from this. An experiment using a mobile robot shows that it can be trained using a real system within reasonable time.
Related Papers (5)
01 Aug 1998
01 Jun 2008