Hybridization of model-based approach with reinforcement fuzzy system design
05 Jul 2009-pp 1919-1924
TL;DR: This paper proposes a method for hybridization of model-based approach with RL, which is the right solution for such control problems and shows superiority in terms of robustness of the controller to parameter variations in the plant.
Abstract: Reinforcement learning (RL) is a popular learning paradigm to adaptive learning control of nonlinear systems, and is able to work without an explicit model. However, learning from scratch, i.e., without any a priori knowledge, is a daunting undertaking, which results in long training time and instability of learning process with large continuous state space. For physical systems, one must consider that the design of controller is very rarely a tabula rasa: some approximate mathematical model of the system is always available. In this paper, our focus is on control applications wherein the system to be controlled is a physical system. We can always obtain at least an approximate mathematical model of the plant to be controlled. We propose a method for hybridization of model-based approach with RL, which is the right solution for such control problems. The superiority of proposed hybrid approach has been established through simulation experiments on a cart-pole balance bench mark problem, comparing it with model-free RL system. We have used fuzzy inference system for function approximation; it can deal with continuous action space in Q-learning. Comparison with other function approximators has shown its superiority in terms of robustness of the controller to parameter variations in the plant.
01 Dec 2015
TL;DR: Two strategies to design robust adaptive fault tolerant control (FTC) systems for a class of unknown n-order nonlinear systems in presence of actuator and sensor faults versus bounded unknown external disturbances are proposed.
Abstract: Proposing two robust adaptive FTC systems based on machine learning approachesPresenting adaptation laws in the sense of the proposed Lyapunov functionUsing an intelligent observer for unknown nonlinear systems in presence of faultsAdapting the critic and actor of continuous RL based on the Lyapunov function This paper proposes two strategies to design robust adaptive fault tolerant control (FTC) systems for a class of unknown n-order nonlinear systems in presence of actuator and sensor faults versus bounded unknown external disturbances It is based on machine learning approaches which are continuous reinforcement learning (RL) and neural networks (NNs) In the first FTC strategy, an intelligent observer is designed for unknown nonlinear systems when faults occur or not In the second strategy, a robust reinforcement learning FTC is proposed through combining reinforcement learning to treat the unknown nonlinear faulty system and nonlinear control theory to guarantee the stability and robustness of the system Critic and actor of continuous RL are adopted based on the behavior of the defined Lyapunov function In both strategies, to generate the residual a Gaussian radial basis function is used for an online estimation of the unknown dynamic function of the normal system The adaptation law of the online estimator is derived in the sense of Lyapunov function which is defined based on adjustable parameters of the estimator and switching surfaces containing dynamic errors and residuals Simulation results demonstrate the validity and feasibility of proposed FTC systems
Cites background from "Hybridization of model-based approa..."
...In the literature, a number of research results have been reported for applying RL in the continuous state and action space [15-17]....
••19 Jun 2017
TL;DR: A model-free tire slip control solution for a fast, highly nonlinear Anti-lock Braking System (ABS) via a reinforcement Q-learning optimal control approach tailored around a batch neural fitted scheme using two neural networks to approximate the value function and the controller, respectively.
Abstract: A model-free tire slip control solution for a fast, highly nonlinear Anti-lock Braking System (ABS) is proposed in this work via a reinforcement Q-learning optimal control approach. The solution is tailored around a batch neural fitted scheme using two neural networks to approximate the value function and the controller, respectively. The transition samples are collected from the process through interaction by online exploiting the current iteration controller (or policy) under an e-greedy exploration strategy. The ABS process fits this type of learning-by-interaction since it does not need an initial stabilizing controller. The validation case studies carried out on a real laboratory setup reveal that high control system performance can be achieved after several tens of interaction episodes with the controlled process. Insightful comments on the observed control behavior in a set of real-time experiments are offered along with performance comparisons with several other controllers.
••02 Sep 1998
TL;DR: In this paper, in addition to reflex rules, environment (domain) knowledge is embedded into the learner and gives leverage to the adaptive state space construction algorithm by splitting key states quickly.
Abstract: In almost all real systems where reinforcement learning is applied, it is found that a knowledge free approach doesn’t work. The basic RL algorithms must sufficiently be biased to achieve a satisfactory performance within a bounded time. This bias takes different forms. In this paper, in addition to reflex rules , environment (domain) knowledge is embedded into the learner. Environment knowledge gives leverage to the adaptive state space construction algorithm by splitting key states quickly. The learner is tested on a B21 robot for a goal reaching task. Experimental results show that after few trials the robot has indeed learned the right situation action rules that unfold its path.
01 Sep 1998
TL;DR: For dealing with reactive sequential decision tasks, a learning model Clarion was developed, which is a hybrid connectionist model consisting of both localist and distributed representations, based on the two-level approach proposed in Sun (1995).
Abstract: For dealing with reactive sequential decision tasks, a learning model Clarion was developed, which is a hybrid connectionist model consisting of both localist (symbolic) and distributed representations, based on the two-level approach proposed in Sun (1995). The model learns and utilizes procedural and declarative knowledge, tapping into the synergy of the two types of processes. It uniies neural, reinforcement, and symbolic methods to perform on-line, bottom-up learning (from subsymbolic to symbolic knowledge). Experiments in various situations shed light on the working of the model. Its theoretical implications in terms of symbol grounding are also discussed.
••10 Feb 2009
TL;DR: This work investigates here the robust tracking performance of reinforcement learning control of manipulators, subjected to parameter variations and extraneous disturbances, and shows the importance of fuzzy Q-learning control.
Abstract: Considerable attention has been given to the design of stable controllers for robot manipulators, in the presence of uncertainties. We investigate here the robust tracking performance of reinforcement learning control of manipulators, subjected to parameter variations and extraneous disturbances. Robustness properties in terms of average error, absolute maximum errors and absolute maximum control efforts, have been compared for reinforcement learning systems using various parameterized function approximators, such as fuzzy, neural network, decision tree, and support vector machine. Simulation results show the importance of fuzzy Q-learning control. Further improvements in this control approach through dynamic fuzzy Q-learning have also been highlighted.
••01 May 2004
TL;DR: The learning control architecture Fynesse provides a unified view onto the integration of prior control knowledge in the reinforcement learning framework and enables autonomous learning of control strategies and the interpretation of learned strategies in terms of fuzzy control rules.
Abstract: Reinforcement learning is an optimisation technique for applications like control or scheduling problems. It is used in learning situations, where success and failure of the system are the only training information. Unfortunately, we have to pay a price for this powerful ability: long training times and the instability of the learning process are not tolerable for industrial applications with large continuous state spaces. From our point of view, the integration of prior knowledge is a key mechanism for making autonomous learning practicable for industrial applications. The learning control architecture Fynesse provides a unified view onto the integration of prior control knowledge in the reinforcement learning framework. In this way, other approaches in this area can be embedded into Fynesse. The key features of Fynesse are (1) the integration of prior control knowledge like linear controllers, control characteristics or fuzzy controllers, (2) autonomous learning of control strategies and (3) the interpretation of learned strategies in terms of fuzzy control rules. The benefits and problems of different methods for the integration of a priori knowledge are demonstrated on empirical studies.
••22 Sep 2003
TL;DR: A method to introduce a priori knowledge into reinforcement learning using temporally extended actions and defines a mechanism called the propagation mechanism to get out of blocked situations induced by the initial knowledge constraints.
Abstract: We present in this paper a method to introduce a priori knowledge into reinforcement learning using temporally extended actions. The aim of our work is to reduce the learning time of the Q-learning algorithm. This introduction of initial knowledge is done by constraining the set of available actions in some states. But at the same time, we can formulate that if the agent is in some particular states (called exception states), we have to relax those constraints. We define a mechanism called the propagation mechanism to get out of blocked situations induced by the initial knowledge constraints. We give some formal properties of our method and test it on a complex grid-world task. On this task, we compare our method with Q-learning and show that the learning time is drastically reduced for a very simple initial knowledge which would not be sufficient, by itself, to solve the task without the definition of exception situations and the propagation mechanism.
Related Papers (5)
01 Aug 1998
01 Jun 2008