Hybridization of model-based approach with reinforcement fuzzy system design
05 Jul 2009-pp 1919-1924
TL;DR: This paper proposes a method for hybridization of model-based approach with RL, which is the right solution for such control problems and shows superiority in terms of robustness of the controller to parameter variations in the plant.
Abstract: Reinforcement learning (RL) is a popular learning paradigm to adaptive learning control of nonlinear systems, and is able to work without an explicit model. However, learning from scratch, i.e., without any a priori knowledge, is a daunting undertaking, which results in long training time and instability of learning process with large continuous state space. For physical systems, one must consider that the design of controller is very rarely a tabula rasa: some approximate mathematical model of the system is always available. In this paper, our focus is on control applications wherein the system to be controlled is a physical system. We can always obtain at least an approximate mathematical model of the plant to be controlled. We propose a method for hybridization of model-based approach with RL, which is the right solution for such control problems. The superiority of proposed hybrid approach has been established through simulation experiments on a cart-pole balance bench mark problem, comparing it with model-free RL system. We have used fuzzy inference system for function approximation; it can deal with continuous action space in Q-learning. Comparison with other function approximators has shown its superiority in terms of robustness of the controller to parameter variations in the plant.
01 Dec 2015
TL;DR: Two strategies to design robust adaptive fault tolerant control (FTC) systems for a class of unknown n-order nonlinear systems in presence of actuator and sensor faults versus bounded unknown external disturbances are proposed.
Abstract: Proposing two robust adaptive FTC systems based on machine learning approachesPresenting adaptation laws in the sense of the proposed Lyapunov functionUsing an intelligent observer for unknown nonlinear systems in presence of faultsAdapting the critic and actor of continuous RL based on the Lyapunov function This paper proposes two strategies to design robust adaptive fault tolerant control (FTC) systems for a class of unknown n-order nonlinear systems in presence of actuator and sensor faults versus bounded unknown external disturbances It is based on machine learning approaches which are continuous reinforcement learning (RL) and neural networks (NNs) In the first FTC strategy, an intelligent observer is designed for unknown nonlinear systems when faults occur or not In the second strategy, a robust reinforcement learning FTC is proposed through combining reinforcement learning to treat the unknown nonlinear faulty system and nonlinear control theory to guarantee the stability and robustness of the system Critic and actor of continuous RL are adopted based on the behavior of the defined Lyapunov function In both strategies, to generate the residual a Gaussian radial basis function is used for an online estimation of the unknown dynamic function of the normal system The adaptation law of the online estimator is derived in the sense of Lyapunov function which is defined based on adjustable parameters of the estimator and switching surfaces containing dynamic errors and residuals Simulation results demonstrate the validity and feasibility of proposed FTC systems
Cites background from "Hybridization of model-based approa..."
...In the literature, a number of research results have been reported for applying RL in the continuous state and action space [15-17]....
••19 Jun 2017
TL;DR: A model-free tire slip control solution for a fast, highly nonlinear Anti-lock Braking System (ABS) via a reinforcement Q-learning optimal control approach tailored around a batch neural fitted scheme using two neural networks to approximate the value function and the controller, respectively.
Abstract: A model-free tire slip control solution for a fast, highly nonlinear Anti-lock Braking System (ABS) is proposed in this work via a reinforcement Q-learning optimal control approach. The solution is tailored around a batch neural fitted scheme using two neural networks to approximate the value function and the controller, respectively. The transition samples are collected from the process through interaction by online exploiting the current iteration controller (or policy) under an e-greedy exploration strategy. The ABS process fits this type of learning-by-interaction since it does not need an initial stabilizing controller. The validation case studies carried out on a real laboratory setup reveal that high control system performance can be achieved after several tens of interaction episodes with the controlled process. Insightful comments on the observed control behavior in a set of real-time experiments are offered along with performance comparisons with several other controllers.
01 Aug 1998
TL;DR: Fuzzy Actor-Critic Learning (FACL) and Fuzzy Q-Learning are reinforcement learning methods based on dynamic programming (DP) principles and the genericity of these methods allows them to learn every kind of reinforcement learning problem.
Abstract: Fuzzy Actor-Critic Learning (FACL) and Fuzzy Q-Learning (FQL) are reinforcement learning methods based on dynamic programming (DP) principles. In the paper, they are used to tune online the conclusion part of fuzzy inference systems (FIS). The only information available for learning is the system feedback, which describes in terms of reward and punishment the task the fuzzy agent has to realize. At each time step, the agent receives a reinforcement signal according to the last action it has performed in the previous state. The problem involves optimizing not only the direct reinforcement, but also the total amount of reinforcements the agent can receive in the future. To illustrate the use of these two learning methods, we first applied them to a problem that involves finding a fuzzy controller to drive a boat from one bank to another, across a river with a strong nonlinear current. Then, we used the well known Cart-Pole Balancing and Mountain-Car problems to be able to compare our methods to other reinforcement learning methods and focus on important characteristic aspects of FACL and FQL. We found that the genericity of our methods allows us to learn every kind of reinforcement learning problem (continuous states, discrete/continuous actions, various type of reinforcement functions). The experimental studies also show the superiority of these methods with respect to the other related methods we can find in the literature.
TL;DR: Reading soft computing fuzzy logic neural networks and distributed artificial intelligence is also a way as one of the collective books that gives many advantages.
Abstract: No wonder you activities are, reading will be always needed. It is not only to fulfil the duties that you need to finish in deadline time. Reading will encourage your mind and thoughts. Of course, reading will greatly develop your experiences about everything. Reading soft computing fuzzy logic neural networks and distributed artificial intelligence is also a way as one of the collective books that gives many advantages. The advantages are not only for you, but for the other peoples with those meaningful benefits.
••01 Jun 2004
TL;DR: A dynamic fuzzy Q-learning method that is capable of tuning fuzzy inference systems (FIS) online and a novel online self-organizing learning algorithm is developed so that structure and parameters identification are accomplished automatically and simultaneously based only on Q- learning.
Abstract: This paper presents a dynamic fuzzy Q-learning (DFQL) method that is capable of tuning fuzzy inference systems (FIS) online. A novel online self-organizing learning algorithm is developed so that structure and parameters identification are accomplished automatically and simultaneously based only on Q-learning. Self-organizing fuzzy inference is introduced to calculate actions and Q-functions so as to enable us to deal with continuous-valued states and actions. Fuzzy rules provide a natural mean of incorporating the bias components for rapid reinforcement learning. Experimental results and comparative studies with the fuzzy Q-learning (FQL) and continuous-action Q-learning in the wall-following task of mobile robots demonstrate that the proposed DFQL method is superior.
TL;DR: Experimental results in robotics domains show the superiority of the proposed continuous-action Q-learning over the standard discrete-action version in terms of both asymptotic performance and speed of learning.
Abstract: This paper presents a Q-learning method that works in continuous domains. Other characteristics of our approach are the use of an incremental topology preserving map (ITPM) to partition the input space, and the incorporation of bias to initialize the learning process. A unit of the ITPM represents a limited region of the input space and maps it onto the Q-values of M possible discrete actions. The resulting continuous action is an average of the discrete actions of the “winning unit” weighted by their Q-values. Then, TD(λ) updates the Q-values of the discrete actions according to their contribution. Units are created incrementally and their associated Q-values are initialized by means of domain knowledge. Experimental results in robotics domains show the superiority of the proposed continuous-action Q-learning over the standard discrete-action version in terms of both asymptotic performance and speed of learning. The paper also reports a comparison of discounted-reward against average-reward Q-learning in an infinite horizon robotics task.
01 Jun 1996
TL;DR: A reinforcement connectionist learning architecture that allows an autonomous robot to acquire efficient navigation strategies in a few trials and has high tolerance to noisy sensory data and good generalization abilities is proposed.
Abstract: In this paper we propose a reinforcement connectionist learning architecture that allows an autonomous robot to acquire efficient navigation strategies in a few trials. Besides rapid learning, the architecture has three further appealing features. First, the robot improves its performance incrementally as it interacts with an initially unknown environment, and it ends up learning to avoid collisions even in those situations in which its sensors cannot detect the obstacles. This is a definite advantage over nonlearning reactive robots. Second, since it learns from basic reflexes, the robot is operational from the very beginning and the learning process is safe. Third, the robot exhibits high tolerance to noisy sensory data and good generalization abilities. All these features make this learning robot's architecture very well suited to real-world applications. We report experimental results obtained with a real mobile robot in an indoor environment that demonstrate the appropriateness of our approach to real autonomous robot control.
Related Papers (5)
01 Aug 1998
01 Jun 2008