scispace - formally typeset
Search or ask a question

Showing papers by "Robert Babuska published in 2010"


Reference BookDOI
29 Apr 2010
TL;DR: Reinforcement Learning and Dynamic Programming Using Function Approximators provides a comprehensive and unparalleled exploration of the field of RL and DP, with a focus on continuous-variable problems.
Abstract: From household appliances to applications in robotics, engineered systems involving complex dynamics can only be as effective as the algorithms that control them. While Dynamic Programming (DP) has provided researchers with a way to optimally solve decision and control problems involving complex dynamic systems, its practical value was limited by algorithms that lacked the capacity to scale up to realistic problems. However, in recent years, dramatic developments in Reinforcement Learning (RL), the model-free counterpart of DP, changed our understanding of what is possible. Those developments led to the creation of reliable methods that can be applied even when a mathematical model of the system is unavailable, allowing researchers to solve challenging control problems in engineering, as well as in a variety of other disciplines, including economics, medicine, and artificial intelligence. Reinforcement Learning and Dynamic Programming Using Function Approximators provides a comprehensive and unparalleled exploration of the field of RL and DP. With a focus on continuous-variable problems, this seminal text details essential developments that have substantially altered the field over the past decade. In its pages, pioneering experts provide a concise introduction to classical RL and DP, followed by an extensive presentation of the state-of-the-art and novel methods in RL and DP with approximation. Combining algorithm development with theoretical guarantees, they elaborate on their work with illustrative examples and insightful comparisons. Three individual chapters are dedicated to representative algorithms from each of the major classes of techniques: value iteration, policy iteration, and policy search. The features and performance of these algorithms are highlighted in extensive experimental studies on a range of control applications. The recent development of applications involving complex systems has led to a surge of interest in RL and DP methods and the subsequent need for a quality resource on the subject. For graduate students and others new to the field, this book offers a thorough introduction to both the basics and emerging methods. And for those researchers and practitioners working in the fields of optimal and adaptive control, machine learning, artificial intelligence, and operations research, this resource offers a combination of practical algorithms, theoretical analysis, and comprehensive examples that they will be able to adapt and apply to their own work. Access the authors' website at www.dcsc.tudelft.nl/rlbook/ for additional material, including computer code used in the studies and information concerning new developments.

917 citations


Book ChapterDOI
01 Jan 2010
TL;DR: This chapter reviews a representative selection of multi-agent reinforcement learning algorithms for fully cooperative, fully competitive, and more general (neither cooperative nor competitive) tasks.
Abstract: Multi-agent systems can be used to address problems in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must instead discover a solution on their own, using learning. A significant part of the research on multi-agent learning concerns reinforcement learning techniques. This chapter reviews a representative selection of multi-agent reinforcement learning algorithms for fully cooperative, fully competitive, and more general (neither cooperative nor competitive) tasks. The benefits and challenges of multi-agent reinforcement learning are described. A central challenge in the field is the formal statement of a multi-agent learning goal; this chapter reviews the learning goals proposed in the literature. The problem domains where multi-agent reinforcement learning techniques have been applied are briefly discussed. Several multi-agent reinforcement learning algorithms are applied to an illustrative example involving the coordinated transportation of an object by two cooperative robots. In an outlook for the multi-agent reinforcement learning field, a set of important open issues are identified, and promising research directions to address these issues are outlined.

548 citations


Book
27 Oct 2010
TL;DR: A range of methods and tools to design observers for nonlinear systems represented by a special type of a dynamic nonlinear model -- the Takagi--Sugeno (TS) fuzzy model are provided.
Abstract: Many problems in decision making, monitoring, fault detection, and control require the knowledge of state variables and time-varying parameters that are not directly measured by sensors. In such situations, observers, or estimators, can be employed that use the measured input and output signals along with a dynamic model of the system in order to estimate the unknown states or parameters. An essential requirement in designing an observer is to guarantee the convergence of the estimates to the true values or at least to a small neighborhood around the true values. However, for nonlinear, large-scale, or time-varying systems, the design and tuning of an observer is generally complicated and involves large computational costs. This book provides a range of methods and tools to design observers for nonlinear systems represented by a special type of a dynamic nonlinear model -- the Takagi--Sugeno (TS) fuzzy model. The TS model is a convex combination of affine linear models, which facilitates its stability analysis and observer design by using effective algorithms based on Lyapunov functions and linear matrix inequalities. Takagi--Sugeno models are known to be universal approximators and, in addition, a broad class of nonlinear systems can be exactly represented as a TS system. Three particular structures of large-scale TS models are considered: cascaded systems, distributed systems, and systems affected by unknown disturbances. The reader will find in-depth theoretic analysis accompanied by illustrative examples and simulations of real-world systems. Stability analysis of TS fuzzy systems is addressed in detail. The intended audience are graduate students and researchers both from academia and industry. For newcomers to the field, the book provides a concise introduction dynamic TS fuzzy models along with two methods to construct TS models for a given nonlinear system

266 citations


Journal ArticleDOI
TL;DR: This paper considers the problem of simultaneously estimating the state and unknown inputs in TS systems and designs an observer based on the known part of the fuzzy model, which guarantees an ultimate bound on the error signal.

89 citations


Journal ArticleDOI
TL;DR: The purpose is to show that integration, from different points of view, is a major issue and that increasing the level of abstraction in the description of systems can help to overcome the integration challenges, and concepts for an integration framework are proposed.

72 citations


Proceedings ArticleDOI
29 Jul 2010
TL;DR: An online PI algorithm that evaluates policies with the so-called least-squares temporal difference for Q-functions (LSTD-Q) is proposed, which is found to work well for a wide range of its parameters, and to learn successfully in a real-time example.
Abstract: Reinforcement learning is a promising paradigm for learning optimal control. We consider policy iteration (PI) algorithms for reinforcement learning, which iteratively evaluate and improve control policies. State-of-the-art, least-squares techniques for policy evaluation are sample-efficient and have relaxed convergence requirements. However, they are typically used in offline PI, whereas a central goal of reinforcement learning is to develop online algorithms. Therefore, we propose an online PI algorithm that evaluates policies with the so-called least-squares temporal difference for Q-functions (LSTD-Q). The crucial difference between this online least-squares policy iteration (LSPI) algorithm and its offline counterpart is that, in the online case, policy improvements must be performed once every few state transitions, using only an incomplete evaluation of the current policy. In an extensive experimental evaluation, online LSPI is found to work well for a wide range of its parameters, and to learn successfully in a real-time example. Online LSPI also compares favorably with offline LSPI and with a different flavor of online PI, which instead of LSTD-Q employs another least-squares method for policy evaluation.

70 citations


Journal ArticleDOI
TL;DR: This work shows that fuzzy Q-iteration is consistent, i.e., that it asymptotically obtains the optimal solution as the approximation accuracy increases, and proves that the asynchronous algorithm is proven to converge at least as fast as the synchronous one.

62 citations


Proceedings ArticleDOI
03 Dec 2010
TL;DR: This work presents two novel temporal difference learning algorithms for problems with control delay that improve learning performance by taking the control delay into account and outperform classical TD learning algorithms while maintaining low computational complexity.
Abstract: Robots controlled by Reinforcement Learning (RL) are still rare A core challenge to the application of RL to robotic systems is to learn despite the existence of control delay - the delay between measuring a system's state and acting upon it Control delay is always present in real systems In this work, we present two novel temporal difference (TD) learning algorithms for problems with control delay These algorithms improve learning performance by taking the control delay into account We test our algorithms in a gridworld, where the delay is an integer multiple of the time step, as well as in the simulation of a robotic system, where the delay can have any value In both tests, our proposed algorithms outperform classical TD learning algorithms, while maintaining low computational complexity

62 citations


Proceedings ArticleDOI
03 May 2010
TL;DR: This paper presents a unified way of describing distributed implementations of three commonly used nonlinear estimators: the Extended Kalman filter, the Unscented Kalman Filter and the Particle Filter, and proposes new distributed versions of these methods, in which the nonlinearities are locally managed by the various sensors whereas the different estimates are merged based on a weighted average consensus process.
Abstract: Distributed linear estimation theory has received increased attention in recent years due to several promising industrial applications Distributed nonlinear estimation, however is still a relatively unexplored field despite the need in numerous practical situations for techniques that can handle nonlinearities This paper presents a unified way of describing distributed implementations of three commonly used nonlinear estimators: the Extended Kalman Filter, the Unscented Kalman Filter and the Particle Filter Leveraging on the presented framework, we propose new distributed versions of these methods, in which the nonlinearities are locally managed by the various sensors whereas the different estimates are merged based on a weighted average consensus process The proposed versions are shown to outperform the few published ones in two robot localization test cases

48 citations


Proceedings ArticleDOI
09 Nov 2010
TL;DR: This paper presents the framework for the new ADR algorithm, as well as the design of a new cost function that translates the motivations and objectives of the algorithm.
Abstract: Ant Colony Optimization (ACO) has proven to be a very powerful optimization heuristic for combinatorial optimization problems. This paper introduces a new type of ACO algorithm that will be used for routing along multiple routes in a network as opposed to optimizing a single route. Contrary to traditional routing algorithms, the Ant Dispersion Routing (ADR) algorithm has the objective of determining recommended routes for every driver in the network, in order to increase network efficiency. We present the framework for the new ADR algorithm, as well as the design of a new cost function that translates the motivations and objectives of the algorithm. The proposed approach is illustrated with a small simulation-based case study for the Singapore Expressway Network.

32 citations


Book ChapterDOI
01 Jan 2010
TL;DR: This chapter provides an in-depth review of the literature on approximate DP and RL in large or continuous-space, infinite-horizon problems, and reviews theoretical guarantees on the approximate solutions produced by these algorithms.
Abstract: Dynamic programming (DP) and reinforcement learning (RL) can be used to address problems from a variety of fields, including automatic control, artificial intelligence, operations research, and economy. Many problems in these fields are described by continuous variables, whereas DP and RL can find exact solutions only in the discrete case. Therefore, approximation is essential in practical DP and RL. This chapter provides an in-depth review of the literature on approximate DP and RL in large or continuous-space, infinite-horizon problems. Value iteration, policy iteration, and policy search approaches are presented in turn. Model-based (DP) as well as online and batch model-free (RL) algorithms are discussed. We review theoretical guarantees on the approximate solutions produced by these algorithms. Numerical examples illustrate the behavior of several representative algorithms in practice. Techniques to automatically derive value function approximators are discussed, and a comparison between value iteration, policy iteration, and policy search is provided. The chapter closes with a discussion of open issues and promising research directions in approximate DP and RL.

Journal ArticleDOI
TL;DR: This paper presents an online self-evolving fuzzy controller with global learning capabilities that starts from very simple or even empty configurations and applies learning techniques based on the input/output data collected during normal operation to modify online the fuzzy controller’s structure and parameters.
Abstract: This paper presents an online self-evolving fuzzy controller with global learning capabilities. Starting from very simple or even empty configurations, the controller learns from its own actions while controlling the plant. It applies learning techniques based on the input/output data collected during normal operation to modify online the fuzzy controller’s structure and parameters. The controller does not need any information about the differential equations that govern the plant, nor any offline training. It consists of two main blocks: a parameter learning block that learns proper values for the rule consequents applying a local and a global strategy, and a self-evolving block that modifies the controller’s structure online. The modification of the topology is based on the analysis of the error surface and the determination of the input variables which are most responsible for the error. Simulation and experimental results are presented to show the controller’s capabilities.

Proceedings ArticleDOI
18 Jul 2010
TL;DR: A method to design local observers for Takagi-Sugeno fuzzy models obtained from nonlinear systems by the sector nonlinearity approach, for which classical observer design conditions are unfeasible.
Abstract: In this paper we propose a method to design local observers for Takagi-Sugeno fuzzy models obtained from nonlinear systems by the sector nonlinearity approach. When a global observer cannot be designed, using our method it is still possible to design observers that are valid in a well-defined region of the state-space. The design is based on a nonquadratic Lyapunov function. Depending on whether or not the scheduling vector is a function of the states to be estimated, the conditions are formulated as an LMI or a BMI problem, respectively. The results are illustrated on simulation examples, for which classical observer design conditions are unfeasible.

Journal ArticleDOI
TL;DR: A class of gait generation and control algorithms based on the Switching Max-Plus modeling framework that allow for the synchronization of multiple legs of walking robots are presented.

Proceedings ArticleDOI
09 Nov 2010
TL;DR: A nonlinear approach to state estimation that is based on a Takagi-Sugeno (TS) fuzzy model representation of the METANET traffic model is proposed, whereby the convergence of the observer is guaranteed.
Abstract: Traffic control has proven an effective measure to reduce traffic congestion on freeways. In order to determine appropriate control actions, it is necessary to have information on the current state of the traffic. However, not all traffic states can be measured (such as the traffic density) and so state estimation must be applied in order to obtain state information from the available measurements. Linear state estimation methods are not directly applicable, as traffic models are in general nonlinear. In this paper we propose a nonlinear approach to state estimation that is based on a Takagi-Sugeno (TS) fuzzy model representation of the METANET traffic model. By representing the METANET traffic model as a TS fuzzy system, a structured observer design procedure can be applied, whereby the convergence of the observer is guaranteed. Simulation results are presented to illustrate the quality of the estimate.

Proceedings ArticleDOI
29 Jul 2010
TL;DR: In this article, the authors proposed an optimization-based method to design the input actuation waveform for the piezo actuator in order to improve the damping of the residual oscillations.
Abstract: The printing quality delivered by a Drop-on-Demand (DoD) inkjet printhead is limited due to operational issues such as residual oscillations in the ink channel and the cross-talk between the ink channels. The maximal jetting frequency of a DoD inkjet printhead can be increased by quickly damping the residual oscillations and by bringing in this way the ink channel to rest after jetting the ink drop. This paper proposes an optimization-based method to design the input actuation waveform for the piezo actuator in order to improve the damping of the residual oscillations. The narrow-gap model is used to predict the response of the ink channel under the application of the piezo input. Simulation and experimental results are presented to show the applicability of the proposed method.

Proceedings ArticleDOI
28 May 2010
TL;DR: This paper considers prior knowledge about the monotonicity of the control policy with respect to the system states, and introduces an approach that exploits this type of prior knowledge to accelerate a state-of-the-art RL algorithm called online least-squares policy iteration (LSPI).
Abstract: Reinforcement learning (RL) is a promising paradigm for learning optimal control. Although RL is generally envisioned as working without any prior knowledge about the system, such knowledge is often available and can be exploited to great advantage. In this paper, we consider prior knowledge about the monotonicity of the control policy with respect to the system states, and we introduce an approach that exploits this type of prior knowledge to accelerate a state-of-the-art RL algorithm called online least-squares policy iteration (LSPI). Monotonic policies are appropriate for important classes of systems appearing in control applications. LSPI is a data-efficient RL algorithm that we previously extended to online learning, but that did not provide until now a way to use prior knowledge about the policy. In an empirical evaluation, online LSPI with prior knowledge learns much faster and more reliably than the original online LSPI.

Proceedings ArticleDOI
18 Jul 2010
TL;DR: This paper presents a self-organizing adaptive fuzzy controller that works online that uses the data obtained online during the normal operation of the system to modify the structure of the fuzzy controller.
Abstract: This paper presents a self-organizing adaptive fuzzy controller that works online. No prior knowledge about the differential equations governing the plant nor offline training is needed. Starting from very simple topologies, the algorithm uses the data obtained online during the normal operation of the system to modify the structure of the fuzzy controller. This is achieved in two phases: first, the consequents of the current fuzzy rules are adapted; in the second phase, new membership functions are added online. To show its capabilities, a real experiment with a nonlinear servo system has been carried out with satisfactory results.

Proceedings ArticleDOI
29 Jul 2010
TL;DR: A fuzzy observer is proposed for the continuous time version of the macroscopic traffic flow model METANET with dynamic Takagi-Sugeno fuzzy model that exactly represents the traffic model of a segment of a highway stretch.
Abstract: Traffic state estimation is a prerequisite for traffic surveillance and control. For macroscopic traffic flow models several estimation methods have been investigated, including extended and unscented Kalman filters and particle filters. In this paper we propose a fuzzy observer for the continuous time version of the macroscopic traffic flow model METANET. In order to design the observer, we first derive a dynamic Takagi-Sugeno fuzzy model that exactly represents the traffic model of a segment of a highway stretch. The fuzzy observer is designed based on the fuzzy model and applied to the traffic model. The simulation results are promising for the future development of fuzzy observers for a highway stretch or a whole traffic network.

Journal ArticleDOI
TL;DR: In this article, a control-design methodology of five design steps is proposed, which takes the treatment process characteristics into account, for each design step, the necessary actions are defined, and a new control scheme for the pellet softening treatment step has been designed and implemented in the full-scale plant.
Abstract: The performance of a drinking-water treatment plant is determined by the control of the plant. To design the appropriate control system, a control-design methodology of five design steps is proposed, which takes the treatment process characteristics into account. For each design step, the necessary actions are defined. Using the methodology for the pellet-softening treatment step, a new control scheme for the pellet-softening treatment step has been designed and implemented in the full-scale plant. The implementation resulted in a chemical usage reduction of 15% and reduction in the maintenance effort for this treatment step. Corrective actions of operators are no longer necessary.

Book ChapterDOI
01 Jan 2010
TL;DR: This chapter discusses the ACL framework and its implementation with crisp and fuzzy partitioning of the state space, and demonstrates the use of both versions in the control problem of two-dimensional navigation in an environment with variable damping.
Abstract: Ant colony optimization (ACO) is an optimization heuristic for solving combinatorial optimization problems and is inspired by the swarming behavior of foraging ants ACO has been successfully applied in various domains, such as routing and scheduling In particular, the agents, called ants here, are very efficient at sampling the problem space and quickly finding good solutions Motivated by the advantages of ACO in combinatorial optimization, we develop a novel framework for finding optimal control policies that we call Ant Colony Learning (ACL) In ACL, the ants all work together to collectively learn optimal control policies for any given control problem for a system with nonlinear dynamics In this chapter, we discuss the ACL framework and its implementation with crisp and fuzzy partitioning of the state space We demonstrate the use of both versions in the control problem of two-dimensional navigation in an environment with variable damping and discuss their performance

Proceedings ArticleDOI
03 Dec 2010
TL;DR: By taking inspiration from nature's large predatory and grazing mammals eye configuration, a solution for the question of finding the best orientation of two cameras, between side and frontal facing, for velocity estimation in a forward moving robot is suggested.
Abstract: We present an unscented Kalman filter based state estimator for a fast moving rigid body (such as a mobile robot) endowed with two video cameras. We focus on forward velocity estimation towards the computation of standard energy cost functions for legged locomotion. Points are chosen as image features and the model of each camera is based on the traditional pinhole projection. The resulting filter's state is composed of the rigid body pose and velocities, together with a measure of depth for each tracked point. By taking inspiration from nature's large predatory and grazing mammals eye configuration, we suggest, via simulation results, a solution for the question of finding the best orientation of two cameras, between side and frontal facing, for velocity estimation in a forward moving robot.

Proceedings ArticleDOI
28 Oct 2010
TL;DR: A particle filter is proposed to estimate the average grain diameter of hopper dredgers based on online measurements of the total height of the mixture in the hopper, total mass, the incoming mixture density and flow-rate and the height of a sand bed, together with estimates of the outgoing mixturedensity andflow-rate.
Abstract: Hopper dredgers are massive ships that excavate sediments from the sea bottom while sailing. The excavated material is then transported and discharged at a specified location. The efficiency of this process is highly dependent on the detailed knowledge of the excavated soil. When the soil is composed mainly of sand, the parameter of the greatest importance is the average grain diameter. This, however cannot be directly measured by available sensors. Therefore, in this paper a particle filter is proposed to estimate the average grain diameter. The estimation is based on online measurements of the total height of the mixture in the hopper, total mass, the incoming mixture density and flow-rate and the height of a sand bed, together with estimates of the outgoing mixture density and flow-rate. The loading process is naturally decomposed into three phases and the filter is applied to the first two phases. In order to match different types of nonlinearities, a separate observer is proposed for each phase under consideration. This increases the modularity of the filter and makes tuning easier. The performance of the filter is evaluated in simulations and the results are encouraging.

Book ChapterDOI
01 Jan 2010
TL;DR: This chapter introduces the continuous-time Takagi-Sugeno (TS) fuzzy systems that are employed throughout the book and presents methods to construct TS models that represent or approximate a nonlinear dynamic system starting from a given model of this system.
Abstract: In this chapter we first introduce the continuous-time Takagi-Sugeno (TS) fuzzy systems that are employed throughout the book. In the second part of the chapter, we present methods to construct TS models that represent or approximate a nonlinear dynamic system starting from a given model of this system.

Journal Article
TL;DR: This paper investigates the case when an observer-based controller is designed for an approximate model and then applied to the original nonlinear system, and considers that the scheduling vector used in the membership functions of the observer depends on the states that have to be estimated.
Abstract: A large class of nonlinear systems can be well approximated by Takagi-Sugeno fuzzy models, for which methods and algorithms have been developed to analyze their stability and to design observers and controllers. However, results obtained for Takagi-Sugeno fuzzy models are in general not directly applicable to the original nonlinear system. In this paper, we investigate what conclusions can be drawn and what guarantees can be expected when an observer or a state feedback controller is designed based on an approximate fuzzy model and applied to the original nonlinear system. We also investigate the case when an observer-based controller is designed for an approximate model and then applied to the original nonlinear system. In particular, we consider that the scheduling vector used in the membership functions of the observer depends on the states that have to be estimated. The results are illustrated using simulation examples.

Proceedings ArticleDOI
18 Jul 2010
TL;DR: This paper generalizes the concept of pheromones and the local and global pherOMone update rules and can integrate both crisp and fuzzy partitioning of the state space into the ACL framework, and compares the performance of ACL with these two partitioning methods by applying it to the control problem of swinging-up and stabilizing an under-actuated pendulum.
Abstract: In this paper, we discuss the Ant Colony Learning (ACL) paradigm for non-linear systems with continuous state spaces. ACL is a novel control policy learning methodology, based on Ant Colony Optimization. In ACL, a collection of agents, called ants, jointly interact with the system at hand in order to find the optimal mapping between states and actions. Through the stigmergic interaction by pheromones, the ants are guided by each others experience towards better control policies. In order to deal with continuous state spaces, we generalize the concept of pheromones and the local and global pheromone update rules. As a result of this generalization, we can integrate both crisp and fuzzy partitioning of the state space into the ACL framework. We compare the performance of ACL with these two partitioning methods by applying it to the control problem of swinging-up and stabilizing an under-actuated pendulum.

Proceedings ArticleDOI
28 May 2010
TL;DR: This paper investigates what conclusions can be drawn when an observer-based controller is designed for an approximate model and then applied to the original nonlinear system.
Abstract: A large class of nonlinear systems can be well approximated by Takagi-Sugeno fuzzy models, for which methods and algorithms have been developed to analyze their stability and to design observers and controllers. However, results obtained for Takagi-Sugeno fuzzy models are in general not directly applicable to the original nonlinear system. In this paper, we investigate what conclusions can be drawn when an observer-based controller is designed for an approximate model and then applied to the original nonlinear system. In particular, we consider the case when the scheduling vector used in the membership functions of the observer depends on the states that have to be estimated. The results are illustrated using simulation examples.

Proceedings ArticleDOI
28 May 2010
TL;DR: This paper investigates what conclusions can be drawn and what guarantees can be expected when an observer is designed based on an approximate fuzzy model and applied to the original nonlinear system and shows that in general, exponential stability of the estimation error dynamics cannot be obtained.
Abstract: Analysis and observer design for nonlinear systems have long been investigated, but no generally applicable methods exist as yet. A large class of nonlinear systems can be well approximated by Takagi-Sugeno fuzzy models, for which methods and algorithms have been developed to analyze their stability and to design observers. However, results obtained for Takagi-Sugeno fuzzy models are in general not directly applicable to the original nonlinear system. In this paper, we investigate what conclusions can be drawn and what guarantees can be expected when an observer is designed based on an approximate fuzzy model and applied to the original nonlinear system. It is shown that in general, exponential stability of the estimation error dynamics cannot be obtained. However, the estimation error is bounded. This bound is computed based on the approximation error and the Lyapunov function used. The results are illustrated using simulation examples.