scispace - formally typeset
Search or ask a question
Author

Robert Babuska

Bio: Robert Babuska is an academic researcher from Delft University of Technology. The author has contributed to research in topics: Fuzzy logic & Reinforcement learning. The author has an hindex of 56, co-authored 371 publications receiving 15388 citations. Previous affiliations of Robert Babuska include Carnegie Mellon University & Czech Technical University in Prague.


Papers
More filters
Proceedings ArticleDOI
15 May 2009
TL;DR: This paper introduces a novel algorithm for approximate policy search in continuous-state, discrete-action Markov decision processes (MDPs) that employs a flexible policy parameterization, suitable for solving general discrete- action MDPs.
Abstract: This paper introduces a novel algorithm for approximate policy search in continuous-state, discrete-action Markov decision processes (MDPs) Previous policy search approaches have typically used ad-hoc parameterizations developed for specific MDPs In contrast, the novel algorithm employs a flexible policy parameterization, suitable for solving general discrete-action MDPs The algorithm looks for the best closed-loop policy that can be represented using a given number of basis functions, where a discrete action is assigned to each basis function The locations and shapes of the basis functions are optimized, together with the action assignments This allows a large class of policies to be represented The optimization is carried out with the cross-entropy method and evaluates the policies by their empirical return from a representative set of initial states We report simulation experiments in which the algorithm reliably obtains good policies with only a small number of basis functions, albeit at sizable computational costs

12 citations

Book ChapterDOI
01 Jan 2015
TL;DR: This paper addresses practical implementations of RL by interfacing elements of systems and control and robotics by using sequential composition and passivity-based control methods towards speeding up learning and providing a stopping time criteria.
Abstract: The model-free paradigm of Reinforcement learning (RL) is a theoretical strength. However in practice, the stringent assumptions required for optimal solutions (full state space exploration) and experimental issues, such as slow learning rates, render model-free RL a practical weakness. This paper addresses practical implementations of RL by interfacing elements of systems and control and robotics. In our approach space is handled by Sequential Composition (a technique commonly used in robotics) and time is handled by the use of passivity-based control methods (a standard nonlinear control approach) towards speeding up learning and providing a stopping time criteria. Sequential composition in effect partitions the state space and allows for the composition of controllers, each having different domains of attraction (DoA) and goal sets. This results in learning taking place in subsets of the state space. Passivity-based control (PBC) is a model-based control approach where total energy is computable. This total energy can be used as a candidate Lyapunov function to evaluate the stability of a controller and find estimates of its DoA. This enables learning in finite time: while learning the candidate Lyapunov function is monitored online to approximate the DoA of the learned controller. Once this DoA covers relevant states, from the point of view of sequential composition, the learning process is stopped. The result of this process is a collection of learned controllers that cover a desired range of the state space, and can be composed in sequence to achieve various desired goals. Optimality is lost in favour of practicality. Other implications include safety while learning and incremental learning.

12 citations

Journal ArticleDOI
TL;DR: This paper considers a multi-objective symbolic regression method that optimizes models with respect to their training error and the measure of how well they comply with the desired physical properties and proposes an extension to the existing algorithm that helps generate a diverse set of high-quality models.
Abstract: Virtually all dynamic system control methods benefit from the availability of an accurate mathematical model of the system. This includes also methods like reinforcement learning, which can be vastly sped up and made safer by using a dynamic system model. However, obtaining a sufficient amount of informative data for constructing dynamic models can be difficult. Consequently, standard data-driven model learning techniques using small data sets that do not cover all important properties of the system yield models that are partly incorrect, for instance, in terms of their steady-state characteristics or local behavior. However, often some knowledge about the desired physical properties of the model is available. Recently, several symbolic regression approaches making use of such knowledge to compensate for data insufficiency were proposed. Therefore, this knowledge should be incorporated into the model learning process to compensate for data insufficiency. In this paper, we consider a multi-objective symbolic regression method that optimizes models with respect to their training error and the measure of how well they comply with the desired physical properties. We propose an extension to the existing algorithm that helps generate a diverse set of high-quality models. Further, we propose a method for selecting a single final model out of the pool of candidate output models. We experimentally demonstrate the approach on three real systems: the TurtleBot 2 mobile robot, the Parrot Bebop 2 drone and the magnetic manipulation system. The results show that the proposed model-learning algorithm yields accurate models that are physically justified. The improvement in terms of the model’s compliance with prior knowledge over the models obtained when no prior knowledge was involved in the learning process is of several orders of magnitude.

12 citations

Proceedings ArticleDOI
06 Jul 2015
TL;DR: This paper extends the standard sequential composition by introducing a novel approach to compose multiple sequential composition controllers towards cooperative control of an inverted pendulum system collaborating with a second-order DC motor for cooperative swing-up maneuvers.
Abstract: Sequential composition is an effective supervisory control approach for addressing challenging control problems on complex dynamical systems. It constructs a back-chaining sequence of controllers to achieve the control objective using simple local controllers. Although sequential composition works properly for a single system, it is not designed for cooperative systems. This paper extends the standard sequential composition by introducing a novel approach to compose multiple sequential composition controllers towards cooperative control. Given two or more systems, cooperation is achieved by composing each of the systems' supervisory finite-state machines, together with the estimation of the domains of attraction of the composed controllers. We present the simulation and experimental results of an inverted pendulum system collaborating with a second-order DC motor for cooperative swing-up maneuvers.

12 citations

Journal ArticleDOI
TL;DR: A method based on finite-difference discretization on a grid in space and time for the identification of distributed-parameter systems is proposed, suitable for the case when the partial differential equation describing the system is not known.

12 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

14,635 citations

Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

01 Apr 2003
TL;DR: The EnKF has a large user group, and numerous publications have discussed applications and theoretical aspects of it as mentioned in this paper, and also presents new ideas and alternative interpretations which further explain the success of the EnkF.
Abstract: The purpose of this paper is to provide a comprehensive presentation and interpretation of the Ensemble Kalman Filter (EnKF) and its numerical implementation. The EnKF has a large user group, and numerous publications have discussed applications and theoretical aspects of it. This paper reviews the important results from these studies and also presents new ideas and alternative interpretations which further explain the success of the EnKF. In addition to providing the theoretical framework needed for using the EnKF, there is also a focus on the algorithmic formulation and optimal numerical implementation. A program listing is given for some of the key subroutines. The paper also touches upon specific issues such as the use of nonlinear measurements, in situ profiles of temperature and salinity, and data which are available with high frequency in time. An ensemble based optimal interpolation (EnOI) scheme is presented as a cost-effective approach which may serve as an alternative to the EnKF in some applications. A fairly extensive discussion is devoted to the use of time correlated model errors and the estimation of model bias.

2,975 citations

Journal ArticleDOI
TL;DR: This article attempts to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots by highlighting both key challenges in robot reinforcement learning as well as notable successes.
Abstract: Reinforcement learning offers to robotics a framework and set of tools for the design of sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this article, we attempt to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots. We highlight both key challenges in robot reinforcement learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our paper lies on the choice between model-based and model-free as well as between value-function-based and policy-search methods. By analyzing a simple problem in some detail we demonstrate how reinforcement learning approaches may be profitably applied, and we note throughout open questions and the tremendous potential for future research.

2,391 citations