Showing papers by "Robert Babuska published in 2012"

PDF

Open Access

Journal Article•DOI•

A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

[...]

I. Grondman¹, Lucian Busoniu, Gabriel A. D. Lopes¹, Robert Babuska¹•Institutions (1)

01 Nov 2012

TL;DR: The workings of the natural gradient is described, which has made its way into many actor-critic algorithms over the past few years, and a review of several standard and natural actor-Critic algorithms is given.

...read moreread less

Abstract: Policy-gradient-based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control, and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper, therefore, describes the state of the art of actor-critic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms over the past few years. A review of several standard and natural actor-critic algorithms is given, and the paper concludes with an overview of application areas and a discussion on open issues.

...read moreread less

764 citations

Journal Article•DOI•

Experience Replay for Real-Time Reinforcement Learning Control

[...]

S. Adam¹, Lucian Busoniu, Robert Babuska•Institutions (1)

ABN AMRO¹

01 Mar 2012

TL;DR: This paper evaluates ER RL on real-time control experiments that involve a pendulum swing-up problem and the vision-based control of a goalkeeper robot, and develops a general ER framework that can be combined with essentially any incremental RL technique, and instantiate this framework for the approximate Q-learning and SARSA algorithms.

...read moreread less

Abstract: Reinforcement-learning (RL) algorithms can automatically learn optimal control strategies for nonlinear, possibly stochastic systems. A promising approach for RL control is experience replay (ER), which learns quickly from a limited amount of data, by repeatedly presenting these data to an underlying RL algorithm. Despite its benefits, ER RL has been studied only sporadically in the literature, and its applications have largely been confined to simulated systems. Therefore, in this paper, we evaluate ER RL on real-time control experiments that involve a pendulum swing-up problem and the vision-based control of a goalkeeper robot. These real-time experiments are complemented by simulation studies and comparisons with traditional RL. As a preliminary, we develop a general ER framework that can be combined with essentially any incremental RL technique, and instantiate this framework for the approximate Q-learning and SARSA algorithms. The successful real-time learning results that are presented here are highly encouraging for the applicability of ER RL in practice.

...read moreread less

229 citations

Journal Article•DOI•

Efficient Model Learning Methods for Actor–Critic Control

[...]

I. Grondman¹, M. Vaandrager, Lucian Busoniu², Robert Babuska¹, Erik Schuitema¹ - Show less +1 more•Institutions (2)

Delft University of Technology¹, University of Lorraine²

01 Jun 2012

TL;DR: Two new actor-critic algorithms for reinforcement learning that learn a process model and a reference model which represents a desired behavior are proposed, from which desired control actions can be calculated using the inverse of the learned process model.

...read moreread less

Abstract: We propose two new actor-critic algorithms for reinforcement learning Both algorithms use local linear regression (LLR) to learn approximations of the functions involved A crucial feature of the algorithms is that they also learn a process model, and this, in combination with LLR, provides an efficient policy update for faster learning The first algorithm uses a novel model-based update rule for the actor parameters The second algorithm does not use an explicit actor but learns a reference model which represents a desired behavior, from which desired control actions can be calculated using the inverse of the learned process model The two novel methods and a standard actor-critic algorithm are applied to the pendulum swing-up problem, in which the novel methods achieve faster learning than the standard algorithm

...read moreread less

105 citations

Journal Article•DOI•

Machine Learning Algorithms in Bipedal Robot Control

[...]

Shouyi Wang¹, Wanpracha Art Chaovalitwongse², Robert Babuska³•Institutions (3)

Rutgers University¹, University of Washington², Delft University of Technology³

01 Sep 2012

TL;DR: A review of recent advances on the state-of-the-art learning algorithms and their applications to bipedal robot control is given.

...read moreread less

Abstract: Over the past decades, machine learning techniques, such as supervised learning, reinforcement learning, and unsupervised learning, have been increasingly used in the control engineering community. Various learning algorithms have been developed to achieve autonomous operation and intelligent decision making for many complex and challenging control problems. One of such problems is bipedal walking robot control. Although still in their early stages, learning techniques have demonstrated promising potential to build adaptive control systems for bipedal robots. This paper gives a review of recent advances on the state-of-the-art learning algorithms and their applications to bipedal robot control. The effects and limitations of different learning techniques are discussed through a representative selection of examples from the literature. Guidelines for future research on learning control of bipedal robots are provided in the end.

...read moreread less

83 citations

Journal Article•DOI•

A Waveform Design Method for a Piezo Inkjet Printhead Based on Robust Feedforward Control

[...]

Amol A. Khalate¹, Xavier Bombois¹, Gérard Scorletti², Robert Babuska¹, Sjirk Koekebakker, W. de Zeeuw - Show less +2 more•Institutions (2)

Delft University of Technology¹, École centrale de Lyon²

30 Jul 2012-IEEE\/ASME Journal of Microelectromechanical Systems

TL;DR: In this paper, the authors proposed a robust optimization-based method to design the input actuation waveform for the piezo actuator in order to improve the damping of the residual oscillations in the presence of parametric uncertainties in the ink-channel model.

...read moreread less

Abstract: The printing quality delivered by a Drop-on-Demand (DoD) piezo inkjet printhead is limited mainly due to the residual oscillations in the ink channel. The maximal jetting frequency of a DoD inkjet printhead can be increased by quickly damping the residual oscillations and thus bringing an ink channel to rest after jetting an ink drop. In this paper, we propose a robust optimization-based method to design the input actuation waveform for the piezo actuator in order to improve the damping of the residual oscillations in the presence of parametric uncertainties in the ink-channel model. The proposed method obtains a robust actuation pulse by minimizing the tracking error under the parametric uncertainty. Experimental results with a small-droplet DoD inkjet printhead are presented to show the efficacy of the proposed method and the significant improvement in the ink drop consistency.

...read moreread less

41 citations

Book Chapter•DOI•

Least-squares methods for policy iteration

[...]

Lucian Busoniu¹, Alessandro Lazaric², Mohammad Ghavamzadeh², Rémi Munos², Robert Babuska³, Bart De Schutter³ - Show less +2 more•Institutions (3)

University of Lorraine¹, French Institute for Research in Computer Science and Automation², Delft University of Technology³

01 Jan 2012

TL;DR: This chapter reviews least-squares methods for policy iteration, an important class of algorithms for approximate reinforcement learning, and provides guarantees on the performance obtained asymptotically, as the number of samples processed and iterations executed grows to infinity.

...read moreread less

Abstract: Approximate reinforcement learning deals with the essential problem of applying reinforcement learning in large and continuous state-action spaces, by using function approximators to represent the solution. This chapter reviews least-squares methods for policy iteration, an important class of algorithms for approximate reinforcement learning. We discuss three techniques for solving the core, policy evaluation component of policy iteration, called: least-squares temporal difference, least-squares policy evaluation, and Bellman residual minimization.We introduce these techniques starting from their general mathematical principles and detailing them down to fully specified algorithms. We pay attention to online variants of policy iteration, and provide a numerical example highlighting the behavior of representative offline and online methods. For the policy evaluation component as well as for the overall resulting approximate policy iteration, we provide guarantees on the performance obtained asymptotically, as the number of samples processed and iterations executed grows to infinity. We also provide finite-sample results, which apply when a finite number of samples and iterations are considered. Finally, we outline several extensions and improvements to the techniques and methods reviewed.

...read moreread less

27 citations

Proceedings Article•DOI•

Model learning actor-critic algorithms: Performance evaluation in a motion control task

[...]

I. Grondman¹, Lucian Busoniu², Robert Babuska¹•Institutions (2)

Delft University of Technology¹, University of Lorraine²

10 Dec 2012

TL;DR: The results show that LLR-based actor-critic RL outperforms the RBF counterpart: it gives quick initial learning and comparable or even superior final control performance.

...read moreread less

Abstract: Reinforcement learning (RL) control provides a means to deal with uncertainty and nonlinearity associated with control tasks in an optimal way. The class of actor-critic RL algorithms proved useful for control systems with continuous state and input variables. In the literature, model-based actor-critic algorithms have recently been introduced to considerably speed up the the learning by constructing online a model through local linear regression (LLR). It has not been analyzed yet whether the speed-up is due to the model learning structure or the LLR approximator. Therefore, in this paper we generalize the model learning actor-critic algorithms to make them suitable for use with an arbitrary function approximator. Furthermore, we present the results of an extensive analysis through numerical simulations of a typical nonlinear motion control problem. The LLR approximator is compared with radial basis functions (RBFs) in terms of the initial convergence rate and in terms of the final performance obtained. The results show that LLR-based actor-critic RL outperforms the RBF counterpart: it gives quick initial learning and comparable or even superior final control performance.

...read moreread less

25 citations

Proceedings Article•DOI•

Max-plus algebra for optimal scheduling of multiple sheets in a printer

[...]

Mohsen Alirezaei, T.J.J. van den Boom, Robert Babuska

27 Jun 2012

TL;DR: It is shown that the proposed method successfully finds the globally optimal schedule for different types of the sheets, and an improvement in the performance compared to the usual constraint satisfaction scheduling.

...read moreread less

Abstract: In this paper, an optimal scheduler for a printer is presented. The scheduling is based on the max-plus modeling framework. It allows to model scheduling of multiple sheets as discrete events in a system described by max-plus linear statespace equations. The optimal scheduler uses the feeding and handling time of each sheet as the design variables. It is shown that the proposed method successfully finds the globally optimal schedule for different types of the sheets. Simulation results demonstrate an improvement in the performance compared to the usual constraint satisfaction scheduling.

...read moreread less

14 citations

Journal Article•DOI•

Minimization of cross-talk in a piezo inkjet printhead based on system identification and feedforward control

[...]

Amol A. Khalate¹, Xavier Bombois¹, Shenxi Ye¹, Robert Babuska¹, Sjirk Koekebakker² - Show less +1 more•Institutions (2)

Delft University of Technology¹, Eindhoven University of Technology²

01 Nov 2012-Journal of Micromechanics and Microengineering

TL;DR: In this article, the authors proposed a system identification-based approach to build a MIMO model for an inkjet printhead, which is used to design new actuation pulses to effectively minimize the residual oscillations and the cross-talk.

...read moreread less

Abstract: The printing quality delivered by a drop-on-demand inkjet printhead is severely affected by the residual oscillations in an ink channel and the cross-talk between neighboring ink channels. For a single ink channel, our earlier contribution shows that the actuation pulse can be designed, using a physical model, to effectively damp the residual oscillations. It is not always possible to obtain a good physical model for a single ink channel. A physical model for a multi-input multi-output (MIMO) inkjet printhead is made even more sophisticated by the presence of the cross-talk effect. This paper proposes a system identification-based approach to build a MIMO model for an inkjet printhead. Additionally, the identified MIMO model is used to design new actuation pulses to effectively minimize the residual oscillations and the cross-talk. Using simulation and experimental results, we demonstrate the efficacy of the proposed method.

...read moreread less

10 citations

Proceedings Article•DOI•

Imitation learning with non-parametric regression

[...]

Maarten Vaandrager, Robert Babuska¹, Lucian Busoniu², Gabriel A. D. Lopes¹•Institutions (2)

Delft University of Technology¹, Centre national de la recherche scientifique²

24 May 2012

TL;DR: In imitation to quickly generate a rough solution to a robotic task from demonstrations, supplied as a collection of state-space trajectories, making the proposed approach a promising step towards versatile learning machines such as future household robots, or robots for autonomous missions.

...read moreread less

Abstract: Humans are very fast learners. Yet, we rarely learn a task completely from scratch. Instead, we usually start with a rough approximation of the desired behavior and take the learning from there. In this paper, we use imitation to quickly generate a rough solution to a robotic task from demonstrations, supplied as a collection of state-space trajectories. Appropriate control actions needed to steer the system along the trajectories are then automatically learned in the form of a (nonlinear) state-feedback control law. The learning scheme has two components: a dynamic reference model and an adaptive inverse process model, both based on a data-driven, non-parametric method called local linear regression. The reference model infers the desired behavior from the demonstration trajectories, while the inverse process model provides the control actions to achieve this behavior and is improved online using learning. Experimental results with a pendulum swing-up problem and a robotic arm demonstrate the practical usefulness of this approach. The resulting learned dynamics are not limited to single trajectories, but capture instead the overall dynamics of the motion, making the proposed approach a promising step towards versatile learning machines such as future household robots, or robots for autonomous missions.

...read moreread less

7 citations

Proceedings Article•DOI•

Adaptive fuzzy observer and robust controller for a 2-DOF robot arm

[...]

S. Bindiganavile Nagesh¹, Zs. Lendek², Amol A. Khalate¹, Robert Babuska¹•Institutions (2)

Delft University of Technology¹, Technical University of Cluj-Napoca²

10 Jun 2012

TL;DR: This paper uses an adaptive fuzzy observer to estimate the uncertainties in the state matrices of a two-degrees-of-freedom robot arm model and analyzes the improvement in the achievable controller performance when using the adaptive observer.

...read moreread less

Abstract: Recently, adaptive fuzzy observers have been introduced that are capable of estimating uncertainties along with the states of a nonlinear system represented by an uncertain Takagi-Sugeno (TS) model. In this paper, we use such an adaptive observer to estimate the uncertainties in the state matrices of a two-degrees-of-freedom robot arm model. The TS model of the robot arm is constructed using the sector nonlinearity approach. The estimates are used in updating the model, and the updated model is used to design a controller for the robot arm. We analyze the improvement in the achievable controller performance when using the adaptive observer.

...read moreread less

Proceedings Article•DOI•

Identification of distributed-parameter systems with missing data

[...]

Z. Hidayat¹, Alfredo Núñez¹, Robert Babuska¹, B. De Schutter¹•Institutions (1)

Delft University of Technology¹

01 Oct 2012

TL;DR: C cubic B-splines are used to preserve the continuity of the first-order and second-order spatial derivatives of the distributed-parameter system to approximate the value of the variables of interest in locations with no sensors.

...read moreread less

Abstract: In this paper we address the identification of linear distributed-parameter systems with missing data. This setting is relevant in, for instance, sensor networks, where data are frequently lost due to transmission errors. We consider an identification problem where the only information available about the system are the input-output measurements from a set of sensors placed at known fixed locations in the distributed-parameter system. The model is represented as a set of coupled multi-input, single-output autoregressive with exogenous input (ARX) submodels. Total least-squares estimation is employed to obtain an unbiased parameter estimate in the presence of sensor noise. The missing samples are reconstructed with the help of an iterative algorithm. To approximate the value of the variables of interest in locations with no sensors, we use cubic B-splines to preserve the continuity of the first-order and second-order spatial derivatives. The method is applied to a simulated one-dimensional heat-conduction process.

...read moreread less

Posted Content•

Constrained Distributed Algebraic Connectivity Maximization in Robotic Networks

[...]

Andrea Simonetto¹, Tamas Keviczky¹, Robert Babuska¹•Institutions (1)

Delft University of Technology¹

15 Jun 2012-arXiv: Systems and Control

TL;DR: In this article, the authors considered the problem of maximizing the algebraic connectivity of the communication graph in a network of mobile robots by moving them into appropriate positions and defined the Laplacian of the graph as dependent on the pairwise distance between the robots and approximate the problem as a sequence of Semi-Definite Programs.

...read moreread less

Abstract: We consider the problem of maximizing the algebraic connectivity of the communication graph in a network of mobile robots by moving them into appropriate positions. We define the Laplacian of the graph as dependent on the pairwise distance between the robots and we approximate the problem as a sequence of Semi-Definite Programs (SDP). We propose a distributed solution consisting of local SDP's which use information only from nearby neighboring robots. We show that the resulting distributed optimization framework leads to feasible subproblems and through its repeated execution, the algebraic connectivity increases monotonically. Moreover, we describe how to adjust the communication load of the robots based on locally computable measures. Numerical simulations show the performance of the algorithm with respect to the centralized solution.

...read moreread less

Posted Content•

Energy-balancing passivity-based control through reinforcement learning

[...]

Olivier Sprangers, Gabriel A. D. Lopes, Robert Babuska

21 Dec 2012