scispace - formally typeset
Search or ask a question

Showing papers by "Robert Babuska published in 2012"


Journal ArticleDOI
01 Nov 2012
TL;DR: The workings of the natural gradient is described, which has made its way into many actor-critic algorithms over the past few years, and a review of several standard and natural actor-Critic algorithms is given.
Abstract: Policy-gradient-based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control, and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper, therefore, describes the state of the art of actor-critic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms over the past few years. A review of several standard and natural actor-critic algorithms is given, and the paper concludes with an overview of application areas and a discussion on open issues.

764 citations


Journal ArticleDOI
01 Mar 2012
TL;DR: This paper evaluates ER RL on real-time control experiments that involve a pendulum swing-up problem and the vision-based control of a goalkeeper robot, and develops a general ER framework that can be combined with essentially any incremental RL technique, and instantiate this framework for the approximate Q-learning and SARSA algorithms.
Abstract: Reinforcement-learning (RL) algorithms can automatically learn optimal control strategies for nonlinear, possibly stochastic systems. A promising approach for RL control is experience replay (ER), which learns quickly from a limited amount of data, by repeatedly presenting these data to an underlying RL algorithm. Despite its benefits, ER RL has been studied only sporadically in the literature, and its applications have largely been confined to simulated systems. Therefore, in this paper, we evaluate ER RL on real-time control experiments that involve a pendulum swing-up problem and the vision-based control of a goalkeeper robot. These real-time experiments are complemented by simulation studies and comparisons with traditional RL. As a preliminary, we develop a general ER framework that can be combined with essentially any incremental RL technique, and instantiate this framework for the approximate Q-learning and SARSA algorithms. The successful real-time learning results that are presented here are highly encouraging for the applicability of ER RL in practice.

229 citations


Journal ArticleDOI
01 Jun 2012
TL;DR: Two new actor-critic algorithms for reinforcement learning that learn a process model and a reference model which represents a desired behavior are proposed, from which desired control actions can be calculated using the inverse of the learned process model.
Abstract: We propose two new actor-critic algorithms for reinforcement learning Both algorithms use local linear regression (LLR) to learn approximations of the functions involved A crucial feature of the algorithms is that they also learn a process model, and this, in combination with LLR, provides an efficient policy update for faster learning The first algorithm uses a novel model-based update rule for the actor parameters The second algorithm does not use an explicit actor but learns a reference model which represents a desired behavior, from which desired control actions can be calculated using the inverse of the learned process model The two novel methods and a standard actor-critic algorithm are applied to the pendulum swing-up problem, in which the novel methods achieve faster learning than the standard algorithm

105 citations


Journal ArticleDOI
01 Sep 2012
TL;DR: A review of recent advances on the state-of-the-art learning algorithms and their applications to bipedal robot control is given.
Abstract: Over the past decades, machine learning techniques, such as supervised learning, reinforcement learning, and unsupervised learning, have been increasingly used in the control engineering community. Various learning algorithms have been developed to achieve autonomous operation and intelligent decision making for many complex and challenging control problems. One of such problems is bipedal walking robot control. Although still in their early stages, learning techniques have demonstrated promising potential to build adaptive control systems for bipedal robots. This paper gives a review of recent advances on the state-of-the-art learning algorithms and their applications to bipedal robot control. The effects and limitations of different learning techniques are discussed through a representative selection of examples from the literature. Guidelines for future research on learning control of bipedal robots are provided in the end.

83 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a robust optimization-based method to design the input actuation waveform for the piezo actuator in order to improve the damping of the residual oscillations in the presence of parametric uncertainties in the ink-channel model.
Abstract: The printing quality delivered by a Drop-on-Demand (DoD) piezo inkjet printhead is limited mainly due to the residual oscillations in the ink channel. The maximal jetting frequency of a DoD inkjet printhead can be increased by quickly damping the residual oscillations and thus bringing an ink channel to rest after jetting an ink drop. In this paper, we propose a robust optimization-based method to design the input actuation waveform for the piezo actuator in order to improve the damping of the residual oscillations in the presence of parametric uncertainties in the ink-channel model. The proposed method obtains a robust actuation pulse by minimizing the tracking error under the parametric uncertainty. Experimental results with a small-droplet DoD inkjet printhead are presented to show the efficacy of the proposed method and the significant improvement in the ink drop consistency.

41 citations


Book ChapterDOI
01 Jan 2012
TL;DR: This chapter reviews least-squares methods for policy iteration, an important class of algorithms for approximate reinforcement learning, and provides guarantees on the performance obtained asymptotically, as the number of samples processed and iterations executed grows to infinity.
Abstract: Approximate reinforcement learning deals with the essential problem of applying reinforcement learning in large and continuous state-action spaces, by using function approximators to represent the solution. This chapter reviews least-squares methods for policy iteration, an important class of algorithms for approximate reinforcement learning. We discuss three techniques for solving the core, policy evaluation component of policy iteration, called: least-squares temporal difference, least-squares policy evaluation, and Bellman residual minimization.We introduce these techniques starting from their general mathematical principles and detailing them down to fully specified algorithms. We pay attention to online variants of policy iteration, and provide a numerical example highlighting the behavior of representative offline and online methods. For the policy evaluation component as well as for the overall resulting approximate policy iteration, we provide guarantees on the performance obtained asymptotically, as the number of samples processed and iterations executed grows to infinity. We also provide finite-sample results, which apply when a finite number of samples and iterations are considered. Finally, we outline several extensions and improvements to the techniques and methods reviewed.

27 citations


Proceedings ArticleDOI
10 Dec 2012
TL;DR: The results show that LLR-based actor-critic RL outperforms the RBF counterpart: it gives quick initial learning and comparable or even superior final control performance.
Abstract: Reinforcement learning (RL) control provides a means to deal with uncertainty and nonlinearity associated with control tasks in an optimal way. The class of actor-critic RL algorithms proved useful for control systems with continuous state and input variables. In the literature, model-based actor-critic algorithms have recently been introduced to considerably speed up the the learning by constructing online a model through local linear regression (LLR). It has not been analyzed yet whether the speed-up is due to the model learning structure or the LLR approximator. Therefore, in this paper we generalize the model learning actor-critic algorithms to make them suitable for use with an arbitrary function approximator. Furthermore, we present the results of an extensive analysis through numerical simulations of a typical nonlinear motion control problem. The LLR approximator is compared with radial basis functions (RBFs) in terms of the initial convergence rate and in terms of the final performance obtained. The results show that LLR-based actor-critic RL outperforms the RBF counterpart: it gives quick initial learning and comparable or even superior final control performance.

25 citations


Proceedings ArticleDOI
27 Jun 2012
TL;DR: It is shown that the proposed method successfully finds the globally optimal schedule for different types of the sheets, and an improvement in the performance compared to the usual constraint satisfaction scheduling.
Abstract: In this paper, an optimal scheduler for a printer is presented. The scheduling is based on the max-plus modeling framework. It allows to model scheduling of multiple sheets as discrete events in a system described by max-plus linear statespace equations. The optimal scheduler uses the feeding and handling time of each sheet as the design variables. It is shown that the proposed method successfully finds the globally optimal schedule for different types of the sheets. Simulation results demonstrate an improvement in the performance compared to the usual constraint satisfaction scheduling.

14 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a system identification-based approach to build a MIMO model for an inkjet printhead, which is used to design new actuation pulses to effectively minimize the residual oscillations and the cross-talk.
Abstract: The printing quality delivered by a drop-on-demand inkjet printhead is severely affected by the residual oscillations in an ink channel and the cross-talk between neighboring ink channels. For a single ink channel, our earlier contribution shows that the actuation pulse can be designed, using a physical model, to effectively damp the residual oscillations. It is not always possible to obtain a good physical model for a single ink channel. A physical model for a multi-input multi-output (MIMO) inkjet printhead is made even more sophisticated by the presence of the cross-talk effect. This paper proposes a system identification-based approach to build a MIMO model for an inkjet printhead. Additionally, the identified MIMO model is used to design new actuation pulses to effectively minimize the residual oscillations and the cross-talk. Using simulation and experimental results, we demonstrate the efficacy of the proposed method.

10 citations


Proceedings ArticleDOI
24 May 2012
TL;DR: In imitation to quickly generate a rough solution to a robotic task from demonstrations, supplied as a collection of state-space trajectories, making the proposed approach a promising step towards versatile learning machines such as future household robots, or robots for autonomous missions.
Abstract: Humans are very fast learners. Yet, we rarely learn a task completely from scratch. Instead, we usually start with a rough approximation of the desired behavior and take the learning from there. In this paper, we use imitation to quickly generate a rough solution to a robotic task from demonstrations, supplied as a collection of state-space trajectories. Appropriate control actions needed to steer the system along the trajectories are then automatically learned in the form of a (nonlinear) state-feedback control law. The learning scheme has two components: a dynamic reference model and an adaptive inverse process model, both based on a data-driven, non-parametric method called local linear regression. The reference model infers the desired behavior from the demonstration trajectories, while the inverse process model provides the control actions to achieve this behavior and is improved online using learning. Experimental results with a pendulum swing-up problem and a robotic arm demonstrate the practical usefulness of this approach. The resulting learned dynamics are not limited to single trajectories, but capture instead the overall dynamics of the motion, making the proposed approach a promising step towards versatile learning machines such as future household robots, or robots for autonomous missions.

7 citations


Proceedings ArticleDOI
10 Jun 2012
TL;DR: This paper uses an adaptive fuzzy observer to estimate the uncertainties in the state matrices of a two-degrees-of-freedom robot arm model and analyzes the improvement in the achievable controller performance when using the adaptive observer.
Abstract: Recently, adaptive fuzzy observers have been introduced that are capable of estimating uncertainties along with the states of a nonlinear system represented by an uncertain Takagi-Sugeno (TS) model. In this paper, we use such an adaptive observer to estimate the uncertainties in the state matrices of a two-degrees-of-freedom robot arm model. The TS model of the robot arm is constructed using the sector nonlinearity approach. The estimates are used in updating the model, and the updated model is used to design a controller for the robot arm. We analyze the improvement in the achievable controller performance when using the adaptive observer.

Proceedings ArticleDOI
01 Oct 2012
TL;DR: C cubic B-splines are used to preserve the continuity of the first-order and second-order spatial derivatives of the distributed-parameter system to approximate the value of the variables of interest in locations with no sensors.
Abstract: In this paper we address the identification of linear distributed-parameter systems with missing data. This setting is relevant in, for instance, sensor networks, where data are frequently lost due to transmission errors. We consider an identification problem where the only information available about the system are the input-output measurements from a set of sensors placed at known fixed locations in the distributed-parameter system. The model is represented as a set of coupled multi-input, single-output autoregressive with exogenous input (ARX) submodels. Total least-squares estimation is employed to obtain an unbiased parameter estimate in the presence of sensor noise. The missing samples are reconstructed with the help of an iterative algorithm. To approximate the value of the variables of interest in locations with no sensors, we use cubic B-splines to preserve the continuity of the first-order and second-order spatial derivatives. The method is applied to a simulated one-dimensional heat-conduction process.

Posted Content
TL;DR: In this article, the authors considered the problem of maximizing the algebraic connectivity of the communication graph in a network of mobile robots by moving them into appropriate positions and defined the Laplacian of the graph as dependent on the pairwise distance between the robots and approximate the problem as a sequence of Semi-Definite Programs.
Abstract: We consider the problem of maximizing the algebraic connectivity of the communication graph in a network of mobile robots by moving them into appropriate positions. We define the Laplacian of the graph as dependent on the pairwise distance between the robots and we approximate the problem as a sequence of Semi-Definite Programs (SDP). We propose a distributed solution consisting of local SDP's which use information only from nearby neighboring robots. We show that the resulting distributed optimization framework leads to feasible subproblems and through its repeated execution, the algebraic connectivity increases monotonically. Moreover, we describe how to adjust the communication load of the robots based on locally computable measures. Numerical simulations show the performance of the algorithm with respect to the centralized solution.