Showing papers by "Robert Babuska published in 2014"

PDF

Open Access

Journal Article•DOI•

Multi-agent discrete-time graphical games and reinforcement learning solutions

[...]

Mohammed Abouheaf¹, Frank L. Lewis², Kyriakos G. Vamvoudakis³, Sofie Haesaert⁴, Robert Babuska⁵ - Show less +1 more•Institutions (5)

King Fahd University of Petroleum and Minerals¹, University of Texas at Arlington², University of California, Santa Barbara³, Eindhoven University of Technology⁴, Delft University of Technology⁵

01 Dec 2014-Automatica

TL;DR: A novel reinforcement learning value iteration algorithm is given to solve the dynamic graphical games in an online manner along with its proof of convergence, and it is proved that this notion holds if all agents are in Nash equilibrium and the graph is strongly connected.

...read moreread less

179 citations

Journal Article•DOI•

Passivity-based reinforcement learning control of a 2-DOF manipulator arm

[...]

Subramanya Nageshrao¹, Gabriel A. D. Lopes¹, Dimitri Jeltsema¹, Robert Babuska¹•Institutions (1)

Delft University of Technology¹

01 Dec 2014-Mechatronics

TL;DR: The learning algorithm is shown to be capable of achieving feedback regulation in the presence of model uncertainties and the experimental results for a two-degree-of-freedom manipulator are presented.

...read moreread less

28 citations

Journal Article•DOI•

Estimation of the soil-dependent time-varying parameters of the hopper sedimentation model: The FPF versus the BPF

[...]

Pawel M. Stano¹, Adam K. Tilton², Robert Babuska¹•Institutions (2)

Delft University of Technology¹, University of Illinois at Urbana–Champaign²

01 Mar 2014-Control Engineering Practice

TL;DR: Two nonparametric filters are investigated: the benchmark Bootstrap Particle Filter (BPF) versus the recently developed Feedback Particle filter (FPF), which concludes that the FPF outperforms the benchmark method in both accuracy and numerical efficiency.

...read moreread less

28 citations

Journal Article•DOI•

Modeling and Control of Legged Locomotion via Switching Max-Plus Models

[...]

Gabriel A. D. Lopes¹, Bart Kersbergen¹, Ton J.J. van den Boom¹, Bart De Schutter¹, Robert Babuska¹ - Show less +1 more•Institutions (1)

Delft University of Technology¹

01 Jun 2014-IEEE Transactions on Robotics

TL;DR: The framework presented in this paper relies on a compact representation of the gait space, provides guarantees regarding the transient and steady-state behavior, and results in simple implementations on legged robotic platforms.

...read moreread less

Abstract: We present a gait generation framework for multilegged robots based on max-plus algebra that is endowed with intrinsically safe gait transitions. The time schedule of each foot liftoff and touchdown is modeled by sets of max-plus linear equations. The resulting discrete-event system is translated to continuous time via piecewise constant leg phase velocities; thus, it is compatible with traditional central pattern generator approaches. Different gaits and gait parameters are interleaved by utilizing different max-plus system matrices. We present various gait transition schemes and show that optimal transitions, in the sense of minimizing the stance time variation, allow for constant acceleration and deceleration on legged platforms. The framework presented in this paper relies on a compact representation of the gait space, provides guarantees regarding the transient and steady-state behavior, and results in simple implementations on legged robotic platforms.

...read moreread less

26 citations

Journal Article•DOI•

Comparison of model-free and model-based methods for time optimal hit control of a badminton robot

[...]

Bruno Depraetere¹, M. Liu², Greg Pinte¹, I. Grondman², Robert Babuska² - Show less +1 more•Institutions (2)

Katholieke Universiteit Leuven¹, Delft University of Technology²

01 Dec 2014-Mechatronics

TL;DR: In this article, a Natural Actor-Critic (NAC) reinforcement learning algorithm was used for the hit motion of a badminton robot during a serve operation, where the goal is to reach this target state as quickly as possible without violating the limitations of the actuator.

...read moreread less

23 citations

Journal Article•DOI•

Balancing a Legged Robot Using State-Dependent Riccati Equation Control

[...]

Esmaeil Najafi¹, Esmaeil Najafi², Gabriel A. D. Lopes², Robert Babuska²•Institutions (2)

University of Tehran¹, Delft University of Technology²

01 Jan 2014-IFAC Proceedings Volumes

TL;DR: In this article, the authors proposed a nonlinear control approach for balancing underactuated legged robots based on the State-Dependent Riccati Equation (SDRE) approach.

...read moreread less

19 citations

Proceedings Article•DOI•

Active learning of affordances for robot use of household objects

[...]

Chang Wang¹, Koen V. Hindriks¹, Robert Babuska¹•Institutions (1)

Delft University of Technology¹

01 Nov 2014

TL;DR: This paper demonstrates through real-world experiments that a humanoid robot NAO is able to autonomously learn how to manipulate two types of garbage cans with lids that need to be opened and closed by different motor skills.

...read moreread less

Abstract: Learning to perform household tasks is a key step towards developing cognitive service robots. This requires that robots are capable of discovering how to use human-designed products. In this paper, we propose an active learning approach for acquiring object affordances and manipulation skills in a bottom-up manner. We address affordance learning in continuous state and action spaces without manual discretization of states or exploratory motor primitives. During exploration in the action space, the robot learns a forward model to predict action effects. It simultaneously updates the active exploration policy through reinforcement learning, whereby the prediction error serves as the intrinsic reward. By using the learned forward model, motor skills are obtained to achieve goal states of an object. We demonstrate through real-world experiments that a humanoid robot NAO is able to autonomously learn how to manipulate two types of garbage cans with lids that need to be opened and closed by different motor skills.

...read moreread less

19 citations

Journal Article•DOI•

On-line Reinforcement Learning for Nonlinear Motion Control: Quadratic and Non-Quadratic Reward Functions

[...]

Jan-Maarten Engel¹, Robert Babuska¹•Institutions (1)

Delft University of Technology¹

01 Jan 2014-IFAC Proceedings Volumes

TL;DR: It is shown that the use of a quadratic reward function in on-line RL may lead to counter-intuitive results in terms of a large steady-state error, which is not acceptable from a control-theoretic point of view.

...read moreread less

16 citations

Journal Article•DOI•

Automated Transitions Between Walking and Running in Legged Robots

[...]

Mohammad Shahbazi¹, Mohammad Shahbazi², Gabriel A. D. Lopes², Robert Babuska²•Institutions (2)

University of Tehran¹, Delft University of Technology²

01 Jan 2014-IFAC Proceedings Volumes

TL;DR: Inspired by human kinematics, a detailed procedure is proposed for the WRT and RWT in an adjustable-stiffness spring and mass model, and derive control parameters that ensure effective gait transitions.

...read moreread less

13 citations

Proceedings Article•DOI•

Effective transfer learning of affordances for household robots

[...]

Chang Wang¹, Koen V. Hindriks¹, Robert Babuska¹•Institutions (1)

Delft University of Technology¹

15 Dec 2014

TL;DR: This paper proposes transfer learning of affordances to reduce the number of exploratory actions needed to learn how to use a new object and demonstrates through real-world experiments with the humanoid robot NAO that this method is able to speed up the use of a new type of garbage can by transferring the affordances learned previously for similar garbage cans.

...read moreread less

Abstract: Learning how to use functional objects is essential for robots that are to carry out household tasks. However, learning every object from scratch would be a very naive and time-consuming approach. In this paper, we propose transfer learning of affordances to reduce the number of exploratory actions needed to learn how to use a new object. Through embodied interaction with the object, the robot discovers the object's similarity to previously learned objects by comparing their shape features and spatial relations between object parts. The robot actively selects object parts along with parameterized actions and evaluates the effects on-line. We demonstrate through real-world experiments with the humanoid robot NAO that our method is able to speed up the use of a new type of garbage can by transferring the affordances learned previously for similar garbage cans.

...read moreread less

11 citations

Proceedings Article•DOI•

Rapid learning in sequential composition control

[...]

Esmaeil Najafi¹, Gabriel A. D. Lopes¹, Subramanya Nageshrao¹, Robert Babuska¹•Institutions (1)

Delft University of Technology¹

01 Dec 2014

TL;DR: The results show that the learning module can rapidly augment the designed sequential composition by new control policies such that the supervisor could handle unpredicted situations online.

...read moreread less

Abstract: Sequential composition is an effective approach to address the control of complex dynamical systems. However, it is not designed to cope with unforeseen situations that might occur during runtime. This paper extends sequential composition control via learning new policies. A learning module based on reinforcement learning is added to the traditional sequential composition that allows for the online creation of new control policies in a short amount of time, on a need basis. During learning, the domain of attraction (DOA) of the new control policy is continuously monitored. Hence, the learning process only executes until the supervisor is able to compose the new control policy with designed controllers via the overlap of DOAs. Estimating the DOAs of the learned controllers is achieved by solving an optimization problem. The proposed strategy has been simulated on a nonlinear system. The results show that the learning module can rapidly augment the designed sequential composition by new control policies such that the supervisor could handle unpredicted situations online.

...read moreread less

Journal Article•DOI•

Nonlinear Disturbance Compensation and Reference Tracking via Reinforcement Learning with Fuzzy Approximators

[...]

Y. Efe Bayiz¹, Robert Babuska¹•Institutions (1)

Delft University of Technology¹

01 Jan 2014-IFAC Proceedings Volumes

TL;DR: A nominal control law is used to achieve reasonable, yet suboptimal, performance and a RL agent is trained to act as a nonlinear compensator whose task is to improve upon the performance of the nominal controller.

...read moreread less

Journal Article•DOI•

Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy

[...]

J.C. van Rooijen¹, I. Grondman¹, Robert Babuska¹•Institutions (1)

Delft University of Technology¹

01 Dec 2014-Mechatronics

TL;DR: This paper presents a sample-efficient, learning-rate-free version of the Value-Gradient Based Policy (VGBP) algorithm, which enables the algorithm to select control actions by optimizing over the right-hand side of the Bellman equation.

...read moreread less

Posted Content•

Sampling-based Approximations with Quantitative Performance for the Probabilistic Reach-Avoid Problem over General Markov Processes

[...]

Sofie Haesaert, Robert Babuska, Alessandro Abate

01 Sep 2014-arXiv: Systems and Control

TL;DR: This article newly provides an approximate computational scheme for the reach-avoid specification based on the Fitted Value Iteration algorithm, and gives a-priori computable formal probabilistic bounds on the error made by the approximation algorithm: the output of the numerical scheme is quantitatively assessed and thus meaningful for safety-critical applications.

...read moreread less

Abstract: This article deals with stochastic processes endowed with the Markov (memoryless) property and evolving over general (uncountable) state spaces. The models further depend on a non-deterministic quantity in the form of a control input, which can be selected to affect the probabilistic dynamics. We address the computation of maximal reach-avoid specifications, together with the synthesis of the corresponding optimal controllers. The reach-avoid specification deals with assessing the likelihood that any finite-horizon trajectory of the model enters a given goal set, while avoiding a given set of undesired states. This article newly provides an approximate computational scheme for the reach-avoid specification based on the Fitted Value Iteration algorithm, which hinges on random sample extractions, and gives a-priori computable formal probabilistic bounds on the error made by the approximation algorithm: as such, the output of the numerical scheme is quantitatively assessed and thus meaningful for safety-critical applications. Furthermore, we provide tighter probabilistic error bounds that are sample-based. The overall computational scheme is put in relationship with alternative approximation algorithms in the literature, and finally its performance is practically assessed over a benchmark case study.

...read moreread less

Journal Article•DOI•

Interconnection and damping assignment control via reinforcement learning

[...]

Subramanya Nageshrao, Gabriel A. D. Lopes, Dimitri Jeltsema¹, Robert Babuska•Institutions (1)

Delft University of Technology¹

01 Jan 2014-IFAC Proceedings Volumes

TL;DR: In this paper, the authors mitigate the difficulties of algebraic IDA-PBC by using reinforcement learning and demonstrate the usefulness of the proposed learning algorithm both by simulations and through experimental validation.

...read moreread less

Proceedings Article•DOI•

Monitoring of traffic networks using mobile sensors

[...]

Zhe Cong¹, Bart De Schutter¹, M. Burger¹, Robert Babuska¹•Institutions (1)

Delft University of Technology¹

20 Nov 2014

TL;DR: This paper considers using mobile sensors (unmanned aerial vehicles) to monitor the traffic situation in a traffic network by finding optimal paths for mobile sensors such that the target links in the traffic network are covered.

...read moreread less

Abstract: In this paper, the authors consider using mobile sensors (unmanned aerial vehicles) to monitor the traffic situation in a traffic network. They aim at finding optimal paths for mobile sensors such that the target links in the traffic network are covered; in addition, they also aim at minimizing energy consumption of mobile sensors. This problem is recast as a multiple rural postman problem. In order to solve this problem, the authors subsequently translate it into a multiple traveling salesman problem, by mapping the real traffic network into a virtual network, and then solve it by using mixed-integer linear programming. A simulation-based case study is used to illustrate their approach.

...read moreread less

Journal Article•DOI•

Convex saturated particle filter

[...]

Pawel M. Stano¹, Arnold J. den Dekker¹, Zsofia Lendek², Robert Babuska¹•Institutions (2)

Delft University of Technology¹, Technical University of Cluj-Napoca²

01 Oct 2014-Automatica

TL;DR: A novel Convex SPF is derived that extends the method to multidimensional systems with convex constraints and is demonstrated using an illustrative example.

...read moreread less

Co-Optimization of Topology Design and Parameterized Control in a Traffic Network

[...]

Zhe Cong, Bart De Schutter, Robert Babuska

01 Jan 2014

TL;DR: The authors apply the proposed co-optimization approach to jointly optimize the traffic network topology traffic controllers on the network and show that this method can improve the interaction between topology design and traffic control design, and result in a better overall performance.

...read moreread less

Abstract: In this paper, the authors introduce a co-optimization approach to jointly optimize the traffic network topology traffic controllers on the network. Since the network topology design has a long-term goal, while the traffic control is usually based on a minute-to-minute basis traffic situation, the authors uses a so-called parameterized traffic control method to bring them to the same time scale. They applied the proposed co-optimization approach in a simulation-based case study, and compared it with a pure topology design and an iterative optimization method. The results show that this method can improve the interaction between topology design and traffic control design, and result in a better overall performance.

...read moreread less

Proceedings Article•

Active Affordance Learning in Continuous State and Action Spaces

[...]

Chao Wang¹, Koen V. Hindriks², Robert Babuska²•Institutions (2)

University of Electronic Science and Technology of China¹, Delft University of Technology²

18 Nov 2014

TL;DR: It is demonstrated that a humanoid robot NAO is able to learn how to manipulate garbage cans with different lids by using different motor skills.

...read moreread less

Abstract: Learning object affordances and manipulation skills is essential for developing cognitive service robots. We propose an active affordance learning approach in continuous state and action spaces without manual discretization of states or exploratory motor primitives. During exploration in the action space, the robot learns a forward model to predict action effects. It simultaneously updates the active exploration policy through reinforcement learning, whereby the prediction error serves as the intrinsic reward. By using the learned forward model, motor skills are obtained in a bottom-up manner to achieve goal states of an object. We demonstrate that a humanoid robot NAO is able to learn how to manipulate garbage cans with different lids by using different motor skills.

...read moreread less

Journal Article•DOI•

Analysis and design for continuous-time string-connected Takagi–Sugeno systems

[...]

Zsofia Lendek¹, Paula Raica¹, Bart De Schutter², Robert Babuska²•Institutions (2)

Technical University of Cluj-Napoca¹, Delft University of Technology²

01 Jul 2014-Journal of The Franklin Institute-engineering and Applied Mathematics

TL;DR: Conditions for the distributed stability analysis of Takagi–Sugeno fuzzy systems connected in a string are proposed and extended to observer and controller design and illustrated on numerical examples.

...read moreread less

Abstract: Distributed systems consist of interconnected, lower-dimensional subsystems. For such systems, distributed analysis and design present several advantages, such as modularity, easier analysis and design, and reduced computational complexity. A special case of distributed systems is when the subsystems are connected in a string. Applications include distributed process control, traffic and communication networks, irrigation systems, hydropower valleys, etc. By exploiting such a structure, in this paper, we propose conditions for the distributed stability analysis of Takagi–Sugeno fuzzy systems connected in a string. These conditions are also extended to observer and controller design and illustrated on numerical examples.

...read moreread less