scispace - formally typeset
Search or ask a question

Showing papers by "Robert Babuska published in 2014"


Journal ArticleDOI
TL;DR: A novel reinforcement learning value iteration algorithm is given to solve the dynamic graphical games in an online manner along with its proof of convergence, and it is proved that this notion holds if all agents are in Nash equilibrium and the graph is strongly connected.

179 citations


Journal ArticleDOI
TL;DR: The learning algorithm is shown to be capable of achieving feedback regulation in the presence of model uncertainties and the experimental results for a two-degree-of-freedom manipulator are presented.

28 citations


Journal ArticleDOI
TL;DR: Two nonparametric filters are investigated: the benchmark Bootstrap Particle Filter (BPF) versus the recently developed Feedback Particle filter (FPF), which concludes that the FPF outperforms the benchmark method in both accuracy and numerical efficiency.

28 citations


Journal ArticleDOI
TL;DR: The framework presented in this paper relies on a compact representation of the gait space, provides guarantees regarding the transient and steady-state behavior, and results in simple implementations on legged robotic platforms.
Abstract: We present a gait generation framework for multilegged robots based on max-plus algebra that is endowed with intrinsically safe gait transitions. The time schedule of each foot liftoff and touchdown is modeled by sets of max-plus linear equations. The resulting discrete-event system is translated to continuous time via piecewise constant leg phase velocities; thus, it is compatible with traditional central pattern generator approaches. Different gaits and gait parameters are interleaved by utilizing different max-plus system matrices. We present various gait transition schemes and show that optimal transitions, in the sense of minimizing the stance time variation, allow for constant acceleration and deceleration on legged platforms. The framework presented in this paper relies on a compact representation of the gait space, provides guarantees regarding the transient and steady-state behavior, and results in simple implementations on legged robotic platforms.

26 citations


Journal ArticleDOI
TL;DR: In this article, a Natural Actor-Critic (NAC) reinforcement learning algorithm was used for the hit motion of a badminton robot during a serve operation, where the goal is to reach this target state as quickly as possible without violating the limitations of the actuator.

23 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a nonlinear control approach for balancing underactuated legged robots based on the State-Dependent Riccati Equation (SDRE) approach.

19 citations


Proceedings ArticleDOI
01 Nov 2014
TL;DR: This paper demonstrates through real-world experiments that a humanoid robot NAO is able to autonomously learn how to manipulate two types of garbage cans with lids that need to be opened and closed by different motor skills.
Abstract: Learning to perform household tasks is a key step towards developing cognitive service robots. This requires that robots are capable of discovering how to use human-designed products. In this paper, we propose an active learning approach for acquiring object affordances and manipulation skills in a bottom-up manner. We address affordance learning in continuous state and action spaces without manual discretization of states or exploratory motor primitives. During exploration in the action space, the robot learns a forward model to predict action effects. It simultaneously updates the active exploration policy through reinforcement learning, whereby the prediction error serves as the intrinsic reward. By using the learned forward model, motor skills are obtained to achieve goal states of an object. We demonstrate through real-world experiments that a humanoid robot NAO is able to autonomously learn how to manipulate two types of garbage cans with lids that need to be opened and closed by different motor skills.

19 citations


Journal ArticleDOI
TL;DR: It is shown that the use of a quadratic reward function in on-line RL may lead to counter-intuitive results in terms of a large steady-state error, which is not acceptable from a control-theoretic point of view.

16 citations


Journal ArticleDOI
TL;DR: Inspired by human kinematics, a detailed procedure is proposed for the WRT and RWT in an adjustable-stiffness spring and mass model, and derive control parameters that ensure effective gait transitions.

13 citations


Proceedings ArticleDOI
15 Dec 2014
TL;DR: This paper proposes transfer learning of affordances to reduce the number of exploratory actions needed to learn how to use a new object and demonstrates through real-world experiments with the humanoid robot NAO that this method is able to speed up the use of a new type of garbage can by transferring the affordances learned previously for similar garbage cans.
Abstract: Learning how to use functional objects is essential for robots that are to carry out household tasks. However, learning every object from scratch would be a very naive and time-consuming approach. In this paper, we propose transfer learning of affordances to reduce the number of exploratory actions needed to learn how to use a new object. Through embodied interaction with the object, the robot discovers the object's similarity to previously learned objects by comparing their shape features and spatial relations between object parts. The robot actively selects object parts along with parameterized actions and evaluates the effects on-line. We demonstrate through real-world experiments with the humanoid robot NAO that our method is able to speed up the use of a new type of garbage can by transferring the affordances learned previously for similar garbage cans.

11 citations


Proceedings ArticleDOI
01 Dec 2014
TL;DR: The results show that the learning module can rapidly augment the designed sequential composition by new control policies such that the supervisor could handle unpredicted situations online.
Abstract: Sequential composition is an effective approach to address the control of complex dynamical systems. However, it is not designed to cope with unforeseen situations that might occur during runtime. This paper extends sequential composition control via learning new policies. A learning module based on reinforcement learning is added to the traditional sequential composition that allows for the online creation of new control policies in a short amount of time, on a need basis. During learning, the domain of attraction (DOA) of the new control policy is continuously monitored. Hence, the learning process only executes until the supervisor is able to compose the new control policy with designed controllers via the overlap of DOAs. Estimating the DOAs of the learned controllers is achieved by solving an optimization problem. The proposed strategy has been simulated on a nonlinear system. The results show that the learning module can rapidly augment the designed sequential composition by new control policies such that the supervisor could handle unpredicted situations online.

Journal ArticleDOI
TL;DR: A nominal control law is used to achieve reasonable, yet suboptimal, performance and a RL agent is trained to act as a nonlinear compensator whose task is to improve upon the performance of the nominal controller.

Journal ArticleDOI
TL;DR: This paper presents a sample-efficient, learning-rate-free version of the Value-Gradient Based Policy (VGBP) algorithm, which enables the algorithm to select control actions by optimizing over the right-hand side of the Bellman equation.

Posted Content
TL;DR: This article newly provides an approximate computational scheme for the reach-avoid specification based on the Fitted Value Iteration algorithm, and gives a-priori computable formal probabilistic bounds on the error made by the approximation algorithm: the output of the numerical scheme is quantitatively assessed and thus meaningful for safety-critical applications.
Abstract: This article deals with stochastic processes endowed with the Markov (memoryless) property and evolving over general (uncountable) state spaces. The models further depend on a non-deterministic quantity in the form of a control input, which can be selected to affect the probabilistic dynamics. We address the computation of maximal reach-avoid specifications, together with the synthesis of the corresponding optimal controllers. The reach-avoid specification deals with assessing the likelihood that any finite-horizon trajectory of the model enters a given goal set, while avoiding a given set of undesired states. This article newly provides an approximate computational scheme for the reach-avoid specification based on the Fitted Value Iteration algorithm, which hinges on random sample extractions, and gives a-priori computable formal probabilistic bounds on the error made by the approximation algorithm: as such, the output of the numerical scheme is quantitatively assessed and thus meaningful for safety-critical applications. Furthermore, we provide tighter probabilistic error bounds that are sample-based. The overall computational scheme is put in relationship with alternative approximation algorithms in the literature, and finally its performance is practically assessed over a benchmark case study.

Journal ArticleDOI
TL;DR: In this paper, the authors mitigate the difficulties of algebraic IDA-PBC by using reinforcement learning and demonstrate the usefulness of the proposed learning algorithm both by simulations and through experimental validation.

Proceedings ArticleDOI
20 Nov 2014
TL;DR: This paper considers using mobile sensors (unmanned aerial vehicles) to monitor the traffic situation in a traffic network by finding optimal paths for mobile sensors such that the target links in the traffic network are covered.
Abstract: In this paper, the authors consider using mobile sensors (unmanned aerial vehicles) to monitor the traffic situation in a traffic network. They aim at finding optimal paths for mobile sensors such that the target links in the traffic network are covered; in addition, they also aim at minimizing energy consumption of mobile sensors. This problem is recast as a multiple rural postman problem. In order to solve this problem, the authors subsequently translate it into a multiple traveling salesman problem, by mapping the real traffic network into a virtual network, and then solve it by using mixed-integer linear programming. A simulation-based case study is used to illustrate their approach.

Journal ArticleDOI
TL;DR: A novel Convex SPF is derived that extends the method to multidimensional systems with convex constraints and is demonstrated using an illustrative example.

01 Jan 2014
TL;DR: The authors apply the proposed co-optimization approach to jointly optimize the traffic network topology traffic controllers on the network and show that this method can improve the interaction between topology design and traffic control design, and result in a better overall performance.
Abstract: In this paper, the authors introduce a co-optimization approach to jointly optimize the traffic network topology traffic controllers on the network. Since the network topology design has a long-term goal, while the traffic control is usually based on a minute-to-minute basis traffic situation, the authors uses a so-called parameterized traffic control method to bring them to the same time scale. They applied the proposed co-optimization approach in a simulation-based case study, and compared it with a pure topology design and an iterative optimization method. The results show that this method can improve the interaction between topology design and traffic control design, and result in a better overall performance.

Proceedings Article
18 Nov 2014
TL;DR: It is demonstrated that a humanoid robot NAO is able to learn how to manipulate garbage cans with different lids by using different motor skills.
Abstract: Learning object affordances and manipulation skills is essential for developing cognitive service robots. We propose an active affordance learning approach in continuous state and action spaces without manual discretization of states or exploratory motor primitives. During exploration in the action space, the robot learns a forward model to predict action effects. It simultaneously updates the active exploration policy through reinforcement learning, whereby the prediction error serves as the intrinsic reward. By using the learned forward model, motor skills are obtained in a bottom-up manner to achieve goal states of an object. We demonstrate that a humanoid robot NAO is able to learn how to manipulate garbage cans with different lids by using different motor skills.

Journal ArticleDOI
TL;DR: Conditions for the distributed stability analysis of Takagi–Sugeno fuzzy systems connected in a string are proposed and extended to observer and controller design and illustrated on numerical examples.
Abstract: Distributed systems consist of interconnected, lower-dimensional subsystems. For such systems, distributed analysis and design present several advantages, such as modularity, easier analysis and design, and reduced computational complexity. A special case of distributed systems is when the subsystems are connected in a string. Applications include distributed process control, traffic and communication networks, irrigation systems, hydropower valleys, etc. By exploiting such a structure, in this paper, we propose conditions for the distributed stability analysis of Takagi–Sugeno fuzzy systems connected in a string. These conditions are also extended to observer and controller design and illustrated on numerical examples.