scispace - formally typeset
Search or ask a question

Showing papers by "Robert Babuska published in 2011"


Journal ArticleDOI
01 Feb 2011
TL;DR: An algorithm for direct search of control policies in continuous-state discrete-action Markov decision processes, which requires vastly fewer BFs than value-function techniques with equidistant BFs, and outperforms policy search with a competing optimization algorithm called DIRECT.
Abstract: This paper introduces an algorithm for direct search of control policies in continuous-state discrete-action Markov decision processes. The algorithm looks for the best closed-loop policy that can be represented using a given number of basis functions (BFs), where a discrete action is assigned to each BF. The type of the BFs and their number are specified in advance and determine the complexity of the representation. Considerable flexibility is achieved by optimizing the locations and shapes of the BFs, together with the action assignments. The optimization is carried out with the cross-entropy method and evaluates the policies by their empirical return from a representative set of initial states. The return for each representative state is estimated using Monte Carlo simulations. The resulting algorithm for cross-entropy policy search with adaptive BFs is extensively evaluated in problems with two to six state variables, for which it reliably obtains good policies with only a small number of BFs. In these experiments, cross-entropy policy search requires vastly fewer BFs than value-function techniques with equidistant BFs, and outperforms policy search with a competing optimization algorithm called DIRECT.

81 citations


Proceedings ArticleDOI
24 Oct 2011
TL;DR: This paper surveys observers for first-order and second-order linear distributed-parameter systems based on their infinite-dimensional and finite-dimensional descriptions.
Abstract: This paper reviews different observer design methods for linear dynamic distributed-parameter systems. In such systems, the states, inputs, and outputs depend on some spatial variable. This dependence, along with additional aspects such as the boundary conditions, increase the complexity of the state estimation problem and of the design methods. The paper in particular surveys observers for first-order and second-order linear distributed-parameter systems based on their infinite-dimensional and finite-dimensional descriptions.

74 citations


Proceedings ArticleDOI
11 Apr 2011
TL;DR: An overview of methods for approximate RL, starting from their dynamic programming roots and organizing them into three major classes: approximate value iteration, policy iteration, and policy search, which compares the different categories of methods and outlines possible ways to enhance the reviewed algorithms.
Abstract: Reinforcement learning (RL) allows agents to learn how to optimally interact with complex environments. Fueled by recent advances in approximation-based algorithms, RL has obtained impressive successes in robotics, artificial intelligence, control, operations research, etc. However, the scarcity of survey papers about approximate RL makes it difficult for newcomers to grasp this intricate field. With the present overview, we take a step toward alleviating this situation. We review methods for approximate RL, starting from their dynamic programming roots and organizing them into three major classes: approximate value iteration, policy iteration, and policy search. Each class is subdivided into representative categories, highlighting among others offline and online algorithms, policy gradient methods, and simulation-based techniques. We also compare the different categories of methods, and outline possible ways to enhance the reviewed algorithms.

59 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed an optimization-based method to design the input actuation waveform for the piezo actuator in order to improve the damping of the residual oscillations.

58 citations


Proceedings ArticleDOI
18 Aug 2011
TL;DR: In this article, the authors consider the problem of maximizing the algebraic connectivity of the communication graph in a network of mobile robots by moving them into appropriate positions and formulate an approximate problem as a Semi-Definite Program (SDP).
Abstract: We consider the problem of maximizing the algebraic connectivity of the communication graph in a network of mobile robots by moving them into appropriate positions. We describe the Laplacian of the graph as dependent on the pairwise distance between the robots and formulate an approximate problem as a Semi-Definite Program (SDP). We propose a consistent, non-iterative distributed solution by solving local SDP's which use information only from nearby neighboring robots. Numerical simulations show the performance of the algorithm with respect to the centralized solution.

25 citations


Proceedings ArticleDOI
11 Apr 2011
TL;DR: An online planning algorithm for finite-action, sparsely stochastic Markov decision processes, in which the random state transitions can only end up in a small number of possible next states is proposed, including the successful online control of a simulated HIV infection with Stochastic drug effectiveness.
Abstract: We propose an online planning algorithm for finite-action, sparsely stochastic Markov decision processes, in which the random state transitions can only end up in a small number of possible next states. The algorithm builds a planning tree by iteratively expanding states, where each expansion exploits sparsity to add all possible successor states. Each state to expand is actively chosen to improve the knowledge about action quality, and this allows the algorithm to return a good action after a strictly limited number of expansions. More specifically, the active selection method is optimistic in that it chooses the most promising states first, so the novel algorithm is called optimistic planning for sparsely stochastic systems. We note that the new algorithm can also be seen as model-predictive (receding-horizon) control. The algorithm obtains promising numerical results, including the successful online control of a simulated HIV infection with stochastic drug effectiveness.

23 citations


Proceedings ArticleDOI
01 Sep 2011
TL;DR: Four methods for decentralized Kalman filtering for distributed-parameter systems, which after spatial and temporal discretization, result in large-scale linear discrete-time systems are compared.
Abstract: In this paper we compare four methods for decentralized Kalman filtering for distributed-parameter systems, which after spatial and temporal discretization, result in large-scale linear discrete-time systems. These methods are: parallel information filter, distributed information filter, distributed Kalman filter with consensus filter, and distributed Kalman filter with weighted averaging. These filters are suitable for sensor networks, where the sensor nodes perform not only sensing and computations, but also communicate estimates among each other. We consider an application of sensor networks to a heat conduction process. The performance of the decentralized filters is evaluated and compared to the centralized Kalman filter.

21 citations


Journal ArticleDOI
TL;DR: This paper proposes sequential stability analysis and observer design for distributed systems where the subsystems are represented by Takagi-Sugeno (TS) fuzzy models, allowing for the online addition of new subsystems.

16 citations


Proceedings ArticleDOI
18 Nov 2011
TL;DR: This paper considers the DTR problem for a traffic network defined as a directed graph, and deals with the mathematical aspects of the resulting optimization problem from the viewpoint of network flow theory.
Abstract: Dynamic traffic routing (DTR) refers to the process of (re)directing traffic at junctions in a traffic network corresponding to the evolving traffic conditions as time progresses. This paper considers the DTR problem for a traffic network defined as a directed graph, and deals with the mathematical aspects of the resulting optimization problem from the viewpoint of network flow theory. Traffic networks may have thousands of links and nodes, resulting in a sizable and computationally complex nonlinear, non-convex DTR optimization problem. To solve this problem Ant Colony Optimization (ACO) is chosen as the optimization method in this paper because of its powerful optimization heuristic for combinatorial optimization problems. However, the standard ACO algorithm is not capable of solving the routing optimization problem aimed at the system optimum, and therefore a new ACO algorithm is developed to achieve the goal of finding the optimal distribution of traffic flows in the network.

16 citations


Proceedings ArticleDOI
01 Dec 2011
TL;DR: This paper compares indirect adaptive fuzzy control and sliding-mode control in a robot manipulator application that performs pick-and-place tasks with unknown and variable payloads and finds the sliding mode controller obtains a very good steady performance.
Abstract: In this paper, we compare indirect adaptive fuzzy control and sliding-mode control in a robot manipulator application. The manipulator performs pick-and-place tasks with unknown and variable payloads. The change of payload causes large variations in the dynamics of the robot. The sliding-mode controller deals with the payload change through its inherent robustness, while the adaptive fuzzy control algorithm adjusts the controller's parameters on-line. The control methods are compared both in numerical simulations and in real-time experiments. The sliding mode controller obtains a very good steady performance. However, thanks to the continuing adaptation, the adaptive fuzzy controller eventually yields smaller steady-state error.

13 citations


Journal ArticleDOI
TL;DR: The novel method and a standard actor-critic algorithm are applied to the pendulum swingup problem, in which the novel method achieves faster learning than the standard algorithm.

Proceedings ArticleDOI
12 Dec 2011
TL;DR: This paper proposes a robust optimization-based method to design the input actuation waveform for the piezo actuator in order to improve the damping of the residual oscillations in the presence of parametric uncertainties in the ink-channel model.
Abstract: The printing quality delivered by a Drop-on-Demand (DoD) inkjet printhead is mainly limited due to the residual oscillations in the ink channel. The maximal jetting frequency of a DoD inkjet printhead can be increased by quickly damping the residual oscillations and by bringing in this way the ink-channel to rest after jetting the ink drop. The inkjet channel model obtained is generally subjected to parametric uncertainty. This paper proposes a robust optimization-based method to design the input actuation waveform for the piezo actuator in order to improve the damping of the residual oscillations in the presence of parametric uncertainties in the ink-channel model. Simulation results are presented to show the efficacy of the proposed method.

Journal ArticleDOI
TL;DR: In this article, a robust optimization-based method is proposed to design the input actuation waveform for the piezo actuator in order to improve the damping of the residual oscillations in the presence of parametric uncertainties in the ink-channel model.

Proceedings ArticleDOI
18 Aug 2011
TL;DR: This paper proposes the Saturated Particle Filter algorithm which incorporates the measurements into the importance sampling procedure through the detection function, and achieves better performance than the standard Constrained SIR filter, while it preserves low computational complexity.
Abstract: In many practical applications the state variables are defined on a compact set of the state space. For estimating such variables constrained particle filters have been successfully applied to nonlinear systems. For the saturated system the measurement information can be used during the sampling procedure to obtain particles that approximate the true state of the system. This can be achieved by using a detection function, which detects the saturation as it occurs. In this paper we propose the Saturated Particle Filter algorithm which incorporates the measurements into the importance sampling procedure through the detection function. The new filter is applied to the Lindley-type stochastic process, where the stochastic process depends on an exogenous parameter. This parameter changes during the simulation. Furthermore, the system is corrupted with high measurement noise. The simulations show that our new filter achieves better performance than the standard Constrained SIR filter, while it preserves low computational complexity.

Proceedings ArticleDOI
05 Dec 2011
TL;DR: This paper presents an intrinsically safe gait switching generator that minimizes the velocity variance of all the legs in stance, allowing for smooth acceleration in legged robots.
Abstract: Switching gaits in many-legged robots can present challenges due to the combinatorial nature of the gait space. In this paper we present an intrinsically safe gait switching generator that minimizes the velocity variance of all the legs in stance, allowing for smooth acceleration in legged robots. The gait switching generator is modeled as a max-plus linear discrete event system which is translated to continuous time via a reference trajectory generator.

Proceedings ArticleDOI
01 Dec 2011
TL;DR: For a class of concurrent two-state cyclic systems, with direct application to legged locomotion, closed-form expressions for the max-plus eigenvalue and eigenvector of the system matrix are presented.
Abstract: Various applications in scheduling, such as train timetables and multi-legged locomotion, can be modeled using systems of max-plus linear equations. In this framework, the eigenvalue of the system matrix represents the total cycle time, whereas the eigenvector dictates the steady-state behavior. For a class of concurrent two-state cyclic systems, with direct application to legged locomotion, we present closed-form expressions for the max-plus eigenvalue and eigenvector of the system matrix. Additionally, we probe into the transient properties of this class of max-plus linear systems by computing the coupling time.

Journal ArticleDOI
TL;DR: These conditions for the distributed stability analysis of Takagi-Sugeno fuzzy systems connected in a string are proposed and extended to observer design and are illustrated on a simulation example.

Journal ArticleDOI
TL;DR: The description of the error dynamics in this paper contains an omission that leads to some bounds used in the conditions of Theorem 8 and Corollary 2 in the paper to be incorrectly defined.

Journal ArticleDOI
TL;DR: This paper considers the essential decision on when to transfer learning from an easier task to a more difficult one, so that the total learning time is reduced and proposes two transfer criteria based on the agent's performance.

Journal ArticleDOI
TL;DR: Upper and lower bounds for the pheromone levels are derived and related to the learning parameters and the number of ants used in the algorithm and also on the expected value of the phersomone Levels are derived.