Showing papers by "Robert Babuska published in 2006"

PDF

Open Access

Proceedings Article•DOI•

Multi-Agent Reinforcement Learning: A Survey

[...]

Lucian Busoniu¹, Robert Babuska¹, B. De Schutter¹•Institutions (1)

01 Dec 2006

TL;DR: An integrated survey of the field of multi-agent learning is presented, in which the issue of the multi- agent learning goal is discussed and a representative selection of algorithms is reviewed.

...read moreread less

Abstract: Multi-agent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, economics. Many tasks arising in these domains require that the agents learn behaviors online. A significant part of the research on multi-agent learning concerns reinforcement learning techniques. However, due to different viewpoints on central issues, such as the formal statement of the learning goal, a large number of different methods and approaches have been introduced. In this paper we aim to present an integrated survey of the field. First, the issue of the multi-agent learning goal is discussed, after which a representative selection of algorithms is reviewed. Finally, open issues are identified and future research directions are outlined

...read moreread less

118 citations

Proceedings Article•DOI•

A comparison of filter configurations for freeway traffic state estimation

[...]

Andreas Hegyi¹, D. Girimonte, Robert Babuska, B. De Schutter•Institutions (1)

Delft University of Technology¹

09 Oct 2006

TL;DR: The main conclusions from the simulations are that the performance of the extended Kalman filter and the unscented Kalmanfilter is comparable, joint filtering performs significantly better than dual filtering, and a larger number of detectors results in better state estimation, but has no significant influence on the parameter estimation error.

...read moreread less

Abstract: We present a comparison for several filter configurations for freeway traffic state estimation. Since the environmental conditions on a freeway may change over time (e.g., changing weather conditions), parameter estimation is also considered. We compare the performance of the extended Kalman filter and the unscented Kalman filter for state estimation, parameter estimation, joint estimation and dual estimation. Furthermore, the performance is evaluated for different detector configurations. The main conclusions from the simulations are that (1) the performance of the extended Kalman filter and the unscented Kalman filter is comparable, (2) joint filtering performs significantly better than dual filtering, and (3) a larger number of detectors results in better state estimation, but has no significant influence on the parameter estimation error

...read moreread less

86 citations

Proceedings Article•DOI•

Decentralized Reinforcement Learning Control of a Robotic Manipulator

[...]

Lucian Busoniu¹, Bart De Schutter¹, Robert Babuska¹•Institutions (1)

Delft University of Technology¹

01 Dec 2006

TL;DR: This paper investigates centralized and decentralized RL, emphasizing the challenges and potential advantages of the latter and illustrated on an example: learning to control a two-link rigid manipulator.

...read moreread less

Abstract: Multi-agent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, etc. Learning approaches to multi-agent control, many of them based on reinforcement learning (RL), are investigated in complex domains such as teams of mobile robots. However, the application of decentralized RL to low-level control tasks is not as intensively studied. In this paper, we investigate centralized and decentralized RL, emphasizing the challenges and potential advantages of the latter. These are then illustrated on an example: learning to control a two-link rigid manipulator. Some open issues and future research directions in decentralized RL are outlined

...read moreread less

47 citations

Journal Article•DOI•

Design of a gain-scheduling mechanism for flight control laws by fuzzy clustering

[...]

M. Oosterom¹, Robert Babuska¹•Institutions (1)

Delft University of Technology¹

01 Jul 2006-Control Engineering Practice

TL;DR: In this paper, an automated procedure has been developed and applied to the design of a longitudinal control law in a fly-by-wire flight control system, where the number of operating points and their locations are determined automatically by using fuzzy clustering to capture characteristic patterns in the aerodynamic model throughout the flight envelope.

...read moreread less

33 citations

Proceedings Article•DOI•

Virtual Sensor for the Angle-of-Attack Signal in Small Commercial Aircraft

[...]

M. Oosterom¹, Robert Babuska¹•Institutions (1)

Delft University of Technology¹

11 Sep 2006

TL;DR: The design of a virtual sensor for the Angle-of-Attack signal in a small commercial aircraft is described, which combines a white-box linear time-varying model, a gray-box nonlinear Takagi-Sugeno fuzzy model and a black-box neural network compensator, whose purpose is to reduce the estimation error of the linear parameter varying model.

...read moreread less

Abstract: An aircraft carries on board many sensors which measure a wide variety of variables. Due to the relations between the measured signals, a certain level redundancy is available. This redundancy can be used to estimate a particular variable based on signals that represent other variables. Such an estimator can be used as a virtual sensor. This paper describes the design of a virtual sensor for the Angle-of-Attack signal in a small commercial aircraft. In order to effectively use all available knowledge and data, and to comply with the stringent design requirements, the virtual sensor combines a number of technologies: a white-box linear time-varying model, a gray-box nonlinear Takagi-Sugeno (TS) fuzzy model and a black-box neural network compensator, whose purpose is to reduce the estimation error of the linear parameter varying model. The TS model and the neural network are trained by using data from nonlinear aircraft simulations. The inputs of the neural network are selected by a genetic search algorithm with a backward elimination procedure. Extensive evaluation has shown that the design requirements are amply met and that the proposed design methodology has a good potential for future applications in aircraft and other high-performance systems.

...read moreread less

30 citations

Journal Article•DOI•

Model weight and state estimation for multiple model systems applied to fault detection and identification

[...]

Redouane Hallouzi¹, Michel Verhaegen¹, Robert Babuska¹, Stoyan Kanev¹•Institutions (1)

Delft University of Technology¹

01 Jan 2006-IFAC Proceedings Volumes

TL;DR: In this article, a method for estimating both the weights and the state of a multiple model system with one common state vector is proposed, where the weights are related to the activation of each individual model.

...read moreread less

22 citations

Journal Article•DOI•

Genetic polynomial regression as input selection algorithm for non-linear identification

[...]

K. Maertens¹, J. De Baerdemaeker¹, Robert Babuska²•Institutions (2)

Katholieke Universiteit Leuven¹, Delft University of Technology²

15 May 2006

TL;DR: A genetic polynomial regression technique is proposed to select the significant input variables for the identification of non-linear dynamic systems with multiple inputs and a real-world example of this technique has been applied.

...read moreread less

Abstract: The performance of non-linear identification techniques is often determined by the appropriateness of the selected input variables and the corresponding time lags. High correlation coefficients between candidate input variables in addition to a non-linear relation with the output signal induce the need for an appropriate input selection methodology. This paper proposes a genetic polynomial regression technique to select the significant input variables for the identification of non-linear dynamic systems with multiple inputs. Statistical tools are presented to visualize and to process the results from different selection runs. The evolutionary approach can be used for a wide range of identification techniques and only requires a minimal input and a priori knowledge from the user. The evolutionary selection algorithm has been applied on a real-world example to illustrate its performance. The engine load in a combine harvester is highly variable in time and should be kept below an allowable limit during automatic ground speed control mode. The genetic regression process has been used to select those measurement variables that have a significant impact on the engine load and that will act as measurement variables of a non-linear model-based engine load controller.

...read moreread less

20 citations

Proceedings Article•DOI•

Reinforcement Learning Control for Biped Robot Walking on Uneven Surfaces

[...]

Shouyi Wang¹, J. Braaksma¹, Robert Babuska¹, D. Hobbelen¹•Institutions (1)

Delft University of Technology¹

16 Jul 2006

TL;DR: The use of reinforcement learning is investigated to make a dynamic walking robot more robust against ground disturbances and demonstrates that the biped quickly learns to overcome step-down disturbances on the floor up to 10% of the leg length, without compromising the natural walking style provided by the PD controller.

...read moreread less

Abstract: Biped robots based on the concept of (passive) dynamic walking are far simpler than the traditional fullyI controlled walking robots, while achieving a more natural gait and consuming less energy. However, lightly actuated dynamic walking robots, which rely on the natural limit cycle of their mechanical structure, are very sensitive to ground disturbances. Already a very small step down can cause the robot to lose stability. In this paper, we investigate the use of reinforcement learning to make a dynamic walking robot more robust against ground disturbances. The learning controller is applied to a simulated two-link biped which is an abstraction of a mechanical prototype developed at the Delft Biorobotics Laboratory. The learning controller has been designed such that it can be applied as a straightforward extension of the proportionalI-derivative (PD) controller currently used to drive the robot's pneumatic actuators. The learning controller is therefore suitable for the future implementation in the robot hardware. Simulation results demonstrate that the biped quickly learns to overcome step-down disturbances on the floor up to 10% of the leg length, without compromising the natural walking style provided by the PD controller, which was optimized for walking on an even surface.

...read moreread less

20 citations

Journal Article•DOI•

Optimal flow distribution over multiple parallel pellet reactors: a model-based approach.

[...]

K. M. van Schagen¹, Robert Babuska¹, Luuk C. Rietveld¹, E. T. Baars•Institutions (1)

Delft University of Technology¹

01 Feb 2006-Water Science and Technology

TL;DR: A new approach for optimising the production of drinking water treatment plants is proposed that relies on optimal model-based control of a single softening reactor and the use of a bypass.

...read moreread less

15 citations

Journal Article•DOI•

Information and communication technology embraces control: Status report prepared by the IFAC Coordinating Committee on Computers, Cognition and Communication

[...]

Wolfgang A. Halang¹, Ricardo Sanz², Robert Babuska³, Hubert Roth⁴•Institutions (4)

Rolf C. Hagen Group¹, Technical University of Madrid², Delft University of Technology³, Folkwang University of the Arts⁴

01 Jan 2006-Annual Reviews in Control

TL;DR: A new approach in control engineering is presented, in which control, computers, communication and cognition play equal roles in addressing real-life problems from very small-scale devices to very large-scale industrial processes and non-technical applications.

...read moreread less

10 citations

Proceedings Article•DOI•

Particle filtering for on-line estimation of overflow losses in a hopper dredger

[...]

Robert Babuska¹, Zsofia Lendek¹, J. Braaksma¹, C. de Keizer•Institutions (1)

Delft University of Technology¹

14 Jun 2006

TL;DR: In this paper, a particle filter is applied to the estimation of overflow losses in a hopper dredger, based on the measurements of the total hopper volume, mass, incoming mixture density and flow-rate.

...read moreread less

Abstract: A particle filter is applied to the estimation of overflow losses in a hopper dredger. The filter estimates online the overflow mixture density and flow-rate, based on the measurements of the total hopper volume, mass, incoming mixture density and flow-rate. These data are readily available on board of every modern hopper dredger. The main advantage of the proposed approach is that the particle filter uses straightforward nonlinear mass balance equations and does not rely on complex sedimentation models with uncertain parameters. The performance was evaluated in simulations as well as with real measurements and the results is encouraging. The filter can be used to improve parameter estimation in complex mechanistic models of the hopper sedimentation process and to facilitate decision making on board of the hopper dredger.

...read moreread less

Proceedings Article•DOI•

A Fuzzy-logic System for Detecting Oscillations in Control Loops

[...]

Robert Babuska¹, J. van Ast¹, S. Mesic•Institutions (1)

Delft University of Technology¹

11 Sep 2006

TL;DR: A novel criterion that uses on-line spectral analysis over a moving window and subsequent fuzzy decision making based on the magnitude and duration of oscillations as criteria is proposed and demonstrated by using real-time data from dissolved oxygen and pH control loops in a fermentation process.

...read moreread less

Abstract: The automatic detection of oscillations in control loops is essential for effective performance monitoring. However, the methods known from the literature are often sensitive to normal system responses such as step changes in the reference or the rejection of disturbances. Therefore, a novel criterion is proposed in this paper. It uses on-line spectral analysis over a moving window and subsequent fuzzy decision making based on the magnitude and duration of oscillations as criteria. An offline adaptation mechanism is available to tune the system with the help of data and expert knowledge. The usefulness of this criterion has been demonstrated by using real-time data from dissolved oxygen and pH control loops in a fermentation process.

...read moreread less

Proceedings Article•

Dynamic Exploration in Q(lambda)-learning.

[...]

Jelmer van Ast, Robert Babuska

01 Jan 2006

Proceedings Article•DOI•

Dynamic Exploration in Q(λ)-learning

[...]

J. van Ast¹, Robert Babuska¹•Institutions (1)

Delft University of Technology¹

16 Jul 2006

TL;DR: This paper introduces a new type of exploration, called dynamic exploration, which differs from the existing exploration methods in that it makes exploration a function of the action selected in the previous time step.

...read moreread less

Abstract: Reinforcement learning has proved its value in solving complex optimization tasks. However, the learning time for even simple problems is typically very long. Efficient exploration of the state-action space is therefore crucial for effective learning. This paper introduces a new type of exploration, called dynamic exploration. It differs from the existing exploration methods (both directed and undirected) in that it makes exploration a function of the action selected in the previous time step. In our approach, states can either belong to long-path states, where the optimal action is the same as the optimal action in the previous state, or to switch states, where the action is different. In realistic learning problems, the number of long-path states exceeds the number of switch states. Given this information, the exploration method can explore the state-space more efficiently. Experiments on different gridworld optimization tasks demonstrate the reduction of learning time with dynamic exploration.

...read moreread less

Multiagent Reinforcement Learning Algorithm Research Based on Non Markov Environment

[...]

Xiangping Meng, Robert Babuska, Yu Chen¹, Lucian Busoniu•Institutions (1)

Delft University of Technology¹

01 Jan 2006

TL;DR: An effective reinforcement learning algorithm based on non Markov environment is proposed, which uses linear programming to find the best-response policy, and avoids solving multiple Nash equilibria problem.

...read moreread less

Abstract: In this paper several multiagent reinforcement learning algorithms are investigated, compared and analyzed. An effective reinforcement learning algorithm based on non Markov environment is proposed. This algorithm uses linear programming to find the best-response policy, and avoids solving multiple Nash equilibria problem. The algorithm involves simple procedures and easy computations, and can guarantee good learning convergence in some situations. Experiment results show that this algorithm is effective. Keyword: multiagent; reinforcement learning; markov environment; nash equilibria

...read moreread less

Journal Article•DOI•

Adaptive fuzzy control for speed-reference tracking in nonlinear servo drives

[...]

Domenico Bellomo¹, Robert Babuska², David Naso¹•Institutions (2)

Instituto Politécnico Nacional¹, Delft University of Technology²

01 Jan 2006-IFAC Proceedings Volumes

TL;DR: Experimental results demonstrate that AFC achieves significantly better tracking performance than the linear adaptive controller and that the composite adaptive laws provide a further improvement over the standard adaptive laws.

...read moreread less