Showing papers by "Robert Babuska published in 2006"
••
01 Dec 2006TL;DR: An integrated survey of the field of multi-agent learning is presented, in which the issue of the multi- agent learning goal is discussed and a representative selection of algorithms is reviewed.
Abstract: Multi-agent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, economics. Many tasks arising in these domains require that the agents learn behaviors online. A significant part of the research on multi-agent learning concerns reinforcement learning techniques. However, due to different viewpoints on central issues, such as the formal statement of the learning goal, a large number of different methods and approaches have been introduced. In this paper we aim to present an integrated survey of the field. First, the issue of the multi-agent learning goal is discussed, after which a representative selection of algorithms is reviewed. Finally, open issues are identified and future research directions are outlined
118 citations
••
09 Oct 2006TL;DR: The main conclusions from the simulations are that the performance of the extended Kalman filter and the unscented Kalmanfilter is comparable, joint filtering performs significantly better than dual filtering, and a larger number of detectors results in better state estimation, but has no significant influence on the parameter estimation error.
Abstract: We present a comparison for several filter configurations for freeway traffic state estimation. Since the environmental conditions on a freeway may change over time (e.g., changing weather conditions), parameter estimation is also considered. We compare the performance of the extended Kalman filter and the unscented Kalman filter for state estimation, parameter estimation, joint estimation and dual estimation. Furthermore, the performance is evaluated for different detector configurations. The main conclusions from the simulations are that (1) the performance of the extended Kalman filter and the unscented Kalman filter is comparable, (2) joint filtering performs significantly better than dual filtering, and (3) a larger number of detectors results in better state estimation, but has no significant influence on the parameter estimation error
86 citations
••
01 Dec 2006TL;DR: This paper investigates centralized and decentralized RL, emphasizing the challenges and potential advantages of the latter and illustrated on an example: learning to control a two-link rigid manipulator.
Abstract: Multi-agent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, etc. Learning approaches to multi-agent control, many of them based on reinforcement learning (RL), are investigated in complex domains such as teams of mobile robots. However, the application of decentralized RL to low-level control tasks is not as intensively studied. In this paper, we investigate centralized and decentralized RL, emphasizing the challenges and potential advantages of the latter. These are then illustrated on an example: learning to control a two-link rigid manipulator. Some open issues and future research directions in decentralized RL are outlined
47 citations
••
TL;DR: In this paper, an automated procedure has been developed and applied to the design of a longitudinal control law in a fly-by-wire flight control system, where the number of operating points and their locations are determined automatically by using fuzzy clustering to capture characteristic patterns in the aerodynamic model throughout the flight envelope.
33 citations
••
11 Sep 2006TL;DR: The design of a virtual sensor for the Angle-of-Attack signal in a small commercial aircraft is described, which combines a white-box linear time-varying model, a gray-box nonlinear Takagi-Sugeno fuzzy model and a black-box neural network compensator, whose purpose is to reduce the estimation error of the linear parameter varying model.
Abstract: An aircraft carries on board many sensors which measure a wide variety of variables. Due to the relations between the measured signals, a certain level redundancy is available. This redundancy can be used to estimate a particular variable based on signals that represent other variables. Such an estimator can be used as a virtual sensor. This paper describes the design of a virtual sensor for the Angle-of-Attack signal in a small commercial aircraft. In order to effectively use all available knowledge and data, and to comply with the stringent design requirements, the virtual sensor combines a number of technologies: a white-box linear time-varying model, a gray-box nonlinear Takagi-Sugeno (TS) fuzzy model and a black-box neural network compensator, whose purpose is to reduce the estimation error of the linear parameter varying model. The TS model and the neural network are trained by using data from nonlinear aircraft simulations. The inputs of the neural network are selected by a genetic search algorithm with a backward elimination procedure. Extensive evaluation has shown that the design requirements are amply met and that the proposed design methodology has a good potential for future applications in aircraft and other high-performance systems.
30 citations
••
TL;DR: In this article, a method for estimating both the weights and the state of a multiple model system with one common state vector is proposed, where the weights are related to the activation of each individual model.
22 citations
••
15 May 2006TL;DR: A genetic polynomial regression technique is proposed to select the significant input variables for the identification of non-linear dynamic systems with multiple inputs and a real-world example of this technique has been applied.
Abstract: The performance of non-linear identification techniques is often determined by the appropriateness of the selected input variables and the corresponding time lags. High correlation coefficients between candidate input variables in addition to a non-linear relation with the output signal induce the need for an appropriate input selection methodology. This paper proposes a genetic polynomial regression technique to select the significant input variables for the identification of non-linear dynamic systems with multiple inputs. Statistical tools are presented to visualize and to process the results from different selection runs. The evolutionary approach can be used for a wide range of identification techniques and only requires a minimal input and a priori knowledge from the user. The evolutionary selection algorithm has been applied on a real-world example to illustrate its performance. The engine load in a combine harvester is highly variable in time and should be kept below an allowable limit during automatic ground speed control mode. The genetic regression process has been used to select those measurement variables that have a significant impact on the engine load and that will act as measurement variables of a non-linear model-based engine load controller.
20 citations
••
16 Jul 2006TL;DR: The use of reinforcement learning is investigated to make a dynamic walking robot more robust against ground disturbances and demonstrates that the biped quickly learns to overcome step-down disturbances on the floor up to 10% of the leg length, without compromising the natural walking style provided by the PD controller.
Abstract: Biped robots based on the concept of (passive) dynamic walking are far simpler than the traditional fullyI controlled walking robots, while achieving a more natural gait and consuming less energy. However, lightly actuated dynamic walking robots, which rely on the natural limit cycle of their mechanical structure, are very sensitive to ground disturbances. Already a very small step down can cause the robot to lose stability. In this paper, we investigate the use of reinforcement learning to make a dynamic walking robot more robust against ground disturbances. The learning controller is applied to a simulated two-link biped which is an abstraction of a mechanical prototype developed at the Delft Biorobotics Laboratory. The learning controller has been designed such that it can be applied as a straightforward extension of the proportionalI-derivative (PD) controller currently used to drive the robot's pneumatic actuators. The learning controller is therefore suitable for the future implementation in the robot hardware. Simulation results demonstrate that the biped quickly learns to overcome step-down disturbances on the floor up to 10% of the leg length, without compromising the natural walking style provided by the PD controller, which was optimized for walking on an even surface.
20 citations
••
TL;DR: A new approach for optimising the production of drinking water treatment plants is proposed that relies on optimal model-based control of a single softening reactor and the use of a bypass.
15 citations
••
TL;DR: A new approach in control engineering is presented, in which control, computers, communication and cognition play equal roles in addressing real-life problems from very small-scale devices to very large-scale industrial processes and non-technical applications.
10 citations
••
14 Jun 2006TL;DR: In this paper, a particle filter is applied to the estimation of overflow losses in a hopper dredger, based on the measurements of the total hopper volume, mass, incoming mixture density and flow-rate.
Abstract: A particle filter is applied to the estimation of overflow losses in a hopper dredger. The filter estimates online the overflow mixture density and flow-rate, based on the measurements of the total hopper volume, mass, incoming mixture density and flow-rate. These data are readily available on board of every modern hopper dredger. The main advantage of the proposed approach is that the particle filter uses straightforward nonlinear mass balance equations and does not rely on complex sedimentation models with uncertain parameters. The performance was evaluated in simulations as well as with real measurements and the results is encouraging. The filter can be used to improve parameter estimation in complex mechanistic models of the hopper sedimentation process and to facilitate decision making on board of the hopper dredger.
••
11 Sep 2006TL;DR: A novel criterion that uses on-line spectral analysis over a moving window and subsequent fuzzy decision making based on the magnitude and duration of oscillations as criteria is proposed and demonstrated by using real-time data from dissolved oxygen and pH control loops in a fermentation process.
Abstract: The automatic detection of oscillations in control loops is essential for effective performance monitoring. However, the methods known from the literature are often sensitive to normal system responses such as step changes in the reference or the rejection of disturbances. Therefore, a novel criterion is proposed in this paper. It uses on-line spectral analysis over a moving window and subsequent fuzzy decision making based on the magnitude and duration of oscillations as criteria. An offline adaptation mechanism is available to tune the system with the help of data and expert knowledge. The usefulness of this criterion has been demonstrated by using real-time data from dissolved oxygen and pH control loops in a fermentation process.
••
16 Jul 2006TL;DR: This paper introduces a new type of exploration, called dynamic exploration, which differs from the existing exploration methods in that it makes exploration a function of the action selected in the previous time step.
Abstract: Reinforcement learning has proved its value in solving complex optimization tasks. However, the learning time for even simple problems is typically very long. Efficient exploration of the state-action space is therefore crucial for effective learning. This paper introduces a new type of exploration, called dynamic exploration. It differs from the existing exploration methods (both directed and undirected) in that it makes exploration a function of the action selected in the previous time step. In our approach, states can either belong to long-path states, where the optimal action is the same as the optimal action in the previous state, or to switch states, where the action is different. In realistic learning problems, the number of long-path states exceeds the number of switch states. Given this information, the exploration method can explore the state-space more efficiently. Experiments on different gridworld optimization tasks demonstrate the reduction of learning time with dynamic exploration.
01 Jan 2006
TL;DR: An effective reinforcement learning algorithm based on non Markov environment is proposed, which uses linear programming to find the best-response policy, and avoids solving multiple Nash equilibria problem.
Abstract: In this paper several multiagent reinforcement learning algorithms are investigated, compared and analyzed. An effective reinforcement learning algorithm based on non Markov environment is proposed. This algorithm uses linear programming to find the best-response policy, and avoids solving multiple Nash equilibria problem. The algorithm involves simple procedures and easy computations, and can guarantee good learning convergence in some situations. Experiment results show that this algorithm is effective. Keyword: multiagent; reinforcement learning; markov environment; nash equilibria
••
TL;DR: Experimental results demonstrate that AFC achieves significantly better tracking performance than the linear adaptive controller and that the composite adaptive laws provide a further improvement over the standard adaptive laws.