scispace - formally typeset
Search or ask a question

Showing papers by "Robert Babuska published in 2008"


Journal ArticleDOI
01 Mar 2008
TL;DR: The benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied, and an outlook for the field is provided.
Abstract: Multiagent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must, instead, discover a solution on their own, using learning. A significant part of the research on multiagent learning concerns reinforcement learning techniques. This paper provides a comprehensive survey of multiagent reinforcement learning (MARL). A central issue in the field is the formal statement of the multiagent learning goal. Different viewpoints on this issue have led to the proposal of many different goals, among which two focal points can be distinguished: stability of the agents' learning dynamics, and adaptation to the changing behavior of the other agents. The MARL algorithms described in the literature aim---either explicitly or implicitly---at one of these two goals or at a combination of both, in a fully cooperative, fully competitive, or more general setting. A representative selection of these algorithms is discussed in detail in this paper, together with the specific issues that arise in each category. Additionally, the benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied. Finally, an outlook for the field is provided.

1,878 citations


Journal ArticleDOI
TL;DR: In this article, a distributed speed limit control approach based on a distributed controller design technique was developed to eliminate short traffic jams that emerge at bottlenecks and travel in the upstream direction on the freeway.
Abstract: Dynamic speed limits can be used to eliminate shockwaves on freeways. Shockwaves are typically short traffic jams that emerge at bottlenecks and travel in the upstream direction on the freeway. These shockwaves lead to increased travel times and possibly to unsafe situations. A speed limit control approach to resolving shockwaves was developed based on a distributed controller design technique. The controller is distributed in the sense that each speed limit sign has its own controller. The controller parameters are optimized by numerical optimization, assuming that the controller structure and parameters are the same for each controller. The resulting performances are compared for several designs, differing in the controller order and the extent that the upstream and downstream traffic states are used as inputs for the controller. Other controllers known from the literature are based on switching schemes using local information only or are centralized model-based controllers with high computational loads...

41 citations


Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed method improves the controller's tracking performance and Parametric and structural changes are introduced to the controlled plant, in order to emphasize the advantages and limitations of the considered adaptive controllers.

40 citations


Journal ArticleDOI
TL;DR: In this article, a particle filter, based on a first-principles model, estimates the state of the softening reactor and a nonlinear model-predictive controller determines the values of the manipulated variables.

39 citations


Journal ArticleDOI
TL;DR: The decomposition of a linear process model into a cascade of simpler subsystems and the use of a Kalman filter to individually estimate the states of these subsystems is proposed and the performance achieved by the cascaded observers is comparable and in certain cases even better than the performance of the centralized observer.

35 citations


Journal ArticleDOI
TL;DR: The flexibility, scalability, and robustness to errors on a local level are intrinsic properties of swarms that have attracted the interest of researchers in applying swarm technology to various problems.

27 citations


Journal ArticleDOI
TL;DR: The current operation of the treatment plant of Waternet violates the calculated constraints with consequences for effluent quality and corrective maintenance, and the softening process can thus be improved.

23 citations


Journal ArticleDOI
TL;DR: In this article, a decentralized feedback controller with a fixed structure was proposed to solve the problem of short traffic jams and reduce the total time spent by 20% compared to the uncontrolled case.

22 citations


Journal ArticleDOI
TL;DR: In this article, a new method to detect random noise in seismic data using fuzzy Gustafson-Kessel (GK) clustering is presented. But the method is not suitable for the detection of seismic events and random noise.
Abstract: We present a new method to detect random noise in seismic data using fuzzy Gustafson–Kessel (GK) clustering. First, using an adaptive distance norm, a matrix is constructed from the observed seismic amplitudes. The next step is to find centres of ellipsoidal clusters and construct a partition matrix which determines the soft decision boundaries between seismic events and random noise. The GK algorithm updates the cluster centres in order to iteratively minimize the cluster variance. Multiplication of the fuzzy membership function with values of each sample yields new sections; we name them 'clustered sections'. The seismic amplitude values of the clustered sections are given in a way to decrease the level of noise in the original noisy seismic input. In pre-stack data, it is essential to study the clustered sections in a f–k domain; finding the quantitative index for weighting the post-stack data needs a similar approach. Using the knowledge of a human specialist together with the fuzzy unsupervised clustering, the method is a semi-supervised random noise detection. The efficiency of this method is investigated on synthetic and real seismic data for both pre- and post-stack data. The results show a significant improvement of the input noisy sections without harming the important amplitude and phase information of the original data. The procedure for finding the final weights of each clustered section should be carefully done in order to keep almost all the evident seismic amplitudes in the output section. The method interactively uses the knowledge of the seismic specialist in detecting the noise.

22 citations


Journal ArticleDOI
TL;DR: In this article, a feed-forward model-based friction compensation technique using the LuGre friction model is presented, where an off-line method is given to estimate the model's parameters based on simple ramp-response experiments.

19 citations


Proceedings ArticleDOI
01 Jun 2008
TL;DR: An ACO approach to optimal control is proposed, which requires that a continuous-time, continuous-state model of the system, together with a finite action set, is formulated as a discrete, non-deterministic automaton.
Abstract: Ant Colony Optimization (ACO) has proven to be a very powerful optimization heuristic for Combinatorial Optimization Problems (COPs). It has been demonstrated to work well when applied to various NP-complete problems, such as the traveling salesman problem. In this paper, an ACO approach to optimal control is proposed. This approach requires that a continuous-time, continuous-state model of the system, together with a finite action set, is formulated as a discrete, non-deterministic automaton. The control problem is then translated into a stochastic COP. This method is applied to the time-optimal swing-up and stabilization of a pendulum.

Journal ArticleDOI
TL;DR: This paper proposes a technique to optimize the shape of a constant number of basis functions for the approximate, fuzzy Q-iteration algorithm, and measures the actual performance of the computed policies in the task, using simulation from a representative set of initial states.

Proceedings ArticleDOI
01 Jun 2008
TL;DR: This work analyzes the stability of the overall TS system based on the Stability of the subsystems and the strength of the interconnection terms, and proposes a decentralized approach to observer design.
Abstract: A large class of nonlinear systems can be well approximated by Takagi-Sugeno (TS) fuzzy models, with linear or affine consequents. It is well-known that the stability of these consequent models does not ensure the stability of the overall fuzzy system. Stability conditions developed for TS fuzzy systems in general rely on the feasibility of an associated system of linear matrix inequalities, whose complexity may grow exponentially with the number of rules. We study distributed systems, where the subsystems are represented as TS fuzzy models. For such systems, a centralized analysis is often unfeasible. We analyze the stability of the overall TS system based on the stability of the subsystems and the strength of the interconnection terms. For naturally distributed applications, such as multi-agent systems, when adding new subsystems ldquoon-linerdquo, the construction and tuning of a centralized observer is often intractable. Therefore, we also propose a decentralized approach to observer design. Applications of such systems include distributed process control, traffic networks, and economic systems.

Journal ArticleDOI
TL;DR: The mathematical model of the elevator system is described in detail, making the system easy to re-implement and re-use, and an experimental comparison is made between the performance of the Q-value iteration and Q-learning RL algorithms, when applied to the elevator system.

Journal ArticleDOI
TL;DR: In this paper, a decomposition of the nonlinear process model into two simpler subsystems is proposed, and a different type of observer is considered for each subsystem, i.e., a particle filter and an unscented Kalman filter.

Proceedings ArticleDOI
01 Jun 2008
TL;DR: A general comprehensive swarm framework is introduced and related to the established state of the art, which is a first and important step in the development and analysis of more complex and intelligent swarms.
Abstract: Swarms are characterized by the ability to generate complex behavior from the coupling of simple individuals. While the swarm approach to distributed systems of moving agents is gradually finding a way to engineering applications, a true successful demonstration of an engineered swarm is still missing. One of the reasons for this is the gap between the complexity of the swarms studied in fundamental research and the complexity needed for the application to interesting control problems. In the majority of the research on swarm intelligent systems, the moving agents in the swarm are modeled as simple reactive agents. This model comprises too little intelligence to fully exploit the potential of swarms. In this paper, a general comprehensive swarm framework is introduced and related to the established state of the art. Such a framework is novel and it is a first and important step in the development and analysis of more complex and intelligent swarms.

Proceedings ArticleDOI
01 Jun 2008
TL;DR: An approximate, model-based Q-iteration algorithm that relies on a fuzzy partition of the state space, and on a discretization of the action space to show that the resulting algorithm is consistent, i.e., that the optimal solution is obtained asymptotically as the approximation accuracy increases.
Abstract: Reinforcement learning (RL) is a widely used paradigm for learning control. Computing exact RL solutions is generally only possible when process states and control actions take values in a small discrete set. In practice, approximate algorithms are necessary. In this paper, we propose an approximate, model-based Q-iteration algorithm that relies on a fuzzy partition of the state space, and on a discretization of the action space. Using assumptions on the continuity of the dynamics and of the reward function, we show that the resulting algorithm is consistent, i.e., that the optimal solution is obtained asymptotically as the approximation accuracy increases. An experimental study indicates that a continuous reward function is also important for a predictable improvement in performance as the approximation accuracy increases.

Journal ArticleDOI
TL;DR: In this paper, a particle filter is used to improve the accuracy of pH quality measurements in a water treatment plant. But the performance of the particle filter was evaluated both for simulated and real-world data.