scispace - formally typeset

Proceedings ArticleDOI

Fictitious play based Markov game control for robotic arm manipulator

01 Oct 2014-pp 1-6

TL;DR: The work attempts a `safe yet consistent' Markov game controller which advocates a minimax policy during the startup control stages and later moves to a more enterprising policy based on stochastic fictitious play.

AbstractMarkov games can be used as a platform to deal with exogenous disturbances and parametric variations. In this work an attempt has been made to achieve a superior performance with fuzzy Markov game based control by hybridizing two game theory based approaches of ‘fictitious play’ and ‘minimax’. The work attempts a ‘safe yet consistent’ Markov game controller which advocates a minimax policy during the startup control stages and later moves to a more enterprising policy based on stochastic fictitious play. The proposed controller addresses continuous state action space problems wherein we use fuzzy inference system as a universal approximator for generalization with a proportional derivative control in the nested position tracking loop. The proposed controller is simulated on a two link robot and its performance compared against fuzzy Markov game control and fuzzy Q control. Simulation results elucidate the fact that proposed control scheme leads to an improved controller with lower tracking error and torque requirements.

...read more


Citations
More filters
Journal ArticleDOI
TL;DR: A Nash-based feedback control law is formulated for an Euler–Lagrange system to yield a solution to noncooperative differential game.
Abstract: We formulate a Nash-based feedback control law for an Euler–Lagrange system to yield a solution to noncooperative differential game. The robot manipulators are broadly used in industrial units on t...

1 citations


Cites background from "Fictitious play based Markov game c..."

  • ...Sharma and Gopal (2014) tried to achieve a superior performance with fuzzy Markov game-based control by hybridizing two game theory–based approaches of “fictitious play” and “minimax.”...

    [...]

Book ChapterDOI
22 Feb 2020
Abstract: This paper casts coordination of a team of robots within the framework of game theoretic learning algorithms. In particular a novel variant of fictitious play is proposed, by considering multi-model adaptive filters as a method to estimate other players’ strategies. The proposed algorithm can be used as a coordination mechanism between players when they should take decisions under uncertainty. Each player chooses an action after taking into account the actions of the other players and also the uncertainty. Uncertainty can occur either in terms of noisy observations or various types of other players. In addition, in contrast to other game-theoretic and heuristic algorithms for distributed optimisation, it is not necessary to find the optimal parameters a priori. Various parameter values can be used initially as inputs to different models. Therefore, the resulting decisions will be aggregate results of all the parameter values. Simulations are used to test the performance of the proposed methodology against other game-theoretic learning algorithms.

References
More filters
Journal ArticleDOI
TL;DR: This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior – and proves their convergence and optimality for special cases and relation to supervised-learning methods.
Abstract: This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervised-learning methods. For most real-world prediction problems, temporal-difference methods require less memory and less peak computation than conventional methods and they produce more accurate predictions. We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporal-difference methods can be applied to advantage.

4,474 citations

Book
01 Jan 1998
Abstract: In economics, most noncooperative game theory has focused on equilibrium in games, especially Nash equilibrium and its refinements. The traditional explanation for when and why equilibrium arises is that it results from analysis and introspection by the players in a situation where the rules of the game, the rationality of the players, and the players' payoff functions are all common knowledge. Both conceptually and empirically, this theory has many problems. In The Theory of Learning in Games Drew Fudenberg and David Levine develop an alternative explanation that equilibrium arises as the long-run outcome of a process in which less than fully rational players grope for optimality over time. The models they explore provide a foundation for equilibrium theory and suggest useful ways for economists to evaluate and modify traditional equilibrium concepts.

3,251 citations

Book ChapterDOI
10 Jul 1994
TL;DR: A Q-learning-like algorithm for finding optimal policies and its application to a simple two-player game in which the optimal policy is probabilistic is demonstrated.
Abstract: In the Markov decision process (MDP) formalization of reinforcement learning, a single adaptive agent interacts with an environment defined by a probabilistic transition function. In this solipsis-tic view, secondary agents can only be part of the environment and are therefore fixed in their behavior. The framework of Markov games allows us to widen this view to include multiple adaptive agents with interacting or competing goals. This paper considers a step in this direction in which exactly two agents with diametrically opposed goals share an environment. It describes a Q-learning-like algorithm for finding optimal policies and demonstrates its application to a simple two-player game in which the optimal policy is probabilistic.

2,171 citations

BookDOI
01 Jan 2004
TL;DR: This chapter discusses reinforcement learning in large, high-dimensional state spaces, model-based adaptive critic designs, and applications of approximate dynamic programming in power systems control.
Abstract: Foreword. 1. ADP: goals, opportunities and principles. Part I: Overview. 2. Reinforcement learning and its relationship to supervised learning. 3. Model-based adaptive critic designs. 4. Guidance in the use of adaptive critics for control. 5. Direct neural dynamic programming. 6. The linear programming approach to approximate dynamic programming. 7. Reinforcement learning in large, high-dimensional state spaces. 8. Hierarchical decision making. Part II: Technical advances. 9. Improved temporal difference methods with linear function approximation. 10. Approximate dynamic programming for high-dimensional resource allocation problems. 11. Hierarchical approaches to concurrency, multiagency, and partial observability. 12. Learning and optimization - from a system theoretic perspective. 13. Robust reinforcement learning using integral-quadratic constraints. 14. Supervised actor-critic reinforcement learning. 15. BPTT and DAC - a common framework for comparison. Part III: Applications. 16. Near-optimal control via reinforcement learning. 17. Multiobjective control problems by reinforcement learning. 18. Adaptive critic based neural network for control-constrained agile missile. 19. Applications of approximate dynamic programming in power systems control. 20. Robust reinforcement learning for heating, ventilation, and air conditioning control of buildings. 21. Helicopter flight control using direct neural dynamic programming. 22. Toward dynamic stochastic optimal power flow. 23. Control, optimization, security, and self-healing of benchmark power systems.

752 citations

Proceedings ArticleDOI
01 Jul 1997
TL;DR: This paper proposes an adaptation of Watkins' Q-learning for fuzzy inference systems where both the actions and the Q-functions are inferred from fuzzy rules, showing its effectiveness.
Abstract: This paper proposes an adaptation of Watkins' Q-learning (1989, 1992) for fuzzy inference systems where both the actions and the Q-functions are inferred from fuzzy rules. This approach is compared with genetic algorithm on the cart-centering problem, showing its effectiveness.

155 citations