scispace - formally typeset
Proceedings ArticleDOI: 10.1109/ICRITO.2014.7014718

Fictitious play based Markov game control for robotic arm manipulator

01 Oct 2014-pp 1-6
Abstract: Markov games can be used as a platform to deal with exogenous disturbances and parametric variations. In this work an attempt has been made to achieve a superior performance with fuzzy Markov game based control by hybridizing two game theory based approaches of ‘fictitious play’ and ‘minimax’. The work attempts a ‘safe yet consistent’ Markov game controller which advocates a minimax policy during the startup control stages and later moves to a more enterprising policy based on stochastic fictitious play. The proposed controller addresses continuous state action space problems wherein we use fuzzy inference system as a universal approximator for generalization with a proportional derivative control in the nested position tracking loop. The proposed controller is simulated on a two link robot and its performance compared against fuzzy Markov game control and fuzzy Q control. Simulation results elucidate the fact that proposed control scheme leads to an improved controller with lower tracking error and torque requirements.

...read more

Topics: Markov process (61%), Markov decision process (60%), Minimax (60%) ...read more
Citations
  More

Journal ArticleDOI: 10.1177/1077546320982447
Abstract: We formulate a Nash-based feedback control law for an Euler–Lagrange system to yield a solution to noncooperative differential game. The robot manipulators are broadly used in industrial units on t...

...read more

Topics: Differential game (60%), Nash equilibrium (55%)

1 Citations


Book ChapterDOI: 10.1007/978-3-030-71158-0_4
22 Feb 2020-
Abstract: This paper casts coordination of a team of robots within the framework of game theoretic learning algorithms. In particular a novel variant of fictitious play is proposed, by considering multi-model adaptive filters as a method to estimate other players’ strategies. The proposed algorithm can be used as a coordination mechanism between players when they should take decisions under uncertainty. Each player chooses an action after taking into account the actions of the other players and also the uncertainty. Uncertainty can occur either in terms of noisy observations or various types of other players. In addition, in contrast to other game-theoretic and heuristic algorithms for distributed optimisation, it is not necessary to find the optimal parameters a priori. Various parameter values can be used initially as inputs to different models. Therefore, the resulting decisions will be aggregate results of all the parameter values. Simulations are used to test the performance of the proposed methodology against other game-theoretic learning algorithms.

...read more

Topics: Fictitious play (56%), Heuristic (52%), Adaptive filter (51%)
References
  More

Open accessJournal ArticleDOI: 10.1023/A:1022633531479
01 Aug 1988-Machine Learning
Abstract: This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervised-learning methods. For most real-world prediction problems, temporal-difference methods require less memory and less peak computation than conventional methods and they produce more accurate predictions. We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporal-difference methods can be applied to advantage.

...read more

4,474 Citations


Open accessBook
01 Jan 1998-
Abstract: In economics, most noncooperative game theory has focused on equilibrium in games, especially Nash equilibrium and its refinements. The traditional explanation for when and why equilibrium arises is that it results from analysis and introspection by the players in a situation where the rules of the game, the rationality of the players, and the players' payoff functions are all common knowledge. Both conceptually and empirically, this theory has many problems. In The Theory of Learning in Games Drew Fudenberg and David Levine develop an alternative explanation that equilibrium arises as the long-run outcome of a process in which less than fully rational players grope for optimality over time. The models they explore provide a foundation for equilibrium theory and suggest useful ways for economists to evaluate and modify traditional equilibrium concepts.

...read more

Topics: Equilibrium selection (75%), Self-confirming equilibrium (71%), Symmetric equilibrium (69%) ...read more

3,251 Citations


Open accessBook ChapterDOI: 10.1016/B978-1-55860-335-6.50027-1
Michael L. Littman1Institutions (1)
10 Jul 1994-
Abstract: In the Markov decision process (MDP) formalization of reinforcement learning, a single adaptive agent interacts with an environment defined by a probabilistic transition function. In this solipsis-tic view, secondary agents can only be part of the environment and are therefore fixed in their behavior. The framework of Markov games allows us to widen this view to include multiple adaptive agents with interacting or competing goals. This paper considers a step in this direction in which exactly two agents with diametrically opposed goals share an environment. It describes a Q-learning-like algorithm for finding optimal policies and demonstrates its application to a simple two-player game in which the optimal policy is probabilistic.

...read more

  • Figure 1: The minimax-Q algorithm.
    Figure 1: The minimax-Q algorithm.
  • Table 3: Results for policies trained by minimax-Q (MR and MM) and Q-learning (QR and QQ).
    Table 3: Results for policies trained by minimax-Q (MR and MM) and Q-learning (QR and QQ).
  • Table 2: Linear constraints on the solution to a matrix game.
    Table 2: Linear constraints on the solution to a matrix game.
  • Figure 2: An initial board (left) and a situation requiring a probabilistic choice for A (right).
    Figure 2: An initial board (left) and a situation requiring a probabilistic choice for A (right).

2,171 Citations


BookDOI: 10.1109/9780470544785
01 Jan 2004-
Abstract: Foreword. 1. ADP: goals, opportunities and principles. Part I: Overview. 2. Reinforcement learning and its relationship to supervised learning. 3. Model-based adaptive critic designs. 4. Guidance in the use of adaptive critics for control. 5. Direct neural dynamic programming. 6. The linear programming approach to approximate dynamic programming. 7. Reinforcement learning in large, high-dimensional state spaces. 8. Hierarchical decision making. Part II: Technical advances. 9. Improved temporal difference methods with linear function approximation. 10. Approximate dynamic programming for high-dimensional resource allocation problems. 11. Hierarchical approaches to concurrency, multiagency, and partial observability. 12. Learning and optimization - from a system theoretic perspective. 13. Robust reinforcement learning using integral-quadratic constraints. 14. Supervised actor-critic reinforcement learning. 15. BPTT and DAC - a common framework for comparison. Part III: Applications. 16. Near-optimal control via reinforcement learning. 17. Multiobjective control problems by reinforcement learning. 18. Adaptive critic based neural network for control-constrained agile missile. 19. Applications of approximate dynamic programming in power systems control. 20. Robust reinforcement learning for heating, ventilation, and air conditioning control of buildings. 21. Helicopter flight control using direct neural dynamic programming. 22. Toward dynamic stochastic optimal power flow. 23. Control, optimization, security, and self-healing of benchmark power systems.

...read more

Topics: Reinforcement learning (70%), Learning classifier system (67%), Inductive programming (64%) ...read more

752 Citations


Proceedings ArticleDOI: 10.1109/FUZZY.1997.622790
P.Y. Glorennec1, Lionel Jouffe1Institutions (1)
01 Jul 1997-
Abstract: This paper proposes an adaptation of Watkins' Q-learning (1989, 1992) for fuzzy inference systems where both the actions and the Q-functions are inferred from fuzzy rules. This approach is compared with genetic algorithm on the cart-centering problem, showing its effectiveness.

...read more

Topics: Fuzzy set operations (69%), Fuzzy number (68%), Adaptive neuro fuzzy inference system (68%) ...read more

155 Citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20211
20201