scispace - formally typeset
Journal ArticleDOI: 10.1504/IJAAC.2010.035528

A fuzzy decision tree-based robust Markov game controller for robot manipulators

01 Oct 2010-International Journal of Automation and Control (Inderscience Publishers)-Vol. 4, Iss: 4, pp 417-439
Abstract: Two-player zero-sum Markov game framework offers an effective platform for designing robust controllers. In the Markov game-based learning, theoretical convergence of the learning process with the function approximator cannot be guaranteed. However, fusing Q-learning with decision tree (DT) function approximator has shown good learning performance and more reliable convergence. It scales better to larger input spaces with lower memory requirements, and can solve problems that are infeasible using table lookup. This motivates us to introduce DT function approximator in Markov game reinforcement learning (RL) framework. This approach works, though it deals with only discrete actions. In realistic applications, it is imperative to deal with continuous state?action spaces. In this paper, we propose Markov game framework for continuous state?action space systems using fuzzy DT as a function approximator. Simulation experiments on a two-link robot manipulator bring out the importance of the proposed structure in terms of better robust performance and computational efficiency.

...read more

Topics: Markov process (61%), Reinforcement learning (59%), Game theory (54%) ...read more
Citations
  More

Open accessJournal ArticleDOI: 10.1016/J.IFACOL.2016.03.034
Hitesh Shah1, Madan Gopal2Institutions (2)
01 Jan 2016-IFAC-PapersOnLine
Abstract: Model predictive control (MPC) is a model-based control philosophy in which the current control action is obtained by on-line optimization of objective function. MPC is, by now, considered to be a mature technology owing to the plethora of research and industrial process control applications. The model under consideration is either linear or piece-wise linear. However, turning to the nonlinear processes, the difficulties are in obtaining a good nonlinear model, and the excessive computational burden associated with the control optimization. Proposed framework, named as model-free predictive control (MFPC), takes care of both the issues of conventional MPC. Model-free reinforcement learning formulates predictive control problem with a control horizon of only length one, but takes a decision based on infinite horizon information. In order to facilitate generalization in continuous state and action spaces, fuzzy inference system is used as a function approximator in conjunction with Q-learning. Empirical study on a continuous stirred tank reactor shows that the MFPC reinforcement learning framework is efficient, and strongly robust.

...read more

Topics: Model predictive control (64%), Reinforcement learning (59%), Q-learning (56%) ...read more

17 Citations


Open accessJournal ArticleDOI: 10.3390/SU13126689
12 Jun 2021-Sustainability
Abstract: Sustainability improvements in industrial production are essential for tackling climate change and the resulting ecological crisis. In this context, resource efficiency can directly lead to significant advancements in the ecological performance of manufacturing companies. The application of Artificial Intelligence (AI) also plays an increasingly important role. However, the potential influence of AI applications on resource efficiency has not been investigated. Against this background, this article provides an overview of the current AI applications and how they affect resource efficiency. In line with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, this paper identifies, categorizes, and analyzes seventy papers with a focus on AI tasks, AI methods, business units, and their influence on resource efficiency. Only a minority of papers was found to address resource efficiency as an explicit objective. Subsequently, typical use cases of the identified AI applications are described with a focus on predictive maintenance, production planning, fault detection and predictive quality, as well as the increase in energy efficiency. In general, more research is needed that explicitly considers sustainability in the development and use phase of AI solutions, including Green AI. This paper contributes to research in this field by systematically examining papers and revealing research deficits. Additionally, practitioners are offered the first indications of AI applications increasing resource efficiency.

...read more

2 Citations


Open accessJournal ArticleDOI: 10.1016/J.PROTCY.2016.03.026
Hitesh Shah1, Madan Gopal2Institutions (2)
Abstract: A game theoretic aspect in reinforcement learning based controller design with kernel recursive least squares algorithm for value function approximation is proposed in this paper. A kernel recursive least-squares-support vector machine is used to realize a mapping from state, controller's action and disturber's action to Q-value function. Online sparsification framework permits the addition of training sample into the Q-function approximation only if it is approximately linearly independent of the preceding training samples. Markov game setup is shown to be one of the important platforms for addressing robustness of direct adaptive optimal control of nonlinear systems. A game against nature strategy shows the strength of state importance in terms of accelerated learning, and better relative stability of the system. Simulation results on two-link robot manipulator show that the proposed method has high learning efficiency—better accuracy measured in terms of mean square error; and lesser computation time, compared to the least-squares support vector machine.

...read more

1 Citations


Book ChapterDOI: 10.1007/978-3-319-07353-8_11
Hitesh Shah1, Madan Gopal2Institutions (2)
01 Jan 2014-
Abstract: Reinforcement learning (RL) algorithms that employ neural networks as function approximators have proven to be powerful tools for solving optimal control problems. However, neural network function approximators suffer from a number of problems like learning becomes difficult when the training data are given sequentially, difficult to determine structural parameters, and usually result in local minima or overfitting. In this paper, a novel on-line sequential learning evolving neural network model design for RL is proposed. We explore the use of minimal resource allocation neural network (mRAN), and develop a mRAN function approximation approach to RL systems. Potential of this approach is demonstrated through a case study. The mean square error accuracy, computational cost, and robustness properties of this scheme are compared with static structure neural networks.

...read more

References
  More

Open accessBook
Richard S. Sutton1, Andrew G. BartoInstitutions (1)
01 Jan 1988-
Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

...read more

Topics: Learning classifier system (69%), Reinforcement learning (69%), Apprenticeship learning (65%) ...read more

32,257 Citations


Open accessBook
J. Ross Quinlan1Institutions (1)
15 Oct 1992-
Abstract: From the Publisher: Classifier systems play a major role in machine learning and knowledge-based systems, and Ross Quinlan's work on ID3 and C4.5 is widely acknowledged to have made some of the most significant contributions to their development. This book is a complete guide to the C4.5 system as implemented in C for the UNIX environment. It contains a comprehensive guide to the system's use , the source code (about 8,800 lines), and implementation notes. The source code and sample datasets are also available on a 3.5-inch floppy diskette for a Sun workstation. C4.5 starts with large sets of cases belonging to known classes. The cases, described by any mixture of nominal and numeric properties, are scrutinized for patterns that allow the classes to be reliably discriminated. These patterns are then expressed as models, in the form of decision trees or sets of if-then rules, that can be used to classify new cases, with emphasis on making the models understandable as well as accurate. The system has been applied successfully to tasks involving tens of thousands of cases described by hundreds of properties. The book starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting. Advantages and disadvantages of the C4.5 approach are discussed and illustrated with several case studies. This book and software should be of interest to developers of classification-based intelligent systems and to students in machine learning and expert systems courses.

...read more

Topics: ID3 algorithm (55%), Intelligent decision support system (55%), C4.5 algorithm (55%) ...read more

21,396 Citations


Open access
Steven L. Salzberg1, Alberto SegreInstitutions (1)
01 Jan 1994-
Abstract: Algorithms for constructing decision trees are among the most well known and widely used of all machine learning methods. Among decision tree algorithms, J. Ross Quinlan's ID3 and its successor, C4.5, are probably the most popular in the machine learning community. These algorithms and variations on them have been the subject of numerous research papers since Quinlan introduced ID3. Until recently, most researchers looking for an introduction to decision trees turned to Quinlan's seminal 1986 Machine Learning journal article [Quinlan, 1986]. In his new book, C4.5: Programs for Machine Learning, Quinlan has put together a definitive, much needed description of his complete system, including the latest developments. As such, this book will be a welcome addition to the library of many researchers and students.

...read more

Topics: Active learning (machine learning) (64%), Robot learning (63%), Instance-based learning (62%) ...read more

7,843 Citations


Open access
01 Jan 1989-

4,910 Citations


Open accessJournal ArticleDOI: 10.1016/S0019-9958(62)90649-6
Aiko M. Hormann1Institutions (1)
Abstract: This paper reports on a proposed schema and gives some detailed specifications for constructing a learning system by means of programming a computer. We have tried to separate learning processes and problem-solving techniques from specific problem content in order to achieve generality, i.e., in order to achieve a system capable of performing in a wide variety of learning and problem-solving situations. Behavior of the system is determined by both a direct and an indirect means. The former involves detailed, explicit specification of responses or response patterns in the form of built-in programs. The indirect means is by programs representing three mechanisms: a “community unit” (a program-providing mechanism), a planning mechanism, and an induction mechanism. These mechanisms have in common the following features: (1) a directly given repertory of response patterns; (2) general and less explicitly specified decision making rules and hierarchically distributed authority for decision making; (3) an ability to delegate some control over the system's behavior to the environment; and (4) a self-modifying ability which allows the decision-making rules and the repertory of response patterns to adapt and grow. In Part I of this paper, the community unit is described and an illustration of its operation is given. It is presented in a schematized framework as a team of routines connected by first and second-order feedback loops. The function of the community unit is to provide higher-level programs (its environment or customers) with programs capable of performing requested tasks, or to perform a customer-stipulated task by executing a program. If the community unit does not have a ready-made program in stock to fill a particular request, internal programming will be performed, i.e., the community unit will have to construct a program, and debug it, before outputting or executing it. The primary purpose of internal programming is to assist higher-level programs in performing tasks for which detailed preplanning by an external programmer is either impossible or impractical. Some heuristics are suggested for enabling the community unit to search for a usable sequence of operations more efficiently than if it were to search simply by exhaustive or random trial and error. These heuristics are of a step-by-step nature. For complex problems, however, such step-by-step heuristics alone will fail unless there is also a mechanism for analyzing problem structure and placing guideposts on the road to the goal. A planning mechanism capable of doing this is proposed in Part II. Under the control of a higher-level program which specifies the level of detail required in a plan being developed, this planning mechanism is to break up problems into a hierarchy of subproblems each by itself presumably easier to solve than the original problem. To manage classes of problems and to make efficient use of past experience, an induction mechanism is proposed in Part II. An illustration is given of the induction mechanism solving a specific sequence of tasks. The system is currently being programmed and tested in IPL-V on the Philco 2000 computer. The current stage of the programming effort is reported in an epilogue to Part II.

...read more

Topics: Heuristics (56%), Trial and error (50%)

3,715 Citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20211
20162
20141