Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

Machine learning

Model-based Bayesian Reinforcement Learning (BRL) provides a principled solution to dealing with the exploration-exploitation trade-off, but such methods typically assume a fully observable environments. The few Bayesian RL methods that are applicable in partially observable domains, such as the Bayes-Adaptive POMDP (BA-POMDP), scale poorly. To address this issue, we introduce the Factored BA-POMDP model (FBA-POMDP), a framework that is able to learn a compact model of the dynamics by exploiting the underlying structure of a POMDP. The FBA-POMDP framework casts the problem as a planning task, for which we adapt the Monte-Carlo Tree Search planning algorithm and develop a belief tracking method to approximate the joint posterior over the state and model variables. Our empirical results show that this method outperforms a number of BRL baselines and is able to learn efficiently when the factorization is known, as well as learn both the factorization and the model parameters simultaneously.

/pdf/bayesian-reinforcement-learning-in-factored-pomdps-xpra8a8w97.pdf

Bayesian Reinforcement Learning in Factored POMDPs

We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination or dominated strategies in normal form games. We prove that when applied to finite-horizon POSGs, the algorithm iteratively eliminates very weakly dominated strategies without first forming a normal form representation of the game. For the special case in which agents share the same payoffs, the algorithm can be used to find an optimal solution. We present preliminary empirical results and discuss ways to further exploit POMDP theory in solving POSGs.

/pdf/dynamic-programming-for-partially-observable-stochastic-3qqaubhdyh.pdf

Dynamic programming for partially observable stochastic games

Autonomous AI systems need complex computational techniques for planning and performing actions. Planning and acting require significant deliberation because an intelligent system must coordinate and integrate these activities in order to act effectively in the real world. This book presents a comprehensive paradigm of planning and acting using the most recent and advanced automated-planning techniques. It explains the computational deliberation capabilities that allow an actor, whether physical or virtual, to reason about its actions, choose them, organize them purposefully, and act deliberately to achieve an objective. Useful for students, practitioners, and researchers, this book covers state-of-the-art planning techniques, acting techniques, and their integration which will allow readers to design intelligent systems that are able to act effectively in the real world.

Automated Planning and Acting

Decentralized control of cooperative systems captures the operation of a group of decision-makers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general problem has been shown to be NEXP-complete. In this paper, we identify classes of decentralized control problems whose complexity ranges between NEXP and P. In particular, we study problems characterized by independent transitions, independent observations, and goal-oriented objective functions. Two algorithms are shown to solve optimally useful classes of goal-oriented decentralized processes in polynomial time. This paper also studies information sharing among the decision-makers, which can improve their performance. We distinguish between three ways in which agents can exchange information: indirect communication, direct communication and sharing state features that are not controlled by the agents. Our analysis shows that for every class of problems we consider, introducing direct or indirect communication does not change the worst-case complexity. The results provide a better understanding of the complexity of decentralized control problems that arise in practice and facilitate the development of planning algorithms for these problems.

/pdf/decentralized-control-of-cooperative-systems-categorization-323p6pd20w.pdf

Decentralized control of cooperative systems: categorization and complexity analysis

We describe a plnning algorithm that integrates two approaches to solving Markov decision processes with large state spaces. State abstraction is used to avoid evaluating states individually. Forward search from a start state, guided by an admissible heuristic, is used to avoid evaluating all states. We combine these two approaches in a novel way that exploits symbolic model-checking techniques and demonstrates their usefulness for decision-theoretic planning.

/pdf/symbolic-heuristic-search-for-factored-markov-decision-19zxy54roa.pdf

Symbolic heuristic search for factored Markov decision processes

We describe an approach for exploiting structure in Markov Decision Processes with continuous state variables. At each step of the dynamic programming, the state space is dynamically partitioned into regions where the value function is the same throughout the region. We first describe the algorithm for piecewise constant representations. We then extend it to piecewise linear representations, using techniques from POMDPs to represent and reason about linear surfaces efficiently. We show that for complex, structured problems, our approach exploits the natural structure so that optimal solutions can be computed efficiently.

/pdf/dynamic-programming-for-structured-continuous-markov-52hr2kpsvb.pdf

Dynamic programming for structured continuous Markov decision problems

Dynamic Programming for Structured Continuous Markov Decision Problems

In a peer-to-peer file-sharing system, a client desiring a particular file must choose a source from which to download. The problem of selecting a good data source is difficult because some peers may not be encountered more than once, and many peers are on low-bandwidth connections. Despite these facts, information obtained about peers just prior to the download can help guide peer selection. A client can gain additional time savings by aborting bad download attempts until an acceptable peer is discovered. We denote as peer selection the entire process of switching among peers and finally settling on one. Our main contribution is to use the methodology of machine learning for the construction of good peer selection strategies from past experience. Decision tree learning is used for rating peers based on low-cost information, and Markov decision processes are used for deriving a policy for switching among peers. Preliminary results with the Gnutella network demonstrate the promise of this approach.

Adaptive peer selection

We present a major improvement to the incremental pruning algorithm for solving partially observable Markov decision processes. Our technique targets the cross-sum step of the dynamic programming (DP) update, a key source of complexity in POMDP algorithms. Instead of reasoning about the whole belief space when pruning the cross-sums, our algorithm divides the belief space into smaller regions and performs independent pruning in each region. We evaluate the benefits of the new technique both analytically and experimentally, and show that it produces very significant performance gains. The results contribute to the scalability of POMDP algorithms to domains that cannot be handled by the best existing techniques.

/pdf/region-based-incremental-pruning-for-pomdps-2gxn1eko31.pdf

Zhengzhu Feng

Papers

Symbolic heuristic search for factored Markov decision processes

Dynamic programming for structured continuous Markov decision problems

Dynamic Programming for Structured Continuous Markov Decision Problems

Adaptive peer selection

Region-based incremental pruning for POMDPs