Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

/pdf/reinforcement-learning-an-introduction-rzxgej9p17.pdf

Reinforcement Learning: An Introduction

The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

/pdf/human-level-control-through-deep-reinforcement-learning-40h0gx9pjx.pdf

Human-level control through deep reinforcement learning

The architecture and learning procedure underlying ANFIS (adaptive-network-based fuzzy inference system) is presented, which is a fuzzy inference system implemented in the framework of adaptive networks. By using a hybrid learning procedure, the proposed ANFIS can construct an input-output mapping based on both human knowledge (in the form of fuzzy if-then rules) and stipulated input-output data pairs. In the simulation, the ANFIS architecture is employed to model nonlinear functions, identify nonlinear components on-line in a control system, and predict a chaotic time series, all yielding remarkable results. Comparisons with artificial neural networks and earlier work on fuzzy modeling are listed and discussed. Other extensions of the proposed ANFIS and promising applications to automatic control and signal processing are also suggested. >

ANFIS: adaptive-network-based fuzzy inference system

https://www2.econ.iastate.edu/tesfatsi/DeepLearningInNeuralNetworksOverview.JSchmidhuber2015.pdf

Deep learning in neural networks

Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

Machine learning

This paper surveys locally weighted learning, a form of lazy learning and memory-based learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, assessing predictions, handling noisy data and outliers, improving the quality of predictions by tuning fit parameters, interference between old and new data, implementing locally weighted learning efficiently, and applications of locally weighted learning. A companion paper surveys how locally weighted learning can be used in robot learning and control.

/pdf/locally-weighted-learning-4parann0z5.pdf

Locally Weighted Learning

Future computing environments will free the user from the constraints of the desktop. Applications for a mobile environment should take advantage of contextual information, such as position, to offer greater services to the user. In this paper, we present the Cyberguide project, in which we are building prototypes of a mobile context-aware tour guide. Knowledge of the user's current location, as well as a history of past locations, are used to provide more of the kind of services that we come to expect from a real tour guide. We describe the architecture and features of a variety of Cyberguide prototypes developed for indoor and outdoor use on a number of different hand-held platforms. We also discuss the general research issues that have emerged in our context-aware applications development in a mobile environment.

https://condor.depaul.edu/dmumaugh/readings/handouts/SE430/p421-abowd-97.pdf

Cyberguide: a mobile context-aware tour guide

We are building a home, called the Aware Home, to create a living laboratory for research in ubiquitous computing for everyday activities. This paper introduces the Aware Home project and outlines some of our technology-and human-centered research objectives in creating the Aware Home.

/pdf/the-aware-home-a-living-laboratory-for-ubiquitous-computing-1xynllzqf1.pdf

The Aware Home: A Living Laboratory for Ubiquitous Computing Research

Unrestrained human arm trajectories between point targets have been investigated using a three-dimensional tracking apparatus, the Selspot system. Movements were executed between different points in a vertical plane under varying conditions of speed and hand-held load. In contrast to past results which emphasized the straightness of hand paths, movement regions were discovered in which the hand paths were curved. All movements, whether curved or straight, showed an invariant tangential velocity profile when normalized for speed and distance. The velocity profile invariance with speed and load is interpreted in terms of simplification of the underlying arm dynamics, extending the results of Hollerbach and Flash (Hollerbach, J. M., and T. Flash (1982) Biol. Cybern. 44: 67-77).

https://www.jneurosci.org/content/jneuro/5/9/2318.full.pdf

Kinematic features of unrestrained vertical arm movements

We present a new algorithm, prioritized sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as temporal differencing and Q-learning have real-time performance. Classical methods are slower, but more accurate, because they make full use of the observations. Prioritized sweeping aims for the best of both worlds. It uses all previous experiences both to prioritize important dynamic programming sweeps and to guide the exploration of state-space. We compare prioritized sweeping with other reinforcement learning schemes for a number of different stochastic optimal control problems. It successfully solves large state-space real-time problems with which other methods have difficulty.

https://link.springer.com/content/pdf/10.1007%2FBF00993104.pdf

Christopher G. Atkeson

Papers

Locally Weighted Learning

Cyberguide: a mobile context-aware tour guide

The Aware Home: A Living Laboratory for Ubiquitous Computing Research

Kinematic features of unrestrained vertical arm movements

Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time