Recent years have seen a boom in interest in machine learning systems that can provide a human-understandable rationale for their predictions or decisions. However, exactly what kinds of explanation are truly human-interpretable remains poorly understood. This work advances our understanding of what makes explanations interpretable in the specific context of verification. Suppose we have a machine learning system that predicts X, and we provide rationale for this prediction X. Given an input, an explanation, and an output, is the output consistent with the input and the supposed rationale? Via a series of user-studies, we identify what kinds of increases in complexity have the greatest effect on the time it takes for humans to verify the rationale, and which seem relatively insensitive.

How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation.

Bayesian methods for machine learning have been widely investigated,yielding principled methods for incorporating prior information intoinference algorithms. In this survey, we provide an in-depth reviewof the role of Bayesian methods for the reinforcement learning RLparadigm. The major incentives for incorporating Bayesian reasoningin RL are: 1 it provides an elegant approach to action-selection exploration/exploitation as a function of the uncertainty in learning; and2 it provides a machinery to incorporate prior knowledge into the algorithms.We first discuss models and methods for Bayesian inferencein the simple single-step Bandit model. We then review the extensiverecent literature on Bayesian methods for model-based RL, where priorinformation can be expressed on the parameters of the Markov model.We also present Bayesian methods for model-free RL, where priors areexpressed over the value function or policy class. The objective of thepaper is to provide a comprehensive survey on Bayesian RL algorithmsand their theoretical and empirical properties.

/pdf/bayesian-reinforcement-learning-a-survey-1dx4j7cfpc.pdf

Bayesian Reinforcement Learning: A Survey

This paper highlights the role that reinforcement learning can play in the optimization of treatment policies for chronic illnesses. Before applying any off-the-shelf reinforcement learning methods in this setting, we must first tackle a number of challenges. We outline some of these challenges and present methods for overcoming them. First, we describe a multiple imputation approach to overcome the problem of missing data. Second, we discuss the use of function approximation in the context of a highly variable observation set. Finally, we discuss approaches to summarizing the evidence in the data for recommending a particular action and quantifying the uncertainty around the Q-function of the recommended policy. We present the results of applying these methods to real clinical trial data of patients with schizophrenia.

/pdf/informing-sequential-clinical-decision-making-through-3fakr3vzxo.pdf

Informing sequential clinical decision-making through reinforcement learning: an empirical study

Formal exploration approaches in model-based reinforcement learning estimate the accuracy of the currently learned model without consideration of the empirical prediction error. For example, PAC-MDP approaches such as R-MAX base their model certainty on the amount of collected data, while Bayesian approaches assume a prior over the transition dynamics. We propose extensions to such approaches which drive exploration solely based on empirical estimates of the learner's accuracy and learning progress. We provide a "sanity check" theoretical analysis, discussing the behavior of our extensions in the standard stationary finite state-action case. We then provide experimental studies demonstrating the robustness of these exploration measures in cases of non-stationary environments or where original approaches are misled by wrong domain assumptions.

/pdf/exploration-in-model-based-reinforcement-learning-by-2pessx9os8.pdf

Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

Model-free reinforcement learning algorithms such as Q-learning perform poorly in the early stages of learning in noisy environments, because much effort is spent on unlearning biased estimates of the state-action function. The bias comes from selecting, among several noisy estimates, the apparent optimum, which may actually be suboptimal. We propose G-learning, a new off-policy learning algorithm that regularizes the noise in the space of optimal actions by penalizing deterministic policies at the beginning of the learning. Moreover, it enables naturally incorporating prior distributions over optimal actions when available. The stochastic nature of G-learning also makes it more cost-effective than Q-learning in noiseless but exploration-risky domains. We illustrate these ideas in several examples where G-learning results in significant improvements of the learning rate and the learning cost.

G-Learning: Taming the Noise in Reinforcement Learning via Soft Updates.

For many machine learning problems, there are some inputs that are known to be positively (or negatively) related to the output, and in such cases training the model to respect that monotonic relationship can provide regularization, and makes the model more interpretable. However, flexible monotonic functions are computationally challenging to learn beyond a few features. We break through this barrier by learning ensembles of monotonic calibrated interpolated look-up tables (lattices). A key contribution is an automated algorithm for selecting feature subsets for the ensemble base models. We demonstrate that compared to random forests, these ensembles produce similar or better accuracy, while providing guaranteed monotonicity consistent with prior knowledge, smaller model size and faster evaluation.

https://proceedings.neurips.cc/paper/2016/file/c913303f392ffc643f7240b180602652-Paper.pdf

Fast and Flexible Monotonic Functions with Ensembles of Lattices

Efficiently learning accurate models of dynamical systems is of central importance for developing rational agents that can succeed in a wide range of challenging domains. The difficulty of this learning problem is particularly acute in settings with large observation spaces and partial observability. We present a new algorithm, called Compressed Predictive State Representation (CPSR), for learning models of high-dimensional partially observable uncontrolled dynamical systems from small sample sets. The algorithm exploits a particular sparse structure present in many domains. This sparse structure is used to compress information during learning, allowing for an increase in both the efficiency and predictive power. The compression technique also relieves the burden of domain specific feature selection. We present empirical results showing that the algorithm is able to build accurate models more efficiently than its uncompressed counterparts, and we provide theoretical results on the accuracy of the learned compressed model.

/pdf/modelling-sparse-dynamical-systems-with-compressed-4rkp8raurh.pdf

Modelling Sparse Dynamical Systems with Compressed Predictive State Representations

Predictive state representations (PSRs) offer an expressive framework for modelling partially observable systems. By compactly representing systems as functions of observable quantities, the PSR learning approach avoids using local-minima prone expectation-maximization and instead employs a globally optimal moment-based algorithm. Moreover, since PSRs do not require a predetermined latent state structure as an input, they offer an attractive framework for model-based reinforcement learning when agents must plan without a priori access to a system model. Unfortunately, the expressiveness of PSRs comes with significant computational cost, and this cost is a major factor inhibiting the use of PSRs in applications. In order to alleviate this shortcoming, we introduce the notion of compressed PSRs (CPSRs). The CPSR learning approach combines recent advancements in dimensionality reduction, incremental matrix decomposition, and compressed sensing. We show how this approach provides a principled avenue for learning accurate approximations of PSRs, drastically reducing the computational costs associated with learning while also providing effective regularization. Going further, we propose a planning framework which exploits these learned models. And we show that this approach facilitates model-learning and planning in large complex partially observable domains, a task that is infeasible without the principled use of compression.

/pdf/efficient-learning-and-planning-with-compressed-predictive-uyy47xt2ov.pdf

Efficient learning and planning with compressed predictive states

This paper introduces the first set of PAC-Bayesian bounds for the batch reinforcement learning problem in finite state spaces. These bounds hold regardless of the correctness of the prior distribution. We demonstrate how such bounds can be used for model-selection in control problems where prior information is available either on the dynamics of the environment, or on the value of actions. Our empirical results confirm that PAC-Bayesian model-selection is able to leverage prior distributions when they are informative and, unlike standard Bayesian RL approaches, ignores them when they are misleading.

/pdf/pac-bayesian-model-selection-for-reinforcement-learning-1i2ra2sny8.pdf

PAC-Bayesian Model Selection for Reinforcement Learning

Recent advances in the area of compressed sensing suggest that it is possible to reconstruct high-dimensional sparse signals from a small number of random projections Domains in which the sparsity assumption is applicable also offer many interesting large-scale machine learning prediction tasks It is therefore important to study the effect of random projections as a dimensionality reduction method under such sparsity assumptions In this paper we develop the bias-variance analysis of a least-squares regression estimator in compressed spaces when random projections are applied on sparse input signals Leveraging the sparsity assumption, we are able to work with arbitrary non iid sampling strategies and derive a worst-case bound on the entire space Empirical results on synthetic and real-world datasets shows how the choice of the projection size affects the performance of regression on compressed spaces, and highlights a range of problems where the method is useful

/pdf/compressed-least-squares-regression-on-sparse-spaces-3u24es20fm.pdf

Mahdi Milani Fard

Papers

Fast and Flexible Monotonic Functions with Ensembles of Lattices

Modelling Sparse Dynamical Systems with Compressed Predictive State Representations

Efficient learning and planning with compressed predictive states

PAC-Bayesian Model Selection for Reinforcement Learning

Compressed least-squares regression on sparse spaces