scispace - formally typeset
Open AccessPosted Content

Measuring the Reliability of Reinforcement Learning Algorithms

TLDR
A novel set of metrics that quantitatively measure different aspects of reliability are proposed, designed to be general-purpose and designed complementary statistical tests to enable rigorous comparisons on these metrics.
Abstract
Lack of reliability is a well-known issue for reinforcement learning (RL) algorithms. This problem has gained increasing attention in recent years, and efforts to improve it have grown substantially. To aid RL researchers and production users with the evaluation and improvement of reliability, we propose a set of metrics that quantitatively measure different aspects of reliability. In this work, we focus on variability and risk, both during training and after learning (on a fixed policy). We designed these metrics to be general-purpose, and we also designed complementary statistical tests to enable rigorous comparisons on these metrics. In this paper, we first describe the desired properties of the metrics and their design, the aspects of reliability that they measure, and their applicability to different scenarios. We then describe the statistical tests and make additional practical recommendations for reporting results. The metrics and accompanying statistical tools have been made available as an open-source library at this https URL. We apply our metrics to a set of common RL algorithms and environments, compare them, and analyze the results.

read more

Citations
More filters
Posted Content

CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning

TL;DR: CausalWorld is proposed, a benchmark for causal structure and transfer learning in a robotic manipulation environment that is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer.
Journal ArticleDOI

A Review of Recent Deep Learning Approaches in Human-Centered Machine Learning.

TL;DR: In this article, the authors present an overview and analysis of existing work in Human-Centered Machine Learning (HCML) related to DL, and identify the topology of the HCML landscape by identifying research gaps, highlighting conflicting interpretations, addressing current challenges and presenting future HCML research opportunities.
Proceedings Article

CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning

TL;DR: CausalWorld as discussed by the authors is a benchmark for causal structure and transfer learning in a robotic manipulation environment, where the user can intervene on all causal variables, which allows for fine-grained control over how similar different tasks (or task distributions) are.
Posted Content

Correlation-aware Cooperative Multigroup Broadcast 360{\deg} Video Delivery Network: A Hierarchical Deep Reinforcement Learning Approach

TL;DR: This work introduces the conventional non-learning-based scheduling and association algorithms, and develops a hierarchical federated DRL algorithm with scheduler as meta-controller, and association as the controller that can effectively handle real-time video transmission from UAVs to VR users.
Posted Content

Stabilizing Deep Reinforcement Learning with Conservative Updates

TL;DR: Experiments show that the proposed method reduces the variance of the process and improves the overall performance in off-policy actor-critic deep reinforcement learning regimes.
References
More filters
Journal ArticleDOI

Human-level control through deep reinforcement learning

TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Journal ArticleDOI

Time Series Analysis.

Posted Content

Proximal Policy Optimization Algorithms

TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.
Journal ArticleDOI

Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy

TL;DR: The bootstrap is extended to other measures of statistical accuracy such as bias and prediction error, and to complicated data structures such as time series, censored data, and regression models.
Proceedings Article

Policy Gradient Methods for Reinforcement Learning with Function Approximation

TL;DR: This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.
Related Papers (5)