Measuring the Reliability of Reinforcement Learning Algorithms

Open AccessPosted Content

Measuring the Reliability of Reinforcement Learning Algorithms

- 10 Dec 2019 -

TLDR

A novel set of metrics that quantitatively measure different aspects of reliability are proposed, designed to be general-purpose and designed complementary statistical tests to enable rigorous comparisons on these metrics.

Abstract:

Lack of reliability is a well-known issue for reinforcement learning (RL) algorithms. This problem has gained increasing attention in recent years, and efforts to improve it have grown substantially. To aid RL researchers and production users with the evaluation and improvement of reliability, we propose a set of metrics that quantitatively measure different aspects of reliability. In this work, we focus on variability and risk, both during training and after learning (on a fixed policy). We designed these metrics to be general-purpose, and we also designed complementary statistical tests to enable rigorous comparisons on these metrics. In this paper, we first describe the desired properties of the metrics and their design, the aspects of reliability that they measure, and their applicability to different scenarios. We then describe the statistical tests and make additional practical recommendations for reporting results. The metrics and accompanying statistical tools have been made available as an open-source library at this https URL. We apply our metrics to a set of common RL algorithms and environments, compare them, and analyze the results.

Citations

PDF

Open Access

More filters

Posted Content

CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning

Ossama Ahmed, +7 more

- 08 Oct 2020 -

arXiv: Robotics

TL;DR: CausalWorld is proposed, a benchmark for causal structure and transfer learning in a robotic manipulation environment that is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer.

...read moreread less

Journal ArticleDOI

A Review of Recent Deep Learning Approaches in Human-Centered Machine Learning.

Tharindu Kaluarachchi, +2 more

- 03 Apr 2021 -

Sensors

TL;DR: In this article, the authors present an overview and analysis of existing work in Human-Centered Machine Learning (HCML) related to DL, and identify the topology of the HCML landscape by identifying research gaps, highlighting conflicting interpretations, addressing current challenges and presenting future HCML research opportunities.

...read moreread less

Proceedings Article

CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning

Ossama Ahmed, +7 more

TL;DR: CausalWorld as discussed by the authors is a benchmark for causal structure and transfer learning in a robotic manipulation environment, where the user can intervene on all causal variables, which allows for fine-grained control over how similar different tasks (or task distributions) are.

...read moreread less

Posted Content

Correlation-aware Cooperative Multigroup Broadcast 360{\deg} Video Delivery Network: A Hierarchical Deep Reinforcement Learning Approach

Fenghe Hu, +2 more

- 21 Oct 2020 -

arXiv: Signal Processing

TL;DR: This work introduces the conventional non-learning-based scheduling and association algorithms, and develops a hierarchical federated DRL algorithm with scheduler as meta-controller, and association as the controller that can effectively handle real-time video transmission from UAVs to VR users.

...read moreread less

Posted Content

Stabilizing Deep Reinforcement Learning with Conservative Updates

Chen Tessler, +2 more

- 02 Oct 2019 -

arXiv: Learning

TL;DR: Experiments show that the proposed method reduces the variance of the process and improves the overall performance in off-policy actor-critic deep reinforcement learning regimes.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Time Series Analysis.

Jorg Breitung, +1 more

- 01 Mar 1995 -

Contemporary Sociology

Posted Content

Proximal Policy Optimization Algorithms

John Schulman, +4 more

- 20 Jul 2017 -

arXiv: Learning

TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.

...read moreread less

Journal ArticleDOI

Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy

Bradley Efron, +1 more

- 01 Feb 1986 -

Statistical Science

TL;DR: The bootstrap is extended to other measures of statistical accuracy such as bias and prediction error, and to complicated data structures such as time series, censored data, and regression models.

...read moreread less

Proceedings Article

Policy Gradient Methods for Reinforcement Learning with Function Approximation

Richard S. Sutton, +3 more

TL;DR: This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.

...read moreread less