Concrete Problems in AI Safety

Open AccessPosted Content

Concrete Problems in AI Safety

Dario Amodei, +5 more

- 21 Jun 2016 -

arXiv: Artificial Intelligence

Chats0

TLDR

A list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function, an objective function that is too expensive to evaluate frequently, or undesirable behavior during the learning process, are presented.

Abstract:

Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper we discuss one such potential impact: the problem of accidents in machine learning systems, defined as unintended and harmful behavior that may emerge from poor design of real-world AI systems. We present a list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function ("avoiding side effects" and "avoiding reward hacking"), an objective function that is too expensive to evaluate frequently ("scalable supervision"), or undesirable behavior during the learning process ("safe exploration" and "distributional shift"). We review previous work in these areas as well as suggesting research directions with a focus on relevance to cutting-edge AI systems. Finally, we consider the high-level question of how to think most productively about the safety of forward-looking applications of AI.

Citations

PDF

Open Access

More filters

Posted Content

Towards A Rigorous Science of Interpretable Machine Learning

Finale Doshi-Velez, +1 more

- 28 Feb 2017 -

arXiv: Machine Learning

TL;DR: This position paper defines interpretability and describes when interpretability is needed (and when it is not), and suggests a taxonomy for rigorous evaluation and exposes open questions towards a more rigorous science of interpretable machine learning.

...read moreread less

Book

Neural Networks and Deep Learning

Charu C. Aggarwal

Journal ArticleDOI

Opportunities and obstacles for deep learning in biology and medicine.

Travers Ching, +38 more

- 01 Apr 2018 -

Journal of the Royal Society Interface

TL;DR: It is found that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art.

...read moreread less

Proceedings Article

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

Dan Hendrycks, +1 more

TL;DR: A simple baseline that utilizes probabilities from softmax distributions is presented, showing the effectiveness of this baseline across all computer vision, natural language processing, and automatic speech recognition, and it is shown the baseline can sometimes be surpassed.

...read moreread less

Posted Content

Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks

Shiyu Liang, +2 more

- 08 Jun 2017 -

arXiv: Learning

TL;DR: The proposed ODIN method, based on the observation that using temperature scaling and adding small perturbations to the input can separate the softmax score distributions between in- and out-of-distribution images, allowing for more effective detection, consistently outperforms the baseline approach by a large margin.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Journal ArticleDOI

Generative Adversarial Nets

Ian Goodfellow, +7 more

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

Collapse

Concrete Problems in AI Safety

Citations

Towards A Rigorous Science of Interpretable Machine Learning

Neural Networks and Deep Learning

Opportunities and obstacles for deep learning in biology and medicine.

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks

References

ImageNet Classification with Deep Convolutional Neural Networks

Generative Adversarial Nets

Reinforcement Learning: An Introduction

Human-level control through deep reinforcement learning

Mastering the game of Go with deep neural networks and tree search

Related Papers (5)

Human-level control through deep reinforcement learning

Reinforcement Learning: An Introduction

Deep Residual Learning for Image Recognition

Intriguing properties of neural networks

Adam: A Method for Stochastic Optimization

Trending Questions (3)