Open AccessPosted Content
Concrete Problems in AI Safety
Reads0
Chats0
TLDR
A list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function, an objective function that is too expensive to evaluate frequently, or undesirable behavior during the learning process, are presented.Abstract:
Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper we discuss one such potential impact: the problem of accidents in machine learning systems, defined as unintended and harmful behavior that may emerge from poor design of real-world AI systems. We present a list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function ("avoiding side effects" and "avoiding reward hacking"), an objective function that is too expensive to evaluate frequently ("scalable supervision"), or undesirable behavior during the learning process ("safe exploration" and "distributional shift"). We review previous work in these areas as well as suggesting research directions with a focus on relevance to cutting-edge AI systems. Finally, we consider the high-level question of how to think most productively about the safety of forward-looking applications of AI.read more
Citations
More filters
Posted Content
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez,Been Kim +1 more
TL;DR: This position paper defines interpretability and describes when interpretability is needed (and when it is not), and suggests a taxonomy for rigorous evaluation and exposes open questions towards a more rigorous science of interpretable machine learning.
Journal ArticleDOI
Opportunities and obstacles for deep learning in biology and medicine.
Travers Ching,Daniel Himmelstein,Brett K. Beaulieu-Jones,Alexandr A. Kalinin,Brian T. Do,Gregory P. Way,Enrico Ferrero,Paul-Michael Agapow,Michael Zietz,Michael M. Hoffman,Michael M. Hoffman,Wei Xie,Gail L. Rosen,Benjamin J. Lengerich,Johnny Israeli,Jack Lanchantin,Stephen Woloszynek,Anne E. Carpenter,Avanti Shrikumar,Jinbo Xu,Evan M. Cofer,Evan M. Cofer,Christopher A. Lavender,Srinivas C. Turaga,Amr Alexandari,Zhiyong Lu,David J. Harris,Dave DeCaprio,Yanjun Qi,Anshul Kundaje,Yifan Peng,Laura K. Wiley,Marwin H. S. Segler,Simina M. Boca,S. Joshua Swamidass,Austin Huang,Anthony Gitter,Anthony Gitter,Casey S. Greene +38 more
TL;DR: It is found that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art.
Proceedings Article
A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
Dan Hendrycks,Kevin Gimpel +1 more
TL;DR: A simple baseline that utilizes probabilities from softmax distributions is presented, showing the effectiveness of this baseline across all computer vision, natural language processing, and automatic speech recognition, and it is shown the baseline can sometimes be surpassed.
Posted Content
Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks
Shiyu Liang,Yixuan Li,R. Srikant +2 more
TL;DR: The proposed ODIN method, based on the observation that using temperature scaling and adding small perturbations to the input can separate the softmax score distributions between in- and out-of-distribution images, allowing for more effective detection, consistently outperforms the baseline approach by a large margin.
References
More filters
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI
Generative Adversarial Nets
Ian Goodfellow,Jean Pouget-Abadie,Mehdi Mirza,Bing Xu,David Warde-Farley,Sherjil Ozair,Aaron Courville,Yoshua Bengio +7 more
TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Book
Reinforcement Learning: An Introduction
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Journal ArticleDOI
Human-level control through deep reinforcement learning
Volodymyr Mnih,Koray Kavukcuoglu,David Silver,Andrei Rusu,Joel Veness,Marc G. Bellemare,Alex Graves,Martin Riedmiller,Andreas K. Fidjeland,Georg Ostrovski,Stig Petersen,Charles Beattie,Amir Sadik,Ioannis Antonoglou,Helen King,Dharshan Kumaran,Daan Wierstra,Shane Legg,Demis Hassabis +18 more
TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Journal ArticleDOI
Mastering the game of Go with deep neural networks and tree search
David Silver,Aja Huang,Chris J. Maddison,Arthur Guez,Laurent Sifre,George van den Driessche,Julian Schrittwieser,Ioannis Antonoglou,Veda Panneershelvam,Marc Lanctot,Sander Dieleman,Dominik Grewe,John Nham,Nal Kalchbrenner,Ilya Sutskever,Timothy P. Lillicrap,Madeleine Leach,Koray Kavukcuoglu,Thore Graepel,Demis Hassabis +19 more
TL;DR: Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.