Unsolved Problems in ML Safety.

Open AccessPosted Content

Unsolved Problems in ML Safety.

Dan Hendrycks, +3 more

- 28 Sep 2021 -

arXiv: Learning

Chats0

TLDR

In this article, the authors provide a new roadmap for ML Safety and refine the technical problems that the field needs to address, namely withstanding hazards, identifying hazards, steering ML systems, and reducing risks to how ML systems are handled.

Abstract:

Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. We present four problems ready for research, namely withstanding hazards ("Robustness"), identifying hazards ("Monitoring"), steering ML systems ("Alignment"), and reducing risks to how ML systems are handled ("External Safety"). Throughout, we clarify each problem's motivation and provide concrete research directions.

Citations

PDF

Open Access

More filters

Posted Content

On the Opportunities and Risks of Foundation Models.

Rishi Bommasani, +113 more

- 16 Aug 2021 -

arXiv: Learning

TL;DR: The authors provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e. g.g. model architectures, training procedures, data, systems, security, evaluation, theory) to their applications.

...read moreread less

Posted Content

How to Certify Machine Learning Based Safety-critical Systems? A Systematic Literature Review

Florian Tambon, +9 more

- 26 Jul 2021 -

arXiv: Learning

TL;DR: In this paper, the authors conducted a systematic literature review (SLR) of research papers published between 2015 to 2020, covering topics related to the certification of ML systems, and identified 217 papers covering topics considered to be the main pillars of ML certification: Robustness, Uncertainty, Explainability, Verification, Safe Reinforcement Learning, and Direct Certification.

...read moreread less

Posted Content

Certified Adversarial Defenses Meet Out-of-Distribution Corruptions: Benchmarking Robustness and Simple Baselines.

Jiachen Sun, +6 more

- 01 Dec 2021 -

arXiv: Learning

TL;DR: In this article, FourierMix is proposed to improve the spectral coverage of the training data by introducing a new regularizer that encourages consistent predictions on noise perturbations of the augmented data.

...read moreread less

Posted Content

A Unified Survey on Anomaly, Novelty, Open-Set, and Out-of-Distribution Detection: Solutions and Future Challenges.

Mohammadreza Salehi, +5 more

- 26 Oct 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this paper, the authors provide a cross-domain and comprehensive review of numerous eminent works in respective areas while identifying their commonalities, and discuss and shed light on future lines of research, intending to bring these fields closer together.

...read moreread less

Posted Content

A General Language Assistant as a Laboratory for Alignment

Amanda Askell, +21 more

- 01 Dec 2021 -

arXiv: Computation and Language

TL;DR: The authors investigate scaling trends for several training objectives relevant to alignment, comparing imitation learning, binary discrimination, and ranked preference modeling, and find that ranked preference modelling performs much better than imitation learning and often scales more favorably with model size.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Theory of the firm: Managerial behavior, agency costs and ownership structure

Michael C. Jensen, +1 more

- 01 Oct 1976 -

Journal of Financial Economics

TL;DR: In this article, the authors draw on recent progress in the theory of property rights, agency, and finance to develop a theory of ownership structure for the firm, which casts new light on and has implications for a variety of issues in the professional and popular literature.

...read moreread less

Book

A Treatise of Human Nature

David Hume

TL;DR: Hume's early years and education is described in a treatise of human nature as discussed by the authors. But it is not a complete account of the early years of his life and education.

...read moreread less

Proceedings ArticleDOI

Towards Evaluating the Robustness of Neural Networks

Nicholas Carlini, +1 more

TL;DR: In this paper, the authors demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with 100% probability.

...read moreread less

Book

The Black Swan: The Impact of the Highly Improbable

Nassim Nicholas Taleb

TL;DR: The Black Swan: The Impact of the Highly Improbable as mentioned in this paper is a book about Black Swans: the random events that underlie our lives, from bestsellers to world disasters, that are impossible to predict; yet after they happen we always try to rationalize them.

...read moreread less

Proceedings Article

Towards Deep Learning Models Resistant to Adversarial Attacks.

Aleksander Madry, +4 more

TL;DR: This article studied the adversarial robustness of neural networks through the lens of robust optimization and identified methods for both training and attacking neural networks that are reliable and, in a certain sense, universal.

...read moreread less