scispace - formally typeset
Open AccessPosted Content

Unsolved Problems in ML Safety.

Reads0
Chats0
TLDR
In this article, the authors provide a new roadmap for ML Safety and refine the technical problems that the field needs to address, namely withstanding hazards, identifying hazards, steering ML systems, and reducing risks to how ML systems are handled.
Abstract
Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. We present four problems ready for research, namely withstanding hazards ("Robustness"), identifying hazards ("Monitoring"), steering ML systems ("Alignment"), and reducing risks to how ML systems are handled ("External Safety"). Throughout, we clarify each problem's motivation and provide concrete research directions.

read more

Citations
More filters
Posted Content

On the Opportunities and Risks of Foundation Models.

Rishi Bommasani, +113 more
- 16 Aug 2021 - 
TL;DR: The authors provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e. g.g. model architectures, training procedures, data, systems, security, evaluation, theory) to their applications.
Posted Content

How to Certify Machine Learning Based Safety-critical Systems? A Systematic Literature Review

TL;DR: In this paper, the authors conducted a systematic literature review (SLR) of research papers published between 2015 to 2020, covering topics related to the certification of ML systems, and identified 217 papers covering topics considered to be the main pillars of ML certification: Robustness, Uncertainty, Explainability, Verification, Safe Reinforcement Learning, and Direct Certification.
Posted Content

Certified Adversarial Defenses Meet Out-of-Distribution Corruptions: Benchmarking Robustness and Simple Baselines.

TL;DR: In this article, FourierMix is proposed to improve the spectral coverage of the training data by introducing a new regularizer that encourages consistent predictions on noise perturbations of the augmented data.
Posted Content

A Unified Survey on Anomaly, Novelty, Open-Set, and Out-of-Distribution Detection: Solutions and Future Challenges.

TL;DR: In this paper, the authors provide a cross-domain and comprehensive review of numerous eminent works in respective areas while identifying their commonalities, and discuss and shed light on future lines of research, intending to bring these fields closer together.
Posted Content

A General Language Assistant as a Laboratory for Alignment

TL;DR: The authors investigate scaling trends for several training objectives relevant to alignment, comparing imitation learning, binary discrimination, and ranked preference modeling, and find that ranked preference modelling performs much better than imitation learning and often scales more favorably with model size.
References
More filters
Journal ArticleDOI

Theory of the firm: Managerial behavior, agency costs and ownership structure

TL;DR: In this article, the authors draw on recent progress in the theory of property rights, agency, and finance to develop a theory of ownership structure for the firm, which casts new light on and has implications for a variety of issues in the professional and popular literature.
Book

A Treatise of Human Nature

David Hume
TL;DR: Hume's early years and education is described in a treatise of human nature as discussed by the authors. But it is not a complete account of the early years of his life and education.
Proceedings ArticleDOI

Towards Evaluating the Robustness of Neural Networks

TL;DR: In this paper, the authors demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with 100% probability.
Book

The Black Swan: The Impact of the Highly Improbable

TL;DR: The Black Swan: The Impact of the Highly Improbable as mentioned in this paper is a book about Black Swans: the random events that underlie our lives, from bestsellers to world disasters, that are impossible to predict; yet after they happen we always try to rationalize them.
Proceedings Article

Towards Deep Learning Models Resistant to Adversarial Attacks.

TL;DR: This article studied the adversarial robustness of neural networks through the lens of robust optimization and identified methods for both training and attacking neural networks that are reliable and, in a certain sense, universal.
Related Papers (5)