Open AccessPosted Content
Unsolved Problems in ML Safety.
Reads0
Chats0
TLDR
In this article, the authors provide a new roadmap for ML Safety and refine the technical problems that the field needs to address, namely withstanding hazards, identifying hazards, steering ML systems, and reducing risks to how ML systems are handled.Abstract:
Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. We present four problems ready for research, namely withstanding hazards ("Robustness"), identifying hazards ("Monitoring"), steering ML systems ("Alignment"), and reducing risks to how ML systems are handled ("External Safety"). Throughout, we clarify each problem's motivation and provide concrete research directions.read more
Citations
More filters
Posted Content
On the Opportunities and Risks of Foundation Models.
Rishi Bommasani,Drew A. Hudson,Ehsan Adeli,Russ B. Altman,Simran Arora,Sydney von Arx,Michael S. Bernstein,Jeannette Bohg,Antoine Bosselut,Emma Brunskill,Erik Brynjolfsson,Shyamal Buch,Dallas Card,Rodrigo Castellon,Niladri S. Chatterji,Annie Chen,Kathleen Creel,Jared Davis,Dora Demszky,Chris Donahue,Moussa Doumbouya,Esin Durmus,Stefano Ermon,John Etchemendy,Kawin Ethayarajh,Li Fei-Fei,Chelsea Finn,Trevor Gale,Lauren Gillespie,Karan Goel,Noah D. Goodman,Shelby Grossman,Neel Guha,Tatsunori Hashimoto,Peter Henderson,John Hewitt,Daniel E. Ho,Jenny Hong,Kyle Hsu,Jing Huang,Thomas Icard,Saahil Jain,Dan Jurafsky,Pratyusha Kalluri,Siddharth Karamcheti,Geoff Keeling,Fereshte Khani,Omar Khattab,Pang Wei Koh,Mark Krass,Ranjay Krishna,Rohith Kuditipudi,Ananya Kumar,Faisal Ladhak,Mina Lee,Tony Lee,Jure Leskovec,Isabelle Levent,Xiang Lisa Li,Xuechen Li,Tengyu Ma,Ali Ahmad Malik,Christopher D. Manning,Suvir Mirchandani,Eric Mitchell,Zanele Munyikwa,Suraj Nair,Avanika Narayan,Deepak Narayanan,Ben Newman,Allen Nie,Juan Carlos Niebles,Hamed Nilforoshan,Julian Nyarko,Giray Ogut,Laurel Orr,Isabel Papadimitriou,Joon Sung Park,Chris Piech,Eva Portelance,Christopher Potts,Aditi Raghunathan,Rob Reich,Hongyu Ren,Frieda Rong,Yusuf H. Roohani,Camilo Ruiz,Jack Ryan,Christopher Ré,Dorsa Sadigh,Shiori Sagawa,Keshav Santhanam,Andy Shih,Krishnan Srinivasan,Alex Tamkin,Rohan Taori,Armin W. Thomas,Florian Tramèr,Rose E. Wang,William Yang Wang,Bohan Wu,Jiajun Wu,Yuhuai Wu,Sang Michael Xie,Michihiro Yasunaga,Jiaxuan You,Matei Zaharia,Michael Zhang,Tianyi Zhang,Xikun Zhang,Yuhui Zhang,Lucia Zheng,Kaitlyn Zhou,Percy Liang +113 more
TL;DR: The authors provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e. g.g. model architectures, training procedures, data, systems, security, evaluation, theory) to their applications.
Posted Content
How to Certify Machine Learning Based Safety-critical Systems? A Systematic Literature Review
Florian Tambon,Gabriel Laberge,Le An,Amin Nikanjam,Paulina Stevia Nouwou Mindom,Yann Pequignot,Foutse Khomh,Giulio Antoniol,Ettore Merlo,François Laviolette +9 more
TL;DR: In this paper, the authors conducted a systematic literature review (SLR) of research papers published between 2015 to 2020, covering topics related to the certification of ML systems, and identified 217 papers covering topics considered to be the main pillars of ML certification: Robustness, Uncertainty, Explainability, Verification, Safe Reinforcement Learning, and Direct Certification.
Posted Content
Certified Adversarial Defenses Meet Out-of-Distribution Corruptions: Benchmarking Robustness and Simple Baselines.
Jiachen Sun,Akshay Mehra,Bhavya Kailkhura,Pin-Yu Chen,Dan Hendrycks,Jihun Hamm,Z. Morley Mao +6 more
TL;DR: In this article, FourierMix is proposed to improve the spectral coverage of the training data by introducing a new regularizer that encourages consistent predictions on noise perturbations of the augmented data.
Posted Content
A Unified Survey on Anomaly, Novelty, Open-Set, and Out-of-Distribution Detection: Solutions and Future Challenges.
Mohammadreza Salehi,Hossein Mirzaei,Dan Hendrycks,Yixuan Li,Mohammad Hossein Rohban,Mohammad Sabokrou +5 more
TL;DR: In this paper, the authors provide a cross-domain and comprehensive review of numerous eminent works in respective areas while identifying their commonalities, and discuss and shed light on future lines of research, intending to bring these fields closer together.
Posted Content
A General Language Assistant as a Laboratory for Alignment
Amanda Askell,Yuntao Bai,Anna Chen,Dawn Drain,Deep Ganguli,Thomas Henighan,Andy Jones,Nicholas Joseph,Ben Mann,Nova DasSarma,Nelson Elhage,Zac Hatfield-Dodds,Danny Hernandez,Jackson Kernion,Kamal Ndousse,Catherine Olsson,Dario Amodei,Tom B. Brown,Jack Clark,Samuel McCandlish,Chris Olah,Jared Kaplan +21 more
TL;DR: The authors investigate scaling trends for several training objectives relevant to alignment, comparing imitation learning, binary discrimination, and ranked preference modeling, and find that ranked preference modelling performs much better than imitation learning and often scales more favorably with model size.
References
More filters
Journal ArticleDOI
Theory of the firm: Managerial behavior, agency costs and ownership structure
TL;DR: In this article, the authors draw on recent progress in the theory of property rights, agency, and finance to develop a theory of ownership structure for the firm, which casts new light on and has implications for a variety of issues in the professional and popular literature.
Book
A Treatise of Human Nature
TL;DR: Hume's early years and education is described in a treatise of human nature as discussed by the authors. But it is not a complete account of the early years of his life and education.
Proceedings ArticleDOI
Towards Evaluating the Robustness of Neural Networks
Nicholas Carlini,David Wagner +1 more
TL;DR: In this paper, the authors demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with 100% probability.
Book
The Black Swan: The Impact of the Highly Improbable
TL;DR: The Black Swan: The Impact of the Highly Improbable as mentioned in this paper is a book about Black Swans: the random events that underlie our lives, from bestsellers to world disasters, that are impossible to predict; yet after they happen we always try to rationalize them.
Proceedings Article
Towards Deep Learning Models Resistant to Adversarial Attacks.
TL;DR: This article studied the adversarial robustness of neural networks through the lens of robust optimization and identified methods for both training and attacking neural networks that are reliable and, in a certain sense, universal.