Proceedings ArticleDOI
Measuring Forgetting of Memorized Training Examples
Matthew Jagielski,Om Thakkar,Florian Tramèr,Daphne Ippolito,Katherine Lee,Nicholas Carlini,Eric Wallace,Shuang Song,Abhradeep Guha Thakurta,Nicolas Papernot,Chiyuan Zhang +10 more
- Vol. abs/2207.00099
Reads0
Chats0
TLDR
It is shown that, while non-convexity can prevent forgetting from happening in the worst-case, standard image and speech models empirically do forget examples over time, and nondeterminism is identified as a potential explanation, showing that deterministically trained models do not forget.Abstract:
Machine learning models exhibit two seemingly contradictory phenomena: training data memorization, and various forms of forgetting. In memorization, models overfit specific training examples and become susceptible to privacy attacks. In forgetting, examples which appeared early in training are forgotten by the end. In this work, we connect these phenomena. We propose a technique to measure to what extent models"forget"the specifics of training examples, becoming less susceptible to privacy attacks on examples they have not seen recently. We show that, while non-convex models can memorize data forever in the worst-case, standard image, speech, and language models empirically do forget examples over time. We identify nondeterminism as a potential explanation, showing that deterministically trained models do not forget. Our results suggest that examples seen early when training with extremely large datasets - for instance those examples used to pre-train a model - may observe privacy benefits at the expense of examples seen later.read more
Citations
More filters
Journal ArticleDOI
PaLM 2 Technical Report
Rohan Anil,Andrew M. Dai,Orhan Firat,Melvin George Johnson,Dmitry Lepikhin,Alexandre Passos,Siamak Shakeri,Emanuel Taropa,Paige Bailey,Zhi Chen,Eric Chu,Jonathan H. Clark,Laurent El Shafey,Yanping Huang,Kathleen S. Meier-Hellstern,Gaurav Mishra,Erica Oliveira Moreira,Mark Omernick,Kevin Robinson,Sebastian Ruder,Yi Pei. Tay,Kefan Xiao,Yuanzhong Xu,Yujing Zhang,Gustavo Hernandez-Abrego,Junwhan Ahn,Jacob Austin,Paul Barham,Jan A. Botha,James Bradbury,Siddhartha Brahma,Kevin Michael Brooks,M. Catasta,Yongzhou Cheng,Colin Cherry,Christopher A. Choquette-Choo,Aakanksha Chowdhery,C Crepy,Shachi Dave,Mostafa Dehghani,Sunipa Dev,Jacob Devlin,M. D'iaz,Nan Du,Ethan Dyer,Vladimir Feinberg,Fan Feng,Markus Freitag,Xavier Garcia,Sebastian Gehrmann,Guy Gur-Ari,Steven Hand,Hadi Hashemi,Le Hou,Joshua Howland,Anren Hu,Jeffrey Hui,Jeremy Scott Hurwitz,Michael Isard,Abe Ittycheriah,Matthew Jagielski,Wenhao Jia,Kathleen Kenealy,Maxim Krikun,Sneha Kudugunta,Katherine Lee,Benjamin N. Lee,Eric Li,Mu Li-Li,Wei Li,Yaguang Li,Jian Li,Hyeontaek Lim,Han Lin,Zhong-Zhong Liu,Frederick Liu,Marcello Maggioni,Aroma Mahendru,Joshua Maynez,Vedant Misra,Maysam Moussalem,Zachary Nado,John Nham,Eric Ni,Andrew Nystrom,Alicia Parrish,Marie Pellat,Martin Polacek,Alex Polozov,Reiner Pope,Siyuan Qiao,Emily Reif,Parker Riley,Alexandra Ros,Aurko Roy,Brennan Saeta,Rajkumar Samuel,Renee Shelby,Ambrose Jay Slone,Daniel Smilkov,David R. So,Daniela Sohn,Simon Tokumine,Vijay K. Vasudevan,Kiran Vodrahalli,Xuezhi Wang,Pidong Wang,Tao Wang,John Wieting,Yuhuai Wu,Ke Xu,Yu Yu Xu,Lin Wu Xue,Pengcheng Yin,Jia Yu,Biao Zhang,Steven X.F. Zheng,Ce Zheng,Wei Zhou,Denny Zhou,Slav Petrov,Yonghui Wu +121 more
TL;DR: The PaLM 2 model as mentioned in this paper is a Transformer-based model trained using a mixture of objectives, which has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM.
Journal ArticleDOI
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Stella Biderman,Hailey Schoelkopf,Quentin Anthony,Herbie Bradley,Eric Hallahan,Mohammad Aflah Khan,Shivanshu Purohit,Usvsn Sai Prashanth,Edward Raff,Lintang A. Sutawika,Oskar van der Wal +10 more
TL;DR: Pythia as discussed by the authors ) is a suite of 16 language models trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters, with 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataaloaders for further study.
Journal ArticleDOI
Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models
TL;DR: In this article , the authors study image retrieval frameworks that enable comparing generated images with training samples and detect when content has been replicated, and identify cases where diffusion models, including the Stable Diffusion model, blatantly copy from their training data.
Journal ArticleDOI
A Survey of Machine Unlearning
Thanh Tam Nguyen,Thanh Trung Huynh,Phi-Le Nguyen,Alan Wee-Chung Liew,Hongzhi Yin,Quoc Viet Hung Nguyen +5 more
TL;DR: This paper aspires to present a comprehensive examination of machine unlearning’s concepts, scenarios, methods, and applications as a category collection of cutting-edge studies to serve as a comprehensive resource for researchers and practitioners seeking an introduction to machine un learning.
Journal ArticleDOI
Analyzing Leakage of Personally Identifiable Information in Language Models
TL;DR: The authors introduced rigorous game-based definitions for three types of PII leakage via black-box extraction, inference, and reconstruction attacks with only API access to an LMs, and empirically evaluated the attacks against GPT-2 models fine-tuned with and without defenses in three domains: case law, health care, and e-mails.
References
More filters
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Posted Content
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu +8 more
TL;DR: This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
Book ChapterDOI
Catastrophic interference in connectionist networks: the sequential learning problem
Michael McCloskey,Neal J. Cohen +1 more
TL;DR: In this article, the authors discuss the catastrophic interference in connectionist networks and show that new learning may interfere catastrophically with old learning when networks are trained sequentially, and the analysis of the causes of interference implies that at least some interference will occur whenever new learning might alter weights involved in representing old learning.