scispace - formally typeset
Proceedings ArticleDOI

Measuring Forgetting of Memorized Training Examples

Reads0
Chats0
TLDR
It is shown that, while non-convexity can prevent forgetting from happening in the worst-case, standard image and speech models empirically do forget examples over time, and nondeterminism is identified as a potential explanation, showing that deterministically trained models do not forget.
Abstract
Machine learning models exhibit two seemingly contradictory phenomena: training data memorization, and various forms of forgetting. In memorization, models overfit specific training examples and become susceptible to privacy attacks. In forgetting, examples which appeared early in training are forgotten by the end. In this work, we connect these phenomena. We propose a technique to measure to what extent models"forget"the specifics of training examples, becoming less susceptible to privacy attacks on examples they have not seen recently. We show that, while non-convex models can memorize data forever in the worst-case, standard image, speech, and language models empirically do forget examples over time. We identify nondeterminism as a potential explanation, showing that deterministically trained models do not forget. Our results suggest that examples seen early when training with extremely large datasets - for instance those examples used to pre-train a model - may observe privacy benefits at the expense of examples seen later.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

PaLM 2 Technical Report

Rohan Anil, +121 more
- 17 May 2023 - 
TL;DR: The PaLM 2 model as mentioned in this paper is a Transformer-based model trained using a mixture of objectives, which has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM.
Journal ArticleDOI

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

TL;DR: Pythia as discussed by the authors ) is a suite of 16 language models trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters, with 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataaloaders for further study.
Journal ArticleDOI

Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models

TL;DR: In this article , the authors study image retrieval frameworks that enable comparing generated images with training samples and detect when content has been replicated, and identify cases where diffusion models, including the Stable Diffusion model, blatantly copy from their training data.
Journal ArticleDOI

A Survey of Machine Unlearning

TL;DR: This paper aspires to present a comprehensive examination of machine unlearning’s concepts, scenarios, methods, and applications as a category collection of cutting-edge studies to serve as a comprehensive resource for researchers and practitioners seeking an introduction to machine un learning.
Journal ArticleDOI

Analyzing Leakage of Personally Identifiable Information in Language Models

TL;DR: The authors introduced rigorous game-based definitions for three types of PII leakage via black-box extraction, inference, and reconstruction attacks with only API access to an LMs, and empirically evaluated the attacks against GPT-2 models fine-tuned with and without defenses in three domains: case law, health care, and e-mails.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article

Attention is All you Need

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Posted Content

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TL;DR: This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
Book ChapterDOI

Catastrophic interference in connectionist networks: the sequential learning problem

TL;DR: In this article, the authors discuss the catastrophic interference in connectionist networks and show that new learning may interfere catastrophically with old learning when networks are trained sequentially, and the analysis of the causes of interference implies that at least some interference will occur whenever new learning might alter weights involved in representing old learning.