scispace - formally typeset
Journal ArticleDOI

Learning to Generate Image Embeddings with User-level Differential Privacy

Reads0
Chats0
TLDR
DP-FedEmb as discussed by the authors is a variant of federated learning algorithms with per-user sensitivity control and noise addition, to train from user-partitioned data centralized in the datacenter.
Abstract
Small on-device models have been successfully trained with user-level differential privacy (DP) for next word prediction and image classification tasks in the past. However, existing methods can fail when directly applied to learn embedding models using supervised training data with a large class space. To achieve user-level DP for large image-to-embedding feature extractors, we propose DP-FedEmb, a variant of federated learning algorithms with per-user sensitivity control and noise addition, to train from user-partitioned data centralized in the datacenter. DP-FedEmb combines virtual clients, partial aggregation, private local fine-tuning, and public pretraining to achieve strong privacy utility trade-offs. We apply DP-FedEmb to train image embedding models for faces, landmarks and natural species, and demonstrate its superior utility under same privacy budget on benchmark datasets DigiFace, EMNIST, GLD and iNaturalist. We further illustrate it is possible to achieve strong user-level DP guarantees of $\epsilon<4$ while controlling the utility drop within 5%, when millions of users can participate in training.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy

TL;DR: Differential privacy has become a gold standard for making formal statements about data anonymization as mentioned in this paper , and while some adoption of DP has happened in industry, attempts to apply DP to real world complex ML models are still few and far between.
Journal ArticleDOI

Differentially Private Diffusion Models Generate Useful Synthetic Images

TL;DR: In this article , Wang et al. used differential privacy to fine-tune ImageNet pre-trained diffusion models with more than 80M parameters and obtained SOTA results on CIFAR-10 and Camelyon17 in terms of both FID and the accuracy of downstream classifiers trained on synthetic data.
Proceedings ArticleDOI

Federated Learning of Gboard Language Models with Differential Privacy

TL;DR: In this article , the authors train and deploy language models (LMs) with federated learning (FL) and differential privacy (DP) in Google Keyboard (Gboard) using the recent DP-Follow the Regularized Leader (DP-FTRL) algorithm.
Journal ArticleDOI

An Empirical Evaluation of Federated Contextual Bandit Algorithms

TL;DR: In this article , the authors propose federated contextual bandits for learning from sensitive data local to user devices, where the learning can be done using implicit signals generated as users interact with the applications of interest, rather than requiring access to explicit labels.
Journal ArticleDOI

Can Public Large Language Models Help Private Cross-device Federated Learning?

TL;DR: The authors proposed a distribution matching algorithm with theoretical grounding to sample public data close to private data distribution, which significantly improves the sample efficiency of (pre-)training on public data, and further improves the privacy-utility tradeoff by techniques of distillation.
References
More filters
Posted Content

Deep Residual Learning for Image Recognition

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

Gradient-based learning applied to document recognition

TL;DR: This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.
Posted Content

MobileNetV2: Inverted Residuals and Linear Bottlenecks

TL;DR: A new mobile architecture, MobileNetV2, is described that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes and allows decoupling of the input/output domains from the expressiveness of the transformation.
Posted Content

A Simple Framework for Contrastive Learning of Visual Representations

TL;DR: It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.