Comparative Study of Differentially Private Synthetic Data Algorithms from the NIST PSCR Differential Privacy Synthetic Data Challenge

Open AccessPosted Content

Comparative Study of Differentially Private Synthetic Data Algorithms from the NIST PSCR Differential Privacy Synthetic Data Challenge

- 28 Nov 2019 -

TLDR

In this paper, the authors present an in-depth evaluation of several differentially private synthetic data algorithms using actual differentially privacy synthetic data sets created by contestants in the 2018-2019 National Institute of Standards and Technology Public Safety Communications Research (NIST PSCR) Division's ''Differential Privacy Synthetic Data Challenge''.

Abstract:

Differentially private synthetic data generation offers a recent solution to release analytically useful data while preserving the privacy of individuals in the data. In order to utilize these algorithms for public policy decisions, policymakers need an accurate understanding of these algorithms' comparative performance. Correspondingly, data practitioners require standard metrics for evaluating the analytic qualities of the synthetic data. In this paper, we present an in-depth evaluation of several differentially private synthetic data algorithms using actual differentially private synthetic data sets created by contestants in the 2018-2019 National Institute of Standards and Technology Public Safety Communications Research (NIST PSCR) Division's ``Differential Privacy Synthetic Data Challenge.'' We offer analyses of these algorithms based on both the accuracy of the data they created and their usability by potential data providers. We frame the methods used in the NIST PSCR data challenge within the broader differentially private synthetic data literature. We implement additional utility metrics, including two of our own, on the differentially private synthetic data and compare mechanism utility on three categories. Our comparative assessment of the differentially private data synthesis methods and the quality metrics shows the relative usefulness, the general strengths and weaknesses, and offers preferred choices of algorithms and metrics. Finally we describe the implications of our evaluation for policymakers seeking to implement differentially private synthetic data algorithms on future data products.

Comparative Study of Differentially Private Synthetic Data Algorithms from the NIST PSCR Differential Privacy Synthetic Data Challenge

Citations

Synthetic Data - what, why and how?

Synthetic and Private Smart Health Care Data Generation using GANs

Statistical Data Privacy: A Song of Privacy and Utility

Archimedes Meets Privacy: On Privately Estimating Quantiles in High Dimensions Under Minimal Assumptions

An Incentive Mechanism for Trading Personal Data in Data Markets

References

Calibrating noise to sensitivity in private data analysis

Deep Learning with Differential Privacy

Privacy integrated queries: an extensible platform for privacy-preserving data analysis

Multiple Imputation for Statistical Disclosure Limitation

Data Privacy: Effects on Customer and Firm Performance

Related Papers (5)

Fake It Till You Make It: Guidelines for Effective Synthetic Data Generation

Principled Evaluation of Differentially Private Algorithms using DPBench

A Systematic Comparison and Evaluation of k-Anonymization Algorithms for Practitioners

Generating high-fidelity synthetic patient data for assessing machine learning healthcare software.

Generative Models for Effective ML on Private, Decentralized Datasets.