Differentially private histogram publication (2013) | Jia Xu

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy

[...]

Zhan Qin¹, Yin Yang², Ting Yu¹, Issa Khalil¹, Xiaokui Xiao³, Kui Ren⁴ - Show less +2 more•Institutions (4)

Qatar Computing Research Institute¹, Khalifa University², Nanyang Technological University³, University at Buffalo⁴

24 Oct 2016

TL;DR: The main idea is to first gather a candidate set of heavy hitters using a portion of the privacy budget, and focus the remaining budget on refining the candidate set in a second phase, which is much more efficient budget-wise than obtaining the heavy hitters directly from the whole dataset.

...read moreread less

Abstract: In local differential privacy (LDP), each user perturbs her data locally before sending the noisy data to a data collector. The latter then analyzes the data to obtain useful statistics. Unlike the setting of centralized differential privacy, in LDP the data collector never gains access to the exact values of sensitive data, which protects not only the privacy of data contributors but also the collector itself against the risk of potential data leakage. Existing LDP solutions in the literature are mostly limited to the case that each user possesses a tuple of numeric or categorical values, and the data collector computes basic statistics such as counts or mean values. To the best of our knowledge, no existing work tackles more complex data mining tasks such as heavy hitter discovery over set-valued data. In this paper, we present a systematic study of heavy hitter mining under LDP. We first review existing solutions, extend them to the heavy hitter estimation, and explain why their effectiveness is limited. We then propose LDPMiner, a two-phase mechanism for obtaining accurate heavy hitters with LDP. The main idea is to first gather a candidate set of heavy hitters using a portion of the privacy budget, and focus the remaining budget on refining the candidate set in a second phase, which is much more efficient budget-wise than obtaining the heavy hitters directly from the whole dataset. We provide both in-depth theoretical analysis and extensive experiments to compare LDPMiner against adaptations of previous solutions. The results show that LDPMiner significantly improves over existing methods. More importantly, LDPMiner successfully identifies the majority true heavy hitters in practical settings.

...read moreread less

304 citations

Cites background from "Differentially private histogram pu..."

...[31] propose a two-phase kdtree based spatial decomposition mechanism to publish histograms....
[...]

Posted Content•

Functional Mechanism: Regression Analysis under Differential Privacy

[...]

Jun Zhang¹, Zhenjie Zhang, Xiaokui Xiao¹, Yin Yang, Marianne Winslett² - Show less +1 more•Institutions (2)

Nanyang Technological University¹, University of Illinois at Urbana–Champaign²

01 Aug 2012-arXiv: Databases

TL;DR: In this paper, the authors proposed the functional mechanism, which perturbs the objective function of the optimization problem, rather than its results, and applied it to linear regression and logistic regression.

...read moreread less

Abstract: \epsilon-differential privacy is the state-of-the-art model for releasing sensitive information while protecting privacy. Numerous methods have been proposed to enforce epsilon-differential privacy in various analytical tasks, e.g., regression analysis. Existing solutions for regression analysis, however, are either limited to non-standard types of regression or unable to produce accurate regression results. Motivated by this, we propose the Functional Mechanism, a differentially private method designed for a large class of optimization-based analyses. The main idea is to enforce epsilon-differential privacy by perturbing the objective function of the optimization problem, rather than its results. As case studies, we apply the functional mechanism to address two most widely used regression models, namely, linear regression and logistic regression. Both theoretical analysis and thorough experimental evaluations show that the functional mechanism is highly effective and efficient, and it significantly outperforms existing solutions.

...read moreread less

297 citations

Journal Article•DOI•

Differentially Private Data Publishing and Analysis: A Survey

[...]

Tianqing Zhu¹, Gang Li¹, Wanlei Zhou¹, Philip S. Yu²•Institutions (2)

Deakin University¹, University of Illinois at Chicago²

01 Aug 2017-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This survey compares the diverse release mechanisms of differentially private data publishing given a variety of input data in terms of query type, the maximum number of queries, efficiency, and accuracy.

...read moreread less

Abstract: Differential privacy is an essential and prevalent privacy model that has been widely explored in recent decades. This survey provides a comprehensive and structured overview of two research directions: differentially private data publishing and differentially private data analysis. We compare the diverse release mechanisms of differentially private data publishing given a variety of input data in terms of query type, the maximum number of queries, efficiency, and accuracy. We identify two basic frameworks for differentially private data analysis and list the typical algorithms used within each framework. The results are compared and discussed based on output accuracy and efficiency. Further, we propose several possible directions for future research and possible applications.

...read moreread less

265 citations

Proceedings Article•DOI•

Blowfish privacy: tuning privacy-utility trade-offs using policies

[...]

Xi He¹, Ashwin Machanavajjhala¹, Bolin Ding²•Institutions (2)

Duke University¹, Microsoft²

18 Jun 2014

TL;DR: Blowfish, a class of privacy definitions inspired by the Pufferfish framework, is presented that allows data publishers to extend differential privacy using a policy, which specifies secrets, or information that must be kept secret, and constraints that may be known about the data.

...read moreread less

Abstract: Privacy definitions provide ways for trading-off the privacy of individuals in a statistical database for the utility of downstream analysis of the data. In this paper, we present Blowfish, a class of privacy definitions inspired by the Pufferfish framework, that provides a rich interface for this trade-off. In particular, we allow data publishers to extend differential privacy using a policy, which specifies (a) secrets, or information that must be kept secret, and (b) constraints that may be known about the data. While the secret specification allows increased utility by lessening protection for certain individual properties, the constraint specification provides added protection against an adversary who knows correlations in the data (arising from constraints). We formalize policies and present novel algorithms that can handle general specifications of sensitive information and certain count constraints. We show that there are reasonable policies under which our privacy mechanisms for k-means clustering, histograms and range queries introduce significantly lesser noise than their differentially private counterparts. We quantify the privacy-utility trade-offs for various policies analytically and empirically on real datasets.

...read moreread less

179 citations

Cites methods from "Differentially private histogram pu..."

...Various hierarchical methods have been proposed in the literature [9, 19, 15, 20, 18]....
[...]

Journal Article•DOI•

Understanding hierarchical methods for differentially private histograms

[...]

Wahbeh Qardaji¹, Weining Yang¹, Ninghui Li¹•Institutions (1)

Purdue University¹

01 Sep 2013

TL;DR: This paper examines the factors affecting the accuracy of hierarchical approaches by studying the mean squared error (MSE) when answering range queries, and analyzes how the MSE changes with different branching factors, after employing constrained inference, and with different methods to allocate the privacy budget among hierarchy levels.

...read moreread less

Abstract: In recent years, many approaches to differentially privately publish histograms have been proposed. Several approaches rely on constructing tree structures in order to decrease the error when answer large range queries. In this paper, we examine the factors affecting the accuracy of hierarchical approaches by studying the mean squared error (MSE) when answering range queries. We start with one-dimensional histograms, and analyze how the MSE changes with different branching factors, after employing constrained inference, and with different methods to allocate the privacy budget among hierarchy levels. Our analysis and experimental results show that combining the choice of a good branching factor with constrained inference outperform the current state of the art. Finally, we extend our analysis to multi-dimensional histograms. We show that the benefits from employing hierarchical methods beyond a single dimension are significantly diminished, and when there are 3 or more dimensions, it is almost always better to use the Flat method instead of a hierarchy.

...read moreread less

174 citations

Cites methods from "Differentially private histogram pu..."

...[23] propose an alternative non-hierarchical mechanism for publishing histograms that can improve upon the Flat method....
[...]
...In order to generate the optimal histogram structure, [23] proposes using dynamic programming techniques....
[...]

Collapse

Differentially private histogram publication

Citations

Cites background from "Differentially private histogram pu..."

Cites methods from "Differentially private histogram pu..."

Cites methods from "Differentially private histogram pu..."

References

"Differentially private histogram pu..." refers background or methods in this paper

Related Papers (5)