Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy

doi:10.1145/2976749.2978409

Proceedings ArticleDOI

Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy

- pp 192-203

TLDR

The main idea is to first gather a candidate set of heavy hitters using a portion of the privacy budget, and focus the remaining budget on refining the candidate set in a second phase, which is much more efficient budget-wise than obtaining the heavy hitters directly from the whole dataset.

Abstract:

In local differential privacy (LDP), each user perturbs her data locally before sending the noisy data to a data collector. The latter then analyzes the data to obtain useful statistics. Unlike the setting of centralized differential privacy, in LDP the data collector never gains access to the exact values of sensitive data, which protects not only the privacy of data contributors but also the collector itself against the risk of potential data leakage. Existing LDP solutions in the literature are mostly limited to the case that each user possesses a tuple of numeric or categorical values, and the data collector computes basic statistics such as counts or mean values. To the best of our knowledge, no existing work tackles more complex data mining tasks such as heavy hitter discovery over set-valued data. In this paper, we present a systematic study of heavy hitter mining under LDP. We first review existing solutions, extend them to the heavy hitter estimation, and explain why their effectiveness is limited. We then propose LDPMiner, a two-phase mechanism for obtaining accurate heavy hitters with LDP. The main idea is to first gather a candidate set of heavy hitters using a portion of the privacy budget, and focus the remaining budget on refining the candidate set in a second phase, which is much more efficient budget-wise than obtaining the heavy hitters directly from the whole dataset. We provide both in-depth theoretical analysis and extensive experiments to compare LDPMiner against adaptations of previous solutions. The results show that LDPMiner significantly improves over existing methods. More importantly, LDPMiner successfully identifies the majority true heavy hitters in practical settings.

Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy

Citations

A Survey on the Edge Computing for the Internet of Things

A Survey on IoT Security: Application Areas, Security Threats, and Solution Architectures

Securing Fog Computing for Internet of Things Applications: Challenges and Solutions

Locally Differentially Private Protocols for Frequency Estimation

Privacy-Preserved Data Sharing Towards Multiple Parties in Industrial IoTs

References

Differential privacy: a survey of results

Randomized response: a survey technique for eliminating evasive answer bias.

Learning to rank using gradient descent

Extensions of Lipschitz mappings into Hilbert space

Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web

Related Papers (5)

RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response

Calibrating noise to sensitivity in private data analysis

Randomized response: a survey technique for eliminating evasive answer bias.

Local, Private, Efficient Protocols for Succinct Histograms

The Algorithmic Foundations of Differential Privacy