Publishing set-valued data via differential privacy
read more
Citations
Differential Privacy Techniques for Cyber Physical Systems: A Survey
Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy
Differentially private sequential data publication via variable-length n-grams
Differentially private transit data publication: a case study on the montreal transportation system
References
Data Mining: Concepts and Techniques
k -anonymity: a model for protecting privacy
Calibrating noise to sensitivity in private data analysis
Calibrating noise to sensitivity in private data analysis
Related Papers (5)
Frequently Asked Questions (16)
Q2. What have the authors stated for future works in "Publishing set-valued data via differential privacy" ?
The authors consider it in their future work.
Q3. What is the rationale of taking into consideration the height?
The rationale of taking into consideration the height is that more general partitions should have more records to be worth being partitioned.
Q4. What is the method for generating sub-partitions?
For exponential mechanism, the authors can get the noisy number N of non-empty sub-partitions, and then use exponential mechanism to extract N sub-partitions by using the number of records in a sub-partition as the score function.
Q5. What is the problem of privacy attacks on set-valued data?
due to both their vulnerability to adversaries’ background knowledge and their deterministic nature, many types of privacy attacks [20, 25, 31] have been identified on these approaches derived using these models, leading to privacy compromise.
Q6. Why does Ghinita et al. have a problem with privacy-preserving?
Due to the nature of high dimensionality in set-valued data, the extensive research on privacy-preserving data publishing (PPDP) for relational data does not fit well with set-valued data [13].
Q7. What is the utility of a sequence of differentially-private computations?
For a sequence of differentially-private computations, its privacy guarantee is provided by the composition properties of differential privacy, namely sequential composition and parallel composition, which are summarized in Appendix B.Due to the lower bound results [6, 8, 9], the authors can only guarantee the utility of restricted classes of queries [4] in the non-interactive setting.
Q8. How does the algorithm create a generalized taxonomy tree?
The algorithm first constructs the context-free taxonomy tree H by iteratively grouping f nodes from one level to an upper level until a single root is created.
Q9. What are the previous anonymization techniques for publishing setvalued data?
The previous anonymization techniques [5, 16, 19, 28, 29, 34, 35] developed for publishing setvalued data are dedicated to partition-based privacy models.
Q10. What is the purpose of differential privacy?
several works [4, 10, 32, 33] have started to address the use of differential privacy in the non-interactive setting as a substitute for partition-based privacy models.
Q11. What is the applicability of their approach to other types of data?
The authors discuss the applicability of their approach to other types of data, e.g. relational data, in Appendix D.In the experiments, the authors examine the performance of their algorithm in terms of utility for different data mining tasks, namely counting queries and frequent itemset mining, and scalability of handling large set-valued datasets.
Q12. How does the paper contribute to the research of differential privacy?
The paper also contributes to the research of differential privacy by demonstrating that an efficient non-interactive solution could be achieved by carefully making use of the underlying dataset.
Q13. What is the maximum number of sub-partitions needed for a non-lea?
For a non-leaf partition, the authors generate a candidate set of taxonomy tree nodes from its hierarchy cut, containing allnon-leaf nodes that are of the largest height in H, and then randomly select a node u from the set to expand, generating a total of 2l sub-partitions, where l ≤ f is the number of u’s children in H.
Q14. What are the factors that dominate the complexity analysis?
According to the complexity analysis in Section 4.2, dataset size and universe size are the two factors that dominate the complexity.
Q15. What is the maximum number of partition operations needed to reach leaf partitions?
Given a non-leaf partition p with a hierarchy cut and an associated taxonomy tree H, the maximum number of partition operations needed to reach leaf partitions is |InternalNodes(cut)| = ∑ui∈cut |InternalNodes(ui, H)|, where |InternalNodes(ui, H)| is the number of internal node of the subtree of H rooted at ui.
Q16. What are the two principal techniques for achieving differential privacy?
Two principal techniques for achieving differential privacy have appeared in the literature, one for real-valued outputs [8] and the other for outputs of arbitrary types [24].