Showing papers on "Data publishing published in 2019"

PDF

Open Access

Journal Article•DOI•

Privacy-Preserving Crowd-Sourced Statistical Data Publishing with An Untrusted Server

[...]

Zhibo Wang¹, Xiaoyi Pang¹, Yahong Chen¹, Huajie Shao², Qian Wang¹, Libing Wu¹, Honglong Chen³, Hairong Qi⁴ - Show less +4 more•Institutions (4)

Wuhan University¹, University of Illinois at Urbana–Champaign², China University of Petroleum³, University of Tennessee⁴

01 Jun 2019-IEEE Transactions on Mobile Computing

TL;DR: It is proved that DADP can provide real-time crowd-sourced statistical data publishing with strong privacy protection under an untrusted server and a distributed budget allocation mechanism and an agent-based dynamic grouping mechanism to realize global $w-event $\epsilon$ε-differential privacy in a distributed way.

...read moreread less

Abstract: The continuous publication of aggregate statistics over crowd-sourced data to the public has enabled many data mining applications (e.g., real-time traffic analysis). Existing systems usually rely on a trusted server to aggregate the spatio-temporal crowd-sourced data and then apply differential privacy mechanism to perturb the aggregate statistics before publishing to provide strong privacy guarantee. However, the privacy of users will be exposed once the server is hacked or cannot be trusted. In this paper, we study the problem of real-time crowd-sourced statistical data publishing with strong privacy protection under an untrusted server. We propose a novel distributed agent-based privacy-preserving framework, called DADP, that introduces a new level of multiple agents between the users and the untrusted server. Instead of directly uploading the check-in information to the untrusted server, a user can randomly select one agent and upload the check-in information to it with the anonymous connection technology. Each agent aggregates the received crowd-sourced data and perturbs the aggregated statistics locally with Laplace mechanism. The perturbed statistics from all the agents are further combined together to form the entire perturbed statistics for publication. In particular, we propose a distributed budget allocation mechanism and an agent-based dynamic grouping mechanism to realize global $w$w-event $\epsilon$e-differential privacy in a distributed way. We prove that DADP can provide $w$w-event $\epsilon$e-differential privacy for real-time crowd-sourced statistical data publishing under the untrusted server. Extensive experiments on real-world datasets demonstrate the effectiveness of DADP.

...read moreread less

101 citations

Book Chapter•DOI•

Differentially Private Generative Adversarial Networks for Time Series, Continuous, and Discrete Open Data

[...]

Lorenzo Frigerio, Anderson Santana de Oliveira, Laurent Gomez, Patrick Duverger

25 Jun 2019

TL;DR: In this paper, a differential privacy framework for privacy preserving data publishing using Generative Adversarial Networks (GANs) is proposed, which can be easily adapted to different use cases, from the generation of time series, to continuous, and discrete data.

...read moreread less

Abstract: Open data plays a fundamental role in the 21st century by stimulating economic growth and by enabling more transparent and inclusive societies. However, it is always difficult to create new high-quality datasets with the required privacy guarantees for many use cases. In this paper, we developed a differential privacy framework for privacy preserving data publishing using Generative Adversarial Networks. It can be easily adapted to different use cases, from the generation of time-series, to continuous, and discrete data. We demonstrate the efficiency of our approach on real datasets from the French public administration and classic benchmark datasets. Our results maintain both the original distribution of the features and the correlations among them, at the same time providing a good level of privacy.

...read moreread less

82 citations

Journal Article•DOI•

Accelerating Health Data Sharing: A Solution Based on the Internet of Things and Distributed Ledger Technologies

[...]

Xiaochen Zheng¹, Shengjing Sun¹, Raghava Rao Mukkamala², Ravi Vatrapu², Joaquín Ordieres-Meré¹ - Show less +1 more•Institutions (2)

Technical University of Madrid¹, Copenhagen Business School²

06 Jun 2019-Journal of Medical Internet Research

TL;DR: The proposed solution based on IOTA Tangle and MAM could overcome many challenges faced by other traditional blockchain-based solutions in terms of cost, efficiency, scalability, and flexibility in data access management.

...read moreread less

Abstract: Background: Huge amounts of health-related data are generated every moment with the rapid development of Internet of Things (IoT) and wearable technologies. These big health data contain great value and can bring benefit to all stakeholders in the health care ecosystem. Currently, most of these data are siloed and fragmented in different health care systems or public and private databases. It prevents the fulfillment of intelligent health care inspired by these big data. Security and privacy concerns and the lack of ensured authenticity trails of data bring even more obstacles to health data sharing. With a decentralized and consensus-driven nature, distributed ledger technologies (DLTs) provide reliable solutions such as blockchain, Ethereum, and IOTA Tangle to facilitate the health care data sharing. Objective: This study aimed to develop a health-related data sharing system by integrating IoT and DLT to enable secure, fee-less, tamper-resistant, highly-scalable, and granularly-controllable health data exchange, as well as build a prototype and conduct experiments to verify the feasibility of the proposed solution. Methods: The health-related data are generated by 2 types of IoT devices: wearable devices and stationary air quality sensors. The data sharing mechanism is enabled by IOTA’s distributed ledger, the Tangle, which is a directed acyclic graph. Masked Authenticated Messaging (MAM) is adopted to facilitate data communications among different parties. Merkle Hash Tree is used for data encryption and verification. Results: A prototype system was built according to the proposed solution. It uses a smartwatch and multiple air sensors as the sensing layer; a smartphone and a single-board computer (Raspberry Pi) as the gateway; and a local server for data publishing. The prototype was applied to the remote diagnosis of tremor disease. The results proved that the solution could enable costless data integrity and flexible access management during data sharing. Conclusions: DLT integrated with IoT technologies could greatly improve the health-related data sharing. The proposed solution based on IOTA Tangle and MAM could overcome many challenges faced by other traditional blockchain-based solutions in terms of cost, efficiency, scalability, and flexibility in data access management. This study also showed the possibility of fully decentralized health data sharing by replacing the local server with edge computing devices.

...read moreread less

75 citations

Journal Article•DOI•

Privacy-Preserving Social Media Data Publishing for Personalized Ranking-Based Recommendation

[...]

Dingqi Yang¹, Bingqing Qu¹, Philippe Cudré-Mauroux¹•Institutions (1)

University of Fribourg¹

01 Mar 2019-IEEE Transactions on Knowledge and Data Engineering

TL;DR: An empirical evaluation on both synthetic and real-world datasets shows that the proposed PrivRank framework can efficiently provide effective and continuous protection of user-specified private data, while still preserving the utility of the obfuscated data for personalized ranking-based recommendation.

...read moreread less

Abstract: Personalized recommendation is crucial to help users find pertinent information. It often relies on a large collection of user data, in particular users’ online activity (e.g., tagging/rating/checking-in) on social media, to mine user preference. However, releasing such user activity data makes users vulnerable to inference attacks, as private data (e.g., gender) can often be inferred from the users’ activity data. In this paper, we proposed PrivRank, a customizable and continuous privacy-preserving social media data publishing framework protecting users against inference attacks while enabling personalized ranking-based recommendations. Its key idea is to continuously obfuscate user activity data such that the privacy leakage of user-specified private data is minimized under a given data distortion budget, which bounds the ranking loss incurred from the data obfuscation process in order to preserve the utility of the data for enabling recommendations. An empirical evaluation on both synthetic and real-world datasets shows that our framework can efficiently provide effective and continuous protection of user-specified private data, while still preserving the utility of the obfuscated data for personalized ranking-based recommendation. Compared to state-of-the-art approaches, PrivRank achieves both a better privacy protection and a higher utility in all the ranking-based recommendation use cases we tested.

...read moreread less

63 citations

Journal Article•DOI•

Attribute-centric anonymization scheme for improving user privacy and utility of publishing e-health data

[...]

Abdul Majeed¹•Institutions (1)

Korea Aerospace University¹

01 Oct 2019-Journal of King Saud University - Computer and Information Sciences

TL;DR: This paper proposes a new anonymization scheme of data privacy for e-health records which differs from existing approaches in its ability to prevent from identity disclosure even faced with adversaries having pertinent background knowledge.

...read moreread less

50 citations

Journal Article•DOI•

Privacy-preserving governmental data publishing: A fog-computing-based differential privacy approach

[...]

Chunhui Piao, Yajuan Shi, Jiaqi Yan¹, Changyou Zhang², Liping Liu - Show less +1 more•Institutions (2)

Nanjing University¹, Chinese Academy of Sciences²

01 Jan 2019-Future Generation Computer Systems

TL;DR: The proposed fog-computing-based differential privacy approach for privacy-preserving data publishing can not only effectively protect citizens’ privacy, but also reduce the query sensitivity and improve the utility of the data published.

...read moreread less

40 citations

Journal Article•DOI•

Brain storm-based Whale Optimization Algorithm for privacy-protected data publishing in cloud computing

[...]

S. Thanga Revathi¹, N. Ramaraj², S. Chithra•Institutions (2)

Anna University¹, Vignan University²

01 Mar 2019-Cluster Computing

TL;DR: An optimization scheme, Brain Storm based Whale Optimization Algorithm (BS-WOA), is introduced for identifying the secret key for preserving the privacy and utility in the data structures of the data owner.

...read moreread less

Abstract: Cloud computing serves as a major boost for the digital era since it handles data from a large number of users simultaneously. Besides the several useful characteristics, providing security to the data stored in the cloud platform is a major challenge for the service providers. Privacy preservation schemes introduced in the literature trying to enhance the privacy and utility of the data structures by modifying the database with the secret key. In this paper, an optimization scheme, Brain Storm based Whale Optimization Algorithm (BS-WOA), is introduced for identifying the secret key. The database from the data owner is modified with the optimal secret key for constructing the retrievable perturbation data for preserving the privacy and utility. The proposed BS-WOA is designed through the hybridization of Brain Storm Optimization and Whale Optimization Algorithm. Simulation of the proposed technique with the BS-WOA is done in the three standard databases, such as chess T10I4D100 K, and the retail databases. When evaluated for the key size of 256, the proposed BS-WOA achieved privacy value of 0.186 and utility value of 0.8777 for the chess database, and thus, has improved performance.

...read moreread less

32 citations

Journal Article•DOI•

Improved l-diversity: Scalable anonymization approach for Privacy Preserving Big Data Publishing

[...]

Brijesh B. Mehta, Udai Pratap Rao

14 Aug 2019-Journal of King Saud University - Computer and Information Sciences

TL;DR: Improved Scalable l-Diversity (ImSLD) approach is proposed which is the extension of Improved Scalable k-Anonymity (ImSKA) for scalable anonymization in this paper, based on scalable k-anonymization that uses MapReduce as a programming paradigm.

...read moreread less

30 citations

Patent•

Electronic medical record storage and sharing model and method based on blockchains

[...]

Liu Jingwei, Xin Li, Xiaolu Li, Sun Rong, Pei Qingqi - Show less +1 more

12 Feb 2019

TL;DR: In this paper, an electronic medical record storage and sharing model and method based on the blockchains is proposed to solve the problems of patient's rights of access to personal medical data and insecure storage of sensitive medical data in the prior art.

...read moreread less

Abstract: The invention discloses an electronic medical record storage and sharing model and method based on the blockchains and solves problems of patient's rights of access to personal medical data and insecure storage and sharing of sensitive medical data in the prior art. The model is characterized by comprising data creators, data owners, cloud storage, federated blockchains and data consumers, whereinthe blockchain is a control center. The method comprises steps that system initialization is performed; medical data are acquired, and data storage with intercepted signatures is employed; data publishing with the improved DPOS consensus mechanism is employed; data sharing based on smart contracts is performed. The method is advantaged in that security, reliability, privacy protection and securestorage are achieved, in combination with the cloud storage technology and the interceptable signature technology, in the federated blockchains, users can set sharing conditions through the smart contracts, safe and effective data sharing and access can be realized, and strong practicality is achieved.

...read moreread less

28 citations

Posted Content•

Privacy in trajectory micro-data publishing : a survey.

[...]

Marco Fiore¹, Panagiota Katsikouli², Elli Zavou³, Mathieu Cunche⁴, Françoise Fessant, Dominique Le Hello, Ulrich Aïvodji⁵, Baptiste Olivier, Tony Quertier, Razvan Stanica⁴ - Show less +6 more•Institutions (5)

National Research Council¹, Technical University of Denmark², French Institute for Research in Computer Science and Automation³, Institut national des sciences Appliquées de Lyon⁴, Université du Québec à Montréal⁵

26 Mar 2019-arXiv: Cryptography and Security

TL;DR: This paper serves as an introductory reading on a critical subject in an era of growing awareness about privacy risks connected to digital services, and provides insights into open problems and future directions for research.

...read moreread less

Abstract: We survey the literature on the privacy of trajectory micro-data, i.e., spatiotemporal information about the mobility of individuals, whose collection is becoming increasingly simple and frequent thanks to emerging information and communication technologies. The focus of our review is on privacy-preserving data publishing (PPDP), i.e., the publication of databases of trajectory micro-data that preserve the privacy of the monitored individuals. We classify and present the literature of attacks against trajectory micro-data, as well as solutions proposed to date for protecting databases from such attacks. This paper serves as an introductory reading on a critical subject in an era of growing awareness about privacy risks connected to digital services, and provides insights into open problems and future directions for research.

...read moreread less

27 citations

Proceedings Article•DOI•

Collaborative Practices with Structured Data: Do Tools Support What Users Need?

[...]

Laura Koesten¹, Emilia Kacprzak¹, Jeni Tennison², Elena Simperl¹•Institutions (2)

University of Southampton¹, Open Data Institute²

02 May 2019

TL;DR: This paper presents the results of an interview study with data practitioners, from which four high-level user needs for tool support are derived, and suggests that data-centric collaborative work would benefit from structured documentation of data and its lifecycle; advanced affordances for conversations among collaborators; better change control; and custom data access.

...read moreread less

Abstract: Collaborative work with data is increasingly common and spans a broad range of activities - from creating or analysing data in a team, to sharing it with others, to reusing someone else's data in a new context. In this paper, we explore collaboration practices around structured data and how they are supported by current technology. We present the results of an interview study with twenty data practitioners, from which we derive four high-level user needs for tool support. We compare them against the capabilities of twenty systems that are commonly associated with data activities, including data publishing software, wikis, web-based collaboration tools, and online community platforms. Our findings suggest that data-centric collaborative work would benefit from: structured documentation of data and its lifecycle; advanced affordances for conversations among collaborators; better change control; and custom data access. The findings help us formalise practices around data teamwork, and build a better understanding how people's motivations and barriers when working with structured data.

...read moreread less

Journal Article•DOI•

Privacy preserving publication of relational and transaction data: Survey on the anonymization of patient data

[...]

Vartika Puri¹, Shelly Sachdeva², Parmeet Kaur¹•Institutions (2)

Jaypee Institute of Information Technology¹, National Institute of Technology Delhi²

01 May 2019-Computer Science Review

TL;DR: A comprehensive survey of the previous research done to develop techniques for ensuring privacy of patient data that includes demographics data, diagnosis codes and the data containing both demographics and diagnosis codes is presented.

...read moreread less

Posted Content•

Differentially Private Generative Adversarial Networks for Time Series, Continuous, and Discrete Open Data

[...]

Lorenzo Frigerio, Anderson Santana de Oliveira, Laurent Gomez, Patrick Duverger

08 Jan 2019-arXiv: Cryptography and Security

TL;DR: This paper developed a differential privacy framework for privacy preserving data publishing using Generative Adversarial Networks that can be easily adapted to different use cases, from the generation of time-series, to continuous, and discrete data.

...read moreread less

Abstract: Open data plays a fundamental role in the 21th century by stimulating economic growth and by enabling more transparent and inclusive societies. However, it is always difficult to create new high-quality datasets with the required privacy guarantees for many use cases. This paper aims at creating a framework for releasing new open data while protecting the individuality of the users through a strict definition of privacy called differential privacy. Unlike previous work, this paper provides a framework for privacy preserving data publishing that can be easily adapted to different use cases, from the generation of time-series to continuous data, and discrete data; no previous work has focused on the later class. Indeed, many use cases expose discrete data or at least a combination between categorical and numerical values. Thanks to the latest developments in deep learning and generative models, it is now possible to model rich-semantic data maintaining both the original distribution of the features and the correlations between them. The output of this framework is a deep network, namely a generator, able to create new data on demand. We demonstrate the efficiency of our approach on real datasets from the French public administration and classic benchmark datasets.

...read moreread less

Journal Article•DOI•

Privacy-preserving model and generalization correlation attacks for 1:M data with multiple sensitive attributes

[...]

Tehsin Kanwal¹, Sayed Ali Asjad Shaukat¹, Adeel Anjum¹, Saif Ur Rehman Malik², Kim-Kwang Raymond Choo³, Kim-Kwang Raymond Choo⁴, Abid Khan¹, Naveed Ahmad⁵, Mansoor Ahmad¹, Samee U. Khan⁶ - Show less +6 more•Institutions (6)

COMSATS Institute of Information Technology¹, Cybernetica², University of Texas at San Antonio³, University of South Australia⁴, National University of Computer and Emerging Sciences⁵, North Dakota State University⁶

01 Jul 2019-Information Sciences

TL;DR: This paper presents a new type of attack on 1:M records with MSAs, coined as MSAs generalization correlation attacks and proposes a privacy-preserving technique “(p, l)-Angelization” for 1-M–MSAs data publication and advocates the outperformance of the technique over its counterparts.

...read moreread less

Journal Article•DOI•

TSRAM: A time-saving k-degree anonymization method in social network

[...]

Maryam Kiabod¹, Mohammad Naderi Dehkordi¹, Behrang Barekatain¹•Institutions (1)

Islamic Azad University¹

01 Jul 2019-Expert Systems With Applications

TL;DR: A time-saving k-degree anonymization method in social network (TSRAM) that anonymizes the social network graph without having to rescan the data set for different levels of anonymity and is effective to preserve the utility of the anonymized graph.

...read moreread less

Abstract: Social networks provide an attractive environment in order to have low cost and easy communication; however, analyzing huge amounts of produced data can considerably affect the user's privacy. In other words, an efficient algorithm should intelligently take the user's privacy into account while extracting data for useful information. In recent years, many studies have been conducted on social network privacy preservation for data publishing. However, the current algorithms are not one-time scan; that is, for every level of anonymization, the data set must be scanned again and this is a time-consuming operation. In order to address the above mentioned issue, the present research introduces a time-saving k-degree anonymization method in social network (TSRAM) that anonymizes the social network graph without having to rescan the data set for different levels of anonymity. First, it produces a tree from the data set. Then, the anonymized degree sequence of the graph is computed based on the tree. The proposed method employs an efficient approach to partition the node degrees. It takes advantage of partitioning the graph bottom-up nodes based on the anonymization levels. Moreover, it uses two effective criteria to increase the utility of the anonymized graph. Comparing to other similar techniques, the results show that TSRAM is effective, not only to make the degree sequence anonymization of the graph one-time scan, but also to preserve the utility of the anonymized graph.

...read moreread less

Journal Article•DOI•

Novel trajectory data publishing method under differential privacy

[...]

Xiaodong Zhao¹, Yulan Dong¹, Dechang Pi¹•Institutions (1)

Nanjing University of Aeronautics and Astronautics¹

30 Dec 2019-Expert Systems With Applications

TL;DR: A Sequence R (SR)-tree structure is proposed that satisfies the differential privacy based on the R-tree, and an attack model called non-location sensitive information attack is put forward, in order to resist this attack, noise is added into the location data and non- location sensitive data using differential privacy techniques.

...read moreread less

Abstract: The existing location-based services have collected a large amount of user trajectory data, and if these data are directly released without any processing, the user's personal privacy will be leaked. At present, differential privacy protection technology is favored by many scholars, but how to apply it reasonably to location-based services is also a challenge for us. Trajectory is spatiotemporal continuous, but most existing methods only consider the single location of moving objects at a certain time without considering the entire trajectory, which may destroy the spatiotemporal integrity of the trajectory. In this paper, we address this problem and firstly propose a Sequence R (SR)-tree structure that satisfies the differential privacy based on the R-tree, and we construct the SR-Tree by using the trajectory sequence instead of the minimum bounding rectangle of the R-tree. Then we put forward an attack model called non-location sensitive information attack, in order to resist this attack, we add noise into the location data and non-location sensitive data using differential privacy techniques. Finally, the Algorithm can be consistently dealt with the problem of data inconsistency after adding noise. Experimental results show that our Algorithm not only has high data availability, operational efficiency, but also has good scalability.

...read moreread less

Posted Content•

Improving Suppression to Reduce Disclosure Risk and Enhance Data Utility.

[...]

Marmar Orooji, Gerald M. Knapp

02 Jan 2019-arXiv: Databases

TL;DR: This paper proposes an improved suppression method, which reduces the disclosure risk and enhances the data utility by targeting the highest risk records and keeping other records intact, and has shown the effectiveness of this approach through an experiment on a real-world confidential dataset.

...read moreread less

Abstract: In Privacy Preserving Data Publishing, various privacy models have been developed for employing anonymization operations on sensitive individual level datasets, in order to publish the data for public access while preserving the privacy of individuals in the dataset. However, there is always a trade-off between preserving privacy and data utility; the more changes we make on the confidential dataset to reduce disclosure risk, the more information the data loses and the less data utility it preserves. The optimum privacy technique is the one that results in a dataset with minimum disclosure risk and maximum data utility. In this paper, we propose an improved suppression method, which reduces the disclosure risk and enhances the data utility by targeting the highest risk records and keeping other records intact. We have shown the effectiveness of our approach through an experiment on a real-world confidential dataset.

...read moreread less

Proceedings Article•DOI•

Synthesizing Differentially Private Datasets using Random Mixing

[...]

Kangwook Lee¹, Hoon Kim¹, Kyung-Min Lee¹, Changho Suh¹, Kannan Ramchandran² - Show less +1 more•Institutions (2)

KAIST¹, University of California, Berkeley²

07 Jul 2019

TL;DR: This work proposes a new data publishing algorithm in which a released dataset is formed by mixing ` randomly chosen data points and then perturbing them with an additive noise, and shows that as ` increases, noise with smaller variance is sufficient to achieve a target privacy level.

...read moreread less

Abstract: The goal of differentially private data publishing is to release a modified dataset so that its privacy can be ensured while allowing for efficient learning. We propose a new data publishing algorithm in which a released dataset is formed by mixing ` randomly chosen data points and then perturbing them with an additive noise. Our privacy analysis shows that as ` increases, noise with smaller variance is sufficient to achieve a target privacy level. In order to quantify the usefulness of our algorithm, we adopt the accuracy of a predictive model trained with our synthetic dataset, which we call the utility of the dataset. By characterizing the utility of our dataset as a function of `, we show that one can learn both linear and nonlinear predictive models so that they yield reasonably good prediction accuracies. Particularly, we show that there exists a sweet spot on ` that maximizes the prediction accuracy given a required privacy level, or vice versa. We also demonstrate that given a target privacy level, our datasets can achieve higher utility than other datasets generated with the existing data publishing algorithms.

...read moreread less

Journal Article•DOI•

The Pursuit of Patterns in Educational Data Mining as a Threat to Student Privacy

[...]

Kyriaki H. Kyritsi¹, Vassilios Zorkadis, Elias C. Stavropoulos¹, Vassilios S. Verykios¹•Institutions (1)

Hellenic Open University¹

27 May 2019-Journal of interactive media in education

TL;DR: The Hellenic Open University (HEOU) as discussed by the authors proposed a framework of interacting roles and factors to discover patterns in the learning process and the ability to publish and share these results would be very helpful for the whole academic institution.

...read moreread less

Abstract: Recent technological advances have led to tremendous capacities for collecting, storing and analyzing data being created at an ever-increasing speed from diverse sources. Academic institutions which offer open and distance learning programs, such as the Hellenic Open University, can benefit from big data relating to its students’ information and communication systems and the use of modern techniques and tools of big data analytics provided that the student’s right to privacy is not compromised. The balance between data mining and maintaining privacy can be reached through anonymisation methods but on the other hand this approach raises technical problems such as the loss of a certain amount of information found in the original data. Considering the learning process as a framework of interacting roles and factors, the discovery of patterns in that system can be really useful and beneficial firstly for the learners and furthermore, the ability to publish and share these results would be very helpful for the whole academic institution.

...read moreread less

Journal Article•DOI•

An Effective Data Privacy Protection Algorithm Based on Differential Privacy in Edge Computing

[...]

Yi Qiao¹, Zhaobin Liu¹, Haoze Lv¹, Minghui Li¹, Zhiyi Huang², Zhiyang Li¹, Weijiang Liu¹ - Show less +3 more•Institutions (2)

Dalian Maritime University¹, University of Otago²

02 Sep 2019-IEEE Access

TL;DR: A partitioned histogram data publishing algorithm based on wavelet transform is proposed that can reduce the complexity of wavelet tree constructed byWavelet transform and improve the accuracy of histogram counting query.

...read moreread less

Abstract: With the rapid development of information science and the Internet of Things (IoT), people have an unprecedented ability to collect and share data, especially the various sensors as the entrance to data collection. At the same time, edge computing has begun to grasp the public's attention because of the difficult challenges of massive equipment access and massive data. Although such large amount of data provides a huge opportunity for information discovery, the privacy leakage has also been concerned. When the data publisher publishes various statistics, the attacker can obtain the statistical rules in the data by simply using the query function without contacting the user or the data publisher. Therefore, how to protect the data privacy of statistical information has become the focus of attention. In this paper, we proposed a partitioned histogram data publishing algorithm based on wavelet transform. Firstly, a partitioning algorithm based on greedy algorithm is used to obtain a better partition structure. Then, we use wavelet transform to add noise. Finally, for the authenticity and usability of histogram, we get the reductive original histogram structure. On the one hand, our algorithm can reduce the complexity of wavelet tree constructed by wavelet transform. On the other hand, the query noise changes from linear growth to multiple logarithm growth. So the accuracy of histogram counting query is improved. Experiments show that our algorithm has the significant improvement in data availability.

...read moreread less

Journal Article•DOI•

IHP: improving the utility in differential private histogram publication

[...]

Hui Li¹, Jiangtao Cui¹, Xue Meng¹, Jianfeng Ma¹•Institutions (1)

Xidian University¹

02 Jan 2019-Distributed and Parallel Databases

TL;DR: A new DP histogram publishing scheme is proposed, namely Iterative Histogram Partition, in which the privacy budget between grouping and injection phases is carefully assigned, and it is theoretically proved that$$ϵ-differential privacy can be achieved according to this new scheme.

...read moreread less

Abstract: Differential privacy (DP) is a promising tool for preserving privacy during data publication, as it provides strong theoretical privacy guarantees in face of adversaries with arbitrary background knowledge. Histogram, as the result of a set of count queries, serves as a core statistical tool to report data distributions and is in fact viewed as the fundamental method for many other statistical analysis such as range queries. It is an important form for data publishing. In this paper, we consider the scenario of publishing sensitive histogram data with differential privacy scheme. Existing work in this field has justified that, comparing to directly applying DP techniques (i.e., injecting noise) over the counts in histogram bins, grouping bins before noise injection is more effective (i.e., with higher utility) as it introduces much less error over the sanitized histogram given the same privacy budget. However, state-of-the-art works have not unveiled how the overall utility of a sanitized histogram can be affected by the balance between the privacy budget distributed between grouping and noise injection phases. In this work, we conduct a theoretical study towards how the probability of getting better groups can be improved such that the overall error introduced in sanitized histogram can be further reduced, which directly leads to a higher utility for the sanitized histograms. In particular, we show that the probability of achieving better grouping can be affected by two factors, namely privacy budget assigned in grouping and the normalized utility function used for selecting groups. Motivated by that, we propose a new DP histogram publishing scheme, namely Iterative Histogram Partition, in which we carefully assign privacy budget between grouping and injection phases based on our theoretical study. We also theoretically prove that $$\epsilon $$ -differential privacy can be achieved according to our new scheme. Moreover, we also show that, under the same privacy budget, our scheme exhibits less errors in the sanitized histograms comparing with state-of-the-art methods. We also extends the model to multi-dimensional histogram publication cases. Finally, empirical study over four real-world datasets also justifies that our scheme achieves the least error among series of state-of-the-art baseline methods.

...read moreread less

Journal Article•DOI•

$\tau$ -Safe ( $l,k$ )-Diversity Privacy Model for Sequential Publication With High Utility

[...]

Hui Zhu¹, Hongbin Liang¹, Lian Zhao², Daiyuan Peng¹, Ling Xiong³ - Show less +1 more•Institutions (3)

Southwest Jiaotong University¹, Ryerson University², Xihua University³

01 Jan 2019-IEEE Access

TL;DR: This paper develops a new $\tau $ -safe ( $l,k$ )-diversity privacy model based on generalization and segmentation by record anonymity satisfying l -diversity and individual anonymity satisfying k -anonymity to protect the privacy of individuals in sequential publication.

...read moreread less

Abstract: Preserving privacy while maintaining high utility during sequential publication for data providers and data users in mathematical statistics, scientific researching, and organizations making decisions play an important role recently. The $\tau $ -safety model is the state-of-the-art model in sequential publication. However, it is based on the generalization technique, which has some drawbacks such as heavy information loss and difficulty of supporting marginal publication. Besides, the privacy of individuals is the major aspect that needs to be protected in privacy preserving data publishing. In this paper, to protect the privacy of individuals in sequential publication, we develop a new $\tau $ -safe ( $l,k$ )-diversity privacy model based on generalization and segmentation by record anonymity satisfying $l$ -diversity and individual anonymity satisfying $k$ -anonymity. This privacy model ensures that each record’s signatures keep consistency or have no intersection in all releases. It can get high data utility while resisting the linking attacks due to arbitrary updates. In addition, it can also be applied to a dataset where individual has multiple records and arbitrary marginal publication. The results of our experiments show that the proposed privacy model achieves better anonymization quality and query accuracy in comparison with the $m$ -invariance and $\tau $ -safety model in the sequential publication with arbitrary updates.

...read moreread less

Journal Article•DOI•

Privacy Engineering for the Smart Micro-Grid

[...]

Ranjan Pal¹, Pan Hui², Viktor K. Prasanna¹•Institutions (2)

University of Southern California¹, University of Helsinki²

01 May 2019-IEEE Transactions on Knowledge and Data Engineering

TL;DR: HIDE is proposed, an oblivious computationally efficient, and rigorous information-theoretic privacy engineering framework for datasets/databases arising in the SMG environments that robustly accounts for multi-attribute correlations while preserving data privacy in a provably optimal fashion.

...read moreread less

Abstract: In developing countries, reliable electricity access is often undermined by the absence of supply from the national power grid and/or load shedding. To alleviate this problem, smart micro-grid (SMG) networks that are small scale distributed electricity provision networks composed of individual electricity providers and consumers, are being increasingly deployed. To ensure the reliable operation of SMGs, monitoring is necessary for data collection and state estimation processes. However, highly calibrated and trustworthy smart meters that are ideally suited to perform such monitoring tasks are often costly and non-ideally suited to SMGs which operate under unreliable communication network infrastructures. As a result, SMGs are an easy target to an adversary who can very easily gain access to private information by monitoring transmission between nodes in the SMG network, and launch inference-based privacy attacks. These attacks lead to electricity theft and grid instability problems in the SMG. The widely popular differential privacy (DP) technique (a rigorous technique in the family of privacy-preserving data publishing (PPDP) techniques to mathematically guarantee the preservation of data privacy) does not address multi-attribute correlations, that are inherently exploited by an adversary in inference attacks. In this paper, we propose HIDE, an oblivious computationally efficient, and rigorous information-theoretic privacy engineering framework for datasets/databases arising in the SMG environments that robustly accounts for multi-attribute correlations while preserving data privacy in a provably optimal fashion. A salient and powerful advantage of HIDE is its ability to generate optimal utility-privacy tradeoffs (computationally efficiently) when the privacy preserving entity in the worst case might have no prior statistical information that links a user’s private data with his public data.

...read moreread less

Proceedings Article•DOI•

Blockchain-based secure digital asset exchange scheme with QoS-aware incentive mechanism

[...]

Jiawei Zheng, Xuewen Dong, Liu Qihang, Xinghui Zhu, Tong Wei¹ - Show less +1 more•Institutions (1)

Xidian University¹

26 May 2019

TL;DR: A digital asset exchange mechanism based on blockchain technology, in which the behavior of data publishing and exchanging is recorded into the blockchain, which can ensure the reliability and transparency of data exchange without the restriction of trusted third-party payment institutions is proposed.

...read moreread less

Abstract: As the Internet of things (IoT) is increasingly popular, the number of IoT devices such as sensors and smart equipments are growing at an astonishing rate and data generated by these devices is exploding. However, these massive IoT data, stored in the form of isolated data centers, can not be shared by others who also need it. Moreover, data exchange is now needing a secure and fair mechanism to guarantee the data provider's rights and data security. Data providers also lack the motivation to share their data, as no effective mechanism exists to reward this behavior. To solve these problems, we propose a digital asset exchange mechanism based on blockchain technology, in which we record the behavior of data publishing and exchanging into the blockchain, which can ensure the reliability and transparency of data exchange without the restriction of trusted third-party payment institutions. Especially, to inspire the data providers to share their high-quality data, we design an incentive mechanism based on QoS, which gives higher rewards to those who provide high-quality data. Experimental results of this prototype demonstrate that this mechanism is appropriate to be applied in practice.

...read moreread less

Journal Article•DOI•

Privacy Preserving Data Publishing with Multiple Sensitive Attributes based on Overlapped Slicing

[...]

Widodo¹, Widodo², Eko K. Budiardjo, Wahyu Catur Wibowo•Institutions (2)

University of Indonesia¹, State University of Jakarta²

21 Nov 2019-Information-an International Interdisciplinary Journal

TL;DR: A novel overlapped slicing method for privacy preserving data publishing with multiple sensitive attributes is proposed and it is shown that the method obtained a lower discernibility value than other methods.

...read moreread less

Abstract: Investigation into privacy preserving data publishing with multiple sensitive attributes is performed to reduce probability of adversaries to guess the sensitive values. Masking the sensitive values is usually performed by anonymizing data by using generalization and suppression techniques. A successful anonymization technique should reduce information loss due to the generalization and suppression. This research attempts to solve both problems in microdata with multiple sensitive attributes. We propose a novel overlapped slicing method for privacy preserving data publishing with multiple sensitive attributes. We used discernibility metrics to measure information loss. The experiment result shows that our method obtained a lower discernibility value than other methods.

...read moreread less

Journal Article•DOI•

Maximum delay anonymous clustering feature tree based privacy-preserving data publishing in social networks

[...]

Jinquan Zhang¹, Bowen Zhao¹, Guochao Song¹, Lina Ni², Lina Ni¹, Jiguo Yu³ - Show less +2 more•Institutions (3)

Shandong University of Science and Technology¹, Tongji University², Qilu University of Technology³

01 Jan 2019-Procedia Computer Science

TL;DR: A privacy preservation scheme of sensitive data publishing in social networks based on Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) algorithm to tackle this issue is proposed.

...read moreread less

Journal Article•DOI•

Balancing control, usability and visibility of linked open government data to create public value

[...]

Benedikt Simon Hitz-Gamper, Oliver Neumann, Matthias Stürmer

15 Mar 2019-International Journal of Public Sector Management

TL;DR: This study provides recommendations for public sector organizations for the development of their data publishing strategy to balance control, usability and visibility considering also the growing popularity of open knowledge bases such as Wikidata.

...read moreread less

Abstract: Linked data is a technical standard to structure complex information and relate independent sets of data. Recently, governments have started to use this technology for bridging separated data “(silos)” by launching linked open government data (LOGD) portals. The purpose of this paper is to explore the role of LOGD as a smart technology and strategy to create public value. This is achieved by enhancing the usability and visibility of open data provided by public organizations.,In this study, three different LOGD governance modes are deduced: public agencies could release linked data via a dedicated triple store, via a shared triple store or via an open knowledge base. Each of these modes has different effects on usability and visibility of open data. Selected case studies illustrate the actual use of these three governance modes.,According to this study, LOGD governance modes present a trade-off between retaining control over governmental data and potentially gaining public value by the increased use of open data by citizens.,This study provides recommendations for public sector organizations for the development of their data publishing strategy to balance control, usability and visibility considering also the growing popularity of open knowledge bases such as Wikidata.

...read moreread less

Proceedings Article•DOI•

An Approach for Distributing Sensitive Values in k-Anonymity

[...]

Widodo¹, Eko K. Budiardjo¹, Wahyu Catur Wibowo¹, Harry T. Yani Achsan¹•Institutions (1)

University of Indonesia¹

01 Oct 2019

TL;DR: Experimental result show the proposed approach, called Simple Distribution of Sensitive Value, outperformed systematic clustering when high-sensitive value is distributed and is considered as very effective method to group quasi identifier.

...read moreread less

Abstract: k-anonymity is a popular model in privacy preserving data publishing. It provides privacy guarantee when a microdata table is released. In microdata, sensitive attributes contain high-sensitive and low sensitive values. Unfortunately, study in anonymity for distributing sensitive value is still rare. This study aims to distribute evenly high-sensitive value to quasi identifier group. We proposed an approach called Simple Distribution of Sensitive Value. We compared our method with systematic clustering which is considered as very effective method to group quasi identifier. Information entropy is used to measure the diversity in each quasi identifier group and in a microdata table. Experiment result show our method outperformed systematic clustering when high-sensitive value is distributed.

...read moreread less

Proceedings Article•DOI•

Enhancing the Usefulness of Open Governmental Data with Linked Data Viewing Techniques

[...]

Erwin Folmer, Wouter Beek¹, Laurens Rietveld, S. Ronzhin, Rutger Geerling², Davey den Haan - Show less +2 more•Institutions (2)

VU University Amsterdam¹, Saxion University of Applied Sciences²

08 Jan 2019

TL;DR: Action research conducted within the context of the Dutch Cadastre's open data platform is described, which develops four components for Linked Data viewing to enhance the current situation, making it easier to observe what a dataset is about and which potential use cases it could serve.

...read moreread less

Abstract: Open Governmental Data publishing has had mixed success. While many governmental bodies are publishing an increasing number of datasets online, the potential usefulness is rather low. This paper describes action research conducted within the context of the Dutch Cadastre’s open data platform. We start by observing contemporary (Dutch) Open Data platforms and observe that dataset reuse is not always realized. We introduce Linked Open Data, which promises to deliver solutions to the lack of Open Data reuse. In the process of implementing Linked Data in practice, we observe that users face a knowledge and skill and that contemporary Linked Open Data tooling is often unable to properly advertise the usefulness of datasets to potential users, thereby hampering reuse. We therefore develop four components for Linked Data viewing to enhance the current situation, making it easier to observe what a dataset is about and which potential use cases it could serve.

...read moreread less

Proceedings Article•DOI•

Privacy-Preserving Sketching for Online Social Network Data Publication

[...]

Tianchong Gao¹, Feng Li¹•Institutions (1)

Indiana University – Purdue University Indianapolis¹

10 Jun 2019

TL;DR: This paper introduces an anonymization algorithm based on All-Distance Sketch (ADS), and proposes the novel bottom-(l, k) sketch to defend against advanced attacks, and develops a scheme to add and delete enough edges to satisfy the privacy demand.

...read moreread less

Abstract: Releasing private data can cause panic to both Online Social Network (OSN) users and service providers. Therefore, anonymization mechanisms are proposed to protect data before sharing it. However, some of these mechanisms set unrealistic privacy demands but cannot defend against real-world de-anonymization attacks.In this paper, we introduce an anonymization algorithm based on All-Distance Sketch (ADS). Sketching can significantly limit attackers’ confidence, as well as provide accurate estimation about shortest path length and other utility metrics. Because sketching removes large amounts of edges, it is invulnerable to seed-based and subgraph-based de-anonymization attacks. However, existing sketching algorithms do not add dummy edges and paths. Adversaries have low false positive in extracting linking information, which challenges the privacy performance. We propose the novel bottom-(l, k) sketch to defend against these advanced attacks. We develop a scheme to add and delete enough edges to satisfy our privacy demand. The experiment results show that our published graphs are closely matched with the original graphs under some metrics, preserving utility, while 80% edges are removed, ensuring privacy.

...read moreread less