scispace - formally typeset
Search or ask a question

Showing papers on "Data publishing published in 2019"


Journal ArticleDOI
TL;DR: It is proved that DADP can provide real-time crowd-sourced statistical data publishing with strong privacy protection under an untrusted server and a distributed budget allocation mechanism and an agent-based dynamic grouping mechanism to realize global $w-event $\epsilon$ε-differential privacy in a distributed way.
Abstract: The continuous publication of aggregate statistics over crowd-sourced data to the public has enabled many data mining applications (e.g., real-time traffic analysis). Existing systems usually rely on a trusted server to aggregate the spatio-temporal crowd-sourced data and then apply differential privacy mechanism to perturb the aggregate statistics before publishing to provide strong privacy guarantee. However, the privacy of users will be exposed once the server is hacked or cannot be trusted. In this paper, we study the problem of real-time crowd-sourced statistical data publishing with strong privacy protection under an untrusted server. We propose a novel distributed agent-based privacy-preserving framework, called DADP, that introduces a new level of multiple agents between the users and the untrusted server. Instead of directly uploading the check-in information to the untrusted server, a user can randomly select one agent and upload the check-in information to it with the anonymous connection technology. Each agent aggregates the received crowd-sourced data and perturbs the aggregated statistics locally with Laplace mechanism. The perturbed statistics from all the agents are further combined together to form the entire perturbed statistics for publication. In particular, we propose a distributed budget allocation mechanism and an agent-based dynamic grouping mechanism to realize global $w$w-event $\epsilon$e-differential privacy in a distributed way. We prove that DADP can provide $w$w-event $\epsilon$e-differential privacy for real-time crowd-sourced statistical data publishing under the untrusted server. Extensive experiments on real-world datasets demonstrate the effectiveness of DADP.

101 citations


Book ChapterDOI
25 Jun 2019
TL;DR: In this paper, a differential privacy framework for privacy preserving data publishing using Generative Adversarial Networks (GANs) is proposed, which can be easily adapted to different use cases, from the generation of time series, to continuous, and discrete data.
Abstract: Open data plays a fundamental role in the 21st century by stimulating economic growth and by enabling more transparent and inclusive societies. However, it is always difficult to create new high-quality datasets with the required privacy guarantees for many use cases. In this paper, we developed a differential privacy framework for privacy preserving data publishing using Generative Adversarial Networks. It can be easily adapted to different use cases, from the generation of time-series, to continuous, and discrete data. We demonstrate the efficiency of our approach on real datasets from the French public administration and classic benchmark datasets. Our results maintain both the original distribution of the features and the correlations among them, at the same time providing a good level of privacy.

82 citations


Journal ArticleDOI
TL;DR: The proposed solution based on IOTA Tangle and MAM could overcome many challenges faced by other traditional blockchain-based solutions in terms of cost, efficiency, scalability, and flexibility in data access management.
Abstract: Background: Huge amounts of health-related data are generated every moment with the rapid development of Internet of Things (IoT) and wearable technologies. These big health data contain great value and can bring benefit to all stakeholders in the health care ecosystem. Currently, most of these data are siloed and fragmented in different health care systems or public and private databases. It prevents the fulfillment of intelligent health care inspired by these big data. Security and privacy concerns and the lack of ensured authenticity trails of data bring even more obstacles to health data sharing. With a decentralized and consensus-driven nature, distributed ledger technologies (DLTs) provide reliable solutions such as blockchain, Ethereum, and IOTA Tangle to facilitate the health care data sharing. Objective: This study aimed to develop a health-related data sharing system by integrating IoT and DLT to enable secure, fee-less, tamper-resistant, highly-scalable, and granularly-controllable health data exchange, as well as build a prototype and conduct experiments to verify the feasibility of the proposed solution. Methods: The health-related data are generated by 2 types of IoT devices: wearable devices and stationary air quality sensors. The data sharing mechanism is enabled by IOTA’s distributed ledger, the Tangle, which is a directed acyclic graph. Masked Authenticated Messaging (MAM) is adopted to facilitate data communications among different parties. Merkle Hash Tree is used for data encryption and verification. Results: A prototype system was built according to the proposed solution. It uses a smartwatch and multiple air sensors as the sensing layer; a smartphone and a single-board computer (Raspberry Pi) as the gateway; and a local server for data publishing. The prototype was applied to the remote diagnosis of tremor disease. The results proved that the solution could enable costless data integrity and flexible access management during data sharing. Conclusions: DLT integrated with IoT technologies could greatly improve the health-related data sharing. The proposed solution based on IOTA Tangle and MAM could overcome many challenges faced by other traditional blockchain-based solutions in terms of cost, efficiency, scalability, and flexibility in data access management. This study also showed the possibility of fully decentralized health data sharing by replacing the local server with edge computing devices.

75 citations


Journal ArticleDOI
TL;DR: An empirical evaluation on both synthetic and real-world datasets shows that the proposed PrivRank framework can efficiently provide effective and continuous protection of user-specified private data, while still preserving the utility of the obfuscated data for personalized ranking-based recommendation.
Abstract: Personalized recommendation is crucial to help users find pertinent information. It often relies on a large collection of user data, in particular users’ online activity (e.g., tagging/rating/checking-in) on social media, to mine user preference. However, releasing such user activity data makes users vulnerable to inference attacks, as private data (e.g., gender) can often be inferred from the users’ activity data. In this paper, we proposed PrivRank, a customizable and continuous privacy-preserving social media data publishing framework protecting users against inference attacks while enabling personalized ranking-based recommendations. Its key idea is to continuously obfuscate user activity data such that the privacy leakage of user-specified private data is minimized under a given data distortion budget, which bounds the ranking loss incurred from the data obfuscation process in order to preserve the utility of the data for enabling recommendations. An empirical evaluation on both synthetic and real-world datasets shows that our framework can efficiently provide effective and continuous protection of user-specified private data, while still preserving the utility of the obfuscated data for personalized ranking-based recommendation. Compared to state-of-the-art approaches, PrivRank achieves both a better privacy protection and a higher utility in all the ranking-based recommendation use cases we tested.

63 citations


Journal ArticleDOI
TL;DR: This paper proposes a new anonymization scheme of data privacy for e-health records which differs from existing approaches in its ability to prevent from identity disclosure even faced with adversaries having pertinent background knowledge.

50 citations


Journal ArticleDOI
TL;DR: The proposed fog-computing-based differential privacy approach for privacy-preserving data publishing can not only effectively protect citizens’ privacy, but also reduce the query sensitivity and improve the utility of the data published.

40 citations


Journal ArticleDOI
TL;DR: An optimization scheme, Brain Storm based Whale Optimization Algorithm (BS-WOA), is introduced for identifying the secret key for preserving the privacy and utility in the data structures of the data owner.
Abstract: Cloud computing serves as a major boost for the digital era since it handles data from a large number of users simultaneously. Besides the several useful characteristics, providing security to the data stored in the cloud platform is a major challenge for the service providers. Privacy preservation schemes introduced in the literature trying to enhance the privacy and utility of the data structures by modifying the database with the secret key. In this paper, an optimization scheme, Brain Storm based Whale Optimization Algorithm (BS-WOA), is introduced for identifying the secret key. The database from the data owner is modified with the optimal secret key for constructing the retrievable perturbation data for preserving the privacy and utility. The proposed BS-WOA is designed through the hybridization of Brain Storm Optimization and Whale Optimization Algorithm. Simulation of the proposed technique with the BS-WOA is done in the three standard databases, such as chess T10I4D100 K, and the retail databases. When evaluated for the key size of 256, the proposed BS-WOA achieved privacy value of 0.186 and utility value of 0.8777 for the chess database, and thus, has improved performance.

32 citations


Journal ArticleDOI
TL;DR: Improved Scalable l-Diversity (ImSLD) approach is proposed which is the extension of Improved Scalable k-Anonymity (ImSKA) for scalable anonymization in this paper, based on scalable k-anonymization that uses MapReduce as a programming paradigm.

30 citations


Patent
Liu Jingwei, Xin Li, Xiaolu Li, Sun Rong, Pei Qingqi 
12 Feb 2019
TL;DR: In this paper, an electronic medical record storage and sharing model and method based on the blockchains is proposed to solve the problems of patient's rights of access to personal medical data and insecure storage of sensitive medical data in the prior art.
Abstract: The invention discloses an electronic medical record storage and sharing model and method based on the blockchains and solves problems of patient's rights of access to personal medical data and insecure storage and sharing of sensitive medical data in the prior art. The model is characterized by comprising data creators, data owners, cloud storage, federated blockchains and data consumers, whereinthe blockchain is a control center. The method comprises steps that system initialization is performed; medical data are acquired, and data storage with intercepted signatures is employed; data publishing with the improved DPOS consensus mechanism is employed; data sharing based on smart contracts is performed. The method is advantaged in that security, reliability, privacy protection and securestorage are achieved, in combination with the cloud storage technology and the interceptable signature technology, in the federated blockchains, users can set sharing conditions through the smart contracts, safe and effective data sharing and access can be realized, and strong practicality is achieved.

28 citations


Posted Content
TL;DR: This paper serves as an introductory reading on a critical subject in an era of growing awareness about privacy risks connected to digital services, and provides insights into open problems and future directions for research.
Abstract: We survey the literature on the privacy of trajectory micro-data, i.e., spatiotemporal information about the mobility of individuals, whose collection is becoming increasingly simple and frequent thanks to emerging information and communication technologies. The focus of our review is on privacy-preserving data publishing (PPDP), i.e., the publication of databases of trajectory micro-data that preserve the privacy of the monitored individuals. We classify and present the literature of attacks against trajectory micro-data, as well as solutions proposed to date for protecting databases from such attacks. This paper serves as an introductory reading on a critical subject in an era of growing awareness about privacy risks connected to digital services, and provides insights into open problems and future directions for research.

27 citations


Proceedings ArticleDOI
02 May 2019
TL;DR: This paper presents the results of an interview study with data practitioners, from which four high-level user needs for tool support are derived, and suggests that data-centric collaborative work would benefit from structured documentation of data and its lifecycle; advanced affordances for conversations among collaborators; better change control; and custom data access.
Abstract: Collaborative work with data is increasingly common and spans a broad range of activities - from creating or analysing data in a team, to sharing it with others, to reusing someone else's data in a new context. In this paper, we explore collaboration practices around structured data and how they are supported by current technology. We present the results of an interview study with twenty data practitioners, from which we derive four high-level user needs for tool support. We compare them against the capabilities of twenty systems that are commonly associated with data activities, including data publishing software, wikis, web-based collaboration tools, and online community platforms. Our findings suggest that data-centric collaborative work would benefit from: structured documentation of data and its lifecycle; advanced affordances for conversations among collaborators; better change control; and custom data access. The findings help us formalise practices around data teamwork, and build a better understanding how people's motivations and barriers when working with structured data.

Journal ArticleDOI
TL;DR: A comprehensive survey of the previous research done to develop techniques for ensuring privacy of patient data that includes demographics data, diagnosis codes and the data containing both demographics and diagnosis codes is presented.

Posted Content
TL;DR: This paper developed a differential privacy framework for privacy preserving data publishing using Generative Adversarial Networks that can be easily adapted to different use cases, from the generation of time-series, to continuous, and discrete data.
Abstract: Open data plays a fundamental role in the 21th century by stimulating economic growth and by enabling more transparent and inclusive societies. However, it is always difficult to create new high-quality datasets with the required privacy guarantees for many use cases. This paper aims at creating a framework for releasing new open data while protecting the individuality of the users through a strict definition of privacy called differential privacy. Unlike previous work, this paper provides a framework for privacy preserving data publishing that can be easily adapted to different use cases, from the generation of time-series to continuous data, and discrete data; no previous work has focused on the later class. Indeed, many use cases expose discrete data or at least a combination between categorical and numerical values. Thanks to the latest developments in deep learning and generative models, it is now possible to model rich-semantic data maintaining both the original distribution of the features and the correlations between them. The output of this framework is a deep network, namely a generator, able to create new data on demand. We demonstrate the efficiency of our approach on real datasets from the French public administration and classic benchmark datasets.

Journal ArticleDOI
TL;DR: This paper presents a new type of attack on 1:M records with MSAs, coined as MSAs generalization correlation attacks and proposes a privacy-preserving technique “(p, l)-Angelization” for 1-M–MSAs data publication and advocates the outperformance of the technique over its counterparts.

Journal ArticleDOI
TL;DR: A time-saving k-degree anonymization method in social network (TSRAM) that anonymizes the social network graph without having to rescan the data set for different levels of anonymity and is effective to preserve the utility of the anonymized graph.
Abstract: Social networks provide an attractive environment in order to have low cost and easy communication; however, analyzing huge amounts of produced data can considerably affect the user's privacy. In other words, an efficient algorithm should intelligently take the user's privacy into account while extracting data for useful information. In recent years, many studies have been conducted on social network privacy preservation for data publishing. However, the current algorithms are not one-time scan; that is, for every level of anonymization, the data set must be scanned again and this is a time-consuming operation. In order to address the above mentioned issue, the present research introduces a time-saving k-degree anonymization method in social network (TSRAM) that anonymizes the social network graph without having to rescan the data set for different levels of anonymity. First, it produces a tree from the data set. Then, the anonymized degree sequence of the graph is computed based on the tree. The proposed method employs an efficient approach to partition the node degrees. It takes advantage of partitioning the graph bottom-up nodes based on the anonymization levels. Moreover, it uses two effective criteria to increase the utility of the anonymized graph. Comparing to other similar techniques, the results show that TSRAM is effective, not only to make the degree sequence anonymization of the graph one-time scan, but also to preserve the utility of the anonymized graph.

Journal ArticleDOI
TL;DR: A Sequence R (SR)-tree structure is proposed that satisfies the differential privacy based on the R-tree, and an attack model called non-location sensitive information attack is put forward, in order to resist this attack, noise is added into the location data and non- location sensitive data using differential privacy techniques.
Abstract: The existing location-based services have collected a large amount of user trajectory data, and if these data are directly released without any processing, the user's personal privacy will be leaked. At present, differential privacy protection technology is favored by many scholars, but how to apply it reasonably to location-based services is also a challenge for us. Trajectory is spatiotemporal continuous, but most existing methods only consider the single location of moving objects at a certain time without considering the entire trajectory, which may destroy the spatiotemporal integrity of the trajectory. In this paper, we address this problem and firstly propose a Sequence R (SR)-tree structure that satisfies the differential privacy based on the R-tree, and we construct the SR-Tree by using the trajectory sequence instead of the minimum bounding rectangle of the R-tree. Then we put forward an attack model called non-location sensitive information attack, in order to resist this attack, we add noise into the location data and non-location sensitive data using differential privacy techniques. Finally, the Algorithm can be consistently dealt with the problem of data inconsistency after adding noise. Experimental results show that our Algorithm not only has high data availability, operational efficiency, but also has good scalability.

Posted Content
TL;DR: This paper proposes an improved suppression method, which reduces the disclosure risk and enhances the data utility by targeting the highest risk records and keeping other records intact, and has shown the effectiveness of this approach through an experiment on a real-world confidential dataset.
Abstract: In Privacy Preserving Data Publishing, various privacy models have been developed for employing anonymization operations on sensitive individual level datasets, in order to publish the data for public access while preserving the privacy of individuals in the dataset. However, there is always a trade-off between preserving privacy and data utility; the more changes we make on the confidential dataset to reduce disclosure risk, the more information the data loses and the less data utility it preserves. The optimum privacy technique is the one that results in a dataset with minimum disclosure risk and maximum data utility. In this paper, we propose an improved suppression method, which reduces the disclosure risk and enhances the data utility by targeting the highest risk records and keeping other records intact. We have shown the effectiveness of our approach through an experiment on a real-world confidential dataset.

Proceedings ArticleDOI
07 Jul 2019
TL;DR: This work proposes a new data publishing algorithm in which a released dataset is formed by mixing ` randomly chosen data points and then perturbing them with an additive noise, and shows that as ` increases, noise with smaller variance is sufficient to achieve a target privacy level.
Abstract: The goal of differentially private data publishing is to release a modified dataset so that its privacy can be ensured while allowing for efficient learning. We propose a new data publishing algorithm in which a released dataset is formed by mixing ` randomly chosen data points and then perturbing them with an additive noise. Our privacy analysis shows that as ` increases, noise with smaller variance is sufficient to achieve a target privacy level. In order to quantify the usefulness of our algorithm, we adopt the accuracy of a predictive model trained with our synthetic dataset, which we call the utility of the dataset. By characterizing the utility of our dataset as a function of `, we show that one can learn both linear and nonlinear predictive models so that they yield reasonably good prediction accuracies. Particularly, we show that there exists a sweet spot on ` that maximizes the prediction accuracy given a required privacy level, or vice versa. We also demonstrate that given a target privacy level, our datasets can achieve higher utility than other datasets generated with the existing data publishing algorithms.

Journal ArticleDOI
TL;DR: The Hellenic Open University (HEOU) as discussed by the authors proposed a framework of interacting roles and factors to discover patterns in the learning process and the ability to publish and share these results would be very helpful for the whole academic institution.
Abstract: Recent technological advances have led to tremendous capacities for collecting, storing and analyzing data being created at an ever-increasing speed from diverse sources. Academic institutions which offer open and distance learning programs, such as the Hellenic Open University, can benefit from big data relating to its students’ information and communication systems and the use of modern techniques and tools of big data analytics provided that the student’s right to privacy is not compromised. The balance between data mining and maintaining privacy can be reached through anonymisation methods but on the other hand this approach raises technical problems such as the loss of a certain amount of information found in the original data. Considering the learning process as a framework of interacting roles and factors, the discovery of patterns in that system can be really useful and beneficial firstly for the learners and furthermore, the ability to publish and share these results would be very helpful for the whole academic institution.

Journal ArticleDOI
TL;DR: A partitioned histogram data publishing algorithm based on wavelet transform is proposed that can reduce the complexity of wavelet tree constructed byWavelet transform and improve the accuracy of histogram counting query.
Abstract: With the rapid development of information science and the Internet of Things (IoT), people have an unprecedented ability to collect and share data, especially the various sensors as the entrance to data collection. At the same time, edge computing has begun to grasp the public's attention because of the difficult challenges of massive equipment access and massive data. Although such large amount of data provides a huge opportunity for information discovery, the privacy leakage has also been concerned. When the data publisher publishes various statistics, the attacker can obtain the statistical rules in the data by simply using the query function without contacting the user or the data publisher. Therefore, how to protect the data privacy of statistical information has become the focus of attention. In this paper, we proposed a partitioned histogram data publishing algorithm based on wavelet transform. Firstly, a partitioning algorithm based on greedy algorithm is used to obtain a better partition structure. Then, we use wavelet transform to add noise. Finally, for the authenticity and usability of histogram, we get the reductive original histogram structure. On the one hand, our algorithm can reduce the complexity of wavelet tree constructed by wavelet transform. On the other hand, the query noise changes from linear growth to multiple logarithm growth. So the accuracy of histogram counting query is improved. Experiments show that our algorithm has the significant improvement in data availability.

Journal ArticleDOI
TL;DR: A new DP histogram publishing scheme is proposed, namely Iterative Histogram Partition, in which the privacy budget between grouping and injection phases is carefully assigned, and it is theoretically proved that$$ϵ-differential privacy can be achieved according to this new scheme.
Abstract: Differential privacy (DP) is a promising tool for preserving privacy during data publication, as it provides strong theoretical privacy guarantees in face of adversaries with arbitrary background knowledge. Histogram, as the result of a set of count queries, serves as a core statistical tool to report data distributions and is in fact viewed as the fundamental method for many other statistical analysis such as range queries. It is an important form for data publishing. In this paper, we consider the scenario of publishing sensitive histogram data with differential privacy scheme. Existing work in this field has justified that, comparing to directly applying DP techniques (i.e., injecting noise) over the counts in histogram bins, grouping bins before noise injection is more effective (i.e., with higher utility) as it introduces much less error over the sanitized histogram given the same privacy budget. However, state-of-the-art works have not unveiled how the overall utility of a sanitized histogram can be affected by the balance between the privacy budget distributed between grouping and noise injection phases. In this work, we conduct a theoretical study towards how the probability of getting better groups can be improved such that the overall error introduced in sanitized histogram can be further reduced, which directly leads to a higher utility for the sanitized histograms. In particular, we show that the probability of achieving better grouping can be affected by two factors, namely privacy budget assigned in grouping and the normalized utility function used for selecting groups. Motivated by that, we propose a new DP histogram publishing scheme, namely Iterative Histogram Partition, in which we carefully assign privacy budget between grouping and injection phases based on our theoretical study. We also theoretically prove that $$\epsilon $$ -differential privacy can be achieved according to our new scheme. Moreover, we also show that, under the same privacy budget, our scheme exhibits less errors in the sanitized histograms comparing with state-of-the-art methods. We also extends the model to multi-dimensional histogram publication cases. Finally, empirical study over four real-world datasets also justifies that our scheme achieves the least error among series of state-of-the-art baseline methods.

Journal ArticleDOI
TL;DR: This paper develops a new $\tau $ -safe ( $l,k$ )-diversity privacy model based on generalization and segmentation by record anonymity satisfying l -diversity and individual anonymity satisfying k -anonymity to protect the privacy of individuals in sequential publication.
Abstract: Preserving privacy while maintaining high utility during sequential publication for data providers and data users in mathematical statistics, scientific researching, and organizations making decisions play an important role recently. The $\tau $ -safety model is the state-of-the-art model in sequential publication. However, it is based on the generalization technique, which has some drawbacks such as heavy information loss and difficulty of supporting marginal publication. Besides, the privacy of individuals is the major aspect that needs to be protected in privacy preserving data publishing. In this paper, to protect the privacy of individuals in sequential publication, we develop a new $\tau $ -safe ( $l,k$ )-diversity privacy model based on generalization and segmentation by record anonymity satisfying $l$ -diversity and individual anonymity satisfying $k$ -anonymity. This privacy model ensures that each record’s signatures keep consistency or have no intersection in all releases. It can get high data utility while resisting the linking attacks due to arbitrary updates. In addition, it can also be applied to a dataset where individual has multiple records and arbitrary marginal publication. The results of our experiments show that the proposed privacy model achieves better anonymization quality and query accuracy in comparison with the $m$ -invariance and $\tau $ -safety model in the sequential publication with arbitrary updates.

Journal ArticleDOI
TL;DR: HIDE is proposed, an oblivious computationally efficient, and rigorous information-theoretic privacy engineering framework for datasets/databases arising in the SMG environments that robustly accounts for multi-attribute correlations while preserving data privacy in a provably optimal fashion.
Abstract: In developing countries, reliable electricity access is often undermined by the absence of supply from the national power grid and/or load shedding. To alleviate this problem, smart micro-grid (SMG) networks that are small scale distributed electricity provision networks composed of individual electricity providers and consumers, are being increasingly deployed. To ensure the reliable operation of SMGs, monitoring is necessary for data collection and state estimation processes. However, highly calibrated and trustworthy smart meters that are ideally suited to perform such monitoring tasks are often costly and non-ideally suited to SMGs which operate under unreliable communication network infrastructures. As a result, SMGs are an easy target to an adversary who can very easily gain access to private information by monitoring transmission between nodes in the SMG network, and launch inference-based privacy attacks. These attacks lead to electricity theft and grid instability problems in the SMG. The widely popular differential privacy (DP) technique (a rigorous technique in the family of privacy-preserving data publishing (PPDP) techniques to mathematically guarantee the preservation of data privacy) does not address multi-attribute correlations, that are inherently exploited by an adversary in inference attacks. In this paper, we propose HIDE, an oblivious computationally efficient, and rigorous information-theoretic privacy engineering framework for datasets/databases arising in the SMG environments that robustly accounts for multi-attribute correlations while preserving data privacy in a provably optimal fashion. A salient and powerful advantage of HIDE is its ability to generate optimal utility-privacy tradeoffs (computationally efficiently) when the privacy preserving entity in the worst case might have no prior statistical information that links a user’s private data with his public data.

Proceedings ArticleDOI
Jiawei Zheng, Xuewen Dong, Liu Qihang, Xinghui Zhu, Tong Wei1 
26 May 2019
TL;DR: A digital asset exchange mechanism based on blockchain technology, in which the behavior of data publishing and exchanging is recorded into the blockchain, which can ensure the reliability and transparency of data exchange without the restriction of trusted third-party payment institutions is proposed.
Abstract: As the Internet of things (IoT) is increasingly popular, the number of IoT devices such as sensors and smart equipments are growing at an astonishing rate and data generated by these devices is exploding. However, these massive IoT data, stored in the form of isolated data centers, can not be shared by others who also need it. Moreover, data exchange is now needing a secure and fair mechanism to guarantee the data provider's rights and data security. Data providers also lack the motivation to share their data, as no effective mechanism exists to reward this behavior. To solve these problems, we propose a digital asset exchange mechanism based on blockchain technology, in which we record the behavior of data publishing and exchanging into the blockchain, which can ensure the reliability and transparency of data exchange without the restriction of trusted third-party payment institutions. Especially, to inspire the data providers to share their high-quality data, we design an incentive mechanism based on QoS, which gives higher rewards to those who provide high-quality data. Experimental results of this prototype demonstrate that this mechanism is appropriate to be applied in practice.

Journal ArticleDOI
TL;DR: A novel overlapped slicing method for privacy preserving data publishing with multiple sensitive attributes is proposed and it is shown that the method obtained a lower discernibility value than other methods.
Abstract: Investigation into privacy preserving data publishing with multiple sensitive attributes is performed to reduce probability of adversaries to guess the sensitive values. Masking the sensitive values is usually performed by anonymizing data by using generalization and suppression techniques. A successful anonymization technique should reduce information loss due to the generalization and suppression. This research attempts to solve both problems in microdata with multiple sensitive attributes. We propose a novel overlapped slicing method for privacy preserving data publishing with multiple sensitive attributes. We used discernibility metrics to measure information loss. The experiment result shows that our method obtained a lower discernibility value than other methods.

Journal ArticleDOI
TL;DR: A privacy preservation scheme of sensitive data publishing in social networks based on Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) algorithm to tackle this issue is proposed.

Journal ArticleDOI
TL;DR: This study provides recommendations for public sector organizations for the development of their data publishing strategy to balance control, usability and visibility considering also the growing popularity of open knowledge bases such as Wikidata.
Abstract: Linked data is a technical standard to structure complex information and relate independent sets of data. Recently, governments have started to use this technology for bridging separated data “(silos)” by launching linked open government data (LOGD) portals. The purpose of this paper is to explore the role of LOGD as a smart technology and strategy to create public value. This is achieved by enhancing the usability and visibility of open data provided by public organizations.,In this study, three different LOGD governance modes are deduced: public agencies could release linked data via a dedicated triple store, via a shared triple store or via an open knowledge base. Each of these modes has different effects on usability and visibility of open data. Selected case studies illustrate the actual use of these three governance modes.,According to this study, LOGD governance modes present a trade-off between retaining control over governmental data and potentially gaining public value by the increased use of open data by citizens.,This study provides recommendations for public sector organizations for the development of their data publishing strategy to balance control, usability and visibility considering also the growing popularity of open knowledge bases such as Wikidata.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: Experimental result show the proposed approach, called Simple Distribution of Sensitive Value, outperformed systematic clustering when high-sensitive value is distributed and is considered as very effective method to group quasi identifier.
Abstract: k-anonymity is a popular model in privacy preserving data publishing. It provides privacy guarantee when a microdata table is released. In microdata, sensitive attributes contain high-sensitive and low sensitive values. Unfortunately, study in anonymity for distributing sensitive value is still rare. This study aims to distribute evenly high-sensitive value to quasi identifier group. We proposed an approach called Simple Distribution of Sensitive Value. We compared our method with systematic clustering which is considered as very effective method to group quasi identifier. Information entropy is used to measure the diversity in each quasi identifier group and in a microdata table. Experiment result show our method outperformed systematic clustering when high-sensitive value is distributed.

Proceedings ArticleDOI
08 Jan 2019
TL;DR: Action research conducted within the context of the Dutch Cadastre's open data platform is described, which develops four components for Linked Data viewing to enhance the current situation, making it easier to observe what a dataset is about and which potential use cases it could serve.
Abstract: Open Governmental Data publishing has had mixed success. While many governmental bodies are publishing an increasing number of datasets online, the potential usefulness is rather low. This paper describes action research conducted within the context of the Dutch Cadastre’s open data platform. We start by observing contemporary (Dutch) Open Data platforms and observe that dataset reuse is not always realized. We introduce Linked Open Data, which promises to deliver solutions to the lack of Open Data reuse. In the process of implementing Linked Data in practice, we observe that users face a knowledge and skill and that contemporary Linked Open Data tooling is often unable to properly advertise the usefulness of datasets to potential users, thereby hampering reuse. We therefore develop four components for Linked Data viewing to enhance the current situation, making it easier to observe what a dataset is about and which potential use cases it could serve.

Proceedings ArticleDOI
10 Jun 2019
TL;DR: This paper introduces an anonymization algorithm based on All-Distance Sketch (ADS), and proposes the novel bottom-(l, k) sketch to defend against advanced attacks, and develops a scheme to add and delete enough edges to satisfy the privacy demand.
Abstract: Releasing private data can cause panic to both Online Social Network (OSN) users and service providers. Therefore, anonymization mechanisms are proposed to protect data before sharing it. However, some of these mechanisms set unrealistic privacy demands but cannot defend against real-world de-anonymization attacks.In this paper, we introduce an anonymization algorithm based on All-Distance Sketch (ADS). Sketching can significantly limit attackers’ confidence, as well as provide accurate estimation about shortest path length and other utility metrics. Because sketching removes large amounts of edges, it is invulnerable to seed-based and subgraph-based de-anonymization attacks. However, existing sketching algorithms do not add dummy edges and paths. Adversaries have low false positive in extracting linking information, which challenges the privacy performance. We propose the novel bottom-(l, k) sketch to defend against these advanced attacks. We develop a scheme to add and delete enough edges to satisfy our privacy demand. The experiment results show that our published graphs are closely matched with the original graphs under some metrics, preserving utility, while 80% edges are removed, ensuring privacy.