Showing papers on "Data publishing published in 2012"

PDF

Open Access

Journal Article•DOI•

Slicing: A New Approach for Privacy Preserving Data Publishing

[...]

Tiancheng Li¹, Ninghui Li¹, Jian Zhang¹, Ian M. Molloy¹•Institutions (1)

01 Mar 2012-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Li et al. as discussed by the authors presented a novel technique called slicing, which partitions the data both horizontally and vertically, and showed that slicing preserves better data utility than generalization and can be used for membership disclosure protection.

...read moreread less

Abstract: Several anonymization techniques, such as generalization and bucketization, have been designed for privacy preserving microdata publishing. Recent work has shown that generalization loses considerable amount of information, especially for high-dimensional data. Bucketization, on the other hand, does not prevent membership disclosure and does not apply for data that do not have a clear separation between quasi-identifying attributes and sensitive attributes. In this paper, we present a novel technique called slicing, which partitions the data both horizontally and vertically. We show that slicing preserves better data utility than generalization and can be used for membership disclosure protection. Another important advantage of slicing is that it can handle high-dimensional data. We show how slicing can be used for attribute disclosure protection and develop an efficient algorithm for computing the sliced data that obey the l-diversity requirement. Our workload experiments confirm that slicing preserves better utility than generalization and is more effective than bucketization in workloads involving the sensitive attribute. Our experiments also demonstrate that slicing can be used to prevent membership disclosure.

...read moreread less

320 citations

Proceedings Article•DOI•

DPCube: Releasing Differentially Private Data Cubes for Health Information

[...]

Yonghui Xiao¹, James Gardner, Li Xiong¹•Institutions (1)

Emory University¹

01 Apr 2012

TL;DR: DPCube, a component in the HIDE framework, is demonstrated for releasing differentially private data cubes (or multi-dimensional histograms) for sensitive data and can serve as a sanitized synopsis of the raw database and can support various Online Analytical Processing queries and learning tasks.

...read moreread less

Abstract: We propose to demonstrate DPCube, a component in our Health Information DE-identification (HIDE) framework, for releasing differentially private data cubes (or multidimensional histograms) for sensitive data. HIDE is a framework we developed for integrating heterogenous structured and unstructured health information and provides methods for privacy preserving data publishing. The DPCube component provides the differentially private multidimensional data cube release. The DPCube algorithm uses the differentially private access mechanisms as provided by HIDE and guarantees differential privacy for the released data. It utilizes an innovative two-step multidimensional partitioning technique to publish a generalized data cube or multi-dimensional histogram that achieve good utility while satisfying the privacy requirement. We demonstrate that the released data cubes can serve as a sanitized synopsis of the raw database and, together with an optional synthesized dataset based on the data cubes, can support various Online Analytical Processing (OLAP) queries and learning tasks.

...read moreread less

51 citations

Book Chapter•DOI•

Secure distributed framework for achieving ε-differential privacy

[...]

Dima Alhadidi¹, Noman Mohammed¹, Benjamin C. M. Fung¹, Mourad Debbabi¹•Institutions (1)

Concordia University¹

11 Jul 2012

TL;DR: This paper presents the first generalization-based algorithm for differentially private data release for horizontally-partitioned data between two parties in the semi-honest adversary model, and presents a two-party protocol for the exponential mechanism.

...read moreread less

Abstract: Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among the existing privacy models, e-differential privacy provides one of the strongest privacy guarantees. In this paper, we address the problem of private data publishing where data is horizontally divided among two parties over the same set of attributes. In particular, we present the first generalization-based algorithm for differentially private data release for horizontally-partitioned data between two parties in the semi-honest adversary model. The generalization algorithm correctly releases differentially-private data and protects the privacy of each party according to the definition of secure multi-party computation. To achieve this, we first present a two-party protocol for the exponential mechanism. This protocol can be used as a subprotocol by any other algorithm that requires exponential mechanism in a distributed setting. Experimental results on real-life data suggest that the proposed algorithm can effectively preserve information for a data mining task.

...read moreread less

47 citations

Journal Article•DOI•

Limiting disclosure of sensitive data in sequential releases of databases

[...]

Erez Shmueli¹, Tamir Tassa², Raz Wasserstein¹, Bracha Shapira¹, Lior Rokach¹ - Show less +1 more•Institutions (2)

Ben-Gurion University of the Negev¹, Open University²

01 May 2012-Information Sciences

TL;DR: A theoretical study and experimentation are followed by experimentation that demonstrates a staggering improvement in terms of utility due to the adoption of the cell generalization model, and exemplifies the correction in the privacy evaluation as offered by using the Full or Kernel Match Joins instead of the Match Join.

...read moreread less

47 citations

Journal Article•DOI•

Data sharing and publishing in the field of neuroimaging

[...]

Janis L. Breeze, Jean-Baptiste Poline, David N. Kennedy¹•Institutions (1)

University of Massachusetts Amherst¹

12 Jul 2012-GigaScience

TL;DR: This commentary outlines the efforts of the International Neuroinformatics Coordinating Facility Task Force on Neuroimaging Datasharing to coordinate and establish such standards, as well as potential ways forward to relieve the issues that researchers who produce these massive, reusable community resources face when making the data rapidly and freely available to the public.

...read moreread less

Abstract: There is growing recognition of the importance of data sharing in the neurosciences, and in particular in the field of neuroimaging research, in order to best make use of the volumes of human subject data that have been acquired to date. However, a number of barriers, both practical and cultural, continue to impede the widespread practice of data sharing; these include: lack of standard infrastructure and tools for data sharing, uncertainty about how to organize and prepare the data for sharing, and researchers’ fears about unattributed data use or missed opportunities for publication. A further challenge is how the scientific community should best describe and/or reference shared data that is used in secondary analyses. Finally, issues of human research subject protections and the ethical use of such data are an ongoing source of concern for neuroimaging researchers. One crucial issue is how producers of shared data can and should be acknowledged and how this important component of science will benefit individuals in their academic careers. While we encourage the field to make use of these opportunities for data publishing, it is critical that standards for metadata, provenance, and other descriptors are used. This commentary outlines the efforts of the International Neuroinformatics Coordinating Facility Task Force on Neuroimaging Datasharing to coordinate and establish such standards, as well as potential ways forward to relieve the issues that researchers who produce these massive, reusable community resources face when making the data rapidly and freely available to the public. Both the technical and human aspects of data sharing must be addressed if we are to go forward.

...read moreread less

23 citations

Journal Article•DOI•

More than modelling and hiding: towards a comprehensive view of Web mining and privacy

[...]

Bettina Berendt¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 May 2012-Data Mining and Knowledge Discovery

TL;DR: A survey with an outline of an agenda for a comprehensive, interdisciplinary view of Web mining and privacy, which draws on notions of privacy not only as hiding, but as control and negotiation, as well as on data mining as the whole cycle of knowledge discovery.

...read moreread less

Abstract: Over the last decade, privacy has been widely recognised as one of the major problems of data collections in general and the Web in particular. This concerns specifically data arising from Web usage (such as querying or transacting) and social networking (characterised by rich self-profiling including relational information) and the inferences drawn from them. The data mining community has been very conscious of these issues and has addressed in particular the inference problems through various methods for "privacy-preserving data mining" and "privacy-preserving data publishing". However, it appears that these approaches by themselves cannot effectively solve the privacy problems posed by mining. We argue that this is due to the underlying notions of privacy and of data mining, both of which are too narrow. Drawing on notions of privacy not only as hiding, but as control and negotiation, as well as on data mining not only as modelling, but as the whole cycle of knowledge discovery, we offer an alternative view. This is intended to be a comprehensive view of the privacy challenges as well as solution approaches along all phases of the knowledge discovery cycle. The paper thus combines a survey with an outline of an agenda for a comprehensive, interdisciplinary view of Web mining and privacy.

...read moreread less

23 citations

Journal Article•DOI•

A Survey on Methods, Attacks and Metric for Privacy Preserving Data Publishing

[...]

Kiran P, Kavya N P

25 Sep 2012-International Journal of Computer Applications

TL;DR: This paper provides a review of various methods for anonymization and analyzes various disclosures that may happen in each of them and discusses various attacks that may take place during anonymization.

...read moreread less

Abstract: Privacy Preserving is a prerequisite for most of the existing systems. Data is usually distributed in the system so the main job of Data Publisher is to retrieve information from different location and to transform it in to some standard format suitable for Data Recipient. This information contains sensitive data which must be preserved by Data Publisher before it is published. So the core of this method is to preserve the sensitivity of data pertaining to individual or company related data. The complexity of its representation and the prerequisite of the current industry have driven lot of research in this direction. In this paper, we provide a review of various methods for anonymization and analyze various disclosures that may happen in each of them. We have also discussed various attacks that may take place during anonymization. A comprehensive study of various metric used for measuring anonymity has also been discussed.

...read moreread less

22 citations

Proceedings Article•DOI•

An information theoretic privacy and utility measure for data sanitization mechanisms

[...]

Mina Askari¹, Reihaneh Safavi-Naini¹, Ken Barker¹•Institutions (1)

University of Calgary¹

07 Feb 2012

TL;DR: A novel information theoretic framework for representing a formal model of a mechanism as a noisy channel and evaluating its privacy and utility and it is shown that using this framework the authors can compute the sanitization mechanism's preserved utility from the point of view of a data user.

...read moreread less

Abstract: Data collection agencies publish sensitive data for legitimate purposes, such as research, marketing and etc. Data publishing has attracted much interest in research community due to the important concerns over the protection of individuals privacy. As a result several sanitization mechanisms with different notions of privacy have been proposed. To be able to measure, set and compare the level of privacy protection, there is a need to translate these different mechanisms to a unified system. In this paper, we propose a novel information theoretic framework for representing a formal model of a mechanism as a noisy channel and evaluating its privacy and utility. We show that deterministic publishing property that is used in most of these mechanisms reduces the privacy guarantees and causes information to leak. The great effect of adversary's background knowledge on this metric is concluded. We also show that using this framework we can compute the sanitization mechanism's preserved utility from the point of view of a data user. By using the specifications of a popular sanitization mechanism, k-anonymity, we analytically provide a representation of this mechanism to be used for its evaluation.

...read moreread less

22 citations

Proceedings Article•DOI•

A new model for privacy preserving sensitive Data Mining

[...]

Muthuramalingam Prakash, G. Singaravel

26 Jul 2012

TL;DR: A new privacy measure called “(n, t)-proximity” is proposed which is more flexible model and problems with k-anonymity and l-Diversity are discussed.

...read moreread less

Abstract: Data Mining and Knowledge Discovery is an indispensable technology for business and researches in many fields such as statistics, machine learning, pattern recognition, databases and high performance computing. In which Privacy Preserving Data Mining has the potential to increase the reach and benefits of data mining technology. This allows publishing a microdata without disclosing private information. Publishing data about individuals without revealing sensitive information about them is an important problem. k-anonymity and l-Diversity has been proposed as a mechanism for protecting privacy in microdata publishing. But both the mechanisms are insufficient to protect the privacy issues like Homogeneity attack, Skewness Attack, Similarity attack and Background Knowledge Attack. A new privacy measure called “(n, t)-proximity” is proposed which is more flexible model. Here first introduction about data mining is presented, and then research challenges are given. Followed by privacy preservation measures and problems with k-anonymity and l-Diversity are discussed. The rest of the paper is organised as (n, t)-proximity model, experimental results and analysis followed by conclusion.

...read moreread less

18 citations

Journal Article•DOI•

An OBDD approach to enforce confidentiality and visibility constraints in data publishing

[...]

Valentina Ciriani¹, Sabrina De Capitani di Vimercati¹, Sara Foresti¹, Giovanni Livraga¹, Pierangela Samarati¹ - Show less +1 more•Institutions (1)

University of Milan¹

01 Sep 2012-Journal of Computer Security

TL;DR: This paper proposes a solution based on the release of vertical views fragments over a relational table that satisfy confidentiality and visibility constraints expressing requirements for information protection and release, respectively.

...read moreread less

Abstract: With the growing needs for data sharing and dissemination, privacy-preserving data publishing is becoming an important issue that still requires further investigation. In this paper, we make a step towards private data publication by proposing a solution based on the release of vertical views fragments over a relational table that satisfy confidentiality and visibility constraints expressing requirements for information protection and release, respectively. We translate the problem of computing a fragmentation composed of the minimum number of fragments into the problem of computing a maximum weighted clique over a fragmentation graph. The fragmentation graph models fragments, efficiently computed using Ordered Binary Decision Diagrams OBDDs, that satisfy all the confidentiality constraints and a subset of the visibility constraints defined in the system. We then show an exact and a heuristic algorithm for computing a minimal and a locally minimal fragmentation, respectively. Finally, we provide experimental results comparing the execution time and the fragmentations returned by the exact and heuristic algorithms. The experiments show that the heuristic algorithm has low computation cost and computes a fragmentation close to optimum.

...read moreread less

18 citations

Journal Article•DOI•

VarioML framework for comprehensive variation data representation and exchange

[...]

Myles Byrne¹, Ivo F.A.C. Fokkema², Owen Lancaster³, Tomasz Adamusiak⁴, Anni Ahonen-Bishopp, David Atlan, Christophe Béroud⁵, Michael Cornell, Raymond Dalgleish³, Andrew Devereau, George P. Patrinos⁶, Morris A. Swertz⁷, Peter E.M. Taschner², Gudmundur A. Thorisson³, Mauno Vihinen⁸, Mauno Vihinen⁹, Anthony J. Brookes³, Juha Muilu¹ - Show less +14 more•Institutions (9)

University of Helsinki¹, Leiden University Medical Center², University of Leicester³, Medical College of Wisconsin⁴, French Institute of Health and Medical Research⁵, RMIT University⁶, University Medical Center Groningen⁷, University of Tampere⁸, Lund University⁹

03 Oct 2012-BMC Bioinformatics

TL;DR: VarioML is a set of tools and practices improving the availability, quality, and comprehensibility of human variation information that enables researchers, diagnostic laboratories, and clinics to share that information with ease, clarity, and without ambiguity.

...read moreread less

Abstract: Background: Sharing of data about variation and the associated phenotypes is a critical need, yet variant information can be arbitrarily complex, making a single standard vocabulary elusive and re-formatting difficult. Complex standards have proven too time-consuming to implement. Results: The GEN2PHEN project addressed these difficulties by developing a comprehensive data model for capturing biomedical observations, Observ-OM, and building the VarioML format around it. VarioML pairs a simplified open specification for describing variants, with a toolkit for adapting the specification into one's own research workflow. Straightforward variant data can be captured, federated, and exchanged with no overhead; more complex data can be described, without loss of compatibility. The open specification enables push-button submission to gene variant databases (LSDBs) e.g., the Leiden Open Variation Database, using the Cafe Variome data publishing service, while VarioML bidirectionally transforms data between XML and web-application code formats, opening up new possibilities for open source web applications building on shared data. A Java implementation toolkit makes VarioML easily integrated into biomedical applications. VarioML is designed primarily for LSDB data submission and transfer scenarios, but can also be used as a standard variation data format for JSON and XML document databases and user interface components. Conclusions: VarioML is a set of tools and practices improving the availability, quality, and comprehensibility of human variation information. It enables researchers, diagnostic laboratories, and clinics to share that information with ease, clarity, and without ambiguity.

...read moreread less

Book Chapter•DOI•

Clustering-Based k -anonymity

[...]

Xianmang He¹, Huahui Chen¹, Yefang Chen¹, Yihong Dong¹, Peng Wang², Zhenhua Huang³ - Show less +2 more•Institutions (3)

Ningbo University¹, Fudan University², Tongji University³

29 May 2012

TL;DR: This work proposes a clustering-based k -anonymity algorithm, which achieves k -Anonymity through clustering, and shows that the utility has been improved by the approach.

...read moreread less

Abstract: Privacy is one of major concerns when data containing sensitive information needs to be released for ad hoc analysis, which has attracted wide research interest on privacy-preserving data publishing in the past few years. One approach of strategy to anonymize data is generalization. In a typical generalization approach, tuples in a table was first divided into many QI (quasi-identifier)-groups such that the size of each QI-group is no less than k . Clustering is to partition the tuples into many clusters such that the points within a cluster are more similar to each other than points in different clusters. The two methods share a common feature: distribute the tuples into many small groups. Motivated by this observation, we propose a clustering-based k -anonymity algorithm, which achieves k -anonymity through clustering. Extensive experiments on real data sets are also conducted, showing that the utility has been improved by our approach.

...read moreread less

Proceedings Article•DOI•

Private-HERMES: a benchmark framework for privacy-preserving mobility data querying and mining methods

[...]

Nikos Pelekis¹, Aris Gkoulalas-Divanis², Marios Vodas¹, Anargyros Plemenos¹, Despina Kopanaki¹, Yannis Theodoridis¹ - Show less +2 more•Institutions (2)

University of Piraeus¹, IBM²

27 Mar 2012

TL;DR: The demonstration of Private-HERMES via a real-world case study, illustrates the flexibility and usefulness of the platform for supporting privacy-aware data analysis, as well as for providing an extensible blueprint benchmark architecture for privacy-preservation related methods in mobility data.

...read moreread less

Abstract: Mobility data sources feed larger and larger trajectory databases nowadays. Due to the need of extracting useful knowledge patterns that improve services based on users' and customers' behavior, querying and mining such databases has gained significant attention in recent years. However, publishing mobility data may lead to severe privacy violations. In this paper, we present Private-HERMES, an integrated platform for applying data mining and privacy-preserving querying over mobility data. The presented platform provides a two-dimension benchmark framework that includes: (i) a query engine that provides privacy-aware data management functionality of the in-house data via a set of auditing mechanisms that protect the sensitive information against several types of attacks, and (ii) a progressive analysis framework, which, apart from anonymization methods for data publishing, includes various well-known mobility data mining techniques to evaluate the effect of anonymization in the querying and mining results. The demonstration of Private-HERMES via a real-world case study, illustrates the flexibility and usefulness of the platform for supporting privacy-aware data analysis, as well as for providing an extensible blueprint benchmark architecture for privacy-preservation related methods in mobility data.

...read moreread less

Proceedings Article•DOI•

Recursive partitioning and summarization: a practical framework for differentially private data publishing

[...]

Wahbeh Qardaji¹, Ninghui Li¹•Institutions (1)

Purdue University¹

02 May 2012

TL;DR: This paper considers the scenario in which a trusted curator gathers sensitive information from a large number of respondents, creates a relational dataset where each tuple corresponds to one entity, and then publishes a privacy-preserving version of the dataset.

...read moreread less

Abstract: In this paper we consider the problem of differentially private data publishing. In particular, we consider the scenario in which a trusted curator gathers sensitive information from a large number of respondents, creates a relational dataset where each tuple corresponds to one entity, such as an individual, a household, or an organization, and then publishes a privacy-preserving (i.e., sanitized or anonymized) version of the dataset. This has been referred to as the "non-interactive" mode of private data analysis, as opposed to the "interactive" mode, where the data curator provides an interface through which users may pose queries about the data, and get (possibly noisy) answers.

...read moreread less

Journal Article•

A Review On Data Anonymization Technique For Data Publishing

[...]

Neha V. Mogre, Girish S. Agarwal, Pragati Patil

28 Dec 2012-International journal of engineering research and technology

Journal Article•DOI•

Clustering-oriented privacy-preserving data publishing

[...]

Weiwei Ni¹, Zhihong Chong¹•Institutions (1)

Southeast University¹

01 Nov 2012-Knowledge Based Systems

TL;DR: A mixed mode data obfuscation method AENDO is proposed, which provides a tradeoff strategy from a novel view on maintaining data utility and privacy protection and delivers better anti-inferring effect compared with RBT and NeNDS.

...read moreread less

Abstract: Privacy-preserving data publishing has attracted considerable research interests in recent years. One of the problems in such practices is how to trade-off between data utility and privacy protection. This problem heavily deteriorates when the published data are used to do cluster analysis; clustering demands differences between singles for grouping while privacy preserving aims to hide single identifications. In this paper, a mixed mode data obfuscation method AENDO is proposed, which provides a tradeoff strategy from a novel view. The underlying principle is to keep nearest neighborhood structures of data points while data are obfuscated. In particular, for each data point, AENDO differentiates its attributes into neighboring dispersed attributes and neighboring concentrated ones. Furthermore, pertinent statistical data substitution and data swapping strategies are applied to these attributes, respectively. An extensive set of experiments on UCI data sets are provided to assess the effectiveness of our solution, including comparing AENDO with RBT which is one of the best methods on maintaining data usability for clustering. Our results demonstrate that AENDO behaves similarly with RBT on maintaining data utility for clustering, while it outperforms NeNDS by a factor of approximate 10%. Meanwhile, it delivers better anti-inferring effect compared with RBT and NeNDS.

...read moreread less

Journal Article•DOI•

Privacy Preserving Data Publishing: Current Status and New Directions

[...]

Junqiang Liu

01 Jan 2012-Information Technology Journal

Proceedings Article•DOI•

A generalization-based approach for personalized privacy preservation in trajectory data publishing

[...]

Elahe Ghasemi Komishani¹, Mahdi Abadi¹•Institutions (1)

Tarbiat Modares University¹

01 Nov 2012

TL;DR: A novel approach for privacy preservation in trajectory data publishing based on the concept of personalized privacy is presented and the results of experiments show that the proposed approach achieve the conflicting goals of data utility and data privacy in accordance with the privacy requirements of moving objects.

...read moreread less

Abstract: Trajectory data are becoming more popular due to the rapid development of mobile devices and the widespread use of location-based services. They often provide useful information that can be used for data mining tasks. However, a trajectory database may contain sensitive attributes that are associated with trajectory data. Therefore, improper publishing of the trajectory database could put the privacy of moving objects at risk. Removing identifiers from the trajectory database before the public release, is not effective against privacy attacks, especially, when the adversary employs some background knowledge. The existing approaches for privacy preservation in trajectory data publishing apply the same amount of privacy preservation for all moving objects, without regard to their privacy requirements. The consequence is that some moving objects may be offered insufficient privacy preservation, while some others may not need high privacy protection. In this paper, we address this issue and present a novel approach for privacy preservation in trajectory data publishing based on the concept of personalized privacy. It consists of two main steps: (1) identifying primary critical trajectory data records and generalizing sensitive attributes according to them, and (2) identifying remaining critical trajectory data records and eliminating moving points with minimum information loss. The results of experiments on a trajectory dataset show that our proposed approach achieve the conflicting goals of data utility and data privacy in accordance with the privacy requirements of moving objects.

...read moreread less

Journal Article•DOI•

L Diversity on K-Anonymity with External Database for improving Privacy Preserving Data Publishing

[...]

P. MayilVelKumar, Madurakavi Karthikeyan

25 Sep 2012-International Journal of Computer Applications

TL;DR: L -diversity concept in k-anonymity applied external data set is applied and it shows the l diversity reduces the data losses in k anonymity data sets when data point moves any size.

...read moreread less

Abstract: The data must be secure and measurable at the public when it releases to view. The data table produces personal information and sensitive values. They are maintained for secrecy, the anonymity is the best method to protect the data. There are many anonymity methods to protect the data. k -anonymity is one method to protect the data. The problem in kanonymity method is if data set increases then utility decreases. Also kanonymity data is possible to many attacks like Homogeneity Attack, Background Knowledge Attack. The l diversity is another method to protect the data. Main advantage of ldiversity is the data set increases then the data utility also increases. Based on above advantage, we applied l -diversity concept in k-anonymity applied external data set and we evaluate high efficiency dataset. It shows the l diversity reduces the data losses in k anonymity data sets when data point moves any size.

...read moreread less

Journal Article•DOI•

Improving access to biodiversity data for, and from, EIAs – a data publishing framework built to global standards

[...]

Nicholas King¹, Asha Rajvanshi², Selwyn Willoughby, Ruben Roberts, Vinod B. Mathur², Mandy Cadman, Vishwas Chavan¹ - Show less +3 more•Institutions (2)

Global Biodiversity Information Facility¹, Wildlife Institute of India²

11 Sep 2012-Impact Assessment and Project Appraisal

TL;DR: In this article, an EIA Biodiversity Data Publishing Framework (EIABDF) is proposed to publish primary biodiversity data obtained during environmental impact assessments (EIAs) for other uses following the completion of the EIA.

...read moreread less

Abstract: Biodiversity information obtained during environmental impact assessments (EIAs) is rarely accessible for other uses following the completion of the EIA. Such data need to be made readily accessible; adding them to publicly accessible national datasets is important if biodiversity science, conservation and future decisions based on environmental assessment are to benefit from new biodiversity data and improved biodiversity data coverage. An ‘EIA Biodiversity Data Publishing Framework’, based on the Global Biodiversity Information Facility (GBIF) global standards, is thus proposed to meet this need. This paper outlines the GBIF-catalysed initiative to establish such an operational framework for uptake by the EIA community, as well as options that are available for data publishing in the absence of such a framework. It reviews the current state of accessibility and management of the primary biodiversity data associated with EIA studies, and highlights the urgent need for uptake of a range of data-publishing...

...read moreread less

Proceedings Article•DOI•

Privacy Preserving Data Publishing for Recommender System

[...]

Xiaoqiang Chen¹, Vincent Huang²•Institutions (2)

Uppsala University¹, Ericsson²

16 Jul 2012

TL;DR: A new clustering based k-anonymity heuristic named Bisecting K-Gather (BKG) is developed and it is proven to be efficient and accurate and to support customized user privacy assignments.

...read moreread less

Abstract: Driven by mutual benefits, exchange and publication of data among various parties is an inevitable trend However, released data often contains sensitive user information thus direct publication violates individual privacy Among many privacy models, k-anonymity framework is popular and well-studied, it protects information by constructing groups of anonymous records such that each record in the table released is covered by no fewer than k-1 other records In this paper, we first investigate different privacy preserving technologies and then focus on achieving k-anonymity for large scale and sparse databases, especially recommender systems We present a general process for anonymization of large scale database A preprocessing phase strategically extracts preference matrix from original data by Singular Value Decomposition (SVD) and eliminates the high dimensionality and sparsity problem We developed a new clustering based k-anonymity heuristic named Bisecting K-Gather (BKG) and it is proven to be efficient and accurate To support customized user privacy assignments, we also proposed a new concept called customized k-anonymity along with a corresponding algorithm (BOKG) We use MovieLens database to assess our algorithms The results show that we can efficiently release anonymized data without compromising the utility of data

...read moreread less

Book Chapter•DOI•

Data privacy against composition attack

[...]

Muzammil M. Baig¹, Jiuyong Li¹, Jixue Liu¹, Xiaofeng Ding¹, Hua Wang² - Show less +1 more•Institutions (2)

University of South Australia¹, University of Southern Queensland²

15 Apr 2012

TL;DR: A new generalization principle (ρ,α)-anonymization is proposed that effectively overcomes the privacy concerns for multiple independent data publishing and an effective algorithm is developed to achieve it.

...read moreread less

Abstract: Data anonymization has become a major technique in privacy preserving data publishing. Many methods have been proposed to anonymize one dataset and a series of datasets of a data holder. However, no method has been proposed for the anonymization scenario of multiple independent data publishing. A data holder publishes a dataset, which contains overlapping population with other datasets published by other independent data holders. No existing methods are able to protect privacy in such multiple independent data publishing. In this paper we propose a new generalization principle (ρ,α)-anonymization that effectively overcomes the privacy concerns for multiple independent data publishing. We also develop an effective algorithm to achieve the (ρ,α)-anonymization. We experimentally show that the proposed algorithm anonymizes data to satisfy the privacy requirement and preserves high quality data utility.

...read moreread less

DOI•

OLAP2DataCube: an ontowiki plug-in for statistical data publishing

[...]

Percy E. Rivera Salast, Michael Martin¹, Fernando Maia Da Mota, Sören Auer¹, Karin Breitman, Marco A. Casanova - Show less +2 more•Institutions (1)

Leipzig University¹

03 Jun 2012

TL;DR: An Ontowiki plugin that extracts and publishes statistical data in RDF is introduced and a comprehensive use case reporting on the extraction and publishing on the Web of statistical data about 10 years of Brazilian government is illustrated.

...read moreread less

Abstract: Statistical data is one of the most important sources of information, relevant for large numbers of stakeholders in the governmental, scientific and business domains alike. In this article, we introduce an Ontowiki plugin that extracts and publishes statistical data in RDF. We illustrate the plugin with a comprehensive use case reporting on the extraction and publishing on the Web of statistical data about 10 years of Brazilian government.

...read moreread less

Journal Article•DOI•

Improved linked data interaction through an automatic information architecture

[...]

Josep Maria Brunetti, Rosa Gil, Juan Manuel Gimeno, Roberto García

01 Jul 2012-International Journal of Software Engineering and Knowledge Engineering

TL;DR: Rhizomer is a data publishing tool whose interface provides a set of components borrowed from Information Architecture that facilitate getting an insight of the dataset at hand.

...read moreread less

Abstract: Thanks to Open Data initiatives the amount of data available on the Web is rapidly increasing. Unfortunately, most of these initiatives only publish raw tabular data, which makes its analysis and reuse very difficult. Linked Data principles allow for a more sophisticated approach by making explicit both the structure and semantics of the data. However, from the user experience viewpoint, published datasets continue to be monolithic files which are completely opaque or difficult to explore by making complex semantic queries. Our objective is to facilitate the user to grasp what kind of entities are in the dataset, how they are interrelated, which are their main properties and values, etc. Rhizomer is a data publishing tool whose interface provides a set of components borrowed from Information Architecture (IA) that facilitate getting an insight of the dataset at hand. Rhizomer automatically generates navigation menus and facets based on the kinds of things in the dataset and how they are described through metadata properties and values. This tool is currently being evaluated with end users that discover a whole new perspective of the Web of Data.

...read moreread less

Journal Article•DOI•

Publishing online identification keys in the form of scholarly papers.

[...]

Lyubomir Penev, Pierfilippo Cerretti, Hans-Peter Tschorsnig, Massimo Lopresti, Filippo Di Giovanni, Teodor Georgiev, Pavel Stoev, Terry L. Erwin - Show less +4 more

07 Apr 2012-ZooKeys

TL;DR: A method to incentive authors of online keys to publishing these through the already established model of “Data Paper” is proposed, and the main features of an interactive key to the Palaearctic genera of the family Tachinidae (Diptera) are described.

...read moreread less

Abstract: One of the main deficiencies in publishing and dissemination of online interactive identification keys produced through various software packages, such as DELTA, Lucid, MX and others, is the lack of a permanent scientific record and a proper citation mechanism of these keys. In two earlier papers, we have discussed some models for publishing raw data underpinning interactive keys (Penev et al. 2009; Sharkey et al. 2009). Here we propose a method to incentive authors of online keys to publishing these through the already established model of “Data Paper” (Chavan and Penev 2011, examples: Narwade et al. 2011, Van Landuyt et al. 2012, Schindel et al. 2011, Pierrat et al. 2012, see also Pensoft's Data Publishing Policies and Guidelines). For clarity, we propose a new article type for this format, “Online Identification Key”, to distinguish it from the “Data Paper” in the narrow sense. The model is demonstrated through an exemplar paper of Cerretti et al. (2012) in the current issue of ZooKeys. The paper describes the main features of an interactive key to the Palaearctic genera of the family Tachinidae (Diptera) implemented as an original web application. The authors discuss briefly the advantages of these tools for both taxonomists and general users, and point out the need of shared, standardized protocols for taxa descriptions to keep matrix-based interactive keys easily and timely updated. The format of the “Online Identification Key” paper largely resembles the structure of Data Papers proposed by Chavan and Penev (2011) on the basis of the Ecological Metadata Language (EML) and developed further in Pensoft's Data Publishing Policies and Guidelines. An “Online Identification Key” paper should focus on a formal description of the technical details and content of an online key that is what is often called “metadata”. For example, an “Online Identification Key” paper has a title, author(s), abstract and keywords like any other scientific paper; it should also include in the first place: the URL of an open access version of the online key and possibly also the data underpinning the key, information on the history of and participants in the project, the software used and its technical advantages and constraints, licenses for use, taxonomic and geographic coverage, lists and descriptions of the morphological characters used, and literature references. In contrast to conventional data papers, the “Online Identification Key” papers do not require compulsory publication of raw data files underpinning a key, although such a practice is highly recommended and encouraged. There might be several obstacles in publishing raw data that can be due to copyright issues on either data or source codes. It is mandatory, however, for the online keys published in this way to be freely available for use to anyone, by just clicking the URL address published in the paper. The publication of an online key in the form of a scholarly article is a pragmatic compromise between the dynamic structure of the internet and the static character of scientific articles. The author(s) of the key will be able to continuously update the product, to the benefit of its users. At the same time, the users will have available a citation mechanism for the online key, identical to that used for any other scientific article, to properly credit the authors of the key.

...read moreread less

Journal Article•DOI•

Efficient rescue of threatened biodiversity data using reBiND workflows

[...]

Anton Güntsch¹, D. Fichtmüller¹, Agnes Kirchhoff¹, Walter G. Berendsohn¹•Institutions (1)

Free University of Berlin¹

16 Oct 2012-Plant Biosystems

TL;DR: The reBiND project aims to develop an efficient and well-documented workflow for rescuing biodiversity data sets that is flexible enough to be transferred to other data types and domains.

...read moreread less

Abstract: Biodiversity data generated in the context of research projects often lack a strategy for long-term preservation and availability, and are therefore at risk of becoming outdated and finally lost. The reBiND project aims to develop an efficient and well-documented workflow for rescuing such data sets. The workflow consists of phases for data transformation into contemporary standards, data validation, storage in a native XML database, and data publishing in international biodiversity networks. It has been developed and tested using the example of collection and observational data but is flexible enough to be transferred to other data types and domains.

...read moreread less

Journal Article•DOI•

A Novel approach for Privacy Preserving in Medical Data Mining using Sensitivity based anonymity

[...]

Bhavana Abad, S A Kinariwala

31 Mar 2012-International Journal of Computer Applications

TL;DR: Experimental results on the Adult Database show the proposed methods not only can improve the accuracy of the publishing data, but also can preserve privacy General Terms Privacy preserving in data mining.

...read moreread less

Abstract: K-anonymity is one of the easy and efficient technique to achieve privacy preserving for sensitive data in many data publishing applications. In k-anonymity techniques, all tuples of releasing database are generalized to make it anonymize which lead to reduce the data utility and more information loss of publishing table. This paper firstly proposes a Sensitivity Based Tuple Anonymity Method. In this method first we consider the sensitivity of values in sensitive attribute and then only tuples having sensitive values are generalized, and the other tuples can be directly published. Experiment results on the Adult Database show the proposed methods not only can improve the accuracy of the publishing data, but also can preserve privacy General Terms Privacy preserving in data mining.

...read moreread less

Posted Content•

Small Count Privacy and Large Count Utility in Data Publishing

[...]

Ada Wai-Chee Fu¹, Jia Wang¹, Ke Wang², Raymond Chi-Wing Wong³•Institutions (3)

The Chinese University of Hong Kong¹, Simon Fraser University², Hong Kong University of Science and Technology³

15 Feb 2012-arXiv: Databases

TL;DR: The method is developed with another goal which is to provide differential privacy guarantee, and for that it introduces a more refined form of differential privacy to deal with certain practical issues.

...read moreread less

Abstract: While the introduction of differential privacy has been a major breakthrough in the study of privacy preserving data publication, some recent work has pointed out a number of cases where it is not possible to limit inference about individuals. The dilemma that is intrinsic in the problem is the simultaneous requirement of data utility in the published data. Differential privacy does not aim to protect information about an individual that can be uncovered even without the participation of the individual. However, this lack of coverage may violate the principle of individual privacy. Here we propose a solution by providing protection to sensitive information, by which we refer to the answers for aggregate queries with small counts. Previous works based on $\ell$-diversity can be seen as providing a special form of this kind of protection. Our method is developed with another goal which is to provide differential privacy guarantee, and for that we introduce a more refined form of differential privacy to deal with certain practical issues. Our empirical studies show that our method can preserve better utilities than a number of state-of-the-art methods although these methods do not provide the protections that we provide.

...read moreread less

Journal Article•DOI•

'Nature Conservation': a new dimension in Open Access publishing bridging science and application

[...]

Klaus Henle¹, Sandra Bell², Lluís Brotons, Jean Clobert³, Douglas Evans, Christoph Görg¹, Małgorzata Grodzińska-Jurczak⁴, Bernd Gruber⁵, Yrjö Haila⁶, Pierre-Yves Henry, Andreas Huth¹, Romain Julliard, Petr Keil⁷, Michael Kleyer⁸, D. Johan Kotze⁹, William E. Kunin¹⁰, Szabolcs Lengyel¹¹, Yu-Pin Lin¹², Adeline Loyau³, Gary W. Luck¹³, William E. Magnusson¹⁴, Chris Margules¹⁵, Yiannis G. Matsinos¹⁴, Peter H. May¹⁶, Isabel Sousa-Pinto¹⁷, Hugh P. Possingham¹⁸, Simon G. Potts¹⁹, Irene Ring¹, James S. Pryke²⁰, Michael J. Samways²⁰, Denis A. Saunders²¹, Dirk S. Schmeller³, Jukka Similä²², Simone Sommer²³, Ingolf Steffan-Dewenter²⁴, Pavel Stoev, Martin T. Sykes²⁵, Béla Tóthmérész¹¹, Joseph Tzanopoulos²⁶, Rita Yam¹², Lyubomir Penev - Show less +37 more•Institutions (26)

14 Mar 2012-Nature and Conservation

TL;DR: The focus, scope and policies of the inaugural issue of Nature Conservation, a new open access, peer-reviewed journal bridging natural sciences, social sciences and hands-on applications in conservation management, are presented.

...read moreread less

Abstract: This Editorial presents the focus, scope and policies of the inaugural issue of Nature Conservation, a new open access, peer-reviewed journal bridging natural sciences, social sciences and hands-on applications in conservation management. The journal covers all aspects of nature conservation and aims particularly at facilitating better interaction between scientists and practitioners. The journal will impose no restrictions on manuscript size or the use of colour. We will use an XML-based editorial workflow and several cutting-edge innovations in publishing and information dissemination. These include semantic mark-up of, and enhancements to published text, data, and extensive cross-linking within the journal and to external sources. We believe the journal will make an important contribution to better linking science and practice, offers rapid, peer-reviewed and flexible publication for authors and unrestricted access to content.

...read moreread less

Patent•

Web publishing method of quantified remote sensing data

[...]

Feng Zhang, Du Zhenhong, Liu Renyi, Xiantao Li, Lei Fang, Minjie Cao - Show less +2 more

02 May 2012

TL;DR: In this paper, a web publishing method of quantified remote sensing data is proposed, which includes the following steps of confirming the data structure commonly published by a remote sensing visual color blending information and the real physical magnitude quantity.

...read moreread less

Abstract: The invention discloses a web publishing method of quantified remote sensing data. The method comprises the following steps of: 1), confirming the data structure commonly published by a remote sensing data visual color blending information and the real physical magnitude quantity; 2), confirming the source data organization structure published by the quantified remote sensing data; 3), defining the web publishing interface of the data, wherein the interface is used for obtaining the service grade element data, the remote sensing real value data element data information, the remote sensing real value data, the remote sensing data color blending information and the remote sensing data counting information; 4), expressing the principle on the basis of the layered and blocked space image via transforming the remote sensing real value data to the special rendered color blending scheme; performing the visual express on the remote sensing real value data via the method of texture mapping; and 5), performing the local caching management on the remote sensing real value data so as to improve the data displaying and processing efficiency of the remote sensing image data publishing application platform. The invention realizes the quantified web publishing of the remote sensing data and improves the data displaying and processing efficiency.

...read moreread less