Privacy preserving data mining: a noise addition framework using a novel clustering technique

doi:10.1016/J.KNOSYS.2011.05.011

Journal ArticleDOI

Privacy preserving data mining: a noise addition framework using a novel clustering technique

Zahidul Islam, +1 more

- 01 Dec 2011 -

Knowledge Based Systems

- Vol. 24, Iss: 8, pp 1214-1223

TLDR

A framework that uses a few novel noise addition techniques for protecting individual privacy while maintaining a high data quality is presented and a security analysis is presented for measuring the security level of a data set.

Abstract:

During the whole process of data mining (from data collection to knowledge discovery) various sensitive data get exposed to several parties including data collectors, cleaners, preprocessors, miners and decision makers. The exposure of sensitive data can potentially lead to breach of individual privacy. Therefore, many privacy preserving techniques have been proposed recently. In this paper we present a framework that uses a few novel noise addition techniques for protecting individual privacy while maintaining a high data quality. We add noise to all attributes, both numerical and categorical. We present a novel technique for clustering categorical values and use it for noise addition purpose. A security analysis is also presented for measuring the security level of a data set.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data

Jinchao Ji, +4 more

- 01 Jun 2012 -

Knowledge Based Systems

TL;DR: A new measure based on co-occurrence of values to evaluate the dissimilarity between data objects and prototypes of clusters is employed, which takes into account the significance of different attributes towards the clustering process.

...read moreread less

Journal ArticleDOI

Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques

Md. Geaur Rahman, +1 more

- 01 Nov 2013 -

Knowledge Based Systems

TL;DR: Two novel techniques for the imputation of both categorical and numerical missing values are presented, using decision trees and forests to identify horizontal segments of a data set where the records belonging to a segment have higher similarity and attribute correlations.

...read moreread less

Journal ArticleDOI

A comprehensive review on privacy preserving data mining

Yousra Abdul Alsahib S. Aldeen, +3 more

- 12 Nov 2015 -

SpringerPlus

TL;DR: A panoramic overview on new perspective and systematic interpretation of a list published literatures via their meticulous organization in subcategories is provided, which reveals the past development, present research challenges, future trends, the gaps and weaknesses.

...read moreread less

Proceedings Article

A decision tree-based missing value imputation technique for data pre-processing

Geaur Rahman, +1 more

TL;DR: An efficient missing value imputation technique called DMI, which makes use of a decision tree and expectation maximization (EM) algorithm, argues that the correlations among attributes within a horizontal partition of a data set can be higher than the correlations over the whole data set.

...read moreread less

Journal ArticleDOI

A sanitization approach for hiding sensitive itemsets based on particle swarm optimization

Jerry Chun-Wei Lin, +5 more

- 01 Aug 2016 -

Engineering Applications of Artificial I...

TL;DR: A particle swarm optimization (PSO)-based algorithm called PSO2DT is developed to hide sensitive itemsets while minimizing the side effects of the sanitization process, which performs better than the Greedy algorithm and GA-based algorithms in terms of runtime, fail to be hidden, not to behidden, and database similarity.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Privacy-preserving data mining

Rakesh Agrawal, +1 more

TL;DR: This work considers the concrete case of building a decision-tree classifier from training data in which the values of individual records have been perturbed and proposes a novel reconstruction procedure to accurately estimate the distribution of original data values.

...read moreread less

Journal ArticleDOI

Improved use of continuous attributes in C4.5

J. R. Quinlan

- 01 Jan 1996 -

Journal of Artificial Intelligence Resea...

TL;DR: A reported weakness of C4.5 in domains with continuous attributes is addressed by modifying the formation and evaluation of tests on continuous attributes with an MDL-inspired penalty, leading to smaller decision trees with higher predictive accuracies.

...read moreread less

Proceedings ArticleDOI

Privacy preserving mining of association rules

Alexandre V. Evfimievski, +3 more

TL;DR: A class of randomization operators are proposed that are much more effective than uniform randomization in limiting the breaches of privacy breaches and derived formulae for an unbiased support estimator and its variance are derived.

...read moreread less

Proceedings ArticleDOI

CACTUS—clustering categorical data using summaries

Venkatesh Ganti, +2 more

TL;DR: This paper introduces a novel formalization of a cluster for categorical attributes by generalizing a definition of a clusters for numerical attributes and describes a very fast summarizationbased algorithm called CACTUS that discovers exactly such clusters in the data.

...read moreread less

Journal ArticleDOI

Association rule hiding

Vassilios S. Verykios, +4 more

- 01 Apr 2004 -

IEEE Transactions on Knowledge and Data ...

TL;DR: This work investigates confidentiality issues of a broad category of rules, the association rules, and presents three strategies and five algorithms for hiding a group of associationrules, which is characterized as sensitive.

...read moreread less

Collapse

Privacy preserving data mining: a noise addition framework using a novel clustering technique

Citations

A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data

Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques

A comprehensive review on privacy preserving data mining

A decision tree-based missing value imputation technique for data pre-processing

A sanitization approach for hiding sensitive itemsets based on particle swarm optimization

References

Privacy-preserving data mining

Improved use of continuous attributes in C4.5

Privacy preserving mining of association rules

CACTUS—clustering categorical data using summaries

Association rule hiding

Related Papers (5)

Privacy-preserving data mining

L-diversity: Privacy beyond k-anonymity

k -anonymity: a model for protecting privacy

Data Mining: Concepts and Techniques

Calibrating noise to sensitivity in private data analysis