Journal ArticleDOI
Privacy preserving data mining: a noise addition framework using a novel clustering technique
Zahidul Islam,Ljiljana Brankovic +1 more
TLDR
A framework that uses a few novel noise addition techniques for protecting individual privacy while maintaining a high data quality is presented and a security analysis is presented for measuring the security level of a data set.Abstract:
During the whole process of data mining (from data collection to knowledge discovery) various sensitive data get exposed to several parties including data collectors, cleaners, preprocessors, miners and decision makers. The exposure of sensitive data can potentially lead to breach of individual privacy. Therefore, many privacy preserving techniques have been proposed recently. In this paper we present a framework that uses a few novel noise addition techniques for protecting individual privacy while maintaining a high data quality. We add noise to all attributes, both numerical and categorical. We present a novel technique for clustering categorical values and use it for noise addition purpose. A security analysis is also presented for measuring the security level of a data set.read more
Citations
More filters
Journal ArticleDOI
A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data
TL;DR: A new measure based on co-occurrence of values to evaluate the dissimilarity between data objects and prototypes of clusters is employed, which takes into account the significance of different attributes towards the clustering process.
Journal ArticleDOI
Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques
Md. Geaur Rahman,Zahidul Islam +1 more
TL;DR: Two novel techniques for the imputation of both categorical and numerical missing values are presented, using decision trees and forests to identify horizontal segments of a data set where the records belonging to a segment have higher similarity and attribute correlations.
Journal ArticleDOI
A comprehensive review on privacy preserving data mining
Yousra Abdul Alsahib S. Aldeen,Yousra Abdul Alsahib S. Aldeen,Mazleena Salleh,Mohammad Abdur Razzaque +3 more
TL;DR: A panoramic overview on new perspective and systematic interpretation of a list published literatures via their meticulous organization in subcategories is provided, which reveals the past development, present research challenges, future trends, the gaps and weaknesses.
Proceedings Article
A decision tree-based missing value imputation technique for data pre-processing
Geaur Rahman,Zahidul Islam +1 more
TL;DR: An efficient missing value imputation technique called DMI, which makes use of a decision tree and expectation maximization (EM) algorithm, argues that the correlations among attributes within a horizontal partition of a data set can be higher than the correlations over the whole data set.
Journal ArticleDOI
A sanitization approach for hiding sensitive itemsets based on particle swarm optimization
Jerry Chun-Wei Lin,Qiankun Liu,Philippe Fournier-Viger,Tzung-Pei Hong,Miroslav Voznak,Justin Zhan +5 more
TL;DR: A particle swarm optimization (PSO)-based algorithm called PSO2DT is developed to hide sensitive itemsets while minimizing the side effects of the sanitization process, which performs better than the Greedy algorithm and GA-based algorithms in terms of runtime, fail to be hidden, not to behidden, and database similarity.
References
More filters
Journal ArticleDOI
Privacy-preserving data mining
TL;DR: This work considers the concrete case of building a decision-tree classifier from training data in which the values of individual records have been perturbed and proposes a novel reconstruction procedure to accurately estimate the distribution of original data values.
Journal ArticleDOI
Improved use of continuous attributes in C4.5
TL;DR: A reported weakness of C4.5 in domains with continuous attributes is addressed by modifying the formation and evaluation of tests on continuous attributes with an MDL-inspired penalty, leading to smaller decision trees with higher predictive accuracies.
Proceedings ArticleDOI
Privacy preserving mining of association rules
TL;DR: A class of randomization operators are proposed that are much more effective than uniform randomization in limiting the breaches of privacy breaches and derived formulae for an unbiased support estimator and its variance are derived.
Proceedings ArticleDOI
CACTUS—clustering categorical data using summaries
TL;DR: This paper introduces a novel formalization of a cluster for categorical attributes by generalizing a definition of a clusters for numerical attributes and describes a very fast summarizationbased algorithm called CACTUS that discovers exactly such clusters in the data.
Journal ArticleDOI
Association rule hiding
TL;DR: This work investigates confidentiality issues of a broad category of rules, the association rules, and presents three strategies and five algorithms for hiding a group of associationrules, which is characterized as sensitive.