scispace - formally typeset
Journal ArticleDOI

Privacy preserving data mining: a noise addition framework using a novel clustering technique

TLDR
A framework that uses a few novel noise addition techniques for protecting individual privacy while maintaining a high data quality is presented and a security analysis is presented for measuring the security level of a data set.
Abstract
During the whole process of data mining (from data collection to knowledge discovery) various sensitive data get exposed to several parties including data collectors, cleaners, preprocessors, miners and decision makers. The exposure of sensitive data can potentially lead to breach of individual privacy. Therefore, many privacy preserving techniques have been proposed recently. In this paper we present a framework that uses a few novel noise addition techniques for protecting individual privacy while maintaining a high data quality. We add noise to all attributes, both numerical and categorical. We present a novel technique for clustering categorical values and use it for noise addition purpose. A security analysis is also presented for measuring the security level of a data set.

read more

Citations
More filters
Journal ArticleDOI

A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data

TL;DR: A new measure based on co-occurrence of values to evaluate the dissimilarity between data objects and prototypes of clusters is employed, which takes into account the significance of different attributes towards the clustering process.
Journal ArticleDOI

Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques

TL;DR: Two novel techniques for the imputation of both categorical and numerical missing values are presented, using decision trees and forests to identify horizontal segments of a data set where the records belonging to a segment have higher similarity and attribute correlations.
Journal ArticleDOI

A comprehensive review on privacy preserving data mining

TL;DR: A panoramic overview on new perspective and systematic interpretation of a list published literatures via their meticulous organization in subcategories is provided, which reveals the past development, present research challenges, future trends, the gaps and weaknesses.
Proceedings Article

A decision tree-based missing value imputation technique for data pre-processing

TL;DR: An efficient missing value imputation technique called DMI, which makes use of a decision tree and expectation maximization (EM) algorithm, argues that the correlations among attributes within a horizontal partition of a data set can be higher than the correlations over the whole data set.
Journal ArticleDOI

A sanitization approach for hiding sensitive itemsets based on particle swarm optimization

TL;DR: A particle swarm optimization (PSO)-based algorithm called PSO2DT is developed to hide sensitive itemsets while minimizing the side effects of the sanitization process, which performs better than the Greedy algorithm and GA-based algorithms in terms of runtime, fail to be hidden, not to behidden, and database similarity.
References
More filters
Journal ArticleDOI

Privacy-preserving data mining

TL;DR: This work considers the concrete case of building a decision-tree classifier from training data in which the values of individual records have been perturbed and proposes a novel reconstruction procedure to accurately estimate the distribution of original data values.
Journal ArticleDOI

Improved use of continuous attributes in C4.5

TL;DR: A reported weakness of C4.5 in domains with continuous attributes is addressed by modifying the formation and evaluation of tests on continuous attributes with an MDL-inspired penalty, leading to smaller decision trees with higher predictive accuracies.
Proceedings ArticleDOI

Privacy preserving mining of association rules

TL;DR: A class of randomization operators are proposed that are much more effective than uniform randomization in limiting the breaches of privacy breaches and derived formulae for an unbiased support estimator and its variance are derived.
Proceedings ArticleDOI

CACTUS—clustering categorical data using summaries

TL;DR: This paper introduces a novel formalization of a cluster for categorical attributes by generalizing a definition of a clusters for numerical attributes and describes a very fast summarizationbased algorithm called CACTUS that discovers exactly such clusters in the data.
Journal ArticleDOI

Association rule hiding

TL;DR: This work investigates confidentiality issues of a broad category of rules, the association rules, and presents three strategies and five algorithms for hiding a group of associationrules, which is characterized as sensitive.
Related Papers (5)