Anonymisation of Social Networks and Rough Set Approach
TL;DR: Social network data need to be anonymized before its publication in order to prevent potential reidentification attacks and to limit privacy breaches in privacy preserving data mining.
Abstract: Scientific study of network data can reveal many important behaviors of the elements involved and social trends. It also provides insight for suitable changes in the social structure and roles of individuals in it. There are many evidences (HIPAA (2002) Health insurance portability and accountability act. Available online http://www.hhs.gov/ocr/hipaa; Lambert, J Off Stat 9:313–331, 1993; Xu (2006) Utility based anonymisation using local recording. In: KDD’06, Philadelphia) which indicate the precious value of social network data in shedding light on social behavior, health, and well-being of the general public. For this purpose, the social network information needs to be published publicly or before a specialized group. But, depending upon the privacy model considered, this information may involve some sensitive data of individual participants in the social network, which are undesirable to be disclosed. Due to this problem, social network data need to be anonymized before its publication in order to prevent potential reidentification attacks. Data anonymization techniques are abundantly used in relational databases (Aggarwal et al. J Priv Technol, 2005; Backstrom et al. (2007) Wherefore art thou R3579X? Anonymized social networks, hidden patterns, and structural steganography. In: International world wide web conference (WWW). ACM, New York, pp 181–190; Bayardo and Agrawal (2005) Data privacy through optimal k-anonymisation. In: IEEE 21st international conference on data engineering, April 2005; Bamba et al. (2008) Supporting anonymous location queries in mobile environments with privacy grid. In: ACM world wide web conference; Byun et al. (2007) Efficient k-anonymisation using clustering techniques. In: International conference on database systems for advanced applications (DASFAA), pp 188–200; Campan and Truta (2008) A clustering approach for data and structural anonymity in social networks. In: ACM SIGKDD workshop on privacy, security, and trust in KDD (PinKDD), Las Vegas; Chakrabarti et al. (2004) R-MAT: a recursive model for graph mining. In: SIAM international conference on data mining; Chawla et al. (2005) Toward privacy in public databases. In: Proceedings of the theory of cryptography conference, Cambridge, MA; Evfimievski et al. (2003) Limiting privacy breaches in privacy preserving data mining. In: ACM principles of database systems (PODS). ACM, New York, pp 211–222; Getoor and Diehl, A surv SIGKDD Explore Newsl 7(2):3–12, 2005; Ghinita et al. (2007) Fast data anonymisation with low information loss. In: Very large data base conference (VLDB), Vienna, pp 758–769; Lefebvre et al. (2006) Mondrian multidimensional K-anonymity. In: IEEE international conference of data engineering (ICDE), p 25; Liu and Terzi (2008) Towards identity anonymisation on graphs. In: Wang (ed.) SIGMOD conference. ACM, New York, pp 93–106; Lunacek et al. (2006) A crossover operator for the k-anonymity problem. In: Genetic and evolutionary computation conference (GECCO), Seattle, Washington, pp 1713–1720; Machanavajjhala et al. (2006) L-diversity: privacy beyond K-anonymity. In: IEEE international conference on data engineering (ICDE), Atlanta, p 24; Malin, J Am Med Inform Assoc 12(1):28–34, 2004; Nergiz and Clifton (2006) Thoughts on k-anonymisation. In: IEEE 22nd international conference on data engineering workshops (ICDEW), Atlanta, April 2006, p 96; Nergiz and Clifton (2007) Multirelational k-anonymity. In: IEEE 23rd international conference on data engineering posters, April 2007). However, most of the known anonymisation approaches such as suppression or generalization do not directly apply to social network data. One major challenge in social network anonymization is the complexity. In (Gross and Yellen (2006) Graph theory and its applications. CRC, Boca Raton), it has been proved that a particular k-anonymity problem trying to minimize the structural change to the original social network is NP-hard. Research in anonymization of social networks is a relatively new field. In this chapter, we provide a systematic study of different approaches and studies done so far in this direction. There is no doubt that social network nodes can have imprecise data as their attributes. So, normal methods proposed for anonymization are not suitable for such type of social networks. Recently, a very efficient rough set-based algorithm was established in (Tripathy and Prakash Kumar, Int J Rapid Manuf 1(2):189–207, 2009) to handle clustering of tuples in relational models. We shall describe how this algorithm can be used for anonymization of social networks. Also, we shall present some recent algorithms which use isomorphism of graphs for anonymization of social networks. In the end, we shall discuss the current status of research on anonymization of social networks and present some related problems for further study.
TL;DR: The Health Insurance Portability and Accountability Act, also known as HIPAA, was designed to protect health insurance coverage for workers and their families while between jobs and establishes standards for electronic health care transactions.
Abstract: The Health Insurance Portability and Accountability Act, also known as HIPAA, was first delivered to congress in 1996 and consisted of just two Titles. It was designed to protect health insurance coverage for workers and their families while between jobs. It establishes standards for electronic health care transactions and addresses the issues of privacy and security when dealing with Protected Health Information (PHI). HIPAA is applicable only in the United States of America.
••01 Nov 2012
TL;DR: This paper proposes an algorithm which can be used to achieve k-anonymity and l-diversity in social network anonymisation and is based upon some existing algorithms developed in this direction.
Abstract: The development of several popular social networks in recent days and publication of social network data has led to the danger of disclosure of sensitive information of individuals. This necessitated the preservation of privacy before the publication of such data. There are several algorithms developed to preserve privacy in micro data. But these algorithms cannot be applied directly as in social networks the nodes have structural properties along with their labels. k-anonymity and l-diversity are efficient tools to anonymise micro data. So efforts have been made to find out similar algorithms to handle social network anonymisation. In this paper we propose an algorithm which can be used to achieve k-anonymity and l-diversity in social network anonymisation. This algorithm is based upon some existing algorithms developed in this direction.
...We present the isomorphism developed in [7,10] below:...
••01 Jan 2014
TL;DR: This chapter gives a brief overview of the privacy concerns in online social networks and provides a detailed description of the algorithm, GASNA, a greedy algorithm for social network anonymization, which provides structural anonymity and sensitive attribute protection by achieving k-anonymity and l-diversity in social network data.
Abstract: As the Internet continues to grow, the proliferation of online social networks raises many privacy concerns. The users of these OSNs are divulging endless details about their lives online. This personal information can be used by attackers to perpetrate significant privacy breaches and carry out attacks such as identity theft and credit card fraud. The privacy concerns arise from not just the users posting their personal information online, but also from OSNs publishing this information for analysis. Driven by Web 2.0 applications, more and more social network has been made publicly available. Preserving the privacy of individuals in this published data is an important concern. Although privacy preservation in data publishing has been studied extensively and several important models such as k- anonymity and l-diversity as well as many efficient algorithms have been proposed, most of the existing studies deal with relational data only. Those methods cannot be applied to social network data straightforwardly. Anonymization of social network data is a much more challenging task than anonymizing relational data. Firstly, in relational databases, attacks come from identifying individuals from quasi-identifiers. But in social networks, information such as neighbourhood graphs can be used to identify individuals. Secondly, tuples can be anonymized in relational data without affecting other tuples. But in social networks, adding edges or vertices affects the neighbourhoods of other vertices in the graph as well. In this chapter, we give a brief overview of the privacy concerns in online social networks and provide a detailed description of our algorithm, GASNA, a greedy algorithm for social network anonymization. This algorithm provides structural anonymity and sensitive attribute protection by achieving k-anonymity and l-diversity in social network data. We also discuss the challenges faced by the existing algorithms/models for social network data privacy and suggest techniques to counter these challenges. The issues discussed are the high cost of achieving k-anonymity when the value of k is fixed and the need for a better anonymity model which suits the current scenario of social networks. We also propose a new model called partial anonymity which can help reduce the number of edges added for anonymization when the value d of d-neighbourhood is greater than 1.
••25 Aug 2013
TL;DR: An algorithm is proposed to achieve k-anonymity and l-diversity in social network data which provides structural anonymity along with sensitive attribute protection and has a substantially lower running time than other algorithms previously proposed in the field.
Abstract: The proliferation of social networks in digital media has proved to be fruitful, but this rise in popularity is accompanied by user privacy concerns Social network data has been published in various ways and preserving the privacy of individuals in the published data has become an important concern Several algorithms have been developed for privacy preservation in relational data, but these algorithms cannot be applied directly to social networks as the nodes here have structural properties along with labels In this paper, we propose an algorithm to achieve k-anonymity and l-diversity in social network data which provides structural anonymity along with sensitive attribute protection The proposed algorithm uses novel edge addition techniques which are also presented in this paper We also propose a concept of partial anonymity to reduce anonymization cost for d>1 The empirical study shows that our algorithm requires significantly less number of edge additions for anonymization of social network data and has a substantially lower running time than the other algorithms previously proposed in the field
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data
•01 Jan 1988
TL;DR: Probabilistic Reasoning in Intelligent Systems as mentioned in this paper is a complete and accessible account of the theoretical foundations and computational methods that underlie plausible reasoning under uncertainty, and provides a coherent explication of probability as a language for reasoning with partial belief.
Abstract: From the Publisher: Probabilistic Reasoning in Intelligent Systems is a complete andaccessible account of the theoretical foundations and computational methods that underlie plausible reasoning under uncertainty. The author provides a coherent explication of probability as a language for reasoning with partial belief and offers a unifying perspective on other AI approaches to uncertainty, such as the Dempster-Shafer formalism, truth maintenance systems, and nonmonotonic logic. The author distinguishes syntactic and semantic approaches to uncertaintyand offers techniques, based on belief networks, that provide a mechanism for making semantics-based systems operational. Specifically, network-propagation techniques serve as a mechanism for combining the theoretical coherence of probability theory with modern demands of reasoning-systems technology: modular declarative inputs, conceptually meaningful inferences, and parallel distributed computation. Application areas include diagnosis, forecasting, image interpretation, multi-sensor fusion, decision support systems, plan recognition, planning, speech recognitionin short, almost every task requiring that conclusions be drawn from uncertain clues and incomplete information. Probabilistic Reasoning in Intelligent Systems will be of special interest to scholars and researchers in AI, decision theory, statistics, logic, philosophy, cognitive psychology, and the management sciences. Professionals in the areas of knowledge-based systems, operations research, engineering, and statistics will find theoretical and computational tools of immediate practical use. The book can also be used as an excellent text for graduate-level courses in AI, operations research, or applied probability.
TL;DR: The solution provided in this paper includes a formal protection model named k-anonymity and a set of accompanying policies for deployment and examines re-identification attacks that can be realized on releases that adhere to k- anonymity unless accompanying policies are respected.
Abstract: Consider a data holder, such as a hospital or a bank, that has a privately held collection of person-specific, field structured data. Suppose the data holder wants to share a version of the data with researchers. How can a data holder release a version of its private data with scientific guarantees that the individuals who are the subjects of the data cannot be re-identified while the data remain practically useful? The solution provided in this paper includes a formal protection model named k-anonymity and a set of accompanying policies for deployment. A release provides k-anonymity protection if the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appears in the release. This paper also examines re-identification attacks that can be realized on releases that adhere to k- anonymity unless accompanying policies are respected. The k-anonymity protection model is important because it forms the basis on which the real-world systems known as Datafly, µ-Argus and k-Similar provide guarantees of privacy protection.
•31 Oct 1991
TL;DR: Theoretical Foundations.
Abstract: I. Theoretical Foundations.- 1. Knowledge.- 1.1. Introduction.- 1.2. Knowledge and Classification.- 1.3. Knowledge Base.- 1.4. Equivalence, Generalization and Specialization of Knowledge.- Summary.- Exercises.- References.- 2. Imprecise Categories, Approximations and Rough Sets.- 2.1. Introduction.- 2.2. Rough Sets.- 2.3. Approximations of Set.- 2.4. Properties of Approximations.- 2.5. Approximations and Membership Relation.- 2.6. Numerical Characterization of Imprecision.- 2.7. Topological Characterization of Imprecision.- 2.8. Approximation of Classifications.- 2.9. Rough Equality of Sets.- 2.10. Rough Inclusion of Sets.- Summary.- Exercises.- References.- 3. Reduction of Knowledge.- 3.1. Introduction.- 3.2. Reduct and Core of Knowledge.- 3.3. Relative Reduct and Relative Core of Knowledge.- 3.4. Reduction of Categories.- 3.5. Relative Reduct and Core of Categories.- Summary.- Exercises.- References.- 4. Dependencies in Knowledge Base.- 4.1. Introduction.- 4.2. Dependency of Knowledge.- 4.3. Partial Dependency of Knowledge.- Summary.- Exercises.- References.- 5. Knowledge Representation.- 5.1. Introduction.- 5.2. Examples.- 5.3. Formal Definition.- 5.4. Significance of Attributes.- 5.5. Discernibility Matrix.- Summary.- Exercises.- References.- 6. Decision Tables.- 6.1. Introduction.- 6.2. Formal Definition and Some Properties.- 6.3. Simplification of Decision Tables.- Summary.- Exercises.- References.- 7. Reasoning about Knowledge.- 7.1. Introduction.- 7.2. Language of Decision Logic.- 7.3. Semantics of Decision Logic Language.- 7.4. Deduction in Decision Logic.- 7.5. Normal Forms.- 7.6. Decision Rules and Decision Algorithms.- 7.7. Truth and Indiscernibility.- 7.8. Dependency of Attributes.- 7.9. Reduction of Consistent Algorithms.- 7.10. Reduction of Inconsistent Algorithms.- 7.11. Reduction of Decision Rules.- 7.12. Minimization of Decision Algorithms.- Summary.- Exercises.- References.- II. Applications.- 8. Decision Making.- 8.1. Introduction.- 8.2. Optician's Decisions Table.- 8.3. Simplification of Decision Table.- 8.4. Decision Algorithm.- 8.5. The Case of Incomplete Information.- Summary.- Exercises.- References.- 9. Data Analysis.- 9.1. Introduction.- 9.2. Decision Table as Protocol of Observations.- 9.3. Derivation of Control Algorithms from Observation.- 9.4. Another Approach.- 9.5. The Case of Inconsistent Data.- Summary.- Exercises.- References.- 10. Dissimilarity Analysis.- 10.1. Introduction.- 10.2. The Middle East Situation.- 10.3. Beauty Contest.- 10.4. Pattern Recognition.- 10.5. Buying a Car.- Summary.- Exercises.- References.- 11. Switching Circuits.- 11.1. Introduction.- 11.2. Minimization of Partially Defined Switching Functions.- 11.3. Multiple-Output Switching Functions.- Summary.- Exercises.- References.- 12. Machine Learning.- 12.1. Introduction.- 12.2. Learning From Examples.- 12.3. The Case of an Imperfect Teacher.- 12.4. Inductive Learning.- Summary.- Exercises.- References.