scispace - formally typeset
Search or ask a question

Showing papers by "Vitaly Shmatikov published in 2009"


Proceedings ArticleDOI
17 May 2009
TL;DR: A framework for analyzing privacy and anonymity in social networks is presented and a new re-identification algorithm targeting anonymized social-network graphs is developed, showing that a third of the users who can be verified to have accounts on both Twitter and Flickr can be re-identified in the anonymous Twitter graph.
Abstract: Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers, application developers, and data-mining researchers. Privacy is typically protected by anonymization, i.e., removing names, addresses, etc.We present a framework for analyzing privacy and anonymity in social networks and develop a new re-identification algorithm targeting anonymized social-network graphs. To demonstrate its effectiveness on real-world networks, we show that a third of the users who can be verified to have accounts on both Twitter, a popular microblogging service, and Flickr, an online photo-sharing site, can be re-identified in the anonymous Twitter graph with only a 12% error rate.Our de-anonymization algorithm is based purely on the network topology, does not require creation of a large number of dummy "sybil" nodes, is robust to noise and all existing defenses, and works even when the overlap between the target network and the adversary's auxiliary information is small.

1,360 citations


Proceedings ArticleDOI
TL;DR: In this paper, the authors presented a framework for analyzing privacy and anonymity in social networks and developed a new re-identification algorithm targeting anonymized social-network graphs, which is based purely on the network topology, does not require creation of a large number of dummy "sybil" nodes, is robust to noise and all existing defenses, and works even when the overlap between the target network and the adversary's auxiliary information is small.
Abstract: Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers, application developers, and data-mining researchers. Privacy is typically protected by anonymization, i.e., removing names, addresses, etc. We present a framework for analyzing privacy and anonymity in social networks and develop a new re-identification algorithm targeting anonymized social-network graphs. To demonstrate its effectiveness on real-world networks, we show that a third of the users who can be verified to have accounts on both Twitter, a popular microblogging service, and Flickr, an online photo-sharing site, can be re-identified in the anonymous Twitter graph with only a 12% error rate. Our de-anonymization algorithm is based purely on the network topology, does not require creation of a large number of dummy "sybil" nodes, is robust to noise and all existing defenses, and works even when the overlap between the target network and the adversary's auxiliary information is small.

114 citations


Proceedings ArticleDOI
08 Jul 2009
TL;DR: SAFER, a static analysis tool for identifying potential DoS vulnerabilities and the root causes of resource-exhaustion attacks before the software is deployed, combines taint analysis with control dependency analysis to detect high-complexity control structures whose execution can be triggered by untrusted network inputs.
Abstract: As networked systems grow in complexity, they are increasingly vulnerable to denial-of-service (DoS) attacks involving resource exhaustion. A single malicious "input of coma" can trigger high-complexity behavior such as deep recursion in a carelessly implemented server, exhausting CPU time or stack space and making the server unavailable to legitimate clients. These DoS attacks exploit the semantics of the target application, are rarely associated with network traffic anomalies, and are thus extremely difficult to detect using conventional methods.We present SAFER, a static analysis tool for identifying potential DoS vulnerabilities and the root causes of resource-exhaustion attacks before the software is deployed. Our tool combines taint analysis with control dependency analysis to detect high-complexity control structures whose execution can be triggered by untrusted network inputs.When evaluated on real-world networked applications, SAFER discovered previously unknown DoS vulnerabilities in the Expat XML parser and the SQLite library, as well as a new attack on a previously patched version of the wu-ftpd server. This demonstrates the importance of understanding and repairing the root causes of DoS vulnerabilities rather than simply blocking known malicious inputs.

58 citations


Book ChapterDOI
21 Jul 2009
TL;DR: This protocol allows a user to construct a classifier on a database held by a remote server without learning any additional information about the records held in the database, and uses several novel techniques to enable oblivious classifier construction.
Abstract: We present an efficient protocol for the privacy-preserving, distributed learning of decision-tree classifiers. Our protocol allows a user to construct a classifier on a database held by a remote server without learning any additional information about the records held in the database. The server does not learn anything about the constructed classifier, not even the user's choice of feature and class attributes. Our protocol uses several novel techniques to enable oblivious classifier construction. We evaluate a prototype implementation, and demonstrate that its performance is efficient for practical scenarios.

39 citations


01 Jan 2009
TL;DR: This thesis conducts a thorough theoretical and empirical investigation of privacy issues involved in non-interactive data release and presents frameworks for privacy and anonymity in these different settings within which one might define exactly when a privacy breach has occurred.
Abstract: The Internet has enabled the collection, aggregation and analysis of personal data on a massive scale. It has also enabled the sharing of collected data in various ways: wholesale outsourcing of data warehousing, partnering with advertisers for targeted advertising, data publishing for exploratory research, etc. This has led to complex privacy questions related to the leakage of sensitive user data and mass harvesting of information by unscrupulous parties. These questions have information-theoretic, sociological and legal aspects and are often poorly understood. There are two fundamental paradigms for how the data is released: in the interactive setting, the data collector holds the data while third parties interact with the data collector to compute some function on the database. In the non-interactive setting, the database is somehow "sanitized" and then published. In this thesis, we conduct a thorough theoretical and empirical investigation of privacy issues involved in non-interactive data release. Both settings have been well analyzed in the academic literature, but simplicity of the non-interactive paradigm has resulted in its being used almost exclusively in actual data releases. We analyze several common applications including electronic directories, collaborative filtering and recommender systems, and social networks. Our investigation has two main foci. First, we present frameworks for privacy and anonymity in these different settings within which one might define exactly when a privacy breach has occurred. Second, we use these frameworks to experimentally analyze actual large datasets and quantify privacy issues. The picture that has emerged from this research is a bleak one for noninteractivity. While a surprising level of privacy control is possible in a limited number of applications, the general sense is that protecting privacy in the non-interactive setting is not as easy as intuitively assumed in the absence of rigorous privacy definitions. While some applications can be salvaged either by moving to an interactive setting or by other means, in others a rethinking of the tradeoffs between utility and privacy that are currently taken for granted appears to be necessary.

5 citations