scispace - formally typeset
Search or ask a question

Showing papers by "Balachander Krishnamurthy published in 2010"


Journal ArticleDOI
07 Jan 2010
TL;DR: This research shows that it is possible for third-parties to link PII, which is leaked via OSNs, with user actions both within OSN sites and elsewhere on non-OSN sites.
Abstract: For purposes of this paper, we define "Personally identifiable information" (PII) as information which can be used to distinguish or trace an individual's identity either alone or when combined with other information that is linkable to a specific individual. The popularity of Online Social Networks (OSN) has accelerated the appearance of vast amounts of personal information on the Internet. Our research shows that it is possible for third-parties to link PII, which is leaked via OSNs, with user actions both within OSN sites and elsewhere on non-OSN sites. We refer to this ability to link PII and combine it with other information as "leakage". We have identified multiple ways by which such leakage occurs and discuss measures to prevent it.

204 citations


Proceedings Article
22 Jun 2010
TL;DR: It is found that all mOSNs exhibit some leakage of private information to third parties, and novel concerns include combination of new features unique to mobile access with the leakage in OSNs that had been examined earlier.
Abstract: Mobile Online Social Networks (mOSNs) have recently grown in popularity. With the ubiquitous use of mobile devices and a rapid shift of technology and access to OSNs, it is important to examine the impact of mobile OSNs from a privacy standpoint. We present a taxonomy of ways to study privacy leakage and report on the current status of known leakages. We find that all mOSNs in our study exhibit some leakage of private information to third parties. Novel concerns include combination of new features unique to mobile access with the leakage in OSNs that we had examined earlier.

83 citations


Patent
25 Feb 2010
TL;DR: In this paper, the anti-tracking data structure may include opt-out cookie data indicative of a set of optout cookies, uniform resource locator (URL), and Referer header field (RDF) anti-tagging data.
Abstract: A disclosed method for implementing anti-tracking measures for a web browser includes refreshing anti-tracking data structure responsive to satisfying at least one of a set of anti-tracking refresh criteria. The anti-tracking data structure may include opt-out cookie data indicative of a set of opt-out cookies, uniform resource locator (URL) anti-tracking data indicative of a set of URLs associated with URL tracking, and Referer header field anti-tracking data indicative of a set of URLs susceptible to Referer header field tracking. Responsive to a web browser of a user device generating a request for a third-party web page specified by a browser URL, at least a portion of the browser URL is compared against the anti-tracking data structure. If a match in the URL anti-tracking data or the Referer header field anti-tracking data is detected, the browser URL may be modified. The refreshing of anti-tracking data may include pulling a current anti-tracking data structure from an anti-tracking server. Alternatively, the current anti-tracking data structure may be pushed from the anti-tracking server.

70 citations


21 Jun 2010
TL;DR: Findings suggest that the majority of time Twitterers do write about themselves, and beyond personally identifiable information, sharing other kinds of personal information on Twitter may put people at risk to be taken advantage of.
Abstract: Social media provide many opportunities to connect people; however, the kinds of personally identifiable information that people share through social media is understudied. Such public discussions of personal information warrant a closer privacy discussion. This paper presents findings from a content analysis of Twitter in which the amount and kinds of personally identifiable information in Twitter messages were coded. Findings suggest that the majority of time Twitterers do write about themselves. Overwhelmingly, Twitterers do not include identifiable information such as phone numbers, email and home addresses. However, about a quarter of tweets do include information regarding when people are engaging in activities and where they are. This kind of information may have privacy implications when found in the same tweet or if coupled with other kinds of publicly available information. PRIVACY ON TWITTER 3 How much is too much? Privacy issues on Twitter Social media provide many people a new way to connect with friends, family and colleagues. In particular, social network sites are frequently used to communicate with people known to one another through offline connections (Ellison, Steinfield, & Lampe, 2007). For example, as of August 2009, Facebook was the fifth most frequented website in the US (ComScore, 2009). These services can help to reinforce social bonds and manage social identities (d. m. boyd, 2004; Lange, 2007; Liu, 2007). Research has shown that there can be benefits that come from sharing personal information in social and public ways (e.g. boyd, 2004; Ellison et al., 2007; Hampton & Wellman, 1999). In addition to the benefits of using social network sites, there may be risks associated with using such services. For example, research has begun exploring what kinds of personally identifiable information (e.g. phone numbers, email address, postal address, social security numbers, etc.) people share through services such as Facebook and MySpace (Kolek & Saunders, 2008; Lenhart & Madden, 2007). The misuse of personally identifiable information obtained online can raise many privacy concerns such as identity theft or even discrimination (Lyon, 2001). Therefore this study seeks to explore the kinds of personally identifiable information that people publically share by analyzing the content of a representative sample of public Twitter messages. Twitter is a popular micro-blogging and social network service that allows people to share messages of 140 characters in length. As of September 2009, Twitter had over 50 million unique users (Moore, 2009). While Twitter allows people to share information among friends or “followers”, the default privacy setting on Twitter is that all messages are public, that is, anyone who signs up for Twitter may see them. In addition, all public tweets may be posted to a public timeline website which showcases the twenty most recent tweets. Profiles PRIVACY ON TWITTER 4 on Twitter are relatively short compared to Facebook, therefore the bulk of the information about a person is communicated through their Twitter messages or tweets. This study explores the kinds of personally identifiable information that public tweets disclose. Beyond personally identifiable information, sharing other kinds of personal information on Twitter may put people at risk to be taken advantage of. For example, in June 2009 Israel Hyman, an Arizona-based video podcaster, tweeted that he was looking forward to his family vacation to Saint Louis where they would be visiting family friends for the week. He tweeted again when they had successfully arrived in Missouri. While they were away, their house was broken into and several thousand dollars of computer and video equipment were stolen (Van Grove, 2009). According to one news report, Hyman said, "We don't know for sure if that's what caused the break it in, but it sure gives you pause to think about what you're publicly going to broadcast on the internet," ("Man Robbed After Posting His Vacation On Twitter", 2009). While this may have been an isolated event, it does raise questions about who has access to personal information and how that might put people at risk (Mills, 2009). Concerns about sharing information regarding where people are and when are not necessarily a new phenomenon. People have often tried to keep the fact that they are on vacation discreet from potential vandals or thieves, whether it be through cancelling their mail or newspaper service or even getting a house sitter. Social media, however, allow people to share their locations with thousands of people with the click of a button. Such broadcastability may have important safety implications. There are offline examples of broadcasting personal time and location information and the risks associated with it. For example, funeral notices in newspapers can broadcast where and when family members will be and there have been examples of people’s homes being broken into while they are at funeral services (Wolfe, 1992). Most funeral PRIVACY ON TWITTER 5 announcements request that flowers and cards be sent to the funeral home rather than the home of the family to avoid broadcasting the family’s home address. These examples suggest that personally identifiable information is not the only kind of personal information shared that can have privacy implications. Incidental information such as when and where people may be can also have privacy implications. Time and location may constitute a second tier of personally identifiable information, which while seemingly mundane and minor can raise potential safety concerns when publically broadcasted and shared. Prominence of Twitter Twitter is one of the fastest growing social network sites on the web today, with 8 million users joining monthly (Moore, 2009). Twitter is most frequently used by young adults. Twentyfive to 34 year olds make up the largest percentage of Twitter users (Lenhart & Fox, 2009). This differs somewhat from other social networking services. For example, Pew reported that median age of Twitterers is several years older than the median age of MySpace or Facebook users but younger than LinkedIn users (Lenhart & Fox, 2009). From its inception, Twitter was crossplatform, meaning that users could submit their messages via the web, instant messenger or SMS (“short messaging service” or text message). This may have contributed to the fact that Twitter users tend to be “more mobile in their communication and consumption of information” than the average internet user,” (Lenhart & Fox, 2009, p. 3). Previous studies of Twitter have explored the kinds of messages people post (Mischaud, 2007; Naaman, Boase, & Lai, 2010), the degree of interactivity within messages (d. boyd, Golder, & Lotan, 2010; Honeycutt & Herring, 2009), the network size of Twitterers and the frequency of tweets (Krishnamurthy, Gill, & Arlitt, 2008; Moore, 2009). Twitter ostensibly asks users, “What are you doing?”, but research suggests that users do not always tweet about what PRIVACY ON TWITTER 6 they are doing (Mischaud, 2007; Naaman et al., 2010). People use Twitter to share information about themselves as well as to share information publicly available elsewhere on the web, such as breaking news or interesting media such music, videos, blogs, etc. Honeycutt and Herring (2008) found that 41% Tweets in their sample were shared information about the author him or herself. Similarly Naaman, Boase, & Lai (2010) found that about half of Twitter messages were about the author him or herself while the rest were about other people or things. These studies suggest that Twitter users are not only talking about themselves directly; but even if just half of the messages are about themselves that still means that Twitter users are sharing 12 million tweets per day about themselves (Liew, 2009). Sometimes of course messages that do not directly reference the user can still share information about the user’s tastes, interests, and preferences (Liu, 2007). Given the rise of GPS and mobile technologies which may encourage sharing of location information (Humphreys, 2007), it is important to take a step back and examine personally identifiable information as well as a second tier of identifiable information including when and where people are. This is the first study to the best of our knowledge that explores the kinds of personally identifiable information that people post on Twitter. Social Media & Sharing Much research has explored the ways people share information about themselves online and the privacy implications (see Joinson & Paine, 2007 for an overview). Time and again, research has shown that people will disclose more personal information online than they will face-to-face (Joinson & Paine, 2007). Not only do people readily self-disclose in online experimental settings, (e.g. (Tidwell & Walther, 2002), but they often also disclose personal identifiable information when this is requested by a website (Metzger, 2004). The personal PRIVACY ON TWITTER 7 information revealed in Twitter messages, however, are at the complete discretion of users, so long as they conform to the 140-character limit. While Twitter differs from social network sites like Facebook and MySpace in its format, it can be helpful to look privacy attitudes and behaviors on these sites in order to better situate this study. A study of the attitudes towards privacy and Facebook use by Acquisti & Gross (2006) found while privacy concerns predicted Facebook use for older people, it did not predict use for students, suggesting that even when young adults were concerned about privacy issues they were still likely to be active and contributing members of Facebook. Lennart and Madden (2007) found that as many as two-thirds of teens on social network sites report to have changed their profile settings so that they are not visible to the entire public. In addition, younger teens and females were likely to engage in privacy

63 citations


Proceedings ArticleDOI
26 Apr 2010
TL;DR: This work proposes methods to anonymize a dynamic network such that the privacy of users is preserved when new nodes and edges are added to the published network, and uses link prediction algorithms to model the evolution of the social network.
Abstract: Anonymization of social networks before they are published or shared has become an important research question. Recent work on anonymizing social networks has looked at privacy preserving techniques for publishing a single instance of the network. However, social networks evolve and a single instance is inadequate for analyzing the evolution of the social network or for performing any longitudinal data analysis. We study the problem of repeatedly publishing social network data as the network evolves, while preserving privacy of users. Publishing multiple instances of the same network independently has privacy risks, since stitching the information together may allow an adversary to identify users in the networks.We propose methods to anonymize a dynamic network such that the privacy of users is preserved when new nodes and edges are added to the published network. These methods make use of link prediction algorithms to model the evolution of the social network. Using this predicted graph to perform group-based anonymization, the loss in privacy caused by new edges can be reduced. We evaluate the privacy loss on publishing multiple social network instances using our methods.

59 citations


Proceedings Article
22 Jun 2010
TL;DR: This work provides methods to anonymize a dynamic network when new nodes and edges are added to the published network and proposes metrics for privacy loss, and evaluates them for publishing multiple OSN instances.
Abstract: Recent work on anonymizing online social networks (OSNs) has looked at privacy preserving techniques for publishing a single instance of the network. However, OSNs evolve and a single instance is inadequate for analyzing their evolution or performing longitudinal data analysis. We study the problem of repeatedly publishing OSN data as the network evolves while preserving privacy of users. Publishing multiple instances independently has privacy risks, since stitching the information together may allow an adversary to identify users. We provide methods to anonymize a dynamic network when new nodes and edges are added to the published network. These methods use link prediction algorithms to model the evolution. Using this predicted graph to perform group-based anonymization, the loss in privacy caused by new edges can be eliminated almost entirely. We propose metrics for privacy loss, and evaluate them for publishing multiple OSN instances.

49 citations


Journal ArticleDOI
22 Oct 2010
TL;DR: This is a brief journey across the Internet privacy landscape and tries to convince you about the importance of the problem and how you might be able to apply your expertise to them.
Abstract: This is a brief journey across the Internet privacy landscape. After trying to convince you about the importance of the problem I will try to present questions of interest and how you might be able to apply your expertise to them.

41 citations


Proceedings ArticleDOI
01 Nov 2010
TL;DR: This paper moves beyond the traditional AS graph view of the Internet to define the problem of AS-to-organization mapping and describes the initial steps at automating the capture of the rich semantics inherent in the AS-level ecosystem where routing and connectivity intersect with organizations.
Abstract: An understanding of Internet topology is central to answer various questions ranging from network resilience to peer selection or data center location. While much of prior work has examined AS-level connectivity, meaningful and relevant results from such an abstract view of Internet topology have been limited. For one, semantically, AS relationships capture business relationships and not physical connectivity. Additionally, many organizations often use multiple ASes, either to implement different routing policies, or as legacies from mergers and acquisitions. In this paper, we move beyond the traditional AS graph view of the Internet to define the problem of AS-to-organization mapping. We describe our initial steps at automating the capture of the rich semantics inherent in the AS-level ecosystem where routing and connectivity intersect with organizations. We discuss preliminary methods that identify multi-AS organizations from WHOIS data and illustrate the challenges posed by the quality of the available data and the complexity of real-world organizational relationships.

41 citations


Journal ArticleDOI
TL;DR: This paper argues that simple models, such as node-and-edge graphs, are insufficient to describe and study OSNs, and proposes that a richer class of Entity Interaction Network models should be adopted.
Abstract: Online Social Networks (OSNs) have been the subject of a great deal of study in recent years. The majority of this study has used simple models, such as node-and-edge graphs, to describe the data. In this paper, we argue that such models, which necessarily limit the structures that can be described and omit temporal information, are insufficient to describe and study OSNs. Instead, we propose that a richer class of Entity Interaction Network models should be adopted. We outline a checklist of features that can help build such a model, and apply it to three popular networks (Twitter, Facebook and YouTube) to highlight important features. We also discuss important considerations for the collection, validation and sharing of OSN data.

32 citations


Patent
Balachander Krishnamurthy1
28 Sep 2010
TL;DR: In this paper, a system that incorporates teachings of the present disclosure may include, for example, a process that reduces a sampling size of a total population of on-line social network users based on a comparison of seed information to a population of online social network user.
Abstract: A system that incorporates teachings of the present disclosure may include, for example, a process that reduces a sampling size of a total population of on-line social network users based on a comparison of seed information to a population of on-line social network users. The reduced sampling of on-line social network users is compared to a social graph of the on-line social network users, wherein the social graph is obtained from an algorithm applied to the reduced sampling of the on-line social network users. An outlier is determined in the reduced sampling of on-line social network users based on a characterizing of a cluster of social network users. Additional embodiments are disclosed.

25 citations


01 Jan 2010
TL;DR: In this article, the disparity between the desired and actual privacy settings is quantified, quantifying the magnitude of the problem of managing privacy in online social networks, and how social network analysis techniques can be leveraged towards addressing the privacy management crisis.
Abstract: The sharing of personal data has emerged as a popular activity over online social networking sites like Facebook. As a result, the issue of online social network privacy has received significant attention in both the research literature and the mainstream media. Our overarching goal is to improve defaults and provide better tools for managing privacy, but we are limited by the fact that the full extent of the privacy problem remains unknown; there is little quantification of the incidence of incorrect privacy settings or the difficulty users face when managing their privacy. In this talk, I will first focus on measuring the disparity between the desired and actual privacy settings, quantifying the magnitude of the problem of managing privacy. Later, I will discuss how social network analysis techniques can be leveraged towards addressing the privacy management crisis.

Patent
Balachander Krishnamurthy1
16 Nov 2010
TL;DR: In this paper, a sender determines a set of attributes that define who will be eligible to receive a narrowcast communication and then transmits the narrowcast communications to those potential recipients.
Abstract: Narrowcast communication to one or more narrowcast communication recipients is provided through the use of an extensible method and apparatus. A narrowcast communication sender determines a set of attributes that define who will be eligible to receive a narrowcast communication. The set of attributes characterize potential recipients according to qualities such as interests, location, or another descriptor of a potential narrowcast communication recipient. Through the use of a privacy sphere, attributes associated with the narrowcast communication are matched to the qualities of potential recipients to identify the network addresses of the narrowcast communication recipients. The narrowcast communication is then transmitted to those network addresses. The narrowcast communication can be then expired from recipients who are no longer eligible to receive it and transmitted to recipients who become eligible to receive the narrowcast communication.