scispace - formally typeset
Search or ask a question
Author

Kyumin Lee

Other affiliations: Sungkyunkwan University, Texas A&M University, IBM  ...read more
Bio: Kyumin Lee is an academic researcher from Worcester Polytechnic Institute. The author has contributed to research in topics: Social media & Service discovery. The author has an hindex of 25, co-authored 92 publications receiving 4436 citations. Previous affiliations of Kyumin Lee include Sungkyunkwan University & Texas A&M University.


Papers
More filters
Proceedings ArticleDOI
26 Oct 2010
TL;DR: A probabilistic framework for estimating a Twitter user's city-level location based purely on the content of the user's tweets, which can overcome the sparsity of geo-enabled features in these services and enable new location-based personalized information services, the targeting of regional advertisements, and so on.
Abstract: We propose and evaluate a probabilistic framework for estimating a Twitter user's city-level location based purely on the content of the user's tweets, even in the absence of any other geospatial cues By augmenting the massive human-powered sensing capabilities of Twitter and related microblogging services with content-derived location information, this framework can overcome the sparsity of geo-enabled features in these services and enable new location-based personalized information services, the targeting of regional advertisements, and so on Three of the key features of the proposed approach are: (i) its reliance purely on tweet content, meaning no need for user IP information, private login information, or external knowledge bases; (ii) a classification component for automatically identifying words in tweets with a strong local geo-scope; and (iii) a lattice-based neighborhood smoothing model for refining a user's location estimate The system estimates k possible locations for each user in descending order of confidence On average we find that the location estimates converge quickly (needing just 100s of tweets), placing 51% of Twitter users within 100 miles of their actual location

1,213 citations

Proceedings Article
05 Jul 2011
TL;DR: It is found that LSS users follow the “Levy Flight” mobility pattern and adopt periodic behaviors; while geographic and economic constraints affect mobility patterns, so does individual social status; and Content and sentiment-based analysis of posts associated with checkins can provide a rich source of context for better understanding how users engage with these services.
Abstract: Location sharing services (LSS) like Foursquare, Gowalla, and Facebook Places support hundreds of millions of user-driven footprints (i.e., "checkins"). Those global-scale footprints provide a unique opportunity to study the social and temporal characteristics of how people use these services and to model patterns of human mobility, which are significant factors for the design of future mobile+location-based services, traffic forecasting, urban planning, as well as epidemiological models of disease spread. In this paper, we investigate 22 million checkins across 220,000 users and report a quantitative assessment of human mobility patterns by analyzing the spatial, temporal, social, and textual aspects associated with these footprints. We find that: (i) LSS users follow the “Levy Flight” mobility pattern and adopt periodic behaviors; (ii) While geographic and economic constraints affect mobility patterns, so does individual social status; and (iii) Content and sentiment-based analysis of posts associated with checkins can provide a rich source of context for better understanding how users engage with these services.

742 citations

Proceedings ArticleDOI
19 Jul 2010
TL;DR: It is found that the deployed social honeypots identify social spammers with low false positive rates and that the harvested spam data contains signals that are strongly correlated with observable profile features (e.g., content, friend information, posting patterns, etc.).
Abstract: Web-based social systems enable new community-based opportunities for participants to engage, share, and interact. This community value and related services like search and advertising are threatened by spammers, content polluters, and malware disseminators. In an effort to preserve community value and ensure longterm success, we propose and evaluate a honeypot-based approach for uncovering social spammers in online social systems. Two of the key components of the proposed approach are: (1) The deployment of social honeypots for harvesting deceptive spam profiles from social networking communities; and (2) Statistical analysis of the properties of these spam profiles for creating spam classifiers to actively filter out existing and new spammers. We describe the conceptual framework and design considerations of the proposed approach, and we present concrete observations from the deployment of social honeypots in MySpace and Twitter. We find that the deployed social honeypots identify social spammers with low false positive rates and that the harvested spam data contains signals that are strongly correlated with observable profile features (e.g., content, friend information, posting patterns, etc.). Based on these profile features, we develop machine learning based classifiers for identifying previously unknown spammers with high precision and a low rate of false positives.

714 citations

Proceedings Article
05 Jul 2011
TL;DR: This paper presents the first long-term study of social honeypots for tempting, profiling, and filtering content polluters in social media, and evaluates a wide range of features to investigate the effectiveness of automatic content polluter identification.
Abstract: The rise in popularity of social networking sites such as Twitter and Facebook has been paralleled by the rise of unwanted, disruptive entities on these networks- — including spammers, malware disseminators, and other content polluters. Inspired by sociologists working to ensure the success of commons and criminologists focused on deterring vandalism and preventing crime, we present the first long-term study of social honeypots for tempting, profiling, and filtering content polluters in social media. Concretely, we report on our experiences via a seven-month deployment of 60 honeypots on Twitter that resulted in the harvesting of 36,000 candidate content polluters. As part of our study, we (1) examine the harvested Twitter users, including an analysis of link payloads, user behavior over time, and followers/following network dynamics and (2) evaluate a wide range of features to investigate the effectiveness of automatic content polluter identification.

465 citations

Proceedings ArticleDOI
26 Apr 2010
TL;DR: The conceptual framework of the Social Honeypot Project for uncovering social spammers who target online communities and initial empirical results from Twitter and MySpace are presented.
Abstract: We present the conceptual framework of the Social Honeypot Project for uncovering social spammers who target online communities and initial empirical results from Twitter and MySpace. Two of the key components of the Social Honeypot Project are: (1) The deployment of social honeypots for harvesting deceptive spam profiles from social networking communities; and (2) Statistical analysis of the properties of these spam profiles for creating spam classifiers to actively filter out existing and new spammers.

120 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors presented a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets.
Abstract: Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of \fake news", i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ine ective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users' social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.

1,891 citations

Journal ArticleDOI
TL;DR: In this article, applied linear regression models are used for linear regression in the context of quality control in quality control systems, and the results show that linear regression is effective in many applications.
Abstract: (1991). Applied Linear Regression Models. Journal of Quality Technology: Vol. 23, No. 1, pp. 76-77.

1,811 citations

Journal ArticleDOI
TL;DR: In this article, the authors discuss the threat posed by today's social bots and how their presence can endanger online ecosystems as well as our society, and how to deal with them.
Abstract: Today's social bots are sophisticated and sometimes menacing. Indeed, their presence can endanger online ecosystems as well as our society.

1,259 citations

Journal ArticleDOI
TL;DR: In this paper, the authors discuss the characteristics of modern, sophisticated social bots and how their presence can endanger online ecosystems and our society, and review current efforts to detect social bots on Twitter.
Abstract: The Turing test aimed to recognize the behavior of a human from that of a computer algorithm. Such challenge is more relevant than ever in today's social media context, where limited attention and technology constrain the expressive power of humans, while incentives abound to develop software agents mimicking humans. These social bots interact, often unnoticed, with real people in social media ecosystems, but their abundance is uncertain. While many bots are benign, one can design harmful bots with the goals of persuading, smearing, or deceiving. Here we discuss the characteristics of modern, sophisticated social bots, and how their presence can endanger online ecosystems and our society. We then review current efforts to detect social bots on Twitter. Features related to content, network, sentiment, and temporal patterns of activity are imitated by bots but at the same time can help discriminate synthetic behaviors from human ones, yielding signatures of engineered social tampering.

1,229 citations