scispace - formally typeset
Search or ask a question
Author

Kurt Thomas

Bio: Kurt Thomas is an academic researcher from Google. The author has contributed to research in topics: Phishing & Social spam. The author has an hindex of 25, co-authored 44 publications receiving 4367 citations. Previous affiliations of Kurt Thomas include University of California, Berkeley & University of Illinois at Urbana–Champaign.

Papers
More filters
Proceedings Article
16 Aug 2017
TL;DR: It is argued that Mirai may represent a sea change in the evolutionary development of botnets--the simplicity through which devices were infected and its precipitous growth, and that novice malicious techniques can compromise enough low-end devices to threaten even some of the best-defended targets.
Abstract: The Mirai botnet, composed primarily of embedded and IoT devices, took the Internet by storm in late 2016 when it overwhelmed several high-profile targets with massive distributed denial-of-service (DDoS) attacks. In this paper, we provide a seven-month retrospective analysis of Mirai's growth to a peak of 600k infections and a history of its DDoS victims. By combining a variety of measurement perspectives, we analyze how the botnet emerged, what classes of devices were affected, and how Mirai variants evolved and competed for vulnerable hosts. Our measurements serve as a lens into the fragile ecosystem of IoT devices. We argue that Mirai may represent a sea change in the evolutionary development of botnets--the simplicity through which devices were infected and its precipitous growth, demonstrate that novice malicious techniques can compromise enough low-end devices to threaten even some of the best-defended targets. To address this risk, we recommend technical and nontechnical interventions, as well as propose future research directions.

1,236 citations

Proceedings ArticleDOI
04 Oct 2010
TL;DR: A characterization of spam on Twitter finds that 8% of 25 million URLs posted to the site point to phishing, malware, and scams listed on popular blacklists, and examines whether the use of URL blacklists would help to significantly stem the spread of Twitter spam.
Abstract: In this work we present a characterization of spam on Twitter. We find that 8% of 25 million URLs posted to the site point to phishing, malware, and scams listed on popular blacklists. We analyze the accounts that send spam and find evidence that it originates from previously legitimate accounts that have been compromised and are now being puppeteered by spammers. Using clickthrough data, we analyze spammers' use of features unique to Twitter and the degree that they affect the success of spam. We find that Twitter is a highly successful platform for coercing users to visit spam pages, with a clickthrough rate of 0.13%, compared to much lower rates previously reported for email spam. We group spam URLs into campaigns and identify trends that uniquely distinguish phishing, malware, and spam, to gain an insight into the underlying techniques used to attract users.Given the absence of spam filtering on Twitter, we examine whether the use of URL blacklists would help to significantly stem the spread of Twitter spam. Our results indicate that blacklists are too slow at identifying new threats, allowing more than 90% of visitors to view a page before it becomes blacklisted. We also find that even if blacklist delays were reduced, the use by spammers of URL shortening services for obfuscation negates the potential gains unless tools that use blacklists develop more sophisticated spam filtering.

613 citations

Proceedings ArticleDOI
22 May 2011
TL;DR: It is shown that Monarch can provide accurate, real-time protection, but that the underlying characteristics of spam do not generalize across web services, and the distinctions between email and Twitter spam are explored.
Abstract: On the heels of the widespread adoption of web services such as social networks and URL shorteners, scams, phishing, and malware have become regular threats. Despite extensive research, email-based spam filtering techniques generally fall short for protecting other web services. To better address this need, we present Monarch, a real-time system that crawls URLs as they are submitted to web services and determines whether the URLs direct to spam. We evaluate the viability of Monarch and the fundamental challenges that arise due to the diversity of web service spam. We show that Monarch can provide accurate, real-time protection, but that the underlying characteristics of spam do not generalize across web services. In particular, we find that spam targeting email qualitatively differs in significant ways from spam campaigns targeting Twitter. We explore the distinctions between email and Twitter spam, including the abuse of public web hosting and redirector services. Finally, we demonstrate Monarch's scalability, showing our system could protect a service such as Twitter -- which needs to process 15 million URLs/day -- for a bit under $800/day.

508 citations

Proceedings ArticleDOI
02 Nov 2011
TL;DR: This study examines the abuse of online social networks at the hands of spammers through the lens of the tools, techniques, and support infrastructure they rely upon and identifies an emerging marketplace of illegitimate programs operated by spammers.
Abstract: In this study, we examine the abuse of online social networks at the hands of spammers through the lens of the tools, techniques, and support infrastructure they rely upon. To perform our analysis, we identify over 1.1 million accounts suspended by Twitter for disruptive activities over the course of seven months. In the process, we collect a dataset of 1.8 billion tweets, 80 million of which belong to spam accounts. We use our dataset to characterize the behavior and lifetime of spam accounts, the campaigns they execute, and the wide-spread abuse of legitimate web services such as URL shorteners and free web hosting. We also identify an emerging marketplace of illegitimate programs operated by spammers that include Twitter account sellers, ad-based URL shorteners, and spam affiliate programs that help enable underground market diversification.Our results show that 77% of spam accounts identified by Twitter are suspended within on day of their first tweet. Because of these pressures, less than 9% of accounts form social relationships with regular Twitter users. Instead, 17% of accounts rely on hijacking trends, while 52% of accounts use unsolicited mentions to reach an audience. In spite of daily account attrition, we show how five spam campaigns controlling 145 thousand accounts combined are able to persist for months at a time, with each campaign enacting a unique spamming strategy. Surprisingly, three of these campaigns send spam directing visitors to reputable store fronts, blurring the line regarding what constitutes spam on social networks.

499 citations

Proceedings Article
14 Aug 2013
TL;DR: This work investigates the market for fraudulent Twitter accounts to monitor prices, availability, and fraud perpetrated by 27 merchants over the course of a 10-month period, and develops a classifier to retroactively detect several million fraudulent accounts sold via this marketplace.
Abstract: As web services such as Twitter, Facebook, Google, and Yahoo now dominate the daily activities of Internet users, cyber criminals have adapted their monetization strategies to engage users within these walled gardens. To facilitate access to these sites, an underground market has emerged where fraudulent accounts - automatically generated credentials used to perpetrate scams, phishing, and malware - are sold in bulk by the thousands. In order to understand this shadowy economy, we investigate the market for fraudulent Twitter accounts to monitor prices, availability, and fraud perpetrated by 27 merchants over the course of a 10-month period. We use our insights to develop a classifier to retroactively detect several million fraudulent accounts sold via this marketplace, 95% of which we disable with Twitter's help. During active months, the 27 merchants we monitor appeared responsible for registering 10-20% of all accounts later flagged for spam by Twitter, generating $127-459K for their efforts.

273 citations


Cited by
More filters
Proceedings Article
25 Apr 2012
TL;DR: Resilient Distributed Datasets is presented, a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner and is implemented in a system called Spark, which is evaluated through a variety of user applications and benchmarks.
Abstract: We present Resilient Distributed Datasets (RDDs), a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner. RDDs are motivated by two types of applications that current computing frameworks handle inefficiently: iterative algorithms and interactive data mining tools. In both cases, keeping data in memory can improve performance by an order of magnitude. To achieve fault tolerance efficiently, RDDs provide a restricted form of shared memory, based on coarse-grained transformations rather than fine-grained updates to shared state. However, we show that RDDs are expressive enough to capture a wide class of computations, including recent specialized programming models for iterative jobs, such as Pregel, and new applications that these models do not capture. We have implemented RDDs in a system called Spark, which we evaluate through a variety of user applications and benchmarks.

4,151 citations

Proceedings ArticleDOI
28 Mar 2011
TL;DR: There are measurable differences in the way messages propagate, that can be used to classify them automatically as credible or not credible, with precision and recall in the range of 70% to 80%.
Abstract: We analyze the information credibility of news propagated through Twitter, a popular microblogging service. Previous research has shown that most of the messages posted on Twitter are truthful, but the service is also used to spread misinformation and false rumors, often unintentionally.On this paper we focus on automatic methods for assessing the credibility of a given set of tweets. Specifically, we analyze microblog postings related to "trending" topics, and classify them as credible or not credible, based on features extracted from them. We use features from users' posting and re-posting ("re-tweeting") behavior, from the text of the posts, and from citations to external sources.We evaluate our methods using a significant number of human assessments about the credibility of items on a recent sample of Twitter postings. Our results shows that there are measurable differences in the way messages propagate, that can be used to classify them automatically as credible or not credible, with precision and recall in the range of 70% to 80%.

2,123 citations

Journal ArticleDOI
TL;DR: This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.
Abstract: This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications

1,776 citations

Proceedings Article
16 Aug 2017
TL;DR: It is argued that Mirai may represent a sea change in the evolutionary development of botnets--the simplicity through which devices were infected and its precipitous growth, and that novice malicious techniques can compromise enough low-end devices to threaten even some of the best-defended targets.
Abstract: The Mirai botnet, composed primarily of embedded and IoT devices, took the Internet by storm in late 2016 when it overwhelmed several high-profile targets with massive distributed denial-of-service (DDoS) attacks. In this paper, we provide a seven-month retrospective analysis of Mirai's growth to a peak of 600k infections and a history of its DDoS victims. By combining a variety of measurement perspectives, we analyze how the botnet emerged, what classes of devices were affected, and how Mirai variants evolved and competed for vulnerable hosts. Our measurements serve as a lens into the fragile ecosystem of IoT devices. We argue that Mirai may represent a sea change in the evolutionary development of botnets--the simplicity through which devices were infected and its precipitous growth, demonstrate that novice malicious techniques can compromise enough low-end devices to threaten even some of the best-defended targets. To address this risk, we recommend technical and nontechnical interventions, as well as propose future research directions.

1,236 citations

Journal ArticleDOI
TL;DR: The authors conducted a comprehensive literature search, identifying 412 relevant articles, which were sorted into 5 categories: descriptive analysis of users, motivations for using Facebook, identity presentation, the role of Facebook in social interactions, and privacy and information disclosure.
Abstract: With over 800 million active users, Facebook is changing the way hundreds of millions of people relate to one another and share information. A rapidly growing body of research has accompanied the meteoric rise of Facebook as social scientists assess the impact of Facebook on social life. In addition, researchers have recognized the utility of Facebook as a novel tool to observe behavior in a naturalistic setting, test hypotheses, and recruit participants. However, research on Facebook emanates from a wide variety of disciplines, with results being published in a broad range of journals and conference proceedings, making it difficult to keep track of various findings. And because Facebook is a relatively recent phenomenon, uncertainty still exists about the most effective ways to do Facebook research. To address these issues, the authors conducted a comprehensive literature search, identifying 412 relevant articles, which were sorted into 5 categories: descriptive analysis of users, motivations for using Facebook, identity presentation, the role of Facebook in social interactions, and privacy and information disclosure. The literature review serves as the foundation from which to assess current findings and offer recommendations to the field for future research on Facebook and online social networks more broadly.

1,148 citations