Towards Proactive Spam Filtering (Extended Abstract)

doi:10.1007/978-3-642-02918-9_3

Home
/
Papers
/
Towards Proactive Spam Filtering (Extended Abstract)

Book Chapter•DOI•

Towards Proactive Spam Filtering (Extended Abstract)

Jan Göbel¹, Thorsten Holz¹, Philipp Trinius¹•Institutions (1)

University of Mannheim¹

29 Jun 2009-pp 38-47

TL;DR: This paper introduces a more proactive approach that allows us to directly collect spam message by interacting with the spam botnet controllers and generates templates that represent a concise summary of a spam run.

read less

Abstract: With increasing security measures in network services, remote exploitation is getting harder. As a result, attackers concentrate on more reliable attack vectors like email: victims are infected using either malicious attachments or links leading to malicious websites. Therefore efficient filtering and blocking methods for spam messages are needed. Unfortunately, most spam filtering solutions proposed so far are reactive , they require a large amount of both ham and spam messages to efficiently generate rules to differentiate between both. In this paper, we introduce a more proactive approach that allows us to directly collect spam message by interacting with the spam botnet controllers. We are able to observe current spam runs and obtain a copy of latest spam messages in a fast and efficient way. Based on the collected information we are able to generate templates that represent a concise summary of a spam run. The collected data can then be used to improve current spam filtering techniques and develop new venues to efficiently filter mails.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Web Spambot Detection Based on Web Navigation Behaviour

[...]

Pedram Hayati¹, Vidyasagar Potdar¹, Kevin Chai¹, Alex Talevski¹•Institutions (1)

Curtin University¹

20 Apr 2010

TL;DR: An automated supervised machine learning solution which utilises web navigation behaviour to detect web spambots and proposes a new feature set (referred to as an action set) as a representation of user behaviour to differentiate web spamots from human users.

...read moreread less

Abstract: Web robots have been widely used for various beneficial and malicious activities. Web spambots are a type of web robot that spreads spam content throughout the web by typically targeting Web 2.0 applications. They are intelligently designed to replicate human behaviour in order to bypass system checks. Spam content not only wastes valuable resources but can also mislead users to unsolicited websites and award undeserved search engine rankings to spammers’ campaign websites. While most of the research in anti-spam filtering focuses on the identification of spam content on the web, only a few have investigated the origin of spam content, hence identification and detection of web spambots still remains an open area of research. In this paper, we describe an automated supervised machine learning solution which utilises web navigation behaviour to detect web spambots. We propose a new feature set (referred to as an action set) as a representation of user behaviour to differentiate web spambots from human users. Our experimental results show that our solution achieves a 96.24% accuracy in classifying web spambots.

...read moreread less

41 citations

Proceedings Article•DOI•

A case for unsupervised-learning-based spam filtering

[...]

Feng Qian¹, Abhinav Pathak², Yu Charlie Hu², Zhuoqing Morley Mao¹, Yinglian Xie³ - Show less +1 more•Institutions (3)

University of Michigan¹, Purdue University², Microsoft³

14 Jun 2010

TL;DR: To the knowledge, SCA is the first unsupervised spam filtering scheme that achieves accuracy comparable to the de-facto supervised spam filters by explicitly exploiting online campaign identification.

...read moreread less

Abstract: Traditional content-based spam filtering systems rely on supervised machine learning techniques. In the training phase, labeled email instances are used to build a learning model (e.g., a Naive Bayes classifier or support vector machine), which is then applied to future incoming emails in the detection phase. However, the critical reliance on the training data becomes one of the major limitations of supervised spam filters. Preparing labeled training data is often labor-intensive and can delay the learning-detection cycle. Furthermore, any mislabeling of the training corpus (e.g., due to spammers’ obfuscations) can severely affect the detection accuracy. Supervised learning schemes share one common mechanism regardless of their algorithm details: learning is performed on an individual email basis. This is the fundamental reason for requiring training data for supervised spam filters. In other words, in the learning phase these classifiers can never tell whether an email is spam or ham because they examine one email instance at a time. We investigate the feasibility of a completely unsupervised-learningbased spam filtering scheme which requires no training data. Our study is motivated by three key observations of the spam in today’s Internet. (1) The vast majority of emails are spam. (2) A spam email should always belong to some campaign [2, 3]. (3) The spam from the same campaign are generated from templates that obfuscate some parts of the spam, e.g., sensitive terms, leaving the other parts unmodified [3]. These observations suggest that in principle we can achieve unsupervised spam detection by examining emails at the campaign level. In particular, we need robust spam identification algorithms to find common terms shared by spam belonging to the same campaign. These common terms form signatures that can be used to detect future spam of the same campaign. This paper presents SpamCampaignAssassin (SCA), an online unsupervised spam learning and detection scheme. SCA performs accurate spam campaign identification, campaign signature generation, and spam detection using campaign signatures. To our knowledge, SCA is the first unsupervised spam filtering scheme that achieves accuracy comparable to the de-facto supervised spam filters by explicitly exploiting online campaign identification. The full paper describing SCA is available as a technical report [4].

...read moreread less

27 citations

Cites methods from "Towards Proactive Spam Filtering (E..."

...(4) The final category includes automated systems like [46, 12, 31] AutoRE [46] is a technique that automatically extracts regular expressions from URLs that satisfy distributed and burstiness criteria as signatures (e....
[...]

Journal Article•DOI•

Deviating From the Cybercriminal Script: Exploring Tools of Anonymity (Mis)Used by Carders on Cryptomarkets:

[...]

Gert Jan van Hardeveld¹, Craig Webber¹, Kieron O'Hara¹•Institutions (1)

University of Southampton¹

09 Oct 2017-American Behavioral Scientist

TL;DR: It is concluded that finding pitfalls in the usage of tools by cybercriminals has the potential to increase the efficiency of disruption, interception, and prevention approaches.

...read moreread less

Abstract: This work presents an overview of some of the tools that cybercriminals employ to trade securely. It will look at the weaknesses of these tools and how the behavior of cybercriminals will sometimes...

...read moreread less

24 citations

Rule-Based On-the-fly Web Spambot Detection Using Action Strings

[...]

Pedram Hayati¹, Vidyasagar Potdar¹, Alex Talevski¹, William F. Smyth¹•Institutions (1)

Curtin University¹

01 Jan 2010

TL;DR: In this article, a rule-based web usage behavior action string that can be analyzed using Trie data structures to detect web spambots is proposed to eliminate spam in web 2.0 applications.

...read moreread less

Abstract: spambots are a new type of internet robot that spread spam content through Web 2.0 applications like online discussion boards, blogs, wikis, social networking platforms etc. These robots are intelligently designed to act like humans in order to fool safeguards and other users. Such spam content not only wastes valuable resources and time but also may mislead users with unsolicited content. Spam content typically intends to misinform users (scams), generate traffic, make sales (marketing/advertising), and occasionally compromise parties, people or systems by spreading spyware or malwares. Current countermeasures do not effectively identify and prevent web spambots. Proactive measures to deter spambots from entering a site are limited to question / response scenarios. The remaining efforts then focus on spam content identification as a passive activity. Spammers have evolved their techniques to bypass existing anti-spam filters. In this paper, we describe a rule-based web usage behaviour action string that can be analysed using Trie data structures to detect web spambots. Our experimental results show the proposed system is successful for on-the-fly classification of web spambots hence eliminating spam in web 2.0 applications.

...read moreread less

11 citations

Book•

Large-Scale Detection and Measurement of Malicious Content

[...]

Jan Göbel

12 Jul 2011

TL;DR: This thesis introduces two new malware detection sensors that make use of the so-called honeypots and study the change in exploit behavior and derive predictions about preferred targets of todays’ malware.

...read moreread less

Abstract: Many different network and host-based security solutions have been developed in the past to counter the threat of autonomously spreading malware. Among the most common detection techniques for such attacks are network traffic analysis and the so-called honeypots. In this thesis, we introduce two new malware detection sensors that make use of the above mentioned techniques. The first sensor called Rishi, passively monitors network traffic to automatically detect bot infected machines. The second sensor called Amun follows the concept of honeypots and detects malware through the emulation of vulnerabilities in network services that are commonly exploited. Both sensors were operated for two years and collected valuable data on autonomously spreading malware in the Internet. From this data we were able to, for example, study the change in exploit behavior and derive predictions about preferred targets of todays’ malware.

...read moreread less

5 citations

References

PDF

Open Access

More filters

Proceedings Article•

HoneySpam: honeypots fighting spam at the source

[...]

Mauro Andreolini, Alessandro Bulgarelli, Michele Colajanni, Francesca Mazzoni

07 Jul 2005

TL;DR: HoneySpam is a fully operating framework that is based on honeypot technologies and is able to address the most common malicious spammer activities and includes slowdown of the e-mail harvesting process.

...read moreread less

Abstract: In this paper, we present the design and implementation of HoneySpam, a fully operating framework that is based on honeypot technologies and is able to address the most common malicious spammer activities. The idea is that of limiting unwanted traffic by fighting spamming at the sources rather than at the receivers, as it is done by the large majority of present proposals and products. The features of HoneySpam include slowdown of the e-mail harvesting process, poisoning of e-mail databases through apparently working addresses, increased spammer traceability through the use of fake open proxies and open relays.

...read moreread less

36 citations

Journal Article•DOI•

Spam Filtering With Dynamically Updated URL Statistics

[...]

Jangbok Kim¹, Kihyun Chung¹, Kyunghee Choi¹•Institutions (1)

Ajou University¹

01 Jul 2007

TL;DR: The authors' proposed URL-based spam filter instead analyzes URL statistics to dynamically calculate the probabilities of whether email with specific URLs are spam or legitimate, and then classifies them accordingly.

...read moreread less

Abstract: Many URL-based spam filters rely on "white" and "black" lists to classify email. The authors' proposed URL-based spam filter instead analyzes URL statistics to dynamically calculate the probabilities of whether email with specific URLs are spam or legitimate, and then classifies them accordingly.

...read moreread less

29 citations