Spam Filtering Based On The Analysis Of Text Information Embedded Into Images

Open AccessJournal Article

Spam Filtering Based On The Analysis Of Text Information Embedded Into Images

Giorgio Fumera, +2 more

- 01 Dec 2006 -

Journal of Machine Learning Research

- Vol. 7, Iss: 98, pp 2699-2720

Chats0

TLDR

This paper proposes an approach to anti-spam filtering which exploits the text information embedded into images sent as attachments, based on the application of state-of-the-art text categorisation techniques to the analysis of text extracted by OCR tools from images attached to e-mails.

Abstract:

In recent years anti-spam filters have become necessary tools for Internet service providers to face up to the continuously growing spam phenomenon. Current server-side anti-spam filters are made up of several modules aimed at detecting different features of spam e-mails. In particular, text categorisation techniques have been investigated by researchers for the design of modules for the analysis of the semantic content of e-mails, due to their potentially higher generalisation capability with respect to manually derived classification rules used in current server-side filters. However, very recently spammers introduced a new trick consisting of embedding the spam message into attached images, which can make all current techniques based on the analysis of digital text in the subject and body fields of e-mails ineffective. In this paper we propose an approach to anti-spam filtering which exploits the text information embedded into images sent as attachments. Our approach is based on the application of state-of-the-art text categorisation techniques to the analysis of text extracted by OCR tools from images attached to e-mails. The effectiveness of the proposed approach is experimentally evaluated on two large corpora of spam e-mails.

Spam Filtering Based On The Analysis Of Text Information Embedded Into Images

Citations

Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning

Review: A review of machine learning approaches to Spam filtering

A survey of learning-based techniques of email spam filtering

Email Spam Filtering: A Systematic Review

Combating Adversarial Misspellings with Robust Word Recognition

References

Machine learning in automated text categorization

Advances in kernel methods: support vector learning

Making large scale SVM learning practical

A comparison of event models for naive bayes text classification

A Bayesian Approach to Filtering Junk E-Mail

Related Papers (5)

Learning Fast Classifiers for Image Spam.

Using visual features for anti-spam filtering

Support vector machines for spam categorization

Filtering Image Spam with Near-Duplicate Detection.

A Bayesian Approach to Filtering Junk E-Mail