Topic

Image spam

About: Image spam is a research topic. Over the lifetime, 175 publications have been published within this topic receiving 4126 citations. The topic is also known as: image-based spam.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Support vector machines for spam categorization

[...]

H. Drucker¹, Donghui Wu, Vladimir Vapnik•Institutions (1)

AT&T Labs¹

01 Sep 1999-IEEE Transactions on Neural Networks

TL;DR: The use of support vector machines in classifying e-mail as spam or nonspam is studied by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees, which found SVM's performed best when using binary features.

...read moreread less

Abstract: We study the use of support vector machines (SVM) in classifying e-mail as spam or nonspam by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees. These four algorithms were tested on two different data sets: one data set where the number of features were constrained to the 1000 best features and another data set where the dimensionality was over 7000. SVM performed best when using binary features. For both data sets, boosting trees and SVM had acceptable test performance in terms of accuracy and speed. However, SVM had significantly less training time.

...read moreread less

1,536 citations

Journal Article•DOI•

Review: A review of machine learning approaches to Spam filtering

[...]

Thiago Guzella¹, Walmir Matos Caminhas¹•Institutions (1)

Universidade Federal de Minas Gerais¹

01 Sep 2009-Expert Systems With Applications

TL;DR: A comprehensive review of recent developments in the application of machine learning algorithms to Spam filtering, focusing on both textual- and image-based approaches concludes that while important advancements have been made in the last years, several aspects remain to be explored, especially under more realistic evaluation settings.

...read moreread less

Abstract: In this paper, we present a comprehensive review of recent developments in the application of machine learning algorithms to Spam filtering, focusing on both textual- and image-based approaches. Instead of considering Spam filtering as a standard classification problem, we highlight the importance of considering specific characteristics of the problem, especially concept drift, in designing new filters. Two particularly important aspects not widely recognized in the literature are discussed: the difficulties in updating a classifier based on the bag-of-words representation and a major difference between two early naive Bayes models. Overall, we conclude that while important advancements have been made in the last years, several aspects remain to be explored, especially under more realistic evaluation settings.

...read moreread less

468 citations

Journal Article•

Spam Filtering Based On The Analysis Of Text Information Embedded Into Images

[...]

Giorgio Fumera, Ignazio Pillai, Fabio Roli

01 Dec 2006-Journal of Machine Learning Research

TL;DR: This paper proposes an approach to anti-spam filtering which exploits the text information embedded into images sent as attachments, based on the application of state-of-the-art text categorisation techniques to the analysis of text extracted by OCR tools from images attached to e-mails.

...read moreread less

Abstract: In recent years anti-spam filters have become necessary tools for Internet service providers to face up to the continuously growing spam phenomenon. Current server-side anti-spam filters are made up of several modules aimed at detecting different features of spam e-mails. In particular, text categorisation techniques have been investigated by researchers for the design of modules for the analysis of the semantic content of e-mails, due to their potentially higher generalisation capability with respect to manually derived classification rules used in current server-side filters. However, very recently spammers introduced a new trick consisting of embedding the spam message into attached images, which can make all current techniques based on the analysis of digital text in the subject and body fields of e-mails ineffective. In this paper we propose an approach to anti-spam filtering which exploits the text information embedded into images sent as attachments. Our approach is based on the application of state-of-the-art text categorisation techniques to the analysis of text extracted by OCR tools from images attached to e-mails. The effectiveness of the proposed approach is experimentally evaluated on two large corpora of spam e-mails.

...read moreread less

161 citations

Proceedings Article•

Learning Fast Classifiers for Image Spam.

[...]

Mark Dredze, Reuven Gevaryahu, Ari Elias-Bachrach

01 Jan 2007

TL;DR: This paper presents features that focus on simple properties of the image, making classification as fast as possible, and introduces a new feature selection algorithm that selects features for classification based on their speed as well as predictive power.

...read moreread less

Abstract: Recently, spammers have proliferated “image spam”, emails which contain the text of the spam message in a human readable image instead of the message body, making detection by conventional content filters difficult. New techniques are needed to filter these messages. Our goal is to automatically classify an image directly as being spam or ham. We present features that focus on simple properties of the image, making classification as fast as possible. Our evaluation shows that they accurately classify spam images in excess of 90% and up to 99% on real world data. Furthermore, we introduce a new feature selection algorithm that selects features for classification based on their speed as well as predictive power. This technique produces an accurate system that runs in a tiny fraction of the time. Finally, we introduce Just in Time (JIT) feature extraction, which creates features at classification time as needed by the classifier. We demonstrate JIT extraction using a JIT decision tree that further increases system speed. This paper makes image spam classification practical by providing both high accuracy features and a method to learn fast classifiers.

...read moreread less

145 citations

Proceedings Article•DOI•

Using visual features for anti-spam filtering

[...]

Ching-Tung Wu¹, Kwang-Ting Cheng¹, Qiang Zhu¹, Yi-Leh Wu•Institutions (1)

University of California, Santa Barbara¹

01 Jan 2005

TL;DR: A novel anti-spam system which utilizes visual clues, in addition to text information in the email body, to determine whether a message is spam, using one-class support vector machines (SVM) as the underlying base classifier for anti- Spam filtering.

...read moreread less

Abstract: Unsolicited commercial email (UCE), also known as spam, has been a major problem on the Internet. In the past, researchers have addressed this problem as a text classification or categorization problem. However, as spammers' techniques continue to evolve and the genre of email content becomes more and more diverse, text-based anti-spam approaches alone are no longer sufficient. In this paper, we propose a novel anti-spam system which utilizes visual clues, in addition to text information in the email body, to determine whether a message is spam. We analyze a large collection of spam emails containing images and identify a number of useful visual features for this application. We then propose using one-class support vector machines (SVM) as the underlying base classifier for anti-spam filtering. The experimental results demonstrate that the proposed system can add significant filtering power to the existing text-based anti-spam filters.

...read moreread less

117 citations

Collapse

Network Information

Performance

Metrics

175

Papers

4,537

Citations

No. of papers in the topic in previous years
Year	Papers
2021	7
2020	9
2019	8
2018	11
2017	4
2016	7

Image spam

Papers published on a yearly basis

Papers

Trending Questions (4)

Network Information

Related Topics (5)

Performance

Metrics