Improving Knowledge Based Spam Detection Methods: The Effect of Malicious Related Features in Imbalance Data Distribution

doi:10.4236/IJCNS.2015.85014

Open AccessJournal ArticleDOI

Improving Knowledge Based Spam Detection Methods: The Effect of Malicious Related Features in Imbalance Data Distribution

Ja'far Alqatawna, +4 more

- 09 Apr 2015 -

Int'l J. of Communications, Network and ...

- Vol. 08, Iss: 5, pp 118-129

Chats0

TLDR

The issue of spam detection is investigated with the aim to develop an efficient method to identify spam email based on the analysis of the content of email messages and a set of features that have a considerable number of malicious related features are identified.

Abstract:

Spam is no longer just commercial unsolicited email messages that waste our time, it consumes network traffic and mail servers’ storage. Furthermore, spam has become a major component of several attack vectors including attacks such as phishing, cross-site scripting, cross-site request forgery and malware infection. Statistics show that the amount of spam containing malicious contents increased compared to the one advertising legitimate products and services. In this paper, the issue of spam detection is investigated with the aim to develop an efficient method to identify spam email based on the analysis of the content of email messages. We identify a set of features that have a considerable number of malicious related features. Our goal is to study the effect of these features in helping the classical classifiers in identifying spam emails. To make the problem more challenging, we developed spam classification models based on imbalanced data where spam emails form the rare class with only 16.5% of the total emails. Different metrics were utilized in the evaluation of the developed models. Results show noticeable improvement of spam classification models when trained by dataset that includes malicious related features.

Improving Knowledge Based Spam Detection Methods: The Effect of Malicious Related Features in Imbalance Data Distribution

Citations

An intelligent system for spam detection and identification of the most relevant features based on evolutionary Random Weight Networks

Optimizing Feedforward neural networks using Krill Herd algorithm for E-mail spam detection

Spam profile detection in social networks based on public features

A Hybrid Approach Based on Particle Swarm Optimization and Random Forests for E-Mail Spam Filtering

Statistical Detection of Online Drifting Twitter Spam: Invited Paper

References

Review: A review of machine learning approaches to Spam filtering

The underground economy of spam: a botmaster's perspective of coordinating large-scale spam campaigns

The Economics of Spam

Show me the money: characterizing spam-advertised revenue

Kaspersky security Bulletin 2013

Related Papers (5)

Review: A review of machine learning approaches to Spam filtering

An Empirical Study of Spam : Analyzing Spam Sending Systems and Malicious Web Servers

Towards Proactive Spam Filtering (Extended Abstract)

A Proposed Model for Malicious Spam Detection in Email Systems of Educational Institutes

Characterizing a spam traffic