scispace - formally typeset
Open AccessJournal ArticleDOI

Filtering Template Driven Spam Mails using Vector Space Models

Liny Varghese, +2 more
- 29 Feb 2012 - 
- Vol. 39, Iss: 14, pp 33-35
TLDR
The main objective in this paper is to find out semantic distance and evaluate the applicability of the two information retrieval techniques, Simple Vector Space Models (VSM) and VSM using Rocchio Classification in the spam context.
Abstract
Spam became a big problem to the society. Some spammers are using templates for sending spam. To send a particular promotion they create some template and merge the details of receivers with the template. Similarities can find among these mails and easily ignore the forthcoming spam. Most highvolume spam is sent using tools those randomizes parts of the message - subject, body, sender address etc. The general form of the template that the spammer is using can often guess by inspecting the features of messages. Most of the spam filters are either rule based models or Bayesian models. The main objective in this paper is to find out semantic distance and evaluate the applicability of the two information retrieval techniques, Simple Vector Space Models (VSM) and VSM using Rocchio Classification in the spam context. Both methods are using cosine similarities to identify the spam

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Performance Evaluation of Data Mining based Classifier for Classification of Spam E-Mail

TL;DR: This research work has recommended the Multilayer perceptron (MLP) as a best classifier for classification of spam which gives 93.15% accuracy with 10-fold cross validation.
References
More filters
Book

Introduction to Information Retrieval

TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.
Journal ArticleDOI

A vector space model for automatic indexing

TL;DR: An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents, demonstating the usefulness of the model.
Journal ArticleDOI

On principal component analysis, cosine and Euclidean measures in information retrieval

TL;DR: The single and complete linkage and Ward clustering was applied to Finnish documents utilizing their relevance assessment as a new feature and a connection between the cosine measure and the Euclidean distance was used in association with PCA.
Related Papers (5)