Filtering Template Driven Spam Mails using Vector Space Models

doi:10.5120/4891-7383

Open AccessJournal ArticleDOI

Filtering Template Driven Spam Mails using Vector Space Models

Liny Varghese, +2 more

- 29 Feb 2012 -

International Journal of Computer Applic...

- Vol. 39, Iss: 14, pp 33-35

TLDR

The main objective in this paper is to find out semantic distance and evaluate the applicability of the two information retrieval techniques, Simple Vector Space Models (VSM) and VSM using Rocchio Classification in the spam context.

Abstract:

Spam became a big problem to the society. Some spammers are using templates for sending spam. To send a particular promotion they create some template and merge the details of receivers with the template. Similarities can find among these mails and easily ignore the forthcoming spam. Most highvolume spam is sent using tools those randomizes parts of the message - subject, body, sender address etc. The general form of the template that the spammer is using can often guess by inspecting the features of messages. Most of the spam filters are either rule based models or Bayesian models. The main objective in this paper is to find out semantic distance and evaluate the applicability of the two information retrieval techniques, Simple Vector Space Models (VSM) and VSM using Rocchio Classification in the spam context. Both methods are using cosine similarities to identify the spam

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Performance Evaluation of Data Mining based Classifier for Classification of Spam E-Mail

Manish Kumar Sahu

- 27 Apr 2017 -

International Journal for Research in Ap...

TL;DR: This research work has recommended the Multilayer perceptron (MLP) as a best classifier for classification of spam which gives 93.15% accuracy with 10-fold cross validation.

...read moreread less

References

PDF

Open Access

More filters

Book

Introduction to Information Retrieval

Christopher D. Manning, +2 more

TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.

...read moreread less

Journal ArticleDOI

A vector space model for automatic indexing

Gerard Salton, +2 more

- 01 Nov 1975 -

Communications of The ACM

TL;DR: An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents, demonstating the usefulness of the model.

...read moreread less

Relevance feedback in information retrieval

J. J. Rocchio

Book

The SMART Retrieval System—Experiments in Automatic Document Processing

Gerard Salton

Journal ArticleDOI

On principal component analysis, cosine and Euclidean measures in information retrieval

Tuomo Korenius, +2 more

- 15 Nov 2007 -

Information Sciences

TL;DR: The single and complete linkage and Ward clustering was applied to Finnish documents utilizing their relevance assessment as a new feature and a connection between the cosine measure and the Euclidean distance was used in association with PCA.

...read moreread less