scispace - formally typeset
Search or ask a question
Author

Liny Varghese

Bio: Liny Varghese is an academic researcher. The author has contributed to research in topics: Statistical classification & Spambot. The author has an hindex of 1, co-authored 2 publications receiving 2 citations.

Papers
More filters
Journal Article
TL;DR: This paper uses mahout framework to analyse the time and accuracy efficiencies of the results of two Naive Bayes classification algorithms.
Abstract: Spam consists of varieties of contents like text, image, embedded HTML, MIME attachments and also the volume of spam mails sent per day is massive. To handle this high volume, high velocity and large varieties of spam, a scalable spam filtering solution is required. Scalable solutions available for machine learning and statistical studies can be used to implement a scalable solution for spam filtering also. From Big data Analytics domain, Mahout is an open source library from Apache for building scalable solutions in machine learning. This paper uses mahout framework to analyse the time and accuracy efficiencies of the results of two Naive Bayes classification algorithms. Keywords: Apache Mahout, big data, scalable algorithms, Naive Bayes algorithms

1 citations

Journal ArticleDOI
TL;DR: The main objective in this paper is to find out semantic distance and evaluate the applicability of the two information retrieval techniques, Simple Vector Space Models (VSM) and VSM using Rocchio Classification in the spam context.
Abstract: Spam became a big problem to the society. Some spammers are using templates for sending spam. To send a particular promotion they create some template and merge the details of receivers with the template. Similarities can find among these mails and easily ignore the forthcoming spam. Most highvolume spam is sent using tools those randomizes parts of the message - subject, body, sender address etc. The general form of the template that the spammer is using can often guess by inspecting the features of messages. Most of the spam filters are either rule based models or Bayesian models. The main objective in this paper is to find out semantic distance and evaluate the applicability of the two information retrieval techniques, Simple Vector Space Models (VSM) and VSM using Rocchio Classification in the spam context. Both methods are using cosine similarities to identify the spam

1 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, an Ensemble Model-1 that is an ensemble of Multilayer Perceptron (MLP), Naive Bayes and Random Forest (RF) was proposed for classification of spam and ham documents.
Abstract: Spam e-mail documents classification is a very challenging task for e-mail users, especially non IT users. Billions of people using the internet and face the problem of spam e-mails. The automatic identification and classification of spam e-mails help to reduce the problem of e-mail users in managing a large amount of e-mails. This work aims to do a significant contribution by building a robust model for classification of spam e-mail documents using data mining techniques. In this paper, we use Enorn1 data set which consists of spam and ham documents collected from Kaggle repository. We propose an Ensemble Model-1 that is an ensemble of Multilayer Perceptron (MLP), Naive Bayes and Random Forest (RF) to obtain better accuracy for the classification of spam and hame-mail docu­ments. Experimental results reveal that the proposed Ensemble Model-1 outperforms other existing classifiers as well as other proposed ensemble models in terms of classification accuracy. The suggested and proposed Ensem­ble Model-1 produces a high accuracy of 97.25% for classification of spam e-mail documents.

3 citations

Journal ArticleDOI
TL;DR: This research work has recommended the Multilayer perceptron (MLP) as a best classifier for classification of spam which gives 93.15% accuracy with 10-fold cross validation.
Abstract: E-mail is one of the important and economical communication media to transfer the information from one person to others. Due to increase number of E-mails resulted drastic increases spam E-mail. In this research work, we have used various classification techniques to classification of spam E-mail and non spam E-mails. The experiment done in Tanagra data mining tool. We have recommended the Multilayer perceptron (MLP) as a best classifier for classification of spam which gives 93.15% accuracy with 10-fold cross validation.