scispace - formally typeset
Journal ArticleDOI

Survey on Spam Filtering Techniques and Mapreduce

25 Dec 2015-international journal of engineering trends and technology-Vol. 30, Iss: 9, pp 444-447
TL;DR: This paper surveys different spam email filtering techniques and Machine learning based, list based, content based and hybrid or other is used because of high accuracy and mathematical support.

...read more

Abstract: Spam Email, also known as junk email , is a subset of electronic spam involving nearly identical messages sent to numerous recipients by email. The messages may contain disguised links that appear to be for familiar websites but in fact lead to phishing web sites or sites that are hosting malware. Spam email may also include malware as scripts or other executable file attachments. Spam is any unwanted and harmful mail. Separation of spam from normal mails is essential. This paper surveys different spam email filtering techniques. The different techniques are Machine learning based, list based, content based and hybrid or other. Machine learning based, is mostly used because of high accuracy and mathematical support. Keywords—Spam filtering techniques, Machine learning based ,content based, word based.

...read more

Topics: Email filtering (70%), Malware (56%), Phishing (56%)
References
More filters

Journal ArticleDOI
Godwin Caruana1, Maozhen Li2, Yang Liu1Institutions (2)
01 May 2013-Neurocomputing
TL;DR: Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart.

...read more

Abstract: Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart.

...read more

45 citations


Proceedings ArticleDOI
Phuc-Tran Ho1, Hee-Sun Kim1, Sung-Ryul Kim1Institutions (1)
05 Oct 2014-
TL;DR: A novel similarity-based method is proposed that implements the fingerprinting technique on parallel processing framework and meet-in-the-middle approach is used in this method to achieve a higher accuracy in the spam email detection system.

...read more

Abstract: Currently, there are many effective techniques that are used for filtering spam emails. However, spammers have mostly identified the weakness of those methods in order to bypass current detection systems. In this paper, we propose a novel similarity-based method that implements the fingerprinting technique on parallel processing framework. Furthermore, meet-in-the-middle approach is used in our method to achieve a higher accuracy in the spam email detection system. Our experimental result demonstrates the improved efficiency of this study.

...read more

13 citations


"Survey on Spam Filtering Techniques..." refers methods in this paper

  • ...The Support Vector Machine [1,2] is one of the most modern techniques used in mail classification....

    [...]

  • ...Machine learning technique like Support Vector Machines (SVM) can be applied efficiently in spam filtering....

    [...]

  • ...Support Vector Machines (SVMs) are powerful classification and regression tools, but their compute and storage requirements increase rapidly with the number of training vectors, putting many problems of practical interest out of their reach....

    [...]

  • ...Amongst these, Naïve Bayesian classification, Support Vector Machine, K-Nearest Neighbor are most used and appreciated by researchers....

    [...]