Survey on Spam Filtering Techniques and Mapreduce
TL;DR: This paper surveys different spam email filtering techniques and Machine learning based, list based, content based and hybrid or other is used because of high accuracy and mathematical support.
...read more
Abstract: Spam Email, also known as junk email , is a subset of electronic spam involving nearly identical messages sent to numerous recipients by email. The messages may contain disguised links that appear to be for familiar websites but in fact lead to phishing web sites or sites that are hosting malware. Spam email may also include malware as scripts or other executable file attachments. Spam is any unwanted and harmful mail. Separation of spam from normal mails is essential. This paper surveys different spam email filtering techniques. The different techniques are Machine learning based, list based, content based and hybrid or other. Machine learning based, is mostly used because of high accuracy and mathematical support. Keywords—Spam filtering techniques, Machine learning based ,content based, word based.
...read more
Topics: Email filtering (70%), Malware (56%), Phishing (56%)
References
More filters
TL;DR: Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart.
...read more
Abstract: Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart.
...read more
45 citations
05 Oct 2014-
TL;DR: A novel similarity-based method is proposed that implements the fingerprinting technique on parallel processing framework and meet-in-the-middle approach is used in this method to achieve a higher accuracy in the spam email detection system.
...read more
Abstract: Currently, there are many effective techniques that are used for filtering spam emails. However, spammers have mostly identified the weakness of those methods in order to bypass current detection systems. In this paper, we propose a novel similarity-based method that implements the fingerprinting technique on parallel processing framework. Furthermore, meet-in-the-middle approach is used in our method to achieve a higher accuracy in the spam email detection system. Our experimental result demonstrates the improved efficiency of this study.
...read more
13 citations
"Survey on Spam Filtering Techniques..." refers methods in this paper
...The Support Vector Machine [1,2] is one of the most modern techniques used in mail classification....
[...]
...Machine learning technique like Support Vector Machines (SVM) can be applied efficiently in spam filtering....
[...]
...Support Vector Machines (SVMs) are powerful classification and regression tools, but their compute and storage requirements increase rapidly with the number of training vectors, putting many problems of practical interest out of their reach....
[...]
...Amongst these, Naïve Bayesian classification, Support Vector Machine, K-Nearest Neighbor are most used and appreciated by researchers....
[...]