Spam Filtering using Support Vector Machine

doi:10.47893/IJCCT.2010.1053

Home
/
Papers
/
Spam Filtering using Support Vector Machine

Journal Article•DOI•

Spam Filtering using Support Vector Machine

Priyanka Chhabra¹, Rajesh Wadhvani¹, Sanyam Shukla¹•Institutions (1)

Maulana Azad National Institute of Technology¹

01 Oct 2010-Vol. 1, Iss: 4, pp 256-261

TL;DR: This paper evaluates the performance of Non Linear SVM based classifiers with various kernel functions over Enron Dataset and finds them to be good candidates for spam classification.

read less

Abstract: The traditional anti-spam techniques like Black and White List is not up to the mark in current scenario. The goal of Spam Classification is to distinguish between spam and legitimate mail message. But with the popularization of the Internet, it is challenging to develop spam filters that can effectively eliminate the increasing volumes of unwanted mails automatically before they enter a user's mailbox. Many researchers have been trying to separate spam from legitimate emails using machine learning algorithms based on statistical learning methods. In this paper, we evaluate the performance of Non Linear SVM based classifiers with various kernel functions over Enron Dataset.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning for email spam filtering: review, approaches and open research problems

[...]

Emmanuel Gbenga Dada¹, Joseph Stephen Bassi¹, Haruna Chiroma, Shafi’i Muhammad Abdulhamid², Adebayo Olusola Adetunmbi³, Opeyemi Emmanuel Ajibuwa⁴ - Show less +2 more•Institutions (4)

University of Maiduguri¹, Federal University of Technology Minna², Federal University of Technology Akure³, University of Ilorin⁴

01 Jun 2019-Heliyon

TL;DR: A systematic review of some of the popular machine learning based email spam filtering approaches and recommended deep leaning and deep adversarial learning as the future techniques that can effectively handle the menace of spam emails.

...read moreread less

267 citations

Additional excerpts

...According to [134], SVM is a good classifier due to its sparse data format and satisfactory recall and precision value....
[...]

Journal Article•DOI•

A Survey on Machine Learning Techniques for Cyber Security in the Last Decade

[...]

Kamran Shaukat¹, Suhuai Luo¹, Vijay Varadharajan¹, Ibrahim A. Hameed², Min Xu³ - Show less +1 more•Institutions (3)

University of Newcastle¹, Norwegian University of Science and Technology², University of Technology, Sydney³

02 Dec 2020-IEEE Access

TL;DR: This paper aims to provide a comprehensive overview of the challenges that ML techniques face in protecting cyberspace against attacks, by presenting a literature on ML techniques for cyber security including intrusion detection, spam detection, and malware detection on computer networks and mobile networks in the last decade.

...read moreread less

Abstract: Pervasive growth and usage of the Internet and mobile applications have expanded cyberspace. The cyberspace has become more vulnerable to automated and prolonged cyberattacks. Cyber security techniques provide enhancements in security measures to detect and react against cyberattacks. The previously used security systems are no longer sufficient because cybercriminals are smart enough to evade conventional security systems. Conventional security systems lack efficiency in detecting previously unseen and polymorphic security attacks. Machine learning (ML) techniques are playing a vital role in numerous applications of cyber security. However, despite the ongoing success, there are significant challenges in ensuring the trustworthiness of ML systems. There are incentivized malicious adversaries present in the cyberspace that are willing to game and exploit such ML vulnerabilities. This paper aims to provide a comprehensive overview of the challenges that ML techniques face in protecting cyberspace against attacks, by presenting a literature on ML techniques for cyber security including intrusion detection, spam detection, and malware detection on computer networks and mobile networks in the last decade. It also provides brief descriptions of each ML method, frequently used security datasets, essential ML tools, and evaluation metrics to evaluate a classification model. It finally discusses the challenges of using ML techniques in cyber security. This paper provides the latest extensive bibliography and the current trends of ML in cyber security.

...read moreread less

135 citations

Journal Article•DOI•

A Novel Feature Selection Based on One-Way ANOVA F-Test for E-Mail Spam Classification

[...]

Nadir Omer Fadl Elssied¹, Othman Ibrahim, Ahmed Osman¹•Institutions (1)

Universiti Teknologi Malaysia¹

20 Jan 2014-Research Journal of Applied Sciences, Engineering and Technology

TL;DR: Experimental results on spam base English datasets showed that the enhanced SVM (FSSVM) significantly outperforms SVM and many other recent spam classification methods for English dataset in terms of computational complexity and dimension reduction.

...read moreread less

Abstract: Spam is commonly defined as unwanted e-mails and it became a global threat against e-mail users. Although, Support Vector Machine (SVM) has been commonly used in e-mail spam classification, yet the problem of high data dimensionality of the feature space due to the massive number of e-mail dataset and features still exist. To improve the limitation of SVM, reduce the computational complexity (efficiency) and enhancing the classification accuracy (effectiveness). In this study, feature selection based on one-way ANOVA F-test statistics scheme was applied to determine the most important features contributing to e-mail spam classification. This feature selection based on one-way ANOVA F-test is used to reduce the high data dimensionality of the feature space before the classification process. The experiment of the proposed scheme was carried out using spam base well- known benchmarking dataset to evaluate the feasibility of the proposed method. The comparison is achieved for different datasets, categorization algorithm and success measures. In addition, experimental results on spam base English datasets showed that the enhanced SVM (FSSVM) significantly outperforms SVM and many other recent spam classification methods for English dataset in terms of computational complexity and dimension reduction.

...read moreread less

88 citations

Cites background or methods or result from "Spam Filtering using Support Vector..."

...It can be used to solve Linearly separable as well as non-linear separable problems (Chhabra et al., 2010; Fagbola et al., 2012)....
[...]
...There are many kernel-based functions such as linear kernel function, the normalized poly kernel, polynomial kernel function, Radial Basis Function (RBF) or Gaussian Kernel and Hyperbolic Tangent (Sigmoid) Kernel sigmoid function can be implemented in SVM (Chhabra et al., 2010)....
[...]
...Experiment dataset: There are various benchmark datasets available for researchers related to e-mail classification (Chhabra et al., 2010)....
[...]
...…features and classify spam mails such as Support Vector Machine (SVM), Particle Swarm Optimization (PSO), Naïve Bayesian (NB) and Feature Selection algorithm (FS) (Chhabra et al., 2010; Golovko et al., 2010; Ma et al., 2009; Mohammad and Zitar, 2011; Salcedo-Campos et al., 2012; Wu et al., 2008)....
[...]
...The result of the study by Priyanka et al. (Chhabra et al., 2010) on SVM for massive data classification showed that SVM takes time consuming when the size of data is massive....
[...]

Journal Article•

Hybrid GA-SVM for Efficient Feature Selection in E-mail Classification

[...]

Fagbola Temitayo, Olabiyisi Stephen Adigun Abimbola

28 Feb 2012-Computer Engineering and Intelligent Systems

TL;DR: A Genetic A lgorithm-Support Vector Machine (GA-SVM) feature selection technique is developed to optimize the SV M classification parameters, the prediction accuracy and computation time and Spam assassin dataset was used to validate the performance of the proposed hybrids.

...read moreread less

Abstract: Feature selection is a problem of global combinator ial optimization in machine learning in which subse ts of relevant features are selected to realize robust le arning models. The inclusion of irrelevant and redu ndant features in the dataset can result in poor predicti ons and high computational overhead. Thus, selectin g relevant feature subsets can help reduce the comput ational cost of feature measurement, speed up learn ing process and improve model interpretability. SVM classifier has proven inefficient in its inability to produce accurate classification results in the face of larg e e-mail dataset while it also consumes a lot of computational resources. In this study, a Genetic A lgorithm-Support Vector Machine (GA-SVM) feature selection technique is developed to optimize the SV M classification parameters, the prediction accurac y and computation time. Spam assassin dataset was used to validate the performance of the proposed syste m. The hybrid GA-SVM showed remarkable improvements over SVM in terms of classification accuracy and computation time.

...read moreread less

36 citations

Cites result from "Spam Filtering using Support Vector..."

...The result of the study by Priyanka et al. (2010) on SVM for large dataset classification showed that SVM is time and memory consuming when size of data is enormous....
[...]
...These results confirmed the results obtained by Priyanka et al. (2010) and Andrew (2010) on SVM drawbacks; and also the work of Ishibuchi & Nakashima (2000) and Chandra & Nandhini (2010) on GA’s ability to find an optimal set of feature weights that improve classification rate, and as an effective…...
[...]

Proceedings Article•DOI•

Performance Evaluation of Machine Learning Algorithms for Email Spam Detection

[...]

S. Nandhini¹, Dr.Jeen Marseline.K.S¹•Institutions (1)

Sri Krishna Arts and Science College¹

01 Feb 2020

TL;DR: Five important machine learning classification algorithms viz.

...read moreread less

Abstract: Sending huge number of unwanted mails causes security threat to users. In spite of various security approaches, spammers cause much vulnerability in the internet. This paper discusses the efficient methods of using some of the popular algorithms for building a machine learning model which can classify whether a mail is a spam or ham. UCI Machine Learning Repository Spambase Data Set is used for the experiment. The performance of five important machine learning classification algorithms viz. Logistic Regression, Decision Tree, Naive Bayes, KNN and SVM are evaluated in order to train and build an effective machine learning model for email spam detection. Weka tool is used for training and testing the data set.

...read moreread less

23 citations

1
2
3
4
…
5

References

PDF

Open Access

More filters

Book•

Convex Optimization

[...]

Stephen Boyd¹, Lieven Vandenberghe²•Institutions (2)

Stanford University¹, University of California, Los Angeles²

01 Mar 2004

TL;DR: In this article, the focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them, and a comprehensive introduction to the subject is given. But the focus of this book is not on the optimization problem itself, but on the problem of finding the appropriate technique to solve it.

...read moreread less

Abstract: Convex optimization problems arise frequently in many different fields. A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency. The focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them. The text contains many worked examples and homework exercises and will appeal to students, researchers and practitioners in fields such as engineering, computer science, mathematics, statistics, finance, and economics.

...read moreread less

33,341 citations

Book•

An Introduction to Support Vector Machines and Other Kernel-based Learning Methods

[...]

Nello Cristianini¹, John Shawe-Taylor²•Institutions (2)

University of Bristol¹, Royal Holloway, University of London²

01 Jan 2000

TL;DR: This is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory, and will guide practitioners to updated literature, new applications, and on-line software.

...read moreread less

Abstract: From the publisher: This is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory. SVMs deliver state-of-the-art performance in real-world applications such as text categorisation, hand-written character recognition, image classification, biosequences analysis, etc., and are now established as one of the standard tools for machine learning and data mining. Students will find the book both stimulating and accessible, while practitioners will be guided smoothly through the material required for a good grasp of the theory and its applications. The concepts are introduced gradually in accessible and self-contained stages, while the presentation is rigorous and thorough. Pointers to relevant literature and web sites containing software ensure that it forms an ideal starting point for further study. Equally, the book and its associated web site will guide practitioners to updated literature, new applications, and on-line software.

...read moreread less

13,736 citations

"Spam Filtering using Support Vector..." refers methods in this paper

...Support Vector Machine (SVM) [13, 14] has been recently proposed by Dr....
[...]

Book•DOI•

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

[...]

Bernhard Schölkopf¹, Alexander J. Smola•Institutions (1)

Max Planck Society¹

01 Dec 2001

TL;DR: Learning with Kernels provides an introduction to SVMs and related kernel methods that provide all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms.

...read moreread less

Abstract: From the Publisher: In the 1990s, a new type of learning algorithm was developed, based on results from statistical learning theory: the Support Vector Machine (SVM). This gave rise to a new class of theoretically elegant learning machines that use a central concept of SVMs-kernels--for a number of learning tasks. Kernel machines provide a modular framework that can be adapted to different tasks and domains by the choice of the kernel function and the base algorithm. They are replacing neural networks in a variety of fields, including engineering, information retrieval, and bioinformatics. Learning with Kernels provides an introduction to SVMs and related kernel methods. Although the book begins with the basics, it also includes the latest research. It provides all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms and to understand and apply the powerful algorithms that have been developed over the last few years.

...read moreread less

7,880 citations

The Nature of Statistical Learning

[...]

Vladimir Vapnik

01 Jan 1995

4,292 citations

"Spam Filtering using Support Vector..." refers methods in this paper

...Support Vector Machine (SVM) [13, 14] has been recently proposed by Dr....
[...]

Journal Article•DOI•

Twin Support Vector Machines for Pattern Classification

[...]

Jayadeva¹, Reshma Khemchandani, Suresh Chandra¹•Institutions (1)

Indian Institutes of Technology¹

01 May 2007-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A binary SVM classifier that determines two nonparallel planes by solving two related SVM-type problems, each of which is smaller than in a conventional SVM, which shows good generalization on several benchmark data sets.

...read moreread less

Abstract: We propose twin SVM, a binary SVM classifier that determines two nonparallel planes by solving two related SVM-type problems, each of which is smaller than in a conventional SVM. The twin SVM formulation is in the spirit of proximal SVMs via generalized eigenvalues. On several benchmark data sets, Twin SVM is not only fast, but shows good generalization. Twin SVM is also useful for automatically discovering two-dimensional projections of the data

...read moreread less

1,501 citations