scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Spam Filtering using Support Vector Machine

01 Oct 2010-Vol. 1, Iss: 4, pp 256-261
TL;DR: This paper evaluates the performance of Non Linear SVM based classifiers with various kernel functions over Enron Dataset and finds them to be good candidates for spam classification.
Abstract: The traditional anti-spam techniques like Black and White List is not up to the mark in current scenario. The goal of Spam Classification is to distinguish between spam and legitimate mail message. But with the popularization of the Internet, it is challenging to develop spam filters that can effectively eliminate the increasing volumes of unwanted mails automatically before they enter a user's mailbox. Many researchers have been trying to separate spam from legitimate emails using machine learning algorithms based on statistical learning methods. In this paper, we evaluate the performance of Non Linear SVM based classifiers with various kernel functions over Enron Dataset.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
01 Jun 2019-Heliyon
TL;DR: A systematic review of some of the popular machine learning based email spam filtering approaches and recommended deep leaning and deep adversarial learning as the future techniques that can effectively handle the menace of spam emails.

267 citations


Additional excerpts

  • ...According to [134], SVM is a good classifier due to its sparse data format and satisfactory recall and precision value....

    [...]

Journal ArticleDOI
TL;DR: This paper aims to provide a comprehensive overview of the challenges that ML techniques face in protecting cyberspace against attacks, by presenting a literature on ML techniques for cyber security including intrusion detection, spam detection, and malware detection on computer networks and mobile networks in the last decade.
Abstract: Pervasive growth and usage of the Internet and mobile applications have expanded cyberspace. The cyberspace has become more vulnerable to automated and prolonged cyberattacks. Cyber security techniques provide enhancements in security measures to detect and react against cyberattacks. The previously used security systems are no longer sufficient because cybercriminals are smart enough to evade conventional security systems. Conventional security systems lack efficiency in detecting previously unseen and polymorphic security attacks. Machine learning (ML) techniques are playing a vital role in numerous applications of cyber security. However, despite the ongoing success, there are significant challenges in ensuring the trustworthiness of ML systems. There are incentivized malicious adversaries present in the cyberspace that are willing to game and exploit such ML vulnerabilities. This paper aims to provide a comprehensive overview of the challenges that ML techniques face in protecting cyberspace against attacks, by presenting a literature on ML techniques for cyber security including intrusion detection, spam detection, and malware detection on computer networks and mobile networks in the last decade. It also provides brief descriptions of each ML method, frequently used security datasets, essential ML tools, and evaluation metrics to evaluate a classification model. It finally discusses the challenges of using ML techniques in cyber security. This paper provides the latest extensive bibliography and the current trends of ML in cyber security.

135 citations

Journal ArticleDOI
TL;DR: Experimental results on spam base English datasets showed that the enhanced SVM (FSSVM) significantly outperforms SVM and many other recent spam classification methods for English dataset in terms of computational complexity and dimension reduction.
Abstract: Spam is commonly defined as unwanted e-mails and it became a global threat against e-mail users. Although, Support Vector Machine (SVM) has been commonly used in e-mail spam classification, yet the problem of high data dimensionality of the feature space due to the massive number of e-mail dataset and features still exist. To improve the limitation of SVM, reduce the computational complexity (efficiency) and enhancing the classification accuracy (effectiveness). In this study, feature selection based on one-way ANOVA F-test statistics scheme was applied to determine the most important features contributing to e-mail spam classification. This feature selection based on one-way ANOVA F-test is used to reduce the high data dimensionality of the feature space before the classification process. The experiment of the proposed scheme was carried out using spam base well- known benchmarking dataset to evaluate the feasibility of the proposed method. The comparison is achieved for different datasets, categorization algorithm and success measures. In addition, experimental results on spam base English datasets showed that the enhanced SVM (FSSVM) significantly outperforms SVM and many other recent spam classification methods for English dataset in terms of computational complexity and dimension reduction.

88 citations


Cites background or methods or result from "Spam Filtering using Support Vector..."

  • ...It can be used to solve Linearly separable as well as non-linear separable problems (Chhabra et al., 2010; Fagbola et al., 2012)....

    [...]

  • ...There are many kernel-based functions such as linear kernel function, the normalized poly kernel, polynomial kernel function, Radial Basis Function (RBF) or Gaussian Kernel and Hyperbolic Tangent (Sigmoid) Kernel sigmoid function can be implemented in SVM (Chhabra et al., 2010)....

    [...]

  • ...Experiment dataset: There are various benchmark datasets available for researchers related to e-mail classification (Chhabra et al., 2010)....

    [...]

  • ...…features and classify spam mails such as Support Vector Machine (SVM), Particle Swarm Optimization (PSO), Naïve Bayesian (NB) and Feature Selection algorithm (FS) (Chhabra et al., 2010; Golovko et al., 2010; Ma et al., 2009; Mohammad and Zitar, 2011; Salcedo-Campos et al., 2012; Wu et al., 2008)....

    [...]

  • ...The result of the study by Priyanka et al. (Chhabra et al., 2010) on SVM for massive data classification showed that SVM takes time consuming when the size of data is massive....

    [...]

Journal Article
TL;DR: A Genetic A lgorithm-Support Vector Machine (GA-SVM) feature selection technique is developed to optimize the SV M classification parameters, the prediction accuracy and computation time and Spam assassin dataset was used to validate the performance of the proposed hybrids.
Abstract: Feature selection is a problem of global combinator ial optimization in machine learning in which subse ts of relevant features are selected to realize robust le arning models. The inclusion of irrelevant and redu ndant features in the dataset can result in poor predicti ons and high computational overhead. Thus, selectin g relevant feature subsets can help reduce the comput ational cost of feature measurement, speed up learn ing process and improve model interpretability. SVM classifier has proven inefficient in its inability to produce accurate classification results in the face of larg e e-mail dataset while it also consumes a lot of computational resources. In this study, a Genetic A lgorithm-Support Vector Machine (GA-SVM) feature selection technique is developed to optimize the SV M classification parameters, the prediction accurac y and computation time. Spam assassin dataset was used to validate the performance of the proposed syste m. The hybrid GA-SVM showed remarkable improvements over SVM in terms of classification accuracy and computation time.

36 citations


Cites result from "Spam Filtering using Support Vector..."

  • ...The result of the study by Priyanka et al. (2010) on SVM for large dataset classification showed that SVM is time and memory consuming when size of data is enormous....

    [...]

  • ...These results confirmed the results obtained by Priyanka et al. (2010) and Andrew (2010) on SVM drawbacks; and also the work of Ishibuchi & Nakashima (2000) and Chandra & Nandhini (2010) on GA’s ability to find an optimal set of feature weights that improve classification rate, and as an effective…...

    [...]

Proceedings ArticleDOI
01 Feb 2020
TL;DR: Five important machine learning classification algorithms viz.
Abstract: Sending huge number of unwanted mails causes security threat to users. In spite of various security approaches, spammers cause much vulnerability in the internet. This paper discusses the efficient methods of using some of the popular algorithms for building a machine learning model which can classify whether a mail is a spam or ham. UCI Machine Learning Repository Spambase Data Set is used for the experiment. The performance of five important machine learning classification algorithms viz. Logistic Regression, Decision Tree, Naive Bayes, KNN and SVM are evaluated in order to train and build an effective machine learning model for email spam detection. Weka tool is used for training and testing the data set.

23 citations

References
More filters
Book
01 Mar 2004
TL;DR: In this article, the focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them, and a comprehensive introduction to the subject is given. But the focus of this book is not on the optimization problem itself, but on the problem of finding the appropriate technique to solve it.
Abstract: Convex optimization problems arise frequently in many different fields. A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency. The focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them. The text contains many worked examples and homework exercises and will appeal to students, researchers and practitioners in fields such as engineering, computer science, mathematics, statistics, finance, and economics.

33,341 citations

Book
01 Jan 2000
TL;DR: This is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory, and will guide practitioners to updated literature, new applications, and on-line software.
Abstract: From the publisher: This is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory. SVMs deliver state-of-the-art performance in real-world applications such as text categorisation, hand-written character recognition, image classification, biosequences analysis, etc., and are now established as one of the standard tools for machine learning and data mining. Students will find the book both stimulating and accessible, while practitioners will be guided smoothly through the material required for a good grasp of the theory and its applications. The concepts are introduced gradually in accessible and self-contained stages, while the presentation is rigorous and thorough. Pointers to relevant literature and web sites containing software ensure that it forms an ideal starting point for further study. Equally, the book and its associated web site will guide practitioners to updated literature, new applications, and on-line software.

13,736 citations


"Spam Filtering using Support Vector..." refers methods in this paper

  • ...Support Vector Machine (SVM) [13, 14] has been recently proposed by Dr....

    [...]

BookDOI
01 Dec 2001
TL;DR: Learning with Kernels provides an introduction to SVMs and related kernel methods that provide all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms.
Abstract: From the Publisher: In the 1990s, a new type of learning algorithm was developed, based on results from statistical learning theory: the Support Vector Machine (SVM). This gave rise to a new class of theoretically elegant learning machines that use a central concept of SVMs—-kernels--for a number of learning tasks. Kernel machines provide a modular framework that can be adapted to different tasks and domains by the choice of the kernel function and the base algorithm. They are replacing neural networks in a variety of fields, including engineering, information retrieval, and bioinformatics. Learning with Kernels provides an introduction to SVMs and related kernel methods. Although the book begins with the basics, it also includes the latest research. It provides all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms and to understand and apply the powerful algorithms that have been developed over the last few years.

7,880 citations

01 Jan 1995

4,292 citations


"Spam Filtering using Support Vector..." refers methods in this paper

  • ...Support Vector Machine (SVM) [13, 14] has been recently proposed by Dr....

    [...]

Journal ArticleDOI
TL;DR: A binary SVM classifier that determines two nonparallel planes by solving two related SVM-type problems, each of which is smaller than in a conventional SVM, which shows good generalization on several benchmark data sets.
Abstract: We propose twin SVM, a binary SVM classifier that determines two nonparallel planes by solving two related SVM-type problems, each of which is smaller than in a conventional SVM. The twin SVM formulation is in the spirit of proximal SVMs via generalized eigenvalues. On several benchmark data sets, Twin SVM is not only fast, but shows good generalization. Twin SVM is also useful for automatically discovering two-dimensional projections of the data

1,501 citations