scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Conventional and Ontology Based Spam Filtering

TL;DR: The ontology based spam filtering and conventional spam filtering are compared and the results show that the former is superior to the latter.
Abstract: Emails are inevitable in this modern era. It has become an effective tool in the communication field. Because of its easiness the number of users are increasing day-by-day. With the increased number of email users, the number of spam mails have also increased. Spam can cause great loss to users. Many spam filtering techniques have been introduced to distinguish between ham and spam. A mail that appear as spam may appear as ham to another user and vice versa. That is, it depends on the personal preference of user. Therefore an ontology based personalized mail access is necessary. This paper compares the ontology based spam filtering and conventional spam filtering.
Citations
More filters
Dissertation
12 Dec 2019
TL;DR: This study proposed an ensemble approach for phishing and spam filter-based feature selection methods with the goal to lower the feature space dimensionality and increase the accuracy of spam and phishing review classification.
Abstract: Most of the cyber breaches in the world today are done based on fraudulent activities. Phishers and Spammers come up with new and hybrid techniques all the time to circumvent the available software and techniques, which shows that all organizations are covered by unbroken threat. Among the approaches developed to stop email spam and phishing, filtering is a popular and important one. Common uses of email filters include organizing incoming emails and removal of spam, while phishing is detected by validating email body, URLs, etc. In this study, we proposed an ensemble approach for phishing and spam filter-based feature selection methods with the goal to lower the feature space dimensionality and increase the accuracy of spam and phishing review classification. We collected different public datasets and trained on Machine Learning (ML) based mRMR (Minimum Redundancy Maximum Relevance) models and Ensemble models. Experimental results with seven classifiers show an average of 83% accuracy which made the feature selector improves the performance of spam and phishing classifiers. And can legitimate future email cyber-attacks with a scope for future research and expansion.

Cites methods from "Conventional and Ontology Based Spa..."

  • ...[17] focused on the user profile classification created by ontology in spam filtering based on ontology....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A comprehensive review of recent developments in the application of machine learning algorithms to Spam filtering, focusing on both textual- and image-based approaches concludes that while important advancements have been made in the last years, several aspects remain to be explored, especially under more realistic evaluation settings.
Abstract: In this paper, we present a comprehensive review of recent developments in the application of machine learning algorithms to Spam filtering, focusing on both textual- and image-based approaches. Instead of considering Spam filtering as a standard classification problem, we highlight the importance of considering specific characteristics of the problem, especially concept drift, in designing new filters. Two particularly important aspects not widely recognized in the literature are discussed: the difficulties in updating a classifier based on the bag-of-words representation and a major difference between two early naive Bayes models. Overall, we conclude that while important advancements have been made in the last years, several aspects remain to be explored, especially under more realistic evaluation settings.

468 citations


"Conventional and Ontology Based Spa..." refers background in this paper

  • ...İn [3], the authors present a comprehensive review of recent developments in the application of machine learning algorithms to spam filtering....

    [...]

Book
23 Jun 2008
TL;DR: This work examines the definition of spam, the user's information requirements and the role of the spam filter as one component of a large and complex information universe, and outlines several uncertainties and proposes experimental methods to address them.
Abstract: Spam is information crafted to be delivered to a large number of recipients, in spite of their wishes. A spam filter is an automated tool to recognize spam so as to prevent its delivery. The purposes of spam and spam filters are diametrically opposed: spam is effective if it evades filters, while a filter is effective if it recognizes spam. The circular nature of these definitions, along with their appeal to the intent of sender and recipient make them difficult to formalize. A typical email user has a working definition no more formal than "I know it when I see it." Yet, current spam filters are remarkably effective, more effective than might be expected given the level of uncertainty and debate over a formal definition of spam, more effective than might be expected given the state-of-the-art information retrieval and machine learning methods for seemingly similar problems. But are they effective enough? Which are better? How might they be improved? Will their effectiveness be compromised by more cleverly crafted spam? We survey current and proposed spam filtering techniques with particular emphasis on how well they work. Our primary focus is spam filtering in email; Similarities and differences with spam filtering in other communication and storage media — such as instant messaging and the Web — are addressed peripherally. In doing so we examine the definition of spam, the user's information requirements and the role of the spam filter as one component of a large and complex information universe. Well-known methods are detailed sufficiently to make the exposition self-contained, however, the focus is on considerations unique to spam. Comparisons, wherever possible, use common evaluation measures, and control for differences in experimental setup. Such comparisons are not easy, as benchmarks, measures, and methods for evaluating spam filters are still evolving. We survey these efforts, their results and their limitations. In spite of recent advances in evaluation methodology, many uncertainties (including widely held but unsubstantiated beliefs) remain as to the effectiveness of spam filtering techniques and as to the validity of spam filter evaluation methods. We outline several uncertainties and propose experimental methods to address them.

259 citations


"Conventional and Ontology Based Spa..." refers background in this paper

  • ...There are several negative consequences of spam [10]....

    [...]

Journal ArticleDOI
TL;DR: Experimental results on spam base English datasets showed that the enhanced SVM (FSSVM) significantly outperforms SVM and many other recent spam classification methods for English dataset in terms of computational complexity and dimension reduction.
Abstract: Spam is commonly defined as unwanted e-mails and it became a global threat against e-mail users. Although, Support Vector Machine (SVM) has been commonly used in e-mail spam classification, yet the problem of high data dimensionality of the feature space due to the massive number of e-mail dataset and features still exist. To improve the limitation of SVM, reduce the computational complexity (efficiency) and enhancing the classification accuracy (effectiveness). In this study, feature selection based on one-way ANOVA F-test statistics scheme was applied to determine the most important features contributing to e-mail spam classification. This feature selection based on one-way ANOVA F-test is used to reduce the high data dimensionality of the feature space before the classification process. The experiment of the proposed scheme was carried out using spam base well- known benchmarking dataset to evaluate the feasibility of the proposed method. The comparison is achieved for different datasets, categorization algorithm and success measures. In addition, experimental results on spam base English datasets showed that the enhanced SVM (FSSVM) significantly outperforms SVM and many other recent spam classification methods for English dataset in terms of computational complexity and dimension reduction.

88 citations


"Conventional and Ontology Based Spa..." refers methods in this paper

  • ...In [4], the authors proposed a novel spam detection scheme by using a combination of feature selection based on one way ANNOVA F-test statistics and Support Vector Machine....

    [...]

Proceedings ArticleDOI
01 Dec 2013
TL;DR: A novel spam classification method that uses features based on email content-language and readability combined with the previously used content-based task features to outperform a number of state-of-the-art methods proposed in previous studies.
Abstract: Supervised machine learning methods for classifying spam emails are long-established. Most of these methods use either header-based or content-based features. Spammers, however, can bypass these methods easily-especially the ones that deal with header features. In this paper, we report a novel spam classification method that uses features based on email content-language and readability combined with the previously used content-based task features. The features are extracted from four benchmark datasets viz. CSDMC2010, Spam Assassin, Ling Spam, and Enron-Spam. We use five well-known algorithms to induce our spam classifiers: Random Forest (RF), BAGGING, ADABOOSTM1, Support Vector Machine (SVM), and Naive Bayes (NB). We evaluate the classifier performances and find that BAGGING performs the best. Moreover, its performance surpasses that of a number of state-of-the-art methods proposed in previous studies. Although applied only to English language emails, the results indicate that our method may be an excellent means to classify spam emails in other languages, as well.

62 citations


Additional excerpts

  • ...In [5], the authors reported a novel classification method that uses features based on email content-language and readability combined with the previously used content based task features....

    [...]

Proceedings ArticleDOI
28 Apr 2003
TL;DR: The construction of an ontology that applies rules for identification of features to be used for email classification and the associated probabilities for these features are calculated from the training set of emails and used as a part of the feature vectors for an underlying Bayesian classifier.
Abstract: We report on the construction of an ontology that applies rules for identification of features to be used for email classification. The associated probabilities for these features are then calculated from the training set of emails and used as a part of the feature vectors for an underlying Bayesian classifier.

36 citations


"Conventional and Ontology Based Spa..." refers background in this paper

  • ...İn [8], the construction of an ontology that applies rules for identification of features to be used for email classification is discussed....

    [...]