Conventional and Ontology Based Spam Filtering

doi:10.1109/ICETIETR.2018.8529061

Citations

PDF

Open Access

More filters

Dissertation•

Detection of Phishing and Spam Emails Using Ensemble Technique

[...]

Michael Oluwasegun Akinrele

12 Dec 2019

TL;DR: This study proposed an ensemble approach for phishing and spam filter-based feature selection methods with the goal to lower the feature space dimensionality and increase the accuracy of spam and phishing review classification.

...read moreread less

Abstract: Most of the cyber breaches in the world today are done based on fraudulent activities. Phishers and Spammers come up with new and hybrid techniques all the time to circumvent the available software and techniques, which shows that all organizations are covered by unbroken threat. Among the approaches developed to stop email spam and phishing, filtering is a popular and important one. Common uses of email filters include organizing incoming emails and removal of spam, while phishing is detected by validating email body, URLs, etc. In this study, we proposed an ensemble approach for phishing and spam filter-based feature selection methods with the goal to lower the feature space dimensionality and increase the accuracy of spam and phishing review classification. We collected different public datasets and trained on Machine Learning (ML) based mRMR (Minimum Redundancy Maximum Relevance) models and Ensemble models. Experimental results with seven classifiers show an average of 83% accuracy which made the feature selector improves the performance of spam and phishing classifiers. And can legitimate future email cyber-attacks with a scope for future research and expansion.

...read moreread less

Cites methods from "Conventional and Ontology Based Spa..."

...[17] focused on the user profile classification created by ontology in spam filtering based on ontology....
[...]

References

PDF

Open Access

More filters

Journal Article•DOI•

Review: A review of machine learning approaches to Spam filtering

[...]

Thiago Guzella¹, Walmir Matos Caminhas¹•Institutions (1)

Universidade Federal de Minas Gerais¹

01 Sep 2009-Expert Systems With Applications

TL;DR: A comprehensive review of recent developments in the application of machine learning algorithms to Spam filtering, focusing on both textual- and image-based approaches concludes that while important advancements have been made in the last years, several aspects remain to be explored, especially under more realistic evaluation settings.

...read moreread less

Abstract: In this paper, we present a comprehensive review of recent developments in the application of machine learning algorithms to Spam filtering, focusing on both textual- and image-based approaches. Instead of considering Spam filtering as a standard classification problem, we highlight the importance of considering specific characteristics of the problem, especially concept drift, in designing new filters. Two particularly important aspects not widely recognized in the literature are discussed: the difficulties in updating a classifier based on the bag-of-words representation and a major difference between two early naive Bayes models. Overall, we conclude that while important advancements have been made in the last years, several aspects remain to be explored, especially under more realistic evaluation settings.

...read moreread less

468 citations

"Conventional and Ontology Based Spa..." refers background in this paper

...İn [3], the authors present a comprehensive review of recent developments in the application of machine learning algorithms to spam filtering....
[...]

Book•

Email Spam Filtering: A Systematic Review

[...]

Gordon V. Cormack¹•Institutions (1)

University of Waterloo¹

23 Jun 2008

TL;DR: This work examines the definition of spam, the user's information requirements and the role of the spam filter as one component of a large and complex information universe, and outlines several uncertainties and proposes experimental methods to address them.

...read moreread less

Abstract: Spam is information crafted to be delivered to a large number of recipients, in spite of their wishes. A spam filter is an automated tool to recognize spam so as to prevent its delivery. The purposes of spam and spam filters are diametrically opposed: spam is effective if it evades filters, while a filter is effective if it recognizes spam. The circular nature of these definitions, along with their appeal to the intent of sender and recipient make them difficult to formalize. A typical email user has a working definition no more formal than "I know it when I see it." Yet, current spam filters are remarkably effective, more effective than might be expected given the level of uncertainty and debate over a formal definition of spam, more effective than might be expected given the state-of-the-art information retrieval and machine learning methods for seemingly similar problems. But are they effective enough? Which are better? How might they be improved? Will their effectiveness be compromised by more cleverly crafted spam? We survey current and proposed spam filtering techniques with particular emphasis on how well they work. Our primary focus is spam filtering in email; Similarities and differences with spam filtering in other communication and storage media — such as instant messaging and the Web — are addressed peripherally. In doing so we examine the definition of spam, the user's information requirements and the role of the spam filter as one component of a large and complex information universe. Well-known methods are detailed sufficiently to make the exposition self-contained, however, the focus is on considerations unique to spam. Comparisons, wherever possible, use common evaluation measures, and control for differences in experimental setup. Such comparisons are not easy, as benchmarks, measures, and methods for evaluating spam filters are still evolving. We survey these efforts, their results and their limitations. In spite of recent advances in evaluation methodology, many uncertainties (including widely held but unsubstantiated beliefs) remain as to the effectiveness of spam filtering techniques and as to the validity of spam filter evaluation methods. We outline several uncertainties and propose experimental methods to address them.

...read moreread less

259 citations

"Conventional and Ontology Based Spa..." refers background in this paper

...There are several negative consequences of spam [10]....
[...]

Journal Article•DOI•

A Novel Feature Selection Based on One-Way ANOVA F-Test for E-Mail Spam Classification

[...]

Nadir Omer Fadl Elssied¹, Othman Ibrahim, Ahmed Osman¹•Institutions (1)

Universiti Teknologi Malaysia¹

20 Jan 2014-Research Journal of Applied Sciences, Engineering and Technology

TL;DR: Experimental results on spam base English datasets showed that the enhanced SVM (FSSVM) significantly outperforms SVM and many other recent spam classification methods for English dataset in terms of computational complexity and dimension reduction.

...read moreread less

Abstract: Spam is commonly defined as unwanted e-mails and it became a global threat against e-mail users. Although, Support Vector Machine (SVM) has been commonly used in e-mail spam classification, yet the problem of high data dimensionality of the feature space due to the massive number of e-mail dataset and features still exist. To improve the limitation of SVM, reduce the computational complexity (efficiency) and enhancing the classification accuracy (effectiveness). In this study, feature selection based on one-way ANOVA F-test statistics scheme was applied to determine the most important features contributing to e-mail spam classification. This feature selection based on one-way ANOVA F-test is used to reduce the high data dimensionality of the feature space before the classification process. The experiment of the proposed scheme was carried out using spam base well- known benchmarking dataset to evaluate the feasibility of the proposed method. The comparison is achieved for different datasets, categorization algorithm and success measures. In addition, experimental results on spam base English datasets showed that the enhanced SVM (FSSVM) significantly outperforms SVM and many other recent spam classification methods for English dataset in terms of computational complexity and dimension reduction.

...read moreread less

88 citations

"Conventional and Ontology Based Spa..." refers methods in this paper

...In [4], the authors proposed a novel spam detection scheme by using a combination of feature selection based on one way ANNOVA F-test statistics and Support Vector Machine....
[...]

Proceedings Article•DOI•

Classifying Spam Emails Using Text and Readability Features

[...]

Rushdi Shams¹, Robert E. Mercer¹•Institutions (1)

University of Western Ontario¹

01 Dec 2013

TL;DR: A novel spam classification method that uses features based on email content-language and readability combined with the previously used content-based task features to outperform a number of state-of-the-art methods proposed in previous studies.

...read moreread less

Abstract: Supervised machine learning methods for classifying spam emails are long-established. Most of these methods use either header-based or content-based features. Spammers, however, can bypass these methods easily-especially the ones that deal with header features. In this paper, we report a novel spam classification method that uses features based on email content-language and readability combined with the previously used content-based task features. The features are extracted from four benchmark datasets viz. CSDMC2010, Spam Assassin, Ling Spam, and Enron-Spam. We use five well-known algorithms to induce our spam classifiers: Random Forest (RF), BAGGING, ADABOOSTM1, Support Vector Machine (SVM), and Naive Bayes (NB). We evaluate the classifier performances and find that BAGGING performs the best. Moreover, its performance surpasses that of a number of state-of-the-art methods proposed in previous studies. Although applied only to English language emails, the results indicate that our method may be an excellent means to classify spam emails in other languages, as well.

...read moreread less

62 citations

Additional excerpts

...In [5], the authors reported a novel classification method that uses features based on email content-language and readability combined with the previously used content based task features....
[...]

Proceedings Article•DOI•

Ontology-based classification of email

[...]

Kazem Taghva, Julie Borsack, Jeffrey Coombs, Allen Condit, Steven E. Lumos, Thomas A. Nartker - Show less +2 more

28 Apr 2003

TL;DR: The construction of an ontology that applies rules for identification of features to be used for email classification and the associated probabilities for these features are calculated from the training set of emails and used as a part of the feature vectors for an underlying Bayesian classifier.

...read moreread less

Abstract: We report on the construction of an ontology that applies rules for identification of features to be used for email classification. The associated probabilities for these features are then calculated from the training set of emails and used as a part of the feature vectors for an underlying Bayesian classifier.

...read moreread less

36 citations

"Conventional and Ontology Based Spa..." refers background in this paper

...İn [8], the construction of an ontology that applies rules for identification of features to be used for email classification is discussed....
[...]

Conventional and Ontology Based Spam Filtering

Citations

Cites methods from "Conventional and Ontology Based Spa..."

References

"Conventional and Ontology Based Spa..." refers background in this paper

"Conventional and Ontology Based Spa..." refers background in this paper

"Conventional and Ontology Based Spa..." refers methods in this paper

Additional excerpts

"Conventional and Ontology Based Spa..." refers background in this paper

Related Papers (5)