A lifelong spam emails classification model

doi:10.1016/J.ACI.2020.01.002

Home
/
Papers
/
A lifelong spam emails classification model

Journal Article•DOI•

A lifelong spam emails classification model

Rami Mustafa A. Mohammad¹•Institutions (1)

University of Dammam¹

23 Jan 2020-Applied Computing and Informatics (No longer published by Elsevier)-

TL;DR: An enhanced model is proposed for ensuring lifelong spam classification model and the overall performance of the suggested model is contrasted against various other stream mining classification techniques to prove the success of the proposed model as a lifelong spam emails classification method.

read less

About: This article is published in Applied Computing and Informatics.The article was published on 2020-01-23 and is currently open access. It has received 17 citations till now.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

[...]

Francisco Jáñez-Martino, Rocío Alaiz-Rodríguez, Víctor González-Castro, Eduardo Fidalgo, Enrique Alegre - Show less +1 more

11 May 2022-Artificial Intelligence Review

TL;DR: In this article , the authors present the consequences of ignoring the problem of dataset shift in spam email detection and show that this shift may lead to severe degradation in the estimated generalization performance, with error rates reaching values up to $48.81.

...read moreread less

Abstract: Abstract Spam emails have been traditionally seen as just annoying and unsolicited emails containing advertisements, but they increasingly include scams, malware or phishing. In order to ensure the security and integrity for the users, organisations and researchers aim to develop robust filters for spam email detection. Recently, most spam filters based on machine learning algorithms published in academic journals report very high performance, but users are still reporting a rising number of frauds and attacks via spam emails. Two main challenges can be found in this field: (a) it is a very dynamic environment prone to the dataset shift problem and (b) it suffers from the presence of an adversarial figure, i.e. the spammer. Unlike classical spam email reviews, this one is particularly focused on the problems that this constantly changing environment poses. Moreover, we analyse the different spammer strategies used for contaminating the emails, and we review the state-of-the-art techniques to develop filters based on machine learning. Finally, we empirically evaluate and present the consequences of ignoring the matter of dataset shift in this practical field. Experimental results show that this shift may lead to severe degradation in the estimated generalisation performance, with error rates reaching values up to $$48.81\%$$ 48.81 % .

...read moreread less

15 citations

Journal Article•DOI•

Intrusion detection using Highest Wins feature selection algorithm

[...]

Rami Mustafa A. Mohammad¹, Mutasem K. Alsmadi¹•Institutions (1)

University of Dammam¹

09 Feb 2021-Neural Computing and Applications

TL;DR: An innovative feature selection algorithm is proposed and is called “the Highest Wins” (HW), which is used for building a naive Bayes and decision tree intrusion detection classifiers using the well-known dataset from Network Security Laboratory-Knowledge Discovery in Databases (NSL-KDD).

...read moreread less

Abstract: The rapid advancement of Internet stimulates building intelligent data mining systems for detecting intrusion attacks The performance of such systems might be negatively affected due to the big datasets employed in the learning phase Determining the appropriate group of features within training datasets is an essential phase when building data mining classification models Nevertheless, the resulted minimized set of features should maintain or even improve the performance of the classification models Throughout this article, an innovative feature selection algorithm is proposed and is called “the Highest Wins” (HW) To evaluate the generalization ability of HW, it has been applied for creating classification models using naive Bayes technique from 10 benchmark datasets The obtained results were compared against two well-known strategies, namely chi-square and information gain The experimental results confirmed the competitiveness ability of the suggested strategy in terms of various evaluation measurements such as recall, precision, and error rate while significantly decreasing the number of selected features in datasets Further, the HW is used for building a naive Bayes and decision tree intrusion detection classifiers using the well-known dataset from Network Security Laboratory-Knowledge Discovery in Databases (NSL-KDD) The results were promising not just in terms of overall performance, but also in terms of the time needed to build the classification model

...read moreread less

13 citations

Journal Article•DOI•

An improved rule induction based denial of service attacks classification model

[...]

Rami Mustafa A. Mohammad¹, Mutasem K. Alsmadi¹, Ibrahim Almarashdeh¹, Malek Alzaqebah¹•Institutions (1)

University of Dammam¹

01 Dec 2020-Computers & Security

TL;DR: An “Improved RI algorithm” (IRI) is offered which decreases the searching space for generating classification rules by removing all unimportant candidate rule-items along the way of creating the classification model.

...read moreread less

7 citations

Journal Article•DOI•

Email classification analysis using machine learning techniques

[...]

Khalid Iqbal, Muhammad Shehrayar Khan

09 May 2022-Applied Computing and Informatics

TL;DR: Point-Biserial correlation is applied to each feature concerning the class label of the University of California Irvine spambase email dataset to select the best features and evaluates the performance of applied classifiers.

...read moreread less

6 citations

Journal Article•DOI•

Smart material to build mail spam filtering technique using Naive Bayes and MRF methodologies

[...]

S. Jancy Sickory Daisy, A. Rijuvana Begum

01 Jan 2021-Materials Today: Proceedings

TL;DR: In this article, a hybrid technique is created by combining the Naive Bayes Algorithm and the Markov Random Field, which is used to determine the prevalence and configuration of values in a dataset and perform a basic probabilistic classification operation in this proposed hybrid technique.

...read moreread less

6 citations

1
2
3
4
…

References

PDF

Open Access

More filters

Book•

Data Mining: Practical Machine Learning Tools and Techniques

[...]

Ian H. Witten, Eibe Frank, Mark Hall

25 Oct 1999

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.

...read moreread less

Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

...read moreread less

20,196 citations

Journal Article•DOI•

An introduction to ROC analysis

[...]

Tom Fawcett

01 Jun 2006-Pattern Recognition Letters

TL;DR: The purpose of this article is to serve as an introduction to ROC graphs and as a guide for using them in research.

...read moreread less

17,017 citations

Journal Article•DOI•

Data mining: practical machine learning tools and techniques with Java implementations

[...]

Ian H. Witten¹, Eibe Frank¹•Institutions (1)

University of Waikato¹

01 Mar 2002

TL;DR: This presentation discusses the design and implementation of machine learning algorithms in Java, as well as some of the techniques used to develop and implement these algorithms.

...read moreread less

Abstract: 1. What's It All About? 2. Input: Concepts, Instances, Attributes 3. Output: Knowledge Representation 4. Algorithms: The Basic Methods 5. Credibility: Evaluating What's Been Learned 6. Implementations: Real Machine Learning Schemes 7. Moving On: Engineering The Input And Output 8. Nuts And Bolts: Machine Learning Algorithms In Java 9. Looking Forward

...read moreread less

5,936 citations

Book Chapter•DOI•

Ensemble Methods in Machine Learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

21 Jun 2000

TL;DR: Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly.

...read moreread less

Abstract: Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions. The original ensemble method is Bayesian averaging, but more recent algorithms include error-correcting output coding, Bagging, and boosting. This paper reviews these methods and explains why ensembles can often perform better than any single classifier. Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly.

...read moreread less

5,679 citations

Journal Article•DOI•

A survey on concept drift adaptation

[...]

João Gama¹, Indrė Žliobaitė², Albert Bifet, Mykola Pechenizkiy³, Abdelhamid Bouchachia⁴ - Show less +1 more•Institutions (4)

University of Porto¹, Aalto University², Eindhoven University of Technology³, Bournemouth University⁴

01 Mar 2014-ACM Computing Surveys

TL;DR: The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art and aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.

...read moreread less

Abstract: Concept drift primarily refers to an online supervised learning scenario when the relation between the input data and the target variable changes over time. Assuming a general knowledge of supervised learning in this article, we characterize adaptive learning processes; categorize existing strategies for handling concept drift; overview the most representative, distinct, and popular techniques and algorithms; discuss evaluation methodology of adaptive algorithms; and present a set of illustrative applications. The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art. Thus, it aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.

...read moreread less

2,374 citations