scispace - formally typeset
Journal ArticleDOI

Detection of phishing websites using an efficient feature-based machine learning framework

Reads0
Chats0
TLDR
This paper proposes a novel classification model, based on heuristic features that are extracted from URL, source code, and third-party services to overcome the disadvantages of existing anti-phishing techniques and outperformed these methods and also detected zero-day phishing attacks.
Abstract
Phishing is a cyber-attack which targets naive online users tricking into revealing sensitive information such as username, password, social security number or credit card number etc. Attackers fool the Internet users by masking webpage as a trustworthy or legitimate page to retrieve personal information. There are many anti-phishing solutions such as blacklist or whitelist, heuristic and visual similarity-based methods proposed to date, but online users are still getting trapped into revealing sensitive information in phishing websites. In this paper, we propose a novel classification model, based on heuristic features that are extracted from URL, source code, and third-party services to overcome the disadvantages of existing anti-phishing techniques. Our model has been evaluated using eight different machine learning algorithms and out of which, the Random Forest (RF) algorithm performed the best with an accuracy of 99.31%. The experiments were repeated with different (orthogonal and oblique) random forest classifiers to find the best classifier for the phishing website detection. Principal component analysis Random Forest (PCA-RF) performed the best out of all oblique Random Forests (oRFs) with an accuracy of 99.55%. We have also tested our model with the third-party-based features and without third-party-based features to determine the effectiveness of third-party services in the classification of suspicious websites. We also compared our results with the baseline models (CANTINA and CANTINA+). Our proposed technique outperformed these methods and also detected zero-day phishing attacks.

read more

Citations
More filters
Journal ArticleDOI

Machine learning based phishing detection from URLs

TL;DR: A real-time anti-phishing system, which uses seven different classification algorithms and natural language processing (NLP) based features, is proposed and Random Forest algorithm with only NLP based features gives the best performance with the 97.98% accuracy rate for detection of phishing URLs.
Journal ArticleDOI

A Survey on Machine Learning Techniques for Cyber Security in the Last Decade

TL;DR: This paper aims to provide a comprehensive overview of the challenges that ML techniques face in protecting cyberspace against attacks, by presenting a literature on ML techniques for cyber security including intrusion detection, spam detection, and malware detection on computer networks and mobile networks in the last decade.
Journal ArticleDOI

Performance Comparison and Current Challenges of Using Machine Learning Techniques in Cybersecurity

TL;DR: A brief review of different machine learning techniques to get to the bottom of all the developments made in detection methods for potential cybersecurity risks, and the first attempt to give a comparison of the time complexity of commonly used ML models in cybersecurity.
Journal ArticleDOI

Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains

TL;DR: This work considers 18 classification tasks with heterogeneous characteristics and experimentally evaluates, for feature subsets of different cardinalities, the extent to which an ensemble approach turns out to be more robust than a single selector, thus providing useful insight for both researchers and practitioners.
Journal ArticleDOI

SoK: A Comprehensive Reexamination of Phishing Research From the Security Perspective

TL;DR: This work reexamines the existing research on phishing and spear phishing from the perspective of the unique needs of the security domain, which includes real-time detection, active attacker, dataset quality and base-rate fallacy, and surveys the existing phishing/spear phishing solutions in their light.
References
More filters
Journal ArticleDOI

Random Forests

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Journal ArticleDOI

The random subspace method for constructing decision forests

TL;DR: A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity.
Book ChapterDOI

Ensemble Methods in Machine Learning

TL;DR: Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly.
Journal ArticleDOI

Do we need hundreds of classifiers to solve real world classification problems

TL;DR: The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in theTop-20, respectively).
Proceedings ArticleDOI

Why phishing works

TL;DR: This paper provides the first empirical evidence about which malicious strategies are successful at deceiving general users by analyzing a large set of captured phishing attacks and developing a set of hypotheses about why these strategies might work.
Related Papers (5)
Trending Questions (1)
CANTINA : A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites

The proposed technique outperformed the baseline models CANTINA and CANTINA+ in detecting phishing websites.