Detection of phishing websites using an efficient feature-based machine learning framework

doi:10.1007/S00521-017-3305-0

Journal ArticleDOI

Detection of phishing websites using an efficient feature-based machine learning framework

Routhu Srinivasa Rao, +1 more

- 01 Aug 2019 -

Neural Computing and Applications

- Vol. 31, Iss: 8, pp 3851-3873

Chats0

TLDR

This paper proposes a novel classification model, based on heuristic features that are extracted from URL, source code, and third-party services to overcome the disadvantages of existing anti-phishing techniques and outperformed these methods and also detected zero-day phishing attacks.

Abstract:

Phishing is a cyber-attack which targets naive online users tricking into revealing sensitive information such as username, password, social security number or credit card number etc. Attackers fool the Internet users by masking webpage as a trustworthy or legitimate page to retrieve personal information. There are many anti-phishing solutions such as blacklist or whitelist, heuristic and visual similarity-based methods proposed to date, but online users are still getting trapped into revealing sensitive information in phishing websites. In this paper, we propose a novel classification model, based on heuristic features that are extracted from URL, source code, and third-party services to overcome the disadvantages of existing anti-phishing techniques. Our model has been evaluated using eight different machine learning algorithms and out of which, the Random Forest (RF) algorithm performed the best with an accuracy of 99.31%. The experiments were repeated with different (orthogonal and oblique) random forest classifiers to find the best classifier for the phishing website detection. Principal component analysis Random Forest (PCA-RF) performed the best out of all oblique Random Forests (oRFs) with an accuracy of 99.55%. We have also tested our model with the third-party-based features and without third-party-based features to determine the effectiveness of third-party services in the classification of suspicious websites. We also compared our results with the baseline models (CANTINA and CANTINA+). Our proposed technique outperformed these methods and also detected zero-day phishing attacks.

Detection of phishing websites using an efficient feature-based machine learning framework

Citations

Machine learning based phishing detection from URLs

A Survey on Machine Learning Techniques for Cyber Security in the Last Decade

Performance Comparison and Current Challenges of Using Machine Learning Techniques in Cybersecurity

Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains

SoK: A Comprehensive Reexamination of Phishing Research From the Security Perspective

References

Random Forests

The random subspace method for constructing decision forests

Ensemble Methods in Machine Learning

Do we need hundreds of classifiers to solve real world classification problems

Why phishing works

Related Papers (5)

Machine learning based phishing detection from URLs

CANTINA+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites

New rule-based phishing detection method

Cantina: a content-based approach to detecting phishing web sites

Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning

Trending Questions (1)