scispace - formally typeset
Open AccessPosted Content

PhishOut: Effective Phishing Detection Using Selected Features.

TLDR
This paper applied knowledge discovery principles from data cleansing, integration, selection, aggregation, and data mining to knowledge extraction and compared six machine-learning approaches to detect phishing based on a small number of carefully chosen features.
Abstract
Phishing emails are the first step for many of today's attacks. They come with a simple hyperlink, request for action or a full replica of an existing service or website. The goal is generally to trick the user to voluntarily give away his sensitive information such as login credentials. Many approaches and applications have been proposed and developed to catch and filter phishing emails. However, the problem still lacks a complete and comprehensive solution. In this paper, we apply knowledge discovery principles from data cleansing, integration, selection, aggregation, data mining to knowledge extraction. We study the feature effectiveness based on Information Gain and contribute two new features to the literature. We compare six machine-learning approaches to detect phishing based on a small number of carefully chosen features. We calculate false positives, false negatives, mean absolute error, recall, precision and F-measure and achieve very low false positive and negative rates. Na{\"i}ve Bayes has the least true positives rate and overall Neural Networks holds the most promise for accurate phishing detection with accuracy of 99.4\%.

read more

Citations
More filters
Journal ArticleDOI

A systematic literature review on phishing website detection techniques

TL;DR: A systematic literature survey was conducted on 80 scientific papers published in the last five years in research journals, conferences, leading workshops, the thesis of researchers, book chapters, and from high-rank websites as discussed by the authors .
Proceedings ArticleDOI

A Comparative Study on Email Phishing Detection Using Machine Learning Techniques

TL;DR: In this paper , a comparison of previous studies in commonly used Supervised Machine Learning techniques on detecting the phishing email attack such as Decision Tree (DT), Naive Bayes (NB), Random Forest (RF), and Support Vector machine(SVM).
Journal ArticleDOI

Identification of pharming in communication networks using ensemble learning

TL;DR: This work aims at enhancing pharming detection strategies by adopting machine learning classification algorithms that include K-Nearest Neighbors, Decision Tree, Random Forest, Gaussian Naive Bayes, Logistic Regression, Support Vector Machine, Adaptiveboosting, Gradient Boosting, and Extra Trees Classifier.
Proceedings ArticleDOI

Accuracy Comparison of Different Machine Learning Models in Phishing Detection

TL;DR: In this paper , the authors compared different machine learning algorithms to detect whether a URL is a legitimate URL or a phishing URL with a certain feature using a Web page phishing detection dataset.
Journal ArticleDOI

Systematic Literature Review: Anti-Phishing Defences and Their Application to Before-the-click Phishing Email Detection

TL;DR: This paper discusses the performance and suitability of using these techniques for detecting phishing emails before the end-user even reads the email, and suggests some promising areas for further research.
References
More filters
Book

Data Mining: Practical Machine Learning Tools and Techniques

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Journal ArticleDOI

The random subspace method for constructing decision forests

TL;DR: A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity.
Proceedings ArticleDOI

Learning to detect phishing emails

TL;DR: This method is applicable, with slight modification, to detection of phishing websites, or the emails used to direct victims to these sites, and correctly identify over 96% of the phishing emails while only mis-classifying on the order of 0.1%" of the legitimate emails.
Proceedings Article

Client-Side Defense Against Web-Based Identity Theft.

TL;DR: A framework for client-side defense is proposed: a browser plug-in that examines web pages and warns the user when requests for data may be part of a spoof attack.
Journal ArticleDOI

The state of phishing attacks

TL;DR: Looking past the systems people use, they target the people using the systems.
Related Papers (5)