scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Phishing Detection by determining reliability factor using rough set theory

TL;DR: In this article, the authors proposed an approach towards phishing detection using Rough Set Theory, which can be a powerful tool, when working on such kind of Applications containing vague or imprecise data.
Abstract: Phishing is a common online weapon, used against users, by Phishers for acquiring a confidential information through deception. Since the inception of internet, nearly everything, ranging from money transaction to sharing information, is done online in most parts of the world. This has also given rise to malicious activities such as Phishing. Detecting Phishing is an intricate process due to complexity, ambiguity and copious amount of possibilities of factors responsible for phishing . Rough sets can be a powerful tool, when working on such kind of Applications containing vague or imprecise data. This paper proposes an approach towards Phishing Detection Using Rough Set Theory. The Thirteen basic factors, directly responsible towards Phishing, are grouped into four Strata. Reliability Factor is determined on the basis of the outcome of these strata, using Rough Set Theory . Reliability Factor determines the possibility of a suspected site to be Valid or Fake. Using Rough set Theory most and the least influential factors towards Phishing are also determined.
Citations
More filters
Proceedings ArticleDOI
05 Oct 2020
TL;DR: Six machine-learning approaches to detect phishing based on a small number of carefully chosen features are compared and Naive Bayes has the least true positives rate and overall Neural Networks holds the most promise for accurate phishing detection with accuracy of 99.4%.
Abstract: Phishing emails are the first step for many of today’s attacks. They come with a simple hyperlink, request for action or a full replica of an existing service or website. The goal is generally to trick the user to voluntarily give away his sensitive information such as login credentials. Many approaches and applications have been proposed and developed to catch and filter phishing emails. However, the problem still lacks a complete and comprehensive solution. In this paper, we apply knowledge discovery principles from data cleansing, integration, selection, aggregation, data mining to knowledge extraction. We study the feature effectiveness based on Information Gain and contribute two new features to the literature. We compare six machine-learning approaches to detect phishing based on a small number of carefully chosen features. We calculate false positives, false negatives, mean absolute error, recall, precision and F-measure and achieve very low false positive and negative rates. Naive Bayes has the least true positives rate and overall Neural Networks holds the most promise for accurate phishing detection with accuracy of 99.4%.

4 citations

01 Jan 2015
TL;DR: The proposed phishing detection model is based on the extracted email features to detect phishing emails, these features appeared in the header and HTML body of email and introduces Artificial Immune System methodology to classify whether the tested email is phishing or not.
Abstract: Phishing/Spam is an attack that deals with social engineering methodology to illegally acquire and use someone else's data on behalf of legitimate website for own benefits. Phishing emails are messages designed to fool the recipient into handing over personal information, such as login names, passwords, credit card numbers, account credentials, social security numbers etc. Fraudulent emails harm their victims through loss of funds and identity theft. They also hurt Internet business, because people lose their trust in Internet transactions for fear that they will become victims of fraud. Filtering approaches using blacklists are not completely effective as about every minute a new phishing scam is created. It has been investigated that the statistical filtering of phishing emails, where a classifier is trained on characteristic features of existing emails and subsequently is able to identify new phishing emails with different contents. This paper deals with the phishing detection problem and how to auto detect phishing emails. The proposed phishing detection model is based on the extracted email features to detect phishing emails, these features appeared in the header and HTML body of email. The developed model introduces Artificial Immune System methodology to classify whether the tested email is phishing or not.
References
More filters
DOI
01 Jan 2006
TL;DR: An automated test bed for testing antiphishing tools is developed and it is demonstrated that the source of phishing URLs and the freshness of the URLs tested can significantly impact the results of anti-phishing tool testing.
Abstract: There are currently dozens of freely available tools to combat phishing and other web-based scams, many of which are web browser extensions that warn users when they are browsing a suspected phishing site. We developed an automated test bed for testing antiphishing tools. We used 200 verified phishing URLs from two sources and 516 legitimate URLs to test the effectiveness of 10 popular anti-phishing tools. Only one tool was able to consistently identify more than 90% of phishing URLs correctly; however, it also incorrectly identified 42% of legitimate URLs as phish. The performance of the other tools varied considerably depending on the source of the phishing URLs. Of these remaining tools, only one correctly identified over 60% of phishing URLs from both sources. Performance also changed significantly depending on the freshness of the phishing URLs tested. Thus we demonstrate that the source of phishing URLs and the freshness of the URLs tested can significantly impact the results of anti-phishing tool testing. We also demonstrate that many of the tools we tested were vulnerable to simple exploits. In this paper we describe our anti-phishing tool test bed, summarize our findings, and offer observations about the effectiveness of these tools as well as ways they might be improved.

296 citations