A Novel Machine Learning Approach to Detect Phishing Websites

doi:10.1109/SPIN.2018.8474040

Home
/
Papers
/
A Novel Machine Learning Approach to Detect Phishing Websites

Proceedings Article•DOI•

A Novel Machine Learning Approach to Detect Phishing Websites

Ishant Tyagi¹, Jatin Shad¹, Shubham Sharma¹, Siddharth Gaur¹, Gagandeep Kaur¹ - Show less +1 more•Institutions (1)

Jaypee Institute of Information Technology¹

01 Feb 2018-

TL;DR: Various Machine Learning algorithms aimed at predicting whether a website is phishing or legitimate, with an accuracy of 98.4% are focused on.

read less

Abstract: Phishing can be described as a way by which someone may try to steal some personal and important information like login id’s, passwords, and details of credit/debit cards, for wrong reasons, by appearing as a trusted body. Many websites, which look perfectly legitimate to us, can be phishing and could well be the reason for various online frauds. These phishing websites may try to obtain our important information through many ways, for example: phone calls, messages, and pop up windows. So, the need of the hour is to secure information that is sent online and one concrete way of doing so is by countering these phishing attacks. This paper is focused on various Machine Learning algorithms aimed at predicting whether a website is phishing or legitimate. Machine learning solutions are able to detect zero hour phishing attacks and they are better at handling new types of phishing attacks, so they are preferred. In our implementation, we managed an accuracy of 98.4% in prediction a website to be phishing or legitimate.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A comprehensive survey of AI-enabled phishing attacks detection techniques.

[...]

Abdul Basit¹, M. Zafar¹, Xuan Liu², Abdul Rehman Javed¹, Zunera Jalil¹, Kashif Kifayat¹ - Show less +2 more•Institutions (2)

Air University (Islamabad)¹, Yangzhou University²

01 Jan 2021-Telecommunication Systems

TL;DR: A literature review of Artificial Intelligence techniques: Machine Learning, Deep Learning, Hybrid Learning, and Scenario-based techniques for phishing attack detection for each AI technique is presented and the qualities and shortcomings of these methodologies are examined.

...read moreread less

Abstract: In recent times, a phishing attack has become one of the most prominent attacks faced by internet users, governments, and service-providing organizations. In a phishing attack, the attacker(s) collects the client's sensitive data (i.e., user account login details, credit/debit card numbers, etc.) by using spoofed emails or fake websites. Phishing websites are common entry points of online social engineering attacks, including numerous frauds on the websites. In such types of attacks, the attacker(s) create website pages by copying the behavior of legitimate websites and sends URL(s) to the targeted victims through spam messages, texts, or social networking. To provide a thorough understanding of phishing attack(s), this paper provides a literature review of Artificial Intelligence (AI) techniques: Machine Learning, Deep Learning, Hybrid Learning, and Scenario-based techniques for phishing attack detection. This paper also presents the comparison of different studies detecting the phishing attack for each AI technique and examines the qualities and shortcomings of these methodologies. Furthermore, this paper provides a comprehensive set of current challenges of phishing attacks and future research direction in this domain.

...read moreread less

128 citations

Cites methods from "A Novel Machine Learning Approach t..."

...[58] used 30 features to detect the attack by RF....
[...]

Journal Article•DOI•

Phishing web site detection using diverse machine learning algorithms

[...]

Ammara Zamir, Hikmat Ullah Khan, Tassawar Iqbal, Nazish Yousaf, Farah Aslam, Almas Anjum, Maryam Hamdani - Show less +3 more

02 Jan 2020-The Electronic Library

TL;DR: This research is novel in this regard that no previous research focusses on using feed forward NN and ensemble learners for detecting phishing websites.

...read moreread less

Abstract: This paper aims to present a framework to detect phishing websites using stacking model. Phishing is a type of fraud to access users’ credentials. The attackers access users’ personal and sensitive information for monetary purposes. Phishing affects diverse fields, such as e-commerce, online business, banking and digital marketing, and is ordinarily carried out by sending spam emails and developing identical websites resembling the original websites. As people surf the targeted website, the phishers hijack their personal information.,Features of phishing data set are analysed by using feature selection techniques including information gain, gain ratio, Relief-F and recursive feature elimination (RFE) for feature selection. Two features are proposed combining the strongest and weakest attributes. Principal component analysis with diverse machine learning algorithms including (random forest [RF], neural network [NN], bagging, support vector machine, Naive Bayes and k-nearest neighbour) is applied on proposed and remaining features. Afterwards, two stacking models: Stacking1 (RF + NN + Bagging) and Stacking2 (kNN + RF + Bagging) are applied by combining highest scoring classifiers to improve the classification accuracy.,The proposed features played an important role in improving the accuracy of all the classifiers. The results show that RFE plays an important role to remove the least important feature from the data set. Furthermore, Stacking1 (RF + NN + Bagging) outperformed all other classifiers in terms of classification accuracy to detect phishing website with 97.4% accuracy.,This research is novel in this regard that no previous research focusses on using feed forward NN and ensemble learners for detecting phishing websites.

...read moreread less

72 citations

Journal Article•DOI•

Don’t click: towards an effective anti-phishing training. A comparative literature review

[...]

Daniel Jampen¹, Gurkan Gur¹, Thomas Sutter¹, Bernhard Tellenbach¹•Institutions (1)

Zurich University of Applied Sciences/ZHAW¹

09 Aug 2020-Human-centric Computing and Information Sciences

TL;DR: This paper surveys and categorizes works that consider different elements of anti-phishing training programs via a clearly laid-out methodology, and identifies key findings in the technical literature.

...read moreread less

Abstract: Email is of critical importance as a communication channel for both business and personal matters. Unfortunately, it is also often exploited for phishing attacks. To defend against such threats, many organizations have begun to provide anti-phishing training programs to their employees. A central question in the development of such programs is how they can be designed sustainably and effectively to minimize the vulnerability of employees to phishing attacks. In this paper, we survey and categorize works that consider different elements of such programs via a clearly laid-out methodology, and identify key findings in the technical literature. Overall, we find that researchers agree on the answers to many relevant questions regarding the utility and effectiveness of anti-phishing training. However, we identified influencing factors, such as the impact of age on the success of anti-phishing training programs, for which mixed findings are available. Finally, based on our comprehensive analysis, we describe how a well-founded anti-phishing training program should be designed and parameterized with a set of proposed research directions.

...read moreread less

48 citations

Proceedings Article•DOI•

A Novel Ensemble Machine Learning Method to Detect Phishing Attack

[...]

Abdul Basit¹, M. Zafar¹, Abdul Rehman Javed¹, Zunera Jalil¹•Institutions (1)

Air University (Islamabad)¹

05 Nov 2020

TL;DR: Wang et al. as discussed by the authors presented a novel ensemble model to detect phishing attacks on the website, which used three machine learning classifiers: Artificial Neural Network (ANN), K-Nearest Neighbors (KNN), and Decision Tree (C4.5).

...read moreread less

Abstract: Currently and particularly with remote working scenarios during COVID-19, phishing attack has become one of the most significant threats faced by internet users, organizations, and service providers. In a phishing attack, the attacker tries to steal client sensitive data (such as login, passwords, and credit card details) using spoofed emails and fake websites. Cybercriminals, hacktivists, and nation-state spy agencies have now got a fertilized ground to deploy their latest innovative phishing attacks. Timely detection of phishing attacks has become most crucial than ever. Machine learning algorithms can be used to accurately detect phishing attacks before a user is harmed. This paper presents a novel ensemble model to detect phishing attacks on the website. We select three machine learning classifiers: Artificial Neural Network (ANN), K-Nearest Neighbors (KNN), and Decision Tree (C4.5) to use in an ensemble method with Random Forest Classifier (RFC). This ensemble method effectively detects website phishing attacks with better accuracy than existing studies. Experimental results demonstrate that the ensemble of KNN and RFC detects phishing attacks with 97.33% accuracy.

...read moreread less

35 citations

Journal Article•DOI•

Intelligent phishing detection scheme using deep learning algorithms

[...]

Moruf Akin Adebowale, Khin T. Lwin, Mohammed Alamgir Hossain

04 Jun 2020-Journal of Enterprise Information Management

TL;DR: This study focused on the design and development of a deep learning-based phishing detection solution that leveraged the universal resource locator and website content such as images, text and frames and built a hybrid classification model named the Intelligent Phishing Detection System.

...read moreread less

Abstract: Phishing attacks have evolved in recent years due to high-tech-enabled economic growth worldwide. The rise in all types of fraud loss in 2019 has been attributed to the increase in deception scams and impersonation, as well as to sophisticated online attacks such as phishing. The global impact of phishing attacks will continue to intensify, and thus, a more efficient phishing detection method is required to protect online user activities. To address this need, this study focussed on the design and development of a deep learning-based phishing detection solution that leveraged the universal resource locator and website content such as images, text and frames.,Deep learning techniques are efficient for natural language and image classification. In this study, the convolutional neural network (CNN) and the long short-term memory (LSTM) algorithm were used to build a hybrid classification model named the intelligent phishing detection system (IPDS). To build the proposed model, the CNN and LSTM classifier were trained by using 1m universal resource locators and over 10,000 images. Then, the sensitivity of the proposed model was determined by considering various factors such as the type of feature, number of misclassifications and split issues.,An extensive experimental analysis was conducted to evaluate and compare the effectiveness of the IPDS in detecting phishing web pages and phishing attacks when applied to large data sets. The results showed that the model achieved an accuracy rate of 93.28% and an average detection time of 25 s.,The hybrid approach using deep learning algorithm of both the CNN and LSTM methods was used in this research work. On the one hand, the combination of both CNN and LSTM was used to resolve the problem of a large data set and higher classifier prediction performance. Hence, combining the two methods leads to a better result with less training time for LSTM and CNN architecture, while using the image, frame and text features as a hybrid for our model detection. The hybrid features and IPDS classifier for phishing detection were the novelty of this study to the best of the authors' knowledge.

...read moreread less

31 citations

1
2
3
4
…
5
6
7
8

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

A novel approach for phishing detection using URL-based heuristic

[...]

Luong Anh Tuan Nguyen¹, Ba Lam To², Huu Khuong Nguyen¹, Minh-Hoang Nguyen•Institutions (2)

Ho Chi Minh City University of Transport¹, Duy Tan University²

27 Apr 2014

TL;DR: A new phishing detection approach based on the features of URL, which focuses on the similarity of phishing site's URL and legitimate site'sURL and shows that the technique can detect over 97% phishing sites.

...read moreread less

Abstract: Together with the growth of e-commerce transaction, Phishing - the act of stealing personal information - rises in quantity and quality. The phishers try to make fake-sites look similar to legitimate sites in terms of interface and uniform resource locator (URL) address. Therefore, the numbers of victim have been increasing due to inefficient methods using blacklist to detect phishing. This paper proposes a new phishing detection approach based on the features of URL. Specifically, the proposed method focuses on the similarity of phishing site's URL and legitimate site's URL. In addition, the ranking of site is also considered as an important factor to decide whether the site is a phishing site. The proposed technique is evaluated with a dataset of 11,660 phishing sites and 5,000 legitimate sites. The results show that the technique can detect over 97% phishing sites.

...read moreread less

57 citations

"A Novel Machine Learning Approach t..." refers methods in this paper

...The approach assesses the relatedness of words that compose a URL [6]....
[...]

Proceedings Article•DOI•

Beyond the lock icon: real-time detection of phishing websites using public key certificates

[...]

Zheng Dong¹, Apu Kapadia¹, Jim Blythe², L. Jean Camp¹•Institutions (2)

Indiana University¹, University of Southern California²

26 May 2015

TL;DR: This work proposes a machine-learning approach to detect phishing websites using features from their X.509 public key certificates, and illustrates that this certificate-based approach greatly increases the difficulty of masquerading undetected for phishers, with single millisecond delays for users.

...read moreread less

Abstract: We propose a machine-learning approach to detect phishing websites using features from their X.509 public key certificates. We show that its efficacy extends beyond HTTPSenabled sites. Our solution enables immediate local identification of phishing sites. As such, this serves as an important complement to the existing server-based anti-phishing mechanisms which predominately use blacklists. Blacklisting suffers from several inherent drawbacks in terms of correctness, timeliness, and completeness. Due to the potentially significant lag prior to site blacklisting, there is a window of opportunity for attackers. Other local client-side phishing detection approaches also exist, but primarily rely on page content or URLs, which are arguably easier to manipulate by attackers. We illustrate that our certificatebased approach greatly increases the difficulty of masquerading undetected for phishers, with single millisecond delays for users. We further show that this approach works not only against HTTPS-enabled phishing attacks, but also detects HTTP phishing attacks with port 443 enabled.

...read moreread less

51 citations

"A Novel Machine Learning Approach t..." refers methods in this paper

...In this research paper [4], the authors have used machine-learning algorithms to detect phishing websites using features from X....
[...]

Proceedings Article•DOI•

Feature extraction and classification phishing websites based on URL

[...]

Mustafa Aydin¹, Nazife Baykal¹•Institutions (1)

Middle East Technical University¹

07 Dec 2015

TL;DR: In this study, websites' URL features are extracted and subset based feature selection methods and classification algorithms for phishing websites detection are analyzed.

...read moreread less

Abstract: In this study we extracted websites' URL features and analyzed subset based feature selection methods and classification algorithms for phishing websites detection.

...read moreread less

45 citations

"A Novel Machine Learning Approach t..." refers methods in this paper

...[7] They proposed a method that makes prediction based on the features of URL and the ranking of site....
[...]

Proceedings Article•DOI•

Real time detection of phishing websites

[...]

Abdulghani Ali Ahmed¹, Nurul Amirah Abdullah¹•Institutions (1)

Universiti Malaysia Pahang¹

01 Oct 2016

TL;DR: The proposed detection technique is able to distinguish between the legitimate web page and fake web page by checking the Uniform Resources Locators (URLs) of suspected web pages.

...read moreread less

Abstract: Web Spoofing lures the user to interact with the fake websites rather than the real ones. The main objective of this attack is to steal the sensitive information from the users. The attacker creates a ‘shadow’ website that looks similar to the legitimate website. This fraudulent act allows the attacker to observe and modify any information from the user. This paper proposes a detection technique of phishing websites based on checking Uniform Resources Locators (URLs) of web pages. The proposed solution is able to distinguish between the legitimate web page and fake web page by checking the Uniform Resources Locators (URLs) of suspected web pages. URLs are inspected based on particular characteristics to check the phishing web pages. The detected attacks are reported for prevention. The performance of the proposed solution is evaluated using Phistank and Yahoo directory datasets. The obtained results show that the detection mechanism is deployable and capable to detect various types of phishing attacks maintaining a low rate of false alarms.

...read moreread less

43 citations

"A Novel Machine Learning Approach t..." refers background or methods in this paper

...In this paper [3], the detection of phishing websites was done...
[...]
...through some of the many features that one can extract about a URL [3]....
[...]

Proceedings Article•DOI•

Feature selections for the machine learning based detection of phishing websites

[...]

Ebubekir Buber¹, Önder Demir¹, Ozgur Koray Sahingoz²•Institutions (2)

Marmara University¹, Turkish Air Force Academy²

01 Sep 2017

TL;DR: This paper aims to list and identify the important features for machine learning-based detection of phishing websites.

...read moreread less

Abstract: Phishing websites are malicious sites which impersonate as legitimate web pages and they aim to reveal users important information such as user id, password, and credit card information. Detection of these phishing sites is a very challenging problem because phishing is mainly a semantics-based attack, which especially abuses human vulnerabilities, however not network or system vulnerabilities. As a software detection scheme, two main approaches are widely used: blacklists/whitelists and machine learning approaches. Machine learning solutions are able to detect zero-hour phishing attacks and they have superior adaption for new types of phishing attacks, therefore they are mainly preferred. To use this type of solution features of input must be selected carefully. The whole performance of the solution depends on these features. Therefore, in this paper, it is aimed to list and identify the important features for machine learning-based detection of phishing websites.

...read moreread less

43 citations