scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A Novel Machine Learning Approach to Detect Phishing Websites

TL;DR: Various Machine Learning algorithms aimed at predicting whether a website is phishing or legitimate, with an accuracy of 98.4% are focused on.
Abstract: Phishing can be described as a way by which someone may try to steal some personal and important information like login id’s, passwords, and details of credit/debit cards, for wrong reasons, by appearing as a trusted body. Many websites, which look perfectly legitimate to us, can be phishing and could well be the reason for various online frauds. These phishing websites may try to obtain our important information through many ways, for example: phone calls, messages, and pop up windows. So, the need of the hour is to secure information that is sent online and one concrete way of doing so is by countering these phishing attacks. This paper is focused on various Machine Learning algorithms aimed at predicting whether a website is phishing or legitimate. Machine learning solutions are able to detect zero hour phishing attacks and they are better at handling new types of phishing attacks, so they are preferred. In our implementation, we managed an accuracy of 98.4% in prediction a website to be phishing or legitimate.
Citations
More filters
Journal ArticleDOI
TL;DR: A literature review of Artificial Intelligence techniques: Machine Learning, Deep Learning, Hybrid Learning, and Scenario-based techniques for phishing attack detection for each AI technique is presented and the qualities and shortcomings of these methodologies are examined.
Abstract: In recent times, a phishing attack has become one of the most prominent attacks faced by internet users, governments, and service-providing organizations. In a phishing attack, the attacker(s) collects the client's sensitive data (i.e., user account login details, credit/debit card numbers, etc.) by using spoofed emails or fake websites. Phishing websites are common entry points of online social engineering attacks, including numerous frauds on the websites. In such types of attacks, the attacker(s) create website pages by copying the behavior of legitimate websites and sends URL(s) to the targeted victims through spam messages, texts, or social networking. To provide a thorough understanding of phishing attack(s), this paper provides a literature review of Artificial Intelligence (AI) techniques: Machine Learning, Deep Learning, Hybrid Learning, and Scenario-based techniques for phishing attack detection. This paper also presents the comparison of different studies detecting the phishing attack for each AI technique and examines the qualities and shortcomings of these methodologies. Furthermore, this paper provides a comprehensive set of current challenges of phishing attacks and future research direction in this domain.

128 citations


Cites methods from "A Novel Machine Learning Approach t..."

  • ...[58] used 30 features to detect the attack by RF....

    [...]

Journal ArticleDOI
TL;DR: This research is novel in this regard that no previous research focusses on using feed forward NN and ensemble learners for detecting phishing websites.
Abstract: This paper aims to present a framework to detect phishing websites using stacking model. Phishing is a type of fraud to access users’ credentials. The attackers access users’ personal and sensitive information for monetary purposes. Phishing affects diverse fields, such as e-commerce, online business, banking and digital marketing, and is ordinarily carried out by sending spam emails and developing identical websites resembling the original websites. As people surf the targeted website, the phishers hijack their personal information.,Features of phishing data set are analysed by using feature selection techniques including information gain, gain ratio, Relief-F and recursive feature elimination (RFE) for feature selection. Two features are proposed combining the strongest and weakest attributes. Principal component analysis with diverse machine learning algorithms including (random forest [RF], neural network [NN], bagging, support vector machine, Naive Bayes and k-nearest neighbour) is applied on proposed and remaining features. Afterwards, two stacking models: Stacking1 (RF + NN + Bagging) and Stacking2 (kNN + RF + Bagging) are applied by combining highest scoring classifiers to improve the classification accuracy.,The proposed features played an important role in improving the accuracy of all the classifiers. The results show that RFE plays an important role to remove the least important feature from the data set. Furthermore, Stacking1 (RF + NN + Bagging) outperformed all other classifiers in terms of classification accuracy to detect phishing website with 97.4% accuracy.,This research is novel in this regard that no previous research focusses on using feed forward NN and ensemble learners for detecting phishing websites.

72 citations

Journal ArticleDOI
TL;DR: This paper surveys and categorizes works that consider different elements of anti-phishing training programs via a clearly laid-out methodology, and identifies key findings in the technical literature.
Abstract: Email is of critical importance as a communication channel for both business and personal matters. Unfortunately, it is also often exploited for phishing attacks. To defend against such threats, many organizations have begun to provide anti-phishing training programs to their employees. A central question in the development of such programs is how they can be designed sustainably and effectively to minimize the vulnerability of employees to phishing attacks. In this paper, we survey and categorize works that consider different elements of such programs via a clearly laid-out methodology, and identify key findings in the technical literature. Overall, we find that researchers agree on the answers to many relevant questions regarding the utility and effectiveness of anti-phishing training. However, we identified influencing factors, such as the impact of age on the success of anti-phishing training programs, for which mixed findings are available. Finally, based on our comprehensive analysis, we describe how a well-founded anti-phishing training program should be designed and parameterized with a set of proposed research directions.

48 citations

Proceedings ArticleDOI
05 Nov 2020
TL;DR: Wang et al. as discussed by the authors presented a novel ensemble model to detect phishing attacks on the website, which used three machine learning classifiers: Artificial Neural Network (ANN), K-Nearest Neighbors (KNN), and Decision Tree (C4.5).
Abstract: Currently and particularly with remote working scenarios during COVID-19, phishing attack has become one of the most significant threats faced by internet users, organizations, and service providers. In a phishing attack, the attacker tries to steal client sensitive data (such as login, passwords, and credit card details) using spoofed emails and fake websites. Cybercriminals, hacktivists, and nation-state spy agencies have now got a fertilized ground to deploy their latest innovative phishing attacks. Timely detection of phishing attacks has become most crucial than ever. Machine learning algorithms can be used to accurately detect phishing attacks before a user is harmed. This paper presents a novel ensemble model to detect phishing attacks on the website. We select three machine learning classifiers: Artificial Neural Network (ANN), K-Nearest Neighbors (KNN), and Decision Tree (C4.5) to use in an ensemble method with Random Forest Classifier (RFC). This ensemble method effectively detects website phishing attacks with better accuracy than existing studies. Experimental results demonstrate that the ensemble of KNN and RFC detects phishing attacks with 97.33% accuracy.

35 citations

Journal ArticleDOI
TL;DR: This study focused on the design and development of a deep learning-based phishing detection solution that leveraged the universal resource locator and website content such as images, text and frames and built a hybrid classification model named the Intelligent Phishing Detection System.
Abstract: Phishing attacks have evolved in recent years due to high-tech-enabled economic growth worldwide. The rise in all types of fraud loss in 2019 has been attributed to the increase in deception scams and impersonation, as well as to sophisticated online attacks such as phishing. The global impact of phishing attacks will continue to intensify, and thus, a more efficient phishing detection method is required to protect online user activities. To address this need, this study focussed on the design and development of a deep learning-based phishing detection solution that leveraged the universal resource locator and website content such as images, text and frames.,Deep learning techniques are efficient for natural language and image classification. In this study, the convolutional neural network (CNN) and the long short-term memory (LSTM) algorithm were used to build a hybrid classification model named the intelligent phishing detection system (IPDS). To build the proposed model, the CNN and LSTM classifier were trained by using 1m universal resource locators and over 10,000 images. Then, the sensitivity of the proposed model was determined by considering various factors such as the type of feature, number of misclassifications and split issues.,An extensive experimental analysis was conducted to evaluate and compare the effectiveness of the IPDS in detecting phishing web pages and phishing attacks when applied to large data sets. The results showed that the model achieved an accuracy rate of 93.28% and an average detection time of 25 s.,The hybrid approach using deep learning algorithm of both the CNN and LSTM methods was used in this research work. On the one hand, the combination of both CNN and LSTM was used to resolve the problem of a large data set and higher classifier prediction performance. Hence, combining the two methods leads to a better result with less training time for LSTM and CNN architecture, while using the image, frame and text features as a hybrid for our model detection. The hybrid features and IPDS classifier for phishing detection were the novelty of this study to the best of the authors' knowledge.

31 citations

References
More filters
Proceedings ArticleDOI
27 Apr 2014
TL;DR: A new phishing detection approach based on the features of URL, which focuses on the similarity of phishing site's URL and legitimate site'sURL and shows that the technique can detect over 97% phishing sites.
Abstract: Together with the growth of e-commerce transaction, Phishing - the act of stealing personal information - rises in quantity and quality. The phishers try to make fake-sites look similar to legitimate sites in terms of interface and uniform resource locator (URL) address. Therefore, the numbers of victim have been increasing due to inefficient methods using blacklist to detect phishing. This paper proposes a new phishing detection approach based on the features of URL. Specifically, the proposed method focuses on the similarity of phishing site's URL and legitimate site's URL. In addition, the ranking of site is also considered as an important factor to decide whether the site is a phishing site. The proposed technique is evaluated with a dataset of 11,660 phishing sites and 5,000 legitimate sites. The results show that the technique can detect over 97% phishing sites.

57 citations


"A Novel Machine Learning Approach t..." refers methods in this paper

  • ...The approach assesses the relatedness of words that compose a URL [6]....

    [...]

Proceedings ArticleDOI
26 May 2015
TL;DR: This work proposes a machine-learning approach to detect phishing websites using features from their X.509 public key certificates, and illustrates that this certificate-based approach greatly increases the difficulty of masquerading undetected for phishers, with single millisecond delays for users.
Abstract: We propose a machine-learning approach to detect phishing websites using features from their X.509 public key certificates. We show that its efficacy extends beyond HTTPSenabled sites. Our solution enables immediate local identification of phishing sites. As such, this serves as an important complement to the existing server-based anti-phishing mechanisms which predominately use blacklists. Blacklisting suffers from several inherent drawbacks in terms of correctness, timeliness, and completeness. Due to the potentially significant lag prior to site blacklisting, there is a window of opportunity for attackers. Other local client-side phishing detection approaches also exist, but primarily rely on page content or URLs, which are arguably easier to manipulate by attackers. We illustrate that our certificatebased approach greatly increases the difficulty of masquerading undetected for phishers, with single millisecond delays for users. We further show that this approach works not only against HTTPS-enabled phishing attacks, but also detects HTTP phishing attacks with port 443 enabled.

51 citations


"A Novel Machine Learning Approach t..." refers methods in this paper

  • ...In this research paper [4], the authors have used machine-learning algorithms to detect phishing websites using features from X....

    [...]

Proceedings ArticleDOI
07 Dec 2015
TL;DR: In this study, websites' URL features are extracted and subset based feature selection methods and classification algorithms for phishing websites detection are analyzed.
Abstract: In this study we extracted websites' URL features and analyzed subset based feature selection methods and classification algorithms for phishing websites detection.

45 citations


"A Novel Machine Learning Approach t..." refers methods in this paper

  • ...[7] They proposed a method that makes prediction based on the features of URL and the ranking of site....

    [...]

Proceedings ArticleDOI
01 Oct 2016
TL;DR: The proposed detection technique is able to distinguish between the legitimate web page and fake web page by checking the Uniform Resources Locators (URLs) of suspected web pages.
Abstract: Web Spoofing lures the user to interact with the fake websites rather than the real ones. The main objective of this attack is to steal the sensitive information from the users. The attacker creates a ‘shadow’ website that looks similar to the legitimate website. This fraudulent act allows the attacker to observe and modify any information from the user. This paper proposes a detection technique of phishing websites based on checking Uniform Resources Locators (URLs) of web pages. The proposed solution is able to distinguish between the legitimate web page and fake web page by checking the Uniform Resources Locators (URLs) of suspected web pages. URLs are inspected based on particular characteristics to check the phishing web pages. The detected attacks are reported for prevention. The performance of the proposed solution is evaluated using Phistank and Yahoo directory datasets. The obtained results show that the detection mechanism is deployable and capable to detect various types of phishing attacks maintaining a low rate of false alarms.

43 citations


"A Novel Machine Learning Approach t..." refers background or methods in this paper

  • ...In this paper [3], the detection of phishing websites was done...

    [...]

  • ...through some of the many features that one can extract about a URL [3]....

    [...]

Proceedings ArticleDOI
01 Sep 2017
TL;DR: This paper aims to list and identify the important features for machine learning-based detection of phishing websites.
Abstract: Phishing websites are malicious sites which impersonate as legitimate web pages and they aim to reveal users important information such as user id, password, and credit card information. Detection of these phishing sites is a very challenging problem because phishing is mainly a semantics-based attack, which especially abuses human vulnerabilities, however not network or system vulnerabilities. As a software detection scheme, two main approaches are widely used: blacklists/whitelists and machine learning approaches. Machine learning solutions are able to detect zero-hour phishing attacks and they have superior adaption for new types of phishing attacks, therefore they are mainly preferred. To use this type of solution features of input must be selected carefully. The whole performance of the solution depends on these features. Therefore, in this paper, it is aimed to list and identify the important features for machine learning-based detection of phishing websites.

43 citations