scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Data mining a way to solve Phishing Attacks

01 Mar 2018-pp 1-5
TL;DR: An architectural model is proposed to differentiate between the fake E-mail and real E-mails with a high accuracy and use naive Bayesian classification for the said purpose and tries to protect the users from leaking their confidential information.
Abstract: With the ever increasing use of Internet by different stake holders in various fields, information on web browsers and servers is highly susceptible to different security attacks. Though high security measures and enhanced techniques are used to protect the information on the web browsers and servers, they are still prone to a number of attacks. Phishing is one such type of attack in which users are tricked by the phishers using social engineering methods to steal their personal or confidential information. Detection of phishing attack with high accuracy is a challenging research issue. Users are duped by the phishers to enter their confidential information into websites created by them and thereby are steal the vital user's credentials. Phishing sites are normally detected by using blacklist based approach but this approach fails as white listed phishing sites cannot be detected using this approach. This research work aims to use data mining algorithms to analyze E-mails and also helps in preventing phishing attacks. This paper proposed an architectural model to differentiate between the fake E-mail and real E-mail with a high accuracy and use naive Bayesian classification for the said purpose. The proposed algorithm works in various stages for fake E-mail detection and hence tries to protect the users from leaking their confidential information.
Citations
More filters
Journal Article
TL;DR: This work proposes using a trusted device to perform mutual authentication that eliminates reliance on perfect user behavior, thwarts Man-in-the-Middle attacks after setup, and protects a user's account even in the presence of keyloggers and most forms of spyware.
Abstract: Phishing, or web spoofing, is a growing problem: the Anti-Phishing Working Group (APWG) received almost 14,000 unique phishing reports in August 2005, a 56% jump over the number of reports in December 2004 [3]. For financial institutions, phishing is a particularly insidious problem, since trust forms the foundation for customer relationships, and phishing attacks undermine confidence in an institution. Phishing attacks succeed by exploiting a user's inability to distinguish legitimate sites from spoofed sites. Most prior research focuses on assisting the user in making this distinction; however, users must make the right security decision every time. Unfortunately, humans are ill-suited for performing the security checks necessary for secure site identification, and a single mistake may result in a total compromise of the user's online account. Fundamentally, users should be authenticated using information that they cannot readily reveal to malicious parties. Placing less reliance on the user during the authentication process will enhance security and eliminate many forms of fraud. We propose using a trusted device to perform mutual authentication that eliminates reliance on perfect user behavior, thwarts Man-in-the-Middle attacks after setup, and protects a user's account even in the presence of keyloggers and most forms of spyware. We demonstrate the practicality of our system with a prototype implementation.

191 citations

Journal ArticleDOI
TL;DR: A novel Machine Learning based classification algorithm has been proposed in this paper which uses heuristic features where feature selection can be extracted from the attributes such as Uniform Resource Locator, Source Code, Session, Type of security involve, Protocol used, type of website.
Abstract: Phishing attack is one of the commonly known attack where the information from the internet users are stolen by the intruder. The internet users are losses their sensitive information such as Protected passwords, personal information and their transactions to the intruders. The Phishing attack is normally carried by the attackers where the legitimate frequently used websites are manipulated and masked to gather the personal information of the users. The Intruders use the personal information and can manipulate the transactions and get definite from them. From the literature there are various anti-Phishing websites by the various authors. Some of the techniques are Blacklist or Whitelist and heuristic and visual similarity based methods. In spite of the users using these techniques most of the users are getting attacked by the intruders by means of Phishing to gather their sensitive information. A novel Machine Learning based classification algorithm has been proposed in this paper which uses heuristic features where feature selection can be extracted from the attributes such as Uniform Resource Locator, Source Code, Session, Type of security involve, Protocol used, type of website. The proposed model has been evaluated using five machine learning algorithms such as random forest, K Nearest Neighbor, Decision Tree, Support Vector Machine, Logistic regression. Out of these models, the random forest algorithm performs better with attack detection accuracy of 91.4%. Moreover the Random Forest Model uses orthogonal and oblique classifiers to select the best classifiers for accurate detection of Phishing attacks in the websites. Keywords—Phishing attack, Machine Learning, Classification Algorithms, Cyber Security, Heuristic Approach.

6 citations


Cites methods from "Data mining a way to solve Phishing..."

  • ...Sahoo (2018) used a data mining technique to analyse phishing attacks on e-mail and built an architecture model separate regular e-mail from spam mail by using Naïve Bayes classification technique Sridharan and Sivakumar (2018), Sridharan and Chitra(2016), Sridharan and Chitra(2014)....

    [...]

Journal ArticleDOI
TL;DR: In this paper , a systematic review and examination of the state of the art of BEC phishing detection techniques is provided to provide a detailed understanding of the topic to allow researchers to identify the main principles of business email compromise detection, the common Machine Learning (ML) algorithms used, the features used to detect BEC, and the common datasets used.
Abstract: The risk of cyberattacks against businesses has risen considerably, with Business Email Compromise (BEC) schemes taking the lead as one of the most common phishing attack methods. The daily evolution of this assault mechanism’s attack methods has shown a very high level of proficiency against organisations. Since the majority of BEC emails lack a payloader, they have become challenging for organisations to identify or detect using typical spam filtering and static feature extraction techniques. Hence, an efficient and effective BEC phishing detection approach is required to provide an effective solution to various organisations to protect against such attacks. This paper provides a systematic review and examination of the state of the art of BEC phishing detection techniques to provide a detailed understanding of the topic to allow researchers to identify the main principles of BEC phishing detection, the common Machine Learning (ML) algorithms used, the features used to detect BEC phishing, and the common datasets used. Based on the selected search strategy, 38 articles (of 950 articles) were chosen for closer examination. Out of these articles, the contributions of the selected articles were discussed and summarised to highlight their contributions as well as their limitations. In addition, the features of BEC phishing used for detection were provided, as well as the ML algorithms and datasets that were used in BEC phishing detection models were discussed. In the end, open issues and future research directions of BEC phishing detection based on ML were discussed.

2 citations

Journal ArticleDOI
11 Apr 2021
TL;DR: This work has evaluated its approach using a huge phishing e-mail test data to illustrate the efficacy of the strategy, which is semanticized in order to identify malicious intent.
Abstract: In present world phishing attacks are the most common and easily targeted attacks. In order to analyse texts and detect improper statements that show the phishing attacks, we have come with an idea which will use (NLP) Natural language processing techniques. Compared to previous work, our approach is different because the emphasis is on the text data found in the attack, which is semanticized in order to identify malicious intent. We have evaluated it using a huge phishing e-mail test data to illustrate the efficacy of our strategy.

2 citations


Cites methods from "Data mining a way to solve Phishing..."

  • ...Prasanta Kumar Sahoo [3] used Data mining algorithms to detect the fake E-mail using Naïve Bayesian classification....

    [...]

Proceedings ArticleDOI
01 Oct 2019
TL;DR: The comparative analysis result indicates the achievement of low false positive rate for phishing classification which suggest that anti-phishing application developer can implement the machine learning classification algorithm that was discovered to be the best in this study to enhance the feature of phishing attack detection and classification.
Abstract: Exponential growth experienced in Internet usage have pave way to exploit users of the Internet, phishing attack is one of the means that can be used to obtained victim confidential details unwittingly across the Internet. A high false positive rate and low accuracy has been a setback in phishing detection. In this research RandomForest, SysFor, SPAARC, RepTree, RandomTree, LMT, ForestPA, JRip, PART, NNge, OneR, AdaBoostM1, RotationForest, LogitBoost, RseslibKnn, LibSVM, and BayesNet were employed to achieve the comparative analysis of machine classifier. The performance of the classifier algorithms were rated using Accuracy, Precision, Recall, F-Measure, Root Mean Squared Error, Receiver Operation Characteristics Area, Root Relative Squared Error False Positive Rate and True Positive Rate using WEKA data mining tool. The research revealed that quit a number of classifiers also exist which if properly explored will yield more accurate results for phishing detection. RondomForest was found to be an excellent classifier that gives the best accuracy of 0.9838 and a false positive rate of 0.017. The comparative analysis result indicates the achievement of low false positive rate for phishing classification which suggest that anti-phishing application developer can implement the machine learning classification algorithm that was discovered to be the best in this study to enhance the feature of phishing attack detection and classification.

1 citations


Cites methods from "Data mining a way to solve Phishing..."

  • ...A combination of some performance metrics was employed in the reviewed of related literature such as AUC [23], [24], Accuracy [13], [20]–[22], [25], [26], [28], Precision [20]–[22], [28], F1-Score [24],Recall [20], [21], [28], FMeasure [21], [22], [28], FPR and FNR [12], [23], [25], [26], [28], ROC [22], TPR [23], [25], [26], [28], TNR [26], [28]....

    [...]

  • ...The performance as obtained after the analysis carried out on the propose classifier Phishing Hybrid Feature-Based (PHFB) by [23] justify PHFB classifier as excellent in comparison to SMO, SVM, TSVM, NB and DT classification algorithms, 97%, 0.7% , 0% and 98.07% representing TPR, FPR, FNR and AUC respectively was achieved....

    [...]

  • ...SVM [13], [20]– [23], KNN [20], [21], AFIS [21], DT [20], [23]–[25], RandomForest [20], [24], [26], RotationForest [13], [24], AdaBoost [20], Naïve Bayes [12], [13], [20], [23], [25], [27], Neural Network [13], [28], J48, IBK and Reinforcement [13], RUSBoost, Gaussian Naïve Bayes and Perception [25], SMO, TSVM and PHFBC [23], CART, C-DT and GBM [24], Logistic Regression [20] the aforementioned classification algorithms in terms of their performance were employed in the reviewed related literatures....

    [...]

  • ...[25] Random Under-Sampling Boosting algorithms (RUSBoost) was used as a classifier in the proposed research that aims at addressing email phishing, SAFE-PC system was develop in other to aid detecting new phishing pranks, as it evolves from existing phishing techniques, furthermore, an evaluation was perform on real world corpus, which aid in comparing SAFE-PC against Sophos an email protection software and Spam Assassin, the performance of SAFE-PC eluded and outperform both Sophos and Spam Assassin in term of email detection by 70%, Gaussian Naïve Bayesian, Decision Tree and Perceptions classifiers were implored to complement RUSBoost, while RUSBoost alongside Gaussian Naïve Bayes learner resulted in a better performance achieving overall accuracy of 97%....

    [...]

  • ...A summary of classification algorithm used in previous study is reflected in Table I In a bid to protect online internet user from revealing sensitive information to malicious entity, [12] proffer a data mining method that will detect a phishing site and notify user about such malicious site, focus of the research was to be able to analyze on distinct email and offer a precautionary measure against phishing attacks, therefore a naïve based classifier algorithm was used to train classifier model, while Naïve Bayesian classification was used to determine between a real and fake emails....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Sometimes a "friendly" email message tempts recipients to reveal more online than they otherwise would, playing right into the sender's hand.
Abstract: Sometimes a "friendly" email message tempts recipients to reveal more online than they otherwise would, playing right into the sender's hand.

995 citations

Proceedings ArticleDOI
06 Jul 2005
TL;DR: A new scheme is proposed, Dynamic Security Skins, that allows a remote web server to prove its identity in a way that is easy for a human user to verify and hard for an attacker to spoof.
Abstract: Phishing is a model problem for illustrating usability concerns of privacy and security because both system designers and attackers battle using user interfaces to guide (or misguide) users.We propose a new scheme, Dynamic Security Skins, that allows a remote web server to prove its identity in a way that is easy for a human user to verify and hard for an attacker to spoof. We describe the design of an extension to the Mozilla Firefox browser that implements this scheme.We present two novel interaction techniques to prevent spoofing. First, our browser extension provides a trusted window in the browser dedicated to username and password entry. We use a photographic image to create a trusted path between the user and this window to prevent spoofing of the window and of the text entry fields.Second, our scheme allows the remote server to generate a unique abstract image for each user and each transaction. This image creates a "skin" that automatically customizes the browser window or the user interface elements in the content of a remote web page. Our extension allows the user's browser to independently compute the image that it expects to receive from the server. To authenticate content from the server, the user can visually verify that the images match.We contrast our work with existing anti-phishing proposals. In contrast to other proposals, our scheme places a very low burden on the user in terms of effort, memory and time. To authenticate himself, the user has to recognize only one image and remember one low entropy password, no matter how many servers he wishes to interact with. To authenticate content from an authenticated server, the user only needs to perform one visual matching operation to compare two images. Furthermore, it places a high burden of effort on an attacker to spoof customized security indicators.

578 citations


"Data mining a way to solve Phishing..." refers background in this paper

  • ...[18] presented a paper titled “dynamic security skins” to enable a remote server to provide a personality that can be easily verified by the consumer....

    [...]

Journal ArticleDOI
TL;DR: A high-level overview of various categories of phishing mitigation techniques is presented, such as: detection, offensive defense, correction, and prevention, which it is believed is critical to present where the phishing detection techniques fit in the overall mitigation process.
Abstract: This article surveys the literature on the detection of phishing attacks. Phishing attacks target vulnerabilities that exist in systems due to the human factor. Many cyber attacks are spread via mechanisms that exploit weaknesses found in end-users, which makes users the weakest element in the security chain. The phishing problem is broad and no single silver-bullet solution exists to mitigate all the vulnerabilities effectively, thus multiple techniques are often implemented to mitigate specific attacks. This paper aims at surveying many of the recently proposed phishing mitigation techniques. A high-level overview of various categories of phishing mitigation techniques is also presented, such as: detection, offensive defense, correction, and prevention, which we belief is critical to present where the phishing detection techniques fit in the overall mitigation process.

396 citations


"Data mining a way to solve Phishing..." refers background in this paper

  • ...Phishing is a crime in which a perpetrator sends the fake e-mail, which appears to come from popular and trusted brand or organization, asking to input personal credential like bank password, username, phone number, address, credit card details, and so forth [1]-[3]....

    [...]

Proceedings ArticleDOI
12 Jul 2006
TL;DR: Passpet is described, a tool that improves both the convenience and security of website logins through a combination of techniques, including password hashing, user-assigned site labels, and password-strengthening measures that defend against dictionary attacks.
Abstract: We describe Passpet, a tool that improves both the convenience and security of website logins through a combination of techniques. Password hashing helps users manage multiple accounts by turning a single memorized password into a different password for each account. User-assigned site labels (petnames) help users securely identify sites in the face of determined attempts at impersonation (phishing). Password-strengthening measures defend against dictionary attacks. Customizing the user interface defends against user-interface spoofing attacks. We propose new improvements to these techniques, discuss how they are integrated into a single tool, and compare Passpet to other solutions for managing passwords and preventing phishing.

202 citations


"Data mining a way to solve Phishing..." refers background in this paper

  • ...[13], and passpet [22] provide password hashing for enforcing security to passwords....

    [...]

Book ChapterDOI
27 Feb 2006
TL;DR: In this paper, the authors propose using a trusted device to perform mutual authentication that eliminates reliance on perfect user behavior, thwarts Man-in-the-Middle attacks after setup, and protects a user's account even in the presence of keyloggers and most forms of spyware.
Abstract: Phishing, or web spoofing, is a growing problem: the Anti-Phishing Working Group (APWG) received almost 14,000 unique phishing reports in August 2005, a 56% jump over the number of reports in December 2004 [3]. For financial institutions, phishing is a particularly insidious problem, since trust forms the foundation for customer relationships, and phishing attacks undermine confidence in an institution. Phishing attacks succeed by exploiting a user's inability to distinguish legitimate sites from spoofed sites. Most prior research focuses on assisting the user in making this distinction; however, users must make the right security decision every time. Unfortunately, humans are ill-suited for performing the security checks necessary for secure site identification, and a single mistake may result in a total compromise of the user's online account. Fundamentally, users should be authenticated using information that they cannot readily reveal to malicious parties. Placing less reliance on the user during the authentication process will enhance security and eliminate many forms of fraud. We propose using a trusted device to perform mutual authentication that eliminates reliance on perfect user behavior, thwarts Man-in-the-Middle attacks after setup, and protects a user's account even in the presence of keyloggers and most forms of spyware.We demonstrate the practicality of our system with a prototype implementation.

197 citations