scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

DC scanner: Detecting phishing attack

TL;DR: This work digs html contents of emails and web pages referred, domains and domain related authority details of these links, script codes associated to web pages are analyzed to conclude for the probability of phishing attacks.
Abstract: Data mining has been used as a technology in various applications of engineering, sciences and others to analysis data of systems and to solve problems. Its applications further extend towards detecting cyber-attacks. We are presenting our work with simple and less efforts similar to data mining which detects email based phishing attacks. This work digs html contents of emails and web pages referred. Also domains and domain related authority details of these links, script codes associated to web pages are analyzed to conclude for the probability of phishing attacks.
Citations
More filters
Journal ArticleDOI
TL;DR: The concept ofphishing terms weighting which evaluates the weight of phishing terms in each email is introduced, which is the highest accuracy rate for an accredited data set.
Abstract: Phishing attacks are one of the trending cyber-attacks that apply socially engineered messages that are communicated to people from professional hackers aiming at fooling users to reveal their sensitive information, the most popular communication channel to those messages is through users’ emails. This paper presents an intelligent classification model for detecting phishing emails using knowledge discovery, data mining and text processing techniques. This paper introduces the concept of phishing terms weighting which evaluates the weight of phishing terms in each email. The pre-processing phase is enhanced by applying text stemming and WordNet ontology to enrich the model with word synonyms. The model applied the knowledge discovery procedures using five popular classification algorithms and achieved a notable enhancement in classification accuracy; 99.1% accuracy was achieved using the Random Forest algorithm and 98.4% using J48, which is –to our knowledge- the highest accuracy rate for an accredited data set. This paper also presents a comparative study with similar proposed classification techniques.

42 citations


Additional excerpts

  • ...KEYWORDS phishing, data mining, email classification, Random Forest, J48....

    [...]

Book ChapterDOI
06 Jul 2018
TL;DR: Some important URL features are identified and the study shows that the detection performance with feature selection is improved, which means that web-based phishing detection by using Random Forest is improved.
Abstract: Phishing has been a widespread issue for many years, claiming countless victims, some of which have not even realized that they fell prey. The sole purpose of phishing is to obtain sensitive information from its victims. There have yet to be a consensus on the best way to detect phishing. In this paper, we analyze web-based phishing detection by using Random Forest. Some important URL features are identified and our study shows that the detection performance with feature selection is improved.

21 citations

Journal ArticleDOI
TL;DR: In this paper, a methodology combining blacklist-based, web content-based and heuristic based approaches, using ML algorithms with comprehensive features to allow more accurate phishing attack detection.

13 citations

Journal ArticleDOI
TL;DR: This study analyses and combines phishing emails and phishing web-forms in a single framework, which allows feature extraction and feature model construction and indicates that the feature model from combined sources can detect phishing websites with a higher accuracy.
Abstract: Anti-phishing detection solutions employed in industry use blacklist-based approaches to achieve low false-positive rates, but blacklist approaches utilizes website URLs only. This study analyses and combines phishing emails and phishing web-forms in a single framework, which allows feature extraction and feature model construction. The outcome should classify between phishing, suspicious, legitimate and detect emerging phishing attacks accurately. The intelligent phishing security for online approach is based on machine learning techniques, using Adaptive Neuro-Fuzzy Inference System and a combination sources from which features are extracted. An experiment was performed using two-fold cross validation method to measure the system’s accuracy. The intelligent phishing security approach achieved a higher accuracy. The finding indicates that the feature model from combined sources can detect phishing websites with a higher accuracy. This paper contributes to phishing field a combined feature which sources in a single framework. The implication is that phishing attacks evolve rapidly; therefore, regular updates and being ahead of phishing strategy is the way forward.

9 citations


Cites background from "DC scanner: Detecting phishing atta..."

  • ...Although various studies have concentrated on phishing attacks and used a variety of solutions in the recent years to combat phishing [2], [3]-[6] there is still a lack of accuracy in real-time causing vast amount of losses annually [7]....

    [...]

References
More filters
Proceedings ArticleDOI
08 May 2007
TL;DR: The design, implementation, and evaluation of CANTINA, a novel, content-based approach to detecting phishing web sites, based on the TF-IDF information retrieval algorithm, are presented.
Abstract: Phishing is a significant problem involving fraudulent email and web sites that trick unsuspecting users into revealing private information. In this paper, we present the design, implementation, and evaluation of CANTINA, a novel, content-based approach to detecting phishing web sites, based on the TF-IDF information retrieval algorithm. We also discuss the design and evaluation of several heuristics we developed to reduce false positives. Our experiments show that CANTINA is good at detecting phishing sites, correctly labeling approximately 95% of phishing sites.

813 citations


Additional excerpts

  • ...Referring CANTINA, Gupta et. al....

    [...]

  • ...Earlier, J. I. H. Zhang proposed CANTINA [18] to analyze and verify HTML contents of web page refereed by links in emails, domains of URLs found in web pages, also URLs using heuristics approaches....

    [...]

Proceedings Article
01 Jan 2004
TL;DR: A framework for client-side defense is proposed: a browser plug-in that examines web pages and warns the user when requests for data may be part of a spoof attack.
Abstract: Web spoofing is a significant problem involving fraudulent email and web sites that trick unsuspecting users into revealing private information We discuss some aspects of common attacks and propose a framework for client-side defense: a browser plug-in that examines web pages and warns the user when requests for data may be part of a spoof attack While the plugin, SpoofGuard, has been tested using actual sites obtained through government agencies concerned about the problem, we expect that web spoofing and other forms of identity theft will be continuing problems in

487 citations


"DC scanner: Detecting phishing atta..." refers methods in this paper

  • ...Further it has been used in phishing detection by heuristics [10], by human factors [11], by visual similarity [12], by blacklisted web sites [13]....

    [...]

Journal ArticleDOI
TL;DR: A high-level overview of various categories of phishing mitigation techniques is presented, such as: detection, offensive defense, correction, and prevention, which it is believed is critical to present where the phishing detection techniques fit in the overall mitigation process.
Abstract: This article surveys the literature on the detection of phishing attacks. Phishing attacks target vulnerabilities that exist in systems due to the human factor. Many cyber attacks are spread via mechanisms that exploit weaknesses found in end-users, which makes users the weakest element in the security chain. The phishing problem is broad and no single silver-bullet solution exists to mitigate all the vulnerabilities effectively, thus multiple techniques are often implemented to mitigate specific attacks. This paper aims at surveying many of the recently proposed phishing mitigation techniques. A high-level overview of various categories of phishing mitigation techniques is also presented, such as: detection, offensive defense, correction, and prevention, which we belief is critical to present where the phishing detection techniques fit in the overall mitigation process.

396 citations


"DC scanner: Detecting phishing atta..." refers background in this paper

  • ...In tenns of authors, phishing is a fraudulent attempt, usually made through email, to steal your personal infonnation [2] ....

    [...]

  • ...[2] large number of entries in data sets can cause perfonnance and resource constraints....

    [...]

Proceedings ArticleDOI
01 Jul 2009
TL;DR: This paper used 191 fresh phish that were less than 30 minutes old to conduct two tests on eight anti-phishing toolbars and found that two tools using heuristics to complement blacklists caught signicantly more phish initially than those using only blacklists.
Abstract: In this paper, we study the eectiveness of phishing blacklists. We used 191 fresh phish that were less than 30 minutes old to conduct two tests on eight anti-phishing toolbars. We found that 63% of the phishing campaigns in our dataset lasted less than two hours. Blacklists were ineective when protecting users initially, as most of them caught less than 20% of phish at hour zero. We also found that blacklists were updated at dierent speeds, and varied in coverage, as 47% - 83% of phish appeared on blacklists 12 hours from the initial test. We found that two tools using heuristics to complement blacklists caught signicantly more phish initially than those using only blacklists. However, it took a long time for phish detected by heuristics to appear on blacklists. Finally, we tested the toolbars on a set of 13,458 legitimate URLs for false positives, and did not nd any instance of mislabeling for either blacklists or heuristics. We present these ndings and discuss ways in which anti-phishing tools can be improved.

362 citations


"DC scanner: Detecting phishing atta..." refers methods in this paper

  • ...Further it has been used in phishing detection by heuristics [10], by human factors [11], by visual similarity [12], by blacklisted web sites [13]....

    [...]

Journal ArticleDOI
TL;DR: This article develops a real-time system for gathering URL features and is able to train an online classifier that detects malicious Web sites with 99% accuracy over a balanced dataset.
Abstract: Malicious Web sites are a cornerstone of Internet criminal activities. The dangers of these sites have created a demand for safeguards that protect end-users from visiting them. This article explores how to detect malicious Web sites from the lexical and host-based features of their URLs. We show that this problem lends itself naturally to modern algorithms for online learning. Online algorithms not only process large numbers of URLs more efficiently than batch algorithms, they also adapt more quickly to new features in the continuously evolving distribution of malicious URLs. We develop a real-time system for gathering URL features and pair it with a real-time feed of labeled URLs from a large Web mail provider. From these features and labels, we are able to train an online classifier that detects malicious Web sites with 99p accuracy over a balanced dataset.

216 citations


"DC scanner: Detecting phishing atta..." refers background or methods in this paper

  • ...[14] has believed in some symbols like "-" that are rarely used in genuine websites....

    [...]

  • ...Analyzing domains of URLs, contents of htrnl pages, links in emails to conclude web sites or URLs as phishing ones with the help of data sets have also been seen as remarkable approaches in [14], [15] [16], [17]....

    [...]