scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Detecting spam and phishing mails using SVM and obfuscation URL detection algorithm

TL;DR: A system that uses SVM technique along with map-reduce paradigm to achieve a higher accuracy in detection of the spam email is proposed and tries to overcome the two hurdles of the SVM.
Abstract: Phishing is a criminal scheme to steal the user's personal data and other credential information. It is a fraud that acquires victim's confidential information such as password, bank account detail, credit card number, financial username and password etc. and later it can be misuse by attacker. We aim to use fundamental visual features of a web page's appearance as the basis of detecting page similarities. We propose a novel solution, to efficiently detect phishing web pages. Note that page layouts and contents are fundamental feature of web pages' appearance. Since the standard way to specify page layouts is through the style sheet (CSS), we develop an algorithm to detect similarities in key elements related to CSS. In this paper, we proposed a system that uses SVM technique along with map-reduce paradigm to achieve a higher accuracy in detection of the spam email. By using the map-reduce technique we also try to overcome the two hurdles of the SVM.
Citations
More filters
Journal ArticleDOI
TL;DR: This study focused on an algorithm that was thoroughly made and the methods in implementing this algorithm are discussed in detail, which can be used in the machine learning method to prevent phishing attacks.
Abstract: The development of computer networks today has increased rapidly. This can be seen based on the trend of computer users around the world, whereby they need to connect their computer to the Internet. This shows that the use of Internet networks is very important, whether for work purposes or access to social media accounts. However, in widely using this computer network, the privacy of computer users is in danger, especially for computer users who do not install security systems in their computer. This problem will allow hackers to hack and commit network attacks. This is very dangerous, especially for Internet users because hackers can steal confidential information such as bank login account or social media login account. The attacks that can be made include phishing attacks. The goal of this study is to review the types of phishing attacks and current methods used in preventing them. Based on the literature, the machine learning method is widely used to prevent phishing attacks. There are several algorithms that can be used in the machine learning method to prevent these attacks. This study focused on an algorithm that was thoroughly made and the methods in implementing this algorithm are discussed in detail.

18 citations


Cites background or methods from "Detecting spam and phishing mails u..."

  • ...There are some attributes that are commonly used by SVM algorithm to detect phishing websites as listed in Table 1 [31]....

    [...]

  • ... Good in handling large attributes and large amount of data [31]....

    [...]

  • ...According to [31], SVM algorithm is a linear strong classifier which can identify two label classes in the dataset....

    [...]

Journal ArticleDOI
TL;DR: The e-mail phishing detection is performed in this paper using the optimization-based deep learning networks and it is clear that the proposed method acquired the maximal accuracy, sensitivity and specificity compared with the existing methods.
Abstract: Phishing is a serious cybersecurity problem, which is widely available through multimedia, such as e-mail and Short Messaging Service (SMS) to collect the personal information of the individual. However, the rapid growth of the unsolicited and unwanted information needs to be addressed, raising the necessity of the technology to develop any effective anti-phishing methods.,The primary intention of this research is to design and develop an approach for preventing phishing by proposing an optimization algorithm. The proposed approach involves four steps, namely preprocessing, feature extraction, feature selection and classification, for dealing with phishing e-mails. Initially, the input data set is subjected to the preprocessing, which removes stop words and stemming in the data and the preprocessed output is given to the feature extraction process. By extracting keyword frequency from the preprocessed, the important words are selected as the features. Then, the feature selection process is carried out using the Bhattacharya distance such that only the significant features that can aid the classification are selected. Using the selected features, the classification is done using the deep belief network (DBN) that is trained using the proposed fractional-earthworm optimization algorithm (EWA). The proposed fractional-EWA is designed by the integration of EWA and fractional calculus to determine the weights in the DBN optimally.,The accuracy of the methods, naive Bayes (NB), DBN, neural network (NN), EWA-DBN and fractional EWA-DBN is 0.5333, 0.5455, 0.5556, 0.5714 and 0.8571, respectively. The sensitivity of the methods, NB, DBN, NN, EWA-DBN and fractional EWA-DBN is 0.4558, 0.5631, 0.7035, 0.7045 and 0.8182, respectively. Likewise, the specificity of the methods, NB, DBN, NN, EWA-DBN and fractional EWA-DBN is 0.5052, 0.5631, 0.7028, 0.7040 and 0.8800, respectively. It is clear from the comparative table that the proposed method acquired the maximal accuracy, sensitivity and specificity compared with the existing methods.,The e-mail phishing detection is performed in this paper using the optimization-based deep learning networks. The e-mails include a number of unwanted messages that are to be detected in order to avoid the storage issues. The importance of the method is that the inclusion of the historical data in the detection process enhances the accuracy of detection.

11 citations

Proceedings ArticleDOI
01 Nov 2019
TL;DR: The combination of OSS and SMOTE can be a plausible option to handle the imbalanced class problem on the web phishing classification either on binary class and multiclass datasets.
Abstract: From the previous work related to web phishing, the researchers overlook the imbalanced class problem on the dataset. theoretically, the majority of classification methods would assume that the nature of the class distribution is balanced. It caused the classification’s performance of the method will be declining. Therefore, the mechanism of imbalanced class handling is severely needed. In our study, One SidedSelection and Synthetic Minority Over-Sampling Technique are used to handle the imbalanced class condition. Those algorithms work to balancing the class distribution of the dataset so that the accuracy and the gmean score of the classification will be enhanced. Based on the result, the combination of those methods (OSS and SMOTE) can enhance the classification’s result significantly either on binary type class and multiclass type dataset. Hence, the combination of OSS and SMOTE can be a plausible option to handle the imbalanced class problem on the web phishing classification either on binary class and multiclass datasets.

8 citations

14 Apr 2018
TL;DR: A hybrid multi-layer model using Natural Language Processing (NLP) techniques for defending against phishing attacks is proposed, which enables a new prospect in detection of a potential attacker trying to manipulate the victim for revealing confidential information.
Abstract: Now-a-days, social engineering is considered to be one of the most overwhelming threats in the field of cyber security. Social engineers, who deceive people by using their personal appeal through cunning communication, do not rely on finding the vulnerabilities to break into the cyberspace as traditional hackers. Instead, they make shifty communication with the victims that often enable them to gain confidential information like their credentials to compromise cyber security. Phishing attack has become one of the most commonly used social engineering methods in daily life. Since the attacker does not rely on technical vulnerabilities, social engineering, especially phishing attacks cannot be tackled using cyber security tools like firewalls, IDSs (Intrusion Detection Systems), etc. What is more, the increased popularity of the social media has further complicated the problem by availing abundance of information that can be used against the victims. The objective of this paper is to propose a new framework that characterizes the behavior of the phishing attack, and a comprehensive model for describing awareness, measurement and defense of phishing based attacks. To be specific, we propose a hybrid multi-layer model using Natural Language Processing (NLP) techniques for defending against phishing attacks. The model enables a new prospect in detection of a potential attacker trying to manipulate the victim for revealing confidential information.

6 citations


Cites methods from "Detecting spam and phishing mails u..."

  • ...[19] presents an approach for detecting spam and phishing emails using SVM (Support Vector Machine) and Obfuscation URL Detection algorithm....

    [...]

Proceedings ArticleDOI
01 Nov 2019
TL;DR: The addition of meta-algorithm is proposed to support the improvement of classification performance for the development of various web phishing detection systems.
Abstract: Web phishing is one of the many crimes that occur in cyberspace and often threatens internet users around the world. Web phishing works by tricking the victim into a website page that has been designed to resemble the original page and then directing the target to submit the important information they have. Web phishing detection system needs to be developed to minimize attacks and theft of information using the website. Research related to web phishing detection system has been carried out by many researchers, one of them using data mining techniques, but still uses a single classification algorithm. Therefore, the addition of meta-algorithm is proposed to support the improvement of classification performance for the development of various web phishing detection systems. From the testing phase that conducted using Web Phishing dataset from UCI Machine Learning Repository, an increase in accuracy value of 97.1% is obtained by the addition of the bagging process, 97.3% by using the boosting process, and 97.5% by the addition of the stacking process. With the resulting improved performance, it is hoped that the model can be used as a reference in perfecting the development of various phishing web detection systems.

5 citations


Cites background from "Detecting spam and phishing mails u..."

  • ...Website is no longer used only as a medium to convey information but is also used as a medium of communication and social interaction such as social media and transaction media in the form of e-commerce and I-banking which are banking transactions [3], [4]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: PhishStorm, an automated phishing detection system that can analyze in real time any URL in order to identify potential phishing sites, is introduced and the new concept of intra-URL relatedness is defined and evaluated.
Abstract: Despite the growth of prevention techniques, phishing remains an important threat since the principal countermeasures in use are still based on reactive URL blacklisting. This technique is inefficient due to the short lifetime of phishing Web sites, making recent approaches relying on real-time or proactive phishing URLs detection techniques more appropriate. In this paper we introduce PhishStorm, an automated phishing detection system that can analyse in real-time any URL in order to identify potential phishing sites. PhishStorm can interface with any email server or HTTP proxy. We argue that phishing URLs usually have few relationships between the part of the URL that must be registered (low level domain) and the remaining part of the URL (upper level domain, path, query). We show in this paper that experimental evidence supports this observation and can be used to detect phishing sites. For this purpose, we define the new concept of intra-URL relatedness and evaluate it using features extracted from words that compose a URL based on query data from Google and Yahoo search engines. These features are then used in machine learning based classification to detect phishing URLs from a real dataset. Our technique is assessed on 96,018 phishing and legitimate URLs that results in a correct classification rate of 94.91% with only 1.44% false positives. An extension for a URL phishingness rating system exhibiting high confidence rate (> 99%) is proposed. We discuss in the paper efficient implementation patterns that allow real time analytics using Big Data architectures like STORM and advanced data structures based on Bloom filter.

148 citations

Proceedings ArticleDOI
09 Sep 2013
TL;DR: A new solution, BaitAlarm, to detect phishing attack using features that are hard to evade and an algorithm to quantify the suspicious ratings of web pages based on similarity of visual appearance between the web pages.
Abstract: In this paper, we present a new solution, BaitAlarm, to detect phishing attack using features that are hard to evade. The intuition of our approach is that phishing pages need to preserve the visual appearance the target pages. We present an algorithm to quantify the suspicious ratings of web pages based on similarity of visual appearance between the web pages. Since CSS is the standard technique to specify page layout, our solution uses the CSS as the basis for detecting visual similarities among web pages. We prototyped our approach as a Google Chrome extension and used it to rate the suspiciousness of web pages. The prototype shows the correctness and accuracy of our approach with a relatively low performance overhead.

61 citations

Proceedings ArticleDOI
27 Jun 2015
TL;DR: This paper describes an approach that classifies URLs automatically based on their lexical and host-based features, and achieves 93-98% accuracy by detecting a large number of phishing hosts, while maintaining a modest false positive rate.
Abstract: The openness of the Web exposes opportunities for criminals to upload malicious content. In fact, despite extensive research, email based spam filtering techniques are unable to protect other web services. Therefore, a counter measure must be taken that generalizes across web services to protect the user from phishing host URLs. This paper describes an approach that classifies URLs automatically based on their lexical and host-based features. Clustering is performed on the entire dataset and a cluster ID (or label) is derived for each URL, which in turn is used as a predictive feature by the classification system. Online URL reputation services are used in order to categorize URLs and the categories returned are used as a supplemental source of information that would enable the system to rank URLs. The classifier achieves 93-98% accuracy by detecting a large number of phishing hosts, while maintaining a modest false positive rate. URL clustering, URL classification, and URL categorization mechanisms work in conjunction to give URLs a rank.

60 citations

Proceedings ArticleDOI
27 Apr 2014
TL;DR: A new phishing detection approach based on the features of URL, which focuses on the similarity of phishing site's URL and legitimate site'sURL and shows that the technique can detect over 97% phishing sites.
Abstract: Together with the growth of e-commerce transaction, Phishing - the act of stealing personal information - rises in quantity and quality. The phishers try to make fake-sites look similar to legitimate sites in terms of interface and uniform resource locator (URL) address. Therefore, the numbers of victim have been increasing due to inefficient methods using blacklist to detect phishing. This paper proposes a new phishing detection approach based on the features of URL. Specifically, the proposed method focuses on the similarity of phishing site's URL and legitimate site's URL. In addition, the ranking of site is also considered as an important factor to decide whether the site is a phishing site. The proposed technique is evaluated with a dataset of 11,660 phishing sites and 5,000 legitimate sites. The results show that the technique can detect over 97% phishing sites.

57 citations

Journal ArticleDOI
TL;DR: Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart.

46 citations