Detecting spam and phishing mails using SVM and obfuscation URL detection algorithm

doi:10.1109/ICISC.2017.8068633

Home
/
Papers
/
Detecting spam and phishing mails using SVM and obfuscation URL detection algorithm

Proceedings Article•DOI•

Detecting spam and phishing mails using SVM and obfuscation URL detection algorithm

Prajakta S. Patil¹, Rashmi A. Rane¹, Madhuri Bhalekar¹•Institutions (1)

Maharashtra Institute of Technology¹

01 Jan 2017-pp 1-4

TL;DR: A system that uses SVM technique along with map-reduce paradigm to achieve a higher accuracy in detection of the spam email is proposed and tries to overcome the two hurdles of the SVM.

read less

Abstract: Phishing is a criminal scheme to steal the user's personal data and other credential information. It is a fraud that acquires victim's confidential information such as password, bank account detail, credit card number, financial username and password etc. and later it can be misuse by attacker. We aim to use fundamental visual features of a web page's appearance as the basis of detecting page similarities. We propose a novel solution, to efficiently detect phishing web pages. Note that page layouts and contents are fundamental feature of web pages' appearance. Since the standard way to specify page layouts is through the style sheet (CSS), we develop an algorithm to detect similarities in key elements related to CSS. In this paper, we proposed a system that uses SVM technique along with map-reduce paradigm to achieve a higher accuracy in detection of the spam email. By using the map-reduce technique we also try to overcome the two hurdles of the SVM.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Review of the machine learning methods in the classification of phishing attack

[...]

John Arthur Jupin¹, Tole Sutikno², Mohd Arfian Ismail¹, Mohd Saberi Mohamad³, Shahreen Kasim⁴, Deris Stiawan⁵ - Show less +2 more•Institutions (5)

Universiti Malaysia Pahang¹, Universitas Ahmad Dahlan², Universiti Malaysia Kelantan³, Universiti Tun Hussein Onn Malaysia⁴, Sriwijaya University⁵

01 Dec 2019-Bulletin of Electrical Engineering and Informatics

TL;DR: This study focused on an algorithm that was thoroughly made and the methods in implementing this algorithm are discussed in detail, which can be used in the machine learning method to prevent phishing attacks.

...read moreread less

Abstract: The development of computer networks today has increased rapidly. This can be seen based on the trend of computer users around the world, whereby they need to connect their computer to the Internet. This shows that the use of Internet networks is very important, whether for work purposes or access to social media accounts. However, in widely using this computer network, the privacy of computer users is in danger, especially for computer users who do not install security systems in their computer. This problem will allow hackers to hack and commit network attacks. This is very dangerous, especially for Internet users because hackers can steal confidential information such as bank login account or social media login account. The attacks that can be made include phishing attacks. The goal of this study is to review the types of phishing attacks and current methods used in preventing them. Based on the literature, the machine learning method is widely used to prevent phishing attacks. There are several algorithms that can be used in the machine learning method to prevent these attacks. This study focused on an algorithm that was thoroughly made and the methods in implementing this algorithm are discussed in detail.

...read moreread less

18 citations

Cites background or methods from "Detecting spam and phishing mails u..."

...There are some attributes that are commonly used by SVM algorithm to detect phishing websites as listed in Table 1 [31]....
[...]
... Good in handling large attributes and large amount of data [31]....
[...]
...According to [31], SVM algorithm is a linear strong classifier which can identify two label classes in the dataset....
[...]

Journal Article•DOI•

An optimization-based deep belief network for the detection of phishing e-mails

[...]

M Arshey, K S Angel Viji

16 Jul 2020-Drug Testing and Analysis

TL;DR: The e-mail phishing detection is performed in this paper using the optimization-based deep learning networks and it is clear that the proposed method acquired the maximal accuracy, sensitivity and specificity compared with the existing methods.

...read moreread less

Abstract: Phishing is a serious cybersecurity problem, which is widely available through multimedia, such as e-mail and Short Messaging Service (SMS) to collect the personal information of the individual. However, the rapid growth of the unsolicited and unwanted information needs to be addressed, raising the necessity of the technology to develop any effective anti-phishing methods.,The primary intention of this research is to design and develop an approach for preventing phishing by proposing an optimization algorithm. The proposed approach involves four steps, namely preprocessing, feature extraction, feature selection and classification, for dealing with phishing e-mails. Initially, the input data set is subjected to the preprocessing, which removes stop words and stemming in the data and the preprocessed output is given to the feature extraction process. By extracting keyword frequency from the preprocessed, the important words are selected as the features. Then, the feature selection process is carried out using the Bhattacharya distance such that only the significant features that can aid the classification are selected. Using the selected features, the classification is done using the deep belief network (DBN) that is trained using the proposed fractional-earthworm optimization algorithm (EWA). The proposed fractional-EWA is designed by the integration of EWA and fractional calculus to determine the weights in the DBN optimally.,The accuracy of the methods, naive Bayes (NB), DBN, neural network (NN), EWA-DBN and fractional EWA-DBN is 0.5333, 0.5455, 0.5556, 0.5714 and 0.8571, respectively. The sensitivity of the methods, NB, DBN, NN, EWA-DBN and fractional EWA-DBN is 0.4558, 0.5631, 0.7035, 0.7045 and 0.8182, respectively. Likewise, the specificity of the methods, NB, DBN, NN, EWA-DBN and fractional EWA-DBN is 0.5052, 0.5631, 0.7028, 0.7040 and 0.8800, respectively. It is clear from the comparative table that the proposed method acquired the maximal accuracy, sensitivity and specificity compared with the existing methods.,The e-mail phishing detection is performed in this paper using the optimization-based deep learning networks. The e-mails include a number of unwanted messages that are to be detected in order to avoid the storage issues. The importance of the method is that the inclusion of the historical data in the detection process enhances the accuracy of detection.

...read moreread less

11 citations

Proceedings Article•DOI•

Hybrid Resampling for Imbalanced Class Handling on Web Phishing Classification Dataset

[...]

Yoga Pristyanto, Akhmad Dahlan

01 Nov 2019

TL;DR: The combination of OSS and SMOTE can be a plausible option to handle the imbalanced class problem on the web phishing classification either on binary class and multiclass datasets.

...read moreread less

Abstract: From the previous work related to web phishing, the researchers overlook the imbalanced class problem on the dataset. theoretically, the majority of classification methods would assume that the nature of the class distribution is balanced. It caused the classification’s performance of the method will be declining. Therefore, the mechanism of imbalanced class handling is severely needed. In our study, One SidedSelection and Synthetic Minority Over-Sampling Technique are used to handle the imbalanced class condition. Those algorithms work to balancing the class distribution of the dataset so that the accuracy and the gmean score of the classification will be enhanced. Based on the result, the combination of those methods (OSS and SMOTE) can enhance the classification’s result significantly either on binary type class and multiclass type dataset. Hence, the combination of OSS and SMOTE can be a plausible option to handle the imbalanced class problem on the web phishing classification either on binary class and multiclass datasets.

...read moreread less

8 citations

Innovations of Phishing Defense: The Mechanism, Measurement and Defense Strategies

[...]

Juan Shan Kutub Thakur, Al-Sakib Khan Pathan¹•Institutions (1)

Southeast University¹

14 Apr 2018

TL;DR: A hybrid multi-layer model using Natural Language Processing (NLP) techniques for defending against phishing attacks is proposed, which enables a new prospect in detection of a potential attacker trying to manipulate the victim for revealing confidential information.

...read moreread less

Abstract: Now-a-days, social engineering is considered to be one of the most overwhelming threats in the field of cyber security. Social engineers, who deceive people by using their personal appeal through cunning communication, do not rely on finding the vulnerabilities to break into the cyberspace as traditional hackers. Instead, they make shifty communication with the victims that often enable them to gain confidential information like their credentials to compromise cyber security. Phishing attack has become one of the most commonly used social engineering methods in daily life. Since the attacker does not rely on technical vulnerabilities, social engineering, especially phishing attacks cannot be tackled using cyber security tools like firewalls, IDSs (Intrusion Detection Systems), etc. What is more, the increased popularity of the social media has further complicated the problem by availing abundance of information that can be used against the victims. The objective of this paper is to propose a new framework that characterizes the behavior of the phishing attack, and a comprehensive model for describing awareness, measurement and defense of phishing based attacks. To be specific, we propose a hybrid multi-layer model using Natural Language Processing (NLP) techniques for defending against phishing attacks. The model enables a new prospect in detection of a potential attacker trying to manipulate the victim for revealing confidential information.

...read moreread less

6 citations

Cites methods from "Detecting spam and phishing mails u..."

...[19] presents an approach for detecting spam and phishing emails using SVM (Support Vector Machine) and Obfuscation URL Detection algorithm....
[...]

Proceedings Article•DOI•

Meta-Algorithms for Improving Classification Performance in the Web-phishing Detection Process

[...]

Anggit Ferdita Nugraha, Luthfia Rahman

01 Nov 2019

TL;DR: The addition of meta-algorithm is proposed to support the improvement of classification performance for the development of various web phishing detection systems.

...read moreread less

Abstract: Web phishing is one of the many crimes that occur in cyberspace and often threatens internet users around the world. Web phishing works by tricking the victim into a website page that has been designed to resemble the original page and then directing the target to submit the important information they have. Web phishing detection system needs to be developed to minimize attacks and theft of information using the website. Research related to web phishing detection system has been carried out by many researchers, one of them using data mining techniques, but still uses a single classification algorithm. Therefore, the addition of meta-algorithm is proposed to support the improvement of classification performance for the development of various web phishing detection systems. From the testing phase that conducted using Web Phishing dataset from UCI Machine Learning Repository, an increase in accuracy value of 97.1% is obtained by the addition of the bagging process, 97.3% by using the boosting process, and 97.5% by the addition of the stacking process. With the resulting improved performance, it is hoped that the model can be used as a reference in perfecting the development of various phishing web detection systems.

...read moreread less

5 citations

Cites background from "Detecting spam and phishing mails u..."

...Website is no longer used only as a medium to convey information but is also used as a medium of communication and social interaction such as social media and transaction media in the form of e-commerce and I-banking which are banking transactions [3], [4]....
[...]

References

PDF

Open Access

More filters

Journal Article•DOI•

PhishStorm: Detecting Phishing With Streaming Analytics

[...]

Samuel Marchal¹, Jerome Francois¹, Radu State¹, Thomas Engel¹•Institutions (1)

University of Luxembourg¹

04 Dec 2014-IEEE Transactions on Network and Service Management

TL;DR: PhishStorm, an automated phishing detection system that can analyze in real time any URL in order to identify potential phishing sites, is introduced and the new concept of intra-URL relatedness is defined and evaluated.

...read moreread less

Abstract: Despite the growth of prevention techniques, phishing remains an important threat since the principal countermeasures in use are still based on reactive URL blacklisting. This technique is inefficient due to the short lifetime of phishing Web sites, making recent approaches relying on real-time or proactive phishing URLs detection techniques more appropriate. In this paper we introduce PhishStorm, an automated phishing detection system that can analyse in real-time any URL in order to identify potential phishing sites. PhishStorm can interface with any email server or HTTP proxy. We argue that phishing URLs usually have few relationships between the part of the URL that must be registered (low level domain) and the remaining part of the URL (upper level domain, path, query). We show in this paper that experimental evidence supports this observation and can be used to detect phishing sites. For this purpose, we define the new concept of intra-URL relatedness and evaluate it using features extracted from words that compose a URL based on query data from Google and Yahoo search engines. These features are then used in machine learning based classification to detect phishing URLs from a real dataset. Our technique is assessed on 96,018 phishing and legitimate URLs that results in a correct classification rate of 94.91% with only 1.44% false positives. An extension for a URL phishingness rating system exhibiting high confidence rate (> 99%) is proposed. We discuss in the paper efficient implementation patterns that allow real time analytics using Big Data architectures like STORM and advanced data structures based on Bloom filter.

...read moreread less

148 citations

Proceedings Article•DOI•

BaitAlarm: Detecting Phishing Sites Using Similarity in Fundamental Visual Features

[...]

Jian Mao¹, Pei Li¹, Kun Li¹, Tao Wei², Zhenkai Liang³ - Show less +1 more•Institutions (3)

Beihang University¹, Peking University², National University of Singapore³

09 Sep 2013

TL;DR: A new solution, BaitAlarm, to detect phishing attack using features that are hard to evade and an algorithm to quantify the suspicious ratings of web pages based on similarity of visual appearance between the web pages.

...read moreread less

Abstract: In this paper, we present a new solution, BaitAlarm, to detect phishing attack using features that are hard to evade. The intuition of our approach is that phishing pages need to preserve the visual appearance the target pages. We present an algorithm to quantify the suspicious ratings of web pages based on similarity of visual appearance between the web pages. Since CSS is the standard technique to specify page layout, our solution uses the CSS as the basis for detecting visual similarities among web pages. We prototyped our approach as a Google Chrome extension and used it to rate the suspiciousness of web pages. The prototype shows the correctness and accuracy of our approach with a relatively low performance overhead.

...read moreread less

61 citations

Proceedings Article•DOI•

Phishing URL Detection Using URL Ranking

[...]

Mohammed Nazim Feroz¹, Susan Mengel¹•Institutions (1)

Texas Tech University¹

27 Jun 2015

TL;DR: This paper describes an approach that classifies URLs automatically based on their lexical and host-based features, and achieves 93-98% accuracy by detecting a large number of phishing hosts, while maintaining a modest false positive rate.

...read moreread less

Abstract: The openness of the Web exposes opportunities for criminals to upload malicious content. In fact, despite extensive research, email based spam filtering techniques are unable to protect other web services. Therefore, a counter measure must be taken that generalizes across web services to protect the user from phishing host URLs. This paper describes an approach that classifies URLs automatically based on their lexical and host-based features. Clustering is performed on the entire dataset and a cluster ID (or label) is derived for each URL, which in turn is used as a predictive feature by the classification system. Online URL reputation services are used in order to categorize URLs and the categories returned are used as a supplemental source of information that would enable the system to rank URLs. The classifier achieves 93-98% accuracy by detecting a large number of phishing hosts, while maintaining a modest false positive rate. URL clustering, URL classification, and URL categorization mechanisms work in conjunction to give URLs a rank.

...read moreread less

60 citations

Proceedings Article•DOI•

A novel approach for phishing detection using URL-based heuristic

[...]

Luong Anh Tuan Nguyen¹, Ba Lam To², Huu Khuong Nguyen¹, Minh-Hoang Nguyen•Institutions (2)

Ho Chi Minh City University of Transport¹, Duy Tan University²

27 Apr 2014

TL;DR: A new phishing detection approach based on the features of URL, which focuses on the similarity of phishing site's URL and legitimate site'sURL and shows that the technique can detect over 97% phishing sites.

...read moreread less

Abstract: Together with the growth of e-commerce transaction, Phishing - the act of stealing personal information - rises in quantity and quality. The phishers try to make fake-sites look similar to legitimate sites in terms of interface and uniform resource locator (URL) address. Therefore, the numbers of victim have been increasing due to inefficient methods using blacklist to detect phishing. This paper proposes a new phishing detection approach based on the features of URL. Specifically, the proposed method focuses on the similarity of phishing site's URL and legitimate site's URL. In addition, the ranking of site is also considered as an important factor to decide whether the site is a phishing site. The proposed technique is evaluated with a dataset of 11,660 phishing sites and 5,000 legitimate sites. The results show that the technique can detect over 97% phishing sites.

...read moreread less

57 citations

Journal Article•DOI•

An ontology enhanced parallel SVM for scalable spam filter training

[...]

Godwin Caruana¹, Maozhen Li², Yang Liu¹•Institutions (2)

Brunel University London¹, Tongji University²

01 May 2013-Neurocomputing

TL;DR: Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart.

...read moreread less

46 citations