scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Spam Reduction by using E-mail History and Authentication (SREHA)

TL;DR: It is suggested that there are no bad or good E–mails forever, so the proposed model dynamically allows the transition of E-mail from one state to another state based on the number of received spam and ham messages.
Abstract: Spam messages are today one of the most serious threats to users of E-mail messages. There are several ways to prevent and detect spam message, the most important way is filtering spam. Sometimes Filtering fails to discover some spam messages or even fails in the classification of non-spam messages as a spam messages. In this paper, we suggest a new effective method that reduces the spam messages by integrating prevention and detection techniques in one scheme. The reduction achieved by considering history and user authentication. This method based on issuing a certificate to each reliable user during the process of Email account Creation. The certificate used by Email servers to discard or forward ingoing or outgoing Emails. Each Server has to maintain white, gray and blacklist according to Email classification spam or ham, which determined by the user or by the contents examination of the message in terms of empty or contained only links without any text or by searching for a specific keywords in the subject and in the content. We believe that there are no bad or good E–mails forever, so the proposed model dynamically allows the transition of E-mail from one state to another state based on the number of received spam and ham messages.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Though none of the algorithms did not achieve 100% accuracy in sorting spam emails, Rotation Forest has shown a near degree to achieving most accurate result.
Abstract: The increase in the use of email in every day transactions for a lot of businesses or general communication due to its cost effectiveness and efficiency has made emails vulnerable to attacks including spamming. Spam emails also called junk emails are unsolicited messages that are almost identical and sent to multiple recipients randomly. In this study, a performance analysis is done on some classification algorithms including: Bayesian Logistic Regression, Hidden Naïve Bayes, Radial Basis Function (RBF) Network, Voted Perceptron, Lazy Bayesian Rule, Logit Boost, Rotation Forest, NNge, Logistic Model Tree, REP Tree, Naïve Bayes, Multilayer Perceptron, Random Tree and J48. The performance of the algorithms were measured in terms of Accuracy, Precision, Recall, FMeasure, Root Mean Squared Error, Receiver Operator Characteristics Area and Root Relative Squared Error using WEKA data mining tool. To have a balanced view on the classification algorithms’ performance, no feature selection or performance boosting method was employed. The research showed that a number of classification algorithms exist that if properly explored through feature selection means will yield more accurate results for email classification. Rotation Forest is found to be the classifier that gives the best accuracy of 94.2%. Though none of the algorithms did not achieve 100% accuracy in sorting spam emails, Rotation Forest has shown a near degree to achieving most accurate result.

34 citations


Cites background from "Spam Reduction by using E-mail Hist..."

  • ...Spam messages are fast growing to be one of the most serious threats to users of E-mail messages because it is a major means of sending threats, including viruses, worms and phishing attacks [4], [5],[6], [7]....

    [...]

Journal ArticleDOI
TL;DR: The purpose of this work is to make a comparative study of several classification techniques on the basis of their performance parameters using spam dataset, and the performance of the different classifiers is measured with different ratio of the testing and training dataset.
Abstract: Nowadays, people and companies use emails for information exchange, email messages, and etc., because they are the fastest and the cheapest way. The main problem that faces email messages is the undesirable emails which known as spams. Spams may cause overflow the internet with considerable copies of the same message or carry malicious content that harms user system and reduce the performance. The purpose of this work is to make a comparative study of several classification techniques on the basis of their performance parameters using spam dataset. The performance of the different classifiers is measured with different ratio of the testing and training dataset. Also, the performance of the classifiers is calculated with and without low variance filter. By applying the low variance filter the accuracy of the KNN classifier is enhanced with about 9% while the accuracy of the other classifier is decreased.

3 citations


Cites methods from "Spam Reduction by using E-mail Hist..."

  • ...[5] shows that, he suggested a new effective method that reduced the spam messages by integrating prevention and detection techniques in one scheme....

    [...]

Proceedings ArticleDOI
23 Feb 2021
TL;DR: In this paper, the authors explored different machine learning techniques relevant to the spam detection and discussed the contributions provided by researchers for controlling the spamming problem using machine learning classifiers by conducting a comparative study of the selected machine learning algorithms such as: Naive Bayes, Clustering techniques, Random Forest, Decision Tree and Support Vector Machine (SVM).
Abstract: Sending and receiving e-mails have continued to take the lead being the easiest and fastest way of e-communication despite the presence of other forms of e-communication such as social networking. The rise in online transactions through email has globally contributed to the increasing rate of spam emails relatively which has been a major problem in the field of computing. In this note, there are many machine learning techniques available for detecting these unwanted spams. In spite of the significant progress made in the figures of literature reviewed, there is no machine learning method that has achieve 100% accuracy. Each algorithm only utilizes limited features and properties for classification. Therefore, identifying the best algorithm is an important task as their strengths need to be weighed against their limitations. In this paper we explored different machine learning techniques relevant to the spam detection and discussed the contributions provided by researchers for controlling the spamming problem using machine learning classifiers by conducting a comparative study of the selected machine learning algorithms such as: Naive Bayes, Clustering techniques, Random Forest, Decision Tree and Support Vector Machine (SVM).

2 citations

Journal ArticleDOI
TL;DR: The accuracy rate of the model in detecting DDoS attack is high when compared with that of the related works which recorded detection accuracy as 98, sensitivity 96, specificity 100% and precision 100%.
Abstract: One of the dangers faced by various organizations and institutions operating in the cyberspace is Distributed Denial of Service (DDoS) attacks; it is carried out through the internet. It resultant consequences are that it slow down internet services, makes it unavailable, and sometime destroy the systems. Most of the services it affects are online applications and procedures, system and network performance, emails and other system resources. The aim of this work is to detect and classify DDoS attack traffics and normal traffics using multi layered feed forward (FFANN) technique as a tool to develop model. The input parameters used for training the model are: service count, duration, protocol bit, destination byte, and source byte, while the output parameters are DDoS attack traffic or normal traffic. KDD99 dataset was used for the experiment. After the experiment the following results were gotten, 100% precision, 100% specificity rate, 100% classified rate, 99.97% sensitivity. The detection rate is 99.98%, error rate is 0.0179%, and inconclusive rate is 0%. The results above showed that the accuracy rate of the model in detecting DDoS attack is high when compared with that of the related works which recorded detection accuracy as 98%, sensitivity 96%, specificity 100% and precision 100%.

1 citations

References
More filters
Proceedings ArticleDOI
25 Oct 2004
TL;DR: In this paper, the authors present quantitative data about SMTP traffic to MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) based on packet traces taken in December 2000 and February 2004, and show that the volume of email has increased by 866% between 2000 and 2004.
Abstract: This paper presents quantitative data about SMTP traffic to MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) based on packet traces taken in December 2000 and February 2004. These traces show that the volume of email has increased by 866% between 2000 and 2004. Local mail hosts utilizing black lists generated over 470,000 DNS lookups, which accounts for 14% of all DNS lookups that were observed on the border gateway of CSAIL on a given day in 2004. In comparison, DNS black list lookups accounted for merely 0.4% of lookups in December 2000.The distribution of the number of connections per remote spam source is Zipf-like in 2004, but not so in 2000. This suggests that black lists may be ineffective at fully stemming the tide of spam. We examined seven popular black lists and found that 80% of spam sources we identified are listed in some DNS black list. Some DNS black lists appear to be well-correlated with others, which should be considered when estimating the likelihood that a host is a spam source.

205 citations


"Spam Reduction by using E-mail Hist..." refers methods in this paper

  • ...In [13], Jung and Sit check the use of DNS blacklists for address-based filtering of spams....

    [...]

Proceedings Article
10 Aug 2009
TL;DR: An automated reputation engine, SNARE, is built based on network-level features that can be ascertained without ever looking at a packet's contents, such as the distance in IP space to other email senders or the geographic distance between sender and receiver.
Abstract: Users and network administrators need ways to filter email messages based primarily on the reputation of the sender. Unfortunately, conventional mechanisms for sender reputation--notably, IP blacklists--are cumbersome to maintain and evadable. This paper investigates ways to infer the reputation of an email sender based solely on network-level features, without looking at the contents of a message. First, we study first-order properties of network-level features that may help distinguish spammers from legitimate senders. We examine features that can be ascertained without ever looking at a packet's contents, such as the distance in IP space to other email senders or the geographic distance between sender and receiver. We derive features that are lightweight, since they do not require seeing a large amount of email from a single IP address and can be gleaned without looking at an email's contents--many such features are apparent from even a single packet. Second, we incorporate these features into a classification algorithm and evaluate the classifier's ability to automatically classify email senders as spammers or legitimate senders. We build an automated reputation engine, SNARE, based on these features using labeled data from a deployed commercial spam-filtering system. We demonstrate that SNARE can achieve comparable accuracy to existing static IP blacklists: about a 70%detection rate for less than a 0.3%false positive rate. Third, we show how SNARE can be integrated into existing blacklists, essentially as a first-pass filter.

162 citations

Proceedings ArticleDOI
13 Sep 2000
TL;DR: A comparative evaluation of several machine learning algorithms applied to spam filtering, considering the text of the messages and a set of heuristics for the task, concludes that cost-oriented biasing and evaluation is performed.
Abstract: Spam filtering is a text categorization task that shows especial features that make it interesting and difficult. First, the task has been performed traditionally using heuristics from the domain. Second, a cost model is required to avoid misclassification of legitimate messages. We present a comparative evaluation of several machine learning algorithms applied to spam filtering, considering the text of the messages and a set of heuristics for the task. Cost-oriented biasing and evaluation is performed.

66 citations


"Spam Reduction by using E-mail Hist..." refers methods in this paper

  • ...Another way to prevent spam is IP blacklist which is the oldest method of anti spam that prevents spam messages depending on the IP address [7]....

    [...]

Proceedings ArticleDOI
26 Aug 2010
TL;DR: The solution developed is an offline application that uses the k-Nearest Neighbor (kNN) algorithm and a pre-classified email data set for the learning process for the spam detection filter.
Abstract: Spamming has become a time consuming and expensive problem for which several new directions have been investigated lately. This paper presents a new approach for a spam detection filter. The solution developed is an offline application that uses the k-Nearest Neighbor (kNN) algorithm and a pre-classified email data set for the learning process.

64 citations


"Spam Reduction by using E-mail Hist..." refers methods in this paper

  • ...A number of techniques benefit of clustering as a part of their spam detection approach like: clustering followed by KNN classification [10], [11] and clustering followed by SVM classification [12]....

    [...]

Journal ArticleDOI
01 Jan 2011
TL;DR: A new method for clustering of spam messages collected in bases of antispam system and Analyzing origins of the spam messages from collection, it is possible to define and solve the organized social networks of spammers.
Abstract: A new method for clustering of spam messages collected in bases of antispam system is offered. The genetic algorithm is developed for solving clustering problems. The objective function is a maximization of similarity between messages in clusters, which is defined by k-nearest neighbor algorithm. Application of genetic algorithm for solving constrained problems faces the problem of constant support of chromosomes which reduces convergence process. Therefore, for acceleration of convergence of genetic algorithm, a penalty function that prevents occurrence of infeasible chromosomes at ranging of values of function of fitness is used. After classification, knowledge extraction is applied in order to get information about classes. Multidocument summarization method is used to get the information portrait of each cluster of spam messages. Classifying and parametrizing spam templates, it will be also possible to define the thematic dependence from geographical dependence (e.g., what subjects prevail in spam messages sent from certain countries). Thus, the offered system will be capable to reveal purposeful information attacks if those occur. Analyzing origins of the spam messages from collection, it is possible to define and solve the organized social networks of spammers.

40 citations


"Spam Reduction by using E-mail Hist..." refers methods in this paper

  • ...A number of techniques benefit of clustering as a part of their spam detection approach like: clustering followed by KNN classification [10], [11] and clustering followed by SVM classification [12]....

    [...]