scispace - formally typeset
Proceedings ArticleDOI

Application of sim-hash algorithm and big data analysis in spam email detection system

TLDR
A novel similarity-based method is proposed that implements the fingerprinting technique on parallel processing framework and meet-in-the-middle approach is used in this method to achieve a higher accuracy in the spam email detection system.
Abstract
Currently, there are many effective techniques that are used for filtering spam emails. However, spammers have mostly identified the weakness of those methods in order to bypass current detection systems. In this paper, we propose a novel similarity-based method that implements the fingerprinting technique on parallel processing framework. Furthermore, meet-in-the-middle approach is used in our method to achieve a higher accuracy in the spam email detection system. Our experimental result demonstrates the improved efficiency of this study.

read more

Citations
More filters
Journal ArticleDOI

Malytics: A Malware Detection Scheme

TL;DR: Malytics is a novel scheme to detect malware which is not dependent on any particular tool or operating system and outperforms a wide range of learning-based techniques and also individual state-of-the-art models on both platforms.
Proceedings ArticleDOI

Improve the Prediction Accuracy of Naïve Bayes Classifier with Association Rule Mining

TL;DR: This work proposes an association rule mining to improve Naïve Bayes Classifier, one of the famous algorithm in big data classification but based on an independent assumptions between features.
Proceedings ArticleDOI

Detecting spam and phishing mails using SVM and obfuscation URL detection algorithm

TL;DR: A system that uses SVM technique along with map-reduce paradigm to achieve a higher accuracy in detection of the spam email is proposed and tries to overcome the two hurdles of the SVM.
Proceedings Article

Proceedings of the thiry-fourth annual ACM symposium on Theory of computing

TL;DR: The topics included algorithms and computational complexity bounds for classical problems in algebra, geometry, topology, graph theory, game theory, logic and machine learning, as well as theoretical aspects of security, databases, information retrieval and networks, the web, computational biology, and alternative models of computation including quantum computation and self-assembly.
Proceedings ArticleDOI

Web Service-Enabled Spam Filtering with Naïve Bayes Classification

TL;DR: An anti-spam filter is developed that employs the Naïve Bayesian classifier, an effective engine to pick out spam emails that was trained on Enron Spam Dataset, a well-known spam/legitimate email dataset.
References
More filters
Proceedings ArticleDOI

Similarity estimation techniques from rounding algorithms

TL;DR: It is shown that rounding algorithms for LPs and SDPs used in the context of approximation algorithms can be viewed as locality sensitive hashing schemes for several interesting collections of objects.
Journal ArticleDOI

Near-Duplicate Web Page Detection: An Efficient Approach Using Clustering, Sentence Feature and Fingerprinting

TL;DR: This work makes use of sentence level features along with fingerprinting method to identify the near duplicate web pages in an efficient manner using K-mode clustering and subsequently sentence feature and fingerprint comparison.
Proceedings Article

Proceedings of the thiry-fourth annual ACM symposium on Theory of computing

TL;DR: The topics included algorithms and computational complexity bounds for classical problems in algebra, geometry, topology, graph theory, game theory, logic and machine learning, as well as theoretical aspects of security, databases, information retrieval and networks, the web, computational biology, and alternative models of computation including quantum computation and self-assembly.
Book ChapterDOI

Fixing the threshold for effective detection of near duplicate web documents in web crawling

TL;DR: A novel and efficient approach is presented for the detection of near duplicate web pages in web crawling where the keywords are extracted from the crawled pages and the similarity score between two pages is calculated.
Proceedings ArticleDOI

Feature selection and similarity coefficient based method for email spam filtering

TL;DR: In this study, statistical feature selection approach combined with similarity coefficients are used to improve the accuracy and detection rate for the spam detection and filtering.
Related Papers (5)