Proceedings ArticleDOI
Application of sim-hash algorithm and big data analysis in spam email detection system
Phuc-Tran Ho,Hee-Sun Kim,Sung-Ryul Kim +2 more
- pp 242-246
TLDR
A novel similarity-based method is proposed that implements the fingerprinting technique on parallel processing framework and meet-in-the-middle approach is used in this method to achieve a higher accuracy in the spam email detection system.Abstract:
Currently, there are many effective techniques that are used for filtering spam emails. However, spammers have mostly identified the weakness of those methods in order to bypass current detection systems. In this paper, we propose a novel similarity-based method that implements the fingerprinting technique on parallel processing framework. Furthermore, meet-in-the-middle approach is used in our method to achieve a higher accuracy in the spam email detection system. Our experimental result demonstrates the improved efficiency of this study.read more
Citations
More filters
Journal ArticleDOI
Malytics: A Malware Detection Scheme
TL;DR: Malytics is a novel scheme to detect malware which is not dependent on any particular tool or operating system and outperforms a wide range of learning-based techniques and also individual state-of-the-art models on both platforms.
Proceedings ArticleDOI
Improve the Prediction Accuracy of Naïve Bayes Classifier with Association Rule Mining
TL;DR: This work proposes an association rule mining to improve Naïve Bayes Classifier, one of the famous algorithm in big data classification but based on an independent assumptions between features.
Proceedings ArticleDOI
Detecting spam and phishing mails using SVM and obfuscation URL detection algorithm
TL;DR: A system that uses SVM technique along with map-reduce paradigm to achieve a higher accuracy in detection of the spam email is proposed and tries to overcome the two hurdles of the SVM.
Proceedings Article
Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
TL;DR: The topics included algorithms and computational complexity bounds for classical problems in algebra, geometry, topology, graph theory, game theory, logic and machine learning, as well as theoretical aspects of security, databases, information retrieval and networks, the web, computational biology, and alternative models of computation including quantum computation and self-assembly.
Proceedings ArticleDOI
Web Service-Enabled Spam Filtering with Naïve Bayes Classification
TL;DR: An anti-spam filter is developed that employs the Naïve Bayesian classifier, an effective engine to pick out spam emails that was trained on Enron Spam Dataset, a well-known spam/legitimate email dataset.
References
More filters
Proceedings ArticleDOI
Similarity estimation techniques from rounding algorithms
TL;DR: It is shown that rounding algorithms for LPs and SDPs used in the context of approximation algorithms can be viewed as locality sensitive hashing schemes for several interesting collections of objects.
Journal ArticleDOI
Near-Duplicate Web Page Detection: An Efficient Approach Using Clustering, Sentence Feature and Fingerprinting
TL;DR: This work makes use of sentence level features along with fingerprinting method to identify the near duplicate web pages in an efficient manner using K-mode clustering and subsequently sentence feature and fingerprint comparison.
Proceedings Article
Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
TL;DR: The topics included algorithms and computational complexity bounds for classical problems in algebra, geometry, topology, graph theory, game theory, logic and machine learning, as well as theoretical aspects of security, databases, information retrieval and networks, the web, computational biology, and alternative models of computation including quantum computation and self-assembly.
Book ChapterDOI
Fixing the threshold for effective detection of near duplicate web documents in web crawling
TL;DR: A novel and efficient approach is presented for the detection of near duplicate web pages in web crawling where the keywords are extracted from the crawled pages and the similarity score between two pages is calculated.
Proceedings ArticleDOI
Feature selection and similarity coefficient based method for email spam filtering
TL;DR: In this study, statistical feature selection approach combined with similarity coefficients are used to improve the accuracy and detection rate for the spam detection and filtering.
Related Papers (5)
Language-model-based detection cascade for efficient classification of image-based spam e-mail
Jen-Hao Hsia,Ming-Syan Chen +1 more