scispace - formally typeset
Search or ask a question
Book ChapterDOI

Automated Spam Detection in Short Text Messages

TL;DR: Experimental results indicate that the proposed algorithm is highly accurate in detecting spam in short messages and can be utilized by a wide variety of users to reduce the volume of spam messages.
Abstract: Increase in the popularity and reach of short text messages has led to their usage in propagating unsolicited advertising, promotional offers, and other unwarranted material to users. This has led to a high influx of such spam messages. In order to protect the interests of the user, several countermeasures have been deployed by telecommunication companies to hinder the volume of such spam. However, some volume of spam messages still manage to avoid these measures and cause varying degree of annoyance to users. In this chapter, an automated spam detection algorithm is proposed to deal with the particular problem of short text message spam. The proposed algorithm performs the two class (spam, ham) classification using stylistic and text features specific to short text messages. The algorithm is evaluated on three databases belonging to diverse demographic settings. Experimental results indicate that the proposed algorithm is highly accurate in detecting spam in short messages and can be utilized by a wide variety of users to reduce the volume of spam messages.
Citations
More filters
Journal ArticleDOI
TL;DR: A new hybrid ensemble approach is proposed that combines the predictions obtained by the classifiers using the original text samples along with their variations created by applying text normalization and semantic indexing techniques, which can improve the text content quality and enhance the performance of the expert systems for spamming detection.
Abstract: A new classifier is presented to detect undesired short text comments.The proposed approach is light, fast, multinomial and offers incremental learning.The impact of applying text normalization and semantic indexing is studied.The results indicate the proposed techniques outperformed most of the approaches.Text normalization and semantic indexing enhanced the classifiers performance. The popularity and reach of short text messages commonly used in electronic communication have led spammers to use them to propagate undesired content. This is often composed by misleading information, advertisements, viruses, and malwares that can be harmful and annoying to users. The dynamic nature of spam messages demands for knowledge-based systems with online learning and, therefore, the most traditional text categorization techniques can not be used. In this study, we introduce the MDLText, a text classifier based on the minimum description length principle, to the context of filtering undesired short text messages. The proposed approach supports incremental learning and, therefore, its predictive model is scalable and can adapt to continuously evolving spamming techniques. It is also fast, with computational cost increasing linearly with the number of samples and features, which is very desirable for expert systems applied to real-time electronic communication. In addition to the dynamic nature of these messages, they are also short and usually poorly written, rife with slangs, symbols, and abbreviations that difficult text representation, learning, and filtering. In this scenario, we also investigated the benefits of using text normalization and semantic indexing techniques. We showed these techniques can improve the text content quality and, consequently, enhance the performance of the expert systems for spamming detection. Based on these findings, we propose a new hybrid ensemble approach that combines the predictions obtained by the classifiers using the original text samples along with their variations created by applying text normalization and semantic indexing techniques. It has the advantages of being independent of the classification method and the results indicated it is efficient to filter undesired short text messages.

23 citations


Cites methods from "Automated Spam Detection in Short T..."

  • ...In a recent study, Goswami et al. (2016) used SVM trained with stylistic and text features specific of short text samples to classify SMS messages....

    [...]

Proceedings ArticleDOI
05 Oct 2020
TL;DR: In this article, TF-IDF and RF term weighting methods were compared in order to classify spam SMS and to use the limited content of SMSs more meaningfully, and the vectors obtained from the data set were weighted by TFIDF, RF and 5 different classifiers popular in this field.
Abstract: Short message services are one of the most widely used communication services. The increased use of mobile devices and the lowering of SMS costs by operators enable short message services to remain popular. However, this popularity causes tens of users to be exposed to spam SMS every day. The term spam can simply be referred to as unwanted messages by users. Although organizations take measures against spam SMS and there are widely used spam SMS filtering systems, the problem of spam SMS is becoming widespread. There are many studies in the literature for the detection of spam SMS, but new and efficient methods are still needed. In this study, TF-IDF and RF term weighting methods which are frequently used in text mining applications were compared in order to classify spam SMS and to use the limited content of SMSs more meaningfully. The vectors obtained from the data set were weighted by TF-IDF and RF term weighting methods and classified with 5 different classifiers popular in this field.

4 citations

Proceedings ArticleDOI
15 May 2018
TL;DR: The experimental studies have shown that a developed artificial neural network model is adequate and it can be effectively used for the e-mail messages classification and the scheme of this technology for e- mail messages “spam”/“not spam” classification is shown.
Abstract: In this paper we solve the problem of neural network technology development for e-mail messages classification. We analyze basic methods of spam filtering such as a sender IP-address analysis, spam messages repeats detection and the Bayesian filtering according to words. We offer the neural network technology for solving this problem because the neural networks are universal approximators and effective in addressing the problems of classification. Also, we offer the scheme of this technology for e-mail messages “spam”/“not spam” classification. The creation of effective neural network model of spam filtering is performed within the databases knowledge discovery technology. For this training set is formed, the neural network model is trained, its value and classifying ability are estimated. The experimental studies have shown that a developed artificial neural network model is adequate and it can be effectively used for the e-mail messages classification. Thus, in this paper we have shown the possibility of the effective neural network model use for the e-mail messages filtration and have shown a scheme of artificial neural network model use as a part of the e-mail spam filtering intellectual system.

2 citations


Cites methods from "Automated Spam Detection in Short T..."

  • ...SPAM FILTRATION METHODS ANALYSIS The basic methods of spam filtration are [8,12,20,21]:...

    [...]

Journal ArticleDOI
21 May 2018-iSys
TL;DR: A simple, fast, scalable, multiclass, and online text classification method based on the minimum description length principle that is effective on instant messaging and SMS spam filtering in both online and offline learning contexts is evaluated.
Abstract: Spam filtering in online instant messages and SMS is a challenging problem nowadays. It is because the messages are often very short and rife with slangs, idioms, symbols, emoticons, and abbreviations which hamper predicting and knowledge discovering. In order to face this problem, we evaluated a simple, fast, scalable, multiclass, and online text classification method based on the minimum description length principle. We conducted experiments using a real and public dataset, which demonstrate that our method is effective on instant messaging and SMS spam filtering in both online and offline learning contexts.

1 citations