scispace - formally typeset
Book ChapterDOI

Automated Spam Detection in Short Text Messages

01 Jan 2016-pp 85-98
TL;DR: Experimental results indicate that the proposed algorithm is highly accurate in detecting spam in short messages and can be utilized by a wide variety of users to reduce the volume of spam messages.

...read more

Abstract: Increase in the popularity and reach of short text messages has led to their usage in propagating unsolicited advertising, promotional offers, and other unwarranted material to users. This has led to a high influx of such spam messages. In order to protect the interests of the user, several countermeasures have been deployed by telecommunication companies to hinder the volume of such spam. However, some volume of spam messages still manage to avoid these measures and cause varying degree of annoyance to users. In this chapter, an automated spam detection algorithm is proposed to deal with the particular problem of short text message spam. The proposed algorithm performs the two class (spam, ham) classification using stylistic and text features specific to short text messages. The algorithm is evaluated on three databases belonging to diverse demographic settings. Experimental results indicate that the proposed algorithm is highly accurate in detecting spam in short messages and can be utilized by a wide variety of users to reduce the volume of spam messages.

...read more

Citations
More filters

Journal ArticleDOI
TL;DR: A new hybrid ensemble approach is proposed that combines the predictions obtained by the classifiers using the original text samples along with their variations created by applying text normalization and semantic indexing techniques, which can improve the text content quality and enhance the performance of the expert systems for spamming detection.

...read more

Abstract: A new classifier is presented to detect undesired short text comments.The proposed approach is light, fast, multinomial and offers incremental learning.The impact of applying text normalization and semantic indexing is studied.The results indicate the proposed techniques outperformed most of the approaches.Text normalization and semantic indexing enhanced the classifiers performance. The popularity and reach of short text messages commonly used in electronic communication have led spammers to use them to propagate undesired content. This is often composed by misleading information, advertisements, viruses, and malwares that can be harmful and annoying to users. The dynamic nature of spam messages demands for knowledge-based systems with online learning and, therefore, the most traditional text categorization techniques can not be used. In this study, we introduce the MDLText, a text classifier based on the minimum description length principle, to the context of filtering undesired short text messages. The proposed approach supports incremental learning and, therefore, its predictive model is scalable and can adapt to continuously evolving spamming techniques. It is also fast, with computational cost increasing linearly with the number of samples and features, which is very desirable for expert systems applied to real-time electronic communication. In addition to the dynamic nature of these messages, they are also short and usually poorly written, rife with slangs, symbols, and abbreviations that difficult text representation, learning, and filtering. In this scenario, we also investigated the benefits of using text normalization and semantic indexing techniques. We showed these techniques can improve the text content quality and, consequently, enhance the performance of the expert systems for spamming detection. Based on these findings, we propose a new hybrid ensemble approach that combines the predictions obtained by the classifiers using the original text samples along with their variations created by applying text normalization and semantic indexing techniques. It has the advantages of being independent of the classification method and the results indicated it is efficient to filter undesired short text messages.

...read more

21 citations


Cites methods from "Automated Spam Detection in Short T..."

  • ...In a recent study, Goswami et al. (2016) used SVM trained with stylistic and text features specific of short text samples to classify SMS messages....

    [...]


Proceedings ArticleDOI
15 May 2018-
TL;DR: The experimental studies have shown that a developed artificial neural network model is adequate and it can be effectively used for the e-mail messages classification and the scheme of this technology for e- mail messages “spam”/“not spam” classification is shown.

...read more

Abstract: In this paper we solve the problem of neural network technology development for e-mail messages classification. We analyze basic methods of spam filtering such as a sender IP-address analysis, spam messages repeats detection and the Bayesian filtering according to words. We offer the neural network technology for solving this problem because the neural networks are universal approximators and effective in addressing the problems of classification. Also, we offer the scheme of this technology for e-mail messages “spam”/“not spam” classification. The creation of effective neural network model of spam filtering is performed within the databases knowledge discovery technology. For this training set is formed, the neural network model is trained, its value and classifying ability are estimated. The experimental studies have shown that a developed artificial neural network model is adequate and it can be effectively used for the e-mail messages classification. Thus, in this paper we have shown the possibility of the effective neural network model use for the e-mail messages filtration and have shown a scheme of artificial neural network model use as a part of the e-mail spam filtering intellectual system.

...read more

2 citations


Cites methods from "Automated Spam Detection in Short T..."

  • ...SPAM FILTRATION METHODS ANALYSIS The basic methods of spam filtration are [8,12,20,21]:...

    [...]


Journal ArticleDOI
21 May 2018-iSys
TL;DR: A simple, fast, scalable, multiclass, and online text classification method based on the minimum description length principle that is effective on instant messaging and SMS spam filtering in both online and offline learning contexts is evaluated.

...read more

Abstract: Spam filtering in online instant messages and SMS is a challenging problem nowadays. It is because the messages are often very short and rife with slangs, idioms, symbols, emoticons, and abbreviations which hamper predicting and knowledge discovering. In order to face this problem, we evaluated a simple, fast, scalable, multiclass, and online text classification method based on the minimum description length principle. We conducted experiments using a real and public dataset, which demonstrate that our method is effective on instant messaging and SMS spam filtering in both online and offline learning contexts.

...read more

1 citations


Proceedings ArticleDOI
05 Oct 2020-
Abstract: Short message services are one of the most widely used communication services. The increased use of mobile devices and the lowering of SMS costs by operators enable short message services to remain popular. However, this popularity causes tens of users to be exposed to spam SMS every day. The term spam can simply be referred to as unwanted messages by users. Although organizations take measures against spam SMS and there are widely used spam SMS filtering systems, the problem of spam SMS is becoming widespread. There are many studies in the literature for the detection of spam SMS, but new and efficient methods are still needed. In this study, TF-IDF and RF term weighting methods which are frequently used in text mining applications were compared in order to classify spam SMS and to use the limited content of SMSs more meaningfully. The vectors obtained from the data set were weighted by TF-IDF and RF term weighting methods and classified with 5 different classifiers popular in this field.

...read more


References
More filters

Journal ArticleDOI
Chih-Chung Chang1, Chih-Jen Lin1Institutions (1)
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

...read more

Abstract: LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

...read more

37,868 citations


Proceedings ArticleDOI
19 Sep 2011-
TL;DR: A new real, public and non-encoded SMS spam collection that is the largest one as far as the authors know is offered and the performance achieved by several established machine learning methods is compared.

...read more

Abstract: The growth of mobile phone users has lead to a dramatic increasing of SMS spam messages. In practice, fighting mobile phone spam is difficult by several factors, including the lower rate of SMS that has allowed many users and service providers to ignore the issue, and the limited availability of mobile phone spam-filtering software. On the other hand, in academic settings, a major handicap is the scarcity of public SMS spam datasets, that are sorely needed for validation and comparison of different classifiers. Moreover, as SMS messages are fairly short, content-based spam filters may have their performance degraded. In this paper, we offer a new real, public and non-encoded SMS spam collection that is the largest one as far as we know. Moreover, we compare the performance achieved by several established machine learning methods. The results indicate that Support Vector Machine outperforms other evaluated classifiers and, hence, it can be used as a good baseline for further comparison.

...read more

249 citations


Proceedings ArticleDOI
10 Oct 2006-
TL;DR: This paper analyzes to what extent Bayesian filtering techniques used to block email spam, can be applied to the problem of detecting and stopping mobile spam, and demonstrates that Bayesian filters can be effectively transferred from email to SMS spam.

...read more

Abstract: In the recent years, we have witnessed a dramatic increment in the volume of spam email. Other related forms of spam are increasingly revealing as a problem of importance, specially the spam on Instant Messaging services (the so called SPIM), and Short Message Service (SMS) or mobile spam.Like email spam, the SMS spam problem can be approached with legal, economic or technical measures. Among the wide range of technical measures, Bayesian filters are playing a key role in stopping email spam. In this paper, we analyze to what extent Bayesian filtering techniques used to block email spam, can be applied to the problem of detecting and stopping mobile spam. In particular, we have built two SMS spam test collections of significant size, in English and Spanish. We have tested on them a number of messages representation techniques and Machine Learning algorithms, in terms of effectiveness. Our results demonstrate that Bayesian filtering techniques can be effectively transferred from email to SMS spam.

...read more

179 citations


Proceedings ArticleDOI
06 Nov 2007-
TL;DR: It is concluded that content filtering for short messages is surprisingly effective and can be improved substantially using different features, while compression-model filters perform quite well as-is.

...read more

Abstract: We consider the problem of content-based spam filtering for short text messages that arise in three contexts: mobile (SMS) communication, blog comments, and email summary information such as might be displayed by a low-bandwidth client. Short messages often consist of only a few words, and therefore present a challenge to traditional bag-of-words based spam filters. Using three corpora of short messages and message fields derived from real SMS, blog, and spam messages, we evaluate feature-based and compression-model-based spam filters. We observe that bag-of-words filters can be improved substantially using different features, while compression-model filters perform quite well as-is. We conclude that content filtering for short messages is surprisingly effective.

...read more

129 citations


Journal ArticleDOI
Qian Xu1, Evan Wei Xiang1, Qiang Yang2, Jiachun Du2  +1 moreInstitutions (2)
TL;DR: This service-side solution uses graph data mining to distinguish spammers from nonspammers and detect spam without checking a message's contents.

...read more

Abstract: Short Message Service text messages are indispensable, but they face a serious problem from spamming. This service-side solution uses graph data mining to distinguish spammers from nonspammers and detect spam without checking a message's contents.

...read more

87 citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20201
20182
20171