Towards SMS Spam Filtering: Results under a New Dataset

Open Access

Towards SMS Spam Filtering: Results under a New Dataset

- Vol. 2, Iss: 1, pp 1-18

TLDR

The results indicate that the procedure followed to build the collection does not lead to near-duplicates and, regarding the classifiers, the Support Vector Machines outperforms other evaluated techniques and, hence, it can be used as a good baseline for further comparison.

Abstract:

The growth of mobile phone users has lead to a dramatic increasing of SMS spam messages. Recent reports clearly indicate that the volume of mobile phone spam is dramatically increasing year by year. In practice, fighting such plague is difficult by several factors, including the lower rate of SMS that has allowed many users and service providers to ignore the issue, and the limited availability of mobile phone spam-filtering software. Probably, one of the major concerns in academic settings is the scarcity of public SMS spam datasets, that are sorely needed for validation and comparison of different classifiers. Moreover, traditional content-based filters may have their performance seriously degraded since SMS messages are fairly short and their text is generally rife with idioms and abbreviations. In this paper, we present details about a new real, public and non-encoded SMS spam collection that is the largest one as far as we know. Moreover, we offer a comprehensive analysis of such dataset in order to ensure that there are no duplicated messages coming from previously existing datasets, since it may ease the task of learning SMS spam classifiers and could compromise the evaluation of methods. Additionally, we compare the performance achieved by several established machine learning techniques. Im summary, the results indicate that the procedure followed to build the collection does not lead to near-duplicates and, regarding the classifiers, the Support Vector Machines outperforms other evaluated techniques and, hence, it can be used as a good baseline for further comparison.

Towards SMS Spam Filtering: Results under a New Dataset

Citations

Transfer Learning with Neural AutoML

A Review on Mobile SMS Spam Filtering Techniques

Explainable AI under contract and tort law: legal incentives and technical challenges

Spam filtering for short messages in adversarial environment

ETHOS: an Online Hate Speech Detection Dataset.

References

Random Forests

Maximum likelihood from incomplete data via the EM algorithm

C4.5: Programs for Machine Learning

Introduction to Modern Information Retrieval

Programs for Machine Learning

Related Papers (5)

Contributions to the study of SMS spam filtering: new collection and results

SMS spam filtering

Content based SMS spam filtering

Feature engineering for mobile (SMS) spam filtering

SMSAssassin: crowdsourcing driven mobile-based system for SMS spam filtering