Content based SMS spam filtering
Citations
369 citations
Cites methods from "Content based SMS spam filtering"
...This corpus has been used in the following academic research efforts: [6], [7], and [14]....
[...]
...Table 3: Evaluated classi.ers Classi.ers BasicNa¨iveBayes(NB) BasicNB[2] MultinomialtermfrequencyNB MNTFNB[2] MultinomialBooleanNB MNBoolNB[2] MultivariateBernoulliNB BernNB[2] BooleanNB BoolNB[2] MultivariateGaussNB GaussNB[2] FlexibleBayes FlexNB[2] Boosted NB[12] LinearSupportVectorMachine SVM[10,13] MinimumDescription Length MDL[4] K-NearestNeighbors KNN[1,14](K =1,3 or5) C4.5[15, 14] Boosted C4.5[14] PART[11, 14] 3.1 Results We carried out this study using the following experiment protocol....
[...]
...K-Nearest Neighbors – KNN [1, 14] (K = 1, 3 or 5)...
[...]
259 citations
Cites background from "Content based SMS spam filtering"
...While this survey confines itself to email spam, we note that the definitions above apply to any number of communication media, including text and voice messages [31, 45, 84], social networks [206], and blog comments [37, 123]....
[...]
180 citations
Cites background from "Content based SMS spam filtering"
...The messages have been manually labeled into 10 categories: (1) medical emergency; (2) people trapped; (3) food shortage; (4) water shortage; (5) water sanitation; (6) shelter needed; (7) collapsed structure; (8) food distribution; (9) hospital/clinic services; and (10) person news....
[...]
[...]
164 citations
Cites background or methods from "Content based SMS spam filtering"
...Most work in SMS spam filtering uses some sort of feature selection technique to reduce the large feature space, including Information Gain (Gómez Hidalgo et al., 2006; Sohn et al., 2009) and Mutual Information (Deng & Peng, 2006) which are widely accepted methods in text classification, but also including less commonly used methods such as Expected Cross Entropy (Cai et al., 2008)—interestingly Information Gain is also the most 7Stylometry is the statistical analysis of linguistic style. commonly used method for email spam filtering (Guzella & Caminhas, 2009)....
[...]
...Most work in SMS spam filtering uses some sort of feature selection technique to reduce the large feature space, including Information Gain (Gómez Hidalgo et al., 2006; Sohn et al., 2009) and Mutual Information (Deng & Peng, 2006) which are widely accepted methods in text classification, but also…...
[...]
...A feature set including words, normalised (i.e. lowercase) words, character bi- and tri-grams and word bi-grams suggested by Gómez Hidalgo et al. (2006) has provided a base feature set for much of the work in feature engineering....
[...]
...Recently the SMS Spam Collection has been made publicly available15 (Almeida et al., 2011), which is an extension of a corpus previously compiled by Gómez Hidalgo et al. (2006)....
[...]
...Gómez Hidalgo et al. (2006) evaluated a number of classification algorithms on two SMS spam datasets and concluded that these techniques can be effectively transferred from email to SMS spam filtering, with SVMs being the most suitable....
[...]
140 citations
References
21,674 citations
7,539 citations
"Content based SMS spam filtering" refers background in this paper
...learning process takes as input the training collection, and consists of the following steps [14]:...
[...]
5,936 citations
"Content based SMS spam filtering" refers methods in this paper
...Contentbased filters can also be built by using Machine Learning techniques applied to a set of pre-classified messages [16]....
[...]
5,366 citations
"Content based SMS spam filtering" refers methods in this paper
...We use Information Gain (IG) [18, 19] as attribute quality metric....
[...]
3,571 citations
"Content based SMS spam filtering" refers background in this paper
...Conversion of a message into an attribute-value pairs’ vector [13], where the attributes are the previously defined tokens, and their values can be binary, (relative) frequencies, etc....
[...]