Effective Filtering of Unsolicited Messages from Online Social Networks Using Spam Templates and Social Contexts

doi:10.1007/S11277-020-07228-Y

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Identification and Filtering of Web Spams Using a Machine Learning Method

[...]

Dawei Zhang, Yanyu Liu

20 Dec 2022-International Journal of Computational Intelligence and Applications

TL;DR: In this article , the authors proposed to convert the email text into vector features using the vector space model, constructed a two-dimensional matrix, and used a convolutional neural network (CNN) to identify spam on the Internet.

...read moreread less

Abstract: In order to enhance the filtering of spam on the Internet and improve the experience of Internet users, this paper proposed to convert the email text into vector features using the vector space model, constructed a two-dimensional matrix, and used a convolutional neural network (CNN) to identify spam on the Internet. The CNN was compared with other two classifiers, support vector machine (SVM), and backward-propagation neural network (BPNN), in simulation experiments. The final results showed that the spam recognition algorithm with CNN as the classifier had better recognition performance than the algorithms with SVM and BPNN classifiers and was also more advantageous in terms of recognition cost and time for spam; in addition, the CNN had the best recognition performance when the number of extracted features was 15.

...read moreread less

References

PDF

Open Access

More filters

Journal Article•DOI•

Fame for sale

[...]

Stefano Cresci¹, Roberto Di Pietro², Marinella Petrocchi, Angelo Spognardi³, Maurizio Tesconi - Show less +1 more•Institutions (3)

Bell Labs¹, University of Padua², Technical University of Denmark³

01 Dec 2015

TL;DR: A novel Class A classifier general enough to thwart overfitting, lightweight thanks to the usage of the less costly features, and still able to correctly classify more than 95% of the accounts of the original training set.

...read moreread less

Abstract: Fake followers are those Twitter accounts specifically created to inflate the number of followers of a target account. Fake followers are dangerous for the social platform and beyond, since they may alter concepts like popularity and influence in the Twittersphere-hence impacting on economy, politics, and society. In this paper, we contribute along different dimensions. First, we review some of the most relevant existing features and rules (proposed by Academia and Media) for anomalous Twitter accounts detection. Second, we create a baseline dataset of verified human and fake follower accounts. Such baseline dataset is publicly available to the scientific community. Then, we exploit the baseline dataset to train a set of machine-learning classifiers built over the reviewed rules and features. Our results show that most of the rules proposed by Media provide unsatisfactory performance in revealing fake followers, while features proposed in the past by Academia for spam detection provide good results. Building on the most promising features, we revise the classifiers both in terms of reduction of overfitting and cost for gathering the data needed to compute the features. The final result is a novel Class A classifier, general enough to thwart overfitting, lightweight thanks to the usage of the less costly features, and still able to correctly classify more than 95% of the accounts of the original training set. We ultimately perform an information fusion-based sensitivity analysis, to assess the global sensitivity of each of the features employed by the classifier.The findings reported in this paper, other than being supported by a thorough experimental methodology and interesting on their own, also pave the way for further investigation on the novel issue of fake Twitter followers.

...read moreread less

340 citations

Journal Article•DOI•

Feature selection for text classification: A review

[...]

Xuelian Deng¹, Yuqing Li¹, Jian Weng², Jilian Zhang²•Institutions (2)

Guangxi University¹, Jinan University²

01 Feb 2019-Multimedia Tools and Applications

TL;DR: A comprehensive review on feature selection techniques for text classification, including Nearest Neighbor (NN) method, Naïve Bayes, Support Vector Machine (SVM), Decision Tree (DT), and Neural Networks, is given.

...read moreread less

Abstract: Big multimedia data is heterogeneous in essence, that is, the data may be a mixture of video, audio, text, and images. This is due to the prevalence of novel applications in recent years, such as social media, video sharing, and location based services (LBS), etc. In many multimedia applications, for example, video/image tagging and multimedia recommendation, text classification techniques have been used extensively to facilitate multimedia data processing. In this paper, we give a comprehensive review on feature selection techniques for text classification. We begin by introducing some popular representation schemes for documents, and similarity measures used in text classification. Then, we review the most popular text classifiers, including Nearest Neighbor (NN) method, Naive Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT), and Neural Networks. Next, we survey four feature selection models, namely the filter, wrapper, embedded and hybrid, discussing pros and cons of the state-of-the-art feature selection approaches. Finally, we conclude the paper and give a brief introduction to some interesting feature selection work that does not belong to the four models.

...read moreread less

223 citations

Journal Article•DOI•

Data mining techniques in social media

[...]

MohammadNoor Injadat¹, Fadi Salo¹, Ali Bou Nassif²•Institutions (2)

University of Western Ontario¹, University of Sharjah²

19 Nov 2016-Neurocomputing

TL;DR: The goal of the present survey is to analyze the data mining techniques that were utilized by social media networks between 2003 and 2015 and suggest that more research be conducted by both the academia and the industry since the studies done so far are not sufficiently exhaustive of datamining techniques.

...read moreread less

128 citations

Journal Article•DOI•

Trust Evaluation in Online Social Networks Using Generalized Network Flow

[...]

Wenjun Jiang¹, Jie Wu², Feng Li³, Guojun Wang⁴, Huanyang Zheng² - Show less +1 more•Institutions (4)

Hunan University¹, Temple University², Indiana University – Purdue University Indianapolis³, Central South University⁴

01 Mar 2016-IEEE Transactions on Computers

TL;DR: This work proposes a modified flow-based trust evaluation scheme GFTrust, in which it addresses path dependence using network flow, and model trust decay with the leakage associated with each node, to predict trust in OSNs with a high accuracy and verify its preferable properties.

...read moreread less

Abstract: In online social networks (OSNs), to evaluate trust from one user to another indirectly connected user, the trust evidence in the trusted paths (i.e., paths built through intermediate trustful users) should be carefully treated. Some paths may overlap with each other, leading to a unique challenge of path dependence , i.e., how to aggregate the trust values of multiple dependent trusted paths. OSNs bear the characteristic of high clustering, which makes the path dependence phenomenon common. Another challenge is trust decay through propagation, i.e., how to propagate trust along a trusted path, considering the possible decay in each node. We analyze the similarity between trust propagation and network flow, and convert a trust evaluation task with path dependence and trust decay into a generalized network flow problem. We propose a modified flow-based trust evaluation scheme GFTrust , in which we address path dependence using network flow, and model trust decay with the leakage associated with each node. Experimental results, with the real social network data sets of Epinions and Advogato, demonstrate that GFTrust can predict trust in OSNs with a high accuracy, and verify its preferable properties.

...read moreread less

112 citations

Journal Article•DOI•

A Performance Evaluation of Machine Learning-Based Streaming Spam Tweets Detection

[...]

Chao Chen¹, Jun Zhang¹, Yi Xie², Yang Xiang¹, Wanlei Zhou¹, Mohammad Mehedi Hassan³, Abdulhameed Alelaiwi³, Majed Alrubaian³ - Show less +4 more•Institutions (3)

Deakin University¹, Sun Yat-sen University², King Saud University³

01 Sep 2015-IEEE Transactions on Computational Social Systems

TL;DR: The results show the streaming spam tweet detection is still a big challenge and a robust detection technique should take into account the three aspects of data, feature, and model, and a performance evaluation of existing machine learning-based streaming spam detection methods is needed.

...read moreread less

Abstract: The popularity of Twitter attracts more and more spammers. Spammers send unwanted tweets to Twitter users to promote websites or services, which are harmful to normal users. In order to stop spammers, researchers have proposed a number of mechanisms. The focus of recent works is on the application of machine learning techniques into Twitter spam detection. However, tweets are retrieved in a streaming way, and Twitter provides the Streaming API for developers and researchers to access public tweets in real time. There lacks a performance evaluation of existing machine learning-based streaming spam detection methods. In this paper, we bridged the gap by carrying out a performance evaluation, which was from three different aspects of data, feature, and model. A big ground-truth of over 600 million public tweets was created by using a commercial URL-based security tool. For real-time spam detection, we further extracted 12 lightweight features for tweet representation. Spam detection was then transformed to a binary classification problem in the feature space and can be solved by conventional machine learning algorithms. We evaluated the impact of different factors to the spam detection performance, which included spam to nonspam ratio, feature discretization, training data size, data sampling, time-related data, and machine learning algorithms. The results show the streaming spam tweet detection is still a big challenge and a robust detection technique should take into account the three aspects of data, feature, and model.

...read moreread less

102 citations

Collapse

Effective Filtering of Unsolicited Messages from Online Social Networks Using Spam Templates and Social Contexts

Citations

References

Related Papers (5)