CPSFS: A Credible Personalized Spam Filtering Scheme by Crowdsourcing

doi:10.1155/2017/1457870

Home
/
Papers
/
CPSFS: A Credible Personalized Spam Filtering Scheme by Crowdsourcing

Journal Article•DOI•

CPSFS: A Credible Personalized Spam Filtering Scheme by Crowdsourcing

Xin Liu¹, Zou Pingjun¹, Weishan Zhang¹, Jiehan Zhou², Changying Dai¹, Wang Feng¹, Xiaomiao Zhang¹ - Show less +3 more•Institutions (2)

China University of Petroleum¹, University of Oulu²

27 Dec 2017-Wireless Communications and Mobile Computing (Hindawi)-Vol. 2017, pp 1-9

TL;DR: The experimental results show that the proposed CPSFS can improve the accuracy rate of distinguishing spam from legitimate emails compared with that of Bayesian filter alone.

read less

Abstract: Email spam consumes a lot of network resources and threatens many systems because of its unwanted or malicious content. Most existing spam filters only target complete-spam but ignore semispam. This paper proposes a novel and comprehensive CPSFS scheme: Credible Personalized Spam Filtering Scheme, which classifies spam into two categories: complete-spam and semispam, and targets filtering both kinds of spam. Complete-spam is always spam for all users; semispam is an email identified as spam by some users and as regular email by other users. Most existing spam filters target complete-spam but ignore semispam. In CPSFS, Bayesian filtering is deployed at email servers to identify complete-spam, while semispam is identified at client side by crowdsourcing. An email user client can distinguish junk from legitimate emails according to spam reports from credible contacts with the similar interests. Social trust and interest similarity between users and their contacts are calculated so that spam reports are more accurately targeted to similar users. The experimental results show that the proposed CPSFS can improve the accuracy rate of distinguishing spam from legitimate emails compared with that of Bayesian filter alone.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A lifelong spam emails classification model

[...]

Rami Mustafa A. Mohammad¹•Institutions (1)

University of Dammam¹

23 Jan 2020-Applied Computing and Informatics

TL;DR: An enhanced model is proposed for ensuring lifelong spam classification model and the overall performance of the suggested model is contrasted against various other stream mining classification techniques to prove the success of the proposed model as a lifelong spam emails classification method.

...read moreread less

17 citations

Journal Article•DOI•

Design and Simulation of MIMO Antennas for Mobile Communication

[...]

D. S. Bhargava, T. V. Padmavathy, Yadhamuri Vinitha Reddy, Neelapareddy Kavitha, Vuppalapati Hema - Show less +1 more

01 Dec 2020

7 citations

Journal Article•DOI•

Blockchain-Based Crowdsourcing Makes Training Dataset of Machine Learning No Longer Be in Short Supply

[...]

Haitao Xu, Wei Wei, Yong Qi, Saiyu Qi

26 Jul 2022-Wireless Communications and Mobile Computing

TL;DR: This paper reviews studies applying mobile crowdsourcing to training dataset collection and annotation and proposes a new possible combination of machine learning and crowdsourcing systems.

...read moreread less

Abstract: Recently, machine learning has become popular in various fields like healthcare, smart transportation, network, and big data. However, the labelled training dataset, which is one of the most core of machine learning, cannot meet the requirements of quantity, quality, and diversity due to the limitation of data sources. Crowdsourcing systems based on mobile computing seem to address the bottlenecks faced by machine learning due to their unique advantages; i.e., crowdsourcing can make professional and nonprofessional participate in the collection and annotation process, which can greatly improve the quantity of the training dataset. Additionally, distributed blockchain technology can be embedded into crowdsourcing systems to make it transparent, secure, traceable, and decentralized. Moreover, truth discovery algorithm can improve the accuracy of annotation. Reasonable incentive mechanism will attract many workers to provide plenty of dataset. In this paper, we review studies applying mobile crowdsourcing to training dataset collection and annotation. In addition, after reviewing researches on blockchain or incentive mechanism, we propose a new possible combination of machine learning and crowdsourcing systems.

...read moreread less

3 citations

Journal Article•DOI•

SentiFilter: A Personalized Filtering Model for Arabic Semi-Spam Content based on Sentimental and Behavioral Analysis

[...]

Mashael M. Alsulami, Arwa Yousef Al-Aama

01 Jan 2020-International Journal of Advanced Computer Science and Applications

TL;DR: The proposed SentiFilter model is a hybrid model that combines both sentimental and behavioral factors to detect unwanted content for each user towards pre-defined topics and is expected to provide an effective automated solution for filtering semi-spam content in favor of personalized preferences.

...read moreread less

Abstract: Unwanted content in online social network services is a substantial issue that is continuously growing and negatively affecting the user-browsing experience. Current practices do not provide personalized solutions that meet each individual’s needs and preferences. Therefore, there is a potential demand to provide each user with a personalized level of protection against what he/she perceives as unwanted content. Thus, this paper proposes a personalized filtering model, which we named SentiFilter. It is a hybrid model that combines both sentimental and behavioral factors to detect unwanted content for each user towards pre-defined topics. An experiment involving 80,098 Twitter messages from 32 users was conducted to evaluate the effectiveness of the SentiFilter model. The effectiveness was measured in terms of the consistency between the implicit feedback derived from the SentiFilter model towards five selected topics and the explicit feedback collected explicitly from participants towards the same topics. Results reveal that commenting behavior is more effective than liking behavior to detect unwanted content because of its high consistency with users’ explicit feedback. Findings also indicate that sentiment of users’ comments does not reflect users’ perception of unwanted content. The results of implicit feedback derived from the SentiFilter model accurately agree with users’ explicit feedback by the indication of the low statistical significance difference between the two sets. The proposed model is expected to provide an effective automated solution for filtering semi-spam content in favor of personalized preferences.

...read moreread less

3 citations

Cites background from "CPSFS: A Credible Personalized Spam..."

...Therefore, a trust value needs to be assigned and computed for each contact [3]....
[...]
...[3] classified spam emails into two categories: complete spam and semispam emails....
[...]
...Studies that involved users‟ perspectives in identifying spam content have used terms such as semi-spam [3] and grey spam [2]....
[...]

Journal Article•DOI•

Hyperparameter Optimization of Ensemble Models for Spam Email Detection

[...]

Temidayo Oluwatosin Omotehinwa, David Opeoluwa Oyewola

03 Feb 2023-Applied Sciences

TL;DR: In this paper , the authors developed baseline models of random forest and extreme gradient boost (XGBoost) ensemble algorithms for the detection and classification of spam emails using the Enron1 dataset.

...read moreread less

Abstract: Unsolicited emails, popularly referred to as spam, have remained one of the biggest threats to cybersecurity globally. More than half of the emails sent in 2021 were spam, resulting in huge financial losses. The tenacity and perpetual presence of the adversary, the spammer, has necessitated the need for improved efforts at filtering spam. This study, therefore, developed baseline models of random forest and extreme gradient boost (XGBoost) ensemble algorithms for the detection and classification of spam emails using the Enron1 dataset. The developed ensemble models were then optimized using the grid-search cross-validation technique to search the hyperparameter space for optimal hyperparameter values. The performance of the baseline (un-tuned) and the tuned models of both algorithms were evaluated and compared. The impact of hyperparameter tuning on both models was also examined. The findings of the experimental study revealed that the hyperparameter tuning improved the performance of both models when compared with the baseline models. The tuned RF and XGBoost models achieved an accuracy of 97.78% and 98.09%, a sensitivity of 98.44% and 98.84%, and an F1 score of 97.85% and 98.16%, respectively. The XGBoost model outperformed the random forest model. The developed XGBoost model is effective and efficient for spam email detection.

...read moreread less

2 citations

References

PDF

Open Access

More filters

Proceedings Article•DOI•

User interactions in social networks and their implications

[...]

Christo Wilson¹, Bryce Boe¹, Alessandra Sala¹, Krishna P. N. Puttaswamy¹, Ben Y. Zhao¹ - Show less +1 more•Institutions (1)

University of California, Santa Barbara¹

01 Apr 2009

TL;DR: This paper proposes the use of interaction graphs to impart meaning to online social links by quantifying user interactions, and uses both types of graphs to validate two well-known social-based applications (RE and SybilGuard).

...read moreread less

Abstract: Social networks are popular platforms for interaction, communication and collaboration between friends. Researchers have recently proposed an emerging class of applications that leverage relationships from social networks to improve security and performance in applications such as email, web browsing and overlay routing. While these applications often cite social network connectivity statistics to support their designs, researchers in psychology and sociology have repeatedly cast doubt on the practice of inferring meaningful relationships from social network connections alone.This leads to the question: Are social links valid indicators of real user interaction? If not, then how can we quantify these factors to form a more accurate model for evaluating socially-enhanced applications? In this paper, we address this question through a detailed study of user interactions in the Facebook social network. We propose the use of interaction graphs to impart meaning to online social links by quantifying user interactions. We analyze interaction graphs derived from Facebook user traces and show that they exhibit significantly lower levels of the "small-world" properties shown in their social graph counterparts. This means that these graphs have fewer "supernodes" with extremely high degree, and overall network diameter increases significantly as a result. To quantify the impact of our observations, we use both types of graphs to validate two well-known social-based applications (RE and SybilGuard). The results reveal new insights into both systems, and confirm our hypothesis that studies of social applications should use real indicators of user interactions in lieu of social graphs.

...read moreread less

992 citations

"CPSFS: A Credible Personalized Spam..." refers background in this paper

...The interests of a user in a social network represent the user’s personality [25]....
[...]

Journal Article•DOI•

Mutual Verifiable Provable Data Auditing in Public Cloud Storage

[...]

Yongjun Ren, Jian Shen, Jin Wang, Jin Han, Sungyoung Lee - Show less +1 more

01 Mar 2015-Journal of Internet Technology

TL;DR: This paper proposes an efficient mutual verifiable provable data possession scheme, which utilizes Diffie-Hellman shared key to construct the homomorphic authenticator and is very efficient compared with the previous PDP schemes, since the bilinear operation is not required.

...read moreread less

Abstract: Cloud storage is now a hot research topic in information technology. In cloud storage, date security properties such as data confidentiality, integrity and availability become more and more important in many commercial applications. Recently, many provable data possession (PDP) schemes are proposed to protect data integrity. In some cases, it has to delegate the remote data possession checking task to some proxy. However, these PDP schemes are not secure since the proxy stores some state information in cloud storage servers. Hence, in this paper, we propose an efficient mutual verifiable provable data possession scheme, which utilizes Diffie-Hellman shared key to construct the homomorphic authenticator. In particular, the verifier in our scheme is stateless and independent of the cloud storage service. It is worth noting that the presented scheme is very efficient compared with the previous PDP schemes, since the bilinear operation is not required.

...read moreread less

349 citations

Journal Article•DOI•

Twitter spammer detection using data stream clustering

[...]

Zachary Miller¹, Brian Dickinson¹, William Deitrick¹, Wei Hu¹, Alex Hai Wang² - Show less +1 more•Institutions (2)

Houghton College¹, Penn State College of Information Sciences and Technology²

01 Mar 2014-Information Sciences

TL;DR: To effectively handle the streaming nature of tweets, two stream clustering algorithms, StreamKM++ and DenStream, were modified to facilitate spam identification, and the system was able to identify 100% of the spammers in the authors' test while incorrectly detecting only 2.2% of normal users as spammers.

...read moreread less

293 citations

"CPSFS: A Credible Personalized Spam..." refers background in this paper

...Spam consumes network bandwidth and brings also other threats to recipients: unwanted advertisements and pornographic content, as well as malicious viruses [1]....
[...]

Journal Article•DOI•

Vehicular Social Networks: Enabling Smart Mobility

[...]

Zhaolong Ning¹, Feng Xia¹, Noor Ullah¹, Xiangjie Kong¹, Xiping Hu² - Show less +1 more•Institutions (2)

Dalian University of Technology¹, Chinese Academy of Sciences²

01 May 2017-IEEE Communications Magazine

TL;DR: An application scenario on trajectory data-analysis-based traffic anomaly detection for VSNs and several research challenges and open issues are highlighted and discussed.

...read moreread less

Abstract: Vehicular transportation is an essential part of modern cities. However, the ever increasing number of road accidents, traffic congestion, and other such issues become obstacles for the realization of smart cities. As the integration of the Internet of Vehicles and social networks, vehicular social networks (VSNs) are promising to solve the above-mentioned problems by enabling smart mobility in modern cities, which are likely to pave the way for sustainable development by promoting transportation efficiency. In this article, the definition of and a brief introduction to VSNs are presented first. Existing supporting communication technologies are then summarized. Furthermore, we introduce an application scenario on trajectory data-analysis-based traffic anomaly detection for VSNs. Finally, several research challenges and open issues are highlighted and discussed.

...read moreread less

286 citations

Journal Article•DOI•

Leveraging social networks to fight spam

[...]

P.O. Boykin¹, Vwani P. Roychowdhury²•Institutions (2)

University of Florida¹, University of California, Los Angeles²

01 Apr 2005-IEEE Computer

TL;DR: An automated antispam tool exploits the properties of social networks to distinguish between unsolicited commercial e-mail - spam - and messages associated with people the user knows.

...read moreread less

Abstract: Social networks are useful for judging the trustworthiness of outsiders. An automated antispam tool exploits the properties of social networks to distinguish between unsolicited commercial e-mail - spam - and messages associated with people the user knows.

...read moreread less

258 citations

"CPSFS: A Credible Personalized Spam..." refers background in this paper

...Similarly, Boykin and Roychowdhury [22] proposed a spam filtering approach based on social networks, which allows users to share the spam information with their friends to identify spam....
[...]