CPSFS: A Credible Personalized Spam Filtering Scheme by Crowdsourcing

doi:10.1155/2017/1457870

Home
/
Papers
/
CPSFS: A Credible Personalized Spam Filtering Scheme by Crowdsourcing

Journal Article•DOI•

CPSFS: A Credible Personalized Spam Filtering Scheme by Crowdsourcing

Xin Liu¹, Zou Pingjun¹, Weishan Zhang¹, Jiehan Zhou², Changying Dai¹, Wang Feng¹, Xiaomiao Zhang¹ - Show less +3 more•Institutions (2)

China University of Petroleum¹, University of Oulu²

27 Dec 2017-Wireless Communications and Mobile Computing (Hindawi)-Vol. 2017, pp 1-9

TL;DR: The experimental results show that the proposed CPSFS can improve the accuracy rate of distinguishing spam from legitimate emails compared with that of Bayesian filter alone.

read less

Abstract: Email spam consumes a lot of network resources and threatens many systems because of its unwanted or malicious content. Most existing spam filters only target complete-spam but ignore semispam. This paper proposes a novel and comprehensive CPSFS scheme: Credible Personalized Spam Filtering Scheme, which classifies spam into two categories: complete-spam and semispam, and targets filtering both kinds of spam. Complete-spam is always spam for all users; semispam is an email identified as spam by some users and as regular email by other users. Most existing spam filters target complete-spam but ignore semispam. In CPSFS, Bayesian filtering is deployed at email servers to identify complete-spam, while semispam is identified at client side by crowdsourcing. An email user client can distinguish junk from legitimate emails according to spam reports from credible contacts with the similar interests. Social trust and interest similarity between users and their contacts are calculated so that spam reports are more accurately targeted to similar users. The experimental results show that the proposed CPSFS can improve the accuracy rate of distinguishing spam from legitimate emails compared with that of Bayesian filter alone.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A lifelong spam emails classification model

[...]

Rami Mustafa A. Mohammad¹•Institutions (1)

University of Dammam¹

23 Jan 2020-Applied Computing and Informatics

TL;DR: An enhanced model is proposed for ensuring lifelong spam classification model and the overall performance of the suggested model is contrasted against various other stream mining classification techniques to prove the success of the proposed model as a lifelong spam emails classification method.

...read moreread less

17 citations

Journal Article•DOI•

Design and Simulation of MIMO Antennas for Mobile Communication

[...]

D. S. Bhargava, T. V. Padmavathy, Yadhamuri Vinitha Reddy, Neelapareddy Kavitha, Vuppalapati Hema - Show less +1 more

01 Dec 2020

7 citations

Journal Article•DOI•

Blockchain-Based Crowdsourcing Makes Training Dataset of Machine Learning No Longer Be in Short Supply

[...]

Haitao Xu, Wei Wei, Yong Qi, Saiyu Qi

26 Jul 2022-Wireless Communications and Mobile Computing

TL;DR: This paper reviews studies applying mobile crowdsourcing to training dataset collection and annotation and proposes a new possible combination of machine learning and crowdsourcing systems.

...read moreread less

Abstract: Recently, machine learning has become popular in various fields like healthcare, smart transportation, network, and big data. However, the labelled training dataset, which is one of the most core of machine learning, cannot meet the requirements of quantity, quality, and diversity due to the limitation of data sources. Crowdsourcing systems based on mobile computing seem to address the bottlenecks faced by machine learning due to their unique advantages; i.e., crowdsourcing can make professional and nonprofessional participate in the collection and annotation process, which can greatly improve the quantity of the training dataset. Additionally, distributed blockchain technology can be embedded into crowdsourcing systems to make it transparent, secure, traceable, and decentralized. Moreover, truth discovery algorithm can improve the accuracy of annotation. Reasonable incentive mechanism will attract many workers to provide plenty of dataset. In this paper, we review studies applying mobile crowdsourcing to training dataset collection and annotation. In addition, after reviewing researches on blockchain or incentive mechanism, we propose a new possible combination of machine learning and crowdsourcing systems.

...read moreread less

3 citations

Journal Article•DOI•

SentiFilter: A Personalized Filtering Model for Arabic Semi-Spam Content based on Sentimental and Behavioral Analysis

[...]

Mashael M. Alsulami, Arwa Yousef Al-Aama

01 Jan 2020-International Journal of Advanced Computer Science and Applications

TL;DR: The proposed SentiFilter model is a hybrid model that combines both sentimental and behavioral factors to detect unwanted content for each user towards pre-defined topics and is expected to provide an effective automated solution for filtering semi-spam content in favor of personalized preferences.

...read moreread less

Abstract: Unwanted content in online social network services is a substantial issue that is continuously growing and negatively affecting the user-browsing experience. Current practices do not provide personalized solutions that meet each individual’s needs and preferences. Therefore, there is a potential demand to provide each user with a personalized level of protection against what he/she perceives as unwanted content. Thus, this paper proposes a personalized filtering model, which we named SentiFilter. It is a hybrid model that combines both sentimental and behavioral factors to detect unwanted content for each user towards pre-defined topics. An experiment involving 80,098 Twitter messages from 32 users was conducted to evaluate the effectiveness of the SentiFilter model. The effectiveness was measured in terms of the consistency between the implicit feedback derived from the SentiFilter model towards five selected topics and the explicit feedback collected explicitly from participants towards the same topics. Results reveal that commenting behavior is more effective than liking behavior to detect unwanted content because of its high consistency with users’ explicit feedback. Findings also indicate that sentiment of users’ comments does not reflect users’ perception of unwanted content. The results of implicit feedback derived from the SentiFilter model accurately agree with users’ explicit feedback by the indication of the low statistical significance difference between the two sets. The proposed model is expected to provide an effective automated solution for filtering semi-spam content in favor of personalized preferences.

...read moreread less

3 citations

Cites background from "CPSFS: A Credible Personalized Spam..."

...Therefore, a trust value needs to be assigned and computed for each contact [3]....
[...]
...[3] classified spam emails into two categories: complete spam and semispam emails....
[...]
...Studies that involved users‟ perspectives in identifying spam content have used terms such as semi-spam [3] and grey spam [2]....
[...]

Journal Article•DOI•

Hyperparameter Optimization of Ensemble Models for Spam Email Detection

[...]

Temidayo Oluwatosin Omotehinwa, David Opeoluwa Oyewola

03 Feb 2023-Applied Sciences

TL;DR: In this paper , the authors developed baseline models of random forest and extreme gradient boost (XGBoost) ensemble algorithms for the detection and classification of spam emails using the Enron1 dataset.

...read moreread less

Abstract: Unsolicited emails, popularly referred to as spam, have remained one of the biggest threats to cybersecurity globally. More than half of the emails sent in 2021 were spam, resulting in huge financial losses. The tenacity and perpetual presence of the adversary, the spammer, has necessitated the need for improved efforts at filtering spam. This study, therefore, developed baseline models of random forest and extreme gradient boost (XGBoost) ensemble algorithms for the detection and classification of spam emails using the Enron1 dataset. The developed ensemble models were then optimized using the grid-search cross-validation technique to search the hyperparameter space for optimal hyperparameter values. The performance of the baseline (un-tuned) and the tuned models of both algorithms were evaluated and compared. The impact of hyperparameter tuning on both models was also examined. The findings of the experimental study revealed that the hyperparameter tuning improved the performance of both models when compared with the baseline models. The tuned RF and XGBoost models achieved an accuracy of 97.78% and 98.09%, a sensitivity of 98.44% and 98.84%, and an F1 score of 97.85% and 98.16%, respectively. The XGBoost model outperformed the random forest model. The developed XGBoost model is effective and efficient for spam email detection.

...read moreread less

2 citations

References

PDF

Open Access

More filters

Journal Article•DOI•

Speed Up Statistical Spam Filter by Approximation

[...]

Zhenyu Zhong¹, Kang Li²•Institutions (2)

McAfee¹, University of Georgia²

01 Jan 2011-IEEE Transactions on Computers

TL;DR: This study proposes a series of acceleration techniques that speed up Bayesian filters based on approximate classifications and demonstrates a 6× speedup over two well-known spam filters while achieving an identical false positive rate and similar false negative rate to the original filters.

...read moreread less

Abstract: Statistical-based Bayesian filters have become a popular and important defense against spam. However, despite their effectiveness, their greater processing overhead can prevent them from scaling well for enterprise level mail servers. For example, the dictionary lookups that are characteristic of this approach are limited by the memory access rate, therefore relatively insensitive to increases in CPU speed. We conduct a comprehensive study to address this scaling issue by proposing a series of acceleration techniques that speed up Bayesian filters based on approximate classifications. The core approximation technique uses hash-based lookup and lossy encoding. Lookup approximation is based on the popular Bloom filter data structure with an extension to support value retrieval. Lossy encoding is used to further compress the data structure. While these approximation methods introduce additional errors to a strict Bayesian approach, we show how the errors can be both minimized and biased toward a false negative classification. We demonstrate a 6× speedup over two well-known spam filters (bogofilter and qsf) while achieving an identical false positive rate and similar false negative rate to the original filters.

...read moreread less

17 citations

"CPSFS: A Credible Personalized Spam..." refers methods in this paper

...All emails of a user are examined by a Bayesian filter at an email server before they reach clients [28]....
[...]

Book Chapter•DOI•

Fast Computation of Similarity Based on Jaccard Coefficient for Composition-Based Image Retrieval

[...]

Michihiro Kobayakawa, Shigenao Kinjo, Mamoru Hoshi, Tadashi Ohmori, Atsushi Yamamoto - Show less +1 more

15 Dec 2009

TL;DR: An algorithm and data structure for fast computation of similarity based on Jaccard coefficient to retrieve images with regions similar to those of a query image to use the runlength description of an image for computing the number of overlapped pixels between the regions.

...read moreread less

Abstract: This paper proposes an algorithm and data structure for fast computation of similarity based on Jaccard coefficient to retrieve images with regions similar to those of a query image. The similarity measures the degree of overlap between the regions of an image and those of another image. The key idea for fast computation of the similarity is to use the runlength description of an image for computing the number of overlapped pixels between the regions. We present an algorithm and data structure, and do experiments on 30,000 images to evaluate the performance of our algorithm. Experiments showed that the proposed algorithm is 5.49 (2.36) times faster than a naive algorithm on the average (the worst). And we theoretically gave fairly good estimates of the computation time.

...read moreread less

17 citations

"CPSFS: A Credible Personalized Spam..." refers background in this paper

...The more the mutual interests and disinterests between a user and his or her contacts are, the more similar they are [26]....
[...]

Proceedings Article•DOI•

A Dynamic Trust Conference Algorithm for Social Network

[...]

Liu Xin, Shi Leyi, Wang Yao, Xin Zhaojun, Fu Wenjing - Show less +1 more

28 Oct 2013

TL;DR: By statistical analysis of trust value in social network, this algorithm improved the accuracy of trust transitivity and trust value computing compared with a classical trust algorithm.

...read moreread less

Abstract: More and more users joined in social network. The precise social trust value is critical for application system such as recommendation system. To a user, the egocentric network is formed by the user, his friends and social relationships between him and other users. We proposed an algorithm for inferring dynamic trust based on trust chains and interactions. Indirect trust values are calculated depending on direct trust values and trust chains in the egocentric network. As the social network evolves, the dynamic trust values can be resulted from the interactions between a user and his friends and trust reference. By statistical analysis of trust value in social network, this algorithm improved the accuracy of trust transitivity and trust value computing compared with a classical trust algorithm.

...read moreread less

14 citations

"CPSFS: A Credible Personalized Spam..." refers methods in this paper

...Social Computing Approach....
[...]
...We divided the existing work into four types based on the used techniques: the Black/White List, Bayesian,Machine Learning, and Social Computing....
[...]
...Social trust can be calculated by analyzing Social Computing [15, 16]....
[...]

Journal Article•DOI•

Patching by automatically tending to hub nodes based on social trust

[...]

Xin Liu¹, Yao Wang¹, Dehai Zhao¹, Weishan Zhang¹, Leyi Shi¹ - Show less +1 more•Institutions (1)

China University of Petroleum¹

01 Feb 2016-Computer Standards & Interfaces

TL;DR: Experiments show the distributed patching mechanism proposed in which the patch can tend to hub nodes automatically based on social computing in social networks is more efficient than other patching mechanisms.

...read moreread less

7 citations

"CPSFS: A Credible Personalized Spam..." refers methods in this paper

...Social Computing Approach....
[...]
...We divided the existing work into four types based on the used techniques: the Black/White List, Bayesian,Machine Learning, and Social Computing....
[...]
...Social trust can be calculated by analyzing Social Computing [15, 16]....
[...]

Proceedings Article•

Mailbook: A social network against spamming

[...]

Dimitris Zisiadis¹, Spyros Kopsidas¹, Argyris Varalis¹, Leandros Tassiulas¹•Institutions (1)

University of Thessaly¹

01 Dec 2011

TL;DR: A user based collaborative approach to address the spam problem, exchanging vote databases containing the hash values of the emails perceived as spam by its users, the mailbook is proposed.

...read moreread less

Abstract: Spam is the main problem of email systems nowadays. The total amount of spam emails account for more than 75% of the total emails exchanged worldwide; recent reports raise this number up to more than 90%. Novel anti-spam solutions are proposed constantly, to be followed by announcements of sophisticated methods to overcome them through the use of advanced software to reach the spammers' goal. In this paper we propose a collaborative spam filter over a social network, exchanging vote databases containing the hash values of the emails perceived as spam by its users, the mailbook. Social networks are blooming nowadays and users are accustomed to their use more and more every day. Our proposal builds upon that strong attachment between friends and people with the same interests and habits. We propose a user based collaborative approach to address the spam problem. Users characterize spam mail and exchange their votes among their friends through mailbook. User profiles are created in mailbook to express user interests, which in turn are used for evaluating mail as spam according to the user's characteristics. Users also form groups of interests which are also used by our method as another mean to evaluate spam for the specific group in a more effective way.

...read moreread less

4 citations