Detecting Spammers on Twitter

Open Access

Detecting Spammers on Twitter

Chats0

TLDR

This paper uses tweets related to three famous trending topics from 2009 to construct a large labeled collection of users, manually classified into spammers and non-spammers, and identifies a number of characteristics related to tweet content and user social behavior which could potentially be used to detect spammers.

Abstract:

With millions of users tweeting around the world, real time search systems and dierent types of mining tools are emerging to allow people tracking the repercussion of events and news on Twitter. However, although appealing as mechanisms to ease the spread of news and allow users to discuss events and post their status, these services open opportunities for new forms of spam. Trending topics, the most talked about items on Twitter at a given point in time, have been seen as an opportunity to generate trac and revenue. Spammers post tweets containing typical words of a trending topic and URLs, usually obfuscated by URL shorteners, that lead users to completely unrelated websites. This kind of spam can contribute to de-value real time search services unless mechanisms to fight and stop spammers can be found. In this paper we consider the problem of detecting spammers on Twitter. We first collected a large dataset of Twitter that includes more than 54 million users, 1.9 billion links, and almost 1.8 billion tweets. Using tweets related to three famous trending topics from 2009, we construct a large labeled collection of users, manually classified into spammers and non-spammers. We then identify a number of characteristics related to tweet content and user social behavior, which could potentially be used to detect spammers. We used these characteristics as attributes of machine learning process for classifying users as either spammers or nonspammers. Our strategy succeeds at detecting much of the spammers while only a small percentage of non-spammers are misclassified. Approximately 70% of spammers and 96% of non-spammers were correctly classified. Our results also highlight the most important attributes for spam detection on Twitter.

Detecting Spammers on Twitter

Citations

Information credibility on twitter

Processing Social Media Messages in Mass Emergency: A Survey

A Survey of Techniques for Event Detection in Twitter

Design and Evaluation of a Real-Time URL Spam Filtering Service

Truthy: mapping the spread of astroturf in microblog streams

References

Data Mining: Practical Machine Learning Tools and Techniques

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

What is Twitter, a social network or a news media?

A Comparative Study on Feature Selection in Text Categorization

Measuring User Influence in Twitter: The Million Follower Fallacy

Related Papers (5)

Detecting spammers on social networks

@spam: the underground on 140 characters or less

Don't follow me: Spam detection in Twitter

Uncovering social spammers: social honeypots + machine learning

What is Twitter, a social network or a news media?