What are the contributions mentioned in the paper "Real-time detection of content polluters in partially observable twitter networks" ?

Identifying this type of bot is particularly challenging, with state-of-the-art methods utilising large volumes of network data as features for machine learning models. In this work the authors develop a methodology to detect content polluters in social media datasets that are streamed in real-time. The authors identify some peculiar characteristics of these bots in their dataset and propose metrics for identification of such accounts. The authors then pose some research questions around this type of bot detection, including: how good Twitter is at detecting content polluters and how well state-of-the-art methods perform in detecting bots in their dataset.

Why were the authors restricted to using only streamed tweets?

Due to rate limits on the public API and the high cost of accessing data, the authors were restricted to using only streamed tweets satisfying certain criteria.

How many bots were created on 20 February 2014?

A total of 109 political bot accounts were created on 20 February 2014 withonly 12 unique names, a strong indication of being a bot network.

What is the challenging aspect of this work?

The most challenging aspect of this work is to validate results since user perceptions are not always correct, and standard bot detection methods are very much prone to misclassification despite using complete twitter account information [9, 17, 18].

How did the authors detect content polluter accounts?

The authors reiterate that the authors detected content polluter accounts using message diversity since the authors did not have access to complete account information, whereas Truthy exploited features obtained from the complete user profile and network.

What did the researchers find interesting about the model?

The authors noted that the model was no longer erroneously predicting events related to ‘escorts’, which improved model performance noticeably.

How do the authors measure the extent of diversity?

The authors measure the extent of diversity in two ways: (1) The Gini coefficient (G ∈ R, G=[0,1]):G =∑n i=1 ∑n j=1 |udi − u d j |2n ∑n i=1 u d i, (1)where n is the number of users tweeting a particular URL.

What is the function that returns an error message when a user deletes a Twitter account?

Given a query for a specific account, the Twitter API returns an error message if the account is suspended by Twitter or deleted by the user.

What did the bots in the dataset help to remove noise?

The bots that the authors detected in their dataset helped to remove noise in the data and significantly improved the performance of prediction models.

(Open Access) Real-time Detection of Content Polluters in Partially Observable Twitter Networks (2018) | Mehwish Nasim

Q: What was the main goal of the research?

The main goal of their work was finding out content polluters in a dataset comprising tweets related to Australian social unrest events in real time, without access to complete profile information of the users.

Real-time Detection of Content Polluters in Partially Observable

Twier Networks

Mehwish Nasim

School of Mathematical Sciences

University of Adelaide

Adelaide, Australia

mehwish.nasim@adelaide.edu.au

Andrew Nguyen

School of Mathematical Sciences

University of Adelaide

Adelaide, Australia

andrew.nguyen03@adelaide.edu.au

Nick Lothian

∗

Tyto.ai

Adelaide, Australia

nick.lothian@gmail.com

Robert Cope

School of Mathematical Sciences

University of Adelaide

Adelaide, Australia

robert.cope@adelaide.edu.au

Lewis Mitchell

School of Mathematical Sciences

University of Adelaide

Adelaide, Australia

lewis.mitchell@adelaide.edu.au

ABSTRACT

Content polluters, or bots that hijack a conversation for political

or advertising purposes are a known problem for event prediction,

election forecasting and when distinguishing real news from fake

news in social media data. Identifying this type of bot is particularly

challenging, with state-of-the-art methods utilising large volumes

of network data as features for machine learning models. Such

datasets are generally not readily available in typical applications

which stream social media data for real-time event prediction. In

this work we develop a methodology to detect content polluters in

social media datasets that are streamed in real-time. Applying our

method to the problem of civil unrest event prediction in Australia,

we identify content polluters from individual tweets, without col-

lecting social network or historical data from individual accounts.

We identify some peculiar characteristics of these bots in our dataset

and propose metrics for identication of such accounts. We then

pose some research questions around this type of bot detection,

including: how good Twitter is at detecting content polluters and

how well state-of-the-art methods perform in detecting bots in our

dataset.

CCS CONCEPTS

• Information systems → Social networking sites

;

• Security

and privacy → Social network security and privacy;

KEYWORDS

Civil unrest, Social bots, Content polluters, Missing links, Twitter

ACM Reference Format:

Mehwish Nasim, Andrew Nguyen, Nick Lothian, Robert Cope, and Lewis

Mitchell. 2018. Real-time Detection of Content Polluters in Partially Observ-

able Twitter Networks. In WWW ’18 Companion: The 2018 Web Conference

∗

Work undertaken while at Data to Decisions CRC.

This paper is published under the Creative Commons Attribution 4.0 International

(CC BY 4.0) license. Authors reserve their rights to disseminate the work on their

personal and corporate Web sites with the appropriate attribution.

WWW ’18 Companion, April 23–27, 2018, Lyon, France

2018 IW3C2 (International World Wide Web Conference Committee), published

under Creative Commons CC BY 4.0 License.

ACM ISBN 978-1-4503-5640-4/18/04.

https://doi.org/10.1145/3184558.3191574

Companion, April 23–27, 2018, Lyon, France. ACM, New York, NY, USA,

9 pages. https://doi.org/10.1145/3184558.3191574

1 INTRODUCTION

1.1 Motivation

Bots and content polluters in online social media aect the socio-

political state of the world, from meddling in elections [

]

to inuencing US veterans [

]. In late September 2017, Twitter

admitted to Congress that it had found 200 Russian accounts that

overlapped with Facebook accounts which were used to sway Amer-

icans and create divisions during the elections held in 2016 [

]. Of

course, some bots are useful as well, for instance accounts that will

tweet alerts to people about natural disasters. The problem arises

when they try to inuence people or spread misinformation. The

importance of detecting bots in online social media has produced

an active research area on this topic [9, 21].

State-of-the-art methods for bot detection use historical patterns

of behaviour and a rich feature set including textual, temporal, and

social network features, to distinguish automated bots from real

human users [

]. However, for real-time application using large

streamed datasets, such methods can be prohibitive due to the sheer

volume, velocity, and incompleteness of data samples. In this work

we develop a new method to detect one particular type of social

bot – content polluters – in streamed microblog datasets such as

Twitter. Content polluters are bots that attempt to subvert a genuine

discussion by hijacking it for political or advertising purposes. As

we will show, these bots are a major concern for applications such

as real-time event prediction, such as social unrest, from social

media datasets.

1.2 Problem context

Social unrest prediction is a growing concern for governments

worldwide. This is evidenced by DARPA’s Open Source Intelligence

program, which produced numerous methods to predict the occur-

rence of future population-level events such as civil unrest, political

crises, election outcomes and disease outbreaks [

]. It

has been observed that social events are either preceded or fol-

lowed by changes in population-level communication behaviour,

consumption and movement. A large fraction of population-level

Track: 9th International Workshop on Modeling Social Media (MSM 2018)

Applying Machine Learning and AI for Modeling Social Media

WWW 2018, April 23-27, 2018, Lyon, France

1331

changes are implicitly reected in online data such as blogs, online

social networks, nancial markets, or search queries. Some of these

data sources have been shown to eectively detect population-level

events in real time. Methods have been developed for predicting

such events by fusing publicly available data from multiple sources.

There exists a plethora of research focused on social media-based

forecasting models, suggesting that features from micro-blogs such

as Twitter can predict and detect population-level events [

]. Once

one develops a “gold standard" (ground truth) record of known

events (e.g. election results, or protests occurring) models can be

trained using open source data to make predictions. A signicant

challenge for such models is noise reduction through ltering “fake

news”, removing misclassied or irrelevant tweets, or mitigating the

eects of missing data. This is of particular concern, as the changing

limits on accessing social media data remains a major challenge for

researchers [

]. Access to data through APIs and third parties can

be inconsistent, incomplete, and corrupted by noise in the form of

bots. Where bots are inuencing people through fake social media

accounts, they also act as content polluters on social media sites [

According to the Digital Forensics Research Lab (DFRL), “They can

make a group of six people look like a group of 46,000 people."

The main goal of our work was nding out content polluters

in a dataset comprising tweets related to Australian social unrest

events in real time, without access to complete prole information

of the users. Due to rate limits on the public API and the high

cost of accessing data, we were restricted to using only streamed

tweets satisfying certain criteria. While the actual event prediction

algorithm is not the primary concern of this paper, further detail

can be found in Osborne et al. [29].

1.3 Related Work

A social bot is a computer algorithm that automatically produces

content and interacts with humans on social media, trying to em-

ulate and possibly alter their behaviour [

]. Social bots inhabit

social media platforms, and online social networks are inundated

by millions of bots exhibiting increasingly sophisticated, human-

like behaviour. In the coming years a proliferation of social media

bots is expected as advertisers, criminals, politicians, governments,

terrorists, and other organizations attempt to inuence populations

[

]. This introduces dimensions for social bots, including social

network characteristics, temporal activity, diusion patterns, and

sentiment expression [14].

Ghost et al. [

] conducted an analysis on the follower/followee

links acquired by over 40,000 spammer accounts suspended by Twit-

ter. They showed that penalizing users for connecting to spammers

can be eective because it would de-incentivize users from linking

with other users in order to gain inuence. Yang et al. [

] found

that bot accounts in online social networks connect to each other

by chance and integrate into the social network just like normal

users. Network information along with content has been shown to

detect spam in online social networks [

]. While researchers were

proposing various bot-detection models, Lee et al. [

] identied

and engaged strangers on social media to eectively propagate

information/misinformation. They proposed a model to leverage

peoples’ social behaviour (online interactions) and users’ wait times

for retweeting.

Social bots evolve over time, making them resilient against stan-

dard bot detection approaches [

]. They are apt at changing discus-

sion topics and posting activities [

]. Researchers have proposed

complex models, such as those based on interaction graphs of suspi-

cious accounts [

]. An adversary often controls multiple

social bots known as a sybil. One strategy to detect such accounts

relies on investigating social graph structure, on the assumption

that sybil accounts link to a small number of legitimate users [

Behavioural patterns and sentiments analysis have also been used

for bot detection [

]. Such patterns can easily be encoded in fea-

tures, thus machine learning techniques can be used to distinguish

bot-like from human-like behaviour. Previous work uses network-

based features or content analysis for bot detection, along with

indicators such as temporal activity, retweets, and crowd sourcing

[

]. Such eorts require substantial network knowledge or the

ability to quickly query an API for a complete history of social

media postings by suspected bots. However, real-time applications,

such as streaming messages based on keywords or geographic lo-

cations, render this impractical. A major challenge therefore is

developing methodologies to detect and remove bots based on par-

tial information, message histories, and network knowledge, in real

time.

In this work we detect bots from individual tweets downloaded

for predicting social unrest in Australian cities. Given lters on

keywords and geographic location of events (such as protests, ral-

lies, civil disturbances) collected in real time, it leaves a small but

informative dataset for prediction. Predictions are generated in real

time by analysing data from online social media platforms such as

Twitter and validated against hand-labeled “Gold standard records”

(GSR) [

]. The GSR is created by the news analysts; after going

through a validation and cleaning process this data is ready to be

used as the ground truth. If Twitter data is contaminated with so-

cial bots, it can greatly degrade prediction models. It is therefore

imperative to develop techniques for detecting and removing social

bots for real-time data streams.

Contributions: Our scientic contributions are as follows:

(1)

We develop a method to identify social bots in data using only

partial information about the user and their tweet history,

in real time.

(2)

We present a new dataset of hand-labelled bots and legiti-

mate records, and use it to validate our method

(3)

We pose a set of research questions for evaluating whether

Twitter users, Twitter, or existing state-of-the-art bot detec-

tion methods could detect bots in our dataset or not.

1.4 Dataset

Our dataset consists of timestamped tweets from 1 January 2015 till

31 December 2016 from 5 major capital cities in Australia. Tweets

identify one of the following locations: ‘Australia’, ‘Adelaide’, ‘Bris-

bane’, ‘Melbourne’, ‘Perth’, or ‘Sydney’. The data are targeted at

studying civil unrest and intends to capture ways in which people

express opinions and organize marches, rallies, peaceful/violent

protests etc., within Australia. Such events aim to draw attention

toward an issue e.g., infrastructure, taxes, immigration laws etc.

Australia has a population of about 24

5 million people and, like

Data can be accessed on http://maths.adelaide.edu.au/mehwish.nasim/

Track: 9th International Workshop on Modeling Social Media (MSM 2018)

Applying Machine Learning and AI for Modeling Social Media

WWW 2018, April 23-27, 2018, Lyon, France

1332

Table 1: Data statistics

Parameters Adelaide Brisbane Melbourne Perth Sydney

Number of tweets 14087 5913 23720 8421 31568

Number of unique users 12039 3466 14611 6215 14515

Number of unique URLs 548 233 762 456 844

Average number of followers (in degree) 8812 9624 6733 5409 6052

Average number friends (out degree) 1223 1736 1517 1643 1860

Number of veried accounts 293 432 840 209 412

in many developed countries, predicting civil unrest events is of

interest to law enforcement agencies, government bodies, media

and academia. Notwithstanding this fact, the literature is devoid

of exploratory studies conducted on this population for real-time

prediction of civil unrest events. The basic statistics about protest-

related tweets in our dataset are reported in Table 1.

Note that the dataset was devoid of information on the alters (fol-

lowers/friends of egos), except for the total count of alters (numbers

of followers and friends).

2 DETECTING CONTENT POLLUTERS

We investigate two characteristics of tweets i.e., temporal informa-

tion and message diversity in a tweet.

Temporal Patterns:

In the rst step we were interested in 1).

users who tweet frequently, 2). pairs of users who tweet on the

same day using the desired keywords. Since no information about

the network of individual users is available, we cannot construct a

follower-friend network graph. Instead, we construct a two mode

user-event network. For all the events in the data we connect two

users if they have tweeted on the same event day. We represent

this problem in graph theoretic terms as follows:

Let

be a bipartite graph of users and events. Let

be the

set of users and let

be the set of events. Let

u, v ∈ U

and let

i, j ∈ V

. For any

i ∈ V

N (u) ∩ N (v) , {}

then

(u, v) ∈ E

the one-mode projection of the bipartite graph. The neighb ourhood

N (v)

of a vertex

v ∈ U

is the set of vertices that are adjacent to

. The resulting projection is an undirected loopless multigraph.

If the edge set

contains the same edge several times, then

a multiset. If an edge occurs several times in

, the copies of that

edge are called parallel edges. Graphs that have parallel edges are

also called multigraphs.

Similar to other social networks such as friendship networks,

event networks are a result of complex sociological processes with

a multitude of relations. When such relations are conated into

a dense network, the visualization often resembles a “hairball”.

Various approaches to declutter drawings of such networks exist

in the literature. We use the recent backbone layout approach for

network visualization [

], which accounts for strong ties (or mul-

tiplicity of edges) and uses the union of all maximum spanning

trees as a sparsier to ensure a connected subgraph. In Figure 1b,

the thickness of edges represents how often a pair of nodes tweet

on the same ’event day’

whereas, the size of the nodes indicates

the individual frequency of tweets by a user

. We noticed that bots

Event day was conrmed from the GSR.

Networks visualizations are created in visone (http://www.visone.info/).

(a) Two purple nodes at the right side that are loosely con-

nected to the core, are bots. They have tweeted together fre-

quently and their individual frequency to tweet is low as com-

pared to other nodes in the graph, however the dyadic (pair-

wise) frequency is higher.

(b) Two densely connected components in the tweets graph.

Figure 1: Graphs containing bots and legitimate users from

the Melbourne events network.

tweeted together frequently. Their individual frequency to tweet is

low as compared to other nodes in the graph, however the dyadic

(pairwise) frequency is higher. For instance, the two purple nodes

on the right have tweeted together frequently, in Figure 1a. Their

individual frequency to tweet is low as compared to other nodes

Track: 9th International Workshop on Modeling Social Media (MSM 2018)

Applying Machine Learning and AI for Modeling Social Media

WWW 2018, April 23-27, 2018, Lyon, France

1333

Figure 2: Graph containing bots and legitimate users from

the Melbourne events network.

in the graph, however the dyadic (pairwise) frequency is higher.

These two nodes are weakly connected to the core. Upon checking

their complete proles, the users were found to be political bots.

This motivated us to further explore the tweets-graph.

The core of the network (green nodes) were found to be news

channels and popular blogs in Australia, such as MelbLiveNews,

newsonaust, 7NewsMelbourne and LoversMelbourne to name a few.

Media accounts are likely to report population-level events on the

day of the events, thus they form a strongly-connected core of the

events network graph.

We then clustered all tweets in a similar manner to construct a

graph where two users have an edge between them if they have

tweeted on the same day, irrespective of whether there was an event

that day or not. We used the Louvain Method for clustering the

network [

], based on the concept of modularity. Optimizing the

modularity results in the best possible grouping of nodes in a given

network. We then found two strongly-connected components in

the graph: 1. News channels, and 2. Bots. We analysed the strongly-

connected vertex-induced subgraphs from the network. One such

component for the city of Melbourne is shown in Figure 2, which is a

strongly-connected component from Figure 1b. Bots are the purple

nodes (validated by manual inspection of proles). Green nodes

represent false positives. Orange nodes are not bots but are also not

relevant for predictions, since these users were not geographically

located in Australia and were tweeting about Victoria in the UK.

Message diversity:

We computed the diversity in the tweets

based upon mentions of URLs and hashtags. We selected the top

most tweeted URLs,

{K }

(

|K | =

20), and then ltered out the users

(

U ⊆ U

) who mentioned those URLs. The motivation for this ap-

proach is that an event prediction model should be resilient against

bot-URLs that are infrequently mentioned in the tweets, so these

will not greatly impact the prediction accuracy. We then computed

the following three measures for each of the remaining users: i).

total number of tweets containing any URL(s),

all

, ii). number of

tweets mentioning URL

k ∈ K

and iii). diversity score i.e., the

dierence between the two measures, u

= u

all

− u

We then plot the diversity score distribution for every

∈

for every URL

k ∈ K

. This immediately provides some relevant

insights about the behaviour of content polluters: Figure 3a shows a

legitimate URL (i.e., linked to by legitimate users), whereas, Figures

3b and 3c show bot-URLs (i.e., URLs linked to by bots). Users who

tweet these URLs are classied as potential bots. The gures show

that the diversity of users linking to legitimate URLs is generally

far greater than those linking to bot-URLs. The temporal patterns

of bot-URL mentions and those which are being tweeted at regular

intervals indicated that these users were indeed bots.

We measure the extent of diversity in two ways:

(1) The Gini coecient (G ∈ R, G=[0,1]):

G =

i=1

j=1

− u

i=1

, (1)

where n is the number of users tweeting a particular URL.

The Gini coecient

describes the relative amount of in-

equality in the distribution of diversity:

G =

0 indicates

complete equality while

G =

1 indicates complete inequality.

A high

suggests coordination among the observations. The

Gini coecient does not measure absolute inequality and

the interpretation can vary from situation to situation. Le-

gitimate accounts such as news channels, newspapers, and

famous activists are likely to tweet legitimate and diverse

URLs, thus the Gini coecient for legitimate URLs is high

as compared to illegitimate URLs. The Gini coecient for a

sample of ten URLs is shown in Figure 4.

(2)

Rank-size Rule: We observed that only a fraction of URLs

are mentioned very frequently in the tweets and very large

number of URLs barely nd their way in more than a single

tweet. It is interesting to note that cities and their rank also

follow a similar distribution; this pattern is generally known

as the rank-size rule [

]. This has also been observed in

various studies on calling behaviour of users [2][3] [27].

We t a curve on every user versus URL-diversity graph and

measure the coecient of determination

. Values close to zero

indicate that the model explains little of the variability of the re-

sponse data around its mean. For legitimate URLs, we obtained

values close to 1 (Figure 3).

Recently, Gilani et al. [

] evaluated the characteristics of auto-

mated versus human accounts by looking at complete tweet his-

tories. They initially hypothesized that bots tweet a number of

dierent URLs, however in the actual data they found that humans

may also post a number of URLs. Conversely, in this work we looked

at most frequently posted URLs and then for each URL we analysed

how diverse the users’ tweets are who are tweeting that URL.

We detected 849 bots in the data using message diversity on

URLs, which we call content polluters. These content polluters con-

tributed about 7% of tweets in the data. We computed some statistics

on content polluters versus legitimate users, shown in Figure 5. In

[

], authors argued that social bots tend to have recent accounts

with long names. However, we did not nd a signicant dierence

in our data between content polluters and regular users. The av-

erage account age of content polluters accounts was 2

9 years as

compared to legitimate users which was 4

2 years. This dierence

was signicant (

p <

01). This suggests that these particular type

of bot accounts are relatively old and have remained (potentially)

undetected by Twitter. The length of Twitter names for bots had

on average 11 characters as compared to non-bots that had 12 char-

acters. None of the bots had veried Twitter accounts. A total of

109 political bot accounts were created on 20 February 2014 with

Track: 9th International Workshop on Modeling Social Media (MSM 2018)

Applying Machine Learning and AI for Modeling Social Media

WWW 2018, April 23-27, 2018, Lyon, France

1334

0 100 200 300 400 500 600

2 4 6 8 10 12 14

users

diversity

(a) Legitimate (Gini = 0.8, R

= 0. 98)

0 100 200 300

1.0 1.2 1.4 1.6 1.8 2.0

users

diversity

(b) Bots (Gini = 0.32, R

= 0)

0 50 100 150

1.5 2.0 2.5

users

diversity

= 0)

Figure 3: Message diversity measured through 3 URLs for bots and genuine users.

Gini

0.0 0.2 0.4 0.6 0.8

www.digitaltrends.com

linkis.com

www.9news.com.au

www.theguardian.com

www.facebook.com

www.youtube.com

www.heraldsun.com.au

www.theage.com.au

www.abc.net.au

www.mojahedin.org

twitter.com

Figure 4: Gini score for ten URLs. High Gini coecient indi-

cates a legitimate URL. The three URLs with the lowest Gini

coecients were being tweeted by content-polluting bots.

only 12 unique names, a strong indication of being a bot network.

We also found several digital media bot accounts. Such accounts

aim at becoming famous by attracting followers [

]. A set of such

accounts was created on 30 March 2016. This set consisted of 8

accounts with an average friend count of 4099 and follower count

of 1112.

We also explored the dataset from [

] using our algorithm. The

dataset contains more than 600k tweets. The Gini coecient for

each dataset (bots and non-bots) was around 0.5, hence we remain

inconclusive. The data set from Gilani et al. [

] only consisted

of the number of URLs each user mentioned, therefore it was not

possible to check the relative frequency of any particular URL.

We argue that the nature of content polluting bots makes them

dicult to distinguish in traditional bot-detection datasets. This

motivates our research questions below and the creation of a new

human-validated content-pollution dataset in the next section.

3 CREATING A CONTENT-POLLUTING BOT

DATASET

Given the peculiarities in the bot accounts that we found in our

analysis, we move on to some pertinent research questions.

3.1 Do humans succeed in detecting content

polluters?

We conducted a user study to hand-label a set of Twitter accounts

that contained equal number of content polluters (from our list

obtained in the previous section) and legitimate accounts. We asked

three independent hand-labellers to create the dataset. Users were

rst shown several examples of content polluters as well as of

legitimate accounts. All three participants were well versed with

using Twitter. All participants found it very dicult to assess non-

English accounts even with automatic translation.

The participants recorded the following comments:

Participant 1

Domain Knowledge: Advance Twitter User

Comments: “What I’m struggling with is that, the user doesn’t

actually initiates a suspicious tweet. He simply retweets a whole

bunch of content polluting tweets".

Strategy:

•

If user has tweeted or retweeted from well known news

spam sites then mark as bot.

•

Otherwise look through pattern of tweets, if very spamy

tweet behaviour, for example highly consistent frequency

of tweeting behaviour and tweets are from a single source

then mark as bot.

•

See if they regular mention and interact with other twitter

users which indicates a good sign for a regular account.

•

Look at prole details and follower and followees ratio

to distinguish if it appears like a regular account or a bot.

Track: 9th International Workshop on Modeling Social Media (MSM 2018)

Applying Machine Learning and AI for Modeling Social Media

WWW 2018, April 23-27, 2018, Lyon, France

1335

Real-time Detection of Content Polluters in Partially Observable Twitter Networks

Figures

Citations

Detect Me If You Can: Spam Bot Detection Using Inductive Representation Learning

You talkin’ to me? Exploring Human/Bot Communication Patterns during Riot Events

DeeProBot: a hybrid deep neural network model for social bot detection based on user profile data

Fake news detection based on explicit and implicit signals of a hybrid crowd: An approach inspired in meta-learning

#ArsonEmergency and Australia's "Black Summer": Polarisation and Misinformation on Social Media.

References

Fast unfolding of communities in large networks

The rise of social bots

Social bots distort the 2016 U.S. Presidential election online discussion

BotOrNot: A System to Evaluate Social Bots

The Size Distribution of Cities: An Examination of the Pareto Law and Primacy

Related Papers (5)

The rise of social bots

BotOrNot: A System to Evaluate Social Bots

The Paradigm-Shift of Social Spambots: Evidence, Theories, and Tools for the Arms Race

Social bots distort the 2016 U.S. Presidential election online discussion

Fast unfolding of communities in large networks

Frequently Asked Questions (12)

Q1. What are the contributions mentioned in the paper "Real-time detection of content polluters in partially observable twitter networks" ?

Q2. Why were the authors restricted to using only streamed tweets?

Q3. How many bots were created on 20 February 2014?

Q4. What is the purpose of the data?

Q5. What was the main goal of the research?

Q6. What is the challenging aspect of this work?

Q7. How did the authors detect content polluter accounts?

Q8. What did the researchers find interesting about the model?

Q9. How do the authors measure the extent of diversity?

Q10. What is the function that returns an error message when a user deletes a Twitter account?

Q11. What did the bots in the dataset help to remove noise?

Q12. What is the way to mark a bot?