What are the contributions mentioned in the paper "Can we predict a riot? disruptive event detection using twitter" ?

In this paper, Alsaedi et al. used temporal, spatial and textual features to detect small-scale events.

What future works have the authors mentioned in the paper "Can we predict a riot? disruptive event detection using twitter" ?

There are many directions for future work. Finally, the detection of rumors in the social media, the analysis of the distinctive characteristics of rumors and the way in which they propagate in the microblogging communities will be addressed in the future. Spammer detection in various online social networking platforms is another interesting task that is reserved for future work. The authors intend to further evaluate the summarization output to not only map onto real events, but to provide qualitatively useful output for decision making.

What are the classification algorithms used in the experiment?

The classification algorithms used in the experiment were: Naive Bayes [Lewis 1998] a statistical classifier based on the Bayes’ theorem; Logistic Regression [Friedman et al. 1998], a generalized linear model to apply regression to categorical variables; and support vector machines (SVMs) [Joachims 1998] which aims at maximizing (maximum margin) the minimum distance between two classes of data using a hyperplane that separates them.

What are the discriminative features of the NDCG?

The near-duplicate measure, the favourite ratio and the positive sentiment ratio are the least discriminative features, which suggest that they appear in all different types of posts, not only in disruptive events.

How can it be used to detect large and small events?

using an online clustering algorithm with a sliding window timeframe, it can be utilised to detect large and small-scale events from social media streams - with particular attention to filtering from large to small-scale events.

What is the proposed framework for detecting events?

Their proposed framework is based on collecting data over time windows for a given location which supports the automatic detection and summarization of events from social media.

What is the effect of using supervised classification of each tweet before clustering?

Employing supervised classification of each tweet before clustering (large scale event detection) reduces the computational overhead at the clustering stage as the number of tweets is significantly reduced (containing only event-related tweets).

What is the effect of the removal of posts that were less than 3 words long?

posts that were less than 3 words long were removed, as were messages where over half the total words were the same word, since these posts were less likely to have useful information.

Why did the authors choose to use an online clustering algorithm?

The decision to use an online clustering algorithm was taken for three main reasons: (i) it supports high dimensional data as it effectively handles the large volume of social media data produced around events; (ii) many clustering algorithms such as K-means require previous knowledge of the number of clusters.

What are the advantages of using a clustering algorithm?

Thus clustering (small-scale event detection), feature selection and summarization are much faster and suitable for real-time analysis.

What is the method for evaluating the clustering algorithm?

The authors evaluate the algorithm’s performance on the training data using a range of thresholds, and identify the threshold setting that yields the highest-quality solution according to a given clustering quality metric (here the authors implement the f-measure).

What is the way to obtain a good performance of the textual feature model?

Using the textual feature model, the authors are still able to obtain a reasonable performance of on average, 40% content about an event, provides situational awareness information about that event.

How did the researchers find the way to detect a riot?

Their experiments suggest that their framework yields better performance than many leading approaches in real-time event detection, and using a real-world ground truth published by the Metropolitan Police Services (MPS) after the 2011 riots in England, the authors showed their system to detect events far quicker than they were reported to MPS.

How did the authors propose to combine the summarization of tweets?

More fine-grained summarization was proposed by considering sub-events detection and combining the summaries extracted from each sub-topic (tweet selection, tweet ranking) [Shen et al. 2013; Yajuan et al. 2012; Zubiaga et al. 2012].

(Open Access) Can We Predict a Riot? Disruptive Event Detection Using Twitter (2017) | Nasser Alsaedi

This is an Open Access document downloaded from ORCA, Cardiff University's institutional

repository: http://orca.cf.ac.uk/99582/

This is the author’s version of a work that was submitted to / accepted for publication.

Citation for final published version:

Alsaedi, Nasser, Burnap, Pete and Rana, Omer 2017. Can we predict a riot? Disruptive event

detection using Twitter. ACM Transactions on Internet Technology 17 (2) , 18. 10.1145/2996183 file

Publishers page: http://dx.doi.org/10.1145/2996183 <http://dx.doi.org/10.1145/2996183>

Please note:

Changes made as a result of publishing processes such as copy-editing, formatting and page

numbers may not be reflected in this version. For the definitive version of this publication, please

refer to the published source. You are advised to consult the publisher’s version if you wish to cite

this paper.

This version is being made available in accordance with publisher policies. See

http://orca.cf.ac.uk/policies.html for usage policies. Copyright and moral rights for publications

made available in ORCA are retained by the copyright holders.

Can we predict a riot? Disruptive Event Detection using

Twitter

NASSER ALSAEDI, PETE BURNAP, and OMER RANA, Cardiff University

In recent years, there has been increased interest in real-world event detection using publicly accessible data made available

through Internet technology such as Twitter, Facebook and YouTube. In these highly interactive systems the general public are

able to post real-time reactions to “real world” events - thereby acting as social sensors of terrestrial activity. Automatically

detecting and categorizing events, particularly small-scale incidents, using streamed data is a non-trivial task, but would be

of high value to public safety organisations such as local Police, who need to respond accordingly. To address this challenge

we present an end-to-end integrated event detection framework which comprises ﬁve main components: data collection, pre-

processing, classiﬁcation, online clustering and summarization. The integration between classiﬁcation and clustering enables

events to be detected, as well as related smaller scale “disruptive events” - smaller incidents that threaten social safety and

security, or could disrupt social order. We present an evaluation of the eﬀectiveness of detecting events using a variety of

features derived from Twitter posts, namely: temporal, spatial and textual content. We evaluate our framework on a large-scale,

real-world dataset from Twitter. Furthermore, we apply our event detection system to a large corpus of tweets posted during

the August 2011 riots in England. We use ground truth data based on intelligence gathered by the London Metropolitan Police

Service, which provides a record of actual terrestrial events and incidents during the riots, and show that our system can perform

as well as terrestrial sources, even better in some cases.

CCS Concepts: •H.3.3 [Information Storage and Retrieval] → Clustering; •H.2.8 [Database Applications] → Data min-

ing; •I.5.2 [Design Methodology] → Feature evaluation and selection;

Additional Key Words and Phrases: Social media, Event Detection, Classiﬁcation, Clustering, Feature selection, Evaluation.

1. INTRODUCTION

The rapid growth of Internet-enabled communication technology in the form of social networking ser-

vices (often collectively referred to as social media) and associated smartphone apps has enabled bil-

lions of global citizens to broadcast news and ‘on the ground’ information during ‘real world’ events as

they unfold. Twitter, for example, has been studied as an emerging news reporting platform [Osborne

et al. 2013; Phuvipadawat and Murata 2010; Weng and Lee 2011] and has been widely used to dis-

seminate information about the Arab Spring [Alsaedi and Burnap 2015; Starbird and Palen 2012] and

other disaster-related incidents [Burnap et al. 2014; Imran et al. 2015; Shamma. et al. 2010; Thelwall

et al. 2011; Williams and Burnap 2015]. The interaction between people, events, and Internet-enabled

technology, presents both an opportunity and a challenge to Social Computing scholars, public sector

organisations (e.g. governments and policing agencies), and private sector, all of whom aim to under-

stand how events are reported using social media and how millions of online posts can be reduced to

accurate but meaningful information that can support decision making and lead to productive action.

Research in recent years has uncovered the increasingly important role of utilising data from social

networking sites in disaster situations, and shown that information broadcast via social media can

enhance situational awareness during a crisis situation [Alsaedi et al. 2015; Vieweg et al. 2010, 2014].

In particular, members of the public, formal response agencies and local, national and international aid

organizations are all aware of the ability to use social media to gather and disperse timely information

in the aftermath of disaster [Chowdhury et al. 2013; Imran et al. 2014; Iyengar et al. 2011]. However,

many existing approaches to event detection are limited to global or large-scale event detection (e.g.

ACM Transactions on Internet Technology, Vol. 0, No. 0, Article 0, Publication date: 0000.

0:2

•

N. Alsaedi, P. Burnap, O. Rana

natural disasters and terror attacks), while detecting small-scale incidents such as ﬁres, car accidents,

and public order events remains an ongoing research topic due to several key challenges.

One challenge is that online posts are often constrained in length (referred to as microblogs), which

means that only a small amount of text is available to be analysed to gain insights. Within the text

there are other challenges, such as frequent use of informal, irregular, and abbreviated words; a large

number of spelling and grammatical errors; and the use of improper sentence structure and mixed lan-

guages [Becker et al. 2011a; Farzindar and Wael 2015; Imran et al. 2015]. Some languages are more

challenging than others, for example Arabic users use dialects heavily as well as a mixture of Latin

and Arabic characters (Arabizi) [Alsaedi and Burnap 2015]. These dialects may differ in vocabulary,

morphology, and spelling from the standard Arabic and most do not have standard spellings. Addi-

tionally, social networking services’ popularity have attracted spammers and other content polluters

to spread advertisements, pornography, viruses, phishing and other malicious material that cloud the

information analysis [Burnap et al. 2015; Farzindar and Wael 2015].

Despite these challenges, it has been noted that detecting small-scale events is essential to improv-

ing situational awareness of both citizens and decision makers [Li et al. 2012; Schulz et al. 2015;

Walther and Kaisser 2013] and thus remains a well motivated research topic for the Social Computing

community. In this article, we propose a novel approach to event detection that aims to overcome many

of the challenges to provide a system to detect large-scale events and related small-scale events. The

approach is based on the integration of supervised machine learning algorithms to detect larger scale

events, and unsupervised approaches to cluster, disambiguate and summarize smaller sub-events, with

a goal of improving situational awareness in emergency situations through automatic methods. Our

contributions can be summarized as follows:

—Using temporal, spatial and textual features, our approach is able to detect small-scale events in a

given place and time better than existing algorithms, to which we compare our performance results;

—While other related work focuses on large or small scale events, our approach can identify large and

related small scale events. Thus, our approach retains the context of smaller events (e.g. distinguish-

ing between public disorder related to an event, and general disorder);

—of the related event detection work is dependent on utilising event-speciﬁc terms and phrases but we

propose a novel approach to summarizing microblog posts corresponding to events without the need

for prior knowledge of the entire data set. That is, in real-time and not post-event. Our approach is

based on modifying a term frequency algorithm to include a dynamic temporal aspect;

—We demonstrate that our proposed approach can identify the relationship between content posted

via social media, and ’real world’ events by using time-stamped social media data and actual crime

reports to accurately ﬂag events prior to their known reporting time throughout a study period,

using human annotated Twitter data as an example data source;

—We present a case study of our approach by evaluating it against other leading approaches using

Twitter posts from the UK riots in 2011, and a publicly accessible account of actual reported intel-

ligence obtained and reports received by the Metropolitan Police Service during this event. Smaller

scale events include localized looting, violence and criminal damage. Results show that our system

can perform as well as terrestrial sources at detecting events related to the riots - in some cases we

detect the event before intelligence reports were recorded.

The rest of this article is organized as follows: Section 2 reviews related work. Sections 3 and 4

deﬁne the problem of event detection using data from social networking services, and discuss the

technical architecture and algorithms developed as part of our proposed system. In section 5 we present

and analyze several features, namely temporal, spatial and textual features. Section 6 presents our

ACM Transactions on Internet Technology, Vol. 0, No. 0, Article 0, Publication date: 0000.

Can we predict a riot? Disruptive Event Detection using Twitter

•

0:3

experiments and discusses the results. In section 7 we conclude and highlight some directions for

future research.

2. RELATED WORK

The general topic of detecting real-world events from social media has received considerable research

interest. Research efforts have focused on real-time event detection and tracking, social media analysis,

micro-blog summarization and information visualisation. We describe relevant related work in three

areas: large-scale (global) event detection, small-scale (local) event detection, and systems used to

extract crisis relevant information from social media.

For large-scale events [Petrovi

c et al. 2010] presented an approach to detect breaking stories from

a stream of tweets using locality-sensitive hashing (LSH). [Becker et al. 2011a] proposed an online

clustering framework to identify different types of real-world events. Then, they use different machine

learning models to predict whether a pair of documents belong to real-world events or not. These

approaches are limited to widely discussed events and fail to report rare and potentially disruptive

small-scale incidents.

Large-scale event detection has also been explored through clustering of discrete wavelet signals

built from individual words generated by Twitter [Weng and Lee 2011]. Auto-correlation then ﬁlters

away the trivial words (noise) and cross correlation groups together words that relate to an event by

modularity-based graph partitioning. Similarly, [Cordeiro 2012] proposed a continuous wavelet trans-

formation based on hashtag occurrences combined with a topic model inference using Latent Dirichlet

Allocation (LDA) [Blei et al. 2003]. In fact, LDA and its variants are widely used statistical modelling

approach implemented in event detection tasks [Cordeiro 2012; Pan and Mitra 2011; Vavliakis et al.

2013; Vieweg et al. 2014]. However, these methods have the main drawback of requiring a priori speci-

ﬁcation of the number of total topics, which leads to problems when the total number of events exceeds

this number.

Other approaches have focused on structural networks and graph models to discover events in social

media feeds. [Benson et al. 2011] presented a structured graphical model which simultaneously an-

alyzes individual messages, clusters them according to event, and induces a canonical value for each

event property. Using a different graph analytical approach, [Sayyadi and Raschid 2013] used a Key-

Graph algorithm [Ohsawa et al. 1998] to convert text data into a term graph based on co-occurrence

relations between terms. Then they employed a community detection approach to partition the graph.

Eventually, each community is regarded as a topic and terms within the community are considered as

the topic’s features. Moreover, [Schinas et al. 2012] used the Structural Clustering Algorithm for Net-

works (SCAN) for detecting “communities” of documents. These candidate social events were further

processed by splitting the events that exceeded a predeﬁned time range into shorter events. Then they

used a classiﬁcation approach based on median geolocations and accumulated TF-IDF vectors for each

cluster to separate relevant and irrelevant candidate events. Nevertheless, these graph partitioning

algorithms are not ideal for social media event detection problems because of their complexity [Agar-

wal et al. 2012] and limitation that they do not capture the highly skewed event distribution of social

media event data due to their bias towards balanced partitioning [Karypis et al. 1997]. In addition, the

multiple events and sub-events discovery becomes computationally expensive using graph partitioning

algorithms due to velocity and scale of updates in a highly dynamic real-time situation [Agarwal et al.

2012].

Various methods have been proposed to identify small-scale events from social media streams such

as ﬁre incidents, trafﬁc jams, etc. [Walther and Kaisser 2013] developed spatiotemporal clustering

methods where they monitor speciﬁc locations of high tweeting activity and cluster tweets that are

geographically and temporally close to each other. A machine-learning module is then used to evaluate

ACM Transactions on Internet Technology, Vol. 0, No. 0, Article 0, Publication date: 0000.

0:4

•

N. Alsaedi, P. Burnap, O. Rana

whether a cluster of tweets refer to an event based on 41 features including the tweet content. Another

clustering approach is presented in [Schulz et al. 2015], with a small-scale incident detection pipeline

based on the clustering of incident-related micro-posts using three properties that deﬁne an incident:

(1) incident type, (2) location and (3) time period. Various techniques are adopted to increase the qual-

ity of their clustering approach: (A) the incident type determination using supervised machine learning

(Semantic Abstraction), (B) geotagging of tweets based on tweets geolocalization and (C) the extrac-

tion of time period of the incident. Yet, both methods are very speciﬁc without giving aspects of the

general context, it is critical that the system can provide insight into ongoing sub-events arising amid

the protest to better inform how to react accordingly, to improve both event reasoning and system per-

formance. That could explain the low recall/precision of [Schulz et al. 2015] and [Walther and Kaisser

2013] approaches when validated using real-world ofﬁcial reports, 32.14% and 4.75%, respectively.

Another event detection system, Twitcident [Abel et al. 2012], presents a Web-based application for

searching, ﬁltering and aggregating information about known events reported by emergency broad-

casting services in the Netherlands. In addition, [Watanabe et al. 2011] proposed a system called

Jasmine, for detecting local events in the real-world using geolocation information from microblog doc-

uments. They obtain the name list of locations from geotagged tweets and add positional information

to tweets by matching the location name. A similar work is [Boettcher and Lee 2012] that introduces

a statistical method for detecting local events using a temporal and spatial analysis by considering

seven day historic data. The main contribution of EventRadar is that it detects local events without

keeping a list of locations by ﬁnding clusters of Tweets that contain the same subset of words. Another

related system is proposed by [Li et al. 2012] to detect crime and disaster related Events (CDE) from

tweets. They use spatial and temporal information of tweets to detect new events with a number of

text mining techniques to extract the meta information (e.g., geo-location names, temporal phrase, and

keywords) for event interpretation. Most of these small-scale event detection approaches are novel and

automatic, however, the performance and detection reliability of these systems are highly dependent

on the incident type so they are limited to certain speciﬁc types of event content that they can handle.

Regarding the use of social media data during disasters, researchers have proposed several visual

analytics approaches aiming at real-time microblog analysis that often facilitate interactive means for

exploration and anomaly indication. TwitterMonitor [Mathioudakis and Koudas 2010] performs trend

detection in two steps and analyzes trends in a third step. During the ﬁrst phase, it identiﬁes bursty

keywords which are then grouped based on their co-occurrences. Once a trend is identiﬁed, additional

information from the tweets is extracted to analyze and describe the trend. AIDR (Artiﬁcial Intelli-

gence for Disaster Response) [Imran et al. 2014] is a platform for ﬁltering and classifying messages

posted to social media during humanitarian crises in real time. AIDR uses human-assigned labels

(crowdsourcing messages), and pre-existing classiﬁcation techniques to classify Twitter messages into

a set of user-deﬁned situational awareness categories in real-time. [Vieweg et al. 2010] analyze the

Twitter logs for a pair of concurrent emergency events; the Oklahoma Grassﬁres (April 2009) and

the Red River Floods (March and April 2009). Their automated framework is based on the relative

frequency of geo-location and location-referencing information from users’ posts.

In a related work, [Olteanu et al. 2014] created a lexicon of crisis-related terms (380 single-word

terms) that frequently appear in relevant messages posted during six crisis events. Then, they demon-

strated how we use the lexicon to automatically identify new terms by employing pseudo-relevance

feedback mechanisms to extract crisis-related messages during emergency events. [Vieweg et al. 2014]

enable ﬁltering, searching, and analyzing of Twitter during another natural disaster (the 2013 Ty-

phoon Yolanda). They used supervised classiﬁcation algorithm to automatically classify tweets into

three categories: Informative; Not informative and Not related to this crisis. Then they employed topic

modelling using LDA [Blei et al. 2003] model to further classify the informative tweets into 10 clusters

ACM Transactions on Internet Technology, Vol. 0, No. 0, Article 0, Publication date: 0000.

Can We Predict a Riot? Disruptive Event Detection Using Twitter

Figures

Citations

Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Breaking News Detection and Tracking in Twitter.

Monitoring the public opinion about the vaccination topic from tweets analysis

Using AI and Social Media Multimodal Content for Disaster Response and Management: Opportunities, Challenges, and Future Directions

Iktishaf: a Big Data Road-Traffic Event Detection Tool Using Twitter and Spark Machine Learning

References

Latent dirichlet allocation

Latent Dirichlet Allocation

The anatomy of a large-scale hypertextual Web search engine

The Anatomy of a Large-Scale Hypertextual Web Search Engine.

Term Weighting Approaches in Automatic Text Retrieval

Related Papers (5)

Earthquake shakes Twitter users: real-time event detection by social sensors

Latent dirichlet allocation

Streaming First Story Detection with application to Twitter

A Survey of Techniques for Event Detection in Twitter

Beyond Trending Topics: Real-World Event Identification on Twitter

Frequently Asked Questions (18)

Q1. What are the contributions mentioned in the paper "Can we predict a riot? disruptive event detection using twitter" ?

Q2. What future works have the authors mentioned in the paper "Can we predict a riot? disruptive event detection using twitter" ?

Q3. What are the classification algorithms used in the experiment?

Q4. What are the discriminative features of the NDCG?

Q5. How can it be used to detect large and small events?

Q6. What is the proposed framework for detecting events?

Q7. What is the main challenge of social media?

Q8. What is the effect of using supervised classification of each tweet before clustering?

Q9. How has large-scale event detection been explored?

Q10. What is the role of social media in disasters?

Q11. What is the effect of the removal of posts that were less than 3 words long?

Q12. What are the main features of their approach?

Q13. Why did the authors choose to use an online clustering algorithm?

Q14. What are the advantages of using a clustering algorithm?

Q15. What is the method for evaluating the clustering algorithm?

Q16. What is the way to obtain a good performance of the textual feature model?

Q17. How did the researchers find the way to detect a riot?

Q18. How did the authors propose to combine the summarization of tweets?