scispace - formally typeset
Open AccessJournal ArticleDOI

Can We Predict a Riot? Disruptive Event Detection Using Twitter

Reads0
Chats0
TLDR
An end-to-end integrated event detection framework that comprises five main components: data collection, pre-processing, classification, online clustering, and summarization is presented and an evaluation of the effectiveness of detecting events using a variety of features derived from Twitter posts is presented.
Abstract
In recent years, there has been increased interest in real-world event detection using publicly accessible data made available through Internet technology such as Twitter, Facebook, and YouTube. In these highly interactive systems, the general public are able to post real-time reactions to “real world” events, thereby acting as social sensors of terrestrial activity. Automatically detecting and categorizing events, particularly small-scale incidents, using streamed data is a non-trivial task but would be of high value to public safety organisations such as local police, who need to respond accordingly. To address this challenge, we present an end-to-end integrated event detection framework that comprises five main components: data collection, pre-processing, classification, online clustering, and summarization. The integration between classification and clustering enables events to be detected, as well as related smaller-scale “disruptive events,” smaller incidents that threaten social safety and security or could disrupt social order. We present an evaluation of the effectiveness of detecting events using a variety of features derived from Twitter posts, namely temporal, spatial, and textual content. We evaluate our framework on a large-scale, real-world dataset from Twitter. Furthermore, we apply our event detection system to a large corpus of tweets posted during the August 2011 riots in England. We use ground-truth data based on intelligence gathered by the London Metropolitan Police Service, which provides a record of actual terrestrial events and incidents during the riots, and show that our system can perform as well as terrestrial sources, and even better in some cases.

read more

Content maybe subject to copyright    Report

This is an Open Access document downloaded from ORCA, Cardiff University's institutional
repository: http://orca.cf.ac.uk/99582/
This is the author’s version of a work that was submitted to / accepted for publication.
Citation for final published version:
Alsaedi, Nasser, Burnap, Pete and Rana, Omer 2017. Can we predict a riot? Disruptive event
detection using Twitter. ACM Transactions on Internet Technology 17 (2) , 18. 10.1145/2996183 file
Publishers page: http://dx.doi.org/10.1145/2996183 <http://dx.doi.org/10.1145/2996183>
Please note:
Changes made as a result of publishing processes such as copy-editing, formatting and page
numbers may not be reflected in this version. For the definitive version of this publication, please
refer to the published source. You are advised to consult the publisher’s version if you wish to cite
this paper.
This version is being made available in accordance with publisher policies. See
http://orca.cf.ac.uk/policies.html for usage policies. Copyright and moral rights for publications
made available in ORCA are retained by the copyright holders.

Can we predict a riot? Disruptive Event Detection using
Twitter
NASSER ALSAEDI, PETE BURNAP, and OMER RANA, Cardiff University
In recent years, there has been increased interest in real-world event detection using publicly accessible data made available
through Internet technology such as Twitter, Facebook and YouTube. In these highly interactive systems the general public are
able to post real-time reactions to “real world” events - thereby acting as social sensors of terrestrial activity. Automatically
detecting and categorizing events, particularly small-scale incidents, using streamed data is a non-trivial task, but would be
of high value to public safety organisations such as local Police, who need to respond accordingly. To address this challenge
we present an end-to-end integrated event detection framework which comprises five main components: data collection, pre-
processing, classification, online clustering and summarization. The integration between classification and clustering enables
events to be detected, as well as related smaller scale “disruptive events” - smaller incidents that threaten social safety and
security, or could disrupt social order. We present an evaluation of the effectiveness of detecting events using a variety of
features derived from Twitter posts, namely: temporal, spatial and textual content. We evaluate our framework on a large-scale,
real-world dataset from Twitter. Furthermore, we apply our event detection system to a large corpus of tweets posted during
the August 2011 riots in England. We use ground truth data based on intelligence gathered by the London Metropolitan Police
Service, which provides a record of actual terrestrial events and incidents during the riots, and show that our system can perform
as well as terrestrial sources, even better in some cases.
CCS Concepts: H.3.3 [Information Storage and Retrieval] Clustering; H.2.8 [Database Applications] Data min-
ing; I.5.2 [Design Methodology] Feature evaluation and selection;
Additional Key Words and Phrases: Social media, Event Detection, Classification, Clustering, Feature selection, Evaluation.
1. INTRODUCTION
The rapid growth of Internet-enabled communication technology in the form of social networking ser-
vices (often collectively referred to as social media) and associated smartphone apps has enabled bil-
lions of global citizens to broadcast news and ‘on the ground’ information during ‘real world’ events as
they unfold. Twitter, for example, has been studied as an emerging news reporting platform [Osborne
et al. 2013; Phuvipadawat and Murata 2010; Weng and Lee 2011] and has been widely used to dis-
seminate information about the Arab Spring [Alsaedi and Burnap 2015; Starbird and Palen 2012] and
other disaster-related incidents [Burnap et al. 2014; Imran et al. 2015; Shamma. et al. 2010; Thelwall
et al. 2011; Williams and Burnap 2015]. The interaction between people, events, and Internet-enabled
technology, presents both an opportunity and a challenge to Social Computing scholars, public sector
organisations (e.g. governments and policing agencies), and private sector, all of whom aim to under-
stand how events are reported using social media and how millions of online posts can be reduced to
accurate but meaningful information that can support decision making and lead to productive action.
Research in recent years has uncovered the increasingly important role of utilising data from social
networking sites in disaster situations, and shown that information broadcast via social media can
enhance situational awareness during a crisis situation [Alsaedi et al. 2015; Vieweg et al. 2010, 2014].
In particular, members of the public, formal response agencies and local, national and international aid
organizations are all aware of the ability to use social media to gather and disperse timely information
in the aftermath of disaster [Chowdhury et al. 2013; Imran et al. 2014; Iyengar et al. 2011]. However,
many existing approaches to event detection are limited to global or large-scale event detection (e.g.
ACM Transactions on Internet Technology, Vol. 0, No. 0, Article 0, Publication date: 0000.

0:2
N. Alsaedi, P. Burnap, O. Rana
natural disasters and terror attacks), while detecting small-scale incidents such as fires, car accidents,
and public order events remains an ongoing research topic due to several key challenges.
One challenge is that online posts are often constrained in length (referred to as microblogs), which
means that only a small amount of text is available to be analysed to gain insights. Within the text
there are other challenges, such as frequent use of informal, irregular, and abbreviated words; a large
number of spelling and grammatical errors; and the use of improper sentence structure and mixed lan-
guages [Becker et al. 2011a; Farzindar and Wael 2015; Imran et al. 2015]. Some languages are more
challenging than others, for example Arabic users use dialects heavily as well as a mixture of Latin
and Arabic characters (Arabizi) [Alsaedi and Burnap 2015]. These dialects may differ in vocabulary,
morphology, and spelling from the standard Arabic and most do not have standard spellings. Addi-
tionally, social networking services’ popularity have attracted spammers and other content polluters
to spread advertisements, pornography, viruses, phishing and other malicious material that cloud the
information analysis [Burnap et al. 2015; Farzindar and Wael 2015].
Despite these challenges, it has been noted that detecting small-scale events is essential to improv-
ing situational awareness of both citizens and decision makers [Li et al. 2012; Schulz et al. 2015;
Walther and Kaisser 2013] and thus remains a well motivated research topic for the Social Computing
community. In this article, we propose a novel approach to event detection that aims to overcome many
of the challenges to provide a system to detect large-scale events and related small-scale events. The
approach is based on the integration of supervised machine learning algorithms to detect larger scale
events, and unsupervised approaches to cluster, disambiguate and summarize smaller sub-events, with
a goal of improving situational awareness in emergency situations through automatic methods. Our
contributions can be summarized as follows:
—Using temporal, spatial and textual features, our approach is able to detect small-scale events in a
given place and time better than existing algorithms, to which we compare our performance results;
—While other related work focuses on large or small scale events, our approach can identify large and
related small scale events. Thus, our approach retains the context of smaller events (e.g. distinguish-
ing between public disorder related to an event, and general disorder);
—of the related event detection work is dependent on utilising event-specific terms and phrases but we
propose a novel approach to summarizing microblog posts corresponding to events without the need
for prior knowledge of the entire data set. That is, in real-time and not post-event. Our approach is
based on modifying a term frequency algorithm to include a dynamic temporal aspect;
—We demonstrate that our proposed approach can identify the relationship between content posted
via social media, and ’real world’ events by using time-stamped social media data and actual crime
reports to accurately flag events prior to their known reporting time throughout a study period,
using human annotated Twitter data as an example data source;
—We present a case study of our approach by evaluating it against other leading approaches using
Twitter posts from the UK riots in 2011, and a publicly accessible account of actual reported intel-
ligence obtained and reports received by the Metropolitan Police Service during this event. Smaller
scale events include localized looting, violence and criminal damage. Results show that our system
can perform as well as terrestrial sources at detecting events related to the riots - in some cases we
detect the event before intelligence reports were recorded.
The rest of this article is organized as follows: Section 2 reviews related work. Sections 3 and 4
define the problem of event detection using data from social networking services, and discuss the
technical architecture and algorithms developed as part of our proposed system. In section 5 we present
and analyze several features, namely temporal, spatial and textual features. Section 6 presents our
ACM Transactions on Internet Technology, Vol. 0, No. 0, Article 0, Publication date: 0000.

Can we predict a riot? Disruptive Event Detection using Twitter
0:3
experiments and discusses the results. In section 7 we conclude and highlight some directions for
future research.
2. RELATED WORK
The general topic of detecting real-world events from social media has received considerable research
interest. Research efforts have focused on real-time event detection and tracking, social media analysis,
micro-blog summarization and information visualisation. We describe relevant related work in three
areas: large-scale (global) event detection, small-scale (local) event detection, and systems used to
extract crisis relevant information from social media.
For large-scale events [Petrovi
´
c et al. 2010] presented an approach to detect breaking stories from
a stream of tweets using locality-sensitive hashing (LSH). [Becker et al. 2011a] proposed an online
clustering framework to identify different types of real-world events. Then, they use different machine
learning models to predict whether a pair of documents belong to real-world events or not. These
approaches are limited to widely discussed events and fail to report rare and potentially disruptive
small-scale incidents.
Large-scale event detection has also been explored through clustering of discrete wavelet signals
built from individual words generated by Twitter [Weng and Lee 2011]. Auto-correlation then filters
away the trivial words (noise) and cross correlation groups together words that relate to an event by
modularity-based graph partitioning. Similarly, [Cordeiro 2012] proposed a continuous wavelet trans-
formation based on hashtag occurrences combined with a topic model inference using Latent Dirichlet
Allocation (LDA) [Blei et al. 2003]. In fact, LDA and its variants are widely used statistical modelling
approach implemented in event detection tasks [Cordeiro 2012; Pan and Mitra 2011; Vavliakis et al.
2013; Vieweg et al. 2014]. However, these methods have the main drawback of requiring a priori speci-
fication of the number of total topics, which leads to problems when the total number of events exceeds
this number.
Other approaches have focused on structural networks and graph models to discover events in social
media feeds. [Benson et al. 2011] presented a structured graphical model which simultaneously an-
alyzes individual messages, clusters them according to event, and induces a canonical value for each
event property. Using a different graph analytical approach, [Sayyadi and Raschid 2013] used a Key-
Graph algorithm [Ohsawa et al. 1998] to convert text data into a term graph based on co-occurrence
relations between terms. Then they employed a community detection approach to partition the graph.
Eventually, each community is regarded as a topic and terms within the community are considered as
the topic’s features. Moreover, [Schinas et al. 2012] used the Structural Clustering Algorithm for Net-
works (SCAN) for detecting “communities” of documents. These candidate social events were further
processed by splitting the events that exceeded a predefined time range into shorter events. Then they
used a classification approach based on median geolocations and accumulated TF-IDF vectors for each
cluster to separate relevant and irrelevant candidate events. Nevertheless, these graph partitioning
algorithms are not ideal for social media event detection problems because of their complexity [Agar-
wal et al. 2012] and limitation that they do not capture the highly skewed event distribution of social
media event data due to their bias towards balanced partitioning [Karypis et al. 1997]. In addition, the
multiple events and sub-events discovery becomes computationally expensive using graph partitioning
algorithms due to velocity and scale of updates in a highly dynamic real-time situation [Agarwal et al.
2012].
Various methods have been proposed to identify small-scale events from social media streams such
as fire incidents, traffic jams, etc. [Walther and Kaisser 2013] developed spatiotemporal clustering
methods where they monitor specific locations of high tweeting activity and cluster tweets that are
geographically and temporally close to each other. A machine-learning module is then used to evaluate
ACM Transactions on Internet Technology, Vol. 0, No. 0, Article 0, Publication date: 0000.

0:4
N. Alsaedi, P. Burnap, O. Rana
whether a cluster of tweets refer to an event based on 41 features including the tweet content. Another
clustering approach is presented in [Schulz et al. 2015], with a small-scale incident detection pipeline
based on the clustering of incident-related micro-posts using three properties that define an incident:
(1) incident type, (2) location and (3) time period. Various techniques are adopted to increase the qual-
ity of their clustering approach: (A) the incident type determination using supervised machine learning
(Semantic Abstraction), (B) geotagging of tweets based on tweets geolocalization and (C) the extrac-
tion of time period of the incident. Yet, both methods are very specific without giving aspects of the
general context, it is critical that the system can provide insight into ongoing sub-events arising amid
the protest to better inform how to react accordingly, to improve both event reasoning and system per-
formance. That could explain the low recall/precision of [Schulz et al. 2015] and [Walther and Kaisser
2013] approaches when validated using real-world official reports, 32.14% and 4.75%, respectively.
Another event detection system, Twitcident [Abel et al. 2012], presents a Web-based application for
searching, filtering and aggregating information about known events reported by emergency broad-
casting services in the Netherlands. In addition, [Watanabe et al. 2011] proposed a system called
Jasmine, for detecting local events in the real-world using geolocation information from microblog doc-
uments. They obtain the name list of locations from geotagged tweets and add positional information
to tweets by matching the location name. A similar work is [Boettcher and Lee 2012] that introduces
a statistical method for detecting local events using a temporal and spatial analysis by considering
seven day historic data. The main contribution of EventRadar is that it detects local events without
keeping a list of locations by finding clusters of Tweets that contain the same subset of words. Another
related system is proposed by [Li et al. 2012] to detect crime and disaster related Events (CDE) from
tweets. They use spatial and temporal information of tweets to detect new events with a number of
text mining techniques to extract the meta information (e.g., geo-location names, temporal phrase, and
keywords) for event interpretation. Most of these small-scale event detection approaches are novel and
automatic, however, the performance and detection reliability of these systems are highly dependent
on the incident type so they are limited to certain specific types of event content that they can handle.
Regarding the use of social media data during disasters, researchers have proposed several visual
analytics approaches aiming at real-time microblog analysis that often facilitate interactive means for
exploration and anomaly indication. TwitterMonitor [Mathioudakis and Koudas 2010] performs trend
detection in two steps and analyzes trends in a third step. During the first phase, it identifies bursty
keywords which are then grouped based on their co-occurrences. Once a trend is identified, additional
information from the tweets is extracted to analyze and describe the trend. AIDR (Artificial Intelli-
gence for Disaster Response) [Imran et al. 2014] is a platform for filtering and classifying messages
posted to social media during humanitarian crises in real time. AIDR uses human-assigned labels
(crowdsourcing messages), and pre-existing classification techniques to classify Twitter messages into
a set of user-defined situational awareness categories in real-time. [Vieweg et al. 2010] analyze the
Twitter logs for a pair of concurrent emergency events; the Oklahoma Grassfires (April 2009) and
the Red River Floods (March and April 2009). Their automated framework is based on the relative
frequency of geo-location and location-referencing information from users’ posts.
In a related work, [Olteanu et al. 2014] created a lexicon of crisis-related terms (380 single-word
terms) that frequently appear in relevant messages posted during six crisis events. Then, they demon-
strated how we use the lexicon to automatically identify new terms by employing pseudo-relevance
feedback mechanisms to extract crisis-related messages during emergency events. [Vieweg et al. 2014]
enable filtering, searching, and analyzing of Twitter during another natural disaster (the 2013 Ty-
phoon Yolanda). They used supervised classification algorithm to automatically classify tweets into
three categories: Informative; Not informative and Not related to this crisis. Then they employed topic
modelling using LDA [Blei et al. 2003] model to further classify the informative tweets into 10 clusters
ACM Transactions on Internet Technology, Vol. 0, No. 0, Article 0, Publication date: 0000.

Citations
More filters
Proceedings Article

Breaking News Detection and Tracking in Twitter.

TL;DR: In this article, a method to collect, group, rank and track breaking news in Twitter is proposed, where each story is provided with the information of message originator, story development and activity chart.
Journal ArticleDOI

Monitoring the public opinion about the vaccination topic from tweets analysis

TL;DR: An intelligent system to automatically infer trends in the public opinion regarding the stance towards the vaccination topic enables the detection of significant opinion shifts, which can be possibly explained with the occurrence of specific social context-related events.
Journal ArticleDOI

Using AI and Social Media Multimodal Content for Disaster Response and Management: Opportunities, Challenges, and Future Directions

TL;DR: Various applications and opportunities of SM multimodal data, latest advancements, current challenges, and future directions for the crisis informatics and other related research fields are highlighted.
Journal ArticleDOI

Iktishaf: a Big Data Road-Traffic Event Detection Tool Using Twitter and Spark Machine Learning

TL;DR: Iktishaf, developed over Apache Spark, a big data tool for traffic-related event detection from Twitter data in Saudi Arabia, uses three machine learning (ML) algorithms to build multiple classifiers to detect eight event types.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI

The anatomy of a large-scale hypertextual Web search engine

TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Journal Article

The Anatomy of a Large-Scale Hypertextual Web Search Engine.

Sergey Brin, +1 more
- 01 Jan 1998 - 
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.
Journal ArticleDOI

Term Weighting Approaches in Automatic Text Retrieval

TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.
Frequently Asked Questions (18)
Q1. What are the contributions mentioned in the paper "Can we predict a riot? disruptive event detection using twitter" ?

In this paper, Alsaedi et al. used temporal, spatial and textual features to detect small-scale events. 

There are many directions for future work. Finally, the detection of rumors in the social media, the analysis of the distinctive characteristics of rumors and the way in which they propagate in the microblogging communities will be addressed in the future. Spammer detection in various online social networking platforms is another interesting task that is reserved for future work. The authors intend to further evaluate the summarization output to not only map onto real events, but to provide qualitatively useful output for decision making. 

The classification algorithms used in the experiment were: Naive Bayes [Lewis 1998] a statistical classifier based on the Bayes’ theorem; Logistic Regression [Friedman et al. 1998], a generalized linear model to apply regression to categorical variables; and support vector machines (SVMs) [Joachims 1998] which aims at maximizing (maximum margin) the minimum distance between two classes of data using a hyperplane that separates them. 

The near-duplicate measure, the favourite ratio and the positive sentiment ratio are the least discriminative features, which suggest that they appear in all different types of posts, not only in disruptive events. 

using an online clustering algorithm with a sliding window timeframe, it can be utilised to detect large and small-scale events from social media streams - with particular attention to filtering from large to small-scale events. 

Their proposed framework is based on collecting data over time windows for a given location which supports the automatic detection and summarization of events from social media. 

One challenge is that online posts are often constrained in length (referred to as microblogs), which means that only a small amount of text is available to be analysed to gain insights. 

Employing supervised classification of each tweet before clustering (large scale event detection) reduces the computational overhead at the clustering stage as the number of tweets is significantly reduced (containing only event-related tweets). 

Large-scale event detection has also been explored through clustering of discrete wavelet signals built from individual words generated by Twitter [Weng and Lee 2011]. 

Research in recent years has uncovered the increasingly important role of utilising data from social networking sites in disaster situations, and shown that information broadcast via social media can enhance situational awareness during a crisis situation [Alsaedi et al. 

posts that were less than 3 words long were removed, as were messages where over half the total words were the same word, since these posts were less likely to have useful information. 

Their contributions can be summarized as follows:—Using temporal, spatial and textual features, their approach is able to detect small-scale events in agiven place and time better than existing algorithms, to which the authors compare their performance results; 

The decision to use an online clustering algorithm was taken for three main reasons: (i) it supports high dimensional data as it effectively handles the large volume of social media data produced around events; (ii) many clustering algorithms such as K-means require previous knowledge of the number of clusters. 

Thus clustering (small-scale event detection), feature selection and summarization are much faster and suitable for real-time analysis. 

The authors evaluate the algorithm’s performance on the training data using a range of thresholds, and identify the threshold setting that yields the highest-quality solution according to a given clustering quality metric (here the authors implement the f-measure). 

Using the textual feature model, the authors are still able to obtain a reasonable performance of on average, 40% content about an event, provides situational awareness information about that event. 

Their experiments suggest that their framework yields better performance than many leading approaches in real-time event detection, and using a real-world ground truth published by the Metropolitan Police Services (MPS) after the 2011 riots in England, the authors showed their system to detect events far quicker than they were reported to MPS. 

More fine-grained summarization was proposed by considering sub-events detection and combining the summaries extracted from each sub-topic (tweet selection, tweet ranking) [Shen et al. 2013; Yajuan et al. 2012; Zubiaga et al. 2012].