scispace - formally typeset
Search or ask a question
Proceedings Article

Using Social Media to Enhance Emergency Situation Awareness: Extended Abstract

TL;DR: This work focuses on analyzing Twitter messages generated during natural disasters, and shows how natural language processing and data mining techniques can be utilized to extract situation awareness information from Twitter.
Abstract: Social media platforms, such as Twitter, offer a rich source of real-time information about real-world events, particularly during mass emergencies. Sifting valuable information from social media provides useful insight into time-critical situations for emergency officers to understand the impact of hazards and act on emergency responses in a timely manner. This work focuses on analyzing Twitter messages generated during natural disasters, and shows how natural language processing and data mining techniques can be utilized to extract situation awareness information from Twitter. We present key relevant approaches that we have investigated including burst detection, tweet filtering and classification, online clustering, and geotagging.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: An end-to-end integrated event detection framework that comprises five main components: data collection, pre-processing, classification, online clustering, and summarization is presented and an evaluation of the effectiveness of detecting events using a variety of features derived from Twitter posts is presented.
Abstract: In recent years, there has been increased interest in real-world event detection using publicly accessible data made available through Internet technology such as Twitter, Facebook, and YouTube. In these highly interactive systems, the general public are able to post real-time reactions to “real world” events, thereby acting as social sensors of terrestrial activity. Automatically detecting and categorizing events, particularly small-scale incidents, using streamed data is a non-trivial task but would be of high value to public safety organisations such as local police, who need to respond accordingly. To address this challenge, we present an end-to-end integrated event detection framework that comprises five main components: data collection, pre-processing, classification, online clustering, and summarization. The integration between classification and clustering enables events to be detected, as well as related smaller-scale “disruptive events,” smaller incidents that threaten social safety and security or could disrupt social order. We present an evaluation of the effectiveness of detecting events using a variety of features derived from Twitter posts, namely temporal, spatial, and textual content. We evaluate our framework on a large-scale, real-world dataset from Twitter. Furthermore, we apply our event detection system to a large corpus of tweets posted during the August 2011 riots in England. We use ground-truth data based on intelligence gathered by the London Metropolitan Police Service, which provides a record of actual terrestrial events and incidents during the riots, and show that our system can perform as well as terrestrial sources, and even better in some cases.

97 citations

Journal ArticleDOI
TL;DR: This work shows that textual and imagery content on social media provide complementary information useful to improve situational awareness and proposes a methodological approach that combines several computational techniques effectively in a unified framework to help humanitarian organisations in their relief efforts.
Abstract: People increasingly use microblogging platforms such as Twitter during natural disasters and emergencies. Research studies have revealed the usefulness of the data available on Twitter for several ...

76 citations

Journal ArticleDOI
04 Dec 2020-PLOS ONE
TL;DR: This project set out to test whether an openly available dataset (Twitter) could be transformed into a resource for urban planning and development by creating road traffic crash location data, which are scarce in most resource-poor environments but essential for addressing the number one cause of mortality for children over age five and young adults.
Abstract: With all the recent attention focused on big data, it is easy to overlook that basic vital statistics remain difficult to obtain in most of the world. What makes this frustrating is that private companies hold potentially useful data, but it is not accessible by the people who can use it to track poverty, reduce disease, or build urban infrastructure. This project set out to test whether we can transform an openly available dataset (Twitter) into a resource for urban planning and development. We test our hypothesis by creating road traffic crash location data, which is scarce in most resource-poor environments but essential for addressing the number one cause of mortality for children over five and young adults. The research project scraped 874,588 traffic related tweets in Nairobi, Kenya, applied a machine learning model to capture the occurrence of a crash, and developed an improved geoparsing algorithm to identify its location. We geolocate 32,991 crash reports in Twitter for 2012-2020 and cluster them into 22,872 unique crashes during this period. For a subset of crashes reported on Twitter, a motorcycle delivery service was dispatched in real-time to verify the crash and its location; the results show 92% accuracy. To our knowledge this is the first geolocated dataset of crashes for the city and allowed us to produce the first crash map for Nairobi. Using a spatial clustering algorithm, we are able to locate portions of the road network (<1%) where 50% of the crashes identified occurred. Even with limitations in the representativeness of the data, the results can provide urban planners with useful information that can be used to target road safety improvements where resources are limited. The work shows how twitter data might be used to create other types of essential data for urban planning in resource poor environments.

28 citations

Proceedings ArticleDOI
14 Sep 2020
TL;DR: This work performed term frequency and topic modelling analyses over the written contents published on the platforms between 2017 and 2019 and sees emerging a very strong presence of contents about blockchain, cryptocurrency and, more specifically, on Steemit itself and its users.
Abstract: Online Social Networking platforms (OSNs) are part of the people's everyday life answering the deep-rooted need for communication among humans. During recent years, a new generation of social media based on blockchain became very popular, bringing the power of the technology to the service of social networks. Steemit is one such and employs the blockchain to implement a rewarding mechanism, adding a new, economic, layer to the social media service. The reward mechanism grants virtual tokens to the users capable of engaging other users on the platform, which can be either vested in the platform for increased influence or exchanged for fiat currency. The introduction of an economic layer on a social networking platform can seriously influence how people socialize. In this work, we tackle the problem of understanding how this new business model conditions the way people create contents. We performed term frequency and topic modelling analyses over the written contents published on the platforms between 2017 and 2019. This analysis lets us understand the most common topics of the contents that appear in the platform. While personal mundane information still appears, along with contents related to arts, food, travels, and sport, we also see emerging a very strong presence of contents about blockchain, cryptocurrency and, more specifically, on Steemit itself and its users.

23 citations

Proceedings Article
22 Sep 1975
TL;DR: ACM Transactions on Database Systems (TODS) as mentioned in this paper was the first publication of a technical journal for database systems, which will have its first issue in the Spring of 1976.
Abstract: The goal of this Conference is to bring to the attention of the international database community and of the members of ACM's Special Interest Groups on Business Data Processing (SIGBDP), Information Retrieval (SIGIR), and Management of Data (SIGMOD), the two new activities which will impact them for years to come. The first new activity is the initiation of a technical journal by ACM known as the ACM Transactions on Database Systems (TODS) which will have its first issue in the Spring of 1976. The second new activity is the emerging emphasis on very large database research and development which is prompted by Advance Research Projects Agency's programs on intelligent terminals, advanced memory concepts and very large database systems. In order to meet the goal, we formed a Program Committee of international standing. The Committee decided to have an open call-for-papers, to organize panel discussions on very large database users, on very large storage devices and on future storage architectures, and to arrange the Conference meeting in the Greater Boston area which had the support and sponsorship of ACM's SIGBDP, SIGIR, and SIGMOD and The Rand Corporation. The response to the call-for-papers was overwhelming. Over one hundred and forty people indicated their desire to submit papers. Despite a delay of notice and an earlier deadline for paper submission, we received ninety-eight complete papers. Each paper was refereed by three reviewers. Twenty-nine papers were recommended by the Committee for their publication in the Conference Proceedings. Six of the twenty-nine papers will be included in the first issue of the TODS. Because some of the remaining papers were of good quality and of interest to database practitioners, the Committee further recommended the inclusion of abstracts of another twenty-seven papers in the Proceedings.

19 citations

References
More filters
Proceedings ArticleDOI
26 Apr 2010
TL;DR: This paper investigates the real-time interaction of events such as earthquakes in Twitter and proposes an algorithm to monitor tweets and to detect a target event and produces a probabilistic spatiotemporal model for the target event that can find the center and the trajectory of the event location.
Abstract: Twitter, a popular microblogging service, has received much attention recently. An important characteristic of Twitter is its real-time nature. For example, when an earthquake occurs, people make many Twitter posts (tweets) related to the earthquake, which enables detection of earthquake occurrence promptly, simply by observing the tweets. As described in this paper, we investigate the real-time interaction of events such as earthquakes in Twitter and propose an algorithm to monitor tweets and to detect a target event. To detect a target event, we devise a classifier of tweets based on features such as the keywords in a tweet, the number of words, and their context. Subsequently, we produce a probabilistic spatiotemporal model for the target event that can find the center and the trajectory of the event location. We consider each Twitter user as a sensor and apply Kalman filtering and particle filtering, which are widely used for location estimation in ubiquitous/pervasive computing. The particle filter works better than other comparable methods for estimating the centers of earthquakes and the trajectories of typhoons. As an application, we construct an earthquake reporting system in Japan. Because of the numerous earthquakes and the large number of Twitter users throughout the country, we can detect an earthquake with high probability (96% of earthquakes of Japan Meteorological Agency (JMA) seismic intensity scale 3 or more are detected) merely by monitoring tweets. Our system detects earthquakes promptly and sends e-mails to registered users. Notification is delivered much faster than the announcements that are broadcast by the JMA.

3,976 citations


"Using Social Media to Enhance Emerg..." refers background in this paper

  • ...Disaster management using Twitter has recently been studied for humanitarian crises and natural disasters such as earthquakes, bushfires, and cyclones [Sakaki et al., 2010; Vieweg et al., 2010; Li et al., 2012; McMinn et al., 2014]....

    [...]

Proceedings ArticleDOI
10 Apr 2010
TL;DR: Analysis of microblog posts generated during two recent, concurrent emergency events in North America via Twitter, a popular microblogging service, aims to inform next steps for extracting useful, relevant information during emergencies using information extraction (IE) techniques.
Abstract: We analyze microblog posts generated during two recent, concurrent emergency events in North America via Twitter, a popular microblogging service. We focus on communications broadcast by people who were "on the ground" during the Oklahoma Grassfires of April 2009 and the Red River Floods that occurred in March and April 2009, and identify information that may contribute to enhancing situational awareness (SA). This work aims to inform next steps for extracting useful, relevant information during emergencies using information extraction (IE) techniques.

1,479 citations

Journal ArticleDOI
Jon Kleinberg1
TL;DR: The goal of the present work is to develop a formal approach for modeling such “bursts,” in such a way that they can be robustly and efficiently identified, and can provide an organizational framework for analyzing the underlying content.
Abstract: A fundamental problem in text data mining is to extract meaningful structure from document streams that arrive continuously over time. E-mail and news articles are two natural examples of such streams, each characterized by topics that appear, grow in intensity for a period of time, and then fade away. The published literature in a particular research field can be seen to exhibit similar phenomena over a much longer time scale. Underlying much of the text mining work in this area is the following intuitive premise—that the appearance of a topic in a document stream is signaled by a “burst of activity,” with certain features rising sharply in frequency as the topic emerges. The goal of the present work is to develop a formal approach for modeling such “bursts,” in such a way that they can be robustly and efficiently identified, and can provide an organizational framework for analyzing the underlying content. The approach is based on modeling the stream using an infinite-state automaton, in which bursts appear naturally as state transitionss it can be viewed as drawing an analogy with models from queueing theory for bursty network traffic. The resulting algorithms are highly efficient, and yield a nested representation of the set of bursts that imposes a hierarchical structure on the overall stream. Experiments with e-mail and research paper archives suggest that the resulting structures have a natural meaning in terms of the content that gave rise to them.

1,477 citations


"Using Social Media to Enhance Emerg..." refers methods in this paper

  • ...In the data mining area, burst detection techniques have been studied to identify emergent patterns from data streams [Fung et al., 2005; Kleinberg, 2003]....

    [...]

Proceedings ArticleDOI
26 Oct 2010
TL;DR: A probabilistic framework for estimating a Twitter user's city-level location based purely on the content of the user's tweets, which can overcome the sparsity of geo-enabled features in these services and enable new location-based personalized information services, the targeting of regional advertisements, and so on.
Abstract: We propose and evaluate a probabilistic framework for estimating a Twitter user's city-level location based purely on the content of the user's tweets, even in the absence of any other geospatial cues By augmenting the massive human-powered sensing capabilities of Twitter and related microblogging services with content-derived location information, this framework can overcome the sparsity of geo-enabled features in these services and enable new location-based personalized information services, the targeting of regional advertisements, and so on Three of the key features of the proposed approach are: (i) its reliance purely on tweet content, meaning no need for user IP information, private login information, or external knowledge bases; (ii) a classification component for automatically identifying words in tweets with a strong local geo-scope; and (iii) a lattice-based neighborhood smoothing model for refining a user's location estimate The system estimates k possible locations for each user in descending order of confidence On average we find that the location estimates converge quickly (needing just 100s of tweets), placing 51% of Twitter users within 100 miles of their actual location

1,213 citations


"Using Social Media to Enhance Emerg..." refers background in this paper

  • ...While the majority of the existing studies focus on estimating the users’ locations [Cheng et al., 2010; Li et al., 2011; Mahmud et al., 2012], we are interested in deriving a coherent locational focus that is referred to in a tweet if it exists....

    [...]

Proceedings ArticleDOI
01 Aug 1999
TL;DR: An unsupervised, near-linear time text clustering system that offers a number of algorithm choices for each phase, and a refinement to center adjustment, “vector average damping,” that further improves cluster quality.
Abstract: Clustering is a powerful technique for large-scale topic discovery from text. It involves two phases: first, feature extraction maps each document or record to a point in high-dimensional space, then clustering algorithms automatically group the points into a hierarchy of clusters. We describe an unsupervised, near-linear time text clustering system that offers a number of algorithm choices for each phase. We introduce a methodology for measuring the quality of a cluster hierarchy in terms of FMeasure, and present the results of experiments comparing different algorithms. The evaluation considers some feature selection parameters (tfidfand feature vector length) but focuses on the clustering algorithms, namely techniques from Scatter/Gather (buckshot, fractionation, and split/join) and kmeans. Our experiments suggest that continuous center adjustment contributes more to cluster quality than seed selection does. It follows that using a simpler seed selection algorithm gives a better time/quality tradeoff. We describe a refinement to center adjustment, “vector average damping,” that further improves cluster quality. We also compare the near-linear time algorithms to a group average greedy agglomerative clustering algorithm to demonstrate the time/quality tradeoff quantitatively.

958 citations

Trending Questions (1)
What are the ways in which the media can enhance awareness?

The paper discusses how social media platforms, such as Twitter, can enhance awareness during emergency situations by providing real-time information and utilizing natural language processing and data mining techniques to extract situation awareness information.