scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development

01 Apr 2013-IEEE Transactions on Knowledge and Data Engineering (IEEE)-Vol. 25, Iss: 4, pp 919-931
TL;DR: An earthquake reporting system for use in Japan is developed and an algorithm to monitor tweets and to detect a target event is proposed, which produces a probabilistic spatiotemporal model for the target event that can find the center of the event location.
Abstract: Twitter has received much attention recently. An important characteristic of Twitter is its real-time nature. We investigate the real-time interaction of events such as earthquakes in Twitter and propose an algorithm to monitor tweets and to detect a target event. To detect a target event, we devise a classifier of tweets based on features such as the keywords in a tweet, the number of words, and their context. Subsequently, we produce a probabilistic spatiotemporal model for the target event that can find the center of the event location. We regard each Twitter user as a sensor and apply particle filtering, which are widely used for location estimation. The particle filter works better than other comparable methods for estimating the locations of target events. As an application, we develop an earthquake reporting system for use in Japan. Because of the numerous earthquakes and the large number of Twitter users throughout the country, we can detect an earthquake with high probability (93 percent of earthquakes of Japan Meteorological Agency (JMA) seismic intensity scale 3 or more are detected) merely by monitoring tweets. Our system detects earthquakes promptly and notification is delivered much faster than JMA broadcast announcements.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The proposed social media crisis mapping platform for natural disasters uses locations from gazetteer, street map, and volunteered geographic information (VGI) sources for areas at risk of disaster and matches them to geoparsed real-time tweet data streams to generate real- time crisis maps.
Abstract: The proposed social media crisis mapping platform for natural disasters uses locations from gazetteer, street map, and volunteered geographic information (VGI) sources for areas at risk of disaster and matches them to geoparsed real-time tweet data streams. The authors use statistical analysis to generate real-time crisis maps. Geoparsing results are benchmarked against existing published work and evaluated across multilingual datasets. Two case studies compare five-day tweet crisis maps to official post-event impact assessment from the US National Geospatial Agency (NGA), compiled from verified satellite and aerial imagery sources.

363 citations


Cites background from "Tweet Analysis for Real-Time Event ..."

  • ...In the humanitarian sector, this has sparked great interest in developing S o c i a l i n t e l l i g e n c e a n d t e c h n o l o g y IS-29-02-Middleton.indd 9 22/05/14 4:01 PM 10 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS benefits of Twitter-based detection systems over sensor-based systems are their fast detection speed and low cost.3 Social media GIS systems can be combined with conventional GIS systems deploying hardware-based sensors, such as in situ seismic sensors or remote sensing aerial photography and satellite imaging....

    [...]

  • ...Currently real-time Geospatial Information Systems (GIS) [2] mostly map social media microblog reports using geotag metadata with long/lat coordinates....

    [...]

  • ...Current Technology Current real-time geospatial information systems (GIS) mostly map social media microblog reports using geotag metadata with longitude/latitude coordinates.2 This approach turns social media into a crowdsourcing virtual sensor network, allowing maps of Twitter messages to be plotted....

    [...]

  • ...This geospatial information is stored in a MySQL database, along with any OpenGIS shape data for later visualization on a map....

    [...]

  • ...All match statistics are saved to the database as soon as they’re ready along with the OpenGIS geometry to plot on the crisis map....

    [...]

Journal ArticleDOI
TL;DR: A real-time monitoring system for traffic event detection from Twitter stream analysis that fetches tweets from Twitter according to several search criteria; processes tweets, by applying text mining techniques; and finally performs the classification of tweets.
Abstract: Social networks have been recently employed as a source of information for event detection, with particular reference to road traffic congestion and car accidents. In this paper, we present a real-time monitoring system for traffic event detection from Twitter stream analysis. The system fetches tweets from Twitter according to several search criteria; processes tweets, by applying text mining techniques; and finally performs the classification of tweets. The aim is to assign the appropriate class label to each tweet, as related to a traffic event or not. The traffic detection system was employed for real-time monitoring of several areas of the Italian road network, allowing for detection of traffic events almost in real time, often before online traffic news web sites. We employed the support vector machine as a classification model, and we achieved an accuracy value of 95.75% by solving a binary classification problem (traffic versus nontraffic tweets). We were also able to discriminate if traffic is caused by an external event or not, by solving a multiclass classification problem and obtaining an accuracy value of 88.89%.

303 citations


Cites background from "Tweet Analysis for Real-Time Event ..."

  • ...Network security is involved in organizations, enterprises, and other types of institutions....

    [...]

  • ...The most common and simple way of protecting a network resource is by assigning it a unique name and a corresponding password....

    [...]

  • ...Keyword: Traffic, Network, Network-based anonymization and processing (NAP)...

    [...]

Journal ArticleDOI
TL;DR: This article presents a discussion on eight open challenges for data stream mining, which cover the full cycle of knowledge discovery and involve such problems as protecting data privacy, dealing with legacy systems, handling incomplete and delayed information, analysis of complex data, and evaluation of stream mining algorithms.
Abstract: Every day, huge volumes of sensory, transactional, and web data are continuously generated as streams, which need to be analyzed online as they arrive. Streaming data can be considered as one of the main sources of what is called big data. While predictive modeling for data streams and big data have received a lot of attention over the last decade, many research approaches are typically designed for well-behaved controlled problem settings, overlooking important challenges imposed by real-world applications. This article presents a discussion on eight open challenges for data stream mining. Our goal is to identify gaps between current research and meaningful applications, highlight open problems, and define new application-relevant research directions for data stream mining. The identified challenges cover the full cycle of knowledge discovery and involve such problems as: protecting data privacy, dealing with legacy systems, handling incomplete and delayed information, analysis of complex data, and evaluation of stream mining algorithms. The resulting analysis is illustrated by practical applications and provides general suggestions concerning lines of future research in data stream mining.

260 citations


Cites methods from "Tweet Analysis for Real-Time Event ..."

  • ...For example, since the detection of events is a main prerequisite for analyzing them, the combination of EHA with methods for event detection [36] is an important challenge....

    [...]

Journal ArticleDOI
TL;DR: A big data driven approach for disaster response through sentiment analysis that helps the emergency responders and rescue personnel to develop better strategies for effective information management of the rapidly changing disaster environment.

226 citations

Journal ArticleDOI
TL;DR: In this paper, in order to detect and describe the real time urban emergency event, the 5W (What, Where, When, Who, and Why) model is proposed and results show the accuracy and efficiency of the proposed method.
Abstract: Crowdsourcing is a process of acquisition, integration, and analysis of big and heterogeneous data generated by a diversity of sources in urban spaces, such as sensors, devices, vehicles, buildings, and human. Especially, nowadays, no countries, no communities, and no person are immune to urban emergency events. Detection about urban emergency events, e.g., fires, storms, traffic jams is of great importance to protect the security of humans. Recently, social media feeds are rapidly emerging as a novel platform for providing and dissemination of information that is often geographic. The content from social media usually includes references to urban emergency events occurring at, or affecting specific locations. In this paper, in order to detect and describe the real time urban emergency event, the 5W (What, Where, When, Who, and Why) model is proposed. Firstly, users of social media are set as the target of crowd sourcing. Secondly, the spatial and temporal information from the social media are extracted to detect the real time event. Thirdly, a GIS based annotation of the detected urban emergency event is shown. The proposed method is evaluated with extensive case studies based on real urban emergency events. The results show the accuracy and efficiency of the proposed method.

206 citations

References
More filters
Journal ArticleDOI
TL;DR: Both optimal and suboptimal Bayesian algorithms for nonlinear/non-Gaussian tracking problems, with a focus on particle filters are reviewed.
Abstract: Increasingly, for many application areas, it is becoming important to include elements of nonlinearity and non-Gaussianity in order to model accurately the underlying dynamics of a physical system. Moreover, it is typically crucial to process data on-line as it arrives, both from the point of view of storage costs as well as for rapid adaptation to changing signal characteristics. In this paper, we review both optimal and suboptimal Bayesian algorithms for nonlinear/non-Gaussian tracking problems, with a focus on particle filters. Particle filters are sequential Monte Carlo methods based on point mass (or "particle") representations of probability densities, which can be applied to any state-space model and which generalize the traditional Kalman filtering methods. Several variants of the particle filter such as SIR, ASIR, and RPF are introduced within a generic framework of the sequential importance sampling (SIS) algorithm. These are discussed and compared with the standard EKF through an illustrative example.

11,409 citations

Journal ArticleDOI
TL;DR: Consider writing, perhaps the first information technology: The ability to capture a symbolic representation of spoken language for long-term storage freed information from the limits of individual memory.
Abstract: Specialized elements of hardware and software, connected by wires, radio waves and infrared, will soon be so ubiquitous that no-one will notice their presence.

9,073 citations

Book ChapterDOI
21 Apr 1998
TL;DR: This paper explores the use of Support Vector Machines for learning text classifiers from examples and analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task.
Abstract: This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task. Empirical results support the theoretical findings. SVMs achieve substantial improvements over the currently best performing methods and behave robustly over a variety of different learning tasks. Furthermore they are fully automatic, eliminating the need for manual parameter tuning.

8,658 citations


"Tweet Analysis for Real-Time Event ..." refers methods in this paper

  • ...To classify a tweet as a positive class or a negative class, we use a support vector machine [14], which is a widely used machine-learning algorithm....

    [...]

Proceedings ArticleDOI
26 Apr 2010
TL;DR: In this paper, the authors have crawled the entire Twittersphere and found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks.
Abstract: Twitter, a microblogging service less than three years old, commands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal of this paper is to study the topological characteristics of Twitter and its power as a new medium of information sharing.We have crawled the entire Twitter site and obtained 41.7 million user profiles, 1.47 billion social relations, 4,262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks [28]. In order to identify influentials on Twitter, we have ranked users by the number of followers and by PageRank and found two rankings to be similar. Ranking by retweets differs from the previous two rankings, indicating a gap in influence inferred from the number of followers and that from the popularity of one's tweets. We have analyzed the tweets of top trending topics and reported on their temporal behavior and user participation. We have classified the trending topics based on the active period and the tweets and show that the majority (over 85%) of topics are headline news or persistent news in nature. A closer look at retweets reveals that any retweeted tweet is to reach an average of 1,000 users no matter what the number of followers is of the original tweet. Once retweeted, a tweet gets retweeted almost instantly on next hops, signifying fast diffusion of information after the 1st retweet.To the best of our knowledge this work is the first quantitative study on the entire Twittersphere and information diffusion on it.

6,108 citations

Journal Article
TL;DR: In this article, the authors propose that specialized elements of hardware and software, connected by wires, radio waves and infrared, will soon be so ubiquitous that no-one will notice their presence.
Abstract: Specialized elements of hardware and software, connected by wires, radio waves and infrared, will soon be so ubiquitous that no-one will notice their presence

5,041 citations