Tweeting Traffic: Analyzing Twitter for generating real-time city traffic insights and predictions

doi:10.1145/2778865.2778874

Home
/
Papers
/
Tweeting Traffic: Analyzing Twitter for generating real-time city traffic insights and predictions

Proceedings Article•DOI•

Tweeting Traffic: Analyzing Twitter for generating real-time city traffic insights and predictions

Priyam Tejaswin¹, Rohan Kumar¹, Siddharth Gupta¹•Institutions (1)

VIT University¹

20 Mar 2015-pp 9

TL;DR: The method utilizes background knowledge from structured data repositories for entity extraction from tweets for traffic incident clustering and prediction, and presents the Continuous Traffic Management Dashboard system: an automated computer system for generating real-time, historic and predictive traffic insights.

read less

Abstract: Crowd sourced road traffic management is an open, unexplored problem in data science. With the growth of mobile communications and social media networks, more people are expressing their traffic situations in real-time. We explore how this social media data can be analyzed to generate valuable insights, useful for traffic management and city planning. Our method utilizes background knowledge from structured data repositories for entity extraction from tweets. We proceed to use this spatio-temporal data for traffic incident clustering and prediction. With accuracy and precision measurements providing encouraging results, we build on our methods and present our Continuous Traffic Management Dashboard (CTMD) system: an automated computer system for generating real-time, historic and predictive traffic insights.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Social media based transportation research: the state of the work and the networking

[...]

Yisheng Lv¹, Yuanyuan Chen¹, Xiqiao Zhang², Yanjie Duan¹, Naiqiang Li Li - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, Harbin Institute of Technology²

16 Jan 2017-IEEE/CAA Journal of Automatica Sinica

TL;DR: This paper reviews social media based transportation research with social network analysis methods, and summarizes main research topics in this field, and reports collaboration patterns at levels of researchers, institutions, and countries.

...read moreread less

Abstract: Recently, there has been an increased interest in the use of social media data as important traffic information sources. In this paper, we review social media based transportation research with social network analysis methods.We summarize main research topics in this field, and report collaboration patterns at levels of researchers, institutions, and countries, respectively. Finally, some future research directions are identified.

...read moreread less

87 citations

Cites background from "Tweeting Traffic: Analyzing Twitter..."

...data to cluster and predict traffic incidents [35]....
[...]

Journal Article•DOI•

Detecting Traffic Information From Social Media Texts With Deep Learning Approaches

[...]

Yuanyuan Chen¹, Yisheng Lv¹, Xiao Wang¹, Lingxi Li², Fei-Yue Wang¹ - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, Indiana University – Purdue University Indianapolis²

01 Aug 2019-IEEE Transactions on Intelligent Transportation Systems

TL;DR: This paper applies the continuous bag-of-word model to learn word embedding representations based on a data set of three billion microblogs, and proposes using convolutional neural networks, long short-term memory models and their combination LSTM-CNN and the multi-layer perceptron model based on word vector features to extract traffic relevant microblogs.

...read moreread less

Abstract: Mining traffic-relevant information from social media data has become an emerging topic due to the real-time and ubiquitous features of social media. In this paper, we focus on a specific problem in social media mining which is to extract traffic relevant microblogs from Sina Weibo, a Chinese microblogging platform. It is transformed into a machine learning problem of short text classification. First, we apply the continuous bag-of-word model to learn word embedding representations based on a data set of three billion microblogs. Compared to the traditional one-hot vector representation of words, word embedding can capture semantic similarity between words and has been proved effective in natural language processing tasks. Next, we propose using convolutional neural networks (CNNs), long short-term memory (LSTM) models and their combination LSTM-CNN to extract traffic relevant microblogs with the learned word embeddings as inputs. We compare the proposed methods with competitive approaches, including the support vector machine (SVM) model based on a bag of n-gram features, the SVM model based on word vector features, and the multi-layer perceptron model based on word vector features. Experiments show the effectiveness of the proposed deep learning approaches.

...read moreread less

72 citations

Journal Article•DOI•

Sensing and detecting traffic events using geosocial media data: A review

[...]

Shishuo Xu¹, Shishuo Xu², Songnian Li¹, Richard Wen¹•Institutions (2)

Ryerson University¹, China University of Mining and Technology²

01 Nov 2018-Computers, Environment and Urban Systems

TL;DR: A systematic review of a wide variety of techniques applied in detecting traffic events from geosocial media data, arranged based on their adoption in each stage of an event detection framework developed from the literature review is presented.

...read moreread less

33 citations

Proceedings Article•DOI•

Towards Real-Time Road Traffic Analytics using Telco Big Data

[...]

Constantinos Costa¹, Georgios Chatzimilioudis¹, Demetrios Zeinalipour-Yazti², Mohamed F. Mokbel³•Institutions (3)

University of Cyprus¹, Max Planck Society², University of Minnesota³

28 Aug 2017

TL;DR: The components of the Traffic-TBD (Traffic Telco Big Data) architecture are outlined, which aims to become an innovative road traffic analytic and prediction system with the following desiderata: provide micro-level traffic modeling and prediction that goes beyond the current state provided by Internet-based navigation enterprises utilizing crowdsourcing.

...read moreread less

Abstract: A telecommunication company (telco) is traditionally only perceived as the entity that provides telecommunication services, such as telephony and data communication access to users. However, the IP backbone infrastructure of such entities spanning densely urban spaces and widely rural areas, provides nowadays a unique opportunity to collect immense amounts of mobility data that can provide valuable insights for road traffic management and avoidance. In this paper we outline the components of the Traffic-TBD (Traffic Telco Big Data) architecture, which aims to become an innovative road traffic analytic and prediction system with the following desiderata: i) provide micro-level traffic modeling and prediction that goes beyond the current state provided by Internet-based navigation enterprises utilizing crowdsourcing; ii) retain the location privacy boundaries of users inside their mobile network operator, to avoid the risks of exposing location data to third-party mobile applications; and iii) be available with minimal costs and using existing infrastructure (i.e., cell towers and TBD data streams are readily available inside a telco). Road traffic understanding, management and analytics can minimize the number of road accidents, optimize fuel and energy consumption, avoid unexpected delays, contribute to a macroscopic spatio-temporal understanding of traffic in cities but also to "smart" societies through applications in city planning, public transportation, logistics and fleet management for enterprises, startups and governmental bodies.

...read moreread less

26 citations

Cites background from "Tweeting Traffic: Analyzing Twitter..."

...5E based on oating vehicle data [15, 21] and crowdsourced data [7, 12] including data from social networks [20]....
[...]

Journal Article•DOI•

Social media as passive geo-participation in transportation planning – how effective are topic modeling & sentiment analysis in comparison with citizen surveys?

[...]

Oliver Lock¹, Christopher Pettit¹•Institutions (1)

University of New South Wales¹

21 Sep 2020-Geo-spatial Information Science

TL;DR: This research forms a case study of the use of passively collected forms of big data in cities – focusing on Sydney, Australia – and examines social media data related to public transport performance and key recommendations for developing Smart Cities were formed.

...read moreread less

Abstract: We live in an era of rapid urbanization as many cities are experiencing an unprecedented rate of population growth and congestion. Public transport is playing an increasingly important role in urba...

...read moreread less

24 citations

1
2
3
4
…
5

References

PDF

Open Access

More filters

Journal Article•DOI•

Random Forests

[...]

Leo Breiman¹•Institutions (1)

University of California, Berkeley¹

01 Oct 2001

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

...read moreread less

Abstract: Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, aaa, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

...read moreread less

79,257 citations

Posted Content•

NLTK: The Natural Language Toolkit

[...]

Edward Loper, Steven Bird

17 May 2002-arXiv: Computation and Language

TL;DR: NLTK, the Natural Language Toolkit, is a suite of open source program modules, tutorials and problem sets, providing ready-to-use computational linguistics courseware that covers symbolic and statistical natural language processing.

...read moreread less

Abstract: NLTK, the Natural Language Toolkit, is a suite of open source program modules, tutorials and problem sets, providing ready-to-use computational linguistics courseware. NLTK covers symbolic and statistical natural language processing, and is interfaced to annotated corpora. Students augment and replace existing components, learn structured programming by example, and manipulate sophisticated models from the outset.

...read moreread less

3,345 citations

Proceedings Article•DOI•

NLTK: The Natural Language Toolkit

[...]

Steven Bird¹•Institutions (1)

University of Pennsylvania¹

17 Jul 2006

TL;DR: The Natural Language Toolkit has been rewritten, simplifying many linguistic data structures and taking advantage of recent enhancements in the Python language.

...read moreread less

Abstract: The Natural Language Toolkit is a suite of program modules, data sets and tutorials supporting research and teaching in computational linguistics and natural language processing. NLTK is written in Python and distributed under the GPL open source license. Over the past year the toolkit has been rewritten, simplifying many linguistic data structures and taking advantage of recent enhancements in the Python language. This paper reports on the simplified toolkit and explains how it is used in teaching NLP.

...read moreread less

2,835 citations

Proceedings Article•DOI•

Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments

[...]

Kevin Gimpel¹, Nathan Schneider¹, Brendan O'Connor¹, Dipanjan Das¹, Daniel Mills¹, Jacob Eisenstein¹, Michael Heilman¹, Dani Yogatama¹, Jeffrey Flanigan¹, Noah A. Smith¹ - Show less +6 more•Institutions (1)

Carnegie Mellon University¹

19 Jun 2011

TL;DR: A tagset is developed, data is annotated, features are developed, and results nearing 90% accuracy are reported on the problem of part-of-speech tagging for English data from the popular micro-blogging service Twitter.

...read moreread less

Abstract: We address the problem of part-of-speech tagging for English data from the popular micro-blogging service Twitter. We develop a tagset, annotate data, develop features, and report tagging results nearing 90% accuracy. The data and tools have been made available to the research community with the goal of enabling richer text analysis of Twitter and related social media data sets.

...read moreread less

1,053 citations

Journal Article•

A Survey of Binary Similarity and Distance Measures

[...]

Seung-Seok Choi, Sung-Hyuk Cha, Charles C. Tappert

01 Feb 2010-Journal on Systemics, Cybernetics and Informatics

TL;DR: This work has collected 76 binary similarity and distance measures used over the last century and reveals their correlations through the hierarchical clustering technique.

...read moreread less

Abstract: The binary feature vector is one of the most common representations of patterns and measuring similarity and distance measures play a critical role in many problems such as clustering, classification, etc. Ever since Jaccard proposed a similarity measure to classify ecological species in 1901, numerous binary similarity and distance measures have been proposed in various fields. Applying appropriate measures results in more accurate data analysis. Notwithstanding, few comprehensive surveys on binary measures have been conducted. Hence we collected 76 binary similarity and distance measures used over the last century and reveal their correlations through the hierarchical clustering technique.

...read moreread less

799 citations