scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Tweeting Traffic: Analyzing Twitter for generating real-time city traffic insights and predictions

20 Mar 2015-pp 9
TL;DR: The method utilizes background knowledge from structured data repositories for entity extraction from tweets for traffic incident clustering and prediction, and presents the Continuous Traffic Management Dashboard system: an automated computer system for generating real-time, historic and predictive traffic insights.
Abstract: Crowd sourced road traffic management is an open, unexplored problem in data science. With the growth of mobile communications and social media networks, more people are expressing their traffic situations in real-time. We explore how this social media data can be analyzed to generate valuable insights, useful for traffic management and city planning. Our method utilizes background knowledge from structured data repositories for entity extraction from tweets. We proceed to use this spatio-temporal data for traffic incident clustering and prediction. With accuracy and precision measurements providing encouraging results, we build on our methods and present our Continuous Traffic Management Dashboard (CTMD) system: an automated computer system for generating real-time, historic and predictive traffic insights.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper reviews social media based transportation research with social network analysis methods, and summarizes main research topics in this field, and reports collaboration patterns at levels of researchers, institutions, and countries.
Abstract: Recently, there has been an increased interest in the use of social media data as important traffic information sources. In this paper, we review social media based transportation research with social network analysis methods.We summarize main research topics in this field, and report collaboration patterns at levels of researchers, institutions, and countries, respectively. Finally, some future research directions are identified.

87 citations


Cites background from "Tweeting Traffic: Analyzing Twitter..."

  • ...data to cluster and predict traffic incidents [35]....

    [...]

Journal ArticleDOI
TL;DR: This paper applies the continuous bag-of-word model to learn word embedding representations based on a data set of three billion microblogs, and proposes using convolutional neural networks, long short-term memory models and their combination LSTM-CNN and the multi-layer perceptron model based on word vector features to extract traffic relevant microblogs.
Abstract: Mining traffic-relevant information from social media data has become an emerging topic due to the real-time and ubiquitous features of social media. In this paper, we focus on a specific problem in social media mining which is to extract traffic relevant microblogs from Sina Weibo, a Chinese microblogging platform. It is transformed into a machine learning problem of short text classification. First, we apply the continuous bag-of-word model to learn word embedding representations based on a data set of three billion microblogs. Compared to the traditional one-hot vector representation of words, word embedding can capture semantic similarity between words and has been proved effective in natural language processing tasks. Next, we propose using convolutional neural networks (CNNs), long short-term memory (LSTM) models and their combination LSTM-CNN to extract traffic relevant microblogs with the learned word embeddings as inputs. We compare the proposed methods with competitive approaches, including the support vector machine (SVM) model based on a bag of n-gram features, the SVM model based on word vector features, and the multi-layer perceptron model based on word vector features. Experiments show the effectiveness of the proposed deep learning approaches.

72 citations

Journal ArticleDOI
TL;DR: A systematic review of a wide variety of techniques applied in detecting traffic events from geosocial media data, arranged based on their adoption in each stage of an event detection framework developed from the literature review is presented.

33 citations

Proceedings ArticleDOI
28 Aug 2017
TL;DR: The components of the Traffic-TBD (Traffic Telco Big Data) architecture are outlined, which aims to become an innovative road traffic analytic and prediction system with the following desiderata: provide micro-level traffic modeling and prediction that goes beyond the current state provided by Internet-based navigation enterprises utilizing crowdsourcing.
Abstract: A telecommunication company (telco) is traditionally only perceived as the entity that provides telecommunication services, such as telephony and data communication access to users. However, the IP backbone infrastructure of such entities spanning densely urban spaces and widely rural areas, provides nowadays a unique opportunity to collect immense amounts of mobility data that can provide valuable insights for road traffic management and avoidance. In this paper we outline the components of the Traffic-TBD (Traffic Telco Big Data) architecture, which aims to become an innovative road traffic analytic and prediction system with the following desiderata: i) provide micro-level traffic modeling and prediction that goes beyond the current state provided by Internet-based navigation enterprises utilizing crowdsourcing; ii) retain the location privacy boundaries of users inside their mobile network operator, to avoid the risks of exposing location data to third-party mobile applications; and iii) be available with minimal costs and using existing infrastructure (i.e., cell towers and TBD data streams are readily available inside a telco). Road traffic understanding, management and analytics can minimize the number of road accidents, optimize fuel and energy consumption, avoid unexpected delays, contribute to a macroscopic spatio-temporal understanding of traffic in cities but also to "smart" societies through applications in city planning, public transportation, logistics and fleet management for enterprises, startups and governmental bodies.

26 citations


Cites background from "Tweeting Traffic: Analyzing Twitter..."

  • ...5E based on oating vehicle data [15, 21] and crowdsourced data [7, 12] including data from social networks [20]....

    [...]

Journal ArticleDOI
TL;DR: This research forms a case study of the use of passively collected forms of big data in cities – focusing on Sydney, Australia – and examines social media data related to public transport performance and key recommendations for developing Smart Cities were formed.
Abstract: We live in an era of rapid urbanization as many cities are experiencing an unprecedented rate of population growth and congestion. Public transport is playing an increasingly important role in urba...

24 citations

References
More filters
Journal ArticleDOI
01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Abstract: Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, aaa, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

79,257 citations

Posted Content
TL;DR: NLTK, the Natural Language Toolkit, is a suite of open source program modules, tutorials and problem sets, providing ready-to-use computational linguistics courseware that covers symbolic and statistical natural language processing.
Abstract: NLTK, the Natural Language Toolkit, is a suite of open source program modules, tutorials and problem sets, providing ready-to-use computational linguistics courseware. NLTK covers symbolic and statistical natural language processing, and is interfaced to annotated corpora. Students augment and replace existing components, learn structured programming by example, and manipulate sophisticated models from the outset.

3,345 citations

Proceedings ArticleDOI
17 Jul 2006
TL;DR: The Natural Language Toolkit has been rewritten, simplifying many linguistic data structures and taking advantage of recent enhancements in the Python language.
Abstract: The Natural Language Toolkit is a suite of program modules, data sets and tutorials supporting research and teaching in computational linguistics and natural language processing. NLTK is written in Python and distributed under the GPL open source license. Over the past year the toolkit has been rewritten, simplifying many linguistic data structures and taking advantage of recent enhancements in the Python language. This paper reports on the simplified toolkit and explains how it is used in teaching NLP.

2,835 citations

Proceedings ArticleDOI
19 Jun 2011
TL;DR: A tagset is developed, data is annotated, features are developed, and results nearing 90% accuracy are reported on the problem of part-of-speech tagging for English data from the popular micro-blogging service Twitter.
Abstract: We address the problem of part-of-speech tagging for English data from the popular micro-blogging service Twitter. We develop a tagset, annotate data, develop features, and report tagging results nearing 90% accuracy. The data and tools have been made available to the research community with the goal of enabling richer text analysis of Twitter and related social media data sets.

1,053 citations

Journal Article
TL;DR: This work has collected 76 binary similarity and distance measures used over the last century and reveals their correlations through the hierarchical clustering technique.
Abstract: The binary feature vector is one of the most common representations of patterns and measuring similarity and distance measures play a critical role in many problems such as clustering, classification, etc. Ever since Jaccard proposed a similarity measure to classify ecological species in 1901, numerous binary similarity and distance measures have been proposed in various fields. Applying appropriate measures results in more accurate data analysis. Notwithstanding, few comprehensive surveys on binary measures have been conducted. Hence we collected 76 binary similarity and distance measures used over the last century and reveal their correlations through the hierarchical clustering technique.

799 citations