scispace - formally typeset
Search or ask a question
Author

Christian Sengstock

Bio: Christian Sengstock is an academic researcher from Heidelberg University. The author has contributed to research in topics: Topic model & Association rule learning. The author has an hindex of 6, co-authored 14 publications receiving 375 citations.

Papers
More filters
Journal ArticleDOI
01 Aug 2013
TL;DR: This work presents a novel framework to detect localized events in real-time from a Twitter stream and to track the evolution of such events over time, using a stream of tweets from Europe during the 2012 UEFA European Football Championship.
Abstract: Microblogging services such as Twitter, Facebook, and Foursquare have become major sources for information about real-world events. Most approaches that aim at extracting event information from such sources typically use the temporal context of messages. However, exploiting the location information of georeferenced messages, too, is important to detect localized events, such as public events or emergency situations. Users posting messages that are close to the location of an event serve as human sensors to describe an event. In this demonstration, we present a novel framework to detect localized events in real-time from a Twitter stream and to track the evolution of such events over time. For this, spatio-temporal characteristics of keywords are continuously extracted to identify meaningful candidates for event descriptions. Then, localized event information is extracted by clustering keywords according to their spatial similarity. To determine the most important events in a (recent) time frame, we introduce a scoring scheme for events. We demonstrate the functionality of our system, called Even-Tweet, using a stream of tweets from Europe during the 2012 UEFA European Football Championship.

268 citations

Proceedings ArticleDOI
06 Nov 2012
TL;DR: This work proposes a framework that transforms the unstructured and noisy geographic information in social media into a high-dimensional multivariate signal of geographic semantics, and uses dimensionality reduction to extract latent geographic features.
Abstract: In this work we present a framework for the unsupervised extraction of latent geographic features from georeferenced social media. A geographic feature represents a semantic dimension of a location and can be seen as a sensor that measures a signal of geographic semantics. Our goal is to extract a small number of informative geographic features from social media, to describe and explore geographic space, and for subsequent spatial analysis, e.g., in market research. We propose a framework that, first, transforms the unstructured and noisy geographic information in social media into a high-dimensional multivariate signal of geographic semantics. Then, we use dimensionality reduction to extract latent geographic features. We conduct experiments using two large-scale Flickr data sets covering the LA area and the US. We show that dimensionality reduction techniques extracting sparse latent features find dimensions with higher informational value. In addition, we show that prior normalization can be used as a parameter in the exploration process to extract features representing different geographic characteristics, that is, landmarks, regional phenomena, or global phenomena.

41 citations

Proceedings ArticleDOI
28 Mar 2011
TL;DR: The query suggestion system called CONQUER, which allows to efficiently suggest queries for a given partial query and a number of available query context observations, and uses a suggestion model that is based on the combined probabilities of sequential query patterns and context observations.
Abstract: Many of today's search engines provide autocompletion while the user is typing a query string. This type of dynamic query suggestion can help users to formulate queries that better represent their search intent during Web search interactions. In this paper, we demonstrate our query suggestion system called CONQUER, which allows to efficiently suggest queries for a given partial query and a number of available query context observations. The context-awareness allows for suggesting queries tailored to a given context, e.g., the user location or the time of day. CONQUER uses a suggestion model that is based on the combined probabilities of sequential query patterns and context observations. For this, the weight of a context in a query suggestion can be adjusted online, for example, based on the learned user behavior or user profiles. We demonstrate the functionality of CONQUER based on 6 million queries from an AOL query log using the time of day and the country domain of the clicked URLs in the search result as context observations.

28 citations

Proceedings ArticleDOI
05 Nov 2013
TL;DR: This paper proposes a novel graph-based regularization procedure that uses spatial cooccurrences of bursty words and allows for computing sound spatial signatures and evaluates the functionality of the online processing framework using two real-world Twitter datasets.
Abstract: Social networking and microblogging services such as Twitter provide a continuous source of data from which useful information can be extracted. The detection and characterization of bursty words play an important role in processing such data, as bursty words might hint to events or trending topics of social importance upon which actions can be triggered. While there are several approaches to extract bursty words from the content of messages, there is only little work that deals with the dynamics of continuous streams of messages, in particular messages that are geo-tagged. In this paper, we present a framework to identify bursty words from Twitter text streams and to describe such words in terms of their spatio-temporal characteristics. Using a time-aware word usage baseline, a sliding window approach over incoming tweets is proposed to identify words that satisfy some burstiness threshold. For these words then a time-varying, spatial signature is determined, which primarily relies on geo-tagged tweets. In order to deal with the noise and the sparsity of geo-tagged tweets, we propose a novel graph-based regularization procedure that uses spatial cooccurrences of bursty words and allows for computing sound spatial signatures. We evaluate the functionality of our online processing framework using two real-world Twitter datasets. The results show that our framework can efficiently and reliably extract bursty words and describe their spatio-temporal evolution over time.

21 citations

Proceedings ArticleDOI
10 Dec 2012
TL;DR: A new general class of interestingness measures that are based on the spatial distribution of co-location patterns allow to judge the interestingness of a pattern based on properties of the underlying spatial feature distribution.
Abstract: Co-location pattern mining aims at finding subsets of spatial features frequently located together in spatial proximity. The underlying motivation is to model the spatial correlation structure between the features. This allows to discover interesting co-location rules (feature interactions) for spatial analysis and prediction tasks. As in association rule mining, a major problem is the huge amount of possible patterns and rules. Hence, measures are needed to identify interesting patterns and rules. Existing approaches so far focused on finding frequent patterns, patterns including rare features, and patterns occurring in small (local) regions. In this paper, we present a new general class of interestingness measures that are based on the spatial distribution of co-location patterns. These measures allow to judge the interestingness of a pattern based on properties of the underlying spatial feature distribution. The results are different from standard measures like participation index or confidence. To demonstrate the usefulness of these measures, we apply our approach to the discovery of rules on a subset of the OpenStreetMap point-of-interest data.

21 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A broad survey of this relatively young field of spatio-temporal data mining is presented, and literature is classified into six major categories: clustering, predictive learning, change detection, frequent pattern mining, anomaly detection, and relationship mining.
Abstract: Large volumes of spatio-temporal data are increasingly collected and studied in diverse domains, including climate science, social sciences, neuroscience, epidemiology, transportation, mobile health, and Earth sciences. Spatio-temporal data differ from relational data for which computational approaches are developed in the data-mining community for multiple decades in that both spatial and temporal attributes are available in addition to the actual measurements/attributes. The presence of these attributes introduces additional challenges that needs to be dealt with. Approaches for mining spatio-temporal data have been studied for over a decade in the data-mining community. In this article, we present a broad survey of this relatively young field of spatio-temporal data mining. We discuss different types of spatio-temporal data and the relevant data-mining questions that arise in the context of analyzing each of these datasets. Based on the nature of the data-mining problem studied, we classify literature on spatio-temporal data mining into six major categories: clustering, predictive learning, change detection, frequent pattern mining, anomaly detection, and relationship mining. We discuss the various forms of spatio-temporal data-mining problems in each of these categories.

266 citations

Journal ArticleDOI
TL;DR: An up‐to‐date survey of itemset mining problems and the relationship to other popular pattern mining problems, such as sequential pattern mining, episode mining, subgraph mining, and association rule mining are discussed.
Abstract: Itemset mining is an important subfield of data mining, which consists of discovering interesting and useful patterns in transaction databases. The traditional task of frequent itemset mining is to discover groups of items (itemsets) that appear frequently together in transactions made by customers. Although itemset mining was designed for market basket analysis, it can be viewed more generally as the task of discovering groups of attribute values frequently cooccurring in databases. Because of its numerous applications in domains such as bioinformatics, text mining, product recommendation, e-learning, and web click stream analysis, itemset mining has become a popular research area. This study provides an up-to-date survey that can serve both as an introduction and as a guide to recent advances and opportunities in the field. The problem of frequent itemset mining and its applications are described. Moreover, main approaches and strategies to solve itemset mining problems are presented, as well as their characteristics are provided. Limitations of traditional frequent itemset mining approaches are also highlighted, and extensions of the task of itemset mining are presented such as high-utility itemset mining, rare itemset mining, fuzzy itemset mining, and uncertain itemset mining. This study also discusses research opportunities and the relationship to other popular pattern mining problems, such as sequential pattern mining, episode mining, subgraph mining, and association rule mining. Main open-source libraries of itemset mining implementations are also briefly presented. WIREs Data Mining Knowl Discov 2017, 7:e1207. doi: 10.1002/widm.1207

197 citations

Journal ArticleDOI
TL;DR: A semantic topic model classification and spatial autocorrelation analysis is applied to detect tweets indicating specific human social activities, showing an overall strong positive correlation in comparison with workplace population census data, being a good indicator and representative proxy for analyzing workplace-based activities.

146 citations

Proceedings ArticleDOI
03 Apr 2017
TL;DR: CrossMap is presented, a novel cross-modal representation learning method that uncovers urban dynamics with massive GTSM data and significantly outperforms state-of-the-art methods for activity recovery and classification, but also achieves much better efficiency.
Abstract: With the ever-increasing urbanization process, systematically modeling people's activities in the urban space is being recognized as a crucial socioeconomic task. This task was nearly impossible years ago due to the lack of reliable data sources, yet the emergence of geo-tagged social media (GTSM) data sheds new light on it. Recently, there have been fruitful studies on discovering geographical topics from GTSM data. However, their high computational costs and strong distributional assumptions about the latent topics hinder them from fully unleashing the power of GTSM. To bridge the gap, we present CrossMap, a novel cross-modal representation learning method that uncovers urban dynamics with massive GTSM data. CrossMap first employs an accelerated mode seeking procedure to detect spatiotemporal hotspots underlying people's activities. Those detected hotspots not only address spatiotemporal variations, but also largely alleviate the sparsity of the GTSM data. With the detected hotspots, CrossMap then jointly embeds all spatial, temporal, and textual units into the same space using two different strategies: one is reconstruction-based and the other is graph-based. Both strategies capture the correlations among the units by encoding their co-occurrence and neighborhood relationships, and learn low-dimensional representations to preserve such correlations. Our experiments demonstrate that CrossMap not only significantly outperforms state-of-the-art methods for activity recovery and classification, but also achieves much better efficiency.

142 citations

Journal ArticleDOI
TL;DR: The STMP integrates the heterogeneous big data streams, such as the IoT, smart sensors, and social media, to detect concept drifts, distinguish between the recurrent and non-recurrent traffic events, and impact propagation, traffic flow forecasting, commuter sentiment analysis, and optimized traffic control decisions.
Abstract: The technological landscape of intelligent transport systems (ITS) has been radically transformed by the emergence of the big data streams generated by the Internet of Things (IoT), smart sensors, surveillance feeds, social media, as well as growing infrastructure needs. It is timely and pertinent that ITS harness the potential of an artificial intelligence (AI) to develop the big data-driven smart traffic management solutions for effective decision-making. The existing AI techniques that function in isolation exhibit clear limitations in developing a comprehensive platform due to the dynamicity of big data streams, high-frequency unlabeled data generation from the heterogeneous data sources, and volatility of traffic conditions. In this paper, we propose an expansive smart traffic management platform (STMP) based on the unsupervised online incremental machine learning, deep learning, and deep reinforcement learning to address these limitations. The STMP integrates the heterogeneous big data streams, such as the IoT, smart sensors, and social media, to detect concept drifts, distinguish between the recurrent and non-recurrent traffic events, and impact propagation, traffic flow forecasting, commuter sentiment analysis, and optimized traffic control decisions. The platform is successfully demonstrated on 190 million records of smart sensor network traffic data generated by 545,851 commuters and corresponding social media data on the arterial road network of Victoria, Australia.

141 citations