Home
/
Authors
/
Christian Sengstock

Author

Christian Sengstock

Bio: Christian Sengstock is an academic researcher from Heidelberg University. The author has contributed to research in topics: Topic model & Association rule learning. The author has an hindex of 6, co-authored 14 publications receiving 375 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

EvenTweet: online localized event detection from twitter

[...]

Hamed Abdelhaq¹, Christian Sengstock¹, Michael Gertz¹•Institutions (1)

Heidelberg University¹

01 Aug 2013

TL;DR: This work presents a novel framework to detect localized events in real-time from a Twitter stream and to track the evolution of such events over time, using a stream of tweets from Europe during the 2012 UEFA European Football Championship.

...read moreread less

Abstract: Microblogging services such as Twitter, Facebook, and Foursquare have become major sources for information about real-world events. Most approaches that aim at extracting event information from such sources typically use the temporal context of messages. However, exploiting the location information of georeferenced messages, too, is important to detect localized events, such as public events or emergency situations. Users posting messages that are close to the location of an event serve as human sensors to describe an event. In this demonstration, we present a novel framework to detect localized events in real-time from a Twitter stream and to track the evolution of such events over time. For this, spatio-temporal characteristics of keywords are continuously extracted to identify meaningful candidates for event descriptions. Then, localized event information is extracted by clustering keywords according to their spatial similarity. To determine the most important events in a (recent) time frame, we introduce a scoring scheme for events. We demonstrate the functionality of our system, called Even-Tweet, using a stream of tweets from Europe during the 2012 UEFA European Football Championship.

...read moreread less

268 citations

Proceedings Article•DOI•

Latent geographic feature extraction from social media

[...]

Christian Sengstock¹, Michael Gertz¹•Institutions (1)

Heidelberg University¹

06 Nov 2012

TL;DR: This work proposes a framework that transforms the unstructured and noisy geographic information in social media into a high-dimensional multivariate signal of geographic semantics, and uses dimensionality reduction to extract latent geographic features.

...read moreread less

Abstract: In this work we present a framework for the unsupervised extraction of latent geographic features from georeferenced social media. A geographic feature represents a semantic dimension of a location and can be seen as a sensor that measures a signal of geographic semantics. Our goal is to extract a small number of informative geographic features from social media, to describe and explore geographic space, and for subsequent spatial analysis, e.g., in market research. We propose a framework that, first, transforms the unstructured and noisy geographic information in social media into a high-dimensional multivariate signal of geographic semantics. Then, we use dimensionality reduction to extract latent geographic features. We conduct experiments using two large-scale Flickr data sets covering the LA area and the US. We show that dimensionality reduction techniques extracting sparse latent features find dimensions with higher informational value. In addition, we show that prior normalization can be used as a parameter in the exploration process to extract features representing different geographic characteristics, that is, landmarks, regional phenomena, or global phenomena.

...read moreread less

41 citations

Proceedings Article•DOI•

CONQUER: a system for efficient context-aware query suggestions

[...]

Christian Sengstock¹, Michael Gertz¹•Institutions (1)

Heidelberg University¹

28 Mar 2011

TL;DR: The query suggestion system called CONQUER, which allows to efficiently suggest queries for a given partial query and a number of available query context observations, and uses a suggestion model that is based on the combined probabilities of sequential query patterns and context observations.

...read moreread less

Abstract: Many of today's search engines provide autocompletion while the user is typing a query string. This type of dynamic query suggestion can help users to formulate queries that better represent their search intent during Web search interactions. In this paper, we demonstrate our query suggestion system called CONQUER, which allows to efficiently suggest queries for a given partial query and a number of available query context observations. The context-awareness allows for suggesting queries tailored to a given context, e.g., the user location or the time of day. CONQUER uses a suggestion model that is based on the combined probabilities of sequential query patterns and context observations. For this, the weight of a context in a query suggestion can be adjusted online, for example, based on the learned user behavior or user profiles. We demonstrate the functionality of CONQUER based on 6 million queries from an AOL query log using the time of day and the country domain of the clicked URLs in the search result as context observations.

...read moreread less

28 citations

Proceedings Article•DOI•

Spatio-temporal characteristics of bursty words in Twitter streams

[...]

Hamed Abdelhaq¹, Michael Gertz¹, Christian Sengstock¹•Institutions (1)

Heidelberg University¹

05 Nov 2013

TL;DR: This paper proposes a novel graph-based regularization procedure that uses spatial cooccurrences of bursty words and allows for computing sound spatial signatures and evaluates the functionality of the online processing framework using two real-world Twitter datasets.

...read moreread less

Abstract: Social networking and microblogging services such as Twitter provide a continuous source of data from which useful information can be extracted. The detection and characterization of bursty words play an important role in processing such data, as bursty words might hint to events or trending topics of social importance upon which actions can be triggered. While there are several approaches to extract bursty words from the content of messages, there is only little work that deals with the dynamics of continuous streams of messages, in particular messages that are geo-tagged. In this paper, we present a framework to identify bursty words from Twitter text streams and to describe such words in terms of their spatio-temporal characteristics. Using a time-aware word usage baseline, a sliding window approach over incoming tweets is proposed to identify words that satisfy some burstiness threshold. For these words then a time-varying, spatial signature is determined, which primarily relies on geo-tagged tweets. In order to deal with the noise and the sparsity of geo-tagged tweets, we propose a novel graph-based regularization procedure that uses spatial cooccurrences of bursty words and allows for computing sound spatial signatures. We evaluate the functionality of our online processing framework using two real-world Twitter datasets. The results show that our framework can efficiently and reliably extract bursty words and describe their spatio-temporal evolution over time.

...read moreread less

21 citations

Proceedings Article•DOI•

Spatial Interestingness Measures for Co-location Pattern Mining

[...]

Christian Sengstock¹, Michael Gertz¹, Tran Van Canh¹•Institutions (1)

Heidelberg University¹

10 Dec 2012

TL;DR: A new general class of interestingness measures that are based on the spatial distribution of co-location patterns allow to judge the interestingness of a pattern based on properties of the underlying spatial feature distribution.

...read moreread less

Abstract: Co-location pattern mining aims at finding subsets of spatial features frequently located together in spatial proximity. The underlying motivation is to model the spatial correlation structure between the features. This allows to discover interesting co-location rules (feature interactions) for spatial analysis and prediction tasks. As in association rule mining, a major problem is the huge amount of possible patterns and rules. Hence, measures are needed to identify interesting patterns and rules. Existing approaches so far focused on finding frequent patterns, patterns including rare features, and patterns occurring in small (local) regions. In this paper, we present a new general class of interestingness measures that are based on the spatial distribution of co-location patterns. These measures allow to judge the interestingness of a pattern based on properties of the underlying spatial feature distribution. The results are different from standard measures like participation index or confidence. To demonstrate the usefulness of these measures, we apply our approach to the discovery of rules on a subset of the OpenStreetMap point-of-interest data.

...read moreread less

21 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Spatio-Temporal Data Mining: A Survey of Problems and Methods

[...]

Gowtham Atluri¹, Anuj Karpatne², Vipin Kumar²•Institutions (2)

University of Cincinnati¹, University of Minnesota²

22 Aug 2018-ACM Computing Surveys

TL;DR: A broad survey of this relatively young field of spatio-temporal data mining is presented, and literature is classified into six major categories: clustering, predictive learning, change detection, frequent pattern mining, anomaly detection, and relationship mining.

...read moreread less

Abstract: Large volumes of spatio-temporal data are increasingly collected and studied in diverse domains, including climate science, social sciences, neuroscience, epidemiology, transportation, mobile health, and Earth sciences. Spatio-temporal data differ from relational data for which computational approaches are developed in the data-mining community for multiple decades in that both spatial and temporal attributes are available in addition to the actual measurements/attributes. The presence of these attributes introduces additional challenges that needs to be dealt with. Approaches for mining spatio-temporal data have been studied for over a decade in the data-mining community. In this article, we present a broad survey of this relatively young field of spatio-temporal data mining. We discuss different types of spatio-temporal data and the relevant data-mining questions that arise in the context of analyzing each of these datasets. Based on the nature of the data-mining problem studied, we classify literature on spatio-temporal data mining into six major categories: clustering, predictive learning, change detection, frequent pattern mining, anomaly detection, and relationship mining. We discuss the various forms of spatio-temporal data-mining problems in each of these categories.

...read moreread less

266 citations

Journal Article•DOI•

A survey of itemset mining

[...]

Philippe Fournier-Viger¹, Jerry Chun-Wei Lin¹, Bay Vo², Bay Vo³, Tin Truong Chi, Ji Zhang⁴, Hoai Bac Le⁵ - Show less +3 more•Institutions (5)

Harbin Institute of Technology Shenzhen Graduate School¹, Sejong University², Ho Chi Minh City University of Technology³, University of Southern Queensland⁴, Ho Chi Minh City University of Science⁵

01 Jul 2017-Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

TL;DR: An up‐to‐date survey of itemset mining problems and the relationship to other popular pattern mining problems, such as sequential pattern mining, episode mining, subgraph mining, and association rule mining are discussed.

...read moreread less

Abstract: Itemset mining is an important subfield of data mining, which consists of discovering interesting and useful patterns in transaction databases. The traditional task of frequent itemset mining is to discover groups of items (itemsets) that appear frequently together in transactions made by customers. Although itemset mining was designed for market basket analysis, it can be viewed more generally as the task of discovering groups of attribute values frequently cooccurring in databases. Because of its numerous applications in domains such as bioinformatics, text mining, product recommendation, e-learning, and web click stream analysis, itemset mining has become a popular research area. This study provides an up-to-date survey that can serve both as an introduction and as a guide to recent advances and opportunities in the field. The problem of frequent itemset mining and its applications are described. Moreover, main approaches and strategies to solve itemset mining problems are presented, as well as their characteristics are provided. Limitations of traditional frequent itemset mining approaches are also highlighted, and extensions of the task of itemset mining are presented such as high-utility itemset mining, rare itemset mining, fuzzy itemset mining, and uncertain itemset mining. This study also discusses research opportunities and the relationship to other popular pattern mining problems, such as sequential pattern mining, episode mining, subgraph mining, and association rule mining. Main open-source libraries of itemset mining implementations are also briefly presented. WIREs Data Mining Knowl Discov 2017, 7:e1207. doi: 10.1002/widm.1207

...read moreread less

197 citations

Journal Article•DOI•

Twitter as an indicator for whereabouts of people? Correlating Twitter with UK census data

[...]

Enrico Steiger¹, Rene Westerholt¹, Bernd Resch¹, Alexander Zipf¹•Institutions (1)

Heidelberg University¹

01 Nov 2015-Computers, Environment and Urban Systems

TL;DR: A semantic topic model classification and spatial autocorrelation analysis is applied to detect tweets indicating specific human social activities, showing an overall strong positive correlation in comparison with workplace population census data, being a good indicator and representative proxy for analyzing workplace-based activities.

...read moreread less

146 citations

Proceedings Article•DOI•

Regions, Periods, Activities: Uncovering Urban Dynamics via Cross-Modal Representation Learning

[...]

Chao Zhang¹, Keyang Zhang¹, Quan Yuan¹, Haoruo Peng¹, Yu Zheng², Timothy Hanratty³, Shaowen Wang¹, Jiawei Han¹ - Show less +4 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, Microsoft², United States Army Research Laboratory³

03 Apr 2017

TL;DR: CrossMap is presented, a novel cross-modal representation learning method that uncovers urban dynamics with massive GTSM data and significantly outperforms state-of-the-art methods for activity recovery and classification, but also achieves much better efficiency.

...read moreread less

Abstract: With the ever-increasing urbanization process, systematically modeling people's activities in the urban space is being recognized as a crucial socioeconomic task. This task was nearly impossible years ago due to the lack of reliable data sources, yet the emergence of geo-tagged social media (GTSM) data sheds new light on it. Recently, there have been fruitful studies on discovering geographical topics from GTSM data. However, their high computational costs and strong distributional assumptions about the latent topics hinder them from fully unleashing the power of GTSM. To bridge the gap, we present CrossMap, a novel cross-modal representation learning method that uncovers urban dynamics with massive GTSM data. CrossMap first employs an accelerated mode seeking procedure to detect spatiotemporal hotspots underlying people's activities. Those detected hotspots not only address spatiotemporal variations, but also largely alleviate the sparsity of the GTSM data. With the detected hotspots, CrossMap then jointly embeds all spatial, temporal, and textual units into the same space using two different strategies: one is reconstruction-based and the other is graph-based. Both strategies capture the correlations among the units by encoding their co-occurrence and neighborhood relationships, and learn low-dimensional representations to preserve such correlations. Our experiments demonstrate that CrossMap not only significantly outperforms state-of-the-art methods for activity recovery and classification, but also achieves much better efficiency.

...read moreread less

142 citations

Journal Article•DOI•

Online Incremental Machine Learning Platform for Big Data-Driven Smart Traffic Management

[...]

Dinithi Nallaperuma¹, Rashmika Nawaratne¹, Tharindu Bandaragoda¹, Achini Adikari¹, Su Nguyen¹, Thimal Kempitiya¹, Daswin De Silva¹, Damminda Alahakoon¹, Dakshan Pothuhera² - Show less +5 more•Institutions (2)

La Trobe University¹, VicRoads²

11 Jul 2019-IEEE Transactions on Intelligent Transportation Systems

TL;DR: The STMP integrates the heterogeneous big data streams, such as the IoT, smart sensors, and social media, to detect concept drifts, distinguish between the recurrent and non-recurrent traffic events, and impact propagation, traffic flow forecasting, commuter sentiment analysis, and optimized traffic control decisions.

...read moreread less

Abstract: The technological landscape of intelligent transport systems (ITS) has been radically transformed by the emergence of the big data streams generated by the Internet of Things (IoT), smart sensors, surveillance feeds, social media, as well as growing infrastructure needs. It is timely and pertinent that ITS harness the potential of an artificial intelligence (AI) to develop the big data-driven smart traffic management solutions for effective decision-making. The existing AI techniques that function in isolation exhibit clear limitations in developing a comprehensive platform due to the dynamicity of big data streams, high-frequency unlabeled data generation from the heterogeneous data sources, and volatility of traffic conditions. In this paper, we propose an expansive smart traffic management platform (STMP) based on the unsupervised online incremental machine learning, deep learning, and deep reinforcement learning to address these limitations. The STMP integrates the heterogeneous big data streams, such as the IoT, smart sensors, and social media, to detect concept drifts, distinguish between the recurrent and non-recurrent traffic events, and impact propagation, traffic flow forecasting, commuter sentiment analysis, and optimized traffic control decisions. The platform is successfully demonstrated on 190 million records of smart sensor network traffic data generated by 545,851 commuters and corresponding social media data on the arterial road network of Victoria, Australia.

...read moreread less

141 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77

Collapse