scispace - formally typeset
Proceedings ArticleDOI

TrioVecEvent: Embedding-Based Online Local Event Detection in Geo-Tagged Tweet Streams

Reads0
Chats0
TLDR
Crowdourcing is used to evaluate TrioVecEvent, a method that leverages multimodal embeddings to achieve accurate online local event detection and introduces discriminative features that can well characterize local events.
Abstract
Detecting local events (e.g., protest, disaster) at their onsets is an important task for a wide spectrum of applications, ranging from disaster control to crime monitoring and place recommendation. Recent years have witnessed growing interest in leveraging geo-tagged tweet streams for online local event detection. Nevertheless, the accuracies of existing methods still remain unsatisfactory for building reliable local event detection systems. We propose TrioVecEvent, a method that leverages multimodal embeddings to achieve accurate online local event detection. The effectiveness of TrioVecEvent is underpinned by its two-step detection scheme. First, it ensures a high coverage of the underlying local events by dividing the tweets in the query window into coherent geo-topic clusters. To generate quality geo-topic clusters, we capture short-text semantics by learning multimodal embeddings of the location, time, and text, and then perform online clustering with a novel Bayesian mixture model. Second, TrioVecEvent considers the geo-topic clusters as candidate events and extracts a set of features for classifying the candidates. Leveraging the multimodal embeddings as background knowledge, we introduce discriminative features that can well characterize local events, which enables pinpointing true local events from the candidate pool with a small amount of training data. We have used crowdsourcing to evaluate TrioVecEvent, and found that it improves the performance of the state-of-the-art method by a large margin.

read more

Citations
More filters
Proceedings ArticleDOI

Weakly-Supervised Neural Text Classification

TL;DR: In this article, a pseudo-document generator is used to generate pseudo-labeled documents for model pre-training, and a self-training module is used for model refinement.
Proceedings ArticleDOI

Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks

TL;DR: HEER as discussed by the authors proposes a comprehensive transcription of heterogeneous information networks (HINs), which embeds HINs via edge representations that are further coupled with properly-learned heterogeneous metrics.
Proceedings ArticleDOI

Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks

TL;DR: The HEER algorithm is proposed, which embeds HINs via edge representations that are further coupled with properly-learned heterogeneous metrics, and demonstrates the effectiveness of the proposed HEER model and the utility of edge representations and heterogeneity metrics.
Proceedings ArticleDOI

MiST: A Multiview and Multimodal Spatial-Temporal Learning Framework for Citywide Abnormal Event Forecasting

TL;DR: A Multi-View and Multi-Modal Spatial-Temporal learning (MiST) framework to address the above challenges by promoting the collaboration of different views (spatial, temporal and semantic) and map the multi-modal units into the same latent space.
Proceedings ArticleDOI

Weakly-Supervised Neural Text Classification

TL;DR: This paper proposes a weakly-supervised method that addresses the lack of training data in neural text classification and achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Posted Content

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
Book

Machine Learning : A Probabilistic Perspective

TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
Related Papers (5)