Proceedings ArticleDOI
TrioVecEvent: Embedding-Based Online Local Event Detection in Geo-Tagged Tweet Streams
Chao Zhang,Liyuan Liu,Dongming Lei,Quan Yuan,Honglei Zhuang,Timothy Hanratty,Jiawei Han +6 more
- pp 595-604
Reads0
Chats0
TLDR
Crowdourcing is used to evaluate TrioVecEvent, a method that leverages multimodal embeddings to achieve accurate online local event detection and introduces discriminative features that can well characterize local events.Abstract:
Detecting local events (e.g., protest, disaster) at their onsets is an important task for a wide spectrum of applications, ranging from disaster control to crime monitoring and place recommendation. Recent years have witnessed growing interest in leveraging geo-tagged tweet streams for online local event detection. Nevertheless, the accuracies of existing methods still remain unsatisfactory for building reliable local event detection systems. We propose TrioVecEvent, a method that leverages multimodal embeddings to achieve accurate online local event detection. The effectiveness of TrioVecEvent is underpinned by its two-step detection scheme. First, it ensures a high coverage of the underlying local events by dividing the tweets in the query window into coherent geo-topic clusters. To generate quality geo-topic clusters, we capture short-text semantics by learning multimodal embeddings of the location, time, and text, and then perform online clustering with a novel Bayesian mixture model. Second, TrioVecEvent considers the geo-topic clusters as candidate events and extracts a set of features for classifying the candidates. Leveraging the multimodal embeddings as background knowledge, we introduce discriminative features that can well characterize local events, which enables pinpointing true local events from the candidate pool with a small amount of training data. We have used crowdsourcing to evaluate TrioVecEvent, and found that it improves the performance of the state-of-the-art method by a large margin.read more
Citations
More filters
Proceedings ArticleDOI
Weakly-Supervised Neural Text Classification
TL;DR: In this article, a pseudo-document generator is used to generate pseudo-labeled documents for model pre-training, and a self-training module is used for model refinement.
Proceedings ArticleDOI
Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks
TL;DR: HEER as discussed by the authors proposes a comprehensive transcription of heterogeneous information networks (HINs), which embeds HINs via edge representations that are further coupled with properly-learned heterogeneous metrics.
Proceedings ArticleDOI
Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks
TL;DR: The HEER algorithm is proposed, which embeds HINs via edge representations that are further coupled with properly-learned heterogeneous metrics, and demonstrates the effectiveness of the proposed HEER model and the utility of edge representations and heterogeneity metrics.
Proceedings ArticleDOI
MiST: A Multiview and Multimodal Spatial-Temporal Learning Framework for Citywide Abnormal Event Forecasting
TL;DR: A Multi-View and Multi-Modal Spatial-Temporal learning (MiST) framework to address the above challenges by promoting the collaboration of different views (spatial, temporal and semantic) and map the multi-modal units into the same latent space.
Proceedings ArticleDOI
Weakly-Supervised Neural Text Classification
TL;DR: This paper proposes a weakly-supervised method that addresses the lack of training data in neural text classification and achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.
References
More filters
Journal ArticleDOI
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article
Latent Dirichlet Allocation
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Proceedings Article
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Posted Content
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
Book
Machine Learning : A Probabilistic Perspective
TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.