scispace - formally typeset
Proceedings ArticleDOI

Sumblr: continuous summarization of evolving tweet streams

Reads0
Chats0
TLDR
This paper proposes a novel prototype called Sumblr (SUMmarization By stream cLusteRing) for tweet streams, and develops a TCV-Rank summarization technique for generating online summaries and historical summaries of arbitrary time durations.
Abstract
With the explosive growth of microblogging services, short-text messages (also known as tweets) are being created and shared at an unprecedented rate. Tweets in its raw form can be incredibly informative, but also overwhelming. For both end-users and data analysts it is a nightmare to plow through millions of tweets which contain enormous noises and redundancies. In this paper, we study continuous tweet summarization as a solution to address this problem. While traditional document summarization methods focus on static and small-scale data, we aim to deal with dynamic, quickly arriving, and large-scale tweet streams. We propose a novel prototype called Sumblr (SUMmarization By stream cLusteRing) for tweet streams. We first propose an online tweet stream clustering algorithm to cluster tweets and maintain distilled statistics called Tweet Cluster Vectors. Then we develop a TCV-Rank summarization technique for generating online summaries and historical summaries of arbitrary time durations. Finally, we describe a topic evolvement detection method, which consumes online and historical summaries to produce timelines automatically from tweet streams. Our experiments on large-scale real tweets demonstrate the efficiency and effectiveness of our approach.

read more

Citations
More filters
Journal ArticleDOI

Processing Social Media Messages in Mass Emergency: A Survey

TL;DR: This survey surveys the state of the art regarding computational methods to process social media messages and highlights both their contributions and shortcomings, and methodically examines a series of key subproblems ranging from the detection of events to the creation of actionable and useful summaries.
Proceedings ArticleDOI

A dirichlet multinomial mixture model-based approach for short text clustering

TL;DR: This paper proposed a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model for short text clustering and found that GSDMM can infer the number of clusters automatically with a good balance between the completeness and homogeneity of the clustering results, and is fast to converge.
BookDOI

A Practical Guide to Sentiment Analysis

TL;DR: The main aim of this book is to provide a feasible research platform to ambitious researchers towards developing the practical solutions that will be indeed beneficial for the authors' society, business and future researches as well.
Proceedings ArticleDOI

Extracting Situational Information from Microblogs during Disaster Events: a Classification-Summarization Approach

TL;DR: A novel framework which first classifies tweets to extract situational information, and then summarizes the information achieves superior performance compared to state-of-the-art tweet summarization approaches.
Proceedings ArticleDOI

STREAMCUBE: Hierarchical spatio-temporal hashtag clustering for event exploration over the Twitter stream

TL;DR: This paper focuses on hierarchical spatio-temporal hashtag clustering techniques and proposes a data structure called STREAMCUBE, which is an extension of the data cube structure from the database community with spatial and temporal hierarchy.
References
More filters
Proceedings Article

ROUGE: A Package for Automatic Evaluation of Summaries

TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.
Proceedings ArticleDOI

BIRCH: an efficient data clustering method for very large databases

TL;DR: Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) as discussed by the authors is a data clustering method that is especially suitable for very large databases.
Journal ArticleDOI

LexRank: graph-based lexical centrality as salience in text summarization

TL;DR: LexRank as discussed by the authors is a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing (NLP), which is based on the concept of eigenvector centrality.
Journal ArticleDOI

The use of MMR, diversity-based reranking for reordering documents and producing summaries

TL;DR: A method for combining query-relevance with information-novelty in the context of text retrieval and summarization and preliminary results indicate some benefits for MMR diversity ranking in document retrieval and in single document summarization.
Book ChapterDOI

A framework for clustering evolving data streams

TL;DR: A fundamentally different philosophy for data stream clustering is discussed which is guided by application-centered requirements and uses the concepts of a pyramidal time frame in conjunction with a microclustering approach.