Sumblr: continuous summarization of evolving tweet streams

doi:10.1145/2484028.2484045

Proceedings ArticleDOI

Sumblr: continuous summarization of evolving tweet streams

Lidan Shou, +3 more

- pp 533-542

Chats0

TLDR

This paper proposes a novel prototype called Sumblr (SUMmarization By stream cLusteRing) for tweet streams, and develops a TCV-Rank summarization technique for generating online summaries and historical summaries of arbitrary time durations.

Abstract:

With the explosive growth of microblogging services, short-text messages (also known as tweets) are being created and shared at an unprecedented rate. Tweets in its raw form can be incredibly informative, but also overwhelming. For both end-users and data analysts it is a nightmare to plow through millions of tweets which contain enormous noises and redundancies. In this paper, we study continuous tweet summarization as a solution to address this problem. While traditional document summarization methods focus on static and small-scale data, we aim to deal with dynamic, quickly arriving, and large-scale tweet streams. We propose a novel prototype called Sumblr (SUMmarization By stream cLusteRing) for tweet streams. We first propose an online tweet stream clustering algorithm to cluster tweets and maintain distilled statistics called Tweet Cluster Vectors. Then we develop a TCV-Rank summarization technique for generating online summaries and historical summaries of arbitrary time durations. Finally, we describe a topic evolvement detection method, which consumes online and historical summaries to produce timelines automatically from tweet streams. Our experiments on large-scale real tweets demonstrate the efficiency and effectiveness of our approach.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Processing Social Media Messages in Mass Emergency: A Survey

Muhammad Imran, +3 more

- 26 Jun 2015 -

ACM Computing Surveys

TL;DR: This survey surveys the state of the art regarding computational methods to process social media messages and highlights both their contributions and shortcomings, and methodically examines a series of key subproblems ranging from the detection of events to the creation of actionable and useful summaries.

...read moreread less

Proceedings ArticleDOI

A dirichlet multinomial mixture model-based approach for short text clustering

Jianhua Yin, +1 more

TL;DR: This paper proposed a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model for short text clustering and found that GSDMM can infer the number of clusters automatically with a good balance between the completeness and homogeneity of the clustering results, and is fast to converge.

...read moreread less

BookDOI

A Practical Guide to Sentiment Analysis

Erik Cambria, +3 more

TL;DR: The main aim of this book is to provide a feasible research platform to ambitious researchers towards developing the practical solutions that will be indeed beneficial for the authors' society, business and future researches as well.

...read moreread less

Proceedings ArticleDOI

Extracting Situational Information from Microblogs during Disaster Events: a Classification-Summarization Approach

Koustav Rudra, +4 more

TL;DR: A novel framework which first classifies tweets to extract situational information, and then summarizes the information achieves superior performance compared to state-of-the-art tweet summarization approaches.

...read moreread less

Proceedings ArticleDOI

STREAMCUBE: Hierarchical spatio-temporal hashtag clustering for event exploration over the Twitter stream

Wei Feng, +6 more

TL;DR: This paper focuses on hierarchical spatio-temporal hashtag clustering techniques and proposes a data structure called STREAMCUBE, which is an extension of the data cube structure from the database community with spatial and temporal hierarchy.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ROUGE: A Package for Automatic Evaluation of Summaries

Chin-Yew Lin

TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.

...read moreread less

Proceedings ArticleDOI

BIRCH: an efficient data clustering method for very large databases

Tian Zhang, +2 more

TL;DR: Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) as discussed by the authors is a data clustering method that is especially suitable for very large databases.

...read moreread less

Journal ArticleDOI

LexRank: graph-based lexical centrality as salience in text summarization

Gunes Erkan, +1 more

- 01 Jul 2004 -

Journal of Artificial Intelligence Resea...

TL;DR: LexRank as discussed by the authors is a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing (NLP), which is based on the concept of eigenvector centrality.

...read moreread less

Journal ArticleDOI

The use of MMR, diversity-based reranking for reordering documents and producing summaries

Jaime Carbinell, +1 more

TL;DR: A method for combining query-relevance with information-novelty in the context of text retrieval and summarization and preliminary results indicate some benefits for MMR diversity ranking in document retrieval and in single document summarization.

...read moreread less

Book ChapterDOI

A framework for clustering evolving data streams

Charu C. Aggarwal, +3 more

TL;DR: A fundamentally different philosophy for data stream clustering is discussed which is guided by application-centered requirements and uses the concepts of a pyramidal time frame in conjunction with a microclustering approach.

...read moreread less

Collapse

Sumblr: continuous summarization of evolving tweet streams

Citations

Processing Social Media Messages in Mass Emergency: A Survey

A dirichlet multinomial mixture model-based approach for short text clustering

A Practical Guide to Sentiment Analysis

Extracting Situational Information from Microblogs during Disaster Events: a Classification-Summarization Approach

STREAMCUBE: Hierarchical spatio-temporal hashtag clustering for event exploration over the Twitter stream

References

ROUGE: A Package for Automatic Evaluation of Summaries

BIRCH: an efficient data clustering method for very large databases

LexRank: graph-based lexical centrality as salience in text summarization

The use of MMR, diversity-based reranking for reordering documents and producing summaries

A framework for clustering evolving data streams

Related Papers (5)

LexRank: graph-based lexical centrality as salience in text summarization

Latent dirichlet allocation

ROUGE: A Package for Automatic Evaluation of Summaries

Event Summarization Using Tweets

Dynamic topic models