Proceedings ArticleDOI
Sumblr: continuous summarization of evolving tweet streams
Lidan Shou,Zhenhua Wang,Ke Chen,Gang Chen +3 more
- pp 533-542
Reads0
Chats0
TLDR
This paper proposes a novel prototype called Sumblr (SUMmarization By stream cLusteRing) for tweet streams, and develops a TCV-Rank summarization technique for generating online summaries and historical summaries of arbitrary time durations.Abstract:
With the explosive growth of microblogging services, short-text messages (also known as tweets) are being created and shared at an unprecedented rate. Tweets in its raw form can be incredibly informative, but also overwhelming. For both end-users and data analysts it is a nightmare to plow through millions of tweets which contain enormous noises and redundancies. In this paper, we study continuous tweet summarization as a solution to address this problem. While traditional document summarization methods focus on static and small-scale data, we aim to deal with dynamic, quickly arriving, and large-scale tweet streams. We propose a novel prototype called Sumblr (SUMmarization By stream cLusteRing) for tweet streams. We first propose an online tweet stream clustering algorithm to cluster tweets and maintain distilled statistics called Tweet Cluster Vectors. Then we develop a TCV-Rank summarization technique for generating online summaries and historical summaries of arbitrary time durations. Finally, we describe a topic evolvement detection method, which consumes online and historical summaries to produce timelines automatically from tweet streams. Our experiments on large-scale real tweets demonstrate the efficiency and effectiveness of our approach.read more
Citations
More filters
Journal ArticleDOI
Processing Social Media Messages in Mass Emergency: A Survey
TL;DR: This survey surveys the state of the art regarding computational methods to process social media messages and highlights both their contributions and shortcomings, and methodically examines a series of key subproblems ranging from the detection of events to the creation of actionable and useful summaries.
Proceedings ArticleDOI
A dirichlet multinomial mixture model-based approach for short text clustering
Jianhua Yin,Jianyong Wang +1 more
TL;DR: This paper proposed a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model for short text clustering and found that GSDMM can infer the number of clusters automatically with a good balance between the completeness and homogeneity of the clustering results, and is fast to converge.
BookDOI
A Practical Guide to Sentiment Analysis
TL;DR: The main aim of this book is to provide a feasible research platform to ambitious researchers towards developing the practical solutions that will be indeed beneficial for the authors' society, business and future researches as well.
Proceedings ArticleDOI
Extracting Situational Information from Microblogs during Disaster Events: a Classification-Summarization Approach
TL;DR: A novel framework which first classifies tweets to extract situational information, and then summarizes the information achieves superior performance compared to state-of-the-art tweet summarization approaches.
Proceedings ArticleDOI
STREAMCUBE: Hierarchical spatio-temporal hashtag clustering for event exploration over the Twitter stream
TL;DR: This paper focuses on hierarchical spatio-temporal hashtag clustering techniques and proposes a data structure called STREAMCUBE, which is an extension of the data cube structure from the database community with spatial and temporal hierarchy.
References
More filters
Proceedings Article
ROUGE: A Package for Automatic Evaluation of Summaries
TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.
Proceedings ArticleDOI
BIRCH: an efficient data clustering method for very large databases
TL;DR: Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) as discussed by the authors is a data clustering method that is especially suitable for very large databases.
Journal ArticleDOI
LexRank: graph-based lexical centrality as salience in text summarization
Gunes Erkan,Dragomir R. Radev +1 more
TL;DR: LexRank as discussed by the authors is a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing (NLP), which is based on the concept of eigenvector centrality.
Journal ArticleDOI
The use of MMR, diversity-based reranking for reordering documents and producing summaries
Jaime Carbinell,Jade Goldstein +1 more
TL;DR: A method for combining query-relevance with information-novelty in the context of text retrieval and summarization and preliminary results indicate some benefits for MMR diversity ranking in document retrieval and in single document summarization.
Book ChapterDOI
A framework for clustering evolving data streams
TL;DR: A fundamentally different philosophy for data stream clustering is discussed which is guided by application-centered requirements and uses the concepts of a pyramidal time frame in conjunction with a microclustering approach.