Data Summarization with Social Contexts

doi:10.1145/2983323.2983736

Home
/
Papers
/
Data Summarization with Social Contexts

Open AccessProceedings ArticleDOI

Data Summarization with Social Contexts

Hao Zhuang,Rameez Rahman,Xia Hu,Tian Guo,Pan Hui,Karl Aberer +5 moreÉcole Polytechnique Fédérale de Lausanne,Texas A&M University,Hong Kong University of Science and Technology

- pp 397-406

Show Less

Chats0

TLDR

This paper analyzes Twitter data and discovers two social contexts which are important for topic generation and dissemination, namely (i) CrowdExp topic score that captures the influence of both the crowd and the expert users in Twitter and (ii) Retweet topic Score that capturesThe influence of Twitter users' actions.

Abstract:

While social data is being widely used in various applications such as sentiment analysis and trend prediction, its sheer size also presents great challenges for storing, sharing and processing such data. These challenges can be addressed by data summarization which transforms the original dataset into a smaller, yet still useful, subset. Existing methods find such subsets with objective functions based on data properties such as representativeness or informativeness but do not exploit social contexts, which are distinct characteristics of social data. Further, till date very little work has focused on topic preserving data summarization, despite the abundant work on topic modeling. This is a challenging task for two reasons. First, since topic model is based on latent variables, existing methods are not well-suited to capture latent topics. Second, it is difficult to find such social contexts that provide valuable information for building effective topic-preserving summarization model. To tackle these challenges, in this paper, we focus on exploiting social contexts to summarize social data while preserving topics in the original dataset. We take Twitter data as a case study. Through analyzing Twitter data, we discover two social contexts which are important for topic generation and dissemination, namely (i) CrowdExp topic score that captures the influence of both the crowd and the expert users in Twitter and (ii) Retweet topic score that captures the influence of Twitter users' actions. We conduct extensive experiments on two real-world Twitter datasets using two applications. The experimental results show that, by leveraging social contexts, our proposed solution can enhance topic-preserving data summarization and improve application performance by up to 18%.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Retaining Data from Streams of Social Platforms with Minimal Regret

Nguyen Thanh Tam,Matthias Weidlich,Duong Chi Thang,Hongzhi Yin,Nguyen Quoc Viet Hung +4 moreÉcole Polytechnique Fédérale de Lausanne,Humboldt University of Berlin,University of Queensland

Show Less

TL;DR: This paper proposes techniques to effectively decide which data to retain, such that the induced loss of information, the regret of neglecting certain data, is minimized.

...read moreread less

Proceedings Article

Robust Guarantees of Stochastic Greedy Algorithms.

Avinatan Hassidim,Yaron Singer +1 moreBar-Ilan University,Harvard University

Show Less

TL;DR: This paper shows that for maximizing a monotone submodular function under a cardinality constraint, iteratively selecting an element whose marginal contribution is approximately maximal in expectation is a sufficient condition to obtain the optimal approximation guarantee with exponentially high probability, assuming the cardinality is sufficiently large.

...read moreread less

Journal ArticleDOI

Efficient Representative Subset Selection over Sliding Windows

Yanhao Wang,Yuchen Li,Kian-Lee Tan +2 moreNational University of Singapore,Singapore Management University

- 01 Jul 2019 -

IEEE Transactions on Knowledge and Data ...

Show Less

TL;DR: This work forms dynamic RSS in data streams as maximizing submodular functions subject to general $d$d-knapsack constraints (SMDK) over sliding windows as a subroutine and evaluates the efficiency and solution quality of KW and KW+ in real-world datasets.

...read moreread less

Book ChapterDOI

Scalable Approximation Algorithm for Graph Summarization

Maham Anwar Beg,Muhammad Ahmad,Arif Zaman,Imdadullah Khan +3 moreLahore University of Management Sciences

Show Less

TL;DR: In this paper, a weighted sampling scheme is proposed to sample vertices for merging that will result in the least reconstruction error, and the running time of the algorithm is shown to be polynomial in the number of vertices.

...read moreread less

Posted Content

Scalable Approximation Algorithm for Graph Summarization

Maham Anwar Beg,Muhammad Ahmad,Arif Zaman,Imdadullah Khan +3 moreLahore University of Management Sciences

- 11 Jun 2018 -

arXiv: Data Structures and Algorithms

Show Less

TL;DR: In this article, a weighted sampling scheme is proposed to sample vertices for merging that will result in the least reconstruction error, and analytical bounds on the running time of the algorithm and prove approximation guarantee for their score computation.

...read moreread less

1
2
3
4
…

References

PDF

Open Access

More filters

Journal ArticleDOI

Latent dirichlet allocation

David M. Blei,Andrew Y. Ng,Michael I. Jordan +2 moreUniversity of California, Berkeley,Stanford University

- 01 Mar 2003 -

Journal of Machine Learning Research

Show Less

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Proceedings Article

Latent Dirichlet Allocation

David M. Blei,Andrew Y. Ng,Michael I. Jordan +2 moreUniversity of California, Berkeley

Show Less

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

Journal ArticleDOI

LexRank: graph-based lexical centrality as salience in text summarization

Gunes Erkan,Dragomir R. Radev +1 moreUniversity of Michigan

- 01 Jul 2004 -

Journal of Artificial Intelligence Resea...

Show Less

TL;DR: LexRank as discussed by the authors is a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing (NLP), which is based on the concept of eigenvector centrality.

...read moreread less

Proceedings ArticleDOI

Recommender systems with social regularization

Hao Ma,Dengyong Zhou,Chao Liu,Michael R. Lyu,Irwin King +4 moreThe Chinese University of Hong Kong,Microsoft,AT&T Labs

Show Less

TL;DR: This paper proposes a matrix factorization framework with social regularization, which can be easily extended to incorporate other contextual information, like social tags, etc, and demonstrates that the approaches outperform other state-of-the-art methods.

...read moreread less

Book ChapterDOI

Comparing twitter and traditional media using topic models

Wayne Xin Zhao,Jing Jiang,Jianshu Weng,Jing He,Ee-Peng Lim,Hongfei Yan,Xiaoming Li +6 morePeking University,Singapore Management University

Show Less

TL;DR: This paper empirically compare the content of Twitter with a traditional news medium, New York Times, using unsupervised topic modeling, and finds interesting and useful findings for downstream IR or DM applications.

...read moreread less