scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings ArticleDOI
01 Sep 2001
TL;DR: This paper defines summarization in terms of a probabilistic language model and uses the definition to explore a new technique for automatically generating topic hierarchies by applying a graph-theoretic algorithm, which is an approximation of the Dominating Set Problem.
Abstract: Hierarchies have long been used for organization, summarization, and access to information. In this paper we define summarization in terms of a probabilistic language model and use the definition to explore a new technique for automatically generating topic hierarchies by applying a graph-theoretic algorithm, which is an approximation of the Dominating Set Problem. The algorithm efficiently chooses terms according to a language model. We compare the new technique to previous methods proposed for constructing topic hierarchies including subsumption and lexical hierarchies, as well as the top TF.IDF terms. Our results show that the new technique consistently performs as well as or better than these other techniques. They also show the usefulness of hierarchies compared with a list of terms.

190 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: This paper forms a summarization model which captures common-sense properties of a good summary, and shows that it can be solved as a submodular function maximization with partition matroid constraints, opening the door to a rich body of work from combinatorial optimization.
Abstract: With the proliferation of wearable cameras, the number of videos of users documenting their personal lives using such devices is rapidly increasing. Since such videos may span hours, there is an important need for mechanisms that represent the information content in a compact form (i.e., shorter videos which are more easily browsable/sharable). Motivated by these applications, this paper focuses on the problem of egocentric video summarization. Such videos are usually continuous with significant camera shake and other quality issues. Because of these reasons, there is growing consensus that direct application of standard video summarization tools to such data yields unsatisfactory performance. In this paper, we demonstrate that using gaze tracking information (such as fixation and saccade) significantly helps the summarization task. It allows meaningful comparison of different image frames and enables deriving personalized summaries (gaze provides a sense of the camera wearer's intent). We formulate a summarization model which captures common-sense properties of a good summary, and show that it can be solved as a submodular function maximization with partition matroid constraints, opening the door to a rich body of work from combinatorial optimization. We evaluate our approach on a new gaze-enabled egocentric video dataset (over 15 hours), which will be a valuable standalone resource.

184 citations

Journal ArticleDOI
TL;DR: This work proposes an automatic summarization approach based on the analysis of review articles' internal topic structure to assemble customer concerns and shows that the proposed approach outperforms the peer approaches, i.e. opinion mining and clustering-summarization, in terms of users' responsiveness and its ability to discover the most important topics.
Abstract: Product reviews possess critical information regarding customers' concerns and their experience with the product. Such information is considered essential to firms' business intelligence which can be utilized for the purpose of conceptual design, personalization, product recommendation, better customer understanding, and finally attract more loyal customers. Previous studies of deriving useful information from customer reviews focused mainly on numerical and categorical data. Textual data have been somewhat ignored although they are deemed valuable. Existing methods of opinion mining in processing customer reviews concentrates on counting positive and negative comments of review writers, which is not enough to cover all important topics and concerns across different review articles. Instead, we propose an automatic summarization approach based on the analysis of review articles' internal topic structure to assemble customer concerns. Different from the existing summarization approaches centered on sentence ranking and clustering, our approach discovers and extracts salient topics from a set of online reviews and further ranks these topics. The final summary is then generated based on the ranked topics. The experimental study and evaluation show that the proposed approach outperforms the peer approaches, i.e. opinion mining and clustering-summarization, in terms of users' responsiveness and its ability to discover the most important topics.

184 citations

Journal Article
TL;DR: A linear-time algorithm for lexical chain computation is presented that makes lexical chains a computationally feasible candidate as an intermediate representation for automatic text summarization.
Abstract: While automatic text summarization is an area that has received a great deal of attention in recent research, the problem of efficiency in this task has not been frequently addressed. When the size and quantity of documents available on the Internet and from other sources are considered, the need for a highly efficient tool that produces usable summaries is clear. We present a linear-time algorithm for lexical chain computation. The algorithm makes lexical chains a computationally feasible candidate as an intermediate representation for automatic text summarization. A method for evaluating lexical chains as an intermediate step in summarization is also presented and carried out. Such an evaluation was heretofore not possible because of the computational complexity of previous lexical chains algorithms.

181 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852