scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings ArticleDOI
13 Dec 2010
TL;DR: By defining a proper distortion measure and a new representation method, the combination of the last two models (the linear representation model and the facility location model) gains good experimental results on the DUC2002 and DUC2004 datasets.
Abstract: Document summarization plays an important role in the area of natural language processing and text mining. This paper proposes several novel information-theoretic models for multi-document summarization. They consider document summarization as a transmission system and assume that the best summary should have the minimum distortion. By defining a proper distortion measure and a new representation method, the combination of the last two models (the linear representation model and the facility location model) gains good experimental results on the DUC2002 and DUC2004 datasets. Moreover, we also indicate that the model has high interpretability and extensibility.

18 citations

Proceedings ArticleDOI
Roy Bar-Haim1, Yoav Kantor1, Lilach Eden1, Roni Friedman1, Dan Lahav1, Noam Slonim1 
01 Nov 2020
TL;DR: This work develops a method for automatic extraction of key points, which enables fully automatic analysis, and is shown to achieve performance comparable to a human expert, and demonstrates that the applicability of key point analysis goes well beyond argumentation data.
Abstract: When summarizing a collection of views, arguments or opinions on some topic, it is often desirable not only to extract the most salient points, but also to quantify their prevalence. Work on multi-document summarization has traditionally focused on creating textual summaries, which lack this quantitative aspect. Recent work has proposed to summarize arguments by mapping them to a small set of expert-generated key points, where the salience of each key point corresponds to the number of its matching arguments. The current work advances key point analysis in two important respects: first, we develop a method for automatic extraction of key points, which enables fully automatic analysis, and is shown to achieve performance comparable to a human expert. Second, we demonstrate that the applicability of key point analysis goes well beyond argumentation data. Using models trained on publicly available argumentation datasets, we achieve promising results in two additional domains: municipal surveys and user reviews. An additional contribution is an in-depth evaluation of argument-to-key point matching models, where we substantially outperform previous results.

18 citations

Proceedings ArticleDOI
24 Nov 2003
TL;DR: The experiments on different genres of musical video and comparisons with the summaries only based on music track and video track indicate that the results of summarization using proposed method are significant and effective to help realize user's expectation.
Abstract: In this paper, we propose a novel approach to automatically summarize musical videos. The proposed summarization scheme is different from the current methods used for video summarization. The musical video is separated into the musical and visual tracks. A music summary is created by analyzing the music content based on music features, adaptive clustering algorithm and musical domain knowledge. Then, shots are detected and clustered in the visual track. Finally, the music video summary is created by aligning the music summary and clustered video shots. Subjective studies by experienced users have been conducted to evaluate the quality of summarization. The experiments on different genres of musical video and comparisons with the summaries only based on music track and video track indicate that the results of summarization using proposed method are significant and effective to help realize user's expectation.

18 citations

Proceedings ArticleDOI
21 Mar 2011
TL;DR: Evaluations with pyramid method indicates that including a corpus specific vocabulary to the traditional summarization methods improves the performance but not significantly, while results show that the state of the art summarization method LexRank is not feasible for scientific corpus summarization because of its high computational cost.
Abstract: In this paper, we investigated four approaches for scientific corpora summarization when only gold-standard keyterms available. MEAD with built-in default vocabulary, MEAD with corpus specific vocabulary extracted by Keyphrase Extraction Algorithm (KEA), LexRank (a state-of-the-art summarization algorithm based on random walk) and W3SS (summarization algorithm based on keyword density) are tested on two Computer Science research paper collections. We use a content evaluation method, pyramid method, instead of the well-known ROUGE metrics since there are no gold-standard summaries available for our data. Evaluations with pyramid method indicates that including a corpus specific vocabulary to the traditional summarization methods improves the performance but not significantly. On the other hand, visual inspection shows us that current content evaluation methods, which use only the gold-standard keyterm information, are not intuitive and focus must turn into better evaluation techniques especially for the multi-document summarization problem. Even though the pyramid method looks for important keyterms in the resulting summaries, it cannot distinguish between a general introductory sentence about the area and a specific sentence on the core idea, if they both contain the same keyterm. Also, our results show that the state of the art summarization method LexRank is not feasible for scientific corpus summarization because of its high computational cost.

18 citations

Proceedings ArticleDOI
29 Aug 2013
TL;DR: A novel technique for generating the summarization of domain specific text from a single Web document by using statistical NLP techniques on the text in a reference corpus and on the web document is presented.
Abstract: Due to an exponential growth in the generation of web data, the need for tools and mechanisms for automatic summarization of Web documents has become very critical. Web data can be accessed from multiple sources, for e.g. on different Web pages, which makes searching for relevant pieces of information a difficult task. Therefore, an automatic summarizer is vital towards reducing human effort. Text summarization is an important activity in the analysis of a high volume text documents and is currently a major research topic in Natural Language Processing. It is the process of generation of the summary of an input document by extracting the representative sentences from it. In this paper, we present a novel technique for generating the summarization of domain specific text from a single Web document by using statistical NLP techniques on the text in a reference corpus and on the web document. The summarizer proposed generates a summary based on the calculated Sentence Weight (SW), the rank of a sentence in the document's content, the number of terms and the number of words in a sentence, and using term frequency in the input corpus.

18 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852