scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This thesis proposes a two-stage framework for effective summarization of multi-topic Web sites and designs and develops a classification approach in the cluster summarization stage, and demonstrates that the proposed clustering summarization approach significantly outperforms the single-topic summarizing approach for any given Web site summarization task.
Abstract: Web site summarization, which identifies the essential content covered in a given Web site, plays an important role in Web information management. However, straightforward summarization of an entire Web site, which is large and with diverse content, may lead to a summary heavily biased to a subset of main topics covered in the target Web site. In this thesis, we propose a two-stage framework for effective summarization of multi-topic Web sites. The first stage identifies the main topics covered in a Web site and the second stage summarizes each topic separately. In order to identify the different topics covered in a Web site, we perform both text- and link-based clustering. In text-based clustering, we investigate the impact of document representation and feature selection on the clustering quality. In link-based clustering, we study co-citation and bibliographic coupling. We demonstrate that text-based clustering based on the selection of features with high variance over Web pages is reliable and that outgoing links can be used to improve the clustering quality if a rich set of cross links is available. Each individual cluster computed above is summarized using an extraction-based summarization system, which extracts key phrases and key sentences from source documents to generate a summary. The performance of such an extraction-based Web site summarization system depends on its underlying key phrase extraction method. Hence, we conduct a user study to investigate five alternative key phrase extraction methods. Results show that the best method combines linguistic constraints with frequency over the corpus adjusted to take into account nesting of terms. Another important component in an extraction based summarization system is the key sentence extraction. To this end, we design and develop a classification approach in the cluster summarization stage. The classifier uses statistical and linguistic features to determine the topical significance of each sentence. Finally, we evaluate the proposed system via a user study. We demonstrate that the proposed clustering summarization approach significantly outperforms the single-topic summarization approach for any given Web site summarization task.

1 citations

Journal Article
TL;DR: A multi-document summarization method based on the concept co-occurrence model that uses the weight and similarity of sentences to extract summary and experimental results show the system has more effectiveness and feasibility.
Abstract: In this paper,we propose a multi-document summarization method based on the concept co-occurrence model.The method uses HowNet to obtain the concept of word,constructing a concept vector sapce model(CVSM);uses the concept co-occurrence frequency and lexical attraction and repulsion model to construct the concept co-occurrence model;uses the concept co-occurrence model and CVSM to compute the weight of concept sentences.According to the weight and similarity of sentences to extract summary,the experimental results show the system has more effectiveness and feasibility.

1 citations

Proceedings ArticleDOI
01 Oct 2015
TL;DR: It is demonstrated that the accuracy of this scoring process can be improved by looking beyond the text found within each input news story, and that summarization performance can be greatly enhanced if it also considers signals and cues from other related news stories.
Abstract: One common approach to single-document news summarization involves scoring and ranking individual sentences within an input story. We demonstrate that the accuracy of this scoring process can be improved by looking beyond the text found within each input news story. Leveraging on an external corpus of past news articles, we show that summarization performance can be greatly enhanced if we also consider signals and cues from other related news stories. Working on top of a basic keyword-based summarization system, we expanded the set of keywords we have from the original news stories with related stories retrieved from the external corpus. With this enhancement, we are able to get significant improvements of at least 10% and 16% in ROUGE-1 and ROUGE-2 respectively.

1 citations

Journal ArticleDOI
TL;DR: A model to explain the relationship between text granularities, point out the way based on the semantic relation to test text similarity, and adopt a strategy to sentence dynamically allocation to produce summarization is constructed.

1 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852