scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Book ChapterDOI
TL;DR: Wang et al. as discussed by the authors proposed a query expansion method which combines multiple query expansion methods to better represent query information, and at the same time, it makes a useful attempt on manifold ranking.
Abstract: Manifold ranking has been successfully applied in query-oriented multi-document summarization. It not only makes use of the relationships among the sentences, but also the relationships between the given query and the sentences. However, the information of original query is often insufficient. So we present a query expansion method, which is combined in the manifold ranking to resolve this problem. Our method not only utilizes the information of the query term itself and the knowledge base WordNet to expand it by synonyms, but also uses the information of the document set itself to expand the query in various ways (mean expansion, variance expansion and TextRank expansion). Compared with the previous query expansion methods, our method combines multiple query expansion methods to better represent query information, and at the same time, it makes a useful attempt on manifold ranking. In addition, we use the degree of word overlap and the proximity between words to calculate the similarity between sentences. We performed experiments on the datasets of DUC 2006 and DUC2007, and the evaluation results show that the proposed query expansion method can significantly improve the system performance and make our system comparable to the state-of-the-art systems.

2 citations

Proceedings ArticleDOI
10 Feb 2023
TL;DR: In this article , a prototype-driven continuous summarization (PDSum) algorithm is proposed for multi-document sets stream summarization, which builds a lightweight prototype of each document set and exploits it to adapt to new documents while preserving accumulated knowledge from previous documents.
Abstract: Summarizing text-rich documents has been long studied in the literature, but most of the existing efforts have been made to summarize a static and predefined multi-document set. With the rapid development of online platforms for generating and distributing text-rich documents, there arises an urgent need for continuously summarizing dynamically evolving multi-document sets where the composition of documents and sets is changing over time. This is especially challenging as the summarization should be not only effective in incorporating relevant, novel, and distinctive information from each concurrent multi-document set, but also efficient in serving online applications. In this work, we propose a new summarization problem, Evolving Multi-Document sets stream Summarization (EMDS), and introduce a novel unsupervised algorithm PDSum with the idea of prototype-driven continuous summarization. PDSum builds a lightweight prototype of each multi-document set and exploits it to adapt to new documents while preserving accumulated knowledge from previous documents. To update new summaries, the most representative sentences for each multi-document set are extracted by measuring their similarities to the prototypes. A thorough evaluation with real multi-document sets streams demonstrates that PDSum outperforms state-of-the-art unsupervised multi-document summarization algorithms in EMDS in terms of relevance, novelty, and distinctiveness and is also robust to various evaluation settings.

2 citations

Proceedings ArticleDOI
03 Oct 2013
TL;DR: A multi-document scanning mechanism by simulating human reading process that simulates human memory of words, association between words and three cognitive processes invoked when reading is proposed.
Abstract: Automatic text summarization is an important and useful research area in natural language processing and information retrieval. Most of current approaches for text summarization do not make full use of human reading process. This paper proposes a multi-document scanning mechanism by simulating human reading process. The mechanism simulates human memory of words, association between words and three cognitive processes invoked when reading. Changes of human memory of topic words in reading process are used to denote sentences' significance, based on which sentences are then ordered and extracted to form a summary. Experiments on DUC2007 test data show that our proposing method is efficient and outperforms two baseline methods.

2 citations

26 Jun 2017
TL;DR: Mapping of section names in scholarly articles provided a relatively good coverage for a large number of articles in the TREC Genomics collection.
Abstract:  Abstract Introduction; Background  Introduction Research Design and Methods; Materials and Methods, Methods  Method Preliminary Results; Early Results; Results; Result  Result Conclusion; Summary  Conclusion Table 13: Mapping of section names in scholarly articles Although section name variants in Table 13 may not be exhaustive and may not include every variation under which a particular section name appears, this mapping strategy provided a relatively good coverage for a large number of articles in the TREC Genomics collection. Table 14 indicates the results of the mapping process on the three TREC 2006 Genomics

2 citations

Proceedings Article
01 Aug 2018
TL;DR: This paper proposes an extractive multi-document summarization approach based on an ant colony system to optimize the information coverage of summary sentences and achieves the best scores based on several ROUGE metrics.
Abstract: This paper proposes an extractive multi-document summarization approach based on an ant colony system to optimize the information coverage of summary sentences. The implemented system was evaluated on both English and Arabic versions of the corpus of the Text Analysis Conference 2011 MultiLing Pilot by using ROUGE metrics. The evaluation results are promising in comparison to those of the participating systems. Indeed, our system achieved the best scores based on several ROUGE metrics.

2 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852