scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings ArticleDOI
01 Aug 2017
TL;DR: This paper aims to develop extractive summarization system on Indonesian parliamentary meeting minutes using rule-based information extraction with regular expression and achieves performance in terms of ROUGE-2 F-measure.
Abstract: Meeting minutes contain many important decisions and fact from a meeting. Since meeting minutes are unstructured document, summarization should be conducted in order to easily get the main information. Some works in this research area have been done for meeting minutes in English, but have not conducted yet in Indonesian. Therefore, this paper aims to develop extractive summarization system on Indonesian parliamentary meeting minutes using rule-based information extraction with regular expression. Summary structure is defined based on summaries from the House of Representatives of the Republic of Indonesia. Regular expressions are designed to recognize patterns in meeting minutes and then to fill 24 slot values in summary templates. Our summarizer consists of four main processes i.e.: preprocessing, information extraction, postprocess, and template filling. Our experimental results evaluated by the ROUGE summarization metrics achieves performance in terms of ROUGE-2 F-measure of 0.718.

2 citations

Journal ArticleDOI
TL;DR: In this paper , a text document is compressed using a summarizing system to produce a new form that conveys the core idea of the content it contains using text summarization method.

2 citations

Proceedings Article
01 May 2004
TL;DR: The process of developing a taxonomy of cohesion problems and corrective revision operators that address such problems are described, as well as an annotation schema for a corpus of 240 extractive, multi-document summaries that have been manually revised to promote cohesion.
Abstract: Multi-document summaries produced via sentence extraction often suffer from a number of cohesion problems, including dangling anaphora, sudden shifts in topic and incorrect or awkward chronological ordering. Therefore, the development of an automated revision process to correct such problems is a research area of current interest. We present the RevisionBank, a corpus of 240 extractive, multi-document summaries that have been manually revised to promote cohesion. The summaries were revised by six linguistic students using a constrained set of revision operations that we previously developed. In the current paper, we describe the process of developing a taxonomy of cohesion problems and corrective revision operators that address such problems, as well as an annotation schema for our corpus. Finally, we discuss how our taxonomy and corpus can be used for the study of revision-based multi-document summarization as well as for summary evaluation.

2 citations

Journal ArticleDOI
29 Jun 2018
TL;DR: This paper proposes an inter and intra cluster which consist of four weighted criteria functions (coherence, coverage, diversity, and inter-cluster analysis) to be optimized by using SaDE (Self Adaptive Differential Evolution) to get the best summary result.
Abstract: Multi – document as one of summarization type has become more challenging issue than single-document because its larger space and its different content of each document. Hence, some of optimization algorithms consider some criteria in producing the best summary, such as relevancy, content coverage, and diversity. Those weighted criteria based on the assumption that the multi-documents are already located in the same cluster. However, in a certain condition, multi-documents consist of many categories and need to be considered too. In this paper, we propose an inter and intra cluster which consist of four weighted criteria functions (coherence, coverage, diversity, and inter-cluster analysis) to be optimized by using SaDE (Self Adaptive Differential Evolution) to get the best summary result. Therefore, the proposed method will deal not only with the value of compactness quality of the cluster within but also the separation of each cluster. Experimental results on Text Analysis Conference (TAC) 2008 datasets yields better summaries results with average ROUGE-1 on precision, recall, and f - measure 0.77, 0.07, and 0.12 compared to another method that only consider the analysis of intra-cluster.

2 citations

Proceedings ArticleDOI
01 Dec 2016
TL;DR: This paper studied the effectiveness of this issue about event elements on the size of event about CEC corpus, and found that recall and precision had got better results to many other methods and the average value of F of this method can be raised to 0.63, which can better generalize the text content.
Abstract: When adopting traditional automatic summarization, it emerged information redundancy and incomplete content covering, but currently the mainstream automatic summarization turned towards to extracting words. This paper studied the effectiveness of this issue about event elements on the size of event. Firstly obtaining the event elements through the tagged CEC corpus, then building an event elements network, calculating each node importance of the event elements network, finally getting the concise summary sentences and outputting the text summarization in accordance with the original text sequence. Experiments were conducted on CEC corpus, recall and precision had got better results to many other methods and the average value of F of this method can be raised to 0.63, which can better generalize the text content.

2 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852