scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings ArticleDOI
01 Aug 2016
TL;DR: MDSWriter is a novel open-source annotation tool for creating multi-document summarization corpora that divides the complex summarization task into multiple steps which enables it to efficiently guide the annotators, to store all their intermediate results, and to record user-system interaction data.
Abstract: In this paper, we present MDSWriter, a novel open-source annotation tool for creating multi-document summarization corpora. A major innovation of our tool is that we divide the complex summarization task into multiple steps which enables us to efficiently guide the annotators, to store all their intermediate results, and to record user-system interaction data. This allows for evaluating the individual components of a complex summarization system and learning from the human writing process. MDSWriter is highly flexible and can be adapted to various other tasks.

9 citations

Proceedings Article
Xiaojun Wan1
01 Dec 2012
TL;DR: Evaluation results on the most recent TAC2011 dataset demonstrate that the proposed co- ranking method can outperform the original co-ranking method and other baselines.
Abstract: Update summarization is an emerging summarization task of creating a short summary of a set of news articles, under the assumption that the user has already read a given set of earlier articles. In this paper, we propose a new co-ranking method to address the update summarization task. The proposed method integrates two co-ranking processes by adding strict constraints. In comparison with the original co-ranking method, the proposed method can compute more accurate scores of sentences for the purpose of update summarization. Evaluation results on the most recent TAC2011 dataset demonstrate that our proposed method can outperform the original co-ranking method and other baselines.

9 citations

Patent
16 Dec 2002
TL;DR: In a document information processing apparatus, intermediate information, which contains the same character information as in document information created by a document creation application and is used for reduction of the amount of the document information.
Abstract: In a document information processing apparatus, intermediate information, which contains the same character information as in document information created by a document creation application and is used for reduction of the amount of the document information, is generated based on the document information, word information contained in the document information or in the intermediate information is extracted, and summary information is generated by adding the extracted word information to the intermediate information which was subjected to a reduction of amount of information according to the need. The generated summary information not only has a small data volume but also contains all the word information, and is therefore usable for a searching process using character information, such as full-text searching.

9 citations

Proceedings ArticleDOI
01 Dec 2015
TL;DR: This paper comprehensively compare the word and paragraph embedding methods for spoken document summarization, and proposes a novel summarization framework which can take both relevance and redundancy information into account simultaneously.
Abstract: Representation learning has emerged as a newly active research subject in many machine learning applications because of its excellent performance. As an instantiation, word embedding has been widely used in the natural language processing area. However, as far as we are aware, there are relatively few studies investigating paragraph embedding methods in extractive text or speech summarization. Extractive summarization aims at selecting a set of indicative sentences from a source document to express the most important theme of the document. There is a general consensus that relevance and redundancy are both critical issues for users in a realistic summarization scenario. However, most of the existing methods focus on determining only the relevance degree between sentences and a given document, while the redundancy degree is calculated by a post-processing step. Based on these observations, three contributions are proposed in this paper. First, we comprehensively compare the word and paragraph embedding methods for spoken document summarization. Next, we propose a novel summarization framework which can take both relevance and redundancy information into account simultaneously. Consequently, a set of representative sentences can be automatically selected through a one-pass process. Third, we further plug in paragraph embedding methods into the proposed framework to enhance the summarization performance. Experimental results demonstrate the effectiveness of our proposed methods, compared to existing state-of-the-art methods.

9 citations

Proceedings ArticleDOI
27 Aug 2014
TL;DR: A novel multi-document summarization approach to summarizing news documents by incorporating relevant pictures to improve the readability of summary and construct a unified semantic link network on concepts, sentences and pictures.
Abstract: As the information explosion is becoming more and more seriously, effective and efficient multi-document summarization techniques are becoming more and more necessary. Previous document summarization approaches mainly focus on texts. The poor readability of summaries prevents these approaches from widely practical use. This paper proposes a novel multi-document summarization approach to summarizing news documents by incorporating relevant pictures to improve the readability of summary. We construct a unified semantic link network on concepts, sentences and pictures, and then propose a mutual reinforcement network method to calculate the saliency scores of the concepts, pictures and sentences simultaneously. An Integer Liner Programming (ILP) model is used to select the important, closely related and succinct sentences and pictures. Experiments show that our approach can generate more readable and understandable summary.

9 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852