scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The conceptual graph (CG) formalism as proposed by Sowa is modified and extended to represent the concepts and their relationships in the documents to generate an objective summary of all relevant documents.
Abstract: In this paper we propose a methodology to mine concepts from documents and use these concepts to generate an objective summary of all relevant documents. We use the conceptual graph (CG) formalism as proposed by Sowa to represent the concepts and their relationships in the documents. In the present work we have modified and extended the definition of the concept given by Sowa. The modified and extended definition is discussed in detail in section 2 of this paper. A CG of a set of relevant documents can be considered as a semantic network. The semantic network is generated by automatically extracting CG for each document and merging them into one. We discuss (i) generation of semantic network using CGs and (ii) generation of multi-document summary. Here we use restricted Boltzmann machines, a deep learning technique, for automatically extracting CGs. We have tested our methodology using MultiLing 2015 corpus. We have obtained encouraging results, which are comparable to those from the state of the art systems.

4 citations

Proceedings ArticleDOI
15 May 2019
TL;DR: This work hybridizes three components, viz.
Abstract: In this work, we aim to develop an abstractive summarization system in the multi-document setup. The main challenge in this kind of a system is the identification of redundant information. Our approach hybridizes three components, viz. Clustering, Word Graphs, Neural Networks. In clustering, all the information from multiple documents is divided amongst clusters based on context and importance analysis, such that each cluster possesses sentences of a similar context - Redundancy Identification. Further, Shortest Path Detection in Word Graphs reduces the text. Along with that, we use a sequence to sequence sentence compression and perform paraphrasing using Supervised Recurrent Neural Network to generate an almost completely abstractive summary. The dataset DUC 2004 that was used indicates that the proposed system outperforms other systems in terms of metrics like ROUGE[1] and BLEU[2].

4 citations

Proceedings Article
01 May 2020
TL;DR: This paper proposes GameWikiSum, a new domain-specific dataset for multi-document summarization, which is one hundred times larger than commonly used datasets, and in another domain than news.
Abstract: Today’s research progress in the field of multi-document summarization is obstructed by the small number of available datasets. Since the acquisition of reference summaries is costly, existing datasets contain only hundreds of samples at most, resulting in heavy reliance on hand-crafted features or necessitating additional, manually annotated data. The lack of large corpora therefore hinders the development of sophisticated models. Additionally, most publicly available multi-document summarization corpora are in the news domain, and no analogous dataset exists in the video game domain. In this paper, we propose GameWikiSum, a new domain-specific dataset for multi-document summarization, which is one hundred times larger than commonly used datasets, and in another domain than news. Input documents consist of long professional video game reviews as well as references of their gameplay sections in Wikipedia pages. We analyze the proposed dataset and show that both abstractive and extractive models can be trained on it. We release GameWikiSum for further research: https://github.com/Diego999/GameWikiSum.

4 citations

01 Jan 2008
TL;DR: A fast query-based multi-document summarizer based solely on word-frequency features of clusters, documents and topics called FastSum, which can rely on a minimal set of features leading to fast processing times: 1250 news documents in 60 seconds.
Abstract: We present a fast query-based multi-document summarizer called FastSum based solely on word-frequency features of clusters, documents and topics. Summary sentences are ranked by a regression SVM. The summarizer does not use any expensive NLP techniques such as parsing, tagging of names or even part of speech information. Still, the achieved accuracy is comparable to the best systems presented in recent academic competitions (i.e., Document Understanding Conference (DUC)). Because of a detailed feature analysis using Least Angle Regression (LARS), FastSum can rely on a minimal set of features leading to fast processing times: 1250 news documents in 60 seconds.

4 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852