scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This work proposes two new original datasets collected from two Italian news websites with multi-sentence summaries and corresponding articles, and from a dataset obtained by machine translation of a Spanish summarization dataset, which demonstrated the superiority of the models obtained from the proposed datasets.
Abstract: Text summarization aims to produce a short summary containing relevant parts from a given text. Due to the lack of data for abstractive summarization on low-resource languages such as Italian, we propose two new original datasets collected from two Italian news websites with multi-sentence summaries and corresponding articles, and from a dataset obtained by machine translation of a Spanish summarization dataset. These two datasets are currently the only two available in Italian for this task. To evaluate the quality of these two datasets, we used them to train a T5-base model and an mBART model, obtaining good results with both. To better evaluate the results obtained, we also compared the same models trained on automatically translated datasets, and the resulting summaries in the same training language, with the automatically translated summaries, which demonstrated the superiority of the models obtained from the proposed datasets.

1 citations

Journal ArticleDOI
TL;DR: This paper analyzes the regularities of information overlap among the articles about the same Wikipedia entry written in different languages and introduces a hypothesis that the structure of this information overlap is similar to the information overlap structure (pyramid model) used in summarization evaluation.
Abstract: Wikipedia is used as a training corpus for many information selection tasks: summarization, question-answering, etc. The information presented in Wikipedia articles as well as the order in which this information is presented, is treated as the gold standard and is used for improving the quality of information selection systems. However, the Wikipedia articles corresponding to the same entry (person, location, event, etc.) written in different languages have substantial differences regarding what information is included in these articles. In this paper we analyze the regularities of information overlap among the articles about the same Wikipedia entry written in different languages: some information facts are covered in the Wikipedia articles in many languages, while others are covered only in a few languages. We introduce a hypothesis that the structure of this information overlap is similar to the information overlap structure (pyramid model) used in summarization evaluation, as well as the information o...

1 citations

Journal Article
TL;DR: This paper summarizes the main automatic abstracting research methods and strategies and divides the methods into three major categories: automatically extracted summarization, automatic summarization based on information extraction and summarizing based on understanding.
Abstract: It summarizes the main automatic abstracting research methods and strategies and divides the methods into three major categories: automatically extracted summarization,automatic summarization based on information extraction and summarization based on understanding.Automatically extracted method uses that extract important sentences from the article to form a digest;Abstract based on information extraction method uses that extract information from the article to fill framework which has been prepared,and then use the template to output the content;Abstract based on understanding is to use natural language processing technology to generate abstracts.focuses on automatically extracted summarization from single theme articles and multi-topic articles.After comparing advantages and disadvantages of variety of algorithms,a new multi-topic classification method is proposed.

1 citations

Book ChapterDOI
24 Nov 2013
TL;DR: This study proposes a novel approach that is built upon a hierarchical topic model for automatic evaluation of sentence ordering that is able to automatically evaluate sentences to find a plausible order to arrange them for generating a more readable summary.
Abstract: The sentence ordering is a difficult but very important task in multi-document summarization. With the aim of producing a coherent and legible summary for multiple documents, this study proposes a novel approach that is built upon a hierarchical topic model for automatic evaluation of sentence ordering. By learning topic correlations from the topic hierarchies, this model is able to automatically evaluate sentences to find a plausible order to arrange them for generating a more readable summary. The experimental results demonstrate that our proposed approach can improve the summarization performance and present a significant enhancement on the sentence ordering for multi-document summarization. In addition, the experimental results show that our model can automatically analyze the topic relationships to infer a strategy for sentence ordering. Human evaluations justify that the generated summaries, which implement this strategy, demonstrate a good linguistic performance in terms of coherence, readability, and redundancy.

1 citations

Proceedings Article
01 Jan 2003

1 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852