scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Posted Content
TL;DR: This work develops SemSentSum, a fully data-driven model able to leverage both types of sentence embeddings by building a sentence semantic relation graph, which is the first to use multiple sentenceembeddings for the task of multi-document summarization.
Abstract: Linking facts across documents is a challenging task, as the language used to express the same information in a sentence can vary significantly, which complicates the task of multi-document summarization. Consequently, existing approaches heavily rely on hand-crafted features, which are domain-dependent and hard to craft, or additional annotated data, which is costly to gather. To overcome these limitations, we present a novel method, which makes use of two types of sentence embeddings: universal embeddings, which are trained on a large unrelated corpus, and domain-specific embeddings, which are learned during training. To this end, we develop SemSentSum, a fully data-driven model able to leverage both types of sentence embeddings by building a sentence semantic relation graph. SemSentSum achieves competitive results on two types of summary, consisting of 665 bytes and 100 words. Unlike other state-of-the-art models, neither hand-crafted features nor additional annotated data are necessary, and the method is easily adaptable for other tasks. To our knowledge, we are the first to use multiple sentence embeddings for the task of multi-document summarization.

9 citations

Journal ArticleDOI
TL;DR: A novel ranking framework for social context summarization that combines intra-relation and inter-relation which integrate the support of local and social information in a mutual reinforcement form and was extensively evaluated on two datasets.
Abstract: A novel ranking framework for social context summarization is proposed.The framework relies on the reinforcement support of social information.14 features in two groups: distance and statistical are proposed.A new open-domain dataset is created and manually annotated.Combining intra-relation and inter-relation benefits the summarization. Traditional summarization methods only use the internal information of a Web document while ignoring its social information such as tweets from Twitter, which can provide a perspective viewpoint for readers towards an event. This paper proposes a framework named SoRTESum to take the advantages of social information such as document content reflection to extract summary sentences and social messages. In order to do that, the summarization was formulated in two steps: scoring and ranking. In the scoring step, the score of a sentence or social message is computed by using intra-relation and inter-relation which integrate the support of local and social information in a mutual reinforcement form. To calculate these relations, 16 features are proposed. After scoring, the summarization is generated by selecting top m ranked sentences and social messages. SoRTESum was extensively evaluated on two datasets. Promising results show that: (i) SoRTESum obtains significant improvements of ROUGE-scores over state-of-the-art baselines and competitive results with the learning to rank approach trained by RankBoost and (ii) combining intra-relation and inter-relation benefits single-document summarization.

9 citations

Book ChapterDOI
15 Dec 2009
TL;DR: An integrated model for automatic text summarization problem is introduced and different techniques advantages are exploited in building of this model like advantage of diversity based method which can filter the similar sentences and select the most diverse ones and advantage of the differentiation between the most important features and less important using swarm based method.
Abstract: Automatic text summarization systems aim to make their created summaries closer to human summaries. The summary creation under the condition of the redundancy and the summary length limitation is a challenge problem. The automatic text summarization system which is built based on exploiting of the advantages of different techniques in form of an integrated model could produce a good summary for the original document. In this paper, we introduced an integrated model for automatic text summarization problem; we tried to exploit different techniques advantages in building of our model like advantage of diversity based method which can filter the similar sentences and select the most diverse ones and advantage of the differentiation between the most important features and less important using swarm based method. The experimental results showed that our model got the best performance over all methods used in this study.

9 citations

Book ChapterDOI
18 Apr 2009
TL;DR: This paper model each individual document as a graph and generate a query-specific summary for it, then these individual summaries are intelligently combined to produce the final summary.
Abstract: In this paper, we address the problem of generating a query-specific extractive summary in a an efficient manner for a given set of documents. In many of the current solutions, the entire collection of documents is modeled as a single graph which is used for summary generation. Unlike these approaches, in this paper, we model each individual document as a graph and generate a query-specific summary for it. These individual summaries are then intelligently combined to produce the final summary. This approach greatly reduces the computational complexity.

9 citations

01 Jan 2003
TL;DR: This paper puts forward two new summarization by context algorithms that uses both the content and the context and the second one relies only on the elements of the context.
Abstract: This paper adresses the issue of Web document summarization. We consider the context of a Web document by the set of pieces of information extracted from the content of all the documents linked to it. We put forward two new summarization by context algorithms. The first one uses both the content and the context and the second one relies only on the elements of the context. It is shown that summaries based on the context are usually much more relevant than those only made from the content of the target. Optimal conditions on the size of the content and the context of the document to yield the best summaries are studied.

9 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852