scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A pattern-based model for generic multi-document summarization is presented, which exploits closed patterns to extract the most salient sentences from a document collection and reduce redundancy in the summary.
Abstract: There are two main categories of multi-document summarization: term-based and ontology-based methods. A term-based method cannot deal with the problems of polysemy and synonymy. An ontology-based approach addresses such problems by taking into account of the semantic information of document content, but the construction of ontology requires lots of manpower. To overcome these open problems, this paper presents a pattern-based model for generic multi-document summarization, which exploits closed patterns to extract the most salient sentences from a document collection and reduce redundancy in the summary. Our method calculates the weight of each sentence of a document collection by accumulating the weights of its covering closed patterns with respect to this sentence, and iteratively selects one sentence that owns the highest weight and less similarity to the previously selected sentences, until reaching the length limitation. The sentence weight calculation by patterns reduces the dimension and captures more relevant information. Our method combines the advantages of the term-based and ontology-based models while avoiding their weaknesses. Empirical studies on the benchmark DUC2004 datasets demonstrate that our pattern-based method significantly outperforms the state-of-the-art methods. Multi-document summarization can be used to extract a particular individual's opinions in the form of closed patterns, from this individual's documents shared in social networks, hence provides a useful tool for further analyzing the individual's behavior and influence in group activities.

54 citations

Proceedings ArticleDOI
Lee-Feng Chien1
01 Jul 1995
TL;DR: The proposed approach is an integrated and efficient text access method, which performs well both in exact match searching of Boolean queries and best match searching (ranking) of quasi-natural language queries, which is capable of retrieving gigabytes of Chinese texts very efficiently and intelligently.
Abstract: This paper presents an efficient signature file approach for fast and intelligent retrieval of large Chinese full-text document databases. The proposed approach is an integrated and efficient text access method, which performs well both in exact match searching of Boolean queries and best match searching (ranking) of quasi-natural language queries. Using this approach, the inherent difficulties of Chinese word segmentation and proper noun identification can be effectively reduced, queries can be expressed with non-controlled vocabulary, and the ranking function can be easily implemented neither demanding extra space overhead nor affecting the retrieval efficiency. The experimental results show that the proposed approach achieves good performance in many ways, especially in the reduction of false drops and space overhead, the speedup of retrieval time, and the capability of best match searching using quasi-natural language queries. In conclusion, the proposed approach is capable of retrieving gigabytes of Chinese texts very efficiently and intelligently.

53 citations

Proceedings Article
23 Aug 2010
TL;DR: An extractive summarization model is proposed to provide an evaluation framework for the position information and results show that word position information is more effective and adaptive than sentence position information.
Abstract: Position information has been proved to be very effective in document summarization, especially in generic summarization. Existing approaches mostly consider the information of sentence positions in a document, based on a sentence position hypothesis that the importance of a sentence decreases with its distance from the beginning of the document. In this paper, we consider another kind of position information, i.e., the word position information, which is based on the ordinal positions of word appearances instead of sentence positions. An extractive summarization model is proposed to provide an evaluation framework for the position information. The resulting systems are evaluated on various data sets to demonstrate the effectiveness of the position information in different summarization tasks. Experimental results show that word position information is more effective and adaptive than sentence position information.

53 citations

Journal ArticleDOI
TL;DR: An automatic patent summarization method for accurate knowledge abstraction and effective R&D knowledge management combining the concepts of key phrase recognition and significant information density is developed.
Abstract: Purpose – In an era of rapidly expanding digital content, the number of e‐documents and the amount of knowledge frequently overwhelm the R&D teams and often impede intellectual property management. The purpose of this paper is to develop an automatic patent summarization method for accurate knowledge abstraction and effective R&D knowledge management.Design/methodology/approach – This paper develops an integrated approach for automatic patent summary generation combining the concepts of key phrase recognition and significant information density. Significant information density is defined based on the domain‐specific key concepts/phrases, relevant phrases, title phrases, indicator phrases and topic sentences of a given patent document.Findings – The document compression ratio and the knowledge retention ratio are used to measure both quantitative and qualitative outcomes of the new summarization methodology. Both measurements indicate the significant benefits and superior results of the method.Research lim...

53 citations

Posted Content
TL;DR: A novel subset selection technique that leverages supervision in the form of humancreated summaries to perform automatic keyframe-based video summarization, and shows how to extend the method to exploit semantic side information about the video's category/ genre to guide the transfer process by those training videos semantically consistent with the test input.
Abstract: Video summarization has unprecedented importance to help us digest, browse, and search today's ever-growing video collections. We propose a novel subset selection technique that leverages supervision in the form of human-created summaries to perform automatic keyframe-based video summarization. The main idea is to nonparametrically transfer summary structures from annotated videos to unseen test videos. We show how to extend our method to exploit semantic side information about the video's category/genre to guide the transfer process by those training videos semantically consistent with the test input. We also show how to generalize our method to subshot-based summarization, which not only reduces computational costs but also provides more flexible ways of defining visual similarity across subshots spanning several frames. We conduct extensive evaluation on several benchmarks and demonstrate promising results, outperforming existing methods in several settings.

53 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852