scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings ArticleDOI
04 Jun 2009
TL;DR: It is shown that the summarizer built is able to outperform most systems participating in task focused summarization evaluations at Text Analysis Conferences (TAC) 2008 and would perform better at producing short summaries than longer summaries.
Abstract: In this paper, we describe a sentence position based summarizer that is built based on a sentence position policy, created from the evaluation testbed of recent summarization tasks at Document Understanding Conferences (DUC). We show that the summarizer thus built is able to outperform most systems participating in task focused summarization evaluations at Text Analysis Conferences (TAC) 2008. Our experiments also show that such a method would perform better at producing short summaries (upto 100 words) than longer summaries. Further, we discuss the baselines traditionally used for summarization evaluation and suggest the revival of an old baseline to suit the current summarization task at TAC: the Update Summarization task.

36 citations

22 Jan 2007
TL;DR: The Document Understanding Conference (DUC) 2005 evaluation had a single user-oriented, question-focused summarization task, which was to synthesize from a set of 25--50 documents a well-organized, fluent answer to a complex question as discussed by the authors.
Abstract: The Document Understanding Conference (DUC) 2005 evaluation had a single user-oriented, question-focused summarization task, which was to synthesize from a set of 25--50 documents a well-organized, fluent answer to a complex question The evaluation shows that the best summarization systems have difficulty extracting relevant sentences in response to complex questions (as opposed to representative sentences that might be appropriate to a generic summary) The relatively generous allowance of 250 words for each answer also reveals how difficult it is for current summarization systems to produce fluent text from multiple documents

36 citations

Proceedings ArticleDOI
01 Dec 2012
TL;DR: This paper presents an approach to query focused multi document summarization by combining single document summary using sentence clustering, and observed an average F-measure on DUC 2002 multi-document dataset, which is comparable to three best performing systems reported on the same dataset.
Abstract: This paper presents an approach to query focused multi document summarization by combining single document summary using sentence clustering. Both syntactic and semantic similarity between sentences is used for clustering. Single document summary is generated using document feature, sentence reference index feature, location feature and concept similarity feature. Sentences from single document summaries are clustered and top most sentences from each cluster are used for creating multi-document summary. We observed an average F-measure of 0.33774 on DUC 2002 multi-document dataset, which is comparable to three best performing systems reported on the same dataset.

36 citations

Proceedings ArticleDOI
13 Oct 1998
TL;DR: A method for combining query-relevance with information-novelty in the context of text retrieval and summarization, where the clearest advantage is demonstrated in the automated construction of large document and non-redundant multi-document summaries, where MMR results are clearly superior to non-MMR passage selection.
Abstract: This paper develops a method for combining query-relevance with information-novelty in the context of text retrieval and summarization. The Maximal Marginal Relevance (MMR) criterion strives to reduce redundancy while maintaining query relevance in reranking retrieved documents and in selecting appropriate passages for text summarization. Preliminary results indicate some benefits for MMR diversity ranking in ad-hoc query and in single document summarization. The latter are borne out by the trial-run (unofficial) TREC-style evaluation of summarization systems. However, the clearest advantage is demonstrated in the automated construction of large document and non-redundant multi-document summaries, where MMR results are clearly superior to non-MMR passage selection. This paper also discusses our preliminary evaluation of summarization methods for single documents.

36 citations

Journal ArticleDOI
TL;DR: A frequent term based text summarization algorithm which is implemented using open source technologies like java, DISCO, Porters stemmer etc and verified over the standard text mining corpus.
Abstract: Text summarization is an important activity in the analysis of a high volume text documents. Text summarization has number of applications; recently number of applications uses text summarization for the betterment of the text analysis and knowledge representation. In this paper a frequent term based text summarization algorithm is designed and implemented in java. The designed algorithm works in three steps. In the first step the document which is required to be summarized is processed by eliminating the stop word and by applying the stemmers. In the second step term-frequent data is calculated from the document and frequent terms are selected, for these selected words the semantic equivalent terms are also generated. Finally in the third step all the sentences in the document, which are containing the frequent and semantic equivalent terms, are filtered for summarization. The designed algorithm is implemented using open source technologies like java, DISCO, Porters stemmer etc. and verified over the standard text mining corpus. Keyword

36 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852