scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Journal Article
TL;DR: It is argued that high-quality summaries should be compact while covering as much information in the original document as possible and entropy and relevance are extracted to leverage the coverage and the compactness of summaries.
Abstract: It remains a challenge to generate high-quality summaries that could concisely describe the original document without loss of informationIn this paper,we argue that high-quality summaries should be compact while covering as much information in the original document as possibleEncouraged by this idea,we extract entropy and relevance to leverage the coverage and the compactness of summariesWe adopt supervised summarization methods based on regression methods to leverage these two featuresMoreover,experiments on single and multiple document summarization show that effectively leveraging entropy and relevance could improve the quality of document summarization

1 citations

01 Jan 2005
TL;DR: This paper proposes a multi documents summarization approach based on the physical features and logical structure of the document set, which firstly clusterssimilar sentences into several Logical Topics (LTs), and then orders these topics according to their physical features of multi documents.
Abstract: With the rapid development of the Internet, multi documents summarization is becoming a very hot research topic. In order to generate a summarization that can effectively characterize the original information from documents, this paper proposes a multi documents summarization approach based on the physical features and logical structure of the document set. This method firstly clusterssimilar sentences into several Logical Topics (LTs), and then orders these topics according to their physical features of multi documents. After that, sentences used for the summarization are extracted from these LTs, and finally the summarization is generated via certain sorting algorithms. Our experiments show that the information coverage rate of our method is 8.83% higher than those methods based solely on logical structures, and 14.31% higher than Top-N method.

1 citations

Book ChapterDOI
27 Jun 2017
TL;DR: A method to use the Latent Dirichlet Allocation topic labels as concepts, instead of n-gram or using external resources, to form a summary of multi-document summarization.
Abstract: Multi-document summarization is a difficult natural language processing task Many extractive summarization methods consist of two steps: extract important concepts of documents and select sentences based on those concepts In this paper, we introduce a method to use the Latent Dirichlet Allocation (LDA) topic labels as concepts, instead of n-gram or using external resources Sentences are selected based on these topic labels in order to form a summary Two selection methods are proposed in the paper Experiments on DUC2004 dataset has shown that Vector-based methods are better, ie map topic labels and sentences to a word vector and a letter trigram vector space to find those sentences which are syntactically and semantically related with the topic labels in order to form a summary Experiments show that the produced summaries are informative, abstractive and better than the baseline method

1 citations

Book ChapterDOI
01 Jan 2007
TL;DR: A framework for text summarization is proposed and the possibilities as well as challenges in using the concepts and methods of CW in text summarizing are examined.
Abstract: The theory of “computing with words” offers mathematical tools to formally represent and reason with perceptive information, which are delivered in natural language text by imprecisely defined terms, concepts classes and chains of thinking and reasoning It thus provides relevant methods for understand-based text summarization systems In this paper we propose a framework for text summarization and examine the possibilities as well as challenges in using the concepts and methods of CW in text summarization

1 citations

Posted Content
TL;DR: The concept of membership and aggregate ranking is applied to the problem of supervised Multi Document Summarization wherein first the important class of sentences are determined using various supervised learning techniques and are post processed using the proposed ranking measure.
Abstract: Most problems in Machine Learning cater to classification and the objects of universe are classified to a relevant class. Ranking of classified objects of universe per decision class is a challenging problem. We in this paper propose a novel Rough Set based membership called Rank Measure to solve to this problem. It shall be utilized for ranking the elements to a particular class. It differs from Pawlak Rough Set based membership function which gives an equivalent characterization of the Rough Set based approximations. It becomes paramount to look beyond the traditional approach of computing memberships while handling inconsistent, erroneous and missing data that is typically present in real world problems. This led us to propose the aggregate Rank Measure. The contribution of the paper is three fold. Firstly, it proposes a Rough Set based measure to be utilized for numerical characterization of within class ranking of objects. Secondly, it proposes and establish the properties of Rank Measure and aggregate Rank Measure based membership. Thirdly, we apply the concept of membership and aggregate ranking to the problem of supervised Multi Document Summarization wherein first the important class of sentences are determined using various supervised learning techniques and are post processed using the proposed ranking measure. The results proved to have significant improvement in accuracy.

1 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852