scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
01 Jan 2016
TL;DR: A sentence extraction based automatic multi-document summarization system that employs fuzzy logic and Genetic Algorithm and the fuzzy rules are optimized with the help of GA algorithm and the extraction of sentences can be done based on the fuzzy score of each sentences.
Abstract: In the recent times, the requirement for generation of multi-document summary has gained a lot of attention among the researchers. Mostly, the text summarization technique uses the sentence extraction technique where the salient sentences in the multiple documents are extracted and presented as a summary. In our proposed system, we have developed a sentence extraction based automatic multi-document summarization system that employs fuzzy logic and Genetic Algorithm (GA). At first, the different features are used to identify the significance of sentences in such a way that, each sentence in the documents is specified with the feature score. The feature score is then fed to the fuzzy logic (an AI technique) in which the fuzzy inference engine decides the importance of the sentences based on the fuzzy rules. The fuzzy rules are optimized with the help of GA algorithm and the extraction of sentences can be done based on the fuzzy score of each sentences. A multi document summary is created from the extracted sentences after removing the redundant sentences. The experiments have been done using the DUC 2002 dataset and the summary is evaluated with the measures such as Precision, Recall and Fmeasure.

3 citations

Journal ArticleDOI
TL;DR: This paper proposes the application of regression models to query-focused multi-document summarization based on the significance of the sentence position based on Support Vector Regression (SVR).
Abstract: Document summarization is needed to get the information effectively and efficiently. One method used to obtain the document summarization by applying machine learning techniques. This paper proposes the application of regression models to query-focused multi-document summarization based on the significance of the sentence position. The method used is the Support Vector Regression (SVR) which estimates the weight of the sentence on a set of documents to be made as a summary based on sentence feature which has been defined previously. A series of evaluations performed on a data set of DUC 2005. From the test results obtained summary which has an average precision and recall values of 0.0580 and 0.0590 for measurements using ROUGE-2, ROUGE 0.0997 and 0.1019 for measurements using the proposed regression-SU4. Model can perform measurements of the significance of the position of the sentence in the document well.

3 citations

Proceedings ArticleDOI
01 Jan 2017
TL;DR: The knowledge based system is able to generate coherent and topically related new sentences by using syntactic structures and semantic features of the given documents, the knowledge base, and the reasoning engine to create abstractive summary of texts.
Abstract: This paper describes a knowledge based system for automatic summarization. The knowledge based system creates abstractive summary of texts by generalizing new concepts, detecting main topics, and composing new sentences. The knowledge based system is built on the Cyc development platform, which comprises the world’s largest ontology of common sense knowledge and reasoning engine. The system is able to generate coherent and topically related new sentences by using syntactic structures and semantic features of the given documents, the knowledge base, and the reasoning engine. The system first performs knowledge acquisition by extracting syntactic structure of each sentence in the given documents, and by mapping the words and the relationships of words into Cyc knowledge base. Next, it performs knowledge discovery by using Cyc ontology and inference engine. New concepts are abstracted by exploring the ontology of the mapped concepts. Main topics are identified based on the clustering of the concepts. Then, the system performs knowledge representation for human readers by creating new English sentences to summarize the key concepts and the relationships of the concepts. The structures of the composed sentences extend beyond subject-predicate-object triplets by allowing adjective and adverb modifiers. The system was tested on various documents and webpages. The test results showed that the system is capable of creating new sentences that include generalized concepts not mentioned in the original text and is capable of combining information from different parts of the text to form a summary.

3 citations

Journal ArticleDOI
TL;DR: This work proposes a clustering based algorithm to select the best summarization among all of internal and external images that leverages relevance and dominance of images as the prior information.
Abstract: Visually summarizing web pages is an attractive approach that provides users an effective and friendly interface to identify desired contents at a first glance for search and re-finding tasks. Using dominant images in web pages is generally reliable for this purpose. However, dominant images are often unavailable in many web pages. To solve this problem, we first propose a new approach to summarize those web pages without any dominant images by retrieving relevant external images from the Internet. However, relevant external images are sometimes unreliable. To take the advantages of these two kinds of images, we further propose a clustering based algorithm to select the best summarization among all of internal and external images. This algorithm leverages relevance and dominance of images as the prior information. Experimental results show that our approach achieves 0.098 and 0.082 NDCG1 gain on a human labeled data set, compared with relevant external image and dominant image, respectively. Our user study also indicates that the images selected by our algorithm are useful as the summarization of web pages.

3 citations

Journal ArticleDOI
01 Apr 2021
TL;DR: This paper proposes a Multi-Document Temporal Summarization (MDTS) technique that generates the summary based on temporally related events extracted from multiple documents and finds that the performance of MDTS is better when compared with other methods.
Abstract: Internet or Web consists of a massive amount of information, handling which is a tedious task. Summarization plays a crucial role in extracting or abstracting key content from multiple sources with its meaning contained, thereby reducing the complexity in handling the information. Multi-document summarization gives the gist of the content collected from multiple documents. Temporal summarization concentrates on temporally related events. This paper proposes a Multi-Document Temporal Summarization (MDTS) technique that generates the summary based on temporally related events extracted from multiple documents. This technique extracts the events with the time stamp. TIMEML standards tags are used in extracting events and times. These event-times are stored in a structured database form for easier operations. Sentence ranking methods are build based on the frequency of events occurrences in the sentence. Sentence similarity measures are computed to eliminate the redundant sentences in an extracted summary. Depending on the required summary length, top-ranked sentences are selected to form the summary. Experiments are conducted on DUC 2006 and DUC 2007 data set that was released for multi-document summarization task. The extracted summaries are evaluated using ROUGE to determine precision, recall and F measure of generated summaries. The performance of the proposed method is compared with particle swarm optimization-based algorithm (PSOS), Cat swarm optimization-based summarization (CSOS), Cuckoo Search based multi-document summarization (MDSCSA). It is found that the performance of MDTS is better when compared with other methods. Doi: 10.28991/esj-2021-01268 Full Text: PDF

3 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852