scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings ArticleDOI
01 Sep 2016
TL;DR: A novel approach for text summarization of Hindi text document based on some linguistic rules is presented and accuracy of the system in form of number of lines extracted from original text containing important information of the original text document is tested.
Abstract: Automatic summarization plays an important role in document processing system and information retrieval system. Generation of summary of a text document is a very important part of NLP. There are a number of scenarios where automatic construction of such summaries is useful. Text summarization is that process which convert a larger text into its shorter form maintaining its information. Summary of a longer text saves the reading time as it contain lesser number of lines but all important information of the original text document. In this paper we present a novel approach for text summarization of Hindi text document based on some linguistic rules. Dead wood words and phrases are also removed from the original document to generate the lesser number of words from the original text. Proposed system is tested on various Hindi inputs and accuracy of the system in form of number of lines extracted from original text containing important information of the original text document.

19 citations

Patent
30 Apr 2008
TL;DR: In this paper, a system that enables automatic summarization of significant events that occur within a collaborative discussion is presented, where the summarization promotes efficient review and asynchronous participation where a user can trigger playback of a series of events that occurred within a discussion.
Abstract: A system (and corresponding method) that enables automatic (and/or manual) summarization of significant events that occur within a collaborative discussion is provided. The summarization promotes efficient review and asynchronous participation where a user can trigger playback of a series of events that occurred within a discussion. The system can automatically summarize ‘high points’ or significant events from within an immersive collaborative environment. ‘World-marks’ or other tags can be employed to mark, locate and/or render the summarized content.

18 citations

Proceedings ArticleDOI
28 Dec 2010
TL;DR: This paper presents a topic-driven framework for generating a generic summary from multi-documents and proposes two methods in similarity measurement: the static method and the dynamic method, which are employed to detect the salience of information in a static way and further controls redundancy in a dynamic way.
Abstract: This paper presents a topic-driven framework for generating a generic summary from multi-documents. Our approach is based on the intuition that, from the statistical point of view, the summary’s probability distribution over the topics should be consistent with the multi-documents’ probability distribution over the inherent topics. Here, the topics are defined as weighted “bag-of-words” and derived by Latent Dirichlet Allocation from a collection of documents, either the given multi-documents or a related large-scale corpus. In this sense, we could represent various kinds of text units, such as word, sentence, summary, document and multi-documents, using a single vector space model via their corresponding probability distributions over the derived topics. Therefore, we are able to extract a sentence or summary by calculating the similarity between a sentence/summary and the given multi-documents via their topic probability distributions. In particular, we propose two methods in similarity measurement: the static method and the dynamic method. While the former is employed to detect the salience of information in a static way, the later further controls redundancy in a dynamic way. In addition, we integrate various popular features to improve the performance. Evaluation on the TAC 2008 update summarization task shows encouraging results.

18 citations

Journal ArticleDOI
TL;DR: A novel statistical approach is given to summarize the given text by assigning a weight value to each word of the sentence and a boost factor is also added to those terms which appear in bold, italic or underlined or any combination of these features.
Abstract: t of work has already been done for automatic text summarization. In this paper we have given a novel statistical approach to summarize the given text. In our approach extraction of relevant sentences is done which can give the actual concept of the input document in a concise form. We rank each sentence in the document by assigning a weight value to each word of the sentence and a boost factor is also added to those terms which appear in bold, italic or underlined or any combination of these features. It helps us to extract more relevant sentences which will lead to a good summary of the given text. Keywordstext summarization, sentence extraction, boost factor, term weight

18 citations

Book ChapterDOI
24 Sep 2006
TL;DR: A model for multiple documents summarization that maximize the coverage of topics and minimize the redundancy of contents is proposed that can analyze the topic of each document, their relationships and the central theme of the collection to evaluate sentences.
Abstract: With the increasing volume of online information, it is more important to automatically extract the core content from lots of information sources. We propose a model for multiple documents summarization that maximize the coverage of topics and minimize the redundancy of contents. Based on Chinese concept lexicon and corpus, the proposed model can analyze the topic of each document, their relationships and the central theme of the collection to evaluate sentences. We present different approaches to determine which sentences are appropriate for the extraction on the basis of sentences weight and their relevance from the related documents. A genetic algorithm is designed to improve the quality of the summarization. The experimental results indicate that it is useful and effective to improve the quality of multiple documents summarization using genetic algorithm.

18 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852