scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings ArticleDOI
01 Sep 2015
TL;DR: This paper presents a novel approach to generate abstractive summary from extractive summary using WordNet ontology and an experimental result shows the generated summary in well-compressed, grammatically correct and human readable format.
Abstract: Text Summarization plays an important role in the area of text mining and natural language processing. As the information resources are increasing tremendously, readers are overloaded with loads of information. Finding out the relevant data and manually summarizing it in short time is much more difficult, challenging and tedious task for a human being. Text Summarization aims to compress the source text into a shorter and concise form with preserving its information content and overall meaning. Summarization can be classified into two main categories i.e. extractive summarization and abstractive summarization. This paper presents a novel approach to generate abstractive summary from extractive summary using WordNet ontology. An experimental result shows the generated summary in well-compressed, grammatically correct and human readable format.

15 citations

Journal ArticleDOI
TL;DR: The results achieved by applying Graph Based Text Summarization techniques with large scale review and feedback data found improvement over previously published results based on sentence scoring using TF and TF-IDF.
Abstract: Background/Objectives: Supervised techniques uses human generated summary to select features and parameter for summarization. The main problem in this approach is reliability of summary based on human generated parameters and features. Many researches have shown the conflicts in summary generated. Due to diversity of large scale datasets, supervised techniques based summarization also fails to meet the requirements. Big data analytics for text dataset also recommends unsupervised techniques than supervised techniques. Unsupervised techniques based summarization systems finds representative sentences from large amount of text dataset. Methods/Statistical Analysis: Co-selection based evaluation measure is applied for evaluating the proposed research work. The value of recall, precision, f-measure and similarity measure are determined for concluding the research outcome for the respective objective. Findings: The algorithms like KMeans, MiniBatchKMeans, and Graph based summarization techniques are discussed with all technical details. The results achieved by applying Graph Based Text Summarization techniques with large scale review and feedback data found improvement over previously published results based on sentence scoring using TF and TF-IDF. Graph based sentence scoring method is much efficient than other unsupervised learning techniques applied for extractive text summarization. Application/Improvements: The execution of graph based algorithm with Spark's Graph X programming environment will secure execution time for this types of large scale review and feedback dataset which is considered under Big Data Problem.

15 citations

01 Jan 2004
TL;DR: ClCL Research's participation in the Document Understanding Conference for 2004 was primarily intended to conduct further experiments in the use of XML-tagged documents containing increasingly richer characterizations of texts, and the Knowledge Management System was extended to include a refined capability for identifying multiword units for use in keyword generation.
Abstract: CL Research's participation in the Document Understanding Conference for 2004 was primarily intended to conduct further experiments in the use of XML-tagged documents containing increasingly richer characterizations of texts. We extended the Knowledge Management System to include (1) a refined capability for identifying multiword units (phrases) for use in keyword generation, (2) the incorporation of word-sense disambiguation to tag senses and identify semantic types, and (3) the integration of question-answering functionality into the summarization framework. We did not devote much effort in refining our system to create summaries for the five tasks, but achieved reasonable levels of performance. We viewed the length restrictions imposed on the tasks as not providing sufficient flexibility to investigate different modes of summarization. We viewed the tasks of summarizing machine translations of poor quality as not very interesting. We used Tasks 1 and 3 to develop and r efine a keywor d gen er ation capa bility, ach ieving levels of four th of 18 a nd fourth of 10 priority 1 systems. In the more general summarization tasks, our performance was near the bottom of participating systems, but still achieved acceptable levels of performance. We performed much better on quality measures with our extraction-based summaries, with an overall level of third of 14 systems for Task 5. For several quality measures, our performance was somewhat less; these levels identify specifically those areas of summarization analysis where the use of an XML representation are particularly amenable to improvement. While we will continue to improve our summarization capability within the general guidelines, we believe that summarization is only one part of document understanding and may not represent needs of users for document exploration at a much deeper level.

15 citations

Journal ArticleDOI
05 Nov 2015
TL;DR: The research provides an empirical analysis of Chinese news content through the proposed TF-Density and Dynamic Centroid Summarization algorithms and shows that the tracking and summarization algorithms for news topics can provide more precise and convenient results for users tracking the news.
Abstract: Purpose – The purpose of this paper is to design and implement new tracking and summarization algorithms for Chinese news content. Based on the proposed methods and algorithms, the authors extract the important sentences that are contained in topic stories and list those sentences according to timestamp order to ensure ease of understanding and to visualize multiple news stories on a single screen. Design/methodology/approach – This paper encompasses an investigational approach that implements a new Dynamic Centroid Summarization algorithm in addition to a Term Frequency (TF)-Density algorithm to empirically compute three target parameters, i.e., recall, precision, and F-measure. Findings – The proposed TF-Density algorithm is implemented and compared with the well-known algorithms Term Frequency-Inverse Word Frequency (TF-IWF) and Term Frequency-Inverse Document Frequency (TF-IDF). Three test data sets are configured from Chinese news web sites for use during the investigation, and two important findings are obtained that help the authors provide more precision and efficiency when recognizing the important words in the text. First, the authors evaluate three topic tracking algorithms, i.e., TF-Density, TF-IDF, and TF-IWF, with the said target parameters and find that the recall, precision, and F-measure of the proposed TF-Density algorithm is better than those of the TF-IWF and TF-IDF algorithms. In the context of the second finding, the authors implement a blind test approach to obtain the results of topic summarizations and find that the proposed Dynamic Centroid Summarization process can more accurately select topic sentences than the LexRank process. Research limitations/implications – The results show that the tracking and summarization algorithms for news topics can provide more precise and convenient results for users tracking the news. The analysis and implications are limited to Chinese news content from Chinese news web sites such as Apple Library, UDN, and well-known portals like Yahoo and Google. Originality/value – The research provides an empirical analysis of Chinese news content through the proposed TF-Density and Dynamic Centroid Summarization algorithms. It focusses on improving the means of summarizing a set of news stories to appear for browsing on a single screen and carries implications for innovative word measurements in practice.

15 citations

Journal ArticleDOI
TL;DR: Three extended versions of the ItemSum summarizer, driven by highlights, annotations, and user skill levels, respectively, have been proposed, and their performance improvements with respect to the baseline version have been validated on benchmark documents.
Abstract: e-Learning platforms allow users with different skills to explore large collections of electronic documents and annotate them with notes and highlights. Generating summaries of these document collections is potentially useful for gaining insights into teaching materials. However, most existing summarizers are general purpose. Thus, they do not consider neither annotations nor user skills during the document summarization process. This paper studies the application of a state-of-the-art summarization system, namely, the itemset-based summarizer (ItemSum), in an e-learning context. The summarizer produces an ordered sequence of key phrases extracted from a teaching document. The aim of this paper is threefold: 1) evaluate the usefulness of the generated summaries for supporting individual and collective learning activities in a real context; 2) understand to what extent document highlights, annotations, and user skill levels can be used to drive the summarization process; and 3) generate multiple summaries of the same document tailored to users with different skill levels. To accomplish task 1), a crowd-sourcing experience of evaluation of the generated summaries was conducted by involving the students of a B.S. course given by a technical university. The results show that the automatically generated summaries reflect, to a large extent, the students’ expectations. Hence, they can be useful for supporting learning activities in university-level computer science courses. To address task 2), three extended versions of the ItemSum summarizer, driven by highlights, annotations, and user skill levels, respectively, have been proposed, and their performance improvements with respect to the baseline version have been validated on benchmark documents. Finally, to accomplish task 3), multiple summaries of the same benchmark documents have been generated by considering only the annotations made by the users with a different skill level. The results confirm that the summary content reflects the level of expertise of the targeted users.

15 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852