Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Multiple Text Document Summarization System using hybrid Summarization technique

[...]

Harsha Dave¹, Shree Jaswal¹•Institutions (1)

St. Francis Institute of Technology¹

01 Sep 2015

TL;DR: This paper presents a novel approach to generate abstractive summary from extractive summary using WordNet ontology and an experimental result shows the generated summary in well-compressed, grammatically correct and human readable format.

...read moreread less

Abstract: Text Summarization plays an important role in the area of text mining and natural language processing. As the information resources are increasing tremendously, readers are overloaded with loads of information. Finding out the relevant data and manually summarizing it in short time is much more difficult, challenging and tedious task for a human being. Text Summarization aims to compress the source text into a shorter and concise form with preserving its information content and overall meaning. Summarization can be classified into two main categories i.e. extractive summarization and abstractive summarization. This paper presents a novel approach to generate abstractive summary from extractive summary using WordNet ontology. An experimental result shows the generated summary in well-compressed, grammatically correct and human readable format.

...read moreread less

15 citations

Journal Article•DOI•

Evaluation of Unsupervised Learning based Extractive Text Summarization Technique for Large Scale Review and Feedback Data

[...]

Jai Prakash Verma¹, Atul Patel²•Institutions (2)

Nirma University of Science and Technology¹, Charotar University of Science and Technology²

05 May 2017-Indian journal of science and technology

TL;DR: The results achieved by applying Graph Based Text Summarization techniques with large scale review and feedback data found improvement over previously published results based on sentence scoring using TF and TF-IDF.

...read moreread less

Abstract: Background/Objectives: Supervised techniques uses human generated summary to select features and parameter for summarization. The main problem in this approach is reliability of summary based on human generated parameters and features. Many researches have shown the conflicts in summary generated. Due to diversity of large scale datasets, supervised techniques based summarization also fails to meet the requirements. Big data analytics for text dataset also recommends unsupervised techniques than supervised techniques. Unsupervised techniques based summarization systems finds representative sentences from large amount of text dataset. Methods/Statistical Analysis: Co-selection based evaluation measure is applied for evaluating the proposed research work. The value of recall, precision, f-measure and similarity measure are determined for concluding the research outcome for the respective objective. Findings: The algorithms like KMeans, MiniBatchKMeans, and Graph based summarization techniques are discussed with all technical details. The results achieved by applying Graph Based Text Summarization techniques with large scale review and feedback data found improvement over previously published results based on sentence scoring using TF and TF-IDF. Graph based sentence scoring method is much efficient than other unsupervised learning techniques applied for extractive text summarization. Application/Improvements: The execution of graph based algorithm with Spark's Graph X programming environment will secure execution time for this types of large scale review and feedback dataset which is considered under Big Data Problem.

...read moreread less

15 citations

Summarization Experiments in DUC 2004

[...]

Kenneth C. Litkowski

01 Jan 2004

TL;DR: ClCL Research's participation in the Document Understanding Conference for 2004 was primarily intended to conduct further experiments in the use of XML-tagged documents containing increasingly richer characterizations of texts, and the Knowledge Management System was extended to include a refined capability for identifying multiword units for use in keyword generation.

...read moreread less

Abstract: CL Research's participation in the Document Understanding Conference for 2004 was primarily intended to conduct further experiments in the use of XML-tagged documents containing increasingly richer characterizations of texts. We extended the Knowledge Management System to include (1) a refined capability for identifying multiword units (phrases) for use in keyword generation, (2) the incorporation of word-sense disambiguation to tag senses and identify semantic types, and (3) the integration of question-answering functionality into the summarization framework. We did not devote much effort in refining our system to create summaries for the five tasks, but achieved reasonable levels of performance. We viewed the length restrictions imposed on the tasks as not providing sufficient flexibility to investigate different modes of summarization. We viewed the tasks of summarizing machine translations of poor quality as not very interesting. We used Tasks 1 and 3 to develop and r efine a keywor d gen er ation capa bility, ach ieving levels of four th of 18 a nd fourth of 10 priority 1 systems. In the more general summarization tasks, our performance was near the bottom of participating systems, but still achieved acceptable levels of performance. We performed much better on quality measures with our extraction-based summaries, with an overall level of third of 14 systems for Task 5. For several quality measures, our performance was somewhat less; these levels identify specifically those areas of summarization analysis where the use of an XML representation are particularly amenable to improvement. While we will continue to improve our summarization capability within the general guidelines, we believe that summarization is only one part of document understanding and may not represent needs of users for document exploration at a much deeper level.

...read moreread less

15 citations

Journal Article•DOI•

A tracking and summarization system for online Chinese news topics

[...]

Hsien-Tsung Chang¹, Shu-Wei Liu¹, Nilamadhab Mishra¹•Institutions (1)

Chang Gung University¹

05 Nov 2015

TL;DR: The research provides an empirical analysis of Chinese news content through the proposed TF-Density and Dynamic Centroid Summarization algorithms and shows that the tracking and summarization algorithms for news topics can provide more precise and convenient results for users tracking the news.

...read moreread less

Abstract: Purpose – The purpose of this paper is to design and implement new tracking and summarization algorithms for Chinese news content. Based on the proposed methods and algorithms, the authors extract the important sentences that are contained in topic stories and list those sentences according to timestamp order to ensure ease of understanding and to visualize multiple news stories on a single screen. Design/methodology/approach – This paper encompasses an investigational approach that implements a new Dynamic Centroid Summarization algorithm in addition to a Term Frequency (TF)-Density algorithm to empirically compute three target parameters, i.e., recall, precision, and F-measure. Findings – The proposed TF-Density algorithm is implemented and compared with the well-known algorithms Term Frequency-Inverse Word Frequency (TF-IWF) and Term Frequency-Inverse Document Frequency (TF-IDF). Three test data sets are configured from Chinese news web sites for use during the investigation, and two important findings are obtained that help the authors provide more precision and efficiency when recognizing the important words in the text. First, the authors evaluate three topic tracking algorithms, i.e., TF-Density, TF-IDF, and TF-IWF, with the said target parameters and find that the recall, precision, and F-measure of the proposed TF-Density algorithm is better than those of the TF-IWF and TF-IDF algorithms. In the context of the second finding, the authors implement a blind test approach to obtain the results of topic summarizations and find that the proposed Dynamic Centroid Summarization process can more accurately select topic sentences than the LexRank process. Research limitations/implications – The results show that the tracking and summarization algorithms for news topics can provide more precise and convenient results for users tracking the news. The analysis and implications are limited to Chinese news content from Chinese news web sites such as Apple Library, UDN, and well-known portals like Yahoo and Google. Originality/value – The research provides an empirical analysis of Chinese news content through the proposed TF-Density and Dynamic Centroid Summarization algorithms. It focusses on improving the means of summarizing a set of news stories to appear for browsing on a single screen and carries implications for innovative word measurements in practice.

...read moreread less

15 citations

Journal Article•DOI•

Learning From Summaries: Supporting e-Learning Activities by Means of Document Summarization

[...]

Elena Baralis¹, Luca Cagliero¹•Institutions (1)

Polytechnic University of Turin¹

01 Jul 2016-IEEE Transactions on Emerging Topics in Computing

TL;DR: Three extended versions of the ItemSum summarizer, driven by highlights, annotations, and user skill levels, respectively, have been proposed, and their performance improvements with respect to the baseline version have been validated on benchmark documents.

...read moreread less

Abstract: e-Learning platforms allow users with different skills to explore large collections of electronic documents and annotate them with notes and highlights. Generating summaries of these document collections is potentially useful for gaining insights into teaching materials. However, most existing summarizers are general purpose. Thus, they do not consider neither annotations nor user skills during the document summarization process. This paper studies the application of a state-of-the-art summarization system, namely, the itemset-based summarizer (ItemSum), in an e-learning context. The summarizer produces an ordered sequence of key phrases extracted from a teaching document. The aim of this paper is threefold: 1) evaluate the usefulness of the generated summaries for supporting individual and collective learning activities in a real context; 2) understand to what extent document highlights, annotations, and user skill levels can be used to drive the summarization process; and 3) generate multiple summaries of the same document tailored to users with different skill levels. To accomplish task 1), a crowd-sourcing experience of evaluation of the generated summaries was conducted by involving the students of a B.S. course given by a technical university. The results show that the automatically generated summaries reflect, to a large extent, the students’ expectations. Hence, they can be useful for supporting learning activities in university-level computer science courses. To address task 2), three extended versions of the ItemSum summarizer, driven by highlights, annotations, and user skill levels, respectively, have been proposed, and their performance improvements with respect to the baseline version have been validated on benchmark documents. Finally, to accomplish task 3), multiple summaries of the same benchmark documents have been generated by considering only the annotations made by the users with a different skill level. The results confirm that the summary content reflects the level of expertise of the targeted users.

...read moreread less

15 citations

Collapse

Network Information

Performance

Metrics

2,507

Papers

81,726

Citations

No. of papers in the topic in previous years
Year	Papers
2023	74
2022	160
2021	52
2020	61
2019	47
2018	52

Multi-document summarization

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics