scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings ArticleDOI
01 Jan 2015
TL;DR: A multi-document summarization approach which summarizes reports of a particular social or political event automatically and comprehensively is proposed, which is feasible under large scale environment and favoured by users.
Abstract: In the big data era, massive news reports about the latest events are being published on the Web. To thoroughly understand an event, we have to read massive reports and keep clues in mind, which is very difficult and usually results in a one-sided interpretation. In this paper, we propose a multi-document summarization approach which summarizes reports of a particular social or political event automatically and comprehensively. To speed up summarization, a pre-summarization approach is introduced to condense each report to a sub-summary, which can reduce the scale of subsequent processing. As an event should be told in chronological order, a timeline is introduced to organize and aggregate event-relevant sub-summaries. With each day's sub-summaries, a key phrase extraction algorithm is used to cluster them into topics and generate a meaningful label for each topic. Finally, a selection criterion is introduced to select relevant and novel sentences for each topic. We perform experiments on a large-scale news dataset, with about 10 million reports collected from news sites. An empirical study shows that our system is feasible under large scale environment. An evaluation on effectiveness shows that it is favoured by users.

3 citations

Journal ArticleDOI
TL;DR: A text summarization approach that clusters text units before extracting summary sentences and shows that the approach improves the quality of summarization.
Abstract: We propose a text summarization approach that clusters text units before extracting summary sentences. Text units are formed by combining sentences based on rhetorical structure information. The rhetorical structure information we use is the one immediately recognizable at the surface level, making this approach language independent as much as possible. Experiments conducted with both Korean and English text collections show that the approach improves the quality of summarization.

3 citations

Proceedings ArticleDOI
10 Nov 2006
TL;DR: The experimental results show the approach is superior to traditional approaches including Bisecting K-means as a leading document clustering approach in terms of cluster quality and clustering reliability and provides concise but rich text summary in key concepts and sentences.
Abstract: We introduce a method that integrates biomedical literature clustering and summarization using biomedical ontology. The core of the approach is to identify document cluster models as semantic chunks capturing the core semantic relationships in the ontology-enriched scale-free graphical representation of documents. These document cluster models are used for both document clustering on document assignment and text summarization on the construction of Text Semantic Interaction Network (TSIN). Our experimental results show our approach is superior to traditional approaches including Bisecting K-means as a leading document clustering approach in terms of cluster quality and clustering reliability. In addition, our approach provides concise but rich text summary in key concepts and sentences.

3 citations

Journal ArticleDOI
TL;DR: The essence of the proposed system is to find the way to summarize the long video and introduce the important information to the user as a text with few numbers of lines to benefit the students or the researchers that have no time to spend with long videos for extract the useful data.
Abstract: Automatic summarization is a technique for quickly introducing key information by abbreviating large sections of material. Summarization may apply to text and video with a different method to display the abstract of the subject. Natural language processing is employed in automated text summarization in this research, which applies to YouTube videos by transcribing and applying the summary stages in this study. Based on the number of words and sentences in the text, the method term frequency-inverse document frequency (TF-IDF) was used to extract the important keywords for the summary. Some videos are long and boring or take more time to display the information that sometimes finds in a few minutes. Therefore, the essence of the proposed system is to find the way to summarize the long video and introduce the important information to the user as a text with few numbers of lines to benefit the students or the researchers that have no time to spend with long videos for extract the useful data. The results have been evaluated using Rouge method on the convolutional neural network (CNN)-dailymail-master data set.

3 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852