scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings ArticleDOI
06 May 2009
TL;DR: A scalable multimedia model that structures the multimedia scene into incremental Spatial, Temporal and Interactive layers and progressively provides presentation details is presented and has been technically validated on PowerPoint-like documents using a generic MPEG-21-based adaptation framework.
Abstract: The summarization of a multimedia document is a challenge that requires the summarization of media elements combined into a document but also relies on an appropriate adaptation of its presentation. In this paper, we present a scalable multimedia model that structures the multimedia scene into incremental Spatial, Temporal and Interactive layers and progressively provides presentation details. Our proposal consists in summarizing such scalable multimedia documents based on three adaptation parameters: a targeted level of expertise, a preferred duration and a level of expectation for extended information. Our approach has been technically validated on PowerPoint-like documents using a generic MPEG-21-based adaptation framework.

1 citations

Proceedings ArticleDOI
13 Apr 2015
TL;DR: A simple, inexpensive and domain-independent system architecture is created for adding semantic analysis to the summarization process that outperforms the baseline system by more than ten rankings and shows that semantic analysis and light-weight, open-domain techniques have potential.
Abstract: Excess amounts of unstructured data are easily accessible in digital format quickly, yet there is no way for a human reader to easily 'ingest and digest' as quickly. This information overload places too heavy a burden on society for its analysis and execution needs. Focused (i.e. topic, query, question, category, etc.) multidocument summarization is an information reduction solution that has reached a state-of-the-art and now demands further exploration into other techniques to model human summarization activity. Such techniques have been mainly extractive and rely on distribution and complex machine learning on corpora in order to perform closely to humans. Consequently, the field needs to move toward more abstractive approaches to model human ways of summarizing. A simple, inexpensive and domain-independent system architecture is created for adding semantic analysis to the summarization process. Our system is novel for a couple of reasons. First, in its use of a semantic cue words feature and semantic class weighting to determine sentences with important information as a new semantic analysis metric. Second, in its use of semantic triples clustering to decompose natural language sentences into their most basic meaning to reduce the complexity of processing sentences and capture more likely semantic-related information. In competition against the gold standard baseline system from the Text Analysis Conference on the standardized summarization evaluation metric ROUGE, this work outperforms the baseline system by more than ten rankings. This work shows that semantic analysis and light-weight, open-domain techniques have potential.

1 citations

Journal Article
TL;DR: A comparative study is conducted on two types of summarizations; opinion summarization using the proposed method, which uses two different sentiment lexicons: VADER and SentiWordNet against extractive summarizations using established methods: Luhn, Latent Semantic Analysis (LSA) and LexRank.
Abstract: Opinion summarization summarizes opinion in texts while extractive summarization summarizes texts without considering opinion in the texts. Can opinion summarization be used to produce a better extractive summary? This paper proposes to determine the effectiveness of opinion summarization generation against extractive text summarization. Sentiment that includes emotion which indicates whether a sentence may be positive, negative or neutral is considered. Sentences that have strong sentiment, either positive or negative are deemed important in text summarization to capture the sentiments in a story text. Thus, a comparative study is conducted on two types of summarizations; opinion summarization using the proposed method, which uses two different sentiment lexicons: VADER and SentiWordNet against extractive summarization using established methods: Luhn, Latent Semantic Analysis (LSA) and LexRank. An experiment was performed on 20 news stories, comparing summaries generated by the proposed opinion summarization method against the summaries generated by established extractive summarization methods. From the experiment, the VADER sentiment analyzer produced the best score of 0.51 when evaluated against the LSA method using ROUGE-1 metric. This implies that opinion summarization converges with extractive summarization.

1 citations

Journal ArticleDOI
TL;DR: This research work has proposed an Entity Aware Text Summarization using Document Clustering (EASDC) technique to extract summary from multidocuments and it shown an improvement of 1.6 percentage when compared with the baseline methods of Textrank and Lexrank.
Abstract: Due to the rapid development of internet technology, social media and popular research article databases have generated many open text information. This large amount of textual information leads to 'Big Data'. Textual information can be recorded repeatedly about an event or topic on different websites. Text summarization (TS) is an emerging research field that helps to produce summary from a single or multiple documents. The redundant information in the documents is difficult, hence part or all of the sentences may be omitted without changing the gist of the document. TS can be organized as an exposition to collect accents from its special position, rather than being semantic in nature. Non-ASCII characters and pronunciation, including tokenizing and lemmatization are involved in generating a summary. This research work has proposed an Entity Aware Text Summarization using Document Clustering (EASDC) technique to extract summary from multidocuments. Named Entity Recognition (NER) has a vital part in the proposed work. The topics and key terms are identified using the NER technique. Extracted entities are ranked with Zipf’s law and sentence clusters are formed using k-means clustering. Cosine similarity-based technique is used to eliminate the similar sentences from multi-documents and produce unique summary. The proposed EASDC technique is evaluated using CNN dataset and it shown an improvement of 1.6 percentage when compared with the baseline methods of Textrank and Lexrank. Keywords—Named entity recognition; text summarization; kmeans clustering; Zipf’s law

1 citations

01 Dec 2005
TL;DR: Redundancy in large text collections, such as the web, creates both problems and opportunities for natural language systems and can be exploited to identify important and accurate information for applications such as summarization and question answering.
Abstract: Redundancy in large text collections, such as the web, creates both problems and opportunities for natural language systems. On the one hand, the presence of numerous sources conveying the same information causes difficulties for end users of search engines and news providers; they must read the same information over and over again. On the other hand, redundancy can be exploited to identify important and accurate information for applications such as summarization and question answering.

1 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852