scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A fully-automated summarizer that compiles comprehensive reviews by extracting important facets and sentiment information based on various sentence features rather than applying complex machine learning algorithms is proposed.
Abstract: Multi-document sentiment analysis is an important natural language processing problem. Summaries generated by these analyzers can greatly reduce the time necessary to read a collection of topically-related documents to locate the desired information needs of a user. With the ever-increasing globalization and technology of the modern day, analysis of online user reviews on different products is an especially pertinent application of the aforementioned problem. At present there are way too many user reviews on popular products for potential buyers to spend adequate time to read and extract the most salient product details and opinions of previous buyers. In solving this problem, we propose a fully-automated summarizer to reduce the workload of online customers. The proposed system takes a user query and extracts the most relevant and essential comments made by individual reviewers. As opposed to existing multi-document summarization approaches, our summarizer compiles comprehensive reviews by extracting important facets and sentiment information based on various sentence features rather than applying complex machine learning algorithms. The design of our summarizer is easy to understand and implement, without the required massive training data and excessive training time. The conducted empirical study shows that the proposed summarization system outperforms current state-of-the-art multi-document sentiment summarization approaches.

3 citations

Proceedings ArticleDOI
01 Jun 2021
TL;DR: Compared to conventional methods, the hybrid generation approach inspired by traditional concept-to-text systems leads to more faithful, relevant and aggregation-sensitive summarization – while being equally fluent.
Abstract: We present a method for generating comparative summaries that highlight similarities and contradictions in input documents. The key challenge in creating such summaries is the lack of large parallel training data required for training typical summarization systems. To this end, we introduce a hybrid generation approach inspired by traditional concept-to-text systems. To enable accurate comparison between different sources, the model first learns to extract pertinent relations from input documents. The content planning component uses deterministic operators to aggregate these relations after identifying a subset for inclusion into a summary. The surface realization component lexicalizes this information using a text-infilling language model. By separately modeling content selection and realization, we can effectively train them with limited annotations. We implemented and tested the model in the domain of nutrition and health – rife with inconsistencies. Compared to conventional methods, our framework leads to more faithful, relevant and aggregation-sensitive summarization – while being equally fluent.

3 citations

Journal ArticleDOI
TL;DR: Experimental results show that there is potential for improving retrieval through query-specific fusion and that analysts found the Detailed Multiple Document Summary to be extremely useful for almost every query, while the Thumbnail sketch was useful in approximately 50% of the queries.
Abstract: A Natural Language Processing based Information Retrieval System that was one of the original systems developed in Phase I of TIPSTER, was the basis of research in TIPSTER III the goal of which was to add two extended capabilities to the core system. Following a description of the multiple levels of linguistic processing that were developed for the original DR-LINK System, details are provided on research into query-specific data fusion and query-specific cross-document summarization. Experimental results show that there is potential for improving retrieval through query-specific fusion and that analysts found the Detailed Multiple Document Summary to be extremely useful for almost every query, while the Thumbnail sketch was useful in approximately 50% of the queries.

3 citations

Journal ArticleDOI
TL;DR: Performance of this query sensitive summarization system is more promising than other measures like cosine similarity, jaccard measure which make use of sparse term-frequent vectors, since the most frequent term sets are consider ed to measure the relevance.
Abstract: Query sensitive summarization aims at extracting th e query relevant contents from web documents. Web page segmentation focuses on reducing the run time overhead of the summarization systems by grouping the related contents of a web page into segments. A t query time, query relevant segments of the web pa ge are identified and important sentences from these s egments are extracted to compose the summary. DOM tree structures of the web documents are utilized t o perform the segmentation of the contents. Leaf no des of DOM tress are merged to form segments according to the statistical and linguistic similarity measur e. The proposed system has been evaluated by intrinsic approach making use of user satisfaction index. Th e performance of the system is compared with summarization without using preprocessed segments. Performance of this system is more promising than t he other measures like cosine similarity, jaccard measure which make use of sparse term-frequent vectors, since the most frequent term sets are consider ed to measure the relevance. Relevant segments alone n eed to be processed at run time for summarization which reduces the time complexity of the summarization process.

3 citations

Book ChapterDOI
01 Apr 2012
TL;DR: This paper investigates how various summarization techniques affect image retrieval performance and shows significant improvements can be obtained when using the summaries for indexing.
Abstract: Images with geo-tagging information are increasingly available on the Web. However, such images need to be annotated with additional textual information if they are to be retrievable, since users do not search by geo-coordinates. We propose to automatically generate such textual information by (1) generating toponyms from the geo-tagging information (2) retrieving Web documents using toponyms as queries (3) summarizing the retrieved documents. The summaries are then used to index the images. In this paper we investigate how various summarization techniques affect image retrieval performance and show significant improvements can be obtained when using the summaries for indexing.

3 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852