scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper applies different supervised learning techniques to build query-focused multi-document summarization systems, where the task is to produce automatic summaries in response to a given query or specific information request stated by the user.
Abstract: In this paper, we apply different supervised learning techniques to build query-focused multi-document summarization systems, where the task is to produce automatic summaries in response to a given query or specific information request stated by the user. A huge amount of labeled data is a prerequisite for supervised training. It is expensive and time-consuming when humans perform the labeling task manually. Automatic labeling can be a good remedy to this problem. We employ five different automatic annotation techniques to build extracts from human abstracts using ROUGE, Basic Element overlap, syntactic similarity measure, semantic similarity measure, and Extended String Subsequence Kernel. The supervised methods we use are Support Vector Machines, Conditional Random Fields, Hidden Markov Models, Maximum Entropy, and two ensemble-based approaches. During different experiments, we analyze the impact of automatic labeling methods on the performance of the applied supervised methods. To our knowledge, no other study has deeply investigated and compared the effects of using different automatic annotation techniques on different supervised learning approaches in the domain of query-focused multi-document summarization.

35 citations

01 Jan 2016
TL;DR: This thesis proposes the task of multi-viewpoint summarization of multilingual social text streams, by monitoring viewpoints for a running topic and selecting a small set of informative documents.
Abstract: In this dissertation, we continue previous research on understanding social media documents along three lines: summarization, classification and recommendation. Our first line of work is the summarization of social media documents. Considering the task of time-aware tweets summarization, we first focus on the problem of selecting meaningful tweets given a user’s interests and propose a dynamic latent factor model. Thereafter, given a set of opinionated documents, we address the task of summarizing contrastive themes by selecting meaningful sentences to represent contrastive themes in those documents. A viewpoint is a triple consisting of an entity, a topic related to this entity and sentiment towards this topic. In this thesis, we also propose the task of multi-viewpoint summarization of multilingual social text streams, by monitoring viewpoints for a running topic and selecting a small set of informative documents. Our second line of work concerns hierarchical multi-label classification. Hierarchical multi-label classification assigns a document to multiple hierarchical labels. Here, we focus on hierarchical multi-label classification of social text streams, in which we propose a structured learning framework to classify a short text from a social text stream to multiple classes from a predefined hierarchy. Based on a viewpoint extraction model that we propose as part of a multi-viewpoint summarization task, our third line of work applies a latent factor model for predicting item ratings that uses user opinions and social relations to generate explanations.

34 citations

Proceedings Article
12 Feb 2017
TL;DR: This paper introduces Active Video Summarization (AVS), an interactive approach to gather the user's preferences while creating the summary, and introduces a new dataset for customized video summarization (CSumm).
Abstract: To facilitate the browsing of long videos, automatic video summarization provides an excerpt that represents its content. In the case of egocentric and consumer videos, due to their personal nature, adapting the summary to specific user's preferences is desirable. Current approaches to customizable video summarization obtain the user's preferences prior to the summarization process. As a result, the user needs to manually modify the summary to further meet the preferences. In this paper, we introduce Active Video Summarization (AVS), an interactive approach to gather the user's preferences while creating the summary. AVS asks questions about the summary to update it on-line until the user is satisfied. To minimize the interaction, the best segment to inquire next is inferred from the previous feedback. We evaluate AVS in the commonly used UTEgo dataset. We also introduce a new dataset for customized video summarization (CSumm) recorded with a Google Glass. The results show that AVS achieves an excellent compromise between usability and quality. In 41% of the videos, AVS is considered the best over all tested baselines, including summaries manually generated. Also, when looking for specific events in the video, AVS provides an average level of satisfaction higher than those of all other baselines after only six questions to the user.

34 citations

Patent
30 May 2006
TL;DR: In this paper, a method for extraction and summarization of sentiment information related to a particular research subject is described, which includes accessing sources of information that contain sentiment information that is related to the research subject and extracting the sentiment information from the sources as opinions related to research subject.
Abstract: Methods and systems for extraction and summarization of sentiment information related to a particular research subject are disclosed. A method includes accessing sources of information that contain sentiment information that is related to the research subject and extracting the sentiment information from the sources of information as opinions related to the research subject. Opinion categories related to features of the research subject are identified. From this information a summarization of the sentiment information that is related to the particular research subject that includes the identified opinion categories is generated. Subsequently, access is provided to the summarization for graphical presentation.

34 citations

Posted Content
TL;DR: This paper considers the effect of the use of lexical cohesion features in Summarization, and proposes an algorithm base on the knowledge base that can improve the performance compared to sate-of-the-art summarization approaches.
Abstract: The technology of automatic document summarization is maturing and may provide a solution to the information overload problem. Nowadays, document summarization plays an important role in information retrieval. With a large volume of documents, presenting the user with a summary of each document greatly facilitates the task of finding the desired documents. Document summarization is a process of automatically creating a compressed version of a given document that provides useful information to users, and multi-document summarization is to produce a summary delivering the majority of information content from a set of documents about an explicit or implicit main topic. The lexical cohesion structure of the text can be exploited to determine the importance of a sentence/phrase. Lexical chains are useful tools to analyze the lexical cohesion structure in a text .In this paper we consider the effect of the use of lexical cohesion features in Summarization, And presenting a algorithm base on the knowledge base. Ours algorithm at first find the correct sense of any word, Then constructs the lexical chains, remove Lexical chains that less score than other, detects topics roughly from lexical chains, segments the text with respect to the topics and selects the most important sentences. The experimental results on an open benchmark datasets from DUC01 and DUC02 show that our proposed approach can improve the performance compared to sate-of-the-art summarization approaches.

34 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852