scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
01 Jan 2015
TL;DR: This paper addresses the summarization of forum threads with domain-independent and language-independent methodology, using recurrent neural networks, and evaluates the system on data from four different web forums, covering different domains, languages and user communities.
Abstract: In the DISCOSUMO project, we aim to develop a computational toolkit to automatically summarize discussion forum threads. In this paper, we present the initial design of the toolkit, the data that we work with and the challenges we face. Discussion threads on a single topic can easily consist of hundreds or even thousands of individual contributions, with no obvious way to gain a quick overview of what kind of information is contained within the thread. We address the summarization of forum threads with domain-independent and language-independent methodology. We evaluate our system on data from four different web forums, covering different domains, languages and user communities. Our approach is largely unsupervised, using recurrent neural networks. Evaluation of the first version should point out where in the pipeline supervised techniques and/or heuristics are required to improve our summarization toolbox. If successful, the automatic summarization of discussion forum threads will play an important role in facilitating easy participation in online discussions.

1 citations

Proceedings Article
01 Jan 2009
TL;DR: The result reveals that the approach based on proposed event-oriented ontology outperformed the traditional text summarization approach in capturing conceptual and procedural knowledge, but the latter was still better in delivering factual knowledge.
Abstract: Document summarization is an important function for knowledge management when a digital library of text documents grows. It allows documents to be presented in a concise manner for easy reading and understanding. Traditionally, document summarization adopts sentence-based mechanisms that identify and extract key sentences from long documents and assemble them together. Although that approach is useful in providing an abstract of documents, it cannot extract the relationship or sequence of a set of related events (also called episodes). This paper proposes an event-oriented ontology approach to constructing episodic knowledge to facilitate the understanding of documents. We also empirically evaluated the proposed approach by using instruments developed based on Bloom’s Taxonomy. The result reveals that the approach based on proposed event-oriented ontology outperformed the traditional text summarization approach in capturing conceptual and procedural knowledge, but the latter was still better in delivering factual knowledge .

1 citations

Book ChapterDOI
20 Sep 2010
TL;DR: The focus of this paper is a question answering system, where the answers are retrieved from a collection of textual documents, which includes automatic document summarization and document visualization by means of a semantic graph.
Abstract: The focus of this paper is a question answering system, where the answers are retrieved from a collection of textual documents. The system also includes automatic document summarization and document visualization by means of a semantic graph. The information extracted from the documents is stored as subject-predicate-object triplets, and the indexed terms are expanded using Cyc, a large common sense ontology.

1 citations

Journal ArticleDOI
TL;DR: The influence of the token chosen in the two-stage sentence selection approach on the quality of the generated summaries is analyzed and proves its validity, compared with the traditional method of sentence selection.
Abstract: Compared with the traditional method of adding sentences to get summary in multi-document summarization, a two-stage sentence selection approach based on deleting sentences in a candidate sentence set to generate summary is proposed, which has two stages, the acquisition of a candidate sentence set and the optimum selection of sentence. At the first stage, the candidate sentence set is obtained by redundancy-based sentence selection approach. At the second stage, optimum selection of sentences is proposed to delete sentences in the candidate sentence set according to its contribution to the whole set until getting the appointed summary length. With a test corpus, the ROUGE value of summaries gotten by the proposed approach proves its validity, compared with the traditional method of sentence selection. The influence of the token chosen in the two-stage sentence selection approach on the quality of the generated summaries is analyzed.

1 citations

Proceedings ArticleDOI
Alpana Dubey1, Atul Kumar2
09 Dec 2008
TL;DR: This paper considers the problem of generating summary from the Web reviews and the rank (usefulness) assigned to these reviews by other users and proposes a technique which takes ranked reviews as input and generates a summary.
Abstract: We propose a technique for summarizing Web reviews. Information summarization has become an important problem in the current content saturated world. One such example is the World Wide Web which provides a platform to publish and evaluate information. This collaborative nature of the Web has enabled users to write their opinion on certain topics and also evaluate others' opinions by assigning ranks. In this paper we show that the above aspect of Web can be utilized to generate more useful summary. We consider the problem of generating summary from the Web reviews and the rank (usefulness) assigned to these reviews by other users. We study the usefulness of user ranks in the summarization task. Based on the study, we propose a technique which takes ranked reviews as input and generates a summary. We experiment with different variations of the proposed technique and evaluate them based on different criteria.

1 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852