scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Journal Article
TL;DR: Results from the DUC2005 evaluation indicate that the model of concept co-occurrence graph can be put into practice and is capable of digging out subjects hidden deep in the document set.
Abstract: A concept co-occurrence graph model was proposed and applied to automatic multi-document summarization.This model bases itself on the concept counting,disambiguating the different meanings of multi-sense words on the basis of the semantic resource — WordNet and merging concepts.It constructs concept co-occurrence graphs and extracts subject concepts from the multi-document set by means of the co-occurrence information between concepts.Subsequently,it builds a vector space model and computes sentence importance in accordance with the subject concepts.As a result of generalizing the concepts well,this model is capable of digging out subjects hidden deep in the document set.Results from the DUC2005 evaluation indicate that the model of concept co-occurrence graph can be put into practice.

1 citations

Book ChapterDOI
13 Jan 2014
TL;DR: A web based and domain independent automatic text summarization method that focuses on generating an arbitrary length summary by extracting and assigning scores to semantically important information from the document by analyzing term frequencies and tagging certain parts of speech like proper nouns and signal words.
Abstract: In today's digital epoch, people share and read a motley of never ending electronic information, thus either a lot of time is wasted in deciphering all this information, or only a tiny amount of it is actually read. Therefore, it is imperative to contrive a generic text summarization technique. In this paper, we propose a web based and domain independent automatic text summarization method. The method focuses on generating an arbitrary length summary by extracting and assigning scores to semantically important information from the document, by analyzing term frequencies and tagging certain parts of speech like proper nouns and signal words. Another important characteristic of our approach is that it also takes font semantics of the text (like headings and emphasized texts) into consideration while scoring different entities of the document.

1 citations

Journal ArticleDOI
TL;DR: A framework of ontology-based query language of data summarization based on the proposed ontology structure for summarizing the data incompleteness demonstrates the effectiveness of the proposed framework.
Abstract: Highlights? Data summarization query system is discussed. ? A generic structure of ontology for data summarization query system is proposed. ? A framework of ontology-based query language of data summarization based on the proposed ontology structure is developed. ? A prototype project of data summarization ontology-based Query by Examples (QBE) is presented. Data summarization has recently received considerable attention in the knowledge systems community. This paper discusses the design of data summarization query system. Based on an initial analysis of requirement representations in data summarization, the study develops a generic organization of ontology for data summarization query system. Furthermore, this paper proposes a framework of ontology-based query language of data summarization based on the proposed ontology structure. A prototype project of data summarization ontology-based Query by Examples (QBE) for summarizing the data incompleteness demonstrates the effectiveness of the proposed framework.

1 citations

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a novel Heterogeneous tree structure-based extractive summarization (HetTreSum) model, where each document is modeled as a tree structure to learn inter-sentence relations and structural information of the original document is incorporated.
Abstract: Scientific paper summarization aims at generating a short and concise digest while preserving important information of the original document. Currently, scientific paper summarization faces two main challenges. First, inter-sentence relations are hard to learn, especially in the case of long-form scientific papers. Second, structural information of the well-structured scientific papers has not been fully exploited. To overcome the above two challenges, we propose a novel Heterogeneous Tree structure-based extractive Summarization (HetTreSum) model, where each document is modeled as a tree structure to learn inter-sentence relations and structural information of the original document is incorporated, enabling the tree structure to have a global perspective of the whole document. Then an iterative updating strategy is presented to interactively refine nodes of the tree structure for better contextualized representations, which can further enhance summarization performance. Experimental results on PubMed and arXiv datasets show that our proposed HetTreeSum model achieves significantly advanced performance compared with various scientific paper summarization models.

1 citations

27 Sep 2017
TL;DR: A new phrase-based highlighting scheme for automatic summarization is introduced that highlights the phrases in the human summaries and also the corresponding semantically-equivalent phrases in student responses.
Abstract: Educational research has demonstrated that asking students to respond to reflection prompts can improve both teaching and learning. However, summarizing student responses to these prompts is an onerous task for humans and poses challenges for existing summarization methods. From the input perspective, there are three challenges. First, there is a lexical variety problem due to the fact that different students tend to use different expressions. Second, there is a length variety problem that student inputs range from single words to multiple sentences. Third, there is a redundancy issue since some content among student responses are not useful. From the output perspective, there are two additional challenges. First, the human summaries consist of a list of important phrases instead of sentences. Second, from an instructor's perspective, the number of students who have a particular problem or are interested in a particular topic is valuable. The goal of this research is to enhance student response summarization at multiple levels of granularity. At the sentence level, we propose a novel summarization algorithm by extending traditional ILP-based framework with a low-rank matrix approximation to address the challenge of lexical variety. At the phrase level, we propose a phrase summarization framework by a combination of phrase extraction, phrase clustering, and phrase ranking. Experimental results show the effectiveness on multiple student response data sets. Also at the phrase level, we propose a quantitative phrase summarization algorithm in order to estimate the number of students who semantically mention the phrases in a summary. We first introduce a new phrase-based highlighting scheme for automatic summarization. It highlights the phrases in the human summaries and also the corresponding semantically-equivalent phrases in student responses. Enabled by the highlighting scheme, we improve the previous phrase-based summarization framework by developing a supervised candidate phrase extraction, learning to estimate the phrase similarities, and experimenting with different clustering algorithms to group phrases into clusters. Experimental results show that our proposed methods not only yield better summarization performance evaluated using ROUGE, but also produce summaries that capture the pressing student needs.

1 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852