scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
01 Jan 2005
TL;DR: A multi-document summarization method based on Latent Semantic Indexing (LSI) that combines several reports on the same issue into a matrix of terms and sentences, and uses a Singular Value Decomposition (SVD) to reduce the dimension of the matrix and extract features.
Abstract: A multi-document summarization method based on Latent Semantic Indexing (LSI) is proposed. The method combines several reports on the same issue into a matrix of terms and sentences, and uses a Singular Value Decomposition (SVD) to reduce the dimension of the matrix and extract features, and then the sentence similarity is computed. The sentences are clustered according to similarity of sentences. The centroid sentences are selected from each class. Finally, the selected sentences are ordered to generate the summarization. The evaluation and results are presented, which prove that the proposed methods are efficient.

3 citations

01 Jan 2006
TL;DR: The SummIt-BMT system as discussed by the authors uses ontologies for information extraction and question answering in the context of Bone Marrow Transplantation (BMT) in a MySQL database.
Abstract: This article reports on ontology use for an automatic summarization that "goes the human way". The idea behind it is that human summary users can comprehend and integrate automatic summaries more easily if they and the automatic summarizer share summarization principles and practices. Our currentfirst real-world application is in bone marrow transplantation (BMT). In the core of the SummIt-BMT system, a domain ontology in a MySQL database provides knowledge for human users and system components. SummIt-BMT supports query formulation through an empirically founded scenario interface. Incoming retrieval results are pre-selected by a text retrieval component and submitted to agents reflecting summarization strategies of competent humans. The agents choose from the text passage retrieval result the sentences that best fit the user question as evidenced by ontology propositions occurring in them. The relevant text clips are entered into the answer version of the question scenario and presented with links to their home positions in the source documents. Summarization and information extraction is ontology-based. It uses the relatively well-defined concepts for objects and properties and finds evidence for relations between them with the help of paraphrases. Discussion concentrates on the ontology and its use for information extraction and question answering / summarization. The system agents are heavy users of the ontology. They typically fetch and combine different types of knowledge from the ontology database: concepts, propositions and their semanto-syntactic schemes, unifiers, paraphrases and query scenario forms. The main achievement of the agents is to keep only text retrieval results that meet user question propositions not only by individual concepts, but also by related units corresponding to phrases or sentences. Our first results are presented in the final section of the paper. They are not yet excellent, but quite good for a start-up team of agents and an ontology that is open for improvement.

3 citations

Proceedings Article
01 Dec 2009
TL;DR: The proposed approach to query-focused multi-document summarization makes use of both the content feature and the relationship feature to select a number of sentences via the co- training based semi-supervised learning, which can identify the query relevant sentences beyond a single point of view.
Abstract: This paper presents a novel approach to query-focused multi-document summarization. As a good biased summary is expected to keep a balance among query relevance, content salience and information diversity, the approach first makes use of both the content feature and the relationship feature to select a number of sentences via the co- training based semi-supervised learning, which can identify the query relevant sentences beyond a single point of view. Then the ranking algorithm based on Markov chain random walks is employed on the relevant sentences by encouraging content salience and information diversity in a unified framework. The final summary focusing on the integration of relevance, salience and diversity is created after several sentences with the highest overall ranking scores are extracted. We performed experiments on DUC2007 dataset and the evaluation results show that the proposed approach can achieve significant improvement over standard baseline approaches and gain comparable performance to the state-of-the-art systems.

3 citations

Proceedings ArticleDOI
23 Dec 2013
TL;DR: In the proposed system, classification of keywords by higher ranking of topics has contributed to an active role for the extraction of summarization, the results of summary ratio in social web is 40%-50%.
Abstract: The proposed system discuss a text summarization system over the social web site. The proposed system works by assigning scores to sentences in the document to be summarized, and using the highest ranking sentences in the summary. Highest ranking values are based on features extracted from the sentence. A linear combination of feature scores is used. In addition to basic summarization, some attempt is made to address the issue of targeting the text at the user. The intended user is considered to have little background knowledge or reading ability. The system helps by simplifying the individual words used in the summary. In the proposed system, classification of keywords by higher ranking of topics has contributed to an active role for the extraction of summarization, the results of summarization ratio in social web is 40%-50%.

3 citations

Book ChapterDOI
24 Apr 2009
TL;DR: This work proposes an approach for the enrichment of geographical databases (GDB) and especially their semantic component by providing knowledge extracted from web documents to supplement the aspatial data of the GDB.
Abstract: The power of geographic information system is to help managers make critical decisions they face daily The ability to make sound decisions relies upon the availability of relevant information Typically, spatial databases do not contain much information that could support the decision making process in all situations To extend the available dataset, we propose an approach for the enrichment of geographical databases (GDB) and especially their semantic component This enrichment is performed by providing knowledge extracted from web documents to supplement the aspatial data of the GDB The knowledge extraction process is reached through the generation of condensed representation of the relevant information derived from a web corpus This process is carried out in a distributed fashion that complies with the multi-agents paradigm

3 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852