scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings Article
01 May 2010
TL;DR: A new algorithm for automatic summarization of specialized texts combining terminological and semantic resources: a term extractor and an ontology that obtains quite good results although the perception is that there is a space for improvement.
Abstract: This paper presents a new algorithm for automatic summarization of specialized texts combining terminological and semantic resources: a term extractor and an ontology. The term extractor provides the list of the terms that are present in the text together their corresponding termhood. The ontology is used to calculate the semantic similarity among the terms found in the main body and those present in the document title. The general idea is to obtain a relevance score for each sentence taking into account both the ”termhood” of the terms found in such sentence and the similarity among such terms and those terms present in the title of the document. The phrases with the highest score are chosen to take part of the final summary. We evaluate the algorithm with Rouge, comparing the resulting summaries with the summaries of other summarizers. The sentence selection algorithm was also tested as part of a standalone summarizer. In both cases it obtains quite good results although the perception is that there is a space for improvement.

8 citations

Proceedings ArticleDOI
29 Oct 2012
TL;DR: A novel approach is proposed that integrates all query-oriented relevance, information richness and novelty requirements skillfully by treating them as sentence features, making that the finally generated summary could fully reflect the combinational effect of these properties.
Abstract: Query-oriented relevance, information richness and novelty are important requirements in query-focused summarization, which, to a considerable extent, determine the summary quality Previous work either rarely took into account all above demands simultaneously or dealt with part of them in the dynamic process of choosing sentences to generate a summary In this paper, we propose a novel approach that integrates all these requirements skillfully by treating them as sentence features, making that the finally generated summary could fully reflect the combinational effect of these properties Experimental results on the DUC2005 and DUC2006 datasets demonstrate the effectiveness of our approach

8 citations

Patent
15 Dec 2017
TL;DR: In this paper, a method, computer system, and computer program product for generating a multi-document summary is provided, which is based on a query statement and one or more documents.
Abstract: A method, computer system, and computer program product for generating a multi-document summary is provided. The embodiment may include receiving a query statement, one or more documents, one or more summary constraints, and quality goals. The embodiment may include identifying one or more keywords within the query statement. The embodiment may include performing a sentence selection from the one or more documents based on the one or more identified keywords. The embodiment may include generating a plurality of candidate summaries of the one or more documents based on the performed sentence selection, the goals, and a cross entropy method. The embodiment may include calculating a quality score for each of the plurality of generated candidate summaries using a plurality of quality features. The embodiment may include selecting a candidate summary from the plurality of generated candidate summaries with the highest calculated quality score that also satisfies a quality score threshold.

8 citations

Proceedings ArticleDOI
22 Sep 2010
TL;DR: In this article, a novel approach for summarizing documents retrieved from the Internet is proposed to capture the semantic nature of a document, expressed in natural language, in order to retrieve a number of RDF triplets and to cluster these ones aggregating similar information.
Abstract: Documents’ summarization techniques automatically extract relevant information from different sources with respect to a list of topics: they can be profitably used by a variety of applications and in particular for automatic indexing and categorization in order to facilitate the production and delivery of new multimedia contents. In this paper we propose a novel approach for summarizing documents retrieved from the Internet: we propose to capture the semantic nature of a document, expressed in natural language, in order to retrieve a number of RDF triplets and to clusterize these ones aggregating similar information. An overview of the system and some preliminary results are described.

8 citations

Proceedings Article
17 Nov 2008
TL;DR: This work extracted several features of different types for each of the sentences in the document collection in order to measure its relevancy to the user query and experimented with two well-known unsupervised statistical machine learning techniques: K-Means and EM algorithms and evaluated their performances.
Abstract: When a user is served with a ranked list of relevant documents by the standard document search engines, his search task is usually not over. He has to go through the entire document contents to judge its relevance and to find the precise piece of information he was looking for. Query-relevant summarization tries to remove the onus on the end-user by providing more condensed and direct access to relevant information. Query-relevant summarization is the task to synthesize a fluent, well-organized summary of the document collection that answers the user questions. We extracted several features of different types (i.e. lexical, lexical semantic, statistical and cosine similarity ) for each of the sentences in the document collection in order to measure its relevancy to the user query. We experimented with two well-known unsupervised statistical machine learning techniques: K-Means and EM algorithms and evaluated their performances. For all these methods of generating summaries, we have shown the effects of different kinds of features.

8 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852