Topic
Multi-document summarization
About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.
Papers published on a yearly basis
Papers
More filters
•
01 May 2010TL;DR: A new algorithm for automatic summarization of specialized texts combining terminological and semantic resources: a term extractor and an ontology that obtains quite good results although the perception is that there is a space for improvement.
Abstract: This paper presents a new algorithm for automatic summarization of specialized texts combining terminological and semantic resources: a term extractor and an ontology. The term extractor provides the list of the terms that are present in the text together their corresponding termhood. The ontology is used to calculate the semantic similarity among the terms found in the main body and those present in the document title. The general idea is to obtain a relevance score for each sentence taking into account both the termhood of the terms found in such sentence and the similarity among such terms and those terms present in the title of the document. The phrases with the highest score are chosen to take part of the final summary. We evaluate the algorithm with Rouge, comparing the resulting summaries with the summaries of other summarizers. The sentence selection algorithm was also tested as part of a standalone summarizer. In both cases it obtains quite good results although the perception is that there is a space for improvement.
8 citations
••
29 Oct 2012TL;DR: A novel approach is proposed that integrates all query-oriented relevance, information richness and novelty requirements skillfully by treating them as sentence features, making that the finally generated summary could fully reflect the combinational effect of these properties.
Abstract: Query-oriented relevance, information richness and novelty are important requirements in query-focused summarization, which, to a considerable extent, determine the summary quality Previous work either rarely took into account all above demands simultaneously or dealt with part of them in the dynamic process of choosing sentences to generate a summary In this paper, we propose a novel approach that integrates all these requirements skillfully by treating them as sentence features, making that the finally generated summary could fully reflect the combinational effect of these properties Experimental results on the DUC2005 and DUC2006 datasets demonstrate the effectiveness of our approach
8 citations
•
IBM1
TL;DR: In this paper, a method, computer system, and computer program product for generating a multi-document summary is provided, which is based on a query statement and one or more documents.
Abstract: A method, computer system, and computer program product for generating a multi-document summary is provided. The embodiment may include receiving a query statement, one or more documents, one or more summary constraints, and quality goals. The embodiment may include identifying one or more keywords within the query statement. The embodiment may include performing a sentence selection from the one or more documents based on the one or more identified keywords. The embodiment may include generating a plurality of candidate summaries of the one or more documents based on the performed sentence selection, the goals, and a cross entropy method. The embodiment may include calculating a quality score for each of the plurality of generated candidate summaries using a plurality of quality features. The embodiment may include selecting a candidate summary from the plurality of generated candidate summaries with the highest calculated quality score that also satisfies a quality score threshold.
8 citations
••
22 Sep 2010TL;DR: In this article, a novel approach for summarizing documents retrieved from the Internet is proposed to capture the semantic nature of a document, expressed in natural language, in order to retrieve a number of RDF triplets and to cluster these ones aggregating similar information.
Abstract: Documents’ summarization techniques automatically extract relevant information from different sources with respect to a list of topics: they can be profitably used by a variety of applications and in particular for automatic indexing and categorization in order to facilitate the production and delivery of new multimedia contents. In this paper we propose a novel approach for summarizing documents retrieved from the Internet: we propose to capture the semantic nature of a document, expressed in natural language, in order to retrieve a number of RDF triplets and to clusterize these ones aggregating similar information. An overview of the system and some preliminary results are described.
8 citations
•
17 Nov 2008TL;DR: This work extracted several features of different types for each of the sentences in the document collection in order to measure its relevancy to the user query and experimented with two well-known unsupervised statistical machine learning techniques: K-Means and EM algorithms and evaluated their performances.
Abstract: When a user is served with a ranked list of relevant documents by the standard document search engines, his search task is usually not over. He has to go through the entire document contents to judge its relevance and to find the precise piece of information he was looking for. Query-relevant summarization tries to remove the onus on the end-user by providing more condensed and direct access to relevant information. Query-relevant summarization is the task to synthesize a fluent, well-organized summary of the document collection that answers the user questions. We extracted several features of different types (i.e. lexical, lexical semantic, statistical and cosine similarity ) for each of the sentences in the document collection in order to measure its relevancy to the user query. We experimented with two well-known unsupervised statistical machine learning techniques: K-Means and EM algorithms and evaluated their performances. For all these methods of generating summaries, we have shown the effects of different kinds of features.
8 citations