scispace - formally typeset
Search or ask a question

Showing papers on "Multi-document summarization published in 1997"


DOI
01 Jan 1997
TL;DR: Empirical results on the identification of strong chains and of significant sentences are presented in this paper, and plans to address short-comings are briefly presented.
Abstract: We investigate one technique to produce a summary of an original text without requiring its full semantic interpretation, but instead relying on a model of the topic progression in the text derived from lexical chains We present a new algorithm to compute lexical chains in a text, merging several robust knowledge sources: the WordNet thesaurus, a part-of-speech tagger, shallow parser for the identification of nominal groups, and a segmentation algorithm Summarization proceeds in four steps: the original text is segmented, lexical chains are constructed, strong chains are identified and significant sentences are extracted We present in this paper empirical results on the identification of strong chains and of significant sentences Preliminary results indicate that quality indicative summaries are produced Pending problems are identified Plans to address these short-comings are briefly presented

1,047 citations


Journal ArticleDOI
01 Mar 1997
TL;DR: This study applies the ideas from the automatic link generation research to attack another important problem in text processing—automatic text summarization, and generates intra-document links between passages of a document.
Abstract: In recent years, information retrieval techniques have been used for automatic generation of semantic hypertext links. This study applies the ideas from the automatic link generation research to attack another important problem in text processing—automatic text summarization. An automatic “general purpose” text summarization tool would be of immense utility in this age of information overload. Using the techniques used (by most automatic hypertext link generation algorithms) for inter-document link generation, we generate intra-document links between passages of a document. Based on the intra-document linkage pattern of a text, we characterize the structure of the text. We apply the knowledge of text structure to do automatic text summarization by passage extraction. We evaluate a set of fifty summaries generated using our techniques by comparing them to paragraph extracts constructed by humans. The automatic summarization methods perform well, especially in view of the fact that the summaries generated by two humans for the same article are surprisingly dissimilar.

525 citations


Proceedings Article
01 Jul 1997
TL;DR: The system’s architecture is described and details of some of its modules, many of them trained on large corpora of text, are provided.
Abstract: SUMMARIST is an attempt to create a robust automated text summarization system, based on the ‘equation’: summarization = topic identification + interpretation + generation. Each of these stages contains several independent modules, many of them trained on large corpora of text. We describe the system’s architecture and provide details of some of its modules.

484 citations


Journal ArticleDOI
01 Dec 1997-Science
TL;DR: In this paper, approaches are outlined for manipulating and accessing texts in arbitrary subject areas in accordance with user needs, and methods are given for determining text themes, traversing texts selectively, and extracting summary statements that reflect text content.
Abstract: Vast amounts of text material are now available in machine-readable form for automatic processing. Here, approaches are outlined for manipulating and accessing texts in arbitrary subject areas in accordance with user needs. In particular, methods are given for determining text themes, traversing texts selectively, and extracting summary statements that reflect text content.

326 citations


Proceedings ArticleDOI
31 Mar 1997
TL;DR: The automated training and evaluation of an Optimal Position Policy is described, a method of locating the likely positions of topic-bearing sentences based on genre-specific regularities of discourse structure that can be used in applications such as information retrieval, routing, and text summarization.
Abstract: This paper addresses the problem of identifying likely topics of texts by their position in the text. It describes the automated training and evaluation of an Optimal Position Policy, a method of locating the likely positions of topic-bearing sentences based on genre-specific regularities of discourse structure. This method can be used in applications such as information retrieval, routing, and text summarization.

312 citations


Proceedings Article
27 Jul 1997
TL;DR: In this article, the authors describe a method for summarizing similarities and differences in a pair of related documents using a graph representation for text, where concepts denoted by words, phrases, and proper names in the document are represented positionally as nodes in the graph along with edges corresponding to semantic relations between items.
Abstract: We describe a new method for summarizing similarities and differences in a pair of related documents using a graph representation for text. Concepts denoted by words, phrases, and proper names in the document are represented positionally as nodes in the graph along with edges corresponding to semantic relations between items. Given a perspective in terms of which the pair of documents is to be summarized, the algorithm first uses a spreading activation technique to discover, in each document, nodes semantically related to the topic. The activated graphs of each document are then matched to yield a graph corresponding to similarities and differences between the pair, which is rendered in natural language. An evaluation of these techniques has been carried out.

247 citations


Journal ArticleDOI
TL;DR: The approach described here exploits the results of recent progress in information extraction to represent salient units of text and their relationships to represent meaningful relations between units based on an analysis of text cohesion and the context in which the comparison is desired.
Abstract: This summarization approach exploits the results of recent progress in information extraction to represent salient units of text and their relationships. By exploiting meaningful relations between units and the perspective from which the comparison is desired, the summarizer can pinpoint similarities and differences, generate composite summaries, and align text segments. These techniques have also been evaluated.

237 citations


25 Jun 1997
TL;DR: A system that constructs a summary by extracting sentences that are likely to represent the main theme of a document by using a probabilistic model that takes into account lexical and statistical information obtained from a document corpus.
Abstract: This paper describes a system that constructs a summary by extracting sentences that are likely to represent the main theme of a document. As a way of selecting summary sentences, the system uses a probabilistic model that takes into account lexical and statistical information obtained from a document corpus. As such, the system consists of two parts: the training part and the summarization part. The former processes sentences that have been manually tagged for summary sentences and extracts necessary statistical information of various kinds, and the latter uses the information to calculate the likelihood of each sentence to become part of a summary. There are at least three unique aspects of this research. First of all, the system uses a text model to identify different components of a text and eliminates parts of text that are not likely to contain summary sentences. Second, although the probabilistic model stems from an existing model developed for English texts, it applies the model to compute multiple probability values based on several features, and computes the final value by combining pieces of evidence from different sources (features) with the Dempster-Shafer theory. Finally, the system is the first of this kind for Korean texts.

12 citations


Posted Content
TL;DR: A new method for summarizing similarities and differences in a pair of related documents using a graph representation for text using a spreading activation technique to discover nodes semantically related to the topic.
Abstract: We describe a new method for summarizing similarities and differences in a pair of related documents using a graph representation for text. Concepts denoted by words, phrases, and proper names in the document are represented positionally as nodes in the graph along with edges corresponding to semantic relations between items. Given a perspective in terms of which the pair of documents is to be summarized, the algorithm first uses a spreading activation technique to discover, in each document, nodes semantically related to the topic. The activated graphs of each document are then matched to yield a graph corresponding to similarities and differences between the pair, which is rendered in natural language. An evaluation of these techniques has been carried out.

9 citations


Proceedings Article
01 Jan 1997
TL;DR: The information to InClude m a summary vanes depending on the author's mtentmn and the use of the summary, and the appropriate goals of the extracting process should be set and a guide should be outlined that instructs the system how to meet the tasks.
Abstract: The information to InClude m a summary vanes depending on the author's mtentmn and the use of the summary To create the best summaries, the appropriate goals of the extracting process should be set and a guide should be outlined that instructs the system how to meet the tasks The approach described m thin report m intended to be a basic archltecture to extract a set of concme sentences that are indicated or predlcted by goals and contexts To evaluate a sentence, the sentence selection algorithm simply measures the mformatlveness of each sentence by comparing with the determined goals, and the algorlthm extracts a set of the hlghest scored bentences by repeat apphcatmn of thin com-

5 citations