scispace - formally typeset
Search or ask a question

Showing papers on "Multi-document summarization published in 1998"


Journal ArticleDOI
01 Aug 1998
TL;DR: A method for combining query-relevance with information-novelty in the context of text retrieval and summarization and preliminary results indicate some benefits for MMR diversity ranking in document retrieval and in single document summarization.
Abstract: This paper presents a method for combining query-relevance with information-novelty in the context of text retrieval and summarization. The Maximal Marginal Relevance (MMR) criterion strives to reduce redundancy while maintaining query relevance in re-ranking retrieved documents and in selecting apprw priate passages for text summarization. Preliminary results indicate some benefits for MMR diversity ranking in document retrieval and in single document summarization. The latter are borne out by the recent results of the SUMMAC conference in the evaluation of summarization systems. However, the clearest advantage is demonstrated in constructing non-redundant multi-document summaries, where MMR results are clearly superior to non-MMR passage selection.

2,365 citations


Patent
Yoshio Nakao1
16 Jan 1998
TL;DR: In this paper, a focused information relevant portion extraction unit extracts a portion related to two types of focused information in a document to be summarized, i.e., user-focused information as information focused by a user who uses a summary and author-focused as information emphasized by an author of the document.
Abstract: A document summarization apparatus or method summarizes an electronic document written in a natural language, and generates an appropriate summary depending on user's focus and user's knowledge. The document summarization apparatus according to the present invention includes, for example, a focused information relevant portion extraction unit, a summary readability improvement unit, and a summary generation unit. The focused information relevant portion extraction unit extracts a portion related to two types of focused information in a document to be summarized based on the two types of focused information, that is, user-focused information as information focused by a user who uses a summary, and author-focused information as information emphasized by an author of the document to be summarized. In the document to be summarized, the summary readability improvement unit distinguishes user known information already known to a user, and information known through an access log regarded as already known to a user based on a document previously presented to the user when a summary is generated, from other information than these two types of information, and selects an important portion in the document to be summarized. The summary generation unit generates the summary of the document to be summarized based on the selection result of the summary readability improvement unit. Thus, a summary can be generated with both user-focused information and author-focused information can be included depending on the knowledge level of a user.

378 citations


Proceedings ArticleDOI
01 Nov 1998
TL;DR: A new approach to extracting information from unstructured documents based on an application ontology that describes a domain of interest is presented, which attained recall ratios in the 80% and 90% range and precision ratios near 98%.
Abstract: We present a new approach to extracting information from unstructured documents based on an application ontology that describes a domain of interest. Starting with such an ontology, we formulate rules to extract constants and context keywords from unstructured documents. For each unstructured document of interest, we extract its constants and keywords and apply a recognizer to organize extracted constants as attribute values of tuples in a generated database schema. To make our approach general, we fix all the processes and change only the ontological description for a different application domain. In experiments we conducted on two different types of unstructured documents taken from the Web, our approach attained recall ratios in the 80% and 90% range and precision ratios near 98%.

199 citations


Proceedings Article
01 Jul 1998
TL;DR: This paper used machine learning on a training corpus of documents and their abstracts to discover salience functions which describe what combination of features is optimal for a given summarization task, which addresses both "generic" and user-focused summaries.
Abstract: A key problem in text summarization is finding a salience function which determines what information in the source should be included in the summary. This paper describes the use of machine learning on a training corpus of documents and their abstracts to discover salience functions which describe what combination of features is optimal for a given summarization task. The method addresses both "generic" and user-focused summaries.

121 citations


Proceedings ArticleDOI
Mark Wasson1
10 Aug 1998
TL;DR: Leading text extracts created to support some online Boolean retrieval goals are evaluated for their acceptability as news document summaries.
Abstract: Leading text extracts created to support some online Boolean retrieval goals are evaluated for their acceptability as news document summaries. Results are presented and discussed from the perspective of commercial summarization technology needs.

82 citations



Proceedings ArticleDOI
10 Aug 1998
TL;DR: A trainable and scalable summarization system which utilizes features derived from information retrieval, information extraction, and NLP techniques and on-line resources and is demonstrated system scalability by reporting results on the best combination of summarization features for different document sources.
Abstract: We describe a trainable and scalable summarization system which utilizes features derived from information retrieval, information extraction, and NLP techniques and on-line resources. The system combines these features using a trainable feature combiner learned from summary examples through a machine learning algorithm. We demonstrate system scalability by reporting results on the best combination of summarization features for different document sources. We also present preliminary results from a task-based evaluation on summarization output usability.

68 citations


Proceedings ArticleDOI
10 Aug 1998
TL;DR: The main features are a domain/style-free algorithm and personalization on summarization which reflects readers' interests and preferences and the proposed method is flexible enough to dynamically generate summaries of various sizes.
Abstract: The GDA (Global Document Annotation) project proposes a tag set which allows machines to automatically infer the underlying semantic/pragmatic structure of documents. Its objectives are to promote development and spread of NLP/AI applications to render GDA-tagged documents versatile and intelligent contents, which should motivate WWW (World Wide Web) users to tag their documents as part of content authoring. This paper discusses automatic text summarization based on GDA. Its main features are a domain/style-free algorithm and personalization on summarization which reflects readers' interests and preferences. In order to calculate the importance score of a text element, the algorithm uses spreading activation on an intradocument network which connects text elements via thematic, rhetorical, and coreferential relations. The proposed method is flexible enough to dynamically generate summaries of various sizes. A summary browser supporting personalization is reported as well.

44 citations


Proceedings ArticleDOI
28 Dec 1998
TL;DR: A hierarchical key-frames summarization algorithm where a coarse-to-fine key-frame summary is generated that facilitates multi-level browsing where the user can quickly discover the content of the video by accessing its coarsest but most compact summary and then view a desired segment of theVideo with increasingly more detail.
Abstract: We address the problem of key-frame summarization of vide in the absence of any a priori information about its content. This is a common problem that is encountered in home videos. We propose a hierarchical key-frame summarization algorithm where a coarse-to-fine key-frame summary is generated. A hierarchical key-frame summary facilitates multi-level browsing where the user can quickly discover the content of the video by accessing its coarsest but most compact summary and then view a desired segment of the video with increasingly more detail. At the finest level, the summary is generated on the basis of color features of video frames, using an extension of a recently proposed key-frame extraction algorithm. The finest level key-frames are recursively clustered using a novel pairwise K-means clustering approach with temporal consecutiveness constraint. We also address summarization of MPEG-2 compressed video without fully decoding the bitstream. We also propose efficient mechanisms that facilitate decoding the video when the hierarchical summary is utilized in browsing and playback of video segments starting at selected key-frames.© (1998) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

42 citations


Proceedings ArticleDOI
13 Oct 1998
TL;DR: A method for combining query-relevance with information-novelty in the context of text retrieval and summarization, where the clearest advantage is demonstrated in the automated construction of large document and non-redundant multi-document summaries, where MMR results are clearly superior to non-MMR passage selection.
Abstract: This paper develops a method for combining query-relevance with information-novelty in the context of text retrieval and summarization. The Maximal Marginal Relevance (MMR) criterion strives to reduce redundancy while maintaining query relevance in reranking retrieved documents and in selecting appropriate passages for text summarization. Preliminary results indicate some benefits for MMR diversity ranking in ad-hoc query and in single document summarization. The latter are borne out by the trial-run (unofficial) TREC-style evaluation of summarization systems. However, the clearest advantage is demonstrated in the automated construction of large document and non-redundant multi-document summaries, where MMR results are clearly superior to non-MMR passage selection. This paper also discusses our preliminary evaluation of summarization methods for single documents.

36 citations




Proceedings ArticleDOI
10 Aug 1998
TL;DR: The role of automated document summarization in building effective search statements is investigated and the results of latest evaluation of the system at the annual Text Retrieval Conference (TREC) are discussed.
Abstract: We discuss a semi-interactive approach to information retrieval which consists of two tasks performed in a sequence. First, the system assists the searcher in building a comprehensive statement of information need, using automatically generated topical summaries of sample documents. Second, the detailed statement of information need is automatically processed by a series of natural language processing routines in order to derive an optimal search query for a statistical information retrieval system. In this paper, we investigate the role of automated document summarization in building effective search statements. We also discuss the results of latest evaluation of our system at the annual Text Retrieval Conference (TREC).

01 Jan 1998
TL;DR: The research described here focuses on multi-lingual summarization (MLS), where summaries of documents are produced in their original language; corresponding summaries in English will eventually be generated.
Abstract: The research described here focuses on multi-lingual summarization (MLS). Summaries of documents are produced in their original language; corresponding summaries in English will eventually be generated. The source languages supported are Spanish, Japanese, English and Russian. Background

Proceedings ArticleDOI
13 Oct 1998
TL;DR: The approach is sentence selection, but includes techniques to improve coherence and also to perform sentence reduction to support robust automatic summarization.
Abstract: We discuss those techniques which, in the opinion of the authors, are needed to support robust automatic summarization. Many of these methods are already incorporated in a multi-lingual summarization system, MINDS, developed at CRL. The approach is sentence selection, but includes techniques to improve coherence and also to perform sentence reduction. Our methods are in distinct contrast to those approaches to summarization by deep analysis of a document followed by text generation.

01 Jan 1998
TL;DR: It is argued that for summarization to be successful, metadata, a manipulable representation of the content of figures, needs to be generated or included initially.
Abstract: When documents include graphics such as diagrams, photos, and data plots, the graphics may also require summarization. This paper discusses essential differences in informational content and rhetorical structure between text and graphics, as well as their interplay. The three approaches to graphics summarization discussed are: Selection, in which a subset of figures is chosen; Merging, in which information in multiple figures is merged into one; and Distillation, in which a single diagram is reduced to a simpler form. These procedures have to consider the content and relations of the graphical elements within figures, the relations among a collection of figures, and the figure captions and discussions of figure content in the running text. We argue that for summarization to be successful, metadata, a manipulable representation of the content of figures, needs t, k generated or included initially. Often, the textual refert to figures are not very informative, so it will be necessa generate metadata by diagram parsing, as we have dont to develop intelligent authorh~g systems that will allow the author to easily include metadata. This paper introduces this new area of research with manual summarization examples and follows them with a discussion of automated techniques under development. For example, here is how two data graphs might be merged:

Proceedings ArticleDOI
13 Oct 1998
TL;DR: This research is developing a procedure to evaluate the summaries the authors create and hopes to uncover useful metrics and evaluation variables that can be used by others working in this area.
Abstract: Our Tipster Phase III research objective for the Summarization task is to produce a single summary across multiple documents returned from a search on an information retrieval system. An established set of metrics to evaluate the performance of our system is not available in this field at present, so this research is also developing a procedure to evaluate the summaries we create. We hope to uncover useful metrics and evaluation variables that can be used by others working in this area.


Proceedings ArticleDOI
13 Oct 1998
TL;DR: The TIPSTER program sponsored seven research efforts into text summarization, all with different approaches to the problem, and there is considerable interest in automatically producing summaries due to the growth of the Internet and the World Wide Web.
Abstract: Automatic Text Summarization was added as a major research thrust of the TIPSTER program during TIPSTER Phase III, 1996-1998. It is a natural extension of the previously supported research efforts in Information Extraction (IE) and Information Retrieval (IR). There is considerable interest in automatically producing summaries due, in large part, to the growth of the Internet and the World Wide Web. The TIPSTER program sponsored seven research efforts into text summarization, all with different approaches to the problem.

01 Jan 1998
TL;DR: This research will test an evaluation method and metric to compare human assessments with machine output of newstext multiple document summaries and uncover useful metrics and evaluation variables that can be used by other research efforts in this area.
Abstract: This paper describes an ongoing research effort to produce multiple document summaries in response to information requests. Given the absence of tools to evaluate multiple document summaries, this research will test an evaluation method and metric to compare human assessments with machine output of newstext multiple document summaries. Using the DR-LINK information retrieval and analysis system, components of documents and metadata generated during document processing become candidates tbr use in multiple document summaries. This research is sponsored by the U.S. Government through the Tipster Phase Ill Text Summarization project. TextWise is a participant in the Tipster Phase III Text Summarization project funded by the U.S. Government. Our research objective is to produce high quality multiple document summaries. An established set of metrics to evaluate the performance of our production of multiple document summaries is not available at present. Therefore, this research effort is also concerned with developing a procedure to evaluate the summaries we create. We hope that we will uncover useful metrics and evaluation variables that can be used by other research efforts in this area. The lack of automatic summarization evaluation tools is directly connected to the need for a comprehensive description of the different types of summaries possible. Automatic text summarization can mean many different things. The summary may be addressing a need of an information seeker (query dependent summary) or it may be independent of any specified information need (generic summary). The summary may represent a single document (single document summary) or a group documents (multiple document summary). The summary may be an extract of sentences or pieces of text from a document (extract summary) or it may not use any of the actual wording from the source documents (generated text summary). Finally, the summary may provide general overview of document contents (indicative summary), or it may act as a substitute for the actual document (informative summary). This terminology will be used though out this report in an attempt to clarify and define the various possible outcomes of automatic text summarization.

Proceedings ArticleDOI
13 Oct 1998
TL;DR: In Phase III, the GE team focused on accurate context indexing of text documents, generation of effective search queries, extended statistical retrieval with constraints, and document abstracting and summarization.
Abstract: In Phase III, the GE team focused on accurate context indexing of text documents, generation of effective search queries, extended statistical retrieval with constraints, and document abstracting and summarization