Showing papers on "Multi-document summarization published in 1998"

PDF

Open Access

Journal Article•DOI•

The use of MMR, diversity-based reranking for reordering documents and producing summaries

[...]

Jaime Carbinell¹, Jade Goldstein¹•Institutions (1)

01 Aug 1998

TL;DR: A method for combining query-relevance with information-novelty in the context of text retrieval and summarization and preliminary results indicate some benefits for MMR diversity ranking in document retrieval and in single document summarization.

...read moreread less

Abstract: This paper presents a method for combining query-relevance with information-novelty in the context of text retrieval and summarization. The Maximal Marginal Relevance (MMR) criterion strives to reduce redundancy while maintaining query relevance in re-ranking retrieved documents and in selecting apprw priate passages for text summarization. Preliminary results indicate some benefits for MMR diversity ranking in document retrieval and in single document summarization. The latter are borne out by the recent results of the SUMMAC conference in the evaluation of summarization systems. However, the clearest advantage is demonstrated in constructing non-redundant multi-document summaries, where MMR results are clearly superior to non-MMR passage selection.

...read moreread less

2,365 citations

Patent•

Summarization apparatus and method

[...]

Yoshio Nakao¹•Institutions (1)

Fujitsu¹

16 Jan 1998

TL;DR: In this paper, a focused information relevant portion extraction unit extracts a portion related to two types of focused information in a document to be summarized, i.e., user-focused information as information focused by a user who uses a summary and author-focused as information emphasized by an author of the document.

...read moreread less

Abstract: A document summarization apparatus or method summarizes an electronic document written in a natural language, and generates an appropriate summary depending on user's focus and user's knowledge. The document summarization apparatus according to the present invention includes, for example, a focused information relevant portion extraction unit, a summary readability improvement unit, and a summary generation unit. The focused information relevant portion extraction unit extracts a portion related to two types of focused information in a document to be summarized based on the two types of focused information, that is, user-focused information as information focused by a user who uses a summary, and author-focused information as information emphasized by an author of the document to be summarized. In the document to be summarized, the summary readability improvement unit distinguishes user known information already known to a user, and information known through an access log regarded as already known to a user based on a document previously presented to the user when a summary is generated, from other information than these two types of information, and selects an important portion in the document to be summarized. The summary generation unit generates the summary of the document to be summarized based on the selection result of the summary readability improvement unit. Thus, a summary can be generated with both user-focused information and author-focused information can be included depending on the knowledge level of a user.

...read moreread less

378 citations

Proceedings Article•DOI•

Ontology-based extraction and structuring of information from data-rich unstructured documents

[...]

David W. Embley¹, Douglas M. Campbell¹, Randy Smith¹, Stephen W. Liddle¹•Institutions (1)

Brigham Young University¹

01 Nov 1998

TL;DR: A new approach to extracting information from unstructured documents based on an application ontology that describes a domain of interest is presented, which attained recall ratios in the 80% and 90% range and precision ratios near 98%.

...read moreread less

Abstract: We present a new approach to extracting information from unstructured documents based on an application ontology that describes a domain of interest. Starting with such an ontology, we formulate rules to extract constants and context keywords from unstructured documents. For each unstructured document of interest, we extract its constants and keywords and apply a recognizer to organize extracted constants as attribute values of tuples in a generated database schema. To make our approach general, we fix all the processes and change only the ontological description for a different application domain. In experiments we conducted on two different types of unstructured documents taken from the Web, our approach attained recall ratios in the 80% and 90% range and precision ratios near 98%.

...read moreread less

199 citations

Proceedings Article•

Machine learning of generic and user-focused summarization

[...]

Inderjeet Mani¹, Eric Bloedorn¹•Institutions (1)

Mitre Corporation¹

01 Jul 1998

TL;DR: This paper used machine learning on a training corpus of documents and their abstracts to discover salience functions which describe what combination of features is optimal for a given summarization task, which addresses both "generic" and user-focused summaries.

...read moreread less

Abstract: A key problem in text summarization is finding a salience function which determines what information in the source should be included in the summary. This paper describes the use of machine learning on a training corpus of documents and their abstracts to discover salience functions which describe what combination of features is optimal for a given summarization task. The method addresses both "generic" and user-focused summaries.

...read moreread less

121 citations

Proceedings Article•DOI•

Using Leading Text for News Summaries: Evaluation Results and Implications for Commercial Summarization Applications

[...]

Mark Wasson¹•Institutions (1)

Elsevier¹

10 Aug 1998

TL;DR: Leading text extracts created to support some online Boolean retrieval goals are evaluated for their acceptability as news document summaries.

...read moreread less

Abstract: Leading text extracts created to support some online Boolean retrieval goals are evaluated for their acceptability as news document summaries. Results are presented and discussed from the perspective of commercial summarization technology needs.

...read moreread less

82 citations

Proceedings Article•DOI•

A hierarchical approach to the automatic categorization of medical documents

[...]

Luciano Soares de Lima¹, Alberto H. F. Laender¹, Berthier Ribeiro-Neto¹•Institutions (1)

Universidade Federal de Minas Gerais¹

01 Nov 1998

72 citations

Proceedings Article•DOI•

Trainable, Scalable Summarization Using Robust NLP and Machine Learning

[...]

Chinatsu Aone¹, Mary Ellen Okurowski², James Gorlinsky¹•Institutions (2)

SRA International¹, United States Department of Defense²

10 Aug 1998

TL;DR: A trainable and scalable summarization system which utilizes features derived from information retrieval, information extraction, and NLP techniques and on-line resources and is demonstrated system scalability by reporting results on the best combination of summarization features for different document sources.

...read moreread less

Abstract: We describe a trainable and scalable summarization system which utilizes features derived from information retrieval, information extraction, and NLP techniques and on-line resources. The system combines these features using a trainable feature combiner learned from summary examples through a machine learning algorithm. We demonstrate system scalability by reporting results on the best combination of summarization features for different document sources. We also present preliminary results from a task-based evaluation on summarization output usability.

...read moreread less

68 citations

Proceedings Article•DOI•

Automatic Text Summarization Based on the Global Document Annotation

[...]

Katashi Nagao, Koiti Hasida

10 Aug 1998

TL;DR: The main features are a domain/style-free algorithm and personalization on summarization which reflects readers' interests and preferences and the proposed method is flexible enough to dynamically generate summaries of various sizes.

...read moreread less

Abstract: The GDA (Global Document Annotation) project proposes a tag set which allows machines to automatically infer the underlying semantic/pragmatic structure of documents. Its objectives are to promote development and spread of NLP/AI applications to render GDA-tagged documents versatile and intelligent contents, which should motivate WWW (World Wide Web) users to tag their documents as part of content authoring. This paper discusses automatic text summarization based on GDA. Its main features are a domain/style-free algorithm and personalization on summarization which reflects readers' interests and preferences. In order to calculate the importance score of a text element, the algorithm uses spreading activation on an intradocument network which connects text elements via thematic, rhetorical, and coreferential relations. The proposed method is flexible enough to dynamically generate summaries of various sizes. A summary browser supporting personalization is reported as well.

...read moreread less

44 citations

Proceedings Article•DOI•

Hierarchical video summarization

[...]

K. Ratakonda¹, M. Ibrahim Sezan, Regis J. Crinon•Institutions (1)

University of Illinois at Urbana–Champaign¹

28 Dec 1998

TL;DR: A hierarchical key-frames summarization algorithm where a coarse-to-fine key-frame summary is generated that facilitates multi-level browsing where the user can quickly discover the content of the video by accessing its coarsest but most compact summary and then view a desired segment of theVideo with increasingly more detail.

...read moreread less

Abstract: We address the problem of key-frame summarization of vide in the absence of any a priori information about its content. This is a common problem that is encountered in home videos. We propose a hierarchical key-frame summarization algorithm where a coarse-to-fine key-frame summary is generated. A hierarchical key-frame summary facilitates multi-level browsing where the user can quickly discover the content of the video by accessing its coarsest but most compact summary and then view a desired segment of the video with increasingly more detail. At the finest level, the summary is generated on the basis of color features of video frames, using an extension of a recently proposed key-frame extraction algorithm. The finest level key-frames are recursively clustered using a novel pairwise K-means clustering approach with temporal consecutiveness constraint. We also address summarization of MPEG-2 compressed video without fully decoding the bitstream. We also propose efficient mechanisms that facilitate decoding the video when the hierarchical summary is utilized in browsing and playback of video segments starting at selected key-frames.© (1998) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

...read moreread less

42 citations

Proceedings Article•DOI•

Summarization: (1) using mmr for diversity- based reranking and (2) evaluating summaries

[...]

Jade Goldstein¹, Jaime G. Carbonell¹•Institutions (1)

Carnegie Mellon University¹

13 Oct 1998

TL;DR: A method for combining query-relevance with information-novelty in the context of text retrieval and summarization, where the clearest advantage is demonstrated in the automated construction of large document and non-redundant multi-document summaries, where MMR results are clearly superior to non-MMR passage selection.

...read moreread less

Abstract: This paper develops a method for combining query-relevance with information-novelty in the context of text retrieval and summarization. The Maximal Marginal Relevance (MMR) criterion strives to reduce redundancy while maintaining query relevance in reranking retrieved documents and in selecting appropriate passages for text summarization. Preliminary results indicate some benefits for MMR diversity ranking in ad-hoc query and in single document summarization. The latter are borne out by the trial-run (unofficial) TREC-style evaluation of summarization systems. However, the clearest advantage is demonstrated in the automated construction of large document and non-redundant multi-document summaries, where MMR results are clearly superior to non-MMR passage selection. This paper also discusses our preliminary evaluation of summarization methods for single documents.

...read moreread less

36 citations

Proceedings Article•DOI•

Memory-adaptive scheduling for large query execution

[...]

Luc Bouganim, Olga Kapitskaia¹, Patrick Valduriez¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Nov 1998

Proceedings Article•DOI•

Information extraction from case law and retrieval of prior cases by partial parsing and query generation

[...]

Peter Jackson¹, Khalid Al-Kofahi¹, Chris Kreilick¹, Brian Grom¹•Institutions (1)

West¹

01 Nov 1998

Proceedings Article•DOI•

Summarization-based Query Expansion in Information Retrieval

[...]

Tomek Strzalkowski, Jin Wang, G. Bowden Wise

10 Aug 1998

TL;DR: The role of automated document summarization in building effective search statements is investigated and the results of latest evaluation of the system at the annual Text Retrieval Conference (TREC) are discussed.

...read moreread less

Abstract: We discuss a semi-interactive approach to information retrieval which consists of two tasks performed in a sequence. First, the system assists the searcher in building a comprehensive statement of information need, using automatically generated topical summaries of sample documents. Second, the detailed statement of information need is automatically processed by a series of natural language processing routines in order to derive an optimal search query for a statistical information retrieval system. In this paper, we investigate the role of automated document summarization in building effective search statements. We also discuss the results of latest evaluation of our system at the annual Text Retrieval Conference (TREC).

...read moreread less

MINDS - Multi-lingual INteractive Document Summarization

[...]

Jim Cowie¹, Kavi Mahesh, Sergei Nirenburg, Remi Zajac•Institutions (1)

New Mexico State University¹

01 Jan 1998

TL;DR: The research described here focuses on multi-lingual summarization (MLS), where summaries of documents are produced in their original language; corresponding summaries in English will eventually be generated.

...read moreread less

Abstract: The research described here focuses on multi-lingual summarization (MLS). Summaries of documents are produced in their original language; corresponding summaries in English will eventually be generated. The source languages supported are Spanish, Japanese, English and Russian. Background

...read moreread less

Proceedings Article•DOI•

Improving Robust Domain Independent Summarization

[...]

Jim Cowie¹, Eugene Ludovik¹, Hugo Molina-Salgado¹•Institutions (1)

New Mexico State University¹

13 Oct 1998

TL;DR: The approach is sentence selection, but includes techniques to improve coherence and also to perform sentence reduction to support robust automatic summarization.

...read moreread less

Abstract: We discuss those techniques which, in the opinion of the authors, are needed to support robust automatic summarization. Many of these methods are already incorporated in a multi-lingual summarization system, MINDS, developed at CRL. The approach is sentence selection, but includes techniques to improve coherence and also to perform sentence reduction. Our methods are in distinct contrast to those approaches to summarization by deep analysis of a document followed by text generation.

...read moreread less

Summarization of Documents That Include Graphics

[...]

Robert P. Futrelle¹•Institutions (1)

Northeastern University¹

01 Jan 1998

TL;DR: It is argued that for summarization to be successful, metadata, a manipulable representation of the content of figures, needs to be generated or included initially.

...read moreread less

Abstract: When documents include graphics such as diagrams, photos, and data plots, the graphics may also require summarization. This paper discusses essential differences in informational content and rhetorical structure between text and graphics, as well as their interplay. The three approaches to graphics summarization discussed are: Selection, in which a subset of figures is chosen; Merging, in which information in multiple figures is merged into one; and Distillation, in which a single diagram is reduced to a simpler form. These procedures have to consider the content and relations of the graphical elements within figures, the relations among a collection of figures, and the figure captions and discussions of figure content in the running text. We argue that for summarization to be successful, metadata, a manipulable representation of the content of figures, needs t, k generated or included initially. Often, the textual refert to figures are not very informative, so it will be necessa generate metadata by diagram parsing, as we have dont to develop intelligent authorh~g systems that will allow the author to easily include metadata. This paper introduces this new area of research with manual summarization examples and follows them with a discussion of automated techniques under development. For example, here is how two data graphs might be merged:

...read moreread less

Proceedings Article•DOI•

Multiple & single document summarization using dr-link

[...]

Mary McKenna¹, Elizabeth D. Liddy¹•Institutions (1)

Syracuse University¹

13 Oct 1998

TL;DR: This research is developing a procedure to evaluate the summaries the authors create and hopes to uncover useful metrics and evaluation variables that can be used by others working in this area.

...read moreread less

Abstract: Our Tipster Phase III research objective for the Summarization task is to produce a single summary across multiple documents returned from a search on an information retrieval system. An established set of metrics to evaluate the performance of our system is not available in this field at present, so this research is also developing a procedure to evaluate the summaries we create. We hope to uncover useful metrics and evaluation variables that can be used by others working in this area.

...read moreread less

Proceedings Article•

Advantage of Query Biased Summarization in Information Retrieval

[...]

A. Tombros

01 Jan 1998

Proceedings Article•DOI•

Automatic Text Summarization in TIPSTER

[...]

Therese Firmin, lnderjeet Mani¹•Institutions (1)

Mitre Corporation¹

13 Oct 1998

TL;DR: The TIPSTER program sponsored seven research efforts into text summarization, all with different approaches to the problem, and there is considerable interest in automatically producing summaries due to the growth of the Internet and the World Wide Web.

...read moreread less

Abstract: Automatic Text Summarization was added as a major research thrust of the TIPSTER program during TIPSTER Phase III, 1996-1998. It is a natural extension of the previously supported research efforts in Information Extraction (IE) and Information Retrieval (IR). There is considerable interest in automatically producing summaries due, in large part, to the growth of the Internet and the World Wide Web. The TIPSTER program sponsored seven research efforts into text summarization, all with different approaches to the problem.

...read moreread less

Evaluation of Automatic Text Summarization Across Multiple Documents

[...]

Mary McKenna, Elizabeth D. Liddy, TextWise Llc

01 Jan 1998

TL;DR: This research will test an evaluation method and metric to compare human assessments with machine output of newstext multiple document summaries and uncover useful metrics and evaluation variables that can be used by other research efforts in this area.

...read moreread less

Abstract: This paper describes an ongoing research effort to produce multiple document summaries in response to information requests. Given the absence of tools to evaluate multiple document summaries, this research will test an evaluation method and metric to compare human assessments with machine output of newstext multiple document summaries. Using the DR-LINK information retrieval and analysis system, components of documents and metadata generated during document processing become candidates tbr use in multiple document summaries. This research is sponsored by the U.S. Government through the Tipster Phase Ill Text Summarization project. TextWise is a participant in the Tipster Phase III Text Summarization project funded by the U.S. Government. Our research objective is to produce high quality multiple document summaries. An established set of metrics to evaluate the performance of our production of multiple document summaries is not available at present. Therefore, this research effort is also concerned with developing a procedure to evaluate the summaries we create. We hope that we will uncover useful metrics and evaluation variables that can be used by other research efforts in this area. The lack of automatic summarization evaluation tools is directly connected to the need for a comprehensive description of the different types of summaries possible. Automatic text summarization can mean many different things. The summary may be addressing a need of an information seeker (query dependent summary) or it may be independent of any specified information need (generic summary). The summary may represent a single document (single document summary) or a group documents (multiple document summary). The summary may be an extract of sentences or pieces of text from a document (extract summary) or it may not use any of the actual wording from the source documents (generated text summary). Finally, the summary may provide general overview of document contents (indicative summary), or it may act as a substitute for the actual document (informative summary). This terminology will be used though out this report in an attempt to clarify and define the various possible outcomes of automatic text summarization.

...read moreread less

Proceedings Article•DOI•

Reflections of Accomplishments in Natural Language Based Detection and Summarization

[...]

Susan R. Viscuso

13 Oct 1998

TL;DR: In Phase III, the GE team focused on accurate context indexing of text documents, generation of effective search queries, extended statistical retrieval with constraints, and document abstracting and summarization.

...read moreread less

Abstract: In Phase III, the GE team focused on accurate context indexing of text documents, generation of effective search queries, extended statistical retrieval with constraints, and document abstracting and summarization

...read moreread less