scispace - formally typeset
Search or ask a question

Showing papers on "Multi-document summarization published in 1995"


Journal ArticleDOI
TL;DR: A system that performs domain-independent automatic condensation of news from a large commercial news service encompassing 41 different publications is described, with the result that the lead-based summaries outperformed the “intelligent” summaries significantly.
Abstract: As electronic information access becomes the norm, and the variety of retrievable material increases, automatic methods of summarizing or condensing text will become critical. This paper describes a system that performs domain-independent automatic condensation of news from a large commercial news service encompassing 41 different publications. This system was evaluated against a system that condensed the same articles using only the first portion of the texts (the lead), up to the target length of the summaries. Three lengths of articles were evaluated for 250 documents by both systems, totalling 1500 suitability judgements in all. The outcome of perhaps the largest evaluation of human vs machine summarization performed to date was unexpected. The lead-based summaries outperformed the “intelligent” summaries significantly, achieving acceptability ratings of over 90%, compared to 74.4%. This paper briefly reviews the literature, details the implications of these results, and addresses the remaining hopes for content-based summarization. We expect the results presented here to be useful to other researchers currently investigating the viability of summarization through sentence selection heuristics.

432 citations


Proceedings ArticleDOI
01 Jul 1995
TL;DR: A natural language system which summarizes a series of news articles on the same event by using summarization operators from the output of the systems developed for ARPA’s Message Understanding Conferences.
Abstract: We present a natural language system which summarizes a series of news articles on the same event. It uses summarization operators, identified through empirical analysis of a corpus of news summaries, to group together templates from the output of the systems developed for ARPA’s Message Understanding Conferences. Depending on the available resources (e.g., space), summaries of different length can be produced. Our research also provides a methodological framework for future work on the summarization task and on the evaluation of news summarization systems.

375 citations


Journal ArticleDOI
TL;DR: A system, SumGen, is described, which selects key information from an event database by reasoning about event frequencies, frequencies of relations between events, and domain specific importance measures and then aggregates similar information and plans a summary presentation tailored to a stereotypical user.
Abstract: Summarization entails analysis of source material, selection of key information, condensation of this, and generation of a compact summary form. While there have been many investigations into the automatic summarization of text, relatively little attention has been given to the summarization of information from structured information sources such as data or knowledge bases, despite this being a desirable capability for a number of application areas including report generation from databases (e.g. weather, financial, medical) and simulations (e.g. military, manufacturing, economic). After a brief introduction indicating the main elements of summarization and referring to some illustrative approaches to it, this article considers specific issues in the generation of text summaries of event data. It describes a system, SumGen, which selects key information from an event database by reasoning about event frequencies, frequencies of relations between events, and domain specific importance measures. The article describes how SumGen then aggregates similar information and plans a summary presentation tailored to a stereotypical user. Finally, the article evaluates SumGen performance, and also that of a much more limited second summariser, by assessesing information extraction by 22 human subjects from both source and summary texts. This evaluation shows that the use of SumGen reduces average sentence length by approx. 15%, document length by 70%, and time to perform information extraction by 58%.

77 citations


Proceedings ArticleDOI
Lee-Feng Chien1
01 Jul 1995
TL;DR: The proposed approach is an integrated and efficient text access method, which performs well both in exact match searching of Boolean queries and best match searching (ranking) of quasi-natural language queries, which is capable of retrieving gigabytes of Chinese texts very efficiently and intelligently.
Abstract: This paper presents an efficient signature file approach for fast and intelligent retrieval of large Chinese full-text document databases. The proposed approach is an integrated and efficient text access method, which performs well both in exact match searching of Boolean queries and best match searching (ranking) of quasi-natural language queries. Using this approach, the inherent difficulties of Chinese word segmentation and proper noun identification can be effectively reduced, queries can be expressed with non-controlled vocabulary, and the ranking function can be easily implemented neither demanding extra space overhead nor affecting the retrieval efficiency. The experimental results show that the proposed approach achieves good performance in many ways, especially in the reduction of false drops and space overhead, the speedup of retrieval time, and the capability of best match searching using quasi-natural language queries. In conclusion, the proposed approach is capable of retrieving gigabytes of Chinese texts very efficiently and intelligently.

53 citations


Proceedings ArticleDOI
01 Jul 1995
TL;DR: There are interaction effects among citations in a search output that affect the physician's judgment of clinical applicability and physicians select among different information processing strategies when attempting to use literature for tinding an answer a clinical question.
Abstract: We report findings from an exploratory study whose overall goal was to design an online document surrogate for journal articles, customized for use in climcal problem solving. We describe two aspects of literature-based medical decision making. First, there are interaction effects among citations in a search output (or among articles in a stack of articles) that affect the physician's judgment of clinical applicability. Second, physicians select among different information processing strategies when attempting to use literature for tinding an answer a clinical question.

37 citations


Proceedings ArticleDOI
01 Jul 1995
TL;DR: An experimental term selection strategy for document visualization that supports browsing in high recall, low precision document retrieval and classification tasks and increases the clustering tendency of low-dimensional document browsing spaces.
Abstract: An experimental term selection strategy for document visualization is described. Strong discriminators with few co-occurrences increase the clustering tendency of low-dimensional document browsing spaces. Clustering tendency is tested with diagnostic measures adapted from the field of cluster analysis, and con6rrned using the VIBE visualization tool. This method supports browsing in high recall, low precision document retrieval and classification tasks.

32 citations


Proceedings ArticleDOI
01 Jul 1995
TL;DR: There was a statistically significant dependence between term-consistsency and the terminologicat styles of searchers on the one hand and between concept-consistency and searchers’ search strategies on the other hand and clear differences in the experience of most and least consistent searchers both in information storage and information retrieval.
Abstract: Differences between the most and least consistent searchers are considered, Attention is payed both to termconsistency and concept-consistency. The paper is based on an empirical study where 32 searchers formulated query statements from 12 search requests. The searchers were stso interviewed to obtain information about their experience. There was a statistically significant dependence between term-consistency and the terminologicat styles of searchers on the one hand and between concept-consistency and searchers’ search strategies on the other hand. There were also clear differences in the experience of most and least consistent searchers both in information storage and information retrieval.

22 citations


Proceedings ArticleDOI
01 Jul 1995
TL;DR: An approach for estimating the number of elements needed from the basic rankings to compute a given number of element of the resulting ranking and experiments with a large text database prove the apphcability of this approach.
Abstract: In this paper, we consider vague queries n text and fact databases. A vague query can be formulated as a combination of vague cnterta. A single database object can meet a vague criterion to a certain degree. We confine ourselves to queries for which the answer can be computed efficiently by (perhaps repetitive) combtnatlon of ranktngs to new rankings. Since users usually w1lI tnspect some of the best answer objects only, the corresponding rarkngs need to be computed just as far as necessary to generate these first answer objects. In this contribution we describe an approach for esttmattng the number of elements needed from the basic rankings to compute a given number of elements of the resulting ranking. Experiments with a large text database prove the apphcability of our approach.

15 citations


Proceedings ArticleDOI
01 Jul 1995
TL;DR: The interface is based on the non-fkst-normal-form (NF2) relational model, which allows intuitive and systematic modeling of complex documents and provides a truly declarative and powerful interface for the users.
Abstract: Complex documents are used in many environments, e.g., information retrieval (IR). Such documents contain subdocuments, which may contain further subdocuments, etc. Powerful tools are needed to facilitate their retrieval, restructuring, and analysis. Existing IR systems are poor in com plex document restructuring and data aggregation. However, in practice, IR system users would often want to obtain aggregation information on subdocuments of complex documents. In this paper we address this problem and provide a truly declarative and powerful interface for the users. Our interface is based on the non-fkst-normal-form (NF2) relational model. It allows intuitive and systematic modeling of complex documents.

12 citations