Showing papers presented at "International ACM SIGIR Conference on Research and Development in Information Retrieval in 1987"

PDF

Open Access

Proceedings Article•DOI•

Automatic phrase indexing for document retrieval

[...]

Joel L. Fagan¹•Institutions (1)

01 Nov 1987

TL;DR: An automatic phrase indexing method based on the term discrimination model is described, and the results of retrieval experiments on five document collections are presented.

...read moreread less

Abstract: An automatic phrase indexing method based on the term discrimination model is described, and the results of retrieval experiments on five document collections are presented. Problems related to this non-syntactic phrase construction method are discussed, and some possible solutions are proposed that make use of information about the syntactic structure of document and query texts.

...read moreread less

130 citations

Proceedings Article•

Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval

[...]

C. T. Yu, C. J. van Rijsbergen

01 Nov 1987

58 citations

Proceedings Article•DOI•

Conceptual information retrieval using RUBRIC

[...]

Richard Tong, L. Appelbaum, Victor N. Askman, J. Cunningham

01 Nov 1987

TL;DR: The problem of retrieving information from large full-text databases is unblquitous and increasing in importance as lowcost optical storage media become available.

...read moreread less

Abstract: The problem of retrieving information from large full-text databases is unblquitous and increasing in importance as lowcost optical storage media become available. In many cases, simple keyword-based retrieval systems have shown themselves inadequate for the task and a number of more or less sophisticated alternatives have been proposed. Of particular interest are those tha t derive from efforts in natural language understanding and which advocate a conceptually oriented approach (Schank et aL, 1981; DeJong, 1982; Kolodner, 1983). These efforts emphasize semantically driven text parsing with the goal of understanding only so much of the text as is necessary to perform satisfactory retrieval.

...read moreread less

56 citations

Proceedings Article•

An Approach to Natural Language Processing for Document Retrieval.

[...]

W. Bruce Croft, David Lewis

01 Jan 1987

47 citations

Proceedings Article•DOI•

Improved techniques for processing queries in full-text systems

[...]

Y. Choueka¹, Aviezri S. Fraenkel², Shmuel T. Klein², E. Segal•Institutions (2)

Telcordia Technologies¹, Weizmann Institute of Science²

01 Nov 1987

TL;DR: This work proposes to combine the concordance and bit-map approaches, and shows how this can speed up the processing of queries: fast ANDing and ORing of the maps in a preprocessing stage, lead to large I/O savings in collating coordinates of keywords needed to satisfy the metrical and Boolean constraints.

...read moreread less

Abstract: In static full-text retrieval systems, which accommodate metrical as well as Boolean operators, the traditional approach to query processing uses a “concordance”, from which large sets of coordinates are retrieved and then merged and/or collated. Alternatively, in a system with l documents, the concordance can be replaced by a set of bit-maps of fixed length l, which are constructed for every different word of the database and serve as occurrence maps. We propose to combine the concordance and bit-map approaches, and show how this can speed up the processing of queries: fast ANDing and ORing of the maps in a preprocessing stage, lead to large I/O savings in collating coordinates of keywords needed to satisfy the metrical and Boolean constraints. Moreover, the bit-maps give partial information on the distribution of the coordinates of the keywords, which can be used when queries must be processed by stages, due to their complexity and the sizes of the involved sets of coordinates. The new techniques are partially implemented at the Responsa Retrieval Project.

...read moreread less

29 citations

Proceedings Article•DOI•

A failure analysis of the limitation of suffixing in an online environment

[...]

Donna Harman¹•Institutions (1)

National Institutes of Health¹

01 Nov 1987

TL;DR: The interaction of suffixing algorithms and ranking techniques in retrieval performance, particularly in an online environment, was investigated and two modifications to ranking techniques were suggested: variable weighting of word variants and selective stemming depending on query length.

...read moreread less

Abstract: The interaction of suffixing algorithms and ranking techniques in retrieval performance, particularly in an online environment, was investigated. Three general purpose suffixing algorithms were used for retrieval on the Cranfield 1400, Medlars, and CACM collections, and the results analysed with several standard evaluation measures. An examination of the retrieval performance using suffixing suggested two modifications to ranking techniques: variable weighting of word variants and selective stemming depending on query length. The experimental data is presented, and the limitations of suffixing in an online environment is discussed.

...read moreread less

28 citations

Proceedings Article•DOI•

A dynamic cluster maintenance system for information retrieval

[...]

Fazli Can¹, Esen A. Ozkarahan²•Institutions (2)

Miami University¹, Arizona State University²

01 Nov 1987

TL;DR: A new cluster maintenance strategy is proposed and its similarity/stability characteristics, cost analysis, and retrieval behavior in comparison with unclustered and completely reclustered database environments have been examined by means of a series of experiments.

...read moreread less

Abstract: Partitioning by clustering of very large databases is a necessity to reduce the space/time complexity of retrieval operations. However, the contemporary and modern retrieval environments demand dynamic maintenance of clusters. A new cluster maintenance strategy is proposed and its similarity/stability characteristics, cost analysis, and retrieval behavior in comparison with unclustered and completely reclustered database environments have been examined by means of a series of experiments.

...read moreread less

28 citations

Proceedings Article•

MICROARRAS: An Advanced Full-Text Retrieval and Analysis System.

[...]

John B. Smith, Stephen F. Weiss, Gordon J. Ferguson

01 Jan 1987

26 citations

Proceedings Article•DOI•

Adaptive linear information retrieval models

[...]

P. Bollmann¹, S. K. M. Wong¹•Institutions (1)

University of Regina¹

01 Nov 1987

TL;DR: This paper proposes a method and gives the precise semantics of the retrieval operations in a system where imprecision is allowed and suggests a way to handle the uncertainty introduced by imprecise data values.

...read moreread less

Abstract: Missing, non-applicable and imprecise values arise frequently in Office Information Systems. There is a need to treat them in a consistent and useful manner. This paper proposes a method and gives the precise semantics of the retrieval operations in a system where imprecision is allowed. It also suggests a way to handle the uncertainty introduced by imprecise data values.

...read moreread less

25 citations

Proceedings Article•DOI•

Fast object partitioning using Stochastic learning automata

[...]

B.J. Oommen¹, D. Ma¹•Institutions (1)

Carleton University¹

01 Nov 1987

TL;DR: The first solution is relatively fast, but its accuracy is not so remarkable in some environments, and the second solution, which uses a new variable structure stochastic automation, demonstrates an excellent partitioning capability.

...read moreread less

Abstract: Let O = {A1, …, AW} be a set of W objects to be partitioned into R classes {P1, …, PR}. The objects are accessed in groups of unknown size and the size of these groups need not be equal. Additionally, the joint access probabilities of the objects are unknown. The intention is that the objects accessed more frequently together are located in the same class. This problem has been shown to be NP-hard [15, 16]. In this paper, we propose two stochastic learning automata solutions to the problem. Although the first one is relatively fast, its accuracy is not so remarkable in some environments. The second solution, which uses a new variable structure stochastic automation, demonstrates an excellent partitioning capability. Experimentally, this solution converges an order of magnitude faster than the best known algorithm in the literature [15, 16].

...read moreread less

25 citations

Proceedings Article•DOI•

An approach to natural language for document retrieval

[...]

Bruce Croft¹•Institutions (1)

University of Massachusetts Amherst¹

01 Nov 1987

TL;DR: The proposed NLP techniques are used to develop a request model based on “conceptual case frames” and to compare this model with the texts of candidate documents and statistical searches carried out using dependency and relative importance information derived from the request models indicate that performance benefits can be obtained.

...read moreread less

Abstract: Document retrieval systems have been restricted, by the nature of the task, to techniques that can be used with large numbers of documents and broad domains. The most effective techniques that have been developed are based on the statistics of word occurrences in text. In this paper, we describe an approach to using natural language processing (NLP) techniques for what is essentially a natural language problem - the comparison of a request text with the text of document titles and abstracts. The proposed NLP techniques are used to develop a request model based on “conceptual case frames” and to compare this model with the texts of candidate documents. The request model is also used to provide information to statistical search techniques that identify the candidate documents. As part of a preliminary evaluation of this approach, case frame representations of a set of requests from the CACM collection were constructed. Statistical searches carried out using dependency and relative importance information derived from the request models indicate that performance benefits can be obtained.

...read moreread less

Proceedings Article•DOI•

TIRS: a topological information retrieval system satisfying the requirements of the Waller-Kraft wish list

[...]

Steven C. Cater¹, Donald H. Kraft²•Institutions (2)

University of Georgia¹, Louisiana State University²

01 Nov 1987

TL;DR: Models of document retrieval systems assuming random selection and best-first selection are developed and compared under binary independence and two Poisson independence feature distribution models.

...read moreread less

Abstract: Most document retrieval systems based on probabilistic models of feature distributions assume random selection of documents for retrieval. The assumptions of these models are met when documents are randomly selected from the database or when retrieving all available documents. A more suitable model for retrieval of a single document assumes that the best document available is to be retrieved first. Models of document retrieval systems assuming random selection and best-first selection are developed and compared under binary independence and two Poisson independence feature distribution models. Under the best-first model, feature discrimination varies with the number of documents in each relevance class in the database. A weight similar to the Inverse Document Frequency weight and consistent with the best-first model is suggested which does not depend on knowledge of the characteristics of relevant documents.

...read moreread less

Proceedings Article•DOI•

Predictive test compression by hashing

[...]

Timo Raita¹, Jukka Teuhola¹•Institutions (1)

University of Turku¹

01 Nov 1987

TL;DR: A family of compression methods using a hash table for searching the prediction information, which are especially apt for “on-the-fly” compression of transmitted data and could be a basis for specialized hardware.

...read moreread less

Abstract: The knowledge of a short substring constitutes a good basis for guessing the next character in a natural language text. This observation, i.e. repeated guessing and encoding of subsequent characters, is very fundamental for the predictive text compression. The paper describes a family of such compression methods, using a hash table for searching the prediction information. The experiments show that the methods produce good compression gains and, moreover, are very fast. The one-pass versions are especially apt for “on-the-fly” compression of transmitted data, and could be a basis for specialized hardware.

...read moreread less

Proceedings Article•DOI•

Optimal determination of user-oriented clusters

[...]

Jitender S. Deogun¹, Vijay V. Raghavan²•Institutions (2)

University of Nebraska–Lincoln¹, Sewanee: The University of the South²

01 Nov 1987

TL;DR: An enhancement of such a clustering scheme is presented by the formulation of the user-oriented clustering as a function-optimization problem, termed the Boundary Selection Problem (BSP).

...read moreread less

Abstract: User-oriented clustering schemes enable the classification of documents based upon the user perception of the similarity between documents, rather than on some similarity function presumed by the designer to represent the user criteria. In this paper, an enhancement of such a clustering scheme is presented. This is accomplished by the formulation of the user-oriented clustering as a function-optimization problem. The problem formulated is termed the Boundary Selection Problem (BSP). Heuristic approaches to solve the BSP are proposed and a preliminary for evaluation of these approaches is provided.

...read moreread less

Proceedings Article•

Individual differences in the use of information retrieval systems: Some issues and some data.

[...]

Christine L. Borgman

01 Jan 1987

Proceedings Article•DOI•

An approach to image retrieval from large image databases

[...]

Fausto Rabitti¹, Peter Stanchev²•Institutions (2)

National Research Council¹, Bulgarian Academy of Sciences²

01 Nov 1987

TL;DR: This approach allows a limited automatic analysis for image belonging to a domain described in advance to the system using a formalism based on fuzzy sets, based on special access structures generated from the image analysis process.

...read moreread less

Abstract: In this paper we address the problem of retrieving images from large image databases, giving a partial description of the image content This approach allows a limited automatic analysis for image belonging to a domain described in advance to the system using a formalism based on fuzzy sets The image query processing is based on special access structures generated from the image analysis process

...read moreread less

Proceedings Article•DOI•

Informational zooming: an interaction model for the graphical access to text knowledge bases

[...]

Ulrich Thiel, Rainer Hammwöhner

01 Nov 1987

TL;DR: An interaction model, which refers to a knowledge based model of document description, is discussed, which employs the feature "informational zooming" to investigate informational entities on an adequate level of abstraction.

...read moreread less

Abstract: User interfaces to information systems can be modelled by providing generalized descriptions of the contributions to the dialog from both partners: user and system. In this paper, we refer to such descriptions as "interaction models". Due to the probable integration of heterogeneous types of information in future information systems, we discuss an interaction model, which refers to a knowledge based model of document description (cf HAHN/REIMER 86). Using interactive graphics the model employs the feature "informational zooming" to investigate informational entities on an adequate level of abstraction. The knowledge-based full-text information system TOPIC/TOPOGRAPHIC integrates the presentation of various types of information (topical, factual and textual) into a comprehensive interaction model based on informational objects. Only three operators suffice for accessing the information structures at all levels. This is accomplished by context depending menus that are generated dynamically during the dialog if a further specification of the command is needed. Thus a user-friendly access to several layers of information about texts is possible: (1) Topical structures of relevant texts at different levels of generality (cascaded abstracts) (2) Facts from those texts automatically extracted during the text analysis (3) Passages from the original text which are presented according to the user's zooming operations. A survey of the functionality of the system is given in the appendix. l Interaction Models of Information Systems User interfaces to information systems can be modelled by providing generalized descriptions of the contributions to the dialog from both partners: user and system. In this paper, we will refer to such descriptions as "interaction models", which are determined by design decisions 1 This paper is an enhanced version of the paper presented at ACMSIGIR ’87, published in: Yu, C.T. / Van Rijsbergen, C. J. (eds): Proceedings of the lOth Annual International ACMSIGIR Conference on Research and Development in Information Retrieval. New York, 1987, pp. 45-56. This text is published under the following Creative Commons Licence: AttributionNonCommercial-NoDerivs 2.0 Germany (http://creativecommons.org/licenses/by-nc-nd/2.0/de/).

...read moreread less

Proceedings Article•DOI•

EP-X: a demonstration of semantically based search of bibliographic databases

[...]

Deborah A. Krawczak¹, Philip J. Smith¹, Steven J. Shute¹•Institutions (1)

Ohio State University¹

01 Nov 1987

TL;DR: A sample interaction with EP-X is discussed, the knowledge representations necessary to support this semantically-based interaction are discussed, preliminary results of empirical studies to evaluate the interface, and recommendations for future directions are made.

...read moreread less

Abstract: EP-X (Environmental Pollution eXpert) is a prototype knowledge-based system that assists users in conducting bibliographic searches of the environmental pollution literature. This system combines artificial intelligence and human factors engineering techniques, allowing us to redesign traditional bibliographic information retrieval interfaces. The result supports semantically-based search as opposed to the typical character-string matching approach. This paper discusses a sample interaction with EP-X,the knowledge representations necessary to support this semantically-based interaction,preliminary results of empirical studies to evaluate the interface, andrecommendations for future directions

...read moreread less

Proceedings Article•DOI•

Thesaurus based concept spaces

[...]

P. Schauble¹•Institutions (1)

ETH Zurich¹

01 Nov 1987

Proceedings Article•

Data Caching in Information Retrieval Systems.

[...]

Patricia Simpson, Rafael Alonso

01 Jan 1987

Proceedings Article•DOI•

A statistical similarity measure

[...]

S. K. M. Wong¹, Yiyu Yao¹•Institutions (1)

University of Regina¹

01 Nov 1987

TL;DR: Within the framework of the vector space models, a statistical similarity measure between document and query is proposed, which provides a natural and consistent interpretation of term occurrence frequencies obtained from autoindexing.

...read moreread less

Abstract: Within the framework of the vector space models, a statistical similarity measure between document and query is proposed. In this approach the assumption that term (or atomic) vectors are pairwise orthogonal is not required. In addition, it provides a natural and consistent interpretation of term occurrence frequencies obtained from autoindexing.

...read moreread less

Proceedings Article•DOI•

Outline of a knowledge base model for an intelligent information retrieval system

[...]

M. F. Bruandet

01 Nov 1987

TL;DR: The attempt in this paper to outline a method for the automatic construction of a knowledge base and propose some methods and a domain knowledge model.

...read moreread less

Abstract: We attempt in this paper to outline a method for the automatic construction of a knowledge base. We propose some methods and a domain knowledge model. A new idea is to conceive a system that is able to each phase of its construction to acquire domain knowledge from all new information that it is building, in particular the indexing terms; the last section is an attempt in this sense.

...read moreread less

Proceedings Article•DOI•

Probabilistic search term weighting - some negative results

[...]

Norbert Fuhr, P. Muller

01 Nov 1987

TL;DR: The experimental results show that in this case no improvement over a simple coordination match function can be achieved, and models based on probabilistic indexing outperform the ranking procedures using search term weights.

...read moreread less

Abstract: The effect of probabilistic search term weighting on the improvement of retrieval quality has been demonstrated in various experiments described in the literature. In this paper, we investigate the feasibility of this method for boolean retrieval with terms from a prescribed indexing vocabulary. This is a quite different test setting in comparison to other experiments where linear retrieval with free text terms was used. The experimental results show that in our case no improvement over a simple coordination match function can be achieved. On the other hand, models based on probabilistic indexing outperform the ranking procedures using search term weights.

...read moreread less

Proceedings Article•DOI•

Non-hierarchical document clustering using the ICL distribution array processor

[...]

Edie M. Rasmussen¹, Peter Willett¹•Institutions (1)

University of Sheffield¹

01 Nov 1987

TL;DR: The results suggest that the parallel architecture of the DAP is not well suited to the variable-length records which characterise bibliographic data.

...read moreread less

Abstract: This paper considers the suitability and efficiency of a highly parallel computer, the ICL Distributed Array Processor (DAP), for document clustering. Algorithms are described for the implementation of the single-pass and reallocation clustering methods on the DAP and on a conventional mainframe computer. These methods are used to classify the Cranfield, Vaswani and UKCIS document test collections. The results suggest that the parallel architecture of the DAP is not well suited to the variable-length records which characterise bibliographic data.

...read moreread less

Journal Article•DOI•

Expert systems and information retrieval

[...]

Gerard Salton¹•Institutions (1)

Cornell University¹

01 Mar 1987

TL;DR: The conclusion is reached that expert systems are unlikely to provide much relief in ordinary retrieval environments and simpler and more effective retrieval systems can be implemented by falling back on methodologies proposed and evaluated over twenty years ago that operate without expert system intervention.

...read moreread less

Abstract: The existing bibliographic retrieval systems are too complex to permit direct on-line access by untrained end users. Expert system approaches have been introduced in the hope of simplifying the document indexing, search and retrieval operations and rendering these operations accessible to end users. The expert system approach is examined briefly in this note and the conclusion is reached that expert systems are unlikely to provide much relief in ordinary retrieval environments. Simpler and more effective retrieval systems than those currently in use can be implemented by falling back on methodologies proposed and evaluated over twenty years ago that operate without expert system intervention.

...read moreread less

Proceedings Article•DOI•

A formal treatment of missing & imprecise information

[...]

J. Morrissey¹, C. J. van Rijsbergen²•Institutions (2)

University College Dublin¹, University of Glasgow²

01 Nov 1987

...read moreread less

Journal Article•DOI•

Informativeness as an ordinal utility function for information retrieval

[...]

J. Tague

01 Mar 1987

TL;DR: Information retrieval has all the elements of a classical decision problem: a set of possible actions, a setof potential states, and a reward or utility attached to each combination of action and state.

...read moreread less

Abstract: Information retrieval has all the elements of a classical decision problem: a set of possible actions, a set of potential states, and a reward or utility attached to each combination of action and state. How the actions, states, and utilities are described, however, is variable, and depends very much on the describer's point of view.

...read moreread less

Proceedings Article•DOI•

Illustrated description of an interactive knowledge based indexing system

[...]

Susanne M. Humphrey¹•Institutions (1)

National Institutes of Health¹

01 Nov 1987

TL;DR: The Indexing Aid System is described and illustrated using an extended example, highlighting the knowledge-based capabilities of the system, namely, inheritance and internal retrieval, enforcement of restrictions, and other functions implemented by procedural attachments, which are characteristic of frame-based knowledge representation languages.

...read moreread less

Abstract: This report discusses the Indexing Aid Project for conducting research in interactive knowledge-based indexing of the medical literature. After providing an overview and background, we describe and illustrate the Indexing Aid System using an extended example, highlighting the knowledge-based capabilities of the system, namely, inheritance and internal retrieval, enforcement of restrictions, and other functions implemented by procedural attachments, which are characteristic of frame-based knowledge representation languages. A feature which generates reports for evaluating the system is also shown. The paper concludes with discussion of the research plan. The project is part of the Automated Classification and Retrieval Program at the Lister Hill National Center for Biomedical Communications, the research and development arm of the National Library of Medicine.

...read moreread less

Proceedings Article•DOI•

Why do some people have more difficulty learning to use an information retrieval system than others

[...]

Christine L. Borgman¹•Institutions (1)

University of California, Los Angeles¹

01 Nov 1987

TL;DR: It is found that engineering majors exhibit academic background and personality characteristics most like those of skilled searchers and programmers, with contrasting patterns or no discernible patterns in English and psychology majors.

...read moreread less

Abstract: The population using information retrieval systems is becoming increasingly diverse. We find a wide range of skills in ability to use these systems; this diverse population must be accommodated by the next generation of systems. This paper reports on a study to identify variables related to information retrieval aptitude, based on results from earlier studies of searchers and programmers. A sample of undergraduate subjects from English, psychology, and engineering majors was given a series of psychometric tests and compared to known populations. We find that engineering majors exhibit academic background and personality characteristics most like those of skilled searchers and programmers, with contrasting patterns or no discernible patterns in English and psychology majors. The strength of most associations increases when restricted to subjects who have either stayed in one major or who have changed major only within one disciplinary area. About half the variance in choice of major can be explained by scores on the tests administered, and a comparable amount of variance in test scores can be explained by the academic background variables.

...read moreread less

Proceedings Article•DOI•

Data cashing in IR systems

[...]

P. Simpson¹, R. Alonso¹•Institutions (1)

Princeton University¹

01 Nov 1987

TL;DR: Methods of integrating personal computers (PCs) into large information systems, with emphasis on effective use of the storage and processing capabilities of these computers, are outlined, noting that caching in this environment poses unique problems.

...read moreread less

Abstract: Information retrieval (IR) systems provide individual remote access to centrally managed data. The current proliferation of personal computer systems, as well as advances in storage and communication technology, have created new possibilities for designing information systems which are easily accessible, economical, and responsive to user needs. This paper outlines methods of integrating personal computers (PCs) into large information systems, with emphasis on effective use of the storage and processing capabilities of these computers. In particular we discuss means for caching retrieved data at PC-equipped user sites, noting that caching in this environment poses unique problems. An event-driven simulation program is described which models information system operation. This simulator is being used to examine caching strategies. Some results of these studies are presented.

...read moreread less