Showing papers presented at "International ACM SIGIR Conference on Research and Development in Information Retrieval in 1988"

PDF

Open Access

Journal Article•DOI•

Information Retrieval using a Singular Value Decomposition Model of Latent Semantic Structure

[...]

George W. Furnas, Scott Deerwester¹, Susan T. Durnais, Thomas K. Landauer, Richard A. Harshman², Lynn A. Streeter, Karen E. Lochbaum - Show less +3 more•Institutions (2)

University of Chicago¹, University of Western Ontario²

01 May 1988

TL;DR: In a new method for automatic indexing and retrieval, implicit higher-order structure in the association of terms with documents is modeled to improve estimates of term-document association, and therefore the detection of relevant documents on the basis of terms found in queries.

...read moreread less

Abstract: In a new method for automatic indexing and retrieval, implicit higher-order structure in the association of terms with documents is modeled to improve estimates of term-document association, and therefore the detection of relevant documents on the basis of terms found in queries. Singular-value decomposition is used to decompose a large term by document matrix into 50 to 150 orthogonal factors from which the original matrix can be approximated by linear combination; both documents and terms are represented as vectors in a 50- to 150- dimensional space. Queries are represented as pseudo-documents vectors formed from weighted combinations of terms, and documents are ordered by their similarity to the query. Initial tests find this automatic method very promising.

...read moreread less

411 citations

Journal Article•DOI•

Towards Interactive Query Expansion

[...]

Donna Harman¹•Institutions (1)

National Institutes of Health¹

01 May 1988

TL;DR: A series of experiments were run using the Cranfield test collection to discover techniques to select terms for lists of suggested terms gathered from feedback, nearest neighbors, and term variants of original query terms that would be effective for further retrieval.

...read moreread less

Abstract: In an era of online retrieval, it is appropriate to offer guidance to users wishing to improve their initial queries. One form of such guidance could be short lists of suggested terms gathered from feedback, nearest neighbors, and term variants of original query terms. To verify this approach, a series of experiments were run using the Cranfield test collection to discover techniques to select terms for these lists that would be effective for further retrieval. The results show that significant improvement can be expected from this approach to query expansion.

...read moreread less

206 citations

Proceedings Article•

On the Use of Spreading Activation Methods in Automatic Information Retrieval

[...]

Gerard Salton, Chris Buckley

01 Apr 1988

TL;DR: In this paper, spreading activation methods have been recommended in information retrieval to expand the search vocabulary and to complement the retrieved document sets, reminiscent of earlier associative indexing and retrieval systems.

...read moreread less

Abstract: Spreading activation methods have been recommended in information retrieval to expand the search vocabulary and to complement the retrieved document sets. The spreading activation strategy is reminiscent of earlier associative indexing and retrieval systems. Some spreading activation procedures are briefly described, and evaluation output is given, reflecting the effectiveness of one of the proposed procedures.

...read moreread less

205 citations

Proceedings Article•DOI•

On the use of spreading activation methods in automatic information

[...]

Gerard Salton¹, Chris Buckley¹•Institutions (1)

Cornell University¹

01 May 1988

TL;DR: The spreading activation strategy is reminiscent of earlier associative indexing and retrieval systems and is recommended in information retrieval to expand the search vocabulary and to complement the retrieved document sets.

...read moreread less

109 citations

Proceedings Article•DOI•

The automatic indexing system AIR/PHYS - from research to applications

[...]

P. Biebricher¹, Norbert Fuhr¹, G. Lustig¹, M. Schwantner¹, Gerhard Knorz - Show less +1 more•Institutions (1)

Darmstadt University of Applied Sciences¹

01 May 1988

TL;DR: An appropriate indexing approach and the corresponding structure of the AIR/PHYS system are described, and the conditions of the application as well as problems of further development are discussed.

...read moreread less

Abstract: Since October 1985, the automatic indexing system AIR/PHYS has been used in the input production of the physics data base of the Fachinformationsentrum Karlsruhe/West Germany. The texts to be indexed are abstracts written in English. The system of descriptors is prescribed. For the application of the AIR/PHYS system a large-scale dictionary containing more than 600 000 word-descriptor relations reap. phrase-descriptor relations has been developed. Most of these relations have been obtained by means of statistical and heuristical methods. In consequence, the relation system is rather imperfect. Therefore, the indexing system needs some fault- tolerating features. An appropriate indexing approach and the corresponding structure of the AIR/PHYS system are described. Finally, the conditions of the application as well as problems of further development are discussed.

...read moreread less

83 citations

Proceedings Article•DOI•

A cluster-based approach to thesaurus construction

[...]

Carolyn J. Crouch¹•Institutions (1)

Tulane University¹

01 May 1988

TL;DR: This paper describes one approach to the automatic generation of global thesauri, based on the discrimination value model of Salton, Yang, and Yu and on an appropriate clustering algorithm, which has been implemented and applied to two document collections.

...read moreread less

Abstract: The importance of a thesaurus in the successful operation of an information retrieval system is well recognized. Yet techniques which support the automatic generation of thesauri remain largely undiscovered. This paper describes one approach to the automatic generation of global thesauri, based on the discrimination value model of Salton, Yang, and Yu and on an appropriate clustering algorithm. This method has been implemented and applied to two document collections. Preliminary results indicate that this method, which produces improvements in retrieval performance in excess of 10 and 15 percent in the test collections, is viable and worthy of continued investigation.

...read moreread less

80 citations

Journal Article•DOI•

Dissertation abstract

[...]

R. C. Groman

01 Sep 1988

TL;DR: A theoretical model of a knowledge-based information retrieval system developed in this thesis specifies the requirements and properties, of such a system, and a novel term-similarity function could be defined.

...read moreread less

Abstract: Information retrieval can be defined as the extraction of specific information out of a great number of stored information items. Information retrieval systems, used for the retrieval of documents, try to answer more or less precise questions about interesting topics with a number of suitable documents or references to documents. Such systems should contain 'knowledge' about the meaning of questions, about the content of the stored information and the particular user's needs for information.Knowledge-bases systems claim to be able to store knowledge an draw conclusions from it. The goal of this thesis is to investigate the use of knowledge-based methods and technologies for information retrieval. A knowledge-based information retrieval system should represent its Information Structures, as well as knowledge in a common knowledge representation formalism. The retrieval process of the system should employ the inferential methods of the used knowledge representation formalism.A subset of first order logic is chosen for this thesis to represent knowledge. Specially designed retrieval rules represent knowledge for the purpose of retrieval. Retrieval rules capture knowledge about the user's vocabulary, his working domain and his way to perform the retrieval of documents. The problem of recall and precision of the answers of an information retrieval system is approached by an explicit representation of control knowledge.A theoretical model of a knowledge-based information retrieval system developed in this thesis specifies the requirements and properties, of such a system. In particular a novel term-similarity function could be defined. Properties like completeness and termination could be derived and boundaries for the amount of overhead of false control strategies could be investigated.The proposed model is implemented in a prototype of a knowledge-based information retrieval system, called KIR. KIR is a single-user system for personal document- and knowledge retrieval running on computer workstations. It is implemented using Prolog and Modula-2.

...read moreread less

68 citations

Proceedings Article•DOI•

Experiments on incorporating syntactic processing of user queries into a document retrieval strategy

[...]

Alan F. Smeaton¹, C. J. van Rijsbergen²•Institutions (2)

National Institute for Higher Education¹, University of Glasgow²

01 May 1988

TL;DR: Investigating whether linguistic processes can be used as part of a document retrieval strategy by predefining a level of syntactic analysis of user queries only, suggests that the approach of using linguistic processing in retrieval, is valid.

...read moreread less

Abstract: Traditional information has relied on the extensive use of statistical parameters in the implementation of retrieval strategies. This paper sets out to investigate whether linguistic processes can be used as part of a document retrieval strategy. This is done by predefining a level of syntactic analysis of user queries only, to be used as part of the retrieval process. A large series of experiments on an experimental test collection are reported which use a parser for noun phrases as part of the retrieval strategy. The results obtained from the experiments do yield improvements in the level of retrieval effectiveness and given the crude linguistic process used and the way it was used on queries and not on document texts, suggests that the approach of using linguistic processing in retrieval, is valid.

...read moreread less

65 citations

Proceedings Article•DOI•

Linear structure in information retrieval

[...]

S. K. M. Wong¹, Yiyu Yao¹•Institutions (1)

University of Regina¹

01 May 1988

TL;DR: The experimental results seem to demonstrate that the model provides a useful framework for the design of an adaptive system and a practical procedure to determine the linear decision function.

...read moreread less

Abstract: Based on the concept of user preference, we investigate the linear structure in information retrieval. We also discuss a practical procedure to determine the linear decision function and present an analysis of term weighting. Our experimental results seem to demonstrate that our model provides a useful framework for the design of an adaptive system.

...read moreread less

60 citations

Journal Article•DOI•

An information retrieval system for software components

[...]

Murray Wood, Ian Sommerville

01 May 1988

TL;DR: An information retrieval system which is specifically designed to be used for storing and retrieving information about software components is described, made use of developments in natural language research to represent component information in a form which encodes semantics as well as syntax.

...read moreread less

Abstract: This paper describes an information retrieval system which is specifically designed to be used for storing and retrieving information about software components. Rather than use a retrieval mechanism which is simply based on keyword descriptions, we have made use of developments in natural language research to represent component information in a form which encodes semantics as well as syntax. We call this the component descriptor frame. The paper describes the basic ideas which underlie our system and describes how it can be used for component information retrieval. An example of the system in use is presented. The version of the system described here has been fully implemented and is now being developed as part of a more general reuse support system.

...read moreread less

59 citations

Proceedings Article•DOI•

Coefficients of combining concept classes in a collection

[...]

E. A. Fox¹, G. L. Nunn¹, W. C. Lee²•Institutions (2)

Radford University¹, Virginia Tech²

01 May 1988

TL;DR: Experiments indicate that regression methods can help predict relevance, given query-document similarity values for each concept type, and the role of links is shown to be especially beneficial.

...read moreread less

Abstract: This report considers combining information to improve retrieval. The vector space model has been extended so different classes of data are associated with distinct concept types and their respective subvectors. Two collections with multiple concept types are described, ISI-1460 and CACM-3204. Experiments indicate that regression methods can help predict relevance, given query-document similarity values for each concept type. After sampling and transformation of data, the coefficient of determination for the best model was .48 (.66) for ISI (CACM). Average precision for the two collections was 11% (31%) better for probabilistic feedback with all types versus with terms only. These findings may be of particular interest to designers of document retrieval or hypertext systems since the role of links is shown to be especially beneficial.

...read moreread less

Proceedings Article•DOI•

An outline of a general model for information retrieval systems

[...]

Jian-Yun Nie

01 May 1988

TL;DR: The implicit base in all information retrieval systems is considered as a logical implication and the measure of correspondence between a document and a query is transformed into the estimation of the strength (or certainty) of logical implication.

...read moreread less

Abstract: This paper is a contribution to the construction of a general model for information retrieval. As in the paper of Van Rijsbergen ([RIJ86]), the implicit base in all information retrieval systems is considered as a logical implication. The measure of correspondence between a document and a query is transformed into the estimation of the strength (or certainty) of logical implication. The modal logics will show its suitability for representing the behavior of information retrieval systems. In existing Information Retrieval models, several aspects are often mixed. A part of this paper is contributed to separate these aspects to give a clearer view of information retrieval systems. This general model is also compared with some existing models to show its generality.

...read moreread less

Proceedings Article•DOI•

Concept based retrieval in classical IR systems

[...]

H. P. Giger¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

01 May 1988

TL;DR: This paper describes some aspects of a project with the aim of developing a user-friendly interface to a classical Information Retrieval (IR) System in order to improve the effectiveness of retrieval.

...read moreread less

Abstract: This paper describes some aspects of a project with the aim of developing a user-friendly interface to a classical Information Retrieval (IR) System in order to improve the effectiveness of retrieval. The character by character approach to IR has been abandoned in favor of an approach based on the meaning of both the queries and the texts containing the information to be sought. The concept space, locally derived from a thesaurus, is used to represent a query as well as documents retrieved in atomic concept units. Dependencies between the search terms are taken into account. The meanings of the query and the retrieved documents (results of Elementary Logical Conjuncts (ELCs)) are compared. The ranking method on the semantical level is used in connection with existing data of a classical IR system. The user enters queries without using complex Boolean expressions.

...read moreread less

Proceedings Article•DOI•

Rough sets and information retrieval

[...]

P. Das-Gupta¹•Institutions (1)

George Mason University¹

01 May 1988

TL;DR: The theory of rough sets, which allows us to classify objects into sets of equivalent members based on their attributes, is introduced and compared to the Boolean, vector and fuzzy models of information retrieval.

...read moreread less

Abstract: The theory of rough sets was introduced [PAWLAK82]. It allows us to classify objects into sets of equivalent members based on their attributes. We may then examine any combination of the same objects (or even their attributes) using the resultant classification. The theory has direct applications in the design and evaluation of classification schemes and the selection of discriminating attributes. Pawlak's papers discuss its application in the domain of medical diagnostic systems. Here we apply it to the design of information retrieval systems accessing collections of documents. Advantages offered by the theory are: the implicit inclusion of Boolean logic; term weighting; and the ability to rank retrieved documents. In the first section we describe the theory. This is derived from the work by [PAWLAK84, PAWLAK82] and includes only the most relevant aspects of the theory. In the second we apply it to information retrieval. Specifically, we design the approximation space, search strategies as well as illustrate the application of relevance feedback to improve document indexing. Following this in section three we compare the rough set formalism to the Boolean, vector and fuzzy models of information retrieval. Finally we present a small scale evaluation of rough sets which indicates its potential in information retrieval.

...read moreread less

Proceedings Article•DOI•

Compression of concordances in full-text retrieval systems

[...]

Y. Choueka¹, Aviezri S. Fraenkel², Shmuel T. Klein³•Institutions (3)

Bar-Ilan University¹, Weizmann Institute of Science², University of Chicago³

01 May 1988

TL;DR: Several methods are presented, which efficiently compress concordances of large fulltext retrieval systems, and yield savings of up to 49% relative to the non-compressed file.

...read moreread less

Abstract: The concordance of a full-text information retrieval system contains for every different word W of the data base, a list L(W) of “coordinates”, each of which describes the exact location of an occurrence of W in the text. The concordance should be compressed, not only for the savings in storage space, but also in order to reduce the number of I/O operations, since the file is usually kept in secondary memory. Several methods are presented, which efficiently compress concordances of large fulltext retrieval systems. The methods were tested on the concordance of the Responsa Retrieval Project and yield savings of up to 49% relative to the non-compressed file; this is a relative improvement of about 27% over the currently used prefix-omission compression technique.

...read moreread less

Proceedings Article•DOI•

Natural language techniques for intelligent information retrieval

[...]

P. S. Jacob¹, Lisa F. Rau¹•Institutions (1)

General Electric¹

01 May 1988

TL;DR: This work provides an analysis of areas in which natural language and information retrieval come together, and describes a system that joins the two fields by combining technology, choice of application area, and knowledge acquisition techniques.

...read moreread less

Abstract: Neither natural language processing nor information retrieval is any longer a young field, but the two areas have yet to achieve a graceful interaction. Mainly, the reason for this incompatibility is that information retrieval technology depends upon relatively simple but robust methods, while natural language processing involves complex knowledge-based systems that have never approached robustness. We provide an analysis of areas in which natural language and information retrieval come together, and describe a system that joins the two fields by combining technology, choice of application area, and knowledge acquisition techniques.

...read moreread less

Proceedings Article•DOI•

Retrieving documents by plausible inference: a priliminary study

[...]

W. B. Croft¹, T. J. Lucia¹, P. R. Cohen¹•Institutions (1)

University of Massachusetts Amherst¹

01 May 1988

TL;DR: The approach to plausible inference for retrieval is explained and some preliminary experiments designed to test this approach using a spreading activation search to implement the plausible inference process show that significant effectiveness improvements are possible.

...read moreread less

Abstract: Choosing an appropriate document representation and search strategy for document retrieval has been largely guided by achieving good average performance instead of optimizing the results for each individual query. A model of retrieval based on plausible inference gives us a different perspective and suggests that techniques should be found for combining multiple sources of evidence (or search strategies) into an overall assessment of a document's relevance, rather than attempting to pick a single strategy. In this paper, we explain our approach to plausible inference for retrieval and describe some preliminary experiments designed to test this approach. The experiments use a spreading activation search to implement the plausible inference process. The results show that significant effectiveness improvements are possible using this approach.

...read moreread less

Proceedings Article•DOI•

Knowledge representation, connectionism and conceptual retrieval

[...]

Ronald J. Brachman¹, Deborah L. McGuinness²•Institutions (2)

Bell Labs¹, Rutgers University²

01 May 1988

TL;DR: A number of experiments are reported on in which a connectionist simulator is used to support similarity-based reasoning in a frame representation to draw some tentative, mixed conclusions on the potential for a union of KR, IR, and connectionism.

...read moreread less

Abstract: Knowledge Representation (KR) systems provide support for Artificial Intelligence systems that reason about relationships between objects in their domains of expertise. Because of their support for inference, KR systems appear to have potential to enrich the kind of retrievals that IR systems might make. Ironically, however, the most useful KR systems are limited to reasoning based on a rigid notion of validity, and thus are awkward to use when relevant but inexact retrievals are desired. We have been exploring the potential of a “connectionist” model—the Boltzmann Machine—to overcome this limitation. We report on a number of experiments in which we use a connectionist simulator to support similarity-based reasoning in a frame representation. We draw some tentative, mixed conclusions on the potential for a union of KR, IR, and connectionism.

...read moreread less

Journal Article•DOI•

A Look Back and a Look Forward

[...]

Karen Sparck Jones¹•Institutions (1)

University of Cambridge¹

01 May 1988

TL;DR: This paper is in two parts, following the suggestion that I first comment on my own past experience in information retrieval, and then present my views on the present and future.

...read moreread less

Abstract: This paper is in two parts, following the suggestion that I first comment on my own past experience in information retrieval, and then present my views on the present and future.

...read moreread less

Proceedings Article•DOI•

Integrated information retrieval for law in a hypertext environment

[...]

E. Wilson¹•Institutions (1)

University of Kent¹

01 May 1988

TL;DR: A prototype information retrieval system for lawyers, Justus, has been developed on a Sun workstation to run in a Guide hypertext environment and incorporates primary legal sources and secondary sources, such as textbooks and a dictionary.

...read moreread less

Abstract: A prototype information retrieval system for lawyers, Justus, has been developed on a Sun workstation to run in a Guide hypertext environment. The hypertext database is created automatically by Justus from machine readable versions of the ordinary printed texts, ideally the publisher's typesetting tapes. The database incorporates primary legal sources, such as statutes and cases, and secondary sources, such as textbooks and a dictionary. Initially, the lawyer may select any document in the system. From this initial document, he may access any other document, or part of any other document, to which reference is made. Reference selection is by a pointing device, such as a mouse. There is no limit on the number of selections that can be made, and no restrictions on the path through the system.

...read moreread less

Proceedings Article•DOI•

On the nature and fuction of explanation in intelligent information retrieval

[...]

Nick Belkin¹•Institutions (1)

Rutgers University¹

01 May 1988

TL;DR: A general model of clarity in human-computer systems, of which explanation is one component, is proposed, and a model for explanation by the computer intermediary in information retrieval is proposed.

...read moreread less

Abstract: We discuss the complexity of explanation activity in human-human goal-directed dialogue, and suggest that this complexity ought to be taken account of in the design of explanation in human-computer interaction. We propose a general model of clarity in human-computer systems, of which explanation is one component. On the bases of: this model; of a model of human-intermediary interaction in the document retrieval situation as one of cooperative model-building for the purpose of developing an appropriate search formulation; and, on the results of empirical observation of human user-human intermediary interaction in information systems, we propose a model for explanation by the computer intermediary in information retrieval.

...read moreread less

Proceedings Article•DOI•

IR-NLI II: applying man-machine interaction and artificial intelligence conceptsto information retrieval

[...]

Giorgio Brajnik¹, Giovanni Guida¹, Carlo Tasso¹•Institutions (1)

University of Udine¹

01 May 1988

TL;DR: The overall organization of the IR-NLI II system is presented, together with a short description of the two main modules implemented so far, namely the Information Retrieval Expert Subsystem and the User Modeling Subsystem.

...read moreread less

Abstract: This paper addresses the problem of building expert interfaces to information retrieval systems. In particular, the problem of augmenting the capabilities of such interfaces with user modeling features is discussed and the main benefits of this approach are outlined. The paper presents a prototype system called IR-NLI II, devoted to model by means of artificial intelligence techniques the human intermediary to information retrieval systems. The overall organization of the IR-NLI II system is presented, together with a short description of the two main modules implemented so far, namely the Information Retrieval Expert Subsystem and the User Modeling Subsystem. An example of interaction with IR-NLI II is described. Perspectives and future research directions are finally outlined.

...read moreread less

Proceedings Article•DOI•

Retrieval based on user behaviour

[...]

A. J. Kok¹, A. M. Botman¹•Institutions (1)

VU University Amsterdam¹

01 May 1988

TL;DR: An overview of the ongoing research in the Active Data Bases project at the Vrije Universiteit, Amsterdam is given, which is specifying and building a system that helps a user in his search for useful and interesting information in large, complex information systems.

...read moreread less

Abstract: This paper gives an overview of the ongoing research in the Active Data Bases project at the Vrije Universiteit, Amsterdam. In this project we are specifying and building a system that helps a user in his search for useful and interesting information in large, complex information systems. The system is able to do this, because it learns from the interaction about the users and the data it contains. The indications of the users are expressed in terms of interests in the data, which serve as building blocks for user and data models. These models are then used to improve the search for interesting data.

...read moreread less

Journal Article•DOI•

IRX: an information retrieval system for experimentation and user applications

[...]

Donna Harman, Dennis Benson, Larry Fitzpatrick, Rand Huntzinger, Charles Goldstein - Show less +1 more

01 May 1988

TL;DR: IRX as mentioned in this paper is a text retrieval system designed to be a testbed for conducting information retrieval research on statistically-based retrieval strategies in either batch or interactive modes, and is used at the Johns Hopkins University and the Lister Hill Center providing access to databases in human and molecular genetics.

...read moreread less

Abstract: IRX is a text retrieval system designed to be a testbed for conducting information retrieval research on statistically-based retrieval strategies in either batch or interactive modes. The modular structure of IRX has permitted major changes in components of the system (e.g., ranking algorithms, parsers, interfaces; without redesign. As an interactive system IRX is in use at the Johns Hopkins University and the Lister Hill Center providing access to databases in human and molecular genetics.

...read moreread less

Proceedings Article•DOI•

Some measures and procedures for evaluation of the user interface in an information retrieval system

[...]

Jean Tague¹, R. Schultz¹•Institutions (1)

University of Western Ontario¹

01 May 1988

TL;DR: Examination of retrieval system evaluation from the perspective of the user presents evaluation procedures which are appropriate to this perspective and which can be used to isolate the effect of variation in the user interface to the system.

...read moreread less

Abstract: Planning the evaluation of an information retrieval system involves two steps: first, a determination of performance descriptors and measures appropriate to the system objectives and, secondly, a development of an evaluation design which ensures the effect of variation in components of interest will be isolated and assessed in an unbiased fashion. This paper examines the question of retrieval system evaluation from the perspective of the user. It presents evaluation procedures which are appropriate to this perspective and which can be used to isolate the effect of variation in the user interface to the system. The general procedure is exemplified by an application to evaluation of an experimental OPAC interface.

...read moreread less

Proceedings Article•DOI•

Flexible selection among objects: a framework based on fuzzy sets

[...]

P. Bosc, M. Galibourg

01 May 1988

TL;DR: The basic principles of this approach are presented and compared to more conventional solutions providing only limited extensions and the implementation aspects related to the approach are discussed to show that reasonable performances can be expected.

...read moreread less

Abstract: Up to now, most of the retrieving systems are founded on a Boolean selection mechanism. It appears that this way of doing is not powerful enough to deal with some applications, especially when the size (number) of the results must be controlled. In that case, some kind of flexibility is needed in query expression. In this paper, we suggest the use of a fuzzy sets based approach. The basic principles of this approach are presented and compared to more conventional solutions providing only limited extensions. Moreover, the implementation aspects related to our approach are discussed to show that reasonable performances can be expected.

...read moreread less

Proceedings Article•DOI•

Two learning schemes in information retrieval

[...]

Clement Yu¹, H. Mizuno¹•Institutions (1)

University of Illinois at Chicago¹

01 May 1988

TL;DR: Two methods are given to improve weighting schemes by using relevance information of a set of queries to estimate parameter values of two independence models in information retrieval — the binary independence model and the non-binary independence model.

...read moreread less

Abstract: Two methods are given to improve weighting schemes by using relevance information of a set of queries. The first method is to estimate parameter values of two independence models in information retrieval — the binary independence model and the non-binary independence model. The parameters estimated here are used to calculate optimal weights for terms in a different set of queries. Performance of this estimation is compared to the inverse document frequency method, the cosine measure, and the statistical similarity measure. The second method is to learn optimal weights of the non-binary independence model adaptively by a learning formula. Experiments are performed on three different document collections CISI, MEDLARS, and CRN4NUL for both methods, and results are reported. Both methods show improvements compared to the existing weighting schemes. Experimental results show that the second method gives slightly better performance than the first one, and has simpler implementation.

...read moreread less

Proceedings Article•DOI•

Query processing in a heterogeneous retrieval network

[...]

P. Simpson¹•Institutions (1)

Princeton University¹

01 May 1988

TL;DR: A normal form for expression of queries is defined, it is shown that such queries can be automatically produced, if necessary, from a natural-language request for information, and algorithms for translating such queries into equivalent queries on both Boolean and term-vector type retrieval systems are given.

...read moreread less

Abstract: The concept of a large-scale information retrieval network incorporating heterogeneous retrieval systems and users is introduced, and the necessary components for enabling term-based searching of any database by untrained end-users are outlined. We define a normal form for expression of queries, show that such queries can be automatically produced, if necessary, from a natural-language request for information, and give algorithms for translating such queries, with little or no loss of expressiveness, into equivalent queries on both Boolean and term-vector type retrieval systems. We conclude with a proposal for extending this approach to arbitrary database models.

...read moreread less

Proceedings Article•DOI•

Information retrieval using impression of documents as a clue

[...]

F. Hirabayashi¹, H. Matoba¹, Y. Kasahara¹•Institutions (1)

NEC¹

01 May 1988

TL;DR: An image retrieval experiment shows that the proposed internal representation and mapping method and the user interface design provide effective tools for information retrieval.

...read moreread less

Abstract: Proposed here is an internal representation and mapping method for multimedia information in which retrieval is based on the impression documents are desired to make. A user interface design for a system using this method is also proposed.The proposed internal representation and mapping method represents each desired document impression as an axis in a semantic space. Documents are represented as points in the space. Queries are represented as subspaces. The proposed user interface design employs a method of visual presentation of the semantic space.For evaluation purposes, a prototype system has been developed. An image retrieval experiment shows that the proposed internal representation and mapping method and the user interface design provide effective tools for information retrieval.

...read moreread less

Proceedings Article•DOI•

Correction of phonographic errors in natural language interfaces

[...]

J. Veronis¹•Institutions (1)

Centre national de la recherche scientifique¹

01 May 1988

TL;DR: A mathematical framework for phonographic correction is proposed by defining a similarity relation between phonetically related substrings and a dissimilarity index between strings and providing a simple and efficient algorithm for recognizing words in dictionaries from misspelt inputs including both typographical and phonographic errors.

...read moreread less

Abstract: In this paper, we point out that, in applications available to the general public, and/or natural language interfaces, the correction of phonographic errors (which are competence errors) is far more important than the correction of typographical errors (which are simply performance errors). Many studies aimed at the correction of typographical errors have been carried out, but relatively few tackle the problem of phonographic correction, and they are generally based on more or less ad hoc methods. We propose a mathematical framework for phonographic correction by defining a similarity relation between phonetically related substrings and a dissimilarity index between strings. We also provide a simple and efficient algorithm for recognizing words in dictionaries from misspelt inputs including both typographical and phonographic errors.

...read moreread less