scispace - formally typeset
Search or ask a question

Showing papers on "Ranking (information retrieval) published in 1979"


Journal ArticleDOI
TL;DR: In this paper, the authors consider the situation where no relevance information is available, that is, at the start of the search, and propose strategies based on a probabilistic model for the initial search and an intermediate search.
Abstract: Most probabilistic retrieval models incorporate information about the occurrence of index terms in relevant and non‐relevant documents. In this paper we consider the situation where no relevance information is available, that is, at the start of the search. Based on a probabilistic model, strategies are proposed for the initial search and an intermediate search. Retrieval experiments with the Cranfield collection of 1,400 documents show that this initial search strategy is better than conventional search strategies both in terms of retrieval effectiveness and in terms of the number of queries that retrieve relevant documents. The intermediate search is shown to be a useful substitute for a relevance feedback search. Experiments with queries that do not retrieve relevant documents at high rank positions indicate that a cluster search would be an effective alternative strategy.

399 citations


Journal ArticleDOI
TL;DR: Criteria are given for the functions used to evaluate the relevance of the records to a specific query, including self-consistency, as a generalization of a Boolean retrieval system.
Abstract: The use of weights to denote a query representation and/or the indexing of a document is analysed as a generalization of a Boolean retrieval system. Criteria are given for the functions used to evaluate the relevance of the records to a specific query, including self-consistency. Various mechanisms suggested in the literature for evaluating the relevance of records with regard to a given query are tested and found to be less than satisfactory. A new approach is suggested to avoid some of the perils of a weighted Boolean retrieval system.

161 citations


Journal ArticleDOI
TL;DR: A new method of document retrieval based on the fundamental operations of the fuzzy set theory is presented, starting by introducing basic notions, then the syntax and semantics of the proposed language for document retrieval will be given and an algorithm allocating documents to particular queries will be described and its properties discussed.
Abstract: The aim of a document retrieval system is to issue documents which contain the information needed by a given user of an information system The process of retrieving documents in response to a given query is carried out by means of the search patterns of these documents and the query It is thus clear that the quality of this process, ie the pertinence of the information system response to the information need of a given user depends on the degree of accuracy in which document and query contents are represented by their search patterns It seems obvious that the weighting of descriptors entering document search patterns improves the quality of the document retrieval process A mathematical apparatus which takes into consideration, in a natural manner, the fact that the grades of importance of the descriptors in document search patterns are of the continuum type, that is an apparatus adequate to the description of a retrieval system of documents indexed by weighted descriptors is—among known mathematical methods—the theory of fuzzy sets, formulated by LA Zadeh It is the aim of this paper to present a new method of document retrieval based on the fundamental operations of the fuzzy set theory We start by introducing basic notions, then the syntax and semantics of the proposed language for document retrieval will be given and an algorithm allocating documents to particular queries will be described and its properties discussed The basic advantage of the use of the fuzzy set theory for document retrieval system description is that it takes into consideration, in a simple way, the differentiation of the importance of descriptors in document search patterns and the differentiation of the formal relevance grades of particular documents of an information system to a given query Documents of the highest grades (in the given information system) of formal relevance to the given query may be retrieved by means of the application of simple operations of the fuzzy set theory

154 citations



Journal ArticleDOI
TL;DR: In this article, the authors explain the philosophy of the indifference zone approach and the subset-selection approach to ranking and selection procedures, including examples of operating characteristic curves and data applications for selection problems based on the binomial and normal distributions.
Abstract: This expository paper explains the philosophy of the indifference-zone approach and the subset-selection approach to ranking and selection procedures. It includes examples of operating characteristic curves and data applications for selection problems based on the binomial and normal distributions. A variety of different models and goals are provided with a list of references.

27 citations




01 Dec 1979
TL;DR: An approach to query optimization is described that draws on two sources of knowledge: real world constraints on the values for the application domain served by the database; and knowledge about the current structure of the database and the cost of available retrieval processes.
Abstract: An approach to query optimization is described that draws on two sources of knowledge: real world constraints on the values for the application domain served by the database; and knowledge about the current structure of the database and the cost of available retrieval processes. Real world knowledge is embodied in rules that are much like semantic integrity rules. The approach, called "query rephrasing", is to generate semantic equivalents of user queries that cost less to process than the original queries. The operation of a prototype system based on this approach is discussed in the context of simple queries which restrict a single file. The need for heuristics to limit the generation of equivalent queries is also discussed, and a method using "constraint thresholds" derived from a model of the retrieval process is proposed.

13 citations


Journal ArticleDOI
TL;DR: A model of information retrieval system based on thesaurus with weights is described, with emphasis onclusiveness and two other fundamental properties of the considered system are given.
Abstract: This paper describes a model of information retrieval system based on thesaurus with weights. Definitions of the following terms: thesaurus, document description, information query, similarity of queries and descriptions of documents, similarity measure and accuracy of response are given. Inclusiveness and two other fundamental properties of the considered system are given.

8 citations


01 Jan 1979
TL;DR: The use of weights to denote a query representation and/or the indexing of a document is analyzed as a generalization of a Boolean retrieval system.
Abstract: The use of weights to denote a query representation and/or the indexing of a document is analyzed as a generalization of a Boolean retrieval system. Criteria are given for the functions used to evaluate the relevance of the records to a specific query. Various mechanisms for evaluating the relevance of records with regard to a given query are tested and found to be less than satisfactory. A new approach is suggested to avoid some of the perils.

4 citations


Journal ArticleDOI
TL;DR: It is postulated that such a system is capable of satisfactorily resolving the major criticisms of the use of citation data for selection purposes: that libraries are diverse in their interests and that no aggregate list can be more than generally relevant.
Abstract: A design for an on-line serials decision-making and collection analysis system is proposed. It is composed of four basic components: citation data, conventional serial records data, utility/cost ratio compilation and journal ranking techniques, and user interface software. The system would have the ability to respond specifically to user interest profiles and to integrate locally generated data. It is postulated that such a system is capable of satisfactorily resolving the major criticisms of the use of citation data for selection purposes: that libraries are diverse in their interests and that no aggregate list can be more than generally relevant; that the inclusion of cost data is essential and that citation ranking without regard to cost can be misleading; and that other relevant data should be considered as well.

01 Jan 1979
TL;DR: In this article, the authors propose a novel approach to solve the problem of homonymity in homonym-pairing: this article ) and propose a solution: homonymization.
Abstract: Chapter

01 Jul 1979
TL;DR: Using global terms to retrieve blocks of information that would otherwise require five to six specific terms is not recommended unless the specific items of information subsumed under the global term are normally retrieved together frequently.
Abstract: : This experiment assessed the impact of using two levels of retrieval terms on formulating and inputting query statements. One level used specific terms to retrieve one element of information; the other level used global terms to retrieve blocks of information that would otherwise require five to six specific terms. Participants were then given a set of 48 problems. For each problem, the participants had to write and type a query statement that would satisfy the information requirements. The opportunity to use global terms had not effect either on the time needed to write query statements or on the accuracy of typed query statements. Where the use of global terms was applicable, substantial savings in the time required to input query statements was shown. Except that the global-specific group reported that it made more use of the data name chart before using the data dictionary, the two groups indicated that they went about writing query statements in approximately the same way. Both groups gave high ratings to the value of using global terms. Use of global terms is not recommended unless the specific items of information subsumed under the global term are normally retrieved together frequently.




01 Jan 1979
TL;DR: A map, drawing or chart is part of the material being photo- graphed and the photographer has followed a definite method in “sectioning” the material.
Abstract: 3. When a map, drawing or chart, etc., is part of the material being photo­ graphed the photographer has followed a definite method in “sectioning” the material. It is customary to begin filming at the upper left hand corner of a large sheet and to continue from left to right in equal sections with small overlaps. I f necessary, sectioning is continued again—beginning below the first row and continuing 011 until complete.