scispace - formally typeset
Search or ask a question

Showing papers on "Ranking (information retrieval) published in 1986"



Proceedings ArticleDOI
01 Sep 1986
TL;DR: It is shown that significant improvements over no term weighting can be made using a combination of weighting measures and normalizing for document length, and the ability to effectively rank retrieved documents in order of their probable relevance to a query.
Abstract: The ability to effectively rank retrieved documents in order of their probable relevance to a query is a critical factor in statistically-based keyword retrieval systems. This paper summarizes a set of experiments with different methods of term weighting for documents, using measures of term importance within an entire document collection, term importance within a given document, and document length. It is shown that significant improvements over no term weighting can be made using a combination of weighting measures and normalizing for document length.

92 citations


Journal ArticleDOI
TL;DR: A Bayesian argument is used to suggest modifications to the Robertson/Sparck Jones relevance weighting formula, to accommodate the addition to the query of terms taken from the relevant documents identified during the search.
Abstract: A Bayesian argument is used to suggest modifications to the Robertson/Sparck Jones relevance weighting formula, to accommodate the addition to the query of terms taken from the relevant documents identified during the search.

77 citations


Journal ArticleDOI
Pekka Korhonen1
TL;DR: This interactive approach is designed to help the decision maker find the most preferred aggregation of the k th level criteria, which produces the criteria at the ( k + 1)st level.

45 citations


Journal ArticleDOI
TL;DR: A prototype front-end system—Cirt—which permits weighting, ranking and relevance feedback on a traditional IR system—Data—Star—is described and discussed, and a project currently under way to evaluate Cirt against traditional retrieval methods under operational conditions is described.
Abstract: A prototype front-end system-Cirt-which permits weighting, ranking and relevance feedback on a traditional IR system-Data-Star-is described and discussed. Cirt is based on an integrated theory of search term weighting, document ranking and modification of weights based on relevance feed back. Previous laboratory tests on various aspects of the theory have led to the need for further evaluation in an operational environment; the intention of Cirt is to make such evaluation possible. The operating environment is described and the de sign of the system is discussed, including the machine to machine host/front-end dialogue, the user interface and some aspects of the programming. A project currently under way to evaluate Cirt against traditional retrieval methods under oper ational conditions is described. The article concludes with a brief word on the future prospects for this type of retrieval.

38 citations


Proceedings ArticleDOI
01 Sep 1986
TL;DR: A project which attempts to classify representations of the anomalous states of knowledge of users of document retrieval systems on the basis of structural characteristics of the representations, and which specifies different retrieval strategies and ranking mechanisms for each ASK class.
Abstract: We report on a project which attempts to classify representations of the anomalous states of knowledge (ASKs) of users of document retrieval systems on the basis of structural characteristics of the representations, and which specifies different retrieval strategies and ranking mechanisms for each ASK class. The classification and retrieval strategy specification is based on 53 real problem statements, 35 of which have a total of 250 evaluated documents. Four facets of the ASK structures have been tentatively identified, whose combinations determine the method and order of application of five basic ranking strategies. This work is still in progress, so results presented here are incomplete.

38 citations


Proceedings Article
25 Aug 1986
TL;DR: A new type of dntab,ase information, called completeness inlormnlion, is introduced, to determine whether each ansivcr to a user query is complete, or whether any subsets of the database for which this assumption is correct.
Abstract: The assumption that a database includes a representation ol every occurrence in the real world environmrnl that it models (the Closed World Asscrtnplio?l) is frequently unrealistic, because it is always made on the database as a whole This paper introduces a new type of dntab,ase information, called completeness inlormnlion, lo dcscrihe the subsets of the database for which this assumption is correct With completeness information it is possible to determine whether each ansivcr to a user query is complete, or whether any subsets of it are complete To users, answers which are accompanied by a statement about their completeness are more mraningful First, the principles of completeness informnlion are defined formally, using an abstract data model Then, specific methods are described for implcmrnting completeness information in the relational modr4 With these methods, each relational algebra query can be nccompnnietl wi(h an instantaneous verdict on its coml)letcness (or on the completcncss of some of its

29 citations


Proceedings ArticleDOI
01 Sep 1986
TL;DR: An information retrieval model, named the Generalized Vector Space Model (GVSM), is extended to handle situations where queries are specified as Boolean expressions and it is shown that this unified model has the advantage of incorporating term correlations into the retrieval process.
Abstract: An information retrieval model, named the Generalized Vector Space Model (GVSM), is extended to handle situations where queries are specified as (extended) Boolean expressions. It is shown that this unified model, unlike currently available alternatives, has the advantage of incorporating term correlations into the retrieval process. The query language extension is attractive in the sense that most of the algebraic properties of the strict Boolean language are still preserved. Although the experimental results for extended Boolean retrieval are not always better than the vector processing method, the developments here are significant in facilitating commercially available retrieval systems to benefit from the vector based methods. The proposed scheme is compared to the p-norm model advanced by Salton and coworkers. An important conclusion is that it is desirable to investigate further extensions that can offer the benefits of both proposals.

22 citations


Journal ArticleDOI
TL;DR: This paper proposes a linear ordering among AVL-trees by integer-pair sequences, called LDP-sequences, and finds that ranking and unranking can be done in 0.0(n\log ^2 n) and 0(n \log ^3 n) time, respectively, after a preprocessing step that takes $0( n^2 \log n)$ time.
Abstract: In this paper, we consider the problem of generating, ranking, and unranking of AVL-trees with n leaves. We represent AVL-trees by integer-pair sequences, called LDP-sequences. Then we propose a linear ordering among these sequences, i.e., among the AVL-trees. The problem of ranking is to determine the order number (rank) of a given tree in this ordering, unranking means constructing the tree of a given rank. The main result is that ranking and unranking can be done in $0(n\log ^2 n)$ and $0(n\log ^3 n)$ time, respectively, after a preprocessing step that takes $0(n^2 \log n)$ time.

16 citations


Journal ArticleDOI
TL;DR: A procedure for evaluating a software prototype consists of identifying evaluation criteria, defining alternative design approaches, and ranking the alternatives according to the criteria.
Abstract: A procedure for evaluating a software prototype is presented. The need to assess the prototype itself arises from the use of prototyping to demonstrate the feasibility of a design or development strategy. The assessment procedure can also be of use in deciding whether to evolve a prototype into a complete system. The procedure consists of identifying evaluation criteria, defining alternative design approaches, and ranking the alternatives according to the criteria.

13 citations


01 Jan 1986
TL;DR: A chemical ranking and scoring (CRS-Korea) system was developed and proposed to use as the first step to prioritize the toxic chemicals for the purpose of monitoring and detailed risk assessment that might follow as necessary as discussed by the authors.
Abstract: A chemical ranking and scoring (CRS-Korea) system was developed and proposed to use as the first step to prioritize the toxic chemicals for the purpose of monitoring and detailed risk assessment that might follow as necessary. The CRS-Korea system takes a basic concept of risk assessment (both human health risk and ecological risk) in that risk score is determined by the product of toxicity score and exposure score. Included in the toxicity category are acute toxicity, chronic/sub-chronic toxicity, carcinogenicity, and other toxicity. The exposure category consists of quantity released to the environment, bioconcentration, and persistence. A consistent scheme and a comprehensive chemical data base are offered in the CRS-Korea system to calculate a score for the each component in the two categories by using specific physicochemical, fate, and toxic properties and the quantity of the chemical used. The toxicity score is obtained by adding up all the individual scores for the components in the toxicity category. The exposure score is determined by multiplication of the score of the quantity released with the sum of persistent score and bioconcentration score. Equal weight is given to the toxicity score and the exposure score. As the CRS-Korea system was applied to identify 50 national priority chemicals, it was found that significant data gap exists on toxicity and fate properties and that

Journal ArticleDOI
TL;DR: An interactive computer package of ranking and selection procedures called RANKSEL, developed by the author is described, which is fully conversational, user-friendly, informative, requires little or no knowledge ofranking and selection methodology, and is easy to use.
Abstract: SYNOPTIC ABSTRACTThe statistical methodology known as ranking and selection is a well-established branch of mathematical statistics. Despite its potential for use in real-life problems, however, the number of applications of ranking and selection has been very small. One reason for this is that ranking and selection procedures have not until now been available in computer package form. This paper describes an interactive computer package of ranking and selection procedures called RANKSEL, developed by the author. RANKSEL is fully conversational, user-friendly, informative, requires little or no knowledge of ranking and selection methodology, and is easy to use. Examples of each of the procedures in RANKSEL are given.



Journal ArticleDOI
Hiroshi Hojo1
TL;DR: Two response models are provided: one for binary ranking and the other for sorting, which assume that the subject perceives any two stimuli as very similar to each other when their dissimilarity is below response thresholds that are associated with those stimuli.
Abstract: The present paper provides two response models: one for binary ranking and the other for sorting. The former is a behavior of choosing, in a random order, only those comparison stimuli which are judged to be very similar to a standard stimulus, and the latter is that of selecting stimuli which are judged very similar to each other to form them into clusters. The key assumption of these models is that the subject perceives any two stimuli as very similar to each other when their dissimilarity, which varies over time, is below response thresholds that are associated with those stimuli. Maximum likelihood estimation procedures are used for the estimation of parameters of these models. The proposed models are applied, for illustrative purposes, to the similarity data collected by the binary ranking and sorting methods. We discuss some advantages of the binary ranking method to be used for collecting similarity data and a practical limitation of our response model for sorting.



Journal ArticleDOI
01 May 1986
TL;DR: A vector processing model termed the GVSM [WO84a, WO85] was proposed in response to the inability of the vector processing systems to handle Boolean queries and it is shown that this model can be performed efficiently enough to handle very large collections.
Abstract: In the past, several mathematical models for document retrieval systems have been developed [C82, S83, S83a, T76, WO84]. These models are used to formally represent the basic characteristics, functional components, and the retrieval processes of document retrieval systems. Two basic categories of models that have been employed in information retrieval are the vector processing models and the Boolean retrieval models.In the conventional vector space model (VSM), proposed by Salton [S71, S83] index terms are basic vectors in a vector space. Each document or query is represented as a linear combination of these basic term vectors. The retrieval operation consists of computing the cosine similarity function between a given query vector and the set of document vectors and then ranking documents accordingly. In this approach, the interpretation that the occurrence frequency of a term in a document represents the component of the document vector along the corresponding basic term vectors is made.The advantages of this model are that it is simple and yet powerful. The vector operations can be performed efficiently enough to handle very large collections. Furthermore, it has been shown that the retrieval effectiveness is significantly higher compared to that of the Boolean retrieval models. However, this vector model has been incorporated into very few commercial systems.In the strict Boolean retrieval systems [BU81, P84] the user query normally consists of index terms that are connected by Boolean operators AND, OR and NOT. The advantage of using Boolean connectives is to provide a better structure to formulate the user query. The major problem in such a system is that there is no provision for associating weights of importance to the terms which are assigned either to the documents or to the queries. In other words, the representation is binary, indicating either the presence or the absence of the various index terms. The output obtained in response to a query is not ranked in any order of presumed importance to the user. In most cases, the AND connectives tend to be too restrictive [BU81]. Mose commercially available retrieval systems essentially conform to this model.One of the challenges for researchers in information retrieval has been to achieve greater acceptance of the vector processing models in commercial systems. The main difficulty in this connection is due to the inability of the vector processing systems to handle Boolean queries. In recent years some progress has been made in expressing Boolean queries as vectors [S83a, S83b]. If attractive ways to achieve this are advanced, it would then be possible to modify existing systems to use vector processing techniques without a great deal of cost and effort.Another problem in the conventional vector space model is that it assumes that term vectors are orthogonal. It is generally agreed that terms are correlated and it is necessary to generalize the model to incorporate term correlations. A vector processing model termed the GVSM [WO84a, WO85] was proposed in response to this need. In the GVSM, the queries are assumed to be presented as a list of terms and corresponding weights. Thus, no provision is made for processing Boolean queries. However, the premises of the model naturally lead to a scheme for handling Boolean queries. In this paper we present the details of this scheme. This result will help achieve the aim of integrating vector processing capabilities into existing systems which use Boolean retrieval models.


Journal Article
TL;DR: A seven-step process has proven successful for use by committees to attract and sort through written candidate applications, to agree upon a preliminary ranking of candidates and to reach a consensus on a final list of recommendations.
Abstract: The search for new administrators in complex systems is an important activity. The special requirements of academic organizations, particularly those with health centers, present some unique considerations that can confound this important and difficult process. Typically, national searches attract a sizable candidate list composed of persons with diverse backgrounds and experiences, and a committee is empowered to sort through their qualifications. A critical step in the planning of each search is the development of a process that allows participatory decision making while not requiring too much time. Too often the search becomes an unmanageable activity that confuses the searchers and frustrates the administration. A seven-step process has proven successful for use by committees to attract and sort through written candidate applications, to agree upon a preliminary ranking of candidates and to reach a consensus on a final list of recommendations. The process could be applied in almost any organizational setting.