scispace - formally typeset
Search or ask a question

Showing papers on "Ranking (information retrieval) published in 1991"


Journal ArticleDOI
TL;DR: This article demonstrates that the similar terms identified by cooccurrence data in a query expansion system tend to occur very frequently in the database that is being searched.
Abstract: Term cooccurrence data has been extensively used in document retrieval systems for the identification of indexing terms that are similar to those that have been specified in a user query: these similar terms can then be used to augment the original query statement. Despite the plausibility of this approach to query expansion, the retrieval effectiveness of the expanded queries is often no greater than, or even less than, the effectiveness of the unexpanded queries. This article demonstrates that the similar terms identified by cooccurrence data in a query expansion system tend to occur very frequently in the database that is being searched. Unfortunately, frequent terms tend to discriminate poorly between relevant and nonrelevant documents, and the general effect of query expansion is thus to add terms that do little or nothing to improve the discriminatory power of the original query.

261 citations


Journal ArticleDOI
TL;DR: The empirical results support the hypothesis that the stability of ranking information decreases with decreasing rank for a ranking of 4 alternatives, and indicate the best strategy to combine the ranks is to include rank-specific scale and other bias parameters.

216 citations


Proceedings ArticleDOI
M.F. Wyle1
09 Oct 1991
TL;DR: A discussion is given on an intelligent wide area network-based clipping service used as a test-bed for the development and effectiveness measurement of high-performance retrieval algorithms.
Abstract: A discussion is given on an intelligent wide area network-based clipping service used as a test-bed for the development and effectiveness measurement of high-performance retrieval algorithms. The system taps news wire and other information sources aperiodically, several times daily. It maintains a local 100 Mbyte news database that changes at a rate of 10 Mbytes per day. The system stores and manages user interest profiles, each of which includes several query texts. News items are selected for presentation by a completely automatic four-stage information filtering process; expert system rules and database knowledge criteria are used in the first three stages; vector-space ranking is used in the final stage. News items are actively and selectively disseminated to users via electronic mail. >

203 citations


Journal ArticleDOI
TL;DR: A class of vector filters is developed, which are efficient smoothers in additive noise and can be designed to have detail-preserving characteristics and are used to develop ranked-order type estimators for multivariate image fields.
Abstract: The extension of ranking a set of elements in R to ranking a set of vectors in a p'th dimensional space R/sup p/ is considered. In the approach presented here vector ranking reduces to ordering vectors according to a sorted list of vector distances. A statistical analysis of this vector ranking is presented, and these vector ranking concepts are then used to develop ranked-order type estimators for multivariate image fields. A class of vector filters is developed, which are efficient smoothers in additive noise and can be designed to have detail-preserving characteristics. A statistical analysis is developed for the class of filters and a number of simulations were performed in order to quantitatively evaluate their performance. These simulations involve the estimation of both stationary multivariate random signals and color images in additive noise. >

103 citations



Journal ArticleDOI
TL;DR: It is proved that the utility an inquirer receives from the relevant documents he or she retrieves is maximized by selecting those documents with the largest predictive probabilities of relevance.
Abstract: We challenge the probability ranking principle in information retrieval from the perspectives of (1) signal detection-decision theory and (2) utility theory. If three conditions are not met by an IR system that is producing predictive probabilities of relevance, then inquirers may incur costs that are too great by selecting first those documents that the system predicts have the highest probabilities of relevance. These three conditions are that predictive probabilities are well calibrated (predictively ccurate); that they are reported with certainty; and that an inquirer independently assesses the relevance of all documents he or she retrieves. When these conditions are met,signal detection analysis with fixed decision-theoretic costs shows that the probability ranking principle is advisable.

41 citations


Journal ArticleDOI
TL;DR: This study concludes that retrieval strategies within the rough set model perform significantly better than retrieval within the vector model using the cosine formula, in document ranking as well as in recall.
Abstract: The objective here is to present the results of our study which show that rough approximations contribute to the improvement of recall in information retrieval (IR). The information retrieval literature provides ample evidence of situations in which less than 40% of the relevant documents are retrieved. A major reason for this is the problem of search vocabulary and the burden it imposes on the user who is expected to specify all possible terms that refer to the subject of interest. The theory of rough sets provides a framework for organizing the vocabulary in such a way that this constraint is reduced. The model also provides a set of search strategies that are flexible and user oriented. These strategies are based on approximate descriptions of objects such as queries and documents. This study concludes that retrieval strategies within the rough set model perform significantly better than retrieval within the vector model using the cosine formula, in document ranking as well as in recall. The paper concludes by demonstrating a methodology for document clustering using rough sets. This work is a continuation of earlier work by the author on the application of rough sets to information retrieval.

36 citations


Journal ArticleDOI

32 citations


Proceedings ArticleDOI
01 Aug 1991
TL;DR: Nomenclator is an architecture for providing efficient descriptive (attribute-based) naming in a large internet environment that will eventually incorporate other name services in addition to X.500 as its underlying data repository.
Abstract: Nomenclator is an architecture for providing efficient descriptive (attribute-based) naming in a large internet environment. As a test of the basic design, we have built a Nomenclator prototype that uses X.500 as its underlying data repository. X.500 SEARCH queries that previously took several minutes, can, in many cases, be answered in a matter of seconds. Our system improves descriptive query performance by trimming branches of the X.500 directory tree from the search. These tree-trimming techniques are part of an active catalog that constrains the search space as needed during query processing. The active catalog provides information about the data distribution (meta-data) to constrain query processing on demand. Nomenclator caches both data (responses to queries) and meta-data (data distribution information, tree-trimming techniques, data access techniques) to speed future queries. Nomenclator relieves users of the need to understand the structure of the name space to locate objects quickly in a large, structured name environment. Nomenclator is a meta-level service that will eventually incorporate other name services in addition to X.500. Its techniques for improving performance should be generally applicable to other naming systems. hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh Research supported in part by an AT&T Ph.D. Scholarship, National Science Foundation grants CCR8703373 and CCR-8815928, Office of Naval Research grant N00014-89-J-1222, and a Digital Equipment Corporation External Research Grant.

31 citations


Journal ArticleDOI
TL;DR: A prototype distributed information retrieval system was designed and built using a distributed architecture and using statistical ranking techniques to help provide better service for the end user.
Abstract: Centralized systems continue to dominate the information retrieval market, with increased competition from CD-ROM based systems. As more large organizations begin to implement office automation systems, however, many will find that neither of these types of retrieval systems will satisfy their requirements, especially those requirements involving easy integration into other systems and heavy usage by casual end users. A prototype distributed information retrieval system was designed and built using a distributed architecture and using statistical ranking techniques to help provide better service for the end user. The distributed architecture was shown to be a feasible alternative to centralized or CD-ROM information retrieval, and user testing of the ranking methodology showed both widespread user enthusiasm for this retrieval technique and very fast response times (on the order of one second for 300 megabytes of data).

25 citations




Proceedings ArticleDOI
11 Sep 1991
TL;DR: The compound document processing system (CDPS) is a multimedia information system that provides integrated facilities for storage and retrieval of text, pictures and document images based on the probabilistic retrieval model.
Abstract: The compound document processing system (CDPS) is a multimedia information system that provides integrated facilities for storage and retrieval of text, pictures and document images. It is based on the probabilistic retrieval model which involves ranking the retrieved records in descending order of their similarity to the user's query. Facilities provided by the system include extracting information from scanned multimedia documents, automatic indexing, sharing information between different devices such as electronic mail, facsimile machine, word processor, text formatter and so on. >

Book ChapterDOI
16 Oct 1991
TL;DR: The technique for constructing user profiles is developed to enable the information retrieval system to learn individual user interpretation of keywords, by eliciting user opinion, as well as vocabulary, by employing techniques from Personal Construct Theory.
Abstract: An information retrieval system retrieves a set of bibliographic citations in response to a user query. The user formulates a query by selecting keywords that express his/her information needs. In query formulation, different persons may use same terms to imply different meaning based on their background and experience. The system, however, is unable to perceive different user viewpoints of terms because the system is built to assign a unique meaning to each term. In this paper, we develop the technique for constructing user profiles to enable the system to learn individual user interpretation of keywords. The user profiles are developed by eliciting user opinion, as well as vocabulary, by employing techniques from Personal Construct Theory. The elicited opinion is analyzed through machine learning heuristics. This leads to a user profile that correlates user vocabulary to index terms in system representation of documents. The techniques are experimentally validated.

Proceedings Article
03 Sep 1991
TL;DR: To support Kaleidoscope’s style of usersystem interaction, the presence of a high-level data model is critical, and the absence of an explicit model leads to ad hoc grammar design and query translation.
Abstract: Most database interfaces provide poor guidance on ad hoc query formulation, burdening users to learn, and to recall precisely the query language and the database. Kaleidoscope avoids this problem by guiding the user’s query construction actively. Based on a grammar specifying the syntax and semantics of an English-like Query Language (EnQL), the interface generates legitimate query constituents incrementally as menu choices. Additional intraquery guidance ensures the integrity of a partial query. The central theme of this paper is that to support Kaleidoscope’s style of usersystem interaction, the presence of a high-level data model is critical. The absence of an explicit model leads to ad hoc grammar design and query translation. Existing models are inadequate for supporting EnQL because of a significant conceptual gap between common English concepts and database representation of such concepts. This paper presents the features of Kaleidoscope, its data model for EnQL, and a mapping to the relational storage.

Journal ArticleDOI
TL;DR: It is proved that the ranking problem for unambiguous context-free languages is NC 1 -reducible to the value problem for algebraic formal power series in noncommuting variables and regular languages are not computable by log-space uniform boolean circuits of polynomial size and depth.

Patent
26 Dec 1991
TL;DR: In this article, a result of retrieval becomes that which is intended by a user by generating an inverted file by executing weighting corresponding to an appearance position in a document of an extracted keyword by a keyboard part.
Abstract: PURPOSE:To constitute the device so that a result of retrieval becomes that which is intended by a user by generating an inverted file by executing weighting corresponding to an appearance position in a document of an extracted keyword by a keyboard extracting part. CONSTITUTION:The device is provided with a document format understanding part 8 for decoding a format of a document registered in a file 1, and an inverted file 2 is generated by executing weighting corresponding to an appearance position in a document for showing the significance in a document of an extracted keyboard by a keyword extracting part 3. Accordingly, the inverted file 2 containing the information regarding to what extent this keyword is significant in the document containing its keyword is generated and can be offered to the retrieval, and the retrieval for obtaining a result by which the document in which the keyword in a retrieval condition has significant semantics can take priority by super-ordinate ranking, etc., can be executed. In such a way, a result of retrieval intended by a user is obtained easily.

Journal ArticleDOI
TL;DR: In this article, a ranking of the paragraphs comprising a full-text document in order of decreasing similarity with a query, where the similarity for each paragraph is determined by the number of keyword stems that it has in common with the query.
Abstract: Full‐text documents are usually searched by means of a Boolean retrieval algorithm that requires the user to specify the logical relationships between the terms of a query In this paper, we summarise the results to date of a continuing programme of research at the University of Sheffield to investigate the use of nearest‐neighbour retrieval algorithms for full‐text searching Given a natural‐language query statement, our methods result in a ranking of the paragraphs comprising a full‐text document in order of decreasing similarity with the query, where the similarity for each paragraph is determined by the number of keyword stems that it has in common with the query A full‐text document test collection has been created to allow systematic tests of retrieval effectiveness to be carried out Experiments with this collection demonstrate that nearest‐neighbour searching provides a means for paragraph‐based access to full‐text documents that is of comparable effectiveness to both Boolean and hypertext searching and that index term weighting schemes which have been developed for the searching of bibliographical databases can also be used to improve the effectiveness of retrieval from full‐text databases A current project is investigating the extent to which a paragraph‐based full‐text retrieval system can be used to augment the explication facilities of an expert system on welding

Journal ArticleDOI
TL;DR: The objective is to introduce a somewhat new methodology called simply ranking and selection (R&S) by way of agronomic examples and to illustrte why R&S may be preferred over TMC for selecting the best treatment.
Abstract: Selecting the best treatment(s) is a common probleme in agricultural research. The usual procedure is to rank all treatments means and to claim the best treatment(s) are the highest (or lowest) subset found not to differ significantly according to a traditional multiple comparison (TMC). A least significant difference (LSD) procedure is applied most frequently. Our objective is to introduce a somewhat new methodology called simply ranking and selection (R&S) by way of agronomic examples and to illustrte why R&S may be preferred over TMC for selecting the best (...)

Book ChapterDOI
01 Jan 1991
TL;DR: It is claimed that the retrieval interface offered by current database management systems is not sufficient for interactive use and the need for empirical studies for the design of future environmental information systems is emphasized.
Abstract: In the design of future enviromental systems, the semantics of the data as well as the kind of queries to those systems have to be considered Enviromental data is frequently uncertain and incomplete Heterogeneous data structures as well as multimedia data have to be managed by the system For interactive queries, the system should allow vague queries and query formulations that are independent of the specific structure of the data and its representation For vague queries and imprecise data, methods developed in information retrieval can be applied Heterogeneous data structures can be handled with concepts from object-oriented database management systems In multimedia information systems, the problem of full integration of the different media is yet unsolved, especially in case the information a user searches for is stored in different media We claim that the retrieval interface offered by current database management systems is not sufficient for interactive use In addition, functions like ranking, browsing, zooming, relevance feedback and cooperative support should be provided Finally, we emphasize the need for empirical studies for the design of future environmental information systems


Proceedings ArticleDOI
02 Dec 1991
TL;DR: The authors show that for an n node tree, one can compute an optimal ranking in O(log/Sup 2/n) time using n/sup 2//log n EREW PRAM processors.
Abstract: This paper places the optimal tree ranking problem in NC. A ranking is a labeling of the nodes with natural numbers such that if nodes u and v have the same label then there exists another node with a greater label on the path between them. An optimal ranking is a ranking in which the largest label assigned to any node is as small as possible among all rankings. An O(n) sequential algorithm is known. Researchers have speculated that the problem is P-complete. The authors show that for an n node tree, one can compute an optimal ranking in O(log/sup 2/n) time using n/sup 2//log n EREW PRAM processors. In fact, their ranking is super critical in that the label assigned to each node is absolutely as small as possible. They achieve their results by introducing and showing that a more general problem, which they call the super critical numbering problem, is in NC. No NC algorithm for the super critical tree ranking problem, approximate or otherwise, was previously known; the only known NC algorithm for optimal tree ranking was an approximate one. >

Journal ArticleDOI
TL;DR: A relation in the performance of three of the most important and frequently performed parallel operations on dynamically reconfiguring machines, namely, the data reduction, the ranking and the ranking is shown.
Abstract: We show a relation in the performance of three of the most important and frequently performed parallel operations on dynamically reconfiguring machines, namely, the data reduction, the ranking and ...

Journal ArticleDOI
TL;DR: A precision inserting operation for chamferless parts under vague positional information by using approximate reasoning to find the search area according to the ranking of search areas.

Journal ArticleDOI
01 Apr 1991
TL;DR: The comparison suggests that the linkage facilities in hypertext do not provide a very cost‐effective mechanism for paragraph‐based retrieval.
Abstract: This paper considers the use of a hypertext system, GUIDE, for paragraph‐based searching in full‐text documents. Searching can be effected in GUIDE using both a conventional, word‐based approach and using the inter‐textual linkage facilities. The effectiveness of these retrieval techniques are evaluated by means of searches of three full‐text documents for which relevance data are available. The results of the searches are compared with those obtained from use of a nearest neighbour retrieval system that has been developed for the ranking of paragraphs within full‐text documents. The comparison suggests that the linkage facilities in hypertext do not provide a very cost‐effective mechanism for paragraph‐based retrieval.

Proceedings ArticleDOI
01 Jan 1991
TL;DR: In this article, the importance of each single eigenvalue by deriving accurate ranking indices is discussed, and the essential or critical modes are defined as the ones that contribute the most to a prespecified quadratic performance index, e.g. output energy function.
Abstract: The assessment of the importance of each single eigenvalue by deriving accurate ranking indices is discussed. The essential or critical modes are defined as the ones that contribute the most to a prespecified quadratic performance index, e.g. output energy function. Specifically, three ranking methods for continuous linear time-invariant systems are proposed. The first method is to use the results of A. Feliachi (IEEE Trans. Power Syst., vol.5, no.3, p.783-7, 1990) and derive a new separation method of the overall contribution index in ranking indices. The second approach, called the weighting factor ranking index, is based on the relative importance of joint eigenvalue contributions to the energy function. Finally, a remainder ranking index based on the least contribution to the energy function when the eigenvalue of concern is ignored is derived. A comparison of all these methods is given. >

Book ChapterDOI
TL;DR: An incremental learning algorithm is described which builds contextual linkages from user sessions so as to optimise the order of reference display and enhance the relevance of reference listings in an IR System.
Abstract: Current technologies have increased both the quantity of information available and the modes of access to it. General tools to provide access should be adaptable to individual contexts and needs. Our research involves the use of learning and adaptive techniques to improve the quality of an IR System. We outline a fully implemented experimental IRS (Okapi), which uses search term weighting and item ranking, based on a probabilistic model. One of its current deficiencies is that users do not benefit from continual use, since the system does not adapt to particular users or their search topics. Here we describe an incremental learning algorithm which builds contextual linkages from user sessions so as to optimise the order of reference display and enhance the relevance of reference listings.

Proceedings ArticleDOI
07 Apr 1991
TL;DR: The authors show that a query concerning the IS-A relation is reduced to the problem to find the set of upper and/or lower bounds of a certain element of the DOT algebra and the answer to a query can be expressed by a regular expression, which is obtained by constructing an automaton from the query.
Abstract: Consideration is given to the query processing problem for a class of knowledge-based systems consisting of object names and labels. Each knowledge-based system in this class represents an IS-A relation and inheritance and its two important features are: no distinction between type and entity, and the capability to represent virtual objects. In order to discuss the query processing problem for such a system, a new algebra called DOT algebra is introduced. The authors show that a query concerning the IS-A relation is reduced to the problem to find the set of upper and/or lower bounds of a certain element of the DOT algebra. The answer to a query can be expressed by a regular expression, which is obtained by constructing an automaton from the query. They also show an implementation plan of the proposed knowledge-based system in a distributed system environment. >

Journal ArticleDOI
TL;DR: A possible evaluation framework taxonomy is presented in this paper, based on three root definitions derived with the "Checklands Soft System Methodology" and divided into measurement, ranking and taxonomy.
Abstract: A possible evaluation framework taxonomy is presented in this paper. Taxonomy is based on three root definitions derived with the "Checklands Soft System Methodology". According to them we distinguish three main evaluation framework categories: measurement, ranking and taxonomy.

Patent
29 Jul 1991
TL;DR: In this paper, a tripartite ranking relation varying relatively with each other centering around a speaker is automatically determined by using a nominal list containing a ranking relation deciding key, an intra-same party ranking relation order table, and an inter-party ranking relation ordering table.
Abstract: PURPOSE:To automatically decide the tripartite ranking relation varying relatively with each other centering around a speaker by using a nominal list containing a ranking relation deciding key, an intra-same party ranking relation order table, and an inter-party ranking relation order table. CONSTITUTION:The conversational sentences inputted via an input part 1 are analyzed by a sentence structure analyzing part 2, and the conversation speakers are extracted and specified by a speaker extracting part 3. Then the relative relation (ranking relation) among those speakers extracted by the part 3 is analyzed by a relative relation analyzing part 4 based on the social knowledge, etc., and with use of a nominal list 5 containing a ranking relation deciding key, an intra-same party ranking relation order table, and an inter-party ranking relation order table. These analyzed results are outputted via an output part 8. Thus it is possible to automatically decide the tripartite relative relation varying relatively with each other centering around a speaker.