scispace - formally typeset
Search or ask a question

Showing papers on "Ranking (information retrieval) published in 1993"


Proceedings ArticleDOI
01 Jul 1993
TL;DR: This paper presents a probabilistic query expansion model based on a similarity thesaurus which was constructed automatically and results in a notable improvement in the retrieval effectiveness when measured using both recall-precision and usefulness.
Abstract: Query expansion methods have been studied for a long time - with debatable success in many instances. In this paper we present a probabilistic query expansion model based on a similarity thesaurus which was constructed automatically. A similarity thesaurus reflects domain knowledge about the particular collection from which it is constructed. We address the two important issues with query expansion: the selection and the weighting of additional search terms. In contrast to earlier methods, our queries are expanded by adding those terms that are most similar to the concept of the query, rather than selecting terms that are similar to the query terms. Our experiments show that this kind of query expansion results in a notable improvement in the retrieval effectiveness when measured using both recall-precision and usefulness.

773 citations


Journal ArticleDOI
TL;DR: In this article, a knowledge-based extended Boolean model (kb•ebm) is proposed to evaluate weighted queries and documents effectively, and avoids the problems of the previous methods.
Abstract: There have been several document ranking methods to calculate the conceptual distance or closeness between a Boolean query and a document. Though they provide good retrieval effectiveness in many cases, they do not support effective weighting schemes for queries and documents and also have several problems resulting from inappropriate evaluation of Boolean operators. We propose a new method called Knowledge‐Based Extended Boolean Model (kb‐ebm) in which Salton's extended Boolean model is incorporated. kb‐ebm evaluates weighted queries and documents effectively, and avoids the problems of the previous methods. kb‐ebm provides high quality document rankings by using term dependence information from is‐a hierarchies The performance experiments show that the proposed method closely simulates human behaviour.

266 citations


Patent
05 Nov 1993
TL;DR: In this paper, a procedure for determining text relevancy is proposed, which can be used to enhance the retrieval of text documents by search queries and can help a user intelligently and rapidly locate information found in large textual databases.
Abstract: This is a procedure for determining text relevancy and can be used to enhance the retrieval of text documents by search queries. This system helps a user intelligently and rapidly locate information found in large textual databases. A first embodiment determines the common meanings between each word in the query and each word in the document. Then an adjustment is made for words in the query that are not in the documents. Further, weights are calculated for both the semantic components in the query and the semantic components in the documents. These weights are multiplied together, and their products are subsequently added to one another to determine a real value number (similarity coefficient) for each document. Finally, the documents are sorted in sequential order according to their real value number from largest to smallest value. Another, embodiment is for routing documents to topics/headings (sometimes referred to as filtering). Here, the importance of each word in both topics and documents are calculated. Then, the real value number (similarity coefficient) for each document is determined. Then each document is routed one at a time according to their respective real value numbers to one or more topics. Finally, once the documents are located with their topics, the documents can be sorted. This system can be used to search and route all kinds of document collections, such as collections of legal documents, medical documents, news stories, and patents.

215 citations


Patent
Ogawa Yasutsugu1
17 Nov 1993
TL;DR: A document retrieval system includes a query converter for converting the retrieval condition designated by the user into a query which has a predetermined normal form in which keywords and at least one type of logical operation out of logical operations AND, OR and NOT are connected, a bibliographical information indicator for indicating a relation between each of said registered documents and keywords and a keyword connection table having relationship values, each relationship values representing the degree of relationship between each two keywords as discussed by the authors.
Abstract: A document retrieval system retrieves one or a plurality of registered documents from a document database responsive to retrieval conditions designated by a user. The document retrieval system includes a query converter for converting the retrieval condition designated by the user into a query which has a predetermined normal form in which keywords and at least one type of logical operation out of logical operations AND, OR and NOT are connected, a bibliographical information indicator for indicating a relation between each of said registered documents and keywords and a keyword connection table having relationship values, each of the relationship values representing the degree of relationship between each two keywords. The document retrieval system also includes a selector for referring the inverted file and the keyword connection so as to select one or a plurality of registered documents which satisfy the query, and an outputting circuit for outputting one or a plurality of registered documents selected by the selecting means.

137 citations


Journal ArticleDOI
TL;DR: A blackboard-based document management system that uses a neural network spreading-activation algorithm which lets users traverse multiple thesauri is discussed, and the system's query formation; the retrieving, ranking and selection of documents; and thesaurus activation are described.
Abstract: A blackboard-based document management system that uses a neural network spreading-activation algorithm which lets users traverse multiple thesauri is discussed. Guided by heuristics, the algorithm activates related terms in the thesauri and converges of the most pertinent concepts. The system provides two control modes: a browsing module and an activation module that determine the sequence of operations. With the browsing module, users have full control over which knowledge sources to browse and what terms to select. The system's query formation; the retrieving, ranking and selection of documents; and thesaurus activation are described. >

106 citations


Patent
Graham James Mark1
07 Dec 1993
TL;DR: In this article, the authors present a method and apparatus utilizing automated semantic pattern recognition whereby a user of a manual enters a query regarding information in the text of the manual and the invention displays the locations in which information responsive to the query is located.
Abstract: A method and apparatus utilizing automated semantic pattern recognition whereby a user of a manual enters a query regarding information in the text of the manual and the invention displays the locations in which information responsive to the query is located. The present invention includes a computer that stores a data structure representing the natural language text of a manual, the structure being a tree structure having nodes, wherein each node represents information from associated locations of the text. The nodes at higher levels of the tree represent general categories of information. Nodes at succeedingly lower levels of the tree represent succeedingly more specific categorical subsets of the general categories of information represented at the higher levels by ancestors of the lower level nodes. The query is formatted into a query data structure that is similar in format to the node structure representing the text. The query data structure is compared to each node in the tree. If the degree of similarity between the query and the node exceeds a predetermined threshold value after taking into account the degree of similarity between the ancestors of the node and the query, then the locations associated with the node are displayed on a video monitor as locations containing information responsive to the query.

80 citations


Proceedings Article
01 Jan 1993
Abstract: The evaluation of 6 ranking algorithms for the ranking of terms for query expansion is discussed within the context of an investigation of interactive query expansion and relevance feedback in a real operational environment. The yardstick for the evaluation was provided by the user relevance judgements on the lists of the candidate terms for query expansion. The evaluation focuses on the similarities in the performance of the different algorithms and how the algorithms with similar performance treat terms.

79 citations


Proceedings Article
01 Jan 1993
TL;DR: This work continues work in the TREC 2 environment, performing both routing and ad-hoc experiments, and extends investigations into combining global similarities, giving an overall indication of how a document matches a query, with local similarities identifying a smaller the query.
Abstract: The Smart information retreival project emphasizes completely approaches to the understanding and retrieval of large quantities of text. We continue our work in the TREC 2 environment, performing both routing and ad-hoc experiments. The ad-hoc work extends our investigations into combining global similarities, giving an overall indication of how a document matches a query, with local similarities identifying a smaller the query. The performance of the ad-hoc runs is good, but it is clear we are not yet taking advantage advantage of the available local information. Our routing experiments use conventional relevance feedback approaches to routing, but with a much grater degree of query expansion than was done in TREC-1. The lenghts of a query vector is increased by a factor of 5 to 10 by adding terms found in previously seen relevant documents. This approach improves effectiveness by 30-40 % over the original query

75 citations


Proceedings ArticleDOI
01 Jul 1993
TL;DR: The evaluation of 6 ranking algorithms for the ranking of terms for query expansion is discussed within the context of an investigation of interactive query expansion and relevance feedback in a real operational environment.
Abstract: The evaluation of 6 ranking algorithms for the ranking of terms for query expansion is discussed within the context of an investigation of interactive query expansion and relevance feedback in a real operational environment. The yardstick for the evaluation was provided by the user relevance judgements on the lists of the candidate terms for query expansion. The evaluation focuses on the similarities in the performance of the different algorithms and how the algorithms with similar performance treat terms.

71 citations


Proceedings ArticleDOI
01 Jul 1993
TL;DR: It is discovered that the knowledge about relevance among queries and documents can be used to obtain empirical connections between query terms and the canonical concepts which are used for indexing the content of documents.
Abstract: This paper describes a unique example-based mapping method for document retrieval. We discovered that the knowledge about relevance among queries and documents can be used to obtain empirical connections between query terms and the canonical concepts which are used for indexing the content of documents. These connections do not depend on whether there are shared terms among the queries and documents; therefore, they are especially effective for a mapping from queries to the documents where the concepts are relevant but the terms used by article authors happen to be different from the terms of database users. We employ a Linear Least Squares Fit (LLSF) technique to compute such connections from a collection of queries and documents where the relevance is assigned by humans, and then use these connections in the retrieval of documents where the relevance is unknown. We tested this method on both retrieval and indexing with a set of MEDLINE documents which has been used by other information retrieval systems for evaluations. The effectiveness of the LLSF mapping and the significant improvement over alternative approaches was evident in the tests.

60 citations


Journal ArticleDOI
TL;DR: The results show that the proposed heuristic methods perform better than the method proposed by Smeaton and van Rijsbergen in terms of retrieval accuracy, which is used to indicate the percentage of top documents obtained after a number of disk accesses.
Abstract: Most commercial text retrieval systems employ inverted files to improve retrieval speed. This paper concerns with the implementations of document ranking based on inverted files. Three heuristic methods for implementing the tf × idf weighting strategy, where tf stands for term frequency and idf stands for inverse document frequency, are studied. The basic idea of the heuristic methods is to process the query terms in an order so that as many top documents as possible can be identified without processing all of the query terms. The first heuristic was proposed by Smeaton and van Rijsbergen and it serves as the basis for comparison with the other two heuristic methods proposed in this paper. These three heuristics are evaluated and compared by experimental runs based on the number of disk accesses required for partial document ranking , in which the returned documents contain some, but not necessarily all, of the requested number of top documents. The results show that the proposed heuristic methods perform better than the method proposed by Smeaton and van Rijsbergen in terms of retrieval accuracy , which is used to indicate the percentage of top documents obtained after a number of disk accesses. For total document ranking , in which all of the requested number of top documents are guaranteed to be returned, no optimization techniques studied so far can lead to substantial performance gain. To realize the advantage of the proposed heuristics, two methods for estimating the retrieval accuracy are studied. Their accuracies and processing costs are compared. All the experimental runs are based on four test collections made available with the SMART system.

Proceedings ArticleDOI
01 Dec 1993
TL;DR: Experiments show that all three of f, DFBB, and dynamic query ordering help to improve the perfor-mance of the authors' h4Q0 algorithm, and compared hexisting methods which use A* and static query order-ing.
Abstract: Ahmet Cosar, Ee-Peng Lim, Jaideep SrivastavaDepartmentof Computer ScienceUniversity of MinnesotaMinneapolis, MN 55455AbstractIn certain database applications such as deductivedatabases, batch query processing, and recursive queryprocessing etc., a single query can be transformed into aset ofclosely related database queries. Great benefits canbe obtained by executing a group of related queries all to-gether in a single unijied multi-plan instead of executingeach query separately. In order to achieve this, MultipleQuery Optimization (MQO) identifies common task(s)(e.g. common subezpressions, joins, etc.) among a setof query plans and creates a single unified plan (multi-plan) which can be executed to obtain the required out-puts forall queries at once. In this paper, anew heuris-tic function (f=), dynamic query ordering heuristics,and Depth-First Branch-and-Bound (DFBB) are de-jined and experimentally evaluated, and compared withexisting methods which use A* and static query order-ing. Our experiments show that all three of f., DFBB,and dynamic query ordering help to improve the perfor-mance of our h4Q0 algorithm.1 IntroductionThe objective of multiple query optimization (MQO)is to exploit the benefits of sharing common tasks inthe access plans for a group of queries. In certaindatabase applications, e.g. deductive query processing,batch query processing and recursive query processing,often a group of queries are submitted together to theDBMS for execution. The traditional approach of pro-cessing queries one at a time will be inefficient espe-cially when there is a high number of queries sharingPermission to copy without fee all or part of this materisi isgmnted provided that the copies m. not mad. or distributed fordirect commercial advantage, tha ACM copyright notica and thatitla of tha publication and ita data appaar, and

Journal ArticleDOI
TL;DR: In this article, the saddle point search and the conformational interconversion paths are discussed with emphasis on the search of low-energy regions, and future developments in saddle point and conformational space search are discussed.
Abstract: Algorithms of conformational space search, namely, methods of ranking conformational isomers of flexible molecules, are discussed with emphasis on the search of low-energy regions. Perspectives on future developments in the saddle point search and the conformational interconversion paths are also mentioned.

Journal ArticleDOI
TL;DR: A real and large-scale application of the Fuzzy Decision-Making Method is presented to decide what sort of computer system for research and education should be renewed in the department of computer engineering.

George E. Heidorn1
01 Jan 1993
TL;DR: This brief paper describes a metric that can be easily computed during either bottom-up or top-down construction of a parse tree for ranking the desirability of alternative parses and discusses the results of using this metric with the EPISTLE system being developed at IBM Research.
Abstract: This brief paper, which is itself an extended abstract for a forthcoming paper, describes a metric that can be easily computed during either bottom-up or top-down construction of a parse tree for ranking the desirability of alternative parses. In its simplest form, the metric tends to prefer trees in which constituents are pushed as far down as possible, but by appropriate modification of a constant in the formula other behavior can be obtained also. This paper includes in introduction to the EPISTLE system being developed at IBM Research and a discussion of the results of using this metric with that system.

Proceedings ArticleDOI
01 Aug 1993
TL;DR: This paper discusses the structured knowledge representation model designed for the FLEXICON system serving both as an internal knowledge representation scheme, in conjunction with statistical ranking, and as an external representation used to summarize legal text for rapid evaluation of the search results.
Abstract: The FLEXICON system was designed to provide legal professionals with an effective and easy-to-use legal text management tool. This paper discusses the structured knowledge representation model designed for the FLEXICON system serving both as an internal knowledge representation scheme, in conjunction with statistical ranking, and as an external representation used to summarize legal text for rapid evaluation of the search results. The model is evaluated and compared to alternative information retrieval models. Experimental test data is presented to demonstrate the model's retrieval effectiveness in comparison to boolean search.

Journal ArticleDOI
TL;DR: The conclusion is that a more realistic and complete view of IR is obtained if the authors do not consider documents and queries to be elements of the same space, which implies that certain restrictions usually applied in the design of an IR system are obviated.
Abstract: Many authors, who adopt the vector space model, take the view that documents, terms, queries, etc., are all elements within the same (conceptual) space. This view seems to be a natural one, given that documents and queries have the same vector notation. We show, however, that the structure of the query space can be very different from that of the document space. To this end, concepts like preference, similarity, term independence, and linearity, both in the document space and in the query space, are discussed. Our conclusion is that a more realistic and complete view of IR is obtained if we do not consider documents and queries to be elements of the same space

Book ChapterDOI
26 Oct 1993
TL;DR: The VisDB system, which tries to provide visual support not only for the query specification process but also for evaluating query results and, thereafter, refining the query accordingly, introduces the notion of ‘approximate joins’ which allow the user to find data items that only approximately fulfill join conditions.
Abstract: In this paper. we present ideas how visualization technology can be used to improve the difficult process of querying very large databases. With our VisDB system, we try to provide visual support not only for the query specification process. but also for evaluating query results and. thereafter, refining the query accordingly. The main idea of our system is to represent as many data items as possible by the pixels of the display device. By arranging and coloring the pixels according to the relevance for the query, the user gets a visual impression of the resulting data set and of its relevance for the query. Using an interactive query interface, the user may change the query dynamically and receives immediate feedback by the visual representation of the resulting data set. By using multiple windows for different parts of the query, the user gets visual feedback for each part of the query and, therefore, may easier understand the overall result. To support complex queries, we introduce the notion of ‘approximate joins’ which allow the user to find data items that only approximately fulfill join conditions. We also present ideas how our technique may be extended to support the interoperation of heterogeneous databases. Finally, we discuss the performance problems that are caused by interfacing to existing database systems and present ideas to solve these problems by using data structures supporting a multidimensional search of the database.


Book ChapterDOI
01 Jan 1993
TL;DR: Logics is used to model relevance in Information Retrieval: a document is relevant to a query if a formula q representing the query can be inferred from a formula d representing the document.
Abstract: We use Logics to model relevance in Information Retrieval: a document is relevant to a query if a formula q representing the query can be inferred from a formula d representing the document. Thus to infer is to retrieve, but because of the nature of aboutness often the inference is uncenain. Using a framework based on Situation Theory, the representation of documents and queries, inference, semantic and pragmatic aspects of information can be modelled formally.


Book ChapterDOI
David Bawden1
01 Jan 1993
TL;DR: The concept of molecular dissimilarity is introduced, and shown to be a powerful complement to the well-established notion of molecular similarity, which provides a quantitative assessment of structural variation and diversity.
Abstract: The concept of molecular dissimilarity is introduced, and shown to be a powerful complement to the well-established notion of molecular similarity. It provides a quantitative assessment of structural variation and diversity. Applications within chemical information systems are discussed. These include ranking of search output, selection of representative sets of structures, file screening, data analysis, and creativity stimulation.

Journal ArticleDOI
01 Nov 1993
TL;DR: The basic structure of an intelligent inquiring system is described and the use of a MAM operator to provide a ranking of the relevant items in the information base is discussed.
Abstract: The basic structure of an intelligent inquiring system is described. We discuss the process of generalization of requirements based on the use of fuzzy subsets. The concept of importance modification is introduced. A description of the construction of the envelope of potentially relevant items is presented. The process of criteria aggregation based on MOM and MAM operators is investigated. We discuss the use of a MAM operator to provide a ranking of the relevant items in the information base.


Proceedings ArticleDOI
21 Mar 1993
TL;DR: The results measure the value of processing tailored for different query styles, use of syntactic tags to produce search phrases, recognition and application of generic concepts, and automatic concept extraction based on interword associations in a large text base.
Abstract: Natural language experiments in information retrieval have often been inconclusive due to the lack of large text bases with associated queries and relevance judgments. This paper describes experiments in incremental query processing and indexing with the INQUERY information retrieval system on the TIPSTER queries and document collection. The results measure the value of processing tailored for different query styles, use of syntactic tags to produce search phrases, recognition and application of generic concepts, and automatic concept extraction based on interword associations in a large text base.

Book ChapterDOI
03 Aug 1993
TL;DR: This paper will discuss the development and utilization of three packages for the IBM PC: a decision support system (DSS) called ’COMBI’ for multicriteria ranking, a hierarchical hypertext system called �’HHS” for categriteria evaluation, and a DSS for hierarchical design, called ‘SED’.
Abstract: This paper describes the hierarchical components of humancomputer systems (HCS) We will discuss the development and utilization of three packages for the IBM PC: a decision support system (DSS) called ’COMBI’ for multicriteria ranking, a hierarchical hypertext system called ’HHS’ for multicriteria evaluation, and a DSS for hierarchical design, called ’SED’ The study is based on an analysis of HCS components (information, user, and techniques) and major operations (development, representation, correction, learning and using)


Proceedings Article
01 Jan 1993
TL;DR: The novel ranking algorithm of the Coach Metathesaurus browser is presented which is a major module of the coach expert search refinement program and can assist in creating a list of candidate terms useful in augmenting a suboptimal Grateful Med search of MEDLINE.
Abstract: This paper presents the novel ranking algorithm of the Coach Metathesaurus browser which is a major module of the Coach expert search refinement program. An example shows how the ranking algorithm can assist in creating a list of candidate terms useful in augmenting a suboptimal Grateful Med search of MEDLINE.

Proceedings Article
01 Jan 1993
TL;DR: In this paper, the first polynomial-time algorithm to find an optimal edge ranking for trees of constant degree was presented, where the edge ranking is defined as a labeling of the edges using positive integers such that all paths between two edges with the same label contain an intermediate edge with a higher label.
Abstract: An edge ranking of a graph is a labeling of the edges using positive integers such that all paths between two edges with the same label contain an intermediate edge with a higher label. An edge ranking isoptimal if the highest label used is as small as possible. The edge-ranking problem has applications in scheduling the manufacture of complex multipart products; it is equivalent to finding the minimum height edge-separator tree. In this paper we give the first polynomial-time algorithm to find anoptimal edge ranking of a tree, placing the problem inP. An interesting feature of the algorithm is an unusual greedy procedure that allows us to narrow an exponential search space down to a polynomial search space containing an optimal solution. AnNC algorithm is presented that finds an optimal edge ranking for trees of constant degree. We also prove that a natural decision problem emerging from our sequential algorithm isP-complete.

Proceedings ArticleDOI
06 Apr 1993
TL;DR: A new indexing method for Japanese text databases using the simpie keyword string, which describes the syntactic and semantic characteristics of a word, enables a precise keyword assignment and simplifies dictionary maintenance.
Abstract: This paper describes a new indexing method for Japanese text databases using the simpie keyword string. A compound word is treated as a string of simple words, which are the smallest units in Japanese grammar which still maintain their meanings. As a result, retrieved texts can be ranked, according to the similarity of their meaning and the query, without using a control vocabulary or thesaurus. For automatic indexing, the newly introduced keywordfeature, which describes the syntactic and semantic characteristics of a word, enables a precise keyword assignment and simplifies dictionary maintenance.