scispace - formally typeset
Search or ask a question

Showing papers on "Ranking (information retrieval) published in 1996"


Patent
14 Aug 1996
TL;DR: In this paper, the user enters a query and the system processes the query to generate an alternative representation, which includes conceptual-level abstraction and representations based on complex nominals (CNs), proper nouns (PNs), single terms, text structure, and logical make-up of the query, including mandatory terms.
Abstract: Techniques for generating sophisticated representations of the contents of both queries and documents in a retrieval system by using natural language processing (NLP) techniques to represent, index, and retrieve texts at the multiple levels (e.g., the morphological, lexical, syntactic, semantic, discourse, and pragmatic levels) at which humans construe meaning in writing. The user enters a query and the system processes the query to generate an alternative representation, which includes conceptual-level abstraction and representations based on complex nominals (CNs), proper nouns (PNs), single terms, text structure, and logical make-up of the query, including mandatory terms. After processing the query, the system displays query information to the user, indicating the system's interpretation and representation of the content of the query. The user is then given an opportunity to provide input, in response to which the system modifies the alternative representation of the query. Once the user has provided desired input, the possibly modified representation of the query is matched to the relevant document database, and measures of relevance generated for the documents. A set of documents is presented to the user, who is given an opportunity to select some or all of the documents, typically on the basis of such documents being of particular relevance. The user then initiates the generation of a query representation based on the alternative representations of the selected document(s).

883 citations


Patent
14 Aug 1996
TL;DR: In this paper, a document retrieval system (20) where a user can enter a query, including a natural query, in a desired one of a plurality of supported languages, and retrieve documents from a database (60) that includes documents in at least one other language of the supported languages.
Abstract: A document retrieval system (20) where a user can enter a query, including a natural query, in a desired one of a plurality of supported languages, and retrieve documents from a database (60) that includes documents in at least one other language of the plurality of supported languages. The user need not have any knowledge of the other languages. Each document in the database is subjected to a set of processing steps to generate a language-independent conceptual representation of the subject content of the document. The query is also subjected to a (possibly different) set of processing steps to generate a language-independent conceptual representation of the subject content of the query. Documents are matched to queries based on the conceptual-level contents of the document and query, and, optionally, on the basis of the term-based representation.

667 citations


Patent
14 Aug 1996
TL;DR: In this article, the user enters a query and the system processes the query to generate an alternative representation, which includes conceptual-level abstraction and representations based on complex nominals (CNs), proper nouns (PNs), single terms, text structure, and logical make-up of the query, including mandatory terms.
Abstract: Techniques for generating sophisticated representations of the contents of both queries and documents in a retrieval system by using natural language processing (NLP) techniques to represent, index, and retrieve texts at the multiple levels (e.g., the morphological, lexical, syntactic, semantic, discourse, and pragmatic levels) at which humans construe meaning in writing. The user enters a query and the system processes the query to generate an alternative representation, which includes conceptual-level abstraction and representations based on complex nominals (CNs), proper nouns (PNs), single terms, text structure, and logical make-up of the query, including mandatory terms. After processing the query, the system displays query information to the user, indicating the system's interpretation and representation of the content of the query. The user is then given an opportunity to provide input, in response to which the system modifies the alternative representation of the query. Once the user has provided desired input, the possibly modified representation of the query is matched to the relevant document database, and measures of relevance generated for the documents. A set of documents is presented to the user, who is given an opportunity to select some or all of the documents, typically on the basis of such documents being of particular relevance. The user then initiates the generation of a query representation based on the alternative representations of the selected document(s).

605 citations


Patent
05 Jul 1996
TL;DR: In this article, a method and apparatus for generating responses to queries to a document retrieval system is presented, which responds to a specific request for information by locating and ranking portions of text that may contain the information sought.
Abstract: The present invention relates to a method and apparatus for generating responses to queries to a document retrieval system. The system responds to a specific request for information by locating and ranking portions of text that may contain the information sought. It locates small relevant passages of text (called "hit passages") and ranks them according to an estimate of the degree to which they correspond to the information sought. The system minimizes the number of these hit passages that need to be examined before an information seeker has either found the desired information or can safely conclude that the information sought is not in the collection of texts. A relaxation ranking mechanism is provided to accommodate paraphrase variations that occur between the description of the information sought and the content of the text passages that may constitute suitable answers, by retrieving phrases that are dissimilar to the query phrase to different degrees according to a predefined set of rules, and penalizing the retrieved phrases based upon the degree of this dissimilarity, thus providing the user with a priority organized query hit list.

411 citations


Patent
10 May 1996
TL;DR: In this paper, a system, method, and various software products provide improved information retrieval performance from multiple document databases by retrieving from the multiple document database in response to a user query, a set of documents that globally satisfy the query, even though each database maintains independent document indices, term frequency information, and scoring functions.
Abstract: A system, method, and various software products provide improved information retrieval performance from multiple document databases by retrieving from the multiple document databases in response to a user query, a set of documents that globally satisfy the query, even though each database maintains independent document indices, term frequency information, and scoring functions. The global search result approximates, to any desired degree of error, the search results that would have been obtained had the multiple document databases been globally indexed. This is done by sharing at the time the query is executed, a small subset of information about the local relative significance of terms related to the user's query, and from this information, determining a global relative significance of such terms. From the global relative significance, the individual document databases determine their query results, which are then merged into a global set of documents satisfying the query. The shared local relative significance information may be the inverse document frequency of each of a number of terms related to the query, or it may be the total frequency of each of such terms. The global relative significance may correspondingly be a global inverse document frequency, or a global term frequency from which the global inverse document frequency is calculated.

385 citations


Patent
12 Nov 1996
TL;DR: In this article, a plurality of text search engines based on substantially different computational searching techniques are combined into a single list of information items, and a ranking process ranks the items in the combined list by utilizing information item ordering data also received from each of the search engines as to the relevance of the information items output by the search engine to the user's request.
Abstract: An information retrieval system is disclosed, wherein the system includes a plurality of text search engines based on substantially different computational searching techniques. By activating each search engine with input from a user information request, the output from each of the search engines is combined into a single list of information items. A ranking process ranks the information items in the combined list by utilizing information item ordering data also received from each of the search engines as to the relevance of the information items output by the search engine to the user's request. Thus, by providing higher rankings to those information items determined to be most relevant to the user's request by each of (or a majority of) the search engines, these information items have been found to be highly consistent in satisfying the user's request for information.

281 citations


Proceedings ArticleDOI
01 Jun 1996
TL;DR: An initial investigation into the use of a 2-step query optimization strategy is described and hybrid-shipping is shown to at least match the best of the two "pure" policies, and in some situations to perform better than both.
Abstract: The construction of high-performance database systems that combine the best aspects of the relational and object-oriented approaches requires the design of client-server architectures that can fully exploit client and server resources in a flexible manner. The two predominant paradigms for client-server query execution are data-shipping and query-shipping We first define these policies in terms of the restrictions they place on operator site selection during query optimization. We then investigate the performance tradeoffs between them for bulk query processing. While each strategy has advantages, neither one on its own is efficient across a wide range of circumstances. We describe and evaluate a more flexible policy called hybrid-shipping, which can execute queries at clients, servers, or any combination of the two. Hybrid-shipping is shown to at least match the best of the two "pure" policies, and in some situations, to perform better than both. The implementation of hybrid-shipping raises a number of difficult problems for query optimization. We describe an initial investigation into the use of a 2-step query optimization strategy as a way of addressing these issues.

193 citations


Patent
30 Sep 1996
TL;DR: In this paper, a method and system for retrieving information in response to a query by a user is presented, which includes the steps of receiving a signal s having a value corresponding to a relevance-ranking algorithm score of a retrieved document, receiving a signals q and v having a values corresponding to the number of words in the query and a signal v corresponding to coordination level of the retrieved document and query (i.e., the degree of overlap between the document terms and the query terms), and generating an adjusted score s1 dependent on the signal s, the signal q and the
Abstract: A method and system for retrieving information in response to a query by a user. The method includes the steps of receiving a signal s having a value corresponding to a relevance-ranking algorithm score of a retrieved document, receiving a signal q having a value corresponding to the number of words in the query and a signal v having a value corresponding to the coordination level of the retrieved document and query (i.e., the degree of overlap between the document terms and the query terms), and generating an adjusted score s1 dependent on the signal s, the signal q and the signal v. The adjusted score s1 takes the coordination level into account for small values of q and gradually decreases the importance of the coordination level as q increases. The system of this invention includes a computer-based system for carrying out the method of this invention.

182 citations


Journal Article
TL;DR: ProFusion, a meta search engine, sends user queries to multiple underlying search engines in parallel, retrieves and merges the resulting URLs, and identifies and removes duplicates and creates one relevance-ranked list.
Abstract: The explosive growth of the World Wide Web, and the resulting information overload, has led to a mini-explosion in World Wide Web search engines. This mini-explosion, in turn, led to the development of ProFusion, a meta search engine. Educators, like other users, do not have the time to evaluate multiple search engines to knowledgeably select the best for their uses. Nor do they have the time to submit each query to multiple search engines and wade through the resulting flood of good information, duplicated information, irrelevant information, and missing documents. ProFusion sends user queries to multiple underlying search engines in parallel, retrieves and merges the resulting URLs. It identifies and removes duplicates and creates one relevance-ranked list. If desired, the actual documents can be pre-fetched to remove yet more duplicates and broken links. ProFusion's performance has been compared to the individual search engines and other meta searchers, demonstrating its ability to retrieve more relevant information and present fewer duplicates pages. The system can automatically analyze queries to identify its topic(s) and, based on that analysis, select the most appropriate search engines for the query.

174 citations


Proceedings Article
02 Aug 1996
TL;DR: FACT takes a query-centered view of knowledge discovery, in which a discovery request is viewed as a query over the implicit set of possible results supported by a collection of documents, and where background knowledge is used to specify constraints on the desired results of this query process.
Abstract: This paper describes the FACT system for knowledge discovery from text. It discovers associations - patterns of co-occurrence -amongst keywords labeling the items in a collection of textual documents. In addition, FACT is able to use background knowledge about the keywords labeling the documents in its discovery process. FACT takes a query-centered view of knowledge discovery, in which a discovery request is viewed as a query over the implicit set of possible results supported by a collection of documents, and where background knowledge is used to specify constraints on the desired results of this query process. Execution of a knowledge-discovery query is structured so that these background-knowledge constraints can be exploited in the search for possible results. Finally, rather than requiring a user to specify an explicit query expression in the knowledge-discovery query language, FACT presents the user with a simple-to-use graphical interface to the query language, with the language providing a well-defined semantics for the discovery actions performed by a user through the interface.

131 citations


Proceedings ArticleDOI
26 Feb 1996
TL;DR: Four keyword-based search and ranking algorithms for locating relevant WWW pages with respect to user queries, including Boolean Spreading Activation, which extends the notion of word occurrence in the Boolean retrieval model by propagating the occurrence of a query word in a page to other pages linked to it.
Abstract: Applying information retrieval techniques to the World Wide Web (WWW) environment is a challenge, mostly because of its hypertext/hypermedia nature and the richness of the meta-information it provides. We present four keyword-based search and ranking algorithms for locating relevant WWW pages with respect to user queries. The first algorithm, Boolean Spreading Activation, extends the notion of word occurrence in the Boolean retrieval model by propagating the occurrence of a query word in a page to other pages linked to it. The second algorithm, Most-cited, uses the number of citing hyperlinks between potentially relevant WWW pages to increase the relevance scores of the referenced pages over the referencing pages. The third algorithm, TFxIDF vector space model, is based on word distribution statistics. The last algorithm, Vector Spreading Activation, combines TFxIDF with the spreading activation model. We conducted an experiment to evaluate the retrieval effectiveness of these algorithms. From the results of the experiment, we draw conclusions regarding the nature of the WWW environment with respect to document ranking strategies.

Proceedings ArticleDOI
01 Jun 1996
TL;DR: In this article, the authors investigate how to optimize the processing of queries over multimedia repositories and define an execution space that is search-minimal, i.e., the set of indexes searched is minimal.
Abstract: Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A selection on these attributes will typically produce not just a set of objects, as in the traditional relational query model (filtering), but also a grade of match associated with each object, indicating how well the object matches the selection condition (ranking). Also, multimedia repositories may allow access to the attributes of each object only through indexes. We investigate how to optimize the processing of queries over multimedia repositories. A key issue is the choice of the indexes used to search the repository. We define an execution space that is search-minimal, i.e., the set of indexes searched is minimal. Although the general problem of picking an optimal plan in the search-minimal execution space is NP-hard, we solve the problem efficiently when the predicates in the query are independent. We also show that the problem of optimizing queries that ask for a few top-ranked objects can be viewed, in many cases, as that of evaluating selection conditions. Thus, both problems can be viewed together as an extended filtering problem.

Journal ArticleDOI
TL;DR: In this paper, a case study of a hydro-ecological management problem is analyzed by means of multi criterion decision-making (MCDM) techniques, including preference ranking organization (PROMETHEE), geometrical analysis for interactive assistance (GAIA), multi criterion Q-analysis (MCQA-I, II, III), compromise programming (CP), and cooperative game theory (CGT).

Journal ArticleDOI
TL;DR: The architecture and associated algorithms for generating the supported subsuming queries and filters for Boolean queries in one rich front end language are introduced and it is shown that generated subsumed queries return a minimal number of documents.
Abstract: Searching over heterogeneous information sources is difficult because of the nonuniform query languages. Our approach is to allow a user to compose Boolean queries in one rich front end language. For each user query and target source, we transform the user query into a subsuming query that can be supported by the source but that may return extra documents. The results are then processed by a filter query to yield the correct final result. We introduce the architecture and associated algorithms for generating the supported subsuming queries and filters. We show that generated subsuming queries return a minimal number of documents; we also discuss how minimal cost filters can be obtained. We have implemented prototype versions of these algorithms and demonstrated them on heterogeneous Boolean systems.

Patent
10 Apr 1996
TL;DR: In this article, a system and method provides for indexing and retrieval of stored documents using a decomposition of words in the documents in n-grams, or linear word subunits.
Abstract: A system and method provides for indexing and retrieval of stored documents using a decomposition of words in the documents in n-grams, or linear word subunits The documents are indexed as pages in a number of banks For each bank there is a bank index The individual n-grams are identified for each page are stored in the bank index Each bank index further contains an entry map that indicates whether a given n-gram is present in any of the pages of the bank, and then provides an index to a page map that further indicates which page in the bank contains the n-gram When a search query is input, the query words are decomposed into their n-grams The query word n-grams are compared first with entry maps to determine if the query word n-grams appear on any page in the bank If so, the associated page map is traversed to determine which page in the bank contains the query word n-grams The n-grams on the page are compared with the query word n-grams to determine the presence of an match therebetween Matching pages are flagged When all pages in all banks have been processed, the pages are consolidated with respect to the documents to which they belong, resulting in a list of documents that match the search query The results are displayed to a user

Patent
09 Aug 1996
TL;DR: In this article, the index is sequentially searched to locate records qualified by a query having terms and operators, the terms correspond to index entries and a weight is assigned to each index entry according to a relative frequency of occurrence of the portion of information in the database.
Abstract: A computer implemented method selectively searches an index of a database according to scores assigned to records of the database located during the searching. The records of the database are index by storing index entries in a memory. Each index entry includes a word entry representing a unique portion of information of the database and one or more location entries indicating where the unique portion of information represented by the word entry occurs in the records of the database. A weight is assigned to each index entry according to a relative frequency of occurrence of the portion of information in the database. The index is sequentially searched to locate records qualified by a query having terms and operators. The terms correspond to index entries. The located records are scored according to the number of times portions of information corresponding to the terms of the query occur in the records and their associated weights. The scores and identities of the located records are stored in entries of a ranking list having a predetermined number of entries. In response to searching a predetermined fraction of the index, a determination is made to see if any unlocated records of the database can receive a score higher than one of the records stored of the ranking list using index entries having a lowest weight. If not, the index is searched using only the index entries having weights higher than index entries having the lowest weigh.

Journal ArticleDOI
TL;DR: It is shown that average precision and recall is not affected by OCR errors across systems for several collections, and it is further shown that the O CR errors and garbage strings generated from the mistranslation of graphic objects increase the size of the index by a wide margin.
Abstract: We give a comprehensive report on our experiments with retrieval from OCR-generated text using systems based on standard models of retrieval. More specifically, we show that average precision and recall is not affected by OCR errors across systems for several collections. The collections used in these experiments include both actual OCR-generated text and standard information retrieval collections corrupted through the simulation of OCR errors. Both the actual and simulation experiments include full-text and abstract-length documents. We also demonstrate that the ranking and feedback methods associated with these models are generally not robust enough to deal with OCR errors. It is further shown that the OCR errors and garbage strings generated from the mistranslation of graphic objects increase the size of the index by a wide margin. We not only point out problems that can arise from applying OCR text within an information retrieval environment, we also suggest solutions to overcome some of these problems.

Journal ArticleDOI
TL;DR: This paper presents an algorithm that combines the Hungarian method and the ranking algorithm for the assignment problem with tour-checking and tour-breaking algorithms and shows that this algorithm finds either a verified optimal or near-optimal solution quickly for moderate size problems.
Abstract: Automated storage and retrieval systems (AS/RS) have made a dramatic impact on material handling and inventory control in warehouses and product systems. A unit-load AS/RS is generic and other AS/RS represent its variations. In this paper, we study a problem of sequencing retrieval requests in a unit-load AS/RS. In a unit-load AS/RS, there are usually multiple openings and a unit-load can be stored in any opening. Given a list of retrieval requests and the locations of openings, this problem seeks a sequence of dual cycles that minimizes total travel time taken by a storage/retrieval machine. Previous researchers believed that this problem is computationally intractable and provided greedy-style heuristic algorithms. In this paper, we present an algorithm that combines the Hungarian method and the ranking algorithm for the assignment problem with tour-checking and tour-breaking algorithms. We show that this algorithm finds either a verified optimal or near-optimal solution quickly for moderate size proble...

Proceedings Article
01 Jan 1996
TL;DR: The major focus this year is on zoning different parts of an initial retrieval ranking, and treating each type of query zone differently as processing continues, as well as experiment with dynamic phrasing.
Abstract: The Smart information rtrieval project emphasizes completely automatic approaches to the understanding and retrieval of large quantities of text. We continue our work in TREC 5, performaing runs in the routing, ad-hoc, and foreign language environments. The major focus this year is on zoning different parts of an initial retrieval ranking, and treating each type of query zone differently as processing continues. We also experiment with dynamic phrasing, seeing which words co-occur with originak query words in documents judged relevant. Exactly the same procedure is used for foreign language environments as for English; our tenet is that good information retrieval techniques are powerful than linguistic knowledge

Journal ArticleDOI
TL;DR: It is shown that average precision and recall is not affected for the full text document collection when the OCR version is compared to its corresponding corrected set and that even though feedback improves retrieval for both collections, it can not be used to compensate for OCR errors caused by badly degraded documents.
Abstract: We report on the performance of the vector space model in the presence of OCR errors. We show that average precision and recall is not affected for our full text document collection when the OCR version is compared to its corresponding corrected set. We do see divergence though between the relevant document rankings of the OCR and corrected collections with different weighting combinations. In particular, we observed that cosine normalization plays a considerable role in the disparity seen between the collections. Furthermore, we show that even though feedback improves retrieval for both collections, it can not be used to compensate for OCR errors caused by badly degraded documents.

Journal ArticleDOI
TL;DR: This article develops a strategy to cope with the problem of overwhelm by formulating ad hoc queries, based on ideas from the information retrieval world, in particular the query by navigation mechanism and the stratified hypermedia architecture.
Abstract: Query formulation in the context of large conceptual schemata is known to be a hard problem. When formulating ad hoc queries users may become overwhelmed by the vast amount of information that is stored in the information system; leading to a feeling of lost in conceptual space. In this article we develop a strategy to cope with this problem. This strategy is based on ideas from the information retrieval world, in particular the query by navigation mechanism and the stratified hypermedia architecture. The stratified hypermedia architecture is used to describe the information contained in the information system on multiple levels of abstraction. When using our approach to the formulation of queries, a user will first formulate a number of simple queries corresponding to linear paths through the information structure. The formulation of the linear paths is the result of the explorative phase of query formulation. Once users have specified a number of these linear paths, they may combine them to form more complex queries. This last process is referred to as query by construction and corresponds to the constructive phase of the query formulation process.

Journal ArticleDOI
TL;DR: In this paper, the authors propose a method to solve the problem of homonymity of homophily in the context of homomorphic data, and no abstracts are available.
Abstract: No abstract available.

Proceedings Article
03 Sep 1996
TL;DR: This work presents a technique for query decomposition, under which the query is shipped exactly once to every site, computed locally, then the local results are shipped to the client, and assembled here into the final result.
Abstract: Recently, several query languages have been proposed for querying information sources whose data is not constrained by a schema, or whose schema is unknown. Examples include: LOREL (for querying data combined from several heterogeneous sources), W3QS (for querying the World Wide Web); and UnQL (for querying unstructured data). The natural data model for such languages is that of a rooted, labeled graph. Their main novelty is the ability to express queries which traverse arbitrarily long paths in the graph, typically described by a regular expression. Such queries however may prove difficult to evaluate in the case when the data is distributed on severalsites, with many edges going between sites. A typical case is that of a collection of WWW sites, with links pointing freely from one site to another (even forming cycles). A naive query shipping strategy may force the query to migrate back and forth between the various sites, leading to poor performance (or even non-termination). We present a technique for query decomposition, under which the query is shipped exactly once to every site, computed locally, then the local results are shipped to the client, and assembled here into the final result. This technique is efficient, in that (a) only data which is part of the final result is shipped from the data sites to the client site, and (b) the total work done locally at all sites does not exceed that needed for computing the (unoptimized) query on a centralized version of the database. Permission to copy without fee 011 or part of this material is granted provided that the copies ore not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice io given that copy’ng is by.permi.vsion of the Very Large Data Bose Endowment. To copy otherwise, or to republish, require8 o fee and/or special permission from the Endowment. Proceedings of the 22nd VLDB Conference Mumbai(Bombay), India, 1996 We also show that the query decomposition technique can be adapted to derive a simple view maintenance method, for two forms of updates which we introduce for the graph data model.

Patent
06 Aug 1996
TL;DR: In this paper, the authors propose a contention protocol in which two nodes connected to a span make provisional allocation for one or more links simultaneously to different restoration routes, in which the higher ranking of the two nodes knows that its provisional allocation will be confirmed, and the lower ranking node knows that it must send a backtrack signature for the capacity that is not available.
Abstract: A method of determining a restoration route (or an additional route) in a fully or partially meshed communications network of nodes, in which the step of sending a route-finder signature from a node to a neighbouring node on one of a plurality of spare links of a span to the neighbouring node comprises the prior substeps of: determining whether the node has a higher or a lower ranking network node identity than that of the neighbouring node; and if it is higher ranking, sending the route-finder signature on the spare link corresponding to the lowest ranking of the node ports associated with said span; or if it is lower ranking, sending the route-finder signature on the spare link corresponding to the highest ranking of the node ports associated with said span. Any contention which occurs because the two nodes connected to a span make provisional allocation for one or more links simultaneously to different restoration routes is dealt with by a contention protocol in which the higher ranking of the two nodes knows that its provisional allocation will be confirmed, and the lower ranking of the two nodes knows that it must send a backtrack signature for the capacity that is not available.

Journal Article
TL;DR: This work defines an execution space that is search-minimal, i.e., the set of indexes searched is minimal, and shows that the problem of optimizing queries that ask for a few top-ranked objects can be viewed, in many cases, as that of evaluating selection conditions.
Abstract: Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A selection on these attributes will typically produce not just a set of objects, as in the traditional relational query model (filtering), but also a grade of match associated with each object, indicating how well the object matches the selection condition (ranking). Also, multimedia repositories may allow access to the attributes of each object only through indexes. We investigate how to optimize the processing of queries over multimedia repositories. A key issue is the choice of the indexes used to search the repository. We define an execution space that is search-minimal, i.e., the set of indexes searched is minimal. Although the general problem of picking an optimal plan in the search-minimal execution space is NP-hard, we solve the problem efficiently when the predicates in the query are independent. We also show that the problem of optimizing queries that ask for a few top-ranked objects can be viewed, in many cases, as that of evaluating selection conditions. Thus, both problems can be viewed together as an extended filtering problem.


Patent
06 Dec 1996
TL;DR: In this paper, the reliability of an information source is considered for ranking retrieved results, and a retrieved result is presented in the priority order in the order of the reliability table reading.
Abstract: PROBLEM TO BE SOLVED: To provide an information retrieving device in which a user can quickly extract intended information by ranking retrieved results by considering the reliability of an information source, and presenting it to the user. SOLUTION: An information retrieving part 3 retrieves a data base 2 by a retrieval request designated by a retrieval request designating part 1 by a user, and transmits information being a retrieved result and the conformity of the information source and the retrieval request to a priority calculating part 6, and the information source to an information source reliability table reading part 5. The information source reliability table reading part 5 reads the reliability of the transmitted information source from an information source reliability table 4, and transmits the reliability to the priority calculation part 6. Then, the priority calculating part 6 calculates the priority of the retrieved result from the transmitted conformity of the retrieval request and the reliability of the information source, and a retrieved result presenting part 7 presents the retrieved results in the priority order. COPYRIGHT: (C)1998,JPO

Journal ArticleDOI
TL;DR: In this paper, the formal similarity between the problem of ranking opportunity sets and an Arrovian aggregation problem is demonstrated, which is exploited to axiomatize several rules for ranking opportunity set.

Proceedings ArticleDOI
12 Nov 1996
TL;DR: A method for predicting the cost of a first-answer query plan under an execution model that attempts to reduce wasted effort in join pipelining is provided and a probabilistic technique for predicting query-plan cost under this modified pipelined join execution model is presented.
Abstract: Special support for quickly finding the first-few answers of a query is already appearing in commercial database systems. This support is useful in active databases, when dealing with potentially unmanageable query results, and as a declarative alternative to navigational techniques. In this paper, we discuss query processing techniques for first-answer queries. We provide a method for predicting the cost of a first-answer query plan under an execution model that attempts to reduce wasted effort in join pipelining. We define new statistics necessary for accurate cost prediction, and discuss techniques for obtaining the statistics through traditional statistical measures (e.g. selectivity) and semantic data properties commonly specified through modern OODB and relational schemas. The proposed techniques also apply to all-answer query processing when optimizing for fast delivery of the initial query results. 1I ntroduction Traditional methods for query processing, primarily those based on the relational model, process queries with the goal of materializing the set of all answer tuples with minimal cost. Several applications instead require only the first answer or first-few answers to particular queries, or require the first answers of a query to be delivered as quickly as possible. This is evidenced by increasing support for first answer query optimization in modern relational systems [11, 16]. First-answer query support is also important in active databases based on production system models, where fast match algorithms lazily enumerate answers to a query one at a time [15]. Object-oriented database systems and knowledge-representation systems support complex structures allowing data to be retrieved through navigation as well as querying. Navigation is often preferable over querying for locating a single object since query engines, usually geared around set-oriented constructs, inevitably touch more data than necessary. A declarative query language with first-answer support can enable more understandable code than navigation, and potentially faster retrieval due to cost-based optimization. Finally, there will always be cases when producing the entire query result is simply too costly. Various search engines (including those for the world wide web) provide functionality for lazily enumerating answers in case of overly general search criteria. In this domain one might argue that all-answer query responses may take infinitely long or an input “table” may be a stream with no known end. Thus only depth first, first solution methods are applicable. This paper presents our work on query processing techniques specifically geared for optimizing and executing first-answer join queries. The techniques also apply to optimizing all-answer queries when the goal is to minimize latency of first-answer delivery instead of overall throughput. The analysis is independent of any storage model, and therefore applies should the database be disk resident, main memory resident, or distributed. We begin by providing a modified pipelined join algorithm that remedies performance problems sometimes exhibited by naive join pipelining. We then present a probabilistic technique for predicting query-plan cost under this modified pipelined join execution model. Though the costestimation technique requires database statistics not typically maintained by traditional centralized database systems, the statistics are derivable from those commonly maintained by distributed query processors. We also show how they can often be derived or estimated from selectivity information and semantic information often specified in the form of cardinality constraints (such as existence and functional dependencies) in modern relational, object-oriented, and knowledge-base systems.

Patent
25 Sep 1996
TL;DR: For use in a system for retrieval of legal information by data processing systems, methods for matching a case identified in the parsing (210, 212) of a legal text within a database (15-16) of case references (16) and methods for ranking (316) the relevance of cases matching (216) a search query are presented in this paper.
Abstract: For use in a system for retrieval of legal information by data processing systems, methods for matching a case identified in the parsing (210, 212) of a legal text within a database (15-16) of case references (16), and methods for ranking (316) the relevance of cases matching (216) a search query.