scispace - formally typeset
Search or ask a question

Showing papers on "Ranking (information retrieval) published in 1997"


Patent
Kelly Wical1
21 May 1997
TL;DR: In this article, a knowledge base search and retrieval system, which includes factual knowledge base queries and concept knowledge base query, is disclosed, which stores associations among terminology/categories that have a lexical, semantical or usage association.
Abstract: A knowledge base search and retrieval system, which includes factual knowledge base queries and concept knowledge base queries, is disclosed. A knowledge base stores associations among terminology/categories that have a lexical, semantical or usage association. Document theme vectors identify the content of documents through themes as well as through classification of the documents in categories that reflects what the documents are primarily about. The factual knowledge base queries identify, in response to an input query, documents relevant to the input query through expansion of the query terms as well as through expansion of themes. The concept knowledge base query does not identify specific documents in response to a query, but specifies terminology that identifies the potential existence of documents in a particular area.

468 citations


Proceedings ArticleDOI
01 Jul 1997
TL;DR: The role of phrases in query expansion via local context analysis and local feedback and how they can be used to significantly reduce the error associated with automatic dictionary translation are explored.
Abstract: Dictionary methods for cross-language information retrieval give performance below that for mono-lingual retrieval. Failure to translate multi-term phrases has been shown to be one of the factors responsible for the errors associated with dictionary methods. First, we study the importance of phrasal translation for this approach. Second, we explore the role of phrases in query expansion via local context analysis and local feedback and show how they can be used to significantly reduce the error associated with automatic dictionary translation.

394 citations


Patent
05 Feb 1997
TL;DR: In this paper, the indexer traverses the hypertext database and finds hypertext information including the address of the document the hyperlinks point to and the anchor text of each hyperlink.
Abstract: A search engine for retrieving documents pertinent to a query indexes documents in accordance with hyperlinks pointing to those documents. The indexer traverses the hypertext database and finds hypertext information including the address of the document the hyperlinks point to and the anchor text of each hyperlink. The information is stored in an inverted index file, which may also be used to calculate document link vectors for each hyperlink pointing to a particular document. When a query is entered, the search engine finds all document vectors for documents having the query terms in their anchor text. A query vector is also calculated, and the dot product of the query vector and each document link vector is calculated. The dot products relating to a particular document are summed to determine the relevance ranking for each document.

373 citations


Patent
Kelly Wical1
21 May 1997
TL;DR: In this article, a knowledge base search and retrieval system, which includes factual knowledge base queries and concept knowledge base query, is disclosed, which stores associations among terminology/categories that have a lexical, semantical or usage association.
Abstract: A knowledge base search and retrieval system, which includes factual knowledge base queries and concept knowledge base queries, is disclosed. A knowledge base stores associations among terminology/categories that have a lexical, semantical or usage association. Document theme vectors identify the content of documents through themes as well as through classification of the documents in categories that reflects what the documents are primarily about. The factual knowledge base queries identify, in response to an input query, documents relevant to the input query through expansion of the query terms as well as through expansion of themes. The concept knowledge base query does not identify specific documents in response to a query, but specifies terminology that identifies the potential existence of documents in a particular area.

333 citations


Proceedings ArticleDOI
01 Jul 1997
TL;DR: This paper compares their scheme of arbitrary passage retrieval to several other document retrieval and passage retrieval methods and shows experimentally that, compared to these methods,ranking via fixed-length passages is robust and effective.
Abstract: Ranking based on passages addresses some of the shortcomings of whole-document ranking. It provides convenient units of text to return to the user, avoids the difficulties of comparing documents of different length, and enables identification of short blocks of relevant material amongst otherwise irrelevant text. In this paper we explore the potential of passage retrieval, based on an experimental evaluation of the ability of passages to identify relevant documents. We compare our scheme of arbitrary passage retrieval to several other document retrieval and passage retrieval methods; we show experimentally that, compared to these methods, ranking via fixed-length passages is robust and effective. Our experiments also show that, compared to whole-document ranking, ranking via fixed-length arbitrary passages significantly improves retrieval effectiveness, by 8% for TREC disks 2 and 4 and by 18%-37% for the Federal Register collection.

299 citations


Journal ArticleDOI
TL;DR: The efficacy of SavvySearch's incrementally acquired metaindex approach to selecting search engines is studied by analyzing the effect of time and experience on performance and how much experience is required to surpass the simple scheme.
Abstract: Search engines are among the most useful and high-profile resources on the Internet. The problem of finding information on the Internet has been replaced with the problem of knowing where search engines are, what they are designed to retrieve, and how to use them. This article describes and evaluates SavvySearch, a metasearch engine designed to intelligently select and interface with multiple remote search engines. The primary metasearch issue examined is the importance of carefully selecting and ranking remote search engines for user queries. We studied the efficacy of SavvySearch's incrementally acquired metaindex approach to selecting search engines by analyzing the effect of time and experience on performance. We also compared the metaindex approach to the simpler categorical approach and showed how much experience is required to surpass the simple scheme.

270 citations


Patent
26 Mar 1997
TL;DR: In this paper, a set of sub queries consisting of different media types are used to search a collection of multimedia documents in a database and then the interim results are combined in a global result object that is processed using a user specification.
Abstract: A query comprising of sub queries, each of which could be of different media types are used to search a collection of multimedia documents in a database. These sub queries are parsed according to media type and operators/functions between these sub queries are recorded creating a set of query objects and query operator objects. The query interface than passes the query objects to the appropriate application programming interfaces (API's) of the various search engines. Furthermore, it applies the query object operators to the respective interim results obtained by executing a query object. Then the interim results are combined in a global result object that is processed using a user specification to produce a single combined result list that conforms to user specified requirements.

261 citations


Patent
John S. Breese1, David Heckerman1, Eric Horvitz1, Carl M. Kadie1, Keiji Kanazawa1 
28 Feb 1997
TL;DR: In this paper, the authors proposed a method to reduce or eliminate the risk of locating known information near the top of a list of search results by discounting the ranking, or adjusting ranking values generated by a known search engine as a function of the knowledge probability estimates.
Abstract: Information retrieval methods and apparatus which involve: 1) the generation of estimates regarding the probability that items included in search results are already known to the user and 2) the use of such knowledge probability estimates to influence the ranking of search results, are described. By discounting the ranking, or adjusting ranking values generated by a known search engine as a function of the knowledge probability estimates, the present invention reduces or eliminates the risk of locating known information near the top of a list of search results. This is advantageous since known information is generally of little interest to a user. In various embodiments the popularity of an item is used to estimate the probability that the item is already known to a user. In addition, in various embodiments one or more user controllable parameters are used in the generation of the knowledge probability estimates and/or the ranking of the search results to give the user an opportunity to have the ranking of the search results accurately reflect the user's knowledge. The present invention is particularly well suited to collaborative filtering based search systems. This is because collaborative filters make recommendations to a user based on historical information relating to, e.g., the popularity of items being considered for recommendation. This same popularity information can be used to estimate a users knowledge of a database item. Such items may include television shows, music, Internet sites, etc.

257 citations


25 Jun 1997
TL;DR: It is discovered that once a good basic ranking scheme is being used, the use of phrases does not have a major effect on precision at high ranks, and phrases are more useful at lower ranks where the connection between documents and relevance is more tenuous.
Abstract: As the amount of textual information available through the World Wide Web grows, there is a growing need for high-precision IR systems that enable a user to find useful information from the masses of available textual data. Phrases have traditionally been regarded as precision-enhancing devices and have proved useful as content-identifiers in representing documents. In this study, we compare the usefulness of phrases recognized using linguistic methods and those recognized by statistical techniques. We focus in particular on high-precision retrieval. We discover that once a good basic ranking scheme is being used, the use of phrases does not have a major effect on precision at high ranks. Phrases are more useful at lower ranks where the connection between documents and relevance is more tenuous. Also, we find that the syntactic and statistical methods for recognizing phrases yield comparable performance.

251 citations


Patent
08 Apr 1997
TL;DR: In this article, a method and system for assisting a user in solving a new problem case within a selected domain, such as a complex apparatus, is presented. But the method comprises the steps of providing a case database comprising domain knowledge for the selected domain and previously solved cases, each previously solved case including a plurality of case attributes, said case attributes comprising case attribute names and associated values.
Abstract: A method and system for assisting a user in solving a new problem case within a selected domain, such as a complex apparatus. The method comprises the steps of providing a case database comprising domain knowledge for the selected domain and previously solved cases, each previously solved case including a plurality of case attributes, said case attributes comprising case attribute names and associated values, prompting the user to select from the case attributes a set of new problem case attributes considered to be relevant to the new problem case and to provide current values for each of the new problem case attributes, searching the database of solved cases for candidate solved cases that have one or more of the new problem case attributes selected by the user and generating a list of said candidate solved cases, matching the candidate solved cases to the new problem case by comparing the value for each of the case attributes in the new problem case to the value for the same case attribute in each of the candidate solved cases, ranking the candidate solved cases in descending order of similarity and presenting a list of candidate solved cases in order of relevance based upon the ranking, generating additional questions based upon unanswered attributes of the candidate solved cases for which values have not yet been provided by the user, to assist the user to select and provide values for the unanswered attributes and thereby appropriately order the candidate solved cases; and repeating the above steps until the user is satisfied with the list of candidate solved cases.

236 citations


Patent
Kelly Wical1
21 May 1997
TL;DR: In this article, a knowledge base search and retrieval system, which includes factual knowledge base queries and concept knowledge base query, is disclosed, which stores associations among terminology/categories that have a lexical, semantical or usage association.
Abstract: A knowledge base search and retrieval system, which includes factual knowledge base queries and concept knowledge base queries, is disclosed. A knowledge base stores associations among terminology/categories that have a lexical, semantical or usage association. Document theme vectors identify the content of documents through themes as well as through classification of the documents in categories that reflects what the documents are primarily about. The factual knowledge base queries identify, in response to an input query, documents relevant to the input query through expansion of the query terms as well as through expansion of themes. The concept knowledge base query does not identify specific documents in response to a query, but specifies terminology that identifies the potential existence of documents in a particular area.

Patent
Wen-Syan Li1, Kasim Selouk Candan1
29 Aug 1997
TL;DR: In this article, a computer implemented method for searching and retrieving images contained within a database of images in which both semantic and cognitive methodologies are utilized is presented, and successively refines the search utilizing semantic and Cognitive methodologies and then ranking the results for presentation to the user.
Abstract: A computer implemented method for searching and retrieving images contained within a database of images in which both semantic and cognitive methodologies are utilized. The method accepts a semantic and cognitive description of an image to be searched from a user, and successively refines the search utilizing semantic and cognitive methodologies and then ranking the results for presentation to the user.

Journal ArticleDOI
01 Sep 1997
TL;DR: This work examines links among the nodes returned in a keyword-based query, finding “interesting” sites that are highly connected to those sites returned by the original query by finding ‘hot spots’ on the Web that contain information germane to a user's query.
Abstract: Finding information located somewhere on the World-Wide Web is an error-prone and frustrating task. The WebQuery system offers a powerful new method for searching the Web based on connectivity and content. We do this by examining links among the nodes returned in a keyword-based query. We then rank the nodes, giving the highest rank to the most highly connected nodes. By doing so, we are finding “hot spots” on the Web that contain information germane to a user's query. WebQuery not only ranks and filters the results of a Web query, it also extends the result set beyond what the search engine retrieves, by finding “interesting” sites that are highly connected to those sites returned by the original query. Even with WebQuery filtering and ranking query results, the result sets can be enormous. So, we need to visualize the returned information. We explore several techniques for visualizing this information—including cone trees, 2D graphs, 3D graphs, lists, and bullseyes-and discuss the criteria for using each of the techniques.

Patent
20 May 1997
TL;DR: In this article, a method for optimizing the cost of searches through a multimedia repository is disclosed where the repository contains a plurality of objects having at least two different attributes such as color in a newspaper photograph and text in the subtitle.
Abstract: A method for optimizing the cost of searches through a multimedia repository is disclosed where the repository contains a plurality of objects having at least two different attributes such as color in a newspaper photograph and text in the subtitle. The method comprises selecting a ranking expression, translating the ranking expression into resulting filter conditions and then optimizing the resulting filter conditions to perform the search. A database look-up step is included which determines the cost of performing searches over the various subconditions of the filter condition. The least costly subcondition is searched first to retrieve objects from the multimedia repository. The remaining subconditions are then evaluated on the retrieved objects using either a search step or probe step depending upon the determined cost to perform each. A further database look-up step predicts a grade of match necessary in the translated ranking expression to retrieve at least the number of objects requested in the search.

Patent
Seema Prasad1
26 Nov 1997
TL;DR: In this article, an automated system optimizes selection of sources in a distributed information system for query searching, where a training set of documents is created for each source by randomly selecting significant portions of the documents thereof.
Abstract: In an information retrieval system, an automated system optimizes selection of sources in a distributed information system for query searching. A training set of documents is created for each source by randomly selecting significant portions of the documents thereof. A test set documents is created for each source from the documents not included in the training set. Each document in the training and test set is defined in terms of features/attributes and a name as samples representing individual sources. Pattern recognizing means process the samples to recognize patterns in the documents to distinguish one source from another source. Rule generating means provide a set of DNF rules from the patterns as a model representing each source. The test set of documents is expressed in terms of DNF rules. Evaluating means create a final classification model after minimizing any error between the DNF rules for the training and test sets. Query means enable a user to express a query in terms of features/attributes and DNF rules which when applied to the final model automatically select the optimal sources for query searching. The sources may also be expressed in taxonomic groupings which reduces the number of data sources and speeds query searching on a distributive information network by a user.

Proceedings ArticleDOI
01 Apr 1997
TL;DR: It is argued that delegating the task of meta-data collection to local index servers is a more scalable approach, and a mechanism for integrating distributed autonomous index servers into a cooperative resource discovery system is proposed.
Abstract: Keyword-based search services have become necessary tools for nding information resources on the Internet today. The rapid growth of information on the Internet renders centralized keyword index services incapable of collecting comprehensive resource meta-data in a timely manner. We argue that delegating the task of meta-data collection to local index servers is a more scalable approach. We propose a mechanism for integrating distributed autonomous index servers into a cooperative resource discovery system. Focusing on the retrieval eeec-tiveness of the system, we propose a set of methods , called CVV-based methods, for ranking and selecting index servers with respect to a query, and merging the results returned by the index servers. Through experiments, we evaluate the eeectiveness of the CVV-based methods, and compare our server ranking method with methods proposed by other researchers .

Patent
Kelly Wical1
12 Nov 1997
TL;DR: In this paper, the search and retrieval system includes point-of-view gists for documents to provide a synopsis for a corresponding document with a slant toward a specific topic.
Abstract: A research mode in a search and retrieval system generates a research document that infers an answer to a query from multiple documents. The search and retrieval system includes point of view gists for documents to provide a synopsis for a corresponding document with a slant toward a topic. To generate a research document, the search and retrieval system processes a query to identify one or more topics related to the query, selects document themes relevant to the query, and then selects point of view gists, based on the document themes, that have a slant towards the topics related to the query. A knowledge base, which includes categories arranged hierarchically, is configured as a directed graph to links those categories having a lexical, semantic or usage association. Through use of the knowledge base, an expanded set of query terms are generated, and research documents are compiled that include point of view gists relevant to the expanded set of query terms. A content processing system, which identifies the themes for a document and classifies the document themes in categories of the knowledge base, is also disclosed.

Proceedings ArticleDOI
03 Jan 1997
TL;DR: An experimental query interface is proposed that filters NL query statements for search predicates that are derived from constructs on conceptual schemas, thereby avoiding the computational difficulty with full-fledged NL parsing.
Abstract: Natural language (NL) interfaces for database query formulation have always been recognized as a much needed enhancement for end-users. Poor performances with earlier NL systems had led to a lull in research in this field. However, latter-day experiments and systems appear to be sufficiently more promising to warrant continued and further research in this area. This paper proposes an experimental query interface that filters NL query statements for search predicates that are derived from constructs on conceptual schemas, thereby avoiding the computational difficulty with full-fledged NL parsing. A prototype of the Conceptual Query Language (CQL) exists as a front-end to an Oracle relational DBMS.

Proceedings ArticleDOI
01 Jul 1997
TL;DR: It is concluded that interactive query expansion has good potential, particular y for term sources that are porer than relevance feedback, but it may be difficult for searchers to realise this potential without experience or training in term selection and free-text search strategies.
Abstract: In query expansion, terms from a source such as relevance feedback are added to the query. This often improves retrieval effectiveness but results are variable across queries. In interactive query expansion (IQE) the automatically-derived terms are instead offered as suggestions to the searcher, who decides which to add. There is little evidence of whether IQE is likely to be effective over multiple iterations in a large scale retrieval context, or whether inexperienced users can achieve this effectiveness in practice. These experiments address these two questions. A small but significant improvement in potential retrieval effectiveness is found. This is consistent across a range of topics. Inexperienced users’ term selections consistently fail to improve on automatic query expansion, however. It is concluded that interactive query expansion has good potential, particular y for term sources that are porer than relevance feedback. But it may be difficult for searchers to realise this potential without experience or training in term selection and free-text search strategies.

Patent
13 Jun 1997
TL;DR: In this paper, a method for classifying a document based on content within a class hierarchy is proposed, which consists of a plurality of category nodes stored within a tree data structure, each of which includes a category name corresponding to a unique directory and a category definition comprising a set of defining terms.
Abstract: A method for classifying a document based on content within a class hierarchy. The class hierarchy comprises a plurality of category nodes stored within a tree data structure. Each of the plurality of category nodes includes a category name corresponding to a unique directory and a category definition comprising a set of defining terms. The class hierarchy is searched to determine appropriate categories for classification of the document. The document is then stored in directories corresponding to the categories selected for classification. If no categories are produced by the search, a system administrator is notified of the unsuccessful search.

Patent
Fujun Bi, Ran Li, Shaun Bliss, Reza Nojoomi, Hong Yan 
29 Sep 1997
TL;DR: In this paper, a multi-element confidence matching system is proposed to automatically provide the user or trader with the information he is interested in without the intervention of the trader, and give the user the maximum amount of information about offers which may meet their requirement, so as to give the trader the ability to not just see offers which exactly match their criteria, but ones which come close or can fulfill part of, or more than, their needs.
Abstract: The present invention relates to a computer matching system used by a plurality of users and the method therefor, said system comprising a database; an offer creation program means for creating an entity for an offer input by each user in the database and storing said offer therein; and a search engine for comparing and matching a requirement input by a user with other users' offers stored in the database and returning matching results to said user. Advantageously, said requirement includes multiple elements as search criteria, each of said elements is assigned a weight of importance thereby each matching result has a search score indicating satisfaction level of said user, said search engine further perform ordering and ranking of said matching results according to the respective search scores thereof, and only the matching results have search scores above a predetermined satisfaction level are returned to said user. Said multi-element confidence matching system can automatically provide the user or trader with the information he is interested in without the intervention of the trader, and give the user the maximum amount of information about offers which may meet their requirement, so as to give the trader the ability to not just see offers which exactly match their criteria, but ones which come close or can fulfill part of, or more than, their needs, thereby the trader may conduct the search efficiently

Patent
23 Sep 1997
TL;DR: The authors presented clusters of documents in response to a search query where the documents within a cluster are determined to be related to one another by comparing documents which match one or more terms in the query to determine the extent to which the documents have commonality with respect to terms appearing infrequently in the collection of documents.
Abstract: A method of presenting clusters of documents in response to a search query where the documents within a cluster are determined to be related to one another. This relationship is assessed by comparing documents which match one or more terms in the query to determine the extent to which the documents have commonality with respect to terms appearing infrequently in the collection of documents. As a consequence, the cluster of documents represents a response or query result that is split across multiple documents. In a further variation the cluster can be constituted by a structured document and an unstructured document.

Patent
12 Sep 1997
TL;DR: In this article, a method of selecting the likely most relevant database collections for document searching based on an ad hoc query where each of the databases includes a plurality of documents is presented.
Abstract: A method of selecting the likely most relevant database collections for document searching based on an ad hoc query where each of the databases includes a plurality of documents. Iterative collection selection processing of the databases is performed to obtain consistent relative-ranking collection selection results for each iteration. The method uses a collection selection query and performs the repetitive steps of determining an inverse collection frequency and a document frequency for each database; determining a ranking value for each database; selecting a subset of the set of databases based on predetermined criteria dependant on the ranking value for each the database. The method provides for automated and manual descriptions, boolean selection terms combined with soft terms, and uses term proximity, capitalization, phraseology and other information in establishing a relevance ranking of the collections with respect to the ad hoc query.

25 Jun 1997
TL;DR: An internet search engine that helps the user formulate their query by a process of navigation through a structured, automatically constructed, information space called a hyperindex, which aids the user in query term addition and deletion is described.
Abstract: Often queries to internet search engines consist of one or two terms. As a consequence, the effectiveness of the retrieval suffers. This paper describes an internet search engine that helps the user formulate their query by a process of navigation through a structured, automatically constructed, information space called a hyperindex. In the first part of this paper, the logs of an internet search engine were analyzed to determine the proportions with which different types of query transformation occur. It was found that the primary transformation type was repetition of the previous query. Users also substitute, add and delete terms from a previous query and with lower frequency split compound terms, make changes to spelling, punctuation, and case and use derivative forms of words and abbreviations. The second part of the paper details the hyperindex - which aids the user in query term addition and deletion. The architecture of a hyperindex-based internet search engine is presented. Some initial practical experiences are also discussed.

Patent
28 Jul 1997
TL;DR: In this paper, the display order of candidates of KANA-KANJI (Chinese character) conversion according to a noun phrase list when an inputted reading character string is converted into a KANJi-mixed character string was determined.
Abstract: PROBLEM TO BE SOLVED: To improve conversion precision by determining the display order of candidates of KANA(Japanese syllabary)-KANJI(Chinese character) conversion according to a noun phrase list when an inputted reading character string is converted into a KANJI-mixed character string. SOLUTION: When an integrated document 208 is generated by an integrated document generation module 207, a natural language process module 200 generates its noun phrase list 203. A ranking engine 204 generates a ranking list 205 wherein sentences are rearranged according to ranking by weighting respective noun phrases in the noun phrase list 203 of the inputted integrated document 208 according to the importance in the integrated document 208, deciding the importance of each noun phrase in the integrated document 208 by using the weighting results of the respective noun phrases, and ranking the noun phrases so that noun phrases of high importance are in high positions. A KANA-KANJI conversion part 209 determines conversion candidates for the reading character string and the priority according to the ranking list 205.


Proceedings Article
25 Aug 1997
TL;DR: This paper presents a condition that a source must satisfy so that a meta-broker can extract the top objects for a query from the source without examining its entire contents, and shows an efficient algorithm to extract theTop objects from sources that satisfy the given condition.
Abstract: Many sources on the Internet and elsewhere rank the objects in query results according to how well these objects match the original query. For example, a real-estate agent might rank the available houses according to how well they match the user's preferred location and price. In this environment, ``meta-brokers'' usually query multiple autonomous, heterogeneous sources that might use varying result-ranking strategies. A crucial problem that a meta-broker then faces is extracting from the underlying sources the top objects for a user query according to the meta-broker's ranking function. This problem is challenging because these top objects might not be ranked high by the sources where they appear. In this paper we discuss strategies for solving this ``meta-ranking'' problem. In particular, we present a condition that a source must satisfy so that a meta-broker can extract the top objects for a query from the source without examining its entire contents. Not only is this condition necessary but it is also sufficient, and we show an efficient algorithm to extract the top objects from sources that satisfy the given condition.

Book ChapterDOI
01 Jan 1997

Patent
Christopher A. Meek1
17 Apr 1997
TL;DR: In this article, a method and system for specifying a selection query for a collection of data items is presented, which allows a user to define various conditions (e.g., "Supervisor=Smith") that relate to the collection.
Abstract: A method and system for specifying a selection query for a collection of data items. The system allows a user to define various conditions (e.g., "Supervisor=Smith") that relate to the collection. A unique icon is then assigned to represent each condition. These icons can either be assigned automatically by the system or assigned by a user. When a selection query is to be specified, the system displays a selection query grid. The selection query grid contains a row for each possible combination of the defined conditions. Each possible combination is represented by displaying the icons for the conditions in that combination in the row. A user can then select which combinations should form the selection query by selecting rows of the selection query grid. The selection query is the logical-AND of each condition or logical inverse of each condition of a selected combination and the logical-OR of all the selected combinations. The system then uses this selection query to retrieve the data items from the collection.

Patent
31 Mar 1997
TL;DR: In this paper, the authors present a system to respond to queries of stored information by receiving a query identifying desired information; providing the query as a search request to a search engine; receiving a search result from the search engine, including identifiers for stored documents; and constructing an index from the documents using the identifiers in the search result.
Abstract: Systems and methods consistent with the present invention respond to queries of stored information by receiving a query identifying desired information; providing the query as a search request to a search engine; receiving a search result from the search engine, including identifiers for stored documents; and constructing an index from the documents using the identifiers in the search result