scispace - formally typeset
Search or ask a question

Showing papers on "Ranking (information retrieval) published in 2006"


Journal ArticleDOI
TL;DR: This survey reviews the interestingness measures for rules and summaries, classifies them from several perspectives, compares their properties, identifies their roles in the data mining process, gives strategies for selecting appropriate measures for applications, and identifies opportunities for future research in this area.
Abstract: Interestingness measures play an important role in data mining, regardless of the kind of patterns being mined. These measures are intended for selecting and ranking patterns according to their potential interest to the user. Good measures also allow the time and space costs of the mining process to be reduced. This survey reviews the interestingness measures for rules and summaries, classifies them from several perspectives, compares their properties, identifies their roles in the data mining process, gives strategies for selecting appropriate measures for applications, and identifies opportunities for future research in this area.

1,198 citations


Proceedings ArticleDOI
06 Aug 2006
TL;DR: In this paper, the authors show that incorporating implicit feedback can augment other features, improving the accuracy of a competitive web search ranking algorithm by as much as 31% relative to the original performance.
Abstract: We show that incorporating user behavior data can significantly improve ordering of top results in real web search setting. We examine alternatives for incorporating feedback into the ranking process and explore the contributions of user feedback compared to other common web search features. We report results of a large scale evaluation over 3,000 queries and 12 million user interactions with a popular web search engine. We show that incorporating implicit feedback can augment other features, improving the accuracy of a competitive web search ranking algorithms by as much as 31% relative to the original performance.

1,119 citations


Book ChapterDOI
11 Jun 2006
TL;DR: In this paper, a search algorithm for folksonomies, called FolkRank, was proposed to find communities within the folksonomy and is used to structure search results, which exploits the structure of folksonomy.
Abstract: Social bookmark tools are rapidly emerging on the Web. In such systems users are setting up lightweight conceptual structures called folksonomies. The reason for their immediate success is the fact that no specific skills are needed for participating. At the moment, however, the information retrieval support is limited. We present a formal model and a new search algorithm for folksonomies, called FolkRank, that exploits the structure of the folksonomy. The proposed algorithm is also applied to find communities within the folksonomy and is used to structure search results. All findings are demonstrated on a large scale dataset.

980 citations


Proceedings ArticleDOI
23 May 2006
TL;DR: A model for selecting between candidates is built, by using a number of features relating the query-candidate pair, and by fitting the model to human judgments of relevance of query suggestions, which improves the quality of the candidates generated.
Abstract: We introduce the notion of query substitution, that is, generating a new query to replace a user's original search query. Our technique uses modifications based on typical substitutions web searchers make to their queries. In this way the new query is strongly related to the original query, containing terms closely related to all of the original terms. This contrasts with query expansion through pseudo-relevance feedback, which is costly and can lead to query drift. This also contrasts with query relaxation through boolean or TFIDF retrieval, which reduces the specificity of the query. We define a scale for evaluating query substitution, and show that our method performs well at generating new queries related to the original queries. We build a model for selecting between candidates, by using a number of features relating the query-candidate pair, and by fitting the model to human judgments of relevance of query suggestions. This further improves the quality of the candidates generated. Experiments show that our techniques significantly increase coverage and effectiveness in the setting of sponsored search.

707 citations


Proceedings ArticleDOI
Yunbo Cao1, Jun Xu2, Tie-Yan Liu1, Hang Li1, Yalou Huang2, Hsiao-Wuen Hon1 
06 Aug 2006
TL;DR: Experimental results show that the modifications made in conventional Ranking SVM can outperform the conventional ranking SVM and other existing methods for document retrieval on two datasets and employ two methods to conduct optimization on the loss function: gradient descent and quadratic programming.
Abstract: The paper is concerned with applying learning to rank to document retrieval. Ranking SVM is a typical method of learning to rank. We point out that there are two factors one must consider when applying Ranking SVM, in general a "learning to rank" method, to document retrieval. First, correctly ranking documents on the top of the result list is crucial for an Information Retrieval system. One must conduct training in a way that such ranked results are accurate. Second, the number of relevant documents can vary from query to query. One must avoid training a model biased toward queries with a large number of relevant documents. Previously, when existing methods that include Ranking SVM were applied to document retrieval, none of the two factors was taken into consideration. We show it is possible to make modifications in conventional Ranking SVM, so it can be better used for document retrieval. Specifically, we modify the "Hinge Loss" function in Ranking SVM to deal with the problems described above. We employ two methods to conduct optimization on the loss function: gradient descent and quadratic programming. Experimental results show that our method, referred to as Ranking SVM for IR, can outperform the conventional Ranking SVM and other existing methods for document retrieval on two datasets.

648 citations


Proceedings Article
04 Dec 2006
TL;DR: A class of simple, flexible algorithms, called LambdaRank, which avoids difficulties by working with implicit cost functions by using neural network models, and can be extended to any non-smooth and multivariate cost functions.
Abstract: The quality measures used in information retrieval are particularly difficult to optimize directly, since they depend on the model scores only through the sorted order of the documents returned for a given query. Thus, the derivatives of the cost with respect to the model parameters are either zero, or are undefined. In this paper, we propose a class of simple, flexible algorithms, called LambdaRank, which avoids these difficulties by working with implicit cost functions. We describe LambdaRank using neural network models, although the idea applies to any differentiable function class. We give necessary and sufficient conditions for the resulting implicit cost function to be convex, and we show that the general method has a simple mechanical interpretation. We demonstrate significantly improved accuracy, over a state-of-the-art ranking algorithm, on several datasets. We also show that LambdaRank provides a method for significantly speeding up the training phase of that ranking algorithm. Although this paper is directed towards ranking, the proposed method can be extended to any non-smooth and multivariate cost functions.

644 citations


Proceedings ArticleDOI
06 Aug 2006
TL;DR: This work considers a number of information retrieval metrics from the literature, including the rank of the first relevant result, the %no metric that penalizes a system only for retrieving no relevant results near the top, and the diversity of retrieved results when queries have multiple interpretations.
Abstract: Traditionally, information retrieval systems aim to maximize the number of relevant documents returned to a user within some window of the top. For that goal, the probability ranking principle, which ranks documents in decreasing order of probability of relevance, is provably optimal. However, there are many scenarios in which that ranking does not optimize for the users information need. One example is when the user would be satisfied with some limited number of relevant documents, rather than needing all relevant documents. We show that in such a scenario, an attempt to return many relevant documents can actually reduce the chances of finding any relevant documents. We consider a number of information retrieval metrics from the literature, including the rank of the first relevant result, the %no metric that penalizes a system only for retrieving no relevant results near the top, and the diversity of retrieved results when queries have multiple interpretations. We observe that given a probabilistic model of relevance, it is appropriate to rank so as to directly optimize these metrics in expectation. While doing so may be computationally intractable, we show that a simple greedy optimization algorithm that approximately optimizes the given objectives produces rankings for TREC queries that outperform the standard approach based on the probability ranking principle.

371 citations


Proceedings ArticleDOI
27 Jun 2006
TL;DR: This paper proposes several algorithms for efficient query processing in geographic search engines, integrate them into an existing web search query processor, and evaluate them on large sets of real data and query traces.
Abstract: Geographic web search engines allow users to constrain and order search results in an intuitive manner by focusing a query on a particular geographic region. Geographic search technology, also called local search, has recently received significant interest from major search engine companies. Academic research in this area has focused primarily on techniques for extracting geographic knowledge from the web. In this paper, we study the problem of efficient query processing in scalable geographic search engines. Query processing is a major bottleneck in standard web search engines, and the main reason for the thousands of machines used by the major engines. Geographic search engine query processing is different in that it requires a combination of text and spatial data processing techniques. We propose several algorithms for efficient query processing in geographic search engines, integrate them into an existing web search query processor, and evaluate them on large sets of real data and query traces.

335 citations


01 Jan 2006
TL;DR: The MediaMill Challenge 2006 as discussed by the authors divided the generic video indexing problem into a visual-only, textual only, early fusion, late fusion, and combined analysis experiment and the MediaMill team participated in two tasks: concept detection and search.
Abstract: In this paper we describe our TRECVID 2006 experiments. The MediaMill team participated in two tasks: concept detection and search. For concept detection we use the MediaMill Challenge as experimental platform. The MediaMill Challenge divides the generic video indexing problem into a visual-only, textual-only, early fusion, late fusion, and combined analysis experiment. We provide a baseline implementation for each experiment together with baseline results, which we made available to the TRECVID community. The Challenge package was downloaded more than 80 times and we anticipate that it has been used by several teams for their 2006 submission. Our Challenge experiments focus specifically on visual-only analysis of video (run id: B MM). We extract image features, on global, regional, and keypoint level, which we combine with various supervised learners. A late fusion approach of visual-only analysis methods using geometric mean was our most successful run. With this run we conquer the Challenge baseline by more than 50%. Our concept detection experiments have resulted in the best score for three concepts: i.e. desert, flag us, and charts. What is more, using LSCOM annotations, our visual-only approach generalizes well to a set of 491 concept detectors. To handle such a large thesaurus in retrieval, an engine is developed which automatically selects a set of relevant concept detectors based on text matching and ontology querying. The suggestion engine is evaluated as part of the automatic search task (run id: A-MM) and forms the entry point for our interactive search experiments (run id: A-MM). Here we experiment with query by object matching and two browsers for interactive exploration: the CrossBrowser and the novel RotorBrowser. It was found that the RotorBrowser is able to produce the same results as the CrossBrowser, but with less user interaction. Similar to previous years our best interactive search runs yield top performance, ranking 2nd and 6th overall. Again a lot has been learned during this year's TRECVID campaign, we highlight the most important lessons at the end of this paper.

301 citations


01 Jan 2006
TL;DR: A modification of the distance based approach called the sign distance is proposed, which is both efficient to evaluate and able to overcome the shortcomings of the previous techniques.

280 citations


Posted Content
TL;DR: This paper forms the ranking problem in a rigorous statistical framework, establishes in particular a tail inequality for degenerate U-processes, and applies it for showing that fast rates of convergence may be achieved under specific noise assumptions, just like in classification.
Abstract: The problem of ranking/ordering instances, instead of simply classifying them, has recently gained much attention in machine learning. In this paper we formulate the ranking problem in a rigorous statistical framework. The goal is to learn a ranking rule for deciding, among two instances, which one is "better," with minimum ranking risk. Since the natural estimates of the risk are of the form of a U-statistic, results of the theory of U-processes are required for investigating the consistency of empirical risk minimizers. We establish in particular a tail inequality for degenerate U-processes, and apply it for showing that fast rates of convergence may be achieved under specific noise assumptions, just like in classification. Convex risk minimization methods are also studied.

Journal ArticleDOI
01 Jan 2006
TL;DR: It is shown that the feedback arc set problem for tournaments is NP-hard under randomized reductions, which settles a conjecture of Bang-Jensen and Thomassen.
Abstract: A tournament is an oriented complete graph. The feedback arc set problem for tournaments is the optimization problem of determining the minimum possible number of edges of a given input tournament T whose reversal makes T acyclic. Ailon, Charikar, and Newman showed that this problem is NP-hard under randomized reductions. Here we show that it is in fact NP-hard. This settles a conjecture of Bang-Jensen and Thomassen.

Proceedings ArticleDOI
17 Jul 2006
TL;DR: It is shown that approximate inference in BAYESUM is possible on large data sets and results in a state-of-the-art summarization system, and how B Bayesian summarization can be understood as a justified query expansion technique in the language modeling for IR framework.
Abstract: We present BAYESUM (for "Bayesian summarization"), a model for sentence extraction in query-focused summarization. BAYESUM leverages the common case in which multiple documents are relevant to a single query. Using these documents as reinforcement for query terms, BAYESUM is not afflicted by the paucity of information in short queries. We show that approximate inference in BAYESUM is possible on large data sets and results in a state-of-the-art summarization system. Furthermore, we show how BAYESUM can be understood as a justified query expansion technique in the language modeling for IR framework.

Proceedings ArticleDOI
06 Nov 2006
TL;DR: This paper proposes a novel approach for predicting and ranking candidate expertise with respect to a query, and demonstrates that applying field-based weighting models improves the ranking of candidates.
Abstract: In an expert search task, the users' need is to identify people who have relevant expertise to a topic of interest. An expert search system predicts and ranks the expertise of a set of candidate persons with respect to the users' query. In this paper, we propose a novel approach for predicting and ranking candidate expertise with respect to a query. We see the problem of ranking experts as a voting problem, which we model by adapting eleven data fusion techniques.We investigate the effectiveness of the voting approach and the associated data fusion techniques across a range of document weighting models, in the context of the TREC 2005 Enterprise track. The evaluation results show that the voting paradigm is very effective, without using any collection specific heuristics. Moreover, we show that improving the quality of the underlying document representation can significantly improve the retrieval performance of the data fusion techniques on an expert search task. In particular, we demonstrate that applying field-based weighting models improves the ranking of candidates. Finally, we demonstrate that the relative performance of the adapted data fusion techniques for the proposed approach is stable regardless of the used weighting models.

Proceedings ArticleDOI
16 Oct 2006
TL;DR: XSnippet is developed, a context-sensitive code assistant framework that allows developers to query a sample repository for code snippets that are relevant to the programming task at hand and provides better coverage of tasks and better rankings for best-fit snippets than other code assistant systems.
Abstract: It is common practice for software developers to use examples to guide development efforts. This largely unwritten, yet standard, practice of "develop by example" is often supported by examples bundled with library or framework packages, provided in textbooks, and made available for download on both official and unofficial web sites. However, the vast number of examples that are embedded in the billions of lines of already developed library and framework code are largely untapped. We have developed XSnippet, a context-sensitive code assistant framework that allows developers to query a sample repository for code snippets that are relevant to the programming task at hand. In particular, our work makes three primary contributions. First, a range of queries is provided to allow developers to switch between a context-independent retrieval of code snippets to various degrees of context-sensitive retrieval for object instantiation queries. Second, a novel graph-based code mining algorithm is provided to support the range of queries and enable mining within and across method boundaries. Third, an innovative context-sensitive ranking heuristic is provided that has been experimentally proven to provide better ranking for best-fit code snippets than context-independent heuristics such as shortest path and frequency. Our experimental evaluation has shown that XSnippet has significant potential to assist developers, and provides better coverage of tasks and better rankings for best-fit snippets than other code assistant systems.

Journal ArticleDOI
TL;DR: It is discovered that gender and task significantly influence different kinds of search behaviors discussed here, and this is suggestive of improvements to query-based search interface designs with respect to both their use of space and workflow.
Abstract: To improve search engine effectiveness, we have observed an increased interest in gathering additional feedback about users' information needs that goes beyond the queries they type in. Adaptive search engines use explicit and implicit feed-back indicators to model users or search tasks. In order to create appropriate models, it is essential to understand how users interact with search engines, including the determining factors of their actions. Using eye tracking, we extend this understanding by analyzing the sequences and patterns with which users evaluate query result returned to them when using Google. We find that the query result abstracts are viewed in the order of their ranking in only about one fifth of the cases, and only an average of about three abstracts per result page are viewed at all. We also compare search behavior variability with respect to different classes of users and different classes of search tasks to reveal whether user models or task models may be greater predictors of behavior. We discover that gender and task significantly influence different kinds of search behaviors discussed here. The results are suggestive of improvements to query-based search interface designs with respect to both their use of space and workflow.

Patent
11 Apr 2006
TL;DR: A search system for searching for electronic documents, and providing a search result in response to a search query is provided in this paper, which includes a search engine that executes a search based on the search query term and the equivalent terms.
Abstract: A search system for searching for electronic documents, and providing a search result in response to a search query is provided. The search system includes a processor, a user interface module adapted to receive a search query from a user, the search query having at least one search query term, and a query processing module that analyzes the search query term to identify candidate synonym words. The query processing module also determines which of the candidate synonym words are equivalent terms to the search query term, and in a same sense as the search query term. In addition, the search system includes a search engine that executes a search based on the search query term and the equivalent terms.

Proceedings ArticleDOI
23 May 2006
TL;DR: This paper presents a simple and intuitive method for mining search engine query logs to get fast query recommendations on a large scale industrial strength search engine, and combines this method with a traditional content based similarity method to compensate for the high sparsity of real query log data, and more specifically, the shortness of most query sessions.
Abstract: This paper presents a simple and intuitive method for mining search engine query logs to get fast query recommendations on a large scale industrial strength search engine. In order to get a more comprehensive solution, we combine two methods together. On the one hand, we study and model search engine users' sequential search behavior, and interpret this consecutive search behavior as client-side query refinement, that should form the basis for the search engine's own query refinement process. On the other hand, we combine this method with a traditional content based similarity method to compensate for the high sparsity of real query log data, and more specifically, the shortness of most query sessions. To evaluate our method, we use one hundred day worth query logs from SINA' search engine to do off-line mining. Then we analyze three independent editors evaluations on a query test set. Based on their judgement, our method was found to be effective for finding related queries, despite its simplicity. In addition to the subjective editors' rating, we also perform tests based on actual anonymous user search sessions.

Proceedings ArticleDOI
23 May 2006
TL;DR: This work shows that it can significantly outperform PageRank using features that are independent of the link structure of the Web, and uses RankNet, a ranking machine learning algorithm, to combine these and other static features based on anchor text and domain characteristics.
Abstract: Since the publication of Brin and Page's paper on PageRank, many in the Web community have depended on PageRank for the static (query-independent) ordering of Web pages. We show that we can significantly outperform PageRank using features that are independent of the link structure of the Web. We gain a further boost in accuracy by using data on the frequency at which users visit Web pages. We use RankNet, a ranking machine learning algorithm, to combine these and other static features based on anchor text and domain characteristics. The resulting model achieves a static ranking pairwise accuracy of 67.3% (vs. 56.7% for PageRank or 50% for random).

01 Jan 2006
TL;DR: A formal model and a new search algorithm for folksonomies, called FolkRank, that exploits the structure of the folksonomy is presented and is used to structure search results.
Abstract: In social bookmark tools users are setting up lightweight conceptual structures called folksonomies. Currently, the information retrieval support is limited. We present a formal model and a new search algorithm for folksonomies, called FolkRank, that exploits the structure of the folksonomy. The proposed algorithm is also applied to find communities within the folksonomy and is used to structure search results. All findings are demonstrated on a large scale dataset. A long version of this paper has been published at the European Semantic Web Conference 2006 [3].

Patent
17 Mar 2006
TL;DR: In this paper, a method for ranking results returned by a search engine is presented, which is based on determining a formula having variables and parameters, wherein the formula is for computing a relevance score for a document and a search query; and ranking the document based on the relevance score.
Abstract: The present invention is directed to methods of and systems for ranking results returned by a search engine. A method in accordance with the invention comprises determining a formula having variables and parameters, wherein the formula is for computing a relevance score for a document and a search query; and ranking the document based on the relevance score. Preferably, determining the formula comprises tuning the parameters based on user input. Preferably, the parameters are determined using a machine learning technique, such as one that includes a form of statistical classification.

Proceedings ArticleDOI
06 Aug 2006
TL;DR: A new framework for associating ads with web pages based on Genetic Programming (GP), which aims at learning functions that select the most appropriate ads, given the contents of a Web page to optimize overall precision and minimize the number of misplacements.
Abstract: Content-targeted advertising, the task of automatically associating ads to a Web page, constitutes a key Web monetization strategy nowadays. Further, it introduces new challenging technical problems and raises interesting questions. For instance, how to design ranking functions able to satisfy conflicting goals such as selecting advertisements (ads) that are relevant to the users and suitable and profitable to the publishers and advertisers? In this paper we propose a new framework for associating ads with web pages based on Genetic Programming (GP). Our GP method aims at learning functions that select the most appropriate ads, given the contents of a Web page. These ranking functions are designed to optimize overall precision and minimize the number of misplacements. By using a real ad collection and web pages from a newspaper, we obtained a gain over a state-of-the-art baseline method of 61.7% in average precision. Further, by evolving individuals to provide good ranking estimations, GP was able to discover ranking functions that are very effective in placing ads in web pages while avoiding irrelevant ones.

Patent
13 Mar 2006
TL;DR: In this article, the relevance of the search results for a target query can be judged based on one or more queries in the query log that are related to the target query temporally and/or lexically.
Abstract: A system(s) and/or method(s) that facilitate improving the relevance of search results through utilization of a query log. The relevance of the search results for a target query can be judged based on one or more queries in the log that are related to the target query temporally and/or lexically. The diversity of the top-ranked search results can be increased and/or decreased based on an iterative re-ranking process of the search result set.

Proceedings Article
01 Jan 2006
TL;DR: The goal of the enterprise track is to conduct experiments with enterprise data that reflect the experiences of users in real organizations, such that for example, an email ranking technique that is effective here would be a good choice for deployment in a real multi-user email search application.
Abstract: The goal of the enterprise track is to conduct experiments with enterprise data — intranet pages, email archives, document repositories — that reflect the experiences of users in real organizations, such that for example, an email ranking technique that is effective here would be a good choice for deployment in a real multi-user email search application. This involves both understanding user needs in enterprise search and development of appropriate IR techniques. The enterprise track began in TREC 2005 as the successor to the web track, and this is reflected in the tasks and measures. While the track takes much of its inspiration from the web track, the foci are on search at the enterprise scale, incorporating non-web data and discovering relationships between entities in the organization. As a result, we have created the first test collections for multi-user email search and expert finding. This year the track has continued using the W3C collection, a crawl of the publicly available web of the World Wide Web Consortium performed in June 2004. This collection contains not only web pages but numerous mailing lists, technical documents and other kinds of data that represent the day-to-day operation of the W3C. Details of the collection may be found in the 2005 track overview (Craswell et al., 2005). Additionally, this year we began creating a repository of information derived from the collection by participants. This data is hosted alongside the W3C collection at NIST. There were two tasks this year, email discussion search and expert search, and both represent refinements of the tasks initially done in 2005. NIST developed topics and relevance judgments for the email discussion search task this year. For expert search, rather than relying on found data as last year, the track participants created the topics and relevance judgments. Twenty-five groups took part across the two tasks.

Book ChapterDOI
TL;DR: The challenges and opportunities encountered in adapting ranking-and-selection techniques to stochastic simulation problems are described, along with key theorems, results and analysis tools that have proven useful in extending them to this setting.
Abstract: We describe the basic principles of ranking and selection, a collection of experiment-design techniques for comparing “populations” with the goal of finding the best among them. We then describe the challenges and opportunities encountered in adapting ranking-and-selection techniques to stochastic simulation problems, along with key theorems, results and analysis tools that have proven useful in extending them to this setting. Some specific procedures are presented along with a numerical illustration.

Patent
13 Mar 2006
TL;DR: In this article, a web site for user suggestions of products, services or other information is proposed, where the Suggestor also submits tags with those suggestions. To the extent subsequent users use the same tags to access or purchase the user suggestion, the suggesting user will be rewarded.
Abstract: A web site for user suggestions of products, services or other information. The Suggestor also submits tags with those suggestions. To the extent subsequent users use the same tags to access or purchase the user suggestion, the suggesting user will be rewarded. The present invention also provides mechanisms for disbursing rewards for “finding-and-buying-thru-tags”, ranking suggestions, enabling various privacy preserving communications and deal validation mechanisms among shoppers, Suggestors and their social networks.

Journal ArticleDOI
TL;DR: Six predictors of query performance are studied, which can be generated prior to the retrieval process without the use of relevance scores, showing that these predictors can be useful to infer query performance in practical applications.

Patent
18 Jan 2006
TL;DR: In this paper, a system is disclosed for generating a search result list in response to a search request from a searcher using a computer network. But, the system is restricted to a set of documents having general web content.
Abstract: A system is disclosed for generating a search result list in response to a search request from a searcher using a computer network. A first database is maintained that includes a first plurality of search listings. A second database is maintained that includes documents having general web content. A search request is received from the searcher. A first set of search listings is identified from the first database having documents generating a match with the search request and a second set of search listings is identified from the second database having documents generating a match with the search request. A confidence score is determined for each listing from the first set of search listings wherein the confidence score is determined in accordance with a relevance of each listing when compared to the listings of the second set of search listings. The identified search listings from the first set of search listing are ordered in accordance, at least in part, with the confidence score for each search listing.

Book ChapterDOI
05 Nov 2006
TL;DR: AKTiveRank is presented, a prototype system for ranking ontologies based on a number of structural metrics, which addresses the need for methods to evaluate and rank existing ontologies in terms of their relevance to the needs of the knowledge engineer.
Abstract: Ontology search and reuse is becoming increasingly important as the quest for methods to reduce the cost of constructing such knowledge structures continues A number of ontology libraries and search engines are coming to existence to facilitate locating and retrieving potentially relevant ontologies The number of ontologies available for reuse is steadily growing, and so is the need for methods to evaluate and rank existing ontologies in terms of their relevance to the needs of the knowledge engineer This paper presents AKTiveRank, a prototype system for ranking ontologies based on a number of structural metrics

Patent
Deepa Joshi1, John Thrall
20 Dec 2006
TL;DR: In this article, a system is described for discovering query intent based on search queries and concept networks, and the system may construct frequency vectors from log data corresponding to a submitted query and at least one related query submitted to one or more search engines.
Abstract: A system is described for discovering query intent based on search queries and concept networks. The system may construct frequency vectors from log data corresponding to a submitted query and at least one related query submitted to one or more search engines. The system may also construct a query intent vector based on the frequency vectors. The query intent vector may include frequency scores that represent the intent of the query.