Topic

Web query classification

About: Web query classification is a research topic. Over the lifetime, 11987 publications have been published within this topic receiving 339343 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•

DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

[...]

Roy Goldman¹, Jennifer Widom¹•Institutions (1)

Stanford University¹

25 Aug 1997

TL;DR: The theoretical foundations of DataGuides are presented along with an algorithm for their creation and an overview of incremental maintenance, and performance results based on the implementation of dataGuides in the Lore DBMS for semistructured data are provided.

...read moreread less

Abstract: In semistructured databases there is no schema fixed in advance. To provide the benefits of a schema in such environments, we introduce DataGuides: concise and accurate structural summaries of semistructured databases. DataGuides serve as dynamic schemas, generated from the database; they are useful for browsing database structure, formulating queries, storing information such as statistics and sample values, and enabling query optimization. This paper presents the theoretical foundations of DataGuides along with an algorithm for their creation and an overview of incremental maintenance. We provide performance results based on our implementation of DataGuides in the Lore DBMS for semistructured data. We also describe the use of DataGuides in Lore, both in the user interface to enable structure browsing and query formulation, and as a means of guiding the query processor and optimizing query execution.

...read moreread less

1,341 citations

Journal Article•DOI•

Analysis of a very large web search engine query log

[...]

Craig Silverstein¹, Hannes Marais, Monika Henzinger¹, Michael Moricz•Institutions (1)

Google¹

01 Sep 1999

TL;DR: It is shown that web users type in short queries, mostly look at the first 10 results only, and seldom modify the query, suggesting that traditional information retrieval techniques may not work well for answering web search requests.

...read moreread less

Abstract: In this paper we present an analysis of an AltaVista Search Engine query log consisting of approximately 1 billion entries for search requests over a period of six weeks. This represents almost 285 million user sessions, each an attempt to fill a single information need. We present an analysis of individual queries, query duplication, and query sessions. We also present results of a correlation analysis of the log entries, studying the interaction of terms within queries. Our data supports the conjecture that web users differ significantly from the user assumed in the standard information retrieval literature. Specifically, we show that web users type in short queries, mostly look at the first 10 results only, and seldom modify the query. This suggests that traditional information retrieval techniques may not work well for answering web search requests. The correlation analysis showed that the most highly correlated items are constituents of phrases. This result indicates it may be useful for search engines to consider search terms as parts of phrases even if the user did not explicitly specify them as such.

...read moreread less

1,255 citations

Journal Article•DOI•

NiagaraCQ: a scalable continuous query system for Internet databases

[...]

Jianjun Chen¹, David J. DeWitt¹, Feng Tian¹, Yuan Wang¹•Institutions (1)

University of Wisconsin-Madison¹

16 May 2000

TL;DR: The design of NiagaraCQ system is presented, some experimental results on the system's performance and scalability are given and other techniques including incremental evaluation of continuous queries, use of both pull and push models for detecting heterogeneous data source changes, and memory caching are employed.

...read moreread less

Abstract: Continuous queries are persistent queries that allow users to receive new results when they become available. While continuous query systems can transform a passive web into an active environment, they need to be able to support millions of queries due to the scale of the Internet. No existing systems have achieved this level of scalability. NiagaraCQ addresses this problem by grouping continuous queries based on the observation that many web queries share similar structures. Grouped queries can share the common computation, tend to fit in memory and can reduce the I/O cost significantly. Furthermore, grouping on selection predicates can eliminate a large number of unnecessary query invocations. Our grouping technique is distinguished from previous group optimization approaches in the following ways. First, we use an incremental group optimization strategy with dynamic re-grouping. New queries are added to existing query groups, without having to regroup already installed queries. Second, we use a query-split scheme that requires minimal changes to a general-purpose query engine. Third, NiagaraCQ groups both change-based and timer-based queries in a uniform way. To insure that NiagaraCQ is scalable, we have also employed other techniques including incremental evaluation of continuous queries, use of both pull and push models for detecting heterogeneous data source changes, and memory caching. This paper presents the design of NiagaraCQ system and gives some experimental results on the system's performance and scalability.

...read moreread less

1,162 citations

Journal Article•DOI•

Topic-sensitive PageRank: a context-sensitive ranking algorithm for Web search

[...]

Taher H. Haveliwala¹•Institutions (1)

Stanford University¹

01 Jul 2003-IEEE Transactions on Knowledge and Data Engineering

TL;DR: It is shown that using linear combinations of these (precomputed) biased PageRank vectors to generate context-specific importance scores for pages at query time, can generate more accurate rankings than with a single, generic PageRank vector.

...read moreread less

Abstract: The original PageRank algorithm for improving the ranking of search-query results computes a single vector, using the link structure of the Web, to capture the relative "importance" of Web pages, independent of any particular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased using a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic. For ordinary keyword search queries, we compute the topic-sensitive PageRank scores for pages satisfying the query using the topic of the query keywords. For searches done in context (e.g., when the search query is performed by highlighting words in a Web page), we compute the topic-sensitive PageRank scores using the topic of the context in which the query appeared. By using linear combinations of these (precomputed) biased PageRank vectors to generate context-specific importance scores for pages at query time, we show that we can generate more accurate rankings than with a single, generic PageRank vector. We describe techniques for efficiently implementing a large-scale search system based on the topic-sensitive PageRank scheme.

...read moreread less

1,161 citations

Journal Article•DOI•

Searching the Web: the public and their queries

[...]

Amanda Spink¹, Dietmar Wolfram², Major B. J. Jansen³, Tefko Saracevic⁴•Institutions (4)

Pennsylvania State University¹, University of Wisconsin–Milwaukee², University of Maryland, College Park³, Rutgers University⁴

01 Feb 2001-Journal of the Association for Information Science and Technology

TL;DR: It is found that most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features, and the language of Web queries is distinctive.

...read moreread less

Abstract: In studying actual Web searching by the public at large, we analyzed over one million Web queries by users of the Excite search engine. We found that most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features. A small number of search terms are used with high frequency, and a great many terms are unique; the language of Web queries is distinctive. Queries about recreation and entertainment rank highest. Findings are compared to data from two other large studies of Web queries. This study provides an insight into the public practices and choices in Web searching.

...read moreread less

1,153 citations

Collapse

Network Information

Performance

Metrics

12,090

Papers

348,683

Citations

No. of papers in the topic in previous years
Year	Papers
2023	32
2022	69
2021	13
2020	13
2019	15
2018	32

Web query classification

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics