scispace - formally typeset
Book ChapterDOI

Caching for realtime search

Reads0
Chats0
TLDR
It is shown that a classical cache replacement policy, LRU, completely fails to guarantee freshness over time, whereas the proposed CIPs serve 97% of the queries with fresh results, and it is demonstrated that the computational overhead of the algorithms is minor, and that they even allow reducing the cache's memory footprint.
Abstract
Modern search engines feature real-time indices, which incorporate changes to content within seconds. As search engines also cache search results for reducing user latency and back-end load, without careful real-time management of search results caches, the engine might return stale search results to users despite the efforts invested in keeping the underlying index up to date. A recent paper proposed an architectural component called CIP - the cache invalidation predictor. CIPs invalidate supposedly stale cache entries upon index modifications. Initial evaluation showed the ability to keep the performance benefits of caching without sacrificing much the freshness of search results returned to users. However, it was conducted on a synthetic workload in a simplified setting, using many assumptions. We propose new CIP heuristics, and evaluate them in an authentic environment - on the real evolving corpus and query stream of a large commercial news search engine. Our CIPs operate in conjunction with realistic cache settings, and we use standard metrics for evaluating cache performance. We show that a classical cache replacement policy, LRU, completely fails to guarantee freshness over time, whereas our CIPs serve 97% of the queries with fresh results. Our policies incur a negligible impact on the baseline's cache hit rate, in contrast with traditional age-based invalidation, which must severely reduce the cache performance in order to achieve the same freshness. We demonstrate that the computational overhead of our algorithms is minor, and that they even allow reducing the cache's memory footprint.

read more

Citations
More filters
Proceedings ArticleDOI

Prefetching query results and its impact on search engines

TL;DR: This work proposes offline and online strategies for selecting and ordering queries whose results are to be prefetched, and demonstrates that these strategies are able to improve various performance metrics, including the hit rate, query response time, result freshness, and query degradation rate, relative to a state-of-the-art baseline.
Journal ArticleDOI

Scalability Challenges in Web Search Engines

TL;DR: This book covers the issues involved in the design of three separate systems that are commonly available in every web-scale search engine: web crawling, indexing, and query processing systems.
Book ChapterDOI

Adaptive time-to-live strategies for query result caching in web search engines

TL;DR: The results show that the proposed techniques reduce the fraction of stale results served by the cache and also decrease the fractions of redundant query evaluations on the search engine backend compared to a strategy using a fixed TTL value for all queries.
Journal ArticleDOI

A machine learning approach for result caching in web search engines

TL;DR: This work presents a machine learning approach to improve the hit rate of a result cache by facilitating a large number of features extracted from search engine query logs and applies the proposed approach to static, dynamic, and static-dynamic caching.
Proceedings ArticleDOI

Online result cache invalidation for real-time web search

TL;DR: This paper presents a new mechanism that identifies and invalidates query results that have become stale in the cache online and demonstrates that the proposed approach induces less processing overhead, ensuring an average throughput 73% higher than that of the baseline approach.
References
More filters
Book

Modern Information Retrieval

TL;DR: In this article, the authors present a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective, which provides an up-to-date student oriented treatment of the subject.
Book

The Cache Memory Book

TL;DR: What is Cache Memory?
Journal ArticleDOI

Principles of Optimal Page Replacement

TL;DR: A formal model is presented for paging algorithms under /-order nonstationary assumptions about program behavior that is expressed as a dynamic programming problem whose solution yields an optimal replacement algorithm.
Proceedings ArticleDOI

Predictive caching and prefetching of query results in search engines

TL;DR: PDC (probability driven cache), a novel scheme tailored for caching search results, that is based on a probabilistic model of search engine users is presented, and prefetching can increase cache hit ratios by 50% for large caches, and can double the hit ratios of small caches.
Journal ArticleDOI

Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data

TL;DR: This article proposes SDC (Static Dynamic Cache), a new caching strategy aimed to efficiently exploit the temporal and spatial locality present in the stream of processed queries to improve the hit ratio of SDC by using an adaptive prefetching strategy.
Related Papers (5)
Trending Questions (1)
How to clear browser cache in Robot Framework?

Our policies incur a negligible impact on the baseline's cache hit rate, in contrast with traditional age-based invalidation, which must severely reduce the cache performance in order to achieve the same freshness.