Caching for realtime search

doi:10.1007/978-3-642-20161-5_12

Book ChapterDOI

Caching for realtime search

Edward Bortnikov, +2 more

- pp 104-116

Chats0

TLDR

It is shown that a classical cache replacement policy, LRU, completely fails to guarantee freshness over time, whereas the proposed CIPs serve 97% of the queries with fresh results, and it is demonstrated that the computational overhead of the algorithms is minor, and that they even allow reducing the cache's memory footprint.

Abstract:

Modern search engines feature real-time indices, which incorporate changes to content within seconds. As search engines also cache search results for reducing user latency and back-end load, without careful real-time management of search results caches, the engine might return stale search results to users despite the efforts invested in keeping the underlying index up to date. A recent paper proposed an architectural component called CIP - the cache invalidation predictor. CIPs invalidate supposedly stale cache entries upon index modifications. Initial evaluation showed the ability to keep the performance benefits of caching without sacrificing much the freshness of search results returned to users. However, it was conducted on a synthetic workload in a simplified setting, using many assumptions. We propose new CIP heuristics, and evaluate them in an authentic environment - on the real evolving corpus and query stream of a large commercial news search engine. Our CIPs operate in conjunction with realistic cache settings, and we use standard metrics for evaluating cache performance. We show that a classical cache replacement policy, LRU, completely fails to guarantee freshness over time, whereas our CIPs serve 97% of the queries with fresh results. Our policies incur a negligible impact on the baseline's cache hit rate, in contrast with traditional age-based invalidation, which must severely reduce the cache performance in order to achieve the same freshness. We demonstrate that the computational overhead of our algorithms is minor, and that they even allow reducing the cache's memory footprint.

Caching for realtime search

Citations

Prefetching query results and its impact on search engines

Scalability Challenges in Web Search Engines

Adaptive time-to-live strategies for query result caching in web search engines

A machine learning approach for result caching in web search engines

Online result cache invalidation for real-time web search

References

Modern Information Retrieval

The Cache Memory Book

Principles of Optimal Page Replacement

Predictive caching and prefetching of query results in search engines

Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data

Related Papers (5)

A refreshing perspective of search engine caching

Caching search engine results over incremental indices

Adaptive time-to-live strategies for query result caching in web search engines

Improved techniques for result caching in web search engines

Admission policies for caches of search engine results

Trending Questions (1)