scispace - formally typeset
Journal ArticleDOI

Optimizing top-k queries for middleware access: A unified cost-based approach

TLDR
This article identifies and addresses the barriers of realizing a unified framework for optimizing top-k queries in middlewares, and develops efficient search schemes over such space for identifying the optimal algorithm.
Abstract
This article studies optimizing top-k queries in middlewares. While many assorted algorithms have been proposed, none is generally applicable to a wide range of possible scenarios. Existing algorithms lack both the “generality” to support a wide range of access scenarios and the systematic “adaptivity” to account for runtime specifics. To fulfill this critical lacking, we aim at taking a cost-based optimization approach: By runtime search over a space of algorithms, cost-based optimization is general across a wide range of access scenarios, yet adaptive to the specific access costs at runtime. While such optimization has been taken for granted for relational queries from early on, it has been clearly lacking for ranked queries. In this article, we thus identify and address the barriers of realizing such a unified framework. As the first barrier, we need to define a “comprehensive” space encompassing all possibly optimal algorithms to search over. As the second barrier and a conflicting goal, such a space should also be “focused” enough to enable efficient search. For SQL queries that are explicitly composed of relational operators, such a space, by definition, consists of schedules of relational operators (or “query plans”). In contrast, top-k queries do not have logical tasks, such as relational operators. We thus define the logical tasks of top-k queries as building blocks to identify a comprehensive and focused space for top-k queries. We then develop efficient search schemes over such space for identifying the optimal algorithm. Our study indicates that our framework not only unifies, but also outperforms existing algorithms specifically designed for their scenarios.

read more

Citations
More filters
Journal ArticleDOI

A survey of top-k query processing techniques in relational database systems

TL;DR: This survey describes and classify top-k processing techniques in relational databases including query models, data access methods, implementation levels, data and query certainty, and supported scoring functions, and shows the implications of each dimension on the design of the underlying techniques.
Journal ArticleDOI

Efficient processing of exact top-k queries over disk-resident sorted lists

TL;DR: A model for estimating the depths to which each sorted list needs to be processed in the two phases of the top-k query is introduced, so that (most of) the required records can be fetched efficiently through sequential or batched I/Os.
Proceedings ArticleDOI

A new approach for processing ranked subsequence matching based on ranked union

TL;DR: The proposed algorithm outperforms HLMJ and the adapted PSM, a state-of-the-art index-based merge algorithm supporting non-monotonic distance functions, by up to two to three orders of magnitude, respectively.
Book ChapterDOI

EcoTop: an economic model for dynamic processing of top-k queries in mobile-P2P networks

TL;DR: This work addresses the processing of top-k queries in mobile ad hoc peer to peer (M-P2P) networks using economic schemes with a novel economic incentive model, designated as EcoTop, which issues economic rewards to the mobile peers, and penalizes peers for sending irrelevant items, thereby incentivizing the optimization of communication traffic.
Proceedings ArticleDOI

Fast First-Phase Candidate Generation for Cascading Rankers

TL;DR: This work proposes an alternative framework that builds specialized single-term and pairwise index structures, and then during query time selectively accesses these structures based on a cost budget and a set of early termination techniques.
References
More filters
Proceedings ArticleDOI

Access path selection in a relational database management system

TL;DR: System R as mentioned in this paper is an experimental database management system developed to carry out research on the relational model of data, which chooses access paths for both simple (single relation) and complex queries (such as joins), given a user specification of desired data as a boolean expression of predicates.
Journal ArticleDOI

Optimal aggregation algorithms for middleware

TL;DR: An elegant and remarkably simple algorithm ("the threshold algorithm", or TA) is analyzed that is optimal in a much stronger sense than FA, and is essentially optimal, not just for some monotone aggregation functions, but for all of them, and not just in a high-probability worst-case sense, but over every database.
Proceedings Article

Combining Fuzzy Information from Multiple Systems.

TL;DR: An algorithm is given, which has been implemented in Garlic, such that if the conjuncts are independent, then with arbitrarily high probability, the total number of elements retrieved in evaluating the query is sublinear in the database size.
Journal ArticleDOI

Evaluating top-k queries over web-accessible databases

TL;DR: This article adapts the sequential query processing technique and introduces an efficient algorithm that maximizes source- access parallelism to minimize query response time, while satisfying source-access constraints.
Proceedings ArticleDOI

Evaluating top-k queries over Web-accessible databases

TL;DR: This paper studies how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces.
Related Papers (5)