scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

The geometry of binary search trees

TL;DR: It is shown that there exists an equal-cost online algorithm, transforming the conjecture of Lucas and Munro into the conjecture that the greedy algorithm is dynamically optimal, and achieving a new lower bound for searching in the BST model.
Abstract: We present a novel connection between binary search trees (BSTs) and points in the plane satisfying a simple property. Using this correspondence, we achieve the following results:1. A surprisingly clean restatement in geometric terms of many results and conjectures relating to BSTs and dynamic optimality.2. A new lower bound for searching in the BST model, which subsumes the previous two known bounds of Wilber [FOCS'86].3. The first proposal for dynamic optimality not based on splay trees. A natural greedy but offline algorithm was presented by Lucas [1988], and independently by Munro [2000], and was conjectured to be an (additive) approximation of the best binary search tree. We show that there exists an equal-cost online algorithm, transforming the conjecture of Lucas and Munro into the conjecture that the greedy algorithm is dynamically optimal.

Content maybe subject to copyright    Report

Citations
More filters
Posted Content
TL;DR: In this paper, the authors prove the existence of an instance-optimal algorithm for computing 2-d or 3-d convex hulls that is optimal for every point set in the following sense: for every sequence $\sigma$ of $n$ points and for every algorithm $A'$ in a certain class $\mathcal{A}, the running time of $A$ on input σ$ is at most a constant factor times the maximum running time on the worst possible permutation of σ for any point set for $A'.
Abstract: We prove the existence of an algorithm $A$ for computing 2-d or 3-d convex hulls that is optimal for every point set in the following sense: for every sequence $\sigma$ of $n$ points and for every algorithm $A'$ in a certain class $\mathcal{A}$, the running time of $A$ on input $\sigma$ is at most a constant factor times the maximum running time of $A'$ on the worst possible permutation of $\sigma$ for $A'$. We establish a stronger property: for every sequence $\sigma$ of points and every algorithm $A'$, the running time of $A$ on $\sigma$ is at most a constant factor times the average running time of $A'$ over all permutations of $\sigma$. We call algorithms satisfying these properties instance-optimal in the order-oblivious and random-order setting. Such instance-optimal algorithms simultaneously subsume output-sensitive algorithms and distribution-dependent average-case algorithms, and all algorithms that do not take advantage of the order of the input or that assume the input is given in a random order. The class $\mathcal{A}$ under consideration consists of all algorithms in a decision tree model where the tests involve only multilinear functions with a constant number of arguments. To establish an instance-specific lower bound, we deviate from traditional Ben-Or-style proofs and adopt a new adversary argument. For 2-d convex hulls, we prove that a version of the well known algorithm by Kirkpatrick and Seidel (1986) or Chan, Snoeyink, and Yap (1995) already attains this lower bound. For 3-d convex hulls, we propose a new algorithm. We further obtain instance-optimal results for a few other standard problems in computational geometry. Our framework also reveals connection to distribution-sensitive data structures and yields new results as a byproduct, for example, on on-line orthogonal range searching in 2-d and on-line halfspace range reporting in 2-d and 3-d.

48 citations

DOI
01 Nov 2007
TL;DR: It is proved that finding an optimal auto-partition is NP-hard and proposed an exact algorithm for finding optimal rectilinear r-partitions whose running time is polynomial when r is a constant, and a faster 2-approximation algorithm.
Abstract: Spatial data structures form a core ingredient of many geometric algorithms, both in theory and in practice. Many of these data structures, especially the ones used in practice, are based on partitioning the underlying space (examples are binary space partitions and decompositions of polygons) or partitioning the set of objects (examples are bounding-volume hierarchies). The efficiency of such data structures---and, hence, of the algorithms that use them---depends on certain characteristics of the partitioning. For example the performance of many algorithms that use binary space partitions (BSPs) depends on the size of the BSPs. Similarly, the performance of answering range queries using bounding-volume hierarchies (BVHs) depends on the so-called crossing number that can be associated with the partitioning on which the BVH is based. Much research has been done on the problem of computing partitioning whose characteristics are good in the worst case. In this thesis, we studied the problem from a different point of view, namely instance-optimality. In particular, we considered the following question: given a class of geometric partitioning structures---like BSPs, simplicial partitions, polygon triangulations, …---and a cost function---like size or crossing number---can we design an algorithm that computes a structure whose cost is optimal or close to optimal for any input instance (rather than only worst-case optimal). We studied the problem of finding optimal data structures for some of the most important spatial data structures. As an example having a set of n points and an input parameter r, It has been proved that there are input sets for which any simplicial partitions has crossing number ?(vr). It has also been shown that for any set of n input points and the parameter r one can make a simplicial partition with stabbing number O(vr). However, there are input point sets for which one can make simplicial partition with lower stabbing number. As an example when the points are on a diagonal, one can always make a simplicial partition with stabbing number 1. We started our research by studying BSPs for line segments in the plane, where the cost function is the size of the BSPs. A popular type of BSPs for line segments are the so-called auto-partitions. We proved that finding an optimal auto-partition is NP-hard. In fact, finding out if a set of input segments admits an auto-partition without any cuts is already NP-hard. We also studied the relation between two other types of BSPs, called free and restricted BSPs, and showed that the number of cuts of an optimal restricted BSP for a set of segments in R2 is at most twice the number of cuts of an optimal free BSP for that set. The details are being represented in Chapter 1 of the thesis. Then we turned our attention to so-called rectilinear r-partitions for planar point sets, with the crossing number as cost function. A rectilinear r-partition of a point set P is a partitioning of P into r subsets, each having roughly |P|/r points. The crossing number of the partition is defined using the bounding boxes of the subsets; in particular, it is the maximum number of bounding boxes that can be intersected by any horizontal or vertical line. We performed some theoretical as well as experimental studies on rectilinear r-partitions. On the theoretical side, we proved that computing a rectilinear r-partition with optimal stabbing number for a given set of points and parameter r is NP-hard. We also proposed an exact algorithm for finding optimal rectilinear r-partitions whose running time is polynomial when r is a constant, and a faster 2-approximation algorithm. Our last theoretical result showed that considering only partitions whose bounding boxes are disjoint is not sufficient for finding optimal rectilinear r-partitions. On the experimental side, we performed a comparison between four different heuristics for constructing rectilinear r-partitions. The so-called windmill KD-tree gave the best results. Chapter 2 of the thesis describes all the details of our research on rectilinear r-partitions. We studied another spatial data structure in Chapter 3 of the thesis. Decomposition of the interior of polygons is one of the fundamental problems in computational geometry. In case of a simple polygon one usually wants to make a Steiner triangulation of it, and when we have a rectilinear polygon at hand, one typically wants to make a rectilinear decomposition for it. Due to this reason there are algorithms which make Steiner triangulations and rectangular decompositions with low stabbing number. These algorithms are worst-case optimal. However, similar to the two previous data structures, there are polygons for which one can make decompositions with lower stabbing numbers. In 3 we proposed a 3-approximation for finding an optimal rectangular decomposition of a rectilinear polygon. We also proposed an O(1)-approximation for finding optimal Steiner triangulation of a simple polygon. Finally, in Chapter 4 of the thesis, we considered another optimization problem, namely how to approximate a piecewise-linear function F: R ?R with another piecewise-linear function with fewer pieces. Here one can distinguish two versions of the problem. The first one is called the min-k problem; the goal is then to approximate the function within a given error e such that the resulting function has the minimum number of links. The second one is called the min-e problem; here the goal is to find an approximation with at most k links (for a given k) such that the error is minimized. These problems have already been studied before. Our contribution is to consider the problem for so-called uncertain functions, where the value of the input function F at its vertices is given as a discrete set of different values, each with an associated probability. We show how to compute an approximation that minimizes the expected error.

47 citations

Proceedings ArticleDOI
25 Oct 2009
TL;DR: For 2-d convex hulls, it is proved that a version of the well known algorithm by Kirkpatrick and Seidel (1986) or Chan, Snoeyink, and Yap (1995) already attains this lower bound, and a new algorithm is proposed.
Abstract: We prove the existence of an algorithm $A$ for computing 2-d or 3-dconvex hulls that is optimal for {\em every point set\/} in the following sense: %for every set $S$ of $n$ points and for every algorithm $A'$ in a certain class $\A$, the running time of $A$ on the worst permutation of $S$ for $A$ is at most a constant factor times the running time of $A'$ on the worst permutation of $S$ for $A'$.%In fact, we can establish a stronger property: for every $S$ and $A'$, the running time of $A$ on $S$ is at most a constant factor times the average running time of $A'$ over all permutations of $S$. %We call algorithms satisfying these properties {\em instance-optimal\/} in the {\em order-oblivious\/} and {\em random-order\/} setting.%Such instance-optimal algorithms simultaneously subsume output-sensitive algorithms and distribution-dependent average-case algorithms, and all algorithms that do not take advantage of the order of the input or that assume the input is given in a random order. The class $\A$ under consideration consists of all algorithms in a decision tree model where the tests involve only {\em multilinear\/}functions with a constant number of arguments. %To establish an instance-specific lower bound, we deviate from traditional Ben--Or-style proofs and adopt an interesting adversary argument. %For 2-d convex hulls, we prove that a version of the well known algorithm by Kirkpatrick and Seidel (1986) or Chan, Snoeyink, and Yap(1995) already attains this lower bound. For 3-d convex hulls, we propose a new algorithm. We further obtain instance-optimal results for a few other standard problems in computational geometry, such as maxima in 2-d and 3-d, orthogonal line segment intersection in 2-d, %finding bichromatic $L_\infty$-close pairs in 2-d, off-line orthogonal range searching in 2-d, %off-line dominance reporting in 2-d and 3-d, off-line halfspace range reporting in 2-d and 3-d, and off-line point location in 2-d. The theory we develop also neatly reveals connections to entropy-dependent data structures, and yields as a byproduct new expected-case results, e.g., for on-line orthogonal range counting in 2-d.

45 citations

Proceedings ArticleDOI
17 Jan 2010
TL;DR: This paper improves, reprove, and simplify several theorems on the performance of data structures based on path compression and search trees, and presents the first asymptotically sharp bound on the length of arbitrary path compressions on arbitrary trees.
Abstract: In this paper we improve, reprove, and simplify several theorems on the performance of data structures based on path compression and search trees. We apply a technique very familiar to computational geometers but still foreign to many researchers in (non-geometric) algorithms and data structures, namely, to bound the complexity of an object via its forbidden substructures.To analyze an algorithm or data structure in the forbidden substructure framework one proceeds in three discrete steps. First, one transcribes the behavior of the algorithm as some combinatorial object M; for example, M may be a graph, sequence, permutation, matrix, set system, or tree. (The size of M should ideally be linear in the running time.) Second, one shows that M excludes some forbidden substructure P, and third, one bounds the size of any object avoiding this substructure. The power of this framework derives from the fact that M lies in a more pristine environment and that upper bounds on the size of a P-free object M may be reused in different contexts.Among our results, we present the first asymptotically sharp bound on the length of arbitrary path compressions on arbitrary trees, improving analyses of Tarjan [35] and Seidel and Sharir [31]. We reprove the linear bound on postordered path compressions, due to Lucas [23] and Loebel and Nesetril [22], the linear bound on deque-ordered path compressions, due to Buchsbaum, Sundar, and Tarjan [5], and the sequential access theorem for splay trees, originally due to Tarjan [38]. We disprove a conjecture of Aronov et al. [3] related to the efficiency of their data structure for half-plane proximity queries and provide a significantly cleaner analysis of their structure. With the exception of the sequential access theorem, all our proofs are exceptionally simple. Notably absent are calculations of any kind.

34 citations

Book ChapterDOI
John Iacono1
01 Jan 2013
TL;DR: In this article, the authors survey the progress that has been made in the almost thirty years since the conjecture was first formulated, and present a binary search tree algorithm that is dynamically optimal.
Abstract: In 1985, Sleator and Tarjan introduced the splay tree, a self-adjusting binary search tree algorithm. Splay trees were conjectured to perform within a constant factor as any offline rotation-based search tree algorithm on every sufficiently long sequence—any binary search tree algorithm that has this property is said to be dynamically optimal. However, currently neither splay trees nor any other tree algorithm is known to be dynamically optimal. Here we survey the progress that has been made in the almost thirty years since the conjecture was first formulated, and present a binary search tree algorithm that is dynamically optimal if any binary search tree algorithm is dynamically optimal.

33 citations

References
More filters
Journal ArticleDOI
TL;DR: The splay tree, a self-adjusting form of binary search tree, is developed and analyzed and is found to be as efficient as balanced trees when total running time is the measure of interest.
Abstract: The splay tree, a self-adjusting form of binary search tree, is developed and analyzed. The binary search tree is a data structure for representing tables and lists so that accessing, inserting, and deleting items is easy. On an n-node splay tree, all the standard search tree operations have an amortized time bound of O(log n) per operation, where by “amortized time” is meant the time per operation averaged over a worst-case sequence of operations. Thus splay trees are as efficient as balanced trees when total running time is the measure of interest. In addition, for sufficiently long access sequences, splay trees are as efficient, to within a constant factor, as static optimum search trees. The efficiency of splay trees comes not from an explicit structural constraint, as with balanced trees, but from applying a simple restructuring heuristic, called splaying, whenever the tree is accessed. Extensions of splaying give simplified forms of two other data structures: lexicographic or multidimensional search trees and link/cut trees.

1,321 citations


"The geometry of binary search trees..." refers background in this paper

  • ...For BSTs, such measures include the entropy bound [ST85],...

    [...]

  • ...This question has fascinated researchers ever since STOC’83, when Sleator and Tarjan [ST85] conjectured that their splay tree is such a “best binary search tree.”...

    [...]

  • ...FUTURE is dynamically optimal! The only previous proposals for dynamic optimality are the original splay trees [ST85], and the multisplay trees2 of [WDS06] which are a combination of splay trees and Tango trees [DHIP07]....

    [...]

  • ...the working-set bound [ST85], static/dynamic finger bounds [CMSS00, Col00], key-independent optimality [Iac05], the unified bound [Iac01, BCDI07], etc....

    [...]

  • ...Nonetheless, they are known to have many properties that OPT(·) has: static optimality [ST85], the working-set bound [ST85], the dynamic-finger bound [CMSS00, Col00], linear traversal [Tar85], nearoptimal deque behavior [Pet08, Sun92], and near-optimal splitting [Luc88]....

    [...]

Journal ArticleDOI
TL;DR: An elegant and remarkably simple algorithm ("the threshold algorithm", or TA) is analyzed that is optimal in a much stronger sense than FA, and is essentially optimal, not just for some monotone aggregation functions, but for all of them, and not just in a high-probability worst-case sense, but over every database.

1,315 citations


"The geometry of binary search trees..." refers background in this paper

  • ...The “theoretical dream” that can unify such work is an instance-optimal algorithm [FLN03]: for any instance S, this algorithm would run on S (almost) as fast as any possible algorithm....

    [...]

Journal ArticleDOI

398 citations

Proceedings ArticleDOI
01 Nov 1986
TL;DR: A tight bound is established on the maximum rotation distance between two A2-node trees for all large n, using volumetric arguments in hyperbolic 3-space, and is given on the minimum number of tetrahedra needed to dissect a polyhedron in the worst case.
Abstract: A rotation in a binary tree is a local restructuring that changes the tree into another tree. Rotations are useful in the design of tree-based data structures. The rotation distance between a pair of trees is the minimum number of rotations needed to convert one tree into the other. In this paper we establish a tight bound of In 6 on the maximum rotation distance between two A2-node trees for all large n, using volumetric arguments in hyperbolic 3-space. Our proof also gives a tight bound on the minimum number of tetrahedra needed to dissect a polyhedron in the worst case, and reveals connections 1 This is a revised and expanded version of a paper that appeared in the 18th Annual ACM Symposium on Theory of Computing, [9]. 2 Partial support provided by DARPA, ARPA order 4976, amendment 19, monitored by the Air Force Avionics Laboratory under contract F33615-87-C-1499, and by the National Science Foundation under grant CCR-8658139. 3 Partial support provided by the National Science Foundation under grant DCR-8605962. 4 Partial support provided by the National Science Foundation under grants DMR-8504984 and DCR8505517.

183 citations


"The geometry of binary search trees..." refers background in this paper

  • ...This equivalence holds because any BST can be converted into any other BST with the same nodes in linear time [ STT86 ]....

    [...]

Journal ArticleDOI
TL;DR: On an n-node splay tree, the amortized cost of an access at distance d from the preceding access is O(log (d+1)) and there is an O(n) initialization cost.
Abstract: The following result is shown: On an n-node splay tree, the amortized cost of an access at distance d from the preceding access is O(log (d+1)). In addition, there is an O(n) initialization cost. The accesses include searches, insertions, and deletions.

130 citations