scispace - formally typeset
Open AccessJournal ArticleDOI

An Optimal Dynamic Data Structure for Stabbing-Semigroup Queries

TLDR
A linear-size dynamic data structure that answers queries in worst-case $O(\log n)$ time and supports both insertions and deletions of intervals in amortized $O('log n') time is proposed.
Abstract
Let $S$ be a set of $n$ intervals in $\mathbb{R}$, and let $(\mathbf{S}, +)$ be any commutative semigroup. We assign a weight $\omega(s) \in \mathbf{S}$ to each interval in $S$. For a point $x \in \mathbb{R}$, let $S(x) \subseteq S$ be the set of intervals that contain $x$. Given a point $q \in \mathbb{R}$, the stabbing-semigroup query asks for computing $\sum_{s \in S(q)} \omega(s)$. We propose a linear-size dynamic data structure, under the pointer-machine model, that answers queries in worst-case $O(\log n)$ time and supports both insertions and deletions of intervals in amortized $O(\log n)$ time. It is the first data structure that attains the optimal $O(\log n)$ bound for all three operations. Furthermore, our structure can easily be adapted to external memory, where we obtain a linear-size structure that answers queries and supports updates in $O(\log_B n)$ I/Os, where $B$ is the disk block size. For the restricted case of a nested family of intervals (either every pair of intervals is disjoint or one contains the other), we present a simpler solution based on dynamic trees.

read more

Content maybe subject to copyright    Report

An Optimal Dynamic Data Structure for Stabbing-Semigroup Queries
Pankaj K. Agarwal
Lars Arge
Haim Kaplan
§
Eyal Molad
Robert E. Tarjan
k
Ke Yi
∗∗
Abstract
Let S be a set of n intervals in R, and let (S, +) be any commutative semigroup. We
assign a weight ω(s) S to each interval in S. For a point x R, let S(x) S be the
set of intervals that contain x. Given a point q R, the stabbing-semigroup query asks for
computing
P
sS(q)
ω(s). We propose a linear-size dynamic data structure, under the pointer-
machine model, that answers queries in worst-case O(log n) time, and supports both insertions
and deletions of intervals in amortized O(log n) time. It is the first data structure that attains
the optimal O(log n) bound for all three operations. Furthermore, our structure can easily be
adapted to external memory, where we obtain a linear-size structure that answers queries and
supports updates in O(log
B
n) I/Os, where B is the disk block size.
For the restricted case of nested family of intervals (every pair of intervals are either disjoint
or one contains the other), we present a simpler solution based on dynamic trees.
1 Introduction
Let S be a set of n intervals in R, and let (S, +) be any commutative semigroup. We assign a
weight ω(s) S to each interval in S. For a point x R and a set R of intervals, let R(x) R
be the set of intervals that contain x. Given a point q R, a stabbing-semigroup query asks for
computing
P
sS(q)
ω(s). We are interested in developing a dynamic data structure to maintain
S dynamically, so that we can answer stabbing-semigroup queries and insert and delete intervals
to/from S efficiently. By taking different semigroups, for instance (Z, +), (R, max), (N, gcd),
({0, 1}, ), etc., we obtain different applications of our data structure. If every pair of intervals
in S is either disjoint or nested, we call the problem a nested instance of the stabbing-semigroup
problem.
Part of work was done while Arge and Yi were at Duke University. Work by Agarwal was supported by NSF
under grants CNS-05-40347, IIS-07-13498, CCF-09-40671, and CCF-1012254, by ARO grants W911NF-07-1-0376
and W911NF-08-1-0452, by an NIH grant 1P50-GM-08183-01, and by a grant from the U.S.–Israel Binational Science
Foundation. Work by Kaplan and Tarjan was supported by Grant no. 2006204 from the U.S.–Israel Binational Science
Foundation.
Department of Computer Science, Duke University, Durham, NC 27708, USA. Email: pankaj@cs.duke.edu
Department of Computer Science, University of Aarhus, Aarhus, Denmark. Email: large@daimi.au.dk
§
Depatment of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel. Email: haimk@tau.ac.il
Depatment of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel.
k
Department of Computer Science, Princeton University, Princeton, NJ and Hewlett-Packard, Palo Alto, CA. Email:
ret@cs.princeton.edu
∗∗
Corresponding author. Department of Computer Science and Engieerning, HKUST. Email: yike@cse.ust.hk
1

The so-called stabbing-max (resp. stabbing-min) problem is the special case of the problem
with the semigroup (R, max) (resp. (R, mi n )). This problem has applications in object oriented
programming [11, 12] and IP routing [10, 13, 17]. In IP routing, a router maintains a dynamic table
of prefixes of IP addresses which is used to pick the outgoing line for each incoming packet. The
decision is done by identifying the longest prefix of the destination address of the packet stored in
its table. We can model this problem as a stabbing-min problem where each prefix corresponds to
an interval whose weight equals to its length. The destination address of a packet is a point and the
shortest interval containing this point corresponds to the longest prefix of the destination address.
Note that the family of intervals in this application is nested.
A more general problem arising in routers is IP packet classification. A router often classifies
each incoming packet into a flow according to some fields in the packet header. The router then
processes in the same way all packets that are in the same flow. To do the classification, the router
maintains a set of rules, each with a priority assigned to it. The highest-priority rule that a packet
obeys determines the flow of the packet. The rules may stipulate range constraints on one or more
fields in the packets (e.g., source/destination IP addresses, source/destination ports), which corre-
sponds to one or multi-dimensional versions of the stabbing-max problem. In many networking
contexts, such as multicast routing protocols and QoS protocols, the set of rules changes over time,
in which case we need the dynamic version of the stabbing-max problem.
Previous work. A linear-size static data structure for the stabbing-semigroup problem that sup-
ports queries in O(log n) time can be developed using the segment tree [8] each node stores the
semigroup sum of the intervals associated with it. This structure can be extended to support inser-
tions of intervals in O(log n) time, without affecting the asymptotic query time, by using a dynamic
segment tree [18]. However, the problem becomes considerably harder when deletions are allowed.
If the weights are drawn from a group, namely in the stabbing-group problem, deleting an interval s
with weight ω(s) can be implemented by inserting s with weight ω(s), with periodic re-building
to avoid a space blowup. However, this solution does not apply to the semigroup case because there
are no inverses. By modifying the segment tree so that each node stores the set of intervals associ-
ated with it, a query can be answered in O(log n) time, but an update takes O(log
2
n) time and the
size of the data structure becomes O(n log n). Alternatively, by using an interval tree [9] one can
obtain a linear-size data structure that supports both insertions and deletions in O(log n) time but
requires O(log
2
n) time to answer a query. We discuss these structures in more detail in Section 2.
Faster data structures have been developed for the stabbing-max problem in the context of the
IP routing problem by exploiting the fact that endpoints of intervals are integers, and using the
RAM model. For example, Feldmann and Muthukrishnan [10] proposed the fat inverted segment
tree (FIS) data structure. The dynamic version of FIS supports queries in O(log log n + ) time,
where is the number of levels in the segment tree. The space requirement is O(n
1+1/ℓ
), and an
insertion or deletion takes O(n
1/ℓ
log n) time, but there is an upper bound on the total number of
insertions and deletions allowed. Thorup [20], improving the result of Feldmann and Muthukr-
ishnan, presents a linear-size data structure with O() query time and O(n
1/ℓ
) update time for
= o(l og n/ log log n), min{
p
log n/ log log n, log log N}, when the endpoints are integers
not exceeding N. See [13] for a survey of such results.
If the input is too large to fit in the main memory, one is interested in an external memory data
structure. In the standard two-level I/O model of computation [2], the machine consists of a finite
main memory and an infinite-size disk. In this model, a block of B consecutive elements can be
2

transferred between main memory and disk, and this is referred to as one I/O operation. The data
structure is stored in a number of disk blocks, each of size B, and the cost of an operation is mea-
sured by the number of I/O operations. See [4, 21] for surveys on external memory data structures.
For the stabbing-semigroup problem, the I/O-efficient interval tree developed by Arge and Vitter [5]
can be used to construct a linear-size data structure for answering a stabbing-semigroup query in
O(log
B
n) I/Os. An interval can be inserted into S using O(log
B
n) I/Os. Their structure can be
modified to handle deletions so that each update takes O(log
B
n) I/Os but then a query requires
O(log
2
B
n) I/Os. An I/O-efficient structure for the stabbing-group problem is presented in [22] that
uses linear space, answers a query and performs an update in O(log
B
n) I/Os, but it does not work
for the semigroup problem.
Our results. The results in this paper combine and extend results from two conference ab-
stracts [1, 14]. Our main result is a linear-size data structure for the stabbing-semigroup problem,
in the pointer-machine model [19] of computation. Our structure answers queries and supports up-
dates (insertions as well as deletions) in O( l og n) time. The query bound is worst-case while the
update bounds are amortized. Our solution starts from the straightforward solutions based on in-
terval and segment trees mentioned above. We then combine features of these two data structures
so that query time is O(log n), update time is O(log n log log n), and the size of the data structure
is O(n log log n). Next, we reduce the size and update time by a factor of log l og n by using a
base tree that is a weight-balanced tree with a large fan-out, in which each fat leaf stores the end-
points of many intervals. Our approach also leads to a data structure with a similar performance
in the I/O model. More precisely, we obtain a linear-size data structure such that a query can be
answered using O(log
B
n) I/Os (worst-case) and each update takes O(log
B
n) I/Os (amortized).
We also propose a simpler data structure that uses dynamic trees to solve nested instances of the
problem. Finally, we prove that our structure is optimal, in the sense that for certain semigroups
none of the query, insertion, or deletion bounds can be improved without sacrificing the others. The
lower bound is established in the cell-probe model, and in fact holds for the (easier) stabbing-group
problem. Previously an Ω(log n/ log log n ) lower bound was known [3] for the problem. Our struc-
ture can be extended to higher dimensions using segment trees in a standard way [14], by paying a
penalty of an O(log n) factor in both time and space for each additional dimension, but the results
may not be optimal in two or higher dimensions.
The rest of this paper is organized as follows. We begin in Section 2 by describing simple data
structures for the stabbing-semigroup problem that use interval and segment trees. In Section 3, we
describe our structure under the assumption that the endpoints of all intervals belong to a fixed set
P of O( n) points. This allows us to disregard the rebalancing issue of the base tree in our multi-
level structure. We remove this assumption in Section 4, where we describe how to rebalance the
base tree. In Section 5, we describe how our structure can be adapted to external memory. Section
6 presents our data structure for nested instances. We prove the lower bounds in Section 7 and
conclude with some open problems in Section 8.
2 Preliminaries and Basic Data Structures
We denote by S the set of (closed) intervals stored in the structure. We use n to denote the cardinality
of S. Note that n changes as S is modified via insertions and deletions. For an interval x S, we
denote by ω(x) the weight of x. Each weight belongs to a semigroup S. To simplify the presentation,
3

we assume that all the endpoints of the intervals in S, as well as the queries, are distinct. This
assumption can easily be removed by fixing an arbitrary order among identical endpoints.
For any Y S, let ω(Y ) =
P
sY
ω(s). For a subset Y S and a point q, we denote by Y (q)
the subset of Y consisting of all intervals containing q. We assume that the semigroup is a monoid,
i.e., it has an identity element, which we denote by 0, and define ω() = 0.
Next, we describe the basic building blocks of our main data structure. The basic ingredient we
use is a structure for storing a totally ordered set X such that each x X has a weight ω(x) S,
subject to the following operations.
(i) INSERT(x): Insert x into X.
(ii) DELETE(x): Delete x from X.
(iii) UPDATEWT(x, w): Given x X and w S, Update ω(x) to be w. We can implement this
by deleting x and reinserting x with its new weight.
(iv) WT(X): Return ω(X).
(v) PREFIXSUM(b): Given b X, return
P
xX, xb
ω(x).
We implement this data type by a dynamic balanced binary tree [7], in which we maintain the
sum of the weights of the elements in each subtree. Then all operations take time logarithmic in
the size of X, except WT(X), which takes O(1) time. The size of the data structure is linear in |X|.
We can also support a FIND(x) operation that locates x in the search tree in logarithmic time. If X
is unordered, we can still use this data structure by imposing an arbitrary total order on X. We call
such a basic structure a BST (for balanced search tree).
Throughout this paper we shall often use the
same name for the set and the BST representing it.
Our new data structure can be viewed as a mixture of an interval tree and a segment tree [9],
so we start by reviewing these classical structures. We describe the static versions of each of these
structures, but they can be made dynamic using the standard techniques [6].
Interval tree. In this section and Section 3, we assume that the endpoints of all intervals in S
that are ever in the structure belong to a fixed set P of m = O(n) points. We divide R into m
atomic intervals by picking an arbitrary separating point between every two consecutive points in
P . We consider these atomic intervals closed (except the leftmost and the rightmost one). Let T be
a balanced full binary tree with m leaves. Each node v T is associated with an interval σ
v
. If v is
the i-th leftmost leaf of T, then σ
v
is the i-th leftmost atomic interval. If v is an interior node with
v
1
and v
2
as its children, then the common endpoint x
v
of σ
v
1
and σ
v
2
is stored at v, and we set
σ
v
= σ
v
1
σ
v
2
. For a point x R, let Π
x
denote the path in T from the root to the deepest node
z such that σ
z
contains x. Note that for every point x 6∈ P , Π
x
is a path from the root to the leaf z
whose atomic interval contains z.
In an interval tree, an interval s S is stored at the highest node v such that x
v
s. Note that
S
v
is empty if v is a leaf. Let S
v
S be the set of intervals stored at v. Let s = [a, b] be an interval
in S
v
. We split s into two subintervals s
= [a, x
v
] and s
r
= [x
v
, b]. We define L
v
= {s
| s S
v
},
R
v
= {s
r
| s S
v
}, L = {s
| s S}, and R = {s
r
| s S}. For a query point q R,
ω(S(q)) = ω(L(q)) + ω(R(q)).
1
We compute ω (L ( q)) and ω ( R(q)) separately and return their
sum.
1
If q = x
v
for some separating point x
v
, we query with a q
+
that we consider to be symbolically larger than q.
4

We add the following secondary structures to the interval tree to compute ω(L(q)) efficiently;
the construction is symmetric for computing ω(R(q)) . For a node v, let E
v
be the set of the left
endpoints of the intervals in L
v
. We assign to each point of E
v
the weight of the corresponding
interval, and store E
v
in a BST. Clearly the total size of the data structure, including all secondary
structures, is O(n).
Let q be a query point, Let Π
q
Π
q
be the set of nodes v Π
q
such that vs left child
is also in Π
q
. Note that L(q)
S
vΠ
q
L
v
and that ω(L(q)) =
P
vΠ
q
ω(L
v
(q)). Moreover,
an interval [a, x
v
] L
v
contains q if and only if a q. To compute ω(L(q)) we traverse the
path Π
q
. At each node v Π
q
, we perform a PREFIXSUM(q) query on E
v
to obtain the weight
P
aE
v
,aq
ω(a) = ω(L
v
(q)) in O(log n) time. Finally, we sum these weights and return the
overall weight. Since we spend O(log n) time at each node v, the overall query time is O(log
2
n).
We can insert or delete an interval s in an interval tree in O(log n) time by finding the node v such
that s S
v
and updating the BST representing E
v
.
Note that it might be tempting to use dynamic fractional cascading [15] to speed up the query
procedure, but this does not work because the PREFIXSUM(q) operation on a BST actually relies on
retrieving O(log n) weights in the BST, not just one search location. If one stores the prefix sum at
the search location, the update cost of the BST will be high.
Segment tree. A segment tree allows us to compute ω(S(q)) in O(log n) time, although the up-
date time is O(log
2
n) and the size is O(n log n). The base tree T of the segment tree is the same
as that of the interval tree. However, we now store an interval s = [a, b] at a node v if σ
v
s and
σ
p(v)
* s, where p(v) denotes the parent of v. Note that the parents of the nodes storing s lie on
Π
a
Π
b
, and that we can find these nodes in O(log n) time. Let
¯
S
v
S be the set of intervals stored
at v in the segment tree. We maintain
¯
S
v
in a BST (imposing an arbitrary order on these intervals).
For a leaf z, let L
z
be the set of intervals with an endpoint inside σ
z
. (For now, L
z
contains at most
one interval, but denoting it as a set will be convenient later on.) We store L
z
at z.
Since an interval s is stored at O( l og n) nodes, the total size is O ( n log n). An interval can be
inserted or deleted in O(log
2
n) time by first finding in O(log n) time the nodes at which s is stored
and then updating the BST at each such node. For a query point q R, let z be the leaf on the path
Π
q
, then we have
S(q) =
[
vΠ
q
¯
S
v
L
z
(q). (1)
Since the sets
¯
S
v
for v Π
q
and L
z
are pairwise disjoint, ω(S(q)) =
P
vΠ
q
ω(
¯
S
v
) + ω(L
z
(q)).
Therefore, ω(S(q)) can be computed in O(log n) time by traversing the path Π
q
, retrieving the
value ω(
¯
S
v
) in O(1) time from the BST representing
¯
S
v
at each node v Π
q
, and finally checking
if the interval in L
z
contains q.
3 An Optimal Data Structure for Fixed Endpoints
In this section, we continue to assume that although the set S of intervals is dynamic, the endpoints
of these intervals belong to a fixed set P of O(n) points. Recall that our assumptions in Section 2
also imply that each point of P is an endpoint of at most one interval. The main result is a linear-size
data structure that answers a stabbing-semigroup query in O(log n) time and performs an update in
O(log n) time. Recall that the segment tree attains an optimal query time whereas the interval tree
5

Citations
More filters
Proceedings ArticleDOI

Improved bounds for orthogonal point enclosure query and point location in orthogonal subdivisions in R3

TL;DR: This work presents a data structure which occupies O(n) space and answers the query in O(logd-3/2 n) time, improving the previously best known query time by roughly a [EQUATION]log n factor.
Proceedings ArticleDOI

Efficient Top-k Indexing via General Reductions

TL;DR: Two general reductions in external memory are proved that significantly simplify the design of top-k structures, as they showcase on numerous problems including halfspace reporting, circular reporting, interval stabbing, point enclosure, and 3d dominance.
Proceedings ArticleDOI

Approximate range counting revisited

TL;DR: This work studies range-searching for colored objects, where one has to count (approximately) the number of colors present in a query range, and presents optimal and near-optimal solutions for these problems.
Proceedings ArticleDOI

Dynamic Planar Orthogonal Point Location in Sublogarithmic Time

TL;DR: A data structure achieving O(log n / log log n) optimal expected query time and O( log^{1/2+epsilon} n) update time (amortized) in the word-RAM model for any constant epsilon>0, under the assumption that the x-coordinates are integers bounded polynomially in n.
Posted Content

A Dynamic I/O-Efficient Structure for One-Dimensional Top-k Range Reporting

TL;DR: A structure in external memory for top-k range reporting is presented, which uses linear space, answers a query in O(lgB n + k/B) I/Os, and supports an update in O(*n) amortized I/ Os, where n is the input size, and B is the block size.
References
More filters
Book ChapterDOI

Introduction to Algorithms

Xin-She Yang
TL;DR: This chapter provides an overview of the fundamentals of algorithms and their links to self-organization, exploration, and exploitation.
Book

Computational Geometry: Algorithms and Applications

TL;DR: In this article, an introduction to computational geometry focusing on algorithms is presented, which is related to particular applications in robotics, graphics, CAD/CAM, and geographic information systems.
Journal ArticleDOI

The input/output complexity of sorting and related problems

TL;DR: Tight upper and lower bounds are provided for the number of inputs and outputs (I/OS) between internal memory and secondary storage required for five sorting-related problems: sorting, the fast Fourier transform (FFT), permutation networks, permuting, and matrix transposition.
Journal ArticleDOI

A data structure for dynamic trees

TL;DR: An O(mn log n)-time algorithm is obtained to find a maximum flow in a network of n vertices and m edges, beating by a factor of log n the fastest algorithm previously known for sparse graphs.
Frequently Asked Questions (13)
Q1. What contributions have the authors mentioned in the paper "An optimal dynamic data structure for stabbing-semigroup queries∗" ?

The authors propose a linear-size dynamic data structure, under the pointermachine model, that answers queries in worst-case O ( log n ) time, and supports both insertions and deletions of intervals in amortized O ( log n ) time. For the restricted case of nested family of intervals ( every pair of intervals are either disjoint or one contains the other ), the authors present a simpler solution based on dynamic trees. Furthermore, their structure can easily be adapted to external memory, where the authors obtain a linear-size structure that answers queries and supports updates in O ( logB n ) I/Os, where B is the disk block size. 

For the stabbing-semigroup problem, the I/O-efficient interval tree developed by Arge and Vitter [5] can be used to construct a linear-size data structure for answering a stabbing-semigroup query in O(logB n) I/Os. 

A set S of n intervals, whose endpoints belong to a fixed set of O(n) points, can be maintained in a data structure of linear size so that a stabbing-semigroup query can be answered in O(logn) time. 

A set of n intervals can be maintained in linear-size data structure so that a stabbingsemigroup query can be answered in O(logn) time worst-case, and an interval can be inserted or deleted in amortized O(logn) time. 

The authors update the affected left interval structures by traversing the subtrees of v′ and v′′ (i.e., the former subtree of v) bottom-up. 

The authors can rebuild the base tree (without the secondary structures) T in O(n) time and perform Θ(n) insertions in O(n logn) time to construct the secondary structures. 

In order to handle insertions of endpoints, the authors make the base tree T a weight-balanced B-tree with branching factor f and leaf parameter log n [5]. 

The time it takes to perform the split is O(nv log logn); the split time at an internal node is dominated by the time it takes to rearrange the middle intervals in the subtree of v into their multislab structures. 

Finally for every w with v′ ≤ w ≤ v′′, the authors update the weight of the element corresponding to the pair v′, v′′ in the slab BST Mv(w), to be ω(Mv(v′, v′′)). 

The authors also show how to use the lower bound by Pǎtraşcu and Demaine [16] for partial sums to prove a lower bound on the trade-off between the query time and the deletion time in a deletiononly data structure for the stabbing-group problem. 

Their structure can be extended to higher dimensions using segment trees in a standard way [14], by paying a penalty of an O(logn) factor in both time and space for each additional dimension, but the results may not be optimal in two or higher dimensions. 

This operation connects the tree containing v with the tree containing w by adding an edge with cost c between v and w with w being the parent.• CUT(v): Splits the tree containing v by removing the edge from v to its parent. 

A sequence of operations for the partial-sum problem can be solved by performing a sequence of insert and query operations on a dynamic stabbing-group data structure, for the group Z/nZ, as follows.