What is the way to handle a stabbing-semigroup query?

A set S of n intervals, whose endpoints belong to a fixed set of O(n) points, can be maintained in a data structure of linear size so that a stabbing-semigroup query can be answered in O(logn) time.

What is the way to solve the stabbingsemigroup query?

A set of n intervals can be maintained in linear-size data structure so that a stabbingsemigroup query can be answered in O(logn) time worst-case, and an interval can be inserted or deleted in amortized O(logn) time.

How do the authors update the affected left interval structures?

The authors update the affected left interval structures by traversing the subtrees of v′ and v′′ (i.e., the former subtree of v) bottom-up.

How can the authors rebuild the base tree without the secondary structures?

The authors can rebuild the base tree (without the secondary structures) T in O(n) time and perform Θ(n) insertions in O(n logn) time to construct the secondary structures.

What is the easiest way to handle deletions of endpoints?

In order to handle insertions of endpoints, the authors make the base tree T a weight-balanced B-tree with branching factor f and leaf parameter log n [5].

What is the time it takes to rearrange the middle intervals in a node?

The time it takes to perform the split is O(nv log logn); the split time at an internal node is dominated by the time it takes to rearrange the middle intervals in the subtree of v into their multislab structures.

What is the weight of the element corresponding to the pair v′, v′′?

Finally for every w with v′ ≤ w ≤ v′′, the authors update the weight of the element corresponding to the pair v′, v′′ in the slab BST Mv(w), to be ω(Mv(v′, v′′)).

How do the authors use the lower bound for partial sums?

The authors also show how to use the lower bound by Pǎtraşcu and Demaine [16] for partial sums to prove a lower bound on the trade-off between the query time and the deletion time in a deletiononly data structure for the stabbing-group problem.

What is the function that splits the tree containing v?

This operation connects the tree containing v with the tree containing w by adding an edge with cost c between v and w with w being the parent.• CUT(v): Splits the tree containing v by removing the edge from v to its parent.

How can the authors solve the partial-sum problem?

A sequence of operations for the partial-sum problem can be solved by performing a sequence of insert and query operations on a dynamic stabbing-group data structure, for the group Z/nZ, as follows.

(Open Access) An Optimal Dynamic Data Structure for Stabbing-Semigroup Queries (2012) | Pankaj K. Agarwal

Q: What contributions have the authors mentioned in the paper "An optimal dynamic data structure for stabbing-semigroup queries∗" ?

The authors propose a linear-size dynamic data structure, under the pointermachine model, that answers queries in worst-case O ( log n ) time, and supports both insertions and deletions of intervals in amortized O ( log n ) time. For the restricted case of nested family of intervals ( every pair of intervals are either disjoint or one contains the other ), the authors present a simpler solution based on dynamic trees. Furthermore, their structure can easily be adapted to external memory, where the authors obtain a linear-size structure that answers queries and supports updates in O ( logB n ) I/Os, where B is the disk block size.

Q: What is the way to solve the stabbing-semigroup problem?

For the stabbing-semigroup problem, the I/O-efficient interval tree developed by Arge and Vitter [5] can be used to construct a linear-size data structure for answering a stabbing-semigroup query in O(logB n) I/Os.

An Optimal Dynamic Data Structure for Stabbing-Semigroup Queries

∗

Pankaj K. Agarwal

†

Lars Arge

‡

Haim Kaplan

Eyal Molad

Robert E. Tarjan

Ke Yi

∗∗

Abstract

Let S be a set of n intervals in R, and let (S, +) be any commutative semigroup. We

assign a weight ω(s) ∈ S to each interval in S. For a point x ∈ R, let S(x) ⊆ S be the

set of intervals that contain x. Given a point q ∈ R, the stabbing-semigroup query asks for

computing

s∈S(q)

ω(s). We propose a linear-size dynamic data structure, under the pointer-

machine model, that answers queries in worst-case O(log n) time, and supports both insertions

and deletions of intervals in amortized O(log n) time. It is the ﬁrst data structure that attains

the optimal O(log n) bound for all three operations. Furthermore, our structure can easily be

adapted to external memory, where we obtain a linear-size structure that answers queries and

supports updates in O(log

n) I/Os, where B is the disk block size.

For the restricted case of nested family of intervals (every pair of intervals are either disjoint

or one contains the other), we present a simpler solution based on dynamic trees.

1 Introduction

Let S be a set of n intervals in R, and let (S, +) be any commutative semigroup. We assign a

weight ω(s) ∈ S to each interval in S. For a point x ∈ R and a set R of intervals, let R(x) ⊆ R

be the set of intervals that contain x. Given a point q ∈ R, a stabbing-semigroup query asks for

computing

s∈S(q)

ω(s). We are interested in developing a dynamic data structure to maintain

S dynamically, so that we can answer stabbing-semigroup queries and insert and delete intervals

to/from S efﬁciently. By taking different semigroups, for instance (Z, +), (R, max), (N, gcd),

({0, 1}, ∨), etc., we obtain different applications of our data structure. If every pair of intervals

in S is either disjoint or nested, we call the problem a nested instance of the stabbing-semigroup

problem.

∗

Part of work was done while Arge and Yi were at Duke University. Work by Agarwal was supported by NSF

under grants CNS-05-40347, IIS-07-13498, CCF-09-40671, and CCF-1012254, by ARO grants W911NF-07-1-0376

and W911NF-08-1-0452, by an NIH grant 1P50-GM-08183-01, and by a grant from the U.S.–Israel Binational Science

Foundation. Work by Kaplan and Tarjan was supported by Grant no. 2006204 from the U.S.–Israel Binational Science

Foundation.

†

Department of Computer Science, Duke University, Durham, NC 27708, USA. Email: pankaj@cs.duke.edu

‡

Department of Computer Science, University of Aarhus, Aarhus, Denmark. Email: large@daimi.au.dk

Depatment of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel. Email: haimk@tau.ac.il

Depatment of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel.

Department of Computer Science, Princeton University, Princeton, NJ and Hewlett-Packard, Palo Alto, CA. Email:

ret@cs.princeton.edu

∗∗

Corresponding author. Department of Computer Science and Engieerning, HKUST. Email: yike@cse.ust.hk

The so-called stabbing-max (resp. stabbing-min) problem is the special case of the problem

with the semigroup (R, max) (resp. (R, mi n )). This problem has applications in object oriented

programming [11, 12] and IP routing [10, 13, 17]. In IP routing, a router maintains a dynamic table

of preﬁxes of IP addresses which is used to pick the outgoing line for each incoming packet. The

decision is done by identifying the longest preﬁx of the destination address of the packet stored in

its table. We can model this problem as a stabbing-min problem where each preﬁx corresponds to

an interval whose weight equals to its length. The destination address of a packet is a point and the

shortest interval containing this point corresponds to the longest preﬁx of the destination address.

Note that the family of intervals in this application is nested.

A more general problem arising in routers is IP packet classiﬁcation. A router often classiﬁes

each incoming packet into a ﬂow according to some ﬁelds in the packet header. The router then

processes in the same way all packets that are in the same ﬂow. To do the classiﬁcation, the router

maintains a set of rules, each with a priority assigned to it. The highest-priority rule that a packet

obeys determines the ﬂow of the packet. The rules may stipulate range constraints on one or more

ﬁelds in the packets (e.g., source/destination IP addresses, source/destination ports), which corre-

sponds to one or multi-dimensional versions of the stabbing-max problem. In many networking

contexts, such as multicast routing protocols and QoS protocols, the set of rules changes over time,

in which case we need the dynamic version of the stabbing-max problem.

Previous work. A linear-size static data structure for the stabbing-semigroup problem that sup-

ports queries in O(log n) time can be developed using the segment tree [8] — each node stores the

semigroup sum of the intervals associated with it. This structure can be extended to support inser-

tions of intervals in O(log n) time, without affecting the asymptotic query time, by using a dynamic

segment tree [18]. However, the problem becomes considerably harder when deletions are allowed.

If the weights are drawn from a group, namely in the stabbing-group problem, deleting an interval s

with weight ω(s) can be implemented by inserting s with weight −ω(s), with periodic re-building

to avoid a space blowup. However, this solution does not apply to the semigroup case because there

are no inverses. By modifying the segment tree so that each node stores the set of intervals associ-

ated with it, a query can be answered in O(log n) time, but an update takes O(log

n) time and the

size of the data structure becomes O(n log n). Alternatively, by using an interval tree [9] one can

obtain a linear-size data structure that supports both insertions and deletions in O(log n) time but

requires O(log

n) time to answer a query. We discuss these structures in more detail in Section 2.

Faster data structures have been developed for the stabbing-max problem in the context of the

IP routing problem by exploiting the fact that endpoints of intervals are integers, and using the

RAM model. For example, Feldmann and Muthukrishnan [10] proposed the fat inverted segment

tree (FIS) data structure. The dynamic version of FIS supports queries in O(log log n + ℓ) time,

where ℓ is the number of levels in the segment tree. The space requirement is O(n

1+1/ℓ

), and an

insertion or deletion takes O(n

1/ℓ

log n) time, but there is an upper bound on the total number of

insertions and deletions allowed. Thorup [20], improving the result of Feldmann and Muthukr-

ishnan, presents a linear-size data structure with O(ℓ) query time and O(n

1/ℓ

) update time for

ℓ = o(l og n/ log log n), ℓ ≥ min{

log n/ log log n, log log N}, when the endpoints are integers

not exceeding N. See [13] for a survey of such results.

If the input is too large to ﬁt in the main memory, one is interested in an external memory data

structure. In the standard two-level I/O model of computation [2], the machine consists of a ﬁnite

main memory and an inﬁnite-size disk. In this model, a block of B consecutive elements can be

transferred between main memory and disk, and this is referred to as one I/O operation. The data

structure is stored in a number of disk blocks, each of size B, and the cost of an operation is mea-

sured by the number of I/O operations. See [4, 21] for surveys on external memory data structures.

For the stabbing-semigroup problem, the I/O-efﬁcient interval tree developed by Arge and Vitter [5]

can be used to construct a linear-size data structure for answering a stabbing-semigroup query in

O(log

n) I/Os. An interval can be inserted into S using O(log

n) I/Os. Their structure can be

modiﬁed to handle deletions so that each update takes O(log

n) I/Os but then a query requires

O(log

n) I/Os. An I/O-efﬁcient structure for the stabbing-group problem is presented in [22] that

uses linear space, answers a query and performs an update in O(log

n) I/Os, but it does not work

for the semigroup problem.

Our results. The results in this paper combine and extend results from two conference ab-

stracts [1, 14]. Our main result is a linear-size data structure for the stabbing-semigroup problem,

in the pointer-machine model [19] of computation. Our structure answers queries and supports up-

dates (insertions as well as deletions) in O( l og n) time. The query bound is worst-case while the

update bounds are amortized. Our solution starts from the straightforward solutions based on in-

terval and segment trees mentioned above. We then combine features of these two data structures

so that query time is O(log n), update time is O(log n log log n), and the size of the data structure

is O(n log log n). Next, we reduce the size and update time by a factor of log l og n by using a

base tree that is a weight-balanced tree with a large fan-out, in which each fat leaf stores the end-

points of many intervals. Our approach also leads to a data structure with a similar performance

in the I/O model. More precisely, we obtain a linear-size data structure such that a query can be

answered using O(log

n) I/Os (worst-case) and each update takes O(log

n) I/Os (amortized).

We also propose a simpler data structure that uses dynamic trees to solve nested instances of the

problem. Finally, we prove that our structure is optimal, in the sense that for certain semigroups

none of the query, insertion, or deletion bounds can be improved without sacriﬁcing the others. The

lower bound is established in the cell-probe model, and in fact holds for the (easier) stabbing-group

problem. Previously an Ω(log n/ log log n ) lower bound was known [3] for the problem. Our struc-

ture can be extended to higher dimensions using segment trees in a standard way [14], by paying a

penalty of an O(log n) factor in both time and space for each additional dimension, but the results

may not be optimal in two or higher dimensions.

The rest of this paper is organized as follows. We begin in Section 2 by describing simple data

structures for the stabbing-semigroup problem that use interval and segment trees. In Section 3, we

describe our structure under the assumption that the endpoints of all intervals belong to a ﬁxed set

P of O( n) points. This allows us to disregard the rebalancing issue of the base tree in our multi-

level structure. We remove this assumption in Section 4, where we describe how to rebalance the

base tree. In Section 5, we describe how our structure can be adapted to external memory. Section

6 presents our data structure for nested instances. We prove the lower bounds in Section 7 and

conclude with some open problems in Section 8.

2 Preliminaries and Basic Data Structures

We denote by S the set of (closed) intervals stored in the structure. We use n to denote the cardinality

of S. Note that n changes as S is modiﬁed via insertions and deletions. For an interval x ∈ S, we

denote by ω(x) the weight of x. Each weight belongs to a semigroup S. To simplify the presentation,

we assume that all the endpoints of the intervals in S, as well as the queries, are distinct. This

assumption can easily be removed by ﬁxing an arbitrary order among identical endpoints.

For any Y ⊂ S, let ω(Y ) =

s∈Y

ω(s). For a subset Y ⊂ S and a point q, we denote by Y (q)

the subset of Y consisting of all intervals containing q. We assume that the semigroup is a monoid,

i.e., it has an identity element, which we denote by 0, and deﬁne ω(∅) = 0.

Next, we describe the basic building blocks of our main data structure. The basic ingredient we

use is a structure for storing a totally ordered set X such that each x ∈ X has a weight ω(x) ∈ S,

subject to the following operations.

(i) INSERT(x): Insert x into X.

(ii) DELETE(x): Delete x from X.

(iii) UPDATEWT(x, w): Given x ∈ X and w ∈ S, Update ω(x) to be w. We can implement this

by deleting x and reinserting x with its new weight.

(iv) WT(X): Return ω(X).

(v) PREFIXSUM(b): Given b ∈ X, return

x∈X, x≤b

ω(x).

We implement this data type by a dynamic balanced binary tree [7], in which we maintain the

sum of the weights of the elements in each subtree. Then all operations take time logarithmic in

the size of X, except WT(X), which takes O(1) time. The size of the data structure is linear in |X|.

We can also support a FIND(x) operation that locates x in the search tree in logarithmic time. If X

is unordered, we can still use this data structure by imposing an arbitrary total order on X. We call

such a basic structure a BST (for balanced search tree).

Throughout this paper we shall often use the

same name for the set and the BST representing it.

Our new data structure can be viewed as a mixture of an interval tree and a segment tree [9],

so we start by reviewing these classical structures. We describe the static versions of each of these

structures, but they can be made dynamic using the standard techniques [6].

Interval tree. In this section and Section 3, we assume that the endpoints of all intervals in S

that are ever in the structure belong to a ﬁxed set P of m = O(n) points. We divide R into m

atomic intervals by picking an arbitrary separating point between every two consecutive points in

P . We consider these atomic intervals closed (except the leftmost and the rightmost one). Let T be

a balanced full binary tree with m leaves. Each node v ∈ T is associated with an interval σ

. If v is

the i-th leftmost leaf of T, then σ

is the i-th leftmost atomic interval. If v is an interior node with

and v

as its children, then the common endpoint x

of σ

and σ

is stored at v, and we set

= σ

∪ σ

. For a point x ∈ R, let Π

denote the path in T from the root to the deepest node

z such that σ

contains x. Note that for every point x 6∈ P , Π

is a path from the root to the leaf z

whose atomic interval contains z.

In an interval tree, an interval s ∈ S is stored at the highest node v such that x

∈ s. Note that

is empty if v is a leaf. Let S

⊆ S be the set of intervals stored at v. Let s = [a, b] be an interval

in S

. We split s into two subintervals s

ℓ

= [a, x

] and s

= [x

, b]. We deﬁne L

= {s

ℓ

| s ∈ S

= {s

| s ∈ S

}, L = {s

ℓ

| s ∈ S}, and R = {s

| s ∈ S}. For a query point q ∈ R,

ω(S(q)) = ω(L(q)) + ω(R(q)).

We compute ω (L ( q)) and ω ( R(q)) separately and return their

sum.

If q = x

for some separating point x

, we query with a q

that we consider to be symbolically larger than q.

We add the following secondary structures to the interval tree to compute ω(L(q)) efﬁciently;

the construction is symmetric for computing ω(R(q)) . For a node v, let E

be the set of the left

endpoints of the intervals in L

. We assign to each point of E

the weight of the corresponding

interval, and store E

in a BST. Clearly the total size of the data structure, including all secondary

structures, is O(n).

Let q be a query point, Let Π

ℓ

⊆ Π

be the set of nodes v ∈ Π

such that v’s left child

is also in Π

. Note that L(q) ⊆

v∈Π

ℓ

and that ω(L(q)) =

v∈Π

ℓ

ω(L

(q)). Moreover,

an interval [a, x

] ∈ L

contains q if and only if a ≤ q. To compute ω(L(q)) we traverse the

path Π

. At each node v ∈ Π

ℓ

, we perform a PREFIXSUM(q) query on E

to obtain the weight

a∈E

,a≤q

ω(a) = ω(L

(q)) in O(log n) time. Finally, we sum these weights and return the

overall weight. Since we spend O(log n) time at each node v, the overall query time is O(log

n).

We can insert or delete an interval s in an interval tree in O(log n) time by ﬁnding the node v such

that s ∈ S

and updating the BST representing E

Note that it might be tempting to use dynamic fractional cascading [15] to speed up the query

procedure, but this does not work because the PREFIXSUM(q) operation on a BST actually relies on

retrieving O(log n) weights in the BST, not just one search location. If one stores the preﬁx sum at

the search location, the update cost of the BST will be high.

Segment tree. A segment tree allows us to compute ω(S(q)) in O(log n) time, although the up-

date time is O(log

n) and the size is O(n log n). The base tree T of the segment tree is the same

as that of the interval tree. However, we now store an interval s = [a, b] at a node v if σ

⊆ s and

p(v)

* s, where p(v) denotes the parent of v. Note that the parents of the nodes storing s lie on

∪Π

, and that we can ﬁnd these nodes in O(log n) time. Let

⊆ S be the set of intervals stored

at v in the segment tree. We maintain

in a BST (imposing an arbitrary order on these intervals).

For a leaf z, let L

be the set of intervals with an endpoint inside σ

. (For now, L

contains at most

one interval, but denoting it as a set will be convenient later on.) We store L

at z.

Since an interval s is stored at O( l og n) nodes, the total size is O ( n log n). An interval can be

inserted or deleted in O(log

n) time by ﬁrst ﬁnding in O(log n) time the nodes at which s is stored

and then updating the BST at each such node. For a query point q ∈ R, let z be the leaf on the path

, then we have

S(q) =

[

v∈Π

∪ L

(q). (1)

Since the sets

for v ∈ Π

and L

are pairwise disjoint, ω(S(q)) =

v∈Π

ω(

) + ω(L

(q)).

Therefore, ω(S(q)) can be computed in O(log n) time by traversing the path Π

, retrieving the

value ω(

) in O(1) time from the BST representing

at each node v ∈ Π

, and ﬁnally checking

if the interval in L

contains q.

3 An Optimal Data Structure for Fixed Endpoints

In this section, we continue to assume that although the set S of intervals is dynamic, the endpoints

of these intervals belong to a ﬁxed set P of O(n) points. Recall that our assumptions in Section 2

also imply that each point of P is an endpoint of at most one interval. The main result is a linear-size

data structure that answers a stabbing-semigroup query in O(log n) time and performs an update in

O(log n) time. Recall that the segment tree attains an optimal query time whereas the interval tree

An Optimal Dynamic Data Structure for Stabbing-Semigroup Queries

Figures

Citations

Improved bounds for orthogonal point enclosure query and point location in orthogonal subdivisions in R3

Efficient Top-k Indexing via General Reductions

Approximate range counting revisited

Dynamic Planar Orthogonal Point Location in Sublogarithmic Time

A Dynamic I/O-Efficient Structure for One-Dimensional Top-k Range Reporting

References

Introduction to Algorithms

Computational Geometry: Algorithms and Applications

Introduction to Algorithms, 2nd edition.

The input/output complexity of sorting and related problems

A data structure for dynamic trees

Related Papers (5)

On the Complexity of Maintaining Partial Sums

Optimal Dynamic Sequence Representations

Are bitvectors optimal

Computational Geometry: Algorithms and Applications

Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays

Frequently Asked Questions (13)

Q1. What contributions have the authors mentioned in the paper "An optimal dynamic data structure for stabbing-semigroup queries∗" ?

Q2. What is the way to solve the stabbing-semigroup problem?

Q3. What is the way to handle a stabbing-semigroup query?

Q4. What is the way to solve the stabbingsemigroup query?

Q5. How do the authors update the affected left interval structures?

Q6. How can the authors rebuild the base tree without the secondary structures?

Q7. What is the easiest way to handle deletions of endpoints?

Q8. What is the time it takes to rearrange the middle intervals in a node?

Q9. What is the weight of the element corresponding to the pair v′, v′′?

Q10. How do the authors use the lower bound for partial sums?

Q11. How can the authors extend the structure to higher dimensions?

Q12. What is the function that splits the tree containing v?

Q13. How can the authors solve the partial-sum problem?