What contributions have the authors mentioned in the paper "An adaptive packed-memory array" ?

Q: What contributions have the authors mentioned in the paper "An adaptive packed-memory array" ?

This paper gives the first adaptive packed-memory array ( APMA ), which automatically adjusts to the input pattern. The paper analyzes sequential inserts, where the insertions are to the front of the APMA, hammer inserts, where the insertions “ hammer ” on one part of the APMA, random inserts, where the insertions are after random elements in the APMA, and bulk inserts, where for constant α ∈ [ 0,1 ], Nα elements are inserted after random elements in the APMA. The paper then gives simulation results that are consistent with the asymptotic bounds.

(Open Access) An adaptive packed-memory array (2006) | Michael A. Bender

An Adaptive Packed-Memory Array

Michael A. Bender

Stony Brook University

and

Haodong Hu

Stony Brook University

The packed-memory array (PMA) is a data structure that maintains a dynamic set of N elements in sorted order

in a Θ(N)-sized array. The idea is to intersperse Θ(N) empty spaces or gaps among the elements so that only

a small number of elements need to be shifted around on an insert or delete. Because the elements are stored

physically in sorted order in memory or on disk, the PMA can be used to support extremely efﬁcient range

queries. Speciﬁcally, the cost to scan L consecutive elements is O(1+ L/B) memory transfers.

This paper gives the ﬁrst adaptive packed-memory array (APMA), which automatically adjusts to the input

pattern. Like the traditional PMA, any pattern of updates costs only O(log

N) amortized element moves and

O(1+ (log

N)/B) amortized memory transfers per update. However, the APMA performs even better on many

common input distributions achieving only O(logN) amortized element moves and O(1+ (logN)/B) amortized

memory transfers. The paper analyzes sequential inserts, where the insertions are to the front of the APMA,

hammer inserts, where the insertions “hammer” on one part of the APMA, random inserts, where the insertions

are after random elements in the APMA, and bulk inserts, where for constant α ∈ [0,1], N

elements are inserted

after random elements in the APMA. The paper then gives simulation results that are consistent with the asymp-

totic bounds. For sequential insertions of roughly 1.4 million elements, the APMA has four times fewer element

moves per insertion than the traditional PMA and running times that are more than seventimes faster.

Categories and Subject Descriptors: D.1.0 [Programming Techniques]: General; E.1 [Data Structures]: Ar-

rays; E.1 [Data Structures]: Lists, stacks, queues; E.5 [Files]: Sorting/searching; H.3.3 [Information Storage

and Retrieval]: Information Search and Retrieval

General Terms: Algorithms, Experimentation, Performance, Theory.

Additional Key Words and Phrases: Adaptive Packed-Memory Array, Cache Oblivious, Locality Preserving,

Packed-MemoryArray, Range Query, Rebalance, Sequential File Maintenance, Sequential Scan, Sparse Array.

1. INTRODUCTION

A classical problem in data structures and databases is how to maintain a dynamic set of

N elements in sorted order in a Θ(N)-sized array. The idea is to intersperse Θ(N) empty

spaces or gaps among the elements so that only a small number of elements need to be

shifted around on an insert or delete. These data structures effectively simulate a library

bookshelf, where gaps on the shelves mean that books are easily added and removed.

Remarkably, such data structures can be efﬁcient for any pattern of inserts/deletes. In-

Department of Computer Science, Stony Brook University, Stony Brook, NY 11794-4400, USA.

Email: {bender,huhd}@cs.sunysb.edu. This research was supported in part by NSF Grants

CCF 0621439/0621425, CCF 0540897/05414009,CCF 0634793/0632838, and CNS 0627645.

Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use

providedthat the copies are not made or distributedfor proﬁt or commercial advantage,the ACM copyright/server

notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the

ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior speciﬁc

permission and/or a fee.

 2007 ACM 1529-3785/2007/0700-0001$5.00

ACM Transactions on Computational Logic, Vol. V, No. N, July 2007, Pages 1–30.

2 · Michael A. Bender and Haodong Hu

deed, it has been known for over two decades that the number of element moves per update

is only O(log

N) both amortized [Itai et al. 1981] and in the worst case [Willard 1982;

1986; 1992]. Since these data structures were proposed, this problem has been studied un-

der different names, including sparse arrays [Itai et al. 1981; Katriel 2002], sequential ﬁle

maintenance [Willard 1982; 1986; 1992], and list labeling [Dietz 1982; Dietz and Sleator

1987; Dietz and Zhang 1990; Dietz et al. 1994]. The problem is also closely related to

the order-maintenance problem [Dietz 1982; Tsakalidis 1984; Dietz and Sleator 1987;

Bender et al. 2002].

Recently there has been renewed interest in these sparse-array data structures because

of their application in I/O-efﬁcient and cache-oblivious algorithms. The I/O-efﬁcient and

cache obliviousversion of the sparse array is called thepacked memory array (PMA) [Ben-

der et al. 2000; 2005]. The PMA maintains N elements in sorted order in a Θ(N)-sized

array. It supports the operations insert, delete, and scan. Let B be the number of ele-

ments that ﬁt within a memory block. To insert an element y after a given element x,

when given a pointer to x, or to delete x, costs O(log

N) amortized element moves and

O(1+(log

N)/B) amortized memory transfers. The PMA maintains the density invariant

that in any region of size S (for S greater than some small constant value), there are Θ(S)

elements stored in it. To scan L elements after a given element x, when given a pointer to

x, costs Θ(1 + L/B) memory transfers.

The PMA has been used in cache-oblivious B-trees [Bender et al. 2000; Bender et al.

2002; Brodal et al. 2002; Bender et al. 2004; Bender et al. 2005; Bender et al. 2006], con-

current cache-obliviousB-trees [Bender et al. 2005], cache-oblivious string B-tree [Bender

et al. 2006], and scanning structures [Bender et al. 2002]. A sparse array in the same spirit

as the PMA was independently proposed and used in the locality-preserving B-tree of [Ra-

man 1999], although the asymptotic space bounds are superlinear and therefore inferior to

the linear space bounds of the earlier sparse-array data structures [Itai et al. 1981; Willard

1982; 1986; 1992] and the PMA [Bender et al. 2000; 2005].

We now give more details about how to implement search in a PMA. For example, the

update and scan bounds above assume that we are given a pointer to an element x; we now

show how to ﬁnd such a pointer. A straightforward approach is to use a standard binary

search, slightly modiﬁed to deal with gaps. However, binary search does not have good

data locality. As a result, binary search is not efﬁcient when the PMA resides on disk

because search requires O(1+ logdN/Be) memory transfers. An alternative approach is to

use a separate index into the array; the index is designed for efﬁcient searches. In [Raman

1999] that index is a B-tree, and in [Bender et al. 2000; Bender et al. 2002; 2004; Bender

et al. 2005] the index is some type of binary search tree, laid out in memory using a so-

called van Emde Boas layout [Prokop 1999; Bender et al. 2000; 2005].

The primary use of the PMA in the literature has been for sequential storage in mem-

ory/disk of all the elements of a (cache-oblivious or traditional) B-tree. An early paper

suggesting this idea was [Raman 1999]. The PMA maintains locality of reference at all

granularities and consequently supports extremely efﬁcient sequential scans/range queries

of the elements. The concern with traditional B-trees is that the 2K or 4K sizes of disk

blocks are too small to amortize the cost of disk seeks. Consequently, on modern disks,

random block accesses are well over an order-of-magnitude slower than sequential block

accesses. Thus, locality-preserving B-trees and cache-oblivious B-trees based on PMAs

support range queries that run an order of magnitude faster than those of traditional B-

ACM Transactions on ComputationalLogic, Vol. V, No. N, July 2007.

An Adaptive Packed-Memory Array · 3

trees [Bender et al. 2006]. Moreover, since the elements are maintained strictly in sorted

order, these structures do not suffer from aging unlike most ﬁle systems and databases.

The point is that traditional B-trees age: As new blocks are allocated and deallocated to

the B-tree, blocks that are logically near each other, are far from each other on the disk.

The result is that range-query performance suffers.

The PMA is an efﬁcient and promising data structure, but it also has weaknesses. The

main weakness is that the PMA performs relatively poorly on some common insertion

patterns such as sequential inserts. For sequential inserts, the PMA performs near its worst

in terms of the number of elements moved per insert. The PMA’s difﬁculty with sequential

inserts is that the insertions “hammer” on one part of the array, causing many elements to

be shifted around. Although O(log

N) amortized elements moves and O(1 + (log

N)/B)

amortized memory transfers is surprisingly good considering the stringent requirements

on the data order, it is relatively slow compared with traditional B-tree inserts. Moreover,

sequential inserts are common, and B-trees in databases are frequently optimized for this

insertion pattern. It would be better if the PMA could perform near its best, not worst, in

this case.

In contrast, one of the PMA’s strengths is its performance on common insertion patterns

such as random inserts. For random inserts, the PMA performs extremely well with only

O(logN) element moves per insert and only O(1 + (logN)/B) memory transfers. This

performance surpasses the guarantees for arbitrary inserts.

Results. This paper proposes an adaptive packed-memory array (abbreviated adaptive

PMA or APMA), which overcomes these deﬁciencies of the traditional PMA. Our struc-

ture is the ﬁrst PMA that adapts to insertion patterns and it gives the largest decrease in

the cost of sparse arrays/sequential-ﬁle maintenance in almost two decades. The APMA

retains the same amortized guarantees as the traditional PMA, but adapts to common in-

sertion patterns, such as sequential inserts, random inserts, and bulk inserts, where chunks

of elements are inserted at random locations in the array.

We give the following results for the APMA:

• We ﬁrst show that the APMA has the “rebalance property”, which ensures that any

pattern of insertions cost only O(1 + (log

N)/B) amortized memory transfers and

O(log

N) amortized element moves. Because the elements are kept in sorted order in

the APMA, as with the PMA, scans of L elements costs O(1 + L/B) memory trans-

fers. Thus, the adaptive PMA guarantees performance at least as good as that of the

traditional PMA.

We next analyze the performance of the APMA under some common insertion patterns.

• We show that for sequential inserts, where all the inserts are to the front of the ar-

ray, the APMA makes only O(logN) amortized element moves and O(1+ (logN)/B)

amortized memory transfers.

• We generalize this analysis to hammer inserts, where the inserts hammer on any single

element in the array.

• We then turn to random inserts, where each insert occurs after a randomly chosen ele-

ment in the array. We establish that the insertioncost is again onlyO(logN) amortized

element moves and O(1+ (logN)/B) amortized memory transfers.

• We generalize all these previous results by analyzing the case of bulk inserts. In the

bulk-insert insertion pattern, we pick a random element in the array and perform N

inserts after it for α ∈ [0, 1]. We show that for all values of α ∈ [0,1], the APMA also

ACM Transactions on ComputationalLogic, Vol. V, No. N, July 2007.

4 · Michael A. Bender and Haodong Hu

only performs O(logN) amortized element moves and O(1 + (logN)/B) amortized

memory transfers.

• We next perform simulations and experiments, measuring the performance of the

APMA on these insertion patterns. For sequential insertions of roughly 1.4 million

elements, the APMA has over four times fewer element moves per insertion than the

traditional PMA and running times that are nearly seven times faster. For bulk inser-

tions of 1.4 million elements, where f(N) = N

0.6

, the APMA has over twotimes fewer

element moves per insertion than the traditional PMA and running times that are over

three times faster.

2. ADAPTIVE PACKED-MEMORY ARRAY

In this section we introduce the adaptive PMA. We ﬁrst explain how the adaptive PMA

differs from the traditional PMA. We then show that both PMAs have the same amor-

tized bounds, O(log

N) element moves and O(1 + (log

N)/B) memory transfers per in-

sert/delete. Thus, adaptivity comes at no extra asymptotic cost.

Description of Traditionaland Adaptive PMAs. We ﬁrst describe how to insert into both

the adaptive and traditional PMAs. Henceforth, PMA with no preceding adjective refers

to either structure. When we insert an element y after an existing element x in the PMA,

we look for a neighborhood around element x that has sufﬁciently low density, that is,

we look for a subarray that is not storing too many or too few elements. Once we ﬁnd

a neighborhood of the appropriate density, we rebalance the neighborhood by spacing

out the elements, including y. In the traditional PMA, we rebalance by spacing out the

elements evenly. In the adaptive PMA, we may rebalance the elements unevenly, based on

previous insertions, that is, we leave extra gaps near elements that have recently had inserts

after them.

We deal with a PMA that is too full or empty, as with a traditional hash table. Namely,

we recopy the elements into a new PMA that is a constant factor larger or smaller. In this

paper, this constant is stated as 2. However, the constant could be larger or smaller (say

1.2) with almost no change in running time. This is because most of the cost from element

moves come from rebalances rather than from recopies.

We now give some terminology. We divide the PMA into Θ(N/ logN) segments, each

of size Θ(logN), and we let the number of segments be a power of 2. We call a contiguous

group of segments a window. We view the PMA in terms of a tree structure, where the

nodes of the tree are windows. The root node is the window containing all segments, and

a leaf node is a window containing a single segment. A node in the tree that is a window

of 2

segments has two children, a left child that is the window of the ﬁrst 2

i−1

segments

and a right child that is the window of the last 2

i−1

segments.

We let the height of the tree be h, so that 2

= Θ(N/ logN) and h = lgN − lglgN +

O(1). The nodes at each height ` have an upper density threshold τ

and a lower density

threshold ρ

, which together determine the acceptable density of keys within a window of

segments. As the node height increases, the upper density thresholds decrease and the

lower density thresholds increase. Thus, for constant minimum and maximum densities

min

and D

max

, we have

min

= ρ

< ··· < ρ

< τ

< ··· < τ

= D

max

. (1)

The density thresholds on windows of intermediate powers of 2 are arithmetically dis-

ACM Transactions on ComputationalLogic, Vol. V, No. N, July 2007.

An Adaptive Packed-Memory Array · 5

tributed. For example, the maximum density threshold of a segment can be set to 1.0, the

maximum density threshold of the entire array to 0.5, the minimum density threshold of

the entire array to 0.2, and the minimum density of a segment to 0.1. If the PMA has 32

segments, then the maximum density threshold of a single segment is 1.0, of two segments

is 0.9, of four segments is 0.8, of eight segments is 0.7, of 16 segments is 0.6, and of all 32

segments is 0.5.

More formally, upper and lower density thresholds for nodes at height ` are deﬁned as

follows:

= τ

+ (τ

− τ

)(h− `)/h (2)

= ρ

− (ρ

− ρ

)(h− `)/h. (3)

Moreover,

2ρ

< τ

, (4)

because when we double the size of an array that becomes too dense, the new array must

be within the density threshold.

Observe that any values of τ

, τ

, ρ

, and ρ

that satisfy

(1)-(4) and enable the array to have size Θ(N) will work. The important requirement is

that

`−1

− τ

= O(ρ

− ρ

`−1

) = O(1/ logN).

We now give more details about how to insert element y after an existing element x. If

there is enough space in the leaf (segment) containing x, then we rearrange the elements

within the leaf to make room for y. If the leaf is full, then we ﬁnd the closest ancestor of the

leaf whose density is within the permitted thresholds and rebalance. To delete an element

x, we remove x from its segment. If the segment falls below its density threshold, then,

as before, we ﬁnd the smallest enclosing window whose density is within threshold and

rebalance. As described above, if the entire array is above the maximum density threshold

(resp., below the minimum density threshold), then we recopy the keys into a PMA of

twice (resp., half) the size.

We introduce further notation. Let Cap(u

) denote the number of array positions in

node u

of height `. Since there are 2

segments in the node, the capacity is Θ(2

logN).

Let Gaps(u

) denote the number of gaps, i.e., unﬁlled array positions in node u

. Let

Density(u

) denote the fraction of elements actually stored in node u

, i.e., Density(u

) =

1− Gaps(u

)/Cap(u

Rebalance. We rebalance a node u

of height ` if u

is within threshold, but we detect

that a child node u

`−1

is outside of threshold. Any node whose elements are rearranged in

the process of a rebalance is swept. Thus, we sweep a node u

of height ` when we detect

that a child node u

`−1

is outside of threshold, butnow u

need not be withinthreshold. Note

that with this rebalance scheme, thistree can be implicitlyrather than explicitlymaintained.

In this case, a rebalance consists of two scans, one to the left and one to the right of the

insertion point until we ﬁnd a region of the appropriate density.

In a traditional PMA we rebalance evenly, whereas in the adaptive PMA we rebalance

unevenly. The idea of the APMA is to store a smaller number of elements in the leaves in

There are straightforward ways to generalize (4) to further reduce space usage. Introducing this generalization

here leads to unnecessary complication in presentation.

ACM Transactions on ComputationalLogic, Vol. V, No. N, July 2007.

An adaptive packed-memory array

Figures

Citations

ALEX: An Updatable Adaptive Learned Index

Hashedcubes: Simple, Low Memory, Real-Time Visual Exploration of Big Data

Updating a cracked database

Accelerating dynamic graph analytics on GPUs

A survey of B-tree logging and recovery techniques

References

Cache-oblivious algorithms

Two algorithms for maintaining order in a list

Maintaining order in a linked list

Cache oblivious search trees via binary trees of small height

A Sparse Table Implementation of Priority Queues

Related Papers (5)

An adaptive packed-memory array

Hybrid analysis: static & dynamic memory reference analysis

Virtual Cache Line: A New Technique to Improve Cache Exploitation for Recursive Data Structures

Run-time parallelization and scheduling of loops

Anti-Persistence on Persistent Storage: History-Independent Sparse Tables and Dictionaries

Frequently Asked Questions (1)

Q1. What contributions have the authors mentioned in the paper "An adaptive packed-memory array" ?