scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Functional approach to data structures and its use in multidimensional searching

01 Jun 1988-SIAM Journal on Computing (Society for Industrial and Applied Mathematics)-Vol. 17, Iss: 3, pp 427-462
TL;DR: These results include, in particular, linear-size data structures for range and rectangle counting in two dimensions with logarithmic query time and a redefinition of data structures in terms of functional specifications.
Abstract: We establish new upper bounds on the complexity of multidimensional searching. Our results include, in particular, linear-size data structures for range and rectangle counting in two dimensions with logarithmic query time. More generally, we give improved data structures for rectangle problems in any dimension, in a static as well as a dynamic setting. Several of the algorithms we give are simple to implement and might be the solutions of choice in practice. Central to this paper is the nonstandard approach followed to achieve these results. At its root we find a redefinition of data structures in terms of functional specifications.

Summary (1 min read)

Jump to:  and [Summary]

Summary

  • The authors establish new upper bounds on the complexity of multidimensional 3earching.
  • The authors results include, in particular, linear-size data structures for range and rectangle counting in two dimensions with logarithmic query time.
  • Central to this paper is the nonstandard approach followed to achieve these results.
  • Key words, functional programming, data structures, concrete complexity, multidimensional search, computational geometry, pointer machine, range search, intersection search, rectangle problems CR Categories.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

SIAM
J.
COMPUT.
Vol.
17,
No.
3,
June
1988
1988
Society
for
Industrial
and
Applied
Mathematics
001
A
FUNCTIONAL
APPROACH
TO
DATA
STRUCTURES
AND
ITS
USE
IN
MULTIDIMENSIONAL
SEARCHING*
BERNARD
CHAZELLE"
Abstract.
We
establish
new
upper
bounds
on
the
complexity
of
multidimensional
3earching.
Our
results
include,
in
particular,
linear-size
data
structures
for
range
and
rectangle
counting
in
two
dimensions
with
logarithmic
query
time.
More
generally,
we
give
improved
data
structures
for
rectangle
problems
in
any
dimension,
in
a
static
as
well
as
a
dynamic
setting.
Several
of
the
algorithms
we
give
are
simple
to
implement
and
might
be
the
solutions
of
choice
in
practice.
Central
to
this
paper
is
the
nonstandard
approach
followed
to
achieve
these
results.
At
its
root
we
find
a
redefinition
of
data
structures
in
terms
of
functional
specifications.
Key
words,
functional
programming,
data
structures,
concrete
complexity,
multidimensional
search,
computational
geometry,
pointer
machine,
range
search,
intersection
search,
rectangle
problems
CR
Categories.
5.25,
3.74,
5.39
1.
Introduction.
This
paper
has
two
main
parts:
in
2,
we
discuss
a
method
for
transforming
data
structures
using
functional
specifications;
in
the
remaining
sections,
we
use
such
transformations
to
solve
a
number
of
problems
in
multidimensional
searching.
To
begin
with,
let
us
summarize
the
complexity
results
of
this
paper.
The
generalization
of
the
notion
of
rectangle
in
higher
dimensions
is
called
a
d-range:
it
is
defined
as
the
Cartesian
product
of
d
closed
intervals
over
the
reals.
Let
V
be
a
set
of
n
points
,a
and
let
v
be
a
function
mapping
a
point
p
to
an
element
v(p)
in
a
commutative
semigroup
(G,
+).
Let
W
be
a
set
of
n
d-ranges.
(1)
Range
counting:
given
a
d-range
q,
compute
the
size
of
V
c
q.
(2)
Range
reporting:
given
a
d-range
q,
report
each
point
of
V
q.
(3)
Semigroup
range
searching:
given
a
d-range
q,
compute
,pvcq
v(p).
(4)
Range
searching
for
maximum:
semigroup
range
searching
with
maximum
as
semigroup
operation.
(5)
Rectangle
counting:
given
a
d-range
q,
compute
the
size
of
{r
W[qf)
r
}.
(6)
Rectangle
reporting:
given
a
d-range
q,
report
each
element
of
{r
W]q
f’)r
}.
In
each
case,
q
represents
a
query
to
which
we
expect
a
fast
response.
The
idea
is
to
do
some
preprocessing
to
accommodate
incoming
queries
in
a
repetitive
fashion.
Note
that
range
counting
(resp.,
reporting)
is
a
subcase
of
rectangle
counting
(resp.
reporting).
To
clarify
the
exposition
(and
keep
up
with
tradition),
however,
we
prefer
to
treat
these
problems
separately.
Other
well-known
problems
falling
under
the
umbrella
of
rectangle
searching
include
point
enclosure
and
orthogonal
segment
intersection.
The
former
involves
computing
the
number
of
d-ranges
enclosing
a
given
query
point,
while
the
latter,
set
in
two
dimensions,
calls
for
computing
how
many
horizontal
segments
from
a
given
collection
intersect
a
query
vertical
segment.
In
both
cases,
the
reduction
to
rectangle
counting
is
immediate.
The
thrust
of
our
results
is
to
demonstrate
the
existence
of
efficient
solutions
to
these
problems
that
use
minimum
storage.
One
of
the
key
ideas
is
to
redesign
range
*
Received
by
the
editors
September
23,
1985;
accepted
for
publication
(in
revised
form)
March
9,
1987.
A
preliminary
version
of
this
paper
has
appeared
in
the
Proceedings
of
the
26th
Annual
IEEE
Symposium
on
Foundations
of
Computer
Science,
Portland,
Oregon,
October
1985,
pp.
165-174.
t
Department
of
Computer
Science,
Princeton
University,
Princeton,
New
Jersey
08544.
This
work
was
begun
when
the
author
was
at
Brown
University,
Providence,
Rhode
Island
02912.
This
work
was
supported
in
part
by
National
Science
Foundation
grant
MCS
83-03925,
427

428
BERNARD
CHAZELLE
trees
(Bentley
[B2])
and
segment
trees
(Bentley
[B1],
Bentley
and
Wood
[BW])
so
as
to
require
only
linear
storage.
Since
improvements
in
this
area
typically
involve
trimming
off
logarithmic
factors,
one
must
be
clear
about
the
models
of
computation
to
be
used.
It
is
all
too
easy
to
"improve"
algorithms
by
encoding
astronomical
numbers
in
one
computer
word
or
allowing
arbitrarily
complex
arithmetic.
To
guard
us
from
such
dubious
tricks,
we
will
assume
that
the
registers
and
memory
cells
of
our
machines
can
only
hold
integers
in
the
range
[0,
n].
In
the
following,
the
sign
refers
to
multiplication,
+
to
division
(truncated
to
the
floor),
and
shift
to
the
operation
shift
(k)=2
k,
defined
for
any
k
(0_-
<
k=
<
[log
nJ).
No
operation
is
allowed
if
the
result
or
any
of
the
operands
falls
outside
of
the
range
[0,
n].
This
will
make
our
model
very
weak
and
thus
give
all
the
more
significance
to
our
upper
bounds.
Remark.
As
observed
in
Gabow
et
al.
[GBT]
the
"orthogonal"
nature
of
the
problems
listed
above
makes
it
possible
to
work
in
rank
space,
that
is,
to
deal
not
with
the
coordinates
themselves
but
with
their
ranks.
This
is
precisely
what
we
will
be
doing
here.
The
conversion
costs
logarithmic
time
per
coordinate,
but
it
has
the
advantage
of
replacing
real
numbers
by
integers
in
the
range
[1,
n].
It
is
important
to
keep
in
mind
that
although
our
data
structures
will
use
integers
over
O(log
n)
bits
internally,
no
such
restriction
will
be
placed
on
the
input
and
query
coordinates.
On
the
contrary
these
will
be
allowed
to
assume
any
real
values.
The
models
of
computation
we
will
consider
include
variants
of
pointer
machines
(Tarjan
IT]).
Recall
that
the
main
characteristic
of
these
machines
is
to
forbid
any
kind
of
address
calculation.
New
memory
cells
can
be
obtained
from
a
free
list
and
are
delivered
along
with
pointers
to
them.
A
pointer
is
just
a
symbolic
name,
that
is,
an
address
whose
particular
representation
is
transparent
to
the
machine
and
on
which
no
arithmetic
operation
is
defined.
Only
pointers
provided
by
the
free
list
can
be
used.
For
the
time
being,
let
us
assume
that
the
only
operations
allowed
are
and
<
along
with
the
standard
Booleans.
We
introduce
our
models
of
computation
in
order
of
increasing
power.
Of
course,
we
always
assume
that
the
semigroup
operations
which
might
be
defined
(problems
3-4)
can
be
performed
in
constant
time.
(1)
An
elementary
pointer
machine
(EPM)
is
a
pointer
machine
endowed
with
+.
(2)
A
semi-arithmeticpointer
machine
(SAPM)
is
a
pointer
machine
endowed
with
q-,
--,
X,
+.
(3)
An
arithmetic
pointer
machine
(APM)
is
a
pointer
machine
endowed
with
+,
-,
,
+,
shift.
(4)
A
random
access
machine
(RAM)
is
endowed
with
comparisons,
and
+,
-,
x,
+.
See
Aho
et
al.
[AHU]
for
details.
Our
motivation
for
distinguishing
between
these
models
is
twofold:
one
reason
is
to
show
that
our
basic
techniques
still
work
even
on
the
barest
machines.
Another
is
to
assess
the
sensitivity
of
the
complexity
of
query-answering
to
the
model
of
computation.
Next,
we
briefly
discuss
these
definitions.
First
of
all,
complexity
is
measured
in
all
cases
under
the
uniform
cost
criterion
(Aho
et
al.
[AHU]).
Note
that
subtraction
can
be
simulated
in
O(logn)
time
and
O(n)
space
on
an
EPM
by
binary
search.
Similarly,
shift
can
be
simulated
on
an
SAPM
in
O(log
log
n)
time
by
binary
search
in
a
tree
of
size
O(log
n)
(this
can
also
be
done
with
constant
extra
space
by
repeated
squaring).
For
this
reason,
we
will
drop
the
SAPM
model
from
consideration
altogether.
Any
result
mentioned
in
this
paper
with
regard
to
an
APM
also
holds
on
an
SAPM,
up
to
within
a
multiplicative
factor
of
log
log
n
in
the
time
complexity.
All
All
logarithms
are
taken
to
the
base
2.
Throughout
this
paper,
we
will
use
the
notation
log"
n
to
designate
(log
n)C

A
FUNCTIONAL
APPROACH
TO
DATA
STRUCTURES
429
the
operations
mentioned
above
are
in
the
instruction
set
of
any
modern
computer,
so
our
models
are
quite
realistic.
We
have
omitted
shift
from
the
RAM,
because
this
operation
can
be
simulated
in
constant
time
by
table
look-up.
One
final
comment
concerns
the
word-size.
Suppose
that
for
some
application
we
need
to
use
integers
in
the
range
[-he,
ne],
for
some
constant
c
>
0.
Any
of
the
machines
described
above
will
work
just
as
well.
Indeed,
we
can
accommodate
any
polynomial
range
by
consider-
ing
virtual
words
made
of
a
constant
number
of
actual
machine
words.
The
simulation
will
degrade
the
time
performance
by
only
a
constant
factor.
See
Knuth
[K2]
for
details
on
multiple-precision
arithmetic.
The
complexity
results.
We
have
summarized
our
results
for
9
2
in
Tables
1
and
2.
The
first
concerns
the
static
case.
Each
pair
(x,
y)
indicates
the
storage
O(x)
required
by
the
data
structure
and
the
time
O(y)
to
answer
a
query;
we
use
e
to
denote
an
arbitrary
positive
real.
In
the
second
table
one
will
find
our
results
for
the
dynamic
case.
We
give
successively
the
storage
requirement,
the
query
time,
and
the
time
for
an
insertion
or
a
deletion.
In
both
cases,
k
indicates
the
number
of
objects
to
be
reported
plus
one
(we
add
a
1
to
treat
the
no-output
case
uniformly).
The
time
to
construct
each
of
these
data
structures
is
O(n
log
n).
TABLE
The
static
case.
Problem
RAM
APM
EPM
range/rectangle
counting
n,
log
n
n,
log
n
n,
log
n
range/rectangle
reporting
n
log
log
n,
k
log
log
--+
log
n
(n
log
n,
k
+
log
n)
(n,
klog
(
k(log
range
search
for
max
(n,
log
1+
n)
(n
log log
n,
log
n
log log
n)
(n
log
n,
log
n)
(n,
log
n)
(n,
log
n)
semigroup
range
search
(n,
log
2+
n)
(n
log
log
n,
log
n
log
log
n)
(n
log
n,
log
n)
(n,
log
n)
(n,
log
n)
TABLE
2
The
dynamic
case
on
an
EPM.
Problem
Storage
Query
time
Update
time
range/rectangle
counting
O(n)
O(log
n)
O(log
n)
range
reporting
O(n)
O(k(log2n/k)
2)
O(Iog
n)
range
search
for
max
O(n)
O(log
n
log
log
n)
O(log
n
log
log
n)
semigroup
range
search
O(n)
O(log
n)
O(log
4
n)
rectangle
reporting
O(n)
O(k(log
2n/k)
+log
n)
O(log
n)

430
BERNARD
CHAZELLE
In
the
dynamic
case,
we
have
restricted
ourselves
to
the
EPM
model.
Our
objective
was
only
to
show
that
the
techniques
of
this
paper
could
be
dynamized
even
in
the
weakest
model.
It
is
likely
that
many
of
these
bounds
can
be
significantly
lowered
if
we
are
ready
to
use
a
more
powerful
model
such
as
a
RAM
or
an
APM.
We
leave
these
improvements
as
open
problems.
In
the
remainder
of
this
paper
we
will
mention
upper
bounds
only
in
connection
with
the
weakest
models
in
which
they
hold.
This
is
quite
harmless
as
long
as
the
reader
keeps
in
mind
that
any
result
relative
to
an
EPM
holds
on
an
APM
or
a
RAM.
Similarly,
anything
one
can
do
on
an
APM
can
be
done
just
as
well
on
a
RAM.
Also
for
the
sake
of
exposition,
we
restrict
ourselves
to
two
dimensions
(d
2),
but
we
recall
a
classical
technique
(Bentley
[B2])
that
allows
us
to
extend
all
our
data
structures
to
,9l
d
(d
>
2).
To
obtain
the
complexity
of
the
resulting
algorithms,
just
multiply
each
expression
in
the
original
complexity
by
a
factor
of
loga-2n
(note"
the
terms
involving
k
remain
unchanged,
but
a
term
loga-n
is
to
be
included
in
the
query
times).
Update
times
are
best
thought
of
as
amortized
bounds,
that
is,
averaged
out
over
a
sequence
of
transactions.
A
general
technique
can
be
used
in
most
cases,
however,
to
turn
these
bounds
into
worst-case
bounds
(Willard
and
Lueker
[WL]).
Similarly,
a
method
described
in
Overmars
[O]
can
often
be
invoked
to
reduce
deletion
times
by
a
factor
of
log
n.
Finally,
we
can
use
a
result
of
Mehlhorn
[Me]
and
Willard
[W1]
to
show
that
if
we
can
afford
an
extra
log
n
factor
in
query
time
then
the
storage
can
often
be
reduced
by
a
factor
of
log
log
n
with
each
increment
in
dimension.
We
will
not
consider
these
variants
here
for
at
least
two
reasons.
The
first
is
that
the
techniques
have
been
already
thoroughly
exposed
and
it
would
be
tedious
but
elementary
to
apply
them
to
our
data
structures.
The
second
is
that
these
variants
are
usually
too
complex
to
be
practical.
We
will
strive
in
this
paper
to
present
data
structures
that
are
easy
to
implement.
We
have
not
succeeded
in
all
cases,
but
in
some
we
believe
that
we
have.
For
example,
our
solutions
to
range
counting
are
short,
simple,
and
very
efficient
in
practice.
To
illustrate
this
point
we
have
included
in
the
paper
the
code
of
a
Pascal
implementation
of
one
of
the
solutions.
Comparison
with
previous
work.
Roughly
speaking,
our
results
constitute
improve-
ments
of
a
logarithmic
factor
in
storage
over
previous
methods.
In
particular,
we
present
the
first
linear-size
data
structures
for
range
and
rectangle
counting
in
two
dimensions
with
logarithmic
query
times.
For
these
two
problems
our
data
structures
are
essentially
memory
compressed
versions
of
the
range
tree
of
Bentley
[B2],
using
new
implementations
of
the
idea
of
a
downpointer
introduced
by
Willard
[W2].
As
regards
range
and
rectangle
reporting
on
a
RAM,
we
improve
a
method
in
Chazelle
[C1]
from
(n(log
n/log
log
n),
k/log
n)
to
(n
log
n,
k+log
n).
Interestingly,
we
have
shown
in
Chazelle
[C2]
that
the
(n
(log
n/log
log
n),
k/log
n)
algorithm
is
optimal
on
a
pointer
machine.
This
constitutes
a
rare
example
(outside
of
hashing),
where
a
pointer
machine
is
provably
less
powerful
than
a
RAM.
Concerning
range
search
for
maximum,
we
improve
over
a
data
structure
of
Gabow
et
al.
[GBT]
from
(n
log
n,
log
n)
to
(n
log
n,
log
n).
As
regards
semigroup
range
searching
we
present
an
improvement
of
a
factor
log
1-
n
space
(again
for
any
e
>
0)
over
the
algorithm
for
the
same
problem
in
Willard
[W1].
In
the
group
model
(the
special
case
of
semigroup
range
searching
where
an
inverse
operation
exists)
we
could
not
find
any
obvious
way
of
taking
advantage
of
the
inverse
operation
to
improve
on
the
results
in
the
table
(except,
of
course,
for
the
case
of
range
counting).
As
a
result,
our
(n
log
n,
log
2
n)
algorithm
may
compare
favorably
with
Willard’s
(n
log
n,
log
n)
[W2]
in
storage
requirement,
but
it
is
superseded
in
query
time
efficiency.
Our
other
results
for
the
static
case
represent
tradeoffs
and
cannot
be
compared
with
previous
work.
In
the
dynamic
case,

A
FUNCTIONAL
APPROACH
TO
DATA
STRUCTURES
431
our
upper
bounds
improve
previous
results
by
a
factor
of
log
n
space
but,
except
for
range
and
rectangle
counting,
they
also
entail
extra
polylogarithmic
costs
in
query
and
update
times.
Also,
what
makes
comparisons
even
more
difficult
is
that
in
order
to
prove
the
generality
of
our
space-reduction
techniques
we
have
purposely
chosen
a
very
weak
model
of
computation,
i.e.,
the
EPM.
2.
Functional
data
structures.
In
trying
to
assess
whether
a
particular
data
structure
is
optimal
or
not,
it
is
natural
to
ask
oneself:
why
is
the
data
stored
where
it
is
and
not
elsewhere?
The
answer
is
usually
"to
facilitate
the
computation
of
some
functions
implicitly
associated
with
the
records
of
the
data
structure."
For
example,
one
will
store
keys
in
the
nodes
of
a
binary
search
tree
to
be
able
to
branch
left
or
right
depending
on
the
outcome
of
a
comparison.
An
important
observation
is
that
nothing
demands
that
a
node
should
store
its
own
key
explicitly.
All
that
is
required
is
that
whenever
the
function
associated
with
that
node
is
called,
the
node
had
better
allow
for
its
prompt
evaluation.
Having
the
key
stored
at
the
node
at
all
times
might
be
handy
but
it
certainly
is
more
than
is
strictly
needed.
There
is
a
classical
example
of
this
fact:
in
their
well-known
data
structure
for
planar
point
location
[LP1],
Lee
and
Preparata
start
out
with
a
balanced
tree
whose
nodes
are
associated
with
various
lists.
Then
they
make
the
key
remark
that
many
elements
in
these
lists
can
be
removed
because
whenever
they
are
needed
by
the
algorithm
they
will
always
have
been
encountered
in
other
lists
before.
This
simple
transformation
brings
down
the
amount
of
storage
required
from
quadratic
to
linear.
However
different
from
the
previous
one
the
resulting
data
structure
might
be
(being
much
smaller,
for
one
thing),
it
still
has
the
same
functional
structure
as
before.
In
other
words,
the
functions
and
arguments
associated
with
each
node,
as
well
as
their
interconnections,
have
gone
unchanged
through
the
transformation.
Only
the
assignment
of
data
has
been
altered.
This
is
no
isolated
case.
Actually,
many
data
structures
have
been
discovered
through
a
similar
process.
We
propose
to
examine
this
phenomenon
in
all
generality
and
see
if
some
useful
methodology
can
be
derived
from
it.
There
are
several
ways
of
looking
at
data
structures
from
a
design
point
of
view.
One
might
choose
to
treat
them
as
structured
mappings
of
data
into
memory.
This
compiler-level
view
addresses
implementation
issues
and
thus
tends
to
be
rigid
and
overspecifying
in
the
early
stage
of
the
design
process.
Instead,
one
can
take
data
structures
a
bit
closer
to
the
notation
of
abstract
data
type,
and
think
of
them
as
combinatorial
structures
that
can
be
used
and
manipulated.
For
example,
a
data
structure
can
be
modeled
as
a
graph
with
data
stored
at
the
nodes
(Earley
[Ea]).
Semantic
rules
can
be
added
to
specify
how
the
structure
can
be
modified
and
constraints
can
be
placed
to
enforce
certain
"shape"
criteria
(e.g.,
balance
or
degree
conditions).
If
needed,
formal
definitions
of
data
structures
can
be
provided
by
means
of
grammars
or
operational
specifications
(Gonnet
and
Tompa
[GOT]).
Note
that
despite
the
added
abstraction
of
this
setting,
a
data
structure
is
still
far
removed
from
an
abstract
data
type
(Guttag
[G]).
In
particular,
unlike
the
latter,
the
former
specifies
an
architecture
for
communicating
data
and
operating
on
it.
The
framework
above
favors
the
treatment
of
data
structures
as
combinatorial
objects.
It
emphasizes
how
they
are
made
and
used
rather
than
why
they
are
the
way
they
are.
This
is
to
be
expected,
of
course,
since
a
data
structure
may
be
used
for
many
different
purposes,
and
part
of
its
interpretation
is
thus
best
left
to
the
user.
Balanced
binary
trees
are
a
case
in
point:
they
can
be
used
as
search
structures,
priority
queues,
models
of
parallel
architecture,
etc.
It
is
thus
only
natural
to
delay
their
interpretation
so
one
can
appreciate
their
versatility.
This
approach
is
sound,
for
it
allows
us
to
map

Citations
More filters
Journal ArticleDOI
TL;DR: The relationship between text entropy and regularities that show up in index structures and permit compressing them are explained and the most relevant self-indexes are covered, focusing on how they exploit text compressibility to achieve compact structures that can efficiently solve various search problems.
Abstract: Full-text indexes provide fast substring search over large text collections. A serious problem of these indexes has traditionally been their space consumption. A recent trend is to develop indexes that exploit the compressibility of the text, so that their size is a function of the compressed text length. This concept has evolved into self-indexes, which in addition contain enough information to reproduce any text portion, so they replace the text. The exciting possibility of an index that takes space close to that of the compressed text, replaces it, and in addition provides fast search over it, has triggered a wealth of activity and produced surprising results in a very short time, which radically changed the status of this area in less than 5 years. The most successful indexes nowadays are able to obtain almost optimal space and search time simultaneously.In this article we present the main concepts underlying (compressed) self-indexes. We explain the relationship between text entropy and regularities that show up in index structures and permit compressing them. Then we cover the most relevant self-indexes, focusing on how they exploit text compressibility to achieve compact structures that can efficiently solve various search problems. Our aim is to give the background to understand and follow the developments in this area.

834 citations

Book ChapterDOI
11 Oct 2010
TL;DR: This paper uses Lyndon words and introduces the Lyndon structure of runs as a useful tool when computing powers and presents an efficient algorithm for testing primitivity of factors of a string and computing their primitive roots.
Abstract: A breakthrough in the field of text algorithms was the discovery of the fact that the maximal number of runs in a string of length n is O(n) and that they can all be computed in O(n) time. We study some applications of this result. New simpler O(n) time algorithms are presented for a few classical string problems: computing all distinct kth string powers for a given k, in particular squares for k = 2, and finding all local periods in a given string of length n. Additionally, we present an efficient algorithm for testing primitivity of factors of a string and computing their primitive roots. Applications of runs, despite their importance, are underrepresented in existing literature (approximately one page in the paper of Kolpakov & Kucherov, 1999). In this paper we attempt to fill in this gap. We use Lyndon words and introduce the Lyndon structure of runs as a useful tool when computing powers. In problems related to periods we use some versions of the Manhattan skyline problem.

439 citations

01 Jan 2007
TL;DR: This volume provides an excellent opportunity to recapitulate the current status of geometric range searching and to summarize the recent progress in this area.
Abstract: About ten years ago, the eld of range searching, especially simplex range searching, was wide open. At that time, neither e cient algorithms nor nontrivial lower bounds were known for most range-searching problems. A series of papers by Haussler and Welzl [161], Clarkson [88, 89], and Clarkson and Shor [92] not only marked the beginning of a new chapter in geometric searching, but also revitalized computational geometry as a whole. Led by these and a number of subsequent papers, tremendous progress has been made in geometric range searching, both in terms of developing e cient data structures and proving nontrivial lower bounds. From a theoretical point of view, range searching is now almost completely solved. The impact of general techniques developed for geometric range searching | "-nets, 1=rcuttings, partition trees, multi-level data structures, to name a few | is evident throughout computational geometry. This volume provides an excellent opportunity to recapitulate the current status of geometric range searching and to summarize the recent progress in this area. Range searching arises in a wide range of applications, including geographic information systems, computer graphics, spatial databases, and time-series databases. Furthermore, a variety of geometric problems can be formulated as a range-searching problem. A typical range-searching problem has the following form. Let S be a set of n points in R , and let

428 citations


Cites background or methods from "Functional approach to data structu..."

  • ...Asymptotic upper bounds for planar orthogonal range searching, due to Chazelle [52, 55]....

    [...]

  • ...Chazelle [55] has shown that the bounds mentioned in Table 1 hold for this problem also....

    [...]

  • ...34 Pankaj Agarwal and Je Erickson Problem Size Query Time Update Time Source Counting n log2 n log2 n [55] n k log2(2n=k) log2 n [55] n n" + k log2 n [234] Reporting n log n log n log log n+ k log n log log n [192] n log n log log n log2+" n log log n + k log2 n log log n [234] Semigroup n log4 n log4 n [55] Table 6....

    [...]

  • ...Chazelle [55] de nes several generalizations of the pointer-machine model that are more appropriate for answering counting and semigroup queries....

    [...]

  • ...The best-known data structures for orthogonal range searching are by Chazelle [52, 55], who used compressed range trees and other techniques to improve the storage and query time....

    [...]

Journal ArticleDOI
TL;DR: To analyze the complexity of the algorithm, an amortization argument based on a new combinatorial theorem on line arrangements is used.
Abstract: The main contribution of this work is an O(n log n + k)-time algorithm for computing all k intersections among n line segments in the plane. This time complexity is easily shown to be optimal. Within the same asymptotic cost, our algorithm can also construct the subdivision of the plane defined by the segments and compute which segment (if any) lies right above (or below) each intersection and each endpoint. The algorithm has been implemented and performs very well. The storage requirement is on the order of n + k in the worst case, but it is considerably lower in practice. To analyze the complexity of the algorithm, an amortization argument based on a new combinatorial theorem on line arrangements is used.

311 citations

Proceedings ArticleDOI
03 Apr 2006
TL;DR: This paper proposes a novel labeling scheme for sparse graphs that ensures that graph reachability queries can be answered in constant time, and provides an alternative scheme to tradeoff query time for label space, which further benefits applications that use tree-like graphs.
Abstract: Graph reachability is fundamental to a wide range of applications, including XML indexing, geographic navigation, Internet routing, ontology queries based on RDF/OWL, etc. Many applications involve huge graphs and require fast answering of reachability queries. Several reachability labeling methods have been proposed for this purpose. They assign labels to the vertices, such that the reachability between any two vertices may be decided using their labels only. For sparse graphs, 2-hop based reachability labeling schemes answer reachability queries efficiently using relatively small label space. However, the labeling process itself is often too time consuming to be practical for large graphs. In this paper, we propose a novel labeling scheme for sparse graphs. Our scheme ensures that graph reachability queries can be answered in constant time. Furthermore, for sparse graphs, the complexity of the labeling process is almost linear, which makes our algorithm applicable to massive datasets. Analytical and experimental results show that our approach is much more efficient than stateof- the-art approaches. Furthermore, our labeling method also provides an alternative scheme to tradeoff query time for label space, which further benefits applications that use tree-like graphs.

258 citations


Cites background from "Functional approach to data structu..."

  • ...In internal memory, the compressed range-tree[ 6 ] can solve the problem in O(log2 jTj) time with O(jTj) space....

    [...]

References
More filters
Book
01 Jan 1974
TL;DR: This text introduces the basic data structures and programming techniques often used in efficient algorithms, and covers use of lists, push-down stacks, queues, trees, and graphs.
Abstract: From the Publisher: With this text, you gain an understanding of the fundamental concepts of algorithms, the very heart of computer science. It introduces the basic data structures and programming techniques often used in efficient algorithms. Covers use of lists, push-down stacks, queues, trees, and graphs. Later chapters go into sorting, searching and graphing algorithms, the string-matching algorithms, and the Schonhage-Strassen integer-multiplication algorithm. Provides numerous graded exercises at the end of each chapter. 0201000296B04062001

9,262 citations

Journal ArticleDOI
TL;DR: An algorithm for a random access machine with uniform cost measure (and a bound of $\Omega (\log n)$ on the number of bits per word) that requires time per query and preprocessing time is presented, assuming that the collection of trees is static.
Abstract: We consider the following problem: Given a collection of rooted trees, answer on-line queries of the form, “What is the nearest common ancester of vertices x and y?” We show that any pointer machine that solves this problem requires $\Omega (\log \log n)$ time per query in the worst case, where n is the total number of vertices in the trees. On the other hand, we present an algorithm for a random access machine with uniform cost measure (and a bound of $\Omega (\log n)$ on the number of bits per word) that requires $O(1)$ time per query and $O(n)$ preprocessing time, assuming that the collection of trees is static. For a version of the problem in which the trees can change between queries, we obtain an almost-linear-time (and linear-space) algorithm.

1,252 citations

Journal ArticleDOI
TL;DR: Multidimensional divide-and-conquer is discussed, an algorithmic paradigm that can be instantiated in many different ways to yield a number of algorithms and data structures for multidimensional problems.
Abstract: Most results in the field of algorithm design are single algorithms that solve single problems. In this paper we discuss multidimensional divide-and-conquer, an algorithmic paradigm that can be instantiated in many different ways to yield a number of algorithms and data structures for multidimensional problems. We use this paradigm to give best-known solutions to such problems as the ECDF, maxima, range searching, closest pair, and all nearest neighbor problems. The contributions of the paper are on two levels. On the first level are the particular algorithms and data structures given by applying the paradigm. On the second level is the more novel contribution of this paper: a detailed study of an algorithmic paradigm that is specific enough to be described precisely yet general enough to solve a wide variety of problems.

720 citations

Proceedings ArticleDOI
01 Dec 1984
TL;DR: Three techniques in computational geometry are explored: scaling solves a problem by viewing it at increasing levels of numerical precision; activation is a restricted type of update operation, useful in sweep algorithms; the Cartesian tree is a data structure for problems involving maximums and minimums.
Abstract: Three techniques in computational geometry are explored: Scaling solves a problem by viewing it at increasing levels of numerical precision; activation is a restricted type of update operation, useful in sweep algorithms; the Cartesian tree is a data structure for problems involving maximums and minimums. These techniques solve the minimum spanning tree problem in Rk1 and Rk

579 citations

Journal ArticleDOI
TL;DR: A new data structure that is called a priority search tree, of which two variants are introduced, operations (1), (2), and (3) can be implemented in $O(\log n)$ time, where n is the cardinality of D.
Abstract: Let D be a dynamic set of ordered pairs $[ x,y ]$ over the set $0,1, \cdots ,k - 1$ of integers. Consider the following operations applied to D: (1) Insert (delete) a pair $[ x,y ]$ into (from) D. (2) Given test integers $x0,x1$, and $y1$, among all pairs $[ x,y ]$ in D such that $x0 \leqq x \leqq x1$ and $y \leqq y1$, find a pair whose x is minimal (or maximal). (3) Given test integers $x0$ and $x1$, among all pairs $[ x,y ]$ in D such that $x0 \leqq x \leqq x1$, find a pair whose y is minimal. (4) Given test integers $x0$, $x1$, and $y1$, enumerate those pairs $[x,y ]$ in D such that $x0 \leqq x \leqq x1$ and $y \leqq y1$.Using a new data structure that we call a priority search tree, of which two variants are introduced, operations (1), (2), and (3) can be implemented in $O(\log n)$ time, where n is the cardinality of D. Operation (4) is performed in at most $O(\log n + s)$ time, where s is the number of pairs enumerated. The priority search tree occupies $O(n)$ space.Priority seach tree algorithms can...

541 citations