scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Parallelization of Plane Sweep Based Voronoi Construction with Compiler Directives

15 Jul 2019-Vol. 1, pp 908-911
TL;DR: A multi-threaded approach where an existing sequential implementation of Fortune's planesweep algorithm is augmented with compiler directives, and the novelty of this fine-grained parallel algorithm lies in exploiting the concurrency available at each event point encountered during the algorithm.
Abstract: Voronoi diagram construction is a common and fundamental problem in computational geometry and spatial computing Numerous sequential and parallel algorithms for Voronoi diagram construction exists in literature This paper presents a multi-threaded approach where we augment an existing sequential implementation of Fortune's planesweep algorithm with compiler directives The novelty of our fine-grained parallel algorithm lies in exploiting the concurrency available at each event point encountered during the algorithm On the Intel Xeon E5 CPU, our shared-memory parallelization with OpenMP achieves around 2x speedup compared to the sequential implementation using datasets containing 2k-128k sites

Summary (1 min read)

Jump to:  and [Sciences]

Sciences

  • This paper is NOT THE PUBLISHED VERSION; but the author’s final, peer-reviewed manuscript.
  • Fortune’s Algorithm Fortune’s algorithm is a planesweep algorithm for computing Voronoi Diagram in O time with O(n) space [9].
  • This removes any early procedure terminating conditions from Algorithm 3.
  • Again, here the circle events check can lead to new events being added, but since these events are just added and not used elsewhere, the authors can put adding events part of the code inside critical sections and still parallelize.
  • In Algorithm 4, there are two code blocks which have been parallelized using OpenMP.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Marquette University Marquette University
e-Publications@Marquette e-Publications@Marquette
Computer Science Faculty Research and
Publications
Computer Science, Department of
2019
Parallelization of Plane Sweep Based Voronoi Construction with Parallelization of Plane Sweep Based Voronoi Construction with
Compiler Directives Compiler Directives
Anmol Paudel
Jie Yang
Satish Puri
Follow this and additional works at: https://epublications.marquette.edu/comp_fac

Marquette University
e-Publications@Marquette
Computer Science Faculty Research and Publications/College of Arts and
Sciences
This paper is NOT THE PUBLISHED VERSION; but the author’s final, peer-reviewed manuscript. The
published version may be accessed by following the link in the citation below.
2019 IEEE 62
nd
International Midwest Symposium on Circuits and Systems, (2019): 908-911. DOI. This
article is © Institute of Electrical and Electronic Engineers (IEEE) and permission has been granted for
this version to appear in e-Publications@Marquette. Institute of Electrical and Electronic Engineers
(IEEE) does not grant permission for this article to be further copied/distributed or hosted elsewhere
without the express permission from Institute of Electrical and Electronic Engineers (IEEE).
Parallelization of Plane Sweep Based Voronoi
Construction with Compiler Directives
Anmol Paudel
MSCS Department, Marquette University
Jie Yang
MSCS Department, Marquette University
Satish Puri
MSCS Department, Marquette University
SECTION I. Introduction
Voronoi diagrams are extensively used in computational geometry to partition a plane into multiple regions
where each region corresponds to and contain a site, and that site will be the closest site to all points in that
region. Figure 1 shows a Voronoi diagram with a unique region for each site. Here is a mathematical definition
[1] of a Voronoi region:

Definition 1. Let P: = {p
1
, p
2
,…, p
n
} be a set of n distinct points in the plane; these points are the sites. We define
the Voronoi diagram of P as the subdivision of the plane into n cells, one for each site in P, with the property that
a point q lies in the cell corresponding to a site p
i
, if and only if dist(q, p
i
) < dist(q, p
j
) for each p
j
with j i.
where (, ): =
(
)
2
+ (
)
2
There are different algorithms to construct Voronoi diagram with n sites as input. A brute-force algorithm
constructs one region at a time. Since each region is the intersection of n-1 half planes, it takes O(nlogn) time
per region, thereby resulting in an O(n
2
logn) time algorithm. An optimal algorithm has O(nlogn) lower bound [2].
The planesweep algorithm that we consider here for parallelization is an optimal algorithm.
We are exploiting parallelism in the planesweep algorithm on a per event basis, however, the order of event
processing is still sequential. This is because there is interdependence between the static and dynamic events
generated by concurrent event processing. We have discovered that there is enough computation in an event
itself to warrant performance improvement in a shared memory environment. These computations include
intersection of neighboring arcs (w.r.t. an event) that is required to generate new events. This is the first work to
identify and report the performance enhancement possible while concurrently maintaining the spatial data
structures (beachline) on a per-event basis.
Fig. 1. Voronoi Diagram
[The dots in the figure are the sites and the lines are the edges of the a partitioned region. It can be observed
that for any arbitrary point in the whole space, the closest site is the one inside the same region as it is.]
This paper is a part of our series of work focused on parallelizing existing spatial and computational geometry
code using compiler directives. Our prior work was successful in the parallelization of the planesweep version of
segment intersection and polygon intersection problems [3], [4], [5]. Existing literature focuses on theoretical
work on parallel algorithms [6], [7]. There are other approaches of parallelization that use data decomposition
[2], [8]. However, data decomposition algorithms require expensive merging steps ( O(n) time complexity) which
are non-trivial to implement efficiently. Our work does not require explicit data decomposition.
This paper explores the concurrency available in processing each event in Voronoi diagram construction and
uses directives to make an existing implementation of Fortune’s algorithm faster with minimal efforts using
compiler directives. OpenMP is an application programming interface which enables us to parallelize existing C,
C++ or Fortran code by just adding compiler directives (#pragma) to it. The compiler takes the directives as hints
for potential ways to inject parallelism in the sequential code. Directives based parallelization can be targeted at
multicore CPUs, GPUs or a combination of both. Adding directives should not affect the correctness of the
results produced, although the order in which results are produced might vary due to concurrency. Compiler
directives based parallelization is more maintainable and performance portable to different multicore
architectures and removes the hassle of having to change the parallelized code according to changes in

multicore architecture. Even though, OpenMP is good for regular parallelism, here we are trying to extract
irregular and dynamic parallelism exposed by our modified Fortune’s algorithm.
SECTION II. Fortune’s Algorithm
Fortune’s algorithm is a planesweep algorithm for computing Voronoi Diagram in O(nlogn) time with O(n) space
[9]. Fortune presented a transformation that could be used to compute Voronoi diagrams with a sweepline
technique. [9]
Fig. 2. A snapshot of the algorithm showing circle events, a vertical sweep line and beachline made up of arcs.
(Best viewed in color)
In Figure 2, the dark grey dots are the site points. The blue dots are the Voronoi vertices and lines connecting
the blue dots are the Voronoi edges. The vertical blue line is the sweepline. The green and red arcs form the
beachline structure at the sweepline position. The light grey circles are the circle events. As the sweepline
reaches a site point, an arc/parabola corresponding to it is created which will grow as the sweepline progresses
and is clipped by neighbouring arcs or new arc ahead of it. The collection of active arcs is the beachline.
Algorithm 1 is a simplified algorithmic description of the implementation of Fortune’s Algorithm. The focus of
the description here is to show the flow of the algorithm so that the possibilities and limitations to a directive
based approach can be explored. This algorithmic description here is necessary to understand the flow of
execution and interdependencies among the variables that are key to any directive-based parallelization.
Algorithm 1 Fortune’s Algorithm (Horizontal Sweep)
1: P load all points
2: Initialize a bounding box with offset
3: Initialize beachline B
// B is of type arc

4: Initialize output O
// O is a collection of edges of the partitioned regions
5: Initialize events priority queue
// event with minimum x-coordinate is at the top
6: Sort P in ascending order by x-coordinate
7: for each p in P do
8: while (events.top.x <= p.x) do
9: ProcessEvent(events.deque())
10: end while
11: ProcessPoint(p)
12: end for
13: ProcessRemainingEvents()
14: FinishEdges()
In the event data structure, x is the maximum x-location a circle event can affect and it introduces a event
processing there. So, x = p.x + radiusOfTheCircle.
Listing 1. Data Structure for Event
struct event {
var x ;
point p ;
arc a;
}
Algorithm 2 ProcessEvent(event e)
1: Input event e
2: if (e.valid) then
3: Begin a new Segment s at e.x
4: Remove e.a from beachline B
5: Complete segments e.a.s0 and e.a.s1
6: // Check circle events
CheckCircleEvent(e.a.prev, e.x)
CheckCircleEvent(e.a.next, e.x)
7: end if
A. Parallelizing Fortune’s Algorithm
We start by trying to find opportunities in the algorithm where compiler directives can be inserted for
parallelization. The most obvious choice would be to parallelize the loops. Loop parallelization using directives is
the easiest way to parallelize and usually has very less overheads. Furthermore, internal loops inside nested
loops can also be parallelized.
In Algorithm 1, the for-loops and while loops cannot be directly parallelized due to interdependencies and
memory side-effects. Algorithm 3 and Algorithm 2 can not run concurrently due to the interdependence of site
events and the circle events.
In Algorithm 2, since entirety of its execution is based on a conditional, we need to determine the possibility of
parallelizing this portion if it gets executed. Here, line 4 is dependent on line 3 because we need the segment s

Citations
More filters
Proceedings ArticleDOI
01 May 2022
TL;DR: This is the first GPU and SIMD-based design and implementation of autocorrelation kernels on heterogeneous CPU and GPU environments and uses Intrinsics to exploit SIMD parallelism on x86 CPU architecture and MPI Graph Topology to minimize inter- process communication.
Abstract: Geographic information systems deal with spatial data and its analysis. Spatial data contains many attributes with location information. Spatial autocorrelation is a fundamental concept in spatial analysis. It suggests that similar objects tend to cluster in geographic space. Hotspots, an example of autocorrelation, are statistically significant clusters of spatial data. Other autocorrelation measures like Moran's I are used to quantify spatial dependence. Large scale spatial autocorrelation methods are compute-intensive. Fast methods for hotspots detection and analysis are crucial in recent times of COVID-19 pandemic. Therefore, we have developed parallelization methods on heterogeneous CPU and GPU environments. To the best of our knowledge, this is the first GPU and SIMD-based design and implementation of autocorrelation kernels. Earlier methods in literature intro-duced cluster-based and Map Reduce-based parallelization. We have used Intrinsics to exploit SIMD parallelism on x86 CPU architecture. We have used MPI Graph Topology to minimize inter- process communication. Our benchmarks for CPU/GPU optimizations gain upto 750X relative speedup with a 8 GPU setup when compared to baseline sequential implementation. Compared to the best implementation using OpenMP + R-tree data structure on a single compute node, our accelerated hotspots benchmark gains a 25X speedup. For real world US counties and COVID data evolution calculated over 500 days, we gain upto 110X speedup reducing time from 33 minutes to 0.3 minutes.

1 citations

References
More filters
Book ChapterDOI

[...]

01 Jan 2012

139,059 citations


"Parallelization of Plane Sweep Base..." refers background in this paper

  • ...Existing literature focuses on theoretical work on parallel algorithms [6], [7]....

    [...]

01 Jan 1985
TL;DR: This book offers a coherent treatment, at the graduate textbook level, of the field that has come to be known in the last decade or so as computational geometry.
Abstract: From the reviews: "This book offers a coherent treatment, at the graduate textbook level, of the field that has come to be known in the last decade or so as computational geometry...The book is well organized and lucidly written; a timely contribution by two founders of the field. It clearly demonstrates that computational geometry in the plane is now a fairly well-understood branch of computer science and mathematics. It also points the way to the solution of the more challenging problems in dimensions higher than two."

6,525 citations


"Parallelization of Plane Sweep Base..." refers background or methods in this paper

  • ...An optimal algorithm has O(nlogn) lower bound [2]....

    [...]

  • ...There are other approaches of parallelization that use data decomposition [2], [8]....

    [...]

Book
01 Jan 1978
TL;DR: In this article, the authors present a coherent treatment of computational geometry in the plane, at the graduate textbook level, and point out the way to the solution of the more challenging problems in dimensions higher than two.
Abstract: From the reviews: "This book offers a coherent treatment, at the graduate textbook level, of the field that has come to be known in the last decade or so as computational geometry...The book is well organized and lucidly written; a timely contribution by two founders of the field. It clearly demonstrates that computational geometry in the plane is now a fairly well-understood branch of computer science and mathematics. It also points the way to the solution of the more challenging problems in dimensions higher than two."

3,419 citations

Journal ArticleDOI
Steven Fortune1
TL;DR: A geometric transformation is introduced that allows Voronoi diagrams to be computed using a sweepline technique and is used to obtain simple algorithms for computing the Vor onoi diagram of point sites, of line segment sites, and of weighted point sites.
Abstract: We introduce a geometric transformation that allows Voronoi diagrams to be computed using a sweepline technique. The transformation is used to obtain simple algorithms for computing the Voronoi diagram of point sites, of line segment sites, and of weighted point sites. All algorithms haveO(n logn) worst-case running time and useO(n) space.

1,209 citations


"Parallelization of Plane Sweep Base..." refers methods in this paper

  • ...Fortune’s algorithm is a planesweep algorithm for computing Voronoi Diagram in O(nlogn) time with O(n) space [9]....

    [...]

Book
01 Jul 1992
TL;DR: Models of parallel computation convex hull intersection problems geometric searching visibility and separability nearest neighbours Vonoroi diagrams geometric optimization triangulation of polygons and point sets current trends future directions.
Abstract: Models of parallel computation convex hull intersection problems geometric searching visibility and separability nearest neighbours Vonoroi diagrams geometric optimization triangulation of polygons and point sets current trends future directions.

96 citations


"Parallelization of Plane Sweep Base..." refers background in this paper

  • ...Existing literature focuses on theoretical work on parallel algorithms [6], [7]....

    [...]