Marquette University Marquette University

e-Publications@Marquette e-Publications@Marquette

Computer Science Faculty Research and

Publications

Computer Science, Department of

2019

Parallelization of Plane Sweep Based Voronoi Construction with Parallelization of Plane Sweep Based Voronoi Construction with

Compiler Directives Compiler Directives

Anmol Paudel

Jie Yang

Satish Puri

Follow this and additional works at: https://epublications.marquette.edu/comp_fac

Marquette University

e-Publications@Marquette

Computer Science Faculty Research and Publications/College of Arts and

Sciences

This paper is NOT THE PUBLISHED VERSION; but the author’s final, peer-reviewed manuscript. The

published version may be accessed by following the link in the citation below.

2019 IEEE 62

nd

International Midwest Symposium on Circuits and Systems, (2019): 908-911. DOI. This

article is © Institute of Electrical and Electronic Engineers (IEEE) and permission has been granted for

this version to appear in e-Publications@Marquette. Institute of Electrical and Electronic Engineers

(IEEE) does not grant permission for this article to be further copied/distributed or hosted elsewhere

without the express permission from Institute of Electrical and Electronic Engineers (IEEE).

Parallelization of Plane Sweep Based Voronoi

Construction with Compiler Directives

Anmol Paudel

MSCS Department, Marquette University

Jie Yang

MSCS Department, Marquette University

Satish Puri

MSCS Department, Marquette University

SECTION I. Introduction

Voronoi diagrams are extensively used in computational geometry to partition a plane into multiple regions

where each region corresponds to and contain a site, and that site will be the closest site to all points in that

region. Figure 1 shows a Voronoi diagram with a unique region for each site. Here is a mathematical definition

[1] of a Voronoi region:

Definition 1. Let P: = {p

1

, p

2

,…, p

n

} be a set of n distinct points in the plane; these points are the sites. We define

the Voronoi diagram of P as the subdivision of the plane into n cells, one for each site in P, with the property that

a point q lies in the cell corresponding to a site p

i

, if and only if dist(q, p

i

) < dist(q, p

j

) for each p

j

∈ with j ≠ i.

where (, ): =

�

(

−

)

2

+ (

−

)

2

There are different algorithms to construct Voronoi diagram with n sites as input. A brute-force algorithm

constructs one region at a time. Since each region is the intersection of n-1 half planes, it takes O(nlogn) time

per region, thereby resulting in an O(n

2

logn) time algorithm. An optimal algorithm has O(nlogn) lower bound [2].

The planesweep algorithm that we consider here for parallelization is an optimal algorithm.

We are exploiting parallelism in the planesweep algorithm on a per event basis, however, the order of event

processing is still sequential. This is because there is interdependence between the static and dynamic events

generated by concurrent event processing. We have discovered that there is enough computation in an event

itself to warrant performance improvement in a shared memory environment. These computations include

intersection of neighboring arcs (w.r.t. an event) that is required to generate new events. This is the first work to

identify and report the performance enhancement possible while concurrently maintaining the spatial data

structures (beachline) on a per-event basis.

Fig. 1. Voronoi Diagram

[The dots in the figure are the sites and the lines are the edges of the a partitioned region. It can be observed

that for any arbitrary point in the whole space, the closest site is the one inside the same region as it is.]

This paper is a part of our series of work focused on parallelizing existing spatial and computational geometry

code using compiler directives. Our prior work was successful in the parallelization of the planesweep version of

segment intersection and polygon intersection problems [3], [4], [5]. Existing literature focuses on theoretical

work on parallel algorithms [6], [7]. There are other approaches of parallelization that use data decomposition

[2], [8]. However, data decomposition algorithms require expensive merging steps ( O(n) time complexity) which

are non-trivial to implement efficiently. Our work does not require explicit data decomposition.

This paper explores the concurrency available in processing each event in Voronoi diagram construction and

uses directives to make an existing implementation of Fortune’s algorithm faster with minimal efforts using

compiler directives. OpenMP is an application programming interface which enables us to parallelize existing C,

C++ or Fortran code by just adding compiler directives (#pragma) to it. The compiler takes the directives as hints

for potential ways to inject parallelism in the sequential code. Directives based parallelization can be targeted at

multicore CPUs, GPUs or a combination of both. Adding directives should not affect the correctness of the

results produced, although the order in which results are produced might vary due to concurrency. Compiler

directives based parallelization is more maintainable and performance portable to different multicore

architectures and removes the hassle of having to change the parallelized code according to changes in

multicore architecture. Even though, OpenMP is good for regular parallelism, here we are trying to extract

irregular and dynamic parallelism exposed by our modified Fortune’s algorithm.

SECTION II. Fortune’s Algorithm

Fortune’s algorithm is a planesweep algorithm for computing Voronoi Diagram in O(nlogn) time with O(n) space

[9]. Fortune presented a transformation that could be used to compute Voronoi diagrams with a sweepline

technique. [9]

Fig. 2. A snapshot of the algorithm showing circle events, a vertical sweep line and beachline made up of arcs.

(Best viewed in color)

In Figure 2, the dark grey dots are the site points. The blue dots are the Voronoi vertices and lines connecting

the blue dots are the Voronoi edges. The vertical blue line is the sweepline. The green and red arcs form the

beachline structure at the sweepline position. The light grey circles are the circle events. As the sweepline

reaches a site point, an arc/parabola corresponding to it is created which will grow as the sweepline progresses

and is clipped by neighbouring arcs or new arc ahead of it. The collection of active arcs is the beachline.

Algorithm 1 is a simplified algorithmic description of the implementation of Fortune’s Algorithm. The focus of

the description here is to show the flow of the algorithm so that the possibilities and limitations to a directive

based approach can be explored. This algorithmic description here is necessary to understand the flow of

execution and interdependencies among the variables that are key to any directive-based parallelization.

Algorithm 1 Fortune’s Algorithm (Horizontal Sweep)

1: P ← load all points

2: Initialize a bounding box with offset

3: Initialize beachline B

// B is of type arc

4: Initialize output O

// O is a collection of edges of the partitioned regions

5: Initialize events priority queue

// event with minimum x-coordinate is at the top

6: Sort P in ascending order by x-coordinate

7: for each p in P do

8: while (events.top.x <= p.x) do

9: ProcessEvent(events.deque())

10: end while

11: ProcessPoint(p)

12: end for

13: ProcessRemainingEvents()

14: FinishEdges()

In the event data structure, x is the maximum x-location a circle event can affect and it introduces a event

processing there. So, x = p.x + radiusOfTheCircle.

Listing 1. Data Structure for Event

struct event {

var x ;

point p ;

arc ∗ a;

}

Algorithm 2 ProcessEvent(event e)

1: Input event e

2: if (e.valid) then

3: Begin a new Segment s at e.x

4: Remove e.a from beachline B

5: Complete segments e.a.s0 and e.a.s1

6: // Check circle events

CheckCircleEvent(e.a.prev, e.x)

CheckCircleEvent(e.a.next, e.x)

7: end if

A. Parallelizing Fortune’s Algorithm

We start by trying to find opportunities in the algorithm where compiler directives can be inserted for

parallelization. The most obvious choice would be to parallelize the loops. Loop parallelization using directives is

the easiest way to parallelize and usually has very less overheads. Furthermore, internal loops inside nested

loops can also be parallelized.

In Algorithm 1, the for-loops and while loops cannot be directly parallelized due to interdependencies and

memory side-effects. Algorithm 3 and Algorithm 2 can not run concurrently due to the interdependence of site

events and the circle events.

In Algorithm 2, since entirety of its execution is based on a conditional, we need to determine the possibility of

parallelizing this portion if it gets executed. Here, line 4 is dependent on line 3 because we need the segment s