What have the authors stated for future works in "Fast computation of database operations using graphics processors" ?

This number is expected to increase in the future. There are many avenues for future work. The authors also plan to design GPU-based algorithms for queries on more complicated data types in the context of spatial and temporal databases and perform continuous queries over streams using GPUs. As the instruction sets of fragment programs improve, the running time of many of their algorithms will further decrease.

How many pixels can be processed at a 450 MHz processor?

The graphics processor has 256MB of video memory with a memory data rate of 950MHz and can process upto 8 pixels at processor clock rate of 450 MHz.

How do the authors achieve significant performance improvement over CPU-based implementations?

The authors show that algorithms for semi-linear and selection queries map very well to GPUs and the authors are able to obtain significant performance improvement over CPU-based implementations.

Why is the GPU not able to copy large databases into memory?

due to the limited video memory, the authors may notbe able to copy very large databases (with tens of millions of records) into GPU memory.

What is the way to measure the traffic patterns in the census database?

For their benchmarks, the authors have used a database consisting of TCP/IP data for monitoring traffic patterns in local area network and wide area network and a census database [6] consisting of monthly income information.

Why is it difficult to perform comparisons between CPU-based and GPU-based algorithms?

Due to different clock rates and instruction sets, it is difficult to perform explicit comparisons between CPU-based and GPU-based algorithms.

What is the speedup in the KthLargest() algorithm?

Given the GPU clock rate and the number of pixel processing engines, the authors can estimate the time taken to perform KthLargest() under some assumptions.

(Open Access) Fast computation of database operations using graphics processors (2005) | Naga K. Govindaraju

Q: What have the authors contributed in "Fast computation of database operations using graphics processors" ?

The authors present new algorithms for performing fast computation of several common database operations on commodity graphics processors. Specifically, the authors consider operations such as conjunctive selections, aggregations, and semi-linear queries, which are essential computational components of typical database, data warehousing, and data mining applications. The authors have compared their performance with an optimized implementation of CPU-based algorithms.

Q: How was the implementation of each operation redesigned?

To utilize the efficiency of pipelining and parallelism that a vector processor provides, the implementation of each operation was redesigned for increasing the vectorization rate and the vector length.

Q: What are the two major components of a desktop computer?

In fact, the two major computational components of a desktop computer system are its main central processing unit (CPU) and its (GPU).

Fast Computation of Database Operations using Graphics

Processors

Naga K. Govindaraju Brandon Lloyd Wei Wang Ming Lin Dinesh Manocha

University of North Carolina at Chapel Hill

{naga, blloyd, weiwang, lin, dm}@cs.unc.edu

http://gamma.cs.unc.edu/DataBase

ABSTRACT

We present new algorithms for performing fast computa-

tion of several common database operations on commod-

ity graphics processors. Speciﬁcally, we consider operations

such as conjunctive selections, aggregations, and semi-linear

queries, which are essential computational components of

typical database, data warehousing, and data mining appli-

cations. While graphics processing units (GPUs) have been

designed for fast display of geometric primitives, we utilize

the inherent pipelining and parallelism, single instruction

and multiple data (SIMD) capabilities, and vector process-

ing functionality of GPUs, for evaluating boolean predicate

combinations and semi-linear queries on attributes and exe-

cuting database operations eﬃciently. Our algorithms take

into account some of the limitations of the programming

model of current GPUs and perform no data rearrange-

ments. Our algorithms have been implemented on a pro-

grammable GPU (e.g. NVIDIA’s GeForce FX 5900) and

applied to databases consisting of up to a million records.

We have compared their performance with an optimized im-

plementation of CPU-based algorithms. Our experiments

indicate that the graphics processor available on commodity

computer systems is an eﬀective co-processor for performing

database operations.

Keywords: graphics processor, query optimization, selec-

tion query, aggregation, selectivity analysis, semi-linear query.

1. INTRODUCTION

As database technology becomes pervasive, Database Man-

agement Systems (DBMSs) have been deployed in a wide

variety of applications. The rapid growth of data volume

for the past decades has intensiﬁed the need for high-speed

database management systems. Most database queries and,

more recently, data warehousing and data mining applica-

tions, are very data- and computation-intensive and there-

fore demand high processing power. Researchers have ac-

tively sought to design and develop architectures and algo-

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage, and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

SIGMOD 2004 June 13-18, 2004, Paris, France.

$5.00.

rithms for faster query execution. Special attention has been

given to increase the performance of selection, aggregation,

and join operations on large databases. These operations

are widely used as fundamental primitives for building com-

plex database queries and for supporting on-line analytic

processing (OLAP) and data mining procedures. The eﬃ-

ciency of these operations has a signiﬁcant impact on the

performance of a database system.

As the current trend of database architecture moves from

disk-based system towards main-memory databases, appli-

cations have become increasingly computation- and memory-

bound. Recent work [3, 21] investigating the processor and

memory behaviors of current DBMSs has demonstrated a

signiﬁcant increase in the query execution time due to mem-

ory stalls (on account of data and instruction misses), branch

mispredictions, and resource stalls (due to instruction de-

pendencies and hardware speciﬁc characteristics). Increased

attention has been given on redesigning traditional database

algorithms for fully utilizing the available architectural fea-

tures and for exploiting parallel execution possibilities, min-

imizing memory and resource stalls, and reducing branch

mispredictions [2, 5, 20, 24, 31, 32, 34, 37].

1.1 Graphics Processing Units

In this paper, we exploit the computational power of graph-

ics processing units (GPUs) for database operations. In the

last decade, high-performance 3D graphics hardware has be-

come as ubiquitous as ﬂoating-point hardware. Graphics

processors are now a part of almost every personal computer,

game console, or workstation. In fact, the two major com-

putational components of a desktop computer system are its

main central processing unit (CPU) and its (GPU). While

CPUs are used for general purpose computation, GPUs have

been primarily designed for transforming, rendering, and

texturing geometric primitives, such as triangles. The driv-

ing application of GPUs has been fast rendering for visual

simulation, virtual reality, and computer gaming.

GPUs are increasingly being used as co-processors to CPUs.

GPUs are extremely fast and are capable of processing tens

of millions of geometric primitives per second. The peak

performance of GPUs has been increasing at the rate of

2.5 − 3.0 times a year, much faster than the Moore’s law

for CPUs. At this rate, the GPU’s peak performance may

move into the teraﬂop range by 2006 [19]. Most of this per-

formance arises from multiple processing units and stream

processing. The GPU treats the vertices and pixels consti-

tuting graphics primitives as streams. Multiple vertex and

pixel processing engines on a GPU are connected via data

ﬂows. These processing engines perform simple operations

in parallel.

Recently, GPUs have become programmable, allowing a

user to write fragment programs that are executed on pixel

processing engines. The pixel processing engines have di-

rect access to the texture memory and can perform vector

operations with ﬂoating point arithmetic. These capabil-

ities have been successfully exploited for many geometric

and scientiﬁc applications. As graphics hardware becomes

increasingly programmable and powerful, the roles of CPUs

and GPUs in computing are being redeﬁned.

1.2 Main Contributions

In this paper, we present novel algorithms for fast com-

putation of database operations on GPUs. The operations

include predicates, boolean combinations, and aggregations.

We utilize the SIMD capabilities of pixel processing engines

within a GPU to perform these operations eﬃciently. We

have used these algorithms for selection queries on one or

more attributes and generic aggregation queries including

selectivity analysis on large databases.

Our algorithms take into account some of the limitations

of the current programming model of GPUs which make it

diﬃcult to perform data rearrangement. We present novel

algorithms for performing multi-attribute comparisons, semi-

linear queries, range queries, computing the kth largest num-

ber, and other aggregates. These algorithms have been im-

plemented using fragment programs and have been applied

to large databases composed of up to a million records. The

performance of these algorithms depends on the instruction

sets available for fragment programs, the number of frag-

ment processors, and the underlying clock rate of the GPU.

We also perform a preliminary comparison between GPU-

based algorithms running on a NVIDIA GeForceFX 5900 Ul-

tra graphics processor and optimized CPU-based algorithms

running on dual 2.8 GHz Intel Xeon processors.

We show that algorithms for semi-linear and selection

queries map very well to GPUs and we are able to ob-

tain signiﬁcant performance improvement over CPU-based

implementations. The algorithms for aggregates obtain a

modest gain of 2 − 4 times speedup over CPU-based imple-

mentations. Overall, the GPU can be used as an eﬀective

co-processor for many database operations.

1.3 Organization

The rest of the paper is organized as follows. We brieﬂy

survey related work on database operations and use of GPUs

for geometric and scientiﬁc computing in Section 2. We give

an overview of the graphics architectural pipeline in Section

3. We present algorithms for database operations includ-

ing predicates, boolean combinations, and aggregations in

Section 4. We describe their implementation in Section 5

and compare their performance with optimized CPU-based

implementations. We analyze the performance in Section 6

and outline the cases where GPU-based algorithms can oﬀer

considerable gain over CPU-based algorithms.

2. RELATED WORK

In this section, we highlight the related research in main-

memory database operations and general purpose computa-

tion using GPUs.

2.1 Hardware Accelerated Database Opera-

tions

Many acceleration techniques have been proposed for data-

base operations. Ailamaki et al. [3] analyzed the execution

time of commercial DBMSs and observed that almost half

of the time is spent in stalls. This indicates that the perfor-

mance of a DBMS can be signiﬁcantly improved by reducing

stalls.

Meki and Kambayashi used a vector processor for accel-

erating the execution of relational database operations in-

cluding selection, projection, and join [24]. To utilize the

eﬃciency of pipelining and parallelism that a vector pro-

cessor provides, the implementation of each operation was

redesigned for increasing the vectorization rate and the vec-

tor length. The limitation of using a vector processor is that

the load-store instruction can have high latency [37].

Modern CPUs have SIMD instructions that allow a single

basic operation to be performed on multiple data elements

in parallel. Zhu and Ross described SIMD implementation

of many important database operations including sequential

scans, aggregation, indexed searches, and joins [37]. Consid-

erable performance gains were achieved by exploiting the in-

herent parallelism of SIMD instructions and reducing branch

mispredictions.

Recently, Sun et al. present the use of graphics processors

for spatial selections and joins [35]. They use color blending

capabilities available on graphics processors to test if two

polygons intersect in screen-space. Their experiments on

graphics processors indicate a speedup of nearly 5 times on

intersection joins and within-distance joins when compared

against their software implementation. The technique fo-

cuses on pruning intersections between triangles based on

their 2D overlap and is quite conservative.

2.2 General-Purpose Computing Using GPUs

In theory, GPUs are capable of performing any computa-

tion that can be mapped to the stream-computing model.

This model has been exploited for ray-tracing [29], global

illumination [30] and geometric computations [22].

The programming model of GPUs is somewhat limited,

mainly due to the lack of random access writes. This limi-

tation makes it more diﬃcult to implement many data struc-

tures and common algorithms such as sorting. Purcell et al.

[30] present an implementation of bitonic merge sort, where

the output routing from one step to another is known in

advance. The algorithm is implemented as a fragment pro-

gram and each stage of the sorting algorithm is performed

as one rendering pass. However, the algorithm can be quite

slow for database operations on large databases.

GPUs have been used for performing many discretized

geometric computations [22]. These include using stencil

buﬀer hardware for interference computations [33], using

depth-buﬀer hardware to perform distance ﬁeld and proxim-

ity computations [15], and visibility queries for interactive

walkthroughs and shadow generation [12].

High throughput and direct access to texture memory

makes fragment processors powerful computation engines for

certain numerical algorithms, including dense matrix-matrix

multiplication [18], general purpose vector processing [36],

visual simulation based on coupled-map lattices [13], linear

algebra operations [17], sparse matrix solvers for conjugate

gradient and multigrid [4], a multigrid solver for boundary

value problems [11], geometric computations [1, 16], etc.

3. OVERVIEW

In this section, we introduce the basic functionality avail-

able on GPUs and give an overview of the architectural

pipeline. More details are given in [9].

3.1 Graphics Pipeline

A GPU is designed to rapidly transform the geometric

description of a scene into the pixels on the screen that con-

stitute a ﬁnal image. Pixels are stored on the graphics card

in a frame-buﬀer. The frame buﬀer is conceptually divided

into three buﬀers according to the diﬀerent values stored at

each pixel:

• Color Buﬀer: Stores the color components of each

pixel in the frame-buﬀer. Color is typically divided

into red, green, and blue channels with an alpha chan-

nel that is used for blending eﬀects.

• Depth Buﬀer: Stores a depth value associated with

each pixel. The depth is used to determine surface

visibility.

• Stencil Buﬀer: Stores a stencil value for each pixel.

It is called the stencil buﬀer because it is typically

used for enabling/disabling writes to portions of the

frame-buﬀer.

Figure 1: Graphics architectural pipeline overview: This

ﬁgure shows the various units of a modern GPU. Each unit

is designed for performing a speciﬁc operation eﬃciently.

The transformation of geometric primitives (points, lines,

triangles, etc.) to pixels is performed by the graphics pipeline,

consisting of several functional units, each optimized for per-

forming a speciﬁc operation. Fig 1 shows the various stages

involved in rendering a primitive.

• Vertex Processing Engine: This unit receives ver-

tices as input and transforms them to points on the

screen.

• Setup Engine: Transformed vertex data is streamed

to the setup engine which generates slope and initial

value information for color, depth, and other param-

eters associated with the primitive vertices. This in-

formation is used during rasterization for constructing

fragments at each pixel location covered by the prim-

itive.

• Pixel Processing Engines: Before the fragments

are written as pixels to the frame buﬀer, they pass

through the pixel processing engines or fragment pro-

cessors. A series of tests can be used for discarding a

fragment before it is written to the frame buﬀer. Each

test performs a comparison using a user-speciﬁed re-

lational operator and discards the fragment if the test

fails.

– Alpha test: Compares a fragment’s alpha value

to a user-speciﬁed reference value.

– Stencil test: Compares the stencil value of a

fragment’s corresponding pixel with a user-speciﬁed

reference value.

– Depth test: Compares the depth value of a frag-

ment to the depth value of the corresponding pixel

in the frame buﬀer.

The relational operator can be any of the following : =, <,

>, ≤, ≥, and 6=. In addition, there are two operators, never

and always, that do not require a reference value.

Current generations of GPUs have a pixel processing en-

gine that is programmable. The user can supply a custom

fragment program to be executed on each fragment. For ex-

ample, a fragment program can compute the alpha value of

a fragment as a complex function of the fragment’s other

color components or its depth.

3.2 Visibility and Occlusion Queries

Current GPUs can perform visibility and occlusion queries

[27]. When a primitive is rasterized, it is converted to frag-

ments. Some of these fragments may or may not be written

to pixels in the frame buﬀer depending on whether they pass

the alpha, stencil and depth tests. An occlusion query re-

turns the pixel pass count, the number of fragments that

pass the diﬀerent tests. We use these queries for performing

aggregation computations (see Section 4).

3.3 Data Representation on the GPUs

Our goal is to utilize the inherent parallelism and vector

processing capabilities of the GPUs for database operations.

A key aspect is the underlying data representation.

Data is stored on the GPU as textures. Textures are 2D

arrays of values. They are usually used for applying images

to rendered surfaces. They may contain multiple channels.

For example, an RGBA texture has four color channels -

red, blue, green and alpha. A number of diﬀerent data for-

mats can be used for textures including 8-bit bytes, 16-bit

integers, and ﬂoating point. We store data in textures in the

ﬂoating-point format. This format can precisely represent

integers up to 24 bits.

To perform computations on the values stored in a tex-

ture, we render a single quadrilateral that covers the win-

dow. The texture is applied to the quadrilateral such that

the individual elements of the texture, texels, line up with

the pixels in the frame-buﬀer. Rendering the textured quadri-

lateral causes a fragment to be generated for every data

value in the texture. Fragment programs are used for per-

forming computations using the data value from the texture.

Then the alpha, stencil, and depth tests can be used to per-

form comparisons.

3.4 Stencil Tests

Graphics processors use stencil tests for restricting com-

putations to a portion of the frame-buﬀer based on the value

in the stencil buﬀer. Abstractly, we can consider the stencil

buﬀer as a mask on the screen. Each fragment that enters

the pixel processing engine corresponds to a pixel in the

frame-buﬀer. The stencil test compares the stencil value of

a fragment’s corresponding pixel against a reference value.

Fragments that fail the comparison operation are rejected

from the rasterization pipeline.

Stencil operations can modify the stencil value of a frag-

ment’s corresponding pixel. Examples of such stencil oper-

ations include

• KEEP: Keep the stencil value in stencil buﬀer. We

use this operation if we do not want to modify the

stencil value.

• INCR: Increment the stencil value by one.

• DECR: Decrement the stencil value by one.

• ZERO: Set the stencil value to zero.

• REPLACE: Set the stencil value to the reference

value.

• INVERT: Bitwise invert the stencil value.

For each fragment there are three possible outcomes based

on the stencil and depth tests. Based on the outcome of the

tests, the corresponding stencil operation is performed:

• Op1: when a fragment fails the stencil test,

• Op2: when a fragment passes the stencil test and fails

the depth test,

• Op3: when the fragment passes the stencil and depth

tests.

We illustrate these operations with the following pseudo-

code for the StencilOp routine:

StencilOp( Op1, Op2, Op3)

if (stencil test passed) /* perform stencil test */

/* fragment passed stencil test */

if(depth test passed) /* perform depth test */

/* fragment passed stencil and depth test */

perform Op3 on stencil value

else

/* fragment passed stencil test */

/* but failed depth test */

perform Op2 on stencil value

end if

else

/* fragment failed stencil test */

perform Op1 on stencil value

end if

4. BASIC DATABASE OPERATIONS USING

GPUS

In this section, we give a brief overview of basic database

operations that are performed eﬃciently on a GPU. Given

a relational table T of m attributes (a

, a

, ..., a

), a basic

SQL query is in the form of

SELECT A

FROM T

WHERE C

where A may be a list of attributes or aggregations (SUM,

COUNT, AVG, MIN, MAX) deﬁned on individual attributes,

and C is a boolean combination (using AND, OR, EXIST,

NOT EXIST) of predicates that have the form a

op a

or a

op constant. The operator op may be any of the

following: =, 6=, >, ≥, <, ≤. In essence, queries speciﬁed in

this form involve three categories of basic operations: pred-

icates, boolean combinations, and aggregations. Our goal

is to design eﬃcient algorithms for performing these opera-

tions using graphics processors.

• Predicates: Predicates in the form of a

op constant

can be evaluated via the depth test and stencil test.

The comparison between two attributes, a

op a

, can

be transformed into a semi-linear query a

− a

op 0,

which can be executed on the GPUs.

• Boolean combinations: A boolean combination of pred-

icates can always be rewritten in a conjunctive normal

form (CNF). The stencil test can be used repeatedly

for evaluating a series of logical operators with the in-

termediate results stored in the stencil buﬀer.

• Aggregations: This category includes simple operations

such as COUNT, SUM, AVG, MIN, MAX, all of which

can be implemented using the counting capability of

the occlusion queries on GPUs.

To perform these operations on a relational table using

GPUs, we store the attributes of each record in multiple

channels of a single texel, or the same texel location in mul-

tiple textures.

4.1 Predicate Evaluation

In this section, we present novel GPU-based algorithms for

performing comparisons as well as the semi-linear queries.

4.1.1 Comparison between an Attribute and a Con-

stant

We can implement a comparison between an attribute

“tex” and a constant “d” by using the depth test function-

ality of graphics hardware. The stencil buﬀer can be conﬁg-

ured to store the result of the depth test. This is important

not only for evaluating a single comparison but also for con-

structing more complex boolean combinations of multiple

predicates.

To use the depth test for performing comparisons, at-

tribute values need to be stored in the depth buﬀer. We

use a simple fragment program for copying the attribute

values from the texture memory to the depth buﬀer.

A comparison operation against a depth value d is imple-

mented by rendering a screen ﬁlling quadrilateral with depth

d. In this operation, the rasterization hardware uses the

comparison function for testing each attribute value stored

in the depth buﬀer against d. The comparison function is

speciﬁed using the depth function. Routine 4.1 describes

the pseudo-code for our implementation.

4.1.2 Comparison between Two Attributes

The comparison between two attributes, a

op a

, can

be transformed into a special semi-linear query (a

− a

0), which can be performed very eﬃciently using the vector

processors on the GPUs. Here, we propose a fast algorithm

that can perform any general semi-linear query on GPUs.

Compare( tex, op, d )

1 CopyToDepth( tex )

2 set depth test function to op

3 RenderQuad( d )

CopyToDepth( tex )

1 set up fragment program

2 RenderTexturedQuad( tex )

ROUTINE 4.1: Compare compares the attribute values

stored in texture tex against d using the comparison function

op. CopyToDepth called on line 1 copies the attribute values in

tex into the depth buﬀer. CopyToDepth uses a simple fragment

program on each pixel of the screen for performing the copy oper-

ation. On line 2, the depth test is conﬁgured to use the compar-

ison operator op. The function RenderQuad(d) called on line

3 generates a fragment at a speciﬁed depth d for each pixel on

the screen. Rasterization hardware compares the fragment depth

d against the attribute values in depth buﬀer using the operation

op.

Semi-linear Queries on GPUs

Applications encountered in Geographical Information Sys-

tems (GIS), geometric modeling, and spatial databases de-

ﬁne geometric data objects as linear inequalities of the at-

tributes in a relational database [28]. Such geometric data

objects are called semi-linear sets. GPUs are capable of fast

computation on semi-linear sets. A linear combination of m

attributes is represented as:

i=m

i=1

· a

where each s

is a scalar multiplier and each a

is an attribute

of a record in the database. The above expression can be

considered as a dot product of two vectors s and a where

s = (s

, s

, ..., s

) and a = (a

, a

, ..., a

Semilinear( tex, s, op, b )

1 enable fragment program SemilinearFP(s, b)

2 RenderTexturedQuad( tex )

SemilinearFP( s, op, b)

1 a = value from tex

2 if dot( s, a) op b

3 discard fragment

ROUTINE 4.2: Semilinear computes the semi-linear query

by performing linear combination of attribute values in tex and

scalar constants in s. Using the operator op, it compares the the

scalar value due to linear combination with b. To perform this

operation, we render a screen ﬁlling quad and generate fragments

on which the semi-linear query is executed. For each fragment,

a fragment program SemilinearFP discards fragments that fail

the query.

Semilinear computes the semi-linear query:

(s · a) op b

where op is a comparison operator and b is a scalar con-

stant. The attributes a

are stored in separate channels in

the texture tex. There is a limit of four channels per texture.

Longer vectors can be split into multiple textures, each with

four components. The fragment program SemilinearFP()

performs the dot product of a texel from tex with s and

compares the result to b. It discards the fragment if the

comparison fails. Line 2 renders a textured quadrilateral

using the fragment program. Semilinear maps very well

to the parallel pixel processing as well as vector processing

capabilities available on the GPUs. This algorithm can also

be extended for evaluating polynomial queries.

EvalCNF( A )

1 Clear Stencil to 1.

2 For each of A

, i = 1, .., k

3 do

4 if ( mod(i, 2) ) /* valid stencil value is 1 */

5 Stencil Test to pass if stencil value is equal to 1

6 StencilOp(KEEP,KEEP,INCR)

7 else /* valid stencil value is 2 */

8 Stencil Test to pass if stencil value is equal to 2

9 StencilOp(KEEP,KEEP,DECR)

10 endif

11 For each B

, j = 1, .., m

12 do

13 Perform B

using Compare

14 end for

15 if ( mod(i, 2)) /* valid stencil value is 2 */

16 if a stencil value on screen is 1, replace it with 0

17 else /* valid stencil value is 1 */

18 if a stencil value on screen is 2, replace it with 0

19 endif

20 end for

ROUTINE 4.3: EvalCNF is used to evaluate a CNF ex-

pression. Initially, the stencil is initialized to 1. This is used

for performing T RUE AND A

. While evaluating each formula

, Line 4 sets the appropriate stencil test and stencil operations

based on whether i is even or odd. If i is even, valid portions

on screen have stencil value 2. Otherwise, valid portions have

stencil value 1. Lines 11 − 14 invalidate portions on screen that

satisfy (A

∧ A

∧ ... ∧ A

i−1

) and fail (A

∧ A

∧ ... ∧ A

). Lines

15 − 19 compute the disjunction of B

for each predicate A

. At

the end of line 19, valid portions on screen have stencil value 2

if i is odd and 1, otherwise. At the end of the line 20, records

corresponding to non-zero stencil values satisfy A.

4.2 Boolean Combination

Complex boolean combinations are often formed by com-

bining simple predicates with the logical operators AND,

OR, NOT. In these cases, the stencil operation is speciﬁed

to store the result of a predicate. We use the function Sten-

cilOp (as deﬁned in Section 3.4) to initialize the appropriate

stencil operation for storing the result in stencil buﬀer.

Our algorithm evaluates a boolean expression represented

as a CNF expression. We assume that the CNF expression

has no NOT operators. If a simple predicate in this ex-

pression has a NOT operator, we can invert the comparison

operation and eliminate the NOT operator. A CNF expres-

sion C

is represented as A

∧ A

∧ ... ∧ A

where each A

represented as B

∨ B

∨ ... ∨ B

. Each B

, j = 1, 2, .., m

is a simple predicate.

The CNF C

can be evaluated using the recursion C

k−1

∧ A

. C

is considered as T RU E. We use the pseu-

docode in routine 4.3 for evaluating C

. Our approach uses

three stencil values 0, 1, 2 for validating data. Data values

corresponding to the stencil value 0 are always invalid. Ini-

tially, the stencil values are initialized to 1. If i is the iter-

ation value for the loop in line 2, lines 3 − 19 evaluate C

The valid stencil value is 1 or 2 depending on whether i is

even or odd respectively. At the end of line 19, portions on

the screen with non-zero stencil value satisfy the CNF C

We can easily modify our algorithm for handling a boolean

expression represented as a DNF.

Fast computation of database operations using graphics processors

Figures

Citations

Mars: a MapReduce framework on graphics processors

GPUTeraSort: high performance graphics co-processor sorting for large database management

Accelerating SQL database operations on a GPU with CUDA

PTask: operating system abstractions to manage GPUs as compute devices

A memory model for scientific algorithms on graphics processors

References

Sparse matrix solvers on the GPU: conjugate gradients and multigrid

Cg: a system for programming graphics hardware in a C-like language

Ray tracing on programmable graphics hardware

Fast computation of generalized Voronoi diagrams using graphics hardware

DBMSs on a Modern Processor: Where Does Time Go?

Related Papers (5)

GPUTeraSort: high performance graphics co-processor sorting for large database management

Relational joins on graphics processors

Brook for GPUs: stream computing on graphics hardware

A Survey of General-Purpose Computation on Graphics Hardware

Sparse matrix solvers on the GPU: conjugate gradients and multigrid

Frequently Asked Questions (17)

Q1. What have the authors contributed in "Fast computation of database operations using graphics processors" ?

Q2. What have the authors stated for future works in "Fast computation of database operations using graphics processors" ?

Q3. What is the current trend of database architecture?

Q4. How was the implementation of each operation redesigned?

Q5. What is the recent development of GPUs?

Q6. How many pixels can be processed at a 450 MHz processor?

Q7. How do the authors achieve significant performance improvement over CPU-based implementations?

Q8. Why is the GPU not able to copy large databases into memory?

Q9. What are the two major components of a desktop computer?

Q10. What is the way to measure the traffic patterns in the census database?

Q11. What are the roles of CPUs and GPUs in computing?

Q12. Why is it difficult to perform comparisons between CPU-based and GPU-based algorithms?

Q13. How many processors are available for fragment programs?

Q14. What are the main operations used for building complex database queries?

Q15. How many times have they sped up on intersection joins?

Q16. What is the speedup in the KthLargest() algorithm?

Q17. How can the authors perform a comparison between an attribute and a constant?