Home
/
Authors
/
Shubhabrata Sengupta

Author

Shubhabrata Sengupta

Other affiliations: Baidu, Nvidia

Bio: Shubhabrata Sengupta is an academic researcher from University of California, Davis. The author has contributed to research in topics: Data structure & Graphics hardware. The author has an hindex of 15, co-authored 25 publications receiving 2087 citations. Previous affiliations of Shubhabrata Sengupta include Baidu & Nvidia.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Scan primitives for GPU computing

[...]

Shubhabrata Sengupta¹, Mark J. Harris², Yao Zhang¹, John D. Owens¹•Institutions (2)

University of California, Davis¹, Nvidia²

04 Aug 2007

TL;DR: Using the scan primitives, this work shows novel GPU implementations of quicksort and sparse matrix-vector multiply, and analyzes the performance of the scanPrimitives, several sort algorithms that use the scan Primitives, and a graphical shallow-water fluid simulation using the scan framework for a tridiagonal matrix solver.

...read moreread less

Abstract: The scan primitives are powerful, general-purpose data-parallel primitives that are building blocks for a broad range of applications. We describe GPU implementations of these primitives, specifically an efficient formulation and implementation of segmented scan, on NVIDIA GPUs using the CUDA API. Using the scan primitives, we show novel GPU implementations of quicksort and sparse matrix-vector multiply, and analyze the performance of the scan primitives, several sort algorithms that use the scan primitives, and a graphical shallow-water fluid simulation using the scan framework for a tridiagonal matrix solver.

...read moreread less

635 citations

Journal Article•DOI•

Fast BVH Construction on GPUs

[...]

Christian Lauterbach¹, Michael Garland², Shubhabrata Sengupta³, David Luebke², Dinesh Manocha¹ - Show less +1 more•Institutions (3)

University of North Carolina at Chapel Hill¹, Nvidia², University of California, Davis³

01 Apr 2009-Computer Graphics Forum

TL;DR: Preliminary results show that current GPU architectures can compete with CPU implementations of hierarchy construction running on multicore systems and can construct hierarchies of models with up to several million triangles and use them for fast ray tracing or other applications.

...read moreread less

Abstract: We present two novel parallel algorithms for rapidly constructing bounding volume hierarchies on manycore GPUs. The first uses a linear ordering derived from spatial Morton codes to build hierarchies extremely quickly and with high parallel scalability. The second is a top-down approach that uses the surface area heuristic (SAH) to build hierarchies optimized for fast ray tracing. Both algorithms are combined into a hybrid algorithm that removes existing bottlenecks in the algorithm for GPU construction performance and scalability leading to significantly decreased build time. The resulting hierarchies are close in to optimized SAH hierarchies, but the construction process is substantially faster, leading to a significant net benefit when both construction and traversal cost are accounted for. Our preliminary results show that current GPU architectures can compete with CPU implementations of hierarchy construction running on multicore systems. In practice, we can construct hierarchies of models with up to several million triangles and use them for fast ray tracing or other applications.

...read moreread less

414 citations

Journal Article•DOI•

Real-time parallel hashing on the GPU

[...]

Dan A. Alcantara¹, Andrei Sharf¹, Fatemeh Abbasinejad¹, Shubhabrata Sengupta¹, Michael Mitzenmacher², John D. Owens¹, Nina Amenta¹ - Show less +3 more•Institutions (2)

University of California, Davis¹, Harvard University²

01 Dec 2009

TL;DR: An efficient data-parallel algorithm for building large hash tables of millions of elements in real-time, which considers a classical sparse perfect hashing approach, and cuckoo hashing, which packs elements densely by allowing an element to be stored in one of multiple possible locations.

...read moreread less

Abstract: We demonstrate an efficient data-parallel algorithm for building large hash tables of millions of elements in real-time. We consider two parallel algorithms for the construction: a classical sparse perfect hashing approach, and cuckoo hashing, which packs elements densely by allowing an element to be stored in one of multiple possible locations. Our construction is a hybrid approach that uses both algorithms. We measure the construction time, access time, and memory usage of our implementations and demonstrate real-time performance on large datasets: for 5 million key-value pairs, we construct a hash table in 35.7 ms using 1.42 times as much memory as the input data itself, and we can access all the elements in that hash table in 15.3 ms. For comparison, sorting the same data requires 36.6 ms, but accessing all the elements via binary search requires 79.5 ms. Furthermore, we show how our hashing methods can be applied to two graphics applications: 3D surface intersection for moving data and geometric hashing for image matching.

...read moreread less

194 citations

Journal Article•DOI•

Glift: Generic, efficient, random-access GPU data structures

[...]

Aaron Lefohn¹, Shubhabrata Sengupta¹, Joe Kniss², Robert Strzodka³, John D. Owens¹ - Show less +1 more•Institutions (3)

University of California, Davis¹, University of Utah², Stanford University³

01 Jan 2006-ACM Transactions on Graphics

TL;DR: Glift, an abstraction and generic template library for defining complex, random-access graphics processor (GPU) data structures, is presented and several new GPU data structures are characterized and implemented using reusable Glift components.

...read moreread less

Abstract: This article presents Glift, an abstraction and generic template library for defining complex, random-access graphics processor (GPU) data structures. Like modern CPU data structure libraries, Glift enables GPU programmers to separate algorithms from data structure definitions; thereby greatly simplifying algorithmic development and enabling reusable and interchangeable data structures. We characterize a large body of previously published GPU data structures in terms of our abstraction and present several new GPU data structures. The structures, a stack, quadtree, and octree, are explained using simple Glift concepts and implemented using reusable Glift components. We also describe two applications of these structures not previously demonstrated on GPUs: adaptive shadow maps and octree three-dimensional paint. Last, we show that our example Glift data structures perform comparably to handwritten implementations while requiring only a fraction of the programming effort.

...read moreread less

174 citations

Efficient Parallel Scan Algorithms for GPUs

[...]

Shubhabrata Sengupta, Mark J. Harris, Michael Garland

01 Jan 2011

TL;DR: This paper describes the design of ecient scan and segmented scan parallel primitives in CUDA for execution on GPUs using a divide-and-conquer approach and demonstrates that this design methodology results in routines that are simple, highly ecient, and free of irregular access patterns that lead to memory bank conicts.

...read moreread less

Abstract: Scan and segmented scan algorithms are crucial building blocks for a great many data-parallel algorithms. Segmented scan and related primitives also provide the necessary support for the attening transform, which allows for nested data-parallel programs to be compiled into at data-parallel languages. In this paper, we describe the design of ecient scan and segmented scan parallel primitives in CUDA for execution on GPUs. Our algorithms are designed using a divide-and-conquer approach that builds all scan primitives on top of a set of primitive intra-warp scan routines. We demonstrate that this design methodology results in routines that are simple, highly ecient, and free of irregular access patterns that lead to memory bank conicts. These algorithms form the basis for current and upcoming releases of the widely used CUDPP library.

...read moreread less

160 citations

1
2
3
4
…
5

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Rodinia: A benchmark suite for heterogeneous computing

[...]

Shuai Che¹, Michael Boyer¹, Jiayuan Meng¹, David Tarjan¹, Jeremy W. Sheaffer¹, Sang-Ha Lee¹, Kevin Skadron¹ - Show less +3 more•Institutions (1)

University of Virginia¹

04 Oct 2009

TL;DR: This characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.

...read moreread less

Abstract: This paper presents and characterizes Rodinia, a benchmark suite for heterogeneous computing. To help architects study emerging platforms such as GPUs (Graphics Processing Units), Rodinia includes applications and kernels which target multi-core CPU and GPU platforms. The choice of applications is inspired by Berkeley's dwarf taxonomy. Our characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.

...read moreread less

2,697 citations

Proceedings Article•DOI•

Scalable parallel programming with CUDA

[...]

John R. Nickolls¹, Ian Buck¹, Michael Garland¹, Kevin Skadron²•Institutions (2)

Nvidia¹, University of Virginia²

11 Aug 2008

TL;DR: Presents a collection of slides covering the following topics: CUDA parallel programming model; CUDA toolkit and libraries; performance optimization; and application development.

...read moreread less

Abstract: The advent of multicore CPUs and manycore GPUs means that mainstream processor chips are now parallel systems. Furthermore, their parallelism continues to scale with Moore's law. The challenge is to develop mainstream application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism to manycore GPUs with widely varying numbers of cores.

...read moreread less

2,216 citations

Journal Article•DOI•

A Survey of General-Purpose Computation on Graphics Hardware

[...]

John D. Owens¹, David Luebke², Naga K. Govindaraju³, Mark J. Harris², Jens Krüger⁴, Aaron Lefohn, Timothy John Purcell² - Show less +3 more•Institutions (4)

University of California, Davis¹, Nvidia², Microsoft³, Technische Universität München⁴

01 Mar 2007-Computer Graphics Forum

TL;DR: This report describes, summarize, and analyzes the latest research in mapping general‐purpose computation to graphics hardware.

...read moreread less

Abstract: The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware a compelling platform for computationally demanding tasks in a wide variety of application domains. In this report, we describe, summarize, and analyze the latest research in mapping general-purpose computation to graphics hardware. We begin with the technical motivations that underlie general-purpose computation on graphics processors (GPGPU) and describe the hardware and software developments that have led to the recent interest in this field. We then aim the main body of this report at two separate audiences. First, we describe the techniques used in mapping general-purpose computation to graphics hardware. We believe these techniques will be generally useful for researchers who plan to develop the next generation of GPGPU algorithms and techniques. Second, we survey and categorize the latest developments in general-purpose application development on graphics hardware. This survey should be of particular interest to researchers who are interested in using the latest GPGPU applications in their systems of interest.

...read moreread less

1,998 citations

Proceedings Article•

A Survey of General-Purpose Computation on Graphics Hardware.

[...]

John D. Owens¹, David Luebke², Naga K. Govindaraju³, Mark J. Harris², Jens Krüger⁴, Aaron Lefohn, Timothy John Purcell² - Show less +3 more•Institutions (4)

University of California, Davis¹, Nvidia², Microsoft³, Technische Universität München⁴

01 Jan 2005

TL;DR: The techniques used in mapping general-purpose computation to graphics hardware will be generally useful for researchers who plan to develop the next generation of GPGPU algorithms and techniques.

...read moreread less

1,728 citations

Journal Article•DOI•

Scalable Parallel Programming with CUDA: Is CUDA the parallel programming model that application developers have been waiting for?

[...]

John R. Nickolls¹, Ian Buck¹, Michael Garland¹, Kevin Skadron²•Institutions (2)

Nvidia¹, University of Virginia²

01 Mar 2008-ACM Queue

TL;DR: In this article, the authors present a framework to develop mainstream application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism on manycore GPUs with widely varying numbers of cores.

...read moreread less

Abstract: The advent of multicore CPUs and manycore GPUs means that mainstream processor chips are now parallel systems. Furthermore, their parallelism continues to scale with Moore’s law. The challenge is to develop mainstream application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism to manycore GPUs with widely varying numbers of cores.

...read moreread less

1,148 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse