Topic

CUDA Pinned memory

About: CUDA Pinned memory is a research topic. Over the lifetime, 1097 publications have been published within this topic receiving 30198 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The Organization and Use of Parallel Memories

[...]

Paul Budnik, David J. Kuck

01 Dec 1971-IEEE Transactions on Computers

TL;DR: As computer CPUs get faster, primary memories tend to be organized in parallel banks, and important questions of design and use of such memories are discussed.

...read moreread less

Abstract: As computer CPUs get faster, primary memories tend to be organized in parallel banks. The fastest machines now being developed can fetch of the order of 100 words in parallel. Unless memory and compiler designers are careful, serious memory conflicts and resulting performance degradation may result. Some of the important questions of design and use of such memories are discussed.

...read moreread less

306 citations

Book Chapter•DOI•

CUDA-Lite: Reducing GPU Programming Complexity

[...]

Sain-Zee Ueng¹, Melvin Lathara¹, Sara S. Baghsorkhi¹, Wen-mei W. Hwu¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

28 Nov 2008

TL;DR: The present CUDA-lite, an enhancement to CUDA, is presented and preliminary results that indicate auto-generated code can have performance comparable to hand coding are shown.

...read moreread less

Abstract: The computer industry has transitioned into multi-core and many-core parallel systems. The CUDA programming environment from NVIDIA is an attempt to make programming many-core GPUs more accessible to programmers. However, there are still many burdens placed upon the programmer to maximize performance when using CUDA. One such burden is dealing with the complex memory hierarchy. Efficient and correct usage of the various memories is essential, making a difference of 2-17x in performance. Currently, the task of determining the appropriate memory to use and the coding of data transfer between memories is still left to the programmer. We believe that this task can be better performed by automated tools. We present CUDA-lite, an enhancement to CUDA, as one such tool. We leverage programmer knowledge via annotations to perform transformations and show preliminary results that indicate auto-generated code can have performance comparable to hand coding.

...read moreread less

257 citations

Proceedings Article•DOI•

UberFlow: a GPU-based particle engine

[...]

Peter Kipfer¹, Mark Segal¹, Rüdiger Westermann¹•Institutions (1)

Technische Universität München¹

29 Aug 2004

TL;DR: By combining memory objects with floating-point fragment programs, this work has implemented a particle engine that entirely avoids the transfer of particle data at runtime.

...read moreread less

Abstract: We present a system for real-time animation and rendering of large particle sets using GPU computation and memory objects in OpenGL. Memory objects can be used both as containers for geometry data stored on the graphics card and as render targets, providing an effective means for the manipulation and rendering of particle data on the GPU.To fully take advantage of this mechanism, efficient GPU realizations of algorithms used to perform particle manipulation are essential. Our system implements a versatile particle engine, including inter-particle collisions and visibility sorting. By combining memory objects with floating-point fragment programs, we have implemented a particle engine that entirely avoids the transfer of particle data at run-time. Our system can be seen as a forerunner of a new class of graphics algorithms, exploiting memory objects or similar concepts on upcoming graphics hardware to avoid bus bandwidth becoming the major performance bottleneck.

...read moreread less

255 citations

Proceedings Article•DOI•

OpenVIDIA: parallel GPU computer vision

[...]

James Fung¹, Steve Mann¹•Institutions (1)

University of Toronto¹

06 Nov 2005

TL;DR: This paper proposes using GPUs in approximately the reverse way: to assist in "converting pictures into numbers" (i.e. computer vision) and provides a simple API which implements some common computer vision algorithms.

...read moreread less

Abstract: Graphics and vision are approximate inverses of each other: ordinarily Graphics Processing Units (GPUs) are used to convert "numbers into pictures" (i.e. computer graphics). In this paper, we propose using GPUs in approximately the reverse way: to assist in "converting pictures into numbers" (i.e. computer vision). The OpenVIDIA project uses single or multiple graphics cards to accelerate image analysis and computer vision. It is a library and API aimed at providing a graphics hardware accelerated processing framework for image processing and computer vision. OpenVIDIA explores the creation of a parallel computer architecture consisting of multiple Graphics Processing Units (GPUs) built entirely from commodity hardware. OpenVIDIA uses multiple Graphic.Processing Units in parallel to operate as a general-purpose parallel computer architecture. It provides a simple API which implements some common computer vision algorithms. Many components can be used immediately and because the project is Open Source, the code is intended to serve as templates and examples for how similar algorithms are mapped onto graphics hardware. Implemented are image processing techniques (Canny edge detection, filtering), image feature handling (identifying and matching features) and image registration, to name a few.

...read moreread less

250 citations

Book Chapter•DOI•

Automatic C-to-CUDA code generation for affine programs

[...]

Muthu Baskaran¹, J. Ramanujam², P. Sadayappan¹•Institutions (2)

Ohio State University¹, Louisiana State University²

20 Mar 2010

TL;DR: An automatic code transformation system that generates parallel CUDA code from input sequential C code, for regular (affine) programs, that is quite close to hand-optimizedCUDA code and considerably better than the benchmarks' performance on a multicore CPU.

...read moreread less

Abstract: Graphics Processing Units (GPUs) offer tremendous computational power. CUDA (Compute Unified Device Architecture) provides a multi-threaded parallel programming model, facilitating high performance implementations of general-purpose computations. However, the explicitly managed memory hierarchy and multi-level parallel view make manual development of high-performance CUDA code rather complicated. Hence the automatic transformation of sequential input programs into efficient parallel CUDA programs is of considerable interest. This paper describes an automatic code transformation system that generates parallel CUDA code from input sequential C code, for regular (affine) programs. Using and adapting publicly available tools that have made polyhedral compiler optimization practically effective, we develop a C-to-CUDA transformation system that generates two-level parallel CUDA code that is optimized for efficient data access. The performance of automatically generated code is compared with manually optimized CUDA code for a number of benchmarks. The performance of the automatically generated CUDA code is quite close to hand-optimized CUDA code and considerably better than the benchmarks' performance on a multicore CPU.

...read moreread less

229 citations

Collapse

Network Information

Performance

Metrics

1,097

Papers

31,680

Citations

No. of papers in the topic in previous years
Year	Papers
2021	1
2020	3
2019	7
2018	7
2017	45
2016	54

CUDA Pinned memory

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics