scispace - formally typeset
Search or ask a question
Journal ArticleDOI

GPU Implementation of Image Convolution Using Sparse Model with Efficient Storage Format

01 Jan 2018-Vol. 10, Iss: 1, pp 54-70
TL;DR: With the KEywoRdS Convolution, CSR, Edge Detection and Image Smoothening, GPU, SpMV
Abstract: With the growth of data parallel computing, role of GPU computing in non-graphic applications such as image processing becomes a focus in research fields. Convolution is an integral operation in filtering, smoothing and edge detection. In this article, the process of convolution is realized as a sparse linear system and is solved using Sparse Matrix Vector Multiplication (SpMV). The Compressed Sparse Row (CSR) format of SPMV shows better CPU performance compared to normal convolution. To overcome the stalling of threads for short rows in the GPU implementation of CSR SpMV, a more efficient model is proposed, which uses the Adaptive-Compressed Row Storage (A-CSR) format to implement the same. Using CSR in the convolution process achieves a 1.45x and a 1.159x increase in speed compared to the normal convolution of image smoothing and edge detection operations, respectively. An average speedup of 2.05x is achieved for image smoothing technique and 1.58x for edge detection technique in GPU platform usig adaptive CSR format. KEywoRdS Convolution, CSR, Edge Detection and Image Smoothening, GPU, SpMV
Citations
More filters
Journal ArticleDOI
TL;DR: In this article , the authors used convolutional techniques to correct the images that were messed up, which can be used in a variety of ways, including smoothing, sharpening, reducing noise, and detecting borders.
Abstract: Image filtering is a common technique used in digital image processing that can be used to take a picture appear differently aesthetically. Noise, also known as distracting visual artifacts, can lower the overall quality of a picture, which is why image improvement techniques are required to fix the problem. It can be utilized in a variety of ways, including smoothing, sharpening, reducing noise, and detecting borders, to name a few. In this piece, we will be using convolutional techniques to correct the images that were messed up. The first thing that needs to be done is a point-by-point multiplication of the frequency domain representation of the picture that's being entered through a black image that has a small white rectangle in the mid of it. This is the first step. Only the lowest harmonics are kept after we apply a filter that gets rid of the higher ones. Because the high frequencies in the input picture are filtered out, the special domain of the image that is produced should look like a blurrier variation of the original picture. Therefore, a greater degree of detail preservation is indicated when the white rectangle W is larger because this indicates that more high-frequency components of I have been preserved.

1 citations

References
More filters
Journal ArticleDOI
TL;DR: The University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications, is described and a new multilevel coarsening scheme is proposed to facilitate this task.
Abstract: We describe the University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications The Collection is widely used by the numerical linear algebra community for the development and performance evaluation of sparse matrix algorithms It allows for robust and repeatable experiments: robust because performance results with artificially generated matrices can be misleading, and repeatable because matrices are curated and made publicly available in many formats Its matrices cover a wide spectrum of domains, include those arising from problems with underlying 2D or 3D geometry (as structural engineering, computational fluid dynamics, model reduction, electromagnetics, semiconductor devices, thermodynamics, materials, acoustics, computer graphics/vision, robotics/kinematics, and other discretizations) and those that typically do not have such geometry (optimization, circuit simulation, economic and financial modeling, theoretical and quantum chemistry, chemical process simulation, mathematics and statistics, power networks, and other networks and graphs) We provide software for accessing and managing the Collection, from MATLAB™, Mathematica™, Fortran, and C, as well as an online search capability Graph visualization of the matrices is provided, and a new multilevel coarsening scheme is proposed to facilitate this task

3,456 citations

Journal ArticleDOI
TL;DR: In this article, a new version of the Perona and Malik theory for edge detection and image restoration is proposed, which keeps all the improvements of the original model and avoids its drawbacks.
Abstract: A new version of the Perona and Malik theory for edge detection and image restoration is proposed. This new version keeps all the improvements of the original model and avoids its drawbacks: it is proved to be stable in presence of noise, with existence and uniqueness results. Numerical experiments on natural images are presented.

2,565 citations

Nathan Bell1, Michael Garland1
01 Jan 2008
TL;DR: Data structures and algorithms for SpMV that are eciently implemented on the CUDA platform for the ne-grained parallel architecture of the GPU and develop methods to exploit several common forms of matrix structure while oering alternatives which accommodate greater irregularity are developed.
Abstract: The massive parallelism of graphics processing units (GPUs) oers tremendous performance in many high-performance computing applications. While dense linear algebra readily maps to such platforms, harnessing this potential for sparse matrix computations presents additional challenges. Given its role in iterative methods for solving sparse linear systems and eigenvalue problems, sparse matrix-vector multiplication (SpMV) is of singular importance in sparse linear algebra. In this paper we discuss data structures and algorithms for SpMV that are eciently implemented on the CUDA platform for the ne-grained parallel architecture of the GPU. Given the memory-bound nature of SpMV, we emphasize memory bandwidth eciency and compact storage formats. We consider a broad spectrum of sparse matrices, from those that are well-structured and regular to highly irregular matrices with large imbalances in the distribution of nonzeros per matrix row. We develop methods to exploit several common forms of matrix structure while oering alternatives which accommodate greater irregularity. On structured, grid-based matrices we achieve performance of 36 GFLOP/s in single precision and 16 GFLOP/s in double precision on a GeForce GTX 280 GPU. For unstructured nite-element matrices, we observe performance in excess of 15 GFLOP/s and 10 GFLOP/s in single and double precision respectively. These results compare favorably to prior state-of-the-art studies of SpMV methods on conventional multicore processors. Our double precision SpMV performance is generally two and a half times that of a Cell BE with 8 SPEs and more than ten times greater than that of a quad-core Intel Clovertown system.

795 citations

Journal ArticleDOI
TL;DR: This review presents the past and present work on GPU accelerated medical image processing, and is meant to serve as an overview and introduction to existing GPU implementations.

360 citations

Proceedings ArticleDOI
16 Nov 2014
TL;DR: This work proposes a novel algorithm, CSR-Adaptive, which keeps the CSR format intact and maps well to GPUs, and achieves an average speedup of 14.7× over existingCSR-based algorithms and 2.3× over clSpMV cocktail, which uses an assortment of matrix formats.
Abstract: The performance of sparse matrix vector multiplication (SpMV) is important to computational scientists. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMV on graphics processing units (GPUs) has poor performance due to irregular memory access patterns, load imbalance, and reduced parallelism. This has led researchers to propose new storage formats. Unfortunately, dynamically transforming CSR into these formats has significant runtime and storage overheads. We propose a novel algorithm, CSR-Adaptive, which keeps the CSR format intact and maps well to GPUs. Our implementation addresses the aforementioned challenges by (i) efficiently accessing DRAM by streaming data into the local scratchpad memory and (ii) dynamically assigning different numbers of rows to each parallel GPU compute unit. CSR-Adaptive achieves an average speedup of 14.7 × over existing CSR-based algorithms and 2.3× over clSpMV cocktail, which uses an assortment of matrix formats.

182 citations