scispace - formally typeset
Proceedings ArticleDOI

A Scalable LDPC Decoder on GPU

TLDR
A compact data packing scheme to reduce the number of global memory accesses and parity-check matrix representation to reduce constant memory latency is proposed and achieves a throughput of 160Mbps, which is comparable to dedicated hardware solutions.
Abstract
A flexible and scalable approach for LDPC decodingon CUDA based Graphics Processing Unit (GPU) is presented in this paper. Layered decoding is a popular method for LDPC decoding and is known for its fast convergence. However, efficient implementation of the layered decoding algorithm on GPU is challenging due to the limited amount of data-parallelism available in this algorithm. To overcome this problem, a kernel execution configuration that can decode multiple codewords simultaneously on GPU is developed. This paper proposes a compact data packing scheme to reduce the number of global memory accesses and parity-check matrix representation to reduce constant memory latency. Global memory bandwidth efficiency is improved by coalescing simultaneous memory accesses of threads in a half-warp into a single memory transaction. Asynchronous data transfers are used to hide host memory latency by overlapping kernel execution with data transfers between CPU and GPU. The proposed implementation of LDPC decoder on GPU performs two orders of magnitude faster than the LDPC decoder on a CPU and four times faster than the previously reported LDPC decoder on GPU. This implementation achieves a throughput of 160Mbps, which is comparable to dedicated hardware solutions.

read more

Citations
More filters
Proceedings ArticleDOI

High throughput low latency LDPC decoding on GPU for SDR systems

TL;DR: This paper presents optimization techniques for a parallel LDPC decoder including algorithm optimization, fully coalesced memory access, asynchronous data transfer and multi-stream concurrent kernel execution for modern GPU architectures.
Journal ArticleDOI

High Throughput LDPC Decoder on GPU

TL;DR: This work designed a multi-codeword parallel decoder with fully coalesced memory access that achieved more than 550Mbps throughput on Compute Unified Device Architecture (CUDA) enabled GPU.
Proceedings ArticleDOI

A multi-standard efficient column-layered LDPC decoder for Software Defined Radio on GPUs

TL;DR: A multi-standard high-throughput column-layered (CL) low-density parity-check (LDPC) decoder for Software-Defined Radio (SDR) on a Graphics Processing Unit (GPU) platform and can achieve a performance improvement of 3.0x times compared to the existing fastest GPU-based implementation.
Journal ArticleDOI

Complexity analysis of software defined DVB-T2 physical layer

TL;DR: A complexity analysis of the software defined implementation of the modulator/demodulator parts of a DVB-T2 transmitter and receiver is performed and implementing these computationally heavy blocks on other architectures that would still allow them to be implemented in software and thus be easily reconfigurable is discussed.
Journal ArticleDOI

A Survey on Programmable LDPC Decoders

TL;DR: A survey of the most relevant publications made in the past decade to programmable LDPC decoders looks at the advantages and disadvantages of parallel architectures and data-parallel programming models, and assesses how the design space exploration is pursued regarding key characteristics of the underlying code and decoding algorithm features.
References
More filters
Book

Low-Density Parity-Check Codes

TL;DR: A simple but nonoptimum decoding scheme operating directly from the channel a posteriori probabilities is described and the probability of error using this decoder on a binary symmetric channel is shown to decrease at least exponentially with a root of the block length.
Journal ArticleDOI

A recursive approach to low complexity codes

TL;DR: It is shown that choosing a transmission order for the digits that is appropriate for the graph and the subcodes can give the code excellent burst-error correction abilities.
Journal ArticleDOI

Near Shannon limit performance of low density parity check codes

TL;DR: The authors report the empirical performance of Gallager's low density parity check codes on Gaussian channels, showing that performance substantially better than that of standard convolutional and concatenated codes can be achieved.

Air Interface for Fixed Broadband Wireless Access Systems

Todor Cooklev
TL;DR: This document is FOR COMMENT as a potential DRAFT standard for medium-access physical layer components that meet the functional requirements of a point-to-multipoint Broadband (BWA) system as defined by the IEEE 802.16 Working Group.
Journal ArticleDOI

Reduced-complexity decoding of LDPC codes

TL;DR: The unified treatment of decoding techniques for LDPC codes presented here provides flexibility in selecting the appropriate scheme from performance, latency, computational-complexity, and memory-requirement perspectives.