A High-Performance Energy-Efficient Architecture for FIR Adaptive Filter Based on New Distributed Arithmetic Formulation of Block LMS Algorithm

doi:10.1109/TSP.2012.2226453

Journal ArticleDOI

A High-Performance Energy-Efficient Architecture for FIR Adaptive Filter Based on New Distributed Arithmetic Formulation of Block LMS Algorithm

Basant Kumar Mohanty, +1 more

- 01 Feb 2013 -

IEEE Transactions on Signal Processing

- Vol. 61, Iss: 4, pp 921-932

Chats0

TLDR

An efficient distributed-arithmetic formulation for the implementation of block least mean square (BLMS) algorithm using a novel look-up table (LUT)-sharing technique for the computation of filter outputs and weight-increment terms of BLMS algorithm, which offers significant saving of adders which constitute a major component of DA-based structures.

Abstract:

In this paper, we present an efficient distributed-arithmetic (DA) formulation for the implementation of block least mean square (BLMS) algorithm. The proposed DA-based design uses a novel look-up table (LUT)-sharing technique for the computation of filter outputs and weight-increment terms of BLMS algorithm. Besides, it offers significant saving of adders which constitute a major component of DA-based structures. Also, we have suggested a novel LUT-based weight updating scheme for BLMS algorithm, where only one set of LUTs out of M sets need to be modified in every iteration, where N=ML, N, and L are, respectively, the filter length and input block-size. Based on the proposed DA formulation, we have derived a parallel architecture for the implementation of BLMS adaptive digital filter (ADF). Compared with the best of the existing DA-based LMS structures, proposed one involves nearly L/6 times adders and L times LUT words, and offers nearly L times throughput of the other. It requires nearly 25% more flip-flops and does not involve variable shifters like those of existing structures. It involves less LUT access per output (LAPO) than the existing structure for block-size higher than 4. For block-size 8 and filter length 64, the proposed structure involves 2.47 times more adders, 15% more flip-flops, 43% less LAPO than the best of existing structures, and offers 5.22 times higher throughput. The number of adders of the proposed structure does not increase proportionately with block size; and the number of flip-flops is independent of block-size. This is a major advantage of the proposed structure for reducing its area delay product (ADP); particularly, when a large order ADF is implemented for higher block-sizes. ASIC synthesis result shows that, the proposed structure for filter length 64, has almost 14% and 30% less ADP and 25% and 37% less EPO than the best of the existing structures for block size 4 and 8, respectively.

A High-Performance Energy-Efficient Architecture for FIR Adaptive Filter Based on New Distributed Arithmetic Formulation of Block LMS Algorithm

Citations

LMS Adaptive Filters for Noise Cancellation: A Review

A High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications

A High-Performance and Energy-Efficient FIR Adaptive Filter Using Approximate Distributed Arithmetic Circuits

Optimal Complexity Architectures for Pipelined Distributed Arithmetic-Based LMS Adaptive Filter

A high-performance VLSI architecture for reconfigurable FIR using distributed arithmetic

References

Vlsi Digital Signal Processing Systems: Design And Implementation

Applications of distributed arithmetic to digital signal processing: a tutorial review

Least-mean-square adaptive filters

Block implementation of adaptive digital filters

Least-Mean-Square Adaptive Filters: Haykin/Least-Mean-Square Adaptive Filters

Related Papers (5)

LMS adaptive filters using distributed arithmetic for high throughput

Applications of distributed arithmetic to digital signal processing: a tutorial review

Efficient FPGA and ASIC Realizations of a DA-Based Reconfigurable FIR Digital Filter

Vlsi Digital Signal Processing Systems: Design And Implementation

FPGA Realization of FIR Filters by Efficient and Flexible Systolization Using Distributed Arithmetic