scispace - formally typeset
Search or ask a question

Showing papers by "Naresh R. Shanbhag published in 2001"


Journal ArticleDOI
TL;DR: New high-speed VLSI architectures for decoding Reed-Solomon codes with the Berlekamp-Massey algorithm are presented, which require approximately 25% fewer multipliers and a simpler control structure than the architectures based on the popular extended Euclidean algorithm.
Abstract: New high-speed VLSI architectures for decoding Reed-Solomon codes with the Berlekamp-Massey algorithm are presented in this paper. The speed bottleneck in the Berlekamp-Massey algorithm is in the iterative computation of discrepancies followed by the updating of the error-locator polynomial. This bottleneck is eliminated via a series of algorithmic transformations that result in a fully systolic architecture in which a single array of processors computes both the error-locator and the error-evaluator polynomials. In contrast to conventional Berlekamp-Massey architectures in which the critical path passes through two multipliers and 1+[log/sub 2/,(t+1)] adders, the critical path in the proposed architecture passes through only one multiplier and one adder, which is comparable to the critical path in architectures based on the extended Euclidean algorithm. More interestingly, the proposed architecture requires approximately 25% fewer multipliers and a simpler control structure than the architectures based on the popular extended Euclidean algorithm. For block-interleaved Reed-Solomon codes, embedding the interleaver memory into the decoder results in a further reduction of the critical path delay to just one XOR gate and one multiplexer, leading to speed-ups of as much as an order of magnitude over conventional architectures.

335 citations


Journal ArticleDOI
TL;DR: A prediction-based error-control scheme is proposed to enhance the performance of the filtering algorithm in the presence of errors due to soft computations, and algorithmic noise-tolerance schemes can also be used to improve theperformance of DSP algorithms in presence of bit-error rates of up to 10/sup -3/ due to deep submicron (DSM) noise.
Abstract: In this paper, we propose a framework for low-energy digital signal processing (DSP), where the supply voltage is scaled beyond the critical voltage imposed by the requirement to match the critical path delay to the throughput. This deliberate introduction of input-dependent errors leads to degradation in the algorithmic performance, which is compensated for via algorithmic noise-tolerance (ANT) schemes. The resulting setup that comprises of the DSP architecture operating at subcritical voltage and the error control scheme is referred to as soft DSP. The effectiveness of the proposed scheme is enhanced when arithmetic units with a higher "delay imbalance" are employed. A prediction-based error-control scheme is proposed to enhance the performance of the filtering algorithm in the presence of errors due to soft computations. For a frequency selective filter, it is shown that the proposed scheme provides 60-81% reduction in energy dissipation for filter bandwidths up to 0.5 /spl pi/ (where 2 /spl pi/ corresponds to the sampling frequency f/sub s/) over that achieved via conventional architecture and voltage scaling, with a maximum of 0.5-dB degradation in the output signal-to-noise ratio (SNR/sub o/). It is also shown that the proposed algorithmic noise-tolerance schemes can also be used to improve the performance of DSP algorithms in presence of bit-error rates of up to 10/sup -3/ due to deep submicron (DSM) noise.

278 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a new circuit technique for designing noise-tolerant dynamic logic, where voltage scaling aggravates the crosstalk noise problem and reduces circuit noise immunity.
Abstract: This paper describes a new circuit technique for designing noise-tolerant dynamic logic. It is shown that voltage scaling aggravates the crosstalk noise problem and reduces circuit noise immunity, motivating the need for noise-tolerant circuit design. In a 0.35-/spl mu/m CMOS technology and at a given supply voltage, the proposed technique provides an improvement in noise immunity of 1.8/spl times/(for an AND gate) and 2.5/spl times/(for an adder carry chain) over domino at the same speed. A multiply-accumulate circuit has been designed and fabricated using a 0.35-/spl mu/m process to verify this technique. Experimental results indicate that the proposed technique provides a significant improvement in the noise immunity of dynamic circuits (>2.4x) with only a modest increase in power dissipation (15%) and no loss in throughput.

66 citations


Journal ArticleDOI
01 Feb 2001
TL;DR: This paper presents an energy-optimized image transmission system for indoor wireless applications which exploits the variabilities in the image data and the wireless multipath channel by employing dynamic algorithm transformations and joint source-channel coding.
Abstract: In this paper, we focus on the total-system-energy minimization of a wireless image transmission system including both digital and analog components. Traditionally, digital power consumption has been ignored in system design, since transmit power has been the most significant component. However, as we move to an era of pico-cell environments and as more complex signal processing algorithms are being used at higher data rates, the digital power consumption of these systems becomes an issue. We present an energy-optimized image transmission system for indoor wireless applications which exploits the variabilities in the image data and the wireless multipath channel by employing dynamic algorithm transformations and joint source-channel coding. The variability in the image data is characterized by the rate-distortion curve, and the variability in the channel characteristics is characterized by the path-loss and impulse response of the channel. The system hardware configuration space is characterized by the error-correction capability of the channel encoder/decoder, number of powered-up fingers in the RAKE receiver, and transmit power of the power amplifier. An optimization algorithm is utilized to obtain energy-optimal configurations subject to end-to-end performance constraints. The proposed design is tested over QCIF images, IMT-2000 channels and 0.18μm, 2.5 V CMOS technology parameters. Simulation results over various images, various distances, two different channels, and two different rates show that the average energy savings in utilizing a total-system-energy minimization over a fixed system (designed for the worst image, the worst channel and the maximum distance) are 53.6% and 67.3%, respectively, for short-range (under 20 m) and long-range (over 20 m) systems.

49 citations


Proceedings ArticleDOI
01 Dec 2001
TL;DR: A low-power digital filtering technique based on voltage overscaling (VOS) and a novel algorithmic noise-tolerant (ANT) technique referred to as reduced precision redundancy (RPR) are proposed.
Abstract: We propose a low-power digital filtering technique based on voltage overscaling (VOS) and a novel algorithmic noise-tolerant (ANT) technique referred to as reduced precision redundancy (RPR). VOS implies scaling of the supply voltage beyond the critical voltage required for correct operation. RPR involves having a reduced precision replica whose output can be employed as the final output in case the original filter computes erroneously. In addition, an LSB estimator is also employed to compensate for information loss in the LSBs. For frequency selective filtering, it is shown that the proposed technique provides 30% energy savings over an optimally scaled (i.e., the supply voltage equals the critical voltage) present day system. Energy savings of up to 65% can be achieved if a SNR loss of 1.5 dB is tolerated.

27 citations


Proceedings ArticleDOI
01 Jan 2001
TL;DR: An integrated circuit implementation of a soft DSP based low-power digital filter in 0.35 /spl mu/m, 3.3 V CMOS process to reduce energy dissipation over optimally voltage-scaled systems with less than 1 db loss in SNR.
Abstract: In this paper we present an integrated circuit implementation of a soft DSP based low-power digital filter in 0.35 /spl mu/m, 3.3 V CMOS process. Soft DSP is a low-power technique that employs voltage overscaling (VOS) and algorithmic noise-tolerance (ANT) to push the limits of energy-efficiency beyond that achievable by voltage scaling alone. VOS refers to scaling the supply voltage beyond the limit imposed by the throughput constraints. ANT is an algorithmic level error-control technique that is employed to restore the algorithmic performance degradation in terms of output signal-to-noise ratio (SNR) caused by VOS. Measured results indicate 40%-67% reduction in energy dissipation over optimally voltage-scaled systems with less than 1 db loss in SNR for a wide range of filter bandwidths (0.05 f/sub s/-0.25 f/sub s/, where f/sub s/ is the sampling frequency).

12 citations


Proceedings ArticleDOI
06 Aug 2001
TL;DR: In this paper, a low-power technique, denoted as MIMO-DECOR, is proposed to reduce energy dissipation in multi-input-multi-output (MIMO) signal processing systems.
Abstract: Presented in this paper is a low-power technique, denoted as MIMO-AEC, to reduce energy dissipation in multi-input-multi-output (MIMO) signal processing systems. The proposed technique extends a previously proposed adaptive error-cancellation (AEC) technique to MIMO systems by employing an algorithm transformation denoted as MIMO-DECOR. The purpose of MIMO-DECOR is to reduce complexity by exploiting correlations inherent in MIMO systems, thereby improving the effectiveness of AEC. We employ the MIMO-AEC in the design of a low-power Gigabit Ethernet 1000Base-T device. Simulation results demonstrate 44.3% - 25.2% overhead reduction due to MIMO-DECOR and 69.1% - 64.2% energy savings over conventional implementations with no loss in algorithmic performance.

10 citations


Proceedings Article
01 Jan 2001
TL;DR: In this paper, a 256x32b 4-read, 4-write ported register file for 6GHz operation in 1.2V, 0.13pm technology is described.
Abstract: This paper describes a 256x32b 4-read, 4-write ported register file for 6GHz operation in 1.2V, 0.13pm technology. The local bitline uses a pseudo-static leakage tolerant scheme to achieve 8% faster read performance and 36% higher DC noise robustness (with 6x active leakage reduction) compared to dual-Vt scheme optimized for high-performance.

9 citations


Proceedings ArticleDOI
09 May 2001
TL;DR: This paper presents the first integrated circuit implementation of a Hermitian decoder thereby proving its practical viability and based on Koetter's decoding algorithm, the chip architecture consists of an array of sixteen interdependent Berlekamp-Massey algorithm blocks.
Abstract: This paper presents the first integrated circuit implementation of a Hermitian decoder thereby proving its practical viability. Hermitian codes provide much larger block lengths (n=4080) compared to that of the popular Reed-Solomon (RS) codes (n=256) over the same field (GF(256)). This translates to a coding gain of 0.6 dB for the same rate. However, Hermitian codes were deemed to be too complex to implement until the emergence of a recent algorithmic breakthrough which made the complexity of Hermitian decoders comparable to that of RS codes. Based on Koetter's decoding algorithm, the chip architecture consists of an array of sixteen interdependent Berlekamp-Massey algorithm (BMA) blocks. Thus, the same IC can be used for decoding RS codes as well. The decoder IC is designed in a 3.3 V, 0.35 μm, four-metal CMOS process and can correct up to t=60 errors per block of n=4080 words at a rate of 400 Mb/s. The IC prototype consumes 3.0 W with a 50 MHz clock

7 citations


Proceedings Article
01 Jan 2001
TL;DR: This paper presents the first integrated circuit implementation of a Hermitian dwoder thereby proving its practical viability and based on Koetter's decoding algorithm, the chip architecture consists of an array of sixteen interdependent Berlekamp-Massey algorithin (BMA) blocks.
Abstract: This paper presents t,he first integrated circuit implementation of a Hermitian dwoder thereby proving its practical viability. Hermitian codes provide much larger block lengths (n = 4080) compared to that of the popular Reed-Solomon (RS) codes (n = 256) over the same field (GF(256)). This translates to a coding gain of 0.6 dB for the same rate. However, Hermitian codes were deemed to be too complex to implement until the emergence of a recent algorithmic breakthrough which made the complexity of Hermitian decoders comparable to that of RS codes. Based on Koetter's decoding algorithm, the chip architecture consists of an array of sixteen interdependent Berlekamp-Massey algorithin (BMA) blocks. Thus, the same IC can be used for decoding RS codes as well. The decoder IC is designed in a 3.3V, 0.35pm, four-metal CMOS process and can correct up to t = 60 errors per block of n = 4080 words at a rate of 400 Mb/s. The IC prototype consumes 3.0 W with a 50 MHz clock.

7 citations


Proceedings ArticleDOI
26 Sep 2001
TL;DR: Simulation results show that the power consumption in a butterfly functional unit of an FFT processor can be reduced by 44% over a conventional voltage-scaled system without any SNR loss in the context of a typical orthogonal frequency division multiplexing (OFDM) based WLAN system.
Abstract: We propose a technique for designing low-power fast Fourier transform (FFT) processors with applications in next generation wireless LAN and wireless access systems. The proposed low-power technique is based on the general principle of soft digital signal processing where voltage overscaling (VOS) (scaling the supply voltage beyond the critical voltage V/sub dd-crit/ required for correct operation) is applied in conjunction with algorithmic noise-tolerance (ANT) techniques. We propose an ANT technique referred to as reduced precision redundancy for compensating the degradation in the signal-to-noise ratio (SNR) at an FFT output due to VOS. Simulation results using the proposed scheme with 0.25 /spl mu/m standard CMOS technology show that the power consumption in a butterfly functional unit of an FFT processor can be reduced by 44% over a conventional voltage-scaled system without any SNR loss in the context of a typical orthogonal frequency division multiplexing (OFDM) based WLAN system.