Showing papers by "Keshab K. Parhi published in 2001"

PDF

Open Access

Proceedings Article•DOI•

On finite precision implementation of low density parity check codes decoder

[...]

Tong Zhang¹, Zhongfeng Wang¹, Keshab K. Parhi¹•Institutions (1)

06 May 2001

TL;DR: Simulation results indicate that the quantization scheme for the LDPC decoder is effective in approximating the infinite precision implementation, and 4 bits and 6 bits are adequate for representing the received data and extrinsic information.

...read moreread less

Abstract: In this paper, we analyze the finite precision effects on the decoding performance of Gallager's low density parity check (LDPC) codes and develop optimal finite word lengths of variables as far as the tradeoffs between the performance and hardware complexity are concerned. We have found that 4 bits and 6 bits are adequate for representing the received data and extrinsic information, respectively. Simulation results indicate that the quantization scheme we have developed for the LDPC decoder is effective in approximating the infinite precision implementation.

...read moreread less

122 citations

Journal Article•DOI•

Systematic design of original and modified Mastrovito multipliers for general irreducible polynomials

[...]

Tong Zhang¹, Keshab K. Parhi¹•Institutions (1)

University of Minnesota¹

01 Jul 2001-IEEE Transactions on Computers

TL;DR: A systematic design of modified Mastrovito multiplier, which is suitable for GF(2/sup m/) generated by high-Hamming weight irreducible polynomials is proposed, which effectively exploits the spatial correlation of elements in Mastovito product matrix to reduce the complexity.

...read moreread less

Abstract: This paper considers the design of bit-parallel dedicated finite field multipliers using standard basis. An explicit algorithm is proposed for efficient construction of Mastrovito product matrix, based on which we present a systematic design of Mastrovito multiplier applicable to GF(2/sup m/) generated by an arbitrary irreducible polynomial. This design effectively exploits the spatial correlation of elements in Mastrovito product matrix to reduce the complexity. Using a similar methodology, we propose a systematic design of modified Mastrovito multiplier, which is suitable for GF(2/sup m/) generated by high-Hamming weight irreducible polynomials. For both original and modified Mastrovito multipliers, the developed multiplier architectures are highly modular, which is desirable for VLSI hardware implementation. Applying the proposed algorithm and design approach, we study the Mastrovito multipliers for several special irreducible polynomials, such as trinomial and equally-spaced-polynomial, and the obtained complexity results match the best known results. Moreover, we have discovered several new special irreducible polynomials which also lead to low-complexity Mastrovito multipliers.

...read moreread less

116 citations

Proceedings Article•DOI•

Low-power 4-2 and 5-2 compressors

[...]

Krishnaswamy Venkatesh Prasad, Keshab K. Parhi

01 Jan 2001

TL;DR: This paper explores various low power higher order compressors such as 4-2 and 5-2 compressor units, which are building blocks for binary multipliers.

...read moreread less

Abstract: This paper explores various low power higher order compressors such as 4-2 and 5-2 compressor units. These compressors are building blocks for binary multipliers. Various circuit architectures for 4-2 compressors are compared with respect to their delay and power consumption. The different circuits are simulated using HSPICE. A new circuit for a 5-2 compressor is then presented which is 12% faster and consumes 37% less power.

...read moreread less

116 citations

Proceedings Article•DOI•

VLSI implementation-oriented (3, k)-regular low-density parity-check codes

[...]

Tong Zhang¹, Keshab K. Parhi•Institutions (1)

University of Minnesota¹

26 Sep 2001

TL;DR: This work proposes a joint code and decoder design approach to construct a class of (3, k)-regular LDPC codes which exactly fit a partly parallel decoder implementation and have a very good performance.

...read moreread less

Abstract: In the past few years, Gallager's low-density parity-check (LDPC) codes received a lot of attention and many efforts have been devoted to analyzing and improving their error-correcting performance. However, little consideration has been given to the LDPC decoder VLSI implementation. The straightforward fully parallel decoder architecture usually incurs too high complexity for many practical purposes and should be transformed to a partly parallel realization. Unfortunately, due to the randomness of LDPC codes, it is nearly impossible to develop an effective transformation for an arbitrarily given LDPC code. We propose a joint code and decoder design approach to construct a class of (3, k)-regular LDPC codes which exactly fit a partly parallel decoder implementation and have a very good performance. Moreover, for such LDPC codes, we propose a systematic, efficient encoding scheme by effectively exploiting the sparseness of its parity check matrix.

...read moreread less

91 citations

Patent•

System and method for high speed communications using digital signal processing

[...]

Oscar E. Agazzi¹, Gottfried Ungerboeck², Keshab K. Parhi², Christian A. J. Lutkemeyer², Pieter Vorenkamp², Kevin T. Chan², Myles H. Wakayama² - Show less +3 more•Institutions (2)

Broadcom¹, Avago Technologies²

28 Feb 2001

TL;DR: In this paper, various systems and methods related to equalization precoding in a communications channel are disclosed, in one implementation precoding is performed on signals transmitted over an optical channel.

...read moreread less

Abstract: Various systems and methods related to equalization precoding in a communications channel are disclosed In one implementation precoding is performed on signals transmitted over an optical channel In one implementation precoding and decoding operations are performed in parallel to facilitate high speed processing in relatively low cost circuits Initialization of the precoders may be realized by transmitting information related to the characteristics of the channel between transceiver pairs

...read moreread less

82 citations

Proceedings Article•DOI•

Joint code and decoder design for implementation-oriented (3, k)-regular LDPC codes

[...]

Tong Zhang¹, Keshab K. Parhi•Institutions (1)

University of Minnesota¹

01 Jan 2001

TL;DR: This work proposes a joint code and decoder design approach to construct a class of (3, k)-regular LDPC codes which exactly fit to a partly parallel decoder implementation.

...read moreread less

Abstract: Gallager's low-density parity-check (LDPC) codes have recently received a lot of attention because of their excellent performance. The decoder hardware implementation is obviously one of the most crucial issues determining the extent of LDPC applications in the real world. The straightforward fully parallel decoder architecture usually incurs too high complexity for many practical purposes and should be transformed to a partly parallel realization. We propose a joint code and decoder design approach to construct a class of (3, k)-regular LDPC codes which exactly fit to a partly parallel decoder implementation. The partly parallel decoder architecture is suitable for efficient VLSI implementation and it has been shown that the jointly developed (3, k)-regular LDPC codes have very good performance.

...read moreread less

45 citations

Journal Article•DOI•

Approaches to low-power implementations of DSP systems

[...]

Keshab K. Parhi¹•Institutions (1)

University of Minnesota¹

01 Oct 2001-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: This paper reviews several approaches for low-power implementations of building blocks for digital subscriber line (DSL) systems and shows that use of separate Galois field functional units for multiply-accumulate and degree reduction can reduce the energy consumption of RS coders dramatically.

...read moreread less

Abstract: Reduction of power consumption is significantly important for all high-performance digital VLSI systems. This paper reviews several approaches for low-power implementations of building blocks for digital subscriber line (DSL) systems. Low-power implementations of Reed-Solomon (RS) coders, fast Fourier transforms (FFTs), FIR filters, and equalizers, and reduction of power consumption by use of dual supply voltages are addressed. It is shown that use of separate Galois field functional units for multiply-accumulate and degree reduction can reduce the energy consumption of RS coders dramatically. A hybrid feedforward and feedback commutator scheme-based FFT is shown to require less area and full hardware utilization efficiency. Reduction of switching activity at one or both inputs of the multipliers is a key to reduction of power consumption in FIR filters and equalizers. The switching activity can be reduced by use of transpose structure and by time-multiplexing of an unfolded filter. A well established retiming approach can be generalized to find those noncritical gates which can be operated with lower supply voltages to reduce the overall system power consumption.

...read moreread less

34 citations

Proceedings Article•DOI•

A class of efficient-encoding generalized low-density parity-check codes

[...]

Tong Zhang¹, Keshab K. Parhi•Institutions (1)

University of Minnesota¹

07 May 2001

TL;DR: It is shown that such GLD codes have equally good performance and a systematic approach to construct an approximate upper triangular GLD parity check matrix which defines a class of efficient-encodingGLD codes is proposed.

...read moreread less

Abstract: In this paper, we investigate an efficient encoding approach for generalized low-density (GLD) parity check codes, a generalization of Gallager's (1962, 1963) low-density parity check (LDPC) codes. We propose a systematic approach to construct an approximate upper triangular GLD parity check matrix which defines a class of efficient-encoding GLD codes. It is shown that such GLD codes have equally good performance. By effectively exploiting structure sharing in the encoding process, we also present a hardware/software codesign for practical encoder implementation of these efficient-encoding GLD codes.

...read moreread less

31 citations

Proceedings Article•DOI•

High-performance, low-complexity decoding of generalized low-density parity-check codes

[...]

Tong Zhang¹, Keshab K. Parhi•Institutions (1)

University of Minnesota¹

25 Nov 2001

TL;DR: It is shown that Max-Log-MAP is an attractive SISO decoding algorithm for GLD coding scheme, considering the trade-off between performance and complexity in the practical implementations, and two techniques are proposed to effectively reduce the decoding complexity without any performance degradation.

...read moreread less

Abstract: A class of pseudo-random compound error-correcting codes, called generalized low-density (GLD) parity-check codes, has been proposed recently. As a generalization of Gallager's low-density parity-check (LDPC) codes, GLD codes are also asymptotically good in the sense of minimum distance criterion and can be effectively decoded based on iterative soft-input soft-output (SISO) decoding of individual constituent codes. The code performance and decoding complexity of GLD codes are heavily dependent on the employed SISO decoding algorithm. In this paper, we show that Max-Log-MAP is an attractive SISO decoding algorithm for GLD coding scheme, considering the trade-off between performance and complexity in the practical implementations. A normalized Max-Log-MAP is presented to improve the GLD code performance significantly compared with using conventional Max-Log-MAP. Moreover, we propose two techniques, decoding task scheduling and reduced search Max-Log-MAP, to effectively reduce the decoding complexity without any performance degradation.

...read moreread less

21 citations

Proceedings Article•DOI•

Energy efficient signaling in deep submicron CMOS technology

[...]

Imed Ben Dhaou, Vijay Sundararajan¹, Hannu Tenhunen², Keshab K. Parhi•Institutions (2)

University of Minnesota¹, Royal Institute of Technology²

01 Jan 2001

TL;DR: An efficient technique for energy savings in DSM technology based on low-voltage signaling over long on-chip interconnect with repeater insertion to tolerate DSM noise and to achieve an acceptable delay is proposed.

...read moreread less

Abstract: In this paper we propose an efficient technique for energy savings in DSM technology The core of this method is based on low-voltage signaling over long on-chip interconnect with repeater insertion to tolerate DSM noise and to achieve an acceptable delay We elaborate a heuristic algorithm, called VIJIM, for repeater insertion VIJIM algorithm has been implemented to design a robust inverter chain for on-chip signaling using 025 /spl mu/m, 25 V, 6-metal-layers CMOS process An average of 70% of energy-saving has been achieved by reducing the supply voltage from 25 V down to 15 K

...read moreread less

19 citations

Proceedings Article•DOI•

Area-efficient high speed decoding schemes for turbo/MAP decoders

[...]

Zhongfeng Wang, Zhipei Chi, Keshab K. Parhi

01 Jan 2001

TL;DR: Two types of area-efficient parallel decoding schemes are proposed and the application of the pipeline-interleaving technique to parallel turbo decoding architectures is presented.

...read moreread less

Abstract: Turbo decoders inherently have a large latency and low throughput due to iterative decoding. To increase the throughput and reduce the latency, high speed decoding schemes have to be employed. In this paper, following a discussion on basic parallel decoding architectures, two types of area-efficient parallel decoding schemes are proposed. Detailed comparison on storage requirement, number of computation units and the overall decoding latency is provided for various decoding schemes with different levels of parallelism. Hybrid parallel decoding schemes are proposed as an attractive solution for very high level parallelism implementations. Simulation results demonstrate that the proposed area-efficient parallel decoding schemes introduce no performance degradation in general. The application of the pipeline-interleaving technique to parallel turbo decoding architectures is also presented.

...read moreread less

Proceedings Article•DOI•

A study on the performance, complexity tradeoffs of block turbo decoder design

[...]

Zhipei Chi¹, Leilei Song¹, Keshab K. Parhi•Institutions (1)

University of Minnesota¹

06 May 2001

TL;DR: In this article, the tradeoffs between VLSI implementation complexity and performance of block turbo decoder are investigated, and low complexity design strategies on choosing the scaling factor of the log extrinsic information, reducing the number of hard decision decodings and reducing the complexity of general hard-decision BCH decoders when softdecision decoding are utilized.

...read moreread less

Abstract: In this paper, results from a study of the tradeoffs between VLSI implementation complexity and performance of block turbo decoder are presented. Specifically, we address low complexity design strategies on choosing the scaling factor of the log extrinsic information, reducing the number of hard decision decodings and reducing the complexity of general hard-decision BCH decoders when soft-decision decodings are utilized.

...read moreread less

Custom VLSI design of efficient low latency and low power finite field multiplier for Reed-Solomon codec

[...]

Lijun Gao¹, Keshab K. Parhi¹•Institutions (1)

University of Minnesota¹

28 May 2001

TL;DR: The issue of VLSI design of low latency/low power finite field multipliers is addressed and methods from logic structure, circuit design and physical mapping aspects are presented and an irregular balanced-tree parallel multiplier is proposed.

...read moreread less

Abstract: The issue of VLSI design of low latency/low power finite field multipliers is addressed and methods from logic structure, circuit design and physical mapping aspects are presented. With proposed architecture and physical mapping, an irregular balanced-tree parallel multiplier con be implemented as easy as a regular multiplier. The custom VLSI implementations of these multipliers over GF(2/sup m/) show that the irregular multiplier has 53% smaller delay and 58% less power consumption than a regular multiplier.

...read moreread less

Proceedings Article•DOI•

Custom VLSI design of efficient low latency and low power finite field multiplier for Reed-Solomon codec

[...]

Lijun Gao¹, Keshab K. Parhi¹•Institutions (1)

University of Minnesota¹

06 May 2001

TL;DR: In this article, the issue of VLSI design of low latency/low power finite field multipliers is addressed and methods from logic structure, circuit design and physical mapping aspects are presented.

...read moreread less

Journal Article•DOI•

A unified algebraic transformation approach for parallel recursive and adaptive filtering and SVD algorithms

[...]

Jun Ma¹, Keshab K. Parhi, E.F. Deprettere¹•Institutions (1)

University of Minnesota¹

01 Feb 2001-IEEE Transactions on Signal Processing

TL;DR: A unified algebraic transformation approach is presented for designing parallel recursive and adaptive digital filters and singular value decomposition (SVD) algorithms, based on the explorations of some algebraic properties of the target algorithms' representations.

...read moreread less

Abstract: In this paper, a unified algebraic transformation approach is presented for designing parallel recursive and adaptive digital filters and singular value decomposition (SVD) algorithms. The approach is based on the explorations of some algebraic properties of the target algorithms' representations. Several typical modern digital signal processing examples are presented to illustrate the applications of the technique. They include the cascaded orthogonal recursive digital filter, the Givens rotation-based adaptive inverse QR algorithm for channel equalization, and the QR decomposition-based SVD algorithms. All three examples exhibit similar throughput constraints. There exist long feedback loops in the algorithms' signal flow graph representation, and the critical path is proportional to the size of the problem. Applying the proposed algebraic transformation techniques, parallel architectures are obtained for all three examples. For cascade orthogonal recursive filter, retiming transformation and orthogonal matrix decompositions (or pseudo-commutativity) are applied to obtain parallel filter architectures with critical path of five Givens rotations. For adaptive inverse QR algorithm, the commutativity and associativity of the matrix multiplications are applied to obtain parallel architectures with critical path of either four Givens rotations or three Givens rotations plus two multiply-add operations, whichever turns out to be larger. For SVD algorithms, retiming and associativity of the matrix multiplications are applied to derive parallel architectures with critical path of eight Givens rotations. The critical paths of all parallel architectures are independent of the problem size as compared with being proportional to the problem size in the original sequential algorithms. Parallelism is achieved at the expense of slight increase (or the same for the SVD case) in the algorithms' computational complexity.

...read moreread less

Proceedings Article•DOI•

A very low complexity block turbo decoder composed of extended Hamming codes

[...]

Yongrui Chen¹, Keshab K. Parhi•Institutions (1)

University of Minnesota¹

25 Nov 2001

TL;DR: This paper presents a very low complexity block turbo decoder composed of extended Hamming codes and new efficient complexity reduction algorithms are proposed including simplifying the extrinsic information computation and soft inputs updating algorithm.

...read moreread less

Abstract: This paper presents a very low complexity block turbo decoder composed of extended Hamming codes. New efficient complexity reduction algorithms are proposed including simplifying the extrinsic information computation and soft inputs updating algorithm. For performance evaluation, [eHamming (32,26,4)]/sup 2/ and [eHamming (64,57,4)]/sup 2/ block turbo code transmitted over an AWGN channel using BPSK modulation are considered. Extra 0.3 dB to 0.4 dB coding gain is obtained if compared with the scheme proposed in Pyndiah et al., (1996), and the hardware overhead is negligible. The complexity of our new block turbo decoder is about ten times less than that of the near-optimum block turbo decoder with a performance degradation of only 0.5 dB. Other schemes such as reduction of test patterns in the Chase algorithm and memory saving techniques are also presented.

...read moreread less

Journal Article•DOI•

Finite Wordlength Analysis and Adaptive Decoding for Turbo/MAP Decoders

[...]

Zhongfeng Wang¹, Hiroshi Suzuki², Keshab K. Parhi¹•Institutions (2)

University of Minnesota¹, Kawasaki Steel Corporation²

01 Nov 2001

TL;DR: Finite precision effects on the performance of Turbo decoders are analyzed and the optimal word lengths of variables are determined considering tradeoffs between the performance and the hardware cost.

...read moreread less

Abstract: Turbo decoders inherently require large hardware for VLSI implementation as a large amount of memory is required to store incoming data and intermediate computation results. Design of highly efficient Turbo decoders requires reduction of hardware size and power consumption. In this paper, finite precision effects on the performance of Turbo decoders are analyzed and the optimal word lengths of variables are determined considering tradeoffs between the performance and the hardware cost. It is shown that the performance degradation from the infinite precision is negligible if 4 bits are used for received bits and 6 bits for the extrinsic information. The state metrics normalization method suitable for Turbo decoders is also discussed. This method requires small amount of hardware and its speed does not depend on the number of states. Furthermore, we propose a novel adaptive decoding approach which does not lead to performance degradation and is suitable for VLSI implementation.

...read moreread less

Journal Article•DOI•

Hybrid annihilation transformation (HAT) for pipelining QRD-based least-square adaptive filters

[...]

Z. Chi¹, Jun Ma¹, Keshab K. Parhi•Institutions (1)

University of Minnesota¹

01 Jul 2001-IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing

TL;DR: HAT is presented as a solution to break the bottleneck of a high-throughput implementation introduced by the inherent recursive computation in the QRD based adaptive filters and allows a linear speedup in the throughput rate by a linear increase in hardware complexity.

...read moreread less

Abstract: A novel transformation, referred to as hybrid annihilation transformation (HAT), for pipelining the QR decomposition (QRD) based least square adaptive filters has been developed. HAT provides a unified framework for the derivation of high-throughput/low-power VLSI architectures of three kinds of QRD adaptive filters, namely, QRD recursive least-square (LS) adaptive filters, QRD LS lattice adaptive filters, and QRD multichannel LS lattice adaptive filters. In this paper, HAT is presented as a solution to break the bottleneck of a high-throughput implementation introduced by the inherent recursive computation in the QRD based adaptive filters. The most important feature of the proposed solution is that it does not introduce any approximation in the entire filtering process. Therefore, it causes no performance degradation no matter how deep the filter is pipelined. It allows a linear speedup in the throughput rate by a linear increase in hardware complexity. The sampling rate can be traded off for power reduction with lower supply voltage for applications where high-speed is not required. The proposed transformation is addressed both analytically, with mathematical proofs, and experimentally, with computer simulation results on its applications in wireless code division multiple access (CDMA) communications, conventional digital communications and multichannel linear predictions.

...read moreread less

Proceedings Article•

IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation

[...]

Tong Zhang, Keshab K. Parhi

01 Jan 2001

Proceedings Article•DOI•

A unified adder design

[...]

Yuke Wang, Keshab K. Parhi

01 Jan 2001

TL;DR: It is demonstrated that the conditional-sum adders and carry-select adders have redundant sum logic and sub-optimal carry logic; therefore they should be eliminated and the most efficient way to generate carries is by layers instead of by groups, where layers are non-consecutive non-equal length collections.

...read moreread less

Abstract: We present a unified framework to compare and optimize various adders in the literature including carry-ripple adders, carry-look-ahead adders, prefix-adders, canonic adders, block-based carry-look-ahead adders, carry-skip adders, conditional-sum adders, carry-select adders and hybrid adders. The logic of all those adders can be separated into two parts: the carry logic and the sum logic, while carries can be generated by a uniformed carry operator implemented by any of the prefix operator, the FCO operator, MUX, the AND-OR gates and pass-transistors. The most efficient way to generate carries is by layers instead of by groups, where layers are non-consecutive non-equal length collections. We demonstrate that the conditional-sum adders and carry-select adders have redundant sum logic and sub-optimal carry logic; therefore they should be eliminated. We show how to design efficient carry logic by layers of carries and improve some prefix addition algorithms such as the Brent-Kung's adder for carry generation. The ideas discussed here are independent of the implementation technology and therefore useful for all implementations and technologies.

...read moreread less

Journal Article•DOI•

Vector processing of wavelet coefficients for robust image denoising

[...]

M. Zervakis¹, Vijay Sundararajan², Keshab K. Parhi²•Institutions (2)

Technical University of Crete¹, University of Minnesota²

01 May 2001-Image and Vision Computing

TL;DR: A wavelet-domain robust denoising algorithm, which efficiently removes both Gaussian as well as Gaussian mixed with impulse noise, and is established by simulation results over a variety of images.

...read moreread less

Proceedings Article•DOI•

Sign extension reduction by propagated-carry selection

[...]

Sang-Min Kim¹, Jin-Gyun Chung¹, Keshab K. Parhi•Institutions (1)

University of Minnesota¹

01 Dec 2001

TL;DR: To reduce the overhead due to sign extension, a new method is proposed based on the fact that carry propagation in the sign-extension part can be controlled such that a desired input bit can be propagated as a carry.

...read moreread less

Abstract: To reduce the area and power consumption in constant coefficient multiplications, the coefficient can be encoded using the canonic signed digit (CSD) representation. When the partial product terms are added depending on the nonzero bit positions in the CSD-encoded multiplier, all sign bits are properly extended before the addition takes place. In this paper, to reduce the overhead due to sign extension, a new method is proposed based on the fact that carry propagation in the sign-extension part can be controlled such that a desired input bit can be propagated as a carry. Also, a fixed-width multiplier design method suitable for CSD multiplications is proposed. By combining these two methods, it is shown that significant hardware saving can be achieved.

...read moreread less

Proceedings Article•DOI•

A study on the performance, power consumption tradeoffs of short frame turbo decoder design

[...]

Zhipei Chi¹, Zhongfeng Wang¹, Keshab K. Parhi•Institutions (1)

University of Minnesota¹

26 Sep 2001

TL;DR: It is shown that significant coding gains can be achieved by actually increasing the coding rate with negligible increase in power consumption, and that performance improvement is demonstrated over both AWGN and Rayleigh flat fading channels.

...read moreread less

Abstract: Protecting short frames using turbo coding is a challenging task because of the small interleave size and the need for transmission efficiency. We explore possible trade-off between power consumption (estimated by the average number of iterations) and performance of turbo decoders when short frame turbo codes are used. Three encoding/decoding schemes are proposed to improve performance of turbo decoder in terms of frame/bit error rate, and to increase the data transmission efficiency whether ARQ protocols are performed or not. Specifically, turbo decoding metrics aided short CRC codes are applied to terminated trellis codes, tail-biting encoded trellis codes and CRC embedded trellis codes with a two-fold purpose: to stop the iterative decoding processes and to detect decoding errors at the last iteration. We show that significant coding gains can be achieved by actually increasing the coding rate with negligible increase in power consumption. Performance improvement is demonstrated over both AWGN and Rayleigh flat fading channels.

...read moreread less

Journal Article•DOI•

A Novel Low-power Shared Division and Square-root Architecture Using the GST Algorithm

[...]

Martin Kuhlmann¹, Keshab K. Parhi¹•Institutions (1)

Broadcom¹

01 Dec 2001-Vlsi Design

TL;DR: A GSTsquare-root architecture is developed without requiring either a multiplication to update the scaled square-root quotient in each iteration or a division/multiplication by the scaling factor after completing the square- root iterations.

...read moreread less

Abstract: Although SRT division and square-root approaches and GST division approach have been known for long time, square-root architectures based on the GST approach have not been proposed so far which do not require a final division/multiplication of the scale factor. A GST square-root architecture is developed without requiring either a multiplication to update the scaled square-root quotient in each iteration or a division/multiplication by the scaling factor after completing the square-root iterations. Additionally, quantitative comparison of speed and power consumption of GST and SRT division/square-root units are presented. Shared divider and square-root units are designed based on the SRT and the GST approaches, in minimally and maximally redundant radix-4 representations. Simulations demonstrate that the worst-case overall latency of the minimally-redundant GST architecture is 35% smaller compared to the SRT. Alternatively, for a fixed latency, the minimally-redundant GST architecture based division and square-root operations consume 32% and 28% less power, respectively, compared to the maximally-redundant SRT approach.

...read moreread less

Proceedings Article•DOI•

Models for power consumption and power grid noise due to datapath transition activity

[...]

Lijun Gao¹, Keshab K. Parhi¹•Institutions (1)

University of Minnesota¹

01 Mar 2001

TL;DR: This research was supported by Defense Advanced Research Project Agency under contract number DA/DABT63-96-C-0050 and the dual bit type (DBT) model for estimating the average transit ion activity was further developed.

...read moreread less

Abstract: The average power consumption is proportional to the average value of transition activity, i.e., transition probability, and the variance of transition activity determines the strength of power grid noise. In this paper, for the first time, a simple accurate model for estimating the variance of transition activity was proposed, and the dual bit type (DBT) model for estimating the average transit ion activity was further developed. Tile model for estimating transition activity variance is based on linearly modeling the spatial correlation of bit-level transition activity, which leads to low computational complexity for computing the variance with very good estimation accuracy. The previous DBT model is made complete with the equation derived in this paper for computing the transition probabili ty beyond the breakpoint BP~. In addition to DSP computational architecture and algorithm designs, the proposed simple models are of great significance for power grid noise decoupling and chip floor-planning designs. 1. I N T R O D U C T I O N Over the past few years, the low-power design issues have received much attention mostly due to the emergence of mobile applications, where the average power consumption determines the lifetime of a battery. The average power consumption also plays an important role on the reliability and the packaging cost of high performance ICs, such as microprocessors which consume large power in order to run at high clock fi-equency. As semi-conductor technology moves rapidly into deep submicron (DSM) stage, the design challenge for delivering reliable ICs becomes more severe due to very low supply voltage, shrinked wire width and much tighter noise margin constraints. Therefore, in DSM technology, tile knowledge of the average power consumption *This research was supported by Defense Advanced Research Project Agency under contract number DA/DABT63-96-C-0050 Permission to make digital or hard copies of all or part of this work tbr personal or classroom use is granted without Ibc provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific pcmaission and/or a t?e. GLSVLS12001 West Lafayette. Indiana. USA Copyright ACM 2001 1-58113-351-0/01/03...$5-00 121 alone is not sufficient for designing a reliable IC. Both tile peak power consumption and the power grid noise have a heavy weight in design process. These two new factors can cause the malfunction of a digital circuit or the performance degradation of an analog circuit in a mixed signal IC chip, which prevents a system-on-chip (SOC) design from being implemented. From this viewpoint, the accurate knowledge and control of the average power consumption, the peak power consumption and tile power grid noise are crucial for successful chip designs. For CMOS circuits, there are three types of power consumption, i.e., static (or leakage), short circuit and dynamic (or switching). The switching power consumption arises when an output node makes a voltage transition from low to high, which is usually from 0V to power supply Vdd [1], and can be represented as P(n) = a(n)CL V~dfclk (1) where P(n) is the power consumption at clock cycle n; o~(n) is the transition activity [2][3] at clock cycle n, whose value is 1 for transit ion (0 to 1) or 0 otherwise; CL is the capacitance load at the output. The average switching power is the average value of equation (1) with respect to ca(n) and can be represented as 2 P = caCLVddfcl k (2), where ca = E[a(n)] , referred as transition probability in [2][3], and ca(n) is assumed stationary. The values of capacitance load of circuits can be extracted using the method presented in [2][3]. To obtain the values of transition probability a for general logic circuits, it requires comprehensive simulations for input vector space. However, for DSP datapaths and components, the values of ca can be more readily computed from signal statistical properties. In [2][3], the relationship between transit ion probabili ty and signal statistics was provided as the dual bit type (DBT) model. In [4][5], more treatments of this issue were presented, where the bit-level lag-1 temporal correlation is modeled instead, and the transition probabil i ty is computed from the bit-level temporal correlation and the bit-level probability. However, in [4][5], a complex procedure using signal probability density function (pd~ is required to compute the bit-level probability. In spite of above aspects, none of these three approaches address the second order statistical properties of transition activity, which determine the magnitude of power grid noise. In this paper, the DBT model is further developed and a forPermission to make digital or hard copies of part or all of this work or personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior sp cific permission and/or a fee. GLSVLSI 2001, est afayette, Indiana, S © ACM 2001 1-58113-351-0/00/03...$5.0 mula for computing the transition probability beyond breakpoint BPz is derived. Thereafter, the proposed model for computing power grid noise is presented. In this paper, all discussions axe limited to datapath situations because up to 40% of the total power may be dissipated in buses, drivers and multiplexors [6]. For components, the transition activity will be larger than that of datapaths due to glitching [5] and this problem can be solved separately. The remainder of this paper has four sections. In section 2, the DBT model is summarized and the augmentation to it is presented. In section 3, a linear model for characterizing the spatial correlation of bit-level transition activity is proposed to compute the variance of transition activity and thus the power grid noise. In section, 4, the information about inStantaneous and peak power consul,nptions are illustrated using transition activity distribution plots. Section 5 is the conclusion of this work. 2. AVERAGE TRANSITION ACTIVITY The first part of this section serves as a summary of the DBT model presented in [2][3]. The remaining part shows the augmentation to the DBT model, which makes the DBT model complete. 2.1 Preliminaries For a stationary signal x(n) with mean value of # = E[x(n)], its vaxiance 0 -2 is given by 0 2 = E [ ( x ( n ) #)2] = E[x~(n) ] _ #2 (3) and its lag-1 temporal correlation p is defined as: E[(x(n) # ) ( x ( n 1) #)] p = E[(x(n) ~)2] E[~(~0x(n 1 ) ] #~ = 0-2 (4) The square-root of variance, 0-, is referred as staxl,daxd deviat ion of the signal. When the signal x(n) is represented as a B-bit binary vector ( xBl , (n ) , . ' ' , xl (n), x0(n)), the quantity ai(n), given by ai(n) = xi(n 1)zi(n), (5) is referred as bit-level transition activity of i-th bit, and the quantity A(n), given by

...read moreread less

Proceedings Article•DOI•

Energy efficient signaling in DSM CMOS technology

[...]

Imed Ben Dhaou, Hannu Tenhunen¹, Vijay Sundararajan², Keshab K. Parhi•Institutions (2)

Royal Institute of Technology¹, University of Minnesota²

06 May 2001

TL;DR: A signaling scheme based on low-swing combined with repeater insertion and resizing is derived for both cascaded inverters and inverter chains and leads to a substantial energy-saving ratio without speed degradation.

...read moreread less

Abstract: The problem of efficient signaling over on-chip interconnect in DSM technology is addressed. A signaling scheme based on low-swing combined with repeater insertion and resizing is derived for both cascaded inverters and inverter chains. The proposed scheme has been implemented in 0.25 /spl mu/m 2.5 V, 6 metal layers CMOS process. HSPICE results showed that our scheme leads to a substantial energy-saving ratio without speed degradation.

...read moreread less

Patent•

Systeme et procede destines a des communications a vitesse elevee utilisant un traitement de signal numerique

[...]

Oscar E. Agazzi, Gottfried Ungerboeck, Keshab K. Parhi, Christian A. J. Lutkemeyer, Pieter Vorenkamp, Kevin T. Chan, Myles Wakayama - Show less +3 more

28 Feb 2001

TL;DR: In this paper, the precodage d'egalization dans a canal de communications is discussed, and it is possible to realiser une initialisation des precodeurs par emission d'information se rapportant aux caracteristiques du canal entre des paires d'emetteurs-recepteurs.

...read moreread less

Abstract: L'invention concerne des systemes et des procedes varies concernant le precodage d'egalisation dans un canal de communications. Dans une realisation, la realisation du precodage s'effectue sur des signaux emis sur un canal optique. Dans une autre realisation, les operations de precodage et de decodage sont realises en parallele afin de faciliter un traitement a vitesse elevee dans des circuits relativement peu couteux. Il est possible de realiser une initialisation des precodeurs par emission d'information se rapportant aux caracteristiques du canal entre des paires d'emetteurs-recepteurs.

...read moreread less

Report•DOI•

Low-Power VLSI Architectures for Error Control Coding and Wavelets

[...]

Keshab K. Parhi

12 Oct 2001

TL;DR: In this article, a brief summary of the results supported by the grant during the period from May 1, 1998 to November 30, 2001 is provided, where the authors addressed design of high-speed, low energy, low-area architectures for signal processing systems and error control coders.

...read moreread less

Abstract: : This final report provides a brief summary of our research results supported by the above grant during the period from May 1,1998 to November 30, 2001. Our research has addressed design of high-speed, low-energy, low-area architectures for signal processing systems and error control coders. Contributions in the area of error control coding architectures include design of low-energy and low-complexity finite field arithmetic architectures and Reed-Solomon (RS) codecs. High- performance and low-power architectures for low-density parity-check (LDPC) codes have been developed.

...read moreread less