scispace - formally typeset
Search or ask a question

Showing papers by "Naresh R. Shanbhag published in 2011"


Journal ArticleDOI
TL;DR: The architectures for bivariate polynomial interpolation and factorization; the two main steps in algebraic soft-decision decoding of Reed-Solomon codes are presented and the latency and hardware requirements are determined.
Abstract: Soft-decision decoding of Reed-Solomon codes delivers significant coding gains over classical minimum distance decoding. In this paper, we present architectures for polynomial interpolation and factorization, the two main steps of the soft-decoding algorithm. We introduce an algorithmic transformation for reducing the iterations required in generating the interpolation polynomial and present efficient architectures by sharing computations. We also describe algorithmic transformations for further reducing the interpolation and factorization latency. An area efficient, folded-pipelined version of the interpolation architecture is also described. Finally, we present an example of a Reed-Solomon soft decoder utilizing the presented architectures, having a 250 Mbps throughput.

38 citations


Proceedings ArticleDOI
01 Nov 2011
TL;DR: In this article, the voltage domains of microprocessors or other digital circuit elements are connected in series, which enhances overall system efficiency and performance by allowing both the digital circuits and the power delivery circuits to operate in their region of highest efficiency.
Abstract: This paper analyzes an alternative for system-level power configuration of digital circuits. To overcome the "power wall" and enable low supply voltages for digital circuits, the voltage domains of microprocessors or other digital circuit elements are connected in series. This enhances overall system efficiency and performance by allowing both the digital circuits and the power delivery circuits to operate in their region of highest efficiency. Power consumption is dramatically reduced because multiple, independent voltage levels enable each processor to operate at its local minimum energy point. A comparative analysis of series and parallel voltage domains concludes that series voltage domains consume less power. Various voltage regulations schemes ranging from software, firmware, and hardware are presented. Some of the challenges and opportunities of series circuits are discussed.

18 citations


Proceedings ArticleDOI
01 Aug 2011
TL;DR: This paper addresses the problem of designing energy-efficient embedded systems by jointly optimizing the power consumption of both the DC-DC converter and the computational core by proposing a reconfigurable core architecture that improves the converter efficiency by 2.3X at C-MEOP, and makes energy consumption at S-MEop and C- MEOP to be within 4% of each other, while improving throughput in the subthreshold region by at least 8X.
Abstract: This paper addresses the problem of designing energy-efficient embedded systems by jointly optimizing the power consumption of both the DC-DC converter and the computational core. Past work has shown that there exists a minimum energy operating point (MEOP) in the subthreshold region for computational cores (C-MEOP), at which the dynamic and leakage powers are balanced. The MEOP is defined by the 3-tuple consisting of the optimum energy consumption E∗, optimum voltage V∗ and optimum frequency f∗. First, we show that the DC-DC converter losses in dynamic voltage scaling (DVS) cause the overall system MEOP (S-MEOP) to differ significantly from C-MEOP. Simulations in a 130-nm, 1.2V commercial CMOS process show that operation at S-MEOP results in a 45.5% energy savings over operating at a core voltage V∗ C suggested by C-MEOP. The DC-DC converter efficiency is also improved by 2.2X. Second, we show that architectural techniques such as parallelization cause the S-MEOP to approach C-MEOP. Thus, it is sufficient to track C-MEOP — a much easier task on-chip — in order to account for process variations. We show that DC-DC converter losses reduces in subthreshold region but increases in superthreshold region when parallelization is employed. This observation leads us to propose a reconfigurable core architecture that improves the converter efficiency by 2.3X at C-MEOP, and makes energy consumption at S-MEOP and C-MEOP to be within 4% of each other, while improving throughput in the subthreshold region by at least 8X. Finally, we show that pipelining, which has been proposed to decrease core energy at C-MEOP while improving throughput [1], adversely affects the S-MEOP. The pipelined-core system energy at S-MEOP is 85% lower than the pipelined-core system energy when operating at the C-MEOP voltage V∗ C .

16 citations


Proceedings ArticleDOI
20 Oct 2011
TL;DR: A 256-tap PN code acquisition filter in an 180nm CMOS process employing statistical system-level error compensation and an improvement of 5.8× in energy-efficiency over conventional error free designs and 3.7× in error tolerance over existing error tolerant designs is presented.
Abstract: We present a 256-tap PN code acquisition filter in an 180nm CMOS process employing statistical system-level error compensation. Under voltage overscaling (VOS), near constant detection probability (P det ) above 90% with 5.8× reduction in energy is achieved at a supply voltage 27% below the point of first failure (PoFF) with an error rate (p e ) of 0.868. This is an improvement of 5.8× in energy-efficiency over conventional error free designs and 3.79× in energy-efficiency and 2170× in error tolerance over existing error tolerant designs.

16 citations


Proceedings ArticleDOI
01 Dec 2011
TL;DR: A novel low complexity energy efficient recon-figurable reduced dimension maximum likelihood (RRDML) multiple-input multiple-output (MIMO) detector is proposed that achieves up to 62.3% power savings with a BER loss at most 3.7% compared to ML based receivers.
Abstract: In this paper, a novel low complexity energy efficient recon-figurable reduced dimension maximum likelihood (RRDML) multiple-input multiple-output (MIMO) detector is proposed. RRDML is based on RDML [1], in which maximum likelihood (ML) is applied to detect a sub-dimension of the received vector and linear detection is used for the remaining dimension. The channel condition number is employed to configure RRDML. For a 4×4 MIMO system with 16-QAM modulation over a Rayleigh fading channel, Verilog simulation in a commercial 45nm, 1.2V CMOS process show that RRDML achieves up to 62.3% power savings with a BER loss at most 3.7% compared to ML based receivers.

10 citations


Patent
09 Dec 2011
TL;DR: In this paper, a system includes a circuit having a plurality of electronic function blocks interconnected in series, a power source unit coupled to the circuit, for supplying power to the plurality of EF blocks, and a control unit coupled with each of the EF block and to the power source.
Abstract: A system includes a circuit having a plurality of electronic function blocks interconnected in series, a power source unit coupled to the circuit, for supplying power to the plurality of electronic function blocks, and a control unit coupled to each of the plurality of the electronic function blocks and to the power source unit. The control unit is configured to monitor activity levels of each of the electronic function blocks, and adjusts the activity level of each of the plurality of electronic function blocks. The control unit determines a voltage level suitable for the corresponding adjusted activity level, and adjusts the power supplied to each of the plurality of electronic function blocks in order to achieve the corresponding determined voltage level at each of the plurality of electronic function blocks.

9 citations


Proceedings ArticleDOI
14 Mar 2011
TL;DR: A simple additive error model for timing errors in arithmetic computations due PVT variations is proposed and a characterization methodology is presented to obtain the proposed model parameters and thus enabling efficient implementations of emerging stochastic computing techniques.
Abstract: This paper makes a case for developing statistical timing error models of DSP kernels implemented in nanoscale circuit fabrics. Recently, stochastic computation techniques have been proposed [1], [2], [3], where the explicit use of error-statistics in system design has been shown to significantly enhance robustness and energy-efficiency. However, obtaining the error statistics at different process, voltage, and temperature (PVT) corners is hard. This paper: 1) proposes a simple additive error model for timing errors in arithmetic computations due PVT variations, 2) analyzes the relationship between error statistics and parameters, specifically the input statistics, and 3) presents a characterization methodology to obtain the proposed model parameters and thus enabling efficient implementations of emerging stochastic computing techniques. Key results include the following observations: 1) the output error statistics is a weak function of input statistics, and 2) the output error statistics depends upon the one's probability profile of the input word. These observations enable a one-time off-line statistical error characterization of DSP kernels similar to delay and power characterization done presently for standard cells and IP cores. The proposed error model is derived for a number of DSP kernels in a commercial 45nm CMOS process.

6 citations


Proceedings ArticleDOI
01 Dec 2011
TL;DR: This paper studies the benefits of BER-optimal ADCs in terms of power savings and relaxation of component specifications in a 90 nm 1.2V CMOS process based on component models for a flash ADC that capture bandwidth limitation of pre-amplifiers and metastability of latches.
Abstract: We recently explored the concept of using BER-optimal ADCs for high-speed links. In this paper, we study the benefits of BER-optimal ADCs in terms of power savings and relaxation of component specifications in a 90 nm 1.2V CMOS process. These analyses are based on component models for a flash ADC that capture bandwidth limitation of pre-amplifiers and metastability of latches. We show that in the presence of these ADC non-idealities, a 3-bit BER-optimal ADC can provide a 3 dB ADC shaping gain over a 4-bit conventional ADC. The one bit reduction offers power savings of 75% in the VGA and 50% in the ADC. Further, the 3dB ADC shaping gain can be traded-off for a 50% reduction of transmit driver power, a 75% reduction of the pre-amplifier bandwidth, or a saving of one latch stage that leads to a 20% additional power reduction in the ADC.

3 citations


Proceedings ArticleDOI
14 Mar 2011
TL;DR: The benefits of the proposed system-aware mixed-signal design approach are illustrated in the context of analog-to-digital converters for high-speed links and the CAD challenges that arise in designing system-assisted mixed-Signal circuits are described.
Abstract: In this paper, we propose a system-assisted analog mixed-signal (SAMS) design paradigm whereby the mixed-signal components of a system are designed in an application-aware manner in order to minimize power and enhance robustness in nanoscale process technologies. In a SAMS-based communication link, the digital and analog blocks from the output of the information source at the transmitter to the input of the decision device in the receiver are treated as part of the composite channel. This comprehensive systems-level view enables us to compensate for impairments of not just the physical communication channel but also the intervening circuit blocks, most notably the analog/mixed-signal blocks. This is in stark contrast to what is done today, which is to treat the analog components in the transmitter and the analog front-end at the receiver as transparent waveform preservers. The benefits of the proposed system-aware mixed-signal design approach are illustrated in the context of analog-to-digital converters (ADCs) for high-speed links. CAD challenges that arise in designing system-assisted mixed-signal circuits are also described.

2 citations


Proceedings ArticleDOI
22 May 2011
TL;DR: A technique is presented that relaxes the need to preserve the exact frequency response and instead considers a least-squares formulation in conjunction with the pipelined architecture, enabling a simple pipelining architecture based on a polyphase decomposition of the original filter.
Abstract: Current techniques used in pipelining recursive filters require significant hardware complexity. These techniques attempt to preserve the exact frequency response of the original circuit while seeking to construct a pipelined architecture. We present a technique that relaxes the need to preserve the exact frequency response and instead considers a least-squares formulation in conjunction with the pipelined architecture. The benefit of this design is that it reduces the complexity of the pipelined circuit immensely, while enabling a simple pipelined architecture based on a polyphase decomposition of the original filter.