scispace - formally typeset
Search or ask a question

Showing papers by "Massoud Pedram published in 2007"


Journal Article•DOI•
TL;DR: Although nanoelectronics won't replace CMOS for some time, research is needed now to develop the architectures, methods, and tools to maximally leverage nanoscale devices and terascale capacity.
Abstract: Although nanoelectronics won't replace CMOS for some time, research is needed now to develop the architectures, methods, and tools to maximally leverage nanoscale devices and terascale capacity. Addressing the complementary architectural and system issues involved requires greater collaboration at all levels. The effective use of nanotechnology calls for total system solutions

88 citations


Proceedings Article•DOI•
23 Jan 2007
TL;DR: An accurate model is presented to calculate the short circuit energy dissipation of logic cells using a current-based logic cell model, which constructs the output voltage waveform for a given noisy input waveform.
Abstract: An accurate model is presented to calculate the short circuit energy dissipation of logic cells. The short circuit current is highly dependent on the input and output voltage values. Therefore the actual shape of the voltage signal waveforms at the input and output of the cell should be considered in order to precisely calculate the short circuit energy dissipation. Previous approaches such as the approximation of the crosstalk induced noisy waveforms with saturated ramps can lead to short circuit energy estimation errors as high as an order of magnitude for a minimum sized inverter. To resolve this shortcoming, a current-based logic cell model is utilized, which constructs the output voltage waveform for a given noisy input waveform. The input and output voltage waveforms are then used to calculate the short circuit current, and hence, short circuit energy dissipation. A characterization process is executed for each logic cell in the standard cell library to model the relevant electrical parameters e.g., the parasitic capacitances and nonlinear current sources. Additionally, our model is capable of calculating the short circuit energy dissipation caused by glitches in VLSI circuits, which in some cases can be a key contributor to the total circuit energy dissipation. Experimental results show an average error of about 1% and a maximum error of 3% compared to SPICE for different types of logic cells under noisy input waveforms including glitches while the runtime speedup is up to a factor of 16,000.

25 citations


Proceedings Article•DOI•
16 Apr 2007
TL;DR: Experimental results with a RISC processor demonstrate the effectiveness of the proposed stochastic framework and show that the proposed variability-aware power management technique ensures robust system-wide energy savings under probabilistic variations.
Abstract: This paper tackles the problem of dynamic power management (DPM) in nanoscale CMOS design technologies that are typically affected by increasing levels of process, voltage, and temperature (PVT) variations and fluctuations. This uncertainty significantly undermines the accuracy and effectiveness of traditional DPM approaches. More specifically, a stochastic framework was propose to improve the accuracy of decision making in power management, while considering the manufacturing process and/or design induced uncertainties. A key characteristic of the framework is that uncertainties are effectively captured by a partially observable semi-Markov decision process. As a result, the proposed framework brings the underlying probabilistic PVT effects to the forefront of power management policy determination. Experimental results with a RISC processor demonstrate the effectiveness of the technique and show that the proposed variability-aware power management technique ensures robust system-wide energy savings under probabilistic variations

24 citations


Proceedings Article•DOI•
27 Aug 2007
TL;DR: This paper shows how to design an efficient power delivery network for a complex system-on-a-chip (SoC) so as to enable dynamic power management through assignment of appropriate voltage level (and the corresponding clock frequency) to each function block in the SoC.
Abstract: Dynamic voltage scaling (DVS) is known to be one of the most efficient techniques for power reduction of integrated circuits. Efficient low voltage DC-DC conversion is a key enabler for the design of any DVS technique. In this paper we show how to design an efficient power delivery network for a complex system-on-a-chip (SoC) so as to enable dynamic power management through assignment of appropriate voltage level (and the corresponding clock frequency) to each function block in the SoC. We show that the proposed technique reduces the power loss of the power delivery network by an average of 34% while reducing its cost by an average of 8%.

21 citations


Proceedings Article•DOI•
04 Jun 2007
TL;DR: This paper formulizes the problem of selecting the best set of regulators in a tree topology as a dynamic program and efficiently solve it and demonstrates the efficacy of proposed problem formulation and solution.
Abstract: High efficiency low voltage DC-DC conversion is a key enabler to the design of power-efficient integrated circuits. Typically a star configuration of the DC-DC converters, where only one converter resides between the source and each load, is used to deliver currents with appropriate voltage levels to different loads in the circuit, hi this paper we show that using a tree topology of suitably chosen voltage regulators between the power source and loads yields higher power efficiency in the power delivery network. We formulize the problem of selecting the best set of regulators in a tree topology as a dynamic program and efficiently solve it. Experimental results demonstrate the efficacy of proposed problem formulation and solution.

18 citations


Proceedings Article•DOI•
23 Apr 2007
TL;DR: A hierarchical wireless sensor network with mobile overlays, along with a mobility-aware multi-hop routing scheme, is presented and analyzed in order to optimize the network lifetime, delay, and local storage size.
Abstract: Recent technological advances have led to the emergence of small battery-powered sensors with considerable, albeit limited, processing and communication capabilities. Wireless sensor networks have gained considerable attention in applications where spatially distributed events are to be monitored with minimal delay. We present and analyze a hierarchical wireless sensor network with mobile overlays, along with a mobility-aware multi-hop routing scheme, in order to optimize the network lifetime, delay, and local storage size. Fixed event aggregation relays and mobile relays are used to collect events from the sensors and send them to a central base station. We analyze the effects of various system parameters on the network performance, and formulate a convex optimization problem for maximizing the network lifetime subject to constraints on local storage, delay, and maintenance cost. Network behavior is studied and analytical results are validated through simulations

16 citations


Proceedings Article•DOI•
27 Aug 2007
TL;DR: Experimental results show that depending on the activity factor of the circuit, the proposed technique can significantly reduce the power consumption of the global bus interconnects.
Abstract: This paper addresses the problem of power-optimal repeater insertion for global buses in the presence of crosstalk noise. MTCMOS technique by inserting high-Vth sleep transistors to reduce the leakage power consumption in the idle mode is used. We simultaneously calculate the repeater sizes, repeater distances, and the size of the sleep transistors to minimize the power dissipation. The effect of crosstalk coupling capacitance on propagation delay and (switching and short circuit) power dissipation is considered. Experimental results show that depending on the activity factor of the circuit, the proposed technique can significantly reduce the power consumption of the global bus interconnects.

9 citations


Proceedings Article•DOI•
11 Mar 2007
TL;DR: An optimal algorithm for linearly placing the allocated sleep transistors on each standard cell row so as to minimize the performance degradation of the MTCMOS circuit, which is in part due to unwanted voltage drops on its virtual ground network.
Abstract: The Multi-Threshold CMOS (MTCMOS) technology has become a popular technique for standby power reduction. This technology utilizes high-Vth sleep transistors to reduce sub threshold leakage currents during the standby mode of CMOS VLSI Circuits. The performance of MTCMOS circuits strongly depends on the size of the sleep transistors and the parasitics on the virtual ground network. Given a placed net list of a row-based MTCMOS design and the number of sleep transistor cells on each standard cell row, this paper introduces an optimal algorithm for linearly placing the allocated sleep transistors on each standard cell row so as to minimize the performance degradation of the MTCMOS circuit, which is in part due to unwanted voltage drops on its virtual ground network. Experimental results show that, compared to existing methods of placing the sleep transistors on cell rows, the proposed technique results in up to 11% reduction in the critical path delay of the circuit.

6 citations


Proceedings Article•DOI•
05 Nov 2007
TL;DR: In this article, the sizing and placement problems of charge-recycling transistors in CR-MTCMOS can be formulated as a linear programming problem, and hence, can be efficiently solved using standard mathematical programming packages.
Abstract: A downside of using Multi-Threshold CMOS (MTCMOS) technique for leakage reduction is the energy consumption during transitions between sleep and active modes. Previously, a charge recycling (CR) MTCMOS architecture was proposed to reduce the large amount of energy consumption that occurs during the mode transitions in power-gated circuits. Considering the RC parasitics of the virtual ground and VDD lines, proper sizing and placement of charge-recycling transistors is key to achieving the maximum power saving. In this paper, we show that the sizing and placement problems of charge-recycling transistors in CR-MTCMOS can be formulated as a linear programming problem, and hence, can be efficiently solved using standard mathematical programming packages. The proposed sizing and placement techniques allow us to employ the CR-MTCMOS solution in large row-based standard cell layouts while achieving nearly the full potential of this power-gating architecture, i.e., we achieve 44% saving in switching energy due to the mode transition in CR-MTCMOS compared to standard MTCMOS.

4 citations


Journal Article•DOI•
TL;DR: Two new approaches for doing variational gate TA for Gaussian and non-Gaussian sources of variation in parameterized sigmaTA are presented.
Abstract: As technology scales down, timing verification of digital integrated circuits becomes an extremely difficult task due to the gate and wire variability. Therefore, statistical timing analysis (denoted by sigmaTA) is becoming unavoidable. In this paper, two new approaches for doing variational gate TA for Gaussian and non-Gaussian sources of variation in parameterized sigmaTA are presented. To start, a variational RC-pi load is approximated by using a canonical first-order model. Next, an accurate variational gate TA (VGTA) technique, which accounts for variational RC-pi loads, variational input transitions, and a variation-aware gate library, is introduced. The proposed method relies on static effective-capacitance-calculation method and its variational form. Experimental results demonstrate that VGTA exhibits an average error of 4% for gate delay and output transition time with respect to the Monte Carlo simulation with 104 samples. Next, a more efficient VGTA [called Fast VGTA (F-VGTA)] based on a single-iteration variational effective capacitance calculation is presented. Experimental results show that F-VGTA achieves an average error of 7% for gate delay and output transition time with respect to the Monte Carlo simulation with 104 samples but with runtimes that are about two times faster than VGTA.

4 citations


Journal Article•DOI•
TL;DR: A synthesis technique targeted toward coarse-grained antifuse-based field- programmable gate arrays (FPGAs) and an interconnect-aware clustering algorithm that assigns logic cells to individual macrocells so as to minimize the routing costs is presented.
Abstract: In this paper, we present a synthesis technique targeted toward coarse-grained antifuse-based field- programmable gate arrays (FPGAs). A macrologic cell, in this class of FPGAs, has multiple inputs and multiple outputs. A library of small logic cells can be generated from this macrocell and used to map the target netlist. First, we calculate the minimum number of macrologic cells required to map a given circuit by using either a dynamic programming or a linear programming technique. Given this minimum number of macrologic cells, we introduce an interconnect-aware clustering algorithm that assigns logic cells to individual macrocells so as to minimize the routing costs. Alternatively, a timing slack-driven clustering algorithm is presented where timing criticalities of nodes in a network are calculated and used to determine the final packing into the macrocells so as to minimize the number of the macrocells on the critical paths. When compared to results from a commercial tool, our two synthesis techniques reduce the number of macrologic cells by 12% and the maximum depth by 35%, respectively.

Proceedings Article•DOI•
26 Mar 2007
TL;DR: A new unified modeling framework, called the extended queuing PetriNet (EQPN), is presented, which combines extended stochastic Petri net and G/M/I queuing models, to realize the design of reliable systems during the design time, while improving the accuracy and robustness of power optimization for high-speed scalable networking systems.
Abstract: The need to bring high-quality systems to market at ever increasing pace is driving the use of system-level models early in the design process. This paper presents a new unified modeling framework, called the extended queuing Petri net (EQPN), which combines extended stochastic Petri net and G/M/I queuing models, to realize the design of reliable systems during the design time, while improving the accuracy and robustness of power optimization for high-speed scalable networking systems. The EQPN model is employed to represent the performance behaviors and to minimize energy consumption of the system under performance constraints through mathematical programming formulations. Being able to model the system with the EQPN would enable the users to accomplish the design of reliable and optimized system at the beginning of design cycle. The proposed system model is compared with existing stochastic models under real simulation data. Experimental results demonstrate the effectiveness of the modeling framework and show that our proposed energy optimization techniques ensure robust system-wide energy savings under tight performance constraints

Proceedings Article•DOI•
23 Jan 2007
TL;DR: A flow-through-queue (FTQ) based power management method, which allows some of the tasks involved in processing the frame data to be offloaded and can achieve system-wide energy savings under tighter performance constraints is proposed.
Abstract: This paper presents a novel architectural mechanism and a power management structure for the design of an energy-efficient gigabit Ethernet controller. Key characteristics of such a controller are low-latency and high-bandwidth required to meet the pressing demands of extremely high frame and control data, which in turn cause difficulties in managing power dissipation. We propose a flow-through-queue (FTQ) based power management method, which allows some of the tasks involved in processing the frame data to be offloaded. This in turn enables utilization of multiple clock rates and multiple voltages for different cores inside the Ethernet controller. A modeling approach based on semi-Markov decision process (SMDP) and queuing models is employed, which allow one to apply mathematical programming formulations for energy optimization under performance constraints. The proposed gigabit Ethernet controller is designed with a 130nm CMOS technology that includes both high and low threshold voltages. Experimental results show that the proposed power optimization method can achieve system-wide energy savings under tighter performance constraints.

Proceedings Article•DOI•
TL;DR: In this article, a technique based on the sensitivity of the output to input waveform is presented for accurate propagation of delay information through a gate for the purpose of static timing analysis (STA) in the presence of noise.
Abstract: A technique based on the sensitivity of the output to input waveform is presented for accurate propagation of delay information through a gate for the purpose of static timing analysis (STA) in the presence of noise. Conventional STA tools represent a waveform by its arrival time and slope. However, this is not an accurate way of modeling the waveform for the purpose of noise analysis. The key contribution of our work is the development of a method that allows efficient propagation of equivalent waveforms throughout the circuit. Experimental results demonstrate higher accuracy of the proposed sensitivity-based gate delay propagation technique, SGDP, compared to the best of existing approaches. SGDP is compatible with the current level of gate characterization in conventional ASIC cell libraries, and as a result, it can be easily incorporated into commercial STA tools to improve their accuracy.

Posted Content•
TL;DR: In this article, a pixel transformation function that maximizes backlight dimming while maintaining a pre-specified image distortion level for a liquid crystal display is proposed, which maps the original image histogram to a new histogram with lower dynamic range.
Abstract: In this paper, a method is proposed for finding a pixel transformation function that maximizes backlight dimming while maintaining a pre-specified image distortion level for a liquid crystal display. This is achieved by finding a pixel transformation function, which maps the original image histogram to a new histogram with lower dynamic range. Next the contrast of the transformed image is enhanced so as to compensate for brightness loss that would arise from backlight dimming. The proposed approach relies on an accurate definition of the image distortion which takes into account both the pixel value differences and a model of the human visual system and is amenable to highly efficient hardware realization. Experimental results show that the histogram equalization for backlight scaling method results in about 45% power saving with an effective distortion rate of 5% and 65% power saving for a 20% distortion rate. This is significantly higher power savings compared to previously reported backlight dimming approaches.

01 Jan 2007
TL;DR: It is shown that assuming temporal independence or even using first-order temporal models is not sufficient, that is, the inaccuracies induced in steady-state and transition probability calculations are significant for most of the analyzed benchmarks, and experimental results show that, if the order of the source is underestimated, the set of reachable sets can be more than 100% off from the correct ones.
Abstract: This paper illustrates, analytically and quantitatively, the effect of high-order temporal correlations on steady-state and transition probabilities in Finite State Machines (FSMs). As the main theoretical contribution, we extend the previous work done on steady-state probability calculation in FSMs to account for complex spatiotemporal correlations which are present at the primary inputs when the target machine models real hardware and receives data from real applications. More precisely: 1) using the concept of constrained reachability analysis, the correct set of Chapman-Kolmogorov equations are constructed; and 2) based on stochastic complementation and iterative aggregation/disaggregation techniques, exact and approximate techniques for finding the state occupancy probabilities in the target machine are presented. From a practical point of view, we show that assuming temporal independence or even using first-order temporal models is not sufficient, that is, the inaccuracies induced in steady-state and transition probability calculations are significant for most of the analyzed benchmarks. Experimental results show that, if the order of the source is underestimated, not only the set of reachable sets is incorrectly determined, but also the obtained probability values can be more than 100% off from the correct ones.

01 Jan 2007
TL;DR: It is shown that the average switching activity can be predicted without simulation using either entropy or informational energy averages, and two new measures relying on these concepts are developed.
Abstract: The problem of estimating the power consumption at logic and register transfer levels is addressed from an information theoretical point of view. It is shown that the average switching activity can be predicted without simulation using either entropy or informational energy averages. Consequently, two new measures relying on these concepts are developed. The accuracy of these models is investigated using common benchmarks and results are reported.