scispace - formally typeset
Search or ask a question

Showing papers by "Massoud Pedram published in 2009"


Book
01 Jan 2009
TL;DR: The present work focuses on the design of low power circuit technologies for portable video-on-demand in wireless communication using CMOS, and the development of algorithms and architectural level methodologies for this purpose.
Abstract: Preface. 1. Introduction J.M. Rabaey, et al. Part I: Technology and circuit design levels. 2. Device and technology impact on low power electronics Chenming Hu. 3. Low power circuit technologies C. Svensson, Dake Liu. 4. Energy-recovery CMOS W.C. Athas. 5. Low power clock distribution J.G. Xi, W.W.-M. Dai. Part II: Logic and module design levels. 6. Logic synthesis and module design levels M. Pedram. 7. Low power arithmetic components T.K. Callawy, E.E. Schwartzlander. 8. Low power memory design K. Itoh. Part III: Architecture and system design levels. 9. Low-power microprocessor design S. Gary. 10. Portable video-on-demand in wireless communication T.H. Meng, et al. 11. Algorithm and architectural level methodologies R. Mehra, et al. Index.

784 citations


Proceedings ArticleDOI
19 Aug 2009
TL;DR: The resulting optimization problem is formulated as an Integer Linear Programming problem and a heuristic algorithm that solves it in polynomial time is presented, showing an average of 13% power saving for different data center utilization rates.
Abstract: This paper focuses on power minimization in a data center accounting for both the information technology equipment and the air conditioning power usage. In particular we address the server consolidation (on/off state assignment) concurrently with the task assignment. We formulate the resulting optimization problem as an Integer Linear Programming problem and present a heuristic algorithm that solves it in polynomial time. Experimental results show an average of 13% power saving for different data center utilization rates compared to a baseline task assignment technique, which does not perform server consolidation.

233 citations


Journal ArticleDOI
TL;DR: Two static random access memory (SRAM) cells that reduce the static power dissipation due to gate and subthreshold leakage currents are presented.
Abstract: In this paper, two static random access memory (SRAM) cells that reduce the static power dissipation due to gate and subthreshold leakage currents are presented. The first cell structure results in reduced gate voltages for the NMOS pass transistors, and thus lowers the gate leakage current. It reduces the subthreshold leakage current by increasing the ground level during the idle (inactive) mode. The second cell structure makes use of PMOS pass transistors to lower the gate leakage current. In addition, dual threshold voltage technology with forward body biasing is utilized with this structure to reduce the subthreshold leakage while maintaining performance. Compared to a conventional SRAM cell, the first cell structure decreases the total gate leakage current by 66% and the idle power by 58% and increases the access time by approximately 2% while the second cell structure reduces the total gate leakage current by 27% and the idle power by 37% with no access time degradation.

84 citations


Journal ArticleDOI
TL;DR: A low-power structure called bypass zero, feed A directly (BZ-FAD) for shift-and-add multipliers is proposed, which considerably lowers the switching activity of conventional multipliers.
Abstract: In this paper, a low-power structure called bypass zero, feed A directly (BZ-FAD) for shift-and-add multipliers is proposed. The architecture considerably lowers the switching activity of conventional multipliers. The modifications to the multiplier which multiplies A by B include the removal of the shifting the B register, direct feeding of A to the adder, bypassing the adder whenever possible, using a ring counter instead of a binary counter and removal of the partial product shift. The architecture makes use of a low-power ring counter proposed in this work. Simulation results for 32-bit radix-2 multipliers show that the BZ-FAD architecture lowers the total switching activity up to 76% and power consumption up to 30% when compared to the conventional architecture. The proposed multiplier can be used for low-power applications where the speed is not a primary design parameter.

60 citations


Journal ArticleDOI
TL;DR: This paper presents an empirically-derived synthetic traffic model based on the Negative Exponential Distribution for homogenous and heterogeneous Network-on-chips (NoCs) with any dimensionality and shows that the NED traffic profile has more similarity with the realistic traffic profiles than those of conventional synthetic ones.
Abstract: In this paper, we present an empirically-derived synthetic traffic model based on the Negative Exponential Distribution (NED) for homogenous and heterogeneous Network-on-chips (NoCs) with any dimensionality. Compared to conventional synthetic traffic profiles, this synthetic traffic profile accurately captures key statistical behavior of realistic traces obtained by running different applications on Network-on-chips. To assess the usefulness of this new NoC traffic model, the average packet hops for the proposed traffic profile is compared with those of some synthetic and realistic traffic patterns. The results show that the NED traffic profile has more similarity with the realistic traffic profiles than those of conventional synthetic ones. Adding this traffic profile to the existing profiles, improves the design and characterization of NoCs.

55 citations


Journal ArticleDOI
TL;DR: It is shown that, in a SoC design with static-voltage assignment, a multilevel tree topology of suitably chosen dc-dc converters between the power source and loads can result in higher power efficiency in the PDN.
Abstract: This paper introduces techniques for power-efficient design of power-delivery network (PDN) in multiple voltage-island system-on-chip (SoC) designs. The first technique is targeted to SoC designs with static-voltage assignment, while the second technique is pertinent to SoC designs with dynamic-voltage scaling (DVS) capability. Conventionally, a single-level configuration of dc-dc converters, where exactly one converter resides between the power source and each load, is used to deliver currents at appropriate voltage levels to different loads on the chip. In the presence of DVS capability, each dc-dc converter in this network should be able to adjust its output voltage. In the first part of this paper, it is shown that, in a SoC design with static-voltage assignment, a multilevel tree topology of suitably chosen dc-dc converters between the power source and loads can result in higher power efficiency in the PDN. The problem is formulated as a combinatorial problem and is efficiently solved by dynamic programming. In the second part of this paper, a new technique is presented to design the PDN for a SoC design to support DVS. In this technique, the PDN is composed of two layers. In the first layer, dc-dc converters with fixed output voltages are used to generate all voltage levels that are needed by different loads in the SoC design. In the second layer of the PDN, a power-switch network is used to dynamically connect the power-supply terminals of each load to the appropriate dc-dc converter output in the first layer. Experimental results demonstrate the efficacy of both techniques.

47 citations


Proceedings ArticleDOI
16 Mar 2009
TL;DR: A tri-modal switch cell is presented that enables implementation of multimodal power gating, including active, data- retentive drowsy, and deep sleep modes, and an additional low leakage data-retentive mode.
Abstract: Designing a power-gating structure with high performance in the active mode and low leakage and short wakeup time during standby mode is an important and challenging task. This paper presents a tri-modal switch cell that enables implementation of multimodal power gating, including active, data-retentive drowsy, and deep sleep modes. A circuit realization and design methodology are presented that allow one to take advantage of the ultra low leakage deep sleep mode, low leakage, but very fast wakeup, drowsy mode, and an additional low leakage data-retentive mode. Experimental results demonstrate the benefits of this new switch and corresponding power gating technique.

33 citations


Journal ArticleDOI
TL;DR: Measurement results show that the proposed techniques significantly reduce the motion‐blur artifacts, enhance the static contrast ratio by about 3×, and reduce the power consumption by 10% on average.
Abstract: A 1-D LED backlight scanning and a 2-D local dimming technique for large LCD TVs are presented. These techniques not only reduce the motion blur artifacts by means of impulse representation of images in video but also increase the static contrast ratio by means of local dimming in the image(s). Both techniques exploit a unique feature of LED backlight in large LCD TVs in which the whole panel is divided into a pre-defined number of regions such that the luminance in each region is independently controllable. The proposed techniques are implemented in a Xilinx FPGA and demonstrated on a Samsung 40-inch LCD TV. Measurement results show that the proposed techniques significantly reduce the motion blur artifacts, enhance the static contrast ratio by about 3X, and reduce the power consumption by 10% on average. Keywords- LED backlight, backlight scanning, backlight local dimming, motion blur, contrast ratio, low power.

17 citations


Journal ArticleDOI
TL;DR: A predictive-flow-queue-based packet interface architecture is presented, which adjusts the operating frequency of different functional blocks at a fine granularity so as to minimize the total system energy dissipation while attaining performance goals.
Abstract: This paper presents energy-efficient packet interface architecture and a power management technique for gigabit Ethernet controllers, where low-latency and high-bandwidth are required to meet the pressing demands of very high frame-rate data. More specifically, a predictive-flow-queue (PFQ)-based packet interface architecture is presented, which adjusts the operating frequency of different functional blocks at a fine granularity so as to minimize the total system energy dissipation while attaining performance goals. A key feature of the proposed architecture is the implementation of a runtime workload prediction method for the network traffic along with a continuous frequency adjustment mechanism, which enables one to eliminate the latency and energy penalties associated with discrete power mode transitions. Furthermore, a stochastic modeling framework based on Markovian decision processes and queuing models is employed, which make it possible to adopt a precise mathematical programming formulation for the energy optimization under performance constraints. Experimental results with a designed 65-nm Gb Ethernet controller show that the proposed interface architecture and continuous frequency scaling result in system-wide energy savings while meeting performance specifications.

14 citations


Journal ArticleDOI
TL;DR: A stochastic framework to improve the accuracy of decision making during dynamic power management, while considering manufacturing process and/or environment induced uncertainties is presented.
Abstract: This paper tackles the problem of dynamic power management (DPM) in nanoscale CMOS design technologies that are typically affected by increasing levels of process and temperature variations and fluctuations due to the randomness in the behavior of silicon structure. This uncertainty undermines the accuracy and effectiveness of traditional DPM approaches. This paper presents a stochastic framework to improve the accuracy of decision making during dynamic power management, while considering manufacturing process and/or environment induced uncertainties. More precisely, variability and uncertainty at the system level are captured by a partially observable semi-Markov decision process with interval-based definition of states while the policy optimization problem is formulated as a mathematical program based on this model. Experimental results with a RISC processor in 65-nm technology demonstrate the effectiveness of the technique and show that the proposed uncertainty-aware power management technique ensures system-wide energy savings under statistical circuit parameter variations.

11 citations


Journal ArticleDOI
TL;DR: In this paper, the clocks of the transmitter and the receiver are generated with two separate ring oscillators, although they can have a small frequency difference, and a single physical line is used to transmit both data and control information, further reducing the power dissipation.
Abstract: This paper presents two novel methods for on-chip serial communication in which the clocks of the transmitter and the receiver are generated with two separate ring oscillators. These oscillators are identical, although they can have a small frequency difference. In the first method, a strobe line that toggles exactly once with every frame of n-bit data is used to activate the oscillators. Local counters are used to count the number of bits in the data frame and to stop the local oscillators when the frame has been processed. In the second method, a single physical line is used to transmit both data and (in-band) control information, further reducing the power dissipation. The data transmission is controlled by the output of a starter flip-flop that indicates the empty/full state of an input buffer, whereas the data reception is controlled by the decoding of a "1" start bit and a "0" end bit, both of which are added to the n-bit data word to form a frame. Circuit simulation results demonstrate that both communication methods have high bandwidth and low power dissipation.

Journal ArticleDOI
TL;DR: A forecasting-based dynamic virtual channel allocation technique for reducing the power consumption of network on chips is introduced based on an exponential smoothing forecasting method that filters out short-term traffic fluctuations.
Abstract: In this paper, a forecasting-based dynamic virtual channel allocation technique for reducing the power consumption of network on chips is introduced. Based on the network traffic as well as past link and total virtual channel utilizations, the technique dynamically forecasts the number of virtual channels that should be active. It is based on an exponential smoothing forecasting method that filters out short-term traffic fluctuations. In this technique, for low (high) traffic loads, a small (large) number of VCs are allocated to the corresponding input channel. To assess the efficacy of the proposed method, the network on chip has been simulated using uniform, transpose, hotspot, NED, and realistic GSM voice codec traffic profiles. Simulation results show that up to a 35% reduction in the buffer power consumption and up to 20% savings in the overall router power consumption may be achieved. The area and power dissipation overheads of the technique are negligible.

Proceedings ArticleDOI
20 Apr 2009
TL;DR: Experimental results indicate that adaptive waveforms representation results in higher compression ratios than the waveform representation as a function of fixed set of basis functions.
Abstract: This paper describes a waveform compression technique suitable for the efficient utilization, storage and interchange of the emerging current source model (CSM) based cell libraries. The technique is based on pre-processing of a collection of voltage/current waveforms for the cells in the library and then, constructing an orthogonal time-voltage/time-current waveform basis using singular-value decomposition. Compression is achieved by representing all waveforms as linear combination coefficients of adaptive subset of the basis waveforms. Experimental results indicate that adaptive waveform representation results in higher compression ratios than the waveform representation as a function of fixed set of basis functions. Interpolation and further compression are obtained by representing the coefficients as simple functions of various parameters, e.g., input slew, load capacitance, supply voltage, and temperature. The methods introduced in this paper are tested and validated on several industrial strength libraries, with spectacular compression results.

Proceedings ArticleDOI
04 Oct 2009
TL;DR: Wrong-path instruction Clock Gating (WPCG) detects wrong-path instructions in the event of branch mis-prediction and prevents them from being issued to the FUs, and subsequently, disables the clock of these FUs along with reducing the stress on register file and cache.
Abstract: In this paper we present deterministic clock gating schemes for various micro architectural blocks of a modern out-of-order superscalar processor. We propose to make use of 1) idle stages of the pipelined function units (FUs) and 2) wrong-path instruction execution during branch mis-prediction, in order to clock gate various stages of FUs. The baseline Pipelined Functional unit Clock Gating (PFCG), presented for evaluation purpose only, disables the clock on idle stages and thus results in 13.93% chip-wide energy saving. Wrong-path instruction Clock Gating (WPCG) detects wrong-path instructions in the event of branch mis-prediction and prevents them from being issued to the FUs, and subsequently, disables the clock of these FUs along with reducing the stress on register file and cache. Simulations demonstrate that more than 92% of all wrong-path instructions can be detected and stopped from being executed. The WPCG architecture results in 16.26% chip-wide energy savings which is 2.33% more than that of the baseline PFCG scheme.

Proceedings ArticleDOI
10 May 2009
TL;DR: This presentation will first explain what is meant by green computing and how greenness of information processing may be quantified, and energy-efficient computing paradigms which utilize chip multi-processing, multiple-voltage domains, dynamic voltage/frequency scaling, and power/clock gating techniques will be reviewed.
Abstract: Digital information management is the key enabler for unprecedented rise in productivity and efficiency gains experienced by the world economies during the 21st century. Information processing systems have thus become essential to the functioning of business, service, academic, and governmental institutions. As institutions increase their offerings of digital information services, the demand for computation and storage capability also increases. Examples include online banking, e-filing of taxes, music and video downloads, online shipment tracking, real-time inventory/supply-chain management, electronic medical recording, insurance database management, surveillance and disaster recovery. It is estimated that, in some industries, the number of records that must be retained is growing at a CAGR of 50 percent or greater. This exponential increase in the digital intensity of human existence is driven by many factors, including ease of use and availability of a rich set of information technology (IT) devices and services. Indeed, it would be difficult to imagine how significant societal transformations that better our world could occur without the productivity and innovation enabled by the IT. Unfortunately, the energy cost and carbon footprint of the IT devices and services has become exorbitant. Moreover, current technological and digital service utilization trends result in a doubling of the energy cost of the IT infrastructure and its carbon footprint in less than five years. In an energy-constrained world, this consumption trend is unsustainable and comes at increasingly unacceptable societal and environmental costs. This presentation will first explain what is meant by green computing and how greenness of information processing may be quantified. Next, energy-efficient computing paradigms which utilize chip multi-processing, multiple-voltage domains, dynamic voltage/frequency scaling, and power/clock gating techniques will be reviewed. Finally, techniques for improving performance per Watt of large-scale information processing and storage systems (e.g., a data center), including hierarchical dynamic power management, task placement and scheduling, energy balancing, resource virtualization, and application optimizations that dynamically configure hardware for higher efficiency will be discussed.

Journal ArticleDOI
TL;DR: The problem of low-power fanout optimization can be reduced to inverter-chain optimization problem and the minimization of the total power consumption of an inverter chain as a geometric program is formulates.
Abstract: This paper addresses the problem of low-power fanout optimization for near-continuous-size inverter libraries. It is demonstrated that because of neglecting short-circuit current, previous techniques proposed to optimize the area of a fanout tree may result in excessive power consumption. This paper describes how the problem of low-power fanout optimization can be reduced to inverter-chain optimization problem and formulates the minimization of the total power consumption of an inverter chain as a geometric program. Moreover, it describes an efficient method to minimize the total power consumption of a fanout tree by using multiple-channel-length (multi- L Gate) and multiple-threshold-voltage (multi- Vt) techniques. Experimental results show that the proposed technique can reduce the power consumption of the fanout trees by an average of 11.17% over SIS fanout-optimization program.

Proceedings ArticleDOI
11 Jan 2009
TL;DR: The notion of network durability is presented, which captures the spatiotemporal life/death patterns of devices in a wireless network by examining the time evolution of spatial patterns according to which devices are progressively forced to exit the network having exhausted their energy resource.
Abstract: Given the criticality of energy awareness in wireless networks, it has become essential to devise an improved definition of the network lifetime at the system design stage. The new definition must capture the life profile of the network while accounting for its functionality and specific design parameters. This paper presents the notion of network durability, which captures the spatiotemporal life/death patterns of devices in a wireless network by examining the time evolution of spatial patterns according to which devices are progressively forced to exit the network having exhausted their energy resource. Using network durability, we show how networks can satisfy different levels of monitoring criticality, even when they exhibit the same conventionally defined lifetime. Finally, as an example application, we consider a heterogeneous location-aware modulation scheme where the proposed durability model is effectively employed to characterize the network lifetime.

01 Jan 2009
TL;DR: Experimental results with a designed 65-nm Gb Ethernet controller show that the proposed interface architecture and continuous frequency scaling result in system-wide energy savings while meeting performance specifications.
Abstract: This paper presents an energy-efficient packet inter- face architecture and a power management technique for gigabit Ethernet controllers, where low-latency and high-bandwidth are required to meet the pressing demands of very high frame-rate data. More specifically, a predictive-flow-queue (PFQ)-based packet interface architecture is presented, which adjusts the operating frequency of different functional blocks at a fine granu- larity so as to minimize the total system energy dissipation while attaining performance goals. A key feature of the proposed archi- tecture is the implementation of a runtime workload prediction method for the network traffic along with a continuous frequency adjustment mechanism, which enables one to eliminate the latency and energy penalties associated with discrete power mode tran- sitions. Furthermore, a stochastic modeling framework based on Markovian decision processes and queuing models is employed, which make it possible to adopt a precise mathematical program- ming formulation for the energy optimization under performance constraints. Experimental results with a designed 65-nm Gb Ethernet controller show that the proposed interface architecture and continuous frequency scaling result in system-wide energy savings while meeting performance specifications.