scispace - formally typeset
Search or ask a question

Showing papers by "Massoud Pedram published in 2004"


Journal Article•DOI•
TL;DR: Two runtime mechanisms for reducing the leakage current of a CMOS circuit are described and a design technique for applying the minimum leakage input to a sequential circuit is presented, which shows that it is possible to reduce the leakage by an average of 25% with practically no delay penalty.
Abstract: The first part of this paper describes two runtime mechanisms for reducing the leakage current of a CMOS circuit. In both cases, it is assumed that the system or environment produces a "sleep" signal that can be used to indicate that the circuit is in a standby mode. In the first method, the "sleep" signal is used to shift in a new set of external inputs and pre-selected internal signals into the circuit with the goal of setting the logic values of all of the internal signals so as to minimize the total leakage current in the circuit. This minimization is possible because the leakage current of a CMOS gate is strongly dependent on the input combination applied to its inputs. In the second method, nMOS and pMOS transistors are added to some of the gates in the circuit to increase the controllability of the internal signals of the circuit and decrease the leakage current of the gates using the "stack effect". This is, however, done carefully so that the minimum leakage is achieved subject to a delay constraint for all input-output paths in the circuit. In both cases, Boolean satisfiability is used to formulate the problems, which are subsequently solved by employing a highly efficient SAT solver. Experimental results on the combinational circuits in the MCNC91 benchmark suite demonstrate that it is possible to reduce the leakage current in combinational circuits by an average of 25% with only a 5% delay penalty. The second part of this paper presents a design technique for applying the minimum leakage input to a sequential circuit. The proposed method uses the built-in scan-chains in a VLSI circuit to drive it with the minimum leakage vector when it enters the sleep mode. The use of these scan registers eliminates the area and delay overhead of the additional circuitry that would otherwise be needed to apply the minimum leakage vector to the circuit. Experimental results on the sequential circuits in the MCNC91 benchmark suit show that, by using the proposed method, it is possible to reduce the leakage by an average of 25% with practically no delay penalty.

293 citations


Proceedings Article•DOI•
09 Aug 2004
TL;DR: In this article, the CPU workload is decomposed in two parts: on-chip and off-chip, and the workload decomposition itself is performed at run time based on statistics reported by a performance monitoring unit (PMU) without a need for application profiling or compiler support.
Abstract: This paper presents a technique called "workload decomposition" in which the CPU workload is decomposed in two parts: on-chip and off-chip. The on-chip workload signifies the CPU clock cycles that are required to execute instructions in the CPU whereas the off-chip workload captures the number of external memory access clock cycles that are required to perform external memory transactions. When combined with a dynamic voltage and frequency scaling (DVFS) technique to minimize the energy consumption, this workload decomposition method results in higher energy savings. The workload decomposition itself is performed at run time based on statistics reported by a performance monitoring unit (PMU) without a need for application profiling or compiler support. We have implemented the proposed DVFS with workload decomposition technique on the BitsyX platform, an Intel PXA255-based platform manufactured by ADS Inc., and performed detailed energy measurements. These measurements show that, for a number of widely used software applications, a CPU energy saving of 80% can be achieved for memory-bound programs while satisfying the user-specified timing constraints.

217 citations


Journal Article•DOI•
TL;DR: This paper presents a concurrent brightness and contrast scaling (CBCS) technique for a cold cathode fluorescent lamp (CCFL) back/it TFT-LCD display and proposes the contrast distortion metric to quantify the image quality loss after backlight scaling.
Abstract: This paper presents a concurrent brightness and contrast scaling (CBCS) technique for a cold cathode fluorescent lamp (CCFL) back/it TFT-LCD display. The proposed technique aims at conserving power by reducing the backlight illumination while retaining the image fidelity through preservation of the image contrast. First, we explain how CCFL works and show how to model the non-linearity between its backlight illumination and power consumption. Next, we propose the contrast distortion metric to quantify the image quality loss after backlight scaling. Finally, we formulate and optimally solve the CBCS optimization problem subject to contrast distortion. Experimental results show that an average of 3.7X power saving can be achieved with only 10% of contrast distortion.

193 citations


Proceedings Article•DOI•
16 Feb 2004
TL;DR: The proposed DVFS technique relies on dynamically-constructed regression models that allow the CPU to calculate the expected workload and slack time for the next time slot, and thus, adjust its voltage and frequency in order to save energy while meeting soft timing constraints.
Abstract: This paper presents an intra-process dynamic voltage and frequency scaling (DVFS) technique targeted toward non real-time applications running on an embedded system platform. The key idea is to make use of runtime information about the external memory access statistics in order to perform CPU voltage and frequency scaling with the goal of minimizing the energy consumption while translucently controlling the performance penalty. The proposed DVFS technique relies on dynamically-constructed regression models that allow the CPU to calculate the expected workload and slack time for the next time slot, and thus, adjust its voltage and frequency in order to save energy while meeting soft timing constraints. This is in turn achieved by estimating and exploiting the ratio of the total off-chip access time to the total on-chip computation time. The proposed technique has been implemented on an XScale-based embedded system platform and actual energy savings have been calculated by current measurements in hardware. For memory-bound programs, a CPU energy saving of more than 70% with a performance degradation of 12% was achieved. For CPU-bound programs, 15/spl sim/60% CPU energy saving was achieved at the cost of 5-20% performance penalty.

149 citations


Proceedings Article•DOI•
16 Feb 2004
TL;DR: This paper presents a concurrent brightness and contrast scaling (CBCS) technique for a cold cathode fluorescent lamp (CCFL) backlit TFT-LCD display, and proposes the contrast distortion metric to quantify the image quality loss after backlight scaling.
Abstract: This paper presents a concurrent brightness and contrast scaling (CBCS) technique for a cold cathode fluorescent lamp (CCFL) backlit TFT-LCD display. The proposed technique aims at conserving power by reducing the backlight illumination while retaining the image fidelity through preservation of the image contrast. First, we explain how CCFL works and show how to model the non-linearity between its backlight illumination and power consumption. Next, we propose the contrast distortion metric to quantify the image quality loss after backlight scaling. Finally, we formulate and optimally solve the CBCS optimization problem with the objective of minimizing the fidelity and power metrics. Experimental results show that an average of 3.7X power saving can be achieved with only 10% of contrast distortion.

115 citations


Journal Article•DOI•
TL;DR: In this article, a backlight power management framework and trade-offs in the extended dynamic-luminance-scaling design space in terms of energy reduction, performance penalty, and image quality are explored.
Abstract: Thin-film transistor liquid-crystal displays are systems widely used to support full-featured multimedia. For such systems, backlight is a major source of power dissipation. This article introduces a backlight power management framework and explores trade-offs in the extended dynamic-luminance-scaling design space in terms of energy reduction, performance penalty, and image quality.

95 citations


Proceedings Article•DOI•
07 Nov 2004
TL;DR: This work presents a dynamic voltage and frequency scaling (DVFS) technique that minimizes the total system energy consumption for performing a task while satisfying a given execution time constraint and implements this technique on the BitsyX platform.
Abstract: This work presents a dynamic voltage and frequency scaling (DVFS) technique that minimizes the total system energy consumption for performing a task while satisfying a given execution time constraint. We first show that in order to guarantee minimum energy for task execution by using DVFS it is essential to divide the system power into active and standby power components. Next, we present a new DVFS technique, which considers not only the active power, but also the standby component of the system power. This is in sharp contrast with previous DVFS techniques, which only consider the active power component. We have implemented the proposed DVFS technique on the BitsyX platform - an Intel PXA255-based platform manufactured by ADS Inc., and report detailed power measurements on this platform. These measurements show that, compared to conventional DVFS techniques, an additional system energy saving of up to 18% can be achieved while satisfying the user-specified timing constraints.

70 citations


Proceedings Article•DOI•
27 Jan 2004
TL;DR: This work presents an adaptive and incremental re-compression technique to maintain efficiency under frequent partial frame buffer updates, based on a run-length encoding for on-the-fly compression, with a negligible burden in resources and time.
Abstract: Despite the limited power available in a battery-operated hand-held device, a display system must still have an enough resolution and sufficient color depth to deliver the necessary information. We introduce some methodologies for frame buffer compression that efficiently reduce the power consumption of display systems and thus distinctly extend battery life for hand-held applications. Our algorithm is based on a run-length encoding for on-the-fly compression, with a negligible burden in resources and time. We present an adaptive and incremental re-compression technique to maintain efficiency under frequent partial frame buffer updates. We save about 30% to 90% frame buffer activity on average for various hand-held applications. We have implemented an LCD controller with frame buffer compression occupying 1,026 slices and 960 flip-flops in a Xilinx Sprantan-II FPGA, which has an equivalent gate count of 65,000 gates. It consumes 30mW more power and 10% additional silicon space than an LCD controller without frame buffer compression, but reduces the power consumption of the frame buffer memory by 400mW.

56 citations


Proceedings Article•DOI•
07 Jun 2004
TL;DR: A prediction error compensation method, called inter-frame compensation, is proposed in which the on-chip workload prediction error is diffused into subsequent frames such that run time frame rates change smoothly.
Abstract: This paper describes a dynamic voltage and frequency scaling (DVFS) technique for MPEG decoding to reduce the energy consumption using the computational workload decomposition. This technique decomposes the workload for decoding a frame into on-chip and off-chip workloads. The execution time required for the on-chip workload is CPU frequency-dependent, whereas the off-chip workload execution time does not change, regardless of the CPU frequency, resulting in the maximum energy savings by setting the minimum frequency during off-chip workload execution time, without causing any delay penalty. This workload decomposition is performed using a performance-monitoring unit (PMU) in the XScale-processor, which provides various statistics such as cache hit/miss and CPU stall, due to data dependency at run time. The on-chip workload for an incoming frame is predicted using a frame-based history so that the processor voltage and frequency can be scaled to provide the exact amount of computing power needed to decode the frame. To guarantee a quality of service (QoS) constraint, a prediction error compensation method, called inter-frame compensation, is proposed in which the on-chip workload prediction error is diffused into subsequent frames such that run time frame rates change smoothly. The proposed DVFS algorithm has been implemented on an XScale-based Testbed. Detailed current measurements on this platform demonstrate significant CPU energy savings ranging from 50% to 80% depending on the video clip.

42 citations


Proceedings Article•DOI•
21 Mar 2004
TL;DR: This paper presents a lifetime-aware multicast routing algorithm that maximizes the ad hoc network lifetime by finding routing solutions that minimize the variance of the remaining energies of the nodes in the network.
Abstract: One of the main design constraints in mobile ad hoc networks (MANETs) is that they are energy constrained. Hence, routing algorithms must be developed to consider energy consumption of the nodes in the network as a primary goal. In MANETS, every node has to perform the junctions of a router. So if some nodes die early due to lack of energy and/or the network becomes fragmented, then it will not be possible for other nodes in the network to communicate with each other. This paper presents a lifetime-aware multicast routing algorithm that maximizes the ad hoc network lifetime by finding routing solutions that minimize the variance of the remaining energies of the nodes in the network. Extensive simulation results are provided to evaluate the performance of the new routing algorithm compared to a number of different metrics and in comparison to a variety of existing multicast routing algorithms.

41 citations


Journal Article•DOI•
TL;DR: This paper presents a low-power encoding technique, called chromatic encoding, for the digital visual interface standard (DVI), a digital serial video interface that reduces power consumption by minimizing the transition counts on the DVI.
Abstract: This paper presents a low-power encoding technique, called chromatic encoding, for the digital visual interface standard (DVI), a digital serial video interface. Chromatic encoding reduces power consumption by minimizing the transition counts on the DVI. This technique relies on the notion of tonal locality, i.e., the observation - first made in this paper - that the signal differences between adjacent pixels in images follow a Gaussian distribution. Based on this observation, an optimal code assignment is performed to minimize the transition counts. Furthermore, the three-color channels of the DVI may be reciprocally encoded to achieve even more power saving. The idea is that given the signal values from the three-color channels, one or two of these channels are encoded by reciprocal differences with a number of redundant bits used to indicate the selection. The channel selection problem is formulated as a minimum spanning tree problem and solved accordingly. The proposed technique requires only three redundant bits for each 24-bit pixel. Experimental results show up to a 75% power reduction in the DVI.

Journal Article•DOI•
TL;DR: A class of irredundant low-power techniques for encoding instruction or data source words before they are transmitted over buses that are quite effective in reducing the number of interpattern transitions on the bus, while incurring rather small power and delay overheads.
Abstract: In this paper, we introduce a class of irredundant low-power techniques for encoding instruction or data source words before they are transmitted over buses. The key idea is to partition the source-word space into a number of sectors with unique identifiers called sector heads. These sectors can, for example, correspond to address spaces for the code, heap, and stack segments of one or more application programs. Each source word is then dynamically mapped to the appropriate sector and is encoded with respect to the sector head. In general, the sectors may be determined a priori or can dynamically be updated based on the source word that was last encountered in that sector. These sector-based encoding techniques are quite effective in reducing the number of interpattern transitions on the bus, while incurring rather small power and delay overheads. For a computer system without an on-chip cache, the proposed techniques decrease the switching activity of data address and multiplexed address buses by an average of 55% to 67%, respectively. For a system with on-chip cache, up to 55% transition reduction is achieved on a multiplexed address bus between the internal cache and the external memory. Assuming a 10 pF per line bus capacitance, we show that, by using the proposed encoding techniques, a power reduction of up to 52% can be achieved for an external data address bus and 42% for the multiplexed bus between cache and main memory.

Proceedings Article•DOI•
27 Jan 2004
TL;DR: This paper presents a solution to the problem of designing interconnects for memory devices as an over-the-cell channel routing problem under pre-specified routing topologies and performance constraints, and proposes TANAR, a proposed routing method that significantly reduces both crosstalk for noise sensitive nets and delay for timing critical nets while minimizing channel height.
Abstract: This paper presents a solution to the problem of designing interconnects for memory devices. More precisely, it solves the automatic routing problem of memory peripheral circuits as an over-the-cell channel routing problem under pre-specified routing topologies and performance constraints. The proposed routing method, named TANAR, consists of two steps: a performance-driven net partitioning step, which constructs a routing topology for each net according to performance constraints, and a performance-driven track assignment step, which reduces the crosstalk noise. Experimental results demonstrate that TANAR significantly reduces both crosstalk for noise sensitive nets, and delay for timing critical nets while minimizing channel height.

Book Chapter•DOI•
29 Nov 2004
TL;DR: In this article, the authors propose a new algorithm: 0.32.0.0-1.0/1.00/0.1/1/0/2.
Abstract: 32.

Proceedings Article•DOI•
16 Feb 2004
TL;DR: This paper addresses a few fundamental issues that make the design process particularly challenging and offers a holistic perspective towards a coherent design methodology.
Abstract: Multimedia systems play a central part in many human activities. Due to the significant advances in the VLSI technology, there is an increasing demand for portable multimedia appliances capable of handling advanced algorithms required in all forms of communication. Over the years, we have witnessed a steady move from standalone (or desktop) multimedia to deeply distributed multimedia systems. Whereas desktop-based systems are mainly optimized based on the performance constraints, power consumption is the key design constraint for multimedia devices that draw their energy from batteries. The overall goal of successful design is then to find the best mapping of the target multimedia application onto the architectural resources, while satisfying an imposed set of design constraints (e.g. minimum power dissipation, maximum performance) and specified QoS metrics (e.g. end-to-end latency, jitter, loss rate) which directly impact the media quality. This paper addresses a few fundamental issues that make the design process particularly challenging and offers a holistic perspective towards a coherent design methodology.

Book Chapter•DOI•
01 Jan 2004
TL;DR: This chapter presents transmittance scaling; a technique aimed at conserving power in a transmissive TFT-LCD with a cold cathode fluorescent lamp (CCFL) backlight by reducing the backlight illumination while compensating for the luminance loss.
Abstract: This chapter presents transmittance scaling; a technique aimed at conserving power in a transmissive TFT-LCD with a cold cathode fluorescent lamp (CCFL) backlight by reducing the backlight illumination while compensating for the luminance loss. This goal is accomplished by adjusting the transmittance function of the TFT-LCD panel while meeting an upper bound on a contrast distortion metric. Experimental results show that an average of 3.7X power saving can be achieved for still images with a mere 10% contrast distortion.

Proceedings Article•DOI•
27 Jan 2004
TL;DR: A new synthesis flow for anti-fuse based FPGAs with multiple-output logic cells by using a dynamic programming based approach and results are provided to assess the effectiveness of the proposed mapping and packing techniques.
Abstract: We present a new synthesis flow for anti-fuse based FPGAs with multiple-output logic cells. The flow consists of two steps: mapping and packing. The mapper finds mapping solutions using a dynamic programming-based approach that finds the best match at each node of the decomposed target circuit. After this mapping step is completed, the resulting netlist of cells in optimally packed into net list of logic cells by using a multi-dimensional coin change problem formulation which is again solved by a dynamic programming based approach. Experimental results for Quicklogic's pASIC3 logic family are provided to assess the effectiveness of the proposed mapping and packing techniques.

Patent•
20 Sep 2004
TL;DR: In this paper, a method for reducing transitions on a bus that includes receiving an input trace and constructing a Markov source correlating to the input trace is presented, which can either minimize or maximize an objective function associated with the trace.
Abstract: A method for reducing transitions on a bus is provided that includes receiving an input trace and constructing a Markov source correlating to the input trace. The method also includes identifying an encoding technique, which can either minimize or maximize an objective function associated with the input trace.

Proceedings Article•DOI•
26 Apr 2004
TL;DR: Experimental results show that on a large industrial circuit using a state-of-the-art commercial timing analysis that incorporates TFA, the algorithm was able to achieve delay and slew estimation accuracies that were quite comparable with the full-blown AWE-based calculators at runtimes that were only 14% higher than those of a simple Elmore-delay calculator.
Abstract: This paper describes an efficient threshold-based filtering algorithm (TFA) for calculating the interconnect delay and slew (transition time) in high-speed VLSI circuits. The key idea is to divide the circuit nets into three groups of low, medium and high complexity nets, whereby for low and medium complexity nets either the first moment of the impulse response or the first and second moments are used. For the high-complexity nets, which are encountered infrequently, TFA resorts to the AWE method. The key contribution of the paper is to come up with very effective and efficient way of classifying the nets into these three groups. Experimental results show that on a large industrial circuit using a state-of-the-art commercial timing analysis that incorporates TFA, we were able to achieve delay and slew estimation accuracies that are quite comparable with the full-blown AWE-based calculators at runtimes that were only 14% higher than those of a simple Elmore-delay calculator.

01 Jan 2004
TL;DR: DVFS technique for MPEG decoding to reduce the energy consumption using the computational workload decomposition and a prediction error compensation method, called inter-frame compensation, is proposed in which the on-chip- workload prediction error is diffused into subsequenrpames such that time frame rates change smoothly.
Abstract: paper describes n dynamic voltage and frequency scaling (DVFS) technique for MPEG decoding to reduce the energy consumption using the computational workload decomposition. This technique decomposes the workload for decoding a frame into on- chip and off-chip workloads. The execution time required for the on- chip workload is CPU frequency-dependent, whereas the off-chip workload aecution time does not change, regardless of the CPU frequency, resulting in the maximum energv savings by setting the minimum frequency during off-chip workload execution time, without causing any delay penalty. This workload decomposition is performed using a performance-monitoring unit (PMU) in the XScale-processor, which provides various statistics such as cache hit/miss and CPUstall, due to data dependency at nm time. The on- chip workload for an incoming frame is predicted using a frame- hased history so that the processor voltage and frequency can be scaled to provide the exact amount of computing power needed to decode theframe. To guarantee a quality of service (QoS) constraint, a prediction error compensation method, called inter-frame compensation, is proposed in which the on-chip- workload prediction error is diffused into subsequenrpames such that mn time frame rates change smoothly. The proposed DVFS algorithm has been implemented on an XScrrle-based Testbed. Detailed current measurements on this platform demonstrate signifirant CPU energv savings ranging from 50% to 80% depending on the video clip.

01 Jan 2004
TL;DR: In this paper, a backlight power management framework and trade-offs in the extended dynamic-luminancescaling design space in terms of energy reduction, performance penalty, and image quality are explored.
Abstract: Editors’ note: Thin-film transistor liquid-crystal displays are systems widely used to support full-featured multimedia. For such systems, backlight is a major source of power dissipation. This article introduces a backlight power management framework and explores trade-offs in the extended dynamic-luminancescaling design space in terms of energy reduction, performance penalty, and image quality.

Proceedings Article•DOI•
16 Feb 2004
TL;DR: A dynamic energy management policy is presented for a wireless video streaming system, consisting of battery-powered client and server, such that the maximum system lifetime is achieved while satisfying a given minimum video quality requirement.
Abstract: This paper presents a dynamic energy management policy for a wireless video streaming system, consisting of battery-powered client and server. The paper starts from the observation that the video quality in wireless streaming is a function of three factors: encoding aptitude of the server, decoding aptitude of the client, and the wireless channel. Based on this observation, the energy consumption of a wireless video streaming system is modeled and analyzed. Using the proposed model, the optimal energy assignment to each video frame is done such that the maximum system lifetime is achieved while satisfying a given minimum video quality requirement. Experimental results show that the proposed policy increases the system lifetime by 20%.