scispace - formally typeset
Search or ask a question
Author

Kimish Patel

Bio: Kimish Patel is an academic researcher from University of Southern California. The author has contributed to research in topics: Cache & Cache-only memory architecture. The author has an hindex of 9, co-authored 21 publications receiving 249 citations. Previous affiliations of Kimish Patel include Nvidia & Polytechnic University of Turin.

Papers
More filters
Book ChapterDOI
21 Sep 2005
TL;DR: A value-based selective refresh scheme in which both horizontal and vertical clusters of zeros are identified and used to selectively deactivated refresh of such clusters achieves a net reduction of the number of refresh operations on average of 31%, evaluated on a set of typical embedded applications.
Abstract: DRAM idle power consumption consists for a large part of the power required for the refresh operation. This is exacerbated by (i) increasing amount of memory devoted to cache, that filter out many accesses to DRAM, and (ii) increased temperature of the chips, which increase leakage and thus data retention times. The well-known structured distribution of zeros in a memory, combined with the observation that cells containing zeros in a DRAM do not require to be refreshed, can be constructively used together to reduce the unnecessary number of required refresh operations. We propose a value-based selective refresh scheme in which both horizontal and vertical clusters of zeros are identified and used to selectively deactivated refresh of such clusters. As a result, our technique significantly achieves a net reduction of the number of refresh operations on average of 31%, evaluated on a set of typical embedded applications.

27 citations

Proceedings ArticleDOI
04 Oct 2006
TL;DR: An effective dynamic thermal management (DTM) scheme for MPEG-2 decoding by allowing some degree of spatiotemporal quality degradation in order to make sure that the microprocessor chip continues to stay in a thermally safe state of operation, albeit with certain amount of image/video quality loss.
Abstract: In this paper, we propose an effective dynamic thermal management (DTM) scheme for MPEG-2 decoding by allowing some degree of spatiotemporal quality degradation. Given a target MPEG-2 decoding time, we dynamically select either an intra-frame spatial degradation or an inter-frame temporal degradation strategy in order to make sure that the microprocessor chip will continue to stay in a thermally safe state of operation, albeit with certain amount of image/video quality loss. For our experiments, we use the MPEG-2 decoder program of MediaBench and modify/combine Wattch and HotSpot for the power and thermal simulations and measurements, respectively. Our experimental results show that we achieve thermally safe state with spatial quality degradation of 0.12 Root Mean Square Error (RMSE) and with frame drop rate of 12.5% on average.

27 citations

Journal ArticleDOI
TL;DR: Static and dynamic diffusion imaging metrics show correlation with conventional imaging scores, reveal spatial heterogeneity, and provide means to differentiate dermatomyositis patients from controls.
Abstract: The original version of this article, published on 04 June 2018, unfortunately contained a mistake.

24 citations

Journal ArticleDOI
TL;DR: In this article, a dynamic thermal management (DTM) algorithm based on accurate estimation of the workload of frames in a group of pictures (GOP) in an MPEG-2 video stream and slack borrowing across the GOP frames in order to achieve a thermally safe state of operation in microprocessors during the video decoding process is presented.
Abstract: In this paper, we present a dynamic thermal management (DTM) algorithm based on: 1) accurate estimation of the workload of frames in a group of pictures (GOP) in an MPEG-2 video stream and 2) slack borrowing across the GOP frames in order to achieve a thermally safe state of operation in microprocessors during the video decoding process. The proposed DTM algorithm employs dynamic voltage and frequency scaling (DVFS) while considering the frame-rate-dependent GOP deadline, variance of the frame decoding times within the GOP, and a maximum chip temperature constraint. If it becomes necessary to sacrifice video quality or violate the GOP deadline due to a low temperature bound, then the (intra-frame) spatial quality degradation and the (inter-frame) temporal quality degradation will be applied to the GOP. Experimental results demonstrate the competence and efficiency of the proposed online DTM algorithm.

24 citations

Proceedings ArticleDOI
02 Oct 2005
TL;DR: The proposed architecture allows to dynamically update the locality information, and, unlike previous approaches, it works virtually independent of the size and position of the updates of the display frames.
Abstract: We propose a technique to reduce the energy consumption of the frame buffer memory, based on the spatial locality of images and display frames. Our scheme reduces energy by selectively avoiding reads from the frame buffer when identical adjacent pixels are detected. This is made possible by using an auxiliary memory that stores the locality information. The proposed architecture allows to dynamically update the locality information, and, unlike previous approaches, it works virtually independent of the size and position of the updates of the display frames. Experimental results evaluated on a set of typical graphical applications show a reduction of about 40% of frame buffer reads.

22 citations


Cited by
More filters
Journal ArticleDOI
09 Jun 2012
TL;DR: This paper proposes RAIDR (Retention-Aware Intelligent DRAM Refresh), a low-cost mechanism that can identify and skip unnecessary refreshes using knowledge of cell retention times and group DRAM rows into retention time bins and apply a different refresh rate to each bin.
Abstract: Dynamic random-access memory (DRAM) is the building block of modern main memory systems. DRAM cells must be periodically refreshed to prevent loss of data. These refresh operations waste energy and degrade system performance by interfering with memory accesses. The negative effects of DRAM refresh increase as DRAM device capacity increases. Existing DRAM devices refresh all cells at a rate determined by the leakiest cell in the device. However, most DRAM cells can retain data for significantly longer. Therefore, many of these refreshes are unnecessary. In this paper, we propose RAIDR (Retention-Aware Intelligent DRAM Refresh), a low-cost mechanism that can identify and skip unnecessary refreshes using knowledge of cell retention times. Our key idea is to group DRAM rows into retention time bins and apply a different refresh rate to each bin. As a result, rows containing leaky cells are refreshed as frequently as normal, while most rows are refreshed less frequently. RAIDR uses Bloom filters to efficiently implement retention time bins. RAIDR requires no modification to DRAM and minimal modification to the memory controller. In an 8-core system with 32 GB DRAM, RAIDR achieves a 74.6% refresh reduction, an average DRAM power reduction of 16.1%, and an average system performance improvement of 8.6% over existing systems, at a modest storage overhead of 1.25 KB in the memory controller. RAIDR's benefits are robust to variation in DRAM system configuration, and increase as memory capacity increases.

520 citations

Proceedings ArticleDOI
05 Mar 2011
TL;DR: Flikker exposes and leverages an interesting trade-off between energy consumption and hardware correctness, and shows that many applications are naturally tolerant to errors in the non-critical data, and in the vast majority of cases, the errors have little or no impact on the application's final outcome.
Abstract: Energy has become a first-class design constraint in computer systems. Memory is a significant contributor to total system power. This paper introduces Flikker, an application-level technique to reduce refresh power in DRAM memories. Flikker enables developers to specify critical and non-critical data in programs and the runtime system allocates this data in separate parts of memory. The portion of memory containing critical data is refreshed at the regular refresh-rate, while the portion containing non-critical data is refreshed at substantially lower rates. This partitioning saves energy at the cost of a modest increase in data corruption in the non-critical data. Flikker thus exposes and leverages an interesting trade-off between energy consumption and hardware correctness. We show that many applications are naturally tolerant to errors in the non-critical data, and in the vast majority of cases, the errors have little or no impact on the application's final outcome. We also find that Flikker can save between 20-25% of the power consumed by the memory sub-system in a mobile device, with negligible impact on application performance. Flikker is implemented almost entirely in software, and requires only modest changes to the hardware.

457 citations

Patent
06 Oct 2009
TL;DR: In this article, a graphics processing system includes a graphics processor and a memory for storing data to be used by and generated by the graphics processor, which is used in a subsequent rendering pass.
Abstract: A graphics processing system includes a graphics processor and a memory for storing data to be used by and generated by the graphics processor. In a first rendering pass, the graphics processor generates an array of graphics data and stores the generated array of graphics data in the memory. The array of graphics data generated in the first rendering pass is used in a subsequent rendering pass. In the first rendering pass, the graphics processor determines one or more regions of the array of graphics data that have a particular characteristic, and generates information indicative of the one or more regions. In the subsequent rendering pass, the graphics processor uses the information indicative of the one or more regions to control the reading of the array of graphics data when it is to be used in the subsequent rendering pass.

337 citations

Journal ArticleDOI
TL;DR: The overall objective of this survey is to give microprocessor designers a broad perspective on various aspects of designing thermal-aware microprocessors and to guide future thermal management studies.
Abstract: Microprocessor design has recently encountered many constraints such as power, energy, reliability, and temperature. Among these challenging issues, temperature-related issues have become especially important within the past several years. We summarize recent thermal management techniques for microprocessors, focusing on those that affect or rely on the microarchitecture. We categorize thermal management techniques into six main categories: temperature monitoring, microarchitectural techniques, floorplanning, OS/compiler techniques, liquid cooling techniques, and thermal reliability/security. Temperature monitoring, a requirement for Dynamic Thermal Management (DTM), includes temperature estimation and sensor placement techniques for accurate temperature measurement or estimation. Microarchitectural techniques include both static and dynamic thermal management techniques that control hardware structures. Floorplanning covers a range of thermal-aware floorplanning techniques for 2D and 3D microprocessors. OS/compiler techniques include thermal-aware task scheduling and instruction scheduling techniques. Liquid cooling techniques are higher-capacity alternatives to conventional air cooling techniques. Thermal reliability/security issues cover temperature-dependent reliability modeling, Dynamic Reliability Management (DRM), and malicious codes that specifically cause overheating. Temperature-related issues will only become more challenging as process technology continues to evolve and transistor densities scale up faster than power per transistor scales down. The overall objective of this survey is to give microprocessor designers a broad perspective on various aspects of designing thermal-aware microprocessors and to guide future thermal management studies.

201 citations

Proceedings ArticleDOI
05 Nov 2007
TL;DR: It is proved that the problem of performance optimization for a set of periodic tasks with discrete voltage/frequency states under thermal constraints is NP-hard, and a pseudo-polynomial optimal algorithm and a fully polynomial time approximation technique (FPTAS) are presented.
Abstract: The paper addresses the problem of performance optimization for a set of periodic tasks with discrete voltage/frequency states under thermal constraints. We prove that the problem is NP-hard, and present a pseudo-polynomial optimal algorithm and a fully polynomial time approximation technique (FPTAS) for the problem. The FPTAS technique is able to generate solutions in polynomial time that are guaranteed to be within a designer specified quality bound (QB) (say within 1% of the optimal). We evaluate our techniques by experimentation with multimedia and synthetic benchmarks mapped on the 70 nm CMOS technology processor. The experimental results demonstrate our techniques are able to match optimal solutions when QB is set at 5%, can generate solutions that arc quite close to optimal ( 25%) for large task sets with 120 nodes (while the optimal solution takes several hundred seconds). We also analyze the effect of different thermal parameters, such as the initial temperature, the final temperature and the thermal resistance.

181 citations