scispace - formally typeset
Search or ask a question
Author

Song Liu

Bio: Song Liu is an academic researcher from Northwestern University. The author has contributed to research in topics: Dram & Energy consumption. The author has an hindex of 6, co-authored 9 publications receiving 561 citations.

Papers
More filters
Proceedings ArticleDOI
05 Mar 2011
TL;DR: Flikker exposes and leverages an interesting trade-off between energy consumption and hardware correctness, and shows that many applications are naturally tolerant to errors in the non-critical data, and in the vast majority of cases, the errors have little or no impact on the application's final outcome.
Abstract: Energy has become a first-class design constraint in computer systems. Memory is a significant contributor to total system power. This paper introduces Flikker, an application-level technique to reduce refresh power in DRAM memories. Flikker enables developers to specify critical and non-critical data in programs and the runtime system allocates this data in separate parts of memory. The portion of memory containing critical data is refreshed at the regular refresh-rate, while the portion containing non-critical data is refreshed at substantially lower rates. This partitioning saves energy at the cost of a modest increase in data corruption in the non-critical data. Flikker thus exposes and leverages an interesting trade-off between energy consumption and hardware correctness. We show that many applications are naturally tolerant to errors in the non-critical data, and in the vast majority of cases, the errors have little or no impact on the application's final outcome. We also find that Flikker can save between 20-25% of the power consumed by the memory sub-system in a mobile device, with negligible impact on application performance. Flikker is implemented almost entirely in software, and requires only modest changes to the hardware.

457 citations

Proceedings ArticleDOI
12 Feb 2011
TL;DR: This work develops a thermal model to estimate the temperature of DRAM chips and proposes three hardware and software schemes to reduce peak temperatures, and introduces a new cache line replacement policy that reduces the number of accesses to the overheatingDRAM chips.
Abstract: The performance of the main memory is an important factor on overall system performance. To improve DRAM performance, designers have been increasing chip densities and the number of memory modules. However, these approaches increase power consumption and operating temperatures: temperatures in existing DRAM modules can rise to over 95°C. Another important property of DRAM temperature is the large variation in DRAM chip temperatures. In this paper, we present our analysis collected from measurements on a real system indicating that temperatures across DRAM chips can vary by over 10°C. This work aims to minimize this variation as well as the peak DRAM temperature. We first develop a thermal model to estimate the temperature of DRAM chips and validate this model against real temperature measurements. We then propose three hardware and software schemes to reduce peak temperatures. The first technique introduces a new cache line replacement policy that reduces the number of accesses to the overheating DRAM chips. The second technique utilizes a Memory Write Buffer to improve the access efficiency of the overheated chips. The third scheme intelligently allocates pages to relatively cooler ranks of the DIMM. Our experiments show that in a high performance memory system, our schemes reduce the peak DRAM chip temperature by as much as 8.39°C over 10 workloads (5.36°C on average). Our schemes also improve performance mainly due to reduction in thermal emergencies: for a baseline system with memory bandwidth throttling scheme, the IPC is improved by as much as 15.8% (4.1% on average).

56 citations

07 Oct 2009
TL;DR: Flicker explores a novel and interesting trade-off between energy consumption and hardware correctness, and shows that many mobile applications are naturally tolerant to errors in the non-critical data, and in the vast majority of cases, the errors have little or no impact on the application’s final outcome.
Abstract: Mobile devices are left in sleep mode for long periods of time But even while in sleep mode, the contents of DRAM memory need to be periodically refreshed, which consumes a significant fraction of power in mobile devices This paper introduces Flicker, an application-level technique to reduce refresh power in DRAM memories Flicker enables developers to specify critical and non-critical data in programs and the runtime system allocates this data in separate parts of memory The portion of memory containing critical data is refreshed at the regular refresh-rate, while the portion containing non-critical data is refreshed at substantially lower rates This saves energy at the cost of a modest increase in data corruption in the non-critical data Flicker thus explores a novel and interesting trade-off between energy consumption and hardware correctness We show that many mobile applications are naturally tolerant to errors in the non-critical data, and in the vast majority of cases, the errors have little or no impact on the application’s final outcome We also find that Flicker can save between 20-25% of the power consumed by the memory subsystem in a mobile device, with negligible impact on application performance Flicker is implemented almost entirely in software, and requires only modest changes to the application, operating system and hardware

30 citations

Proceedings ArticleDOI
08 Jun 2008
TL;DR: The experiments show that a system with a 64-entry PHA-WB could reduce the total DRAM power consumption by up to 22.0% (9.6% on average) and the peak and average temperature reductions are 6.1degC and 2.2degC, respectively.
Abstract: Technological advances enable modern processors to utilize increasingly larger DRAMs with rising access frequencies. This is leading to high power consumption and operating temperature in DRAM chips. As a result, temperature management has become a real and pressing issue in high performance DRAM systems. Traditional low power techniques are not suitable for high performance DRAM systems with high bandwidth. In this paper, we propose and evaluate a customized DRAM low power technique based on page hit aware write buffer (PHA-WB). Our proposed approach reduces DRAM system power consumption and temperature without any performance penalty. Our experiments show that a system with a 64-entry PHA-WB could reduce the total DRAM power consumption by up to 22.0% (9.6% on average). The peak and average temperature reductions are 6.1degC and 2.1degC, respectively.

25 citations

Proceedings ArticleDOI
07 Jun 2008
TL;DR: This paper combines the throughput-aware page-hit-aware write buffer (TAP) with low-power-state-based techniques for further power and temperature reduction, namely, TAP-low, which reduces DRAM system power consumption and temperature without any performance penalty.
Abstract: With rising capacities and higher accessing frequencies, high-performance DRAMs are providing increasing memory access bandwidth to the processors. However, the increasing DRAM performance comes with the price of higher power consumption and temperature in DRAM chips. Traditional low power approaches for DRAM systems focus on utilizing low power modes, which is not always suitable for high performance systems. Existing DRAM temperature management techniques, on the other hand, utilize generic temperature management methods inherited from those applied on processor cores. These methods reduce DRAM temperature by controlling the number of DRAM accesses, similar to throttling the processor core, which incurs significant performance penalty. In this paper, we propose a customized low power technique for high performance DRAM systems, namely the Page Hit Aware Write Buffer (PHA-WB). The PHA-WB improves DRAM page hit rate by buffering write operations that may incur page misses. This approach reduces DRAM system power consumption and temperature without any performance penalty. Our proposed Throughput-Aware PHA-WB (TAP) dynamically configures the write buffer for different applications and workloads, thus achieves the best trade off between DRAM power reduction and buffer power overhead. Our experiments show that a system with TAP could reduce the total DRAM power consumption by up to 18.36% (8.64% on average). The steady-state temperature can be reduced by as much as 5.10°C and by 1.93°C on average across eight representative workloads.

15 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A survey of techniques for approximate computing (AC), which discusses strategies for finding approximable program portions and monitoring output quality, techniques for using AC in different processing units, processor components, memory technologies, and so forth, as well as programming frameworks for AC.
Abstract: Approximate computing trades off computation quality with effort expended, and as rising performance demands confront plateauing resource budgets, approximate computing has become not merely attractive, but even imperative. In this article, we present a survey of techniques for approximate computing (AC). We discuss strategies for finding approximable program portions and monitoring output quality, techniques for using AC in different processing units (e.g., CPU, GPU, and FPGA), processor components, memory technologies, and so forth, as well as programming frameworks for AC. We classify these techniques based on several key characteristics to emphasize their similarities and differences. The aim of this article is to provide insights to researchers into working of AC techniques and inspire more efforts in this area to make AC the mainstream computing approach in future systems.

890 citations

Journal ArticleDOI
04 Jun 2011
TL;DR: EnerJ is developed, an extension to Java that adds approximate data types and a hardware architecture that offers explicit approximate storage and computation and allows a programmer to control explicitly how information flows from approximate data to precise data.
Abstract: Energy is increasingly a first-order concern in computer systems. Exploiting energy-accuracy trade-offs is an attractive choice in applications that can tolerate inaccuracies. Recent work has explored exposing this trade-off in programming models. A key challenge, though, is how to isolate parts of the program that must be precise from those that can be approximated so that a program functions correctly even as quality of service degrades.We propose using type qualifiers to declare data that may be subject to approximate computation. Using these types, the system automatically maps approximate variables to low-power storage, uses low-power operations, and even applies more energy-efficient algorithms provided by the programmer. In addition, the system can statically guarantee isolation of the precise program component from the approximate component. This allows a programmer to control explicitly how information flows from approximate data to precise data. Importantly, employing static analysis eliminates the need for dynamic checks, further improving energy savings. As a proof of concept, we develop EnerJ, an extension to Java that adds approximate data types. We also propose a hardware architecture that offers explicit approximate storage and computation. We port several applications to EnerJ and show that our extensions are expressive and effective; a small number of annotations lead to significant potential energy savings (10%-50%) at very little accuracy cost.

680 citations

Proceedings ArticleDOI
01 Dec 2012
TL;DR: A programming model is defined that allows programmers to identify approximable code regions -- code that can produce imprecise but acceptable results and is faster and more energy efficient than executing the original code.
Abstract: This paper describes a learning-based approach to the acceleration of approximate programs. We describe the \emph{Parrot transformation}, a program transformation that selects and trains a neural network to mimic a region of imperative code. After the learning phase, the compiler replaces the original code with an invocation of a low-power accelerator called a \emph{neural processing unit} (NPU). The NPU is tightly coupled to the processor pipeline to accelerate small code regions. Since neural networks produce inherently approximate results, we define a programming model that allows programmers to identify approximable code regions -- code that can produce imprecise but acceptable results. Offloading approximable code regions to NPUs is faster and more energy efficient than executing the original code. For a set of diverse applications, NPU acceleration provides whole-application speedup of 2.3x and energy savings of 3.0x on average with quality loss of at most 9.6%.

532 citations

Journal ArticleDOI
09 Jun 2012
TL;DR: This paper proposes RAIDR (Retention-Aware Intelligent DRAM Refresh), a low-cost mechanism that can identify and skip unnecessary refreshes using knowledge of cell retention times and group DRAM rows into retention time bins and apply a different refresh rate to each bin.
Abstract: Dynamic random-access memory (DRAM) is the building block of modern main memory systems. DRAM cells must be periodically refreshed to prevent loss of data. These refresh operations waste energy and degrade system performance by interfering with memory accesses. The negative effects of DRAM refresh increase as DRAM device capacity increases. Existing DRAM devices refresh all cells at a rate determined by the leakiest cell in the device. However, most DRAM cells can retain data for significantly longer. Therefore, many of these refreshes are unnecessary. In this paper, we propose RAIDR (Retention-Aware Intelligent DRAM Refresh), a low-cost mechanism that can identify and skip unnecessary refreshes using knowledge of cell retention times. Our key idea is to group DRAM rows into retention time bins and apply a different refresh rate to each bin. As a result, rows containing leaky cells are refreshed as frequently as normal, while most rows are refreshed less frequently. RAIDR uses Bloom filters to efficiently implement retention time bins. RAIDR requires no modification to DRAM and minimal modification to the memory controller. In an 8-core system with 32 GB DRAM, RAIDR achieves a 74.6% refresh reduction, an average DRAM power reduction of 16.1%, and an average system performance improvement of 8.6% over existing systems, at a modest storage overhead of 1.25 KB in the memory controller. RAIDR's benefits are robust to variation in DRAM system configuration, and increase as memory capacity increases.

520 citations

Proceedings ArticleDOI
21 Apr 2013
TL;DR: It is shown that an optimized, equal capacity STT-RAM main memory can provide performance comparable to DRAM main memory, with an average 60% reduction in main memory energy.
Abstract: In this paper, we explore the possibility of using STT-RAM technology to completely replace DRAM in main memory. Our goal is to make STT-RAM performance comparable to DRAM while providing substantial power savings. Towards this goal, we first analyze the performance and energy of STT-RAM, and then identify key optimizations that can be employed to improve its characteristics. Specifically, using partial write and row buffer write bypass, we show that STT-RAM main memory performance and energy can be significantly improved. Our experiments indicate that an optimized, equal capacity STT-RAM main memory can provide performance comparable to DRAM main memory, with an average 60% reduction in main memory energy.

478 citations