Conference

Compilers, Architecture, and Synthesis for Embedded Systems

About: Compilers, Architecture, and Synthesis for Embedded Systems is an academic conference. The conference publishes majorly in the area(s): Compiler & Cache. Over the lifetime, 549 publications have been published by the conference receiving 14593 citations.

...read moreread less

Topics: Compiler, Cache, Cache pollution, Software, Scheduling (computing) ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

CFLRU: a replacement algorithm for flash memory

[...]

Seon-Yeong Park¹, Dawoon Jung¹, Jeong-Uk Kang¹, Jin-Soo Kim¹, Joonwon Lee¹ - Show less +1 more•Institutions (1)

KAIST¹

22 Oct 2006

TL;DR: The Clean-First LRU (CFLRU) replacement algorithm is proposed that exploits the characteristics of flash memory and reduces the average replacement cost by 28.4% in swap system and by 26.2% in buffer cache, compared with LRU algorithm.

...read moreread less

Abstract: In most operating systems which are customized for disk-based storage system, the replacement algorithm concerns only the number of memory hits. However, flash memory has different read and write cost in the aspects of time and energy so the replacement algorithm with flash memory should consider not only the hit count but also the replacement cost caused by selecting dirty victims. The replacement cost of dirty page is higher than that of clean page with regard to both access time and energy consumption. In this paper, we propose the Clean-First LRU (CFLRU) replacement algorithm that exploits the characteristics of flash memory. CFLRU splits the LRU list into the working region and the clean-first region and adopts a policy that evicts clean pages preferentially in the clean-first region until the number of page hits in the working region is preserved in a suitable level. Using the trace-driven simulation, the proposed algorithm reduces the average replacement cost by 28.4% in swap system and by 26.2% in buffer cache, compared with LRU algorithm. We also implement the CFLRU algorithm in the Linux kernel and present some optimization issues.

...read moreread less

434 citations

Proceedings Article•DOI•

Computation offloading to save energy on handheld devices: a partition scheme

[...]

Zhiyuan Li¹, Cheng Wang¹, Rong Xu¹•Institutions (1)

Purdue University¹

16 Nov 2001

TL;DR: Based on profiling information on computation time and data sharing at the level of procedure calls, a cost graph is constructed for a given application program and a partition scheme is applied to statically divide the program into server tasks and client tasks such that the energy consumed by the program is minimized.

...read moreread less

Abstract: We consider handheld computing devices which are connected to a server (or a powerful desktop machine) via a wireless LAN On such devices, it is often possible to save the energy on the handheld by offloading its computation to the server In this work, based on profiling information on computation time and data sharing at the level of procedure calls, we construct a cost graph for a given application program We then apply a partition scheme to statically divide the program into server tasks and client tasks such that the energy consumed by the program is minimized Experiments are performed on a suite of multimedia benchmarks Results show considerable energy saving for several programs through offloading

...read moreread less

309 citations

Proceedings Article•DOI•

Compiler-decided dynamic memory allocation for scratch-pad based embedded systems

[...]

Sumesh Udayakumaran¹, Rajeev Barua¹•Institutions (1)

University of Maryland, College Park¹

30 Oct 2003

TL;DR: A dynamic allocation method for global and stack data that accounts for changing program requirements at runtime, has no software-caching tags, requires no run-time checks, has extremely low overheads, and yields 100% predictable memory access times is presented.

...read moreread less

Abstract: This paper presents a highly predictable, low overhead and yet dynamic, memory allocation strategy for embedded systems with scratch-pad memory. A scratch-pad is a fast compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees vs cache and by its significantly lower overheads in energy consumption, area and overall runtime, even with a simple allocation scheme [4].Existing scratch-pad allocation methods are of two types. First, software-caching schemes emulate the workings of a hardware cache in software. Instructions are inserted before each load/store to check the software-maintained cache tags. Such methods incur large overheads in runtime, code size, energy consumption and SRAM space for tags and deliver poor real-time guarantees just like hardware caches. A second category of algorithms partitionsm variables at compile-time into the two banks. For example, our previous work in [3] derives a provably optimal static allocation for global and stack variables and achieves a speedup over all earlier methods. However, a drawback of such static allocation schemes is that they do not account for dynamic program behavior. It is easy to see why a data allocation that never changes at runtime cannot achieve the full locality benefits of a cache.In this paper we present a dynamic allocation method for global and stack data that for the first time, (i) accounts for changing program requirements at runtime (ii) has no software-caching tags (iii) requires no run-time checks (iv) has extremely low overheads, and (v) yields 100% predictable memory access times. In this method data that is about to be accessed frequently is copied into the SRAM using compiler-inserted code at fixed and infrequent points in the program. Earlier data is evicted if necessary. When compared to a provably optimal static allocation our results show runtime reductions ranging from 11% to 38%, averaging 31.2%, using no additional hardware support. With hardware support for pseudo-DMA and full DMA, which is already provided in some commercial systems, the runtime reductions increase to 33.4% and 34.2% respectively.

...read moreread less

236 citations

Proceedings Article•DOI•

Process cruise control: event-driven clock scaling for dynamic power management

[...]

Andreas Weissel¹, Frank Bellosa¹•Institutions (1)

University of Erlangen-Nuremberg¹

08 Oct 2002

TL;DR: An energy-aware scheduling policy for non-real-time operating systems that benefits from event counters is proposed and energy measurements of the target architecture under variable load show the advantage of the proposed approach.

...read moreread less

Abstract: Scalability of the core frequency is a common feature of low-power processor architectures. Many heuristics for frequency scaling were proposed in the past to find the best trade-off between energy efficiency and computational performance. With complex applications exhibiting unpredictable behavior these heuristics cannot reliably adjust the operation point of the hardware because they do not know where the energy is spent and why the performance is lost.Embedded hardware monitors in the form of event counters have proven to offer valuable information in the field of performance analysis. We will demonstrate that counter values can also reveal the power-specific characteristics of a thread.In this paper we propose an energy-aware scheduling policy for non-real-time operating systems that benefits from event counters. By exploiting the information from these counters, the scheduler determines the appropriate clock frequency for each individual thread running in a time-sharing environment. A recurrent analysis of the thread-specific energy and performance profile allows an adjustment of the frequency to the behavioral changes of the application. While the clock frequency may vary in a wide range, the application performance should only suffer slightly (e.g. with 10% performance loss compared to the execution at the highest clock speed). Because of the similarity to a car cruise control, we called our scheduling policy Process Cruise Control. This adaptive clock scaling is accomplished by the operating system without any application support.Process Cruise Control has been implemented on the Intel XScale architecture, that offers a variety of frequencies and a set of configurable event counters. Energy measurements of the target architecture under variable load show the advantage of the proposed approach.

...read moreread less

231 citations

Proceedings Article•DOI•

Scalable custom instructions identification for instruction-set extensible processors

[...]

Pan Yu¹, Tulika Mitra¹•Institutions (1)

National University of Singapore¹

22 Sep 2004

TL;DR: This paper presents an efficient algorithm for exact enumeration of all possible candidate instructions given the dataflow graph (DFG) corresponding to a code fragment, and achieves orders of magnitude speedup in enumerating these candidate custom instructions for very large DFGs.

...read moreread less

Abstract: Extensible processors allow addition of application-specific custom instructions to the core instruction set architecture. However, it is computationally expensive to automatically select the optimal set of custom instructions. Therefore, heuristic techniques are often employed to quickly search the design space. In this paper, we present an efficient algorithm for exact enumeration of all possible candidate instructions given the dataflow graph (DFG) corresponding to a code fragment. Even though this is similar to the "subgraph enumeration" problem (which is exponential), we find that most subgraphs are not feasible candidates for various reasons. In fact, the number of candidates is quite small compared to the size of the DFG. Compared to previous approaches, our technique achieves orders of magnitude speedup in enumerating these candidate custom instructions for very large DFGs.

...read moreread less

176 citations

Collapse

Performance

Metrics

549

Papers

14,593

Citations

No. of papers from the Conference in previous years
Year	Papers
2021	2
2020	8
2018	15
2017	22
2016	21
2015	26