scispace - formally typeset
Search or ask a question

Showing papers in "ACM Transactions in Embedded Computing Systems in 2002"


Journal ArticleDOI
TL;DR: This article presents a compiler strategy that automatically partitions the data among the memory units, and shows that this strategy is optimal, relative to the profile run, among all static partitions for global and stack data.
Abstract: This article presents a technique for the efficient compiler management of software-exposed heterogeneous memory. In many lower-end embedded chips, often used in microcontrollers and DSP processors, heterogeneous memory units such as scratch-pad SRAM, internal DRAM, external DRAM, and ROM are visible directly to the software, without automatic management by a hardware caching mechanism. Instead, the memory units are mapped to different portions of the address space. Caches are avoided due to their cost and power consumption, and because they make it difficult to guarantee real-time performance. For this important class of embedded chips, the allocation of data to different memory units to maximize performance is the responsibility of the software.Current practice typically leaves it to the programmer to partition the data among different memory units. We present a compiler strategy that automatically partitions the data among the memory units. We show that this strategy is optimal, relative to the profile run, among all static partitions for global and stack data. For the first time, our allocation scheme for stacks distributes the stack among multiple memory units. For global and stack data, the scheme is provably equal to or better than any other compiler scheme or set of programmer annotations. Results from our benchmarks show a 44.2p reduction in runtime from using our distributed stack strategy vs. using a unified stack, and a further 11.8p reduction in runtime from using a linear optimization strategy for allocation vs. a simpler greedy strategy; both in the case of the SRAM size being 20p of the total data size. For some programs, less than 5p of data in SRAM achieves a similar speedup.

338 citations


Journal ArticleDOI
TL;DR: This work demonstrates that a small number of distinct values tend to occur very frequently in memory and demonstrates that the identity of frequent values remains stable over the entire execution of the program and these values are scattered fairly uniformly across the allocated memory.
Abstract: By analyzing the behavior of a set of benchmarks, we demonstrate that a small number of distinct values tend to occur very frequently in memory. On an average, only eight of these frequent values were found to occupy 48p of memory locations for the benchmarks studied. In addition, we demonstrate that the identity of frequent values remains stable over the entire execution of the program and these values are scattered fairly uniformly across the allocated memory. We present three different algorithms for finding frequent values and experimentally demonstrate their effectiveness. Each of these algorithms is designed to suit a different application scenario. Since the contents of memory exhibit frequent value locality, it is expected that frequent values will be observed in data streams that flow across different points in the memory hierarchy. We exploit this observation for developing two low-power designs: a low-power level-one data cache and a low-power external data bus. In each of these applications a different form of encoding of frequent values is employed to obtain a low-power design. We also experimentally demonstrate the effectiveness of these designs.

92 citations


Journal ArticleDOI
TL;DR: This article presents a GC-controlled leakage energy optimization technique that shuts off memory banks that do not hold live data and examines its sensitivity to different parameters such as bank size, the garbage collection frequency, object allocation style, compaction style, and compaction frequency.
Abstract: Java has been widely adopted as one of the software platforms for the seamless integration of diverse computing devices. Over the last year, there has been great momentum in adopting Java technology in devices such as cellphones, PDAs, and pagers where optimizing energy consumption is critical. Since, traditionally, the Java virtual machine (JVM), the cornerstone of Java technology, is tuned for performance, taking into account energy consumption requires reevaluation, and possibly redesign of the virtual machine. This motivates us to tune specific components of the virtual machine for a battery-operated architecture. As embedded JVMs are designed to run for long periods of time on limited-memory embedded systems, creating and managing Java objects is of critical importance. The garbage collector (GC) is an important part of the JVM responsible for the automatic reclamation of unused memory. This article shows that the GC is not only important for limited-memory systems but also for energy-constrained architectures.This article focuses on tuning the GC to reduce energy consumption in a multibanked memory architecture. Tuning the GC is important not because it consumes a sizeable portion of overall energy during execution, but because it influences the energy consumed in the memory during application execution. In particular, we present a GC-controlled leakage energy optimization technique that shuts off memory banks that do not hold live data. Using two different commercial GCs and a suite of thirteen mobile applications, we evaluate the effectiveness of the GC-controlled energy optimization technique and study its sensitivity to different parameters such as bank size, the garbage collection frequency, object allocation style, compaction style, and compaction frequency. We observe that the energy consumption of an embedded Java application can be significantly more if the GC parameters are not tuned appropriately. Further, we notice that the object allocation pattern and the number of memory banks available in the underlying architecture are limiting factors on how effectively GC parameters can be used to optimize the memory energy consumption.

38 citations


Journal ArticleDOI
TL;DR: This article presents the design of a simple hardware-controlled, high performance cache system that offers high performance with low power consumption and low hardware cost, and a simple dynamic fetching mechanism with different fetch sizes.
Abstract: This article presents the design of a simple hardware-controlled, high performance cache system. The design supports fast access time, optimal utilization of temporal and spatial localities adaptive to given applications, and a simple dynamic fetching mechanism with different fetch sizes. Support for dynamically varying the fetch size makes the cache equally effective for general-purpose as well as multimedia applications. Our cache organization and operational mechanism are especially designed to maximize temporal locality and spatial locality, selectively and adaptively. Simulation shows that the average memory access time of the proposed cache is equal to that of a conventional direct-mapped cache with eight times as much space. In addition, the simulations show that our cache achieves better performance than a 2-way or 4-way set associative cache with twice as much space. The average miss ratio, compared with the victim cache with 32-byte block size, is improved by about 41p or 60p for general applications and multimedia applications, respectively. It is also shown that power consumption of the proposed cache is around 10p to 60p lower than other cache systems that we examine. Our cache system thus offers high performance with low power consumption and low hardware cost.

12 citations


Journal ArticleDOI
TL;DR: In this inaugural issue, a variety of papers are presented to cover the breadth and depth of the intended scope of JAIT, including a survey paper by Khan et al. on machine learning techniques for text document classification.
Abstract: In this inaugural issue, we present a variety of papers to cover, as much as possible, the breadth and depth of the intended scope of JAIT. We begin with a survey paper by Khan et al. on machine learning techniques for text document classification. With a proliferation of electronic documents on the web and elsewhere, it is increasingly important to be able to classify such e-documents for proper management. Their paper presents a timely review on some of the more prominent theories and methods of document classification and text mining for e-documents.

10 citations


Journal ArticleDOI
TL;DR: This paper presents a new exploration and optimization method at the system level to select customized implementations for dynamic data sets, as encountered in telecom network, database, and multimedia applications, and enables to further raise the abstraction level of the initial specification.
Abstract: We present a new exploration and optimization method at the system level to select customized implementations for dynamic data sets, as encountered in telecom network, database, and multimedia applications. Our method fits in the context of embedded system synthesis for such applications, and enables to further raise the abstraction level of the initial specification, where dynamic data sets can be specified without low-level details. Our method is suited for hardware and software implementations. In this paper, it mainly aims at minimizing the average memory power, although it can also be driven by other cost functions such as memory size and performance. Compared with existing methods, for large dynamic data sets, it can save up to 90p of the average memory power, while still saving up to 80p of the average memory size.

2 citations


Journal ArticleDOI
TL;DR: Embedded systems differ from general-purpose systems in two main aspects: while generalpurpose systems run a myriad of unrelated software packages, each with potentially very different performance requirements and dynamic behaviors compared to the rest, embedded systems perform a single function their entire lifetime, and thus execute the same code day in, day out.
Abstract: Embedded systems differ from general-purpose systems in two main aspects. First, the two systems are designed for very different purposes: while generalpurpose systems run a myriad of unrelated software packages, each with potentially very different performance requirements and dynamic behaviors compared to the rest, embedded systems perform a single function their entire lifetime, and thus execute the same code day in, day out, until the system is discarded or a software upgrade is performed. Second, while performance is the primary (in many instances, the only) quality by which a general-purpose system is judged, optimal embedded-system designs usually represent trade-offs between several goals, including manufacturing costs (e.g., die area, testability, etc.), energy consumption, and performance. As a result, we see two very different design strategies. General-purpose systems are typically overbuilt; by definition, they are expected by the consumer to run all possible software applications thrown at them. Such systems are designed to handle the average case very well and the worst case at least tolerably well. Were they optimized for any particular task, they would likely become less-than-optimal for all dissimilar tasks; therefore, generalpurpose systems are optimized for nothing in particular. They make up for this in raw performance and pure number-crunching. The average notebook computer is capable of performing orders of magnitude more operations per second than that required by a word processor or email client— tasks to which the average notebook is frequently relegated—but because the general-purpose system may be expected to handle virtually anything at any time, it must have the number-crunching ability of a supercomputer, just in case. On the other hand, because embedded systems are expected to handle only one task, it is not only possible but highly beneficial to optimize an embedded design for its one task. Thus, if general-purpose systems are overbuilt, the goal for an embedded system is to be appropriately built. In addition, since effort spent at design time is amortized over the life of a product, and many embedded systems have long lifetimes (tens of years), many embedded design houses will expend significant resources up front to optimize a design, employing techniques not generally used in general-purpose systems (e.g., compiler optimizations that require many days or weeks to perform). One of the most critical resources in embedded systems, one that receives much of the attention of embedded-system engineers, CAD tool designers, compiler writers, and researchers, is the memory resource. Memory, whether SRAM or DRAM, usually represents one of the more costly components in an embedded system, especially if the memory is located on-CPU, since once the CPU is fabricated, the memory size cannot be increased. In nearly all system-on-chip designs, and in many microcontrollers as well, memory accounts for the lion’s share of available die area; moreover, memory is one of the primary consumers

1 citations