scispace - formally typeset
Search or ask a question

Showing papers in "ACM Transactions in Embedded Computing Systems in 2005"


Journal ArticleDOI
TL;DR: APIT, a novel localization algorithm that is range-free, is presented and it is shown that the APIT scheme performs best when an irregular radio pattern and random node placement are considered, and low communication overhead is desired.
Abstract: With the proliferation of location dependent applications in sensor networks, location awareness becomes an essential capability of sensor nodes. Because coarse accuracy is sufficient for most sensor network applications, solutions in range-free localization are being pursued as a cost-effective alternative to more expensive range-based approaches. In this paper, we present APIT, a novel localization algorithm that is range-free. We show that our APIT scheme performs best when an irregular radio pattern and random node placement are considered, and low communication overhead is desired. We compare our work, via extensive simulation, with three state-of-the-art range-free localization schemes to identify the preferable system configurations of each. In addition, we provide insight into the impact of localization accuracy on various location dependent applications and suggestions on improving their performance in the presence of such inaccuracy.

263 citations


Journal ArticleDOI
TL;DR: A method of translating discrete-time Simulink models to Lustre programs is presented, which has been implemented in a prototype tool called S2L and has been used in the context of a European research project to translate two automotive controller models provided by Audi.
Abstract: We present a method of translating discrete-time Simulink models to Lustre programs. Our method consists of three steps: type inference, clock inference, and hierarchical bottom-up translation. In the process, we explain and formalize the typing and timing mechanisms of Simulink. The method has been implemented in a prototype tool called S2L, which has been used in the context of a European research project to translate two automotive controller models provided by Audi.

203 citations


Journal ArticleDOI
TL;DR: A voltage allocation technique is proposed that produces a feasible task schedule with optimal processor energy consumption and is based on an efficient linear programming (LP) formulation, which solves the allocation problems optimally in polynomial time.
Abstract: This paper presents important, new results of a study on the problem of task scheduling and voltage allocation in dynamically variable voltage processors, the purpose of which was minimization of processor energy consumption. The contributions are twofold: (1) For given multiple discrete supply voltages and tasks with arbitrary arrival-time/deadline constraints, we propose a voltage allocation technique that produces a feasible task schedule with optimal processor energy consumption. (2) We then extend the problem to include the case in which tasks have nonuniform loads (i.e.; switched) capacitances and solve it optimally. The proposed technique, called Alloc-vt, in (1) is based on the prior results in [Yao, Demers and Shenker. 1995. In Proceedings of IEEE Symposium on Foundations of Computer Science. 374--382] (which is optimal for dynamically continuously variable voltages, but not for discrete ones) and [Ishihara and Yasuura. 1998. In Proceedings of International Symposium on Low Power Electronics and Design. 197--202] (which is optimal for a single task, but not for multiple tasks), whereas the proposed technique, called Alloc-vtcap, in (2) is based on an efficient linear programming (LP) formulation. Both techniques solve the allocation problems optimally in polynomial time.

162 citations


Journal ArticleDOI
TL;DR: A study of 23 programs drawn from Powerstone, MediaBench, and Spec2000 benchmark suites shows that the configurable cache tuned to each program saved energy for every program compared to a conventional four-way set-associative cache as well as compared to an conventional direct-mapped cache, with an average savings of energy related to memory access.
Abstract: Energy consumption is a major concern in many embedded computing systems. Several studies have shown that cache memories account for about 50p of the total energy consumed in these systems. The performance of a given cache architecture is determined, to a large degree, by the behavior of the application executing on the architecture. Desktop systems have to accommodate a very wide range of applications and therefore the cache architecture is usually set by the manufacturer as a best compromise given current applications, technology, and cost. Unlike desktop systems, embedded systems are designed to run a small range of well-defined applications. In this context, a cache architecture that is tuned for that narrow range of applications can have both increased performance as well as lower energy consumption. We introduce a novel cache architecture intended for embedded microprocessor platforms. The cache has three software-configurable parameters that can be tuned to particular applications. First, the cache's associativity can be configured to be direct-mapped, two-way, or four-way set-associative, using a novel technique we call way concatenation. Second, the cache's total size can be configured by shutting down ways. Finally, the cache's line size can be configured to have 16, 32, or 64 bytes. A study of 23 programs drawn from Powerstone, MediaBench, and Spec2000 benchmark suites shows that the configurable cache tuned to each program saved energy for every program compared to a conventional four-way set-associative cache as well as compared to a conventional direct-mapped cache, with an average savings of energy related to memory access of over 40p.

100 citations


Journal ArticleDOI
TL;DR: In this paper, the authors propose a stack analysis tool that statically guarantees stack safety of interrupt-driven embedded software using an approach based on context-sensitive dataflow analysis of object code.
Abstract: An important correctness criterion for software running on embedded microcontrollers is stack safety: a guarantee that the call stack does not overflow. Our first contribution is a method for statically guaranteeing stack safety of interrupt-driven embedded software using an approach based on context-sensitive dataflow analysis of object code. We have implemented a prototype stack analysis tool that targets software for Atmel AVR microcontrollers and tested it on embedded applications compiled from up to 30,000 lines of C. We experimentally validate the accuracy of the tool, which runs in under 10 sec on the largest programs that we tested. The second contribution of this paper is the development of two novel ways to reduce stack memory requirements of embedded software.

81 citations


Journal ArticleDOI
TL;DR: This paper presents the reflections that took place in the European Network of Excellence Artist leading us to propose principles and structured contents for building curricula on embedded software and systems.
Abstract: The design of embedded real-time systems requires skills from multiple specific disciplines, including, but not limited to, control, computer science, and electronics This often involves experts from differing backgrounds, who do not recognize that they address similar, if not identical, issues from complementary angles Design methodologies are lacking in rigor and discipline so that demonstrating correctness of an embedded design, if at all possible, is a very expensive proposition that may delay significantly the introduction of a critical product While the economic importance of embedded systems is widely acknowledged, academia has not paid enough attention to the education of a community of high-quality embedded system designers, an obvious difficulty being the need of interdisciplinarity in a period where specialization has been the target of most education systems This paper presents the reflections that took place in the European Network of Excellence Artist leading us to propose principles and structured contents for building curricula on embedded software and systems

80 citations


Journal ArticleDOI
TL;DR: In this paper, the authors describe embedded system coursework during the first four years of university education (the U.S. undergraduate level) and describe lessons learned from teaching courses in many of these areas, as well as general skills taught and approaches used.
Abstract: Embedded systems encompass a wide range of applications, technologies, and disciplines, necessitating a broad approach to education. We describe embedded system coursework during the first 4 years of university education (the U.S. undergraduate level). Embedded application curriculum areas include: small and single-microcontroller applications, control systems, distributed embedded control, system-on-chip, networking, embedded PCs, critical systems, robotics, computer peripherals, wireless data systems, signal processing, and command and control. Additional cross-cutting skills that are important to embedded system designers include: security, dependability, energy-aware computing, software/systems engineering, real-time computing, and human--computer interaction. We describe lessons learned from teaching courses in many of these areas, as well as general skills taught and approaches used, including a heavy emphasis on course projects to teach system skills.

75 citations


Journal ArticleDOI
TL;DR: This paper presents an offline I/O device scheduling algorithm called energy-optimal device scheduler (EDS), which determines the start time of each job such that the energy consumption of the I/o devices is minimized and a heuristic called maximum device overlap (MDO) to generate near-optical solutions in polynomial time.
Abstract: Software-controlled (or dynamic) power management (DPM) in embedded systems has emerged as an attractive alternative to inflexible hardware solutions. However, DPM via I/O device scheduling for hard real-time systems has received relatively little attention. In this paper, we present an offline I/O device scheduling algorithm called energy-optimal device scheduler (EDS). For a given set of jobs, it determines the start time of each job such that the energy consumption of the I/O devices is minimized. EDS also ensures that no real-time constraint is violated. The device schedules are provably energy optimal under hard real-time job deadlines. Temporal and energy-based pruning are used to reduce the search space significantly. Since the I/O device scheduling problem is NP-complete, we also describe a heuristic called maximum device overlap (MDO) to generate near-optimal solutions in polynomial time. We present experimental results to show that EDS and MDO reduce the energy consumption of I/O devices significantly for hard real-time systems.

69 citations


Journal ArticleDOI
TL;DR: From the analysis it is concluded that embedded systems has a thematic identity and a functional legitimacy, which implies that the subject would benefit from being taught with an exemplifying selection and using an interactive communication, meaning that the education should move from teaching “something of everything” toward “everything of something.
Abstract: This paper provides an analysis of embedded systems education using a didactic approach. Didactics is a field of educational studies mostly referring to research aimed at investigating what's unique with a particular subject and how this subject ought to be taught. From the analysis we conclude that embedded systems has a thematic identity and a functional legitimacy. This implies that the subject would benefit from being taught with an exemplifying selection and using an interactive communication, meaning that the education should move from teaching “something of everything” toward “everything of something.” The interactive communication aims at adapting the education toward the individual student, which is feasible if using educational methods inspired by project-organized and problem-based learning. This educational setting is also advantageous as it prepares the students for a future career as embedded system engineers. The conclusions drawn from the analysis correlate with our own experiences from education in mechatronics as well as with a recently published study of 21 companies in Sweden dealing with industrial software engineering.

59 citations


Journal ArticleDOI
TL;DR: The efforts conducted at Vanderbilt University to establish a curriculum that addresses the needs of embedded software and systems are described and current efforts in using learning technology to construct, manage, and deliver sophisticated computer-aided learning modules that can supplement the traditional course structure in the individual disciplines through out-of-class and in-class use are described.
Abstract: Embedded software and systems are at the intersection of electrical engineering, computer engineering, and computer science, with, increasing importance, in mechanical engineering. Despite the clear need for knowledge of systems modeling and analysis (covered in electrical and other engineering disciplines) and analysis of computational processes (covered in computer science), few academic programs have integrated the two disciplines into a cohesive program of study. This paper describes the efforts conducted at Vanderbilt University to establish a curriculum that addresses the needs of embedded software and systems. Given the compartmentalized nature of traditional engineering schools, where each discipline has an independent program of study, we have had to devise innovative ways to bring together the two disciplines. The paper also describes our current efforts in using learning technology to construct, manage, and deliver sophisticated computer-aided learning modules that can supplement the traditional course structure in the individual disciplines through out-of-class and in-class use.

53 citations


Journal ArticleDOI
TL;DR: A unique approach to embedded software protection that utilizes a hardware/software codesign methodology is introduced, demonstrating that this framework can be the successful basis for the development of embedded applications that meet a wide range of security and performance requirements.
Abstract: The new-found ubiquity of embedded processors in consumer and industrial applications brings with it an intensified focus on security, as a strong level of trust in the system software is crucial to their widespread deployment. The growing area of software protection attempts to address the key steps used by hackers in attacking a software system. In this paper, we introduce a unique approach to embedded software protection that utilizes a hardware/software codesign methodology. Results demonstrate that this framework can be the successful basis for the development of embedded applications that meet a wide range of security and performance requirements.

Journal ArticleDOI
TL;DR: This work is the first attempt to systematically tackle energy macromodeling of an embedded OS and presents experimental results for two well-known embedded OSs, namely, μC/OS and embedded Linux OS.
Abstract: As embedded systems get more complex, deployment of embedded operating systems (OSs) as software run-time engines has become common In particular, this trend is true even for battery-powered embedded systems, where maximizing battery life is a primary concern In such OS-driven embedded software, the overall energy consumption depends very much on which OS is used and how the OS is used Therefore, the energy effects of the OS need to be studied in order to design low-energy systems effectivelyIn this paper, we discuss the motivation for performing OS energy characterization and propose a methodology to perform the characterization systematically The methodology consists of two parts The first part is analysis, which is concerned with identifying a set of components that can be used to characterize the OS energy consumption, called energy characteristics The second part is macromodeling, which is concerned with obtaining quantitative macromodels for the energy characteristics It involves the process of experiment design, data collection, and macromodel fitting The OS energy macromodels can be used conveniently as OS energy estimators in high-level or architectural optimization of embedded systems for low-energy consumptionAs far as we know, this work is the first attempt to systematically tackle energy macromodeling of an embedded OS To demonstrate our approach, we present experimental results for two well-known embedded OSs, namely, μC/OS and embedded Linux OS

Journal ArticleDOI
TL;DR: A compiler technique is described that ensures that dereferencing dangling pointers to freed memory does not violate memory safety, without annotations, run-time checks, or garbage collection, and works for arbitrary type-safe C programs.
Abstract: Traditional approaches to enforcing memory safety of programs rely heavily on run-time checks of memory accesses and on garbage collection, both of which are unattractive for embedded applications. The goal of our work is to develop advanced compiler techniques for enforcing memory safety with minimal run-time overheads. In this paper, we describe a set of compiler techniques that, together with minor semantic restrictions on C programs and no new syntax, ensure memory safety and provide most of the error-detection capabilities of type-safe languages, without using garbage collection, and with no run-time software checks, (on systems with standard hardware support for memory management). The language permits arbitrary pointer-based data structures, explicit deallocation of dynamically allocated memory, and restricted array operations. One of the key results of this paper is a compiler technique that ensures that dereferencing dangling pointers to freed memory does not violate memory safety, without annotations, run-time checks, or garbage collection, and works for arbitrary type-safe C programs. Furthermore, we present a new interprocedural analysis for static array bounds checking under certain assumptions. For a diverse set of embedded C programs, we show that we are able to ensure memory safety of pointer and dynamic memory usage in all these programs with no run-time software checks (on systems with standard hardware memory protection), requiring only minor restructuring to conform to simple type restrictions. Static array bounds checking fails for roughly half the programs we study due to complex array references, and these are the only cases where explicit run-time software checks would be needed under our language and system assumptions.

Journal ArticleDOI
TL;DR: The considerations that are driving the curriculum development and the search for fundamentals of embedded system science rather than embedded system design techniques, an approach that today is rather unique.
Abstract: Embedded systems have been a traditional area of strength in the research agenda of the University of California at Berkeley. In parallel to this effort, a pattern of graduate and undergraduate classes has emerged that is the result of a distillation process of the research results. In this paper, we present the considerations that are driving our curriculum development and we review our undergraduate and graduate program. In particular, we describe in detail a graduate class (EECS249: Design of Embedded Systems: Modeling, Validation and Synthesis) that has been taught for six years. A common feature of our education agenda is the search for fundamentals of embedded system science rather than embedded system design techniques, an approach that today is rather unique.

Journal ArticleDOI
TL;DR: A dynamic DElay-constrained minimum-Energy Dissemination (DEED) scheme that increases the probability that packets arrive at users within an upper-bound end-to-end delay (UBED) and minimizes energy consumption in both building the d-tree and disseminating data to mobile sinks.
Abstract: Disseminating data generated by sensors to users is one of useful functions of sensor networks. In probable real-time applications of sensor networks, multiple mobile users should receive data within their end-to-end delay constraint. In this paper, we propose a dynamic DElay-constrained minimum-Energy Dissemination (DEED) scheme. A dissemination tree (d-tree) is updated in a distributed way without regenerating the tree from scratch, such that energy consumption of the tree is minimized while satisfying end-to-end delay constraints. The d-tree is adjusted using delay estimation based on geometric distance. DEED increases the probability that packets arrive at users within an upper-bound end-to-end delay (UBED) and minimizes energy consumption in both building the d-tree and disseminating data to mobile sinks. Evaluation results show that DEED makes each node consume small energy resources and maintains fewer UBED misses when compared to Directed Diffusion and other baselines for sensor networks.

Journal ArticleDOI
TL;DR: The curriculum has a strong mathematics and basic science base, an in-depth exposure to engineering science and design of systems implemented with digital hardware and software, and coverage of two prominent application areas of embedded systems.
Abstract: The paper presents a curriculum for a 4-year undergraduate program in Embedded System Engineering (ESE). The curriculum was developed using a two-step approach. First, a body of education knowledge for Embedded System Engineering was defined. The body consists of sixteen knowledge areas. Each area is composed of several knowledge units, some designated as core and others as electives. The minimum lecture time for the core of each knowledge area is identified. The Body of Knowledge for Computer Engineering, developed by the IEEE-CS/ACM task force for Computing Curricula, was used as a reference. The education knowledge for ESE then served as the base for the development of the program curriculum. The curriculum has a strong mathematics and basic science base, an in-depth exposure to engineering science and design of systems implemented with digital hardware and software, and coverage of two prominent application areas of embedded systems. The curriculum core takes approximately 3 years of the program; the remaining part is elective.

Journal ArticleDOI
TL;DR: A schedulability analysis for applications consisting of mixed event-triggered and time-trIGgered processes and messages, and a worst-case queuing delay analysis for the gateways, responsible for routing inter-cluster traffic are proposed.
Abstract: We present an approach to frame packing for multicluster distributed embedded systems consisting of time-triggered and event-triggered clusters, interconnected via gateways. In our approach, the application messages are packed into frames such that the application is schedulable, thus the end-to-end message communication constraints are satisfied. We have proposed a schedulability analysis for applications consisting of mixed event-triggered and time-triggered processes and messages, and a worst-case queuing delay analysis for the gateways, responsible for routing inter-cluster traffic. Optimization heuristics for frame packing aiming at producing a schedulable system have been proposed. Extensive experiments and a real-life example show the efficiency of our frame-packing approach.

Journal ArticleDOI
TL;DR: A generalization of the supervisory control problem proposed by Ramadge and Wonham, which is able to exchange the coaccessibility requirement by any condition that could be used in model checking, and delivers algorithms to solve the generalized synthesis problem.
Abstract: Model checking and supervisor synthesis have been successful in solving different design problems related to discrete systems in the last decades In this paper, we analyze some advantages and drawbacks of these approaches and combine them for mutual improvement We achieve this through a generalization of the supervisory control problem proposed by Ramadge and Wonham The objective of that problem is to synthesize a supervisor which constrains a system's behavior according to a given specification, ensuring controllability and coaccessibility By introducing a new representation of the solution using systems of μ-calculus equations, we are able to handle these two conditions separately and thus to exchange the coaccessibility requirement by any condition that could be used in model checking Well-known results on μ-calculus model checking allow us to easily assess the computational complexity of any generalization Moreover, the model checking approach also delivers algorithms to solve the generalized synthesis problem We include an example in which the coaccessibility requirement is replaced by fairness constraints The paper also contains an analysis of related work by several authors

Journal ArticleDOI
TL;DR: Preliminary results with several array-intensive applications and varying input sizes show that the DST approach outperforms classical iteration space-oriented tiling as well as a data-oriented approach that considers each nest in isolation.
Abstract: Improving locality of data references is becoming increasingly important due to increasing gap between processor cycle times and off-chip memory access latencies. Improving data locality not only improves effective memory access time but also reduces memory system energy consumption due to data references. An optimizing compiler can play an important role in enhancing data locality in array-intensive embedded media applications with regular data access patterns.This paper presents a compiler-based data space-oriented tiling approach (DST). In this strategy, the data space (e.g., an array of signals) is logically divided into chunks (called data tiles) and each data tile is processed in turn. In processing a data tile, our approach traverses the entire iteration space of all nests in the code and executes all iterations (potentially coming from different nests) that access the data tile being processed. In doing so, it also takes data dependences into account. Since a data space is common across all nests that access it, DST can potentially achieve better results than traditional iteration space (loop) tiling by exploiting internest data locality.We also present an example application of DST for improving the effectiveness of a scratch pad memory (SPM) for data accesses. SPMs are alternatives to conventional cache memories in embedded computing world. These small on-chip memories, like caches, provide fast and low-power access to data; but, they differ from conventional data caches in that their contents are managed by compiler instead of hardware. We have implemented DST in a source-to-source translator and quantified its benefits using a simulator. Our preliminary results with several array-intensive applications and varying input sizes show that our approach outperforms classical iteration space-oriented tiling as well as a data-oriented approach that considers each nest in isolation.

Journal ArticleDOI
TL;DR: A data flow graph transformation method coupled with efficient scheduling and allocation is used to automatically synthesize a Multi-Mode system from its behavior-level specifications, which can easily switch configurations throughout the set of configurations it is designed for.
Abstract: In this paper, we present a novel design methodology for synthesizing multiple configurations (or modes) into a single programmable core that can be used in embedded systems Recent portable applications require reconfigurability of a system along with efficiency in terms of power, performance, and area The field programmable gate arrays (FPGAs) provide a reconfigurable platform; however, they are slower in speed with significantly higher power and area than achievable by a customized application-specific integrated circuits (ASIC) Implementation of a system in either FPGA or ASIC represents a trade-off between programmability and design efficiency In this work, we have developed techniques to realize efficient reconfigurable cores for a set of user-specified applications The resultant system, named as multimode system, can easily switch configurations throughout the set of configurations it is designed for A data flow graph transformation method coupled with efficient scheduling and allocation is used to automatically synthesize a Multi-Mode system from its behavior-level specifications Experimental results on several applications demonstrate that our implementations can achieve about 60X power reduction on average and run 35X faster over corresponding FPGA implementations

Journal ArticleDOI
TL;DR: A novel Energy-Aware Compilation (EAC) framework that estimates and optimizes energy consumption of a given code, taking as input the architectural and technological parameters, energy models, and energy/performance/code size constraints.
Abstract: The demand for high-performance architectures and powerful battery-operated mobile devices has accentuated the need for power optimization. While many power-oriented hardware optimization techniques have been proposed and incorporated in current systems, the increasingly critical power constraints have made it essential to look for software-level optimizations as well. The compiler can play a pivotal role in addressing the power constraints of a system as it wields a significant influence on the application's runtime behavior. This paper presents a novel Energy-Aware Compilation (EAC) framework that estimates and optimizes energy consumption of a given code, taking as input the architectural and technological parameters, energy models, and energy/performance/code size constraints. The framework has been validated using a cycle-accurate architectural-level energy simulator and found to be within 6p error margin while providing significant estimation speedup. The estimation speed of EAC is the key to the number of optimization alternatives that can be explored within a reasonable compilation time. As shown in this paper, EAC allows compiler writers and system designers to investigate power-performance tradeoffs of traditional compiler optimizations and to develop energy-conscious high-level code transformations.

Journal ArticleDOI
TL;DR: An algorithm for scheduling a set of nonrecurrent tasks (or jobs) with FIFO real-time constraints so as to minimize the total energy consumed when the tasks are performed on a dynamically variable voltage processor is presented.
Abstract: We present an algorithm for scheduling a set of nonrecurrent tasks (or jobs) with FIFO real-time constraints so as to minimize the total energy consumed when the tasks are performed on a dynamically variable voltage processor. Our algorithm runs in linear time and thus, in this case, is an improvement over the classical algorithm of Yao et al. It was inspired by considering the problem as a shortest-path problem. We also propose an algorithm to deal with the case where the processor has only a limited number of clock frequencies. This algorithm gives the optimum schedule with the minimum number of speed changes, which is important when the speed switching overhead cannot be neglected. All our algorithms are linear in the number of tasks if the arrivals and deadlines are sorted and otherwise need O(N log N) time. These complexities are shown to be the best possible. Finally, we extend our results to fluid tasks and to nonconvex cost functions.

Journal ArticleDOI
TL;DR: A novel approach that enhances the performance of 16-bit Thumb code by incorporating Augmenting eXtensions (AX) and ensuring that coalescing does not introduce pipeline delays or increase cycle time thereby resulting in reduction of both instruction counts and cycle counts.
Abstract: In the embedded domain, memory usage and energy consumption are critical constraints.Embedded processors such as the ARM and MIPS provide a 16-bit instruction set, (called Thumb in the case of the ARM family of processors), in addition to the 32-bit instruction set to address these concerns. Using 16-bit instructions one can achieve code size reduction and instruction cache energy savings at the cost of performance. This paper presents a novel approach that enhances the performance of 16-bit Thumb code. We have observed that throughout Thumb code there exist Thumb instruction pairs that are equivalent to a single ARM instruction. We have developed enhancements to the processor microarchitecture and the Thumb instruction set to exploit this property. We enhance the Thumb instruction set by incorporating Augmenting eXtensions (AX). A Thumb instruction pair that can be combined into a single ARM instruction is replaced by an AXThumb instruction pair by the compiler. The AX instruction is coalesced with the immediately following Thumb instruction to generate a single ARM instruction at decode time. The enhanced microarchitecture ensures that coalescing does not introduce pipeline delays or increase cycle time thereby resulting in reduction of both instruction counts and cycle counts. Using AX instructions and coalescing hardware we are also able to support efficient predicated execution in 16-bit mode.

Journal ArticleDOI
TL;DR: This work extends the ESTEREL language with a new gotopause construct, which behaves as a noninstantaneous jump instruction compatible with concurrency, and derives very efficient, correct-by-construction algorithms to verify and transform loops at compile time, using static analysis and program rewriting techniques.
Abstract: ESTEREL is a synchronous design language for the specification of reactive systems. Thanks to its compact formal semantics, code generation for ESTEREL is essentially provably correct. In practice, due to the many intricacies of an optimizing compiler, an actual proof would be in order. To begin with, we need a precise description of an efficient translation scheme, into some lower-level formalism. We tackle this issue on a specific part of the compilation process: the translation of loop constructs. First, because of instantaneous loops, programs may generate runtime errors, which cannot be tolerated for embedded systems, and have to be predicted and prevented at compile time. Second, because of schizophrenia, loops must be partly unfolded, making C code generation, as well as logic synthesis, nonlinear in general. Clever expansion strategies are required to minimize the unfolding. We first characterize these two difficulties w.r.t. the formal semantics of ESTEREL. We then derive very efficient, correct-by-construction algorithms to verify and transform loops at compile time, using static analysis and program rewriting techniques. With this aim in view, we extend the language with a new gotopause construct, which we use to encode loops. It behaves as a noninstantaneous jump instruction compatible with concurrency.

Journal ArticleDOI
TL;DR: The six-consortium architecture and the organization and programs of ESW are introduced and the embedded software curriculum developed by ESW is described, which will be implemented in most universities and colleges in Taiwan to promote the capabilities of embedded software design and implementations.
Abstract: The advancement of semiconductor manufacturing technology makes it practical to place a traditional board-level embedded system on a single chip. The evolvement of system-on-chip (SoC) techniques presents new challenges for integrated circuit designs as well as embedded software and systems. To address these challenges, the Ministry of Education (MOE) of Taiwan has been running the VLSI Circuits and Systems Education Program since 1996. This program adopts a top-down approach by forming six domain-specific, intercollegiate consortia. The Embedded Software (ESW) consortium addresses the challenges of embedded software for SoC systems. This paper first introduces the six-consortium architecture and the organization and programs of ESW. We next describe the embedded software curriculum developed by ESW. This curriculum will later be implemented in most universities and colleges in Taiwan to promote the capabilities of embedded software design and implementations. Finally, we present an execution summary of ESW 2004.

Journal ArticleDOI
TL;DR: A new Cache-Aware Code Allocation Technique (CAT) is presented, which transforms the structure of programs so that their behavior toward memory can meet the locality features the cache is able to exploit.
Abstract: In the embedded domain, the gap between memory and processor performance and the increase in application complexity need to be supported without wasting precious system resources: die size, power, etc. For these reasons, effective exploitation of small and simple cache memories is of the utmost importance. However, programs running on such caches can experience serious inefficiencies due to cache conflicts.We present a new Cache-Aware Code Allocation Technique (CAT), which transforms the structure of programs so that their behavior toward memory can meet the locality features the cache is able to exploit. The proposed approach uses detailed information of program execution to place program areas into memory and employs the new idea of “look-forward estimation” that helps to seek better global layouts during the placement of each area. CAT-optimized programs outperform the original ones achieving the same miss rate on two times, and sometimes four times, smaller caches. Moreover, CAT improves the instruction miss rate by more than 40p if compared to the best procedure-reordering algorithm. CAT performances derive from the increased number of cache lines that support the execution of optimized applications and from a more balanced load on them.

Journal ArticleDOI
TL;DR: A DISE-based implementation of postfetch decompression is presented and it is shown that it naturally supports customized program-specific decompression dictionaries, enables parameterized decompression allowing similar-but-not-identical instruction sequences to share dictionary entries, and uses no decompression-specific hardware.
Abstract: Code compression coupled with dynamic decompression is an important technique for both embedded and general-purpose microprocessors. Postfetch decompression, in which decompression is performed after the compressed instructions have been fetched, allows the instruction cache to store compressed code but requires a highly efficient decompression implementation. We propose implementing postfetch decompression using a new hardware facility called dynamic instruction stream editing (DISE). DISE provides a programmable decoder---similar in structure to those in many IA-32 processors---that is used to add functionality to an application by injecting custom code snippets into its fetched instruction stream. We present a DISE-based implementation of postfetch decompression and show that it naturally supports customized program-specific decompression dictionaries, enables parameterized decompression allowing similar-but-not-identical instruction sequences to share dictionary entries, and uses no decompression-specific hardware. We present extensive experimental results showing the virtue of this approach and evaluating the factors that impact its efficacy. We also present implementation-neutral results that give insight into the characteristics of any postfetch decompression technique. Our experiments not only demonstrate significant reduction in code size (up to 35%) but also significant improvements in performance (up to 20%) and energy (up to 10%).

Journal ArticleDOI
TL;DR: This paper presents code restructuring techniques for array-based and pointer-intensive applications for reducing data cache leakage energy consumption and indicates that the proposed compiler-based strategy reduces the cache energy consumption significantly.
Abstract: Silicon technology advances have made it possible to pack millions of transistors---switching at high clock speeds---on a single chip. While these advances bring unprecedented performance to electronic products, they also pose difficult power/energy consumption problems. For example, large number of transistors in dense on-chip cache memories consume significant static (leakage) power even if the cache is not used by the current computation. While previous compiler research studied code and data restructuring for improving data cache performance, to our knowledge, there exists no compiler-based study that targets data cache leakage power consumption. In this paper, we present code restructuring techniques for array-based and pointer-intensive applications for reducing data cache leakage energy consumption. The idea is to let the compiler analyze the application code and insert instructions that turn off cache lines that keep variables not used by the current computation. This turning-off does not destroy contents of a cache line and waking up the cache line (when it is accessed later) does not incur much overhead. Due to inherent data locality in applications, we find that, at a given time, only a small portion of the data cache needs to be active; the remaining part can be placed into a leakage-saving mode (state); i.e., they can be turned off. Our experimental results indicate that the proposed compiler-based strategy reduces the cache energy consumption significantly. We also demonstrate how different compiler optimizations can increase the effectiveness of our strategy.

Journal ArticleDOI
TL;DR: This paper presents an approach for analyzing data reuse properties of loop nests, and gives algorithms to simulate the footprints of array references in their reuse space and presents an optimization algorithm to compute the cache configurations for each loop nest.
Abstract: Classical compiler optimizations assume a fixed cache architecture and modify the program to take best advantage of it. In some cases, this may not be the best strategy because each nest might work best with a different cache configuration and transforming a nest for a given fixed cache configuration may not be possible due to data and control dependences. Working with a fixed cache configuration can also increase energy consumption in loops where the best required configuration is smaller than the default (fixed) one. In this paper, we take an alternate approach and modify the cache configuration for each nest, depending on the access pattern exhibited by the nest. We call this technique compiler-directed cache polymorphism (CDCP). More specifically, in this paper, we make the following contributions. First, we present an approach for analyzing data reuse properties of loop nests. Second, we give algorithms to simulate the footprints of array references in their reuse space. Third, based on our reuse analysis, we present an optimization algorithm to compute the cache configurations for each loop nest. Our experimental results show that CDCP is very effective in finding the near-optimal data cache configurations for different nests in array-intensive applications.

Journal ArticleDOI
TL;DR: An approach called “selective formalism” allows the use of CSP to be limited to specifying the control portion of a system, while the rest of its functionality is supplied in the form of C++ modules.
Abstract: CSP (communicating sequential processes) is a useful algebraic notation for creating a hierarchical behavioral specification for concurrent systems, due to its formal interprocess synchronization and communication semantics. CSP specifications are amenable to simulation and formal verification by model-checking tools. A translator has been created to synthesize Cpp code from CSP for execution with an object-oriented framework called CSPpp, thereby making CSP specifications directly executable. To overcome the drawback that CSP is neither a full-featured nor popular programming language, an approach called “selective formalism” allows the use of CSP to be limited to specifying the control portion of a system, while the rest of its functionality is supplied in the form of Cpp modules. These are activated through association with abstract events in the CSP specification. This is a new means of bringing convergence between a formal method and a popular programming language. It is believed that this methodology can be extended to hardware/software codesign for embedded systems.