scispace - formally typeset
Search or ask a question

Showing papers in "ACM Transactions in Embedded Computing Systems in 2004"


Journal ArticleDOI
TL;DR: A virtual force algorithm (VFA) is proposed as a sensor deployment strategy to enhance the coverage after an initial random placement of sensors to improve the coverage of cluster-based distributed sensor networks.
Abstract: The effectiveness of cluster-based distributed sensor networks depends to a large extent on the coverage provided by the sensor deployment. We propose a virtual force algorithm (VFA) as a sensor deployment strategy to enhance the coverage after an initial random placement of sensors. For a given number of sensors, the VFA algorithm attempts to maximize the sensor field coverage. A judicious combination of attractive and repulsive forces is used to determine the new sensor locations that improve the coverage. Once the effective sensor positions are identified, a one-time movement with energy consideration incorporated is carried out, that is, the sensors are redeployed, to these positions. We also propose a novel probabilistic target localization algorithm that is executed by the cluster head. The localization results are used by the cluster head to query only a few sensors (out of those that report the presence of a target) for more detailed information. Simulation results are presented to demonstrate the effectiveness of the proposed approach.

518 citations


Journal ArticleDOI
TL;DR: An introduction to the challenges involved in secure embedded system design is provided, recent advances in addressing them are discussed, and opportunities for future research are identified.
Abstract: Many modern electronic systems---including personal computers, PDAs, cell phones, network routers, smart cards, and networked sensors to name a few---need to access, store, manipulate, or communicate sensitive information, making security a serious concern in their design. Embedded systems, which account for a wide range of products from the electronics, semiconductor, telecommunications, and networking industries, face some of the most demanding security concerns---on the one hand, they are often highly resource constrained, while on the other hand, they frequently need to operate in physically insecure environments.Security has been the subject of intensive research in the context of general-purpose computing and communications systems. However, security is often misconstrued by embedded system designers as the addition of features, such as specific cryptographic algorithms and security protocols, to the system. In reality, it is a new dimension that designers should consider throughout the design process, along with other metrics such as cost, performance, and power.The challenges unique to embedded systems require new approaches to security covering all aspects of embedded system design from architecture to implementation. Security processing, which refers to the computations that must be performed in a system for the purpose of security, can easily overwhelm the computational capabilities of processors in both low- and high-end embedded systems. This challenge, which we refer to as the "security processing gap," is compounded by increases in the amounts of data manipulated and the data rates that need to be achieved. Equally daunting is the "battery gap" in battery-powered embedded systems, which is caused by the disparity between rapidly increasing energy requirements for secure operation and slow improvements in battery technology. The final challenge is the "assurance gap," which relates to the gap between functional security measures (e.g., security services, protocols, and their constituent cryptographic algorithms) and actual secure implementations. This paper provides an introduction to the challenges involved in secure embedded system design, discusses recent advances in addressing them, and identifies opportunities for future research.

484 citations


Journal ArticleDOI
TL;DR: The development of a scalable broadcast authentication scheme based on μTESLA, a broadcast authentication protocol whose scalability is limited by its unicast-based initial parameter distribution, and the experimental results obtained through simulation are presented.
Abstract: Broadcast authentication is a fundamental security service in distributed sensor networks. This paper presents the development of a scalable broadcast authentication scheme named multilevel μTESLA based on μTESLA, a broadcast authentication protocol whose scalability is limited by its unicast-based initial parameter distribution. Multilevel μTESLA satisfies several nice properties, including low overhead, tolerance of message loss, scalability to large networks, and resistance to replay attacks as well as denial-of-service attacks. This paper also presents the experimental results obtained through simulation, which demonstrate the performance of the proposed scheme under severe denial-of-service attacks and poor channel quality.

381 citations


Journal ArticleDOI
TL;DR: In this paper, the authors describe the motivation, design, implementation, and experimental evaluation (on sharply resource-constrained devices) of a self-configuring localization system using radio beacons.
Abstract: Embedded networked sensors promise to revolutionize the way we interact with our physical environment and require scalable, ad hoc deployable and energy-efficient node localization/positioning.This paper describes the motivation, design, implementation, and experimental evaluation (on sharply resource-constrained devices) of a self-configuring localization system using radio beacons. We identify beacon density as an important parameter in determining localization quality, which saturates at a transition density. We develop algorithms to improve localization quality by (i) automating placement of new beacons at low densities (HEAP) and (ii) rotating functionality among redundant beacons while increasing system lifetime at high densities (STROBE).

269 citations


Journal ArticleDOI
TL;DR: A real-time garbage-collection mechanism, which provides a guaranteed performance, for hard real- time systems, and a wear-leveling method, which is executed as a non-real-time service, to resolve the endurance problem of flash memory.
Abstract: Flash-memory technology is becoming critical in building embedded systems applications because of its shock-resistant, power economic, and nonvolatile nature. With the recent technology breakthroughs in both capacity and reliability, flash-memory storage systems are now very popular in many types of embedded systems. However, because flash memory is a write-once and bulk-erase medium, we need a translation layer and a garbage-collection mechanism to provide applications a transparent storage service. In the past work, various techniques were introduced to improve the garbage-collection mechanism. These techniques aimed at both performance and endurance issues, but they all failed in providing applications a guaranteed performance. In this paper, we propose a real-time garbage-collection mechanism, which provides a guaranteed performance, for hard real-time systems. On the other hand, the proposed mechanism supports non-real-time tasks so that the potential bandwidth of the storage system can be fully utilized. A wear-leveling method, which is executed as a non-real-time service, is presented to resolve the endurance problem of flash memory. The capability of the proposed mechanism is demonstrated by a series of experiments over our system prototype.

262 citations


Journal ArticleDOI
TL;DR: This work takes advantage of queuing delay and the broadcast nature of wireless communication to concatenate network units into an aggregate using a novel adaptive feedback scheme to schedule the delivery of this aggregate to the MAC layer for transmission.
Abstract: Sensor networks, a novel paradigm in distributed wireless communication technology, have been proposed for various applications including military surveillance and environmental monitoring. These systems deploy heterogeneous collections of sensors capable of observing and reporting on various dynamic properties of their surroundings in a time sensitive manner. Such systems suffer bandwidth, energy, and throughput constraints that limit the quantity of information transferred from end-to-end. These factors coupled with unpredictable traffic patterns and dynamic network topologies make the task of designing optimal protocols for such networks difficult. Mechanisms to perform data-centric aggregation utilizing application-specific knowledge provide a means to augmenting throughput, but have limitations due to their lack of adaptation and reliance on application-specific decisions. We, therefore, propose a novel aggregation scheme that adaptively performs application-independent data aggregation in a time sensitive manner. Our work isolates aggregation decisions into a module that resides between the network and the data-link layer and does not require any modifications to the currently existing MAC and network layer protocols. We take advantage of queuing delay and the broadcast nature of wireless communication to concatenate network units into an aggregate using a novel adaptive feedback scheme to schedule the delivery of this aggregate to the MAC layer for transmission. In our evaluation we show that end-to-end transmission delay is reduced by as much as 80p under heavy traffic loads. Additionally, we show as much as a 50p reduction in transmission energy consumption with an overall reduction in header overhead. Theoretical analysis, simulation, and a test-bed implementation on Berkeley's MICA motes are provided to validate our claims.

237 citations


Journal ArticleDOI
TL;DR: This contribution provides a state-of-the-art description of security issues on FPGAs, both from the system and implementation perspectives, and summarizes both public and symmetric-key algorithm implementations on FGPAs.
Abstract: In the last decade, it has become apparent that embedded systems are integral parts of our every day lives. The wireless nature of many embedded applications as well as their omnipresence has made the need for security and privacy preserving mechanisms particularly important. Thus, as field programmable gate arrays (FPGAs) become integral parts of embedded systems, it is imperative to consider their security as a whole. This contribution provides a state-of-the-art description of security issues on FPGAs, both from the system and implementation perspectives. We discuss the advantages of reconfigurable hardware for cryptographic applications, show potential security problems of FPGAs, and provide a list of open research problems. Moreover, we summarize both public and symmetric-key algorithm implementations on FPGAs.

203 citations


Journal ArticleDOI
TL;DR: A rigorous definition of leakage immunity is given and several leakage detection tests are presented, which confirm the probable existence of secret-correlated emanations and indicate how likely the leakage is.
Abstract: In addition to its usual complexity assumptions, cryptography silently assumes that information can be physically protected in a single location. As one can easily imagine, real-life devices are not ideal and information may leak through different physical channels.This paper gives a rigorous definition of leakage immunity and presents several leakage detection tests. In these tests, failure confirms the probable existence of secret-correlated emanations and indicates how likely the leakage is. Success does not refute the existence of emanations but indicates that significant emanations were not detected on the strength of the evidence presented, which of course, leaves the door open to reconsider the situation if further evidence comes to hand at a later date.

182 citations


Journal ArticleDOI
TL;DR: This work presents a lightweight security protocol (LiSP) that makes a tradeoff between security and resource consumption via efficient rekeying and shows that LiSP reduces resource consumption significantly, while requiring only three hash computations, on average, and a storage space for eight keys.
Abstract: Small low-cost sensor devices with limited resources are being used widely to build a self-organizing wireless network for various applications, such as situation monitoring and asset surveillance. Making such a sensor network secure is crucial to their intended applications, yet challenging due to the severe resource constraints in each sensor device. We present a lightweight security protocol (LiSP) that makes a tradeoff between security and resource consumption via efficient rekeying. The heart of the protocol is the novel rekeying mechanism that offers (1) efficient key broadcast without requiring retransmission/ACKs, (2) authentication for each key-disclosure without incurring additional overhead, (3) the ability of detecting/recovering lost keys, (4) seamless key refreshment without disrupting ongoing data encryption/decryption, and (5) robustness to inter-node clock skews. Furthermore, these benefits are preserved in conventional contention-based medium access control protocols that do not support reliable broadcast. Our performance evaluation shows that LiSP reduces resource consumption significantly, while requiring only three hash computations, on average, and a storage space for eight keys.

120 citations


Journal ArticleDOI
TL;DR: These experiments represent the most comprehensive hardware/software partitioning study published to date and found that moving critical code to hardware resulted in average speedups of 3 to 5 and average energy savings of 35% to 70%, with average hardware requirements of only 5000 to 10,000 gates.
Abstract: We present results of extensive hardware/software partitioning experiments on numerous benchmarks. We describe our loop-oriented partitioning methodology for moving critical code from hardware to software. Our benchmarks included programs from PowerStone, MediaBench, and NetBench. Our experiments included estimated results for partitioning using an 8051 8-bit microcontroller or a 32-bit MIPS microprocessor for the software, and using on-chip configurable logic or custom application-specific integrated circuit hardware for the hardware. Additional experiments involved actual measurements taken from several physical implementations of hardware/software partitionings on real single-chip microprocessor/configurable-logic devices. We also estimated results assuming voltage scalable processors. We provide performance, energy, and size data for all of the experiments. We found that the benchmarks spent an average of 80p of their execution time in only 3p of their code, amounting to only about 200 bytes of critical code. For various experiments, we found that moving critical code to hardware resulted in average speedups of 3 to 5 and average energy savings of 35p to 70p, with average hardware requirements of only 5000 to 10,000 gates. To our knowledge, these experiments represent the most comprehensive hardware/software partitioning study published to date.

106 citations


Journal ArticleDOI
TL;DR: This work introduces on-chip hardware implementing an efficient cache tuning heuristic that can automatically, transparently, and dynamically tune the cache to an executing program, completely transparently to the programmer.
Abstract: Memory accesses often account for about half of a microprocessor system's power consumption. Customizing a microprocessor cache's total size, line size, and associativity to a particular program is well known to have tremendous benefits for performance and power. Customizing caches has until recently been restricted to core-based flows, in which a new chip will be fabricated. However, several configurable cache architectures have been proposed recently for use in prefabricated microprocessor platforms. Tuning those caches to a program is still, however, a cumbersome task left for designers, assisted in part by recent computer-aided design (CAD) tuning aids. We propose to move that CAD on-chip, which can greatly increase the acceptance of tunable caches. We introduce on-chip hardware implementing an efficient cache tuning heuristic that can automatically, transparently, and dynamically tune the cache to an executing program. Our heuristic seeks not only to reduce the number of configurations that must be examined, but also traverses the search space in a way that minimizes costly cache flushes. By simulating numerous Powerstone and MediaBench benchmarks, we show that such a dynamic self-tuning cache saves on average 40p of total memory access energy over a standard nontuned reference cache.

Journal ArticleDOI
TL;DR: An integrated heuristic methodology is proposed, which allows the scheduler to handle power-aware real-time tasks with low cost while maximizing the use of the available resources and without jeopardizing the temporal constraints of the system.
Abstract: In this paper, we propose a novel scheduling framework for a dynamic real-time environment with energy constraints. This framework dynamically adjusts the CPU voltage/frequency so that no task in the system misses its deadline and the total energy savings of the system are maximized. In this paper, we consider only realistic, discrete-level speeds.Each task in the system consumes a certain amount of energy, which depends on a speed chosen for execution. The process of selecting speeds for execution while maximizing the energy savings of the system requires the exploration of a large number of combinations, which is too time consuming to be computed online. Thus, we propose an integrated heuristic methodology, which executes an optimization procedure in a low computation time. This scheme allows the scheduler to handle power-aware real-time tasks with low cost while maximizing the use of the available resources and without jeopardizing the temporal constraints of the system. Simulation results show that our heuristic methodology is able to generate power-aware scheduling solutions with near-optimal performance.

Journal ArticleDOI
TL;DR: This work shows that combining the string-matching methodology, hashing engine supported by most network processors, and characteristics of current Snort signatures frequently improves performance and reduces number of memory accesses compared to conventional string- matching algorithms.
Abstract: Network intrusion detection systems (NIDSs) are one of the latest developments in security. The matching of packet strings against collected signatures dominates signature-based NIDS performance. Network processors are also one of the fastest growing segments of the semiconductor market, because they are designed to provide scalable and flexible solutions that can accommodate change quickly and economically. This work presents a fast string-matching algorithm (called FNP) over the network processor platform that conducts matching sets of patterns in parallel. This design also supports numerous practical features such as case-sensitive string matching, signature prioritization, and multiple-content signatures. This efficient multiple-pattern matching algorithm utilizes the hardware facilities provided by typical network processors instead of employing the external lookup co-processors. To verify the efficiency and practicability of the proposed algorithm, it was implemented on the Vitesse IQ2000 network processor platform. The searching patterns used in the present experiments are derived from the well-known Snort ruleset cited by most open-source and commercial NIDSs. This work shows that combining our string-matching methodology, hashing engine supported by most network processors, and characteristics of current Snort signatures frequently improves performance and reduces number of memory accesses compared to conventional string-matching algorithms. Another contribution of this work is to highlight that, besides total number of searching patterns, shortest pattern length is also a major influence on NIDS multipattern matching algorithm performance.

Journal ArticleDOI
TL;DR: Results demonstrate the benefits of the approach (achieving similar performance to a static configuration solution but using half of the resources) and the hardware configuration prefetch unit is useful in applications with low level of parallelism.
Abstract: Dynamic scheduling for system-on-chip (SoC) platforms has become an important field of research due to the emerging range of applications with dynamic behavior (e.g., MPEG-4). Dynamically reconfigurable architectures are an interesting solution for this type of applications. Scheduling for dynamically reconfigurable architectures might be classified in two major broad categories: (1) static scheduling techniques or (2) use of an operating system (OS) for reconfigurable computing. However, research efforts demonstrate a trend to move tasks traditionally assigned to the OS into hardware (thus increasing performance and reducing power).In this paper, we introduce a methodology for dynamically reconfigurable architectures. The dynamic scheduling of tasks to several reconfigurable units is performed by a hardware-based multitasking support unit. Two different versions of the microarchitecture are possible (with or without a hardware configuration prefetch unit). The dynamic scheduling algorithms are also explained. Both algorithms try to minimize the reconfiguration overhead by overlapping the execution of tasks with device reconfigurations.An exhaustive study (using the developed simulation and performance analysis framework) of this novel proposal is presented, and the effect of the microarchitecture parameters has been studied. Results demonstrate the benefits of our approach (achieving similar performance to a static configuration solution but using half of the resources). The hardware configuration prefetch unit is useful (i.e., minimize the execution time) in applications with low level of parallelism.

Journal ArticleDOI
TL;DR: This article presents an approach for obtaining the expected deadline miss ratio of the tasks and of the task graphs in an analytic way, and aims to keep the analysis cost low, in terms of required analysis time and memory, while considering as general classes of target application models as possible.
Abstract: In the past decade, the limitations of models considering fixed (worst-case) task execution times have been acknowledged for large application classes within soft real-time systems. A more realistic model considers the tasks having varying execution times with given probability distributions. Considering such a model with specified task execution time probability distribution functions, an important performance indicator of the system is the expected deadline miss ratio of the tasks and of the task graphs. This article presents an approach for obtaining this indicator in an analytic way.Our goal is to keep the analysis cost low, in terms of required analysis time and memory, while considering as general classes of target application models as possible. The following main assumptions have been made on the applications that are modeled as sets of task graphs: the tasks are periodic, the task execution times have given generalized probability distribution functions, the task execution deadlines are given and arbitrary, the scheduling policy can belong to practically any class of non-preemptive scheduling policies, and a designer supplied maximum number of concurrent instantiations of the same task graph is tolerated in the system.Experiments show the efficiency of the proposed technique for monoprocessor systems.

Journal ArticleDOI
TL;DR: An adaptive checkpointing scheme in which the checkpointing interval for a task is dynamically adjusted during execution, and checkpoints are inserted based not only on the available slack, but also on the occurrences of faults is presented.
Abstract: Safety-critical embedded systems often operate in harsh environmental conditions that necessitate fault-tolerant computing techniques. In addition, many safety-critical systems execute real-time applications that require strict adherence to task deadlines. These embedded systems are also energy-constrained, since system lifetime is determined largely by the battery lifetime. In this paper, we investigate dynamic adaptation techniques based on checkpointing and dynamic voltage scaling (DVS) for fault tolerance and power management. We first present schedulability tests that provide the criteria under which checkpointing can provide fault tolerance and real-time guarantees. We then present an adaptive checkpointing scheme in which the checkpointing interval for a task is dynamically adjusted during execution, and checkpoints are inserted based not only on the available slack, but also on the occurrences of faults. Next, we combine adaptive checkpointing with DVS to achieve power reduction. Finally, we develop an adaptive checkpointing scheme for a set of multiple tasks in real-time systems. An offline preprocessing based on linear programming is used to determine the parameters that are provided as inputs to the online adaptive checkpointing procedure. Simulation results show that compared to previous methods, the proposed adaptive checkpointing approach increases the likelihood of timely task completion in the presence of faults. When combined with DVS, adaptive checkpointing also leads to considerable energy savings.

Journal ArticleDOI
TL;DR: This paper presents an efficient optimal algorithm for minimizing the run-time reconfiguration (context switching) delay of executing an application on a dynamically adaptable system, and proves the optimality and the efficiency of the algorithm.
Abstract: Reconfiguration delay is one of the major barriers in the way of dynamically adapting a system to its application requirements. The run-time reconfiguration delay is quite comparable to the application latency for many classes of applications and might even dominate the application run-time. In this paper, we present an efficient optimal algorithm for minimizing the run-time reconfiguration (context switching) delay of executing an application on a dynamically adaptable system. The system is composed of a number of cameras with embedded reconfigurable resources collaborating in order to track an object. The operations required to execute in order to track the object are revealed to the system at run-time and can change according to a number of parameters, such as the target shape and proximity. Similarly, we can assume that the applications comprising tasks are already scheduled and each of them has to be realized on the reconfigurable fabric in order to be executed.The modeling and the algorithm are both applicable to partially reconfigurable platforms as well as multi-FPGA systems. The algorithm can be directly applied to minimize the application run-time for the typical classes of applications, where the actual execution delay of the basic operations is negligible compared to the reconfiguration delay. We prove the optimality and the efficiency of our algorithm. We report the experimental results, which demonstrate a 2.5--40p improvement on the total run-time reconfiguration delay as compared to other heuristics.

Journal ArticleDOI
TL;DR: Two digit-serial architectures for normal basis multipliers over (GF(2m)) have the same gate count and gate delay and an algorithm that can considerably reduce the redundancy is developed.
Abstract: In this article, two digit-serial architectures for normal basis multipliers over (GF(2m)) are presented. These two structures have the same gate count and gate delay. We also consider two special cases of optimal normal bases for the two digit-serial architectures. A straightforward implementation leaves gate redundancy in both of them. An algorithm that can considerably reduce the redundancy is also developed. The proposed architectures are compared with the existing ones in terms of gate and time complexities.

Journal ArticleDOI
TL;DR: This paper uses a recently proposed radio power management technique, dynamic modulation scaling (DMS), as a control knob to enable energy-latency trade-offs during wireless packet transmission, and presents E2WFQ, an energy efficient version of the weighted fair queuing algorithm for fair packet scheduling.
Abstract: As embedded systems are being networked, often wirelessly, an increasingly larger share of their total energy budget is due to the communication. This necessitates the development of power management techniques that address communication subsystems, such as radios, as opposed to computation subsystems, such as embedded processors, to which most of the research effort thus far has been devoted. In this paper, we present techniques for energy efficient packet scheduling and fair queuing in wireless communication systems. Our techniques are based on an extensive slack management approach that dynamically adapts the output rate of the system in accordance with the input packet arrival rate. We use a recently proposed radio power management technique, dynamic modulation scaling (DMS), as a control knob to enable energy-latency trade-offs during wireless packet transmission. We first analyze a single input stream scenario, and describe a rate adaptation technique that results in significantly lower energy consumption (reductions of up to 10 ×), while still bounding the resulting packet delays. By appropriately setting the various parameters of our algorithm, the system can be made to traverse the energy-latency-fidelity trade-off space. We extend our techniques to a multiple input stream scenario, and present E2WFQ, an energy efficient version of the weighted fair queuing (WFQ) algorithm for fair packet scheduling. Simulation results show that large energy savings can be obtained through the use of E2WFQ, with only a small, bounded increase in worst case packet latency. Further, our results demonstrate that E2WFQ does not adversely affect the throughput allocation (and hence, fairness) of WFQ.

Journal ArticleDOI
TL;DR: A simple dynamic voltage scheduling technique, which suits multimedia applications well and, in case of soft real-time applications, allows all idle intervals of the processor to be fully exploited by using buffers, to achieve significant power reduction in real-world multimedia applications.
Abstract: Power-efficient design of multimedia applications becomes more important as they are used increasingly in many embedded systems. We propose a simple dynamic voltage scheduling (DVS) technique, which suits multimedia applications well and, in case of soft real-time applications, allows all idle intervals of the processor to be fully exploited by using buffers. Our main theme is to determine the minimum buffer size to maximize energy saving in three cases: (i) single task, (ii) multiple subtask, and (iii) multitask. We also present a technique of adjusting task deadlines for further reducing energy consumption in the multiple-subtask and multitask cases. Unlike other DVS techniques using buffers, we guarantee to meet the real-time latency constraint. Experimental results show that the proposed technique does indeed achieve significant power reduction in real-world multimedia applications.

Journal ArticleDOI
TL;DR: This contribution appears to be the first thorough comparison of two public-key families, namely elliptic curve (ECC) and hyperelliptic curve cryptosystems on a wide range of embedded processor types (ARM, ColdFire, PowerPC).
Abstract: It is widely recognized that data security will play a central role in future IT systems. Providing public-key cryptographic primitives, which are the core tools for security, is often difficult on embedded processor due to computational, memory, and power constraints. This contribution appears to be the first thorough comparison of two public-key families, namely elliptic curve (ECC) and hyperelliptic curve cryptosystems on a wide range of embedded processor types (ARM, ColdFire, PowerPC). We investigated the influence of the processor type, resources, and architecture regarding throughput. Further, we improved previously known HECC algorithms resulting in a more efficient arithmetic.

Journal ArticleDOI
TL;DR: Computer-aided design tools are presented, which complement conventional FPGA design environments to enable the specification, simulation, synthesis, automatic placement and routing, partial configuration generation and control of partially reconfigurable designs.
Abstract: This paper presents a top-down designer-driven design flow for creating hardware that exploits partial run-time reconfiguration. Computer-aided design (CAD) tools are presented, which complement conventional FPGA design environments to enable the specification, simulation (both functional and timing), synthesis, automatic placement and routing, partial configuration generation and control of partially reconfigurable designs. Collectively these tools constitute the dynamic circuit switching CAD framework. A partially reconfigurable Viterbi decoder design is presented to demonstrate the design flow and illustrate possible power consumption reductions and performance improvements through the exploitation of partial reconfiguration.

Journal ArticleDOI
TL;DR: Simulation results show that the proposed voltage scheduling algorithms dramatically reduce processor energy consumption over non-power-aware scheduling algorithms and consistently outperform the static speed scheme in a wide range of system and workload conditions.
Abstract: As mobile computing is getting popular, there is a growing need for techniques that minimize energy consumption on battery-powered mobile devices. Processor voltage scheduling can effectively reduce processor energy consumption by lowering the processor speed. In this paper, we study voltage scheduling for real-time periodic tasks with non-preemptible sections. Three schemes are proposed: The static speed algorithm derives the minimum static feasible speed based on the stack resource policy. Due to blocking, this static speed is usually higher than the speed required for scheduling fully preemptible tasks (called the utilization speed). Two dynamic speed algorithms are then introduced to further reduce energy consumption. The novel dual speed algorithm operates the processor at the utilization speed whenever possible and switches to the higher static speed only when blocking occurs. The dual speed dynamic reclaiming algorithm reserves time budget for each job, reclaims the unused time budget from completed jobs and redistributes it to subsequent jobs so they can run at a lower speed whenever possible. Feasibility conditions for real-time task sets have been derived and proved mathematically. Simulation results show that the proposed voltage scheduling algorithms dramatically reduce processor energy consumption over non-power-aware scheduling algorithms. Furthermore, the two dynamic speed algorithms consistently outperform the static speed scheme in a wide range of system and workload conditions.

Journal ArticleDOI
TL;DR: This paper presents a relatively simple processor with a dynamically reconfigurable datapath acting as an accelerating coprocessor, and presents a methodology for the design of these solutions and illustrates it with two complete case studies: an MPEG2 coder and a GSM coder to show how significant speedups can be obtained using relatively little hardware.
Abstract: Increasing nonrecurring engineering and mask costs are making it harder to turn to hardwired application specific integrated circuit (ASIC) solutions for high-performance applications. The volume required to amortize these high costs has been increasing, making it increasingly expensive to afford ASIC solutions for medium-volume products. This has led to designers seeking programmable solutions of varying sorts using these so-called programmable platforms. These programmable platforms span a large range from bit-level programmable field programmable gate arrays to word-level programmable application-specific, and in some cases even general-purpose processors. The programmability comes with a power and performance overhead. Attempts to reduce this overhead typically involve making some core hardwired ASIC like logic blocks accessible to the programmable elements. This paper presents one such hybrid solution in this space---a relatively simple processor with a dynamically reconfigurable datapath acting as an accelerating coprocessor. This datapath consists of hardwired function units and reconfigurable interconnect. We present a methodology for the design of these solutions and illustrate it with two complete case studies: an MPEG2 coder, and a GSM coder, to show how significant speedups can be obtained using relatively little hardware. This work is part of the MESCAL project, which is geared towards developing design environments for the development of application-specific platforms.

Journal ArticleDOI
TL;DR: This work deals with errors caused by such faults in polynomial basis multipliers over finite fields of characteristic two and presents a scheme to correct single errors and both bit-parallel and bit-serial fault tolerant multipliers are proposed.
Abstract: Cryptographic schemes, such as authentication, confidentiality, and integrity, rely on computations in very large finite fields, whose hardware realization may require millions of logic gates. In a straightforward design, even a single fault in such a complex circuit is likely to yield an incorrect result and may be exploited by an attacker to break the cryptosystem. In this regard, we consider computing over finite fields in presence of certain faults in multiplier circuits. Our work reported here deals with errors caused by such faults in polynomial basis multipliers over finite fields of characteristic two and presents a scheme to correct single errors. Towards this, pertinent theoretical results are derived, and both bit-parallel and bit-serial fault tolerant multipliers are proposed.

Journal ArticleDOI
TL;DR: An efficient heuristic is presented that identifies optimized supply voltages by not only "simply" exploiting slack time, but under the additional consideration of the power profiles, effectively minimizes the energy dissipation of heterogeneous architectures, including power-managed processing elements, effectively.
Abstract: We present an iterative schedule optimization for multirate system specifications, mapped onto heterogeneous distributed architectures containing dynamic voltage scalable processing elements (DVS-PEs). To achieve a high degree of energy reduction, we formulate a generalized DVS problem, taking into account the power variations among the executing tasks. An efficient heuristic is presented that identifies optimized supply voltages by not only "simply" exploiting slack time, but under the additional consideration of the power profiles. Thereby, this algorithm minimizes the energy dissipation of heterogeneous architectures, including power-managed processing elements, effectively. Further, we address the simultaneous schedule optimization toward timing behavior and DVS utilization by integrating the proposed DVS heuristic into a genetic list scheduling approach. We investigate and analyze the possible energy reduction at both steps of the co-synthesis (voltage scaling and scheduling), including the power variations effects. Extensive experiments indicate that the presented work produces solutions with high quality.

Journal ArticleDOI
TL;DR: A novel method of modeling and evaluating RTR-reduced HW--SW partitions is presented, and the techniques of computation-reconfiguration overlap and the retention of circuitry between reconfigurations are used within this model to explore the possibilities of RTR reduction.
Abstract: The hardware--software (HW--SW) partitioning of applications to dynamically reconfigurable embedded systems allows for customization of their hardware resources during run-time to meet the demands of executing applications. The run-time reconfiguration (RTR) of such systems can have an impact on the HW--SW partitioning strategy and the system performance. It is therefore important to consider approaches to optimally reduce the RTR overhead during the HW--SW partitioning stage. In order to examine potential benefits in performance, it is necessary to develop a method to model and evaluate the RTR. In this paper, a novel method of modeling and evaluating such RTR-reduced HW--SW partitions is presented. The techniques of computation-reconfiguration overlap and the retention of circuitry between reconfigurations are used within this model to explore the possibilities of RTR reduction. The integration of this model into the authors' current genetic-algorithm-driven HW--SW partitioner is also presented, with two applications used to illustrate the benefits of RTR-reduced exploration during HW--SW partitioning.

Journal ArticleDOI
TL;DR: This paper addresses automatic validation of processor, memory, and coprocessor pipelines described in an ADL by presenting a graph-based modeling that captures both structure and behavior of the architecture.
Abstract: Verification is one of the most complex and expensive tasks in the current Systems-on-Chip design process. Many existing approaches employ a bottom-up approach to pipeline validation, where the functionality of an existing pipelined processor is, in essence, reverse-engineered from its RT-level implementation. Our validation technique is complementary to these bottom-up approaches. Our approach leverages the system architect's knowledge about the behavior of the pipelined architecture, through architecture description language (ADL) constructs, and thus allows a powerful top-down approach to pipeline validation. The most important requirement in top-down validation process is to ensure that the specification (reference model) is golden. This paper addresses automatic validation of processor, memory, and coprocessor pipelines described in an ADL. We present a graph-based modeling that captures both structure and behavior of the architecture. Based on this model, we present algorithms to ensure that the static behavior of the pipeline is well formed by analyzing the structural aspects of the specification. We applied our methodology to verify specification of several realistic architectures from different architectural domains to demonstrate the usefulness of our approach.

Journal ArticleDOI
TL;DR: A step towards code-size-aware compilation is presented, phrase register allocation and code generation as an integer linear programming problem where the upper bound on the code size can simply be expressed as an additional constraint.
Abstract: Most compilers ignore the problems of limited code space in embedded systems. Designers of embedded software often have no better alternative than to manually reduce the size of the source code or even the compiled code. Besides being tedious and error prone, such optimization results in obfuscated code that is difficult to maintain and reuse. In this paper, we present a step towards code-size-aware compilation. We phrase register allocation and code generation as an integer linear programming problem where the upper bound on the code size can simply be expressed as an additional constraint. The resulting compiler, when applied to six commercial microcontroller programs, generates code nearly as compact as carefully crafted code.

Journal ArticleDOI
TL;DR: For the first time, differential power analysis results on a VLIW using real power measurements are presented and show that the processor instruction level parallelism and large bus size contribute in making differential powerAnalysis attacks extremely difficult.
Abstract: Embedded wireless devices require secure high-performance cryptography in addition to low-cost and low-energy dissipation. This paper presents for the first time a design methodology for security on a VLIW complex DSP-embedded processor core. Elliptic curve cryptography is used to demonstrate the design for security methodology. Results are verified with real dynamic power measurements and show that compared to previous research a 79p improvement in performance is achieved. Modification of power traces are performed to resist simple power analysis attack with up to 39p overhead in performance, up to 49p overheads in energy dissipation, and up to 11p overhead in code size. Simple power analysis on the VLIW DSP core is shown to be more correlated to routine ordering than individual instructions. For the first time, differential power analysis results on a VLIW using real power measurements are presented. Results show that the processor instruction level parallelism and large bus size contribute in making differential power analysis attacks extremely difficult. This research is important for industry since efficient yet secure cryptography is crucial for wireless communication devices.