scispace - formally typeset
Search or ask a question

Showing papers in "ACM Transactions in Embedded Computing Systems in 2010"


Journal ArticleDOI
TL;DR: MEDiSN is a wireless sensor network for monitoring patients' physiological data in hospitals and during disaster events that can scale from tens to at least five hundred PMs, effectively protect application packets from congestive and corruptive losses, and deliver medically actionable data.
Abstract: Staff shortages and an increasingly aging population are straining the ability of emergency departments to provide high quality care. At the same time, there is a growing concern about hospitals' ability to provide effective care during disaster events. For these reasons, tools that automate patient monitoring have the potential to greatly improve efficiency and quality of health care. Towards this goal, we have developed MEDiSN, a wireless sensor network for monitoring patients' physiological data in hospitals and during disaster events. MEDiSN comprises Physiological Monitors (PMs), which are custom-built, patient-worn motes that sample, encrypt, and sign physiological data and Relay Points (RPs) that self-organize into a multi-hop wireless backbone for carrying physiological data. Moreover, MEDiSN includes a back-end server that persistently stores medical data and presents them to authenticated GUI clients. The combination of MEDiSN's two-tier architecture and optimized rate control protocols allows it to address the compound challenge of reliably delivering large volumes of data while meeting the application's QoS requirements. Results from extensive simulations, testbed experiments, and multiple pilot hospital deployments show that MEDiSN can scale from tens to at least five hundred PMs, effectively protect application packets from congestive and corruptive losses, and deliver medically actionable data.

236 citations


Journal ArticleDOI
TL;DR: In this article, the state space is built using a tailored simulator, which abstracts from time, handles nondeterminism, and creates an overapproximation of the behavior shown by the real microcontroller.
Abstract: The interest of industries in model checking software for microcontrollers is increasing. However, there are currently no appropriate tools that can be applied by embedded systems developers for the direct verification of software for microcontrollers without the need for manual modeling. This article describes a new approach to model checking software for microcontrollers, which verifies the assembly code of the software. The state space is built using a tailored simulator, which abstracts from time, handles nondeterminism, and creates an overapproximation of the behavior shown by the real microcontroller. Within this simulator, we apply abstraction techniques to tackle the state-explosion problem. In our approach, we combine different formal methods, namely, model checking, static analysis, and abstract interpretation. We also combine explicit and symbolic model checking techniques. This article presents a case study using several programs to demonstrate the efficiency of the applied abstraction techniques and to show the applicability of this approach.

84 citations


Journal ArticleDOI
TL;DR: This work proposes a novel superblock-based FTL scheme, which combines a set of adjacent logical blocks into a superblock, which has the flexibility provided by fine- grain address translation, while reducing the memory overhead to the level of coarse-grain address translation.
Abstract: In NAND flash-based storage systems, an intermediate software layer called a Flash Translation Layer (FTL) is usually employed to hide the erase-before-write characteristics of NAND flash memory. We propose a novel superblock-based FTL scheme, which combines a set of adjacent logical blocks into a superblock. In the proposed Superblock FTL, superblocks are mapped at coarse granularity, while pages inside the superblock are mapped freely at fine granularity to any location in several physical blocks. To reduce extra storage and flash memory operations, the fine-grain mapping information is stored in the spare area of NAND flash memory. This hybrid address translation scheme has the flexibility provided by fine-grain address translation, while reducing the memory overhead to the level of coarse-grain address translation. Our experimental results show that the proposed FTL scheme significantly outperforms previous block-mapped FTL schemes with roughly the same memory overhead.

81 citations


Journal ArticleDOI
TL;DR: In this article, three different arbitration policies are presented, evaluated, and compared with respect to their real-time and average-case performance: a fixed priority, a fair-based, and a time-sliced arbiter.
Abstract: Chip-multiprocessors are an emerging trend for embedded systems. In this article, we introduce a real-time Java multiprocessor called JopCMP. It is a symmetric shared-memory multiprocessor, and consists of up to eight Java Optimized Processor (JOP) cores, an arbitration control device, and a shared memory. All components are interconnected via a system on chip bus. The arbiter synchronizes the access of multiple CPUs to the shared main memory. In this article, three different arbitration policies are presented, evaluated, and compared with respect to their real-time and average-case performance: a fixed priority, a fair-based, and a time-sliced arbiter.Tasks running on different CPUs of a chip-multiprocessor (CMP) influence each others' execution times when accessing a shared memory. Therefore, the system needs an arbiter that is able to limit the worst-case execution time of a task running on a CPU, even though tasks executing simultaneously on other CPUs access the main memory. Our research shows that timing analysis is in fact possible for homogeneous multiprocessor systems with a shared memory. The timing analysis of tasks, executing on the CMP using time-sliced memory arbitration, leads to viable worst-case execution time bounds.The time-sliced arbiter divides the memory access time into equal time slots, one time slot for each CPU. This memory arbitration scheme allows for a calculation of upper bounds of Java application worst-case execution times, depending on the number of CPUs, the time slot size, and the memory access time. Examples of worst-case execution time calculation are presented, and the analyzed results of a real-world application task are compared to measured execution time results. Finally, we evaluate the tradeoffs when using a time-predictable solution compared to using average-case optimized chip-multiprocessors, applying three different benchmarks. These experiments are carried out by executing the programs on the CMP prototype.

78 citations


Journal ArticleDOI
TL;DR: This article presents Microsearch, a search system suitable for embedded devices used in ubiquitous computing environments, and adopts Information Retrieval techniques for query resolution, and proposed a new space-efficient top-k query resolution algorithm.
Abstract: In this article, we present Microsearch, a search system suitable for embedded devices used in ubiquitous computing environments. Akin to a desktop search engine, Microsearch indexes the information inside a small device, and accurately resolves a user's queries. Given the limited hardware, conventional search engine design and algorithms cannot be used. We adopt Information Retrieval (IR) techniques for query resolution, and proposed a new space-efficient top-k query resolution algorithm. A theoretical model of Microsearch is given to better understand the trade-offs in design parameters. Evaluation is done via actual implementation on off-the-shelf hardware.

71 citations


Journal ArticleDOI
TL;DR: This special issue presents some significant research work on the use of Java in embedded real-time systems, based on innovative ideas presented and discussed during the 6th International Workshop on Java Technologies for Real-Time and Embedded Systems (JTRES) in September 2008.
Abstract: Real-time and embedded applications cover an extremely wide variety of domains, each with stringent requirements. This breadth of domains results in a deep diversity of requirements, including highly precise timing characteristics, small memory footprints, flexible sensor and actuator interfaces, and robust safety characteristics. Many embedded and real-time applications can be categorized as mission-critical or safety-critical in which critical human infrastructures (e.g., automobile braking) and even human life (e.g., aircraft flight control) is sometimes at stake. Processing in such application domains often involves distributed processing in which communication and synchronization and their resulting integration present additional challenges. A major part of the cost of creating such real-time and embedded applications is spent in the often complex software development, integration, verification, and validation of the resulting systems. Therefore it is essential that the production of real-time embedded systems exploit languages, tools, and methods that simultaneously enable high software productivity with highly robust structures and resource management. In mainstream IT applications, the Java programming language has become an attractive choice because of its safety, productivity, relatively low maintenance costs, as well as the wide availability of well trained developers. However, although it has excellent software engineering characteristics, Java has often been deemed unsuitable for developing real-time embedded systems, mainly due to its under-specification of thread scheduling and the presence of garbage collection. To address these problems, some significant extensions to Java have been introduced, such as the Real-Time Specification for Java (RTSJ) developed by an Expert Group within the Java Community Process. In addition, there has been rapid progress in processor and sensor technologies in both single-node and distributed applications. This progress, combined with the expanding diversity of applications domains, is placing enormous demands on the facilities that the Java run-time environment must provide. This special issue presents some significant research work on the use of Java in embedded real-time systems. The papers presented here are based on innovative ideas presented and discussed during the 6th International Workshop on Java Technologies for Real-Time and Embedded Systems (JTRES) in September 2008. Selected papers from the workshop were invited to be expanded for this special issue in addition to an open call for papers soliciting novel contributions on the topics of the workshop. This special issue is divided into three

54 citations


Journal ArticleDOI
TL;DR: It will be shown how it is possible to formalize and constrict mobility characteristics by combining, and in some cases extending, several formal methods to achieve a higher degree of dependability.
Abstract: Wireless and pervasive healthcare applications typically present critical requirements from the point of view of functional correctness, reliability, availability, security, and safety. In contrast to the case of classic safety critical applications, the behavior of wireless and pervasive applications is affected by the movements and location of users and resources.This article presents a methodology to formally express requirements in safety critical wireless and pervasive healthcare applications in order to achieve a higher degree of dependability. In particular, it will be shown how it is possible to formalize and constrict mobility characteristics by combining, and in some cases extending, several formal methods. The article also describes a rigorous specification process. Finally, it concludes with a case study of a real safety critical pervasive healthcare application that is going to be deployed in a city hospital.

49 citations


Journal ArticleDOI
TL;DR: This article proposes MMSN, which takes advantage of multifrequency availability while, at the same time, takes into consideration the restrictions of wireless sensor networks, and exhibits the prominent ability to utilize parallel transmissions among neighboring nodes.
Abstract: Multifrequency media access control has been well understood in general wireless ad hoc networks, while in wireless sensor networks, researchers still focus on single frequency solutions. In wireless sensor networks, each device is typically equipped with a single radio transceiver and applications adopt much smaller packet sizes compared to those in general wireless ad hoc networks. Hence, the multifrequency MAC protocols proposed for general wireless ad hoc networks are not suitable for wireless sensor network applications, which we further demonstrate through our simulation experiments. In this article, we propose MMSN, which takes advantage of multifrequency availability while, at the same time, takes into consideration the restrictions of wireless sensor networks. In MMSN, four frequency assignment options are provided to meet different application requirements. A scalable media access is designed with efficient broadcast support. Also, an optimal nonuniform back-off algorithm is derived and its lightweight approximation is implemented in MMSN, which significantly reduces congestion in the time synchronized media access design. Through extensive experiments, MMSN exhibits the prominent ability to utilize parallel transmissions among neighboring nodes. When multiple physical frequencies are available, it also achieves increased energy efficiency, demonstrating the ability to work against radio interference and the tolerance to a wide range of measured time synchronization errors.

48 citations


Journal ArticleDOI
TL;DR: Applying the design approach on the Network-on-Chip (NoC) platform demonstrates the design process and benefits using the novel approach and function partitioning of agents and hierarchical monitoring operations on parallel SoCs.
Abstract: Hierarchical agent framework is proposed to construct a monitoring layer towards self-aware parallel systems-on-chip (SoCs). With monitoring services as a new design dimension, systems are capable of observing and reconfiguring themselves dynamically at all levels of granularity, based on application requirements and platform conditions. Agents with hierarchical priorities work adaptively and cooperatively to maintain and improve system performance in the presence of variations and faults. Function partitioning of agents and hierarchical monitoring operations on parallel SoCs are analyzed. Applying the design approach on the Network-on-Chip (NoC) platform demonstrates the design process and benefits using the novel approach.

44 citations


Journal ArticleDOI
TL;DR: This work proposes a technique which leverages configurable data caches to address the problem of energy inefficiency and intertask interference in multitasking embedded systems, and introduces a profile-based, off-line algorithm, which identifies a beneficial cache partitioning.
Abstract: We propose a technique that leverages configurable data caches to address the problem of energy inefficiency and intertask interference in multitasking embedded systems. Data caches are often necessary to provide the required memory bandwidth. However, caches introduce two important problems for embedded systems. Caches contribute to a significant amount of power as they typically occupy a large part of the chip and are accessed frequently. In nanometer technologies, such large structures contribute significantly to the total leakage power as well. Additionally, cache outcomes in multitasking environments are notoriously difficult to predict, if not impossible, thus resulting in poor real-time guarantees. We study the effect of multiprogramming workloads on the data cache in a preemptive multitasking environment, and propose a technique which leverages configurable cache architectures to not only eliminate intertask cache interference, but also to significantly reduce both dynamic and leakage power. By mapping tasks to different cache partitions, interference is completely eliminated. Dynamic and leakage power are significantly reduced as only a subset of the cache is active at any moment. We introduce a profile-based, off-line algorithm, which identifies a beneficial cache partitioning. The OS configures the data cache during context-switch by activating the corresponding partition. Our experiments on a large set of multitasking benchmarks demonstrate that our technique not only efficiently eliminates intertask interference, but also significantly reduces both dynamic and leakage power.

40 citations


Journal ArticleDOI
TL;DR: This work identifies the software-based compression algorithms that are most appropriate for use in low-power embedded systems as well as presenting a novel framework for dynamic data memory compression and in-RAM filesystem compression in embedded systems.
Abstract: Memory is a scarce resource during embedded system design. Increasing memory often increases packaging costs, cooling costs, size, and power consumption. This article presents CRAMES, a novel and efficient software-based RAM compression technique for embedded systems. The goal of CRAMES is to dramatically increase effective memory capacity without hardware or application design changes, while maintaining high performance and low energy consumption. To achieve this goal, CRAMES takes advantage of an operating system's virtual memory infrastructure by storing swapped-out pages in compressed format. It dynamically adjusts the size of the compressed RAM area, protecting applications capable of running without it from performance or energy consumption penalties. In addition to compressing working data sets, CRAMES also enables efficient in-RAM filesystem compression, thereby further increasing RAM capacity.CRAMES was implemented as a loadable module for the Linux kernel and evaluated on a battery-powered embedded system. Experimental results indicate that CRAMES is capable of doubling the amount of RAM available to applications running on the original system hardware. Execution time and energy consumption for a broad range of examples are rarely affected. When physical RAM is reduced to 62.5p of its original quantity, CRAMES enables the target embedded system to support the same applications with reasonable performance and energy consumption penalties (on average 9.5p and 10.5p), while without CRAMES those applications either may not execute or suffer from extreme performance degradation or instability. In addition to presenting a novel framework for dynamic data memory compression and in-RAM filesystem compression in embedded systems, this work identifies the software-based compression algorithms that are most appropriate for use in low-power embedded systems.

Journal ArticleDOI
TL;DR: An interruptible copy unit is presented that implements nonblocking object copy and can be interrupted after a single word move, and is evaluated as a real-time garbage collector that uses the proposed techniques on a Java processor.
Abstract: A real-time garbage collector has to fulfill two basic properties: ensure that programs with bounded allocation rates do not run out of memory and provide short blocking times. Even for incremental garbage collectors, two major sources of blocking exist, namely, root scanning and heap compaction. Finding root nodes of an object graph is an integral part of tracing garbage collectors and cannot be circumvented. Heap compaction is necessary to avoid probably unbounded heap fragmentation, which in turn would lead to unacceptably high memory consumption. In this article, we propose solutions to both issues.Thread stacks are local to a thread, and root scanning, therefore, only needs to be atomic with respect to the thread whose stack is scanned. This fact can be utilized by either blocking only the thread whose stack is scanned, or by delegating the responsibility for root scanning to the application threads. The latter solution eliminates blocking due to root scanning completely. The impact of this solution on the execution time of a garbage collector is shown for two different variants of such a root scanning algorithm.During heap compaction, objects are copied. Copying is usually performed atomically to avoid interference with application threads, which could render the state of an object inconsistent. Copying of large objects and especially large arrays introduces long blocking times that are unacceptable for real-time systems. In this article, an interruptible copy unit is presented that implements nonblocking object copy. The unit can be interrupted after a single word move.We evaluate a real-time garbage collector that uses the proposed techniques on a Java processor. With this garbage collector, it is possible to run high-priority hard real-time tasks at 10 kHz parallel to the garbage collection task on a 100 MHz system.

Journal ArticleDOI
TL;DR: GUSTO is the first tool of its kind to provide automatic generation of a variety of general-purpose matrix inversion architectures with different parameterization options, and provides an optimized application-specific architecture with an average of 59% area decrease and 3X throughput increase over its general- Purpose architecture.
Abstract: Matrix inversion is a common function found in many algorithms used in wireless communication systems. As FPGAs become an increasingly attractive platform for wireless communication, it is important to understand the trade-offs in designing a matrix inversion core on an FPGA. This article describes a matrix inversion core generator tool, GUSTO, that we developed to ease the design space exploration across different matrix inversion architectures. GUSTO is the first tool of its kind to provide automatic generation of a variety of general-purpose matrix inversion architectures with different parameterization options. GUSTO also provides an optimized application-specific architecture with an average of 59p area decrease and 3X throughput increase over its general-purpose architecture. The optimized architectures generated by GUSTO provide comparable results to published matrix inversion architecture implementations, but offer the advantage of providing the designer the ability to study the trade-offs between architectures with different design parameters.

Journal ArticleDOI
TL;DR: This article introduces MobiSense, a novel mobile health monitoring system for ambulatory patients that is able to detect body postures such as lying, sitting, and standing, and walking speed, by utilizing a rule-based heuristic activity classification scheme based on the extended Kalman (EK) Filtering algorithm.
Abstract: This article introduces MobiSense, a novel mobile health monitoring system for ambulatory patients. MobiSense resides in a mobile device, communicates with a set of body sensor devices attached to the wearer, and processes data from these sensors. MobiSense is able to detect body postures such as lying, sitting, and standing, and walking speed, by utilizing our rule-based heuristic activity classification scheme based on the extended Kalman (EK) Filtering algorithm. Furthermore, the proposed system is capable of controlling each of the sensor devices, and performing resource reconfiguration and management schemes (sensor sleep/wake-up mode). The architecture of MobiSense is highlighted and discussed in depth. The system has been implemented, and its prototype is showcased. We have also carried out rigorous performance measurements of the system including real-time and query latency as well as the power consumption of the sensor nodes. The accuracy of our activity classifier scheme has been evaluated by involving several human subjects, and we found promising results.

Journal ArticleDOI
TL;DR: A new design dimension is added to the traditional TLM refinement process to represent network configuration alternatives and a system/network simulation taxonomy is investigated aiming at precisely identifying the role of cosimulation in system/ network design-space exploration.
Abstract: This article presents a methodology for the design of Networked Embedded Systems (NESs), which extends Transaction Level Modeling (TLM) to perform system/network design-space exploration. As a result, a new design dimension is added to the traditional TLM refinement process to represent network configuration alternatives. Each network configuration can be used to drive both architecture exploration and system validation after each refinement step. A system/network simulation taxonomy is investigated aiming at precisely identifying the role of cosimulation in system/network design-space exploration. Furthermore, a general criterion to map functionalities to system and network models is presented. As a case study, the proposed methodology is applied to the design of a Voice-over-IP client.

Journal ArticleDOI
TL;DR: A measurement-based framework in which the postural position as it pertains to a given wireless link is first inferred based on the measured RF signal strength and packet drops, and optimal power assignment is done by fitting those measurement results into a model describing the relationship between the assigned power and the resulting signal strength.
Abstract: This article presents a novel transmission power assignment mechanism for on-body wireless links formed between severely energy-constrained wearable and implanted sensors. The key idea is to develop a measurement-based framework in which the postural position as it pertains to a given wireless link is first inferred based on the measured RF signal strength and packet drops. Then optimal power assignment is done by fitting those measurement results into a model describing the relationship between the assigned power and the resulting signal strength. A closed loop power control mechanism is then added for iterative convergence to the optimal power level as a response to both intra-and-inter posture body movements. This provides a practical paradigm for on-body power assignment, which cannot leverage the existing mechanisms in the literature that rely on localization, which is not realistic for on-body sensors. Extensive experimental results are provided to demonstrate the model building and algorithm performance on a prototype body area network. The proposed mechanism has also been compared with a number of other closed loop mechanisms and an experimental benchmark.

Journal ArticleDOI
TL;DR: The article presents the bare model and the main programming patterns that are associated with the NhRo model, which aims to avoid garbage collection in the remote invocation process, improving predictability and memory isolation of distributed Java-based real-time applications.
Abstract: This article presents an approach to providing real-time support for Java's Remote Method Invocation (RMI) and its integration with the RTSJ memory model in order to leave out garbage collection. A new construct for remote objects, called No-heap Remote object (NhRo), is introduced. The use of a NhRo guarantees that memory required to perform a remote invocation (at the server side) does not use heap memory. Thus, the aim is to avoid garbage collection in the remote invocation process, improving predictability and memory isolation of distributed Java-based real-time applications. The article presents the bare model and the main programming patterns that are associated with the NhRo model. Sun RMI implementation has been modified to integrate the NhRo model in both static and dynamic environments.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a new loop scheduling with memory management technique, Iterational Retiming with Partitioning (IRP), that can completely hide memory latencies for applications with multidimensional loops on architectures like CELL processor.
Abstract: The widening gap between processor and memory performance is the main bottleneck for modern computer systems to achieve high processor utilization. To hide memory latency, a variety of techniques have been proposed—from intermediate fast memories (caches) to various prefetching and memory management techniques. In this article, we propose a new loop scheduling with memory management technique, Iterational Retiming with Partitioning (IRP), that can completely hide memory latencies for applications with multidimensional loops on architectures like CELL processor. In IRP, the iteration space is first partitioned carefully. Then a two-part schedule, consisting of processor and memory parts, is produced such that the execution time of the memory part never exceeds the execution time of the processor part. These two parts are executed simultaneously and complete memory latency hiding is reached. In this article, we prove that such optimal two-part schedule can always be achieved given the right partition size and shape. Experiments on DSP benchmarks show that IRP consistently produces optimal solutions as well as significant improvement over previous techniques.

Journal ArticleDOI
TL;DR: The proposed approach is a two-level scheduling framework, where the first level is the RTSJ priority scheduler and the second level is under application control, which allows the concurrent use of multiple application-defined scheduling policies.
Abstract: This article presents a viable solution to introducing flexible scheduling in the Real-Time specification for Java (RTSJ), in the form of a flexible scheduling framework. The framework allows the concurrent use of multiple application-defined scheduling policies, each scheduling a subset of the total set of threads. Moreover, all threads, regardless of the policy under which they are scheduled, are permitted to share common resources. Thus, the framework can accommodate a variety of interworking applications (soft, firm, and hard) running under the RTSJ. The proposed approach is a two-level scheduling framework, where the first level is the RTSJ priority scheduler and the second level is under application control. This article describes the framework's protocol, examines the different types of scheduling policies that can be supported, and evaluates the proposed framework by measuring its execution cost. A description of an application-defined Earliest-Deadline-First (EDF) scheduler illustrates how the interface can be used. Minimum backward-compatible changes to the RTSJ specification are discussed to motivate the required interface. The only assumptions made about the underlying real-time operating system is that it supports preemptive priority-based dispatching of threads and that changes to priorities have immediate effect.

Journal ArticleDOI
TL;DR: This article proposes a method with polynomial complexity to find the global optimum of an NP-hard model partitioning problem for 75% of occurrences under some practical conditions.
Abstract: This article takes a theoretical approach to focus on the algorithmic properties of hardware/software partitioning. It proposes a method with polynomial complexity to find the global optimum of an NP-hard model partitioning problem for 75p of occurrences under some practical conditions. The global optimum is approached with a lower bound distance for the remaining 25p. Furthermore, this approach ensures finding the 2-approximate of the global optimum partition in 97p of instances where technical assumptions exist. The strategy is based on intelligently changing the parameters of the polynomial model of the partitioning problem to force it to produce (or approach) the exact solution to the NP-hard model.

Journal ArticleDOI
TL;DR: A new romization scheme is proposed that allows the system to be started within a virtual execution environment, and thus to be fully deployed off-board before being transferred to its real execution support, resulting in a very low-footprint, custom-tailored embedded system.
Abstract: This article presents a new way to deploy and customize embedded virtual machine based operating systems for very restrained devices. Due to the specificity of restrained embedded devices (large usage of read-only memory, very few writable memory available, …), these systems are typically deployed off-board, in a process called romization. However, current romization solutions do not allow a complete deployment to take place outside of the execution device: they are capable of converting system components and applications into their executable form, but are unable to perform any operation that would require the system to be running. This results in a good part of the deployment being performed by the target device, at the cost of longer startup times, bloat with code and data that are only executed once at startup, and suboptimal memory placement of data structures. In this article, we propose a new romization scheme that allows the system to be started within a virtual execution environment, and thus to be fully deployed off-board before being transferred to its real execution support. We then take advantage of all the information provided by the deployed state in order to analyze and customize it, resulting in a very low-footprint, custom-tailored embedded system. The Java platform is used as a support to implement our romization architecture and perform our experiments. For the evaluated set of embedded applications, we were able to obtain embedded systems which memory footprint was lower than their J2ME counterpart, while being based on a full-fledged J2SE environment.

Journal ArticleDOI
TL;DR: An FPGA-arbitrated parallel architecture is described that allows unqualified commercial devices to be incorporated into a computational device with aggregate reliability figures similar to those of traditional space-qualified alternatives.
Abstract: Spacecraft typically employ rare and expensive radiation-tolerant, radiation-hardened, or at least military qualified parts for computational and other mission critical subsystems. Reasons include reliability in the harsh environment of space, and systems compatibility or heritage with previous missions. The overriding reliability concern leads most satellite computing systems to be rather conservative in design, avoiding novel or commercial-off-the-shelf components. This article describes an alternative approach: an FPGA-arbitrated parallel architecture that allows unqualified commercial devices to be incorporated into a computational device with aggregate reliability figures similar to those of traditional space-qualified alternatives. Apart from the obvious cost benefits in moving to commercial-off-the-shelf devices, these are attractive in situations where lower power consumption and/or higher processing performance are required. The latter argument is particularly of major importance at a time when the gap between required and available processing capability in satellites is widening. An analysis compares the proposed architecture to typical alternatives, maintaining risk of failure to within required levels, and discusses key applications for the parallel architecture.

Journal ArticleDOI
TL;DR: This article examines the AEH techniques used in some popular RTSJ implementations and proposes two efficient AEH models, which require fewer threads on average and give a certain level of configurability to AEH.
Abstract: The Real-Time Specification for Java (RTSJ) is becoming mature. It has been implemented, formed the basis for research and used in serious applications. Some strengths and weaknesses are emerging. One of the areas that requires further elaboration is asynchronous event handling (AEH). The primary goal for handlers in the RTSJ is to have a lightweight concurrency mechanism. Some implementation will, however, simply map a handler to a real-time thread and this results in undermining the original motivations and introduces performance penalties. However it is generally unclear how to map handlers to real-time threads effectively. Also the support for nonblocking handlers in the RTSJ is criticized as lacking in configurability as implementations are unable to take advantage of them. This article, therefore, examines the AEH techniques used in some popular RTSJ implementations and proposes two efficient AEH models for the RTSJ. We then define formal models of the RTSJ AEH implementations using the automata formalism provided by the UPPAAL model checking tool. Using the automata models, their properties are explored and verified. In the proposed models, blocking and nonblocking handlers are serviced by different algorithms. In this way, it is possible to assign a real-time thread to a handler at the right time in the right place while maintaining the fewest possible threads overall and to give a certain level of configurability to AEH. We also have implemented the proposed models on an existing RTSJ implementation, jRate and executed a set of performance tests that measure their respective dispatch and multiple-handler completion latencies. The results from the tests and the verifications indicate that the proposed models require fewer threads on average with better performance than other approaches.

Journal ArticleDOI
TL;DR: A distributed, peer-to-peer gesture recognition system along with a software architecture modeling technique and authority control protocol for ubiquitous cameras and a service-oriented software architecture to dynamically reconfigure services when system state changes are proposed.
Abstract: In this article, we describe a distributed, peer-to-peer gesture recognition system along with a software architecture modeling technique and authority control protocol for ubiquitous cameras. This system performs gesture recognition in real time by combining imagery from multiple cameras without using a central server. We propose a system architecture that uses a network of inexpensive cameras to perform in-network video processing. A methodology for transforming well-designed single-node algorithm to distributed system is also proposed. Applications for ubiquitous cameras can be modeled as the composition of a finite-state machine of the system, functional services, and middleware. A service-oriented software architecture is proposed to dynamically reconfigure services when system state changes. By exchanging data and control messages between neighboring sensors, each node can maintain broader view of the environment with integrated video-processing results. Our prototype system is built on Windows machines, and uses standard video cameras as sensors and local network as a communication channel.

Journal ArticleDOI
TL;DR: The upper bound on lock-free retries under the UAM is derived with utility accrual scheduling - the first such result and establishes the tradeoffs between lock- free and lock-based sharing under UAM.
Abstract: We consider lock-free synchronization for dynamic embedded real-time systems that are subject to resource overloads and arbitrary activity arrivals. We model activity arrival behaviors using the unimodal arbitrary arrival model (or UAM). UAM embodies a stronger “adversary” than most traditional arrival models. We derive an upper bound on lock-free retries under the UAM with utility accrual scheduling—the first such result. We establish the tradeoffs between lock-free and lock-based sharing under UAM. These include conditions under which activities' accrued timeliness utility is greater under lock-free than lock-based, and the consequent lower and upper bound on the total accrued utility that is possible with lock-free and lock-based sharing. We confirm our analytical results with a POSIX RTOS implementation.

Journal ArticleDOI
TL;DR: The proposed approach enables a bus arbiter to use much less RT mode in providing a Real-Time (RT) guarantee and, therefore, gives the arbiter more opportunity to employ non-RT modes to achieve better overall QoS.
Abstract: In an advanced System-on-Chip (SoC) for real-time applications, the arbiter of its on-chip communication subsystem needs to support multiple QoS criteria while providing a hard real-time guarantee. To fulfill both objectives, the arbitration algorithm must dynamically switch between NonReal-Time (NRT) and Real-Time (RT) modes such that use of the RT mode is minimized to best accommodate the overall QoS criteria. In this article, we define a model for this problem, and propose optimal solutions to its associated problems with static and dynamic warning-zone-length assignment. Compared with previous works, the proposed approach enables a bus arbiter to use much less RT mode in providing a Real-Time (RT) guarantee and, therefore, gives the arbiter more opportunity to employ non-RT modes to achieve better overall QoS. Experimental results show that the proposed approach reduces RT mode usage by as much as 37.1p. Moreover, that reduction in RT mode usage helps cut the execution time by 27.0p when applying our approach to an industrial DRAM controller. Another case study on an AMBA-compliant ultra-high-resolution H.264 decoder IP shows that the proposed approach reduces RT mode usage by 26.4p, which leads to an average reduction of 10.4p in decoding time. Finally, when implementing a 16 master arbiter, it costs only 6.9K and 9.5K gates of overhead using the proposed static and dynamic approach, respectively. Therefore, the proposed approach is suitable for real-time SoC applications.

Journal ArticleDOI
TL;DR: A self-adjusting flash translation layer is proposed with low memory requirements to provide efficient address mapping and low garbage collection overhead, while controlling main memory usage of the flashtranslation layer.
Abstract: The capacity of flash memory storage systems has been growing at a speed similar to many other storage systems. In order to properly manage the product cost, vendors face serious challenges in resource-limited embedded systems. In this article, a self-adjusting flash translation layer is proposed with low memory requirements. The objective of the design is to provide efficient address mapping and low garbage collection overhead, while controlling main memory usage of the flash translation layer. The capability of the design is evaluated over realistic workloads and benchmarks. System performance is also guaranteed under low memory requirements.


Journal ArticleDOI
TL;DR: This article presents a new software-based online memory compression algorithm for embedded systems that has a competitive compression ratio but is twice as fast, and allows existing applications to execute with less physical memory.
Abstract: Online memory compression is a technology that increases the amount of memory available to applications by dynamically compressing and decompressing their working datasets on demand. It has proven extremely useful in embedded systems with tight physical RAM constraints. The technology can be used to increase functionality, reduce size, and reduce cost, without modifying applications or hardware. This article presents a new software-based online memory compression algorithm for embedded systems. In comparison with the best algorithms used in online memory compression, our new algorithm has a competitive compression ratio but is twice as fast. In addition, we describe several practical problems encountered in developing an online memory compression infrastructure and present solutions. We present a method of adaptively managing the uncompressed and compressed memory regions during application execution. This memory management scheme adapts to the predicted memory requirements of applications. It permits efficient compression for a wide range of applications. We have evaluated our techniques on a portable embedded device and have found that the memory available to applications can be increased by 2.5× with negligible performance and power consumption penalties, and with no changes to hardware or applications. Our techniques allow existing applications to execute with less physical memory. They also allow applications with larger working datasets to execute on unchanged embedded system hardware, thereby increasing functionality.

Journal ArticleDOI
TL;DR: This work presents a high-level study of the performance, power, and thermal behavior of tile-based architectures with a large number of cores executing flow-based packet workloads, and proposes a load-balancing policy of assigning packets to cores that minimizes the communication latency while featuring a hotspot-free thermal profile.
Abstract: Massive multi-core architectures provide a computation platform with high processing throughput, enabling the efficient processing of workloads with a significant degree of thread-level parallelism found in networking environments.Communication-centric workloads, like those in LAN and WAN environments, are fundamentally composed of sets of packets, named flows. The packets within a flow usually have dependencies among them, which reduce the amount of parallelism. However, packets of different flows tend to have very few or no dependencies among them, and thus can exploit thread-level parallelism to its fullest extent.Therefore, in massive tile-based multi-core architectures, it is important that the processing of the packets of a particular flow takes place in a set of cores physically close to each other to minimize the communication latency among those cores. Moreover, it is also desirable to spread out the processing of the different flows across all the cores of the processor in order to minimize the stress on a reduced number of cores, thus minimizing the potential for thermal hotspots and increasing the reliability of the processor. In addition, the burst-like nature of packet-based workloads render most of the cores idle most of the time, enabling large power savings by power gating these idle cores.This work presents a high-level study of the performance, power, and thermal behavior of tile-based architectures with a large number of cores executing flow-based packet workloads, and proposes a load-balancing policy of assigning packets to cores that minimizes the communication latency while featuring a hotspot-free thermal profile.