Showing papers in &quot;IEEE Embedded Systems Letters in 2011&quot;

Intermediate Fabrics: Virtual Architectures for Near-Instant FPGA Compilation

TL;DR: The design of the versatile Opal platform that couples a Cortex M3 MCU with two IEEE 802.15.4 radios for supporting sensing applications with high transfer rates without sacrificing communication range is presented.

...read moreread less

Abstract: Design of current sensor network platforms has favored low power operation at the cost of communication throughput or range, which severely limits support for real-time monitoring applications with high throughput requirements. This letter presents the design of the versatile Opal platform that couples a Cortex M3 MCU with two IEEE 802.15.4 radios for supporting sensing applications with high transfer rates without sacrificing communication range. We present experiments that evaluate Opal's throughput and range when operating with one or two radios, and we compare these results with an Iris-based node and TelosB nodes. We introduce the spatial energy cost metric that measures the energy to transfer one bit of information in a unit area for comparing the performance of the platforms. The results show that Opal operating with dual radios increases the throughput compared to single radio platforms with the same data-rate by a factor of 3.7, without sacrificing communication range. Opal operating with one radio can deliver a 460% increase in throughput over other single radio nodes at reduced range. We also analyze the implications of Opal's design for multihop communication, showing that the dual radio architecture removes the bandwidth bottleneck in multihop communications that is inherent to single radio platforms.

...read moreread less

62 citations

Journal Article•DOI•

[...]

Greg Stitt¹, James Coole¹•Institutions (1)

University of Florida¹

A Custom FPGA Processor for Physical Model Ordinary Differential Equation Solving

TL;DR: In this letter, virtual reconfigurable architectures, referred to as intermediate fabrics, are evaluated, which enable near-instant placement and routing of applications for commercial FPGAs.

...read moreread less

Abstract: Field-programmable gate arrays (FPGAs) suffer from lower application design productivity than other devices, which is largely due to compilation taking hours or even days. Making FPGA compilation comparable to software compilation is critical for continued FPGA usage due to competitive technologies, such as graphics-processing units, that use languages with runtime compilation models. In this letter, we evaluate virtual reconfigurable architectures, referred to as intermediate fabrics, which enable near-instant placement and routing of applications for commercial FPGAs.

...read moreread less

46 citations

Journal Article•DOI•

[...]

Chen Huang¹, Frank Vahid¹, Tony Givargis²•Institutions (2)

University of California, Riverside¹, University of California, Irvine²

Managing Battery and Supercapacitor Resources for Real-Time Sporadic Workloads

TL;DR: A resource efficient custom processor-the differential equation processing element, or DEPE-specifically designed for efficient solution of ODEs on FPGAs, is introduced, and its accompanying compilation tools are introduced.

...read moreread less

Abstract: Models of physical systems, such as of human physiology or of chemical reactions, are typically comprised of numerous ordinary differential equations (ODEs). Today's designers commonly consider simulating physical models utilizing field-programmable gate arrays (FPGAs). This letter introduces a resource efficient custom processor-the differential equation processing element, or DEPE-specifically designed for efficient solution of ODEs on FPGAs, and also introduces its accompanying compilation tools. We show that a single DEPE on a Xilinx Virtex6 130T FPGA executes several physiological models faster than real-time while requiring only a few hundred FPGA lookup tables (LUTs). Experiments with a commercial high-level synthesis(HLS) tool show that while a single DEPE is 5-50× slower than HLS circuits, DEPE is 10-200 × smaller. We show that a single DEPE is only 10× slower than a relatively massive and costly 3 GHz Pentium 4 desktop processor for ODE solving, and its speed is also competitive with a 700 Mhz TI digital signal processor and an 450 Mhz ARM9 processor. DEPE is 4×-17× faster than a Xilinx MicroBlaze soft-core processor and 3 ×-6 × smaller. DEPE thus represents an excellent processing element for use by itself for small physical models, and in future parallel networks for larger models.

...read moreread less

29 citations

Journal Article•DOI•

[...]

Chandan Krishna¹•Institutions (1)

University of Massachusetts Amherst¹

Efficient On-Chip Task Scheduler and Allocator for Reconfigurable Operating Systems

TL;DR: This letter characterize such energy sources by means of two performance measures: expected time before the first task failure and the fraction of tasks that fail before the battery dies, and presents semi-Markov models to evaluate these measures.

...read moreread less

Abstract: Batteries and supercapacitors are complementary: batteries have a high energy-to-weight ratio but are limited in the power levels they can support; supercapacitors can provide high levels of power while they have a much lower energy-to-weight ratio. A battery-supercapacitor duo can therefore prove useful in embedded systems serving sporadic, energy-intensive, tasks: the battery charges the capacitor at a low, fairly steady, rate which maximizes the energy that can be drawn from it, while the supercapacitor satisfies the impulse power demands of the application. In this letter, we characterize such energy sources by means of two performance measures: expected time before the first task failure and the fraction of tasks that fail before the battery dies. or the case of rare (but energy-intensive) sporadic tasks, we present semi-Markov models to evaluate these measures. For more frequent task arrivals, we provide simulation results. This letter demonstrates the impact of various parameters on our performance measures: power draw, capacitor sizing, and the battery rest scheduling policy.

...read moreread less

27 citations

Journal Article•DOI•

[...]

Chuan Hong¹, Khaled Benkrid¹, Xabier Iturbe¹, Ali Ebrahim¹, Tughrul Arslan¹ - Show less +1 more•Institutions (1)

University of Edinburgh¹

Lossless Hyperspectral Image Compression System-Based on HW/SW Codesign

TL;DR: A novel fault-tolerant allocating algorithm called “best-fit empty area compact (BF-EAC),” and its on-chip implementation on a Xilinx Virtex-4 field-programmable gate array (FPGA), which circumvents emerging faults while maintaining more compact empty areas for emerging tasks.

...read moreread less

Abstract: This letter presents efficient and modular task scheduler and allocator support for dynamically and partially reconfigurable electronic systems. This enables hardware tasks to be preempted and arbitrarily placed at an optimal position on the chip on-the-fly. In particular, we present a novel fault-tolerant allocating algorithm called “best-fit empty area compact (BF-EAC),” and its on-chip implementation on a Xilinx Virtex-4 field-programmable gate array (FPGA), which circumvents emerging faults while maintaining more compact empty areas for emerging tasks. We also present an implementation of the early deadline first (EDF) scheduling heuristic used to optimize the chronological order of execution of hardware tasks to meet real time constraints. Put together, the placement and scheduling architecture efficiently exploits chip resources with a μs-grade computing speed and a lightweight footprint (less than 500 slices).

...read moreread less

23 citations

Journal Article•DOI•

[...]

Yin-Tsung Hwang¹, Cheng-Chen Lin¹, Ruei-Ting Hung¹•Institutions (1)

National Chung Hsing University¹

An Energy-Efficient Heterogeneous System for Embedded Learning and Classification

TL;DR: The design and implementation of a lossless compression system for hyperspectral images on a processor-plus-field-programmable gate array (FPGA)-based embedded platform shows a 21 speed-up compared to a purely software implementation and the performance was actually bounded by the software section in realizing an entropy coder.

...read moreread less

Abstract: The design and implementation of a lossless compression system for hyperspectral images on a processor-plus-field-programmable gate array (FPGA)-based embedded platform. Software execution time of compression algorithm was profiled first to conclude the decision of accelerating the most time consuming interband prediction module by hardware realization. Efficient algorithm to hardware mapping led to a high throughput accelerator design in FPGA capable of processing 16.5 M pixels/s. A set of optimization techniques were applied systematically to enhance the overall system performance. These include a hierarchical memory access scheme to resolve the bus bandwidth limitation, DMA assisted data transfers to shorten the hardware/software (HW/SW) communication, and various coding style and compiler options to optimize the software execution. The final result shows a 21 speed-up compared to a purely software implementation and the performance was actually bounded by the software section in realizing an entropy coder. A 27 speed-up can be achieved if a simplified coder is used.

...read moreread less

19 citations

Journal Article•DOI•

[...]

Abhinandan Majumdar¹, Srihari Cadambi¹, Srimat Chakradhar¹•Institutions (1)

Princeton University¹

A Novel Soft Error Detection and Correction Circuit for Embedded Reconfigurable Systems

TL;DR: This letter builds a low-power system using an Atom processor, an ION, a GPU, and a field-programmable gate array (FPGA)-based custom accelerator, and study its performance and power characteristics using four representative workloads.

...read moreread less

Abstract: Embedded learning applications in automobiles, surveillance, robotics, and defense are computationally intensive, and process large amounts of real-time data. Systems for such workloads have to balance stringent performance constraints within limited power budgets. High performance computer processing units (CPUs) and graphics processing units (GPUs) cannot be used in an embedded platform due to power issues. In this letter, we propose a low power heterogeneous system consisting of an Atom processor supported by multiple accelerators that target these workloads, and seek to find if such a system can satisfy performance requirements in an energy-efficient manner. We build our low-power system using an Atom processor, an ION, a GPU, and a field-programmable gate array (FPGA)-based custom accelerator, and study its performance and power characteristics using four representative workloads. With such a system, we show an energy improvement of 42-85% over a server comprising a 2.27 GHz quadcore Xeon coupled to a 1.3 GHz 240 core Tesla GPU.

...read moreread less

19 citations

Journal Article•DOI•

[...]

Qian Zhao¹, Yoshihiro Ichinomiya¹, Motoki Amagasaki¹, Masahiro Iida¹, Toshinori Sueyoshi¹ - Show less +1 more•Institutions (1)

Kumamoto University¹

Lazy Versus Eager Conflict Detection in Software Transactional Memory: A Real-Time Schedulability Perspective

TL;DR: A Hamming code based error detection and correction (EDAC) circuit that can protect the configuration memory of a reconfigurable device from SEUs and DEUs is developed.

...read moreread less

Abstract: As the size of integrated circuits has reached the nanoscale, embedded memories are more sensitive to single-event upsets (SEUs) or double-event upsets (DEUs), due to their low threshold voltage. In particular, reconfigurable systems, containing a large number of configuration memories to implement customer circuits, are more likely to suffer from soft errors caused by SEUs and DEUs. In this letter, we develop a Hamming code based error detection and correction (EDAC) circuit that can protect the configuration memory of a reconfigurable device from SEUs. Evaluation reveals that compared to the conventional triple modular redundancy (TMR) protected field-programmable gate array (FPGA) tile, the proposed EDAC protected FPGA tile shows about 2.3 times better dependability on the influence of DEUs. Moreover, as the FPGA array size increases, the dependability advantage of EDAC increases exponentially. The main drawback of EDAC is that it has about 1.6 times greater area overhead than TMR.

...read moreread less

15 citations

Journal Article•DOI•

[...]

Chaitanya Belwal¹, Albert M. K. Cheng¹•Institutions (1)

University of Houston¹

Custom Microcoded Dynamic Memory Management for Distributed On-Chip Memory Organizations

TL;DR: This work presents a real-time scheduling perspective analysis of lazy and eager conflict detection policies used in STM, and presents an abstract model for this analysis.

...read moreread less

Abstract: Transactional memory is a mechanism of controlling access to shared resources in concurrent programs. Though originally implemented in hardware, software implementations of transactional memory are now available as library extensions in all major programming language. Lately, variants of software transactional memory (STM) with real-time support have been presented. The conflict detection policy used in STM, which can be of lazy or eager type, determines the point at which transactions are aborted. The conflict detection policy can have a significant effect on the schedulability of tasks sharing common resources. Using an abstract model, we present a real-time scheduling perspective analysis of lazy and eager conflict detection policies used in STM.

...read moreread less

14 citations

Journal Article•DOI•

[...]

Iraklis Anagnostopoulos¹, Sotirios Xydis¹, Alexandros Bartzas¹, Zhonghai Lu², Dimitrios Soudris¹, Axel Jantsch² - Show less +2 more•Institutions (2)

National and Kapodistrian University of Athens¹, Royal Institute of Technology²

Data Archival to SD Card Via Hardware Description Language

TL;DR: This work addresses the problem of providing customized microcoded DMM on MPSoC platforms with distributed memory organization by providing a solution that can serve approximately 7× more allocation requests compared to pure distributed memory platforms and perform 25% faster than the corresponding high-level implementation in C language.

...read moreread less

Abstract: Multiprocessor system-on-chip (MPSoCs) have attracted significant attention since they are recognized as a scalable paradigm to interconnect and organize a high number of cores. Current multicore embedded systems exhibit increased levels of dynamic behavior, leading to unexpected memory footprint variations unknown at design time. Dynamic memory management (DMM) is a promising solution for such types of dynamic systems. Although some efficient dynamic memory managers have been proposed for conventional bus-based MPSoC platforms, there are no DMM solutions regarding the constraints and the opportunities delivered by the physical distribution of multiple memory nodes of the platform. In this work, we address the problem of providing customized microcoded DMM on MPSoC platforms with distributed memory organization. Customization is enabled at application- and platform-level. Results show that customized microcoded DMM can serve approximately 7× more allocation requests compared to pure distributed memory platforms and perform 25% faster than the corresponding high-level implementation in C language.

...read moreread less

14 citations

Journal Article•DOI•

[...]

Omar Elkeelany¹, Vivekanand Savant Todakar¹•Institutions (1)

Tennessee Technological University¹

A Tabu-Based Partitioning and Layer Assignment Algorithm for 3-D FPGAs

TL;DR: The design of an efficient, real-time data archival system to a secure digital flash memory card via reconfigurable hardware and the bidirectional access takes place correctly and the data integrity has been verified using cyclic redundancy code in both field-programmable gate array (FPGA) chip and the SD card controller.

...read moreread less

Abstract: The main objective of this letter is to present the design of an efficient, real-time data archival system to a secure digital flash memory card via reconfigurable hardware. The data access from the SD card is implemented completely using Verilog and hence, there is no use of any microcontroller or on-chip general purpose processors. And since the complete design is a single-purpose system, no extra hardware is required. The design has four independent modules for the required different operations on the SD memory card. These four modules are for single-block write, multiple-block write, single-block read, and multiple-block read operations. We show how the bidirectional access takes place correctly and the data integrity has been verified using cyclic redundancy code in both field-programmable gate array (FPGA) chip and the SD card controller.

...read moreread less

Journal Article•DOI•

[...]

Kostas Siozios¹, Dimitrios Soudris¹•Institutions (1)

National and Kapodistrian University of Athens¹

Scalable Many-Domain Power Gating in Coarse-Grained Reconfigurable Processor Arrays

TL;DR: A TSV-aware partitioning algorithm that enables higher performance for application implementation onto 3-D field-programmable gate arrays (FPGAs) and leads to a more efficient utilization of the available (fabricated) interlayer connectivity.

...read moreread less

Abstract: Integrating more functionality in a smaller form factor with higher performance and lower power consumption is pushing semiconductor technology scaling to its limits. Three-dimensional (3-D) chip stacking is touted as the silver bullet technology that can keep Moore's momentum and fuel the next wave of consumer electronics products. This letter introduces a TSV-aware partitioning algorithm that enables higher performance for application implementation onto 3-D field-programmable gate arrays (FPGAs). Unlike other algorithms that minimize the number of connections among layers, our solution leads to a more efficient utilization of the available (fabricated) interlayer connectivity. Experimental results show average reductions in delay and power consumption, as compared to similar 3-D computer-aided design (CAD) tools, about 28% and 26%, respectively.

...read moreread less

Journal Article•DOI•

[...]

Dmitrij Kissler¹, D Gran¹, Zoran Salcic², Frank Hannig¹, Jürgen Teich¹ - Show less +1 more•Institutions (2)

University of Erlangen-Nuremberg¹, University of Auckland²

Reconfigurable Architecture for ZQDCT Using Computational Complexity Prediction and Bitstream Relocation

TL;DR: This letter presents a systematic approach to efficiently handle a very large number of power domains in modern coarse-grained reconfigurable arrays in order to tightly match the different computational demands of processed algorithms with corresponding power consumption.

...read moreread less

Abstract: This letter presents a systematic approach to efficiently handle a very large number of power domains in modern coarse-grained reconfigurable arrays in order to tightly match the different computational demands of processed algorithms with corresponding power consumption. It is based on a new highly scalable and generic power control network and additionally uses the state-of-the-art common power format based front-to-backend design methodology for a fully automated implementation. The power management is transparent to the user and is seamlessly integrated into the overall reconfiguration process: reconfiguration-controlled power gating. Furthermore, for the first time, a coarse-grained reconfigurable case study design with as many as 24 switchable power domains with detailed results on power savings and overheads is presented. The application of the proposed technique results in 60% active leakage and 90% standby leakage power reduction for several digital signal processing algorithms.

...read moreread less

Journal Article•DOI•

[...]

Jian Huang¹, Jooheung Lee¹•Institutions (1)

University of Central Florida¹

PI and PID Regulation Approaches for Performance-Constrained Adaptive Multiprocessor System-on-Chip

TL;DR: A hybrid model-based quality priority algorithm is developed to reduce power consumption, required hardware resources, and computation time with a small quality degradation in ZQDCT computation.

...read moreread less

Abstract: Due to the high computational complexity of discrete cosine transform (DCT) computation, prediction of zero quantized DCT (ZQDCT) coefficients has been extensively studied to reduce the computational complexity of DCT computation. In this letter, we propose a reconfigurable architecture to support ZQDCT computation. Twelve different modes of DCT computations including zonal coding, multiblock processing, and parallel-sequential stage mode can be performed using proposed architecture. We develop a hybrid model-based quality priority algorithm to reduce power consumption, required hardware resources, and computation time with a small quality degradation.

...read moreread less

Journal Article•DOI•

[...]

Gabriel Marchesan Almeida¹, Remi Busseuil¹, Luciano Ost¹, Florent Bruguier¹, Gilles Sassatelli¹, Pascal Benoit¹, Lionel Torres¹, Michel Robert¹ - Show less +4 more•Institutions (1)

University of Montpellier¹

Task Dependency Analysis for Regression Test Selection of Embedded Programs

TL;DR: A novel approach based on the utilization of PI and PID controllers, widely used in control automation, for optimizing resources utilization in Multiprocessor System-on-Chip (MPSoC).

...read moreread less

Abstract: Adaptive multiprocessor systems are appearing as a promising solution for dealing with complex and unpredictable scenarios. Given the large variety of possible use cases that these platforms must support and the resulting workload variability, offline approaches are no longer sufficient because they do not allow coping with time changing workloads. This letter presents a novel approach based on the utilization of PI and PID controllers, widely used in control automation, for optimizing resources utilization in Multiprocessor System-on-Chip (MPSoC). Several architecture characteristics such as response time during frequency changing, noise and perturbations are modeled and validated in a high-level model and results are compared to information obtained on a homogeneous MPSoC platform prototype. Power and energy consumption figures are discussed and two controllers are proposed: 1) PI-; and 2) PID-based controllers. Results show the system capability of adapting under disturbing conditions while ensuring application performance constraints and reducing energy consumption.

...read moreread less

Journal Article•DOI•

[...]

Swarnendu Biswas¹, Rajib Mall¹, Manoranjan Satpathy•Institutions (1)

Indian Institute of Technology Kharagpur¹

Island-Based Adaptable Embedded System Design

TL;DR: It is argued that execution dependencies among tasks need to be suitably considered in various embedded software engineering activities such as debugging, regression testing, and computation of complexity metrics.

...read moreread less

Abstract: Execution dependencies arise among the tasks of an embedded program due to issues such as task priority, task precedence, and intertask communication. We argue that execution dependencies among tasks need to be suitably considered in various embedded software engineering activities such as debugging, regression testing, and computation of complexity metrics. In this letter, we discuss how task execution dependencies among real-time tasks can be identified from static code analysis. Subsequently, we briefly describe an application of our analysis to regression test selection.

...read moreread less

Journal Article•DOI•

[...]

Ivan Beretta¹, Vincenzo Rana, David Atienza¹, Donatella Sciuto•Institutions (1)

École Normale Supérieure¹

Scheduling Conditions for Real-Time Software Transactional Memory

TL;DR: This work proposes a design methodology that combines an efficient reconfigurable architecture and a related mapping flow that couples an efficient area usage and an adaptable communication infrastructure for island-based hardware architecture.

...read moreread less

Abstract: Nowadays, hardware devices are meant to host the execution of many complex, multicore applications, whose functional and nonfunctional requirements vary according to the specific working domain. In this work, we propose a design methodology that combines an efficient reconfigurable architecture and a related mapping flow. In particular, the proposed island-based hardware architecture couples an efficient area usage and an adaptable communication infrastructure. The proposed mapping flow distributes the cores on the device to optimize both performance and reconfiguration related metrics.

...read moreread less

Journal Article•DOI•

[...]

Chaitanya Belwal¹, Albert M. K. Cheng¹•Institutions (1)

University of Houston¹

A Kleene Algebra of Tagged System Actors

TL;DR: This letter formally derive utilization based necessary and sufficient scheduling condition for a STM system using lazy conflict detection and derives the execution semantics of STM from the classical preemptive or nonpreemptive model.

...read moreread less

Abstract: Software transactional memory (STM) is a transactional mechanism of controlling access to shared resources in memory. Recently, variants of STM with real-time support have been presented. Due to its abort-restart nature, the execution semantics of STM are different from the classical preemptive or nonpreemptive model. In this letter, we formally derive utilization based necessary and sufficient scheduling condition for a STM system using lazy conflict detection.

...read moreread less

Journal Article•DOI•

[...]

Soumyajit Dey¹, Dipankar Sarkar¹, Anupam Basu¹•Institutions (1)

Indian Institute of Technology Kharagpur¹

Cross-Layer Optimizations in Solid-State Drives

TL;DR: This work provides a representation of tagged systems using the semantics of Kleene algebra that facilitates the usage of standard off-the-shelf theorem provers for reasoning about such systems for both behavioral verification through equivalence checking and property verification.

...read moreread less

Abstract: The tagged signal model (TSM) is a formal framework for modeling heterogeneous embedded systems. In the present work, we provide a representation of tagged systems using the semantics of Kleene algebra. Such an algebraic representation facilitates the usage of standard off-the-shelf theorem provers for reasoning about such systems for both behavioral verification through equivalence checking and property verification.

...read moreread less

Journal Article•DOI•

[...]

Jia-Hao Wang¹, Hsin-Hung Chen¹, Wei-Jian Su¹, Da-Wei Chang¹•Institutions (1)

National Cheng Kung University¹

A Low-Overhead Partition-Oriented ERfair Scheduler for Hard Real-Time Embedded Systems

TL;DR: Cross-layer optimization techniques involving the cooperation between the buffer management layer and the FTL are proposed in this letter to maximize the degree of parallelism while keeping a low garbage collection overhead.

...read moreread less

Abstract: Solid-state drives (SSDs) utilize parallel architectures to improve their IO throughput. Although log buffer based flash translation layers (FTLs) are widely employed in SSDs, little has been addressed on the issue of placing pages under a parallel architecture in such FTLs. In this letter, we evaluate three possible page placement policies and show that there is no best policy due to the tradeoff between the degree of parallelism and the garbage collection overhead. To achieve high performance SSDs, cross-layer optimization techniques involving the cooperation between the buffer management layer and the FTL are proposed in this letter to maximize the degree of parallelism while keeping a low garbage collection overhead. The basic idea is to let the FTL keep the garbage collection overhead low by eliminating high-cost cross-channel live page copying, while making the buffer management layer responsible for maximizing the degree of parallelism. Simulation results on five realistic or benchmark based workloads show that the proposed techniques reduce the response time of the SSD by up to 79%.

...read moreread less

Journal Article•DOI•

[...]

Arnab Sarkar¹, Amit Shanker¹, Sujoy Ghose¹, Partha Chakrabarti¹•Institutions (1)

Indian Institute of Technology Kharagpur¹

Nonvolatile Memory Partitioning Scheme for Technology-Based Performance-Reliability Tradeoff

TL;DR: Experimental results reveal that POES incurs almost no migrations at low workloads and achieves up to 32 times reduction in the number of migrations suffered with respect to the global ERfair scheduler on a set of two to 16 processors even when the average system load is as high as 85%.

...read moreread less

Abstract: This letter presents partition-oriented ERfair scheduler (POES), a low-overhead proportional fair scheduler for hard real-time multiprocessor embedded systems. POES achieves lower overheads using an online partitioning/merging mechanism that retains the optimal schedulability of a fully global scheduler by merging processor groups as resources become critical while using partitioning for fast scheduling at other times. The principal objective is to remain only just as global at any given instant of time as is necessary to maintain ERfair schedulability of the system throughout the schedule length. Experimental results reveal that POES incurs almost no migrations at low workloads and achieves up to 32 times reduction in the number of migrations suffered with respect to the global ERfair scheduler on a set of two to 16 processors even when the average system load is as high as 85%. Theoretical analysis proves that POES typically has the same amortized complexity as that of the global ERfair algorithm.

...read moreread less

Journal Article•DOI•

[...]

Cristian Zambelli, Davide Bertozzi, Andrea Chimenton¹, Piero Olivo•Institutions (1)

Intel¹

A Dynamic Resource Management and Scheduling Environment for Embedded Multimedia and Communications Platforms

TL;DR: A methodology is proposed, managed by the memory controller, that optimizes the data reliability at the physical level for critical data whereas exploiting the transaction performances for noncritical data by partitioning the memory addressable space in different functional blocks.

...read moreread less

Abstract: The need to improve nonvolatile memories reliability in embedded systems is a key design concern. We here propose a methodology, managed by the memory controller, that optimizes the data reliability at the physical level for critical data whereas exploiting the transaction performances for noncritical data. The reliability-performance tradeoff is obtained by partitioning the memory addressable space in different functional blocks, each on written by means of a specific optimized writing algorithm. The method feasibility is demonstrated by a case study exploiting phase change memories (PCMs) features.

...read moreread less

Journal Article•DOI•

[...]

Arshdeep Bahga¹, Vijay K. Madisetti¹•Institutions (1)

Georgia Institute of Technology¹

Parallelism, Performance, and Energy-Efficiency Tradeoffs for In Situ Sensor Data Processing

TL;DR: The design of a resource manager and master scheduler for the OpenCLosE environment, that allows efficient realization of multiple applications within a multitasked platform, is described.

...read moreread less

Abstract: We present a framework, OpenCLosE, for dynamic resource management and scheduling of applications written in open compute language (OpenCL) for heterogeneous multimedia and graphics platforms, such as those found in multimedia smartphones and automotive infotainment clusters. We describe the design of a resource manager and master scheduler for the OpenCLosE environment, that allows efficient realization of multiple applications within a multitasked platform.

...read moreread less

Journal Article•DOI•

[...]

Phillip Stanley-Marbell¹•Institutions (1)

IBM¹

Scenario-Based Specification of Automotive Requirements With Quantitative Constraints and Synthesis of SL/SF Monitors

TL;DR: This letter presents the design, implementation, and evaluation of a miniature, energy-scalable, 24-processor module, L24, for compute-intensive in situ sensor data processing tasks, finding an optimum operating voltage that minimizes either time-to-solution, energy usage, or the energy-delay product.

...read moreread less

Abstract: The in situ processing of vast amounts of data, available intermittently in networks of sensors, motivates investigation of means for achieving high performance when required, but ultralow-power dissipation when idle. One approach is the use of embedded multiprocessor systems, leading to tradeoffs between parallelism, performance, energy-efficiency, and cost. To evaluate these tradeoffs, and to gain insight for future system designs, this letter presents the design, implementation, and evaluation of a miniature, energy-scalable, 24-processor module, L24, for compute-intensive in situ sensor data processing tasks. The platform provides idle power dissipation over an order of magnitude lower than systems employing a monolithic processor of equivalent performance, while dynamic power dissipation remains competitive. Taking into account both application computation and interprocessor communication demands, it is shown that there may exist an optimum operating voltage that minimizes either time-to-solution, energy usage, or the energy-delay product. This optimum operating point is formulated analytically, calibrated with system measurements and instruction-level microarchitectural simulation, and evaluated for the hardware platform and application presented.

...read moreread less

Journal Article•DOI•

[...]

Silky Arora¹, A A Gadkari¹, S. Ramesh¹•Institutions (1)

General Motors¹

Coroutine-Based Synthesis of Efficient Embedded Software From SystemC Models

TL;DR: An algorithm for automatic synthesis of SL/SF monitors from ESC-QC specifications, which is used for generating monitors for verification of controller models from active safety and body control applications.

...read moreread less

Abstract: Requirements of embedded systems often describe the system behavior with quantitative constraints over parameters such as timing, memory, and other resources. In this letter, we present a visual language suited for scenario-based specification of requirements with quantitative constraints. Our language, known as event sequence charts with quantitative constraints (ESC-QC), is inspired by message sequence charts (MSC) and its variants. We introduce ESC-QC notations through an example from automotive requirements and then describe the formal syntax and semantics. Besides being useful for formal documentation and analysis of system requirements, ESC-QC specifications can be translated into monitors and used for run-time verification of designs. In automotive systems Simulink/Stateflow (SL/SF) is widely used for design of control systems. We have developed an algorithm for automatic synthesis of SL/SF monitors from ESC-QC specifications. We have used this algorithm for generating monitors for verification of controller models from active safety and body control applications.

...read moreread less

Journal Article•DOI•

[...]

Weichen Liu¹, Jiang Xu¹, Jogesh K. Muppala¹, Wei Zhang², Xiaowen Wu¹, Yaoyao Ye¹ - Show less +2 more•Institutions (2)

Hong Kong University of Science and Technology¹, Nanyang Technological University²

Process Algebra as a Common Framework for Hardware/Software Coverification

TL;DR: An approach to automatic software synthesis from SystemC-based on coroutines instead of the traditional approaches based on real-time operating system (RTOS) threads, which results in impressive reduction of runtime overheads compared to the thread-based approaches.

...read moreread less

Abstract: SystemC is a widely used electronic system-level (ESL) design language that can be used to model both hardware and software at different stages of system design. There has been a lot of research on behavior synthesis of hardware from SystemC, but relatively little work on synthesizing embedded software for SystemC designs. In this letter, we present an approach to automatic software synthesis from SystemC-based on coroutines instead of the traditional approaches based on real-time operating system (RTOS) threads. Performance evaluation results on some realistic applications show that our approach results in impressive reduction of runtime overheads compared to the thread-based approaches.

...read moreread less

Journal Article•DOI•

[...]

Matthias Raffelsieper¹, Mohammad Reza Mousavi¹, J Sleuters•Institutions (1)

Eindhoven University of Technology¹