Showing papers in &quot;Design Automation for Embedded Systems in 2002&quot;

C-HEAP: A Heterogeneous Multi-Processor Architecture Template and Scalable and Flexible Protocol for the Design of Embedded Signal Processing Systems

TL;DR: This paper compares three heuristic search algorithms: genetic algorithm (GA), simulated annealing (SA) and tabu search (TS), for hardware–software partitioning and shows that TS is superior to SA and GA in terms of both search time and quality of solutions.

...read moreread less

Abstract: This paper compares three heuristic search algorithms: genetic algorithm (GA), simulated annealing (SA) and tabu search (TS), for hardware–software partitioning. The algorithms operate on functional blocks for designs represented as directed acyclic graphs, with the objective of minimising processing time under various hardware area constraints. Thecomparison involves a model for calculating processing time based on a non-increasing first-fit algorithm to schedule tasks, given that shared resource conflicts do not occur. The results show that TS is superior to SA and GA in terms of both search time and quality of solutions. In addition, we have implemented an intensification strategy in TS called penalty reward, which can further improve the quality of results.

...read moreread less

142 citations

Journal Article•DOI•

[...]

Andre K. Nieuwland¹, Jeffrey Kang¹, Om P. Gangwal¹, Ramanathan Sethuraman¹, Natalino G. Busá¹, Kees Goossens¹, Rafael Peset Llopis¹, Paul E R Lippens¹ - Show less +4 more•Institutions (1)

Philips¹

Intermediate Representations for Design Automation of Multiprocessor DSP Systems

TL;DR: A modular, flexible, and scalable heterogeneous multi-processor architecture template based on distributed shared memory is proposed and an efficient and transparent protocol for communication and (re)configuration is presented, enabling incremental design.

...read moreread less

Abstract: The key issue in the design of Systems-on-a-Chip (SoC) is to trade-off efficiency against flexibility, and time to market versus cost. Current deep submicron processing technologiesenable integration of multiple software programmable processors (e.g., CPUs,DSPs) and dedicated hardware components into a single cost-efficient IC. Ourtop-down design methodology with various abstraction levels helps designingthese ICs in a reasonable amount of time. This methodology starts with a high-levelexecutable specification, and converges towards a silicon implementation.A major task in the design process is to ensure that all components (hardwareand software) communicate with each other correctly. In this article, we tacklethis problem in the context of the signal processing domain in two ways: wepropose a modular, flexible, and scalable heterogeneous multi-processor architecturetemplate based on distributed shared memory, and we present an efficient andtransparent protocol for communication and (re)configuration. The protocolimplementations have been incorporated in libraries, which allows quick traversalof the various abstraction levels, so enabling incremental design. The designdecisions to be taken at each abstraction level are evaluated by means of(co-)simulation. Prototyping is used too, to verify the system's functionalcorrectness. The effectiveness of our approach is illustrated by a designcase of a multi-standard video and image codec.

...read moreread less

93 citations

Journal Article•DOI•

[...]

Neal K. Bambha¹, Vida Kianzad¹, Mukul Khandelia¹, Shuvra S. Bhattacharyya¹•Institutions (1)

University of Maryland, College Park¹

Improving Software Performance with Configurable Logic

TL;DR: A number of high-level intermediate representations for compiling dataflow programs onto self-timed DSP platforms are reviewed, including representations for modeling the placement of interprocessor communication (IPC) operations; separating synchronization from data transfer during IPC; modeling and optimizinglinear orderings of communication operations; performing accurate design space exploration under communication resource contention.

...read moreread less

Abstract: Self-timed scheduling is an attractive implementation style for multiprocessor DSP systems due to its ability to exploit predictability in application behavior, its avoidanceof over-constrained synchronization, and its simplified clocking requirements.However, analysis and optimization of self-timed systems under real-time constraintsis challenging due to the complex, irregular dynamics of self-timed operation.In this paper, we review a number of high-level intermediate representationsfor compiling dataflow programs onto self-timed DSP platforms, including representationsfor modeling the placement of interprocessor communication (IPC) operations;separating synchronization from data transfer during IPC; modeling and optimizinglinear orderings of communication operations; performing accurate design spaceexploration under communication resource contention; and exploring alternativeprocessor assignments during the synthesis process. We review the structureof these representations, and discuss efficient techniques that operate onthem to streamline scheduling, communication synthesis, and power managementof multiprocessor DSP implementations.

...read moreread less

61 citations

Journal Article•DOI•

[...]

Jason Villarreal¹, Dinesh C. Suresh¹, Greg Stitt¹, Frank Vahid¹, Walid Najjar¹ - Show less +1 more•Institutions (1)

University of California, Riverside¹

A Sensitivity-Based Design Space Exploration Methodology for Embedded Systems

TL;DR: A design flow is presented that finds critical software loops automatically and manually re-implements these inconfigurable logic by implementing them in SA-C, a C language variation supporting a dataflow computation model and designed to specify and map DSP applicationsonto reconfigurable Logic.

...read moreread less

Abstract: We examine the energy and performance benefits that can be obtained by re-mapping frequently executed loops from a microprocessor to reconfigurable logic. We present a design flow that finds critical software loops automatically and manually re-implements these inconfigurable logic by implementing them in SA-C, a C language variation supportinga dataflow computation model and designed to specify and map DSP applicationsonto reconfigurable logic. We apply this design flow on several examples fromthe MediaBench benchmark suite and report the energy and performance improvements.

...read moreread less

50 citations

Journal Article•DOI•

[...]

William Fornaciari¹, Donatella Sciuto¹, Cristina Silvano², Vittorio Zaccaria¹•Institutions (2)

Polytechnic University of Milan¹, University of Milan²

Compilation From Matlab to Process Networks Realized in FPGA

TL;DR: A system-level design methodology for the efficient exploration of the architectural parameters of the memory sub-systems, from the energy-delay joint perspective, based on the EDP metric taking into consideration both performance and energy constraints is proposed.

...read moreread less

Abstract: In this paper, we propose a system-level design methodology for the efficient exploration of the architectural parameters of the memory sub-systems, from the energy-delay joint perspective. The aim is to find the best configuration of the memory hierarchy without performing the exhaustive analysis of the parameters space. The target system architecture includes the processor, separated instruction and data caches, the main memory, and the system buses. To achieve a fast convergence toward the near-optimal configuration, the proposed methodology adopts an iterative local-search algorithm based on the sensitivity analysis of the cost function with respect to the tuning parameters of the memory sub-system architecture. The exploration strategy is based on the Energy-Delay Product (EDP) metric taking into consideration both performance and energy constraints. The effectiveness of the proposed methodology has been demonstrated through the design space exploration of a real-world case study: the optimization of the memory hierarchy of a MicroSPARC2-based system executing the set of Mediabench benchmarks for multimedia applications. Experimental results have shown an optimization speedup of 2 orders of magnitude with respect to the full search approach, while the near-optimal system-level configuration is characterized by a distance from the optimal full search configuration in the range of 2%.

...read moreread less

44 citations

Journal Article•DOI•

[...]

Tim Harriss¹, R. Walke¹, Bart Kienhuis², Ed F. Deprettere²•Institutions (2)

Qinetiq¹, Leiden University²

A System-Based Approach to the Formal Development of Embedded Controllers for a Railway

TL;DR: It is shown that high performing realizations can in principle be obtained in a fraction of the design time currently employed to realize a parameterized implementation of a FPGA implementation.

...read moreread less

Abstract: Compaan is a software tool that is capable of automatically translating nested loop programs, written in Matlab, into parallel process network descriptions suitable for implementation in hardware. In this article, we show a methodology and tool to convert theseprocess networks into FPGA implementations. We will show that we can in principleobtain high performing realizations in a fraction of the design time currentlyemployed to realize a parameterized implementation. This allows us to rapidlyexplore a range of transformations, such as loop unrolling and skewing, togenerate a circuit that meets the requirements of a particular application.The QR decomposition algorithm is used to demonstrate the capability of thetool. We present results showing how the number of clock cycles and calculations-per-secondvary with these transformations using a simple implementation of the functionunits. We also provide an indication of what we expect to achieve in the nearfuture once the tools are completed and applied the transformations to parallel,highly pipelined implementations of the function units.

...read moreread less

43 citations

Journal Article•DOI•

[...]

Michael Butler¹•Institutions (1)

University of Southampton¹

The system-on-a-chip lock cache

TL;DR: A formal approach to the development of embedded controllers for a railway by using the B Method to add more and more implementation detail to the models and to decompose the models into sub-systems to arrive at models of individual controllers.

...read moreread less

Abstract: We describe a formal approach to the development of embedded controllers for a railway. The approach starts with a system-level specification modeling the system under control and the desired control behavior. Correctness-preserving refinement is then used to add more and more implementation detail to the models and to decompose the models into sub-systems to arrive at models of individual controllers. The B Method is used as the formal notation and methodology.

...read moreread less

31 citations

Journal Article•DOI•

[...]

Bilge E. S. Akgul¹, Vincent J. Mooney¹•Institutions (1)

Georgia Institute of Technology¹

IMPACCT: Methodology and Tools for Power-Aware Embedded Systems

TL;DR: A novel, efficient, small and very simple hardware unit, SoC Lock Cache (SoCLC), which resolvesthe critical section (CS) interactions among multiple processors and improvest the performance criteria in terms of lock latency, lock delay and bandwidth consumption in a shared-memory multiprocessor SoC.

...read moreread less

Abstract: In this dissertation, we implement efficient lock-based synchronization by a novel, high performance, simple and scalable hardware technique and associated software for a target shared-memory multiprocessor System-on-a-Chip (SoC). The custom hardware part of our solution is provided in the form of an intellectual property (IP) hardware unit which we call the SoC Lock Cache (SoCLC). SoCLC provides effective lock hand-off by reducing on-chip memory traffic and improving performance in terms of lock latency, lock delay and bandwidth consumption. The proposed solution is independent from the memory hierarchy, cache protocol and the processor architectures used in the SoC, which enables easily applicable implementations of the SoCLC (e.g., as a reconfigurable or partially/fully custom logic), and which distinguishes SoCLC from previous approaches. Furthermore, the SoCLC mechanism has been extended to support priority inheritance with an immediate priority ceiling protocol (IPCP) implemented in hardware, which enhances the hard real-time performance of the system. Our experimental results in a four-processor SoC indicate that SoCLC can achieve up to 37% overall speedup over spin-lock and up to 48% overall speedup over MCS for a microbenchmark with false sharing. The priority inheritance implemented as part of the SoCLC hardware, on the other hand, achieves 1.43X speedup in overall execution time of a robot application when compared to the priority inheritance implementation under the Atalanta real-time operating system. Furthermore, it has been shown that with the IPCP mechanism integrated into the SoCLC, all of the tasks of the robot application could meet their deadlines (e.g., a high priority task with 250us worst case response time could complete its execution in 93us with SoCLC, however the same task missed its deadline by completing its execution in 283us without SoCLC). Therefore, with IPCP support, our solution can provide better real-time guarantees for real-time systems. To automate SoCLC design, we have also developed an SoCLC-generator tool, PARLAK, that generates user specified configurations of a custom SoCLC. We used PARLAK to generate SoCLCs from a version for two processors with 32 lock variables occupying 2,520 gates up to a version for fourteen processors with 256 lock variables occupying 78,240 gates.

...read moreread less

27 citations

Journal Article•DOI•

[...]

Pai H. Chou¹, Jinfeng Liu¹, Dexin Li¹, Nader Bagherzadeh¹•Institutions (1)

University of California, Irvine¹

Hybrid Cache Analysis in Running Time Verification of Embedded Software

TL;DR: A new design tool framework called IMPACCT is proposed, which correctly combines the state-of-the-arttechniques at the system level, thereby saving even experienced designers from many pitfalls of system-level power management.

...read moreread less

Abstract: Power-aware systems are those that must exploit a widerange of power/performance trade-offs in order to adapt to the power availabilityand application requirements. They require the integration of many novel powermanagement techniques, ranging from voltage scaling to subsystem shutdown.However, those techniques do not always compose synergistically with eachother; in fact, they can combine subtractively and often yield counterintuitive,and sometimes incorrect, results in the context of a complete system. Thiscan become a serious problem as more of these power aware systems are beingdeployed in mission critical applications. To address the problem of technique integration for power-aware embedded systems, we propose a new design tool framework called IMPACCT and the associated design methodology. The system modeling methodology includes application model for capturing timing/powerconstraints and mode dependencies at the system level. The tool performs power-awarescheduling and mode selection to ensure that all timing/power constraintsare satisfied and that all overhead is taken into account. IMPACCT then synthesizesthe implementation targeting a symmetric multiprocessor platform. Experimentalresults show that the increased dynamic range of power/performance settingsenabled a Mars rover to achieve significant acceleration while using lessenergy. More importantly, our tool correctly combines the state-of-the-arttechniques at the system level, thereby saving even experienced designersfrom many pitfalls of system-level power management.

...read moreread less

20 citations

Journal Article•DOI•

[...]

Fabian Wolf, Jan Staschulat, Rolf Ernst

Domain Specific Tools and Methods for Application in Security Processor Design

TL;DR: This work presents an approach that extends instruction and data cache modeling from basic blocks to program segments thereby increasing the overall running time analysis precision and combines it with data flow analysis based prediction of cache line contents.

...read moreread less

Abstract: Verification of software running time is essential in embedded systemdesign with real-time constraints. Simulation with incomplete test patternsis unsafe for complex architectures when software running times are inputdata dependent. Formal analysis of such dependencies leads to software runningtime intervals rather than single values. These intervals depend on programproperties, execution paths and states of processes, as well as on the targetarchitecture. In the target architecture, caches have a major influence onsoftware running time. Current cache analysis techniques as a part of runningtime analysis approaches combine basic block level cache modeling with explicitor implicit program path analysis. We present an approach that extends instructionand data cache modeling from basic blocks to program segments thereby increasingthe overall running time analysis precision. We combine it with data flowanalysis based prediction of cache line contents. This novel cache analysisapproach shows high precision in the presented experiments.

...read moreread less

16 citations

Journal Article•DOI•

[...]

Patrick Schaumont¹, Ingrid Verbauwhede¹•Institutions (1)

University of California, Los Angeles¹

Stack Size Minimization for Embedded Real-Time Systems-on-a-Chip

TL;DR: GEZEL is proposed, a design environment consisting of a design language and an implementation methodology that can be used for domain specific processors used to implement cryptographic algorithms with high throughput and/or low energy consumption constraints.

...read moreread less

Abstract: Security processors are used to implement cryptographic algorithmswith high throughput and/or low energy consumption constraints. The designof these processors is a balancing act between flexibility and energy consumption.The target is to create a processor with just enough programmability to covera set of algorithms--an application domain. This paper proposes GEZEL,a design environment consisting of a design language and an implementationmethodology that can be used for such domain specific processors. We use thesecurity domain as driver, and discuss the impact of the domain on the targetarchitecture. We also present a methodology to create, refine and verify asecurity processor.

...read moreread less

Journal Article•DOI•

[...]

Paolo Gai, Giuseppe Lipari, Marco Di Natale

HASoC--Towards a New Method for System-on-a-Chip Development

TL;DR: A fast and simple algorithm for sharing resources in multiprocessor systems, together with an innovative procedure for assigning a preemption threshold to tasks, both of which allow the use of a single user stack.

...read moreread less

Abstract: The primary goal for real-time kernel software for single and multiple-processor on a chip systems is to support the design of timely and cost effective systems. The kernel must provide time guarantees, in order to predict the timely behaviorof the application, an extremely fast response time, in order not to waste computing power outside of the application cycles and save as much RAM space as possible in order to reduce the overall cost of the chip. The research on real-time software systems has produced algorithms that allow to effectively schedule system resources while guaranteeing the deadlines of the application and to group tasks in a very small number of non-preemptive sets which require much less RAM memory for stack. Unfortunately, up to now the research focus has been on time guarantees rather than on the optimization of RAM usage.Furthermore, these techniques do not apply to multiprocessor architectures which are likely to be widely used in future microcontrollers. This paper presents innovative scheduling and optimization algorithms that effectively solve the problem of guaranteeing schedulability with an extremely little operating system overhead and minimizing RAM usage. We developed a fast and simple algorithm for sharing resources in multiprocessor systems, together with an innovative procedure for assigning a preemption threshold to tasks. These allow the use of a single user stack. The experimental part shows the effectiveness of a simulated annealing-based tool that allows to find a schedulable system configuration starting from the selection of a near-optimal task allocation. When used in conjunction with our preemption threshold assignment algorithm, our tool further reduces the RAM usage in multiprocessor systems.

...read moreread less

Journal Article•DOI•

[...]

Peter Green¹, Martyn Edwards¹, Salah Essa¹•Institutions (1)

University of Manchester¹

A Framework for Modeling and Estimating the Energy Dissipation of VLIW-Based Embedded Systems

TL;DR: A novel method (HASoC) for developing embedded systems that are targeted at system-on-a-chip implementations and supports a lifecycle that explicitly separates the behavior of a system from its implementation technology.

...read moreread less

Abstract: We present a novel method (HASoC) for developing embedded systems that are targeted at system-on-a-chip implementations. The object-oriented development method is based on the experiences of using our existing MOOSE technique and supports a lifecycle that explicitly separates the behavior of a system from its implementation technology. The design process, whichuses a notation based on extensions to UML-RT, begins with the incremental development and validation of an abstract executable model of a system. Subsequently, this model is partitioned into hardware and software sub-systems to create a committed model, which is mapped onto a system platform that defines the implementation environment. The methodology emphasizes the reuse of pre-existing hardware and software platforms to ease the development process. A partial example application is presented in order to illustrate the main concepts in our methodology.

...read moreread less

Journal Article•DOI•

[...]

Luca Benini, D. Bruni, Mauro Chinosi¹, Cristina Silvano², Vittorio Zaccaria³, Roberto Zafalon¹ - Show less +2 more•Institutions (3)

STMicroelectronics¹, University of Milan², Polytechnic University of Milan³

Reprogrammable Platforms for High-Speed Data Acquisition

TL;DR: The proposed approach has been applied to the Lx family of scalable embedded VLIWprocessors, jointly designed by STMicroelectronics and HPLabs, and demonstrated an average accuracy of 5% of the instruction-level estimation engine with respect to the RTL engine, with an average speed-up of four orders of magnitude.

...read moreread less

Abstract: This paper describes a technique for modeling and estimating the power consumptionat the system-level for embedded VLIW (Very Long Instruction Word) architectures.The method is based on a hierarchy of dynamic power estimationengines: from the instruction-level down to the gate/transistor-level. Powermacro-models have been developed for the main components of the system: theVLIW core, the register file, the instruction and data caches. The main goalis to define a system-level simulation framework for the dynamic profilingof the power behavior during the software execution, providing also a break-downof the power contributions due to the single components of the system. Theproposed approach has been applied to the Lx family of scalable embedded VLIWprocessors, jointly designed by STMicroelectronics and HPLabs. Experimentalresults, carried out over a set of benchmarks for embedded multimedia applications,have demonstrated an average accuracy of 5% of the instruction-level estimationengine with respect to the RTL engine, with an average speed-up of four ordersof magnitude.

...read moreread less

Journal Article•DOI•

[...]

J. Ludvig¹, J. Mccarthy, S. Neuendorffer², S. R. Sachs²•Institutions (2)

Lawrence Berkeley National Laboratory¹, University of California, Berkeley²

Synthesizing Energy-Efficient Embedded Systems with LOPOCOS

TL;DR: This paper presents a methodology for designing and evaluating high-speed data acquisition systems using reprogrammable platforms.

...read moreread less

Abstract: Complex embedded systems that do not target mass marketsoften have design and engineering costs that exceed production costs. Oneexample is the triggering and data acquisition system (DAQ) integrated intohigh-energy physics experiments. Parameterizable and reprogrammable architecturesare natural candidates as platforms for specialized embedded systems likehigh-speed data acquisition systems. In order to facilitate the design ofspecialized embedded systems, design strategies and tools are needed thatgreatly increase the efficiency of the design process. End-user programmabilityof reprogrammable platforms is essential, because system designers, withouttraining in low-level programming languages, are required to change the basedesign, compare designs, and generate configuration data for the reprogrammableplatforms. This paper presents a methodology for designing and evaluatinghigh-speed data acquisition systems using reprogrammable platforms.

...read moreread less

Journal Article•DOI•

[...]

Marcus T. Schmitz¹, Bashir M. Al-Hashimi¹, Petru Eles²•Institutions (2)

University of Southampton¹, Linköping University²

Enhanced Image Detection on an ARM based Embedded System

TL;DR: How LOPOCOS can support the system designer in identifying energy-efficient hardware/software implementations for the desired embedded systems is demonstrated by highlighting the necessary optimization steps during design space exploration for DVS enable architectures.

...read moreread less

Abstract: In this paper, we introduce the LOPOCOS (Low Power Co-synthesis) system, a prototype CAD tool for system level co-design. LOPOCOS targets the design of energy-efficient embedded systems implemented as heterogeneous distributed architectures. In particular, it is designed to solve the specific problems involved in architectures that include dynamic voltage scalable (DVS) processors. The aim of this paper is to demonstrate how LOPOCOS can support the system designer in identifying energy-efficient hardware/software implementations for the desired embedded systems. Hence, highlighting the necessary optimization steps during design space exploration for DVS enable architectures. The optimization steps carried out in LOPOCOS involve component allocation and task/communication mapping as well as scheduling and dynamic voltage scaling. LOPOCOS has the following key features, which contribute to this energy efficiency. During the voltage scaling valuable power profile information of task execution is taken into account, hence, the accuracy of the energy estimation is improved. A combined optimization for scheduling and communication mapping based on genetic algorithm, optimizes simultaneously execution order and communication mapping towards the utilization of the DVS processors and timing behaviour. Furthermore, a separation of task and communication mapping allows a more effective implementation of both task and communication mapping optimizationsteps. Extensive experiments are conducted to demonstrate the efficiency of LOPOCOS. We report up to 38% higher energy reductions compared to previous co-synthesis techniques for DVS systems. The investigations include a real-life example of an optical flow detection algorithm.

...read moreread less

Journal Article•DOI•

[...]

J. R. Evans¹, Tughrul Arslan¹•Institutions (1)

University of Edinburgh¹

A Compositional Framework for Hardware/Software Co-Design

TL;DR: A new technique for the detection of Integrated Circuits within images of Printed Circuit Boards autonomously and without the need to be assisted by CAD data is presented, and results showing the reduction in complexity when compared to a Hough Transform are presented.

...read moreread less

Abstract: This paper presents a new technique for thedetection of Integrated Circuits within images of Printed Circuit Boards autonomouslyand without the need to be assisted by CAD data. The technique is a key partof a suite of algorithms targeted for an embedded System On Chip architecturebased on the ARM7 platform for real time detection of PCB images for diagnosticpurposes. The technique has a significant reduction in complexity when comparedto conventional approaches such as the Hough Transform. The reduction in complexitymakes the approach ideal for an embedded vision application suchas the one described in this paper. This paper presents the technique, thetarget embedded architecture and results showing the reduction in complexitywhen compared to a Hough Transform.

...read moreread less

Journal Article•DOI•

[...]

Antonio Cau¹, R. Hale¹, J. Dimitrov¹, Hussein Zedan¹, Ben C. Moszkowski¹, M. Manjunathaiah², M. Spivey² - Show less +3 more•Institutions (2)

De Montfort University¹, University of Oxford²

The Rationale for Distributed Semantics as a Topology Independent Embedded Systems Design Methodology and its Implementation in the Virtuoso RTOS

TL;DR: A compositional framework, together with its supporting toolset, for hardware/software co-design based on Interval Temporal Logic and its executable subset, Tempura, based on a single formal specification of the system the software and hardware parts of the implementation, while preserving all properties of theSystem specification.

...read moreread less

Abstract: We describe a compositional framework, together with its supporting toolset, for hardware/software co-design. Our framework is an integration of a formal approach within a traditional design flow. The formal approach is based on Interval Temporal Logic and its executable subset, Tempura. Refinement is the key element in our framework because it will derivefrom a single formal specification of the system the software and hardware parts of the implementation, while preserving all properties of the system specification. During refinement simulation is used to choose the appropriate refinement rules, which are applied automatically in the HOL system. The framework is illustrated with two case studies. The work presented is part of a UK collaborative research project between the Software Technology Research Laboratory at the De Montfort University and the Oxford University Computing Laboratory.

...read moreread less

Journal Article•DOI•

[...]

Eric Verhulst

01 Mar 2002-Design Automation for Embedded Systems

TL;DR: The rationale for developing the distributed semantics of Virtuoso’s microkernel is described and some of the implementation issues are described and Extensions of the model towards heterogeneous embedded target systems are discussed.

...read moreread less

Abstract: Virtuoso VSP is a fully distributed real-time operating system originally developed on the Inmos transputer. Its generic architecture is based on a small but very fast nanokernel and a portable preemptive microkernel. It was further on ported in single and virtual single processor implementations to a wide range of processors. This paper describes the rationale for developing the distributed semantics of Virtuoso's microkernel and describes some of the implementation issues. The analysis is based on the parallel DSP implementations as these push the performance limits most for hard real-time applications. Extensions of the model towards heterogeneous embedded target systems are discussed.

...read moreread less

Journal Article•DOI•

Tuning of Cache Ways and Voltage for Low-Energy Embedded System Platforms

[...]

Tony Givargis¹, Frank Vahid²•Institutions (2)

University of California, Irvine¹, University of California, Riverside²

An Effective Software Pipelining Algorithm for Clustered Embedded VLIW Processors

TL;DR: The energy benefits of combining the configurable features of voltage scaling and cache way shutdown in a single platform are illustrated and methods to assist a designer to tune such a platform to a particular software task and to particular energy optimization criteria are described.

...read moreread less

Abstract: System-on-a-chip platform manufacturers are increasingly adding configurable features that provide power and performance flexibility, in order to increase a platform's applicability to a variety of embedded computing systems. We illustrate the energy benefits of combining the configurable features of voltage scaling and cache way shutdown in a single platform. We describe methods to assist a designer to tune such a platform to a particular software task and to particular energy optimization criteria.

...read moreread less

Journal Article•DOI•

[...]

Cagdas Akturan¹, Margarida F. Jacome¹•Institutions (1)

University of Texas at Austin¹

Processor Utilization Bounds for Real-Time Systems With Precedence Constraints

TL;DR: A software pipelining framework, CALiBeR (ClusterAware Load Balancing Retiming Algorithm), suitable for compilers targetingclustered embedded VLIW processors, and demonstrates that the algorithm compares favorably with one of the best state-of-the-art algorithms.

...read moreread less

Abstract: This paper proposes a software pipelining framework, CALiBeR (ClusterAware Load Balancing Retiming Algorithm), suitable for compilers targetingclustered embedded VLIW processors. CALiBeR can be used by embedded systemdesigners to explore different code optimization alternatives, that is, high-qualitycustomized retiming solutions for desired throughput and program memory sizerequirements, while minimizing register pressure. An extensive set of experimentalresults is presented, demonstrating that our algorithm compares favorablywith one of the best state-of-the-art algorithms, achieving up to 50% improvementin performance and up to 47% improvement in register requirements. In orderto empirically assess the effectiveness of clustering for high ILP applications,additional experiments are presented contrasting the performance achievedby software pipelined kernels executing on clustered and on centralized machines.

...read moreread less

Journal Article•DOI•

[...]

Hongchao (Stephanie) Liu¹, Xiaobo Sharon Hu¹•Institutions (1)

University of Notre Dame¹

CADRE: An Asynchronous Embedded DSP for Mobile Phone Applications

TL;DR: This paper presents a novel approach to computing tight upper bounds on the processor utilization for general real-time systems where tasks are composed of subtasks and precedence constraints may exist among subtasks of the same task.

...read moreread less

Abstract: This paper presents a novel approach to computing tight upper bounds on the processor utilization for general real-time systems where tasks are composed of subtasks and precedence constraints may exist among subtasks of the same task. By careful analysis of preemption effects among tasks, the problem is formulated as a set of linear programming (LP) problems. Observations are made to reduce the number of LP problem instances required to be solved, which greatly improves the computation time of the utilization bounds. Furthermore, additional constraints are allowed to be included under certain circumstances to improve the quality of the bounds.

...read moreread less

Journal Article•DOI•

[...]

M. Lewis¹, Linda E. M. Brackenbury²•Institutions (2)

Ericsson¹, University of Manchester²