scispace - formally typeset
Search or ask a question
Author

Michael Glab

Bio: Michael Glab is an academic researcher from University of Ulm. The author has contributed to research in topics: Design space exploration & Solver. The author has an hindex of 5, co-authored 12 publications receiving 86 citations. Previous affiliations of Michael Glab include University of Erlangen-Nuremberg.

Papers
More filters
Journal ArticleDOI
TL;DR: A novel meta-heuristic DSE approach that eliminates architectural symmetries by abstracting the problem to a clustering of tasks and their mapping to processor types and shows that a DSE equipped with the novel symmetry-eliminating search space and the proposed learning techniques clearly outperforms a state-of-the-art approach known from literature in terms of the quality of the gained implementation classes.
Abstract: Large scale many-core systems are able to execute concurrently changing mixes of different parallel applications. Hybrid application mapping combines the strengths of design-time exploration/analysis of resource constellations for task-to-core mappings with the flexibility of choosing concrete mappings at run time. However, state-of-the-art design space exploration (DSE) techniques so far ignore the problem of symmetries in modern heterogeneous architectures: not only recurring patterns in the architecture but the mapping of tasks to instances of the same processor type may unnecessarily increase the search space by redundant, symmetrical implementations which typically affects the quality of the DSE. As a remedy, we propose a novel meta-heuristic DSE approach that eliminates architectural symmetries by abstracting the problem to a clustering of tasks and their mapping to processor types. However, we demonstrate that simple task clustering and type mappings may again introduce encoding symmetries in our search space. Thus, we present a formulation of the task clustering and type mapping as a 0–1 integer linear program (ILP) which eliminates all architectural as well as encoding symmetries from the search space. We also contribute a formal feasibility check to ensure that only implementations with at least one feasible concrete mapping are considered. To further improve the search process for feasible solutions, we apply satisfiability modulo theories-like learning techniques: from each infeasible implementation, we extract conditions why the implementation is infeasible and enrich our 0–1 ILP by additional constraints continuously during the DSE. Experimental results show that a DSE equipped with the novel symmetry-eliminating search space and the proposed learning techniques clearly outperforms a state-of-the-art approach known from literature in terms of the quality of the gained implementation classes.

29 citations

Book ChapterDOI
28 Oct 2013
TL;DR: This work introduces a model transformation framework that converts a Simulink model to an executable specification, written in an actor-oriented modeling language that serves as the input of well-established Electronic System Level (ESL) design flows that enables Design Space Exploration (DSE) and automatic code generation for both hardware and software.
Abstract: Matlab/Simulink is today's de-facto standard for model-based design in domains such as control engineering and signal processing. Particular strengths of Simulink are rapid design and algorithm exploration. Moreover, commercial tools are available to generate embedded C or HDL code directly from a Simulink model. On the other hand, Simulink models are purely functional models and, hence, designers cannot seamlessly consider the architecture that a Simulink model is later implemented on. In particular, it is not possible to explore the different architectural alternatives and investigate the arising interactions and side-effects directly within Simulink. To benefit from Matlab/Simulink's algorithm exploration capabilities and overcome the outlined drawbacks, this work introduces a model transformation framework that converts a Simulink model to an executable specification, written in an actor-oriented modeling language. This specification then serves as the input of well-established Electronic System Level (ESL) design flows that, e. g., enables Design Space Exploration (DSE) and automatic code generation for both hardware and software. We also present a validation technique that considers the functional correctness by comparing the original Simulink model with the generated specification in a co-simulation environment. The co-simulation can also be used to evaluate the performance of implementation candidates during DSE. As case study, we present and investigate a torque vectoring application from an electric automotive vehicle.

16 citations

Proceedings ArticleDOI
01 Sep 2016
TL;DR: It is shown that statically proven quality guarantees may be enforced on many multi-core architectures by a presented hybrid mapping approach and a real-world case study from the domain of heterogeneous robot vision is given to demonstrate the capabilities of this approach to guarantee statically analyzed best and worst-case timing requirements on latency and throughput.
Abstract: The predictability of execution qualities including timeliness, power consumption, and fault-tolerability is of utmost importance for the successful introduction of multi-core architectures in embedded systems requiring guarantees rather than best effort behavior. Examples are real-time and/or safety-critical parallel applications. In particular for future many-core architectures, analysis tools for proving such properties to hold for a given application irrespective of other workload either suffer from computational complexity. Or, sound bounds are of no practical interest due to severe interferences of resources and software at multiple levels. In view of abundant computational and memory resources becoming available, we propose to apply the principles of invasive computing to avoid sharing of resources at run time as much as possible. We subsequently show that statically proven quality guarantees may be enforced on many multi-core architectures by a presented hybrid mapping approach. Rather than fixed resource mappings, this approach provides only constellations of resource allocations to the run-time system that searches for such constellations and assigns the invader a suitable claim of resources, if possible. We have implemented this hybrid approach and the interface to the language InvadeX10, a library-based extension of the X10 programming language. In this extension, so-called requirements on execution qualities such as deadlines (e.g., in the form of latency constraints) may be annotated to individual programs or even program segments. These are then translated into satisfying resource constellations that need to be found at run time prior to admitting a parallel application to start, respectively continue in view of required execution quality requirements. We give a real-world case study from the domain of heterogeneous robot vision to demonstrate the capabilities of this approach to guarantee statically analyzed best and worst-case timing requirements on latency and throughput.

16 citations

Proceedings ArticleDOI
01 Jul 2016
TL;DR: This paper proposes a novel top-down system synthesis approach with additional support for the composition of subsystems that is based on the use of hierarchical mapping edges and a list-based scheduling algorithm using distributed priority queues.
Abstract: Typically, state-of-the-art approaches in system synthesis do not consider the trend in embedded systems design towards systems-of-systems where optimized subsystems exist from previous projects or as 3rd party IP. In this paper, we propose a novel top-down system synthesis approach with additional support for the composition of subsystems that is based on the use of hierarchical mapping edges and a list-based scheduling algorithm using distributed priority queues. The proposed method not only enables composition of existing subsystems, but experimental results also show a significant reduction of the design space while maintaining a good quality of the implemented systems. Especially for large network-on-chip systems (NoC), our approach outperforms an existing top-down methodology in solving time by nearly 50% and in average quality by 11%.

8 citations

Proceedings ArticleDOI
01 Oct 2019
TL;DR: A data-driven approach for detecting workload scenarios and exploring scenario-optimized mappings based on a collection of input data is provided and the latency of two exemplary applications, a ray tracing as well as an image stitching application, can be significantly improved.
Abstract: For applications whose workload and execution behavior significantly varies with the input, a single mapping of application tasks to a given target architecture is insufficient. A single mapping may deliver a high-quality solution for the average case but rarely exploits the specific execution behavior of concurrent tasks triggered by each input tuple. E.g., tasks with higher computational demands under certain input should be mapped onto high-performance resources of the heterogeneous architecture. This necessitates mappings that are specialized for specific input data. Yet, due to the large size of input combinations, determining a separate optimized mapping for each individual input workload is not feasible for most applications. As a remedy, we propose to group input data with similar execution characteristics into a selected, small number of so-called workload scenarios for which we supply optimized mappings. In this paper, we provide a data-driven approach for detecting workload scenarios and exploring scenario-optimized mappings based on a collection of input data. The identification of scenarios and the determination of optimized mappings are interdependent: For the data-driven identification of workload scenarios, we have to measure the profiles when executing the application with the given input data for different application mappings. However, to come up with scenario-optimized application mappings, the workload scenarios have to be known. We tackle this interdependence problem by proposing a cyclic design methodology that optimizes both aspects in an iterative fashion. It is shown that with our approach, the latency of two exemplary applications, a ray tracing as well as an image stitching application, can be significantly improved compared to methods that ignore workload scenarios or do not perform the proposed iterative refinement. Furthermore, we demonstrate that our proposal can be used in the context of a hybrid application mapping methodology.

7 citations


Cited by
More filters
Proceedings Article
01 Jan 2011
TL;DR: This is the first approach in fault-tolerant task scheduling that considers permanent and transient faults in a unified manner and the effectiveness of this approach is illustrated using several case studies.
Abstract: Reliability is a major requirement for most safety-related systems. To meet this requirement, fault-tolerant techniques such as hardware replication and software re-execution are often utilized. In this paper, we tackle the problem of analysis and optimization of fault-tolerant task scheduling for multiprocessor embedded systems. A set of existing fault-and process-models are adopted and a Binary Tree Analysis (BTA) is proposed to compute the system-level reliability in the presence of software/hardware redundancy. The BTA is integrated into a multi-objective evolutionary algorithm via a two-step encoding to perform reliability-aware design optimization. The optimization results contain the mapping of tasks to processing elements, the exact task and message schedule and the fault-tolerance policy assignment. Based on the observation that permanent faults need to be considered together with transient faults to achieve optimal system design, we propose a virtual mapping technique to take both types of faults into account. To the best of our knowledge, this is the first approach in fault-tolerant task scheduling that considers permanent and transient faults in a unified manner. The effectiveness of our approach is illustrated using several case studies.

54 citations

Journal ArticleDOI
TL;DR: This paper presents the detailed comparative analysis and categorization of application mapping approaches with current trends in NoC design implementation, and identifies the best technique identified in each category based on the evaluation of performance results.
Abstract: Network-on-chip (NoC) is evolving as a better substitute for incorporating a large number of cores on a single system-on-chip (SoC). The dependency on multi-core systems to accomplish the high-performance constraints of composite embedded applications is on the rise. This leads to the realization of efficient mapping approaches for such complex applications. The significance of efficient application mapping approaches has increased ever since the embedded applications have become more complex and performance-oriented. This paper presents the detailed comparative analysis and categorization of application mapping approaches with current trends in NoC design implementation. These approaches target to improve the performance of the whole system by optimizing communication cost, energy, power consumption, and latency. Apart from the categorization of the discussed approaches, comparison of communication cost, power, energy, and latency of the NoC system carried out on real applications like VOPD and MPEG4. Moreover, the best technique identified in each category based on the evaluation of performance results.

31 citations

Journal ArticleDOI
TL;DR: This paper presents a survey regarding various aspects of designing time-critical computing systems.
Abstract: Time-critical computing systems are enablers for various important application domains, like avionics, automotive, spacecraft, IoT etc. All these applications will be benefited immensely from the design of computing systems that fulfill strict timing requirements. This paper presents a survey regarding various aspects of designing time-critical computing systems. —Partha Pande, Washington State University

25 citations

Journal ArticleDOI
TL;DR: An overview and classification of mapping algorithms that would facilitate graphical interpretation of the known techniques are provided, along with performance, energy consumption, communication cost, reliability, or thermal management on different target architectures.
Abstract: Multicore systems are in demand due to their high performance thus making application mapping an important research area in this field. Breaking an application into multiple parallel tasks efficiently and task-core assignment decisions can drastically influence system performance. This has created an urgency to find potent mapping techniques which can handle the complexity of these systems. Task assignment methods are governed by the application model, user-requirements, and architecture model. This paper provides an overview and classification of mapping algorithms that would facilitate graphical interpretation of the known techniques. It details the mapping methodologies along with performance, energy consumption, communication cost, reliability, or thermal management on different target architectures. Upcoming trends and open research areas have also been discussed.

17 citations

Journal ArticleDOI
TL;DR: The experimental results demonstrate that the proposed adaptive DSE strategies clearly outperform a state-of-the-art DSE approach known from literature in terms of the quality of the gained implementations as well as exploration times.
Abstract: State-of-the-art system synthesis techniques employ meta-heuristic optimization techniques for Design Space Exploration (DSE) to tailor application execution, e.g., defined by a dataflow graph, for a given target platform. Unfortunately, the performance evaluation of each implementation candidate is computationally very expensive, in particular on recent multi-core platforms, as this involves compilation to and extensive evaluation on the target hardware. Applying heuristics for performance evaluation on the one hand allows for a reduction of the exploration time but on the other hand may deteriorate the convergence of the optimization technique toward performance-optimal solutions with respect to the target platform. To address this problem, we propose DSE strategies that are able to dynamically trade off between (i) approximating heuristics to guide the exploration and (ii) accurate performance evaluation, i.e., compilation of the application and subsequent performance measurement on the target platform. Technically, this is achieved by introducing a set of additional, but easily computable guiding objective functions, and varying the set of objective functions that are evaluated during the DSE adaptively. One major advantage of these guiding objectives is that they are generically applicable for dataflow models without having to apply any configuration techniques to tailor their parameters to the specific use case. We show this for synthetic benchmarks as well as a real-world control application. Moreover, the experimental results demonstrate that our proposed adaptive DSE strategies clearly outperform a state-of-the-art DSE approach known from literature in terms of the quality of the gained implementations as well as exploration times. Amongst others, we show a case for a two-core implementation where after about 3 hours of exploration time one of our proposed adaptive DSE strategies already obtains a 60% higher performance value than obtained by the state-of-the-art approach. Even when the state-of-the-art approach is given a total exploration time of more than 2 weeks to optimize this value, the proposed adaptive DSE strategy features a 20% higher performance value after a total exploration time of about 4 days.

17 citations