scispace - formally typeset
Search or ask a question
Author

Sang-Il Han

Bio: Sang-Il Han is an academic researcher from Seoul National University. The author has contributed to research in topics: MPSoC & Code generation. The author has an hindex of 7, co-authored 11 publications receiving 206 citations.

Papers
More filters
Proceedings ArticleDOI
04 Jun 2007
TL;DR: Experimental results show that the design flow can generate various MPSoC architectures from Simulink CAAM correctly and efficiently, allowing processor and task design space exploration at different abstraction levels.
Abstract: System-level design methodologies have been introduced as a solution to handle the design complexity of embedded multiprocessor SoC (MPSoC) systems. In this paper we describe a system-level design flow starting from Simulink specification, focusing on concurrent hardware and software design and verification at four different abstraction levels: Simulink Combined Algorithm and Architecture Model (CAAM), Virtual Architecture, Transaction-accurate Model and Virtual Prototype. We used two multimedia applications, Motion-JPEG and H.264, to evaluate this design flow. Experimental results show that our design flow can generate various MPSoC architectures from Simulink CAAM correctly and efficiently, allowing processor and task design space exploration at different abstraction levels.

55 citations

Proceedings ArticleDOI
07 Jun 2004
TL;DR: The proposed Distributed Memory Server is composed of high-performance and flexible memory service access points (MSAPs), which execute data transfers without intervention of the processing elements, and data network, and control network that can handle direct massive data transfer between the distributed memories of an MPSoC.
Abstract: Massive data transfer encountered in emerging multimedia embedded applications requires architecture allowing both highly distributed memory structure and multiprocessor computation to be handled. The key issue that needs to be solved is then how to manage data transfers between large numbers of distributed memories. To overcome this issue, our paper proposes a scalable Distributed Memory Server (DMS) for multiprocessor SoC (MPSoC). The proposed DMS is composed of: (1) high-performance and flexible memory service access points (MSAPs), which execute data transfers without intervention of the processing elements, (2) data network, and (3) control network. It can handle direct massive data transfer between the distributed memories of an MPSoC. The scalability and flexibility of the proposed DMS are illustrated through the implementation of an MPEG4 video encoder for QCIF and CIF formats. The experiments show clearly how DMS can be adapted to accommodate different SoC configurations requiring various data transfer bandwidths. Synthesis results show that bandwidth can scale up to 28.8 GB/sec.

49 citations

Proceedings ArticleDOI
24 Jul 2006
TL;DR: This work generates a memory-efficient C code from a restricted Simulink model, which can represent both data and control dependency explicitly, by applying two buffer memory optimization techniques: copy removal and buffer sharing.
Abstract: Reduction of the on-chip memory size is a key issue in video codec system design. Because video codec applications involve complex algorithms that are both data-intensive and control-dependent, memory optimization based on global and precise analysis of data and control dependency is required. We generate a memory-efficient C code from a restricted Simulink model, which can represent both data and control dependency explicitly, by applying two buffer memory optimization techniques: copy removal and buffer sharing. Copy removal is performed while parsing the Simulink model. Buffer sharing requires global scheduling and formal lifetime analysis. Experimental results on an H.264 video decoder show that the buffer memory size and execution time of the C code generated by the proposed method are 71% and 32% less than those of the C code produced by Simulink's C code generator, respectively. When compared to the hand written C code, the memory size was reduced by 27% while its execution time was increased by only 3%.

33 citations

Journal ArticleDOI
TL;DR: A joint Simulink-SystemC design flow that enables mixed hardware/software refinement and simulation in the early design process and demonstrates the applicability of the proposed design flow on two real video applications is presented.

24 citations

Proceedings ArticleDOI
20 Apr 2007
TL;DR: A Simulink-based multithread code generation method is presented, which applies Message Aggregation optimization technique to reduce the number of inter-processor communications and reduces the communication overheads.
Abstract: Heterogeneous MPSoCs present unique opportunities for emerging embedded applications, which require both high-performance and programmability. Although, software programming for these MPSoC architectures requires tedious and error-prone tasks, thereby automatic code generation tools are required. A code generation method based on fine-grain specification can provide more design space and optimization opportunities, such as exploiting fine-level parallelism and more efficient partitions. However, when partitioned, fine-grain models may require a large number of inter-processor communications, decreasing the overall system performance. This paper presents a Simulink-based multithread code generation method, which applies Message Aggregation optimization technique to reduce the number of inter-processor communications. This technique reduces the communication overheads in terms of execution time by reduction on the number of messages exchanged and in terms of memory size by the reduction on the number of channels. The paper also presents experiment results for one multimedia application, showing performance improvements and memory reduction obtained with Message Aggregation technique.

17 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This paper develops and proposes a novel classification for ESL synthesis tools, and presents six different academic approaches in this context based on common principles and needs that are ultimately required for a true ESL synthesis solution.
Abstract: With ever-increasing system complexities, all major semiconductor roadmaps have identified the need for moving to higher levels of abstraction in order to increase productivity in electronic system design. Most recently, many approaches and tools that claim to realize and support a design process at the so-called electronic system level (ESL) have emerged. However, faced with the vast complexity challenges, in most cases at best, only partial solutions are available. In this paper, we develop and propose a novel classification for ESL synthesis tools, and we will present six different academic approaches in this context. Based on these observations, we can identify such common principles and needs as they are leading toward and are ultimately required for a true ESL synthesis solution, covering the whole design process from specification to implementation for complete systems across hardware and software boundaries.

174 citations

Journal ArticleDOI
TL;DR: The motivation here is to prove that, when exploiting the treatment fastness of FPGAs (less than 6 μs ), it is possible to enhance the control bandwidth.
Abstract: The aim of this paper is to quantify the interest of using hardware field-programmable gate arrays (FPGAs) to implement complex control algorithms. As a benchmark, authors have chosen a sensorless speed controller for a synchronous motor. The estimation of the rotor position and speed is achieved using an extended Kalman filter (EKF), eliminating the need of their corresponding mechanical sensors. Due to the EKF complexity, such sensorless controller is systematically implemented in a software digital signal processor (DSP) device. The execution time is frequently evaluated to several tens or hundreds of microseconds. The motivation here is to prove that, when exploiting the treatment fastness of FPGAs (less than 6 μs ), it is possible to enhance the control bandwidth. To reach this objective, a comparison between the developed FPGA-based sensorless speed controller and its DSP-based counterpart is made. The same sensorless controller (with the same complexity) has been implemented in both cases. To prop up this comparison, simulation, hardware-in-the-loop, and experimental tests are presented.

118 citations

Proceedings ArticleDOI
20 Apr 2009
TL;DR: A transformation of the transaction level model to a graph-based model and symbolic representation that allows multi-objective optimization is presented and results from optimizing a motion-JPEG decoder illustrate the effectiveness of the proposed approach.
Abstract: In this paper, a novel design space exploration approach is proposed that enables a concurrent optimization of the topology, the process binding, and the communication routing of a system. Given an application model written in SystemC TLM 2.0, the proposed approach performs a fully automatic optimization by a simultaneous resource allocation, task binding, data mapping, and transaction routing for MPSoC platforms. To cope with the huge complexity of the design space, a transformation of the transaction level model to a graph-based model and symbolic representation that allows multi-objective optimization is presented. Results from optimizing a Motion-JPEG decoder illustrate the effectiveness of the proposed approach.

68 citations

Journal ArticleDOI
TL;DR: A middleware infrastructure supporting dynamic task allocation for NUMA architectures is presented and an extensive characterization of its impact on multimedia soft real-time applications using a software FM Radio benchmark is performed.
Abstract: Multiprocessor systems on chips (MPSoCs) are envisioned as the future of embedded platforms such as game-engines, smart-phones and palmtop computers. One of the main challenge preventing the widespread diffusion of these systems is the efficient mapping of multitask multimedia applications on processing elements. Dynamic solutions based on task migration has been recently explored to perform run-time reallocation of task to maximize performance and optimize energy consumption. Even if task migration can provide high flexibility, its overhead must be carefully evaluated when applied to soft real-time applications. In fact, these applications impose deadlines that may be missed during the migration process. In this paper we first present a middleware infrastructure supporting dynamic task allocation for NUMA architectures. Then we perform an extensive characterization of its impact on multimedia soft real-time applications using a software FM Radio benchmark.

53 citations

Proceedings ArticleDOI
07 Mar 2005
TL;DR: This paper presents lightweight support for message passing, and has made the library as flexible as possible such that it can optimally match the application with the target architecture.
Abstract: With the advent of multi-processor systems on a chip, the interest for message passing libraries has revived. Message passing helps in mastering the design complexity of parallel systems. However, to satisfy the stringent energy-budget of embedded applications, the message passing overhead should be limited. Recently, several hardware extensions have been proposed for reducing the transfer cost on a distributed memory architecture. Unfortunately, they ignore the synchronization cost between sender/receiver and/or require many dedicated hardware blocks. To overcome the above limitations, we present in this paper light-weight support for message passing. Moreover, we have made our library as flexible as possible such that we can optimally match the application with the target architecture. We demonstrate the benefits of our approach by means of representative benchmarks from the multimedia domain..

42 citations