

# MPSoC Programming using the MAPS Compiler

Rainer Leupers, <u>Jeronimo Castrillon</u>, Institute for Integrated Signal Processing Systems RWTH Aachen University, Germany

> ASP-DAC Taipei, Jan. 2010



Institute for Integrated Signal Processing Systems

#### Motivation: MPSoCs and the Productivity Gap

# Multi-Processor Systems on Chip are a reality

Increased HW and SW complexity



The productivity Gap: Requirements double every 10 months, HW/SW productivity every 2 years (Ecker, Mueller, Doemer, 2008)

→ Need better support for SW development in the MPSoC era







# MAPS: Bridging the Productivity Gap

# MAPS: MPSoC Application Programming Studio:



Source: Virtual Platform of Shapes RDT, SSS RWTH Aachen



Source: Chen, NTU, MPSoC 2008

- Flexible input specification: 85% of embedded programmers use C/C++ (<u>www.eetimes.com</u>)
  - Legacy C-code and partitioning
  - Explicitly parallel C-like programming model (KPN)
- Abstraction & retargetability:
  - Abstract APIs for early SW design
  - Code generation hides HW dependent SW
- Functional validation:
  - Abstract simulator (HVP), no processor-specific tool chains involved
- Mapping & Scheduling frameworks:
  - Manage the huge design space
- Multiple application of different classes (real-time, best effort)







Motivation



- Sequential and Parallel Flows
- Results
- Conclusions and Outlook





# MAPS Flow Overview



# **MAPS:** Graphical User Interface





- Motivation
- MAPS Overview



- Results
- Conclusions and Outlook





# Sequential Flow: How it started...



 Sequential flow as presented in DAC 2008

#### Key points:

- 1. Analysis phase: Traces for Dynamic Data Flow Analysis
- 2. New analysis granularity: "Coupled" blocks as opposed to basic-blocks, functions,...
- 3. Performance estimation: annotated 3-address-code IR via cost table
- 4. Heuristic for hierarchical code partitioning
- Simple code generation for TCT platform (TiTech, Tokyo)
- Execution on TCT virtual/real platform



Analyze Strongly Connected Components (SCC): improves parallel efficiency, i.e. less PEs – similar execution time



- SCCs are recognized and a heuristic is used to merge blocks in order to improve the parallel efficiency
- Especial care of nested SCCs





 Balance partitions of functions in different locations of the Call Graph





- Dataflow programming models gain everyday more acceptance... Which to use?
  - HSDFs, SDFs, MRDFs, CFDF, KPN...



- MAPS programming model: Based on the Kahn Process Networks (KPN) Model of Computation (MoC)
  - Better expressiveness compared to other models
  - Simple semantics
  - More difficult to analyze and derive plausible schedules
    - Although comparable when handling multiple applications





Pragma extensions to represent KPN applications. Ex. RLE Decoding:



# GUI equivalent editor/viewer:







# Parallel Flow



 Parallel flow, details to appear in DATE Mar. 2010

#### Key points:

- 1. Intermediate *pthread* code generation for tracing
- 2. "Sequentialized" processes analyzed by traditional MAPS
- 3. KPN tracer generates KPN traces
- 4. Modular framework for scheduling and mapping: RR, RRWS, prioritybased, FIFO,...
- 5. TRM allows to compare different schedules
- The scheduler descriptor can be used to generate code directly



#### Parallel Flow: What is a KPN Trace?

- A sequential trace is a series of basic blocks
- The KPN tracer identifies in which BBs channels were accessed



A trace is a sequence of segments, where a segment is a sequence of BBs with a channel access in its last BB







# Handling Multiple Applications

- Applications organized into classes:
  - Hard/soft real time
  - Best effort



The Application Concurrency Graph (ACG) serves to describe the interaction among applications



- A sub-graph of the ACG represent a use-case or multi-application scenario
- Schedules for different applications are computed separately
- Use-case analysis via composition:





- Motivation
- MAPS Overview
- Sequential and Parallel Flows



Conclusions and Outlook





## New partitioning passes: a toy example



#### Results: Parallel & Overall Flow

- The parallel flow has been tested on several real life applications:
  - MPEG2, JPEG, GSM, MIMO,...
- MAPS usability fully tested:
  - Parsing/tracing/profiling
  - Functional validation
- Later verification on different back-ends
  - TI-OMAP, TCT, OSIP













- Motivation
- MAPS Overview
- Sequential and Parallel Flows
- Results







- MAPS A fairly complete tool set for MPSoC programming was presented:
  - Sequential (C) & parallel (KPN) input specification
  - Abstraction: functional simulation, APIs
  - Mapping & scheduling of single and multiple applications to heterogeneous MPSoCs

# ... in a user friendly Eclipsed-based GUI

- Current & future work in MAPS
  - C extensions instead of *pragmas*, aka: **CPN**
  - Compiler development: CLANG, LLVM
  - Better performance estimation techniques: TotalProf
  - Improving mapping and scheduling heuristics
  - Research on *composability* for KPNs





# Thank You! Questions??

Acknowledgments:

*This work has been supported by the UMIC (Ultra High-Speed Mobile Information and Communication) research centre. <u>www.umic.rwth-aachen.de</u>* 

The team:





maps@iss.rwth-aachen.de



