### ABSTRACT

*Virginia* 

SystemC promises an environment for faster hardware/ software design-space exploration.

With systems becoming more complex, the simulation performance of the corresponding SystemC RTL design deteriorates significantly as current SystemC standard makes no effort to parallelize its simulation kernel to take advantage of multi-core platforms that are becoming the norm nowadays.

In this work, we present a translator that transforms synthesizable SystemC designs into parallely executable programs targeting an NVIDIA graphics processing unit (GPU).

The translator retains the discrete-event semantics of the original designs by applying semantic preserving transformations. The resulting programs exploit the hardware parallelism for improved simulation efficiency.

Preliminary experiments show a simulation speed-up of approximately 30x-100x.

### CONTACT

Mahesh Nanjundappa FERMAT Lab, Virginia Tech Email: <u>knmahesh@vt.edu</u> Phone: 540-425-1151 Website: http://filebox.vt.edu/users/knmahesh/

SystemC, a modeling and simulation language, has been used in making early trade-off analysis and design-space exploration.

OSCI implementation of SystemC simulation kernel, being a Discrete Event Simulation Kernel makes no attempts to take advantage of recent parallel architectures.

present a source-to-source (S2S) We translator that transforms SystemC designs into parallely executable CUDA programs that run on NVIDA Graphics Processing Units (GPUs).

Preliminary experiments show impressive results by giving speed-ups from 30-100x on certain benchmarks.

### PARALLEL SYSTEMC KERNEL

Parallelism is extracted by mapping each runnable process to a thread belonging to different warp and executing multiple threads parallely on GPU.

To prevent *race* condition, *Double Buffering* mechanism is used.





# SCGPSim: A fast SystemC simulator on GPUs

## Mahesh Nanjundappa<sup>1</sup>, Hiren D. Patel<sup>2</sup>, Bijoy A. Jose<sup>1</sup> and Sandeep K. Shukla<sup>1</sup> <sup>1</sup>FERMAT Lab, Virginia Polytechnic Institute and State University, <sup>2</sup>University of Waterloo

### INTRODUCTION

### **Barrier Synchronization** is employed for synchronization of various threads.

### Fig1: A Parallel SystemC Simulation Kernel

Thread doing simulation work
Thread Idle and waits for synchronization







| Design          | CPU   | CPU+GPU | Speed Up |
|-----------------|-------|---------|----------|
|                 | (sec) | (sec)   |          |
| Pipeline AES    | 120.1 | 3.916   | 30.66    |
| FIR             | 51.9  | 1.37    | 37.88    |
| 10 Stage Buffer | 28    | 0.277   | 101.083  |
| Simple ALU      | 13    | 0.146   | 89.041   |
| 3 Stage Buffer  | 11    | 0.276   | 39.85    |