### Low-Overhead Hard Real-time Aware Interconnect Network Router

#### Michel A. Kinsy

Department of Computer and Information Science University of Oregon



#### Srinivas Devadas

Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology







- The convergence of application specific processors and multicore systems
- New phase where we see two distinctive computing architectures emerging



Massive parallel homogenous cores



Heterogeneous many-core architectures



#### Homogeneous Many-cores



- Tilera-like homogenous architectures: servers, cloud computing...
- Great for trivially parallel applications
- Having similar cores on the die makes the manufacturing and testing process more manageable
- Homogeneous collection of cores also keeps the software support model simple
- Lack of specialization of hardware to different tasks







- Integration of heterogeneous technologies
- These SoC architectures have large number of processing units
  - programmable RISC/CISC cores, memory, DSPs, and accelerator function units/ASIC
- Logic diagram of the hardware processing units in a typical smart phone today







 SoC architectures need more research and standardization to encourage high degree of reusability



SoC design complexity trends [International Technology Roadmap for Semiconductors 2011 Report]





- Design of heterogeneous many-core systems shares many of the same challenges in general-purpose homogeneous architectures, but there are few added design constraints
- SoC architectures are deployed in many computation environments that require concurrent execution of several tasks with different, sometimes opposite, performance goals
- Integration of different services on the same computing platform requires rethinking of the cores, the memory hierarchy, and the interconnect network
- On these platforms, we need to think not only in terms of tasks and task parallelism, but also services and service guarantees
- Quality of service at the on-chip network level

































#### **Conventional Router**









The Hard Real-time Support (HRES) router is able to:

- Decouple hard real-time and best effort traffic using a twodatapath routing scheme
- Maximize link throughput and guarantee hard real-time timing constraints
- Provide fairness of link utilization among the two classes of traffic
- Be acknowledgment-free, retransmission-free, and lossless
- Be deadlock and livelock free with no modification to the buffered datapath of the router.
- It has low hardware overhead for supporting hard real-time communications.

# Hard Real-time Support (HRES) router











#### **Other Routers**





#### **Two-network router**

Single shared crossbar router





|                        | Number ports | Cell area |
|------------------------|--------------|-----------|
| HRES router            | 392          | 47190.44  |
| Two-Network router     | 721          | 51766.84  |
| Single crossbar router | 392          | 52812.94  |

- Two-network router has duplication of wires and logic
  - Lead to more cell area
  - Changes in the network interface
- Single large crossbar: switch arbitration logic is modified to give priority to the real-time traffic
  - Increase in switch arbitration datapath and router critical path
  - Real-time and normal traffic requests must be serialized



#### **Area and Power**







#### **QoS-Aware On-Chip Routing**



$$\sum_{i} \sum_{(u,v)\in E} f_{i[r]}(u,v) \cdot w_{i[r]}(u,v)$$
(1)

subject to:

$$\forall i, \ \forall (u,v) \in E \quad f_{i[r]}(u,v) \in S_{[r]}(u,v) \tag{2}$$

$$\left|S_{[r]}(u,v)\right| \le 1\tag{3}$$

$$0 \le \sum_{i} f_{i[r]}(u, v) \le c_{[i]}(u, v) \le c(u, v)$$
(4)

$$\forall (u,v) \in E \quad \left(\sum_{i} f_i(u,v) + \sum_{i} f_{i[r]}(u,v)\right) \leq c(u,v) \quad (5)$$
  
$$\forall i \quad f_{i[r]} \in K_{i[r]} \quad (6)$$

$$\left|K_{i[r]}\right| \le k_i \tag{7}$$

$$\forall i, \forall j \in \{1, \dots, k_i\} \frac{\sum_{(u,v) \in E} f_{(i,j)[r]}(u,v)}{f_{(i,j)[r]}} \le deadline_i \quad (8)$$



#### **QoS-Aware On-Chip Routing**



$$\sum_{i} \sum_{(u,v)\in E} f_{i[r]}(u,v) \cdot w_{i[r]}(u,v) \qquad (1)$$
subject to:  

$$\forall i, \forall (u,v) \in E \quad f_{i[r]}(u,v) \in S_{[r]}(u,v) \qquad (2)$$

$$|S_{[r]}(u,v)| \leq 1 \qquad (3)$$
Non-starvation  

$$0 \leq \sum_{i} f_{i[r]}(u,v) \leq c_{[i]}(u,v) \leq c(u,v) \qquad (4)$$

$$\forall (u,v) \in E \quad (\sum_{i} f_{i}(u,v) + \sum_{i} f_{i[r]}(u,v)) \leq c(u,v) \qquad (5)$$

$$\forall i \quad f_{i[r]} \in K_{i[r]} \qquad (6)$$

$$|K_{i[r]}| \leq k_{i} \qquad (7)$$

$$\forall i, \forall j \in \{1, ..., k_{i}\} \frac{\sum_{(u,v)\in E} f_{(i,j)[r]}(u,v)}{f_{(i,j)[r]}} \leq deadline_{i} \qquad (8)$$



- In a modern automobile there are as many as as 70 electronic control units (ECUs) embedded in a vehicle\*
- In the HEV, a motor drive, with a controlled inverter system, is needed to deliver powerful and efficient drive to the electric motor
- Mixed-criticality application: hard real-time and best effort





#### **Evaluation Results**







#### **Evaluation Results**











## Thank You !

# More Information at <a href="http://caes.cs.uoregon.edu/">http://caes.cs.uoregon.edu/</a>

