#### Range and Bitmask Analysis for Hardware Optimization in High-Level Synthesis

Jan 25, 2013 ASP-DAC



Marcel Gort Jason Anderson



# Motivation

- Software programs mostly use standard 32 and 64 bit datatypes to represent variables.
  - However, don't need 32 bits for a loop counter that only counts to 100!
  - Software is over-engineered, which is fine because processor datapaths are fixed-width.

# LegUp

- LegUp is an open-source high level synthesis framework built within the llvm compiler framework.
  - C to Verilog (supports CHStone benchmarks).
  - Targets pure HW or processor/accelerator system.
  - Automated verification.
- Developed at the University of Toronto.
- Freely downloadable at <a href="https://eecg.utoronto.ca">legup.eecg.utoronto.ca</a>

# Motivation

- High-level-synthesis (HLS) generates hardware from software program.
- Unlike with software, efficiency of that hardware is dependent on bit-level representation of variables.
- Need bitwidth analysis in HLS to generate minimum bit-level representation for each variable.

# This work

- Created a new bitmask analysis approach and combined it with existing variable range analysis techniques.
- Built bitwidth analysis pass into LegUp HLS.

- Minimize variable bitwidths by propagating constants through the program instructions.
- Variables represented in one of two ways:

- Minimize variable bitwidths by propagating constants through the program instructions.
- Variables represented in one of two ways:
  - 1) As a min/max value e.g. -2 -> 2

- Minimize variable bitwidths by propagating constants through the program instructions.
- Variables represented in one of two ways:
  - 1) As a min/max value e.g. -2 -> 2
  - As a bitmask of known values (0 or 1), unknowns (?), or sign-extended bits (S).
    e.g. "S?10"

- Minimize variable bitwidths by propagating constants through the program instructions.
- Variables represented in one of two ways:

1) As a min/max value - e.g. -2 -> 2

 As a bitmask of known values (0 or 1), unknowns (?), or sign-extended bits (S).
e.g. "S?10"

Focus of our work

- Software program represented as a control dataflow graph (CDFG) of llvm operators.
- Traverse CDFG in forward and backward directions, propagating bitwidths through operators.
- For each llvm operator, we created forward and backward transit functions.
  - e.g Xor, Shl, Ashr, Mul, Div, etc.



#### Forward





















Backward

???0 x <u>????</u> ???0

???0 x <u>????</u> ???0 **???0** 



Backward

???0 x <u>????</u> ???0 ???00 **???00** 



Backward



**Backward** 

???0 x <u>????</u> ???0 ???00 ???000 + **???000** 



Backward





Backward





















#### WINNER!





#### Range vs. Bitmask analysis







#### Range vs. Bitmask analysis



#### Range and bitmask analyses are complementary

- Target Altera Cyclone II FPGAs.
- Used 10 CHStone benchmarks
  - All circuits were simulated after bitwidth reduction using ModelSim and golden inputs provided with CHStone to verify correct functionality.

- Bitwidth analysis llvm pass.
  - Result: Sum of instruction widths.



- Bitwidth analysis llvm pass.
  - Result: Sum of instruction widths.
- LegUp HLS llvm pass uses bitwidth analysis to generate minimized RTL.



- Bitwidth analysis llvm pass.
  - Result: Sum of instruction widths.
- LegUp HLS llvm pass uses bitwidth analysis to generate minimized RTL.
- Quartus generates optimized FPGA implementation.
  - It also minimizes bitwidth!
  - Results: Area in LUTs and registers, speed in Fmax.



- 5 flows
  - Bitmask analysis by itself



- 5 flows
  - Bitmask analysis by itself
  - Range analysis by itself (Campos et. al 2012)



- 5 flows
  - Bitmask analysis by itself
  - Range analysis by itself (Campos et. al 2012)
  - Range & bitmask analysis



- 5 flows
  - Bitmask analysis by itself
  - Range analysis by itself (Campos et. al 2012)
  - Range & Bitmask analysis
  - Profiling-based dynamic range analysis



- 5 flows
  - Bitmask analysis by itself
  - Range analysis by itself (Campos et. al 2012)
  - Range & Bitmask analysis
  - Profiling-based dynamic range analysis
  - Profiling-based dynamic range analysis & bitmask analysis













#### Area Results



#### Area Results



## **Conclusions and Future work**

- Opportunities exist to optimize instruction bitwidths in HLS that are not present in RTL synthesis.
  - 9% area improvement over Quartus.
- Using range and bitmask analysis approaches together yields better results than using either in isolation.
- Excellent dynamic range-analysis results show that program information can be used to further reduce area.
  - In hybrid system, minimized HW with SW fallback.
  - User hints for variable use.

Bitwidth minimization will be part of the LegUp 3.0 release. Soon to be available at:

#### http://legup.eecg.utoronto.ca