Yun Chou | National Taiwan University | 1 Publications | 2 Citations

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

VLSI Structure-aware Placement for Convolutional Neural Network Accelerator Units

[...]

Yun Chou¹, Jhih-Wei Hsu¹, Yao-Wen Chang¹, Tung-Chieh Chen•Institutions (1)

National Taiwan University¹

05 Dec 2021

TL;DR: In this paper, a kernel-based placement framework for CNN accelerator units is proposed, which extracts kernels from the circuit and inserts kernelbased regions to guide placement and minimize routing congestion.

...read moreread less

Abstract: AI-dedicated hardware designs are growing dramatically for various AI applications. These designs often contain highly connected circuit structures, reflecting the complicated structure in neural networks, such as convolutional layers and fully-connected layers. As a result, such dense interconnections incur severe congestion problems in physical design that cannot be solved by conventional placement methods. This paper proposes a novel placement framework for CNN accelerator units, which extracts kernels from the circuit and insert kernel-based regions to guide placement and minimize routing congestion. Experimental results show that our framework effectively reduces global routing congestion without wirelength degradation, significantly outperforming leading commercial tools.

...read moreread less

2 citations

Cited by

PDF

Open Access

More filters

Book Chapter•DOI•

Modified Floating Point Adder and Multiplier IP Design

[...]

01 Jan 2023

TL;DR: In this article , the IEEE 754 format for single/double precision floating-point architecture is used for adding and subtracting real numbers in a single-input single-output (SISO) architecture.

...read moreread less

Abstract: AbstractThe advent of digital circuits made it easy to realize many mathematical operations using the binary Boolean functions. The only problem was that, the mathematical operations were able to run at a significantly high speed with great accuracy for unsigned or signed integer numbers. Various architectures were able to accelerate the performance of mathematical operations on integers. But most of the real-world problems required operation with real numbers and hence either fixed point or more importantly floating-point arithmetic was necessary. It is possible to run different algorithms to execute operations on floating point numbers in an architecture designed for integer numbers. But the amount of CPU cycles required increases substantially with the complexity of problems involved, the precision required and accuracy as well. It is possible to develop architectures for fixed-point real numbers. But the reusability of the architecture is quite difficult for higher precision. For high accuracy or precision, the architecture hardware utilization increases exponentially. The conversion of fixed point to floating-point increases hardware as well. Hence, we have the IEEE 754 format for single/double precision floating-point architecture. The algorithms provide a fixed register architecture thereby enabling ease of architecture design along with certain acceleration if required. The algorithm predominantly consists of registers to hold the sign, exponent and mantissa along with modules to perform the required arithmetic operations, normalization modules, rounding off modules and bypass circuitry to accelerate addition/subtraction, multiplication with 0, 1 etc. The IP designed here for adder and multiplier, follows the IEEE 754 format for single precision. Bypass modules have been designed for adder and multiplier for addition and multiplication with zero. The final output is available after once clock cycle. This clock is necessary to load the registers with final answers. The normalization modules use chain structure of multiplexers to select the required inputs based on the comparator and for easy swapping. The complete modules by default, compute zero (reset) if inputs are not applied. The control logic involves design with basic gates to measure the direction of normalization and thereby faster computation and less hardware overhead. Universal Shit Registers are used to enable bidirectional shift in normalization of multiplication and shift registers are used for addition operation.KeywordsFloating point adderIPMultiplier

...read moreread less

Proceedings Article•DOI•

Routability-aware Placement Guidance Generation for Mixed-size Designs

[...]

05 Apr 2023

TL;DR: In this article , the authors explore the possibility of using placement guidance to mitigate routing congestion and reduce design rule violations for mixed-size designs by extracting the underlying knowledge of a mixed size design using a graph neural network, and generate an embedding for each standard cell.

...read moreread less

Abstract: Placement is a critical step in a modern physical design flow, and the routability of the placement result is a major issue that must be taken into account. In this work, we explore the possibility of using placement guidance to mitigate routing congestion and reduce design rule violations for mixed-size designs. By extracting the underlying knowledge of a mixed-size design using a graph neural network, we generate an embedding for each standard cell. Based on the embeddings, we cluster standard cells into groups and create the placement guidance. By adding the placement guidance to a commercial place-and-route tool, the tool will strive to avoid the fragmentation of standard cells with dense connections in the placement stage. Experimental results show that our placement guidance generation methodology helps the commercial tool reduce 26% routing overflow and 65% design rule violations for the test cases.

...read moreread less