Showing papers on "Physical design published in 2021"

PDF

Open Access

Journal Article•DOI•

A graph placement methodology for fast chip design

[...]

Azalia Mirhoseini¹, Anna Goldie², Anna Goldie¹, Mustafa Yazgan¹, Joe Wenjie Jiang¹, Ebrahim M. Songhori¹, Shen Wang¹, Young-Joon Lee¹, Eric Johnson¹, Omkar Pathak¹, Azade Nazi¹, Jiwoo Pak¹, Andy Tong¹, Kavya Srinivasa¹, William Hang², Emre Tuncer¹, Quoc V. Le¹, James Laudon¹, Richard Ho¹, Roger Carpenter¹, Jeffrey Dean¹ - Show less +17 more•Institutions (2)

Google¹, Stanford University²

09 Jun 2021-Nature

TL;DR: In this article, the authors presented a deep reinforcement learning approach to chip floorplanning, which can automatically generate chip floorplans that are superior or comparable to those produced by humans in all key metrics, including power consumption, performance and chip area.

...read moreread less

Abstract: Chip floorplanning is the engineering task of designing the physical layout of a computer chip. Despite five decades of research1, chip floorplanning has defied automation, requiring months of intense effort by physical design engineers to produce manufacturable layouts. Here we present a deep reinforcement learning approach to chip floorplanning. In under six hours, our method automatically generates chip floorplans that are superior or comparable to those produced by humans in all key metrics, including power consumption, performance and chip area. To achieve this, we pose chip floorplanning as a reinforcement learning problem, and develop an edge-based graph convolutional neural network architecture capable of learning rich and transferable representations of the chip. As a result, our method utilizes past experience to become better and faster at solving new instances of the problem, allowing chip design to be performed by artificial agents with more experience than any human designer. Our method was used to design the next generation of Google’s artificial intelligence (AI) accelerators, and has the potential to save thousands of hours of human effort for each new generation. Finally, we believe that more powerful AI-designed hardware will fuel advances in AI, creating a symbiotic relationship between the two fields. Machine learning tools are used to greatly accelerate chip layout design, by posing chip floorplanning as a reinforcement learning problem and using neural networks to generate high-performance chip layouts.

...read moreread less

124 citations

Journal Article•DOI•

A combined three and five inputs majority gate-based high performance coplanar full adder in quantum-dot cellular automata

[...]

Fahimeh Danehdaran, Shaahin Angizi¹, Milad Bagherian Khosroshahy², Keivan Navi², Nader Bagherzadeh³ - Show less +1 more•Institutions (3)

University of Central Florida¹, Shahid Beheshti University², University of California, Irvine³

01 Jun 2021-International Journal of Information Technology

TL;DR: An alternative approach for the streamlined physical design of quantum-dot cellular automata (QCA) full-adder circuits in which the placement of input cells and wire crossing congestion are substantially reduced.

...read moreread less

Abstract: Nowadays, arithmetic computing is an important subject in computer architectures in which the one-bit full-adder gate plays a significant role. Thus, efficient design of such full-adder component can be beneficial to the overall efficiency of the entire system. In this essay, a novel method for the design and simulation of a combined majority gate toward realization of the one-bit full-adder gate is proposed. We inspect an alternative approach for the streamlined physical design of quantum-dot cellular automata (QCA) full-adder circuits in which the placement of input cells and wire crossing congestion are substantially reduced. The proposed method has outstanding characteristics such as low complexity, reduced area consumption, simplified physical design, and ultra-high speed one-bit full-adder. Based on simulation results the proposed design provides 33.33% reduction in area and 20.00% improvement in complexity as well as 10.49% in 1 Ek reduction in power consumption.

...read moreread less

17 citations

Proceedings Article•DOI•

Global Placement with Deep Learning-Enabled Explicit Routability Optimization

[...]

Siting Liu¹, Qi Sun¹, Peiyu Liao¹, Yibo Lin², Bei Yu¹ - Show less +1 more•Institutions (2)

The Chinese University of Hong Kong¹, Peking University²

01 Feb 2021

TL;DR: In this article, a fully convolutional network model is proposed to predict congestion hotspots and then incorporate this prediction model into a placement engine, DREAMPlace, to get a more route-friendly result.

...read moreread less

Abstract: Placement and routing (PnR) is the most time-consuming part of the physical design flow. Recognizing the routing performance ahead of time can assist designers and design tools to optimize placement results in advance. In this paper, we propose a fully convolutional network model to predict congestion hotspots and then incorporate this prediction model into a placement engine, DREAMPlace, to get a more route-friendly result. The experimental results on ISPD2015 benchmarks show that with the superior accuracy of the prediction model, our proposed approach can achieve up to 9.05% reduction in congestion rate and 5.30% reduction in routed wirelength compared with the state-of-the-art.

...read moreread less

16 citations

Proceedings Article•DOI•

Analog Layout Placement for FinFET Technology Using Reinforcement Learning

[...]

Mehrnaz Ahmadi¹, Lihong Zhang¹•Institutions (1)

St. John's University¹

22 May 2021

TL;DR: This paper has proposed a reinforcement-learning-based method that can fully automate analog layout placement optimization and is not only applicable to any unseen analog placement scenarios, but also can meet the requirements of analog layouts placement designs in the advanced FinFET technology.

...read moreread less

Abstract: Despite all efforts being made to ease analog layout generation, the designers' expertise is still highly demanded in the process of analog IC physical design. Recently, some endeavors started to leverage artificial intelligence (AI) to tackle the complexity of analog layout optimization and alleviate the high demand for the designers' experience in the design process. However, these methods, which mainly rely on using the previous designs, are not effective to the unseen data (or scenarios) that were not included in the AI training. In this paper, we have proposed a reinforcement-learning-based method that can fully automate analog layout placement optimization. It is not only applicable to any unseen analog placement scenarios, but also can meet the requirements of analog layout placement designs in the advanced FinFET technology. Our experimental results show that the proposed method can place analog modules subject to the defined objectives 77x faster than the conventional analytical methods (e.g., conjugate gradient) without compromising the optimization accuracy.

...read moreread less

15 citations

Journal Article•DOI•

Flow-Based Microfluidic Biochips with Distributed Channel Storage: Synthesis, Physical Design, and Wash Optimization

[...]

Xing Huang¹, Wenzhong Guo², Zhisheng Chen², Bing Li¹, Tsung-Yi Ho³, Ulf Schlichtmann¹ - Show less +2 more•Institutions (3)

Technische Universität München¹, Fuzhou University², National Tsing Hua University³

26 Jan 2021-IEEE Transactions on Computers

TL;DR: This article forms the first practical system level design and wash optimization problem for microfluidic biochips with distributed channel storage architecture, considering high-level synthesis, physical design, and wash optimized simultaneously, and presents a top-down design flow to solve this problem systematically.

...read moreread less

Abstract: System-architecture design optimization of flow-based microfluidic biochips has been extensively investigated over the past decade. Most of the prior work, however, is still based on chip architectures with dedicated storage units and this, not only limits the performance of biochips, but also increases their fabrication cost. To overcome this limitation, a distributed channel-storage architecture can be implemented, where fluid samples can be cached temporarily in flow channels instead of using a dedicated storage. This new concept of fluid storage, however, requires a careful arrangement of fluid samples to enable the channels to fulfill their dual functions. Moreover, to avoid cross-contamination between fluidic flows, wash operations are necessary to remove the residue left in flow channels. In this paper, we formulate the first system level design and wash optimization problem for microfluidic biochips with distributed channel storage, considering high-level synthesis, physical design, and wash optimization simultaneously. Given the protocol of a biochemical application and the corresponding design requirements, our goal is to generate a chip architecture with minimized cost. Meanwhile the bioassay can be executed efficiently with an optimized wash scheme. Experimental results confirm that our approach leads to short completion time of bioassays, low chip cost, and high wash efficiency.

...read moreread less

11 citations

Proceedings Article•DOI•

Snap-3D: A Constrained Placement-Driven Physical Design Methodology for Face-to-Face-Bonded 3D ICs

[...]

Pruek Vanna-iampikul¹, Chengjia Shao¹, Yi-Chen Lu¹, Sai Pentapati¹, Sung Kyu Lim¹ - Show less +1 more•Institutions (1)

Georgia Institute of Technology¹

22 Mar 2021

TL;DR: In this article, a constraint-driven placement approach is proposed to build commercial-quality 3D ICs. But, this approach is limited to two-dimensional (2D) ICs and cannot handle 3D integration.

...read moreread less

Abstract: 3D integration technology is one of the leading options that can advance Moore's Law beyond conventional scaling. Due to the absence of commercial 3D placers and routers, existing 3D physical design flows rely heavily on 2D commercial tools to handle 3D IC physical synthesis. Specifically, these flows build 2D designs first and then convert them into 3D designs. However, several works demonstrate that design qualities degrade during this 2D-3D transformation. In this paper, we overcome this issue with our Snap-3D, a constraint-driven placement approach to build commercial-quality 3D ICs. Our key idea is based on the observation that if the standard cell height is contracted by one half and partitioned into multiple tiers, any commercial 2D placer can place them onto the row structure and naturally achieve high-quality 3D placement. This methodology is shown to optimize power, performance, and area (PPA) metrics across different tiers simultaneously and minimize the aforementioned design quality loss. Experimental results on 7 industrial designs demonstrate that Snap-3D achieves up to 5.4% wirelength, 10.1% power, and 92.3% total negative slack improvements compared with state-of-the-art 3D design flows.

...read moreread less

10 citations

Proceedings Article•DOI•

TAP-2.5D: A Thermally-Aware Chiplet Placement Methodology for 2.5D Systems

[...]

Yenai Ma¹, Leila Delshadtehrani¹, Cansu Demirkiran¹, José L. Abellán², Aiav Joshi¹ - Show less +1 more•Institutions (2)

Boston University¹, University of Murcia²

01 Feb 2021

TL;DR: TAP-2.5D as discussed by the authors is the first open-source network routing and thermally-aware chiplet placement methodology for heterogeneous 2-5D systems, which strategically inserts spacing between chiplets to jointly minimize the temperature and total wirelength, and in turn increases the thermal design power envelope of the overall system.

...read moreread less

Abstract: Heterogeneous systems are commonly used today to sustain the historic benefits we have achieved through technology scaling. 2.5D integration technology provides a cost-effective solution for designing heterogeneous systems. The traditional physical design of a 2.5D heterogeneous system closely packs the chiplets to minimize wirelength, but this leads to a thermally-inefficient design. We propose TAP-2.5D: the first open-source network routing and thermally-aware chiplet placement methodology for heterogeneous 2.5D systems. TAP-2.5D strategically inserts spacing between chiplets to jointly minimize the temperature and total wirelength, and in turn, increases the thermal design power envelope of the overall system. We present three case studies demonstrating the usage and efficacy of TAP-2.5D.

...read moreread less

10 citations

Proceedings Article•DOI•

Design Considerations for Edge Neural Network Accelerators: An Industry Perspective

[...]

Arnab Raha¹, Kim Sang Kyun¹, Deepak A. Mathaikutty¹, Guruguhanathan Venkataramanan¹, Debabrata Mohapatra¹, Raymond Sung¹, Cormac Brick¹, Gautham N. Chinya¹ - Show less +4 more•Institutions (1)

Intel¹

01 Feb 2021

TL;DR: In this article, the authors focus on Deep Neural Networks (DNN) and how to build efficient deep neural network accelerators through microarchitectural exploration, energy efficient memory hierarchies, flexible dataflow distribution, domain-specific compute optimizations and finally hardware-software co-design techniques.

...read moreread less

Abstract: AI acceleration is one of the most actively researched fields in IP and system design. The introduction of specialized AI accelerators in the cloud and at the edge has made it possible to deploy large scaled AI solutions that automate tasks that were previously not possible without AI. The growth of big data and the compute horsepower needed to process this data to provide business intelligence is key for several companies in gaining a competitive edge. AI workloads are both data and compute intensive and improving the efficiency often requires an end-to-end solution. In this perspective paper, we identify key considerations for the design of AI accelerators. The focus of this paper is on Deep Neural Networks (DNN) and how to build efficient deep neural network accelerators through microarchitectural exploration, energy efficient memory hierarchies, flexible dataflow distribution, domain-specific compute optimizations and finally hardware-software co-design techniques. The importance of interconnect topology and the impact of its scaling to the physical design of an AI accelerator is also a key consideration that is described in this paper. In the future, the energy efficiency of these accelerators may rely on approximation computing, compute-in-memory and runtime flexibility for significant improvement.

...read moreread less

9 citations

Journal Article•DOI•

Analog and Mixed-Signal Layout Automation Using Digital Place-and-Route Tools

[...]

Po-Hsuan Wei¹, Boris Murmann¹•Institutions (1)

Stanford University¹

27 Aug 2021-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This work presents an AMS layout generation flow that leverages digital place-and-route (PnR) tools, amortizes setup cost with reusable primitives, and prunes layout candidates using a fast evaluation scheme.

...read moreread less

Abstract: Today’s analog and mixed-signal (AMS) layout flow requires long manual iterations and does not leverage computing resources for data-driven optimization. This issue is further compounded by the explosion of design rules and layout-dependent effects (LDEs). We present an AMS layout generation flow that leverages digital place-and-route (PnR) tools, amortizes setup cost with reusable primitives, and prunes layout candidates using a fast evaluation scheme. We also analyze LDEs and parasitics and investigate unique challenges and mitigation strategies associated with using digital PnR tools for AMS circuits. These insights are validated with a generated StrongARM comparator and a voltage-controlled oscillator (VCO). The VCO layout was optimized in 2 h and fabricated in 16-nm FinFET CMOS. Silicon measurement results of the VCO closely track the simulation, verifying the methodology from netlist to silicon.

...read moreread less

9 citations

Journal Article•DOI•

A survey on design and synthesis techniques for photonic integrated circuits

[...]

Sumit Sharma¹, Sudip Roy¹•Institutions (1)

Indian Institute of Technology Roorkee¹

01 May 2021-The Journal of Supercomputing

TL;DR: In this paper, the authors present an extensive survey of recent work reported in the literature on the domains of logic circuit design, synthesis, and physical design automation for implementing photonic integrated circuits (PICs).

...read moreread less

Abstract: In recent years, silicon photonics (Si-photonics) have received significant attention among researchers due to complementary metal-oxide semiconductor compatibility, and the characteristics of high-speed and low-power dissipation. The integration of electronic and optical circuits on a single chip has opened up new directions of research in the domain of digital logic design and synthesis of photonic integrated circuits (PICs). Several optical switching devices using different technologies have been designed and experimentally demonstrated, which further helps in implementing PICs. In order to efficiently design larger, complex and reliable PICs, the photonic design automation techniques are being explored as electronic design automation techniques have been investigated in case of very large-scale integration circuits. This paper presents an extensive survey of recent work reported in the literature on the domains of logic circuit design, synthesis, and physical design automation for implementing PICs. The aim of this survey is to start with the fundamental optical concepts and then move to the latest research domains of design and synthesis of PICs. Finally, we provide a discussion on the challenges and the future research directions toward practically realizing the Si-photonics and PICs.

...read moreread less

9 citations

Proceedings Article•DOI•

A Unified Printed Circuit Board Routing Algorithm With Complicated Constraints and Differential Pairs

[...]

Ting-Chou Lin¹, Devon J. Merrill¹, Yen-Yi Wu¹, Chester Holtz¹, Chung-Kuan Cheng¹ - Show less +1 more•Institutions (1)

University of California, San Diego¹

18 Jan 2021

TL;DR: In this article, the authors propose a full-board routing algorithm that can handle multiple real-world complicated constraints to facilitate the printed circuit board routing and produce high-quality manufacturable layouts.

...read moreread less

Abstract: The printed circuit board (PCB) routing problem has been studied extensively in recent years Due to continually growing net/pin counts, extremely high pin density, and unique physical constraints, the manual routing of PCBs has become a time-consuming task to reach design closure Previous works break down the problem into escape routing and area routing and focus on these problems separately However, there is always a gap between these two problems requiring a massive amount of human efforts to fine-tune the algorithms back and forth Besides, previous works of area routing mainly focus on routing between escaping routed ball-grid-array (BGA) packages Nevertheless, in practice, many components are not in the form of BGA packages, such as passive devices, decoupling capacitors, and through-hole pin arrays To mitigate the deficiencies of previous works, we propose a full-board routing algorithm that can handle multiple real-world complicated constraints to facilitate the printed circuit board routing and produce high-quality manufacturable layouts Experimental results show that our algorithm is effective and efficient Specifically, for all given test cases, our router can achieve 100% routability without any design rule violation while the other two state-of-the-art routers fail to complete the routing for some test cases and incur design rule violations

...read moreread less

Proceedings Article•DOI•

Top-down Physical Design of Soft Embedded FPGA Fabrics

[...]

Prashanth Mohan¹, Oguz Atli¹, Onur Kibar¹, Mohammed Zackriya¹, Larry Pileggi¹, Ken Mai¹ - Show less +2 more•Institutions (1)

Carnegie Mellon University¹

17 Feb 2021

TL;DR: Soft embedded FPGA redaction as discussed by the authors is a hardware obfuscation approach that allows the designer to substitute security-critical IP blocks within a design with a synthesizable eFPGA fabric.

...read moreread less

Abstract: In recent years, IC reverse engineering and IC fabrication supply chain security have grown to become significant economic and security threats for designers, system integrators, and end customers. Many of the existing logic locking and obfuscation techniques have shown to be vulnerable to attack once the attacker has access to the design netlist either through reverse engineering or through an untrusted fabrication facility. We introduce soft embedded FPGA redaction, a hardware obfuscation approach that allows the designer substitute security-critical IP blocks within a design with a synthesizable eFPGA fabric. This method fully conceals the logic and the routing of the critical IP and is compatible with standard ASIC flows for easy integration and process portability. To demonstrate eFPGA redaction, we obfuscate a RISC-V control path and a GPS P-code generator. We also show that the modified netlists are resilient to SAT attacks with moderate VLSI overheads. The secure RISC-V design has 1.89x area and 2.36x delay overhead while the GPS design has 1.39x area and negligible delay overhead when implemented on an industrial 22nm FinFET CMOS process.

...read moreread less

Proceedings Article•DOI•

ILP-Based Global Routing Optimization with Cell Movements

[...]

Tiago Augusto Fontana¹, Erfan Aghaeekiasaraee², Renan Netto¹, Sheiny Almeida¹, Upma Gandh², Aysa Fakheri Tabrizi², David T. Westwick², Laleh Behjat², Jose Luis Guntzel¹ - Show less +5 more•Institutions (2)

Universidade Federal de Santa Catarina¹, University of Calgary²

07 Jul 2021

TL;DR: In this article, an ILP-based global routing with cell movement (ILP-GRC) is proposed to reduce the wirelength of ICs by 16% and 7% on average.

...read moreread less

Abstract: The placement and routing processes are key parts of the physical design of Integrated Circuits (IC). They directly impact the circuit’s performance, area, power consumption, and reliability. During the physical design flow, placement and routing problems are usually tackled using a divide-and-conquer approach to reduce the complexity and size of modern circuits. The increase in complexity and size of circuits means that small inefficiencies in the placement solution can be amplified during routing and severely impact the quality and convergence of the design. In this work, we propose an Integer Linear Programming (ILP)-based Global Routing with Cell Movement (ILP-GRC) that simultaneously performs cell movements and routes nets. The proposed method enables the designer to relocate cells that can lead to routing issues at an early stage and without compromising the quality with respect to wirelength and the number of vias. The proposed model is evaluated using ISPD2018 contest benchmarks. Results show that ILP-GRC is able to reduce wirelength on average by 16% and 7% following academic and commercial global routers, respectively, while moving only 2% of cells on average.

...read moreread less

Proceedings Article•DOI•

Read your Circuit: Leveraging Word Embedding to Guide Logic Optimization

[...]

Walter Lau Neto¹, Matheus T. Moreira, Luca Amaru², Cunxi Yu¹, Pierre-Emmanuel Gaillardon¹ - Show less +1 more•Institutions (2)

University of Utah¹, Synopsys²

18 Jan 2021

TL;DR: In this paper, the authors formalize logic paths as sentences, with the gates being a bag of words, and show how word embedding can be leveraged to represent generic paths and predict if a given path is likely to be critical post-PnR.

...read moreread less

Abstract: To tackle the involved complexity, Electronic Design Automation (EDA) tools are broken in well-defined steps, each operating at different abstraction levels. Higher levels of abstraction shorten the flow run-time while sacrificing correlation with the physical circuit implementation. Bridging this gap between Logic Synthesis tool and Physical Design (PnR) tools is key to improve Quality of Results (QoR), while possibly shorting the time-to-market. To address this problem, in this work, we formalize logic paths as sentences, with the gates being a bag of words. Thus, we show how word embedding can be leveraged to represent generic paths and predict if a given path is likely to be critical post-PnR. We present the effectiveness of our approach, with accuracy over than 90% for our test-cases. Finally, we give a step further and introduce an intelligent and non-intrusive flow that uses this information to guide optimization. Our flow presents up to 15.53% area delay product (ADP) and 18.56% power delay product (PDP), compared to a standard flow.

...read moreread less

Proceedings Article•DOI•

GRA-LPO: Graph Convolution Based Leakage Power Optimization

[...]

Uday Mallappa¹, Chung-Kuan Cheng¹•Institutions (1)

University of California, San Diego¹

18 Jan 2021

TL;DR: In this paper, the authors use graph convolution neural networks to predict per-cell recoverable slack, and translate these slack values to equivalent power savings, and apply this model to several graphs with various logic-cone structures.

...read moreread less

Abstract: Static power consumption is a critical challenge for IC designs, particularly for mobile and IoT applications. A final post-layout step in modern design flows involves a leakage recovery step that is embedded in signoff static timing analysis tools. The goal of such recovery is to make use of the positive slack (if any) and recover the leakage power by performing cell swaps with footprint compatible variants. Though such swaps result in unaltered routing, the hard constraint is not to introduce any new timing violations. This process can require up to tens of hours of runtime, just before the tapeout, when schedule and resource constraints are tightest. The physical design teams can benefit greatly from a fast predictor of the leakage recovery step: if the eventual recovery will be too small, the entire step can be skipped, and the resources can be allocated elsewhere. If we represent the circuit netlist as a graph with cells as vertices and nets connecting these cells as edges, the leakage recovery step is an optimization step, on this graph. If we can learn these optimizations over several graphs with various logic-cone structures, we can generalize the learning to unseen graphs. Using graph convolution neural networks, we develop a learning-based model, that predicts per-cell recoverable slack, and translate these slack values to equivalent power savings. For designs up to 1.6M instances, our inference step takes less than 12 seconds on a Tesla P100 GPU, and an additional feature extraction, post-processing steps consuming 420 seconds. The model is accurate with relative error under 6.2%, for the design-specific context.

...read moreread less

Proceedings Article•DOI•

Design Benefits of Hybrid Bonding for 3D Integration

[...]

Theodros Nigussie, Tse-Han Pan¹, Steve Lipa¹, W. Shepherd Pitts¹, Javi DeLaCruz, Paul D. Franzon¹ - Show less +2 more•Institutions (1)

North Carolina State University¹

01 Jun 2021

TL;DR: In this article, the authors present electrical and thermal analyses of 3D digital designs using hybrid bonding, specifically using the design rules, and other properties, for the XPERI DBI® technology at a $\mathrm{1.6}\ \mu \mathm{m}$ pad pitch.

...read moreread less

Abstract: We present electrical and thermal analyses of 3D digital designs using hybrid bonding, specifically using the design rules, and other properties, for the XPERI DBI® technology at a $\mathrm{1.6}\ \mu \mathrm{m}$ pad pitch. We also go over the advantages of hybrid bonding over thermo-compression bonding (TCB) and 2D designs. Commercial 3D physical design tools were not mature when we did this work, so we came up with a methodology that builds on 2D tools. Our design flow includes scripts for optimal assignment of bonding locations, partitioning of netlist and delay constraints, and optimization techniques that involve iterating on delay constraints. Various partitioning schemes that include targeting long nets, managing flip-flop distribution between tiers, and minimum cut partitioning using an open source tool were analyzed. Because analysis results could vary from design to design, we propose potential metrics that can be used to identify designs that may benefit from 3DIC technology. Analysis results showed that we were able to reduce routed wire length by up to 57%. Logic power and total power decreased by up to 34% and 22% respectively. Silicon area also improved by 11%.11This work was supported, in part, by Xperi. DISTRIBUTION STATE-MENT A. Approved for public release: distribution unlimited.

...read moreread less

Journal Article•DOI•

Congestion Prediction in FPGA Using Regression Based Learning Methods

[...]

Pingakshya Goswami, Dinesh Bhatia

18 Aug 2021-Electronics

TL;DR: A methodology for a post-placement, machine learning-based routing congestion prediction model for FPGAs, which shows significant improvement in terms of accuracy measured as mean absolute error and prediction time when compared against the latest state-of-the-art works.

...read moreread less

Abstract: Design closure in general VLSI physical design flows and FPGA physical design flows is an important and time-consuming problem. Routing itself can consume as much as 70% of the total design time. Accurate congestion estimation during the early stages of the design flow can help alleviate last-minute routing-related surprises. This paper has described a methodology for a post-placement, machine learning-based routing congestion prediction model for FPGAs. Routing congestion is modeled as a regression problem. We have described the methods for generating training data, feature extractions, training, regression models, validation, and deployment approaches. We have tested our prediction model by using ISPD 2016 FPGA benchmarks. Our prediction method reports a very accurate localized congestion value in each channel around a configurable logic block (CLB). The localized congestion is predicted in both vertical and horizontal directions. We demonstrate the effectiveness of our model on completely unseen designs that are not initially part of the training data set. The generated results show significant improvement in terms of accuracy measured as mean absolute error and prediction time when compared against the latest state-of-the-art works.

...read moreread less

Proceedings Article•DOI•

Power delivery and thermal-aware arm-based multi-tier 3D architecture

[...]

Lingjun Zhu¹, Tuan Ta², Rossana Liu, Rahul Mathur, Xiaoqing Xu, Shidhartha Das, Ankit Kaul¹, Alejandro Rico, Doug Joseph, Brian Cline, Sung Kyu Lim¹ - Show less +7 more•Institutions (2)

Georgia Institute of Technology¹, Cornell University²

26 Jul 2021

TL;DR: In this paper, the authors explore and evaluate multiple design options for an Arm Neoverse-based 3D architecture focusing on power and thermals at 7nm process and sub-10$\mu $m pitch.

...read moreread less

Abstract: 3D integration is becoming a cost-effective way to incorporate more CPU cores and memory to improve the performance of computing systems. Meanwhile, due to the higher power density, power delivery and thermal issues become more significant in multi-tier 3DICs. In this paper, we explore and evaluate multiple design options for an Arm Neoverse-based 3D architecture focusing on power and thermals at 7nm process and sub-10$\mu $m pitch. Using a rapid voltage-drop and thermal analysis methodology, we model a system with a 32-core CPU layer and up to 4 layers of system-level caches, and quantity the trade-offs between performance, cost, voltage-drop, and temperature. A 3-layer configuration shows a good balance with 17% IPC gain and 17% lower cost, while incurring 15mV worse voltage drop and 8.5°C higher temperature compared with 2D. Our studies suggest that the co-optimization of system architecture, technology, and physical design is key for high-performance 3D systems.

...read moreread less

Proceedings Article•DOI•

Impact of Physical Design on PUF Behavior: A Statistical Study

[...]

Sayed Elgendy¹, Eslam Yahya Tawfik¹•Institutions (1)

Ohio State University¹

22 May 2021

TL;DR: This work conducts a statistical analysis to quantify the effect of physical layout on the randomness of multiple copies of the same PUF structure that deployed relatively close on the same FPGA die.

...read moreread less

Abstract: FPGA-based designs dominate many applications for their faster implementation, configurability, and low design cost. These applications need a root of trust to be secured against malicious activities. Physical unclonable functions (PUF) are promising security primitive that can be used to identify silicon dies. To effectively distinguish between different dies, PUF should satisfy a set of quality metrics. FPGA-based PUFs are susceptible to parameters such as systematic variation and placement and routing. In this work, we conduct a statistical analysis to quantify the effect of physical layout on the randomness of multiple copies of the same PUF structure that deployed relatively close on the same FPGA die. As a case study, we have adopted an FPGA-based ring PUF structure known as bistable ring PUF. The results show that only 2 out of the 64 PUFs structure can show good randomness behavior. Even by considering 40%-60% randomness as an acceptable range, only 10 out of 64 PUFs can pass this criterion. Moreover, most of the rest PUFs are extremely biased toward a single state. As a result, 84.4% of PUFs under test end up non-functional PUFs due to the physical design of the FPGA.

...read moreread less

Journal Article•DOI•

PathDriver+: Enhanced Path-Driven Architecture Design for Flow-Based Microfluidic Biochips

[...]

Xing Huang¹, Youlin Pan², Grace Li Zhang¹, Bing Li¹, Wenzhong Guo², Tsung-Yi Ho³, Ulf Schlichtmann¹ - Show less +3 more•Institutions (3)

Technische Universität München¹, Fuzhou University², National Tsing Hua University³

10 Aug 2021-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: PathDriver+ as mentioned in this paper integrates the actual fluid manipulations into both high-level synthesis and physical design of microfluidic biochips, which has never been considered in prior work.

...read moreread less

Abstract: Continuous-flow microfluidic biochips have attracted high research interest over the past years. Inside such a chip, fluid samples of milliliter volumes are efficiently transported between devices (e.g., mixers, heaters, etc.) to automatically perform various laboratory procedures in biology and biochemistry. Each transportation task, however, requires an exclusive flow path composed of multiple contiguous microchannels during its execution period. Excess/waste fluids, in the meantime, should be discarded by independent flow paths connected to waste ports. All these paths are etched in a very tiny chip area using multilayer soft lithography and driven by flow ports connecting with external pressure sources, forming a highly integrated chip architecture that determines the final performance of biochips. In this paper, we propose a new and practical design flow called PathDriver+ for the architecture design of microfluidic biochips, integrating the actual fluid manipulations into both high-level synthesis and physical design, which has never been considered in prior work. With this design flow, highly efficient chip architectures with a flow-path network that enables the actual fluid transportation and removal can be constructed automatically. Meanwhile, fluid volume management between devices and flowpath minimization are realized for the first time, thus ensuring the correctness of assay outcomes while reducing the complexity of chip architectures. Additionally, diagonal channel routing is implemented to fundamentally improve the chip performance. The tradeoff between the numbers of channel intersections and fluidic ports is evaluated to further reduce the fabrication cost of biochips. Experimental results on multiple benchmarks confirm that the proposed design flow leads to high assay execution efficiency and low overall chip cost.

...read moreread less

Journal Article•DOI•

Design and implementation of energy-efficient near-threshold standard cell library for IoT applications

[...]

AbdelRahman Hesham¹, Amin Nassar¹, Hassan Mostafa¹•Institutions (1)

Cairo University¹

01 Sep 2021-Aeu-international Journal of Electronics and Communications

TL;DR: A CMOS standard cells library of low-energy, minimum-area, and fitted for IoT applications is introduced in this paper, which uses two solutions to provide significant energy saving.

...read moreread less

Abstract: A CMOS standard cells library of low-energy, minimum-area, and fitted for IoT applications is introduced in this paper. The paper uses two solutions to provide significant energy saving. The first is to design the library to be operating in the Near-Threshold Voltage (NTV) region. The second is to create layouts of cells at the minimum possible area that can be achieved for a given technology process. To partially recover the speed loss due to operating in the NTV region, the pMOS performance is boosted by a proposed body biasing technique that connects pMOS body to the ground. Furthermore, minimum energy consumption is considered at the selection of the library supply voltage and the selection of each cell transistor sizing, while keeping the library performing in the range of 1 MHz up to 20 MHz. This range is sufficient for IoT applications. Another challenge for the NTV is Performance Sensitivity to the process variations, which is analyzed, then a design solution is provided to assure timing closure with such sensitivity. The UMC 130 nm CMOS process technology was used to design and characterize the proposed library. Library timing and physical views were created to enable its usage in both synthesis and physical design tools. Library benchmark was done on three cryptography algorithms to show the benefit for IoT applications. The used algorithms are AEGIS-128, ASCON, and AEZ. The maximum achieved frequency for these cores is 14 MHz, 18 MHz, and 16 MHz, and the corresponding energy consumption is 4.25 pJ, 10.03 pJ, and 30.57 pJ, respectively.

...read moreread less

Journal Article•DOI•

High-Performance Logic-on-Memory Monolithic 3-D IC Designs for Arm Cortex-A Processors

[...]

Lingjun Zhu¹, Lennart Bamberg², Sai Pentapati¹, Kyungwook Chang³, Francky Catthoor⁴, Dragomir Milojevic⁴, Manu Komalan⁴, Brian Cline, Saurabh Sinha, Xiaoqing Xu, Alberto Garcia-Ortiz², Sung Kyu Lim¹ - Show less +8 more•Institutions (4)

Georgia Institute of Technology¹, University of Bremen², Sungkyunkwan University³, Katholieke Universiteit Leuven⁴

30 Apr 2021-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: In this article, the authors developed a new physical design flow that optimally places and routes cache modules in one tier and logic gates in the other, and developed a signoff analysis tool flow to evaluate power, performance, area (PPA), thermal, and voltage-drop quality for given M3-D designs.

...read moreread less

Abstract: Monolithic 3-D IC (M3-D) is a promising solution to improve the performance and energy-efficiency of modern processors. But, designers are faced with challenges in design tools and methodologies, especially for power and thermal verifications. We developed a new physical design flow that optimally places and routes cache modules in one tier and logic gates in the other. Our tool also builds high-quality clock and power delivery networks targeting logic-on-memory M3-D designs. Finally, we developed a sign-off analysis tool flow to evaluate power, performance, area (PPA), thermal, and voltage-drop quality for given M3-D designs. Using our complete register transfer level (RTL)-to-Graphic Design System (GDS) tool flow, we designed commercial quality 2-D and M3-D implementation of Arm Cortex-A7 and Cortex-A53 processors in a commercial 28-nm technology. Experimental results show that our 3-D processors offer 20% (A7) and 21% (A53) performance gain, compared with their 2-D commercial counterparts. The voltage-drop degradation of our 3-D Cortex-A7 and Cortex-A53 processors is less than 3% of the supply voltage, while temperature increase is 10.71 °C and 13.04 °C, respectively.

...read moreread less

Proceedings Article•DOI•

Asymmetric Aging Avoidance EDA Tool

[...]

Freddy Gabbay¹, Avi Mendelson², Basel Salameh², Majd Ganaiem²•Institutions (2)

Ruppin Academic Center¹, Technion – Israel Institute of Technology²

23 Aug 2021

TL;DR: In this paper, the authors proposed a flow and a tool to minimize the asymmetric aging effect in data path design structures, which can be straightforwardly integrated as part of standard design flows of large-scale ICs.

...read moreread less

Abstract: The latest process technologies have become highly susceptible to asymmetric aging, whereby the timing of logical elements degrades at unequal rates over the element lifetime, causing severe reliability concerns. Although several tools are available to handle asymmetric aging, such tools mainly rely on circuit or physical design approaches and offer a limited capability to handle large-scale ICs. In this paper, we introduce a flow and a tool to minimize the asymmetric aging effect in data path design structures. The proposed tool can be straightforwardly integrated as part of standard design flows of large-scale ICs. In addition, the tool can automatically analyze various designs at RTL or gate-level and identify logical elements which are suspectable to asymmetric aging. As part of the design flow, the tool automatically embeds a special logical circuitry in the design to eliminate asymmetric aging. Our experimental analysis shows that the proposed design flow can minimize the asymmetric aging effect and eliminate reliability concerns while introducing minor power and silicon area overhead.

...read moreread less

Journal Article•DOI•

High-Level Annotation of Routing Congestion for Xilinx Vivado HLS Designs

[...]

Osama Bin Tariq¹, Junnan Shan¹, Georgios Floros², Christos Sotiriou², Mario R. Casu¹, Mihai Teodor Lazarescu¹, Luciano Lavagno¹ - Show less +3 more•Institutions (2)

Polytechnic University of Turin¹, University of Thessaly²

19 Mar 2021-IEEE Access

TL;DR: In this article, a back-annotation flow is proposed to summarize the routing congestion issues at the source level by analyzing the reports from the FPGA physical design tools and the internal debugging files of the HLS tools.

...read moreread less

Abstract: Ever since transistor cost stopped decreasing, customized programmable platforms, such as field-programmable gate arrays (FPGAs), became a major way to improve software execution performance and energy consumption. While software developers can use high-level synthesis (HLS) to speed up register-transfer level (RTL) code generation from C++ or OpenCL source code, placement and routing issues, such as congestion, can still prevent achieving an FPGA programming bitstream or dramatically reduce the FPGA implementation performance. Congestion reports from physical design tools refer to thousands of RTL signal names instead of developer-accessible identifiers and statements, considerably complicating the developer understanding and resolution of the issues at the source level. We propose a high-level back-annotation flow that summarizes the routing congestion issues at the source level by analyzing the reports from the FPGA physical design tools and the internal debugging files of the HLS tools. Our flow describes congestion using comments back-annotated on the source code and identifies if the congestion causes are the on-chip memories or the DSP units (multipliers/adders), which are the shared resources very often associated with routing problems on FPGAs. We demonstrate on realistic large designs how the information provided by our flow helps to quickly spot congestion causes at the source level and to solve them using appropriate HLS directives.

...read moreread less

Journal Article•DOI•

A hybrid circuits-cloud: Development of a low-cost secure cloud-based collaborative platform for A/D circuits in virtual hardware E-lab

[...]

Shaffee Mayoof, Hasan Al-Aswad¹, Sameer Aljeshi¹, Ahmed Tarafa¹, Wael Elmedany¹ - Show less +1 more•Institutions (1)

College of Information Technology¹

01 Jun 2021-Ain Shams Engineering Journal

TL;DR: In this article, a hybrid-circuits cloud-based platform that enables students to design, simulate, and model both analog and digital electronic systems is presented. But, it is not suitable for the use in virtual learning.

...read moreread less

Proceedings Article•DOI•

Physical Synthesis for Advanced Neural Network Processors

[...]

Zhuolun He¹, Peiyu Liao¹, Siting Liu¹, Yuzhe Ma¹, Yibo Lin², Bei Yu¹ - Show less +2 more•Institutions (2)

The Chinese University of Hong Kong¹, Peking University²

18 Jan 2021

TL;DR: In this paper, the authors review several physical synthesis techniques for advanced neural network processors and argue that datapath design is an essential methodology in the above procedures due to the organized computational graph of neural networks.

...read moreread less

Abstract: The remarkable breakthroughs in deep learning have led to a dramatic thirst for computational resources to tackle interesting real-world problems Various neural network processors have been proposed for the purpose, yet, far fewer discussions have been made on the physical synthesis for such specialized processors, especially in advanced technology nodes In this paper, we review several physical synthesis techniques for advanced neural network processors We especially argue that datapath design is an essential methodology in the above procedures due to the organized computational graph of neural networks As a case study, we investigate a wafer-scale deep learning accelerator placement problem in detail

...read moreread less

Proceedings Article•DOI•

A Scalable and Robust Hierarchical Floorplanning to Enable 24-hour Prototyping for 100k-LUT FPGAs

[...]

Ganesh Gore¹, Xifan Tang¹, Pierre-Emmanuel Gaillardon¹•Institutions (1)

University of Utah¹

22 Mar 2021

TL;DR: In this paper, the authors present scalable and adaptive hierarchical floorplanning strategies to significantly reduce the physical design runtime and enable millions of LUT FPGA layout implementations using standard ASIC toolchains.

...read moreread less

Abstract: Physical design for Field Programmable Gate Array (FPGA) is challenging and time-consuming, primarily due to the use of a full-custom approach for aggressively optimize Performance, Power and Area (P.P.A.) of the FPGA design. The growing number of FPGA applications demands novel architectures and shorter development cycles. The use of an automated toolchain is essential to reduce end-to-end development time. This paper presents scalable and adaptive hierarchical floorplanning strategies to significantly reduce the physical design runtime and enable millions-of-LUT FPGA layout implementations using standard ASIC toolchains. This approach mainly exploits the regularity of the design and performs necessary feedthrough creations for global and clock nets to eliminate any requirement of global optimizations. To validate this approach, we implemented full-chip layouts for modern FPGA fabric with logic capacity ranging from 40 to 100k LUTs using a commercial 12nm technology. Our results show that the physical implementation of a 128k-LUT FPGA fabric can be achieved within 24-hours, which has not been demonstrated by any previous work. Compared to previous work, the runtime reduction of 8x is obtained for implementing 2.5k LUTs FPGA device.

...read moreread less

Journal Article•DOI•

Convex restrictions in physical design.

[...]

Guillermo Angeris¹, Jelena Vuckovic¹, Stephen Boyd¹•Institutions (1)

Stanford University¹

21 Jun 2021-Scientific Reports

TL;DR: In this article, a convex optimization problem for photonic design with real scalar fields, diffusion-type systems, and others is shown to be efficiently solved globally, given the sign of an optimal field at every point.

...read moreread less

Abstract: In a physical design problem, the designer chooses values of some physical parameters, within limits, to optimize the resulting field. We focus on the specific case in which each physical design parameter is the ratio of two field variables. This form occurs for photonic design with real scalar fields, diffusion-type systems, and others. We show that such problems can be reduced to a convex optimization problem, and therefore efficiently solved globally, given the sign of an optimal field at every point. This observation suggests a heuristic, in which the signs of the field are iteratively updated. This heuristic appears to have good practical performance on diffusion-type problems (including thermal design and resistive circuit design) and some control problems, while exhibiting moderate performance on photonic design problems. We also show in many practical cases there exist globally optimal designs whose design parameters are maximized or minimized at each point in the domain, i.e., that there is a discrete globally optimal structure.

...read moreread less

Proceedings Article•DOI•

Development of an IP-cores Libraries as Part of the Design Flow of Integrated Circuits on FPGA

[...]

V.M. Khvatov¹, D. A. Zheleznikov¹•Institutions (1)

Russian Academy of Sciences¹

26 Jan 2021

TL;DR: This article shows IP-core libraries types and forms used as a part of design flow developed by IPPM RAS for Russian FPGA and describes challenges of libraries for logical synthesis development and automatic mapping on an existing basis.

...read moreread less

Abstract: IP-core is a block with a complex function that can be re-used in integrated circuits design. There are two types of FPGA IP-cores: hard IP-core and soft IP-core. Hard IP-cores have an exact location and pre-routed interconnects while soft IP-cores can be synthesized from logic elements and should be placed and routed. To use IP-cores in automated design flow of integrated circuits on FPGA it is necessary to develop IP-cores libraries that allow identifying blocks on every stage of flow.This article shows IP-core libraries types and forms used as a part of design flow developed by IPPM RAS for Russian FPGA. It describes challenges of libraries for logical synthesis development and automatic mapping on an existing basis. The paper presents libraries needed by CAD on every stage of physical design for clustering, placement and routing. Also, it considers soft and hard IP-cores libraries distinct features and methods of their formation taking into account the FPGA architecture.

...read moreread less

Proceedings Article•DOI•

Automatic Surrogate Model Generation and Debugging of Analog/Mixed-Signal Designs Via Collaborative Stimulus Generation and Machine Learning

[...]

Jun Yang Lei¹, Abhijit Chatterjee¹•Institutions (1)

Georgia Institute of Technology¹

18 Jan 2021

TL;DR: In this article, the authors present a guided test generation algorithm that explores the input stimulus space and generates new stimuli which are likely to excite differences between the model and its netlist description.

...read moreread less

Abstract: In top-down analog and mixed-signal design, a key problem is to ensure that the netlist or physical design does not contain unanticipated behaviors. Mismatches between netlist level circuit descriptions and high level behavioral models need to be captured at all stages of the design process for accuracy of system level simulation as well as fast convergence of the design. To support the above, we present a guided test generation algorithm that explores the input stimulus space and generates new stimuli which are likely to excite differences between the model and its netlist description. Subsequently, a recurrent neural network (RNN) based learning model is used to learn divergent model and netlist behaviors and absorb them into the model to minimize these differences. The process is repeated iteratively and in each iteration, a Bayesian optimization algorithm is used to find optimal RNN hyperparameters to maximize behavior learning. The result is a circuit-accurate behavioral model that is also much faster to simulate than a circuit simulator. In addition, another sub-goal is to perform design bug diagnosis to track the source of observed behavioral anomalies down to individual modules or small levels of circuit detail. An optimization-based diagnosis approach using Volterra learning kernels that is easily integrated into circuit simulators is proposed. Results on representative circuits are presented.

...read moreread less