scispace - formally typeset
Search or ask a question

Showing papers on "Physical design published in 2021"


Journal ArticleDOI
09 Jun 2021-Nature
TL;DR: In this article, the authors presented a deep reinforcement learning approach to chip floorplanning, which can automatically generate chip floorplans that are superior or comparable to those produced by humans in all key metrics, including power consumption, performance and chip area.
Abstract: Chip floorplanning is the engineering task of designing the physical layout of a computer chip. Despite five decades of research1, chip floorplanning has defied automation, requiring months of intense effort by physical design engineers to produce manufacturable layouts. Here we present a deep reinforcement learning approach to chip floorplanning. In under six hours, our method automatically generates chip floorplans that are superior or comparable to those produced by humans in all key metrics, including power consumption, performance and chip area. To achieve this, we pose chip floorplanning as a reinforcement learning problem, and develop an edge-based graph convolutional neural network architecture capable of learning rich and transferable representations of the chip. As a result, our method utilizes past experience to become better and faster at solving new instances of the problem, allowing chip design to be performed by artificial agents with more experience than any human designer. Our method was used to design the next generation of Google’s artificial intelligence (AI) accelerators, and has the potential to save thousands of hours of human effort for each new generation. Finally, we believe that more powerful AI-designed hardware will fuel advances in AI, creating a symbiotic relationship between the two fields. Machine learning tools are used to greatly accelerate chip layout design, by posing chip floorplanning as a reinforcement learning problem and using neural networks to generate high-performance chip layouts.

124 citations


Journal ArticleDOI
TL;DR: An alternative approach for the streamlined physical design of quantum-dot cellular automata (QCA) full-adder circuits in which the placement of input cells and wire crossing congestion are substantially reduced.
Abstract: Nowadays, arithmetic computing is an important subject in computer architectures in which the one-bit full-adder gate plays a significant role. Thus, efficient design of such full-adder component can be beneficial to the overall efficiency of the entire system. In this essay, a novel method for the design and simulation of a combined majority gate toward realization of the one-bit full-adder gate is proposed. We inspect an alternative approach for the streamlined physical design of quantum-dot cellular automata (QCA) full-adder circuits in which the placement of input cells and wire crossing congestion are substantially reduced. The proposed method has outstanding characteristics such as low complexity, reduced area consumption, simplified physical design, and ultra-high speed one-bit full-adder. Based on simulation results the proposed design provides 33.33% reduction in area and 20.00% improvement in complexity as well as 10.49% in 1 Ek reduction in power consumption.

17 citations


Proceedings ArticleDOI
01 Feb 2021
TL;DR: In this article, a fully convolutional network model is proposed to predict congestion hotspots and then incorporate this prediction model into a placement engine, DREAMPlace, to get a more route-friendly result.
Abstract: Placement and routing (PnR) is the most time-consuming part of the physical design flow. Recognizing the routing performance ahead of time can assist designers and design tools to optimize placement results in advance. In this paper, we propose a fully convolutional network model to predict congestion hotspots and then incorporate this prediction model into a placement engine, DREAMPlace, to get a more route-friendly result. The experimental results on ISPD2015 benchmarks show that with the superior accuracy of the prediction model, our proposed approach can achieve up to 9.05% reduction in congestion rate and 5.30% reduction in routed wirelength compared with the state-of-the-art.

16 citations


Proceedings ArticleDOI
22 May 2021
TL;DR: This paper has proposed a reinforcement-learning-based method that can fully automate analog layout placement optimization and is not only applicable to any unseen analog placement scenarios, but also can meet the requirements of analog layouts placement designs in the advanced FinFET technology.
Abstract: Despite all efforts being made to ease analog layout generation, the designers' expertise is still highly demanded in the process of analog IC physical design. Recently, some endeavors started to leverage artificial intelligence (AI) to tackle the complexity of analog layout optimization and alleviate the high demand for the designers' experience in the design process. However, these methods, which mainly rely on using the previous designs, are not effective to the unseen data (or scenarios) that were not included in the AI training. In this paper, we have proposed a reinforcement-learning-based method that can fully automate analog layout placement optimization. It is not only applicable to any unseen analog placement scenarios, but also can meet the requirements of analog layout placement designs in the advanced FinFET technology. Our experimental results show that the proposed method can place analog modules subject to the defined objectives 77x faster than the conventional analytical methods (e.g., conjugate gradient) without compromising the optimization accuracy.

15 citations


Journal ArticleDOI
TL;DR: This article forms the first practical system level design and wash optimization problem for microfluidic biochips with distributed channel storage architecture, considering high-level synthesis, physical design, and wash optimized simultaneously, and presents a top-down design flow to solve this problem systematically.
Abstract: System-architecture design optimization of flow-based microfluidic biochips has been extensively investigated over the past decade. Most of the prior work, however, is still based on chip architectures with dedicated storage units and this, not only limits the performance of biochips, but also increases their fabrication cost. To overcome this limitation, a distributed channel-storage architecture can be implemented, where fluid samples can be cached temporarily in flow channels instead of using a dedicated storage. This new concept of fluid storage, however, requires a careful arrangement of fluid samples to enable the channels to fulfill their dual functions. Moreover, to avoid cross-contamination between fluidic flows, wash operations are necessary to remove the residue left in flow channels. In this paper, we formulate the first system level design and wash optimization problem for microfluidic biochips with distributed channel storage, considering high-level synthesis, physical design, and wash optimization simultaneously. Given the protocol of a biochemical application and the corresponding design requirements, our goal is to generate a chip architecture with minimized cost. Meanwhile the bioassay can be executed efficiently with an optimized wash scheme. Experimental results confirm that our approach leads to short completion time of bioassays, low chip cost, and high wash efficiency.

11 citations


Proceedings ArticleDOI
22 Mar 2021
TL;DR: In this article, a constraint-driven placement approach is proposed to build commercial-quality 3D ICs. But, this approach is limited to two-dimensional (2D) ICs and cannot handle 3D integration.
Abstract: 3D integration technology is one of the leading options that can advance Moore's Law beyond conventional scaling. Due to the absence of commercial 3D placers and routers, existing 3D physical design flows rely heavily on 2D commercial tools to handle 3D IC physical synthesis. Specifically, these flows build 2D designs first and then convert them into 3D designs. However, several works demonstrate that design qualities degrade during this 2D-3D transformation. In this paper, we overcome this issue with our Snap-3D, a constraint-driven placement approach to build commercial-quality 3D ICs. Our key idea is based on the observation that if the standard cell height is contracted by one half and partitioned into multiple tiers, any commercial 2D placer can place them onto the row structure and naturally achieve high-quality 3D placement. This methodology is shown to optimize power, performance, and area (PPA) metrics across different tiers simultaneously and minimize the aforementioned design quality loss. Experimental results on 7 industrial designs demonstrate that Snap-3D achieves up to 5.4% wirelength, 10.1% power, and 92.3% total negative slack improvements compared with state-of-the-art 3D design flows.

10 citations


Proceedings ArticleDOI
01 Feb 2021
TL;DR: TAP-2.5D as discussed by the authors is the first open-source network routing and thermally-aware chiplet placement methodology for heterogeneous 2-5D systems, which strategically inserts spacing between chiplets to jointly minimize the temperature and total wirelength, and in turn increases the thermal design power envelope of the overall system.
Abstract: Heterogeneous systems are commonly used today to sustain the historic benefits we have achieved through technology scaling. 2.5D integration technology provides a cost-effective solution for designing heterogeneous systems. The traditional physical design of a 2.5D heterogeneous system closely packs the chiplets to minimize wirelength, but this leads to a thermally-inefficient design. We propose TAP-2.5D: the first open-source network routing and thermally-aware chiplet placement methodology for heterogeneous 2.5D systems. TAP-2.5D strategically inserts spacing between chiplets to jointly minimize the temperature and total wirelength, and in turn, increases the thermal design power envelope of the overall system. We present three case studies demonstrating the usage and efficacy of TAP-2.5D.

10 citations


Proceedings ArticleDOI
01 Feb 2021
TL;DR: In this article, the authors focus on Deep Neural Networks (DNN) and how to build efficient deep neural network accelerators through microarchitectural exploration, energy efficient memory hierarchies, flexible dataflow distribution, domain-specific compute optimizations and finally hardware-software co-design techniques.
Abstract: AI acceleration is one of the most actively researched fields in IP and system design. The introduction of specialized AI accelerators in the cloud and at the edge has made it possible to deploy large scaled AI solutions that automate tasks that were previously not possible without AI. The growth of big data and the compute horsepower needed to process this data to provide business intelligence is key for several companies in gaining a competitive edge. AI workloads are both data and compute intensive and improving the efficiency often requires an end-to-end solution. In this perspective paper, we identify key considerations for the design of AI accelerators. The focus of this paper is on Deep Neural Networks (DNN) and how to build efficient deep neural network accelerators through microarchitectural exploration, energy efficient memory hierarchies, flexible dataflow distribution, domain-specific compute optimizations and finally hardware-software co-design techniques. The importance of interconnect topology and the impact of its scaling to the physical design of an AI accelerator is also a key consideration that is described in this paper. In the future, the energy efficiency of these accelerators may rely on approximation computing, compute-in-memory and runtime flexibility for significant improvement.

9 citations


Journal ArticleDOI
TL;DR: This work presents an AMS layout generation flow that leverages digital place-and-route (PnR) tools, amortizes setup cost with reusable primitives, and prunes layout candidates using a fast evaluation scheme.
Abstract: Today’s analog and mixed-signal (AMS) layout flow requires long manual iterations and does not leverage computing resources for data-driven optimization. This issue is further compounded by the explosion of design rules and layout-dependent effects (LDEs). We present an AMS layout generation flow that leverages digital place-and-route (PnR) tools, amortizes setup cost with reusable primitives, and prunes layout candidates using a fast evaluation scheme. We also analyze LDEs and parasitics and investigate unique challenges and mitigation strategies associated with using digital PnR tools for AMS circuits. These insights are validated with a generated StrongARM comparator and a voltage-controlled oscillator (VCO). The VCO layout was optimized in 2 h and fabricated in 16-nm FinFET CMOS. Silicon measurement results of the VCO closely track the simulation, verifying the methodology from netlist to silicon.

9 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present an extensive survey of recent work reported in the literature on the domains of logic circuit design, synthesis, and physical design automation for implementing photonic integrated circuits (PICs).
Abstract: In recent years, silicon photonics (Si-photonics) have received significant attention among researchers due to complementary metal-oxide semiconductor compatibility, and the characteristics of high-speed and low-power dissipation. The integration of electronic and optical circuits on a single chip has opened up new directions of research in the domain of digital logic design and synthesis of photonic integrated circuits (PICs). Several optical switching devices using different technologies have been designed and experimentally demonstrated, which further helps in implementing PICs. In order to efficiently design larger, complex and reliable PICs, the photonic design automation techniques are being explored as electronic design automation techniques have been investigated in case of very large-scale integration circuits. This paper presents an extensive survey of recent work reported in the literature on the domains of logic circuit design, synthesis, and physical design automation for implementing PICs. The aim of this survey is to start with the fundamental optical concepts and then move to the latest research domains of design and synthesis of PICs. Finally, we provide a discussion on the challenges and the future research directions toward practically realizing the Si-photonics and PICs.

9 citations


Proceedings ArticleDOI
18 Jan 2021
TL;DR: In this article, the authors propose a full-board routing algorithm that can handle multiple real-world complicated constraints to facilitate the printed circuit board routing and produce high-quality manufacturable layouts.
Abstract: The printed circuit board (PCB) routing problem has been studied extensively in recent years Due to continually growing net/pin counts, extremely high pin density, and unique physical constraints, the manual routing of PCBs has become a time-consuming task to reach design closure Previous works break down the problem into escape routing and area routing and focus on these problems separately However, there is always a gap between these two problems requiring a massive amount of human efforts to fine-tune the algorithms back and forth Besides, previous works of area routing mainly focus on routing between escaping routed ball-grid-array (BGA) packages Nevertheless, in practice, many components are not in the form of BGA packages, such as passive devices, decoupling capacitors, and through-hole pin arrays To mitigate the deficiencies of previous works, we propose a full-board routing algorithm that can handle multiple real-world complicated constraints to facilitate the printed circuit board routing and produce high-quality manufacturable layouts Experimental results show that our algorithm is effective and efficient Specifically, for all given test cases, our router can achieve 100% routability without any design rule violation while the other two state-of-the-art routers fail to complete the routing for some test cases and incur design rule violations

Proceedings ArticleDOI
17 Feb 2021
TL;DR: Soft embedded FPGA redaction as discussed by the authors is a hardware obfuscation approach that allows the designer to substitute security-critical IP blocks within a design with a synthesizable eFPGA fabric.
Abstract: In recent years, IC reverse engineering and IC fabrication supply chain security have grown to become significant economic and security threats for designers, system integrators, and end customers. Many of the existing logic locking and obfuscation techniques have shown to be vulnerable to attack once the attacker has access to the design netlist either through reverse engineering or through an untrusted fabrication facility. We introduce soft embedded FPGA redaction, a hardware obfuscation approach that allows the designer substitute security-critical IP blocks within a design with a synthesizable eFPGA fabric. This method fully conceals the logic and the routing of the critical IP and is compatible with standard ASIC flows for easy integration and process portability. To demonstrate eFPGA redaction, we obfuscate a RISC-V control path and a GPS P-code generator. We also show that the modified netlists are resilient to SAT attacks with moderate VLSI overheads. The secure RISC-V design has 1.89x area and 2.36x delay overhead while the GPS design has 1.39x area and negligible delay overhead when implemented on an industrial 22nm FinFET CMOS process.

Proceedings ArticleDOI
07 Jul 2021
TL;DR: In this article, an ILP-based global routing with cell movement (ILP-GRC) is proposed to reduce the wirelength of ICs by 16% and 7% on average.
Abstract: The placement and routing processes are key parts of the physical design of Integrated Circuits (IC). They directly impact the circuit’s performance, area, power consumption, and reliability. During the physical design flow, placement and routing problems are usually tackled using a divide-and-conquer approach to reduce the complexity and size of modern circuits. The increase in complexity and size of circuits means that small inefficiencies in the placement solution can be amplified during routing and severely impact the quality and convergence of the design. In this work, we propose an Integer Linear Programming (ILP)-based Global Routing with Cell Movement (ILP-GRC) that simultaneously performs cell movements and routes nets. The proposed method enables the designer to relocate cells that can lead to routing issues at an early stage and without compromising the quality with respect to wirelength and the number of vias. The proposed model is evaluated using ISPD2018 contest benchmarks. Results show that ILP-GRC is able to reduce wirelength on average by 16% and 7% following academic and commercial global routers, respectively, while moving only 2% of cells on average.

Proceedings ArticleDOI
18 Jan 2021
TL;DR: In this paper, the authors formalize logic paths as sentences, with the gates being a bag of words, and show how word embedding can be leveraged to represent generic paths and predict if a given path is likely to be critical post-PnR.
Abstract: To tackle the involved complexity, Electronic Design Automation (EDA) tools are broken in well-defined steps, each operating at different abstraction levels. Higher levels of abstraction shorten the flow run-time while sacrificing correlation with the physical circuit implementation. Bridging this gap between Logic Synthesis tool and Physical Design (PnR) tools is key to improve Quality of Results (QoR), while possibly shorting the time-to-market. To address this problem, in this work, we formalize logic paths as sentences, with the gates being a bag of words. Thus, we show how word embedding can be leveraged to represent generic paths and predict if a given path is likely to be critical post-PnR. We present the effectiveness of our approach, with accuracy over than 90% for our test-cases. Finally, we give a step further and introduce an intelligent and non-intrusive flow that uses this information to guide optimization. Our flow presents up to 15.53% area delay product (ADP) and 18.56% power delay product (PDP), compared to a standard flow.

Proceedings ArticleDOI
18 Jan 2021
TL;DR: In this paper, the authors use graph convolution neural networks to predict per-cell recoverable slack, and translate these slack values to equivalent power savings, and apply this model to several graphs with various logic-cone structures.
Abstract: Static power consumption is a critical challenge for IC designs, particularly for mobile and IoT applications. A final post-layout step in modern design flows involves a leakage recovery step that is embedded in signoff static timing analysis tools. The goal of such recovery is to make use of the positive slack (if any) and recover the leakage power by performing cell swaps with footprint compatible variants. Though such swaps result in unaltered routing, the hard constraint is not to introduce any new timing violations. This process can require up to tens of hours of runtime, just before the tapeout, when schedule and resource constraints are tightest. The physical design teams can benefit greatly from a fast predictor of the leakage recovery step: if the eventual recovery will be too small, the entire step can be skipped, and the resources can be allocated elsewhere. If we represent the circuit netlist as a graph with cells as vertices and nets connecting these cells as edges, the leakage recovery step is an optimization step, on this graph. If we can learn these optimizations over several graphs with various logic-cone structures, we can generalize the learning to unseen graphs. Using graph convolution neural networks, we develop a learning-based model, that predicts per-cell recoverable slack, and translate these slack values to equivalent power savings. For designs up to 1.6M instances, our inference step takes less than 12 seconds on a Tesla P100 GPU, and an additional feature extraction, post-processing steps consuming 420 seconds. The model is accurate with relative error under 6.2%, for the design-specific context.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: In this article, the authors present electrical and thermal analyses of 3D digital designs using hybrid bonding, specifically using the design rules, and other properties, for the XPERI DBI® technology at a $\mathrm{1.6}\ \mu \mathm{m}$ pad pitch.
Abstract: We present electrical and thermal analyses of 3D digital designs using hybrid bonding, specifically using the design rules, and other properties, for the XPERI DBI® technology at a $\mathrm{1.6}\ \mu \mathrm{m}$ pad pitch. We also go over the advantages of hybrid bonding over thermo-compression bonding (TCB) and 2D designs. Commercial 3D physical design tools were not mature when we did this work, so we came up with a methodology that builds on 2D tools. Our design flow includes scripts for optimal assignment of bonding locations, partitioning of netlist and delay constraints, and optimization techniques that involve iterating on delay constraints. Various partitioning schemes that include targeting long nets, managing flip-flop distribution between tiers, and minimum cut partitioning using an open source tool were analyzed. Because analysis results could vary from design to design, we propose potential metrics that can be used to identify designs that may benefit from 3DIC technology. Analysis results showed that we were able to reduce routed wire length by up to 57%. Logic power and total power decreased by up to 34% and 22% respectively. Silicon area also improved by 11%.11This work was supported, in part, by Xperi. DISTRIBUTION STATE-MENT A. Approved for public release: distribution unlimited.

Journal ArticleDOI
TL;DR: A methodology for a post-placement, machine learning-based routing congestion prediction model for FPGAs, which shows significant improvement in terms of accuracy measured as mean absolute error and prediction time when compared against the latest state-of-the-art works.
Abstract: Design closure in general VLSI physical design flows and FPGA physical design flows is an important and time-consuming problem. Routing itself can consume as much as 70% of the total design time. Accurate congestion estimation during the early stages of the design flow can help alleviate last-minute routing-related surprises. This paper has described a methodology for a post-placement, machine learning-based routing congestion prediction model for FPGAs. Routing congestion is modeled as a regression problem. We have described the methods for generating training data, feature extractions, training, regression models, validation, and deployment approaches. We have tested our prediction model by using ISPD 2016 FPGA benchmarks. Our prediction method reports a very accurate localized congestion value in each channel around a configurable logic block (CLB). The localized congestion is predicted in both vertical and horizontal directions. We demonstrate the effectiveness of our model on completely unseen designs that are not initially part of the training data set. The generated results show significant improvement in terms of accuracy measured as mean absolute error and prediction time when compared against the latest state-of-the-art works.

Proceedings ArticleDOI
26 Jul 2021
TL;DR: In this paper, the authors explore and evaluate multiple design options for an Arm Neoverse-based 3D architecture focusing on power and thermals at 7nm process and sub-10$\mu $m pitch.
Abstract: 3D integration is becoming a cost-effective way to incorporate more CPU cores and memory to improve the performance of computing systems. Meanwhile, due to the higher power density, power delivery and thermal issues become more significant in multi-tier 3DICs. In this paper, we explore and evaluate multiple design options for an Arm Neoverse-based 3D architecture focusing on power and thermals at 7nm process and sub-10$\mu $m pitch. Using a rapid voltage-drop and thermal analysis methodology, we model a system with a 32-core CPU layer and up to 4 layers of system-level caches, and quantity the trade-offs between performance, cost, voltage-drop, and temperature. A 3-layer configuration shows a good balance with 17% IPC gain and 17% lower cost, while incurring 15mV worse voltage drop and 8.5°C higher temperature compared with 2D. Our studies suggest that the co-optimization of system architecture, technology, and physical design is key for high-performance 3D systems.

Proceedings ArticleDOI
22 May 2021
TL;DR: This work conducts a statistical analysis to quantify the effect of physical layout on the randomness of multiple copies of the same PUF structure that deployed relatively close on the same FPGA die.
Abstract: FPGA-based designs dominate many applications for their faster implementation, configurability, and low design cost. These applications need a root of trust to be secured against malicious activities. Physical unclonable functions (PUF) are promising security primitive that can be used to identify silicon dies. To effectively distinguish between different dies, PUF should satisfy a set of quality metrics. FPGA-based PUFs are susceptible to parameters such as systematic variation and placement and routing. In this work, we conduct a statistical analysis to quantify the effect of physical layout on the randomness of multiple copies of the same PUF structure that deployed relatively close on the same FPGA die. As a case study, we have adopted an FPGA-based ring PUF structure known as bistable ring PUF. The results show that only 2 out of the 64 PUFs structure can show good randomness behavior. Even by considering 40%-60% randomness as an acceptable range, only 10 out of 64 PUFs can pass this criterion. Moreover, most of the rest PUFs are extremely biased toward a single state. As a result, 84.4% of PUFs under test end up non-functional PUFs due to the physical design of the FPGA.

Journal ArticleDOI
TL;DR: PathDriver+ as mentioned in this paper integrates the actual fluid manipulations into both high-level synthesis and physical design of microfluidic biochips, which has never been considered in prior work.
Abstract: Continuous-flow microfluidic biochips have attracted high research interest over the past years. Inside such a chip, fluid samples of milliliter volumes are efficiently transported between devices (e.g., mixers, heaters, etc.) to automatically perform various laboratory procedures in biology and biochemistry. Each transportation task, however, requires an exclusive flow path composed of multiple contiguous microchannels during its execution period. Excess/waste fluids, in the meantime, should be discarded by independent flow paths connected to waste ports. All these paths are etched in a very tiny chip area using multilayer soft lithography and driven by flow ports connecting with external pressure sources, forming a highly integrated chip architecture that determines the final performance of biochips. In this paper, we propose a new and practical design flow called PathDriver+ for the architecture design of microfluidic biochips, integrating the actual fluid manipulations into both high-level synthesis and physical design, which has never been considered in prior work. With this design flow, highly efficient chip architectures with a flow-path network that enables the actual fluid transportation and removal can be constructed automatically. Meanwhile, fluid volume management between devices and flowpath minimization are realized for the first time, thus ensuring the correctness of assay outcomes while reducing the complexity of chip architectures. Additionally, diagonal channel routing is implemented to fundamentally improve the chip performance. The tradeoff between the numbers of channel intersections and fluidic ports is evaluated to further reduce the fabrication cost of biochips. Experimental results on multiple benchmarks confirm that the proposed design flow leads to high assay execution efficiency and low overall chip cost.

Journal ArticleDOI
TL;DR: A CMOS standard cells library of low-energy, minimum-area, and fitted for IoT applications is introduced in this paper, which uses two solutions to provide significant energy saving.
Abstract: A CMOS standard cells library of low-energy, minimum-area, and fitted for IoT applications is introduced in this paper. The paper uses two solutions to provide significant energy saving. The first is to design the library to be operating in the Near-Threshold Voltage (NTV) region. The second is to create layouts of cells at the minimum possible area that can be achieved for a given technology process. To partially recover the speed loss due to operating in the NTV region, the pMOS performance is boosted by a proposed body biasing technique that connects pMOS body to the ground. Furthermore, minimum energy consumption is considered at the selection of the library supply voltage and the selection of each cell transistor sizing, while keeping the library performing in the range of 1 MHz up to 20 MHz. This range is sufficient for IoT applications. Another challenge for the NTV is Performance Sensitivity to the process variations, which is analyzed, then a design solution is provided to assure timing closure with such sensitivity. The UMC 130 nm CMOS process technology was used to design and characterize the proposed library. Library timing and physical views were created to enable its usage in both synthesis and physical design tools. Library benchmark was done on three cryptography algorithms to show the benefit for IoT applications. The used algorithms are AEGIS-128, ASCON, and AEZ. The maximum achieved frequency for these cores is 14 MHz, 18 MHz, and 16 MHz, and the corresponding energy consumption is 4.25 pJ, 10.03 pJ, and 30.57 pJ, respectively.

Journal ArticleDOI
TL;DR: In this article, the authors developed a new physical design flow that optimally places and routes cache modules in one tier and logic gates in the other, and developed a signoff analysis tool flow to evaluate power, performance, area (PPA), thermal, and voltage-drop quality for given M3-D designs.
Abstract: Monolithic 3-D IC (M3-D) is a promising solution to improve the performance and energy-efficiency of modern processors. But, designers are faced with challenges in design tools and methodologies, especially for power and thermal verifications. We developed a new physical design flow that optimally places and routes cache modules in one tier and logic gates in the other. Our tool also builds high-quality clock and power delivery networks targeting logic-on-memory M3-D designs. Finally, we developed a sign-off analysis tool flow to evaluate power, performance, area (PPA), thermal, and voltage-drop quality for given M3-D designs. Using our complete register transfer level (RTL)-to-Graphic Design System (GDS) tool flow, we designed commercial quality 2-D and M3-D implementation of Arm Cortex-A7 and Cortex-A53 processors in a commercial 28-nm technology. Experimental results show that our 3-D processors offer 20% (A7) and 21% (A53) performance gain, compared with their 2-D commercial counterparts. The voltage-drop degradation of our 3-D Cortex-A7 and Cortex-A53 processors is less than 3% of the supply voltage, while temperature increase is 10.71 °C and 13.04 °C, respectively.

Proceedings ArticleDOI
23 Aug 2021
TL;DR: In this paper, the authors proposed a flow and a tool to minimize the asymmetric aging effect in data path design structures, which can be straightforwardly integrated as part of standard design flows of large-scale ICs.
Abstract: The latest process technologies have become highly susceptible to asymmetric aging, whereby the timing of logical elements degrades at unequal rates over the element lifetime, causing severe reliability concerns. Although several tools are available to handle asymmetric aging, such tools mainly rely on circuit or physical design approaches and offer a limited capability to handle large-scale ICs. In this paper, we introduce a flow and a tool to minimize the asymmetric aging effect in data path design structures. The proposed tool can be straightforwardly integrated as part of standard design flows of large-scale ICs. In addition, the tool can automatically analyze various designs at RTL or gate-level and identify logical elements which are suspectable to asymmetric aging. As part of the design flow, the tool automatically embeds a special logical circuitry in the design to eliminate asymmetric aging. Our experimental analysis shows that the proposed design flow can minimize the asymmetric aging effect and eliminate reliability concerns while introducing minor power and silicon area overhead.

Journal ArticleDOI
TL;DR: In this article, a back-annotation flow is proposed to summarize the routing congestion issues at the source level by analyzing the reports from the FPGA physical design tools and the internal debugging files of the HLS tools.
Abstract: Ever since transistor cost stopped decreasing, customized programmable platforms, such as field-programmable gate arrays (FPGAs), became a major way to improve software execution performance and energy consumption. While software developers can use high-level synthesis (HLS) to speed up register-transfer level (RTL) code generation from C++ or OpenCL source code, placement and routing issues, such as congestion, can still prevent achieving an FPGA programming bitstream or dramatically reduce the FPGA implementation performance. Congestion reports from physical design tools refer to thousands of RTL signal names instead of developer-accessible identifiers and statements, considerably complicating the developer understanding and resolution of the issues at the source level. We propose a high-level back-annotation flow that summarizes the routing congestion issues at the source level by analyzing the reports from the FPGA physical design tools and the internal debugging files of the HLS tools. Our flow describes congestion using comments back-annotated on the source code and identifies if the congestion causes are the on-chip memories or the DSP units (multipliers/adders), which are the shared resources very often associated with routing problems on FPGAs. We demonstrate on realistic large designs how the information provided by our flow helps to quickly spot congestion causes at the source level and to solve them using appropriate HLS directives.

Journal ArticleDOI
TL;DR: In this article, a hybrid-circuits cloud-based platform that enables students to design, simulate, and model both analog and digital electronic systems is presented. But, it is not suitable for the use in virtual learning.

Proceedings ArticleDOI
18 Jan 2021
TL;DR: In this paper, the authors review several physical synthesis techniques for advanced neural network processors and argue that datapath design is an essential methodology in the above procedures due to the organized computational graph of neural networks.
Abstract: The remarkable breakthroughs in deep learning have led to a dramatic thirst for computational resources to tackle interesting real-world problems Various neural network processors have been proposed for the purpose, yet, far fewer discussions have been made on the physical synthesis for such specialized processors, especially in advanced technology nodes In this paper, we review several physical synthesis techniques for advanced neural network processors We especially argue that datapath design is an essential methodology in the above procedures due to the organized computational graph of neural networks As a case study, we investigate a wafer-scale deep learning accelerator placement problem in detail

Proceedings ArticleDOI
22 Mar 2021
TL;DR: In this paper, the authors present scalable and adaptive hierarchical floorplanning strategies to significantly reduce the physical design runtime and enable millions of LUT FPGA layout implementations using standard ASIC toolchains.
Abstract: Physical design for Field Programmable Gate Array (FPGA) is challenging and time-consuming, primarily due to the use of a full-custom approach for aggressively optimize Performance, Power and Area (P.P.A.) of the FPGA design. The growing number of FPGA applications demands novel architectures and shorter development cycles. The use of an automated toolchain is essential to reduce end-to-end development time. This paper presents scalable and adaptive hierarchical floorplanning strategies to significantly reduce the physical design runtime and enable millions-of-LUT FPGA layout implementations using standard ASIC toolchains. This approach mainly exploits the regularity of the design and performs necessary feedthrough creations for global and clock nets to eliminate any requirement of global optimizations. To validate this approach, we implemented full-chip layouts for modern FPGA fabric with logic capacity ranging from 40 to 100k LUTs using a commercial 12nm technology. Our results show that the physical implementation of a 128k-LUT FPGA fabric can be achieved within 24-hours, which has not been demonstrated by any previous work. Compared to previous work, the runtime reduction of 8x is obtained for implementing 2.5k LUTs FPGA device.

Journal ArticleDOI
TL;DR: In this article, a convex optimization problem for photonic design with real scalar fields, diffusion-type systems, and others is shown to be efficiently solved globally, given the sign of an optimal field at every point.
Abstract: In a physical design problem, the designer chooses values of some physical parameters, within limits, to optimize the resulting field. We focus on the specific case in which each physical design parameter is the ratio of two field variables. This form occurs for photonic design with real scalar fields, diffusion-type systems, and others. We show that such problems can be reduced to a convex optimization problem, and therefore efficiently solved globally, given the sign of an optimal field at every point. This observation suggests a heuristic, in which the signs of the field are iteratively updated. This heuristic appears to have good practical performance on diffusion-type problems (including thermal design and resistive circuit design) and some control problems, while exhibiting moderate performance on photonic design problems. We also show in many practical cases there exist globally optimal designs whose design parameters are maximized or minimized at each point in the domain, i.e., that there is a discrete globally optimal structure.

Proceedings ArticleDOI
26 Jan 2021
TL;DR: This article shows IP-core libraries types and forms used as a part of design flow developed by IPPM RAS for Russian FPGA and describes challenges of libraries for logical synthesis development and automatic mapping on an existing basis.
Abstract: IP-core is a block with a complex function that can be re-used in integrated circuits design. There are two types of FPGA IP-cores: hard IP-core and soft IP-core. Hard IP-cores have an exact location and pre-routed interconnects while soft IP-cores can be synthesized from logic elements and should be placed and routed. To use IP-cores in automated design flow of integrated circuits on FPGA it is necessary to develop IP-cores libraries that allow identifying blocks on every stage of flow.This article shows IP-core libraries types and forms used as a part of design flow developed by IPPM RAS for Russian FPGA. It describes challenges of libraries for logical synthesis development and automatic mapping on an existing basis. The paper presents libraries needed by CAD on every stage of physical design for clustering, placement and routing. Also, it considers soft and hard IP-cores libraries distinct features and methods of their formation taking into account the FPGA architecture.

Proceedings ArticleDOI
18 Jan 2021
TL;DR: In this article, the authors present a guided test generation algorithm that explores the input stimulus space and generates new stimuli which are likely to excite differences between the model and its netlist description.
Abstract: In top-down analog and mixed-signal design, a key problem is to ensure that the netlist or physical design does not contain unanticipated behaviors. Mismatches between netlist level circuit descriptions and high level behavioral models need to be captured at all stages of the design process for accuracy of system level simulation as well as fast convergence of the design. To support the above, we present a guided test generation algorithm that explores the input stimulus space and generates new stimuli which are likely to excite differences between the model and its netlist description. Subsequently, a recurrent neural network (RNN) based learning model is used to learn divergent model and netlist behaviors and absorb them into the model to minimize these differences. The process is repeated iteratively and in each iteration, a Bayesian optimization algorithm is used to find optimal RNN hyperparameters to maximize behavior learning. The result is a circuit-accurate behavioral model that is also much faster to simulate than a circuit simulator. In addition, another sub-goal is to perform design bug diagnosis to track the source of observed behavioral anomalies down to individual modules or small levels of circuit detail. An optimization-based diagnosis approach using Volterra learning kernels that is easily integrated into circuit simulators is proposed. Results on representative circuits are presented.