scispace - formally typeset
Search or ask a question

Showing papers on "Very-large-scale integration published in 2019"


Proceedings ArticleDOI
02 Jun 2019
TL;DR: A novel GPU-accelerated placement framework DREAMPlace is proposed, by casting the analytical placement problem equivalently to training a neural network, to achieve over 30 times speedup in global placement without quality degradation compared to the state-of-the-art multi-threaded placer RePlAce.
Abstract: Placement for very-large-scale integrated (VLSI) circuits is one of the most important steps for design closure. This paper proposes a novel GPU-accelerated placement framework DREAMPlace, by casting the analytical placement problem equivalently to training a neural network. Implemented on top of a widely-adopted deep learning toolkit PyTorch, with customized key kernels for wirelength and density computations, DREAMPlace can achieve over $ 30\times $ speedup in global placement without quality degradation compared to the state-of-the-art multi-threaded placer RePlAce. We believe this work shall open up new directions for revisiting classical EDA problems with advancement in AI hardware and software.

57 citations


Journal ArticleDOI
TL;DR: A new PRBG method called as “modified dual-CLCG” and its very large-scale integration (VLSI) architecture are proposed in this paper to mitigate the aforesaid problems.
Abstract: Pseudorandom bit generator (PRBG) is an essential component for securing data during transmission and storage in various cryptography applications. Among popular existing PRBG methods such as linear feedback shift register (LFSR), linear congruential generator (LCG), coupled LCG (CLCG), and dual-coupled LCG (dual-CLCG), the latter proves to be more secure. This method relies on the inequality comparisons that lead to generating pseudorandom bit at a non-uniform time interval. Hence, a new architecture of the existing dual-CLCG method is developed that generates pseudo-random bit at uniform clock rate. However, this architecture experiences several drawbacks such as excessive memory usage and high-initial clock latency, and fails to achieve the maximum length sequence. Therefore, a new PRBG method called as “modified dual-CLCG” and its very large-scale integration (VLSI) architecture are proposed in this paper to mitigate the aforesaid problems. The novel contribution of the proposed PRBG method is to generate pseudorandom bit at uniform clock rate with one initial clock delay and minimum hardware complexity. Moreover, the proposed PRBG method passes all the 15 benchmark tests of NIST standard and achieves the maximal period of $2^{n}$ . The proposed architecture is implemented using Verilog-HDL and prototyped on the commercially available FPGA device.

33 citations


Journal ArticleDOI
TL;DR: A variation and noise-tolerant learning algorithm and postsilicon process variation compensation technique which does not require any additional monitoring circuitry to reduce the accuracy degradation in the corrupted fully connected network.
Abstract: Recently, analog and mixed-signal neural network processors have been extensively studied due to their better energy efficiency and small footprint. However, analog computing is more vulnerable to circuit nonidealities such as process variation than their digital counterparts. On-chip calibration circuits can be adopted to measure and compensate for those effects, but it leads to unavoidable area and power overheads. In this brief, we propose a variation and noise-tolerant learning algorithm and postsilicon process variation compensation technique which does not require any additional monitoring circuitry. The proposed techniques reduce the accuracy degradation in the corrupted fully connected network down to 1% under large amount of variations including 10% unit capacitor mismatch, 8-mVrms comparator noise and 20-mVrms comparator offset.

31 citations


Proceedings ArticleDOI
01 Oct 2019
TL;DR: This work presents a graph-based deep learning method for quickly predicting logic-induced routing congestion hotspots from a gate-level netlist before placement, which can provide early feedback to designers and EDA tools, indicating logic that may be difficult to route.
Abstract: As feature size shrinks, routing constraints become a more significant limiting factor to the manufacturability of VLSI designs. Routing congestion significantly impacts quality metrics such as area and timing performance, but congestion is not known accurately until late in the design cycle, after placement and routing. This can lead to unpleasant surprises during the design process. Accordingly, early prediction of routing requirements would enable design engineers to iterate faster, with more confidence that their designs were routable and high quality. Additionally, routability estimates can inform placement itself, preemptively eliminating routing problems. In this work, we present a graph-based deep learning method for quickly predicting logic-induced routing congestion hotspots from a gate-level netlist before placement. This model can provide early feedback to designers and EDA tools, indicating logic that may be difficult to route. Compared to using previous congestion prediction metrics to predict congestion hotspots without placement information, our solution provides a 29% increase in the Kendall ranking correlation score. Because our focus is on predicting congestion due to local logic structure, which manifests itself on lower metal layers, we also report accuracy for predicting lower metal layer congestion. When predicting congestion for the lower metal layers, the benefit of our solution over previous metrics increases to 75%. Additionally, our approach is fast. On a circuit with 1.3 million cells, our approach takes 19 seconds to predict congestion, compared with 10–60 minutes for other methods.

28 citations


Journal ArticleDOI
TL;DR: The study indicates that the inverse-dependence of threshold voltage of negative-capacitance field-effect transistor is not only acceptable but also beneficial for the speed performance of both the static and pass-transistor logic (PTL) circuits, especially for the PTL at low VDD.
Abstract: This paper examines metal–ferroelectric–insulator–semiconductor negative-capacitance FinFET (NC-FinFET) based VLSI subsystem-level logic circuits. For the first time, with the aid of a short-channel NC-FinFET compact model, we confirm the functionality and evaluate the standby-power/switching-energy/delay performance of large logic circuits (e.g., dynamic 4-bit Manchester carry-chain adder and the formal hierarchical 32-bit carry-look-ahead adder) employing 14-nm ultra-low-power NC-FinFETs. Our study indicates that the inverse V ds-dependence of threshold voltage ( V T), also known as the negative drain-induced barrier lowering, of negative-capacitance field-effect transistor is not only acceptable but also beneficial for the speed performance of both the static and pass-transistor logic (PTL) circuits, especially for the PTL at low V DD.

25 citations


Proceedings ArticleDOI
02 Jun 2019
TL;DR: This work presents a predictive model-based method that takes as inputs the results of an ASIC HLS DSE and automatically, without the need to re-explore the behavioral description, finds the Pareto-optimal micro-architectures for the target FPGA.
Abstract: One of the advantages of High-Level Synthesis (HLS), also called C-based VLSI-design, over traditional RT-level VLSI design flows, is that multiple micro-architectures of unique area vs. performance can be automatically generated by setting different synthesis options, typically in the form of synthesis directives specified as pragmas in the source code. This design space exploration (DSE) is very time-consuming and can easily take multiple days for complex designs. At the same time, and because of the complexity in designing large ASICs, verification teams now routinely make use of emulation and prototyping to test the circuit before the silicon is taped out. This also allows the embedded software designers to start their work earlier in the design process and thus, further reducing the Turn-Around-Times (TAT). In this work, we present a method to automatically re-optimize ASIC designs specified as behavioral descriptions for HLS to FPGAs for emulation and prototyping, based on the observation that synthesis directives that lead to efficient micro-architectures for ASICs, do not directly translate into optimal micro-architectures in FPGAs. This implies that the HLS DSE process would have to be completely repeated for the target FPGA. To avoid this, this work presents a predictive model-based method that takes as inputs the results of an ASIC HLS DSE and automatically, without the need to re-explore the behavioral description, finds the Pareto-optimal micro-architectures for the target FPGA. Experimental results comparing our predictive-model based method vs. completely re-exploring the search space show that our proposed method works well. CCS CONCEPTS • Hardware → Functional verification; Reconfigurable logic and FPGAs; High-level and register-transfer level synthesis.

25 citations


Journal ArticleDOI
TL;DR: Very large scale integration (VLSI) decoder architectures for product-like codes for systems with strict throughput and power dissipation requirements are presented, designed to minimize data transfers in and out of memory blocks, and to use parallel noniterative component decoders.
Abstract: Implementing forward error correction (FEC) for modern long-haul fiber-optic communication systems is a challenge, since these high-throughput systems require FEC circuits that can combine high coding gains and energy-efficient operation. We present very large scale integration (VLSI) decoder architectures for product-like codes for systems with strict throughput and power dissipation requirements. To reduce energy dissipation, our architectures are designed to minimize data transfers in and out of memory blocks, and to use parallel noniterative component decoders. Using a mature 28-nm VLSI process technology node, we showcase different product and staircase decoder implementations that have the capacity to exceed 1-Tb/s information throughputs with energy efficiencies of around 2 pJ/b.

23 citations


Journal ArticleDOI
TL;DR: In this SC-RNN, a hybrid structure is developed by utilizing SC designs and binary circuits to improve the hardware efficiency without significant loss of accuracy and achieves a higher noise tolerance compared to binary implementations.
Abstract: Recurrent neural networks (RNNs) are widely used to solve a large class of recognition problems, including prediction, machine translation, and speech recognition. The hardware implementation of RNNs is, however, challenging due to the high area and energy consumption of these networks. Recently, stochastic computing (SC) has been considered for implementing neural networks and reducing the hardware consumption. In this paper, we propose an energy-efficient and noise-tolerant long short-term memory-based RNN using SC. In this SC-RNN, a hybrid structure is developed by utilizing SC designs and binary circuits to improve the hardware efficiency without significant loss of accuracy. The area and energy consumption of the proposed design are between 1.6%–2.3% and 6.5%–11.2%, respectively, of a 32-bit floating-point (FP) implementation. The SC-RNN requires significantly smaller area and lower energy consumption in most cases compared to an 8-bit fixed point implementation. The proposed design achieves a higher noise tolerance compared to binary implementations. The inference accuracy is from 10% to 13% higher than an FP design when the noise level is high in the computation process.

23 citations


Journal ArticleDOI
TL;DR: Reversible logic cryptography design (RLCD) architecture is introduced and more than 7% of the ASIC performances improved in RLCD-LFSR method compared to the conventional methods.

23 citations


Journal ArticleDOI
TL;DR: This paper explores the integration of MAGIC NOR gates within large-scale memory crossbar arrays by evaluating both analytically and numerically different non-ideality parameters that influence the logic gate performance.

21 citations


Journal ArticleDOI
TL;DR: The experimental validation of the algorithm VLSI implementation proves the possibility of conducting accurate seizure detection using quickly-mountable dry-electrode headsets without the need for uncomfortable/painful through-hair electrodes or adhesive gels.
Abstract: A patient-specific epilepsy diagnostic solution in the form of a wireless wearable ambulatory device is presented. First, the design, VLSI implementation, and experimental validation of a resource-optimized machine learning algorithm for epilepsy seizure detection are described. Next, the development of a mini-PCB that integrates a low-power wireless data transceiver and a programmable processor for hosting the seizure detection algorithm is discussed. The algorithm uses only EEG signals from the frontal lobe electrodes while yielding a seizure detection sensitivity and specificity competitive to the standard full EEG systems. The experimental validation of the algorithm VLSI implementation proves the possibility of conducting accurate seizure detection using quickly-mountable dry-electrode headsets without the need for uncomfortable/painful through-hair electrodes or adhesive gels. Details of design and optimization of the algorithm, the VLSI implementation, and the mini-PCB development are presented and resource optimization techniques are discussed. The optimized implementation is uploaded on a low-power Microsemi Igloo FPGA, requires 1237 logic elements, consumes 110 $\mu$ W dynamic power, and yields a minimum detection latency of 10.2 $\mu$ s. The measurement results from the FPGA implementation on data from 23 patients (198 seizures in total) shows a seizure detection sensitivity and specificity of 92.5% and 80.1%, respectively. Comparison to the state of the art is presented from system integration, the VLSI implementation, and the wireless communication perspectives.

Journal ArticleDOI
TL;DR: Four approximate subtractors are proposed based on the approximate computing at logic level using Karnaugh map (K-map) simplification to offer better error tolerant capabilities for image processing.
Abstract: Approximate computing is a promising technique for energy-efficient Very Large Scale Integration (VLSI) system design and best suited for error resilient applications, such as signal processing and multimedia. Approximate computing reduces accuracy, but still provides significant and faster results with low power consumption. It is attractive for arithmetic circuits. Four approximate subtractors are proposed based on the approximate computing at logic level using Karnaugh map (K-map) simplification. This paper deals with the design approach of various approximate subtractors and dividers for image processing to tolerate the minimal loss of quality. The proposed designs offer better error tolerant capabilities for image processing

Journal ArticleDOI
TL;DR: The proposed VLSI implementation of a wavelet packet transform-based architecture results in an improved performance for ECG signals taken from the Fantasia Database as well as from the self-recorded database, as compared to the WT wavelet architectures which result in distorted outputs.

Proceedings ArticleDOI
25 Mar 2019
TL;DR: This work proposes dynamic programming-based single-row and double-row detailed placement optimizations to maximize the power staple insertion in a post-placement flow and proposes metaheuristics to improve the quality of result.
Abstract: Power Delivery Network (PDN) is one of the most challenging topics in modern VLSI design. Due to aggressive technology node scaling, resistance of back-end-of-line (BEOL) layers increases dramatically in sub-10nm VLSI, causing high supply voltage (IR) drop. To solve this problem, pre-placed or post-placed power staples are inserted in pin-access layers to connect adjacent power rails and reduce PDN resistance, at the cost of reduced routing flexibility, or reduced power staple insertion opportunity. In this work, we propose dynamic programming-based single-row and double-row detailed placement optimizations to maximize the power staple insertion in a post-placement flow. We further propose metaheuristics to improve the quality of result. Compared to the traditional post-placement flow, we achieve up to 13.2% (10mV ) reduction in IR drop, with almost no WNS degradation.

Journal ArticleDOI
TL;DR: A novel 10T SRAM architecture is proposed in this paper which operates in three modes (active, park, standby or hold) to provide better stability and reduced delay in active mode, reduced leakage current in standby mode and retaining the logic state in park mode.
Abstract: Static or leakage power is the dominating component of total power dissipation in deep nanometer technologies below 90 nm, which has resulted in increase from 18% at 130 nm to 54% at 65 nm technology due to continued device and voltage scaling. Static random access memory (SRAM) is a type of RAM in which data is not written permanently and it does not need to be refreshed periodically. Different techniques have been applied to SRAM cell to reduce leakage power without affecting its performance. A novel 10T SRAM architecture is proposed in this paper which operates in three modes (active, park, standby or hold). The main objective of the proposed architecture is to provide better stability and reduced delay in active mode, reduced leakage current in standby mode and retaining the logic state in park mode. Design metrics such as static and dynamic power, delay, power delay product, energy, energy delay product, rise and fall time, slew rate and static noise margin are taken into account. All the circuits were designed using SYNOPSYS EDA tool and simulated in 30 nm technology. Simulation results shows that the proposed SRAM is much better than conventional and other SRAM cells designed using hybrid techniques.

Journal ArticleDOI
TL;DR: The proposed work is a key step towards SiC-based very large-scale integrated (VLSI) circuits implementation for high-temperature applications.
Abstract: A Process Design Kit (PDK) has been developed to realize complex integrated circuits in Silicon Carbide (SiC) bipolar low-power technology. The PDK development process included basic device modeling, and design of gate library and parameterized cells. A transistor–transistor logic (TTL)-based PDK gate library design will also be discussed with delay, power, noise margin, and fan-out as main design criterion to tolerate the threshold voltage shift, beta ( β ) and collector current ( I C ) variation of SiC devices as temperature increases. The PDK-based complex digital ICs design flow based on layout, physical verification, and in-house fabrication process will also be demonstrated. Both combinational and sequential circuits have been designed, such as a 720-device ALU and a 520-device 4 bit counter. All the integrated circuits and devices are fully characterized up to 500 °C. The inverter and a D-type flip-flop (DFF) are characterized as benchmark standard cells. The proposed work is a key step towards SiC-based very large-scale integrated (VLSI) circuits implementation for high-temperature applications.

01 Jan 2019
TL;DR: In this article, the performance of the low power edge triggered d flip flop and dual edge triggered static pulsed flip-flop was investigated in a cadence virtuoso environment using the 180 nm technology.
Abstract: As the day by day size of the electronic devices has been decreased by scaling down the of the VLSI technology. For any electronic devices reliability is one of the best performance indicator which decides the life time of the any device. In this paper we investigated the performance of thelow power edge triggered d flip flop and dual edge triggered static pulsed flip-flop respectively. Simulated the given circuit in cadence virtuoso environment using the 180 nm technology. To estimate the reliability we used the Monte Carlo analysis and applied at different corners such as SS,SF,FS,FF and TT respectively

Journal ArticleDOI
TL;DR: The comparisons of the proposed design with respect to different parameters of the existing MUX(s) along with their corresponding graphical representations prove the robustness of the suggested multiplexer architecture.
Abstract: Quantum Dot Cellular Automata (QCA) is an alternate version of the existing conventional CMOS technology due to its low power intake, faster speed, and smaller size. A multiplexer is a very important logical block in VLSI designs. In this paper, a 2:1 multiplexer (MUX) architecture is proposed, analyzed and compared with related existing architectures. The kink energy of proposed circuit has been calculated and hazard analysis has been completed successfully. All designs in this paper are simulated, checked, and verified using the popular QCADesigner tool. The comparisons of the proposed design with respect to different parameters of the existing MUX(s) along with their corresponding graphical representations prove the robustness of the proposed multiplexer.

Journal ArticleDOI
TL;DR: Binary-weighted convolutional neural networks architecture is proposed, which provides high throughput and low power dissipation, and reduces computational and hardware complexity, storage complexity, critical path delay, bandwidth requirements and accuracy.

Journal ArticleDOI
TL;DR: A novel ultra-low-power Sleepy CMOS-Sleepy Stack technique for nano scale VLSI technologies and eight prior techniques are taken for comparison with proposed technique.
Abstract: This paper presents a novel ultra-low-power Sleepy CMOS-Sleepy Stack (SC-SS) technique for nano scale VLSI technologies. Eight prior techniques are taken for comparison with proposed technique on 6...

Journal ArticleDOI
TL;DR: In this paper, a low power and high speed two hybrid 1-bit full adder cells employing both pass transistor and transmission gate logics are presented, which aim to minimise power dissipation and red light.
Abstract: This paper presents a low power and high speed two hybrid 1-bit full adder cells employing both pass transistor and transmission gate logics. These designs aim to minimise power dissipation and red...

Journal ArticleDOI
TL;DR: This study suggests an evolutionary technique namely symbiotic organisms search (SOS) algorithm based optimal designs of two different analogue very-large-scale integration circuits using the SOS algorithm to optimise the area occupied by the individual circuit.
Abstract: This study suggests an evolutionary technique namely symbiotic organisms search (SOS) algorithm based optimal designs of two different analogue very-large-scale integration circuits. The configurations considered here are nulling resistor compensation based complementary metal–oxide–semiconductor (CMOS) two-stage op-amp and two-stage CMOS op-amp with robust bias circuit. The prime goal of this work is the sizing of metal–oxide–semiconductor (MOS) transistors employing the SOS algorithm to optimise the area occupied by the individual circuit. Design results based on the SOS algorithm are authenticated with SPICE simulation. SPICE simulation results reveal that all the design specifications are firmly satisfied for both the circuits. Moreover, SPICE based results show that the SOS algorithm provides much better results compared to the earlier reported techniques regarding the gain, MOS area and power dissipation for the abovementioned op-amp circuits.

Journal ArticleDOI
TL;DR: A robust RTL to C translation method called VeriIntel2C is proposed to abstract RTL descriptions (written in Verilog) into ANSI-C descriptions optimized for HLS DSE by generating a large number of loops and arrays.

Journal ArticleDOI
TL;DR: This work reviews some popular circuit level SET mitigation techniques developed for combinational logic and compares them with respect to area, power and delay overheads.
Abstract: Soft errors created due to propagation of single event transients are a significant reliability challenge in modern VLSI. With advances in CMOS technology scaling, circuits become increasingly more sensitive to transient pulses caused by energetic particles. This work reviews some popular circuit level SET mitigation techniques developed for combinational logic and compares them with respect to area, power and delay overheads.

Journal ArticleDOI
TL;DR: The efficiency of Vedic mathematics and advances of low power VLSI is combined in this paper and the CNTFET design reduces the power by about 95% and has controllability of the threshold voltage.

Journal ArticleDOI
TL;DR: An encyclopedia of various general and special purpose microprocessors proposed by far is developed, presenting the complete design flow and available electronic design and automation tools and presenting an evaluation of those works in terms of area on the die and performance metrics.
Abstract: Proceeding miniaturization in the VLSI circuits continues to pose challenges to the conventionally used synchronous design style in microprocessors. These include the distribution of clock in the GHz range, robustness to delay variations, reduction in electromagnetic interference, and energy conservation, to name a few. The asynchronous logic has been known for its ability to address the aforementioned challenges by means of the closed-loop handshake protocols, instead of notorious clock signals. Because of these advantages, there have been numerous attempts on building general and special purpose microprocessors during the last three decades. Still, however, the number of asynchronous processors commercially available is scarce, mainly due to an insufficient electronic design and automation tools support, an ambiguous design flow and testing mechanisms for asynchronous logic and, most importantly, absence of a forum to look for relevant works, explaining the design steps and tools for such microprocessors. This paper is intended to bridge this gap by 1) reviewing the design principles of asynchronous logic, including classification, signaling conventions, and pipelining approaches; 2) presenting the complete design flow and available electronic design and automation tools; 3) developing an encyclopedia of various general and special purpose microprocessors proposed by far; and 4) presenting an evaluation of those works in terms of area on the die and performance metrics. This paper will also serve as guidelines for the asynchronous microprocessor design and implementation in all phases from specification to tape-out.

Proceedings ArticleDOI
01 Feb 2019
TL;DR: Three mathematical models of a placement problem in VLSI design is offered, where modules are squares, octagons, and rhombuses, and 450-rotation is permitted.
Abstract: Three mathematical models of a placement problem in VLSI design is offered, where modules are squares, octagons, and rhombuses, and 450-rotation is permitted. For that, the generalized placement problem of signed permutation polytopes within a signed permutation polytope is constructed based on a concept of a circular-like object. The way to formalize 450-rotation is presented, which applies the theory of continuous functional representations of discrete sets.

Journal ArticleDOI
TL;DR: The proposed CNTFET-based SRAM cell has the potential to be exploited as the basic platform for modern high-performance large memory arrays as well as a power-efficient and a reasonable data transfer speed rate operation.
Abstract: In recent years, carbon nanotube FETs with their astounding electrical properties have been in the spotlight of nanoelectronics designers. Therefore, they have introduced as a promising candidate for VLSI applications. The aim of this work is to represent a robust energy-efficient SRAM cell based on wrap-gate CNTFET transistors. The proposed SRAM cell has been designed in a particular way that mitigates the need to utilize complex bit-conditioning circuitries to precharge the bit-lines during operations. Moreover, the proposed design utilizes high-threshold voltage multi-tube CNTFET transistors which are biased in the near-threshold region to achieve a power-efficient and a reasonable data transfer speed rate operation. To benchmark the functionality of the proposed SRAM cell, performance parameters including power, delay, etc. have been evaluated through rigorous simulations. The simulation results demonstrate that the proposed SRAM consumes 14.59 pW and 1.25 nW static and dynamic powers respectively ( @ V dd = 0.5 V ). The proposed design has 180 mV and 340 mV read and write static noise margins respectively and no failure has observed up to 5000 times repetition in Monte Carlo simulations. Based on the simulation results, the proposed CNTFET-based SRAM cell has the potential to be exploited as the basic platform for modern high-performance large memory arrays.


Journal ArticleDOI
TL;DR: This study provides a mechanism to integrate five state of the art design tools in one single design project and can help electrical engineering programs meet Accreditation Board for Engineering and Technology students’ outcome (k).
Abstract: Demand for microelectronics products has seen a recent explosion due to their increased adaption in high‐performance data storage, networking, and Internet of Things applications. Not only such products need to provide high performance, they are often integrated in mixed signal environments that include both analog and digital circuits. This has posed a challenge to faculty who teach microelectronics design in senior undergraduate and graduate electrical engineering courses. It is becoming increasingly difficult to upgrade microelectronics curricula, so students are enabled with the proper skills to utilize design tools presently common in the industry. This study provides a mechanism to integrate five state of the art design tools in one single design project. The tools are Custom Compiler, Hewlett Simulation Program with Integrated Circuit, verilog compiler simulator, IC Validator, and Design Complier. Students, through a design project, conduct the design, layout, and simulations of an static random‐access memory array. The project utilizes both the full‐custom and the semi‐custom flows. One full design is created and integrated where students do the design and layout of transistors in specific circuits and generate synthesized circuits automatically from a high‐level description language. This study can serve as a resource for senior undergraduate students, graduate students, faculty, and practicing engineers. Finally, it can help electrical engineering programs meet Accreditation Board for Engineering and Technology students’ outcome (k) which is an ability to use the techniques, skills, and modern engineering tools necessary for engineering practice.