scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems in 2016"


Journal ArticleDOI
TL;DR: This work uses a first-published methodology to compare one commercial and three academic tools on a common set of C benchmarks, aiming at performing an in-depth evaluation in terms of performance and the use of resources.
Abstract: High-level synthesis (HLS) is increasingly popular for the design of high-performance and energy-efficient heterogeneous systems, shortening time-to-market and addressing today’s system complexity. HLS allows designers to work at a higher-level of abstraction by using a software program to specify the hardware functionality. Additionally, HLS is particularly interesting for designing field-programmable gate array circuits, where hardware implementations can be easily refined and replaced in the target device. Recent years have seen much activity in the HLS research community, with a plethora of HLS tool offerings, from both industry and academia. All these tools may have different input languages, perform different internal optimizations, and produce results of different quality, even for the very same input description. Hence, it is challenging to compare their performance and understand which is the best for the hardware to be implemented. We present a comprehensive analysis of recent HLS tools, as well as overview the areas of active interest in the HLS research community. We also present a first-published methodology to evaluate different HLS tools. We use our methodology to compare one commercial and three academic tools on a common set of C benchmarks, aiming at performing an in-depth evaluation in terms of performance and the use of resources.

433 citations


Journal ArticleDOI
TL;DR: A new security metric and a method to deliver strong logic locking are introduced and it is demonstrated that an attacker can decipher the locked netlist, in a time linear to the number of keys, by sensitizing the key-bits to the output.
Abstract: Due to globalization of integrated circuit (IC) design flow, rogue elements in the supply chain can pirate ICs, overbuild ICs, and insert hardware Trojans. EPIC locks the design by randomly inserting additional gates; only a correct key makes the design to produce correct outputs. We demonstrate that an attacker can decipher the locked netlist, in a time linear to the number of keys, by sensitizing the key-bits to the output. We then develop techniques to fix this vulnerability and make an attacker’s effort truly exponential in the number of inserted keys. We introduce a new security metric and a method to deliver strong logic locking.

287 citations


Journal ArticleDOI
TL;DR: This paper provides a review of different mechanisms that manipulate the state of a nano-magnet using current-induced spin-transfer torque and demonstrates how such mechanisms have been engineered to develop device structures for energy-efficient on-chip memory and logic.
Abstract: As CMOS technology begins to face significant scaling challenges, considerable research efforts are being directed to investigate alternative device technologies that can serve as a replacement for CMOS. Spintronic devices, which utilize the spin of electrons as the state variable for computation, have recently emerged as one of the leading candidates for post-CMOS technology. Recent experiments have shown that a nano-magnet can be switched by a spin-polarized current and this has led to a number of novel device proposals over the past few years. In this paper, we provide a review of different mechanisms that manipulate the state of a nano-magnet using current-induced spin-transfer torque and demonstrate how such mechanisms have been engineered to develop device structures for energy-efficient on-chip memory and logic.

194 citations


Journal ArticleDOI
TL;DR: This paper proposes Hibernus++ to intelligently adapt the hibernate and restore thresholds in response to source dynamics and system load properties, which provides an average 16% reduction in energy consumption and an improvement of 17% in application execution time over state-of-the-art approaches.
Abstract: Energy harvesters are being used to power autonomous systems, but their output power is variable and intermittent. To sustain computation, these systems integrate batteries or supercapacitors to smooth out rapid changes in harvester output. Energy storage devices require time for charging and increase the size, mass, and cost of systems. The field of transient computing moves away from this approach, by powering the system directly from the harvester output. To prevent an application from having to restart computation after a power outage, approaches such as Hibernus allow these systems to hibernate when supply failure is imminent. When the supply reaches the operating threshold, the last saved state is restored and the operation is continued from the point it was interrupted. This paper proposes Hibernus++ to intelligently adapt the hibernate and restore thresholds in response to source dynamics and system load properties. Specifically, capabilities are built into the system to autonomously characterize the hardware platform and its performance during hibernation in order to set the hibernation threshold at a point which minimizes wasted energy and maximizes computation time. Similarly, the system auto-calibrates the restore threshold depending on the balance of energy supply and consumption in order to maximize computation time. Hibernus++ is validated both theoretically and experimentally on microcontroller hardware using both synthesized and real energy harvesters. Results show that Hibernus++ provides an average 16% reduction in energy consumption and an improvement of 17% in application execution time over state-of-the-art approaches.

170 citations


Journal ArticleDOI
TL;DR: This paper proposes a paradigm shift in representing and optimizing logic by using only majority (MAJ) and inversion (INV) functions as basic operations, and develops powerful Boolean methods exploiting global properties of MIGs, such as bit-error masking.
Abstract: In this paper, we propose a paradigm shift in representing and optimizing logic by using only majority (MAJ) and inversion (INV) functions as basic operations. We represent logic functions by majority-inverter graph (MIG): a directed acyclic graph consisting of three-input majority nodes and regular/complemented edges. We optimize MIGs via a new Boolean algebra, based exclusively on majority and inversion operations, that we formally axiomatize in this paper. As a complement to MIG algebraic optimization, we develop powerful Boolean methods exploiting global properties of MIGs, such as bit-error masking. MIG algebraic and Boolean methods together attain very high optimization quality. Considering the set of IWLS’05 benchmarks, our MIG optimizer (MIGhty) enables a 7% depth reduction in LUT-6 circuits mapped by ABC while also reducing size and power activity, with respect to similar and-inverter graph (AIG) optimization. Focusing on arithmetic intensive benchmarks instead, MIGhty enables a 16% depth reduction in LUT-6 circuits mapped by ABC, again with respect to similar AIG optimization. Employed as front-end to a delay-critical 22-nm application-specified integrated circuit flow (logic synthesis + physical design) MIGhty reduces the average delay/area/power by 13%/4%/3%, respectively, over 31 academic and industrial benchmarks. We also demonstrate delay/area/power improvements by 10%/10%/5% for a commercial FPGA flow.

166 citations


Journal ArticleDOI
TL;DR: The universal, scalable, efficient (USE), and easily manufacturable clocking scheme solves one of the most limiting factors of existing clock schemes, the implementation of feedback paths and easy routing of QCA-based circuits.
Abstract: Quantum-dot cellular automata (QCA) is an emerging technology, conceived in face of nanoscale limitations of CMOS circuits, with exceptional integration density, impressive switching frequency, and remarkable low-power characteristics. Several of the current challenges toward the progress of QCA technology is related to the automation of the design process and integration into existing design flows. In this regard, this paper proposes the universal, scalable, efficient (USE), and easily manufacturable clocking scheme. It solves one of the most limiting factors of existing clock schemes, the implementation of feedback paths and easy routing of QCA-based circuits. Consequently, USE facilitates considerably the development of standard cell libraries and design tools for this technology, besides avoiding thermodynamics problems. Case studies presented in this paper reveal an area reduction of up to factor 5 and delay decrease by up to factor 3 in comparison with an existing advanced clocking scheme.

130 citations


Journal ArticleDOI
TL;DR: Simulation results using state-of-the-art tools on several publicly available circuits show that the proposed approach can detect HTs with high accuracy rate and a comparison of this approach with a previously proposed approach is conducted.
Abstract: Due to design and fabrication outsourcing to foundries, the problem of malicious modifications to integrated circuits (ICs), also known as hardware Trojans (HTs), has attracted attention in academia as well as industry. To reduce the risks associated with Trojans, researchers have proposed different approaches to detect them. Among these approaches, test-time detection approaches have drawn the greatest attention. Many test-time approaches assume the existence of a Trojan-free (TF) chip/model also known as “golden model.” Prior works suggest using reverse engineering (RE) to identify such TF ICs for the golden model. However, they did not state how to do this efficiently. In fact, RE is a very costly process which consumes lots of time and intensive manual effort. It is also very error prone. In this paper, we propose an innovative and robust RE scheme to identify the TF ICs. We reformulate the Trojan-detection problem as clustering problem. We then adapt a widely used machine learning method, ${K}$ -means clustering, to solve our problem. Simulation results using state-of-the-art tools on several publicly available circuits show that the proposed approach can detect HTs with high accuracy rate. A comparison of this approach with our previously proposed approach [1] is also conducted. Both the limitations and application scenarios of the two methods are discussed in detail.

114 citations


Journal ArticleDOI
TL;DR: A two-stage energy-efficient temperature-aware task scheduling scheme for heterogeneous real-time multiprocessor system-on-chip (MPSoC) systems that minimizes the system dynamic energy consumption under the constraint of task deadlines and reduces the temperature-dependent system leakage energy consumption.
Abstract: With the continuous scaling of CMOS devices, the increase in power density and system integration level have not only resulted in huge energy consumption but also led to elevated chip temperature. Thus, energy efficient task scheduling with thermal consideration has become a pressing research issue in computing systems, especially for real-time embedded systems with limited cooling techniques. In this paper, we design a two-stage energy-efficient temperature-aware task scheduling scheme for heterogeneous real-time multiprocessor system-on-chip (MPSoC) systems. In the first stage, we analyze the energy optimality of assigning real-time tasks to multiple processors of an MPSoC system, and design a task assignment heuristic that minimizes the system dynamic energy consumption under the constraint of task deadlines. In the second stage, the optimality of minimizing the peak temperature of a processor is investigated, and a slack distribution heuristic is proposed to improve the temperature profile of each processor under the thermal constraint, thus the temperature-dependent system leakage energy consumption is reduced. Through the extensive efforts made in two stages, the system overall energy consumption is minimized. Experimental results have demonstrated the effectiveness of our scheme.

114 citations


Journal ArticleDOI
TL;DR: Impact of process parameters variations on various design metrics of the proposed cell are presented and compared with conventional differential 6T (D6T), transmission gate-based 8T (TG8T), and single ended8T (SE8T) SRAM cells.
Abstract: Low power and noise tolerant static random access memory (SRAM) cells are in high demand today. This paper presents a stable differential SRAM cell that consumes low power. The proposed cell has similar structure to conventional 6T SRAM cell with the addition of two buffer transistors, one tail transistor and one complementary word line. Due to stacking effect, the proposed cell achieves lower power dissipation. In this paper, impact of process parameters variations on various design metrics of the proposed cell are presented and compared with conventional differential 6T (D6T), transmission gate-based 8T (TG8T), and single ended 8T (SE8T) SRAM cells. Impact of process variation, like threshold voltage and length, on different design metrics of an SRAM cell like, read static noise margin (RSNM), read access time ( ${T_{\mathrm{RA}}}$ ), and write access time ( ${T_{\mathrm{ WA}}} $ ) are also presented. The proposed cell achieves ${1.12{\times } /{\mathrm{ 1}}.{\mathrm{ 43}}{\times } /{\mathrm{ 5}}.{\mathrm{ 62}}\times } $ improvement in ${T_{\mathrm {RA}}}$ compared to TG8T/D6T/SE8T at a penalty of $ {1.1{\times } /{4}.{88}\times }$ in $ {T_{\mathrm{ WA}}} $ compared to D6T/TG8T and $ {1.19{\times } /1.18\times } $ in read/write power consumption compared to D6T. An improvement of $ {\rm 1.{\mathrm{ 12}}{\times } /{\mathrm{ 2}}.{\mathrm{ 15}}\times } $ in RSNM is observed compared to D6T/TG8T. The proposed cell consumes $ {5.38\times } $ less power during hold mode and also shows ${2.33\times } $ narrower spread in hold power @ $ {V_{\mathrm{ DD}} = 0.{\mathrm{ 4}}}$ V compared to D6T SRAM cell.

110 citations


Journal ArticleDOI
TL;DR: A refined definition of QMDDs is presented and significantly improved computational methods for their use and manipulation are provided and it is shown that the resulting representation satisfies important criteria for a decision diagram, i.e., compactness and canonicity.
Abstract: Quantum mechanical phenomena such as phase shifts, superposition, and entanglement show promise in use for computation. Suitable technologies for the modeling and design of quantum computers and other information processing techniques that exploit quantum mechanical principles are in the range of vision. Quantum algorithms that significantly speed up the process of solving several important computation problems have been proposed in the past. The most common representation of quantum mechanical phenomena are transformation matrices. However, the transformation matrices grow exponentially with the size of a quantum system and, thus, pose significant challenges for efficient representation and manipulation of quantum functionality. In order to address this problem, first approaches for the representation of quantum systems in terms of decision diagrams have been proposed. One very promising approach is given by Quantum Multiple-Valued Decision Diagrams (QMDDs) which are able to efficiently represent transformation matrices and also inherently support multiple-valued basis states offered by many physical quantum systems. However, the initial proposal of QMDDs was lacking in a formal basis and did not allow, e.g., the change of the variable order—an established core functionality in decision diagrams which is crucial for determining more compact representations. Because of this, the full potential of QMDDs or decision diagrams for quantum functionality in general has not been fully exploited yet. In this paper, we present a refined definition of QMDDs for the general quantum case. Furthermore, we provide significantly improved computational methods for their use and manipulation and show that the resulting representation satisfies important criteria for a decision diagram, i.e., compactness and canonicity. An experimental evaluation confirms the efficiency of QMDDs.

109 citations


Journal ArticleDOI
TL;DR: A combination of new design techniques and new memory technologies is presented that detects a wide variety of hardware Trojans during IC testing and also during system operation in the field, and can prevent a wide range of attacks during synthesis, place-and-route, and fabrication of ICs.
Abstract: There are increasing concerns about possible malicious modifications of integrated circuits (ICs) used in critical applications. Such attacks are often referred to as hardware Trojans. While many techniques focus on hardware Trojan detection during IC testing, it is still possible for attacks to go undetected. Using a combination of new design techniques and new memory technologies, we present a new approach that detects a wide variety of hardware Trojans during IC testing and also during system operation in the field. Our approach can also prevent a wide variety of attacks during synthesis, place-and-route, and fabrication of ICs. It can be applied to any digital system, and can be tuned for both traditional and split-manufacturing methods. We demonstrate its applicability for both application-specified integrated circuits and field-programmable gate arrays. Using fabricated test chips with Trojan emulation capabilities and also using simulations, we demonstrate: 1) the area and power costs of our approach can range between 7.4%–165% and 7%–60%, respectively, depending on the design and the attacks targeted; 2) the speed impact can be minimal (close to 0%); 3) our approach can detect 99.998% of Trojans (emulated using test chips) that do not require detailed knowledge of the design being attacked; 4) our approach can prevent 99.98% of specific attacks (simulated) that utilize detailed knowledge of the design being attacked (e.g., through reverse engineering); and 5) our approach never produces any false positives, i.e., it does not report attacks when the IC operates correctly.

Journal ArticleDOI
TL;DR: This paper pioneers a converter-less PV power system with the maximum power point tracking that directly supplies power to the load without the power converters or the energy storage element and achieves an 87.1% of overall system efficiency during a day.
Abstract: Energy harvesting from natural environment gives range of benefits for the Internet of things. Scavenging energy from photovoltaic (PV) cells is one of the most practical solutions in terms of power density among existing energy harvesting sources. PV power systems mandate the maximum power point tracking (MPPT) to scavenge the maximum possible solar energy. In general, a switching-mode power converter, an MPPT charger, controls the charging current to the energy storage element (a battery or equivalent), and the energy storage element provides power to the load device. The mismatch between the maximum power point (MPP) current and the load current is managed by the energy storage element. However, such architecture causes significant energy loss (typically over 20%) and a significant weight/volume and a high cost due to the cascaded power converters and the energy storage element. This paper pioneers a converter-less PV power system with the MPPT that directly supplies power to the load without the power converters or the energy storage element. The proposed system uses a nonvolatile microprocessor to enable an extremely fine-grain dynamic power management in a few hundred microseconds. This makes it possible to match the load current with the MPP current. We present detailed modeling, simulation, and optimization of the proposed energy harvesting system including the radio frequency transceiver. Experiments show that the proposed setup achieves an 87.1% of overall system efficiency during a day, 30.6% higher than the conventional MPPT methods in actual measurements, and thus a significantly higher duty cycle under a weak solar irradiance.

Journal ArticleDOI
TL;DR: A novel distributed low-storage clone detection protocol (LSCD) for WSNs is presented and extensive simulations demonstrate that the lifetime, storage requirements, and detection probability of the protocol are substantially improved over competing solutions from the literature.
Abstract: Cyber-physical systems (CPSs) have recently become an important research field not only because of their important and varied application scenarios, including transportation systems, smart homes, surveillance systems, and wearable devices but also because the fundamental infrastructure has yet to be well addressed. Wireless sensor networks (WSNs), as a type of supporting infrastructure, play an irreplaceable role in CPS design. Specifically, secure communication in WSNs is vital because information transferred in the networks can be easily stolen or replaced. Therefore, this paper presents a novel distributed low-storage clone detection protocol (LSCD) for WSNs. We first design a detection route along the perpendicular direction of a witness path with witness nodes deployed in a ring path. This ensures that the detection route must encounter the witness path because the distance between any two detection routes must be smaller than the witness path length. In the LSCD protocol, clone detection is processed in a nonhotspot region where a large amount of energy remains, which can improve energy efficiency as well as network lifetime. Extensive simulations demonstrate that the lifetime, storage requirements, and detection probability of our protocol are substantially improved over competing solutions from the literature.

Journal ArticleDOI
TL;DR: A novel approach and techniques for physics-based electromigration (EM) assessment in power delivery networks of very large scale integration systems shows that the proposed method will lead to less conservative estimation of the lifetime than the existing Black-Blech-based methods.
Abstract: This paper presents a novel approach and techniques for physics-based electromigration (EM) assessment in power delivery networks of very large scale integration systems. An increase in the voltage drop above the threshold level, caused by EM-induced increase in resistances of the individual interconnect branches, is considered as a failure criterion. It replaces a currently employed conservative weakest branch criterion, which does not account an essential redundancy for current propagation existing in the power-ground (P/G) networks. EM-induced increase in the resistance of the individual grid branches is described in the approximation of the recently developed physics-based formalism for void nucleation and growth. An approach to calculation of the void nucleation times in the group of branches comprising the interconnect tree is implemented. As a result, P/G networks become time-varying linear networks. A developed technique for calculating the hydrostatic stress evolution inside a multibranch interconnect tree allows to avoid over optimistic prediction of the time-to-failure made with the Blech–Black analysis of individual branches of interconnect tree. Experimental results obtained on a number of International Business Machines Corporation benchmark circuits show that the proposed method will lead to less conservative estimation of the lifetime than the existing Black–Blech-based methods. It also reveals that the EM-induced failure is more likely to happen at the place where the hydrostatic stress predicted by the initial current density is large and is more likely to happen at longer times when the saturated void volume effect is taken into account.

Journal ArticleDOI
TL;DR: NumChecker is presented, a new virtual machine (VM) monitor based framework to detect and identify control-flow modifying kernel rootkits in a guest VM by measuring the number of certain hardware events that occur during the system call's execution.
Abstract: Kernel rootkits are formidable threats to computer systems. They are stealthy and can have unrestricted access to system resources. This paper presents NumChecker, a new virtual machine (VM) monitor based framework to detect and identify control-flow modifying kernel rootkits in a guest VM. NumChecker detects and identifies malicious modifications to a system call in the guest VM by measuring the number of certain hardware events that occur during the system call’s execution. To automatically count these events, NumChecker leverages the hardware performance counters (HPCs), which exist in modern processors. By using HPCs, the checking cost is significantly reduced and the tamper-resistance is enhanced. We implement a prototype of NumChecker on Linux with the kernel-based VM. An HPC-based two-phase kernel rootkit detection and identification technique is presented and evaluated on a number of real-world kernel rootkits. The results demonstrate its practicality and effectiveness.

Journal ArticleDOI
TL;DR: The proposed approach is implemented as a power governor in Linux and extensively validated on an ARM Cortex-A8 running different benchmark applications, showing that with intra- and inter-application variations, it can effectively minimize energy consumption by up to 33% compared to the existing approaches.
Abstract: Embedded systems execute applications with varying performance requirements. These applications exercise the hardware differently depending on the computation task, generating varying workloads with time. Energy minimization with such workload and performance variations within (intra) and across (inter) applications is particularly challenging. To address this challenge, we propose an online approach, capable of minimizing energy through adaptation to these variations. At the core of this approach is a reinforcement learning algorithm that suitably selects the appropriate voltage/frequency scaling (VFS) based on workload predictions to meet the applications’ performance requirements. The adaptation is then facilitated and expedited through learning transfer, which uses the interaction between the application, runtime, and hardware layers to adjust the VFS. The proposed approach is implemented as a power governor in Linux and extensively validated on an ARM Cortex-A8 running different benchmark applications. We show that with intra- and inter-application variations, our proposed approach can effectively minimize energy consumption by up to 33% compared to the existing approaches. Scaling the approach to multicore systems, we also demonstrate that it can minimize energy by up to 18% with $2{\times }$ reduction in the learning time when compared with an existing approach.

Journal ArticleDOI
TL;DR: The key feature of this paper is the development of a modified Fedorov search algorithm based on the D -optimal criterion that expeditiously locates a highly sparse set of nodes within the multidimensional random space where the original network needs to be probed.
Abstract: This paper presents a novel linear regression-based polynomial chaos (PC) approach for the efficient multidimensional uncertainty quantification of general distributed and lumped high-speed circuit networks. The key feature of this paper is the development of a modified Fedorov search algorithm based on the ${D}$ -optimal criterion that expeditiously locates a highly sparse set of nodes within the multidimensional random space where the original network needs to be probed. Specifically, the number of selected nodes is kept equal to the number of unknown PC coefficients of the network response, thereby making this approach substantially more efficient than the conventional linear regression approach which is based on an oversampling methodology. Additionally, due to the ${D}$ -optimal criterion, this approach ensures highly accurate recovery of the PC coefficients. The validity of this paper is established through multiple numerical examples.

Journal ArticleDOI
TL;DR: “power-neutral” operation is proposed, a new paradigm for transient computing systems, whereby the instantaneous power consumption of the system must match the instantaneous harvested power.
Abstract: Transient computing systems do not have energy storage, and operate directly from energy harvesting. These systems are often faced with the inherent challenge of low-current or transient power supply. In this paper, we propose “power-neutral” operation, a new paradigm for such systems, whereby the instantaneous power consumption of the system must match the instantaneous harvested power. Power neutrality is achieved using a control algorithm for dynamic frequency scaling, modulating system performance gracefully in response to the incoming power. Detailed system model is used to determine design parameters for selecting the system voltage thresholds where the operating frequency will be raised or lowered, or the system will be hibernated. The proposed control algorithm for power-neutral operation is experimentally validated using a microcontroller incorporating voltage threshold-based interrupts for frequency scaling. The microcontroller is powered directly from real energy harvesters; results demonstrate that a power-neutral system sustains operation for 4%–88% longer with up to 21% speedup in application execution.

Journal ArticleDOI
TL;DR: A cross-layer design framework for resource-constrained cyber-physical systems that combines control-theoretic methods at the functional layer and cybersecurity techniques at the embedded platform layer, and addresses security together with other design metrics such as control performance under resource and real-time constraints is proposed.
Abstract: Security attacks may have disruptive consequences on cyber-physical systems, and lead to significant social and economic losses. Building secure cyber-physical systems is particularly challenging due to the variety of attack surfaces from the cyber and physical components, and often to limited computation and communication resources. In this paper, we propose a cross-layer design framework for resource-constrained cyber-physical systems. The framework combines control-theoretic methods at the functional layer and cybersecurity techniques at the embedded platform layer, and addresses security together with other design metrics such as control performance under resource and real-time constraints. We use the concept of interface variables to capture the interactions between control and platform layers, and quantitatively model the relation among system security, performance, and schedulability via interface variables. The general codesign framework is customized and refined to the automotive domain, and its effectiveness is demonstrated through an industrial case study and a set of synthetic examples.

Journal ArticleDOI
TL;DR: A reliable small-signal model parameter extraction method for GaN high electron mobility transistor (HEMT) on Si substrate has been developed and validated with respect to different gate width devices and shows reliable and physically relevant results.
Abstract: In this paper, a reliable small-signal model parameter extraction method for GaN high electron mobility transistor (HEMT) on Si substrate has been developed and validated with respect to different gate width devices. The main advantage of this approach is its accuracy and dependency on only pinched-off and unbiased ${S}$ -parameter measurements. The developed procedure shows reliable and physically relevant results for the investigated devices and scaled with the gate width. A very good agreement is obtained between small-and large-signal simulations and measurements of the considered GaN HEMTs.

Journal ArticleDOI
TL;DR: This paper demonstrates, for the first time, a first principle-based analytical solution of the stress evolution in a multibranch tree by de-coupling the individual segments through the proper boundary conditions (BCs) accounting the interactions between different branches.
Abstract: Electromigration (EM) in very large scale integration (VLSI) interconnects has become one of the major reliability issues for current and future VLSI technologies. However, existing EM modeling and analysis techniques are mainly developed for a single wire. For practical VLSI chips, the elemental EM reliability unit called interconnect tree is a multibranch interconnect segment consisting of a continuously connected, highly conductive metal (Cu) lines terminated by diffusion barriers and located within the single level of metallization. The EM effects in those branches are not independent and have to be considered simultaneously. In this paper, we demonstrate, for the first time, a first principle-based analytical solution of this problem. We have derived the analytical expressions describing the hydrostatic stress evolution in several typical interconnect trees: 1) the straight-line three-terminal wires; 2) the T-shaped four-terminal wires; and 3) the cross-shaped five-terminal wires. The new approach solves the stress evolution in a multibranch tree by de-coupling the individual segments through the proper boundary conditions (BCs) accounting the interactions between different branches. By using Laplace transformation technique, analytical solutions are obtained for each type of the interconnect trees. The analytical solutions in terms of a set of auxiliary basis functions using the complementary error function agree well with the numerical analysis results. Our analysis further demonstrates that using the first two dominant basis functions can lead to 0.5% error, which is sufficient for practical EM analysis.

Journal ArticleDOI
TL;DR: A tool is developed that maps graphs of add/sub/mult nodes to DSP blocks on Xilinx FPGAs, ensuring maximum throughput, and offers an improvement in frequency over standard pipelined code, and 23% over Vivado HLS synthesis implementation, while retaining code portability, at the cost of a modest increase in logic resource usage.
Abstract: The digital signal processing (DSP) blocks on modern field programmable gate arrays (FPGAs) are highly capable and support a variety of different datapath configurations. Unfortunately, inference in synthesis tools can fail to result in circuits that reach maximum DSP block throughput. We have developed a tool that maps graphs of add/sub/mult nodes to DSP blocks on Xilinx FPGAs, ensuring maximum throughput. This is done by delaying scheduling until after the graph has been partitioned onto DSP blocks and scheduled based on their pipeline structure, resulting in a throughput optimized implementation. Our tool prepares equivalent implementations in a variety of other methods, including high-level synthesis (HLS) for comparison. We show that the proposed approach offers an improvement in frequency of 100% over standard pipelined code, and 23% over Vivado HLS synthesis implementation, while retaining code portability, at the cost of a modest increase in logic resource usage.

Journal ArticleDOI
TL;DR: An LDE-aware analytical analog placement algorithm is presented to mitigate the influence of the LDEs while improving circuit performance and Experimental results show that the placement algorithm can effectively and efficiently reduce theLDE-induced variations and improve circuit performance.
Abstract: Layout-dependent effects (LDEs) have become a critical issue in modern analog and mixed-signal circuit designs. The three major sources of LDEs, well proximity, length of oxide diffusion, and oxide-to-oxide spacing, significantly affect the threshold voltage and mobility of devices in advanced technology nodes. In this paper, we propose the first work to consider the three major sources of LDEs during analog placement. We first transform the three LDE models into nonlinear analytical placement models. Then an LDE-aware analytical analog placement algorithm is presented to mitigate the influence of the LDEs while improving circuit performance. Experimental results show that our placement algorithm can effectively and efficiently reduce the LDE-induced variations and improve circuit performance.

Journal ArticleDOI
TL;DR: Two circuit examples designed in a commercial 32 nm CMOS silicon on insulator process demonstrate that the proposed BMF method achieves up to $9\times $ runtime speed-up over the traditional modeling technique without surrendering any accuracy.
Abstract: Efficient performance modeling of today’s analog and mixed-signal circuits is an important yet challenging task, due to the high-dimensional variation space and expensive circuit simulation. In this paper, we propose a novel performance modeling algorithm that is referred to as Bayesian model fusion (BMF) to address this challenge. The key idea of BMF is to borrow the information collected from an early stage (e.g., schematic level) to facilitate efficient performance modeling at a late stage (e.g., post layout). Such a goal is achieved by statistically modeling the performance correlation between early and late stages through Bayesian inference. Furthermore, to make the proposed BMF method of practical utility, four implementation issues, including: 1) prior mapping; 2) missing prior knowledge; 3) fast solver; and 4) prior and hyper-parameter selection, are carefully considered in this paper. Two circuit examples designed in a commercial 32 nm CMOS silicon on insulator process demonstrate that the proposed BMF method achieves up to $9\times $ runtime speed-up over the traditional modeling technique without surrendering any accuracy.

Journal ArticleDOI
TL;DR: The results show that the algebraic approach to functional verification of gate-level, integer arithmetic circuits wins over the state-of-the-art SAT/satisfiability modulo theory solvers by several orders of magnitude of CPU time.
Abstract: This paper presents an algebraic approach to functional verification of gate-level, integer arithmetic circuits. It is based on extracting a unique bit-level polynomial function computed by the circuit directly from its gate-level implementation. The method can be used to verify the arithmetic function computed by the circuit against its known specification, or to extract an arithmetic function implemented by the circuit. Experiments were performed on arithmetic circuits synthesized and mapped onto standard cells using ABC system. The results demonstrate scalability of the method to large arithmetic circuits, such as multipliers, multiply-accumulate, and other elements of arithmetic datapaths with up to 512-bit operands and over 2 million gates. The results show that our approach wins over the state-of-the-art SAT/satisfiability modulo theory solvers by several orders of magnitude of CPU time. The procedure has linear runtime and memory complexity, measured by the number of logic gates.

Journal ArticleDOI
TL;DR: Two novel adaptive routing algorithms, namely coarse and fine-grained look-ahead algorithms, are proposed in this paper to enhance 2-D mesh/torus NoC system fault-tolerant capabilities.
Abstract: Fault tolerance and adaptive capabilities are challenges for modern networks-on-chip (NoC) due to the increase in physical defects in advanced manufacturing processes. Two novel adaptive routing algorithms, namely coarse and fine-grained (FG) look-ahead algorithms, are proposed in this paper to enhance 2-D mesh/torus NoC system fault-tolerant capabilities. These strategies use fault flag codes from neighboring nodes to obtain the status or conditions of real-time traffic in an NoC region, then calculate the path weights and choose the route to forward packets. This approach enables the router to minimize congestion for the adjacent connected channels and also to bypass a path with faulty channels by looking ahead at distant neighboring router paths. The novelty of the proposed routing algorithms is the weighted path selection strategies, which make near-optimal routing decisions to maintain the NoC system performance under high fault rates. Results show that the proposed routing algorithms can achieve performance improvement compared to other state of the art works under various traffic loads and high fault rates. The routing algorithm with FG look-ahead capability achieves a higher throughput compared with the coarse-grained approach under complex fault patterns. The hardware area/power overheads of both routing approaches are relatively low which does not prohibit scalability for large-scale NoC implementations.

Journal ArticleDOI
TL;DR: This paper proposes SVR-NoC, a network-onchip (NoC) latency model using support vector regression (SVR), and proposes a learning framework that relies on SVR to collect training data and predict the traffic flow latency.
Abstract: In this paper, we propose SVR-NoC, a network-on-chip (NoC) latency model using support vector regression (SVR). More specifically, based on the application communication information and the NoC routing algorithm, the channel and source queue waiting times are first estimated using an analytical queuing model with two equivalent queues. To improve the prediction accuracy, the queuing theory-based delay estimations are included as features in the learning process. We then propose a learning framework that relies on SVR to collect training data and predict the traffic flow latency. The proposed learning methods can be used to analyze various traffic scenarios for the target NoC platform. Experimental results on both synthetic and real-application traffic demonstrate on average less than 12% prediction error in network saturation load, as well as more than $100\times $ speedup compared to cycle-accurate simulations can be achieved.

Journal ArticleDOI
TL;DR: A new placement approach which can handle designs with any number of double-row height standard cells is proposed which can achieve much better quality and robustness and is compared with two alternative detailed placement methods on mixed-height synchronous designs.
Abstract: Conventional detailed placement algorithms typically assume all standard cells in the design have the same height. However, as the complexity and design requirement increase in modern very large-scale integration design, designs with mixed single-row height and double-row height standard cells come into existence in order to address the emerging standard cell design challenges. A detailed placement algorithm without considering these double-row height cells will either have to deal with a lot of movable macros or waste a significant amount of placement area, depending on what type of techniques people use to accommodate such design. This paper proposes a new placement approach which can handle designs with any number of double-row height standard cells. We transform design with mixed-height standard cells into one which only contains same height standard cells by pairing up single-row height cells into double-row height. Then conventional detailed placement algorithms can be applied. In particular, we generate cell pair candidates by formulating a maximum weighted matching problem. A subset of the cell pair candidates are then carefully selected to form double-row height cells based on the local bin density. A refinement procedure is performed at the end to further improve our placement quality. We compare our approach with two alternative detailed placement methods on mixed-height asynchronous and synchronous designs. The experimental results show that our approach can achieve much better quality and robustness.

Journal ArticleDOI
TL;DR: A novel integrated approach for managing the read-disturb problem and proposes a proactive data migration technique which is effective in reducing large variations in I/O response times of the existing on-demand read reclaim (RR) technique.
Abstract: The read-disturb problem is emerging as one of the main reliability issues in high-density NAND flash memory. A read-disturb error, which causes data loss, occurs to a page when a large number of reads are performed to its neighboring pages. In this paper, we propose a novel integrated approach for managing the read-disturb problem. Our approach is based on our key observations from the NAND physics that the read disturbance to neighboring pages is a function of the read voltage and the read time. Since the read disturbance has an exponential dependence on the read voltage, lowering the read voltage can improve the read-disturb resistance of a NAND block. By modifying NAND chips to support multiple read modes with different read voltages, our approach allows a flash translation layer module to exploit the tradeoff between the read disturbance and write speed. Since the read disturbance is also proportional to the read time, our approach exploits the difference in the read time among different NAND pages so that frequently read pages can be less intensively read-disturbed using fast page reads. By intelligently relocating read-intensive data to read-disturb resistant blocks and pages, our approach can reduce a large portion of the time overhead from managing read-disturb errors. We also propose a proactive data migration technique which is effective in reducing large variations in I/O response times of the existing on-demand read reclaim (RR) technique. Our experimental results show that our proposed techniques can reduce the execution time overhead by 73% over the existing read-disturb management technique while reducing I/O response time fluctuations during RR activations.

Journal ArticleDOI
TL;DR: An efficient process variation (PV)-aware mask optimization framework, namely PVOPC, to simultaneously minimize EPE and PV band with fast convergence is presented, which includes EPE-sensitivity-driven dynamic fragmentation, PV-aware EPE modeling, and correction with three new EPEconverging techniques and a systematic subresolution-assisted feature insertion algorithm.
Abstract: As nanometer technology advances, conventional optical proximity correction (OPC) that minimizes the edge placement error (EPE) at the nominal process condition alone often leads to poor process windows. To improve the mask printability across various process corners, process-window OPC optimizes EPE for multiple process corners, but often suffers long runtime, due to repeated lithographic simulations. This paper presents an efficient process variation (PV)-aware mask optimization framework, namely PVOPC, to simultaneously minimize EPE and PV band with fast convergence. The PVOPC framework includes EPE-sensitivity-driven dynamic fragmentation, PV-aware EPE modeling, and correction with three new EPE-converging techniques and a systematic subresolution-assisted feature insertion algorithm. Experimental results show that our approach efficiently achieves high-quality EPE and PV band results.