Showing papers in "IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems in 2013"

PDF

Open Access

Journal Article•DOI•

Low-Power Digital Signal Processing Using Approximate Adders

[...]

Vaibhav Kumar Gupta¹, Debabrata Mohapatra², Anand Raghunathan¹, Kaushik Roy¹•Institutions (2)

01 Jan 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This paper proposes logic complexity reduction at the transistor level as an alternative approach to take advantage of the relaxation of numerical accuracy, and demonstrates the utility of these approximate adders in two digital signal processing architectures with specific quality constraints.

...read moreread less

Abstract: Low power is an imperative requirement for portable multimedia devices employing various signal processing algorithms and architectures. In most multimedia applications, human beings can gather useful information from slightly erroneous outputs. Therefore, we do not need to produce exactly correct numerical outputs. Previous research in this context exploits error resiliency primarily through voltage overscaling, utilizing algorithmic and architectural techniques to mitigate the resulting errors. In this paper, we propose logic complexity reduction at the transistor level as an alternative approach to take advantage of the relaxation of numerical accuracy. We demonstrate this concept by proposing various imprecise or approximate full adder cells with reduced complexity at the transistor level, and utilize them to design approximate multi-bit adders. In addition to the inherent reduction in switched capacitance, our techniques result in significantly shorter critical paths, enabling voltage scaling. We design architectures for video and image compression algorithms using the proposed approximate arithmetic units and evaluate them to demonstrate the efficacy of our approach. We also derive simple mathematical models for error and power consumption of these approximate adders. Furthermore, we demonstrate the utility of these approximate adders in two digital signal processing architectures (discrete cosine transform and finite impulse response filter) with specific quality constraints. Simulation results indicate up to 69% power savings using the proposed approximate adders, when compared to existing implementations using accurate adders.

...read moreread less

637 citations

Journal Article•DOI•

A Meet-in-the-Middle Algorithm for Fast Synthesis of Depth-Optimal Quantum Circuits

[...]

Matthew Amy¹, Dmitri Maslov, Michele Mosca¹, Martin Roetteler²•Institutions (2)

University of Waterloo¹, Princeton University²

01 Jun 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: An algorithm for computing depth-optimal decompositions of logical operations, leveraging a meet-in-the-middle technique to provide a significant speedup over simple brute force algorithms is presented.

...read moreread less

Abstract: We present an algorithm for computing depth-optimal decompositions of logical operations, leveraging a meet-in-the-middle technique to provide a significant speedup over simple brute force algorithms. As an illustration of our method, we implemented this algorithm and found factorizations of commonly used quantum logical operations into elementary gates in the Clifford+T set. In particular, we report a decomposition of the Toffoli gate over the set of Clifford and T gates. Our decomposition achieves a total T-depth of 3, thereby providing a 40% reduction over the previously best known decomposition for the Toffoli gate. Due to the size of the search space, the algorithm is only practical for small parameters, such as the number of qubits, and the number of gates in an optimal implementation.

...read moreread less

495 citations

Journal Article•DOI•

Generalized Memristive Device SPICE Model and its Application in Circuit Design

[...]

Chris Yakopcic¹, Tarek M. Taha¹, Guru Subramanyam¹, Robinson E. Pino¹•Institutions (1)

University of Dayton¹

01 Aug 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: The proposed SPICE model builds on existing models and is correlated against several published device characterization data with an average error of 6.04%.

...read moreread less

Abstract: This paper presents a SPICE model for memristive devices. It builds on existing models and is correlated against several published device characterization data with an average error of 6.04%. When compared to existing alternatives, the proposed model can more accurately simulate a wide range of published memristors. The model is also tested in large circuits with up to 256 memristors, and was less likely to cause convergence errors when compared to other models. We show that the model can be used to study the impact of memristive device variation within a circuit. We examine the impact of nonuniformity in device state variable dynamics and conductivity on individual memristors as well as a four memristor read/write circuit. These studies show that the model can be used to predict how variation in a memristor wafer may impact circuit performance.

...read moreread less

217 citations

Journal Article•DOI•

Stochastic Testing Method for Transistor-Level Uncertainty Quantification Based on Generalized Polynomial Chaos

[...]

Zheng Zhang¹, Tarek A. El-Moselhy¹, Ibrahim M. Elfadel², Luca Daniel¹•Institutions (2)

Massachusetts Institute of Technology¹, Masdar Institute of Science and Technology²

01 Oct 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: In this article, an intrusive spectral simulator for statistical circuit analysis is presented, which employs the recently developed generalized polynomial chaos expansion to perform uncertainty quantification of nonlinear transistor circuits with both Gaussian and non-Gaussian random parameters.

...read moreread less

Abstract: Uncertainties have become a major concern in integrated circuit design. In order to avoid the huge number of repeated simulations in conventional Monte Carlo flows, this paper presents an intrusive spectral simulator for statistical circuit analysis. Our simulator employs the recently developed generalized polynomial chaos expansion to perform uncertainty quantification of nonlinear transistor circuits with both Gaussian and non-Gaussian random parameters. We modify the nonintrusive stochastic collocation (SC) method and develop an intrusive variant called stochastic testing (ST) method. Compared with the popular intrusive stochastic Galerkin (SG) method, the coupled deterministic equations resulting from our proposed ST method can be solved in a decoupled manner at each time point. At the same time, ST requires fewer samples and allows more flexible time step size controls than directly using a nonintrusive SC solver. These two properties make ST more efficient than SG and than existing SC methods, and more suitable for time-domain circuit simulation. Simulation results of several digital, analog and RF circuits are reported. Since our algorithm is based on generic mathematical models, the proposed ST algorithm can be applied to many other engineering problems.

...read moreread less

167 citations

Journal Article•DOI•

Underdesigned and Opportunistic Computing in Presence of Hardware Variability

[...]

Puneet Gupta¹, Yuvraj Agarwal², Lara Dolecek¹, Nikil Dutt³, Rajesh Gupta², Rakesh Kumar⁴, Subhasish Mitra⁵, Alexandru Nicolau³, Tajana Rosing², Mani Srivastava¹, Steven Swanson², Dennis Sylvester⁶ - Show less +8 more•Institutions (6)

University of California, Los Angeles¹, University of California, San Diego², University of California, Irvine³, University of Illinois at Urbana–Champaign⁴, Stanford University⁵, University of Michigan⁶

01 Jan 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: Specific sensing mechanisms that have been developed and their potential use in building underdesigned and opportunistic computing machines, including software stack that opportunistically adapts to a sensed or modeled hardware.

...read moreread less

Abstract: Microelectronic circuits exhibit increasing variations in performance, power consumption, and reliability parameters across the manufactured parts and across use of these parts over time in the field. These variations have led to increasing use of overdesign and guardbands in design and test to ensure yield and reliability with respect to a rigid set of datasheet specifications. This paper explores the possibility of constructing computing machines that purposely expose hardware variations to various layers of the system stack including software. This leads to the vision of underdesigned hardware that utilizes a software stack that opportunistically adapts to a sensed or modeled hardware. The envisioned underdesigned and opportunistic computing (UnO) machines face a number of challenges related to the sensing infrastructure and software interfaces that can effectively utilize the sensory data. In this paper, we outline specific sensing mechanisms that we have developed and their potential use in building UnO machines.

...read moreread less

153 citations

Journal Article•DOI•

NCTU-GR 2.0: Multithreaded Collision-Aware Global Routing With Bounded-Length Maze Routing

[...]

Wen-Hao Liu¹, Wei-Chun Kao¹, Yih-Lang Li¹, Kai-Yuan Chao²•Institutions (2)

National Chiao Tung University¹, Intel²

01 May 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: Two bounded-length maze routing (BLMR) algorithms are presented that perform much faster routing than traditional maze routing algorithms and a rectilinear Steiner minimum tree aware routing scheme is proposed to guide heuristic-BLMR and monotonic routing to build a routing tree with shorter wirelength.

...read moreread less

Abstract: Modern global routers employ various routing methods to improve routing speed and quality Maze routing is the most time-consuming process for existing global routing algorithms This paper presents two bounded-length maze routing (BLMR) algorithms (optimal-BLMR and heuristic-BLMR) that perform much faster routing than traditional maze routing algorithms In addition, a rectilinear Steiner minimum tree aware routing scheme is proposed to guide heuristic-BLMR and monotonic routing to build a routing tree with shorter wirelength This paper also proposes a parallel multithreaded collision-aware global router based on a previous sequential global router (SGR) Unlike the partitioning-based strategy, the proposed parallel router uses a task-based concurrency strategy Finally, a 3-D wirelength optimization technique is proposed to further refine the 3-D routing results Experimental results reveal that the proposed SGR uses less wirelength and runs faster than most of other state-of-the-art global routers with a different set of parameters , , , Compared to the proposed SGR, the proposed parallel router yields almost the same routing quality with average 271 and 312-fold speedup on overflow-free and hard-to-route cases, respectively, when running on a 4-core system

...read moreread less

142 citations

Journal Article•DOI•

Error Recovery in Cyberphysical Digital Microfluidic Biochips

[...]

Yan Luo¹, Krishnendu Chakrabarty¹, Tsung-Yi Ho²•Institutions (2)

Duke University¹, National Cheng Kung University²

01 Jan 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A physical-aware system reconfiguration technique that uses sensor data at intermediate checkpoints to dynamically reconfigure the biochip and a cyberphysical resynthesis technique is used to recompute electrode-actuation sequences, thereby deriving new schedules, module placement, and droplet routing pathways, with minimum impact on the time-to-response.

...read moreread less

Abstract: Droplet-based digital microfluidics technology has now come of age, and software-controlled biochips for healthcare applications are starting to emerge. However, today's digital microfluidic biochips suffer from the drawback that there is no feedback to the control software from the underlying hardware platform. Due to the lack of precision inherent in biochemical experiments, errors are likely during droplet manipulation; error recovery based on the repetition of experiments leads to wastage of expensive reagents and hard-to-prepare samples. By exploiting recent advances in the integration of optical detectors (sensors) into a digital microfluidics biochip, we present a physical-aware system reconfiguration technique that uses sensor data at intermediate checkpoints to dynamically reconfigure the biochip. A cyberphysical resynthesis technique is used to recompute electrode-actuation sequences, thereby deriving new schedules, module placement, and droplet routing pathways, with minimum impact on the time-to-response.

...read moreread less

126 citations

Journal Article•DOI•

Accurate Modeling of the Delay and Energy Overhead of Dynamic Voltage and Frequency Scaling in Modern Microprocessors

[...]

Sangyoung Park¹, Jaehyun Park¹, Donghwa Shin², Yanzhi Wang³, Qing Xie³, Massoud Pedram³, Naehyuck Chang¹ - Show less +3 more•Institutions (3)

Seoul National University¹, Polytechnic University of Turin², University of Southern California³

01 May 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: The DVFS transition overhead is redefined including the underclocking-related losses in a DVFS-enabled microprocessor, additional inductor IR losses, and power losses due to discontinuous-mode DC-DC conversion.

...read moreread less

Abstract: Dynamic voltage and frequency scaling (DVFS) has been studied for well over a decade. Nevertheless, existing DVFS transition overhead models suffer from significant inaccuracies; for example, by incorrectly accounting for the effect of DC–DC converters, frequency synthesizers, voltage, and frequency change policies on energy losses incurred during mode transitions. Incorrect and/or inaccurate DVFS transition overhead models prevent one from determining the precise break-even time and thus forfeit some of the energy saving that is ideally achievable. This paper introduces accurate DVFS transition overhead models for both energy consumption and delay. In particular, we redefine the DVFS transition overhead including the underclocking-related losses in a DVFS-enabled microprocessor, additional inductor IR losses, and power losses due to discontinuous-mode DC–DC conversion. We report the transition overheads for a desktop, a mobile and a low-power representative processor. We also present DVFS transition overhead macromodel for use by high-level DVFS schedulers.

...read moreread less

122 citations

Journal Article•DOI•

Design for Manufacturing With Emerging Nanolithography

[...]

David Z. Pan¹, Bei Yu¹, Jhih-Rong Gao¹•Institutions (1)

University of Texas at Austin¹

01 Oct 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This paper surveys key design for manufacturing issues for extreme scaling with emerging nanolithography technologies, including double/multiple patterning lithography, extreme ultraviolet lithographic, and electron-beam lithography.

...read moreread less

Abstract: In this paper, we survey key design for manufacturing issues for extreme scaling with emerging nanolithography technologies, including double/multiple patterning lithography, extreme ultraviolet lithography, and electron-beam lithography. These nanolithography and nanopatterning technologies have different manufacturing processes and their unique challenges to very large scale integration (VLSI) physical design, mask synthesis, and so on. It is essential to have close VLSI design and underlying process technology co-optimization to achieve high product quality (power/performance, etc.) and yield while making future scaling cost-effective and worthwhile. Recent results and examples will be discussed to show the enablement and effectiveness of such design and process integration, including lithography model/analysis, mask synthesis, and lithography friendly physical design.

...read moreread less

113 citations

Journal Article•DOI•

3-D Mesh-Based Optical Network-on-Chip for Multiprocessor System-on-Chip

[...]

Yaoyao Ye¹, Jiang Xu¹, Baihan Huang¹, Xiaowen Wu¹, Wei Zhang², Xuan Wang¹, Mahdi Nikdast¹, Zhehui Wang¹, Weichen Liu¹, Zhe Wang¹ - Show less +6 more•Institutions (2)

Hong Kong University of Science and Technology¹, Nanyang Technological University²

01 Apr 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This paper presents a 3-D mesh-based ONoC for MPSoCs, and new low-cost nonblocking 4 × 4, 5 × 5, 6 × 6, and 7 × 7 optical routers for dimension-order routing in the 3- D mesh- based O noC, and proposes an optimized floorplan.

...read moreread less

Abstract: Optical networks-on-chip (ONoCs) are emerging communication architectures that can potentially offer ultrahigh communication bandwidth and low latency to multiprocessor systems-on-chip (MPSoCs). In addition to ONoC architectures, 3-D integrated technologies offer an opportunity to continue performance improvements with higher integration densities. In this paper, we present a 3-D mesh-based ONoC for MPSoCs, and new low-cost nonblocking 4 × 4, 5 × 5, 6 × 6, and 7 × 7 optical routers for dimension-order routing in the 3-D mesh-based ONoC. Besides, we propose an optimized floorplan for the 3-D mesh-based ONoC. The floorplan follows the regular 3-D mesh topology but implements all optical routers in a single optical layer. The floorplan is optimized to minimize the number of extra waveguide crossings caused when merging the 3-D ONoC to one optical layer. Based on a set of real applications and uniform traffic pattern, we develop a SystemC-based cycle-accurate NoC simulator and compare the 3-D mesh-based ONoC with the matched 2-D mesh-based ONoC and 2-D electronic NoC for performance and energy efficiency. Additionally, we quantitatively analyze thermal effects on the 3-D 8 × 8 × 2 mesh-based ONoC.

...read moreread less

104 citations

Journal Article•DOI•

Board-Level Functional Fault Diagnosis Using Artificial Neural Networks, Support-Vector Machines, and Weighted-Majority Voting

[...]

Fangming Ye¹, Zhaobo Zhang², Krishnendu Chakrabarty¹, Xinli Gu²•Institutions (2)

Duke University¹, Huawei²

01 May 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This work proposes a smart diagnosis method based on two ML classification models, namely, artificial neural networks (ANNs) and support-vector machines (SVMs) that can learn from repair history and accurately localize the root cause of a failure.

...read moreread less

Abstract: Increasing integration densities and high operating speeds lead to subtle manifestation of defects at the board level. Functional fault diagnosis is, therefore, necessary for board-level product qualification. However, ambiguous diagnosis results lead to long debug times and even wrong repair actions, which significantly increase repair cost and adversely impact yield. Advanced machine-learning (ML) techniques offer an unprecedented opportunity to increase the accuracy of board-level functional diagnosis and reduce high-volume manufacturing cost through successful repair. We propose a smart diagnosis method based on two ML classification models, namely, artificial neural networks (ANNs) and support-vector machines (SVMs) that can learn from repair history and accurately localize the root cause of a failure. Fine-grained fault syndromes extracted from failure logs and corresponding repair actions are used to train the classification models. We also propose a decision machine based on weighted-majority voting, which combines the benefits of ANNs and SVMs. Three complex boards from the industry, currently in volume production, and additional synthetic data, are used to validate the proposed methods in terms of diagnostic accuracy, resolution, and quantifiable improvement over current diagnostic software.

...read moreread less

Journal Article•DOI•

LAYGEN II—Automatic Layout Generation of Analog Integrated Circuits

[...]

Ricardo Martins, Nuno Lourenço, Nuno Horta

01 Nov 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: The automatic layout generation is demonstrated here using the LAYGEN II tool for typical analog circuit structures, and the results in GDSII format were validated using the industrial grade verification tool Calibre®.

...read moreread less

Abstract: This paper describes an innovative design automation tool, LAYGEN II, for analog integrated circuit (IC) layout generation based on template descriptions and on evolutionary computation techniques. LAYGEN II was developed giving special emphasis to the reusability of expert knowledge and to the efficiency of retargeting operations. The designer specifies the sized circuit-level structure, the required technology and also, the layout template consisting of technology and specification independent high-level layout guidelines. For placement, the topological relations present in the template are extracted to a nonslicing B*-tree layout representation, and the tool automatically merges devices and improves the floorplan quality. For routing an optimization kernel consisting of a tailored version of the multiobjective multiconstraint evolutionary algorithm NSGA-II is used. The Router optimizes all nets simultaneously and uses a built-in engine to evaluate each of the layout solutions. The automatic layout generation is demonstrated here using the LAYGEN II tool for typical analog circuit structures, and the results in GDSII format were validated using the industrial grade verification tool Calibre®.

...read moreread less

Journal Article•DOI•

RWCap: A Floating Random Walk Solver for 3-D Capacitance Extraction of Very-Large-Scale Integration Interconnects

[...]

Wenjian Yu¹, Hao Zhuang², Chao Zhang¹, Gang Hu³, Zhi Liu - Show less +1 more•Institutions (3)

Tsinghua University¹, Peking University², Columbia University³

01 Mar 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A floating random walk (FRW) solver, called RWCap, is presented for the capacitance extraction of very-large-scale integration (VLSI) interconnects and it is demonstrated that the parallel RWCap is over 6× faster than its serial-computing version.

...read moreread less

Abstract: A floating random walk (FRW) solver, called RWCap, is presented for the capacitance extraction of very-large-scale integration (VLSI) interconnects. An approach, including the numerical characterization of the cross-interface transition probability and weight value, is proposed to accelerate the extraction of structures with multiple dielectric layers. A comprehensive variance reduction scheme based on the importance sampling and stratified sampling is proposed to improve the convergence rate of the FRW algorithm. Finally, the space management technique using an octree data structure and the parallel computing technique are presented to further improve the efficiency. Numerical experiments are carried out with the test cases generated under the 180 and 45-nm process technologies. They demonstrate that the proposed multidielectric FRW algorithm achieves up to 160× speedup over the FRW algorithm using spherical transition domains to cross dielectric interface, with very small memory overhead. The variance reduction techniques further bring 3× or more speedup without memory overhead and the loss of accuracy. The RWCap also outperforms other existing FRW algorithm and fast boundary element method solvers in terms of computational time or scalability. The experiments on an 8-core CPU machine show that the parallel RWCap is over 6× faster than its serial-computing version.

...read moreread less

Journal Article•DOI•

Mining Hardware Assertions With Guidance From Static Analysis

[...]

Samuel Hertz¹, David Sheridan¹, Shobha Vasudevan¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jun 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: The results show that GoldMine can generate complex, high coverage assertions for sequential as well as combinational designs in RTL, thereby minimizing human effort in this process.

...read moreread less

Abstract: We present GoldMine, a methodology for generating assertions automatically in hardware. Our method involves a combination of data mining and static analysis of the register transfer level (RTL) design. The RTL design is first simulated to generate data about the design's dynamic behavior. The generated data is then mined for “candidate assertions” that are likely to be invariants. The data mining algorithm is a decision-tree-based supervised learning algorithm. These candidate assertions are then passed through a formal verification engine to filter out the spurious candidates. The assertions that are attested as true by the formal engine are system invariants. These are then evaluated by a process of designer ranking that is provided as feedback to the data mining engine. We demonstrate the scalability of GoldMine by showing assertion generation of the RTL of Sun's OpenSparc T2 many-threaded processor. Our results show that GoldMine can generate complex, high coverage assertions for sequential as well as combinational designs in RTL, thereby minimizing human effort in this process. GoldMine assertions distill the random input stimulus space and can be used for calibrating directed tests. They can be used in a regression test suite of an evolving RTL. They are also useful in providing differing perspectives from the designer, as well as hints to designers for manually writing assertions.

...read moreread less

Journal Article•DOI•

On Effective Through-Silicon Via Repair for 3-D-Stacked ICs

[...]

Li Jiang¹, Qiang Xu¹, Bill Eklow²•Institutions (2)

The Chinese University of Hong Kong¹, Cisco Systems, Inc.²

01 Apr 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A novel TSV repair framework is presented, including a hardware redundancy architecture that enables faulty TSVs to be repaired by redundant TSVs that are farther apart, the corresponding repair algorithm and the redundancy architecture construction, which can improve the manufacturing yield for 3-D-stacked ICs.

...read moreread less

Abstract: 3-D-stacked integrated circuits (ICs) that employ through-silicon vias (TSVs) to connect multiple dies vertically have gained wide-spread interest in the semiconductor industry. In order to be commercially viable, the assembly yield for 3-D-stacked ICs must be as high as possible, requiring TSVs to be reparable. Existing techniques typically assume TSV faults to be uniformly distributed and use neighboring TSVs to repair faulty ones, if any. In practice, however, clustered TSV faults are quite common due to the fact that the TSV bonding quality depends on surface roughness and cleanness of silicon dies, rendering prior TSV redundancy solutions less effective. Furthermore, existing techniques consume a lot of redundant TSVs that are still costly in the current TSV process. This inefficient TSV redundancy can limit the amount of TSVs that is allowed to use and may even become the obstacle to commercial production. To resolve this problem, we present a novel TSV repair framework, including a hardware redundancy architecture that enables faulty TSVs to be repaired by redundant TSVs that are farther apart, the corresponding repair algorithm and the redundancy architecture construction. By doing so, the manufacturing yield for 3-D-stacked ICs can be dramatically improved, as demonstrated in our experimental results.

...read moreread less

Journal Article•DOI•

Efficient Gröbner Basis Reductions for Formal Verification of Galois Field Arithmetic Circuits

[...]

Jinpeng Lv¹, Priyank Kalla², Florian Enescu³•Institutions (3)

Cadence Design Systems¹, University of Utah², Georgia State University³

01 Sep 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This paper addresses formal verification of combinational arithmetic circuits over Galois fields of the type F2k using a computer-algebra/algebraic-geometry-based approach and demonstrates the ability of this approach to verify the correctness of, and detect bugs in, up to 163-bit circuits in F2163-whereas verification utilizing contemporary techniques proves infeasible.

...read moreread less

Abstract: Galois field arithmetic is a critical component in communication and security-related hardware, requiring dedicated arithmetic architectures for better performance. In many Galois field applications, such as cryptography, the data-path size in the circuits can be very large. Formal verification of such circuits is beyond the capabilities of contemporary verification techniques. This paper addresses formal verification of combinational arithmetic circuits over Galois fields of the type F2k using a computer-algebra/algebraic-geometry-based approach. The verification problem is formulated as membership testing of a given specification polynomial in a corresponding ideal generated by the circuit constraints. Ideal membership testing requires the computation of a Grobner basis, which is computationally very expensive. To overcome this limitation, we analyze the circuit topology and derive a term order to represent the polynomials. Subsequently, using the theory of Grobner bases over F2k, we show that this term order renders the set of polynomials itself a minimal Grobner basis of this ideal. Consequently, the verification test reduces to a much simpler case of Grobner basis reduction via polynomial division, significantly enhancing verification efficiency. To further improve our approach, we exploit the concepts presented in the F4 algorithm for Grobner basis, and show that the verification test can be formulated as Gaussian elimination on a matrix representation of the problem. Finally, we demonstrate the ability of our approach to verify the correctness of, and detect bugs in, up to 163-bit circuits in F2163-whereas verification utilizing contemporary techniques proves infeasible.

...read moreread less

Journal Article•DOI•

NICSLU: An Adaptive Sparse Matrix Solver for Parallel Circuit Simulation

[...]

Xiaoming Chen¹, Yu Wang¹, Huazhong Yang¹•Institutions (1)

Tsinghua University¹

01 Feb 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: An adaptive sparse matrix solver called NICSLU is proposed, which uses a multithreaded parallel LU factorization algorithm on shared-memory computers with multicore/multisocket central processing units to accelerate circuit simulation.

...read moreread less

Abstract: The sparse matrix solver has become a bottleneck in simulation program with integrated circuit emphasis (SPICE)-like circuit simulators. It is difficult to parallelize the solver because of the high data dependency during the numeric LU factorization and the irregular structure of circuit matrices. This paper proposes an adaptive sparse matrix solver called NICSLU, which uses a multithreaded parallel LU factorization algorithm on shared-memory computers with multicore/multisocket central processing units to accelerate circuit simulation. The solver can be used in all the SPICE-like circuit simulators. A simple method is proposed to predict whether a matrix is suitable for parallel factorization, such that each matrix can achieve optimal performance. The experimental results on 35 matrices reveal that NICSLU achieves speedups of 2.08× ~ 8.57×(on the geometric mean), compared with KLU, with 1-12 threads, for the matrices which are suitable for the parallel algorithm. NICSLU can be downloaded from http://nicslu.weebly.com.

...read moreread less

Journal Article•DOI•

TSV-Aware Analytical Placement for 3-D IC Designs Based on a Novel Weighted-Average Wirelength Model

[...]

Meng-Kai Hsu¹, Valeriy Balabanov¹, Yao-Wen Chang¹•Institutions (1)

National Taiwan University¹

01 Apr 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This paper proposes a new 3-D cell placement algorithm that can additionally consider the sizes of TSVs and the physical positions for TSV insertion during placement, and can achieve the best routed wirelength, TSV counts, and total silicon area, in shortest running time.

...read moreread less

Abstract: Through-silicon vias (TSVs) are required for transmitting signals among different dies for the 3-D integrated circuit (IC) technology. The significant silicon areas occupied by TSVs bring critical challenges for 3-D IC placement. Unlike most published 3-D placement works that only minimize the number of TSVs during placement due to the limitations in their techniques, this paper proposes a new 3-D cell placement algorithm that can additionally consider the sizes of TSVs and the physical positions for TSV insertion during placement. The algorithm consists of three stages: 1) 3-D analytical global placement with density optimization and whitespace reservation for TSVs; 2) TSV insertion and TSV-aware legalization; and 3) layer-by-layer detailed placement. In particular, the global placement is based on a novel weighted-average (WA) wirelength model, giving the first published model that can outperform the well-known log-sum-exp wirelength model theoretically and empirically. Also, a scheme is proposed to enhance the numerical stability of the WA wirelength model. Furthermore, 3-D routing can easily be accomplished by traditional 2-D routers since the physical positions of TSVs are determined during placement. Experimental results show the effectiveness of our algorithm. Compared with state-of-the-art 3-D cell placement works, our algorithm can achieve the best routed wirelength, TSV counts, and total silicon area, in shortest running time.

...read moreread less

Journal Article•DOI•

A 3-D Split Manufacturing Approach to Trustworthy System Development

[...]

Jonathan Valamehr¹, Timothy Sherwood¹, Ryan Kastner², D. Marangoni-Simonsen, Ted Huffmire, Cynthia E. Irvine, Timothy E. Levin - Show less +3 more•Institutions (2)

University of California, Santa Barbara¹, Harvey Mudd College²

01 Apr 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: It is argued that a split manufacturing approach to hardware trust based on 3-D integration is viable and provides several advantages over other approaches.

...read moreread less

Abstract: Securing the supply chain of integrated circuits is of utmost importance to computer security. In addition to counterfeit microelectronics, the theft or malicious modification of designs in the foundry can result in catastrophic damage to critical systems and large projects. In this letter, we describe a 3-D architecture that splits a design into two separate tiers: one tier that contains critical security functions is manufactured in a trusted foundry; another tier is manufactured in an unsecured foundry. We argue that a split manufacturing approach to hardware trust based on 3-D integration is viable and provides several advantages over other approaches.

...read moreread less

Journal Article•DOI•

Preemptible I/O Scheduling of Garbage Collection for Solid State Drives

[...]

Junghee Lee¹, Youngjae Kim¹, Galen M. Shipman², Sarp Oral², Jongman Kim² - Show less +1 more•Institutions (2)

Georgia Institute of Technology¹, Oak Ridge National Laboratory²

01 Feb 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This paper examines the GC process and proposes a semipreemptible GC (PGC) scheme that allows GC processing to be preempted while pending I/O requests in the queue are serviced and further enhance flash performance by pipelining internal GC operations and merging them with pending I-O requests whenever possible.

...read moreread less

Abstract: Unlike hard disks, flash devices use out-of-place updates operations and require a garbage collection (GC) process to reclaim invalid pages to create free blocks. This GC process is a major cause of performance degradation when running concurrently with other I/O operations as internal bandwidth is consumed to reclaim these invalid pages. The invocation of the GC process is generally governed by a low watermark on free blocks and other internal device metrics that different workloads meet at different intervals. This results in an I/O performance that is highly dependent on workload characteristics. In this paper, we examine the GC process and propose a semipreemptible GC (PGC) scheme that allows GC processing to be preempted while pending I/O requests in the queue are serviced. Moreover, we further enhance flash performance by pipelining internal GC operations and merge them with pending I/O requests whenever possible. Our experimental evaluation of this semi-PGC scheme with realistic workloads demonstrates both improved performance and reduced performance variability. Write-dominant workloads show up to a 66.56% improvement in average response time with a 83.30% reduced variance in response time compared to the non-PGC scheme. In addition, we explore opportunities of a new NAND flash device that supports suspend/resume commands for read, write, and erase operations for fully PGC (F-PGC). Our experiments with an F-PGC enabled flash device show that request response time can be improved by up to 14.57% compared to semi-PGC.

...read moreread less

Journal Article•DOI•

Recomputing with Permuted Operands: A Concurrent Error Detection Approach

[...]

Xiaofei Guo¹, Ramesh Karri¹•Institutions (1)

New York University¹

01 Oct 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A concurrent error detection technique called recomputing with permuted operands (REPO) is developed that is cost effective in advanced encryption standard (AES) and a secure hash function Grøstl and achieves close to 100% fault coverage for multiple byte faults.

...read moreread less

Abstract: Naturally occurring and maliciously injected faults reduce the reliability of cryptographic hardware and may leak confidential information. We develop a concurrent error detection technique (CED) called recomputing with permuted operands (REPO). We show that it is cost effective in advanced encryption standard (AES) and a secure hash function Grostl. We provide experimental results and formal proofs to show that REPO detects all single-bit and single-byte faults. Experimental results show that REPO achieves close to 100% fault coverage for multiple byte faults. The hardware and throughput overheads are compared with those of previously reported CED techniques on two Xilinx Virtex FPGAs. The hardware overhead is 12.4%-27.3%, and the throughput is 1.2-23 Gbps, depending on the AES architecture, FPGA family, and detection latency. The performance overhead ranges from 10% to 100% depending on the security level. Moreover, the proposed technique can be integrated into various block cipher modes of operation. We also discuss the limitation of REPO and its potential vulnerabilities.

...read moreread less

Journal Article•DOI•

Data Placement and Duplication for Embedded Multicore Systems With Scratch Pad Memory

[...]

Yibo Guo¹, Qingfeng Zhuge², Jingtong Hu¹, Juan Yi², Meikang Qiu³, E. H-M Sha² - Show less +2 more•Institutions (3)

University of Texas at Dallas¹, Chongqing University², University of Kentucky³

01 Jun 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: Two polynomial time algorithms, RDPM and RDPM-DUP, have been proposed to generate near-optimal data placement with minimum total cost to effectively utilizing SPMs on multicore systems.

...read moreread less

Abstract: Scratch pad memories (SPM) are attractive alternatives for caches on multicore systems since caches are relatively expensive in terms of area and energy consumption. The key to effectively utilizing SPMs on multicore systems is the data placement algorithm. In this paper, two polynomial time algorithms, regional data placement for multicore (RDPM) and regional data placement for multicore with duplication (RDPM-DUP), have been proposed to generate near-optimal data placement with minimum total cost. There is only one copy for each data in RDPM, while RDPM-DUP allows data duplication. Experimental results show that the proposed RDPM algorithm alone can reduce the time cost of memory accesses by 32.68% on average compared with existing algorithms. With data duplication, the RDPM-DUP algorithm further reduces the time cost by 40.87%. In terms of energy consumption, the proposed RDPM algorithm with exclusive copy can reduce the total cost by 33.47% on average. When RDPM-DUP is applied, the improvement increases up to 38.15% on average.

...read moreread less

Journal Article•DOI•

Analytical Thermal Model for Self-Heating in Advanced FinFET Devices With Implications for Design and Reliability

[...]

Chuan Xu¹, Seshadri Kolluri¹, Kazuhiko Endo, Kaustav Banerjee¹•Institutions (1)

University of California, Santa Barbara¹

01 Jul 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A rigorous analytical thermal model has been formulated for the analysis of self-heating effects in FinFETs, under both steady-state and transient stress conditions, which is critical for improving circuit performance and electrical overstress/electrostatic discharge (ESD) reliability.

...read moreread less

Abstract: A rigorous analytical thermal model has been formulated for the analysis of self-heating effects in FinFETs, under both steady-state and transient stress conditions. 3-D self-consistent electrothermal simulations, tuned with experimentally measured electrical characteristics, were used to understand the nature of self-heating in FinFETs and calibrate the proposed model. The accuracy of the model has been demonstrated for a wide range of multifin devices by comparing it against finite element simulations. The model has been applied to carry out a detailed sensitivity analysis of self-heating with respect to various FinFET parameters and structures, which are critical for improving circuit performance and electrical overstress/electrostatic discharge (ESD) reliability. The transient model has been used to estimate the thermal time constants of these devices and predict the sensitivity of power-to-failure to various device parameters, for both long and short pulse ESD situations. Suitable modifications to the model are also proposed for evaluating the thermal characteristics of production level FinFET (or Tri-gate FET) structures involving metal-gates, body-tied bulk FinFETs, and trench contacts.

...read moreread less

Journal Article•DOI•

Virtual Channels and Multiple Physical Networks: Two Alternatives to Improve NoC Performance

[...]

Young Jin Yoon¹, Nicola Concer², Michele Petracca³, Luca P. Carloni¹•Institutions (3)

Columbia University¹, NXP Semiconductors², Cadence Design Systems³

01 Dec 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A comprehensive comparative analysis of virtual channels and multiple physical networks, including an analytical model, synthesis-based designs with both FPGAs and standard-cell libraries, and system-level simulations identifies the scenarios where each method is best suited to achieve high performance, very low power dissipation, and increased design flexibility.

...read moreread less

Abstract: Virtual channels (VC) and multiple physical (MP) networks are two alternative methods to provide better performance, support quality-of-service, and avoid protocol deadlocks in packet-switched network-on-chip design. Since contention can be dynamically resolved, VCs give lower zero-load packet latency than MPs; however, MPs can be built with simpler routers and narrower channels, which improves the target clock frequency, power dissipation, and area occupation. In this paper, we present a comprehensive comparative analysis of these two design approaches, including an analytical model, synthesis-based designs with both FPGAs and standard-cell libraries, and system-level simulations. The result of our analysis shows that one solution does not outperform the other in all the tested scenarios. Instead, each approach has its own specific strengths and weaknesses. Hence, we identify the scenarios where each method is best suited to achieve high performance, very low power dissipation, and increased design flexibility.

...read moreread less

Journal Article•DOI•

A Reliability-Oriented Placement Algorithm for Reconfigurable Digital Microfluidic Biochips Using 3-D Deferred Decision Making Technique

[...]

Ying-Han Chen¹, Chung-Lun Hsu², Li-Chen Tsai¹, Tsung-Wei Huang³, Tsung-Yi Ho¹ - Show less +1 more•Institutions (3)

National Cheng Kung University¹, University of California, San Diego², University of Texas at Austin³

01 Aug 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: Experimental results demonstrate that the proposed technique can achieve reliability-oriented placement for DMFBs without excessive actuation in each electrode, while optimizing bioassay completion time.

...read moreread less

Abstract: In recent studies, digital microfluidic biochips (DMFBs) have been a promising solution for lab-on-a-chip and bio-assay experiments because of their flexible application and low fabrication cost. However, the reliability problem is an imperative issue to guarantee the valid function of DMFBs. The reliability of DMFBs decreases when electrodes are excessively actuated, preventing droplets on DMFBs controlled successfully. Because the placement for bio-assays in DMFBs is a key step in generating corresponding actuating signals, the reliability of DMFBs must be considered during biochip placement to avoid excessive actuation. Although researchers have proposed several DMFB placement algorithms, they have failed to consider the reliability issue. In addition, previous algorithms were all based on the simulated-annealing (SA) method, which is time consuming and does not guarantee to obtain an optimal solution. This paper proposes the first reliability-oriented non-SA placement algorithm for DMFBs. This approach considers the reliability problem during placement, and uses the 3-D deferred decision making (3D-DDM) technique to enumerate only possible placement solutions. Large-scale DMFB placement can be synthesized efficiently by partitioning the operation sequential graph of bioassays. Experimental results demonstrate that the proposed technique can achieve reliability-oriented placement for DMFBs without excessive actuation in each electrode, while optimizing bioassay completion time.

...read moreread less

Journal Article•DOI•

Reactant and Waste Minimization in Multitarget Sample Preparation on Digital Microfluidic Biochips

[...]

Juinn-Dar Huang¹, Chia-Hung Liu¹, Huei-Shan Lin¹•Institutions (1)

National Chiao Tung University¹

01 Oct 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A multitarget sample preparation algorithm that extensively exploits the ideas of waste recycling and intermediate droplet sharing to reduce both reactant usage and waste amount for digital microfluidic biochips is proposed.

...read moreread less

Abstract: Sample preparation is one of essential processes in biochemical reactions. Raw reactants are diluted in this process to achieve given target concentrations. A bioassay may require several different target concentrations of a reactant. Both the dilution operation count and the reactant usage can be minimized if multiple target concentrations are considered simultaneously during sample preparation. Hence, in this paper, we propose a multitarget sample preparation algorithm that extensively exploits the ideas of waste recycling and intermediate droplet sharing to reduce both reactant usage and waste amount for digital microfluidic biochips. Experimental results show that our waste recycling algorithm can reduce the waste and operation count by 48% and 37%, respectively, as compared to an existing state-of-the-art multitarget sample preparation method if the number of target concentrations is ten. The reduction can be up to 97% and 73% when the number of target concentrations goes even higher.

...read moreread less

Journal Article•DOI•

Oscillation-Based Prebond TSV Test

[...]

Li-Ren Huang¹, Shi-Yu Huang¹, Stephen Sunter², Kun-Han Tsai², Wu-Tung Cheng² - Show less +1 more•Institutions (2)

National Tsing Hua University¹, Mentor Graphics²

01 Sep 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This paper presents a versatile prebond TSV test method applicable before wafer thinning when the deep end of the TSV is inaccessible as buried in the still-thick wafer.

...read moreread less

Abstract: Testing the quality of prebond through-silicon vias (TSV) is a vital part of the Known-Good-Die test that is often necessary to retain a high compound yield for 3-D stacked integrated circuits. In this paper, we present a versatile prebond TSV test method applicable before wafer thinning when the deep end of the TSV is inaccessible as buried in the still-thick wafer. Technical merits include: 1) the ability to handle both the resistive open fault and the leakage fault in the same test structure; 2) a capability that allows an user to have a better measure of the severity of the fault; and 3) an all-digital and easy to implement design-for-testability circuit.

...read moreread less

Journal Article•DOI•

Software Model Checking SystemC

[...]

Alessandro Cimatti¹, Iman Narasamdya¹, Marco Roveri¹•Institutions (1)

fondazione bruno kessler¹

01 May 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This paper proposes an accurate model of SystemC and three complementary encodings of systemC to finite-state processes, sequential and threaded programming models and shows the effectiveness of the threaded and of the finite-modelencodings to prove and disprove properties, respectively.

...read moreread less

Abstract: SystemC is an increasingly used language for writing executable specifications of systems-on-chip. The verification of SystemC, however, is a very difficult challenge. Simulation features great scalability, but can miss important defects. On the other hand, formal verification of SystemC is extremely hard because of the presence of threads, and the intricacies of the communication and scheduling mechanisms. In this paper, we explore formal verification for SystemC by means of software model checking techniques, which have demonstrated substantial progress in recent years. We propose an accurate model of SystemC and three complementary encodings of SystemC to finite-state processes, sequential and threaded programming models. We implement the proposed approaches in a tool chain and carry out a thorough experimental evaluation using several benchmarks taken from the literature on SystemC verification, and experimenting with different state-of-the-art software model checkers. The results clearly show the applicability and efficiency of the proposed approaches. In particular, the results show the effectiveness of the threaded and of the finite-model encodings to prove and disprove properties, respectively.

...read moreread less

Journal Article•DOI•

Charge Allocation in Hybrid Electrical Energy Storage Systems

[...]

Qing Xie¹, Yanzhi Wang¹, Younghyun Kim², Massoud Pedram¹, Naehyuck Chang² - Show less +1 more•Institutions (2)

University of Southern California¹, Seoul National University²

01 Jul 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This paper is the first to formally describe the global charge allocation problem in HEES systems, namely, distributing a specified level of incoming power to a subset of destination EES banks so that maximum charge allocation efficiency is achieved.

...read moreread less

Abstract: A hybrid electrical energy storage (HEES) system consists of multiple banks of heterogeneous electrical energy storage (EES) elements placed between a power source and some load devices and providing charge storage and retrieval functions For an HEES system to perform its desired functions of 1) reducing electricity costs by storing electricity obtained from the power grid at off-peak times when its price is lower, for use at peak times instead of electricity that must be bought then at higher prices, and 2) alleviating problems, such as excessive power fluctuation and undependable power supply, which are associated with the use of large amounts of renewable energy on the grid, appropriate charge management policies must be developed in order to efficiently store and retrieve electrical energy while attaining performance metrics that are close to the respective best values across the constituent EES banks in the HEES system This paper is the first to formally describe the global charge allocation problem in HEES systems, namely, distributing a specified level of incoming power to a subset of destination EES banks so that maximum charge allocation efficiency is achieved The problem is formulated as a mixed integer nonlinear program with the objective function set to the global charge allocation efficiency and the constraints capturing key requirements and features of the system such as the energy conservation law, power conversion losses in the chargers, the rate capacity, and self-discharge effects in the EES elements A rigorous algorithm is provided to obtain near-optimal charge allocation efficiency under a daily charge allocation schedule A photovoltaic array is used as an example of the power source for the charge allocation process and a heuristic is provided to predict the solar radiation level with a high accuracy Simulation results using this photovoltaic cell array and a representative HEES system demonstrate up to 25% gain in the charge allocation efficiency by employing the proposed algorithm

...read moreread less

Journal Article•DOI•

An Analytical Placement Framework for 3-D ICs and Its Extension on Thermal Awareness

[...]

Guojie Luo¹, Yiyu Shi², Jason Cong³•Institutions (3)

Peking University¹, Missouri University of Science and Technology², University of California, Los Angeles³

01 Apr 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: The experimental results show that this analytical approach is effective for achieving tradeoffs between the wirelength and the through-silicon-via (TSV) number, and suggest that considering the thermal effects of TSVs is necessary and effective during the placement stage.

...read moreread less

Abstract: In this paper, we present a high-quality analytical 3-D placement framework. We propose using a Huber-based local smoothing technique to work with a Helmholtz-based global smoothing technique to handle the nonoverlapping constraints. The experimental results show that this analytical approach is effective for achieving tradeoffs between the wirelength and the through-silicon-via (TSV) number. Compared to the state-of-the-art 3-D placer ntuplace3d, our placer achieves more than 20% wirelength reduction, on average, with a similar number of TSVs. Furthermore, we extend this analytical 3-D placement framework with thermal awareness. While 2-D thermal-aware placement simply follows uniform power distribution to minimize temperature, we show that the same criterion does not work for 3-D ICs. Instead, we are able to prove that when the TSV area in each bin is proportional to the lumped power consumption of that bin and the bins in all tiers directly above it, the peak temperature is minimized. Based on this criterion, we implement thermal awareness in our analytical 3-D placement framework. Compared with a TSV oblivious method, which only results in an 8% peak temperature reduction, our method reduces the peak temperature by 34%, on average, with slightly less wirelength overhead. These results suggest that considering the thermal effects of TSVs is necessary and effective during the placement stage.

...read moreread less

Collapse