scispace - formally typeset
Search or ask a question

Showing papers in "ACM Journal on Emerging Technologies in Computing Systems in 2009"


Journal ArticleDOI
TL;DR: Two techniques for designing scan chains in 3D ICs, with given constraints on the number of through-silicon-vias (TSVs) are proposed; the first technique is based on a genetic algorithm (GA), and it addresses the ordering of cells in a single scan chain, and the second isbased on integer linear programming (ILP); it addresses single-scan-chain ordering.
Abstract: Scan chains are widely used to improve the testability of integrated circuit (IC) designs and to facilitate fault diagnosis. For traditional 2D IC design, a number of design techniques have been proposed in the literature for scan-chain routing and scan-cell partitioning. However, these techniques are not effective for three-dimensional (3D) technologies, which have recently emerged as a promising means to continue technology scaling. In this article, we propose two techniques for designing scan chains in 3D ICs, with given constraints on the number of through-silicon-vias (TSVs). The first technique is based on a genetic algorithm (GA), and it addresses the ordering of cells in a single scan chain. The second optimization technique is based on integer linear programming (ILP); it addresses single-scan-chain ordering as well as the partitioning of scan flip-flops into multiple scan chains. We compare these two methods by conducting experiments on a set of ISCAS'89 benchmark circuits. The first conclusion obtained from the results is that 3D scan-chain optimization achieves significant wire-length reduction compared to 2D counterparts. The second conclusion is that the ILP-based technique provides lower bounds on the scan-chain interconnect length for 3D ICs, and it offers considerable reduction in wire-length compared to the GA-based heuristic method.

38 citations


Journal ArticleDOI
TL;DR: A high-performance reconfigurable architecture, called NATURE, that utilizes CMOS logic and nano RAMs that supports fine-grain runtime reconfiguration and temporal logic folding of a circuit before being mapped to the architecture is proposed.
Abstract: Rapid progress on nanodevices points to a promising direction for future circuit design. However, since nanofabrication techniques are not yet mature, implementation of nanocircuits, at least on a large scale, in the near future is infeasible. To ease fabrication and overcome the problem of high defect levels in nanotechnology, hybrid nano/CMOS reconfigurable architectures are attractive choices. Moreover, if the current photolithography fabrication process can be used to manufacture the hybrid chips, the benefits of nanotechnologies can be realized today.Traditional reconfigurable architectures can only support partial or coarse-grain runtime reconfiguration due to their limited on-chip storage and long off-chip reconfiguration latency. Recent progress on nano Random Access Memories (RAMs), such as carbon nanotube-based RAM (NRAM), Phase-Change Memory (PCM), magnetoresistive RAM (MRAM), etc., provides us with a chance to realize on-chip fine-grain runtime reconfiguration. These nano RAMs have good compatibility with the current fabrication process. By utilizing them in the hybrid design, we can take advantage of both CMOS and nanotechnology, and greatly improve the logic density, resource utilization, and performance of our design.In this article, we propose a high-performance reconfigurable architecture, called NATURE, that utilizes CMOS logic and nano RAMs. An automatic design flow for NATURE is presented in Part II of the article. In NATURE, the highly dense nonvolatile nano RAMs are distributed throughout the chip to allow large embedded on-chip configuration storage, which enables fast reading and hence supports fine-grain runtime reconfiguration and temporal logic folding of a circuit before being mapped to the architecture. Temporal logic folding can significantly increase the logic density of NATURE (by over an order of magnitude for large circuits) while remaining competitive in performance and power consumption. For ease of exposition, we use NRAMs to illustrate various concepts in this article due to the excellent properties of NRAMs. However, other nano RAMs can also be used instead. Experimental results based on NRAMs establish the efficacy of NATURE.

25 citations


Journal ArticleDOI
TL;DR: A significant generalization of TCMS (Threshold Control through Multiple Supply Voltages) to the design of any logic circuit, which represents a significant divergence from the conventional multiple supply voltage schemes considered in the past and obviates the need for voltage level-converters.
Abstract: According to Moore's law, the number of transistors in a chip doubles every 18 months. The increased transistor-count leads to increased power density. Thus, in modern circuits, power efficiency is a central determinant of circuit efficiency. With scaling, leakage power accounts for an increasingly larger portion of the total power consumption in deep submicron technologies (>40p).FinFET technology has been proposed as a promising alternative to deep submicron bulk CMOS technology, because of its better scalability, short-channel characteristics, and ability to suppress leakage current and mitigate device-to-device variability when compared to bulk CMOS. The subthreshold slope of a FinFET is approximately 60mV which is close to ideal.In this article, we propose a methodology for low-power FinFET based circuit synthesis. A mechanism called TCMS (Threshold Control through Multiple Supply Voltages) was previously proposed for improving the power efficiency of FinFET based global interconnects. We propose a significant generalization of TCMS to the design of any logic circuit. This scheme represents a significant divergence from the conventional multiple supply voltage schemes considered in the past. It also obviates the need for voltage level-converters. We employ accurate delay and power estimates using table look-up methods based on HSPICE simulations for supply voltage and threshold voltage optimization. Experimental results demonstrate that TCMS can provide power savings of 67.6p and device area savings of 65.2p under relaxed delay constraints. Two other variants of TCMS are also proposed that yield similar benefits. We compare our scheme to extended cluster voltage scaling (ECVS), a popular dual-Vdd scheme presented in the literature. ECVS makes use of voltage level-converters. Even when it is assumed that these level-converters have zero delay, thus significantly favoring ECVS in time-constrained power optimization, TCMS still outperforms ECVS.

24 citations


Journal ArticleDOI
TL;DR: Efficient parallel testing and diagnosis algorithms are presented that can detect and locate single as well as multiple faults in a microfluidic array without flooding the array, a problem that has hampered realistic implementation of several existing strategies.
Abstract: Microfluidics-based biochips consist of microfluidic arrays on rigid substrates through which movement of fluids is tightly controlled to facilitate biological reactions. Biochips are soon expected to revolutionize biosensing, clinical diagnostics, environmental monitoring, and drug discovery. Critical to the deployment of the biochips in such diverse areas is the dependability of these systems. Thus robust testing and diagnosis techniques are required to ensure adequate level of system dependability. Due to the underlying mixed technology and mixed energy domains, such biochips exhibit unique failure mechanisms and defects. In this article efficient parallel testing and diagnosis algorithms are presented that can detect and locate single as well as multiple faults in a microfluidic array without flooding the array, a problem that has hampered realistic implementation of several existing strategies. The fault diagnosis algorithms are well suited for built-in self-test that could drastically reduce the operating cost of microfluidic biochip. Also, the proposed alogirthms can be used both for testing and fault diagnosis during field operation as well as increasing yield during the manufacturing phase of the biochip. Furthermore, these algorithms can be applied to both online and offline testing and diagnosis. Analytical results suggest that these strategies that can be used to design highly dependable biochip systems.

21 citations


Journal ArticleDOI
TL;DR: A nanodevice paradigm: graphene nanoelectronics is introduced and it is predicted that graphene-based devices may replace carbon nanotube devices and become major building blocks for future nanoscale computing.
Abstract: The continued miniaturization of silicon-based electronic circuits is fast approaching its physical limitations. It is unlikely that advances in miniaturization, following the so-called Moore's Law, can continue in the foreseeable future. Nanoelectronics has to go beyond silicon technology. New device paradigms based on nanoscale materials, such as molecular electronic devices, spin devices and carbon-based devices, will emerge. In this article, we introduce a nanodevice paradigm: graphene nanoelectronics. Due to its unique quantum effects and electronic properties, researchers predict that graphene-based devices may replace carbon nanotube devices and become major building blocks for future nanoscale computing. To manifest its unique electronic properties, we present some of our recent designs, namely a graphene-based switch, a negative differential resistance (NDR) device and a random access memory array (RAM). Since these basic devices are the building blocks for large-scale circuits, our findings can help researchers construct useful computing systems and study graphene-based circuit performance in the future.

18 citations


Journal ArticleDOI
TL;DR: This article analyzes a novel, QCA-based, Programmable Logic Array (PLA) structure, develops an implementation independent fault model, and introduces techniques for mapping Boolean logic functions to a defects-based PLA.
Abstract: Defect tolerance will be critical in any system with nanoscale feature sizes. This article examines some fundamental aspects of defect tolerance for a reconfigurable system based on Quantum-dot Cellular Automata (QCA). We analyze a novel, QCA-based, Programmable Logic Array (PLA) structure, develop an implementation independent fault model, and discuss how expected defects and faults might affect yield. Within this context, we introduce techniques for mapping Boolean logic functions to a defective QCA-based PLA. Simulation results show that our new mapping techniques can achieve higher yields than existing techniques.

16 citations


Journal ArticleDOI
TL;DR: Experimental results demonstrate that NanoMap can judiciously trade off area and delay targeting different optimization goals, and effectively exploit the advantages of NATURE.
Abstract: In Part I of this work, a hybrid nano/CMOS reconfigurable architecture, called NATURE, was described. It is composed of CMOS reconfigurable logic and interconnect fabric, and nonvolatile nano on-chip memory. Through its support for cycle-by-cycle runtime reconfiguration and a highly-efficient computation model, temporal logic folding, NATURE improves logic density and area-delay product by more than an order of magnitude compared to existing CMOS-based field-programmable gate arrays (FPGAs). NATURE can be fabricated using mainstream photo-lithography fabrication techniques. Thus, it offers a currently commercially feasible architecture with high performance, superior logic density, and excellent runtime design flexibility.In Part II of this work, we present an integrated design and optimization flow for NATURE, called NanoMap. Given an input design specified in register-transfer level (RTL) and/or gate-level VHDL, NanoMap optimizes and implements the design on NATURE through logic mapping, temporal clustering, temporal placement, and routing. As opposed to other design tools for traditional FPGAs, NanoMap supports and leverages temporal logic folding by integrating novel mapping techniques. It can automatically explore and identify the best temporal logic folding configuration, targeting area, delay or area-delay product optimization. A force-directed scheduling technique is used to optimize and balance resource usage across different folding cycles. By supporting logic folding, NanoMap can provide significant design flexibility in performing area-delay trade-offs under various user-specified constraints. We present details of the mapping procedure and results for different architectural instances. Experimental results demonstrate that NanoMap can judiciously trade off area and delay targeting different optimization goals, and effectively exploit the advantages of NATURE.Part I of this work will appear in JETC Vol. 5, No. 4.

15 citations


Journal ArticleDOI
TL;DR: A quantum-dot cellular automata (QCA) design of an nxm-bit, shift-register-based memory architecture is presented that provides size and latency improvements over other known one-bit memory cells through its novel clocking scheme.
Abstract: A quantum-dot cellular automata (QCA) design of an nxm-bit, shift-register-based memory architecture is presented. The architecture maintains data at a stable conformation, which is contrary to traditional data in-motion concept for QCA architectures. The memory architecture is based on an existing dual-phase-synchronized, line-based, one-bit QCA memory cell building block that provides size and latency improvements over other known one-bit memory cells through its novel clocking scheme. Read/write latencies up to ∼2X lower than the existing tile-based architecture with three-phase, line-based memory cells are obtained. Simulations with QCADesigner and HDLQ are performed on a sample 4x8 bit memory architecture implementation.

10 citations


Journal ArticleDOI
TL;DR: An application-independent defect tolerant scheme for reconfigurable crossbar array nanoarchitectures is presented and the main feature of this approach is that the existence and location of defective resources within the nano-fabric are hidden from the entire design flow, resulting in minimum post-Fabrication customization per chip and minimum changes to the whole design and synthesis flow.
Abstract: It is anticipated that the number of defects in nanoscale devices fabricated using bottom-up self-assembly process is significantly higher than that for CMOS devices fabricated by conventional top-down lithography patterning. This is mainly because of inherent lack of control in self-assembly fabrication as well as atomic scale of devices. The goal of defect tolerance, as an integral part of nano computing, is to obtain error-free computation from such fabrics containing defective elements.In this article, an application-independent defect tolerant scheme for reconfigurable crossbar array nanoarchitectures is presented. The main feature of this approach is that the existence and location of defective resources within the nano-fabric are hidden from the entire design flow, resulting in minimum post-fabrication customization per chip and minimum changes to the entire design and synthesis flow. It is also shown how to drastically minimize the area overhead associated with this flow. The proposed technique requires extraction of regular yet incomplete defect-free subsets, in contrast to previously proposed complete defect-free subsets. This can greatly reduce the area overhead required for defect tolerance while not sacrificing logic mapping or signal routing capabilities. Extensive simulation results confirm considerable reduction in the area overhead without any negative impact on the usability of modified defect-free subsets.

9 citations


Journal ArticleDOI
TL;DR: A multipurpose simulator for ballistic nanostructures, based on classical mechanics of electrons at the Fermi level, has been successfully implemented and results provide design guidelines for devices which operate on ballistic transport principles.
Abstract: A multipurpose simulator for ballistic nanostructures, based on classical mechanics of electrons at the Fermi level, has been successfully implemented. Despite the simplicity of the model, the simulator successfully reproduces a number of experimental results, and is shown to consistently match observed current-voltage characteristics and magnetoresistance phenomena. The simulator results provide design guidelines for devices which operate on ballistic transport principles. Using the simulator, preliminary logic structures have been designed based on the ballistic deflection transistor.

9 citations


Journal ArticleDOI
TL;DR: A hybrid nanowire-CMOS architecture is proposed and evaluated that addresses all three problems—namely high defect rates, unlocated defects, and transient faults—at the same time by using multiple levels of redundancy and majority voters.
Abstract: As the end of the semiconductor roadmap for CMOS approaches, architectures based on nanoscale molecular devices are attracting attention. Among several alternatives, silicon nanowires and carbon nanotubes are the two most promising nanotechnologies according to the ITRS. These technologies may enable scaling deep into the nanometer regime. However, they suffer from very defect-prone manufacturing processes. Although the reconfigurability property of the nanoscale devices can be used to tolerate high defect rates, it may not be possible to locate all defects. With very high device densities, testing each component may not be possible because of time or technology restrictions. This points to a scenario in which even though the devices are tested, the tests are not very comprehensive at locating defects, and hence the shipped chips are still defective. Moreover, the devices in the nanometer range will be susceptible to transient faults which can produce arbitrary soft errors. Despite these drawbacks, it is possible to make nanoscale architectures practical and realistic by introducing defect and fault tolerance. In this article, we propose and evaluate a hybrid nanowire-CMOS architecture that addresses all three problems—namely high defect rates, unlocated defects, and transient faults—at the same time. This goal is achieved by using multiple levels of redundancy and majority voters. A key aspect of the architecture is that it contains a judicious balance of both nanoscale and traditional CMOS components. A companion to the architecture is a compiler with heuristics to quickly determine if logic can be mapped onto partially defective nanoscale elements. The heuristics make it possible to introduce defect-awareness in placement and routing. The architecture and compiler are evaluated by applying the complete design flow to several benchmarks.

Journal ArticleDOI
TL;DR: A thorough design space exploration is conducted to optimize a high-performance hybrid nano/CMOS reconfigurable architecture with high performance, superior logic density, and outstanding design flexibility, which is very attractive for deployment in cost-conscious embedded systems.
Abstract: In recent years, research on nanotechnology has advanced rapidly. Novel nanodevices have been developed, such as those based on carbon nanotubes, nanowires, etc. Using these emerging nanodevices, diverse nanoarchitectures have been proposed. Among them, hybrid nano/CMOS reconfigurable architectures have attracted attention because of their advantages in performance, integration density, and fault tolerance. Recently, a high-performance hybrid nano/CMOS reconfigurable architecture, called NATURE, was presented. NATURE comprises CMOS reconfigurable logic and interconnect fabric, and CMOS-fabrication-compatible nanomemory. High-density, fast nano RAMs are distributed in NATURE as on-chip storage to store multiple reconfiguration copies for each reconfigurable element. It enables cycle-by-cycle runtime reconfiguration and a highly efficient computational model, called temporal logic folding. Through logic folding, NATURE provides more than an order of magnitude improvement in logic density and area-delay product, and significant design flexibility in performing area-delay trade-offs, at the same technology node. Moreover, NATURE can be fabricated using mainstream photolithography fabrication techniques. Hence, it offers a currently commercially viable reconfigurable architecture with high performance, superior logic density, and outstanding design flexibility, which is very attractive for deployment in cost-conscious embedded systems.In order to fully explore the potential of NATURE and further improve its performance, in this article, a thorough design space exploration is conducted to optimize its architecture. Investigations in terms of different logic element architectures, interconnect designs, and various technologies for nano RAMs are presented. Nano RAMs can not only be used as storage for configuration bits, but the high density of nano RAMs also makes them excellent candidates for large-capacity on-chip data storage in NATURE. Many logic- and memory-intensive applications, such as video and image processing, require large storage of temporal results. To enhance the capability of NATURE for implementing such applications, we investigate the design of nano data memory structures in NATURE and explore the impact of memory density. Experimental results demonstrate significant throughput improvements due to area saving from logic folding and parallel data processing.

Journal ArticleDOI
TL;DR: By using dynamic redundancy allocation, the massive parallelism is exploited to jointly achieve fault (defect/error) tolerance and high performance and the effectiveness of the proposed nanoarchitecture technique is demonstrated.
Abstract: Nanoelectronic devices are considered to be the computational fabrics for the emerging nanocomputing systems due to their ultra-high speed and integration density. However, the imperfect bottom-up self-assembly fabrication leads to excessive defects that have become a barrier for achieving reliable computing. In addition, transient errors continue to be a problem. The massive parallelism rendered by nanoscale integration opens up new opportunities but also poses challenges on how to manage such massive resources for reliable and high-performance computing. In this paper, we propose a nanoarchitecture solution to address these emerging challenges. By using dynamic redundancy allocation, the massive parallelism is exploited to jointly achieve fault (defect/error) tolerance and high performance. Simulation results demonstrate the effectiveness of the proposed technique under a range of fault rates and operating conditions.

Journal ArticleDOI
TL;DR: A new nanosystem architecture that employs nanowire crossbars for Digital Signal Processing (DSP) applications that features good scalability and viability for various DSP applications.
Abstract: Emerging technologies such as silicon NanoWires (NW) and Carbon NanoTubes (CNT) have shown great potential for building the next generation of computing systems in the nano ranges. However, the excessive number of defects originating from bottom-up fabrication (such as a self-assembly process) poses a pressing challenge for achieving scalable system integration. This article proposes a new nanosystem architecture that employs nanowire crossbars for Digital Signal Processing (DSP) applications. Distributed arithmetic is utilized such that complex signal processing computation can be mapped into regular memory operations, thus making this architecture well suited for implementation by nanowire crossbars. Furthermore, the inherent features of DSP-type computation provide new insights to remedy errors (as logic/computational manifestation of defects). A new defect/error-tolerant technique that exploits algorithmic error compensation is proposed; at system level different trade-offs between correctness in output and performance are established while retaining low overhead in its implementation. As an instance of its application, the proposed approach has been utilized to a generic DSP nanosystem performing frequency-selective filtering. Simulation results show that the proposed nanoDSP introduces only a minor performance degradation under high defect rates and at a range of operational conditions. The proposed technique also features good scalability and viability for various DSP applications.

Journal ArticleDOI
TL;DR: This article develops the methodology and an automated synthesis flow to support two different asynchronous design approaches (Micropipelines and Four phase Dual-rail) for system designs using nano-crossbar logic stages and CMOS interface data-storage elements.
Abstract: Among the emerging alternatives to CMOS, molecular electronics based diode-resistor crossbar fabric has generated considerable interest in recent times. Logic circuit design with future nano-scale molecular devices using dense and regular crossbar fabrics is promising in terms of integration density, performance and power dissipation. However, circuit design using molecular switches involve some major challenges: 1) lack of voltage gain of these switches that prevents logic cascading; 2) large output voltage level degradation; 3) vulnerability to parameter variations that affect yield and robustness of operation; and 4) high defect rate. In this article, we analyze some of the above challenges and investigate the effectiveness of asynchronous design methodology in a hybrid system design platform using molecular crossbar and CMOS interfacing elements. We explore different approaches of asynchronous circuit design and compare their suitability in terms of several circuit design parameters. We then develop the methodology and an automated synthesis flow to support two different asynchronous design approaches (Micropipelines and Four phase Dual-rail) for system designs using nano-crossbar logic stages and CMOS interface data-storage elements. Circuit-level simulation results for several benchmarks show considerable advantage in terms of performance and robustness at moderate area and power overhead compared to two different synchronous implementations.

Journal ArticleDOI
TL;DR: This article proposes to exploit the programmable threshold voltage quantum dot (QD) transistors to reduce leakage thereby improving the energy efficiency for mobile computing and demonstrates the significant leakage reduction over conventional techniques.
Abstract: Power consumption poses one of the fundamental barriers for deploying mobile computing devices in energy-constrained situations with varying operation conditions. In particular, leakage power is projected to increase exponentially in future semiconductor process nodes. This challenging problem is pressing for renewed focus on power-performance optimization at all levels of design abstract, from novel device structures to fundamental shifts in design paradigm. In this article, we propose to exploit the programmable threshold voltage quantum dot (QD) transistors to reduce leakage thereby improving the energy efficiency for mobile computing. The unique programmability and reconfigurability enabled by QD transistors extend our capability in design optimization for new power-performance trade-offs. Simulation results demonstrate the significant leakage reduction over conventional techniques.

Journal ArticleDOI
Sandeep K. Shukla1
TL;DR: Engineers and scientists are now engaged in finding alternatives to silicon-based computing such that nanoscale computation can be realized with molecular dynamics, quantum effects, and other nontraditional material and computation paradigms.
Abstract: As silicon technology is reaching the lower end of nanometer ranges of feature size (45nm CMOS technology is already in use), the continuation of Moore’s law-based scaling of silicon technology is now facing several challenges. The reduced feature size implies a larger number of the transistors on the unit area of silicon chips which provides both scopes for newer features in our computation capabilities, coupled with the problem of increased defect rates and susceptibility to transient faults. Since defect rates can go up 10% or more, traditional discarding of silicon chips based on defects would reduce yields to such low levels that alternative measures of yield enhancements are imperative. One possible way is to enhance the computing logic and micro-architectures with defectand fault-tolerance features that would make computation robust against such high level of defects and faults, hence increasing yields. On the other hand, engineers and scientists are now engaged in finding alternatives to silicon-based computing such that nanoscale computation can be realized with molecular dynamics, quantum effects, and other nontraditional material and computation paradigms. Molecular transistors, DNA-scaffoldingbased computation fabrics, carbon nanotube-based field effect transistors, carbon nanotube-based PLA type fabrics, and many other technological advances are happening in various academic and industrial labs. However, many of these technologies cannot depend on traditional lithographic techniques for manufacturing because of the small range of the dimensions. As a result, self-assembly techniques and various fortifications of techniques to make nature engineer such systems instead of us having to devise the engineering methods to circumvent nature are being worked out. Self-assembly when not tempered with such techniques would also lead to yield problems, hence these techniques and innovations are necessary for nanoscale computing with such technologies to be usable. While both the low-dimension nanometer-scale silicon technology and nontraditional technologies grapple with these issues of defect and fault, reliability engineering is playing an important part both at the lowest level of manufacturing techniques to the architecture, micro-architecture, and logic level stages of the electronic design flow. While these have renewed interest in the early work on fault tolerance and defect tolerance as far back as Von Neumann’s original work in the early 1950s, some of the techniques imply so much area and latency overhead that one has

Journal ArticleDOI
TL;DR: This article investigates, via analytic modeling, how a magnetic QCA wire should be organized to provide the highest reliability and develops a guideline for selecting the most reliable wire organization during the circuit design process.
Abstract: This article investigates, via analytic modeling, how a magnetic QCA wire should be organized to provide the highest reliability. We compare a nonredundant wire and two redundant wire organizations. For all three organizations, a fault rate per unit length is used for comparison; additionally, since extra components are necessary to implement the redundant organizations, these components are faulty as well. We show that the difference between these two fault rates is the main driver for selecting a wire organization. Lastly, we develop a guideline for selecting the most reliable wire organization during the circuit design process.