scispace - formally typeset
Search or ask a question

Showing papers on "Very-large-scale integration published in 2015"


Proceedings ArticleDOI
01 Dec 2015
TL;DR: A full custom hardware implementation of a deep neural network, built using multiple neuromorphic VLSI devices that integrate analog neuron and synapse circuits together with digital asynchronous logic circuits is presented.
Abstract: We present a full custom hardware implementation of a deep neural network, built using multiple neuromorphic VLSI devices that integrate analog neuron and synapse circuits together with digital asynchronous logic circuits. The deep network comprises an event-based convolutional stage for feature extraction connected to a spike-based learning stage for feature classification. We describe the properties of the chips used to implement the network and present preliminary experimental results that validate the approach proposed.

140 citations


Proceedings ArticleDOI
18 Oct 2015
TL;DR: This paper proposes a novel design methodology for logic circuits targeting memristor crossbars that supports the execution of Boolean logic functions within constant number of steps independent of its functionality.
Abstract: As the CMOS technology is gradually scaling down to inherent physical device limits, significant challenges emerge related to scalability, leakage, reliability, etc. Alternative technologies are under research for next-generation VLSI circuits. Memristor is one of the promising candidates due to its scalability, practically zero leakage, non-volatility, etc. This paper proposes a novel design methodology for logic circuits targeting memristor crossbars. This methodology allows the optimization of the design of logic function, and their automatic mapping on the memristor crossbar. More important, this methodology supports the execution of Boolean logic functions within constant number of steps independent of its functionality. To illustrate the potential of the proposed methodology, multi-bit adders and multipliers are explored; their incurred delay, area and energy costs are analyzed. The comparison of our approach with state-of-the-art Boolean logic circuits for memristor crossbar architecture shows significant improvement in both delay (4 to 500 x) and energy consumption (1.22 to 3.71 x). The area overhead may decrease (down to 44%) or increase (up to 17%) depending on the circuit's functionality and logic optimization level.

92 citations


Journal ArticleDOI
Genggeng Liu1, Wenzhong Guo1, Yuzhen Niu1, Guolong Chen1, Xing Huang1 
01 May 2015
TL;DR: Experimental results indicate that the proposed MOPSO is worthy of being studied in the field of multi-objective optimization problems, and the proposed algorithm has a better tradeoff between the wire length and radius of the routing tree and has achieved a better delay value.
Abstract: Constructing a timing-driven Steiner tree is very important in VLSI performance-driven routing stage. Meanwhile, non-Manhattan architecture is supported by several manufacturing technologies and now well appreciated in the chip manufacturing circle. However, limited progress has been reported on the non-Manhattan performance-driven routing problem. In this paper, an efficient algorithm, namely, TOST_BR_MOPSO, is presented to construct the minimum-cost spanning tree with a minimum radius for performance-driven routing in Octilinear architecture (one type of the non-Manhattan architecture) based on multi-objective particle swarm optimization (MOPSO) and Elmore delay model. Edge transformation is employed in our algorithm to make the particles have the ability to achieve the optimal solution while Union-Find partition is used to prevent the generation of invalid solution. For the purpose of reducing the number of bends which is one of the key factors of chip manufacturability, we also present an edge-vertex encoding strategy combined with edge transformation. To our best knowledge, no approach has been proposed to optimize the number of bends in the process of constructing the non-Manhattan timing-driven Steiner tree. Moreover, the theorem of Markov chain is used to prove the global convergence of our proposed algorithm. Experimental results indicate that the proposed MOPSO is worthy of being studied in the field of multi-objective optimization problems, and our algorithm has a better tradeoff between the wire length and radius of the routing tree and has achieved a better delay value. Meanwhile, combining edge transformation with the encoding strategy, the proposed algorithm can significantly reduce nearly 20 % in the number of bends.

82 citations


Journal ArticleDOI
Xing Huang1, Genggeng Liu1, Wenzhong Guo1, Yuzhen Niu1, Guolong Chen1 
TL;DR: This is the first time to specially solve the single-layer obstacle-avoiding problem in X-architecture for a given set of pins and obstacles and achieves the best solution quality in a reasonable runtime among the existing algorithms.
Abstract: Obstacle-avoiding Steiner minimal tree (OASMT) construction has become a focus problem in the physical design of modern very large-scale integration (VLSI) chips. In this article, an effective algorithm is presented to construct an OASMT based on X-architecturex for a given set of pins and obstacles. First, a kind of special particle swarm optimization (PSO) algorithm is proposed that successfully combines the classic genetic algorithm (GA), and greatly improves its own search capability. Second, a pretreatment strategy is put forward to deal with obstacles and pins, which can provide a fast information inquiry for the whole algorithm by generating a precomputed lookup table. Third, we present an efficient adjustment method, which enables particles to avoid all the obstacles by introducing some corner points of obstacles. Finally, an excellent refinement method is discussed to further enhance the quality of the final routing tree, which can improve the quality of the solution by 7.93p on average. To our best knowledge, this is the first time to specially solve the single-layer obstacle-avoiding problem in X-architecture. Experimental results show that the proposed algorithm can further shorten wirelength in the presence of obstacles. And it achieves the best solution quality in a reasonable runtime among the existing algorithms.

64 citations


Journal ArticleDOI
TL;DR: The Quantum dot Cellular Automata (QCA) can be such an architecture at nano-scale and thus emerges as a viable alternative for the current CMOS VLSI and its effectiveness is further established through synthesis of configurable logic block (CLB) for field programmable gate arrays (FPGAs).

63 citations


Proceedings ArticleDOI
24 May 2015
TL;DR: This work proposes an FPGA design for soft-output data detection in orthogonal frequency-division multiplexing (OFDM)-based large-scale (multi-user) MIMO systems that uses a modified version of the conjugate gradient least square (CGLS) algorithm.
Abstract: We propose an FPGA design for soft-output data detection in orthogonal frequency-division multiplexing (OFDM)-based large-scale (multi-user) MIMO systems. To reduce the high computational complexity of data detection, our design uses a modified version of the conjugate gradient least square (CGLS) algorithm. In contrast to existing linear detection algorithms for massive MIMO systems, our method avoids two of the most complex tasks, namely Gram-matrix computation and matrix inversion, while still being able to compute soft-outputs. Our architecture uses an array of reconfigurable processing elements to compute the CGLS algorithm in a hardware-efficient manner. Implementation results on Xilinx Virtex-7 FPGA for a 128 antenna, 8 user large-scale MIMO system show that our design only uses 70% of the area-delay product of the competitive method, while exhibiting superior error-rate performance.

56 citations


Journal ArticleDOI
TL;DR: The quantum implementation of primitive reversible gate has been presented and the proposed gates have been designed and simulated using QCADesigner.
Abstract: Quantum Dot Cellular Automata (QCA) is a rising innovation which seems to be a good competitor for the next generation of digital systems and widely utilized as a part of advanced frameworks. It is an appealing substitute to ordinary CMOS innovation because of diminutive size, faster speed, extremely scalable feature, ultralow power consumption and better switching frequency. The realization of quantum computation is not possible without reversible logic. Reversible logic has enlarged operations in quantum computation. Generally reversible computing is executed when system composes of reversible gates. It has numerous fields of use as applied science, quantum dot cellular automata as well as low power VLSI circuits, low power CMOS, digital signal processing, computer graphics. In this paper, the quantum implementation of primitive reversible gate has been presented. The proposed gates have been designed and simulated using QCADesigner. General Terms Quantum Cellular Automata and Reversible Logic Gates

54 citations


Journal ArticleDOI
TL;DR: The methods discussed in the paper can be used in the design of emerging low-power digital systems having lowest complexity at the cost of a loss in accuracy?the optimal trade-off of computational accuracy for lowest possible complexity and power.
Abstract: The DCT and the DWT are used in a number of emerging DSP applications, such as, HD video compression, biomedical imaging, and smart antenna beamformers for wireless communications and radar. Of late, there has been much interest on fast algorithms for the computation of the above transforms using multiplier-free approximations because they result in low power and low complexity systems. Approximate methods rely on the trade-off of accuracy for lower power and/or circuit complexity/chip-area. This paper provides a detailed review of VLSI architectures and CAS implementations for both DCT/DWTs, which can be designed either for higher-accuracy or for low-power consumption. This article covers both recent theoretical advancements on discrete transforms in addition to an overview of existing VLSI architectures. The paper also discusses error free VLSI architectures that provides high accuracy systems and approximate architectures that offer high computational gain making them highly attractive for real-world applications that are subject to constraints in both chip-area as well as power. The methods discussed in the paper can be used in the design of emerging low-power digital systems having lowest complexity at the cost of a loss in accuracy?the optimal trade-off of computational accuracy for lowest possible complexity and power. A complete synopsis of available techniques, algorithms and FPGA/VLSI realizations are discussed in the paper.

54 citations


BookDOI
01 Jan 2015
TL;DR: The first € price and the £ and $ price are net prices, subject to local VAT as discussed by the authors, and the €(D) includes 7% for Germany, €(A) includes 10% for Austria.
Abstract: The first € price and the £ and $ price are net prices, subject to local VAT. Prices indicated with * include VAT for books; the €(D) includes 7% for Germany, the €(A) includes 10% for Austria. Prices indicated with ** include VAT for electronic products; 19% for Germany, 20% for Austria. All prices exclusive of carriage charges. Prices and other details are subject to change without notice. All errors and omissions excepted. B.K. Kaushik, M.K. Majumder Carbon Nanotube Based VLSI Interconnects

49 citations


Journal ArticleDOI
Genggeng Liu1, Wenzhong Guo1, Rongrong Li1, Yuzhen Niu1, Guolong Chen1 
TL;DR: To the best knowledge, XGRouter is the first work to use a concurrent algorithm to solve the global routing problem in X-architecture and can produce solutions of higher quality than other global routers.
Abstract: This paper presents a high-quality very large scale integration (VLSI) global router in X-architecture, called XGRouter, that heavily relies on integer linear programming (ILP) techniques, partition strategy and particle swarm optimization (PSO). A new ILP formulation, which can achieve more uniform routing solution than other formulations and can be effectively solved by the proposed PSO is proposed. To effectively use the new ILP formulation, a partition strategy that decomposes a large-sized problem into some small-sized sub-problems is adopted and the routing region is extended progressively from the most congested region. In the post-processing stage of XGRouter, maze routing based on new routing edge cost is designed to further optimize the total wire length and mantain the congestion uniformity. To our best knowledge, XGRouter is the first work to use a concurrent algorithm to solve the global routing problem in X-architecture. Experimental results show that XGRouter can produce solutions of higher quality than other global routers. And, like several state-of-the-art global routers, XGRouter has no overflow.

48 citations


Proceedings ArticleDOI
12 Mar 2015
TL;DR: This paper will discuss key techniques and recent results of machine learning and pattern matching, with their applications in physical design.
Abstract: Machine learning (ML) and pattern matching (PM) are powerful computer science techniques which can derive knowledge from big data, and provide prediction and matching. Since nanometer VLSI design and manufacturing have extremely high complexity and gigantic data, there has been a surge recently in applying and adapting machine learning and pattern matching techniques in VLSI physical design (including physical verification), e.g., lithography hotspot detection and data/pattern-driven physical design, as ML and PM can raise the level of abstraction from detailed physics-based simulations and provide reasonably good quality-of-result. In this paper, we will discuss key techniques and recent results of machine learning and pattern matching, with their applications in physical design.

Journal ArticleDOI
TL;DR: A novel compact multilayer CNN model based on nanometer scale resonant tunneling diodes (RTDs) and memristors is presented and offers advantages of powerful processing capability as well as high compactness, versatility, and possibility of very large scale integration (VLSI) circuit implementations.

Journal ArticleDOI
TL;DR: This paper presents some highly scalable features reversible logic gate for the QCA technology and proposed layout compared with CMOS technology, offer a better reduction in size up to 233 times.
Abstract: Conventional lithography-based VLSI design technology deployed to optimize low-powered-computing and higher scale integration of semiconductor components. However, this downscaling trend confronts serious challenges of tunneling and leakage current increment to the Complementary Metal–Oxide–Semiconductor (CMOS) technology on nanoscale regimes. To resolve the physical restriction of the CMOS, Quantum-dot Cellular Automata (QCA) technology dedicates for the nanoscale technology that embrace a new information transformation technique. However, QCA is limited to the design of the sequential and combinational circuits only. This paper presents some highly scalable features reversible logic gate for the QCA technology. In addition, proposed layout compared with CMOS technology, offer a better reduction in size up to 233 times.

Journal ArticleDOI
TL;DR: This brief proposes a novel design scheme for approximate adders and comparators to significantly reduce energy consumption while maintaining a very low error rate and critical path delay.
Abstract: This brief proposes a novel design scheme for approximate adders and comparators to significantly reduce energy consumption while maintaining a very low error rate. The considerably improved error rate and critical path delay stem from the employed carry prediction technique that leverages the information from less significant input bits in a parallel manner. The proposed designs have been adopted in a VLSI-based neuromorphic character recognition chip with unsupervised learning implemented on chip. The approximation errors of the proposed arithmetic units have been shown to have negligible impact on the training process while archiving good energy efficiency.

Book
14 Aug 2015
TL;DR: Compact Models for Integrated Circuit Design: Conventional Transistors and Beyond as discussed by the authors provides a modern treatise on compact models for circuit computer-aided design (CAD) and provides a balanced presentation of compact modeling crucial for addressing current modeling challenges and understanding new models for emerging devices.
Abstract: Compact Models for Integrated Circuit Design: Conventional Transistors and Beyond provides a modern treatise on compact models for circuit computer-aided design (CAD). Written by an author with more than 25 years of industry experience in semiconductor processes, devices, and circuit CAD, and more than 10 years of academic experience in teaching compact modeling courses, this first-of-its-kind book on compact SPICE models for very-large-scale-integrated (VLSI) chip design offers a balanced presentation of compact modeling crucial for addressing current modeling challenges and understanding new models for emerging devices. Starting from basic semiconductor physics and covering state-of-the-art device regimes from conventional micron to nanometer, this text: Presents industry standard models for bipolar-junction transistors (BJTs), metal-oxide-semiconductor (MOS) field-effect-transistors (FETs), FinFETs, and tunnel field-effect transistors (TFETs), along with statistical MOS models Discusses the major issue of process variability, which severely impacts device and circuit performance in advanced technologies and requires statistical compact models Promotes further research of the evolution and development of compact models for VLSI circuit design and analysis Supplies fundamental and practical knowledge necessary for efficient integrated circuit (IC) design using nanoscale devices Includes exercise problems at the end of each chapter and extensive references at the end of the book Compact Models for Integrated Circuit Design: Conventional Transistors and Beyond is intended for senior undergraduate and graduate courses in electrical and electronics engineering as well as for researchers and practitioners working in the area of electron devices. However, even those unfamiliar with semiconductor physics gain a solid grasp of compact modeling concepts from this book.

Journal ArticleDOI
TL;DR: This paper presents fault-tolerant irregular topology-generation method for application-specific NoC designs and demonstrates that the method is able to determine fault-Tolerant topologies with negligible area increase and better energy values.
Abstract: As the technology sizes of integrated circuits (ICs) scale down rapidly, current transistor densities on chips dramatically increase. While nanometer feature sizes allow denser chip designs in each technology generation, fabricated ICs become more susceptible to wear-outs, causing operation failure. Even a single link failure within an on-chip fabric can halt communication between application blocks, which makes the entire chip useless. In this paper, we aim to make faulty chips designed with network-on-chip (NoC) communication usable. Specifically, we present fault-tolerant irregular topology-generation method for application-specific NoC designs. Designed NoC topology allows different routing path if there is a link failure on the default routing path. Additionally, we present a simulated annealing-based application mapping algorithm aiming to minimize total energy consumption of the NoC design. We compare fault-tolerant topologies with nonfault-tolerant application-specific irregular topologies on energy consumption, performance, and area using multimedia benchmarks and custom-generated graphs. Our results demonstrate that our method is able to determine fault-tolerant topologies with negligible area increase and better energy values.

Proceedings ArticleDOI
Karim Arabi1, Kambiz Samadi1, Yang Du1
29 Mar 2015
TL;DR: 3D VLSI (3DV) is an emerging 3D integration technology that unlike packaging-driven 3D technologies can deliver orders of magnitude more integration densities due to extremely small sizes of vertical vias.
Abstract: As the semiconductor industry faces serious challenges extending the CMOS roadmap, traditional cost reduction benefits that accompanied power/performance/area (PPA) advantages of successive technology nodes have decreased due to a myriad of process integration challenges and increased variability, reliability, power and thermal constraints. 3D integration technologies have been pursued as a potential solution to help integrate more functions within a confined available dimensions of advanced mobile devices. 3D VLSI (3DV) is an emerging 3D integration technology that unlike packaging-driven 3D technologies (e.g., 2.5D, TSV-based 3D, etc.) can deliver orders of magnitude more integration densities due to extremely small sizes of vertical vias. In this paper, we describe the 3DV technology and its current benefits and challenges. We also survey recent literature that show the potential of 3DV to help continue Moore's law trajectory beyond 2D.

Proceedings ArticleDOI
26 Mar 2015
TL;DR: A modified full adder using multiplexer is proposed to achieve low power consumption of multiplier and shows an average reduction of 37.45% in power consumption, 45.75% in area, and 17.65% in delay compared to the existing approaches.
Abstract: Achieving high speed integrated circuits with low power consumption is a major concern for the VLSI circuit designers. Most arithmetic operations are done using multiplier, which is the major power consuming element in the digital circuits. Basically the process of multiplication is realized in hardware in terms of shift and add operation. The optimization of adder has led to the improvement in performance of multiplier. In this paper, a modified full adder using multiplexer is proposed to achieve low power consumption of multiplier. To analyze the efficiency of proposed design, the conventional Wallace tree multiplier structure is used. The designs are developed using Verilog HDL and the functionalities are verified through simulation using Quartus II. The designs are synthesized in Synopsys Design Compiler using SAED90nm CMOS technology. The ASIC synthesis results of the proposed multiplier shows an average reduction of 37.45% in power consumption, 45.75% in area, and 17.65% in delay compared to the existing approaches.

Book
01 Jan 2015

Journal ArticleDOI
TL;DR: The proposed threshold logic outperforms previous memristive-CMOS logic cells on every aspect, however, they indicate a lower chip area, lower total harmonic distortion, and controllable leakage power, but a higher power dissipation with respect to CMOS logic.
Abstract: Brain-inspired circuits can provide an alternative solution to implement computing architectures taking advantage of fault tolerance and generalization ability of logic gates. In this brief, we advance over the memristive threshold circuit configuration consisting of memristive averaging circuit in combination with operational amplifier and/or CMOS inverters in application to realizing complex computing circuits. The developed memristive threshold logic gates are used for designing fast Fourier transform and multiplication circuits useful for modern microprocessors. Overall, the proposed threshold logic outperforms previous memristive-CMOS logic cells on every aspect, however, they indicate a lower chip area, lower total harmonic distortion, and controllable leakage power, but a higher power dissipation with respect to CMOS logic.

Journal ArticleDOI
TL;DR: This paper cooptimizes algorithm, architecture, circuit, and device for real-time energy-efficient on-chip hardware acceleration of sparse coding and shows that 65 nm implementation of the CMOS ASIC and PARCA scheme accelerates sparse coding computation by 394 and 2140×, respectively, compared to software running on a eight-core CPU.
Abstract: Many recent advances in sparse coding led its wide adoption in signal processing, pattern classification, and object recognition applications. Even with improved performance in state-of-the-art algorithms and the hardware platform of CPUs/GPUs, solving a sparse coding problem still requires expensive computations, making real-time large-scale learning a very challenging problem. In this paper, we cooptimize algorithm, architecture, circuit, and device for real-time energy-efficient on-chip hardware acceleration of sparse coding. The principle of hardware acceleration is to recognize the properties of learning algorithms, which involve many parallel operations of data fetch and matrix/vector multiplication/addition. Today's von Neumann architecture, however, is not suitable for such parallelization, due to the separation of memory and the computing unit that makes sequential operations inevitable. Such principle drives both the selection of algorithms and the design evolution from CPU to CMOS application-specific integrated circuits (ASIC) to parallel architecture with resistive crosspoint array (PARCA) that we propose. The CMOS ASIC scheme implements sparse coding with SRAM dictionaries and all-digital circuits, and PARCA employs resistive-RAM dictionaries with special read and write circuits. We show that 65 nm implementation of the CMOS ASIC and PARCA scheme accelerates sparse coding computation by $394$ and $2140\times$ , respectively, compared to software running on a eight-core CPU. Simulated power for both hardware schemes lie in the milli-Watt range, making it viable for portable single-chip learning applications.

Journal ArticleDOI
TL;DR: The proposed chip comprises a speaker feature extraction (SFE) module, an SVM module, and a decision module that performs autocorrelation analysis, linear predictive coefficient (LPC) extraction, and LPC-to-cepstrum conversion.
Abstract: This brief presents the chip implementation of a support vector machine (SVM)-based speaker verification system. The proposed chip comprises a speaker feature extraction (SFE) module, an SVM module, and a decision module. The SFE module performs autocorrelation analysis, linear predictive coefficient (LPC) extraction, and LPC-to-cepstrum conversion. The SVM module includes a Gaussian kernel unit and a scaling unit. The purpose of the Gaussian kernel unit is first to evaluate the kernel value of a test vector and a support vector. Four Gaussian kernel processing elements (GK-PEs) are designed to process four support vectors simultaneously. Each GK-PE is designed in the pipeline fashion and is capable of performing 2-norm and exponential operations. An enhanced CORDIC architecture is proposed to calculate the exponential value. As well as the Gaussian kernel unit, a scaling unit is also developed for use in the SVM module. The scaling unit is used to perform scaling multiplications and the remaining operations of SVM decision value evaluation. Finally, the decision module accumulates the frame scores that are generated by all of the test frames, and then compare it with a threshold to see if the test utterance is spoken by the claimed speaker. This designed chip is characterized by its high speed and its ability to handle a large number of support vectors in the SVM. The prototype chip is a semicustom chip that is fabricated using Taiwan Semiconductor Manufacturing Company 0.90-nm CMOS technology on a die with a size of $\sim 7.9\times 7.9$ mm $^{2}$ .

Journal ArticleDOI
TL;DR: This paper presents reconfigurable reversible computing-based cryptography, and a generic reconfiguration-based VLSI design-for-security methodology, and prevents software- or hardware-based code injection attacks based on a SPARC V8 LEON2 processor.
Abstract: Reconfigurable computing is a critical technology for achieving nanoelectronic systems of yield and reliability. In this paper, we present that reconfigurable computing is further a critical technology for achieving hardware security in the presence of supply chain adversaries. Specifically, reconfigurable implementation of a given logic function achieves design obfuscation, while reconfiguration for difference logic functions further achieves moving target defense. We further present reconfigurable reversible computing-based cryptography, and a generic reconfiguration-based VLSI design-for-security methodology. In our case studies based on a SPARC V8 LEON2 processor, we prevent software- or hardware-based code injection attacks at cost of 0.72% area increase, negligible power consumption increase and no performance degradation; we further prevent a hardware Trojan from gaining unauthorized memory access at cost of 4.42% area increase, negligible power consumption increase, and 11.30% critical path delay increase.

BookDOI
01 Jan 2015
TL;DR: This paper discusses hardware/Software Partitioning for Embedded Systems, Design Space Exploration for Scheduling and Allocation in High Level Synthesis of Datapaths, and Flow from Algorithm to RTL using Evolutionary Exploration Approach.
Abstract: Introduction to Multi-Objective Evolutionary Algorithms.- Hardware/Software Partitioning for Embedded Systems.- Circuit Partitioning for VLSI Layout.- Design of Operational Amplifier.- Design Space Exploration for Scheduling and Allocation in High Level Synthesis of Datapaths.- Design Space Exploration of Datapath (Architecture) in High Level Synthesis for Computation Intensive Applications.- Design Flow from Algorithm to RTL using Evolutionary Exploration Approach.- Crosstalk Delay Fault Test Generation.- Scheduling in Heterogeneous Distributed Systems.

Journal ArticleDOI
TL;DR: In this paper, a functional test approach specified for complex multifunctional VLSI devices is presented and the basic radiation test procedure is discussed in application to some typical examples, and the main difficulty is to organize informative and quick functional test dirctly under irradiation.
Abstract: Total ionizing dose (TID) effects and radiation tests of complex multifunctional Very-large-scale integration (VLSI) integrated circuits (ICs) rise up some particularities as compared to conventional "simple" ICs. The main difficulty is to organize informative and quick functional test dirctly under irradiation. Functional tests approach specified for complex multifunctional VLSI devices is presented and the basic radiation test procedure is discussed in application to some typical examples.

Journal ArticleDOI
TL;DR: How the programmable neuromorphic system proposed can be configured to implement specific spike-based synaptic plasticity rules is demonstrated and how it can be utilised in a cognitive task is depicted.
Abstract: Hardware implementations of spiking neural networks offer promising solutions for computational tasks that require compact and low-power computing technologies. As these solutions depend on both the specific network architecture and the type of learning algorithm used, it is important to develop spiking neural network devices that offer the possibility to reconfigure their network topology and to implement different types of learning mechanisms. Here we present a neuromorphic multi-neuron VLSI device with on-chip programmable event-based hybrid analog/digital circuits; the event-based nature of the input/output signals allows the use of address-event representation infrastructures for configuring arbitrary network architectures, while the programmable synaptic efficacy circuits allow the implementation of different types of spike-based learning mechanisms. The main contributions of this article are to demonstrate how the programmable neuromorphic system proposed can be configured to implement specific spike-based synaptic plasticity rules and to depict how it can be utilised in a cognitive task. Specifically, we explore the implementation of different spike-timing plasticity learning rules online in a hybrid system comprising a workstation and when the neuromorphic VLSI device is interfaced to it, and we demonstrate how, after training, the VLSI device can perform as a standalone component (i.e., without requiring a computer), binary classification of correlated patterns.

Journal ArticleDOI
TL;DR: A novel hybrid memristor-CMOS XOR/XNOR logic circuit that offers several advantages such as combinational circuit behavior, simpler operation and lower hardware overhead than existing solutions is introduced.

Journal ArticleDOI
TL;DR: This brief proposes an unbalanced pull-up/down network, together with an inverse narrow-width technique, to improve the operating speed of the individual logic cell to save power and die area in the process of device sizing and topology optimization.
Abstract: Ultralow-energy biomedical applications have urged the development of a subthreshold VLSI logic family in standard CMOS. This brief proposes an unbalanced pull-up/down network, together with an inverse narrow-width technique, to improve the operating speed of the individual logic cell. Effective logical efforts save both power and die area in the process of device sizing and topology optimization. Three experimental 14-tap 8-bit finite impulse response filters optimized for ultralow-voltage operation were fabricated in 0.18- $\mu $ m CMOS. Measurements show that the optimized 0.45 and 0.6 V libraries achieve minimum energy operations at 100 kHz, with a figure-of-merit of 0.365 (at 0.31 V) and 0.4632 (at 0.39 V), respectively. They correspond to 35.96% and 18.74% improvements, and the overall performances are well comparable with the state of the art.

Journal ArticleDOI
TL;DR: Experimental results reveal that the proposed AES architectures offer superior performance than the existing VLSI architectures in terms of power, throughput and critical path delay.

Journal ArticleDOI
TL;DR: This paper presents a symbolic framework to model soft errors in both synchronous and asynchronous designs, and is the first time that a decision diagram based soft error identification approach is proposed for asynchronous circuits.