scispace - formally typeset
Search or ask a question

Showing papers on "Very-large-scale integration published in 2014"


Journal ArticleDOI
TL;DR: This work proposes a new approximate matrix inversion algorithm relying on a Neumann series expansion, which substantially reduces the complexity of linear data detection in single-carrier frequency-division multiple access (SC-FDMA)-based large-scale MIMO systems.
Abstract: Large-scale (or massive) multiple-input multiple-out put (MIMO) is expected to be one of the key technologies in next-generation multi-user cellular systems based on the upcoming 3GPP LTE Release 12 standard, for example. In this work, we propose-to the best of our knowledge-the first VLSI design enabling high-throughput data detection in single-carrier frequency-division multiple access (SC-FDMA)-based large-scale MIMO systems. We propose a new approximate matrix inversion algorithm relying on a Neumann series expansion, which substantially reduces the complexity of linear data detection. We analyze the associated error, and we compare its performance and complexity to those of an exact linear detector. We present corresponding VLSI architectures, which perform exact and approximate soft-output detection for large-scale MIMO systems with various antenna/user configurations. Reference implementation results for a Xilinx Virtex-7 XC7VX980T FPGA show that our designs are able to achieve more than 600 Mb/s for a 128 antenna, 8 user 3GPP LTE-based large-scale MIMO system. We finally provide a performance/complexity trade-off comparison using the presented FPGA designs, which reveals that the detector circuit of choice is determined by the ratio between BS antennas and users, as well as the desired error-rate performance.

363 citations


Book
12 Mar 2014

131 citations


Journal ArticleDOI
TL;DR: The current status of High K dielectrics in Very Large Scale Integrated circuit (VLSI) manufacturing for leading edge Dynamic Random Access Memory (DRAM) and Complementary Metal Oxide Semiconductor (CMOS) applications is summarized along with the deposition methods and general equipment types employed.
Abstract: The current status of High K dielectrics in Very Large Scale Integrated circuit (VLSI) manufacturing for leading edge Dynamic Random Access Memory (DRAM) and Complementary Metal Oxide Semiconductor (CMOS) applications is summarized along with the deposition methods and general equipment types employed. Emerging applications for High K dielectrics in future CMOS are described as well for implementations in 10 nm and beyond nodes. Additional emerging applications for High K dielectrics include Resistive RAM memories, Metal-Insulator-Metal (MIM) diodes, Ferroelectric logic and memory devices, and as mask layers for patterning. Atomic Layer Deposition (ALD) is a common and proven deposition method for all of the applications discussed for use in future VLSI manufacturing.

121 citations


Journal ArticleDOI
TL;DR: In this article, the authors provide a vision for codesigning 3D IC architecture and integrated cooling systems and provide a new level of codesign approach with circuit, software and thermal designers working together.
Abstract: In an effort to increase processor speeds, 3D IC architecture is being aggressively pursued by researchers and chip manufacturers. This architecture allows extremely high level of integration with enhanced electrical performance and expanded functionality, and facilitates realization of VLSI and ULSI technologies. However, utilizing the third dimension to provide additional device layers poses thermal challenges due to the increased heat dissipation and complex electrical interconnects among different layers. The conflicting needs of the cooling system requiring larger flow passage dimensions to limit the pressure drop, and the IC architecture necessitating short interconnect distances to reduce signal latency warrant paradigm shifts in both of their design approach. Additional considerations include the effects due to temperature nonuniformity, localized hot spots, complex fluidic connections, and mechanical design. This paper reviews the advances in 3D IC cooling in the last decade and provides a vision for codesigning 3D IC architecture and integrated cooling systems. For heat fluxes of 50‐100W/cm 2 on each side of a chip in a 3D IC package, the current single-phase cooling technology is projected to provide adequate cooling, albeit with high pressure drops. For future applications with coolant surface heat fluxes from 100 to 500W/cm 2 , significant changes need to be made in both electrical and cooling technologies through a new level of codesign. Effectively mitigating the high temperatures surrounding local hot spots remains a challenging issue. The codesign approach with circuit, software and thermal designers working together is seen as essential. The through silicon vias (TSVs) in the current designs place a stringent limit on the channel height in the cooling layer. It is projected that integration of wireless network on chip architecture could alleviate these height restrictions since the data bandwidth is independent of the communication lengths. Microchannels that are 200lm or larger in depth are expected to allow dissipation of large heat fluxes with significantly lower pressure drops. [DOI: 10.1115/1.4027175]

120 citations


Proceedings ArticleDOI
24 Mar 2014
TL;DR: This work proposes to achieve ASIC design obfuscation based on embedded reconfigurable logic which is determined by the end user and unknown to any party in the supply chain to severely limit a supply chain adversary's ability to subvert a VLSI system with back doors or logic bombs.
Abstract: Hardware is the foundation and the root of trust of any security system. However, in today's global IC industry, an IP provider, an IC design house, a CAD company, or a foundry may subvert a VLSI system with back doors or logic bombs. Such a supply chain adversary's capability is rooted in his knowledge on the hardware design. Successful hardware design obfuscation would severely limit a supply chain adversary's capability if not preventing all supply chain attacks. However, not all designs are obfuscatable in traditional technologies. We propose to achieve ASIC design obfuscation based on embedded reconfigurable logic which is determined by the end user and unknown to any party in the supply chain. Combined with other security techniques, embedded reconfigurable logic can provide the root of ASIC design obfuscation, data confidentiality and tamper-proofness. As a case study, we evaluate hardware-based code injection attacks and reconfiguration-based instruction set obfuscation based on an open source SPARC processor LEON2. We prevent program monitor Trojan attacks and increase the area of a minimum code injection Trojan with a 1KB ROM by 2.38% for every 1% area increase of the LEON2 processor.

87 citations


Journal ArticleDOI
TL;DR: A hybrid analog/digital very large scale integration (VLSI) implementation of a spiking neural network with programmable synaptic weights and experimental results demonstrating the correct operation of all the circuits present on the chip are presented.
Abstract: We present a hybrid analog/digital very large scale integration (VLSI) implementation of a spiking neural network with programmable synaptic weights. The synaptic weight values are stored in an asynchronous Static Random Access Memory (SRAM) module, which is interfaced to a fast current-mode event-driven DAC for producing synaptic currents with the appropriate amplitude values. These currents are further integrated by current-mode integrator synapses to produce biophysically realistic temporal dynamics. The synapse output currents are then integrated by compact and efficient integrate and fire silicon neuron circuits with spike-frequency adaptation and adjustable refractory period and spike-reset voltage settings. The fabricated chip comprises a total of 32 × 32 SRAM cells, 4 × 32 synapse circuits and 32 × 1 silicon neurons. It acts as a transceiver, receiving asynchronous events in input, performing neural computation with hybrid analog/digital circuits on the input spikes, and eventually producing digital asynchronous events in output. Input, output, and synaptic weight values are transmitted to/from the chip using a common communication protocol based on the Address Event Representation (AER). Using this representation it is possible to interface the device to a workstation or a micro-controller and explore the effect of different types of Spike-Timing Dependent Plasticity (STDP) learning algorithms for updating the synaptic weights values in the SRAM module. We present experimental results demonstrating the correct operation of all the circuits present on the chip.

86 citations


Journal ArticleDOI
15 Jul 2014
TL;DR: Light is shed on the vulnerabilities in very large scale integration (VLSI) design and fabrication flow, and survey design-for-trust (DfTr) techniques that aim at regaining trust in IC design are elaborate on.
Abstract: Designers use third-party intellectual property (IP) cores and outsource various steps in their integrated circuit (IC) design flow, including fabrication. As a result, security vulnerabilities have been emerging, forcing IC designers and end-users to reevaluate their trust in hardware. If an attacker gets hold of an unprotected design, attacks such as reverse engineering, insertion of malicious circuits, and IP piracy are possible. In this paper, we shed light on the vulnerabilities in very large scale integration (VLSI) design and fabrication flow, and survey design-for-trust (DfTr) techniques that aim at regaining trust in IC design. We elaborate on four DfTr techniques: logic encryption, split manufacturing, IC camouflaging, and Trojan activation. These techniques have been developed by reusing VLSI test principles.

84 citations


Journal ArticleDOI
TL;DR: To the best of the knowledge, this work proves for the first time the effectiveness of LPA attacks in a real scenario where on chip noise and process variations are taken into account.
Abstract: This paper extends the analysis of the effectiveness of Leakage Power Analysis (LPA) attacks to cryptographic VLSI circuits on which circuit level countermeasures against Differential Power Analysis (DPA) are adopted. Security metrics used for assessing the DPA-resistance of crypto core implementations, such as the minimum number to disclosure (MTD) and the asymptotic correlation coefficient, have been extended to the case of LPA. The LPA-resistance has been evaluated in terms of MTD as a function of the on chip noise. Noise variances up to 10000 times greater than the signal variance have been taken into account and LPA attacks have been successfully executed for all the logic styles under analysis using less than 100000 measurements. Moreover the role of process variations has been investigated through extensive Monte Carlo simulations in order to evaluate their impact on the leakage model for the logic styles under analysis. Results show that LPA attacks can be successfully carried out on the different anti-DPA logic styles even in presence of process variations. To the best of our knowledge, this work proves for the first time the effectiveness of LPA attacks in a real scenario where on chip noise and process variations are taken into account.

72 citations


Journal ArticleDOI
01 Apr 2014-ACS Nano
TL;DR: This work demonstrates the first very large scale integration (VLSI)-compatible approach to realizing CNFET digital circuits at highly scaled technology nodes, with devices ranging from 90 nm to sub-20 nm channel lengths.
Abstract: Carbon nanotube (CNT) field-effect transistors (CNFETs) are a promising emerging technology projected to achieve over an order of magnitude improvement in energy-delay product, a metric of performance and energy efficiency, compared to silicon-based circuits. However, due to substantial imperfections inherent with CNTs, the promise of CNFETs has yet to be fully realized. Techniques to overcome these imperfections have yielded promising results, but thus far only at large technology nodes (1 μm device size). Here we demonstrate the first very large scale integration (VLSI)-compatible approach to realizing CNFET digital circuits at highly scaled technology nodes, with devices ranging from 90 nm to sub-20 nm channel lengths. We demonstrate inverters functioning at 1 MHz and a fully integrated CNFET infrared light sensor and interface circuit at 32 nm channel length. This demonstrates the feasibility of realizing more complex CNFET circuits at highly scaled technology nodes.

71 citations


Journal ArticleDOI
TL;DR: A generalized methodology to determine the major set of device parameters sensitive to random and systematic process variability in nanoscale MOSFET devices, map each variability-sensitive device parameter to the corresponding compact model parameter of the target compact model, and generate statistical compact MOSfET models for variability-aware VLSI circuit design.
Abstract: This paper presents a systematic methodology to develop compact MOSFET models for process variability-aware VLSI circuit design. Process variability in scaled CMOS technologies severely impacts the functionality, yield, and reliability of advanced integrated circuit devices, circuits, and systems. Therefore, variability-aware circuit design techniques are required for realistic assessment of the impact of random and systematic process variability in advanced VLSI circuit performance. However, variability-aware circuit design requires compact MOSFET variability models for computer analysis of the impact of process variability in VLSI circuit design. This paper describes a generalized methodology to determine the major set of device parameters sensitive to random and systematic process variability in nanoscale MOSFET devices, map each variability-sensitive device parameter to the corresponding compact model parameter of the target compact model, and generate statistical compact MOSFET models for variability-aware VLSI circuit design.

71 citations


Journal ArticleDOI
TL;DR: A memory-based in-place architecture is presented for the FFT processor that performs 64000-point finite-field FFT operations using a radix-16 computing unit and 16 dual-port SRAMs, adopting a special prime as the base of the finite field.
Abstract: This paper presents the design of a power- and area-efficient high-speed 768000-bit multiplier, based on fast Fourier transform multiplication for fully homomorphic encryption operations. A memory-based in-place architecture is presented for the FFT processor that performs 64000-point finite-field FFT operations using a radix-16 computing unit and 16 dual-port SRAMs. By adopting a special prime as the base of the finite field, the radix-16 calculations are simplified to requiring only additions and shift operations. A two-stage carry-look-ahead scheme is employed to resolve carries and obtain the multiplication result. The multiplier design is validated by comparing its results with the GNU Multiple Precision (GMP) arithmetic library. The proposed design has been synthesized using 90-nm process technology with an estimated die area of 45.3 mm 2 . At 200 MHz, the large-number multiplier offers roughly twice the performance of a previous implementation on an NVIDIA C2050 graphics processor unit and is 29 times faster than the Xeon X5650 CPU, while at the same time consuming a modest 0.97 W.

Journal ArticleDOI
TL;DR: A rate-0.96 (68254, 65536) shortened Euclidean geometry low-density parity-check code and its VLSI implementation for high-throughput NAND Flash memory systems is presented and compared with a BCH (Bose-Chaudhuri-Hocquenghem) decoding circuit showing comparable error- correcting performance and throughput.
Abstract: The reliability of data stored in high-density Flash memory devices tends to decrease rapidly because of the reduced cell size and multilevel cell technology. Soft-decision error correction algorithms that use multiple-precision sensing for reading memory can solve this problem; however, they require very complex hardware for high-throughput decoding. In this paper, we present a rate-0.96 (68254, 65536) shortened Euclidean geometry low-density parity-check code and its VLSI implementation for high-throughput NAND Flash memory systems. The design employs the normalized a posteriori probability (APP)-based algorithm, serial schedule, and conditional update, which lead to simple functional units, halved decoding iterations, and low-power consumption, respectively. A pipelined-parallel architecture is adopted for high-throughput decoding, and memory-reduction techniques are employed to minimize the chip size. The proposed decoder is implemented in 0.13-μm CMOS technology, and the chip size and energy consumption of the decoder are compared with those of a BCH (Bose-Chaudhuri-Hocquenghem) decoding circuit showing comparable error-correcting performance and throughput.

Proceedings ArticleDOI
04 May 2014
TL;DR: This work develops a digital VLSI for phoneme recognition using deep neural networks and assess the design in terms of throughput, chip size, and power consumption.
Abstract: Deep neural networks show very good performance in phoneme and speech recognition applications when compared to previously used GMM (Gaussian Mixture Model)-based ones. However, efficient implementation of deep neural networks is difficult because the network size needs to be very large when high recognition accuracy is demanded. In this work, we develop a digital VLSI for phoneme recognition using deep neural networks and assess the design in terms of throughput, chip size, and power consumption. The developed VLSI employs a fixed-point optimization method that only uses +Δ, 0, and -Δ for representing each of the weight. The design employs 1,024 simple processing units in each layer, which however can be scaled easily according to the needed throughput, and the throughput of the architecture varies from 62.5 to 1,000 times of the real-time processing speed.

Journal ArticleDOI
TL;DR: The logarithmic-Bahl-Cocke-Jelinek-Raviv (LBCJR) algorithm used in MAP decoders is presented, and an ungrouped backward recursion technique for the computation of backward state metrics is presented.
Abstract: This work focuses on the VLSI design aspect of high- speed maximum a posteriori (MAP) probability decoders which are intrinsic building-blocks of parallel turbo decoders. For the logarithmic-Bahl-Cocke-Jelinek-Raviv (LBCJR) algorithm used in MAP decoders, we have presented an ungrouped backward recursion technique for the computation of backward state metrics. Unlike the conventional decoder architectures, MAP decoder based on this technique can be extensively pipelined and retimed to achieve higher clock frequency. Additionally, the state metric normalization technique employed in the design of an add-compare-select-unit (ACSU) has reduced critical path delay of our decoder architecture. We have designed and implemented turbo decoders with 8 and 64 parallel MAP decoders in 90 nm CMOS technology. VLSI implementation of an 8 × parallel turbo-decoder has achieved a maximum throughput of 439 Mbps with 0.11 nJ/bit/iteration energy-efficiency. Similarly, 64 × parallel turbo-decoder has achieved a maximum throughput of 3.3 Gbps with an energy-efficiency of 0.079 nJ/bit/iteration. These high-throughput decoders meet peak data-rates of 3GPP-LTE and LTE-Advanced standards.

Journal ArticleDOI
TL;DR: An effective hybrid multi-objective partitioning algorithm, based on discrete particle swarm optimzation with local search strategy, called MDPSO-LS is presented to solve the VLSI twoway partitioning with simultaneous cutsize and circuit delay minimization.
Abstract: Very large scale integration (VLSI) circuit partitioning is an important problem in design automation of VLSI chips and multichip systems; it is an NP-hard combinational optimization problem. In this paper, an effective hybrid multi-objective partitioning algorithm, based on discrete particle swarm optimzation (DPSO) with local search strategy, called MDPSO-LS, is presented to solve the VLSI twoway partitioning with simultaneous cutsize and circuit delay minimization. Inspired by the physics of genetic algorithm, uniform crossover and random two-point exchange operators are designed to avoid the case of generating infeasible solutions. Furthermore, the phenotype sharing function of the objective space is applied to circuit partitioning to obtain a better approximation of a true Pareto front, and the theorem of Markov chains is used to prove global convergence. To improve the ability of local exploration, Fiduccia-Matteyses (FM) strategy is also applied to further improve the cutsize of each particle, and a local search strategy for improving circuit delay objective is also designed. Experiments on ISCAS89 benchmark circuits show that the proposed algorithm is efficient.

Book
21 Apr 2014
TL;DR: Yu et al. as discussed by the authors presented a systematic introduction to, and treatment of, the key field-solver methods for RC extraction of VLSI interconnects and substrate coupling in mixed-signal ICs.
Abstract: Resistance and capacitance (RC) extraction is an essential step in modeling the interconnection wires and substrate coupling effect in nanometer-technology integrated circuits (IC). The field-solver techniques for RC extraction guarantee the accuracy of modeling, and are becoming increasingly important in meeting the demand for accurate modeling and simulation of VLSI designs. Advanced Field-Solver Techniques for RC Extraction of Integrated Circuits presents a systematic introduction to, and treatment of, the key field-solver methods for RC extraction of VLSI interconnects and substrate coupling in mixed-signal ICs. Various field-solver techniques are explained in detail, with real-world examples to illustrate the advantages and disadvantages of each algorithm.This book will benefit graduate students and researchers in the field of electrical and computer engineering as well as engineers working in the IC design and design automation industries.Dr. Wenjian Yu is an Associate Professor at the Department of Computer Science and Technology at Tsinghua University in China; Dr. Xiren Wang is a R&D Engineer at Cadence Design Systems in the USA.

Proceedings ArticleDOI
27 Aug 2014
TL;DR: This work explores the use of SC to implement a representative complex matrix operation, namely eigenvector computation, and applies it to a training task for visual face recognition, and shows that the SC design has performance comparable to its conventional binary counterpart, while being able to trade computation time for accuracy.
Abstract: Stochastic computing (SC) is a re-emerging technique to process probability data encoded in digital bit-streams. Its main advantage is that arithmetic operations can be implemented by extremely small and low-power logic circuits. This makes SC suitable for signal-processing applications involving matrix operations whose VLSI implementation is very costly. Previous SC approaches only address basic matrix operations with relatively low accuracy needs. We explore the use of SC to implement a representative complex matrix operation, namely eigenvector computation. We apply it to a training task for visual face recognition, and show that our SC design has performance comparable to its conventional binary counterpart, while being able to trade computation time for accuracy.

Journal ArticleDOI
TL;DR: This work targets the design of reversible ALU (arithmetic logic unit) in QCA framework and proposes a new “Reversible QCA” (RQCA), which is established to be more effective than the existing ALU.
Abstract: Reversible logic is emerging as a prospective logic design style for implementing ultra-low-power VLSI circuits. It promises low-power consuming circuits by nullifying the energy dissipation in irreversible logic. On the other hand, as a potential alternative to CMOS technology, Quantum-dot Cellular Automata (QCA) promises energy efficient digital design with high device density and high computing speed. The integration of reversible logic in QCA circuit is expected to be effective in addressing the issue of energy dissipation at nano scale regime. This work targets the design of reversible ALU (arithmetic logic unit) in QCA framework and proposes a new “Reversible QCA” (RQCA). The primary design focus is on optimizing the number of reversible gates, quantum cost and the garbage outputs that are the most important hindrances in realizing reversible logic. Besides optimization, the fault coverage capability of RQCA under missing/additional cell deposition defects is analysed. The scope of reversible logic is further outstretched by introducing a novel DFT (design for testability) architecture around the reversible ALU that reduces testing overhead. The performance of proposed ALU is evaluated, subjected to different faults, and is established to be more effective than the existing ALU.

Journal ArticleDOI
TL;DR: In this article, a neuro-inspired, hardware-friendly readout stage for the liquid state machine (LSM), a popular model for reservoir computing, has been proposed, which incorporates neurons having multiple dendrites with a lumped nonlinearity (two compartment model).
Abstract: In this paper, we describe a new neuro-inspired, hardware-friendly readout stage for the liquid state machine (LSM), a popular model for reservoir computing. Compared to the parallel perceptron architecture trained by the p-delta algorithm, which is the state of the art in terms of performance of readout stages, our readout architecture and learning algorithm can attain better performance with significantly less synaptic resources making it attractive for VLSI implementation. Inspired by the nonlinear properties of dendrites in biological neurons, our readout stage incorporates neurons having multiple dendrites with a lumped nonlinearity (two compartment model). The number of synaptic connections on each branch is significantly lower than the total number of connections from the liquid neurons and the learning algorithm tries to find the best `combination' of input connections on each branch to reduce the error. Hence, the learning involves network rewiring (NRW) of the readout network similar to structural plasticity observed in its biological counterparts. We show that compared to a single perceptron using analog weights, this architecture for the readout can attain, even by using the same number of binary valued synapses, up to 3.3 times less error for a two-class spike train classification problem and 2.4 times less error for an input rate approximation task. Even with 60 times larger synapses, a group of 60 parallel perceptrons cannot attain the performance of the proposed dendritically enhanced readout. An additional advantage of this method for hardware implementations is that the `choice' of connectivity can be easily implemented exploiting address event representation (AER) protocols commonly used in current neuromorphic systems where the connection matrix is stored in memory. Also, due to the use of binary synapses, our proposed method is more robust against statistical variations.

Journal ArticleDOI
TL;DR: The following articles are retracted because after thorough investigation evidence points towards them having at least one author or being reviewed by at leastOne reviewer who has been implicated in the peer review ring and/or citation ring.
Abstract: In 2013 the Editor of Journal of Vibration and Control and SAGE became aware of a peer review ring involving assumed and fabricated identities that appeared to centre around Peter Chen at National Pingtung University of Education, Taiwan (NPUE). SAGE and the Editor then began a complex investigation into the case during the rest of 2013 and 2014. Following an unsatisfactory response from Peter Chen, NPUE was notified. NPUE were serious in addressing the Journal and SAGE’s concerns. NPUE confirmed that the institution was investigating Peter Chen. SAGE subsequently uncovered a citation ring involving the above mentioned author and others. We regret that individual authors have compromised the academic record by perverting the peer review process and apologise to readers. On uncovering problems with peer review and citation SAGE immediately put steps in place to avoid similar vulnerability of the Journal to exploitation in the future. More information may be found at www.sagepub.co.uk/JVC_Statement_2014. The Journal and SAGE understand from NPUE that Peter Chen has resigned his post at NPUE. The following articles are retracted because after thorough investigation evidence points towards them having at least one author or being reviewed by at least one reviewer who has been implicated in the peer review ring and/or citation ring. All authors have had an opportunity to respond to the allegations and proposed actions. OnlineFirst articles (these articles will not be published in an issue) Chen CY, Chen T-H, Chen Y-H, Yu S-E and Chung P-Y (2013) Information technology system modeling an integrated C-TAM-TPB model to the validation of ocean tidal analyses Journal of Vibration and Control Epub ahead of print 7 May 2013. doi: 10.1177/1077546312472924 Chang R-F, Chen CY, Su F-P and Lin H-C (2013) A two-step approach for broadband digital signal processing technique Journal of Vibration and Control Epub ahead of print 26 April 2013. doi: 10.1177/1077546312472925 Chen TH, Chang CJ, Yu SE, Chung PY and Liu C-K (2013) Nonlinear information analysis and system management technique: the influence of design experience and control complexity Journal of Vibration and Control Epub ahead of print 12 April 2013. doi: 10.1177/1077546312473321 Chen CY, Shih BY, Chen YH, Yu SE and Liu YC (2013) The exploration of a 3T flow model using vibrating NXT: II. Model validation Journal of Vibration and Control Epub ahead of print 10 April 2013. doi: 10.1177/1077546312470481 Chen CY, Shih BY, Chen YH, Yu SE and Liu YC (2013) The exploration of 3T flow model using vibrating NXT: I. model formulation Journal of Vibration and Control Epub ahead of print 6 February 2013. doi: 10.1177/1077546312467360 Lin M-L and Chen C-W (2013) Stability analysis of fuzzy-based NN modeling for ecosystems using fuzzy Lyapunov methods Journal of Vibration and Control Epub ahead of print 6 February 2013. doi: 10.1177/1077546312466687 Chen CY, Chen TH, Chen YH and Chiu J (2012) A multi-stage method for deterministic-statistical analysis: a mathematical case and measurement studies Journal of Vibration and Control Epub ahead of print 20 December 2012. doi: 10.1177/1077546312466579 Shih BY, Lin MC and Chen CY (2012) Autonomous navigation system for radiofrequency identification mobile robot e-book reader Journal of Vibration and Control Epub ahead of print 13 December 2012. doi: 10.1177/1077546312466578 Chang RF, Chen CY, Su FP, Lin HC and Lu C-K (2012) Multiphase SUMO robot based on an agile modeling-driven process for a small mobile robot Journal of Vibration and Control Epub ahead of print 13 December 2012. doi: 10.1177/1077546312464993 Shih B-Y, Lin Y-K, Cheng M-H, Chen C-Y and Chiu C-P (2012) The development of an application program interactive game-based information system Journal of Vibration and Control Epub ahead of print 12 December 2012. doi: 10.1177/1077546312464682 Chen C-Y, Chang C-J and Lin C-H (2012) On dynamic access control in web 2.0 and cloud interactive information hub: technologies Journal of Vibration and Control Epub ahead of print 12 December 2012. doi: 10.1177/1077546312464992 Shin BY, Chen CY and Hsu KH (2012) Robot cross platform system using innovative interactive theory and selection algorithms for Android application Journal of Vibration and Control Epub ahead of print 13 November 2012. doi: 10.1177/1077546312463757 Articles published in an issue Chen C-W (2014) Applications of neural-network-based fuzzy logic control to a nonlinear time-delay chaotic system Journal of Vibration and Control 20 (4): 589-605. Epub ahead of print 5 November 2012. doi: 10.1177/1077546312461370 Chen C-W (2014) A review of intelligent algorithm approaches and neural-fuzzy stability criteria for time-delay tension leg platform systems Journal of Vibration and Control 20 (4): 561-575. Epub ahead of print 5 November 2012. doi: 10.1177/1077546312463759 Chen C-Y, Chang C-J and Lin C-H (2014) On dynamic access control in web 2.0 and cloud interactive information hub: trends and theories Journal of Vibration and Control 20 (4): 548-560. Epub ahead of print 5 November 2012. doi: 10.1177/1077546312463762 Lin M-L and Chen C-W (2014) Stability conditions for ecosystem modeling using the fuzzy Lyapunov method Journal of Vibration and Control 20 (2): 290-302. Epub ahead of print 23 October 2012. doi: 10.1177/1077546312451301 Chen C-H, Kuo C-M, Hsieh S-H and Chen C-Y (2014) Highly efficient very-large-scale integration (VLSI) implementation of probabilistic neural network image interpolator Journal of Vibration and Control 20 (2): 218-224. Epub ahead of print 22 October 2012. doi: 10.1177/1077546312458822 Chen C-Y (2014) Wave vibration and simulation in dissipative media described by irregular boundary surfaces: a mathematical formulation Journal of Vibration and Control 20 (2): 191-203. Epub ahead of print 22 October 2012. doi: 10.1177/1077546312464258 Chen C-H, Yao T-K, Dai J-H and Chen C-Y (2014) A pipelined multiprocessor system- on-a-chip (SoC) design methodology for streaming signal processing Journal of Vibration and Control 20 (2): 163-178. Epub ahead of print 16 October 2012. doi: 10.1177/1077546312458821 Lin M-L and Chen C-W (2014) Fuzzy neural modeling for n-degree ecosystems using the linear matrix inequality approach Journal of Vibration and Control 20 (1): 82-93. Epub ahead of print 8 October 2012. doi: 10.1177/1077546312458533 Chen C-H, Wu W-X and Chen C-Y (2013) Ant-inspired collective problem-solving systems Journal of Vibration and Control 19 (16): 2481-2490. Epub ahead of print 18 September 2012. doi: 10.1177/1077546312456231 Chen C-H, Yao T-K, Kuo C-M and Chen C-Y (2013) Evolutionary design of constructive multilayer feedforward neural network Journal of Vibration and Control 19 (16): 2413-2420. Epub ahead of print 12 September 2012. doi: 10.1177/1077546312456726 Chen C-W (2013) Applications of the fuzzy-neural Lyapunov criterion to multiple time-delay systems Journal of Vibration and Control 19 (13): 2054-2067. Epub ahead of print 16 August 2012. doi: 10.1177/1077546312451034 Chung P-Y, Chen Y-H, Walter L and Chen C-Y (2013) Influence and dynamics of a mobile robot control on mechanical components Journal of Vibration and Control 19 (13): 1923-1935. Epub ahead of print 20 July 2012. doi: 10.1177/1077546312452184 Chen C-W (2013) Neural network-based fuzzy logic parallel distributed compensation controller for structural system Journal of Vibration and Control 19 (11): 1709-1727. Epub ahead of print 22 June 2012. doi: 10.1177/1077546312442233 Chen C-W, Yeh K, Yang H-C, Liu KFR and Liu C-C (2013) A critical review of structural system control by the large-scaled neural network linear-deferential-inclusion-based criterion Journal of Vibration and Control 19 (11): 1658-1673. Epub ahead of print 18 June 2012. doi: 10.1177/1077546312443377 Chen C-H, Kuo C-M, Chen C-Y and Dai J-H (2013) The design and synthesis using hierarchical robotic discrete-event modeling Journal of Vibration and Control 19 (11): 1603-1613. Epub ahead of print 27 June 2012. doi: 10.1177/1077546312449645 Chang CJ, Chen CY and Chou I-T (2013) The design of information and communication technologies: telecom MOD strength machines Journal of Vibration and Control 19 (10): 1499-1513. Epub ahead of print 27 June 2012. doi: 10.1177/1077546312449644 Shih B-Y, Chen C-Y, Li K-H, Wu T-Y, Chen G-Y (2013) A novel NXT control method for implementing force sensing and recycling in a training robot Journal of Vibration and Control 19 (10): 1443-1459. Epub ahead of print 1 June 2012. doi: 10.1177/1077546312446361 Chen C-W, Chen P-C and Chiang W-L (2013) Modified intelligent genetic algorithm-based adaptive neural network control for uncertain structural systems Journal of Vibration and Control 19 (9): 1333-1347. Epub ahead of print 31 May 2012. doi: 10.1177/1077546312442232 Chen C-Y, Shih B-Y, Shih C-H and Wang L-H (2013) Enhancing robust and stability control of a humanoid biped robot: system identification approach. Journal of Vibration and Control 19 (8): 1199-1207. Epub ahead of print 26 April 2012. doi: 10.1177/1077546312442947 Chang C-J, Chen C-Y and Huang C-W (2013) Applications for medical recovery using wireless control of a bluetooth ball with a hybrid G-sensor and human-computer interface technology Journal of Vibration and Control 19 (8): 1139-1151. Epub ahead of print 24 April 2012. doi: 10.1177/1077546312442948 Hsu W-K, Chiou D-J, Chen C-W, Liu M-Y, Chiang W-L and Huang P-C (2013) Sensitivity of initial damage detection for steel structures using the Hilbert-Huang transform method Journal of Vibration and Control 19 (6): 857-878. Epub ahead of print 29 February 2012. doi: 10.1177/1077546311434794 Chen C-Y, Shih B-Y, Shih C-H and Wang L-H (2013) Human–machine interface for the motion control of humanoid biped robots using a graphical user interface Motion Editor Journal of Vibration and Control 19 (6): 814-820. Epub ahead of print 23 February 2012. doi: 10.1177/1077546312437804 Chen C-Y (201

Journal ArticleDOI
TL;DR: On/off logic (ONOFIC) as mentioned in this paper uses extra insertion of two transistors (an NMOS and a PMOS) within the logic block to improve power dissipation and propagation delay of the logic circuits.
Abstract: Shrinking in the device dimensions increases the device density on the chip and thus reducing the overall chip area requirement for logic implementation Minimising the chip area is not a lonely optimisation performance factor for a VLSI chip designer The other equally important performance parameters such that power dissipation and propagation delay are the thinkable facts for a designer The focusable part of power dissipation is the huge leakage current in deep submicron (DSM) regime Many leakage reduction techniques are applied to reduce the leakage current in the DSM regime but they have own limitations Our proposed on/off logic (ONOFIC) approach gives an excellent settlement between power dissipation and propagation delay for designing the nanoscale CMOS circuits It uses extra insertion of two transistors (an NMOS and a PMOS) within the logic block The exact on/off level of the ONOFIC block improves the power dissipation and propagation delay of the logic circuits In this article, ONOFIC appro

Proceedings ArticleDOI
06 Mar 2014
TL;DR: A high-performance eDRAM based on a 22nm tri-gate CMOS technology is introduced, which enables the integration of an e DRAM cell into the logic technology platform and features a well-balanced configuration to achieve both optimal array efficiency and bandwidth.
Abstract: CMOS technology scaling continues to drive higher levels of integration in VLSI design, which adds more compute engines on a die. To meet the overall performance-scaling needs, high-speed and high-bandwidth memory is becoming increasingly important. Conventional VLSI systems often rely on on-die SRAMs to address the performance gap between CPU and main memory, DRAM. However, with the rapid growth in capacity needs for high-performance memory, SRAM is not always sufficient to meet the demands of bandwidth-intense applications. Embedded DRAM (eDRAM) has been explored as an alternative to satisfy the high-performance and density needs in memory [1-3]. In this paper, a high-performance eDRAM based on a 22nm tri-gate CMOS technology is introduced. This eDRAM technology enables the integration of an eDRAM cell into the logic technology platform [4]. The design features a well-balanced configuration to achieve both optimal array efficiency and bandwidth. By leveraging the high-performance and low-voltage tri-gate transistor at 22nm generation, the eDRAM achieves a wide range in operating voltage, from 1.1V down to 0.7V, which is essential for low-power logic applications.

Proceedings ArticleDOI
01 Jun 2014
TL;DR: This paper proposes a parallel event-based method for calibrating appropriately the synaptic weights and demonstrates the method by encoding and decoding arbitrary mathematical functions, and by implementing dynamical systems via recurrent connections.
Abstract: Brain-inspired, spike-based computation in electronic systems is being investigated for developing alternative, non-conventional computing technologies. The Neural Engineering Framework provides a method for programming these devices to implement computation. In this paper we apply this approach to perform arbitrary mathematical computation using a mixed signal analog/digital neuromorphic multi-neuron VLSI chip. This is achieved by means of a network of spiking neurons with multiple weighted connections. The synaptic weights are stored in a 4-bit on-chip programmable SRAM block. We propose a parallel event-based method for calibrating appropriately the synaptic weights and demonstrate the method by encoding and decoding arbitrary mathematical functions, and by implementing dynamical systems via recurrent connections.

Journal ArticleDOI
TL;DR: A new neuro-inspired, hardware-friendly readout stage for the liquid state machine (LSM), a popular model for reservoir computing, which can attain better performance with significantly less synaptic resources making it attractive for VLSI implementation.
Abstract: In this paper, we describe a new neuro-inspired, hardware-friendly readout stage for the liquid state machine (LSM), a popular model for reservoir computing. Compared to the parallel perceptron architecture trained by the p-delta algorithm, which is the state of the art in terms of performance of readout stages, our readout architecture and learning algorithm can attain better performance with significantly less synaptic resources making it attractive for VLSI implementation. Inspired by the nonlinear properties of dendrites in biological neurons, our readout stage incorporates neurons having multiple dendrites with a lumped nonlinearity. The number of synaptic connections on each branch is significantly lower than the total number of connections from the liquid neurons and the learning algorithm tries to find the best 'combination' of input connections on each branch to reduce the error. Hence, the learning involves network rewiring (NRW) of the readout network similar to structural plasticity observed in its biological counterparts. We show that compared to a single perceptron using analog weights, this architecture for the readout can attain, even by using the same number of binary valued synapses, up to 3.3 times less error for a two-class spike train classification problem and 2.4 times less error for an input rate approximation task. Even with 60 times larger synapses, a group of 60 parallel perceptrons cannot attain the performance of the proposed dendritically enhanced readout. An additional advantage of this method for hardware implementations is that the 'choice' of connectivity can be easily implemented exploiting address event representation (AER) protocols commonly used in current neuromorphic systems where the connection matrix is stored in memory. Also, due to the use of binary synapses, our proposed method is more robust against statistical variations.

Book
17 Nov 2014
TL;DR: A comparison study of MOS Fabrication Technology and Low Power Software Approaches found that MOS Combinational Circuits outperforms conventional MOS Circuits in terms of power dissipation and efficiency.
Abstract: Introduction.- MOS Fabrication Technology.- MOS Transistors.- MOS Inverters.- MOS Combinational Circuits.- Sources of Power Dissipation.- Supply Voltage Scaling for Low Power.- Switched Capacitance Minimization.- Leakage Power Minimization.- Adiabatic Logic Circuits.- Battery-Aware Systems.- Low Power Software Approaches.

Journal ArticleDOI
TL;DR: A field programmable gate array (FPGA) based very large scale integration (VLSI) architecture of RCM-RW algorithm for digital images that can serve the purpose of media authentication in real-time environment is proposed.

Journal ArticleDOI
TL;DR: To achieve the computation demand of guided filtering in full-HD video, a double integral image architecture for guided filter ASIC design is proposed and a reformation of the guided filter formula is proposed, which can prevent the error resulted from truncation in the fractional part and modify the regularization parameter ε on user's demand.
Abstract: Filtering is widely used in image and video processing for various applications Recently, the guided filter has been proposed and became one of the popular filtering methods In this paper, to achieve the computation demand of guided filtering in full-HD video, a double integral image architecture for guided filter ASIC design is proposed In addition, a reformation of the guided filter formula is proposed, which can prevent the error resulted from truncation in the fractional part and modify the regularization parameter e on user's demand The hardware architecture of the guided image filter is then proposed and can be embedded in mobile devices to achieve real-time HD applications To the best of our knowledge, this paper is also the first ASIC design for guided image filter With a TSMC 90-nm cell library, the design can operate at 100 MHz and support for Full-HD (1920 × 1080) 30 frame/s with 929K gate counts and 32 KB on-chip memory Moreover, for the hardware efficiency, our architecture is also the best compared to other previous works with bilateral filter

Proceedings ArticleDOI
06 Mar 2014
TL;DR: A time-stamp-based optic flow algorithm is devised and implemented, modified from the conventional EMD algorithm to give an optimum partitioning of hardware blocks in analog and digital domains as well as assign adequate allocation of pixel-level, column-parallel, and chip-level processing.
Abstract: Miniaturized low-power artificial compound eyes in a small form factor and a low payload can be a promising approach to provide wide-field information for micro-air-vehicle (MAV) applications. Recently, research efforts have been made to realize bio-inspired artificial compound eyes to mimic the wide field of view (FoV) of insect visual organs by implementing photoreceptors to independently face different angles [1-2]. However, these approaches have drawbacks. They use complicated fabrication processes to form a hemispherical lens configuration and secure an independent optical path to each photoreceptor. We take a simple and practical approach to realize wide-field optic flow sensing in a pseudo-hemispherical configuration by mounting a number of 2D array optic flow sensors on a flexible PCB module as shown in Figure 7.2.1. In this scheme, the 2D optic flow sensor should meet the requirements of MAV applications: extremely low power consumption while maintaining robust optic flow generation. Conventional optic flow algorithms, such as Lucas-and-Kanade, require huge amounts of numerical calculations; therefore, they require substantial digital hardware (CPU and/or FPGA), resulting in large power consumption [3-4]. As an alternative approach for low-power implementation, bio-inspired elementary motion detector (EMD) based algorithms (or neuromorphic algorithms) have been studied and implemented in analog VLSI circuits for autonomous navigation [5-6]. However, pure analog signal processing is easily susceptible to temperature and process variations and it is difficult to scale the pixel size or apply low-power design techniques because extensive analog processing is implemented in pixel-level circuits. In this work, we have devised and implemented a time-stamp-based optic flow algorithm, which is modified from the conventional EMD algorithm to give an optimum partitioning of hardware blocks in analog and digital domains as well as assign adequate allocation of pixel-level, column-parallel, and chip-level processing. Temporal filtering, which may require huge hardware resources if implemented in the digital domain, remains in a pixel-level analog processing unit. Feature detection is implemented using digital circuits that are column parallel. The embedded digital core decodes the 2D time-stamp information into velocity using chip-level processing. Finally, the estimated 16b optic flow data are compressed and transmitted to the host through a 4-wired Serial Peripheral Interface (SPI) bus.

Proceedings ArticleDOI
01 Jan 2014
TL;DR: To enable the capacitance extraction of chip-scale large VLSI layout using the floating random walk (FRW) algorithm, two techniques are proposed, including a virtual Gaussian surface sampling technique that makes efficient random sampling on theGaussian surface for complex nets with vias, and optimizes the sampling scheme to reduce the time of random walk.
Abstract: To enable the capacitance extraction of chip-scale large VLSI layout using the floating random walk (FRW) algorithm, two techniques are proposed. The first one is a virtual Gaussian surface sampling technique. It makes efficient random sampling on the Gaussian surface for complex nets with vias, and optimizes the sampling scheme to reduce the time of random walk. The other one is a parallelized, improved construction approach for Octree based space management structure. It can be over 5000X faster than the existing approach and provides same convenience to the FRW procedure. Numerical experiments on large cases with up to half million conductors validate the proposed techniques, and demonstrate a fast FRW solver for chip-scale extraction task.

Journal ArticleDOI
TL;DR: This investigation permits to understand the potential of TFETs and their advantages over traditional devices within a unitary framework that is based on fair design and comparison from device to circuit level, as well as to develop clear design perspectives in the context of ULV/ULP VLSI digital circuits.
Abstract: In Part II of this paper, the potential of tunnel FETs (TFETs) for ultra-low voltage (ULV)/ultra-low power (ULP) operation at 32-nm node is investigated through Verilog-A simulations of appropriate reference circuits. Critical issues arising at ultra-low voltages are analyzed, including static robustness of TFET logic gates, performance degradation, and sensitivity to process variations. Guidelines to design ultra-low energy standard cell libraries are derived. The minimum energy point is analyzed in a wide range of conditions, and guidelines for microarchitectural optimization for ultra-low energy are introduced. Voltage scalability of static RAM memories is also analyzed as main limitation to aggressive voltage scaling of very large scale integration (VLSI) systems, and improved precharge schemes are introduced to reduce leakage. The impact of variations of the main device parameters on VLSI digital circuits is investigated to identify the most critical variations that need to be controlled at process level. This investigation permits to understand the potential of TFETs and their advantages over traditional devices within a unitary framework that is based on fair design and comparison from device to circuit level, as well as to develop clear design perspectives in the context of ULV/ULP VLSI digital circuits.