Showing papers in "ACM Journal on Emerging Technologies in Computing Systems in 2015"

PDF

Open Access

Journal Article•DOI•

Spintronics: Emerging Ultra-Low-Power Circuits and Systems beyond MOS Technology

[...]

Wang Kang¹, Yue Zhang², Zhaohao Wang², Jacques-Olivier Klein², Claude Chappert², Dafiné Ravelosona², Gefei Wang¹, Youguang Zhang¹, Weisheng Zhao² - Show less +5 more•Institutions (2)

Beihang University¹, University of Paris-Sud²

02 Sep 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: This review will give an overview of the status and prospects of spin-based devices and circuits that are currently under intense investigation and development across the world, and address particularly their merits and challenges for practical applications.

...read moreread less

Abstract: Conventional MOS integrated circuits and systems suffer serve power and scalability challenges as technology nodes scale into ultra-deep-micron technology nodes (e.g., below 40nm). Both static and dynamic power dissipations are increasing, caused mainly by the intrinsic leakage currents and large data traffic. Alternative approaches beyond charge-only-based electronics, and in particular, spin-based devices, show promising potential to overcome these issues by adding the spin freedom of electrons to electronic circuits. Spintronics provides data non-volatility, fast data access, and low-power operation, and has now become a hot topic in both academia and industry for achieving ultra-low-power circuits and systems. The ITRS report on emerging research devices identified the magnetic tunnel junction (MTJ) nanopillar (one of the Spintronics nanodevices) as one of the most promising technologies to be part of future micro-electronic circuits. In this review we will give an overview of the status and prospects of spin-based devices and circuits that are currently under intense investigation and development across the world, and address particularly their merits and challenges for practical applications. We will also show that, with a rapid development of Spintronics, some novel computing architectures and paradigms beyond classic Von-Neumann architecture have recently been emerging for next-generation ultra-low-power circuits and systems.

...read moreread less

168 citations

Journal Article•DOI•

QuickRecall: A HW/SW Approach for Computing across Power Cycles in Transiently Powered Computers

[...]

Hrishikesh Jayakumar¹, Arnab Raha¹, Woo Suk Lee¹, Vijay Raghunathan¹•Institutions (1)

Purdue University¹

03 Aug 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: Experimental results show that highly-efficient checkpointing translate to significant speedup in program execution time and reduction in application-level energy consumption.

...read moreread less

Abstract: Transiently Powered Computers (TPCs) are a new class of batteryless embedded systems that depend solely on energy harvested from external sources for performing computations. Enabling long-running computations on TPCs is a major challenge due to the highly intermittent nature of the power supply (often bursts of in-situ checkpointing technique for TPCs using FRAM that consumes only 30nJ while decreasing the time taken for saving and restoring a checkpoint to only 21.06μs, which is over two orders of magnitude lower than the corresponding overhead using flash. We have implemented and evaluated our technique, QuickRecall, using the TI MSP430FR5739 FRAM-enabled microcontroller. Experimental results show that our highly-efficient checkpointing translate to significant speedup (1.25x - 8.4x) in program execution time and reduction (∼3x) in application-level energy consumption.

...read moreread less

76 citations

Journal Article•DOI•

A Reconfigurable Digital Neuromorphic Processor with Memristive Synaptic Crossbar for Cognitive Computing

[...]

Yongtae Kim¹, Yong Zhang¹, Peng Li¹•Institutions (1)

Texas A&M University¹

27 Apr 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: A brain-inspired reconfigurable digital neuromorphic processor (DNP) architecture for large-scale spiking neural networks is presented and the functionality of the proposed DNP architecture is demonstrated by realizing an unsupervised-learning based character recognition system.

...read moreread less

Abstract: This article presents a brain-inspired reconfigurable digital neuromorphic processor (DNP) architecture for large-scale spiking neural networks. The proposed architecture integrates an arbitrary number of N digital leaky integrate-and-fire (LIF) silicon neurons to mimic their biological counterparts and on-chip learning circuits to realize spike-timing-dependent plasticity (STDP) learning rules. We leverage memristor nanodevices to build an N×N crossbar array to store not only multibit synaptic weight values but also network configuration data with significantly reduced area overhead. Additionally, the crossbar array is designed to be accessible both column- and row-wise to expedite the synaptic weight update process for learning. The proposed digital pulse width modulator (PWM) produces binary pulses with various durations for reading and writing the multilevel memristive crossbar. The proposed column based analog-to-digital conversion (ADC) scheme efficiently accumulates the presynaptic weights of each neuron and reduces silicon area overhead by using a shared arithmetic unit to process the LIF operations of all N neurons. With 256 silicon neurons, learning circuits and 64K synapses, the power dissipation and area of our DNP are 6.45 mW and 1.86 mm2, respectively, when implemented in a 90-nm CMOS technology. The functionality of the proposed DNP architecture is demonstrated by realizing an unsupervised-learning based character recognition system.

...read moreread less

75 citations

Journal Article•DOI•

Spike-Time-Dependent Encoding for Neuromorphic Processors

[...]

Chenyuan Zhao¹, Bryant Wysocki², Yifang Liu³, Clare Thiem², Nathan McDonald², Yang Yi - Show less +2 more•Institutions (3)

University of Kansas¹, Air Force Research Laboratory², Google³

21 Sep 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: This article pattern the neural activities across multiple timescales and encode the sensory information using time-dependent temporal scales and proposes the proposed spiking neuron, which is compact, low power, and robust.

...read moreread less

Abstract: This article presents our research towards developing novel and fundamental methodologies for data representation using spike-timing-dependent encoding. Time encoding efficiently maps a signal's amplitude information into a spike time sequence that represents the input data and offers perfect recovery for band-limited stimuli. In this article, we pattern the neural activities across multiple timescales and encode the sensory information using time-dependent temporal scales. The spike encoding methodologies for autonomous classification of time-series signatures are explored using near-chaotic reservoir computing. The proposed spiking neuron is compact, low power, and robust. A hardware implementation of these results is expected to produce an agile hardware implementation of time encoding as a signal conditioner for dynamical neural processor designs.

...read moreread less

48 citations

Journal Article•DOI•

Embedding of Large Boolean Functions for Reversible Logic

[...]

Mathias Soeken¹, Robert Wille¹, Oliver Keszocze¹, D. Michael Miller², Rolf Drechsler¹ - Show less +1 more•Institutions (2)

University of Bremen¹, University of Victoria²

09 Dec 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: It is proved that determining an optimal embedding is coNP-hard already for restricted cases and proposed heuristic and exact methods for determining both the number of additional lines and a corresponding embedding are proposed.

...read moreread less

Abstract: Reversible logic represents the basis for many emerging technologies and has recently been intensively studied. However, most of the Boolean functions of practical interest are irreversible and must be embedded into a reversible function before they can be synthesized. Thus far, an optimal embedding is guaranteed only for small functions, whereas a significant overhead results when large functions are considered. We study this issue in this article. We prove that determining an optimal embedding is coNP-hard already for restricted cases. Then, we propose heuristic and exact methods for determining both the number of additional lines and a corresponding embedding. For the approaches, we considered sum of products and binary decision diagrams as function representations. Experimental evaluations show the applicability of the approaches for large functions. Consequently, the reversible embedding of large functions is enabled as a precursor to subsequent synthesis.

...read moreread less

48 citations

Journal Article•DOI•

Designing a Million-Qubit Quantum Computer Using a Resource Performance Simulator

[...]

Muhammad Ahsan¹, Rodney Van Meter², Jungsang Kim¹•Institutions (2)

Duke University¹, Keio University²

28 Dec 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: In this article, the authors report a complete performance simulation software tool capable of searching the hardware design space by varying resource architecture and technology parameters, synthesizing and scheduling a fault-tolerant quantum algorithm within the hardware constraints, quantifying the performance metrics such as the execution time and the failure probability of the algorithm, and analyzing the breakdown of these metrics to highlight the performance bottlenecks and visualizing resource utilization.

...read moreread less

Abstract: The optimal design of a fault-tolerant quantum computer involves finding an appropriate balance between the burden of large-scale integration of noisy components and the load of improving the reliability of hardware technology. This balance can be evaluated by quantitatively modeling the execution of quantum logic operations on a realistic quantum hardware containing limited computational resources. In this work, we report a complete performance simulation software tool capable of (1) searching the hardware design space by varying resource architecture and technology parameters, (2) synthesizing and scheduling a fault-tolerant quantum algorithm within the hardware constraints, (3) quantifying the performance metrics such as the execution time and the failure probability of the algorithm, and (4) analyzing the breakdown of these metrics to highlight the performance bottlenecks and visualizing resource utilization to evaluate the adequacy of the chosen design. Using this tool, we investigate a vast design space for implementing key building blocks of Shor’s algorithm to factor a 1,024-bit number with a baseline budget of 1.5 million qubits. We show that a trapped-ion quantum computer designed with twice as many qubits and one-tenth of the baseline infidelity of the communication channel can factor a 2,048-bit integer in less than 5 months.

...read moreread less

30 citations

Journal Article•DOI•

A Survey of Architectural Techniques for Near-Threshold Computing

[...]

Sparsh Mittal¹•Institutions (1)

Oak Ridge National Laboratory¹

28 Dec 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: Several recent techniques that aim to offset these challenges for fully leveraging the potential of near-threshold voltage computing are surveyed, classifying these techniques along several dimensions and highlighting their similarities and differences.

...read moreread less

Abstract: Energy efficiency has now become the primary obstacle in scaling the performance of all classes of computing systems. Low-voltage computing, specifically, near-threshold voltage computing (NTC), which involves operating the transistor very close to and yet above its threshold voltage, holds the promise of providing many-fold improvement in energy efficiency. However, use of NTC also presents several challenges such as increased parametric variation, failure rate, and performance loss. This article surveys several recent techniques that aim to offset these challenges for fully leveraging the potential of NTC. By classifying these techniques along several dimensions, we also highlight their similarities and differences. It is hoped that this article will provide insights into state-of-the-art NTC techniques to researchers and system designers and inspire further research in this field.

...read moreread less

27 citations

Journal Article•DOI•

Programmable Spike-Timing-Dependent Plasticity Learning Circuits in Neuromorphic VLSI Architectures

[...]

Mostafa Rahimi Azghadi¹, Saber Moradi², Daniel Fasnacht², Mehmet Sirin Ozdas², Giacomo Indiveri² - Show less +1 more•Institutions (2)

University of Adelaide¹, University of Zurich²

02 Sep 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: How the programmable neuromorphic system proposed can be configured to implement specific spike-based synaptic plasticity rules is demonstrated and how it can be utilised in a cognitive task is depicted.

...read moreread less

Abstract: Hardware implementations of spiking neural networks offer promising solutions for computational tasks that require compact and low-power computing technologies. As these solutions depend on both the specific network architecture and the type of learning algorithm used, it is important to develop spiking neural network devices that offer the possibility to reconfigure their network topology and to implement different types of learning mechanisms. Here we present a neuromorphic multi-neuron VLSI device with on-chip programmable event-based hybrid analog/digital circuits; the event-based nature of the input/output signals allows the use of address-event representation infrastructures for configuring arbitrary network architectures, while the programmable synaptic efficacy circuits allow the implementation of different types of spike-based learning mechanisms. The main contributions of this article are to demonstrate how the programmable neuromorphic system proposed can be configured to implement specific spike-based synaptic plasticity rules and to depict how it can be utilised in a cognitive task. Specifically, we explore the implementation of different spike-timing plasticity learning rules online in a hybrid system comprising a workstation and when the neuromorphic VLSI device is interfaced to it, and we demonstrate how, after training, the VLSI device can perform as a standalone component (i.e., without requiring a computer), binary classification of correlated patterns.

...read moreread less

22 citations

Journal Article•DOI•

Energy-Neutral Design Framework for Supercapacitor-Based Autonomous Wireless Sensor Networks

[...]

Trong Nhan Le¹, Alain Pegatoquet², Olivier Berder¹, Olivier Sentieys¹, Arnaud Carer¹ - Show less +1 more•Institutions (2)

University of Rennes¹, University of Nice Sophia Antipolis²

02 Sep 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: An efficient energy harvesting system which is compatible with various environmental sources, such as light, heat, or wind energy, is proposed, and takes advantage of double-level capacitors not only to prolong system lifetime but also to enable robust booting from the exhausting energy of the system.

...read moreread less

Abstract: To design autonomous wireless sensor networks (WSNs) with a theoretical infinite lifetime, energy harvesting (EH) techniques have been recently considered as promising approaches. Ambient sources can provide everlasting additional energy for WSN nodes and exclude their dependence on battery. In this article, an efficient energy harvesting system which is compatible with various environmental sources, such as light, heat, or wind energy, is proposed. Our platform takes advantage of double-level capacitors not only to prolong system lifetime but also to enable robust booting from the exhausting energy of the system. Simulations and experiments show that our multiple-energy-sources converter (MESC) can achive booting time in order of seconds. Although capacitors have virtual recharge cycles, they suffer higher leakage compared to rechargeable batteries. Increasing their size can decrease the system performance due to leakage energy. Therefore, an energy-neutral design framework providing a methodology to determine the minimum size of those storage devices satisfying energy-neutral operation (ENO) and maximizing system quality-of-service (QoS) in EH nodes, when using a given energy source, is proposed. Experiments validating this framework are performed on a real WSN platform with both photovoltaic cells and thermal generators in an indoor environment. Moreover, simulations on OMNET++ show that the energy storage optimized from our design framework is utilized up to 93.86p.

...read moreread less

22 citations

Journal Article•DOI•

Modeling DVFS and Power-Gating Actuators for Cycle-Accurate NoC-Based Simulators

[...]

Davide Zoni¹, William Fornaciari¹•Institutions (1)

Polytechnic University of Milan¹

21 Sep 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: This article presents a simulation framework for power-performance analysis of multicore architectures with specific focus on the NoC that integrates accurate power gating and DVFS models encompassing also their timing and power overheads.

...read moreread less

Abstract: Networks-on-chip (NoCs) are a widely recognized viable interconnection paradigm to support the multi-core revolution. One of the major design issues of multicore architectures is still the power, which can no longer be considered mainly due to the cores, since the NoC contribution to the overall energy budget is relevant. To face both static and dynamic power while balancing NoC performance, different actuators have been exploited in literature, mainly dynamic voltage frequency scaling (DVFS) and power gating. Typically, simulation-based tools are employed to explore the huge design space by adopting simplified models of the components. As a consequence, the majority of state-of-the-art on NoC power-performance optimization do not accurately consider timing and power overheads of actuators, or (even worse) do not consider them at all, with the risk of overestimating the benefits of the proposed methodologies.This article presents a simulation framework for power-performance analysis of multicore architectures with specific focus on the NoC. It integrates accurate power gating and DVFS models encompassing also their timing and power overheads. The value added of our proposal is manyfold: (i) DVFS and power gating actuators are modeled starting from SPICE-level simulations; (ii) such models have been integrated in the simulation environment; (iii) policy analysis support is plugged into the framework to enable assessment of different policies; (iv) a flexible GALS (globally asynchronous locally synchronous) support is provided, covering both handshake and FIFO re-synchronization schemas. To demonstrate both the flexibility and extensibility of our proposal, two simple policies exploiting the modeled actuators are discussed in the article.

...read moreread less

22 citations

Journal Article•DOI•

Fault-Tolerant Operations for Universal Blind Quantum Computation

[...]

Chia-Hung Chien¹, Rodney Van Meter², Sy-Yen Kuo¹•Institutions (2)

National Taiwan University¹, Keio University²

03 Aug 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: Two fault-tolerant protocols are presented and the trade-off of the computational overhead using the ten-bit quantum carry-lookahead adder as an example is shown and it is suggested that Alice send input qubit encoded with error correction code instead of single input qubits.

...read moreread less

Abstract: Blind quantum computation is an appealing use of quantum information technology because it can conceal both the client's data and the algorithm itself from the server. However, problems need to be solved in the practical use of blind quantum computation and fault-tolerance is a major challenge. Broadbent et al. proposed running error correction over blind quantum computation, and Morimae and Fujii proposed using fault-tolerant entangled qubits as the resource for blind quantum computation. Both approaches impose severe demands on the teleportation channel, the former requiring unrealistic data rates and the latter near-perfect fidelity. To extend the application range of blind quantum computation, we suggest that Alice send input qubits encoded with error correction code instead of single input qubits. Two fault-tolerant protocols are presented and we showed the trade-off of the computational overhead using the ten-bit quantum carry-lookahead adder as an example. Though these two fault-tolerant protocols require the client to have more quantum computing ability than using approaches from prior work, they provide better fault-tolerance when the client and the server are connected by realistic quantum repeater networks.

...read moreread less

Journal Article•DOI•

Large-Scale Spiking Neural Networks using Neuromorphic Hardware Compatible Models

[...]

Jeffrey L. Krichmar¹, Philippe Coussy², Nikil Dutt¹•Institutions (2)

University of California, Irvine¹, University of Southern Brittany²

27 Apr 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: In this article, the authors argue that the neural circuit approach, in which networks of neuronal elements model brain circuitry are constructed, allows the development of practical applications and the exploration of brain function, and they present case studies of spiking neural networks in vision and recognition tasks based on one instantiation of a simulation environment.

...read moreread less

Abstract: Neuromorphic engineering is a fast growing field with great potential in both understanding the function of the brain, and constructing practical artifacts that build upon this understanding. For these novel chips and hardware to be useful, hardware compatible applications and simulation tools are needed. We argue that the neural circuit approach, in which networks of neuronal elements model brain circuitry are constructed, allows the development of practical applications and the exploration of brain function. At this level of abstraction, networks of 105 neurons or larger can be efficiently simulated, but still preserve the neuronal and synaptic dynamics that appear to be important for brain function. Because the neural circuit level supports spiking neural networks and the prevalent Addressable Event Representation (AER) communication scheme, it fits well with many existing neuromorphic hardware and simulation tools. To show how this approach can be applied, we present case studies of spiking neural networks in vision and recognition tasks based on one instantiation of a simulation environment. However, there are now many hardware options, simulation environments, and applications in this emerging field. These approaches and other considerations are discussed.

...read moreread less

Journal Article•DOI•

An MINLP Model for Scheduling and Placement of Quantum Circuits with a Heuristic Solution Approach

[...]

Tayebeh Bahreini¹, Naser Mohammadzadeh¹•Institutions (1)

Shahed University¹

21 Sep 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: A mixed integer nonlinear programming model is proposed for the placement and scheduling of quantum circuits in such a way that latency is minimized and is proved reducible to a quadratic assignment problem which is a well-known NP-complete combinatorial optimization problem.

...read moreread less

Abstract: Recent works on quantum physical design have pushed the scheduling and placement of quantum circuit into their prominent positions. In this article, a mixed integer nonlinear programming model is proposed for the placement and scheduling of quantum circuits in such a way that latency is minimized. The proposed model determines locations of gates and the sequence of operations. The proposed model is proved reducible to a quadratic assignment problem which is a well-known NP-complete combinatorial optimization problem. Since it is impossible to find the optimal solution of this NP-complete problem for large quantum circuits within a reasonable amount of time, a metaheuristic solution method is developed for the proposed model. Some experiments are conducted to evaluate the performance of the developed solution approach. Experimental results show that the proposed approach improves average latency by about 24.09p for the attempted benchmarks.

...read moreread less

Journal Article•DOI•

PROTON+: A Placement and Routing Tool for 3D Optical Networks-on-Chip with a Single Optical Layer

[...]

Anja von Beuningen¹, Luca Ramini², Davide Bertozzi², Ulf Schlichtmann¹•Institutions (2)

Technische Universität München¹, University of Ferrara²

18 Dec 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: This article proposes PROTON+, a fast tool for placement and routing of 3D ONoCs minimizing the total laser power, and studies optimal positions of memory controllers for the first time.

...read moreread less

Abstract: Optical Networks-on-Chip (ONoCs) are a promising technology to overcome the bottleneck of low bandwidth of electronic Networks-on-Chip. Recent research discusses power and performance benefits of ONoCs based on their system-level design, while layout effects are typically overlooked. As a consequence, laser power requirements are inaccurately computed from the logic scheme but do not consider the layout. In this article, we propose PROTON+, a fast tool for placement and routing of 3D ONoCs minimizing the total laser power. Using our tool, the required laser power of the system can be decreased by up to 94p compared to a state-of-the-art manually designed layout. In addition, with the help of our tool, we study the physical design space of ONoC topologies. For this purpose, topology synthesis methods (e.g., global connectivity and network partitioning) as well as different objective function weights are analyzed in order to minimize the maximum insertion loss and ultimately the system’s laser power consumption. For the first time, we study optimal positions of memory controllers. A comparison of our algorithm to a state-of-the-art placer for electronic circuits shows the need for a different set of tools custom-tailored for the particular requirements of optical interconnects.

...read moreread less

Journal Article•DOI•

A Survey on Low-Power Techniques with Emerging Technologies: From Devices to Systems

[...]

Pierre-Emmanuel Gaillardon¹, Edith Beigne, Suzanne Lesecq, Giovanni De Micheli¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

02 Sep 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: A transversal survey on energy-efficient techniques ranging from devices to architectures is presented, showing how novel and emerging devices provide new opportunities to extend the trend toward low-power design.

...read moreread less

Abstract: Nowadays, power consumption is one of the main limitations of electronic systems. In this context, novel and emerging devices provide new opportunities to extend the trend toward low-power design. In this survey article, we present a transversal survey on energy-efficient techniques ranging from devices to architectures. The actual trends of device research, with fully depleted planar devices, tri-gate geometries, and gate-all-around structures, allows us to reach an increasingly higher level of performance while reducing the associated power. In addition, beyond the simple device property enhancements, emerging devices also lead to innovations at the circuit and architectural levels. In particular, devices whose properties can be tuned through additional terminals enable a fine and dynamic control of device threshold. They also enable designers to realize logic gates and to implement power-related techniques in a compact way unreachable to standard technologies. These innovations reduce power consumption at the gate level and unlock new means of actuation in architectural solutions like adaptive voltage and frequency scaling.

...read moreread less

Journal Article•DOI•

On-Chip Universal Supervised Learning Methods for Neuro-Inspired Block of Memristive Nanodevices

[...]

Djaafar Chabi¹, Weisheng Zhao¹, Damien Querlioz¹, Jacques-Olivier Klein¹•Institutions (1)

Centre national de la recherche scientifique¹

27 Apr 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: This article investigates the design of a neuro-inspired logic block (NLB) dedicated to on-chip function learning and proposes learning strategy and supervised learning methods to demonstrate the ability to learn logic functions with memristive nanodevices.

...read moreread less

Abstract: Scaling down beyond CMOS transistors requires the combination of new computing paradigms and novel devices. In this context, neuromorphic architecture is developed to achieve robust and ultra-low power computing systems. Memristive nanodevices are often associated with this architecture to implement efficiently synapses for ultra-high density. In this article, we investigate the design of a neuro-inspired logic block (NLB) dedicated to on-chip function learning and propose learning strategy. It is composed of an array of memristive nanodevices as synapses associated to neuronal circuits. Supervised learning methods are proposed for different type of memristive nanodevices and simulations are performed to demonstrate the ability to learn logic functions with memristive nanodevices. Benefiting from a compact implementation of neuron circuits and the optimization of learning process, this architecture requires small number of nanodevices and moderate power consumption.

...read moreread less

Journal Article•DOI•

A Sub-μ A Stand-By Current Synchronous Electric Charge Extractor for Piezoelectric Energy Harvesting

[...]

Aldo Romani¹, Matteo Filippi¹, Michele Dini¹, Marco Tartagni¹•Institutions (1)

University of Bologna¹

03 Aug 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: This article presents a fully autonomous and battery-less circuit solution for piezoelectric energy harvesting based on discrete components in a low-cost PCB technology, which achieves a comparable performance in a 32 × 43 mm2 footprint.

...read moreread less

Abstract: In the field of energy harvesting there is a growing interest in power management circuits with intrinsic sub-μ A current consumptions, in order to operate efficiently with very low levels of available power. In this context, integrated circuits proved to be a viable solution with high associated nonrecurring costs and design risks. As an alternative, this article presents a fully autonomous and battery-less circuit solution for piezoelectric energy harvesting based on discrete components in a low-cost PCB technology, which achieves a comparable performance in a 32 × 43 mm2 footprint. The power management circuit implements synchronous electric charge extraction (SECE) with a passive bootstrap circuit from fully discharged states. Circuit characterization showed that the circuit consumes less than 1μ A with a 3V output and may achieve energy conversion efficiencies of up to 85%. In addition, the circuit is specifically designed for operating with input and output voltages up to 20V, which grants a significant flexibility in the choice of transducers and energy storage capacitors.

...read moreread less

Journal Article•DOI•

A Write-Aware STTRAM-Based Register File Architecture for GPGPU

[...]

Jue Wang¹, Yuan Xie²•Institutions (2)

Pennsylvania State University¹, University of California, Santa Barbara²

03 Aug 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: This work proposes a write-aware STTRAM-based RF architecture (WarRF), which contains two techniques: Split Bank Write modifies the arbitrator design to increase the parallelism of read and write accesses in the same bank; Write Pool reduces the number of repeatedwrite accesses to RFs.

...read moreread less

Abstract: The massively parallel processing capacity of GPGPUs requires a large register file (RF), and its size keeps increasing to support more concurrent threads from generation to generation. Using traditional SRAM-based RFs, there are concerns in both area cost and energy consumption, and soon they will become unrealistic. In this work, we analyze the feasibility of using STTRAM-based RF designs, which have benefits in terms of smaller silicon area and zero standby leakage power. However, STTRAM long write latency and high write energy bring new challenges. Therefore, we propose a write-aware STTRAM-based RF architecture (WarRF), which contains two techniques: Split Bank Write modifies the arbitrator design to increase the parallelism of read and write accesses in the same bank; Write Pool reduces the number of repeated write accesses to RFs. Our experiment shows that the performance of STTRAM-based RF is improved by 13p and up to 23p after adopting WarRF. In addition, the energy consumption is reduced by 38p on average compared to SRAM-based RFs.

...read moreread less

Journal Article•DOI•

Process Variability and Electrostatic Analysis of Molecular QCA

[...]

Mariagrazia Graziano¹, Azzurra Pulimeno², Ruiyu Wang², Xiang Wei², Massimo Ruo Roch², Gianluca Piccinini² - Show less +2 more•Institutions (2)

London Centre for Nanotechnology¹, Polytechnic University of Turin²

02 Sep 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: A methodology based on both ab-initio simulations and post-processing of data for analyzing an mQCA system adopting an electronic point of view is identified and an interesting assessment on the energy of an m QCA is started, one of the most promising features of this technology.

...read moreread less

Abstract: Molecular quantum-dot cellular automata (mQCA) is an emerging paradigm for nanoscale computation. Its revolutionary features are the expected operating frequencies (THz), the high device densities, the noncryogenic working temperature, and, above all, the limited power densities. The main drawback of this technology is a consequence of one of its very main advantages, that is, the extremely small size of a single molecule. Device prototyping and the fabrication of a simple circuit are limited by lack of control in the technological process [Pulimeno et al. 2013a]. Moreover, high defectivity might strongly impact the correct behavior of mQCA devices. Another challenging point is the lack of a solid method for analyzing and simulating mQCA behavior and performance, either in ideal or defective conditions. Our contribution in this article is threefold: (i) We identify a methodology based on both ab-initio simulations and post-processing of data for analyzing an mQCA system adopting an electronic point of view (we baptized this method as “MoSQuiTo”); (ii) we assess the performance of an mQCA device (in this case, a bis- ferrocene molecule) working in nonideal conditions, using as a reference the information on fabrication-critical issues and on the possible defects that we are obtaining while conducting our own ongoing experiments on mQCA: (iii) we determine and assess the electrostatic energy stored in a bis-ferrocene molecule both in an oxidized and reduced form. Results presented here consist of quantitative information for an mQCA device working in manifold driving conditions and subjected to defects. This information is given in terms of: (a) output voltage; (b) safe operating area (SOA); (c) electrostatic energy; and (d) relation between SOA and energy, that is, possible energy reduction subject to reliability and functionality constraints. The whole analysis is a first fundamental step toward the study of a complex mQCA circuit. It gives important suggestions on possible improvements of the technological processes. Moreover, it starts an interesting assessment on the energy of an mQCA, one of the most promising features of this technology.

...read moreread less

Journal Article•DOI•

A Low-Power Variation-Aware Adaptive Write Scheme for Access-Transistor-Free Memristive Memory

[...]

Amirali Ghofrani¹, Miguel Angel Lastras-Montano¹, Siddharth Gaba², Melika Payvand¹, Wei Lu², Luke Theogarajan¹, Kwang-Ting Cheng¹ - Show less +3 more•Institutions (2)

University of California, Santa Barbara¹, University of Michigan²

03 Aug 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: An adaptive write scheme that adaptively adjusts the write pulses to address variations in memristive arrays, resulting in 7×--11× average energy saving in case studies and helps shorten the test time of memory march algorithms.

...read moreread less

Abstract: Recent advances in access-transistor-free memristive crossbars have demonstrated the potential of memristor arrays as high-density and ultra-low-power memory. However, with considerable variations in the write-time characteristics of individual memristors, conventional fixed-pulse write schemes cannot guarantee reliable completion of the write operations and waste significant amount of energy. We propose an adaptive write scheme that adaptively adjusts the write pulses to address such variations in memristive arrays, resulting in 7×--11× average energy saving in our case studies. Our scheme embeds an online monitor to detect the completion of a write operation and takes into account the parasitic effect of line-shared devices in access-transistor-free crossbars. This feature also helps shorten the test time of memory march algorithms by eliminating the need of a verifying read right after a write, which is commonly employed in the test sequences of march algorithms.

...read moreread less

Journal Article•DOI•

Toward a Sparse Self-Organizing Map for Neuromorphic Architectures

[...]

Laurent Rodriguez¹, Benoit Miramond¹, Bertrand Granado¹•Institutions (1)

Centre national de la recherche scientifique¹

27 Apr 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: This article proposes modeling cortical plasticity in mammalian brains as a problem of estimating a probability density function that would correspond to the nature and the richness of the environment perceived through multiple modalities and defines and develops a novel neural model solving the problem in a distributed and sparse manner.

...read moreread less

Abstract: Neurobiological systems have often been a source of inspiration for computational science and engineering, but in the past their impact has also been limited by the understanding of biological models. Today, new technologies lead to an equilibrium situation where powerful and complex computers bring new biological knowledge of the brain behavior. At this point, we possess sufficient understanding to both imagine new brain-inspired computing paradigms and to sustain a classical paradigm which reaches its end programming and intellectual limitations.In this context, we propose to reconsider the computation problem first in the specific domain of mobile robotics. Our main proposal consists in considering computation as part of a global adaptive system, composed of sensors, actuators, a source of energy and a controlling unit. During the adaptation process, the proposed brain-inspired computing structure does not only execute the tasks of the application but also reacts to the external stimulation and acts on the emergent behavior of the system. This approach is inspired by cortical plasticity in mammalian brains and suggests developing the computation architecture along the system's experience.This article proposes modeling this plasticity as a problem of estimating a probability density function. This function would correspond to the nature and the richness of the environment perceived through multiple modalities. We define and develop a novel neural model solving the problem in a distributed and sparse manner. And we integrate this neural map into a bio-inspired hardware substrate that brings the plasticity property into parallel many-core architectures. The approach is then called Hardware Plasticity. The results show that the self-organization properties of our model solve the problem of multimodal sensory data clusterization. The properties of the proposed model allow envisaging the deployment of this adaptation layer into hardware architectures embedded into the robot's body in order to build intelligent controllers.

...read moreread less

Journal Article•DOI•

FinFET-Based Low-Swing Clocking

[...]

Can Sitik¹, Emre Salman², Leo Filippini¹, Sungjun Yoon², Baris Taskin¹ - Show less +1 more•Institutions (2)

Drexel University¹, Stony Brook University²

02 Sep 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: A FinFET-based low- Swing clocking methodology is introduced to preserve the dynamic power savings of low-swing clocking while minimizing these three negative effects, facilitated through an efficient use of FinFet technology.

...read moreread less

Abstract: A low-swing clocking methodology is introduced to achieve low-power operation at 20nm FinFET technology. Low-swing clock trees are used in existing methodologies in order to decrease the dynamic power consumption in a trade-off for 3 issues: (1) the effect of leakage power consumption, which is becoming more dominant when the process scales sub-32nm; (2) the increase in insertion delay, resulting in a high clock skew; and (3) the difficulty in driving the existing DFF sinks with a low-swing clock signal without a timing violation. In this article, a FinFET-based low-swing clocking methodology is introduced to preserve the dynamic power savings of low-swing clocking while minimizing these three negative effects, facilitated through an efficient use of FinFET technology. At scaled performance constraints, the proposed methodology at 20nm FinFET leads to 42p total power savings (clock network+DFF) compared to a FinFET-based full-swing counterpart at the same frequency (3 GHz), thanks to the dynamic power savings of low-swing clocking and 3p power savings compared to a CMOS-based low-swing implementation running at the half frequency (1.5 GHz), thanks to the leakage power savings of FinFET technology.

...read moreread less

Journal Article•DOI•

mNoC: Large Nanophotonic Network-on-Chip Crossbars with Molecular Scale Devices

[...]

Jun Pang¹, Chris Dwyer¹, Alvin R. Lebeck¹•Institutions (1)

Duke University¹

03 Aug 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: A Single Writer Multiple Reader (SWMR) bus based crossbar mNoC is presented that can achieve more than 88p reduction in energy for a 64×64 crossbar compared to similar ring resonator based designs and can scale to a 256×256 crossbar with an average 10p performance improvement and 54p energy reduction.

...read moreread less

Abstract: Moore's law and the continuity of device scaling have led to an increasing number of coressnodes on a chip, creating a need for new mechanisms to achieve high-performance and power-efficient Network-on-Chip (NoC) Nanophotonics based NoCs provide for higher bandwidth and more power efficient designs than electronic networks Present approaches often use an external laser source, ring resonators, and waveguides However, they still suffer from important limitations: large static power consumption, and limited network scalability In this article, we explore the use of emerging molecular scale devices to construct nanophotonic networks: Molecular-scale Network-on-Chip (mNoC) We leverage on-chip emitters such as quantum dot LEDs, which provide electrical to optical signal modulation, and chromophores, which provide optical signal filtering for receivers These devices replace the ring resonators and the external laser source used in contemporary nanophotonic NoCs They reduce energy consumption or enable scaling to larger crossbars for a reduced energy budget We present a Single Writer Multiple Reader (SWMR) bus based crossbar mNoC Our evaluation shows that an mNoC can achieve more than 88p reduction in energy for a 64×64 crossbar compared to similar ring resonator based designs Additionally, an mNoC can scale to a 256×256 crossbar with an average 10p performance improvement and 54p energy reduction

...read moreread less

Journal Article•DOI•

Analytical Reliability Analysis of 3D NoC under TSV Failure

[...]

Misagh Khayambashi¹, Pooria M. Yaghini¹, Ashkan Eghbal¹, Nader Bagherzadeh¹•Institutions (1)

University of California, Irvine¹

27 Apr 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: This article aims to define a reliability criterion for NoC and provide a framework for quantifying this reliability as it relates to TSV issues, and for the first time, the reliability criterion is reduced to a tractable closed-form expression that requires a single Monte Carlo simulation.

...read moreread less

Abstract: The network-on-chip (NoC) technology allows for integration of a manycore design on a single chip for higher efficiency and scalability. Three-dimensional (3D) NoCs offer several advantages over two-dimensional (2D) NoCs. Through-silicon via (TSV) technology is one of the candidates for implementation of 3D NoCs. TSV reliability analysis is still challenging for 3D NoC designers because of their unique electrical, thermal, and physical characteristics. After providing an overview of common TSV issues, this article aims to define a reliability criterion for NoC and provide a framework for quantifying this reliability as it relates to TSV issues. TSV issues are modeled as a time-invariant failure probability. Also, a reliability criterion for TSV-based NoC is defined. The relationship between NoC reliability and TSV failure is quantified. For the first time, the reliability criterion is reduced to a tractable closed-form expression that requires a single Monte Carlo simulation. Importantly, the Monte Carlo simulation depends only on network geometry. To demonstrate our proposed method, the reliability criterion of a simple 8×8×8 NoC supported by an 8×8×7 network of TSVs is calculated.

...read moreread less

Journal Article•DOI•

MN-MATE: Elastic Resource Management of Manycores and a Hybrid Memory Hierarchy for a Cloud Node

[...]

Kyu Ho Park¹, Woomin Hwang¹, Hyunchul Seok¹, Chul Min Kim¹, Dong-Jae Shin¹, Dong Jin Kim¹, Min Kyu Maeng¹, Seongmin Kim¹ - Show less +4 more•Institutions (1)

KAIST¹

03 Aug 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: The proposed MN-MATE, an elastic resource management architecture for a single cloud node with manycores, on-chip DRAM, and large size of off- chip DRAM and NVRAM, improves system performance and reduces energy consumption.

...read moreread less

Abstract: Recent advent of manycore system increases needs for larger but faster memory hierarchy. Emerging next generation memories such as on-chip DRAM and nonvolatile memory (NVRAM) are promising candidates for replacement of DRAM-only main memory. Combined with the manycore trends, it gives an opportunity to rethink conventional resource management system with a memory hierarchy for a single cloud node. In an attempt to mitigate the energy and memory problems, we propose MN-MATE, an elastic resource management architecture for a single cloud node with manycores, on-chip DRAM, and large size of off-chip DRAM and NVRAM. In MN-MATE, the hypervisor places consolidated VMs and balances memory among them. Based on the monitored information about the allocated memory, a guest OS co-schedules tasks accessing different types of memory with complementary access intensity. Polymorphic management of DRAM hierarchy accelerates average memory access speed inside each guest OS. A guest OS reduces energy consumption with small performance loss based on the NVRAM-aware data placement policy and the hybrid page cache. A new lightweight kernel is developed to reduce the overhead from the guest OS for scientific applications. Experiment results show that our techniques in MN-MATE platform improve system performance and reduce energy consumption.

...read moreread less

Journal Article•DOI•

Randomly Spiking Dynamic Neural Fields

[...]

Benoit Chappet de Vangel¹, Cesar Torres-Huitzil², Bernard Girau¹•Institutions (2)

University of Lorraine¹, CINVESTAV²

27 Apr 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: This article introduces a hardware-friendly model adapted from the CNFT, namely the RSDNF model (randomly spiking dynamic neural fields), which achieves scalable parallel implementations on digital hardware while maintaining the behavioral properties of CNFT models.

...read moreread less

Abstract: Bio-inspired neural computation attracts a lot of attention as a possible solution for the future challenges in designing computational resources. Dynamic neural fields (DNF) provide cortically inspired models of neural populations to which computation can be applied for a wide variety of tasks, such as perception and sensorimotor control. DNFs are often derived from continuous neural field theory (CNFT). In spite of the parallel structure and regularity of CNFT models, few studies of hardware implementations have been carried out targeting embedded real-time processing. In this article, a hardware-friendly model adapted from the CNFT is introduced, namely the RSDNF model (randomly spiking dynamic neural fields). Thanks to their simplified 2D structure, RSDNFs achieve scalable parallel implementations on digital hardware while maintaining the behavioral properties of CNFT models. Spike-based computations within neurons in the field are introduced to reduce interneuron connection bandwidth. Additionally, local stochastic spike propagation ensures inhibition and excitation broadcast without a fully connected network. The behavioral soundness and robustness of the model in the presence of noise and distracters is fully validated through software and hardware. A field programmable gate array (FPGA) implementation shows how the RSDNF model ensures a level of density and scalability out of reach for previous hardware implementations of dynamic neural field models.

...read moreread less

Journal Article•DOI•

Fully Binary Neural Network Model and Optimized Hardware Architectures for Associative Memories

[...]

Philippe Coussy¹, Cyrille Chavet¹, Hugues Nono Wouafo¹, Laura Conde-Canencia¹•Institutions (1)

University of Southern Brittany¹

27 Apr 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: This article proposes computational simplifications and architectural optimizations of the original GBNN that leads to significant complexity and area reduction without affecting neither memorizing nor retrieving performance.

...read moreread less

Abstract: Brain processes information through a complex hierarchical associative memory organization that is distributed across a complex neural network. The GBNN associative memory model has recently been proposed as a new class of recurrent clustered neural network that presents higher efficiency than the classical models. In this article, we propose computational simplifications and architectural optimizations of the original GBNN. This work leads to significant complexity and area reduction without affecting neither memorizing nor retrieving performance. The obtained results open new perspectives in the design of neuromorphic hardware to support large-scale general-purpose neural algorithms.

...read moreread less

Journal Article•DOI•

Energy-Efficient All-Spin Cache Hierarchy Using Shift-Based Writes and Multilevel Storage

[...]

Rangharajan Venkatesan¹, Mrigank Sharad¹, Kaushik Roy¹, Anand Raghunathan¹•Institutions (1)

Purdue University¹

03 Aug 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: This work proposes a new device--multilevel DWM with shift-based write (ML-DWM-SW)--that is capable of storing 2 bits in a single device that achieves improved write efficiency and features decoupled read-write paths, enabling independent optimizations of read and write operations.

...read moreread less

Abstract: Spintronic memories are considered to be promising candidates for future on-chip memories due to their high density, nonvolatility, and near-zero leakage. However, they also face challenges such as high write energy and latency and limited read speed due to single-ended sensing. Further, the conflicting requirements of read and write operations lead to stringent design constraints that severely compromises their benefits. Recently, domain wall memory was proposed as a spintronic memory that has a potential for very high density by storing multiple bits in the domains of a ferromagnetic nanowire. While reliable operation of DWM memory with multiple domains faces many challenges, single-bit cells that utilize domain wall motion for writes have been experimentally demonstrated [Fukami et al. 2009]. This bit-cell, which we refer to as Domain Wall Memory with Shift-based Write (DWM-SW), achieves improved write efficiency and features decoupled read-write paths, enabling independent optimizations of read and write operations. However, these benefits are achieved at the cost of sacrificing the original goal of improved density. In this work, we explore multilevel storage as a new direction to enhance the density benefits of DWM-SW. At the device level, we propose a new device--multilevel DWM with shift-based write (ML-DWM-SW)--that is capable of storing 2 bits in a single device. At the circuit level, we propose a ML-DWM-SW based bit-cell design and layout. The ML-DWM-SW bit-cell incurs no additional area overhead compared to the DWM-SW bit-cell despite storing an additional bit, thereby achieving roughly twice the density. However, it requires a two-step write operation and has data-dependent read and write energies, which pose unique challenges. To address these issues, we propose suitable architectural optimizations: (i) intra-word interleaving and (ii) bit encoding. We design “all-spin” cache architectures using the proposed ML-DWM-SW bit-cell for both general purpose processors as well as general purpose graphics processing units (GPGPUs). We perform an iso-capacity replacement of SRAM with spintronic memories and study the energy and area benefits at iso-performance conditions. For general purpose processors, the ML-DWM-SW cache achieves 10X reduction in energy and 4.4X reduction in cache area compared to an SRAM cache and 2X and 1.7X reduction in energy and area, respectively, compared to an STT-MRAM cache. For GPGPUs, the ML-DWM-SW cache achieves 5.3X reduction in energy and 3.6X area reduction compared to SRAM and 3.5X energy reduction and 1.9X area reduction compared to STT-MRAM.

...read moreread less

Journal Article•DOI•

REDELF: An Energy-Efficient Deadlock-Free Routing for 3D NoCs with Partial Vertical Connections

[...]

Jinho Lee¹, Kyungsu Kang², Kiyoung Choi¹•Institutions (2)

Seoul National University¹, Samsung²

21 Sep 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: An energy-efficient deadlock-free routing algorithm for 3D mesh topologies where vertical connections partially exist by introducing some rules for selecting elevators and eliminating the dedicated virtual channel requirement is proposed.

...read moreread less

Abstract: 3D integrated circuits (3D ICs) using through-silicon vias (TSVs) allow to envision the stacking of dies with different functions and technologies, using as an interconnect backbone a 3D network-on-chip (NoC). However, partial vertical connection in 3D NoCs seems unavoidable because of the large overhead of TSV itself (e.g., large footprint, low fabrication yield, additional fabrication processes) as well as the heterogeneity in dimension. This article proposes an energy-efficient deadlock-free routing algorithm for 3D mesh topologies where vertical connections partially exist. By introducing some rules for selecting elevators (i.e., vertical links between dies), the routing algorithm can eliminate the dedicated virtual channel requirement. In this article, the rules themselves as well as the proof of deadlock freedom are given. By eliminating the virtual channels for deadlock avoidance, the proposed routing algorithm reduces the energy consumption by 38.9p compared to a conventional routing algorithm. When the virtual channel is used for reducing the head-of-line blocking, the proposed routing algorithm increases performance by up to 23.1p and 6.9p on average.

...read moreread less

Journal Article•DOI•

Dynamic Cache Pooling in 3D Multicore Processors

[...]

Tiansheng Zhang¹, Jie Meng¹, Ayse K. Coskun¹•Institutions (1)

Boston University¹

02 Sep 2015-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: A 3D multicore architecture that provides poolable cache resources and a runtime management policy to improve energy efficiency in 3D systems by utilizing the flexible heterogeneity of cache resources are introduced.

...read moreread less

Abstract: Resource pooling, where multiple architectural components are shared among cores, is a promising technique for improving system energy efficiency and reducing total chip area. 3D stacked multicore processors enable efficient pooling of cache resources owing to the short interconnect latency between vertically stacked layers. This article first introduces a 3D multicore architecture that provides poolable cache resources. We then propose a runtime management policy to improve energy efficiency in 3D systems by utilizing the flexible heterogeneity of cache resources. Our policy dynamically allocates jobs to cores on the 3D system while partitioning cache resources based on cache hungriness of the jobs. We investigate the impact of the proposed cache resource pooling architecture and management policy in 3D systems, both with and without on-chip DRAM. We evaluate the performance, energy efficiency, and thermal behavior for a wide range of workloads running on 3D systems. Experimental results demonstrate that the proposed architecture and policy reduce system energy-delay product (EDP) and energy-delay-area product (EDAP) by 18.8p and 36.1p on average, respectively, in comparison to 3D processors with static cache sizes.

...read moreread less