scispace - formally typeset
Search or ask a question

Showing papers in "ACM Journal on Emerging Technologies in Computing Systems in 2016"


Journal ArticleDOI
TL;DR: This survey of RE and anti-RE techniques on the chip, board, and system levels should be of interest to both governmental and industrial bodies whose critical systems and intellectual property (IP) require protection from foreign enemies and counterfeiters who possess advanced RE capabilities.
Abstract: The reverse engineering (RE) of electronic chips and systems can be used with honest and dishonest intentions. To inhibit RE for those with dishonest intentions (e.g., piracy and counterfeiting), it is important that the community is aware of the state-of-the-art capabilities available to attackers today. In this article, we will be presenting a survey of RE and anti-RE techniques on the chip, board, and system levels. We also highlight the current challenges and limitations of anti-RE and the research needed to overcome them. This survey should be of interest to both governmental and industrial bodies whose critical systems and intellectual property (IP) require protection from foreign enemies and counterfeiters who possess advanced RE capabilities.

208 citations


Journal ArticleDOI
TL;DR: Simulation results indicate that highly efficient and secure circuit structures can be achieved via the use of non-CMOS devices and will be treated as one design metric for emerging nano-architecture.
Abstract: Hardware security concerns such as intellectual property (IP) piracy and hardware Trojans have triggered research into circuit protection and malicious logic detection from various design perspectives. In this article, emerging technologies are investigated by leveraging their unique properties for applications in the hardware security domain. Security, for the first time, will be treated as one design metric for emerging nano-architecture. Five example circuit structures including camouflaging gates, polymorphic gates, current/voltage-based circuit protectors, and current-based XOR logic are designed to show the high efficiency of silicon nanowire FETs and graphene SymFET in applications such as circuit protection and IP piracy prevention. Simulation results indicate that highly efficient and secure circuit structures can be achieved via the use of non-CMOS devices.

77 citations


Journal ArticleDOI
TL;DR: Comprehensive analysis on machine-learning algorithms suggests that Support Vector Machine (SVM) and K-Nearest Neighbor (K-NN) have better attack detection efficiency.
Abstract: In this article, we propose a real-time anomaly detection framework for an NoC-based many-core architecture. We assume that processing cores and memories are safe and anomaly is included through a communication medium (i.e., router). The article targets three different attacks, namely, traffic diversion, route looping, and core address spoofing attacks. The attacks are detected by using machine-learning techniques. Comprehensive analysis on machine-learning algorithms suggests that Support Vector Machine (SVM) and K-Nearest Neighbor (K-NN) have better attack detection efficiency. It has been observed that both algorithms have accuracy in the range of 94p to 97p. Additional hardware complexity analysis advocates SVM to be implemented on hardware. To test the framework, we implement a condition-based attack insertion module; attacks are performed intra- and intercluster. The proposed real-time anomaly detection framework is fully placed and routed on Xilinx Virtex-7 FPGA. Postplace and -route implementation results show that SVM has 12p to 2p area overhead and 3p to 1p power overhead for the quad-core and 16-core implementation, respectively. It is also observed that it takes 25p to 18p of the total execution time to detect an anomaly in transferred packets for quad-core and 16-core, respectively. The proposed framework achieves 65p reduction in area overhead and is 3 times faster compared to previous published work.

53 citations


Journal ArticleDOI
TL;DR: The benefits of having a non-volatile processor to enable ultra-low-power IoT devices and magnetic random access memory is a promising candidate, as it combines non-Volatility, high density, reasonable latency, and low leakage.
Abstract: Over the past few years, a new era of smart connected devices has emerged in the market to enable the future world of the Internet of Things (IoT). A key requirement for IoT applications is the power consumption to allow very high autonomy in the case of battery-powered systems. Depending on the application, such devices will be most of the time in a low-power mode (sleep mode) and will wake up only when there is a task to accomplish (active mode). Emerging non-volatile memory technologies are seen as a very attractive solution to design ultra-low-power systems. Among these technologies, magnetic random access memory is a promising candidate, as it combines non-volatility, high density, reasonable latency, and low leakage. Integration of non-volatility as a new feature of memories has the great potential to allow full data retention after a complete shutdown with a fast wake-up time. This article explores the benefits of having a non-volatile processor to enable ultra-low-power IoT devices.

42 citations


Journal ArticleDOI
TL;DR: This article proposes an innovative PUF design based on STT-MRAM memory that exploits the high variability affecting the electrical resistance of the Magnetic Tunnel Junction (MTJ) device in anti-parallel magnetization and demonstrates that the proposed solution is robust, unclonable, and unpredictable.
Abstract: Physically Unclonable Functions (PUFs) are emerging cryptographic primitives used to implement low-cost device authentication and secure secret key generation. Weak PUFs (i.e., devices able to generate a single signature or to deal with a limited number of challenges) are widely discussed in literature. One of the most investigated solutions today is based on SRAMs. However, the rapid development of low-power, high-density, high-performance SoCs has pushed the embedded memories to their limits and opened the field to the development of emerging memory technologies. The Spin-Transfer-Torque Magnetic Random Access Memory (STT-MRAM) has emerged as a promising choice for embedded memories due to its reduced read/write latency and high CMOS integration capability. In this article, we propose an innovative PUF design based on STT-MRAM memory. We exploit the high variability affecting the electrical resistance of the Magnetic Tunnel Junction (MTJ) device in anti-parallel magnetization. We will demonstrate that the proposed solution is robust, unclonable, and unpredictable.

39 citations


Journal ArticleDOI
TL;DR: A survey of architectural techniques for using DWM for designing components in both CPU and GPU is presented and techniques related to performance, energy, and reliability are discussed and works that compare DWM with other memory technologies are discussed.
Abstract: Recent trends of increasing core-count and bandwidth/memory wall have motivated researchers to explore novel memory technologies for designing processor components such as cache, register file, shared memory, and so on. Domain-wall memory (DWM), also known as racetrack memory, is a promising emerging technology due to its non-volatility and very high density. However, use of DWM presents challenges due to characteristics of both DWM itself (e.g., requirement of shift operations, variable latency) and processor components. Recently, several techniques have been proposed to address these challenges. This article presents a survey of architectural techniques for using DWM for designing components in both CPU and GPU. We discuss techniques related to performance, energy, and reliability and also discuss works that compare DWM with other memory technologies. We also highlight the opportunities and obstacles in using DWM for designing processor components. This survey is expected to spark further research in this area and be useful for researchers, chip designers, and computer architects.

35 citations


Journal ArticleDOI
TL;DR: This article investigates and compares the layouts and parasitic capacitances and resistances of HVTFETs with FinFETs, and modeled the analytical parasitics in SPICE in order to analyze the impact of Parasitics.
Abstract: Vertical tunnel field-effect transistors (VTFETs) have been extensively explored to overcome the scaling limits and to improve on-current (ION) compared to standard lateral device structures for the future technologies. The benefits in terms of reduced footprint, high ION and feasibility of fabrication have been demonstrated in several works. Among various VTFETs, the asymmetric heterojunction vertical tunnel FETs (HVTFETs) have emerged as one of the promising alternatives to standard transistors for low-voltage applications. However, while such device-level benefits without parasitics have been widely investigated, logic-gate design with parasitics and layout implications are not clear. In this article, we investigate and compare the layouts and parasitic capacitances and resistances of HVTFETs with FinFETs. Due to the vertical device structure of HVTFETs, a smaller footprint is observed compared to FinFETs in cells with small fan-in. However, for high fan-in cells, HVTFETs exhibit area overheads due to infeasibility of contact sharing in parallel and series transistors. These area overheads also lead to approximately 48p higher parasitic capacitance and resistance compared to FinFETs when the number of parallel and series connections increases. Further, in order to analyze the impact of parasitics, we modeled the analytical parasitics in SPICE. The models for both HVTFETs and FinFETs with parasitics were used to simulate a 15-stage inverter-based ring oscillator (RO) in order to compare the delay and energy. Our simulation results clearly show that HVTFETs exhibit less delay at a VDD

28 citations


Journal ArticleDOI
TL;DR: It is observed that smaller fin width of the device improves its performance and can be used as a high-efficiency RF integrated circuit design.
Abstract: In this article, the RF and analog performance of junctionless accumulation-mode bulk FinFETs is analyzed by employing the variation of fin width so that it can be used as a high-efficiency RF integrated circuit design. The RF/analog performance evaluation has been carried out using the ATLAS 3D device simulator in terms of evaluation of figure-of-merits metrics such as transconductance (gm), gate-to-source/drain capacitances (Cgg), cutoff frequency (fT), and maximum frequency of oscillation (fmax). Apart from RF/analog performance investigation, the variation of ON-current to OFF-current ratio (ION/IOFF) and transconductance generation factor (gm/Ids) have also been carried out. From this study, it is observed that smaller fin width of the device improves its performance.

21 citations


Journal ArticleDOI
TL;DR: A novel storage strategy optimized for feedforward neural networks is proposed in this work, which greatly reduces the energy and area cost of the memristor array and its peripherals.
Abstract: Due to their nonvolatile nature, excellent scalability, and high density, memristive nanodevices provide a promising solution for low-cost on-chip storage. Integrating memristor-based synaptic crossbars into digital neuromorphic processors (DNPs) may facilitate efficient realization of brain-inspired computing. This article investigates architectural design exploration of DNPs with memristive synapses by proposing two synapse readout schemes. The key design tradeoffs involving different analog-to-digital conversions and memory accessing styles are thoroughly investigated. A novel storage strategy optimized for feedforward neural networks is proposed in this work, which greatly reduces the energy and area cost of the memristor array and its peripherals.

20 citations


Journal ArticleDOI
TL;DR: The proposed spintronic physically unclonable functions (PUFs) to exploit security-specific properties of domain wall memory (DWM) for security, trust, and authentication show promising results in terms of randomness, stability, and resistance to attacks.
Abstract: We propose spintronic physically unclonable functions (PUFs) to exploit security-specific properties of domain wall memory (DWM) for security, trust, and authentication. We note that the nonlinear dynamics of domain walls (DWs) in the physical magnetic system is an untapped source of entropy that can be leveraged for hardware security. The spatial and temporal randomness in the physical system is employed in conjunction with microscopic and macroscopic properties such as stochastic DW motion, stochastic pinning/depinning, and serial access to realize novel relay-PUF and memory-PUF designs. The proposed PUFs show promising results (∼50p interdie Hamming distance (HD) and 10p to 20p intradie HD) in terms of randomness, stability, and resistance to attacks. We have investigated noninvasive attacks, such as machine learning and magnetic field attack, and have assessed the PUFs resilience.

20 citations


Journal ArticleDOI
TL;DR: There is a growing concern regarding the trustworthiness and reliability of the hardware underlying all information systems on which modern society is reliant, andHardware-based attacks and hardware-based security primitives rooted in emerging technologies are investigated.
Abstract: There is a growing concern regarding the trustworthiness and reliability of the hardware underlying all information systems on which modern society is reliant. Trustworthy and reliable semiconductor supply chain, hardware components, and platforms are essential to all critical infrastructures including financial, healthcare, transportation, and energy. Traditionally, the information systems underlying all critical infrastructures were being protected—specifically the authenticity, integrity, and confidentiality of the information was being ensured—using security protocols implemented in software running on hardware platforms that were assumed to be trustworthy and reliable. However, this assumption is no longer true; an increasing number of attacks are being reported on the hardware root of trust [https://isis.poly.edu/esc/2014/index.html]. Since 2008, NYU has been organizing the annual Embedded Security Challenge (ESC) to demonstrate the ease and feasibility of hardware-based attacks on information systems. As part of this annual event, ESC2014 challenged the hardware security and emerging technologies communities to investigate hardware-based attacks and hardware-based security primitives rooted in emerging technologies according to the tutorial papers on this topic [Rajendran et al. 2012, 2015]. ESC 2014 had three phases [https://isis.poly.edu/esc/2014/index.html]. In phase 1, 14 teams submitted a 2-page proposal that described an emerging technology, the structure and operation of the security primitives that exploited the unique characteristics of the chosen emerging technology, the threat model that the security primitives target, the security metrics used to evaluate the security primitives and applications of the developed security primitives. Ten promising proposals were down-selected for Phase 2 of ESC 2014. In this phase, participants developed and evaluated their emerging technology-based security primitives. In the ESC 2014 finals held at NYU in November 2014, as part of the annual NYU Cyber Security Awareness Week, the ten finalists demonstrated and presented their security primitives and submitted a final report. Examples of security primitives included, but were not limited to, cryptographically secure pseudo-random number generators, public-key and private-key cryptography, one-way hash functions, and physical unclonable functions. Emerging technologies that were considered include: graphene transistors, atomic switches, memristors, Mott field effect transistor, spin FET, all-spin-logic, spin-wave devices, orthogonal spin-transfer random access memory, magneto-resistive random access memory, spintronic devices, nanomagnets, nano-electromechanical switches and phase-change memory. The finalists included Case Western Reserve University, Rochester Institute of Technology, University of Central Florida, University of Illinois at Urbana-Champaign, University of

Journal ArticleDOI
TL;DR: This work explores the limits and opportunities for using photonic links to design the NoC architecture for a future Kilocore system, and investigates the use of prefetching and aggressive non-blocking caches.
Abstract: The increasing core count in manycore systems requires a corresponding large Network-on-chip (NoC) bandwidth to support the overlying applications. However, it is not possible to provide this large bandwidth in an energy-efficient manner using electrical link technology. To overcome this issue, photonic link technology has been proposed as a replacement. This work explores the limits and opportunities for using photonic links to design the NoC architecture for a future Kilocore system. Three different NoC designs are explored: ElecNoC, an electrical concentrated two-dimensional- (2D) mesh NoC; HybNoC, an electrical concentrated 2D mesh with a photonic multi-crossbar NoC; and PhotoNoC, a photonic multi-bus NoC. We consider both private and shared cache architectures and, to leverage the large bandwidth density of photonic links, we investigate the use of prefetching and aggressive non-blocking caches. Our analysis using contemporary Big Data workloads shows that the non-blocking caches with a shared LLC can best leverage the large bandwidth of the photonic links in the Kilocore system. Moreover, compared to ElecNoC-based and HybNoC-based Kilocore systems, a PhotoNoC-based Kilocore system achieves up to 2.5× and 1.5× better performance, respectively, and can support up to 2.1× and 1.1× higher bandwidth, respectively, while dissipating comparable power in the overall system.

Journal ArticleDOI
TL;DR: A synthesis algorithm, featuring both variable reordering and product term reordering, for area minimization is proposed, which can achieve an area reduction of up to 24% as compared to current state-of-the-art techniques.
Abstract: Power dissipation has become a pressing issue of concern in the designs of most electronic system as fabrication processes enter even deeper submicron regions. More specifically, leakage power plays a dominant role in system power dissipation. An emerging circuit design style, the reconfigurable single-electron transistor (SET) array, has been proposed for continuing Moore's Law due to its ultra-low leakage power consumption. Recently, several works have been proposed to address the issues related to automated synthesis for the reconfigurable SET array. Nevertheless, all of those existing approaches consider mandatory fabrication constraints of SET array merely in late synthesis stages. In this article, we propose a synthesis algorithm, featuring input-variable ordering and dynamic product term ordering, for area minimization. The fabrication constraints are taken into account at every synthesis stage of proposed flow to guarantee better synthesis outcomes. We also develop a simulated annealing-based postprocess to find a proper phase assignment of each input variable for further area reduction. Experimental results show that our new methodology can achieve up to 29p area reduction as compared to existing state-of-the-art techniques.

Journal ArticleDOI
TL;DR: This work introduces a novel attack methodology to recover the secret key employed in implementations of the Elliptic Curve Digital Signature Algorithm, exploiting the information leakage induced when altering the execution of the modular arithmetic operations used in the signature primitive.
Abstract: Elliptic curve cryptosystems proved to be well suited for securing systems with constrained resources like embedded and portable devices. In a fault-based attack, errors are induced during the computation of a cryptographic primitive, and the results are collected to derive information about the secret key safely stored in the device. We introduce a novel attack methodology to recover the secret key employed in implementations of the Elliptic Curve Digital Signature Algorithm. Our attack exploits the information leakage induced when altering the execution of the modular arithmetic operations used in the signature primitive and does not rely on the underlying elliptic curve mathematical structure, thus being applicable to all standardized curves. We provide both a validation of the feasibility of the attack, even employing common off-the-shelf hardware to perform the required computations, and a low-cost countermeasure to counteract it.

Journal ArticleDOI
TL;DR: A model for the beat frequency detector--based high-speed TRNG (BFD-TRNG) is proposed, and the key contribution of the proposed approach lies in fitting the model to measured data and the ability to use themodel to predict performance of BFD- TRNGs that have not been fabricated.
Abstract: True random number generators (TRNGs) are crucial components for the security of cryptographic systems. In contrast to pseudo--random number generators (PRNGs), TRNGs provide higher security by extracting randomness from physical phenomena. To evaluate a TRNG, statistical properties of the circuit model and raw bitstream should be studied. In this article, a model for the beat frequency detector--based high-speed TRNG (BFD-TRNG) is proposed. The parameters of the model are extracted from the experimental data of a test chip. A statistical analysis of the proposed model is carried out to derive mean and variance of the counter values of the TRNG. Our statistical analysis results show that mean of the counter values is inversely proportional to the frequency difference of the two ring oscillators (ROSCs), whereas the dynamic range of the counter values increases linearly with standard deviation of environmental noise and decreases with increase of the frequency difference. Without the measurements from the test data, a model cannot be created; similarly, without a model, performance of a TRNG cannot be predicted. The key contribution of the proposed approach lies in fitting the model to measured data and the ability to use the model to predict performance of BFD-TRNGs that have not been fabricated. Several novel alternate BFD-TRNG architectures are also proposed; these include parallel BFD, cascade BFD, and parallel-cascade BFD. These TRNGs are analyzed using the proposed model, and it is shown that the parallel BFD structure requires less area per bit, whereas the cascade BFD structure has a larger dynamic range while maintaining the same mean of the counter values as the original BFD-TRNG. It is shown that 3.25M and 4M random bits can be obtained per counter value from parallel BFD and parallel-cascade BFD, respectively, where M counter values are computed in parallel. Furthermore, the statistical analysis results illustrate that BFD-TRNGs have better randomness and less cost per bit than other existing ROSC-TRNG designs. For example, it is shown that BFD-TRNGs accumulate 150p more jitter than the original two-oscillator TRNG and that parallel BFD-TRNGs require one-third power and one-half area for same number of random bits for a specified period.

Journal ArticleDOI
TL;DR: This article presents a detailed survey and review of the areas of computer architecture and software systems that are oriented to PCM devices, and identifies key technical challenges that need to be addressed before this memory technology can be leveraged to build high-performance computer systems.
Abstract: With dramatic growth of data and rapid enhancement of computing powers, data accesses become the bottleneck restricting overall performance of a computer system. Emerging phase-change memory (PCM) is byte-addressable like DRAM, persistent like hard disks and Flash SSD, and about four orders of magnitude faster than hard disks or Flash SSDs for typical file system I/Os. The maturity of PCM from research to production provides a new opportunity for improving the I/O performance of a system. However, PCM also has some weaknesses, for example, long write latency, limited write endurance, and high active energy. Existing processor cache systems, main memory systems, and online storage systems are unable to leverage the advantages of PCM, and/or to mitigate PCM’s drawbacks. The reason behind this incompetence is that they are designed and optimized for SRAM, DRAM memory, and hard drives, respectively, instead of PCM memory. There have been some efforts concentrating on rethinking computer architectures and software systems for PCM. This article presents a detailed survey and review of the areas of computer architecture and software systems that are oriented to PCM devices. First, we identify key technical challenges that need to be addressed before this memory technology can be leveraged, in the form of processor cache, main memory, and online storage, to build high-performance computer systems. Second, we examine various designs of computer architectures and software systems that are PCM aware. Finally, we obtain several helpful observations and propose a few suggestions on how to leverage PCM to optimize the performance of a computer system.

Journal ArticleDOI
TL;DR: In this paper, failure-aware ECC (FaECC) is proposed to mask permanent faults while maintaining the same correction capability for soft errors without increased number of encoded bits.
Abstract: Spin-Transfer Torque MRAMs are attractive due to their non-volatility, high density, and zero leakage. However, STT-MRAMs suffer from poor reliability due to shared read and write paths. Additionally, conflicting requirements for data retention and writeability (both related to the energy barrier height of the storage device) makes design more challenging. Furthermore, the energy barrier height depends on the geometry of the storage. Any variations in the geometry of the storage device lead to variations in the energy barrier height. In order to address the poor reliability of STT-MRAMs, usage of Error Correcting Codes (ECC) has been proposed. Unlike traditional CMOS memory technologies, ECC is expected to correct both soft and hard errors in STT-MRAMs. To achieve acceptable yield with low write power, stronger ECC is required, resulting in increased number of encoded bits and degraded memory capacity. In this article, we propose Failure-aware ECC (FaECC), which masks permanent faults while maintaining the same correction capability for soft errors without increased number of encoded bits. Furthermore, we investigate the impact of process variations on run-time reliability of STT-MRAMs. In order to analyze the effectiveness of our methodology, we developed a cross-layer simulation framework that consists of device, circuit and array level analysis of STT-MRAM memory arrays. Our results show that using FaECC relaxes the requirements on the energy barrier height, which reduces the write energy and results in smaller access transistor size and memory array area.

Journal ArticleDOI
TL;DR: A wear-resistant page allocation algorithm is developed, which exploits the diverse write characteristics of different program segments to improve PCM write endurance within almost no extra remapping cost in terms of energy and performance.
Abstract: Improving the endurance of phase change memory (PCM) is a fundamental issue when PCM technology is considered as an alternative to main memory usage. Existing wear-leveling techniques overcome this challenge through constantly remapping hot virtual pages, thus engendering a fair amount of extra write operations to PCM and imposing considerable performance and energy overhead. Our observation is that it is unnecessary to fully balance the accesses to different physical page frames during the execution of each process. Instead, since endurance is a lifetime factor, the hot virtual pages of different processes can be mapped to different physical pages in the PCM. Leveraging this property, we develop a wear-resistant page allocation algorithm, which exploits the diverse write characteristics of different program segments to improve PCM write endurance within almost no extra remapping cost in terms of energy and performance. The results of experiments conducted based on SPEC benchmarks show that the proposed technique can prolong PCM lifetime by hundreds of times within nearly zero searching and remapping overhead.

Journal ArticleDOI
TL;DR: This work develops a fully characterized system-on-chip from the basic cell up to the system architecture in a 40nm LP hybrid CMOS/magnetic process and implements a check-pointing methodology based on the regular interrupt routines of a processor to enable a fast power on and off functionality.
Abstract: The most widely used embedded memory technology, static random access memory (SRAM), is heading toward scaling problems in advanced technology nodes due to the leakage currents caused by the quantum tunneling effect. As an alternative, spin-transfer torque magnetic RAM (STT-MRAM) technology shows comparable performance in terms of speed and power consumption and much better performance in terms of density and leakage. Moreover, MRAM brings up new paradigms in system design thanks to its inherent nonvolatility, which allows the definition of new instant-on/off policies and leakage current optimization. Based on our compact model, we have developed a fully characterized system-on-chip from the basic cell up to the system architecture in a 40nm LP hybrid CMOS/magnetic process. Through simulations, first we demonstrate that STT-MRAM is a candidate for the memory part of embedded systems, and second we implement a check-pointing methodology based on the regular interrupt routines of a processor to enable a fast power on and off functionality. Using a synthetic benchmark developed in high-level programming languages intended to be representative of integer system performance, our method shows that having MRAM instead of SRAM in an embedded design brings up important energy savings. The influence of the check-pointing routine on power consumption is finally evaluated with regard to various shutdown and restart behaviors.

Journal ArticleDOI
TL;DR: This work investigates the relation between the number of gates and number of splitters and the effect of so-called splitters to the signal strength in the domain of optical circuits by considering a variety of synthesis approaches.
Abstract: Optical circuits are considered a promising emerging technology for applications in ultra-high-speed networks or interconnects. However, the development of (automatic) synthesis approaches for such circuits is still in its infancy. Although first generic and automatic synthesis approaches have been proposed, no clear understanding exists yet on how to keep the costs of the resulting circuits as small as possible. In the domain of optical circuits, this is particularly interesting for the number of gates and the effect of so-called splitters to the signal strength. In this work, we investigate this relation by considering a variety of (existing as well as proposed) synthesis approaches for optical circuits. Our investigations show that reducing the number of gates and reducing the number of splitters are contradictory optimization objectives. Furthermore, the performance of synthesis guided with respect to gate efficiency as well as synthesis guided with respect to splitter freeness is evaluated and an overhead factor between the contradictory metrics is experimentally determined.

Journal ArticleDOI
TL;DR: A MRAM PUF architecture is presented and resistances in MRAM cells can be used to generate analog voltage output that are easier to detect if probed by an adversary and a discussion on the threat resilience ability of the new improved MRAMPUF to attacks from probing-, tampering-, reuse-, and simulation-based models is discussed.
Abstract: In this work, we have studied two novel techniques to enhance the performance of existing geometry-based magnetoresistive RAM physically unclonable function (MRAM PUF). Geometry-based MRAM PUFs rely only on geometric variations in MRAM cells that generate preferred ground state in cells and form the basis of digital signature generation. Here we study two novel ways to improve the performance of the geometry-based PUF signature. First, we study how the choice between specific geometries can enhance the reliability of the digital signature. Using fabrications and simulations, we study how the rectangular shape in the PUF cells is more susceptible to lithography-based geometric variations than the elliptical shape of the same aspect ratio. The choice of rectangular over elliptical masks in the lithography process can therefore improve the reliability of the digital signature from PUF. Second, we present a MRAM PUF architecture and study how resistances in MRAM cells can be used to generate analog voltage output that are easier to detect if probed by an adversary. In the new PUF architecture, we have the choice between selection of rows and columns to generate unique and hard-to-predict analog voltage outputs. For a 64-bit response, the analog voltage output can range between 20 and 500 mV, making it tough for an adversary to guess over this wide range of voltages. This work ends with a discussion on the threat resilience ability of the new improved MRAM PUF to attacks from probing-, tampering-, reuse-, and simulation-based models.

Journal ArticleDOI
TL;DR: An approach to the synthesis of secure real-time applications mapped on distributed embedded systems, which focuses on preventing fault injection attacks of the security protection on processing units, and proposes an efficient algorithm based on the fruit fly optimization algorithm.
Abstract: Fault injection attack has been a serious threat to security-critical embedded systems for a long time, yet existing research ignores addressing of the problem from a system-level perspective. This article presents an approach to the synthesis of secure real-time applications mapped on distributed embedded systems, which focuses on preventing fault injection attacks of the security protection on processing units. We utilize symmetric cryptographic service to protect confidentiality and deploy fault detection within a confidential algorithm to resist fault injection attacks. Several fault detection schemes are identified, and their fault coverage rates and time overheads are derived and measured. Our synthesis approach makes efforts to determine the best fault detection schemes for the encryption/decryption of messages such that the overall security strength of detecting a fault injection attack is maximized and the deadline constraint of the real-time applications is guaranteed. Due to the complexity of the problem, we propose an efficient algorithm based on the fruit fly optimization algorithm, and we compare it to the simulated annealing approach. Extensive experiments and a real-life application evaluation demonstrate the superiority of our approach.

Journal ArticleDOI
TL;DR: The experimental results indicate that the robustness of this newly proposed design is significantly enhanced in comparison with its the fault-tolerant wire-based counterparts in the presence of various faulty regions under both synthetic and application-specific traffic patterns.
Abstract: Wireless Network-on-Chip (WNoC) architectures have emerged as a promising interconnection infrastructure to address the performance limitations of traditional wire-based multihop NOCs. Nevertheless, the WNoC systems encounter high failure rates due to problems pertaining to integration and manufacturing of wireless interconnection in nano-domain technology. As a result, the permanent failures may lead to the formation of any shape of faulty regions in the interconnection network, which can break down the whole system. This issue is not investigated in previous studies on WNoC architectures. Our solution advocates the adoption of communication structures with both node and link on disjoint paths. On the other hand, the imposed costs of WNoC design must be reasonable. Hence, a novel approach to design an optimized fault-tolerant hybrid hierarchical WNoC architecture for enhancing performance as well as minimizing system costs is proposed. The experimental results indicate that the robustness of this newly proposed design is significantly enhanced in comparison with its the fault-tolerant wire-based counterparts in the presence of various faulty regions under both synthetic and application-specific traffic patterns.

Journal ArticleDOI
TL;DR: A novel software-hardware co-designed solution (i.e., Red-Shield) is proposed, which consists of three optimizations to overcome the limitations of the existing solutions to combat the read disturbance of STT-RAM.
Abstract: To address the high energy consumption issue of SRAM on GPUs, emerging Spin-Transfer Torque (STT-RAM) memory technology has been intensively studied to build GPU register files for better energy-efficiency, thanks to its benefits of low leakage power, high density, and good scalability. However, STT-RAM suffers from the read disturbance issue, which stems from the fact that the voltage difference between read current and write current becomes smaller as technology scales. The read disturbance leads to high error rates for read operations, which cannot be effectively protected by the SEC-DED ECC on large-capacity register files of GPUs. Prior schemes (e.g., read-restore) to mitigate the read disturbance usually incur either non-trivial performance loss or excessive energy overhead, thus not applicable for the GPU register file design that aims to achieve both high performance and energy-efficiency. To combat the read disturbance, we propose a novel software-hardware co-designed solution (i.e., Red-Shield), which consists of three optimizations to overcome the limitations of the existing solutions. First, we identify dead reads at compiling stage and augment instructions to avoid unnecessary restores. Second, we employ a small read buffer to accommodate register reads with high-access locality to further reduce restores. Third, we propose an adaptive restore mechanism to selectively pick the suitable restore scheme, according to the busy status of corresponding register banks. Experimental results show that our proposed design can effectively mitigate the performance loss and energy overhead caused by restore operations while still maintaining the reliability of reads.

Journal ArticleDOI
TL;DR: A new technique to reduce the magnet count for an ASL majority gate but still ensure correct functioning through layout optimization methods is proposed and a standard cell library with diverse functionality is built, which results in circuits that are 12.90% faster, consume 26.16% less energy, and are 33.56% more area efficient.
Abstract: All-Spin Logic (ASL) devices provide a promising spintronics-based alternative for Boolean logic implementations in the post-Complementary Metal-Oxide Semiconductor (CMOS) era. In principle, any logic functionality can be implemented in ASL. In practice, the performance of an ASL gate is significantly affected by layout choices, but such implications have not been adequately explored in the past. This article proposes a systematic approach for building standard cells in ASL, which are a basic building block in an overall design methodology for implementing large ASL-based circuits. We first propose a new technique to reduce the magnet count for an ASL majority gate but still ensure correct functioning through layout optimization methods. Building on physics-based analysis, we then build a standard cell library with diverse functionality and characterize the library for delay, energy, and area. We perform delay-optimized technology mapping on ISCAS85 benchmark circuits using our library. Our approach results in circuits that are 12.90% faster, consume 26.16% less energy, and are 33.56% more area efficient compared to a standard cell library that does not incorporate layout-based optimization techniques of our work.

Journal ArticleDOI
TL;DR: This work proposes eight source authentication mechanisms that can achieve similar level of security as SHA-3 for a router configuration perspective without causing significant area and power increase.
Abstract: It is known that maliciously configured Network-on-Chip routers can enable an attacker to launch different attacks inside a Multiprocessor System-on-Chip. A source authentication mechanism for router configuration packets can prevent such vulnerability. This ensures that a router is configured by the configuration packets sent only by a trusted configuration source. Conventional method like Secure Hash Algorithm-3 (SHA-3) can provide required source authentication in a router but with a router area overhead of 1355.25% compared to a normal router area. We propose eight source authentication mechanisms that can achieve similar level of security as SHA-3 for a router configuration perspective without causing significant area and power increase. Moreover, the processing time of our proposed techniques is 1/100th of SHA-3 implementation. Most of our proposed techniques use different timing channel watermarking methods to transfer source authentication data to the receiver router. We also propose the Individual packet-based stream authentication technique and combinations of this technique with timing channel watermarking techniques. It is shown that, among all of our proposed techniques, maximum router area increment required is 28.32% compared to a normal router.

Journal ArticleDOI
TL;DR: A proposal of an index regarding IAQ which considers both the aspects of thermal comfort and non-toxicity is presented, which was calculated for offices of several European countries, available from previous studies and for Portugal as well.
Abstract: In 2002, the European Commission (EU) issued a Directive aiming to reduce the energy consumption of buildings, which was adopted by the EU member states and came into force in 2006. Portugal adopted it by issuing law decrees in 2006 which considered not only the energy saving aspects but also additional specific measures aiming to protect indoor air quality (IAQ). This new legislation is now being enforced, and it will be necessary to define compliance acceptance levels for the prescribed indoor air limits. The use of comfort or environmental indexes could be of considerable help to ameliorate the evaluation of IAQ. This paper presents a proposal of an index regarding IAQ which considers both the aspects of thermal comfort and non-toxicity. The proposed index was calculated for offices of several European countries, available from previous studies and for Portugal as well. Bearing in mind there is few existing data, this study is consistent with the proposed index, as the obtained values are similar to Greece, which has several similarities with the Portuguese situation.

Journal ArticleDOI
TL;DR: It is shown, conversely to what is generally assumed, that frontside injection can provide even better results compared to backside injection, especially for low-cost beams with a large laser spot.
Abstract: The development of cryptographic devices was followed by the development of so-called implementation attacks, which are intended to retrieve secret information exploiting the hardware itself. Among these attacks, fault attacks can be used to disturb the circuit while performing a computation to retrieve the secret. Among possible means of injecting a fault, laser beams have proven to be accurate and powerful. The laser can be used to illuminate the circuit either from its frontside (i.e., where metal interconnections are first encountered) or from the backside (i.e., through the substrate). Historically, frontside injection was preferred because it does not require the die to be thinned. Nevertheless, due to the increasing integration of metal layers in modern technologies, frontside injections do not allow targeting of any desired location. Indeed, metal lines act as mirrors, and they reflect and refract most of the energy provided by the laser beam. Conversely, backside injections, although more difficult to set up, allow an increase of the resolution of the target location and remove the drawbacks of the frontside technique. This article compares experimental results from frontside and backside fault injections. The effectiveness of the two techniques is measured in terms of exploitable errors on an AES circuit (i.e., errors that can be used to extract the value of the secret key used during the encryption process). We will show, conversely to what is generally assumed, that frontside injection can provide even better results compared to backside injection, especially for low-cost beams with a large laser spot.

Journal ArticleDOI
TL;DR: A simplified phase model is proposed to perform phase and frequency synchronization prediction based on a synthesis of earlier models, enabling the effective and efficient simulation of the large numbers of oscillators required for practical computing systems.
Abstract: Building oscillator-based computing systems with emerging nano-device technologies has become a promising solution for unconventional computing tasks like computer vision and pattern recognition. However, simulation and analysis of these computing systems is both time and compute intensive due to the nonlinearity of new devices and the complex behavior of coupled oscillators. In order to speed up the simulation of coupled oscillator systems, we propose a simplified phase model to perform phase and frequency synchronization prediction based on a synthesis of earlier models. Our model can predict the frequency-locking behavior with several orders of magnitude speedup compared to direct evaluation, enabling the effective and efficient simulation of the large numbers of oscillators required for practical computing systems. We demonstrate the oscillator-based computing paradigm with three applications, pattern matching, convolution, and image segmentation. The simulation with these models are respectively sped up by factors of 780, 300, and 1120 in our tests.

Journal ArticleDOI
TL;DR: It is shown that the considered devices are able to intrinsically tolerate a rather high number of faults, and exploited this property to build a robust and scalable adder whose area, performance and leakage power characteristics are improved by 15%, 18% and 12%, respectively, when compared to an equivalent Fin FET solution at 22-nm technology node.
Abstract: This article first explores the effects of faults on circuits implemented with controllable-polarity transistors. We propose a new fault model that suits the characteristics of these devices, and we report the results of a SPICE-based analysis of the effects of faults on the behavior of some basic gates implemented with them. Hence, we show that the considered devices are able to intrinsically tolerate a rather high number of faults. We finally exploit this property to build a robust and scalable adder whose area, performance, and leakage power characteristics are improved by 15%, 18%, and 12%;, respectively, when compared to an equivalent FinFET solution at 22nm technology node.