scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Embedded Systems Letters in 2018"


Journal ArticleDOI
TL;DR: The proposed design, even with the additional error recovery module, is more accurate, requires less hardware, and consumes less power than previously proposed 4–2 compressor-based approximate multiplier designs.
Abstract: Approximate multiplication is a common operation used in approximate computing methods for high performance and low power computing. Power-efficient circuits for approximate multiplication can be realized with an approximate 4–2 compressor. This letter presents a novel design that uses a modification of a previous approximate 4–2 compressor design and adds an error recovery module. The proposed design, even with the additional error recovery module, is more accurate, requires less hardware, and consumes less power than previously proposed 4–2 compressor-based approximate multiplier designs.

130 citations


Journal ArticleDOI
TL;DR: This letter discusses extensions to HLS tools for creating secure heterogeneous architectures for system-on-chip architectures.
Abstract: High-level synthesis (HLS) tools have made significant progress in the past few years, improving the design productivity for hardware accelerators and becoming mainstream in industry to create specialized system-on-chip architectures Increasing the level of security of these heterogeneous architectures is becoming critical However, state-of-the-art security countermeasures are still applied only to the code executing on the processor cores or manually implemented into the generated components, leading to suboptimal and sometimes even insecure designs This letter discusses extensions to HLS tools for creating secure heterogeneous architectures

67 citations


Journal ArticleDOI
TL;DR: Experimental results indicated that the encryption layer and the IPS increased the security of the link between the PLC and the supervisory software, preventing interception, injection, and DoS attacks.
Abstract: During its nascent stages, programmable logic controllers (PLCs) were made robust to sustain tough industrial environments, but little care was taken to raise defenses against potential cyberthreats. The recent interconnectivity of legacy PLCs and supervisory control and data acquisition (SCADA) systems with corporate networks and the Internet has significantly increased the threats to critical infrastructure. To counter these threats, researchers have put their efforts in finding defense mechanisms that can protect the SCADA network and the PLCs. Encryption and intrusion prevention systems (IPSs) have been used by many organizations to protect data and the network against cyber-attacks. However, since PLC vendors do not make available information about their hardware or software, it becomes challenging for researchers to embed security mechanisms into their devices. This letter describes an alternative design using an open source PLC that was modified to encrypt all data it sends over the network, independently of the protocol used. Additionally, a machine learning-based IPS was added to the PLC network stack providing a secure mechanism against network flood attacks like denial of service (DoS). Experimental results indicated that the encryption layer and the IPS increased the security of the link between the PLC and the supervisory software, preventing interception, injection, and DoS attacks.

63 citations


Journal ArticleDOI
TL;DR: In this paper, the authors discuss the need for energy transparency in software development and emphasize on how such transparency can be realized to help tackle the IoT energy challenge, and propose a concept that makes a program's energy consumption visible, from hardware to software.
Abstract: The Internet of Things (IoT) sparks a whole new world of embedded applications. Most of these applications are based on deeply embedded systems that have to operate on limited or unreliable sources of energy, such as batteries or energy harvesters. Meeting the energy requirements for such applications is a hard challenge, which threatens the future growth of the IoT. Software has the ultimate control over hardware. Therefore, its role is significant in optimizing the energy consumption of a system. Currently, programmers have no feedback on how their software affects the energy consumption of a system. Such feedback can be enabled by energy transparency, a concept that makes a program’s energy consumption visible, from hardware to software. This letter discusses the need for energy transparency in software development and emphasizes on how such transparency can be realized to help tackle the IoT energy challenge.

39 citations


Journal ArticleDOI
TL;DR: A taxonomy that classifies approximate computing techniques according to salient features: visibility, determinism, and coarseness is presented to address questions about the correctability, reproducibility, and control over accuracy–efficiency tradeoffs of different techniques.
Abstract: Approximate computing is the idea that systems can gain performance and energy efficiency if they expend less effort on producing a “perfect” answer. Approximate computing techniques propose various ways of exposing and exploiting accuracy–efficiency tradeoffs. We present a taxonomy that classifies approximate computing techniques according to salient features: visibility, determinism, and coarseness. These axes allow us to address questions about the correctability, reproducibility, and control over accuracy–efficiency tradeoffs of different techniques. We use this taxonomy to inform research challenges in approximate architectures, compilers, and applications.

39 citations


Journal ArticleDOI
TL;DR: This letter demonstrates that introducing diverse building blocks to implement the multiplier rather than cloning one building block achieves higher precision results, and shows that the proposed heterogeneous multiplier achieves more precise outputs than the tested circuits while improving performance and power tradeoffs.
Abstract: Approximate computing is a design paradigm considered for a range of applications that can tolerate some loss of accuracy. In fact, the bottleneck in conventional digital design techniques can be eliminated to achieve higher performance and energy efficiency by compromising accuracy. In this letter, a new architecture that engages accuracy as a design parameter is presented, where an approximate parallel multiplier using heterogeneous blocks is implemented. Based on design space exploration, we demonstrate that introducing diverse building blocks to implement the multiplier rather than cloning one building block achieves higher precision results. We show experimental results in terms of precision, delay, and power dissipation as metrics and compare with three previous approximate designs. Our results show that the proposed heterogeneous multiplier achieves more precise outputs than the tested circuits while improving performance and power tradeoffs.

36 citations


Journal ArticleDOI
TL;DR: This letter proposes a fast update mechanism for an SRAM-based TCAM and implements it on Xilinx Virtex-6 field-programmable gate array and is the first ever proposal on content-update-module in anSRAM- based TCAM, which consumes least possible clock cycles to update a TCAM word.
Abstract: Static random-access memory (SRAM)-based ternary content-addressable memory (TCAM), an alternative to traditional TCAM, where inclusion of SRAM improves the memory access speed, scalability, cost, and storage density compared to conventional TCAM. In order to confidently use the SRAM-based TCAMs in application, an update module (UM) is essential. The UM replaces the old TCAM contents with fresh contents. This letter proposes a fast update mechanism for an SRAM-based TCAM and implements it on Xilinx Virtex-6 field-programmable gate array. To the best of authors’ knowledge, this is the first ever proposal on content-update-module in an SRAM-based TCAM, which consumes least possible clock cycles to update a TCAM word.

26 citations


Journal ArticleDOI
TL;DR: The proposed 6T SRAM architecture uses three supply voltages to improve the static noise margin during read and write modes and also reduces leakage current in retention mode, hence, it allows aggressive supply voltage scaling for low power multimedia applications.
Abstract: A voltage-scalable SRAM architecture suitable for video applications where energy can be traded with output signal quality is presented. The proposed 6T SRAM architecture uses three supply voltages to improve the static noise margin during read and write modes and also reduces leakage current in retention mode, hence, it allows aggressive supply voltage scaling for low power multimedia applications. Simulation results in IBM/Global Foundries cmos32soi 32-nm CMOS technology show a 69% power saving and a 63% improvement in image quality for the proposed array compared to a conventional single-supply 64-kB 6T SRAM at 0.70 V, 20 MHz. The proposed design also allows a dynamic power-quality tradeoff at run time and makes the 6T SRAM array a suitable power efficient memory for different video/image applications.

22 citations


Journal ArticleDOI
TL;DR: The concept of a hierarchical dynamic goal manager that considers the priority, significance, and constraints of each application, while holistically coupling the overlapping and/or contradicting goals of different applications to satisfy embedded system constraints is presented.
Abstract: Many-core systems are highly complex and require thorough orchestration of different goals across the computing abstraction stack to satisfy embedded system constraints. Contemporary resource management approaches typically focus on a fixed objective, while neglecting the need for replanning (i.e., updating the objective function). This trend is particularly observable in existing resource allocation and application mapping approaches that allocate a task to a tile to maximize a fixed objective (e.g., the cores’ and network’s performance), while minimizing others (e.g., latency and power consumption). However, embedded system goals typically vary over time, and also over abstraction levels, requiring a new approach to orchestrate these varying goals. We motivate the problem by showcasing conflicts resulting from state-of-the-art fixed-objective resource allocation approaches, and highlight the need to incorporate dynamic goal management from the very early stages of design. We then present the concept of a hierarchical dynamic goal manager that considers the priority, significance, and constraints of each application, while holistically coupling the overlapping and/or contradicting goals of different applications to satisfy embedded system constraints.

16 citations


Journal ArticleDOI
TL;DR: A nonvolatile approximate lookup table, called Nvalt, to significantly accelerate general public utilities (GPUs) computation and defines a similarity metric, appropriate for binary representation, by exploiting the analog characteristics of thenonvolatile content addressable memory.
Abstract: In this letter, we design a nonvolatile approximate lookup table, called Nvalt, to significantly accelerate general public utilities (GPUs) computation. Our design stores high frequency input patterns within an approximate Nvalt to model each application’s functionality. Nvalt searches for and returns the stored data best matching the input data to produce an approximate output. We define a similarity metric, appropriate for binary representation, by exploiting the analog characteristics of the nonvolatile content addressable memory. Our design controls the ratio of the application running between the approximate Nvalt and accurate GPU cores in order to tune the level of accuracy necessary for user requirements. Our evaluations on seven general GPU applications shows that, Nvalt can improve energy computation by $4.5 \times $ and performance by $5.7 \times $ on average while providing less than 10% average relative error as compared to the baseline GPU.

16 citations


Journal ArticleDOI
TL;DR: A hardware-based countermeasure against return address corruption in the processor stack is proposed and validated on the OpenRISC core with a minimal hardware modification of the targeted core and an easy integration at the application level.
Abstract: With the emergence of Internet of Things, embedded devices are increasingly the target of software attacks. The aim of these attacks is to maliciously modify the behavior of the software being executed by the device. The work presented in this letter has been developed for the Cyber Security Awareness Week Embedded Security Challenge. This contest focuses on memory corruption issues, such as stack overflow vulnerabilities. These low level vulnerabilities are the result of code errors. Once exploited, they allow an attacker to write arbitrary data in memory without limitations. We detail in this letter a hardware-based countermeasure against return address corruption in the processor stack. First, several exploitation techniques targeting stack return addresses are discussed, whereas a lightweight hardware countermeasure is proposed and validated on the OpenRISC core. The countermeasure presented follows the shadow stack concept with a minimal hardware modification of the targeted core and an easy integration at the application level.

Journal ArticleDOI
TL;DR: A novel technique is presented that clusters iterations according to data statistics and applies different approximations in each cluster to maximize performance gains, and can be improved by up to 76%, with up to 21.7% stemming from clustering.
Abstract: Approximate computing (AC) obtains performance or energy gains by trading off computational accuracy in error-tolerant applications. At the hardware level, AC-aware high-level synthesis tools apply operation-level approximations to generate a quality-reduced register-transfer level design from an accurate high-level description. While existing tools primarily target energy savings, we focus on performance optimizations. Loops are often the most performance-critical application code structures. In a loop, iterations can have different impacts on output quality due to an inherent data-dependency of approximations. Exploiting iteration-wise data variations, we present a novel technique that clusters iterations according to data statistics and applies different approximations in each cluster to maximize performance gains. Performance can be improved by up to 76%, with up to 21.7% stemming from clustering.

Journal ArticleDOI
TL;DR: This letter presents the first fault injection attack based on JTAG, using the example of a privilege escalation attack, and details how this tool can be used either to check the feasibility of this attack by fault injection or to perform an actual attack.
Abstract: Fault injection attacks are wide spread in the domain of smart cards and microcontrollers but have not been yet democratized on complex embedded microprocessors, such as system on chip that can be found in smart phones, tablets, and automotive systems. The main explanation is the difficulty involved in injecting a fault at the right place and at the right time to make these attacks effective on such devices. However, for development and debugging, these devices provide new tools that could be considered as possibly enabling attacks. One of them, the JTAG debug tool is present on most electronic devices today. In this letter, we present the first fault injection attack based on JTAG. Using the example of a privilege escalation attack, we detail how this tool can be used either to check the feasibility of this attack by fault injection or to perform an actual attack.

Journal ArticleDOI
TL;DR: A novel methodology to calculate the arithmetic error rate (AER) for deterministic approximate adder architectures, where the calculation of each output bit is restricted to a subset of the input bits, denoted as visibilities.
Abstract: In this letter, we present a novel methodology to calculate the arithmetic error rate (AER) for deterministic approximate adder architectures, where the calculation of each output bit is restricted to a subset of the input bits, denoted as visibilities . Such architectures have been widely proposed in the literature and are, e.g., obtained when splitting the carry chain in a carry-propagate adder into partitions each computed by a separate parallel adder, or when removing carry-lookahead operators in a parallel prefix adder. Our contribution is a unified calculus for determining the ARE for: 1) such deterministic approximate adder architectures making use of visibilities and 2) the general case of arbitrarily (also nonuniformly) distributed input bits.

Journal ArticleDOI
TL;DR: This letter surveys memory safety in the context of embedded processors, and describes different attacks that can subvert the legitimate control flow, with a special focus on return oriented programming.
Abstract: For more than two decades, memory safety violations and control-flow integrity attacks have been a prominent threat to the security of computer systems. Contrary to regular systems that are updated regularly, application-constrained devices typically run monolithic firmware that may not be updated in the lifetime of the device after being deployed in the field. Hence, the need for protections against memory corruption becomes even more prominent. In this letter, we survey memory safety in the context of embedded processors, and describe different attacks that can subvert the legitimate control flow, with a special focus on return oriented programming. Based on common attack trends, we formulate the anatomy of typical memory corruption attacks and discuss powerful mitigation techniques that have been reported in the literature.

Journal ArticleDOI
TL;DR: In this paper, the self-correcting behavior of iterative algorithms is used to reduce the computational effort and bandwidth required for the execution of the discussed algorithm, especially when targeting special accelerator hardware.
Abstract: Approximate computing has shown to provide new ways to improve performance and power consumption of error-resilient applications. While many of these applications can be found in image processing, data classification, or machine learning, we demonstrate its suitability to a problem from scientific computing. Utilizing the self-correcting behavior of iterative algorithms, we show that approximate computing can be applied to the calculation of inverse matrix ${p}$ th roots which are required in many applications in scientific computing. Results show great opportunities to reduce the computational effort and bandwidth required for the execution of the discussed algorithm, especially when targeting special accelerator hardware.

Journal ArticleDOI
TL;DR: Experimental results show that SVMDVA is an effective technique for reducing hotspot occurrences and increasing throughput for 3D-MCPs and can effectively limit the temperature increase in3D- MCPs.
Abstract: Hotspots occur frequently in 3-D multicore processors (3D-MCPs) and they may adversely impact the reliability of the system and its lifetime. We present support-vector-machine (SVM)-based dynamic voltage assignment (SVMDVA) strategy to select voltages among low-power and high-performance operating modes for reducing hotspots and optimizing performance in 3D-MCPs. The proposed SVMDVA can be employed in online, thermally constrained task schedulers. First, we revealed two different thermal regions of 3D-MCPs and extract different key features of these regions. Based on these key features, SVM models are constructed to predict the thermal behavior and the best operation mode of 3D-MCPs during runtime. SVMDVA using SVM models with monitoring workload and temperature behavior can effectively limit the temperature increase in 3D-MCPs. This is extremely important for accurately predicting the thermal behavior and providing the optimum operating condition of 3D-MCPs to achieve the best system performance. Experimental results show that SVMDVA is an effective technique for reducing hotspot occurrences (57.19%) and increasing throughput (25.41%) for 3D-MCPs.

Journal ArticleDOI
TL;DR: The relationship between hardware failures and the AC paradigm is discussed and the opportunities of efficient test methods to improve yields of hardware blocks underlying AC components, as well as the associated conceptual and practical challenges are presented.
Abstract: As it becomes more and more evident that approximate computing (AC) is a key enabler for next-generation computer architectures, physical realization of AC systems is gaining in relevance. This letter discusses a crucial and under-investigated area: the relationship between hardware failures and the AC paradigm. We present the opportunities of efficient test methods to improve yields of hardware blocks underlying AC components, as well as the associated conceptual and practical challenges. We distinguish between incomplete designs and underprovisioned circuits and describe shortcomings of existing test methods for both these classes of AC blocks. We also discuss the intricate interplay of approximate circuitry with failures that manifest themselves during the device’s life-time.

Journal ArticleDOI
TL;DR: Gandalf as discussed by the authors is a compiler assisted hardware extension for the OpenRISC processor that thwarts all forms of memory-based attacks and achieves locality and incurs minimal overheads in the hardware.
Abstract: Illegal memory accesses are a serious security vulnerability that have been exploited on numerous occasions. In this letter, we present Gandalf, a compiler assisted hardware extension for the OpenRISC processor that thwarts all forms of memory-based attacks. We associate lightweight capabilities to all program variables, which are checked at run time by the hardware. Gandalf is transparent to the user and does not require significant OS modifications. Moreover, it achieves locality and incurs minimal overheads in the hardware. We demonstrate these features with a customized Linux kernel executing SPEC2006 benchmarks. To the best of our knowledge, this is the first work to demonstrate a complete solution for hardware-based memory protection schemes for embedded platforms.

Journal ArticleDOI
TL;DR: This letter investigated the dependencies of resource consumption on the granularity of coarse grained function definitions using the extended database management function of CyberWorkBench and showed that small footprint was achieved especially with dynamically reconfigurable technique.
Abstract: This letter describes a newly established design framework with the layered architecture of processing elements (PEs) exploiting high-level synthesis and its evaluation results. The design framework was developed for intelligent sensor nodes of Internet of Things applications that collaborate with cloud systems, in which small footprint and low power consumption were major concerns. The design framework consists of four layered structure of PE architecture with the extended database management function of a high-level synthesis tool. We investigated the dependencies of resource consumption on the granularity of coarse grained function definitions using the extended database management function of CyberWorkBench. The evaluation results showed that small footprint was achieved especially with dynamically reconfigurable technique.

Journal ArticleDOI
TL;DR: This letter presents a novel approach for determining under what conditions a software verification result is valid for approximate hardware, and derives a set of constraints which—when met by the AC hardware—guarantees the verification result to carry over to AC.
Abstract: Approximate computing (AC) is an emerging paradigm for energy-efficient computation. The basic idea of AC is to sacrifice high precision for low energy by allowing hardware to carry out “approximately correct” calculations. This provides a major challenge for software quality assurance: programs successfully verified to be correct might be erroneous on approximate hardware. In this letter, we present a novel approach for determining under what conditions a software verification result is valid for approximate hardware. To this end, we compute the allowed tolerances for AC hardware from successful verification runs. More precisely, we derive a set of constraints which—when met by the AC hardware—guarantees the verification result to carry over to AC. On the practical side, we furthermore: 1) show how to extract tolerances from verification runs employing predicate abstraction as verification technology and 2) show how to check such constraints on hardware designs. We have implemented all techniques, and exemplify them on example C programs and a number of recently proposed approximate adders .

Journal ArticleDOI
TL;DR: This letter presents a PLC error detection and correction technique utilizing shifted time redundancy, enabled by a full crossbar connection network, to successfully detect, isolate, and mitigate cybersecurity attacks and faults in PLC devices in a factory setting where multiple parallel assembly lines complete the same task.
Abstract: As frequently buying, programming, testing, and utilizing new programmable logic controllers (PLCs) is cost and time prohibitive, many factories utilize legacy systems which are potentially insecure. This letter presents a PLC error detection and correction technique utilizing shifted time redundancy, enabled by a full crossbar connection network. It aims to successfully detect, isolate, and mitigate cybersecurity attacks and faults in PLC devices in a factory setting where multiple parallel assembly lines complete the same task. A trust scoring system is utilized in order to rate each PLC’s trustworthiness and reduces necessary overhead.

Journal ArticleDOI
TL;DR: This letter analyzed the memory exploitation codes being developed as a part of the Cyber Security Awareness Week-2016 competition, which are based on unsecured memcpy and return address modification by buffer overflow on OpenRISC and RISC-V architectures, and added eight new instructions to handle the four exploits by designing dedicated hardware stack and a module for checking against buffer overflow.
Abstract: Customized instructions have typically been used for enhancing the performance of embedded systems. However, the use of finding dedicated instructions for security has been rather limited. On the contrary, modern processors are crippled by the threats of memory integrity attacks, which typically target the control flow of a program and are mitigated at the software level. In this letter, we analyze the memory exploitation codes being developed as a part of the Cyber Security Awareness Week-2016 competition, which are based on unsecured memcpy and return address modification by buffer overflow on OpenRISC and RISC-V architectures, and implement protections at the hardware level. We added eight new instructions to handle the four exploits by designing dedicated hardware stack and a module for checking against buffer overflow. We have also performed a validation on RISC-V platform and introduced two new custom instructions to ensure security from unbounded memcpy . The proposed countermeasures and the new instructions are validated on field programmable gate array platform.

Journal ArticleDOI
TL;DR: This letter runs Caffe on various hardware platforms using different computation setups to train LeNet-5 on MNIST dataset and finds that the speedups perform diversely and the scalability of multicore CPU varies when processing different stages of the network.
Abstract: Deep learning system composed with multiple levels of layers is increasingly presented in diverse areas nowadays. To achieve good performance, multicore CPUs and accelerators are widely used in real system. Previous study shows that GPU can significantly speed up computation in deep neural networks, while the performance does not scale very well on multicore CPUs. In this letter, we run Caffe on various hardware platforms using different computation setups to train LeNet-5 on MNIST dataset and measure individual time durations of forward and backward passes for each layer. We find that the speedups perform diversely and the scalability of multicore CPU varies when processing different stages of the network. Based on the observation, we show it is worth applying different policies for each layer separately to achieve the overall optimized performance. In addition, our benchmarking results can be used for references to develop dedicated acceleration methods for individual layer of the network.

Journal ArticleDOI
TL;DR: A theoretically sound model for skipping loop executions without compromising stability is used as the basis for developing an SAT-based approach for synthesizing such schedulable loop execution patterns over multihop (wireless) control networks.
Abstract: We propose pattern-based execution of control loops as a preferable alternative to traditional fully periodic execution for implementing a set of embedded control systems where the sensors, actuators, and control nodes are connected via a shared wireless network. We use a theoretically sound model for skipping loop executions without compromising stability as the basis for developing an SAT-based approach for synthesizing such schedulable loop execution patterns over multihop (wireless) control networks. We demonstrate superior control performance as compared to controllers designed under the fully periodic solutions.

Journal ArticleDOI
TL;DR: Approximate multiplication is a common operation used in approximate computing methods for high performance and low power computing as mentioned in this paper, and power-efficient circuits for approximate multiplication can be found in the literature.
Abstract: Approximate multiplication is a common operation used in approximate computing methods for high performance and low power computing. Power-efficient circuits for approximate multiplication can be r...

Journal ArticleDOI
TL;DR: In this paper, the authors proposed an architecture that optimally reduces the amount of data handled at the architecture level realization of the basic median filtering operation on images by parallel and pipelined median filter architecture.
Abstract: The existing 2-D median filters in the literature are computationally intensive. It is proposed to optimally reduce the amount of data handled at the architecture level realization of the basic median filtering operation on images. The proposed architecture reads 4 pixels at a time in the input image, 4 pixels forming a word on a 32-bit hardware processing system; the subsequent processing is carried out by parallel and pipelined median filter architecture. Two read operations process eight input pixels which results in the generation of four output pixels with an initial latency. The proposed architecture offers reduced number of read operations and increased speed.

Journal ArticleDOI
TL;DR: This letter shows that embedded digital signal processors (DSPs) can benefit from dynamic branch prediction while staying within strict memory limitations, and addresses the design cost efficiency by offering memory saving optimizations.
Abstract: This letter presents a novel approach for designing a dynamic branch predictor. The proposed design, called decoupled-predictor literally decouples the prediction making from the prediction update stages of the scheme. This separation is intended to tailor each part for its particular task and to reduce unnecessary references to these parts. In this letter, we show that embedded digital signal processors (DSPs) can benefit from dynamic branch prediction while staying within strict memory limitations. This letter presents a detailed description of the predictor architecture and evaluates its performance through a set of trace driven simulations of embedded DSP applications. Finally, we address the design cost efficiency by offering memory saving optimizations.

Journal ArticleDOI
TL;DR: This work adopts an opportunity study for instruction-based distinction of read implementation to take advantage of both of the approximation techniques, while enhancing application’s QoR, and shows that the proposed method allows to increase application”sQoR substantially compared to QOR-oblivious use of the read approximation techniques.
Abstract: Although the read disturb spin-transfer torque RAM approximation technique improves performance, it may consist of an approximate read plus an approximate write both at the same time. So it may degrade the application quality of result (QoR) considerably. On the other hand, the incorrect read decision approximation technique improves power without corrupting the stored data. We adopt an opportunity study for instruction-based distinction of read implementation to take advantage of both of the approximation techniques, while enhancing application’s QoR. We evaluated the proposed method using a set of state-of-the-art benchmarks. The experimental results show that our method allows to increase application’s QoR substantially (i.e., by up to about 24 dB) compared to QoR-oblivious use of the read approximation techniques.

Journal ArticleDOI
TL;DR: inspired by the operation of a polymorphic virus, a novel threat model for NTC is proposed, referred to as a focally induced fault attack (FIFA), which employs a machine learning framework to ascertain the circuit vulnerabilities and generates targeted software modules to cause a breach of end-user privacy.
Abstract: In this letter, we explore the emerging security threats of near-threshold computing (NTC). Researchers have shown that the delay sensitivity of a circuit to supply voltage variation tremendously increases, as the circuit’s operating conditions shift from traditional super-threshold values to NTC values. As a result, NTC systems become extremely vulnerable to timing fault attacks, jeopardizing trustworthy computing. Inspired by the operation of a polymorphic virus, we propose a novel threat model for NTC, referred to as a focally induced fault attack (FIFA). FIFA employs a machine learning framework to ascertain the circuit vulnerabilities and generates targeted software modules to cause a breach of end-user privacy. Our experimental results, obtained from a rigorous machine learning approach, indicate the efficacy of FIFA, in a low-power mobile platform.