scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Low Power Electronics and Applications in 2020"


Journal ArticleDOI
TL;DR: In this review, memristive logic families which can implement MAJORITY gate and NOT are to be favored for in-memory computing, and one-bit full adders implemented in memory array using different logic primitives are compared and the efficiency of majority-based implementation is underscores.
Abstract: As we approach the end of Moore’s law, many alternative devices are being explored to satisfy the performance requirements of modern integrated circuits. At the same time, the movement of data between processing and memory units in contemporary computing systems (‘von Neumann bottleneck’ or ‘memory wall’) necessitates a paradigm shift in the way data is processed. Emerging resistance switching memories (memristors) show promising signs to overcome the ‘memory wall’ by enabling computation in the memory array. Majority logic is a type of Boolean logic which has been found to be an efficient logic primitive due to its expressive power. In this review, the efficiency of majority logic is analyzed from the perspective of in-memory computing. Recently reported methods to implement majority gate in Resistive RAM array are reviewed and compared. Conventional CMOS implementation accommodated heterogeneity of logic gates (NAND, NOR, XOR) while in-memory implementation usually accommodates homogeneity of gates (only IMPLY or only NAND or only MAJORITY). In view of this, memristive logic families which can implement MAJORITY gate and NOT (to make it functionally complete) are to be favored for in-memory computing. One-bit full adders implemented in memory array using different logic primitives are compared and the efficiency of majority-based implementation is underscored. To investigate if the efficiency of majority-based implementation extends to n-bit adders, eight-bit adders implemented in memory array using different logic primitives are compared. Parallel-prefix adders implemented in majority logic can reduce latency of in-memory adders by 50–70% when compared to IMPLY, NAND, NOR and other similar logic primitives.

22 citations


Journal ArticleDOI
TL;DR: The design, implementation and test results of a flood-monitoring system based on LoRa technology, tested in a real-world scenario, designed in a modular perspective to have the capability to interface different types of sensors without the need for making significant hardware changes to the proposed node architecture.
Abstract: The development of Internet of Things (IoT) systems is a rapidly evolving scenario, thanks also to newly available low-power wide area network (LPWAN) technologies that are utilized for environmental monitoring purposes and to prevent potentially dangerous situations with smaller and less expensive physical structures. This paper presents the design, implementation and test results of a flood-monitoring system based on LoRa technology, tested in a real-world scenario. The entire system is designed in a modular perspective, in order to have the capability to interface different types of sensors without the need for making significant hardware changes to the proposed node architecture. The information is stored through a device equipped with sensors and a microcontroller, connected to a LoRa wireless module for sending data, which are then processed and stored through a web structure where the alarm function is implemented in case of flooding.

22 citations


Journal ArticleDOI
TL;DR: In this article, an integrated circuit that can enhance the immunity of ion-sensitive field effect transistors (ISFET) against the temperature was presented, which can accurately compensate the temperature variation on the measured pH values at low power consumption.
Abstract: PH measurements are widely used in agriculture, biomedical engineering, the food industry, environmental studies, etc. Several healthcare and biomedical research studies have reported that all aqueous samples have their pH tested at some point in their lifecycle for evaluation of the diagnosis of diseases or susceptibility, wound healing, cellular internalization, etc. The ion-sensitive field effect transistor (ISFET) is capable of pH measurements. Such use of the ISFET has become popular, as it allows sensing, preprocessing, and computational circuitry to be encapsulated on a single chip, enabling miniaturization and portability. However, the extracted data from the sensor have been affected by the variation of the temperature. This paper presents a new integrated circuit that can enhance the immunity of ion-sensitive field effect transistors (ISFET) against the temperature. To achieve this purpose, the considered ISFET macro model is analyzed and validated with experimental data. Moreover, we investigate the temperature dependency on the voltage-current (I-V). Accordingly, an improved conditioning circuit is designed in order to reduce the temperature sensitivity on the measured pH values of the ISFET sensor. The numerical validation results show that the developed solution accurately compensates the temperature variation on the measured pH values at low power consumption.

21 citations


Journal ArticleDOI
TL;DR: The proposed device exhibited better performance compared to the P2110 commercial device, allowing a maximum distance of operation of up to 22 meters from the dedicated RF power source, making it suitable even for IoT (Internet of Things) applications.
Abstract: This paper presents the design and implementation of two front-ends for RF (Radio Frequency) energy harvesting, comparing them with the commercial one—P2110 by Powercast Co. (Pittsburgh, PA, USA) Both devices are implemented on a discrete element board with microstrip lines combined with lumped elements and are optimized for two different input power levels (−10 dBm and 10 dBm, respectively), at the GSM900 frequencies. The load has been fixed at 5kΩ, after a load-pull analysis on systems. The rectifiers stages implement two different Schottky diodes in two different topologies: a single diode and a 2-stage Dickson’s charge pump. The second one is compared with the P2110 by generating RF fields at 915 MHz with the Powercast Powerspot. The main aim of this work is to design simple and efficient low-cost devices, which can be used as a power supply for low-power autonomous sensors, with better performances than the current solutions of state-of-the-art equipment, providing an acceptable voltage level on the load. Measurements have been conducted for input power range −20 dBm up to 10 dBm; the best power conversion efficiency (PCE) is obtained with the second design, which reaches a value of 70% at 915 MHz. In particular, the proposed device exhibited better performance compared to the P2110 commercial device, allowing a maximum distance of operation of up to 22 meters from the dedicated RF power source, making it suitable even for IoT (Internet of Things) applications.

15 citations


Journal ArticleDOI
TL;DR: The power management circuits used to provide the voltage supply of an integrated temperature sensor with analog-to-digital converter for RF energy harvesters suitable for integration in wireless sensor nodes are reported.
Abstract: The paper describes the design and implementation of power management circuits for RF energy harvesters suitable for integration in wireless sensor nodes. In particular, we report the power management circuits used to provide the voltage supply of an integrated temperature sensor with analog-to-digital converter. A DC-DC boost converter is used to transfer efficiently the energy harvested from a generic radio-frequency rectifier into a charge reservoir, whereas a linear regulator scales the voltage supply to a suitable value for a sensing and conversion circuit. Implemented in a 65 nm CMOS technology, the power management system achieves a measured overall efficiency of 20%, with an available power of 4.5 μW at the DC-DC converter input. The system can sustain a temperature measurement rate of one sample/s with an RF input power of −28 dBm, making it compatible with the power levels available in generic outdoor environments.

14 citations


Journal ArticleDOI
TL;DR: An accurate and detailed analysis and design of six widely used complementary metal-oxide-semiconductor (CMOS) SISO ST circuits is presented and new design equations provide better accuracy and insights, as broad assumptions of original derivations were avoided.
Abstract: Schmitt trigger (ST) circuits are widely used integrated circuit (IC) blocks with hysteretic input/output (I/O) characteristics Like the I/O characteristics of a living neuron, STs reject noise and provide stability to systems that they are deployed in Indeed, single-input/single-output (SISO) STs are likely candidates to be the core unit element in artificial neural networks (ANNs) due not only to their similar I/O characteristics but also to their low power consumption and small silicon footprints This paper presents an accurate and detailed analysis and design of six widely used complementary metal-oxide-semiconductor (CMOS) SISO ST circuits The hysteresis characteristics of these ST circuits were derived for hand calculations and compared to original design equations and simulation results Simulations were carried out in a well-established, 035 μm/33 V, analog/mixed-signal CMOS process Additionally, simulations were performed using a wide range of supplies and process variations, but only 33 V supply results are presented Most of the new design equations provide better accuracy and insights, as broad assumptions of original derivations were avoided

14 citations


Journal ArticleDOI
TL;DR: In this paper, a fully integrated switched-capacitor (SC) DC-DC converter that steps down 2.0 V to 0.9 V with a peak efficiency of 80% is implemented in a 0.18 μ m CMOS process.
Abstract: A fully-integrated switched-capacitor (SC) DC-DC converter that steps down 2.0 V to 0.9 V with a peak efficiency of 80% is implemented in a 0.18 μ m CMOS process. An ultra-low-power voltage-controlled oscillator that generates a wide range of switching frequencies is proposed to extend battery runtime. An efficiency >70% for load currents in the range of 12 μ A to 17.8 mA is achieved by implementing a novel adaptively-biased pulse frequency modulation (ABPFM) technique in the controller. A symmetric charge-discharge topology with two-phase time interleaving is used as a power stage to reduce the output voltage ripple to <72 mV over the entire load current range.

11 citations


Journal ArticleDOI
TL;DR: This work analyses the feasibility of integrating a machine learning classifier inside a low-power embedded system in order to obtain information from the user’s gait in real-time and prevent future injuries.
Abstract: Abnormal foot postures can be measured during the march by plantar pressures in both dynamic and static conditions. These detections may prevent possible injuries to the lower limbs like fractures, ankle sprain or plantar fasciitis. This information can be obtained by an embedded instrumented insole with pressure sensors and a low-power microcontroller. However, these sensors are placed in sparse locations inside the insole, so it is not easy to correlate manually its values with the gait type; that is why a machine learning system is needed. In this work, we analyse the feasibility of integrating a machine learning classifier inside a low-power embedded system in order to obtain information from the user’s gait in real-time and prevent future injuries. Moreover, we analyse the execution times, the power consumption and the model effectiveness. The machine learning classifier is trained using an acquired dataset of 3000+ steps from 6 different users. Results prove that this system provides an accuracy over 99% and the power consumption tests obtains a battery autonomy over 25 days.

11 citations


Journal ArticleDOI
TL;DR: A simple scheme to implement class AB low-voltage fully differential amplifiers that do not require an output common-mode feedback network (CMFN) is introduced, which has a rail to rail output signal swing and high rejection of common- mode input signals.
Abstract: A simple scheme to implement class AB low-voltage fully differential amplifiers that do not require an output common-mode feedback network (CMFN) is introduced. It has a rail to rail output signal swing and high rejection of common-mode input signals. It operates in strong inversion with ±300 mV supplies in a 180 nm CMOS process. It uses an auxiliary amplifier that minimizes supply requirements by setting the op-amp input terminals very close to one of the rails and also serves as a common-mode feedback network to generate complementary output signals. The scheme is verified with simulation results of an amplifier that consumes 25 µW, has a gain-bandwidth product (GBW) of 16.1 MHz, slew rate (SR) of 8.4 V/µs, the small signal figure of merit (FOMSS) of 6.49 MHz*pF/µW, the large signal figure of merit (FOMLS) of 3.39 V/µs*pF/µW, and current efficiency (CE) of 2.03 in strong inversion, with a 10 pF load capacitance.

9 citations


Journal ArticleDOI
TL;DR: A low-noise instrumentation amplifier dedicated to a nano- and micro-electro-mechanical system (M&NEMS) microphone for the use in Internet of Things (IoT) applications is presented.
Abstract: A low-noise instrumentation amplifier dedicated to a nano- and micro-electro-mechanical system (M&NEMS) microphone for the use in Internet of Things (IoT) applications is presented. The piezoresistive sensor and the electronic interface are respectively, silicon nanowires and an instrumentation amplifier. To design an instrumentation amplifier for IoT applications, different trade-offs are discussed like power consumption, gain, noise and sensitivity. Because the most critical noisy block is the amplifier, a delay-time chopper stabilization (CHS) technique is implemented around it to eliminate its offset and 1/f noise. The low-noise instrumentation amplifier is implemented in a 65-nm CMOS (Complementary metal–oxide–semiconductor) technology. The supply voltage is 2.5 V while the power consumption is 0.4 mW and the core area is 1 mm2. The circuit of the M&NEMS microphone and the amplifier was fabricated and measured. From measurement results over a signal bandwidth of 20 kHz, it achieves a signal-to-noise ratio (SNR) of 77 dB.

9 citations


Journal ArticleDOI
TL;DR: An overview of the main challenges encountered when employing HAM is provided, a collection of state-of-the-art techniques and methodologies proposed to address these challenges are surveyed, and possible future directions are outlined.
Abstract: Many-core platforms are rapidly expanding in various embedded areas as they provide the scalable computational power required to meet the ever-growing performance demands of embedded applications and systems. However, the huge design space of possible task mappings, the unpredictable workload dynamism, and the numerous non-functional requirements of applications in terms of timing, reliability, safety, and so forth. impose significant challenges when designing many-core systems. Hybrid Application Mapping (HAM) is an emerging class of design methodologies for many-core systems which address these challenges via an incremental (per-application) mapping scheme: The mapping process is divided into (i) a design-time Design Space Exploration (DSE) step per application to obtain a set of high-quality mapping options and (ii) a run-time system management step in which applications are launched dynamically (on demand) using the precomputed mappings. This paper provides an overview of HAM and the design methodologies developed in line with it. We introduce the basics of HAM and elaborate on the way it addresses the major challenges of application mapping in many-core systems. We provide an overview of the main challenges encountered when employing HAM and survey a collection of state-of-the-art techniques and methodologies proposed to address these challenges. We finally present an overview of open topics and challenges in HAM, provide a summary of emerging trends for addressing them particularly using machine learning, and outline possible future directions. While there exists a large body of HAM methodologies, the techniques studied in this paper are developed, to a large extent, within the scope of invasive computing. Invasive computing introduces resource awareness into applications and employs explicit resource reservation to enable incremental application mapping and dynamic system management.

Journal ArticleDOI
TL;DR: In this article, the authors developed a new topology of active continuous-time second-order bandpass filter with maximum resonant frequency in the range of 1 GHz and wide electrically tunable quality factor requiring a very limited quiescent current consumption below 10 μA.
Abstract: Fully Depleted Silicon on Insulator (FD-SOI) CMOS technology offers the possibility of circuit performance optimization with reduction of both topology complexity and power consumption. These advantages are fully exploited in this paper in order to develop a new topology of active continuous-time second-order bandpass filter with maximum resonant frequency in the range of 1 GHz and wide electrically tunable quality factor requiring a very limited quiescent current consumption below 10 μA. Preliminary simulations that were carried out using the 28-nm FD-SOI technology from STMicroelectronics show that the designed example can operate up to 1.3 GHz of resonant frequency with tunable Q ranging from 90 to 370, while only requiring 6 μA standby current under 1-V supply.

Journal ArticleDOI
TL;DR: The major trends in managing PIM and NMP-based DCC systems are surveyed and a review of the landscape of resource management techniques employed by system designers for such systems are provided.
Abstract: Due to the amount of data involved in emerging deep learning and big data applications, operations related to data movement have quickly become a bottleneck. Data-centric computing (DCC), as enabled by processing-in-memory (PIM) and near-memory processing (NMP) paradigms, aims to accelerate these types of applications by moving the computation closer to the data. Over the past few years, researchers have proposed various memory architectures that enable DCC systems, such as logic layers in 3D-stacked memories or charge-sharing-based bitwise operations in dynamic random-access memory (DRAM). However, application-specific memory access patterns, power and thermal concerns, memory technology limitations, and inconsistent performance gains complicate the offloading of computation in DCC systems. Therefore, designing intelligent resource management techniques for computation offloading is vital for leveraging the potential offered by this new paradigm. In this article, we survey the major trends in managing PIM and NMP-based DCC systems and provide a review of the landscape of resource management techniques employed by system designers for such systems. Additionally, we discuss the future challenges and opportunities in DCC management.

Journal ArticleDOI
TL;DR: The fractional-order lung impedance model of the human respiratory tree is implemented in this paper, using Operational Transconductance Amplifiers, in order to achieve reduction of the spread of the required time-constants and scaling factors.
Abstract: The fractional-order lung impedance model of the human respiratory tree is implemented in this paper, using Operational Transconductance Amplifiers. The employment of such active element offers electronic adjustment of the impedance characteristics in terms of both elements values and orders. As the MOS transistors in OTAs are biased in the weak inversion region, the power dissipation and the dc bias voltage of operation are also minimized. In addition, the partial fraction expansion tool has been utilized, in order to achieve reduction of the spread of the required time-constants and scaling factors. The performance of the proposed scheme has been evaluated, at post-layout level, using MOS transistors models provided by the 0.35 μ m Austria Mikro Systeme technology CMOS process, and the Cadence IC design suite.

Journal ArticleDOI
TL;DR: The results highlight that LiM architectures have a clear advantage over Von Neumann architectures, allowing a reduction in energy consumption while increasing the overall speed of the circuit.
Abstract: Recently, the Logic-in-Memory (LiM) concept has been widely studied in the literature. This paradigm represents one of the most efficient ways to solve the limitations of a Von Neumann’s architecture: by placing simple logic circuits inside or near a memory element, it is possible to obtain a local computation without the need to fetch data from the main memory. Although this concept introduces a lot of advantages from a theoretical point of view, its implementation could introduce an increasing complexity overhead of the memory itself, leading to a more sophisticated design flow. As a case study, Binary Neural Networks (BNNs) have been chosen. BNNs binarize both weights and inputs, transforming multiply-and-accumulate into a simpler bitwise logical operation while maintaining high accuracy, making them well-suited for a LiM implementation. In this paper, we present two circuits implementing a BNN model in CMOS technology. The first one, called Out-Of-Memory (OOM) architecture, is implemented following a standard Von Neumann structure. The same architecture was redesigned to adapt the critical part of the algorithm for a modified memory, which is also capable of executing logic calculations. By comparing both OOM and LiM architectures we aim to evaluate if Logic-in-Memory paradigm is worth it. The results highlight that LiM architectures have a clear advantage over Von Neumann architectures, allowing a reduction in energy consumption while increasing the overall speed of the circuit.

Journal ArticleDOI
TL;DR: It is demonstrated that the utilization of an FPAA-based signal preprocessor can greatly improve the flexibility and power consumption of wireless sensor nodes.
Abstract: The wireless sensor nodes used in a growing number of remote sensing applications are deployed in inaccessible locations or are subjected to severe energy constraints. Audio-based sensing offers flexibility in node placement and is popular in low-power schemes. Thus, in this paper, a node architecture with low power consumption and in-the-field reconfigurability is evaluated in the context of an acoustic vehicle detection and classification (hereafter “AVDC”) scenario. The proposed architecture utilizes an always-on field-programmable analog array (FPAA) as a low-power event detector to selectively wake a microcontroller unit (MCU) when a significant event is detected. When awoken, the MCU verifies the vehicle class asserted by the FPAA and transmits the relevant information. The AVDC system is trained by solving a classification problem using a lexicographic, nonlinear programming algorithm. On a testing dataset comprising of data from ten cars, ten trucks, and 40 s of wind noise, the AVDC system has a detection accuracy of 100%, a classification accuracy of 95%, and no false alarms. The mean power draw of the FPAA is 43 μ W and the mean power consumption of the MCU and radio during its validation and wireless transmission process is 40.9 mW. Overall, this paper demonstrates that the utilization of an FPAA-based signal preprocessor can greatly improve the flexibility and power consumption of wireless sensor nodes.

Journal ArticleDOI
TL;DR: This position paper argues for the development of techniques for quantifying the ’degree of secureness’ of embedded system design instances such that these can be incorporated in a multi-objective optimization process.
Abstract: As modern embedded systems are becoming more and more ubiquitous and interconnected, they attract a world-wide attention of attackers and the security aspect is more important than ever during the design of those systems. Moreover, given the ever-increasing complexity of the applications that run on these systems, it becomes increasingly difficult to meet all security criteria. While extra-functional design objectives such as performance and power/energy consumption are typically taken into account already during the very early stages of embedded systems design, system security is still mostly considered as an afterthought. That is, security is usually not regarded in the process of (early) design-space exploration of embedded systems, which is the critical process of multi-objective optimization that aims at optimizing the extra-functional behavior of a design. This position paper argues for the development of techniques for quantifying the ’degree of secureness’ of embedded system design instances such that these can be incorporated in a multi-objective optimization process. Such technology would allow for the optimization of security aspects of embedded systems during the earliest design phases as well as for studying the trade-offs between security and the other design objectives such as performance, power consumption and cost.

Journal ArticleDOI
TL;DR: In this article, threshold voltage instability on commercial silicon carbide (SiC) power metal oxide semiconductor field electric transistor MOSFETs was evaluated using devices manufactured from two different manufacturers.
Abstract: In this study, threshold voltage instability on commercial silicon carbide (SiC) power metal oxide semiconductor field electric transistor MOSFETs was evaluated using devices manufactured from two different manufacturers. The characterization process included PBTI (positive bias temperature instability) and pulsed IV measurements of devices to determine electrical parameters’ degradations. This work proposes an experimental procedure to characterize silicon carbide (SiC) power MOSFETs following two characterization methods: (1) Using the one spot drop down (OSDD) measurement technique to assess the threshold voltage explains temperature dependence when used on devices while they are subjected to high temperatures and different gate voltage stresses. (2) Measurement data processing to obtain hysteresis characteristics variation and the damage effect over threshold voltage. Finally, based on the results, it was concluded that trapping charge does not cause damage on commercial devices due to reduced value of recovery voltage, when a negative small voltage is applied over a long stress time. The motivation of this research was to estimate the impact and importance of the bias temperature instability for the application fields of SiC power n-MOSFETs. The importance of this study lies in the identification of the aforementioned behavior where SiC power n-MOSFETs work together with complementary MOS (CMOS) circuits.

Journal ArticleDOI
TL;DR: A comparison between different neural spike algorithms to find the optimum for in vivo implanted EOSFET (electrolyte–oxide-semiconductor field effect transistor) sensors is presented and a figure of merit based on accuracy and resource consumption is presented.
Abstract: This work presents a comparison between different neural spike algorithms to find the optimum for in vivo implanted EOSFET (electrolyte–oxide-semiconductor field effect transistor) sensors. EOSFET arrays are planar sensors capable of sensing the electrical activity of nearby neuron populations in both in vitro cultures and in vivo experiments. They are characterized by a high cell-like resolution and low invasiveness compared to probes with passive electrodes, but exhibit a higher noise power that requires ad hoc spike detection algorithms to detect relevant biological activity. Algorithms for implanted devices require good detection accuracy performance and low power consumption due to the limited power budget of implanted devices. A figure of merit (FoM) based on accuracy and resource consumption is presented and used to compare different algorithms present in the literature, such as the smoothed nonlinear energy operator and correlation-based algorithms. A multi transistor array (MTA) sensor of 7 honeycomb pixels of a 30 μm2 area is simulated, generating a signal with Neurocube. This signal is then used to validate the algorithms’ performances. The results allow us to numerically determine which is the most efficient algorithm in the case of power constraint in implantable devices and to characterize its performance in terms of accuracy and resource usage.

Journal ArticleDOI
TL;DR: This paper dives deep into DNN architecture to uncover some unique challenges and opportunities for operation in the NTC paradigm, and reveals the severity of timing errors and its impact on inference accuracy at NTC.
Abstract: AI evolution is accelerating and Deep Neural Network (DNN) inference accelerators are at the forefront of ad hoc architectures that are evolving to support the immense throughput required for AI computation. However, much more energy efficient design paradigms are inevitable to realize the complete potential of AI evolution and curtail energy consumption. The Near-Threshold Computing (NTC) design paradigm can serve as the best candidate for providing the required energy efficiency. However, NTC operation is plagued with ample performance and reliability concerns arising from the timing errors. In this paper, we dive deep into DNN architecture to uncover some unique challenges and opportunities for operation in the NTC paradigm. By performing rigorous simulations in TPU systolic array, we reveal the severity of timing errors and its impact on inference accuracy at NTC. We analyze various attributes—such as data–delay relationship, delay disparity within arithmetic units, utilization pattern, hardware homogeneity, workload characteristics—and uncover unique localized and global techniques to deal with the timing errors in NTC.

Journal ArticleDOI
TL;DR: This study explored by means of simulations, a case of study and three figures of merit used for the transconductance-to-drain-current method, and concludes for the first time that the method should be reformulated.
Abstract: The transconductance-to-drain-current method is a transistor sizing methodology that is commonly used in CMOS technology. In this study, we explored by means of simulations, a case of study and three figures of merit used for the method, and we conclude for the first time that the method should be reformulated. The study has been performed on Ultra-Thin Body and Buried Fully Depleted Silicon-On-Insulator 28 nm low-voltage-threshold NFET commercial technology (UTBB FD-SOI), and the simulations were performed via Spectre Circuit Simulator, by using the device model-card. To our knowledge, no previous attempts have been made to assess the method capability, and we collected very important results that infer that the method should be reformulated or considered incomplete for use with this technology, which has an impact and ramifications on the field of process modeling, simulation and circuit design.

Journal ArticleDOI
TL;DR: This paper proposes a design space exploration framework for OFDMA-based RF-NoC architecture, which takes advantage of both real application benchmarks simulated using Sniper and RF- noC architecture modeled using Noxim, and adopted the proposed framework to finely configure a routing algorithm.
Abstract: The Network-on-chip (NoC) paradigm has been proposed as a promising solution to enable the handling of a high degree of integration in multi-/many-core architectures. Despite their advantages, wired NoC infrastructures are facing several performance issues regarding multi-hop long-distance communications. RF-NoC is an attractive solution offering high performance and multicast/broadcast capabilities. However, managing RF links is a critical aspect that relies on both application-dependent and architectural parameters. This paper proposes a design space exploration framework for OFDMA-based RF-NoC architecture, which takes advantage of both real application benchmarks simulated using Sniper and RF-NoC architecture modeled using Noxim. We adopted the proposed framework to finely configure a routing algorithm, working with real traffic, achieving up to 45% of delay reduction, compared to a wired NoC setup in similar conditions.

Journal ArticleDOI
TL;DR: This investigation reveals that different latency constraints can be met even under continuous inference, yet with a severe accuracy penalty forced by thermal constraints, and empirically demonstrate that thermal behavior does not benefit from topology scaling as the on-chip temperature still reaches critical values affecting reliability and user satisfaction.
Abstract: Embedded Convolutional Neural Networks (ConvNets) are driving the evolution of ubiquitous systems that can sense and understand the environment autonomously. Due to their high complexity, aggressive compression is needed to meet the specifications of portable end-nodes. A variety of algorithmic optimizations are available today, from custom quantization and filter pruning to modular topology scaling, which enable fine-tuning of the hyperparameters and the right balance between quality, performance and resource usage. Nonetheless, the implementation of systems capable of sustaining continuous inference over a long period is still a primary source of concern since the limited thermal design power of general-purpose embedded CPUs prevents execution at maximum speed. Neglecting this aspect may result in substantial mismatches and the violation of the design constraints. The objective of this work was to assess topology scaling as a design knob to control the performance and the thermal stability of inference engines for image classification. To this aim, we built a characterization framework to inspect both the functional (accuracy) and non-functional (latency and temperature) metrics of two ConvNet models, MobileNet and MnasNet, ported onto a commercial low-power CPU, the ARM Cortex-A15. Our investigation reveals that different latency constraints can be met even under continuous inference, yet with a severe accuracy penalty forced by thermal constraints. Moreover, we empirically demonstrate that thermal behavior does not benefit from topology scaling as the on-chip temperature still reaches critical values affecting reliability and user satisfaction.

Journal ArticleDOI
TL;DR: Kriging-based surrogate models of circuits' performances were constructed and then used within a metaheuristic-based optimization kernel in order to maximize the circuits’ sizing.
Abstract: Low-voltage low-power (LVLP) circuit design and optimization is a hard and time-consuming task. In this study, we are interested in the application of the newly proposed meta-modelling technique to alleviate such burdens. Kriging-based surrogate models of circuits’ performances were constructed and then used within a metaheuristic-based optimization kernel in order to maximize the circuits’ sizing. The JAYA algorithm was used for this purpose. Three topologies of CMOS current conveyors (CCII) were considered to showcase the proposed approach. The achieved performances were compared to those obtained using conventional LVLP circuit sizing techniques, and we show that our approach offers interesting results.

Journal ArticleDOI
TL;DR: The design of an approximate comparator used for preforming mantissa products in the floating-point multipliers is examined and the design space of approximate comparators for designing efficient approximateComparator-enabled multipliers (AxCEM) is explored.
Abstract: Floating-point multipliers have been the key component of nearly all forms of modern computing systems. Most data-intensive applications, such as deep neural networks (DNNs), expend the majority of their resources and energy budget for floating-point multiplication. The error-resilient nature of these applications often suggests employing approximate computing to improve the energy-efficiency, performance, and area of floating-point multipliers. Prior work has shown that employing hardware-oriented approximation for computing the mantissa product may result in significant system energy reduction at the cost of an acceptable computational error. This article examines the design of an approximate comparator used for preforming mantissa products in the floating-point multipliers. First, we illustrate the use of exact comparators for enhancing power, area, and delay of floating-point multipliers. Then, we explore the design space of approximate comparators for designing efficient approximate comparator-enabled multipliers (AxCEM). Our simulation results indicate that the proposed architecture can achieve a 66% reduction in power dissipation, another 66% reduction in die-area, and a 71% decrease in delay. As compared with the state-of-the-art approximate floating-point multipliers, the accuracy loss in DNN applications due to the proposed AxCEM is less than 0.06%.

Journal ArticleDOI
TL;DR: This paper presents an architecture for interpolation filters, able to trade quality for energy and power efficiency by exploiting approximate interpolations filters and by halving the amount of required memory with respect to state-of-the-art implementations.
Abstract: High Efficiency Video Coding (HEVC) is the latest video standard developed by the Joint Video Exploration Team. HEVC is able to offer better compression results than preceding standards but it suffers from a high computational complexity. In particular, one of the most time consuming blocks in HEVC is the fractional-sample interpolation filter, which is used in both the encoding and the decoding processes. Integrating different state-of-the-art techniques, this paper presents an architecture for interpolation filters, able to trade quality for energy and power efficiency by exploiting approximate interpolation filters and by halving the amount of required memory with respect to state-of-the-art implementations.

Journal ArticleDOI
TL;DR: A reinforcement learning based approach is proposed, which jointly optimizes profit and energy in the allocation of jobs to available resources, without the need for such prior information, which can address the under-utilization problem of the servers.
Abstract: Servers in a data center are underutilized due to over-provisioning, which contributes heavily toward the high-power consumption of the data centers. Recent research in optimizing the energy consumption of High Performance Computing (HPC) data centers mostly focuses on consolidation of Virtual Machines (VMs) and using dynamic voltage and frequency scaling (DVFS). These approaches are inherently hardware-based, are frequently unique to individual systems, and often use simulation due to lack of access to HPC data centers. Other approaches require profiling information on the jobs in the HPC system to be available before run-time. In this paper, we propose a reinforcement learning based approach, which jointly optimizes profit and energy in the allocation of jobs to available resources, without the need for such prior information. The approach is implemented in a software scheduler used to allocate real applications from the Princeton Application Repository for Shared-Memory Computers (PARSEC) benchmark suite to a number of hardware nodes realized with Odroid-XU3 boards. Experiments show that the proposed approach increases the profit earned by 40% while simultaneously reducing energy consumption by 20% when compared to a heuristic-based approach. We also present a network-aware server consolidation algorithm called Bandwidth-Constrained Consolidation (BCC), for HPC data centers which can address the under-utilization problem of the servers. Our experiments show that the BCC consolidation technique can reduce the power consumption of a data center by up-to 37%.

Journal ArticleDOI
TL;DR: A 140 dB input dynamic range low-noise current readout circuit with a noise floor of 10 fArms/sq(Hz) and a programmable bidirectional input current gain stage followed by an integrator-based analog-to-pulse conversion stage is presented.
Abstract: Designing low-noise current readout circuits at high speed is challenging. There is a need for preamplification stages to amplify weak input currents before being processed by conventional integrator based readout. However, the high current gain preamplification stage usually limits the dynamic range. This article presents a 140 dB input dynamic range low-noise current readout circuit with a noise floor of 10 fArms/sq(Hz). The architecture uses a programmable bidirectional input current gain stage followed by an integrator-based analog-to-pulse conversion stage. The programmable current gains setting enables one to achieve higher overall input dynamic range. The readout circuit is designed and in 0.18 μm CMOS and consumes 10.3 mW power from a 1.8 V supply. The circuit has been verified using post-layout simulations.

Journal ArticleDOI
TL;DR: A typical foundry perspective is presented and a detailed description of the chemical mechanical polishing process and the coverage dependency is provided, followed by a comprehensive description of coverage rules needed for dielectric, poly, and Cu layers used in advanced technologies.
Abstract: The continuous scaling needed for higher density and better performance has introduced some new challenges to the planarity processes. This has resulted in new definitions of the layout coverage rules developed by the foundry and provided to the designers. In advanced technologies, the set of rules considers both the global and the local coverage of the front-end-of line (FEOL) dielectric layers, to the back-end-of-line (BEOL) Cu layers and Al layers, to support high-k/Metal Gate process integration. For advance technologies, a new set of rules for dummy feature insertion was developed by the integrated circuit (IC) manufacturers in order to fulfill coverage limits. New models and utilities for fill insertion were developed, taking into consideration the design coverage, thermal effects, sensitive signal line, critical analog and RF devices like inductors, and double patterning requirements, among others. To minimize proximity effects, cell insertion was also introduced. This review is based on published data from leading IC manufacturers with a careful integration of new experimental data accumulated by the authors. We aim to present a typical foundry perspective. The review provides a detailed description of the chemical mechanical polishing (CMP) process and the coverage dependency, followed by a comprehensive description of coverage rules needed for dielectric, poly, and Cu layers used in advanced technologies. Coverage rules verification data are then presented. RF-related aspects of some rules, like the size and the distance of dummy features from inductors, are discussed with additional design-for-manufacturing layout recommendations as developed by the industry.

Journal ArticleDOI
TL;DR: Two new electronic tuning current-mode square-wave generators are introduced in the ensuing paper and good agreement with the theoretical values is demonstrated.
Abstract: Two new electronic tuning current-mode square-wave generators are introduced in the ensuing paper. In the first proposed square-wave generator circuit, one Operational Trans-resistance Amplifier (OTRA) and two passive components are involved, along with two NMOS depletion mode transistors. This circuit generates a square-wave with almost equal and fixed duty cycles. The second proposed circuit is able to control both on-duty and off-duty cycles independently with the help of two passive components, two NMOS depletion mode transistors, and two diodes connected to the circuit. The frequency of the proposed circuits can be adjusted with the passive components connected to the circuit. Moreover, electronic tuning can also be achieved with the proposed circuits. The measured results that are included in the paper show the linear variation of a time period as compared with existing OTRA based square waveform generator. The performance of the proposed circuits is examined while using SPICE models. These circuits are built on a laboratory breadboard using commercially available Current Feedback Operational Amplifier (AD844 AN) and passive components are connected externally and tested for square waveform generation. The obtained results demonstrate good agreement with the theoretical values.