scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Journal on Emerging and Selected Topics in Circuits and Systems in 2018"


Journal ArticleDOI
TL;DR: For the first time, the applicability and practicality of approximate multipliers in multiple-input multiple-output antenna communication systems with error control coding are shown.
Abstract: Approximate computing has been considered to improve the accuracy-performance tradeoff in error-tolerant applications. For many of these applications, multiplication is a key arithmetic operation. Given that approximate compressors are a key element in the design of power-efficient approximate multipliers, we first propose an initial approximate 4:2 compressor that introduces a rather large error to the output. However, the number of faulty rows in the compressor’s truth table is significantly reduced by encoding its inputs using generate and propagate signals. Based on this improved compressor, two $4\times 4$ multipliers are designed with different accuracies and then are used as building blocks for scaling up to $16\,\times \,16$ and $32\times 32$ multipliers. According to the mean relative error distance (MRED), the most accurate of the proposed $16\,\times \,16$ unsigned designs has a 44% smaller power-delay product (PDP) compared to other designs with comparable accuracy. The radix-4 signed Booth multiplier constructed using the proposed compressor achieves a 52% reduction in the PDP-MRED product compared to other approximate Booth multipliers with comparable accuracy. The proposed multipliers outperform other approximate designs in image sharpening and joint photographic experts group applications by achieving higher quality outputs with lower power consumptions. For the first time, we show the applicability and practicality of approximate multipliers in multiple-input multiple-output antenna communication systems with error control coding.

134 citations


Journal ArticleDOI
TL;DR: A mapping algorithm with inner fault tolerance is proposed to convert matrix parameters into RRAM conductances in RCS and tolerate SAFs by fully exploring the available mapping space to ensure that RCS is effective when the percentage of faulty RRAM cells is high.
Abstract: Emerging metal-oxide resistive switching random-access memory (RRAM) devices and RRAM crossbars have demonstrated their potential in boosting the speed and energy-efficiency of analog matrix-vector multiplication. However, due to the immature fabrication technology, commonly occurring Stuck-At-Faults (SAFs) seriously degrade the computational accuracy of an RRAM-based computing system (RCS). In this paper, we present a fault-tolerant framework for RCS. A mapping algorithm with inner fault tolerance is proposed to convert matrix parameters into RRAM conductances in RCS and tolerate SAFs by fully exploring the available mapping space. Two baseline redundancy schemes are proposed to ensure that RCS is effective when the percentage of faulty RRAM cells is high. To reduce the number of redundant RRAM cells when the SAFs follow a non-uniform distribution or an unknown distribution, a distribution-aware redundancy scheme and a re-configurable redundancy scheme are proposed to provide dynamic fault tolerance. Simulation results show that, the baseline redundancy schemes can improve the recognition accuracy of the MNIST data set to almost the same as the RRAM-fault-free case, with an energy overhead of approximately 30%. When SAFs follow a non-uniform and an unknown distribution, the distribution-aware and re-configurable schemes can reduce the number of redundant RRAM cells from more than 200% to less than 40% and 60%, respectively, without reducing the recognition accuracy.

117 citations


Journal ArticleDOI
TL;DR: In this article, a detailed circuit and device analysis of a training accelerator may serve as a foundation for further architecture-level studies, and the possible gains over a similar digital-only version of this accelerator block suggest that continued optimization of analog resistive memories is valuable.
Abstract: Neural networks are an increasingly attractive algorithm for natural language processing and pattern recognition. Deep networks with >50 M parameters are made possible by modern graphics processing unit clusters operating at $270\times $ energy and $540\times $ latency advantage over a similar block utilizing only digital ReRAM and takes only 11 fJ per multiply and accumulate. Compared with an SRAM-based accelerator, the energy is $430\times $ better and latency is $34\times $ better. Although training accuracy is degraded in the analog accelerator, several options to improve this are presented. The possible gains over a similar digital-only version of this accelerator block suggest that continued optimization of analog resistive memories is valuable. This detailed circuit and device analysis of a training accelerator may serve as a foundation for further architecture-level studies.

104 citations


Journal ArticleDOI
TL;DR: A survey of recent works in developing neuromorphic or neuro-inspired hardware systems, focusing on those systems which can either learn from data in an unsupervised or online supervised manner, and present algorithms and architectures developed specially to support on-chip learning.
Abstract: In this paper, we present a survey of recent works in developing neuromorphic or neuro-inspired hardware systems. In particular, we focus on those systems which can either learn from data in an unsupervised or online supervised manner. We present algorithms and architectures developed specially to support on-chip learning. Emphasis is placed on hardware friendly modifications of standard algorithms, such as backpropagation, as well as novel algorithms, such as structural plasticity, developed specially for low-resolution synapses. We cover works related to both spike-based and more traditional non-spike-based algorithms. This is followed by developments in novel devices, such as floating-gate MOS, memristors, and spintronic devices. CMOS circuit innovations for on-chip learning and CMOS interface circuits for post-CMOS devices, such as memristors, are presented. Common architectures, such as crossbar or island style arrays, are discussed, along with their relative merits and demerits. Finally, we present some possible applications of neuromorphic hardware, such as brain–machine interfaces, robotics, etc., and identify future research trends in the field.

90 citations


Journal ArticleDOI
TL;DR: This work proposes an efficient hardware architecture to implement gradient boosted trees in applications under stringent power, area, and delay constraints, such as medical devices, and introduces the concepts of asynchronous tree operation and sequential feature extraction to achieve an unprecedented energy and area efficiency.
Abstract: Biomedical applications often require classifiers that are both accurate and cheap to implement. Today, deep neural networks achieve the state-of-the-art accuracy in most learning tasks that involve large data sets of unstructured data. However, the application of deep learning techniques may not be beneficial in problems with limited training sets and computational resources, or under domain-specific test time constraints. Among other algorithms, ensembles of decision trees, particularly the gradient boosted models have recently been very successful in machine learning competitions. Here, we propose an efficient hardware architecture to implement gradient boosted trees in applications under stringent power, area, and delay constraints, such as medical devices. Specifically, we introduce the concepts of asynchronous tree operation and sequential feature extraction to achieve an unprecedented energy and area efficiency. The proposed architecture is evaluated in automated seizure detection for epilepsy, using 3074 h of intracranial EEG data from 26 patients with 393 seizures. Average F1 scores of 99.23% and 87.86% are achieved for random and block-wise splitting of data into train/test sets, respectively, with an average detection latency of 1.1 s. The proposed classifier is fabricated in a 65-nm TSMC process, consuming 41.2 nJ/class in a total area of $540\times 1850\,\,\mathrm {\mu m}^{2}$ . This design improves the state-of-the-art by $27\times $ reduction in energy-area-latency product. Moreover, the proposed gradient-boosting architecture offers the flexibility to accommodate variable tree counts specific to each patient, to trade the predictive accuracy with energy. This patient-specific and energy-quality scalable classifier holds great promise for low-power sensor data classification in biomedical applications.

87 citations


Journal ArticleDOI
TL;DR: A two-layer perceptron network is successfully trained online and the classification accuracy of MNIST handwritten digit data set is improved by using 6-/8-b analog synapses, respectively, with extremely high asymmetric nonlinearity.
Abstract: Asymmetric nonlinear weight update is considered as one of the major obstacles for realizing hardware neural networks based on analog resistive synapses, because it significantly compromises the online training capability. This paper provides new solutions to this critical issue through co-optimization with the hardware-applicable deep-learning algorithms. New insights on engineering activation functions and a threshold weight update scheme effectively suppress the undesirable training noise induced by inaccurate weight update. We successfully trained a two-layer perceptron network online and improved the classification accuracy of MNIST handwritten digit data set to 87.8%/94.8% by using 6-/8-b analog synapses, respectively, with extremely high asymmetric nonlinearity.

81 citations


Journal ArticleDOI
TL;DR: Comparative analysis of the classification performance under different sleep stage patterns with prior works has been carried out to show the significant improvements over state-of-the-art solutions, and suggest that the proposed scheme is suitable for long-term sleep monitoring.
Abstract: Sleep stage estimation is crucial to the evaluation of sleep quality and is a proven biometric in diagnosing cardiovascular diseases. In this paper, we design a continuous wave (CW) Doppler radar to accurately measure sleep-related signals, including respiration, heartbeat, and body movement. Body movement index, respiration per minute (RPM), variance of RPM, amplitude difference accumulation (ADA) of respiration, rapid eye movement parameter, sample entropy, heartbeat per minute (HPM), variance of HPM, ADA of heartbeat, deep parameter, and time feature have been extracted and fed into different machine learning classifiers. A total of 11 all night polysomnography recordings from 13 healthy examinees were used to validate the proposed CW Doppler radar system and the ability to detect sleep stage information from it. Comparative studies and statistical results have shown that the subspace K-nearest neighbor algorithm outperforms the other classifiers with the highest accuracy of up to 86.6%. With the Relief F algorithm, features have been ranked, and the selected feature subsets have been preliminary tested to identify the optimal feature subset. Meanwhile, comparative analysis of our classification performance under different sleep stage patterns with prior works has been carried out to show the significant improvements over state-of-the-art solutions. These results suggest that the proposed scheme is suitable for long-term sleep monitoring.

67 citations


Journal ArticleDOI
TL;DR: An energy-efficient and high throughput architecture for convolutional neural networks (CNN) employing a deep in-memory architecture, to embed energy- efficient low swing mixed-signal computations in the periphery of the SRAM bitcell array.
Abstract: This paper presents an energy-efficient and high throughput architecture for convolutional neural networks (CNN). Architectural and circuit techniques are proposed to address the dominant energy and delay costs associated with data movement in CNNs. The proposed architecture employs a deep in-memory architecture, to embed energy-efficient low swing mixed-signal computations in the periphery of the SRAM bitcell array. An efficient data access pattern and a mixed-signal multiplier are proposed to exploit data reuse opportunities in convolution. Silicon-validated energy, delay, and behavioral models of the proposed architecture are developed and employed to perform large-scale system simulations. System-level simulations using these models show >97% detection accuracy on the MNIST data set, along with $4.9\times $ and $2.4\times $ improvements in energy efficiency and throughput, respectively, leading to $11.9\times $ reduction in energy-delay product as compared with a conventional (SRAM + digital processor) architecture.

66 citations


Journal ArticleDOI
TL;DR: This paper investigates a technique to attain the requisite signal to noise ratio by dc offset management and detailed exploration of the unique features in respiration signals using noncontact CW Doppler radar are presented.
Abstract: A low distortion dc coupled CW radar system with high signal to noise ratio is capable of accurate representation of respiration in human subjects. We propose to test the hypothesis that a non-contact physiological radar monitoring system which measures and characterizes subtle body kinematics, can be made to resolve patterns accurately enough to recognize an individual’s identity. This paper investigates a technique to attain the requisite signal to noise ratio by dc offset management. Detailed exploration of the unique features in respiration signals using noncontact CW Doppler radar are presented. A proposed dynamic segmentation technique allowed detection of various unique features and patterns. KMN nearest neighbor and majority vote algorithms were implemented in software for this radar-based unique identification system. The system was tested and validated for six test subjects with 95% success rate. Fractal analysis of minor components of linearly demodulated radar signal was also presented for additional improvement in accuracy. This paper is believed to be significant as radar unique identification of human subjects has many potential applications, including security, health monitoring, IoT applications, and virtual reality.

55 citations


Journal ArticleDOI
TL;DR: This paper proposes an integer convolutional neural network (CNN) implementation, Integer-Net, as a memory-efficient unified hardware-friendly CNN framework, and discusses the structure of theinteger convolution to improve the computational gain and reduce the inference time that are crucial for real-time application.
Abstract: Outstanding seizure detection algorithms have been developed over past two decades. Despite this success, their implementations as part of implantable or wearable devices are still limited. These works are mainly based on heavily handcrafted feature extraction, which is computationally expensive and is shown to be data set specific. These issues greatly limit the applicability of such methods to hardware implementation, including in-silicon implementations such as application specific integrated circuits. In this paper, we propose an integer convolutional neural network (CNN) implementation, Integer-Net, as a memory-efficient unified hardware-friendly CNN framework. The performance of Integer-Net is evaluated with multiple time-series data sets consisting of intracranial and scalp electroencephalogram (EEG) signals. Integer-Net shows a consistent seizure detection performance across three data sets: Freiburg Hospital intracranial EEG data set, Children’s Hospital of Boston-MIT scalp EEG data set, and UPenn and Mayo Clinic’s seizure detection data set. Our experimental results show that a 4-bit Integer-Net leads to only 2% drop of accuracy compared with a 32-bit real-value resolution CNN model, while offering more than 7 times improvement in memory efficiency. We discuss the structure of the integer convolution to improve the computational gain and reduce the inference time that are crucial for real-time application.

54 citations


Journal ArticleDOI
TL;DR: In this paper, an adaptive weight decay mechanism with the traditional spike timing dependent plasticity (STDP) learning was proposed to model adaptivity in SNNs for digit recognition.
Abstract: A fundamental feature of learning in animals is the “ability to forget” that allows an organism to perceive, model, and make decisions from disparate streams of information and adapt to changing environments. Against this backdrop, we present a novel unsupervised learning mechanism adaptive synaptic plasticity (ASP) for improved recognition with spiking neural networks (SNNs) for real time online learning in a dynamic environment. We incorporate an adaptive weight decay mechanism with the traditional spike timing dependent plasticity (STDP) learning to model adaptivity in SNNs. The leak rate of the synaptic weights is modulated based on the temporal correlation between the spiking patterns of the pre- and post-synaptic neurons. This mechanism helps in gradual forgetting of insignificant data while retaining significant, yet old, information. ASP, thus, maintains a balance between forgetting and immediate learning to construct a stable-plastic self-adaptive SNN for continuously changing inputs. We demonstrate that the proposed learning methodology addresses catastrophic forgetting, while yielding significantly improved accuracy over the conventional STDP learning method for digit recognition applications. In addition, we observe that the proposed learning model automatically encodes selective attention toward relevant features in the input data, while eliminating the influence of background noise (or denoising) further improving the robustness of the ASP learning.

Journal ArticleDOI
TL;DR: The results show that the proposed RF system comprising of ultra-wideband transmitter-receiver antenna and highly sensitive planar sensor for wireless sensing of glucose and saline concentration possesses strong potential for biomedical applications.
Abstract: Wireless RF sensors are the building blocks of the next generation sensing techniques that are quite essential for the Internet of Things. In this paper, the RF system comprising of ultra-wideband transmitter-receiver antenna and highly sensitive planar sensor is proposed for wireless sensing of glucose and saline concentration, which are currently being used in various biomedical applications. The proposed wireless sensor technique is quite cost-effective having both antennae and the sensor fabricated on economical FR4 substrate, where the ultra-wide band antennas are operational in 1–18 GHz with highest gain of 7 dBi. The wireless sensor of the proposed RF system possessing high sensitivity of 60 MHz per unit change in dielectric constant of liquid sample are operating over broad frequency range of 1.5 to 5.9 GHz. The sensors are designed, fabricated, and tested for characterizing various biomedically relevant samples: saline and glucose solutions at various concentrations. Our results show that the proposed wireless sensing system possesses strong potential for biomedical applications.

Journal ArticleDOI
TL;DR: This work provides a proof of concept for unsupervised learning by STDP in memristive networks, providing insight into the dynamics of stochastic learning and supporting the understanding and design of neuromorphic networks with emerging memory devices.
Abstract: Hardware processors for neuromorphic computing are gaining significant interest as they offer the possibility of real in-memory computing, thus by-passing the limitations of speed and energy consumption of the von Neumann architecture. One of the major limitations of current neuromorphic technology is the lack of bio-realistic and scalable devices to improve the current design of artificial synapses and neurons. To overcome these limitations, the emerging technology of resistive switching memory has attracted wide interest as a nano-scaled synaptic element. This paper describes the implementation of a perceptron-like neuromorphic hardware capable of spike-timing dependent plasticity (STDP), and its operation under stochastic learning conditions. The learning algorithm of a single or multiple patterns, consisting of either static or dynamic visual input data, is described. The impact of noise is studied with respect to learning efficiency (false fire, true fire) and learning time. Finally, the impact of stochastic learning rule, such as the inversion of the time dependence of potentiation and depression in STDP, is considered. Overall, the work provides a proof of concept for unsupervised learning by STDP in memristive networks, providing insight into the dynamics of stochastic learning and supporting the understanding and design of neuromorphic networks with emerging memory devices.

Journal ArticleDOI
TL;DR: This paper presents a memristive neuromorphic system for improved power and area efficiency, and includes synchronous digital long term plasticity, an online learning methodology that helps the system train the neural networks during the operation phase and improves the efficiency in learning considering the power consumption and area overhead.
Abstract: Neuromorphic computing is non-von Neumann computer architecture for the post Moore’s law era of computing. Since a main focus of the post Moore’s law era is energy-efficient computing with fewer resources and less area, neuromorphic computing contributes effectively in this research. In this paper, we present a memristive neuromorphic system for improved power and area efficiency. Our particular mixed-signal approach implements neural networks with spiking events in a synchronous way. Moreover, the use of nano-scale memristive devices saves both area and power in the system. We also provide device-level considerations that make the system more energy-efficient. The proposed system additionally includes synchronous digital long term plasticity, an online learning methodology that helps the system train the neural networks during the operation phase and improves the efficiency in learning considering the power consumption and area overhead.

Journal ArticleDOI
TL;DR: A multiple-layer classification method is introduced for comprehensive human motion recognition, including the largest number of motion types ever studied with an ultra-wideband radar system, and could be beneficial in smart homes and senior care.
Abstract: Human motion recognition is crucial for surveillance, search and rescue operation, smart homes, and senior care. In daily life, there exists various kinds of human motions with widely different characteristics and meanwhile they also exhibit some clustering features, which make it difficult for recognition. In this paper, a multiple-layer classification method is introduced for comprehensive human motion recognition, including the largest number of motion types ever studied with an ultra-wideband radar system. First, in the pre-screening layer, information in the time-range domain is used to distinguish in situ motions and non- in situ motions. According to different kinds of human motions, the weighted range-time-frequency transform method is proposed to obtain corresponding spectrograms. Then physical empirical features and principal component analysis-based features are extracted for the classifiers to achieve the specific in situ and non- in situ motions, respectively. Extensive experiments have been conducted, achieving the highest accuracy rate of up to 94.4% and 95.3% for in situ motions and non- in situ motions, respectively. The interferences of individual diversity on the proposed method are also investigated. The proposed method could be beneficial in smart homes and senior care.

Journal ArticleDOI
TL;DR: This paper provides a comprehensive overview of the recent advances in the field of wireless contactless sensing circuits and systems for healthcare and biomedical applications with special emphasis on wireless implantable devices, radar-based techniques to detect human motions, wireless neural interfacing and prosthesis, and the characterization of biological materials by means of imaging approaches.
Abstract: This paper provides a comprehensive overview of the recent advances in the field of wireless contactless sensing circuits and systems for healthcare and biomedical applications. In particular, special emphasis is made on wireless implantable devices, radar-based techniques to detect human motions, such as vital signs or gestures, wireless neural interfacing and prosthesis, and the characterization of biological materials by means of imaging approaches as well as the determination of their electrical properties. It is believed that this overview can serve as a starting point to the biomedical wireless-sensing topic and could encourage researchers and practitioners to continue works in the exciting area of wireless technologies with application to healthcare and biomedical contexts.

Journal ArticleDOI
TL;DR: This paper aims to take stock of recent advances in the field of energy-quality scalable circuits and systems, as promising direction to continue the historical exponential energy downscaling under diminished returns from technology and voltage scaling.
Abstract: This paper aims to take stock of recent advances in the field of energy-quality (EQ) scalable circuits and systems, as promising direction to continue the historical exponential energy downscaling under diminished returns from technology and voltage scaling. EQ-scalable systems explicitly trade off energy and quality at different levels of abstraction and sub-systems, dealing with “quality” as an explicit design requirement, and reducing energy whenever the application, the task, or the dataset allow quality degradation (e.g., vision and machine learning). A general framework for EQ-scalable systems based on the concept of quality slack is presented along with scalable architectures. A taxonomy of techniques to trade off energy and quality, a VLSI perspective, and possible quality control strategies are then discussed. The state of the art is surveyed to put the advances in its different sub-fields into a unitary perspective, emphasizing the on-going and prospective trends. At the component level, the generality of the EQ-scaling concept is shown through several examples, ranging from logic to analog circuits, to memories, data converters, and accelerators. Interesting implications of the joint adoption of EQ scaling and machine learning are also discussed, suggesting that their synergy gives ample room for further energy and performance improvements. From a level of abstraction viewpoint, EQ scaling is discussed from the circuit level to architectures, the hardware–software interface, the programming language, the compiler level, and run-time adaptation. Several case studies are discussed to put EQ scaling in the context of real-world applications.

Journal ArticleDOI
TL;DR: The model proved to be accurate for both the heart rate and respiration rate detection within 5% error margin when compared with conventional contact sensor readings and in the presence of more than one subject, only slight degradation has been observed.
Abstract: An electromagnetic model for heart rate and respiration rate detection using the scattered fields of incident plane waves on a dielectric model of a human subject has been developed. The model approximates the torso by an equivalent homogenous dielectric layer, and utilizes an accelerated parallel version of the multi-level fast multipole algorithm to speed up the computation. Non-contact measurements using an ultra-wideband radar are utilized to experimentally validate the model. The model proved to be accurate for both the heart rate and respiration rate detection within 5% error margin when compared with conventional contact sensor readings. The agreement between measured and simulated results is good for distances up to 3 m, and at various subject orientations with respect to the radar boresight. Meanwhile, in the presence of more than one subject, only slight degradation has been observed unless one subject is almost blocked by another. This accurate model would have broader impacts as it can be utilized to investigate various human activities and motion scenarios, and can be used, as well, to fine-tune the signal processing techniques and radar system development.

Journal ArticleDOI
TL;DR: An Internet of Things (IoT)-based selective, sensitive, quick, and inexpensive device for the quantification of CTx-1 levels in serum and for data transmission to an IoT-based cloud server is reported.
Abstract: Early detection of disease is essential for an efficient treatment. Bone loss can be detected and monitored by regular measurement of serum or urine C-terminal telopeptide of type 1 collagen (CTx-1). Therefore, rapid, portable, and low-cost point-of-care devices are highly desirable. In this paper, we have reported an Internet of Things (IoT)-based selective, sensitive, quick, and inexpensive device for the quantification of CTx-1 levels in serum. A capacitive interdigital sensor was coated with artificial antibodies, prepared by molecular imprinting technology. Electrochemical impedance spectroscopy was used to evaluate the resistive and capacitive properties of the sample solutions. A microcontroller-based system was developed for the measurement of the level of CTx-1 in serum and for data transmission to an IoT-based cloud server. The data can be provided to the medical practitioner and a detailed investigation can start for early detection and treatment. The developed sensing system responded linearly in a range of 0.1 to 2.5 ppb, which covers the normal reference range of CTx-1 in serum, with a limit of detection of 0.09 ppb. The results demonstrated that the proposed portable biosensing system could provide a rapid, simple, and selective approach for CTx-1 measurement in serum. Sheep serum samples were tested using the proposed system and the validation of the results was done using an enzyme-linked immunosorbent assay kit.

Journal ArticleDOI
TL;DR: The experiments on recognition benchmarks show that cross-layer approximation provides substantial improvements in energy efficiency for different accuracy/quality requirements, and a synergistic framework for combining the approximation techniques to achieve maximal energy benefits from approximate DNNs is proposed.
Abstract: Deep neural networks (DNNs) have emerged as the state-of-the-art technique in a wide range of machine learning tasks for analytics and computer vision in the next generation of embedded (mobile, IoT, and wearable) devices. Despite their success, they suffer from high energy requirements. In recent years, the inherent error resiliency of DNNs has been exploited by introducing approximations at either the algorithmic or the hardware levels (individually) to obtain energy savings while incurring tolerable accuracy degradation. However, there is a need for investigating the overall energy-accuracy trade-offs arising from the introduction of approximations at different levels in complex DNNs. We perform a comprehensive analysis to determine the effectiveness of cross-layer approximations for the energy-efficient realization of large-scale DNNs. The approximations considered are as follows: 1) use of lower complexity networks (containing lesser number of layers and/or neurons per layer); 2) pruning of synaptic weights; 3) approximate multiplication operation in the neuronal multiply-and-accumulate computation; and 4) approximate write/read operations to/from the synaptic memory. Our experiments on recognition benchmarks (MNIST and CIFAR10) show that cross-layer approximation provides substantial improvements in energy efficiency for different accuracy/quality requirements. Furthermore, we propose a synergistic framework for combining the approximation techniques to achieve maximal energy benefits from approximate DNNs.

Journal ArticleDOI
TL;DR: The first high-energy stimulator that can be controlled wirelessly and integrated into a gastric bioelectrical activity monitoring system and can be used for treating functional gastrointestinal disorders is reported.
Abstract: The purpose of this paper is to develop and validate a miniature system that can wirelessly acquire gastric electrical activity called slow waves and deliver high-energy electrical pulses to modulate its activity. The system is composed of a front-end unit and an external stationary back-end unit that is connected to a computer. The front-end unit contains a recording module with three channels and a single-channel stimulation module. Commercial off-the-shelf components were used to develop front- and back-end units. A graphical user interface was designed in LabVIEW to process and display the recorded data in real-time and store the data for off-line analysis. The system was successfully validated on bench top and in vivo in porcine models. The bench-top studies showed an appropriate frequency response for analog conditioning and digitization resolution to acquire gastric slow waves. The system was able to deliver electrical pulses at amplitudes up to 10 mA to a load smaller than $880~\Omega $ . Simultaneous acquisition of the slow waves from all three channels was demonstrated in vivo . The system was able to modulate—by either suppressing or entraining—the slow wave activity. This paper reports the first high-energy stimulator that can be controlled wirelessly and integrated into a gastric bioelectrical activity monitoring system. The system can be used for treating functional gastrointestinal disorders.

Journal ArticleDOI
TL;DR: An always-on event-driven asynchronous wake-up circuit with trainable pattern recognition capabilities to duty-cycle power-constrained Internet-of-Things (IoT) sensor nodes and a novel asynchronous digital logic classifier for sequential pattern recognition is presented.
Abstract: We report an always-on event-driven asynchronous wake-up circuit with trainable pattern recognition capabilities to duty-cycle power-constrained Internet-of-Things (IoT) sensor nodes. The wake-up circuit is based on a level-crossing analog-to-digital converter (LC-ADC) employed as a feature-extraction block with automatic activity-sampling rate scaling behavior. A novel asynchronous digital logic classifier for sequential pattern recognition is presented. It is driven by the LC-ADC activity and trained to minimize classification errors due to falsely detected events. As proof-of-concept, a prototype of the wake-up circuit is fabricated in 130nm CMOS technology within 0.054 mm2 of active area, covering up to 2.6 kHz of input signal bandwidth. The prototype has been first validated by interfacing it with a commercial accelerometer to classify hand gestures in real-time, reaching 81% of accuracy with only 2.2 $\mu \text{W}$ at 1-V supply. To highlight the flexibility of the design, a second application, detecting pathologic electrocardiogram beats is also discussed.

Journal ArticleDOI
TL;DR: It is demonstrated that a general purpose neural processor, using a single tile size, compares reasonably with highly specialized neural processor designs using multiple, optimally sized custom crossbars, indicating that general purpose memristive crossbar-based neural processors are practical.
Abstract: Memristor crossbar arrays have been proposed for use in synthetic neuron hardware due to their high density, non-volatile programmable conductances, and ability to perform dot-product computation efficiently. However, fitting complete neural networks into multiple, uniformly sized crossbars often results in poor memristor and neuron utilization. In this paper, we propose the use of smaller crossbar tiles that can be flexibly combined to create a variety of crossbar sizes and aspect ratios to more closely fit the ideal sizes required. We examine the throughput/power and throughput/area metrics of custom versus tiled designs for three different neural network applications, representing widely differing design points. We demonstrate that a general purpose neural processor, using a single tile size, compares reasonably with highly specialized neural processor designs using multiple, optimally sized custom crossbars. These results indicate that general purpose memristive crossbar-based neural processors are practical.

Journal ArticleDOI
TL;DR: The in vivo experiments verified that epileptic seizures could be suppressed by the electrical stimulation provided by the proposed stimulator, and the reliability measurements verified that the proposed Stimulator is robust for electrical stimulation in medical applications.
Abstract: A high-voltage-tolerant and power-efficient stimulator with adaptive power supply is proposed and realized in a 0.18- $\mu \text{m}$ 1.8-V/3.3-V CMOS process. The self-adaption bias technique and stacked MOS configuration are used to prevent issues of electrical overstress and gate-oxide reliability in low-voltage transistors. The on-chip high-voltage generator uses a pulse-skip regulation scheme to generate a variable dc supply voltage for the stimulator by detecting the headroom voltage on the electrode sites. With a dc input voltage of 3.3 V, the on-chip high-voltage generator provides an adjustable dc output voltage from 6.7 to 12.3 V at a step of 0.8 V, which results in a maximal system power efficiency of 56% at a 2400- $\mu \text{A}$ stimulus current. The charge mismatch of the stimulator is down to 1.7% in the whole stimulus current range of 200– $3000~\mu \text{A}$ . The in vivo experiments verified that epileptic seizures could be suppressed by the electrical stimulation provided by the proposed stimulator. In addition, the reliability measurements verified that the proposed stimulator is robust for electrical stimulation in medical applications.

Journal ArticleDOI
TL;DR: A roofline model for cascaded systems is proposed, derives system level trade-offs and proves the approaches validity through a visual classification case-study.
Abstract: Recently, there has been an increasing demand for advanced classification capabilities embedded on wearable battery constrained devices, such as smartphones or watches. Achieving such functionality with a tight power and energy budget has proven a real challenge, specifically for large-scale neural network-based applications. Previously, cascaded systems have been proposed to minimize energy consumption for such applications, either through using a single wake-up stage, or by using a linear- or tree based cascade of consecutive classifiers that allow early termination. In this paper, we expand upon these concepts by generalizing cascades to hierarchical cascaded processing, where a hierarchy of increasingly complex classifiers, each designed and trained for a specific subtask is used. This hierarchical approach significantly outperforms the wake-up based approach by up to 2 orders of magnitude in energy consumption at iso-accuracy, specifically in systems with sparse input data such as speech recognition and visual object detection. This paper presents a general design framework for such systems and illustrates how to optimize them toward minimum energy consumption. The text further proposes a roofline model for cascaded systems, derives system level trade-offs and proves the approaches validity through a visual classification case-study.

Journal ArticleDOI
TL;DR: The CCBA outperforms both state-of-the-art and truncated adders for high-accuracy and low-power circuits, confirming the interest of the proposed concept to help building highly-efficient approximate or precision-scalable hardware accelerators.
Abstract: This paper introduces a novel method for designing approximate circuits by fabricating and exploiting false timing paths, i.e., critical paths that cannot be logically activated. This allows to strongly relax timing constraints while guaranteeing minimal and controlled behavioral change. This technique is applied to an approximate adder architecture, called the Carry Cut-Back Adder (CCBA), in which high-significance stages can cut the carry propagation chain at lower-significance positions. This lightweight approach prevents the logic activation of the carry chain, improving performance and energy efficiency while guaranteeing low worst-case errors. A design methodology is presented along with implementation, error optimization, and design-space minimization. The CCBA is proven capable of extremely high accuracy while displaying significant circuit savings. For a worst case precision of 99.999%, energy savings up to 36% are demonstrated compared with exact adders. Finally, an industry-oriented comparison of 32-bit approximate and truncated adders is carried out for mean and worst-case relative errors. The CCBA outperforms both state-of-the-art and truncated adders for high-accuracy and low-power circuits, confirming the interest of the proposed concept to help building highly-efficient approximate or precision-scalable hardware accelerators.

Journal ArticleDOI
TL;DR: It is shown that SDLC can achieve up to an order of magnitude energy savings, and reductions of 65% in critical delay, and almost 45% in silicon area can be achieved for an 128-bit multiplier, compared with an accurate equivalent.
Abstract: Approximate arithmetic has recently emerged as a promising paradigm for many imprecision-tolerant applications. It can offer substantial reductions in circuit complexity, delay, and energy consumption by relaxing accuracy requirements. In this paper, we propose a novel energy-efficient approximate multiplier design using a significance-driven logic compression (SDLC) approach. Fundamental to this approach is an algorithmic and configurable lossy compression of the partial product rows based on their progressive bit significance. This is followed by the commutative remapping of the resulting product terms to reduce the number of product rows. As such, the complexity of the multiplier in terms of logic cell counts and lengths of critical paths is drastically reduced. A number of multipliers with different bit-widths (4-bit to 128-bit) are designed in SystemVerilog and synthesized using Synopsys Design Compiler. Post-synthesis experiments showed that up to an order of magnitude energy savings, and reductions of 65% in critical delay, and almost 45% in silicon area can be achieved for an 128-bit multiplier, compared with an accurate equivalent. These gains are achieved with low accuracy losses estimated at less than 0.0028 mean relative error. Additionally, we demonstrate the performance-energy-quality tradeoffs for different degrees of compression, achieved through configurable logic clustering. While evaluating the effectiveness of the proposed approach three case studies were set up. First, a Gaussian blur filter was designed, which demonstrated up to 80% energy reduction with a meagre loss of image quality. Second, we evaluate our approach in machine learning application using perceptron classifier, showed up to 74% energy reduction with negligible error rate. Third, the proposed multiplier designs were used in a power-constrained image processing application. We showed that SDLC can achieve $60\times $ improvement in computation capability, with potential to be employed in ubiquitous systems.

Journal ArticleDOI
TL;DR: A baseband multi-beamforming method based on the spatial Fourier transform that has the potential to reduce circuit area and power requirements while meeting the bandwidth requirements of emerging 5G baseband systems is explored.
Abstract: Emerging millimeter-wave (mmW) wireless systems require beamforming and multiple-input multiple-output (MIMO) approaches in order to mitigate path loss, obstructions, and attenuation of the communication channel. Sharp mmW beams are essential for this purpose and must support baseband bandwidths of at least 1 GHz to facilitate higher system capacity. This paper explores a baseband multi-beamforming method based on the spatial Fourier transform. Approximate computing techniques are used to propose a low-complexity fast algorithm with sparse factorizations that neatly map to integer $W/L$ ratios in CMOS current mirrors. The resulting approximate fast Fourier transform (FFT) can thus be efficiently realized using CMOS analog integrated circuits to generate multiple, parallel mmW beams in both transmit and receive modes. The paper proposes both 8- and 16-point approximate-FFT algorithms together with circuit theory and design information for 65-nm CMOS implementations. Post-layout simulations of the 8-point circuit in Cadence Spectre provide well-defined mmW beam shapes, a baseband bandwidth of 2.7 GHz, a power consumption of 70 mW, and a dynamic range >42.2 dB. Preliminary experimental results confirm the basic functionality of the 8-beam circuit. Schematic-level analysis of the 16-beam I/Q version show worst-case and average side lobe levels of −10.2 dB and −12.2 dB at 1 GHz bandwidth, and −9.1 dB and −11.3 dB at 1.5 GHz bandwidth. The proposed multi-beam architectures have the potential to reduce circuit area and power requirements while meeting the bandwidth requirements of emerging 5G baseband systems.

Journal ArticleDOI
TL;DR: A miniature inductorless impulse-radio ultra-wideband transmitter–receiver and radar for wireless short-range communication and vital-sign sensing and is capable of sensing the human’s respiratory rate when the time interval is measured by a fast sampling digital oscilloscope.
Abstract: This paper presents a miniature inductorless impulse-radio ultra-wideband transmitter–receiver and radar for wireless short-range communication and vital-sign sensing. The all-digital transmitter generates impulse-radio ultra-wideband pulses using edge combining technique and consumes 21.6 pJ/pulse at 10 Mb/s. A novel active-inductor-based technique is proposed for low noise amplifier that achieves ultra-wideband input impedance matching in the receiver. The non-coherent receiver employs a simple demodulation and synchronization circuit to achieve self-synchronization without any on-chip or external oscillator. Consuming a total power of 6.4 mW, the receiver attains −64-dBm sensitivity at 10 Mb/s. The chip is implemented in a 65-nm digital CMOS process and occupies only 0.04 mm2. Measurement shows that the transmitter–receiver and radar is capable of sensing the human’s respiratory rate when the time interval is measured by a fast sampling digital oscilloscope.

Journal ArticleDOI
TL;DR: Measurement results indicate that the proposed system with the adaptive array processing technique is effective for the noncontact measurement of the heart rate of a specific person when there is more than one person in the scene.
Abstract: A noncontact measurement of the heart rate of a specific person using an ${X}$ -band radar system with a four-element antenna array is presented. The system comprises a 2-D planar wide-beam antenna array with a four-channel network analyzer. The direction of arrival is estimated using the Capon method, and the directionally constrained minimization-of-power algorithm is then applied to the received signals. Signals from the four channels are adaptively combined to enhance the accuracy in estimating the instantaneous heart rate in a multi-person scenario. Measurement results indicate that the proposed system with the adaptive array processing technique is effective for the noncontact measurement of the heart rate of a specific person when there is more than one person in the scene.