scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Journal on Emerging and Selected Topics in Circuits and Systems in 2011"


Journal ArticleDOI
TL;DR: This paper explicitly highlights several subtle but essential design differences that distinguish energy harvesting systems from battery-powered embedded systems and envision the necessity and importance of developing a simulation tool that enables design space exploration and quick performance evaluation at the early design phase.
Abstract: Micro-scale energy harvesting has emerged as an attractive and increasingly feasible option to alleviate the power supply challenge in a variety of low power applications, such as wireless sensor networks, implantable biomedical devices, etc. While the basic idea and system composition of micro-scale energy harvesting systems have been explored and applied in a number of prototypes in recent years, designing micro-scale efficient energy harvesting systems require an in-depth understanding of various design factors and tradeoffs. This paper provides an overview of the area of micro-scale energy harvesting and addresses various challenges and considerations involved from circuit, architecture and system perspectives. This paper explicitly highlights several subtle but essential design differences that distinguish energy harvesting systems from battery-powered embedded systems. Moreover, we envision the necessity and importance of developing a simulation tool that enables design space exploration and quick performance evaluation at the early design phase. The practical issues, challenges and considerations for implementation of this envisaged simulation tool is discussed in this paper.

117 citations


Journal ArticleDOI
TL;DR: This paper presents a solid foundation for implementing analog vector-matrix multipliers (VMMs) in field-programmable analog arrays (FPAAs) and details the aspects of VMM topology choice, the performance metrics, and the methods and tools involved in FPAA synthesis.
Abstract: This paper presents a solid foundation for implementing analog vector-matrix multipliers (VMMs) in field-programmable analog arrays (FPAAs). Custom analog VMMs have been demonstrated to be 1000 times more power efficient than commercial digital implementations. However, no previous analog VMM discussion has carefully provided all of the implementation and performance considerations needed to utilize such a system. We utilize the FPAA because it provides an ideal platform for embedding low-power analog processing into larger systems. FPAAs allow the analog processing system to be rapidly prototyped, implemented at low cost, and easily reconfigured in the field. This paper can double as a complete analog VMM design specification, as well as a systematic tutorial on developing general systems with FPAA hardware. We detail the aspects of VMM topology choice, completely analyze the performance metrics, and describe the methods and tools involved in FPAA synthesis.

103 citations


Journal ArticleDOI
TL;DR: A custom integrated noncontact sensor front-end amplifier that fully bootstraps internal and external parasitic impedances is designed and fabricated and ensures DC stability without the need for external large valued resistances is ensured by an ac bootstrapped, low-leakage, on-chip biasing network.
Abstract: Noncontact electrocardiogram/electroencephalogram/ electromyogram electrodes, which operate primarily through capacitive coupling, have been extensively studied for unobtrusive physiological monitoring. Previous implementations using discrete off-the-shelf amplifiers have been encumbered by the need for manually tuned input capacitance neutralization networks and complex dc-biasing schemes. We have designed and fabricated a custom integrated noncontact sensor front-end amplifier that fully bootstraps internal and external parasitic impedances. DC stability without the need for external large valued resistances is ensured by an ac bootstrapped, low-leakage, on-chip biasing network. The amplifier achieves, without neutralization, input impedance of 60 fF $\Vert$ 50 T $\Omega$ , input referred noise of 0.05 fA/ $\sqrt{\rm Hz}$ and 200 nV/ $\sqrt{\rm Hz}$ at 1 Hz, and power consumption of 1.5 $\mu$ A per channel at 3.3 V supply voltage. Stable frequency response is demonstrated below 0.05 Hz with electrode coupling capacitances as low as 0.5 pF.

102 citations


Journal ArticleDOI
TL;DR: In this methodology, a modified system design is proposed to optimize the area/noise/linearity performance and a novel linear pseudo-resistor with a wide range of tunability is also proposed.
Abstract: In this paper, an in-depth design methodology for fully-integrated tunable low-noise amplifiers for neural recording applications is presented. In this methodology, a modified system design is proposed to optimize the area/noise/linearity performance. A novel linear pseudo-resistor with a wide range of tunability is also proposed. As a case study, a low-noise tunable and reconfigurable amplifier for neural recording applications is designed and simulated in a 0.18 $\mu{\rm m}$ complementary metal–oxide–semiconductor process in all process corners. Simulated characteristics of the amplifier include tunable gain of 54 dB, tunable high-cutoff frequency of 10 kHz, programmable low-cutoff frequency ranging from 4 to 300 Hz, and power consumption of 20.8 $\mu{\rm W}$ at 1.8 V. According to postlayout simulations, integrated input-referred noise of the amplifier is 2.6 $\mu{\rm V}_{\rm rms}$ and 2.38 $\mu{\rm V}_{\rm rms}$ over the 0.5 Hz–50 kHz frequency range for low-cutoff frequency of 4 and 300 Hz, respectively. The amplifier also provides output voltage swing of 1 ${\rm V}_{\rm P-P}$ with total harmonic distortion of -46.24 dB at 300 Hz, and -45.97 dB at 10 kHz.

87 citations


Journal ArticleDOI
TL;DR: The reliable operation at the energy-minimum voltage of the various SCM architectures in a 65-nm CMOS technology considering within-die process parameter variations is demonstrated by means of Monte Carlo circuit simulation and the area of the best SCM architecture is compared to recent sub-VT SRAM designs.
Abstract: In this paper, standard-cell based memories (SCMs) are proposed as an alternative to full-custom sub-VT SRAM macros for ultra-low-power systems requiring small memory blocks. The energy per memory access as well as the maximum achievable throughput in the sub-VT domain of various SCM architectures are evaluated by means of a gate-level sub-VT characterization model, building on data extracted from fully placed, routed, and back-annotated netlists. The reliable operation at the energy-minimum voltage of the various SCM architectures in a 65-nm CMOS technology considering within-die process parameter variations is demonstrated by means of Monte Carlo circuit simulation. Finally, the energy per memory access, the achievable throughput, and the area of the best SCM architecture are compared to recent sub-VT SRAM designs.

80 citations


Journal ArticleDOI
TL;DR: The most important unreliability effects in nanometer CMOS technologies are reviewed and transistor aging models, intended for accurate circuit simulation, are described and efficient methods for circuit reliability simulation and analysis are discussed.
Abstract: Integrated analog circuit design in nanometer CMOS technologies brings forth new and significant reliability challenges. Ever-increasing process variability effects and transistor wear-out phenomena such as BTI, hot carrier degradation and dielectric breakdown force designers to use large design margins and to increase the uncertainty on the circuit lifetime. To help designers to tackle these problems at design time (i.e., Design For Reliability, or DFR), accurate transistor aging models, efficient circuit reliability analysis methods and novel design techniques are needed. The paper overviews the current state of the art in DFR for analog circuits. The most important unreliability effects in nanometer CMOS technologies are reviewed and transistor aging models, intended for accurate circuit simulation, are described. Also, efficient methods for circuit reliability simulation and analysis are discussed. These methods can help designers to analyze their circuits and to identify weak spots. Finally, cost-effective design techniques for more resilient and self-healing analog circuits are studied.

74 citations


Journal ArticleDOI
TL;DR: An overview of the root causes of on-chip variations associated with printing finer geometry features, increased atomic-scale effects, and increased on- chip power densities is presented.
Abstract: Nanometer-scale circuits are fundamentally different from those built in their predecessor technologies in that they are subject to a wide range of new effects that induce on-chip variations. These include effects associated with printing finer geometry features, increased atomic-scale effects, and increased on-chip power densities, and are manifested as variations in process and environmental parameters and as circuit aging effects. The impact of such variations on key circuit performance metrics is quite significant, resulting in parametric variations in the timing and power, and potentially catastrophic failure due to reliability and aging effects. Such problems have led to a revolution in the way that chips are designed in the presence of such uncertainties, both in terms of performance analysis and optimization. This paper presents an overview of the root causes of these variations and approaches for overcoming their effects.

65 citations


Journal ArticleDOI
TL;DR: This paper presents a partitioning, mapping, routing and interface optimization framework for energy-efficient voltage-frequency island (VFI) based networks-on-chip and proves that this framework achieves better power-performance trade-offs.
Abstract: In this paper, we present a partitioning, mapping, routing and interface optimization framework for energy-efficient voltage-frequency island (VFI) based networks-on-chip. Unlike the recent work that performs tile partitioning only with voltage-frequency assignment for a given mesh network layout, our framework consists of three key VFI-aware components, i.e., VFI-aware core partitioning with voltage and frequency assignment, VFI-aware mapping, and VFI-aware routing path allocation. In addition, we develop a VFI interface and its insertion algorithm to easily satisfy performance constraints. Our methodology makes cores using the same voltage and frequency unified to single VFI. Thus, our technique considerably reduces VFI overheads such as a mixed clock first input, first output buffer and a voltage level converter up to 82% and energy consumption up to 10% compared with the state-of-the-art work. It proves that our global energy optimization framework achieves better power-performance trade-offs.

63 citations


Journal ArticleDOI
TL;DR: This paper discusses two promising device candidates (Tunnel-FET and Magnetic-RAM) for introducing technological diversity in the multicores and analyzes their integration in the processor and cache hierarchy in detail.
Abstract: Heterogeneous multicores are envisioned to be a promising design paradigm to combat today's challenges of power, memory, and reliability walls that are impeding chip design using deep submicron technology. Future multicores are expected to integrate multiple different cores, including GPGPUs, custom accelerators and configurable cores. In this paper, we introduce an important dimension-technology-using which heterogeneity can be introduced in multicores to improve their energy-performance envelope. Specifically, we analyze the benefits of heterogenous technologies for processor cores and cache subsystems. We discuss two promising device candidates (Tunnel-FET and Magnetic-RAM) for introducing technological diversity in the multicores and analyze their integration in the processor and cache hierarchy in detail. Our analysis shows that introducing such a kind of heterogeneity can significantly enhance the performance and energy behavior of future multicore systems.

63 citations


Journal ArticleDOI
TL;DR: Three techniques that can enable a sea change in robust system design through cost-effective tolerance and prediction of failures in hardware during system operation are described: 1) efficient soft error resilience; 2) circuit failure prediction; and 3) effective on-line self-test and diagnostics.
Abstract: Today's mainstream electronic systems typically assume that transistors and interconnects operate correctly over their useful lifetime. With enormous complexity and significantly increased vulnerability to failures compared to the past, future system designs cannot rely on such assumptions. For coming generations of silicon technologies, several causes of hardware reliability failures, largely benign in the past, are becoming significant at the system level. Robust system design is essential to ensure that future systems perform correctly despite rising complexity and increasing disturbances. This paper describes three techniques that can enable a sea change in robust system design through cost-effective tolerance and prediction of failures in hardware during system operation: 1) efficient soft error resilience; 2) circuit failure prediction; and 3) effective on-line self-test and diagnostics. The need for global optimization across multiple abstraction layers is also demonstrated.

62 citations


Journal ArticleDOI
TL;DR: Soft-edge clocking, body-biasing, mismatch-tolerant memories, asynchronous operation and low-skew clock networks are presented to mitigate variability in the near threshold VDD regime.
Abstract: Near threshold computing has recently gained significant interest due to its potential to address the prohibitive increase of power consumption in a wide spectrum of modern VLSI circuits. This tutorial paper starts by reviewing the benefits and challenges of near threshold computing. We focus on the challenge of variability and discuss circuit and architecture solutions tailored to three different circuit fabrics: logic, memory, and clock distribution. Soft-edge clocking, body-biasing, mismatch-tolerant memories, asynchronous operation and low-skew clock networks are presented to mitigate variability in the near threshold VDD regime.

Journal ArticleDOI
TL;DR: A neural amplifier in UMC 130 nm, 1P8M complementary metal-oxide-semiconductor (CMOS) technology that achieves a noise efficiency factor of 2.58 and a low noise design technique which minimizes the noise contribution of the load circuitry is described.
Abstract: Chronic recording of neural signals is indispensable in designing efficient brain-machine interfaces and to elucidate human neurophysiology. The advent of multichannel micro-electrode arrays has driven the need for electronics to record neural signals from many neurons. The dynamic range of the system can vary over time due to change in electrode-neuron distance and background noise. We propose a neural amplifier in UMC 130 nm, 1P8M complementary metal-oxide-semiconductor (CMOS) technology. It can be biased adaptively from 200 nA to 2 μA, modulating input referred noise from 9.92 μV to 3.9 μV. We also describe a low noise design technique which minimizes the noise contribution of the load circuitry. Optimum sizing of the input transistors minimizes the accentuation of the input referred noise of the amplifier and obviates the need of large input capacitance. The amplifier achieves a noise efficiency factor of 2.58. The amplifier can pass signal from 5 Hz to 7 kHz and the bandwidth of the amplifier can be tuned for rejecting low field potentials (LFP) and power line interference. The amplifier achieves a mid-band voltage gain of 37 dB. In vitro experiments are performed to validate the applicability of the neural low noise amplifier in neural recording systems.

Journal ArticleDOI
TL;DR: Low power consumption due to spin transfer torque current induced switching and clocking along with the reasonable magneto-resistance (MR) distinguishing the two energy minimum states of the device, make these devices a promising candidate in MQCA realization.
Abstract: In this paper, we report magnetic quantum cellular automata (MQCA) realization using multi-layer cells with tilted polarizer reference layer with a particular focus on the critical need to shift toward the multi-layer cells as elemental entities from the conventional single-domain nanomagnets. We have reported a novel spin-transfer torque current-induced clocking scheme, theoretically derived the clocking current, and shown the reduction in power consumption achieved against the traditional mechanism of clocking using magnetic fields typically generated from overhead or underneath wires. We have modeled the multi-layer cell behavior in Verilog-A along with the underlying algorithm used in implementing the neighbor interaction between the cells. This paper reports the switching and clocking current magnitudes, their direction and the power consumption associated with switching and clocking operation. Finally, we present the simulation results from Verilog-A model of switching, clocking and neighbor interaction. Low power consumption due to spin transfer torque current induced switching and clocking along with the reasonable magneto-resistance (MR) distinguishing the two energy minimum states of the device, make these devices a promising candidate in MQCA realization.

Journal ArticleDOI
TL;DR: This paper describes how ultra-low-power analog circuitry can be integrated with sensor nodes to create energy-efficient sensor networks and presents a custom analog front-end which performs spectral analysis at a fraction of the power used by a digital counterpart.
Abstract: Preprocessing of data before transmission is recommended for many sensor network applications to reduce communication and improve energy efficiency. However, constraints on memory, speed, and energy currently limit the processing capabilities within a sensor network. In this paper, we describe how ultra-low-power analog circuitry can be integrated with sensor nodes to create energy-efficient sensor networks. To demonstrate this concept, we present a custom analog front-end which performs spectral analysis at a fraction of the power used by a digital counterpart. Furthermore, we show that the front-end can be combined with existing sensor nodes to 1) selectively wake up the mote based upon spectral content of the signal, thus increasing battery life without missing interesting events, and to 2) achieve low-power signal analysis using an analog spectral decomposition block, freeing up digital computation resources for higher-level analysis. Experiments in the context of vehicle classification show improved performance for our ASP-interfaced mote over an all-digital implementation.

Journal ArticleDOI
TL;DR: It is demonstrated that a P300-based BCI is definitely feasible in ambulatory condition and a recommended approach is given for the development of a real-time application.
Abstract: Brain-computer interfaces (BCIs) enable their users to interact with their surrounding environment using the activity of their brain only, without activating any muscle. This technology provides severely disabled people with an alternative mean to communicate or control any electric device. On the other hand, BCI applications are more and more dedicated to healthier people, with the aim of giving them access to augmented reality or new rehabilitation tools. As it is noninvasive, light and relatively cheap, electroencephalography (EEG) is the most used acquisition technique to record cerebral activity of the BCI users. However, when using such type of BCI, user movements are likely to provoke motions of the measuring electrodes which can severely damage the EEG quality. Thus, current BCI technology requires that the user sits and performs as little movements as possible. This is of course a strong limitation of BCI for use in ordinary life. Very recently, preliminary studies have been published in the literature and suggest that BCI applications can be realized even in the physically moving context. In this paper, we thoroughly investigate the possibility to develop a P300-based BCI system in ambulatory condition. The study is based on experimental data recorded with seven subjects executing a visual P300 speller-like discrimination task while simultaneously walking at different speeds on a treadmill. It is demonstrated that a P300-based BCI is definitely feasible in such conditions. Different artifact correction methods are described and discussed in detail. To conclude, a recommended approach is given for the development of a real-time application.

Journal ArticleDOI
TL;DR: This study theoretically quantified the improvement in accuracy of a BCI system when using error potentials for correcting the output decision, in the general case of multiclass BCI and studied in simulation the performance of the closed-loop system in order to evaluate its ability to adapt to the changes in the mental states of the user.
Abstract: New paradigms for brain–computer interfacing (BCI), such as based on imagination of task characteristics, require long training periods, have limited accuracy, and lack adaptation to the changes in the users' conditions. Error potentials generated in response to an error made by the translation algorithm can be used to improve the performance of a BCI, as a feedback extracted from the user and fed into the BCI system. The present study addresses the inclusion of error potentials in a BCI system based on the decoding of movement-related cortical potentials (MRCPs) associated to the speed of a task. First, we theoretically quantified the improvement in accuracy of a BCI system when using error potentials for correcting the output decision, in the general case of multiclass BCI. The derived theoretical expressions can be used during the design phase of any BCI system. They were applied to experimentally estimated accuracies in decoding MRCPs and error potentials. Second we studied in simulation the performance of the closed-loop system in order to evaluate its ability to adapt to the changes in the mental states of the user. By setting the parameters of the simulator to experimentally determined values, we showed that updating the learning set with the examples estimated as correct based on the decoding of error potentials leads to convergence to the optimal solution.

Journal ArticleDOI
TL;DR: A hybrid brain–computer interface (BCI) system that combines a self-paced BCI and an eye-tracker and a method that adaptively updates the BCI classifier is proposed for text-entry applications.
Abstract: A hybrid brain–computer interface (BCI) system that combines a self-paced BCI and an eye-tracker is proposed for text-entry applications. To make a text-entry of a letter/word, the user must gaze at the target for at least a specific period of time (called the dwell time) and then activate the self-paced BCI with an attempted hand extension. Although the self-paced BCI is available for use at any time, a built-in sleep mode is activated when the user is not looking at a letter/word or when the user gazes at a letter/word for less than the dwell time. Such a design has the advantage of greatly minimizing the false positive outcomes compared to the state-of-art self-paced BCIs. To further improve the system's performance, a method that adaptively updates the BCI classifier is also proposed. The results from seven able-bodied individuals show great improvements compared to the pure self-paced BCI. For dwell times of 0.75 and 1.00 s, the number of false-positives/minute is significantly reduced to 2.5 and 1.7, at acceptable average true positive rates of 54.5% and 54.1%, respectively.

Journal ArticleDOI
TL;DR: The Booth multiplier utilizing area-efficiency, power- efficiency, and high-accuracy is achieved using the proposed GPEB, which has the most power-efficiency compared with other methods.
Abstract: In this paper, a closed form of compensation function for fixed-width Booth multipliers using generalized probabilistic estimation bias (GPEB) is proposed. Based on the probabilistic estimation from the truncation part, the GPEB circuit can be easily built according to the proposed systematic steps. The GPEB fixed-width multipliers with variable-correction outperform the existing compensation circuits in reducing error. An 8 × 8 GPEB Booth multiplier improves more than 88% on the reduction of absolute average error compared with the traditional direct truncation (D-T) multiplier, and more than 32% area savings is obtained in the GPEB Booth multiplier compared with posttrun cation (P-T) Booth multiplier. By the same power consumption, the GPEB Booth multipliers can achieve higher accuracy than the existing works. Besides, considering power efficiency with accuracy, the proposed GPEB Booth multiplier has the most power-efficiency compared with other methods. Furthermore, the GPEB Booth multipliers are implemented in the circuit of two-dimensional discrete cosine transform (DCT). Compared with traditional Booth multiplier's applications, the proposed 2-D DCT cores can reduce about 18% area cost with the penalty of only 0.8 dB peak signal-to-noise ratio (PSNR). Consequently, the Booth multiplier utilizing area-efficiency, power-efficiency, and high-accuracy is achieved using the proposed GPEB.

Journal ArticleDOI
TL;DR: A programmable charge pump driven by a direct digital synthesizer (DDS) and the topology of multiple supercapacitors is dynamically reconfigured to maximize charging efficiency and minimize voltage-dependent leakage to expand the zones of effective charging.
Abstract: Micro-solar energy harvesting systems have achieved efficient operations through maximum power point tracking (MPPT) and maximum power transfer tracking (MPTT) techniques. However, they may have chargers with relatively high power thresholds, below which they have 0% efficiency. As a result, these harvesters either require much larger panels than necessary, or they fail to sustain extended periods of poor weather. To address this problem, we propose to generalize MPTT to MCZT, for Maximum Charging Zone Tracking, to expand the zones of effective charging. To cover the wide dynamic range of solar irradiation, we propose a programmable charge pump driven by a direct digital synthesizer (DDS). In addition, we dynamically reconfigure the topology of multiple supercapacitors to maximize charging efficiency and minimize voltage-dependent leakage. Experimental results from simulation and measurement show that under the high solar irradiance of 1000 W/m2, our MPTT part achieves 40%-50% faster charging time than one without MPTT; and under low solar irradiation of 300 W/m2, the boost-up operation of our system enables fully charging the supercapacitors, thereby extending the harvesting time zone from 10:00 am-07:10 pm to 8:20 am-8:00 pm even on a sunny day, all with an MPTT overhead of 1.5 mW.

Journal ArticleDOI
TL;DR: This work argues that designers should evaluate the design of system-on-chip implementations in terms of average power for an entire workload, including active and idle periods, not just the metric of energy-per-instruction.
Abstract: Networks of ultra-low-power nodes capable of sensing, computation, and wireless communication have applications in medicine, science, industrial automation, and security. Reducing power consumption requires the development of system-on-chip implementations that must provide both energy efficiency and adequate performance to meet the demands of the long deployment lifetimes and bursts of computation that characterize wireless sensor network (WSN) applications. Therefore, this work argues that designers should evaluate the design in terms of average power for an entire workload, including active and idle periods, not just the metric of energy-per-instruction.

Journal ArticleDOI
TL;DR: This paper proposes PowerSleep, a smart power-saving scheme by carefully choosing an execution speed for the server with DVS and sleep periods while putting the system in the sleep power mode with DPM, and presents how to minimize the mean power consumption of the server under the given mean response time constraint.
Abstract: Reducing the power consumption while maintaining the response time constraint has been an important goal in server system design. One of the techniques widely explored in the literature to achieve this goal is dynamic voltage scaling (DVS). However, DVS is not efficient in modern systems where the overall power consumption includes a large portion of static power consumption. In this paper, we aim to reduce the static power consumption by dynamic power management (DPM) with sleep model in addition to DVS. To maximize the sleep efficiency, we propose PowerSleep, a smart power-saving scheme by carefully choosing an execution speed for the server with DVS and sleep periods while putting the system in the sleep power mode with DPM. By modeling the system with M/G/1/PS queuing model and further significant extensions, we present how to minimize the mean power consumption of the server under the given mean response time constraint. Simulation results show that our smart PowerSleep scheme significantly outperforms the simple power-saving scheme which adopts sleep mode.

Journal ArticleDOI
TL;DR: A low-power biomedical signal processor based on reduced instruction set computer (RISC) architecture for real-time seizure detection is implemented to achieve low- power consumption and perform continuous and real- time processing.
Abstract: Epilepsy is one of the most common neurological disorders, with a worldwide prevalence of approximately 1%. A considerable portion of epilepsy patients cannot be treated sufficiently by today's available therapies. Implantable closed-loop neurostimulation is an innovative and effective method for seizure control. A real-time seizure detector is the kernel of a closed-loop seizure controller. In this paper, a low-power biomedical signal processor based on reduced instruction set computer (RISC) architecture for real-time seizure detection is implemented to achieve low-power consumption and perform continuous and real-time processing. The low-power processor is implemented in a 0.18 $\mu$ m complementary–metal–oxide semiconductor technology to verify functionality and capability. The measurement results show the implemented processor can reduce over 90% power consumption compared with our previous prototype, which was implemented on an enhanced 8051 microprocessor. This seizure detector was applied to the continuous EEG signals of four Long–Evans rats with spontaneous absence seizures. It also processed 24 h long-term and uninterrupted EEG sequence. The developed seizure detector can be applied for online seizure monitoring and integrated with an electrical stimulator to perform a closed-loop seizure controller in the future.

Journal ArticleDOI
TL;DR: A 32-channel recording ASIC that provides low-noise amplification and analog filtering, and also includes a 12-bit analog-to-digital conversion function, and offers programmable output rates through a serial peripheralinterface (SPI).
Abstract: Monitoring of electrocorticography signals using multi-electrode array creates new opportunities for neural prosthetic applications. In this paper, we present a 32-channel recording ASIC that provides low-noise amplification and analog filtering. It also includes a 12-bit analog-to-digital conversion function, and offers programmable output rates through a serial peripheralinterface (SPI). The targeted application is a remote-powered wireless implantable ECoG recording system. Each recording channel has a measured 0.7 $\mu {\rm V_{\rm rms}}$ input-referred noise on a [0.5–300 Hz] bandwidth. The device was fabricated in a 0.35 $\mu{\rm m}$ complementary metal–oxide–semiconductor process for a total die area of 86 ${\hbox {mm}} ^{2}$ with an analog power consumption limited to 134 $\mu {\rm W}$ per channel.

Journal ArticleDOI
Sherief Reda1
TL;DR: New techniques for thermal and power characterization of real computing devices are described and it is shown how the measurements from infrared imaging, embedded thermal sensors, and current meters can be integrated to accurately characterize the temperatures and power of computing devices during operation.
Abstract: Power and temperature are key design concerns in modern computing systems. Power minimization is essential for battery-operated devices and for large-scale data center facilities. The spatial and temporal allocation of within-die power consumption lead to thermal gradients and hot spots during operation. Temperature impacts key circuit metrics such as reliability, speed, and leakage power, and it is a major constraint towards improving the performance of high-end computing devices. Due to the enormous complexities and sheer number of modeling parameters of state-of-the-art designs, pre-silicon power and thermal models cannot be trusted blindly. It is necessary to complement pre-silicon analysis with post-silicon thermal and power characterization on the fabricated devices, and then to use the characterization results to improve the design during re-spins before ramp and production. In this paper, we describe new techniques for thermal and power characterization of real computing devices. We show how the measurements from infrared imaging, embedded thermal sensors, and current meters can be integrated to accurately characterize the temperatures and power of computing devices during operation. We describe the key algorithmic and experimental techniques required to overcome the challenges encountered when working with real devices. We present characterization results of a dual-core processor and a programmable logic device.

Journal ArticleDOI
TL;DR: The integration of the hierarchical data sampling in the hardware to accelerate the clustering speed and the development of the “Bayesian-Information-Criterion (BIC) Processor” to estimate the number of clusters of K-Means.
Abstract: A power-efficient K-Means hardware architecture that can automatically estimate the number of clusters in the clustering process is proposed. The contributions of this work include two main aspects. The first is the integration of the hierarchical data sampling in the hardware to accelerate the clustering speed. The second is the development of the “Bayesian-Information-Criterion (BIC) Processor” to estimate the number of clusters of K-Means. The architecture of the “BIC Processor” is designed based on the simplification of the BIC computations, and the precision of the logarithm function is also analyzed. The experiments show that the proposed architecture can be employed in different multimedia applications, such as motion segmentation and edge-adaptive noise reduction. Besides, the gate count of the hardware is 51 K with the 90-nm complimentary metal-oxide-semiconductor technology. It is also shown that this work can achieve high efficiency compared with a GPU, and the power consumption scales well with the number of clusters and the number of dimensions. The power consumption ranges between 10.72 and 12.95 mW in different modes when the operating frequency is 233 MHz.

Journal ArticleDOI
TL;DR: Key points in the design of the rectifier are discussed as well as practical limitations of the circuit, and the maximum measured output voltages for the proposed single stage and four-stage rectifiers were 3.5 V.
Abstract: In this paper, a self-powered rectifier is proposed with an intended application in energy harvesting systems and wireless sensor networks. Key points in the design of the rectifier are discussed as well as practical limitations of the circuit. The proposed self-powered rectifier was designed and fabricated in a standard 0.5-μm complementary metal-oxide-semiconductor process using standard components and poly-poly capacitors. Measurement results are presented for single and four-stage rectifiers with resistive loads from 10 kΩ to 10 MΩ and sinusoidal input amplitudes from 100-1000 mVpk. For the specified input and loading conditions, the maximum measured output voltages for the proposed single stage and four-stage rectifiers were 0.8 and 3.5 V, respectively.

Journal ArticleDOI
TL;DR: The proposed Hebbian eigenfilter technique enables real-time multichannel spike sorting, and leads the way towards the next generation of motor and cognitive neuro-prosthetic devices.
Abstract: Real-time multichannel neuronal signal recording has spawned broad applications in neuro-prostheses and neuro-rehabilitation. Detecting and discriminating neuronal spikes from multiple spike trains in real-time require significant computational efforts and present major challenges for hardware design in terms of hardware area and power consumption. This paper presents a Hebbian eigenfilter spike sorting algorithm, in which principal components analysis (PCA) is conducted through Hebbian learning. The eigenfilter eliminates the need of computationally expensive covariance analysis and eigenvalue decomposition in traditional PCA algorithms and, most importantly, is amenable to low cost hardware implementation. Scalable and efficient hardware architectures for real-time multichannel spike sorting are also presented. In addition, folding techniques for hardware sharing are proposed for better utilization of computing resources among multiple channels. The throughput, accuracy and power consumption of our Hebbian eigenfilter are thoroughly evaluated through synthetic and real spike trains. The proposed Hebbian eigenfilter technique enables real-time multichannel spike sorting, and leads the way towards the next generation of motor and cognitive neuro-prosthetic devices.

Journal ArticleDOI
TL;DR: It is shown that for both nanoscale complementary metal-oxide-semiconductor (CMOS) as well as emerging non-CMOS [spin torque transfer random access memory (STTRAM)] memory technologies, such a co-design solution can achieve significant improvement in system EDP over a conventional FPGA framework.
Abstract: Reconfigurable computing frameworks such as field programmable gate array (FPGA) provide flexibility to map arbitrary applications. However, their intrinsic flexibility comes at the cost of significantly worse performance and power dissipation than their custom counterparts. Existing design solutions such as voltage scaling and multi-threshold assignment typically trade off energy for performance or vise versa. In this paper, we show that an integrated circuit-architecture-software co-design approach can be extremely effective to simultaneously improve the power and performance of a reconfigurable hardware framework, leading to large improvement in energy-delay product (EDP). First, we select a spatio-temporal reconfigurable computing architecture based on 2-threshold assignment-D memory-array. Applications are mapped to memory as multiple-input multiple-output lookup tables (LUTs) and are evaluated in temporal manner inside a computing element. Multiple such computing elements communicate spatially through programmable interconnects. Next, we exploit the read-dominant memory access pattern in reconfigurable hardware to design an asymmetric memory cell, which provides higher read performance and lower read power leading to improvement in the overall EDP during operation. We note that the proposed memory cell is also asymmetric in terms of its content, providing better read power for one of the logic states (logic “0” or “1”). Based on this observation, next we propose a content-aware application mapping approach, which tries to maximize the logic “0” or logic “1” content in the lookup tables. A design flow is presented to incorporate the proposed architecture, asymmetric memory cell design and content-aware mapping. We show that for both nanoscale complementary metal-oxide-semiconductor (CMOS) [static random access memory (SRAM)] as well as emerging non-CMOS [spin torque transfer random access memory (STTRAM)] memory technologies, such a co-design solution can achieve significant improvement in system EDP over a conventional FPGA framework.

Journal ArticleDOI
TL;DR: This paper provides a spike-based implementation of the HMAX model, demonstrating its ability to perform biologically-plausible MAX computations as well as classify basic shapes.
Abstract: Object recognition and categorization are computationally difficult tasks that are performed effortlessly by humans. Attempts have been made to emulate the computations in different parts of the primate cortex to gain a better understanding of the cortex and to design brain–machine interfaces that speak the same language as the brain. The HMAX model proposed by Riesenhuber and Poggio and extended by Serre attempts to truly model the visual cortex. In this paper, we provide a spike-based implementation of the HMAX model, demonstrating its ability to perform biologically-plausible MAX computations as well as classify basic shapes. The spike-based model consists of 2514 neurons and 17 $\thinspace$ 305 synapses (S1 Layer: 576 neurons and 7488 synapses, C1 Layer: 720 neurons and 2880 synapses, S2 Layer: 576 neurons and 1152 synapses, C2 Layer: 640 neurons and 5760 synapses, and Classifier: 2 neurons and 25 synapses). Without the limits of the retina model, it will take the system 2 min to recognize rectangles and triangles in 24 $\,\times\,$ 24 pixel images. This can be reduced to 4.8 s by rearranging the lookup table so that neurons which have similar responses to the same input(s) can be placed on the same row and affected in parallel.

Journal ArticleDOI
TL;DR: The design of embedded subthreshold SRAMs for a quality-scalable H.264 video decoder IP adopted power-gating techniques and multi-output dynamic circuits in order to achieve a low VDDmin, a small area overhead, and a higher operating speed.
Abstract: The design of embedded subthreshold SRAMs for a quality-scalable H.264 video decoder IP is presented in this paper. In addition to the conventional 7T SRAM bitcell, we adopted power-gating techniques and multi-output dynamic circuits in order to achieve a low VDDmin, a small area overhead, and a higher operating speed. A 256 × 32 90-nm SRAM macro was designed for verifying the proposed design techniques. The H.264 IP provides energy-efficient scalable video decoding of 42.8 pJ/cycle for QCIF and 235 pJ/cycle for HD720 at 0.3 V and 0.7 V, respectively.