scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Circuits and Systems in 2014"


Journal ArticleDOI
TL;DR: The potential conservativeness for the distributed robust pinning synchronization problem is solved by means of an evolutionary algorithm-based optimization method, which includes a constraint optimization evolutionary algorithm and a convex optimization method and aims at improving the traditional optimization methods.
Abstract: This paper deals with the problem of robust adaptive synchronization of dynamical networks with stochastic coupling by means of evolutionary algorithms. The complex networks under consideration are subject to: 1) the coupling term in a stochastic way is considered; 2) uncertainties exist in the node's dynamics; and 3) pinning distributed synchronization is also considered. By resorting to Lyapunov function methods and stochastic analysis techniques, the tasks to get the distributed robust synchronization and distributed robust pinning synchronization of dynamical networks are solved in terms of a set of inequalities, respectively. The impacts of degree information, stochastic coupling, and uncertainties on synchronization performance, i.e., mean control gain and convergence rate, are derived theoretically. The potential conservativeness for the distributed robust pinning synchronization problem is solved by means of an evolutionary algorithm-based optimization method, which includes a constraint optimization evolutionary algorithm and a convex optimization method and aims at improving the traditional optimization methods. Simulations are provided to illustrate the effectiveness and applicability of the obtained results.

170 citations


Journal ArticleDOI
TL;DR: By means of a time-dependent Lyapunov function and the comparison principle, several sufficient conditions are established under which nonlinear dynamical networks with heterogeneous impulsive effects are exponentially synchronized to a desired state.
Abstract: In this paper, the synchronization problem is investigated for a class of nonlinear delayed dynamical networks with heterogeneous impulsive effects. The intrinsic properties of heterogeneous impulses are that impulsive strengths are inhomogeneous in both time and space domains, i.e., the impulsive effect in each node is not only nonidentical from each other, but also time-varying at different impulsive instants. The purpose of the addressed problem is to derive synchronization criteria such that, the nonlinear delayed dynamical networks with heterogeneous impulses can be synchronized to a desired state. By means of a time-dependent Lyapunov function and the comparison principle, several sufficient conditions are established under which nonlinear dynamical networks with heterogeneous impulsive effects are exponentially synchronized to a desired state. An example is given to show the effectiveness of the proposed results.

164 citations


Journal ArticleDOI
TL;DR: A 60-dB gain bulk-driven Miller OTA operating at 0.25-V power supply in the 130-nm digital CMOS process can help overcome some of the constraints imposed by nanometerCMOS process for high performance analog circuits in weak inversion region.
Abstract: This paper presents a 60-dB gain bulk-driven Miller OTA operating at 0.25-V power supply in the 130-nm digital CMOS process. The amplifier operates in the weak-inversion region with input bulk-driven differential pair sporting positive feedback source degeneration for transconductance enhancement. In addition, the distributed layout configuration is used for all the transistors to mitigate the effect of halo implants for higher output impedance. Combining these two approaches, we experimentally demonstrate a high gain of over 60-dB with just 18-nW power consumption from 0.25-V power supply. The use of enhanced bulk-driven differential pair and distributed layout can help overcome some of the constraints imposed by nanometer CMOS process for high performance analog circuits in weak inversion region.

147 citations


Journal ArticleDOI
TL;DR: This work addresses design and implementation issues of a 24 GHz rectenna, which is developed to demonstrate the feasibility of wireless power harvesting and transmission techniques towards millimeter-wave regime.
Abstract: This work addresses design and implementation issues of a 24 GHz rectenna, which is developed to demonstrate the feasibility of wireless power harvesting and transmission (WPT) techniques towards millimeter-wave regime. The proposed structure includes a compact circularly polarized substrate integrated waveguide (SIW) cavity-backed antenna array integrated with a self-biased rectifier using commercial Schottky diodes. The antenna and the rectifier are individually designed, optimized, fabricated and measured. Then they are integrated into one circuit in order to validate the studied rectenna architecture. The maximum measured conversion efficiency and DC voltage are respectively equal to 24% and 0.6 V for an input power density of 10 mW/cm 2 .

135 citations


Journal ArticleDOI
TL;DR: A direct connection between oscillator measurements and optimal communication system performance, in terms of measured single-side band PN spectrum, and the resulting error vector magnitude (EVM) due to PN, is mathematically derived and analyzed.
Abstract: Oscillator phase noise (PN) is one of the major problems that affect the performance of communication systems. In this paper, a direct connection between oscillator measurements, in terms of measured single-side band PN spectrum, and the optimal communication system performance, in terms of the resulting error vector magnitude (EVM) due to PN, is mathematically derived and analyzed. First, a statistical model of the PN, considering the effect of white and colored noise sources, is derived. Then, we utilize this model to derive the modified Bayesian Cramer-Rao bound on PN estimation, and use it to find an EVM bound for the system performance. Based on our analysis, it is found that the influence from different noise regions strongly depends on the communication bandwidth, i.e., the symbol rate. For high symbol rate communication systems, cumulative PN that appears near carrier is of relatively low importance compared to the white PN far from carrier. Our results also show that 1/f 3 noise is more predictable compared to 1/f 2 noise and in a fair comparison it affects the performance less.

122 citations


Journal ArticleDOI
TL;DR: A new bit-interleaving 12T subthreshold SRAM cell with Data-Aware Power-Cutoff (DAPC) Write-assist to improve the Write-ability to mitigate increased device variations at low supply voltage under deep sub-100 nm processes is presented.
Abstract: This paper presents a new bit-interleaving 12T subthreshold SRAM cell with Data-Aware Power-Cutoff (DAPC) Write-assist to improve the Write-ability to mitigate increased device variations at low supply voltage under deep sub-100 nm processes. The disturb-free feature facilitates the bit-interleaving architecture that can reduce multiple-bit errors in a single word and enhance soft error immunity by employing error checking and correction (ECC) techniques. The proposed 12T SRAM cell is demonstrated by a 4 kb SRAM macro implemented in 40 nm general purpose (40GP) CMOS technology. The test chip operates from typical VDD to 350 mV ( ~ 100 mV lower than the threshold voltage) with VDDMIN limited by Read operation. Data can be written successfully for VDD down to 300 mV. The measured maximum operation frequency is 11.5 MHz with total power consumption of 22 μW at 350 mV, 25 ° C.

118 citations


Journal ArticleDOI
TL;DR: A novel CMOS bandgap reference with high-order curvature-compensation by using MOS transistors operating in weak inversion region using standard CMOS 0.18 μm technology is proposed, suitable for low-power applications requiring references with high precision.
Abstract: This paper proposes a novel CMOS bandgap reference (BGR) with high-order curvature-compensation by using MOS transistors operating in weak inversion region. The mechanism of the proposed curvature-compensation technique is analyzed thoroughly and the corresponding BGR circuit was implemented in standard CMOS 0.18 μm technology. The experimental results show that the proposed BGR achieves 4.5 ppm/°C over the temperature range of -40°C to 120°C at 1.2 V supply voltage. It consumes only 36 μA. In addition, it achieves line regulation performance of 0.054%/V. It is suitable for low-power applications requiring references with high precision.

117 citations


Journal ArticleDOI
TL;DR: Sufficient conditions are derived for the global asymptotic synchronization of a class of identical nonlinear oscillators coupled through a linear time-invariant network and this work facilitates a modular design approach because the synchronization condition is independent of the number of oscillators.
Abstract: Sufficient conditions are derived for the global asymptotic synchronization of a class of identical nonlinear oscillators coupled through a linear time-invariant network. In particular, we focus on systems where oscillators are connected to a common node through identical branch impedances. For such networks, it is shown that the synchronization condition is independent of the number of oscillators and the value of the load impedance connected to the common node. Theoretical findings are then leveraged to control a system of parallel single-phase voltage source inverters serving an impedance load in an islanded microgrid application. The ensuing paradigm: i) does not necessitate communication between inverters, ii) is independent of system load, and iii) facilitates a modular design approach because the synchronization condition is independent of the number of oscillators. We present both simulation and experimental case studies to validate the analytical results and demonstrate the proposed application.

117 citations


Journal ArticleDOI
TL;DR: A novel 8-point DCT approximation that requires only 14 addition operations and no multiplications is introduced and is compared to state-of-the-art DCT approximations in terms of both algorithm complexity and peak signal-to-noise ratio.
Abstract: Video processing systems such as HEVC requiring low energy consumption needed for the multimedia market has lead to extensive development in fast algorithms for the efficient approximation of 2-D DCT transforms The DCT is employed in a multitude of compression standards due to its remarkable energy compaction properties Multiplier-free approximate DCT transforms have been proposed that offer superior compression performance at very low circuit complexity Such approximations can be realized in digital VLSI hardware using additions and subtractions only, leading to significant reductions in chip area and power consumption compared to conventional DCTs and integer transforms In this paper, we introduce a novel 8-point DCT approximation that requires only 14 addition operations and no multiplications The proposed transform possesses low computational complexity and is compared to state-of-the-art DCT approximations in terms of both algorithm complexity and peak signal-to-noise ratio The proposed DCT approximation is a candidate for reconfigurable video standards such as HEVC The proposed transform and several other DCT approximations are mapped to systolic-array digital architectures and physically realized as digital prototype circuits using FPGA technology and mapped to 45 nm CMOS technology

112 citations


Journal ArticleDOI
TL;DR: A novel reformulation for the last stage of SC decoding is presented, which leads to a significant reduction in critical path and hardware complexity and 2 bits can be decoded simultaneously instead of 1 bit in this new decoder.
Abstract: Polar codes have emerged as important error correction codes due to their capacity-achieving property. Successive cancellation (SC) algorithm is viewed as a good candidate for hardware design of polar decoders due to its low complexity. However, for (n, k) polar codes, the long latency of SC algorithm of (2n-2) is a bottleneck for designing high-throughput polar decoder. In this paper, we present a novel reformulation for the last stage of SC decoding. The proposed reformulation leads to two benefits. First, critical path and hardware complexity in the last stage of SC algorithm is significantly reduced. Second, 2 bits can be decoded simultaneously instead of 1 bit. As a result, this new decoder, referred to as 2b-SC decoder, reduces latency from (2n-2) to (1.5n-2) without performance loss. Additionally, overlapped-scheduling, precomputation and look-ahead techniques are used to design two additional decoders referred to as 2b-SC-Overlapped-scheduling decoder and 2b-SC-Precomputation decoder, respectively. All three architectures offer significant advantages with respect to throughput and hardware efficiency. Compared to known prior least-latency SC decoder, the 2b-SC-Precomputation decoder has 25% less latency. Synthesis results show that the proposed (1024, 512) 2b-SC-Precomputation decoder can achieve at least 4 times increase in throughput and 40% increase in hardware efficiency.

109 citations


Journal ArticleDOI
TL;DR: A novel level shifter, of which the operating range is from a deep subthreshold voltage to the standard supply voltage and includes upward and downward level conversion and is designed for practical applications.
Abstract: Wide-range level shifters play critical roles in ultra- low-voltage circuits and systems. Although state-of-the-art level shifters can convert a subthreshold voltage to the standard supply voltage, they may have limited operating ranges, which restrict the flexibility of dynamic voltage scaling. Therefore, this paper presents a novel level shifter, of which the operating range is from a deep subthreshold voltage to the standard supply voltage and includes upward and downward level conversion. The proposed level shifter is a hybrid structure comprising a modified Wilson current mirror and generic CMOS logic gates. The simulation and measurement results were verified using a 65-nm technology. The minimal operating voltage of the proposed level shifter was less than 200 mV based on the measurement results. In addition to the operating range, the delay, power consumption, and duty cycle of the proposed level shifter were designed for practical applications.

Journal ArticleDOI
TL;DR: A concurrent dual-band uneven GaN Doherty power amplifier for two wide-spacing frequencies application is proposed in this paper and an adaptive power division is realized by a frequency-dependent uneven power divider as well as the input matching nonlinearities of the two cells in Doherty PA.
Abstract: A concurrent dual-band uneven GaN Doherty power amplifier (PA) for two wide-spacing frequencies application is proposed in this paper. To avoid an early load modulation-drop caused by the soft turn-on characteristic of the peaking device, an adaptive power division is realized by a frequency-dependent uneven power divider as well as the input matching nonlinearities of the two cells in Doherty PA. Due to the adaptive power division, the proposed dual-band uneven Doherty PA achieves a power-added efficiency of 45% and 41% at the 6 dB backoff from the saturation at 850 MHz and 2330 MHz, respectively, the gain of the proposed Doherty PA is also enhanced to 19 dB and 13 dB in the dual bands. Furthermore, a more accurate two-dimensional joint digital predistortion model (2D-JDPD) is applied to linearize the PA and compensate for the in-phase and quadrature (I/Q) imbalance simultaneously. With this new model, the adjacent channel power ratio (ACPR) is improved to better than -47.1 dBc and -49.4 dBc in the lower and upper bands at an average output power of 31.75 dBm, and a drain efficiency of 26.7% is obtained at the same time.

Journal ArticleDOI
TL;DR: To achieve a trade-off design object between consensus regulation performances and control energy consumptions, a quadratic cost function is constructed by state errors among agents and control inputs of all agents and guaranteed-cost consensus problems are introduced.
Abstract: Guaranteed-cost consensus problems for high-order singular multi-agent systems with switching topologies are investigated. Firstly, to achieve a trade-off design object between consensus regulation performances and control energy consumptions, a quadratic cost function is constructed by state errors among agents and control inputs of all agents and guaranteed-cost consensus problems are introduced. Then, based on linear matrix inequality techniques, sufficient conditions for guaranteed-cost consensus and consensualization are presented respectively, which can guarantee the scalability of singular multi-agent systems since the dimensions of all the variables in these conditions are independent of the number of agents. Moreover, an upper bound of the cost function is determined, explicit expressions of consensus functions are given on the basis of the Second Equivalent Form, and it is shown that consensus functions are dependent on the average of initial states of all agents but are independent of switching topologies. Finally, the applications of theoretical results in multi-agent supporting systems are shown.

Journal ArticleDOI
TL;DR: In this paper, it is demonstrated that it is possible to synthesize a stochastic flash ADC entirely from Verilog code and a standard digital library, and a prototype IC is fabricated in 90 nm CMOS and implements a 2047-comparator version of the proposed architecture.
Abstract: It is demonstrated in this paper that it is possible to synthesize a stochastic flash ADC entirely from Verilog code and a standard digital library. An analog comparator is introduced that is constructed from two cross-coupled 3-input digital NAND gates, and can be described in Verilog. The synthesized comparators have random, Gaussian offsets that are used as virtual voltage references to make a flash ADC. A piecewise-linear inverse Gaussian CDF function is used to correct the nonlinearity introduced by the Gaussian offset distribution. The prototype IC is fabricated in 90 nm CMOS and implements a 2047-comparator version of the proposed architecture. All components including the comparators, the ones adder, and the peicewise inverse Gaussian function are all implemented in Verilog. Conventional digital synthesis and place-and-route is then used to generate the physical layout, making this the first fully synthesized ADC. SNDR of 35.9 dB (without calibration) is achieved at 210 MSPS from the Verilog synthesized design.

Journal ArticleDOI
TL;DR: This study reveals lacking predictivity of the first class of models, independent of the applied window function, and concludes that only the physics-based model is able to fulfill most of the basic evaluation criteria.
Abstract: Highly accurate and predictive models of resistive switching devices are needed to enable future memory and logic design. Widely used is the memristive modeling approach considering resistive switches as dynamical systems. Here we introduce three evaluation criteria for memristor models, checking for plausibility of the I-V characteristics, the presence of a sufficiently nonlinearity of the switching kinetics, and the feasibility of predicting the behavior of two antiserially connected devices correctly. We analyzed two classes of models: the first class comprises common linear memristor models and the second class widely used nonlinear memristive models. The linear memristor models are based on Strukov's initial memristor model extended by different window functions, while the nonlinear models include Pickett's physics-based memristor model and models derived thereof. This study reveals lacking predictivity of the first class of models, independent of the applied window function. Only the physics-based model is able to fulfill most of the basic evaluation criteria.

Journal ArticleDOI
TL;DR: A theoretical investigation of synchronous NV logic gates based on RS memories (RS-NVL) is presented and special design techniques and strategies are proposed to optimize the structure according to different resistive characteristics of NVMs.
Abstract: Emerging non-volatile memories (NVM) based on resistive switching mechanism (RS) such as STT-MRAM, OxRRAM and CBRAM etc., are under intense R&D investigation by both academics and industries. They provide high write/read speed, low power and good endurance (e.g., > 1012) beyond mainstream NVMs, which allow them to be embedded directly with logic units for computing purpose. This integration could increase significantly the power/die area efficiency, and then overcome definitively the power/speed bottlenecks of modern VLSIs. This paper presents firstly a theoretical investigation of synchronous NV logic gates based on RS memories (RS-NVL). Special design techniques and strategies are proposed to optimize the structure according to different resistive characteristics of NVMs. To validate this study, we simulated a non-volatile full-adder (NVFA) with two types of NVMs: STT-MRAM and OxRRAM by using CMOS 40 nm design kit and compact models, which includes related physics and experimental parameters. They show interesting power, speed and area gain compared with synchronized CMOS FA while keeping good reliability.

Journal ArticleDOI
TL;DR: The obtained results show that the proposed cell can not only tolerate upset at its any sensitive node regardless of upset polarity and strength, but also recover from multiple-node upset induced by charge sharing on the fixed nodes independent of the stored value.
Abstract: In this paper, a novel low-power and highly reliable radiation hardened memory cell (RHM-12T) using 12 transistors is proposed to provide enough immunity against single event upset in TSMC 65 nm CMOS technology. The obtained results show that the proposed cell can not only tolerate upset at its any sensitive node regardless of upset polarity and strength, but also recover from multiple-node upset induced by charge sharing on the fixed nodes independent of the stored value. Moreover, the proposed cell has comparable or lower overheads in terms of static power, area and access time compared with previous radiation hardened memory cells.

Journal ArticleDOI
TL;DR: A power-efficient MFF design architecture to address this challenge based on the combination of checkpointing operation, power gating and self-enable mechanisms is proposed and results confirm its lower power consumption compared to conventional CMOS FF and the other structures.
Abstract: Advanced computing systems suffer from high static power due to the rapidly rising leakage currents in deep sub-micron MOS technologies. Fast access non-volatile memories (NVM) are under intense investigation to be integrated in Flip-Flops or computing memories to allow system power-off in standby state and save power. Spin Transfer Torque MRAM (STT-MRAM) is considered the most promising NVM to address this issue thanks to its high speed, low power, and infinite endurance. However, one of the disadvantages of STT-MRAM for the computing purpose is its relatively high write energy to build up Magnetic Flip-Flop (MFF). In this paper, we propose a power-efficient MFF design architecture to address this challenge based on the combination of checkpointing operation, power gating and self-enable mechanisms. Multi non-volatile storages can be integrated locally in a conventional FF without significant area overhead benefiting from the 3-D implementation of STT-MRAM. We performed electrical simulations (i.e. transient and statistical) to validate its functional behaviors and evaluate its performance by using an accurate spice model of STT-MRAM and an industrial 40 nm CMOS design kit. The simulation results confirm its lower power consumption compared to conventional CMOS FF and the other structures.

Journal ArticleDOI
TL;DR: To verify the ZVS of the main switches and efficiency improvement of the proposed bidirectional dc-dc converter, theoretical analysis and experimental results from a 200 W prototype are discussed.
Abstract: A soft-switching bidirectional dc-dc converter using a lossless active snubber is proposed in this paper. In the proposed converter, zero-voltage-switching (ZVS) of main switches is achieved by utilizing an active snubber which consists of auxiliary switches, diodes, an inductor, and a capacitor. Although conduction losses associated with additional components increase, switching losses are significantly reduced due to the ZVS operation of main switches. Therefore, total efficiency is improved. Moreover, there is no reverse-recovery problem of the intrinsic body diodes of the switches. To verify the ZVS of the main switches and efficiency improvement of the proposed bidirectional dc-dc converter, theoretical analysis and experimental results from a 200 W prototype are discussed.

Journal ArticleDOI
TL;DR: This paper presents a new approach to identify systems which adapts dynamically to the sparseness level of the system and thus works well both in sparse and non-sparse environments, and requires much less complexity than the existing algorithms.
Abstract: In practice, one often encounters systems that have a sparse impulse response, with the degree of sparseness varying over time. This paper presents a new approach to identify such systems which adapts dynamically to the sparseness level of the system and thus works well both in sparse and non-sparse environments. The proposed scheme uses an adaptive convex combination of the LMS algorithm and the recently proposed, sparsity-aware zero-attractor LMS (ZA-LMS) algorithm. It is shown that while for non-sparse systems, the proposed combined filter always converges to the LMS algorithm (which is better of the two filters for non-sparse case in terms of lesser steady state excess mean square error (EMSE)), for semi-sparse systems, on the other hand, it actually converges to a solution that produces lesser steady state EMSE than produced by either of the component filters. For highly sparse systems, depending on the value of a proportionality constant in the ZA-LMS algorithm, the proposed combined filter may either converge to the ZA-LMS based filter or may produce a solution which, like the semi-sparse case, outperforms both the constituent filters. A simplified update formula for the mixing parameter of the adaptive convex combination is also presented. The proposed algorithm requires much less complexity than the existing algorithms and its claimed robustness against variable sparsity is well supported by simulation results.

Journal ArticleDOI
TL;DR: Radar measurements with the PRN and the frequency-modulated continuous-wave (FMCW) principles show comparable results and the PRn radar proves to be a real alternative to the FMCW radar.
Abstract: This paper describes a fully-integrated 77-GHz ultra-wideband pseudo-random noise (PRN) radar transceiver in a Silicon-Germanium technology. The transceiver is equipped with a programmable pseudo-random binary sequence (PRBS) generator, which is realized in a current-mode logic topology and can be operated with a clock rate of up to 4.25 GHz to enable a range resolution of 3.5 cm. The signal generation unit is simplified by including a frequency multiplier to create a 76.5-GHz carrier signal from a single 4.25-GHz input, that is also used as a clock for the PRBS generator. The transceiver achieves a phase noise of -105.3 dBc/Hz at 1-MHz offset frequency, a transmit output power of 6.2 dBm, a receive gain of 24 dB and an input-referred 1-dB compression point of -14 dBm. Track&hold circuits included in the receive path allow the use of a sub-sampling technique to reduce the IF data rate down to 1 MHz. Radar measurements with two PRN transceivers with different primitive polynomials were done concurrently to show a fundamental function of the programmable PRBS generator. Radar measurements with the PRN and the frequency-modulated continuous-wave (FMCW) principles show comparable results and the PRN radar proves to be a real alternative to the FMCW radar.

Journal ArticleDOI
TL;DR: Using three independent gates, dual-threshold-voltage design is achievable through the use of a wiring scheme on an uncommitted pattern and a range of logic functions is also obtained by replacing VDD and GND by complementary input signals.
Abstract: Silicon nanowire transistors with Schottky-barrier contacts exhibit both n-type and p-type characteristics under different bias conditions. Polarity controllability of silicon nanowire transistors has been further demonstrated by using an additional polarity gate. The device can be configured as n-type or p-type by controlling the polarity gate voltage. This paper extends this approach by using three independent gates and shows its interest to implement dual-threshold-voltage configurable circuits. Polarity and threshold voltage of uncommitted devices are determined by applying different bias patterns to the three gates. Uncommitted logic gates can thus be configured to implement different logic functions, targeting either high-performance or low-leakage applications. Dual-threshold-voltage design is thereby achievable through the use of a wiring scheme on an uncommitted pattern. With the polarity controllability of the three-independent-gate device, a range of logic functions is also obtained by replacing VDD and GND by complementary input signals. Synthesis results of ISCAS’85 and VTR sequential benchmark circuits with these devices show, before place and route, comparable performance and 51% reduction of leakage power consumption compared to 22-nm low-standby-power FinFET technology.

Journal ArticleDOI
TL;DR: A novel technique, which allows the use of low-speed ADCs by introducing spectral extrapolation to the band-limited feedback signal, is proposed and allows efficient implementation of DPD for very wideband signals.
Abstract: With the ever increasing demands for higher data rate, wider bandwidth is required for improving the throughput. This trend, however, imposes design challenges for the digital predistortion (DPD) in many aspects. In order to sample the broadband power amplifier (PA) output signal, it requires the use of high-speed analog-to-digital converters (ADCs), which tend to be the most expensive components in a transmitter with DPD. The sampling speed of the ADC for conventional DPD has to be several times of the original signal bandwidth in order to cover the out-of-band intermodulation components caused by nonlinear PA. In this paper, a novel technique, which allows the use of low-speed ADCs by introducing spectral extrapolation to the band-limited feedback signal, is proposed. This allows efficient implementation of DPD for very wideband signals. Experimental results demonstrate that the bandwidth of the acquisition path can be even less than the bandwidth of original signal applying the proposed technique. In addition, satisfactory linearization performance has been achieved employing wideband signals up to 160 MHz bandwidth.

Journal ArticleDOI
TL;DR: This design is the first fully implemented wormhole router with packet-branching that can never deadlock, and the design's effectiveness is demonstrated in Neurogrid, a million-neuron neuromorphic system consisting of sixteen chips.
Abstract: We present a tree router for multichip systems that guarantees deadlock-free multicast packet routing without dropping packets or restricting their length. Multicast routing is required to efficiently connect massively parallel systems' computational units when each unit is connected to thousands of others residing on multiple chips, which is the case in neuromorphic systems. Our tree router implements this one-to-many routing by branching recursively-broadcasting the packet within a specified subtree. Within this subtree, the packet is only accepted by chips that have been programmed to do so. This approach boosts throughput because memory look-ups are avoided enroute, and keeps the header compact because it only specifies the route to the subtree's root. Deadlock is avoided by routing in two phases-an upward phase and a downward phase-and by restricting branching to the downward phase. This design is the first fully implemented wormhole router with packet-branching that can never deadlock. The design's effectiveness is demonstrated in Neurogrid, a million-neuron neuromorphic system consisting of sixteen chips. Each chip has a 256 × 256 silicon-neuron array integrated with a full-custom asynchronous VLSI implementation of the router that delivers up to 1.17 G words/s across the sixteen-chip network with less than 1 μs jitter.

Journal ArticleDOI
TL;DR: A low-complexity algorithm and architecture to compute power spectral density (PSD) using the Welch method and a special case of the short-time Fourier transform based on the proposed PSD computation algorithm is presented.
Abstract: This paper presents a low-complexity algorithm and architecture to compute power spectral density (PSD) using the Welch method. The Welch algorithm provides a good estimate of the spectral power at the cost of high computational complexity. We propose a new modified approach to reduce the computational complexity of the Welch PSD computation for a 50% overlap. In the proposed approach, an N/2-point FFT is computed, where N is the length of the window and is merged with the FFT of the previous N/2-point to generate an N-point FFT of the overlapped segment. This requires replacing the windowing operation as a convolution in the frequency domain. Fortunately, the frequency-domain filtering requires a symmetric 3-tap or 5-tap FIR filter for raised cosine windows. The proposed method needs to compute ( L+1) N/2-point FFTs instead of L N-point FFTs, where L is the number of overlapping segments. In the proposed novel merged FFT approach, the even samples are computed exactly, while the odd samples require a shift by a half-sample delay and are estimated using a bidirectional fractional-delay filter. The complexity reduction comes at the cost of slight performance loss due to the approximation used for the implementation of the fractional-delay filter. The performance loss is about 8% using fractional-delay filter with 2 multipliers. A novel architecture is presented based on the proposed algorithm. The proposed architecture not only consumes 33% less energy compared to the original method but also reduces the latency by about 44% for 8 overlapping segments. Further a low-complexity architecture is presented to compute a special case of the short-time Fourier transform based on the proposed PSD computation algorithm.

Journal ArticleDOI
TL;DR: A new concept of a true random number generator (TRNG) in which the direct proximity of the metastable point is not mandatory is proposed and the transition times of two devices are compared.
Abstract: The paper introduces a new concept of a true random number generator (TRNG). Most metastability-based solutions operate on the uncertainty of a logical output state of a device (flip-flop, D-latch) aimed to be resolved from an exact metastable point. However, it has been shown that the metastable point of a bistable circuit (which is practically impossible to reach) does not guarantee absolute randomness or sufficient entropy. We propose the concept of a device in which the direct proximity of the metastable point is not mandatory. In our concept the transition times of two devices are compared. Such construction is less sensitive to the proximity of the metastable point, temperature fluctuations, and power supply instabilities. The paper briefly describes the metastability phenomena in general and other known metastability-based TRNG concepts. A new concept of a dual-metastability time-competitive generator is presented, analyzed both numerically and theoretically, and verified based on the sample circuit's implementation. Empirical and statistical test results are presented.

Journal ArticleDOI
TL;DR: The proposed nvSRAM-based FPGA system significantly accelerates the loading speed to less than 1 ns with 2.54 fJ/cell loading energy and achieves 174 times reduction in active leakage power and 15,000 times increase in retention time.
Abstract: The high leakage current has been one of the critical issues in SRAM-based Field Programmable Gate Arrays (FPGAs). In recent works, resistive non-volatile memories (NVMs) have been utilized to tackle the issue with their superior energy efficiency and fast power-on speed. Phase Change Memory (PCM) is one of the most promising resistive NVMs with the advantages of low cost, high density and high resistance ratio. However, most of the reported PCM-based FPGAs have significant active leakage power and reliability issues. This paper presents a low active leakage power and high reliability PCM based non-volatile SRAM (nvSRAM). The low active leakage power and high reliability are achieved by biasing PCM cells at 0 V during FPGA operation. Compared to the state-of-the-art, the proposed nvSRAM based 4-input look up table (LUT) achieves 174 times reduction in active leakage power and 15000 times increase in retention time. In addition, the proposed nvSRAM-based FPGA system significantly accelerates the loading speed to less than 1 ns with 2.54 fJ/cell loading energy.

Journal ArticleDOI
TL;DR: This paper investigates the sine qua non condition of existence of equilibria for electrical systems with external sources furnishing constant power to the loads, which is a scenario encountered in modern applications.
Abstract: In this paper we investigate the sine qua non condition of existence of equilibria for electrical systems with external (AC or DC) sources furnishing constant power to the loads, which is a scenario encountered in modern applications. Two general cases are considered, when the system is i) linear time-invariant or ii) nonlinear, with dynamic behavior described by a port-Hamiltonian model with constant dissipation and switching interconnection matrix. The latter class includes the practically important case of power converters. For both cases necessary and sufficient conditions for existence of equilibria are given, which give an upper bound on the power dissipated in steady-state that should exceed the extracted constant power. The existence of the equilibrium is ensured if and only if the inequality is satisfied.

Journal ArticleDOI
TL;DR: It is shown that the direct-form LMS adaptive filter has nearly the same critical path as its transpose-form counterpart, but provides much faster convergence and lower register complexity.
Abstract: This paper presents a precise analysis of the critical path of the least-mean-square (LMS) adaptive filter for deriving its architectures for high-speed and low-complexity implementation. It is shown that the direct-form LMS adaptive filter has nearly the same critical path as its transpose-form counterpart, but provides much faster convergence and lower register complexity. From the critical-path evaluation, it is further shown that no pipelining is required for implementing a direct-form LMS adaptive filter for most practical cases, and can be realized with a very small adaptation delay in cases where a very high sampling rate is required. Based on these findings, this paper proposes three structures of the LMS adaptive filter: (i) Design 1 having no adaptation delays, (ii) Design 2 with only one adaptation delay, and (iii) Design 3 with two adaptation delays. Design 1 involves the minimum area and the minimum energy per sample (EPS). The best of existing direct-form structures requires 80.4% more area and 41.9% more EPS compared to Design 1. Designs 2 and 3 involve slightly more EPS than the Design 1 but offer nearly twice and thrice the MUF at a cost of 55.0% and 60.6% more area, respectively.

Journal ArticleDOI
TL;DR: For the first time, ReRAM-based Non-Volatile Flip-Flop (NVFF) topologies which are optimized for low-voltage operation (including near-VT and sub-VT operation) are presented and compared.
Abstract: The total power budget of Ultra-Low Power (ULP) VLSI Systems-on-Chip (SoCs) is often dominated by the leakage power of embedded memories as well as status registers. On the one hand, supply voltage scaling down to the near-threshold (near- $V_{\rm T}$ ) or even to the subthreshold (sub- $V_{\rm T}$ ) domain is a commonly used, efficient technique to reduce both leakage power and active energy dissipation. On the other hand, emerging CMOS-compatible device technologies such as Resistive Memories (ReRAMs) enable non-volatile, on-chip data storage and zero-leakage sleep periods. For the first time, we present and compare ReRAM-based Non-Volatile Flip-Flop (NVFF) topologies which are optimized for low-voltage operation (including near- $V_{\rm T}$ and sub- $V_{\rm T}$ operation). Three low-voltage NVFF circuit topologies are proposed and evaluated in terms of energy dissipation and reliability. Using topologies with two complementary programmed ReRAM devices, Monte Carlo simulations accounting for parametric variations confirm reliable data restore operation from the ReRAM devices at a sub- $V_{\rm T}$ voltage as low as 400 mV. A topology using a single ReRAM device exhibits lower write energy, but requires a near- $V_{\rm T}$ voltage for robust read. Energy characterization is performed at nominal, near- $V_{\rm T}$ , and sub- $V_{\rm T}$ supply voltages. The minimum energy point is reached for near- $V_{\rm T}$ read operation with a total read+write energy of 735 fJ.