scispace - formally typeset
Search or ask a question

Showing papers on "Clock gating published in 2017"


Journal ArticleDOI
TL;DR: A novel DNU tolerant latch design is proposed that is designed specifically to provide additional reliability when clock gating is used and is shown to provide superior soft error resiliency while incurring a 40 percent overhead compared toDNU tolerant designs.
Abstract: As the process feature size continues to scale down, the susceptibility of logic circuits to radiation induced error has increased. This trend has led to the increase in sensitivity of circuits to multi-node upsets. Previously, work has been done to harden latches against single event upsets (SEU). Currently, there has been a concerted effort to design latches that are tolerant to double node upsets (DNU) and triple node upsets (TNU). In this paper, we first propose a novel DNU tolerant latch design. The latch is designed specifically to provide additional reliability when clock gating is used. Through experimentation, it is shown that the DNU tolerant latch is 11.3 percent more power efficient than existing latch designs suited for clock gating. In addition to the DNU tolerant design, we propose the first TNU tolerant latch. The TNU tolerant latch is shown to provide superior soft error resiliency while incurring a 40 percent overhead compared to DNU tolerant designs.

69 citations


Patent
30 Sep 2017
TL;DR: In this paper, the mechanisms and apparatuses relating to configurable clock gating in spatial arrays are described, and a configuration controller coupled to a first processing element and a second processing element of the plurality of processing elements is described.
Abstract: Methods and apparatuses relating to configurable clock gating in spatial arrays are described. In one embodiment, a processor includes processing elements; an interconnect network between the processing elements; and a configuration controller, coupled to a first processing element and a second processing element of the plurality of processing elements and the first processing element having an output coupled to an input of the second processing element, to configure the second processing element to clock gate at least one clocked component of the second processing element, and configure the first processing element to send a reenable signal on the interconnect network to the second processing element to reenable the at least one clocked component of the second processing element when data is to be sent from the first processing element to the second processing element.

25 citations


Journal ArticleDOI
01 Apr 2017
TL;DR: Dataflow specifications are adopted as a starting point to feature power minimization in coarse-grained reconfigurable embedded systems and the validity of this model-based approach has been proved over the reconfigured computing core of a multi-functional coprocessor for image processing applications.
Abstract: Modern embedded systems, to accommodate different applications or functionalities over the same substrate and provide flexibility at the hardware level, are often resource redundant and, consequently, power hungry. Therefore, dedicated design frameworks are required to implement efficient runtime reconfigurable platforms. Such frameworks, to challenge this scenario, need also to offer application specific support for power management. In this work, we adopt dataflow specifications as a starting point to feature power minimization in coarse-grained reconfigurable embedded systems. The proposed flow is composed of two subsequent steps: 1) the characterization of the optimal topological system specification(s) and 2) the identification of disjointed logic regions. These latter are then used to implement clock and power gating methodologies. The validity of this model-based approach has been proved over the reconfigurable computing core of a multi-functional coprocessor for image processing applications. Results have been assessed targeting both an ASIC 90 nm technology and a 45 nm one.

21 citations


Journal ArticleDOI
TL;DR: By applying the proposed method to JPEG compression standard, it is shown that the reconstructed images of the proposed structure have favorable quality and compression ratio, in addition to achieving lower power consumption and area overhead.
Abstract: Discrete Cosine Transform (DCT) is known as one of the most significant, fundamental, and prevailing transforms in data compression, watermarking, and medical digital image processing applications for secure storing, sending, and transmitting in cyberspace. This paper addresses a low power, area efficient, and low complexity structure of COordinate Rotation Digital Computer (CORDIC)-based DCT for applying to the Wireless Capsule Endoscopy (WCE) application. In this work, a novel structure is developed called low-power Lookahead (LPLA) CORDIC to overcome the problems of power supply restriction and small dimensions while maintaining the image quality in the WCE. The key idea in the proposed structure is that the successive computations of CORDIC algorithm are performed by utilizing the pipeline property of Lookahead (LA) CORDIC. This leads to employ a single common hardware for the two CORDIC outputs. Thus, the number of the adder and shifter blocks required for CORDIC computation decreases in the proposed structure. Moreover, we utilize pipelining, clock gating and data gating techniques which result in further reduction of power consumption. The simulation results by TSMC 0.13 m technology at 1.2v power supply demonstrate that the proposed LPLA CORDIC-based DCT structure averagely consumes 64% of the power, 43% of the area, and 8% of the power-delay-product (PDP) of the most efficient structures reported in the literature. Furthermore, by applying the proposed method to JPEG compression standard, it is shown that the reconstructed images of the proposed structure have favorable quality and compression ratio, in addition to achieving lower power consumption and area overhead. According to the aforementioned features, the proposed LPLA CORDIC-based DCT structure is suitable for utilizing in the compressor part of wireless endoscopy capsule and other low-power and high-quality systems which are battery-based. Display Omitted In this paper DCT-Based Compressor using LPLA CORDIC in WCE application is proposed.The main idea is based on employing a single common hardware for two CORDIC outputs.Clock gating and pipeline techniques is used for further reduction in power and glitch.Number of adder and shifter blocks are reduced compared with other known structures.Simulation results show 64%, 43% and 8% reduction in Power, area and PDP respectively.

19 citations


Journal ArticleDOI
TL;DR: A set of techniques that, considering the dynamic streaming behavior of algorithms, can achieve power savings by selectively switching off parts of the circuits when they are temporarily inactive are introduced.
Abstract: This paper investigates the reduction of dynamic power for streaming applications yielded by asynchronous dataflow designs by using clock gating techniques. Streaming applications constitute a very broad class of computing algorithms in areas such as signal processing, digital media coding, cryptography, video analytics, network routing, packet processing, etc. This paper introduces a set of techniques that, considering the dynamic streaming behavior of algorithms, can achieve power savings by selectively switching off parts of the circuits when they are temporarily inactive. The techniques being independent from the semantic of the application can be applied to any application and can be integrated into the synthesis stage of a high-level dataflow design flow. Experimental results of at-size applications synthesized on field-programmable gate arrays platforms demonstrate power reductions achievable with no loss in data throughput.

16 citations


Journal ArticleDOI
TL;DR: A dual-lane DC-to-12.5 Gb/s all-rate clock and data recovery IC with a single LC voltage-controlled oscillator is fabricated in a 90 nm CMOS that features an automatic loop gain control scheme that adjusts the bandwidth of a CDR in the background for optimal bit error rate (BER) performance.
Abstract: A dual-lane DC-to-12.5 Gb/s all-rate clock and data recovery (CDR) IC with a single LC voltage-controlled oscillator is fabricated in a 90 nm CMOS. An all-rate clock divider with an asynchronous phase calibration scheme is employed to generate all-rate clock signals without a phase mismatch or duty cycle distortion. The IC features an automatic loop gain control scheme that adjusts the bandwidth of a CDR in the background for optimal bit error rate (BER) performance by monitoring the phase difference between the incoming data and the recovered clock signal. The proposed CDR consumes 244 mW at 12.5 Gb/s under dual-lane operation with an input sensitivity of 12 mVpp,diff. The CDR supports referenceless allrate operation with a BER <; 10 -12 on PRBS31 and compensates for 20 dB of channel loss using a continuous-time linear equalizer (CTLE), a one-tap decision feedback equalizer (DFE), and a three-tap pre-emphasis filter. The power efficiency of the test chip is 9.76 mW/Gb/s.

16 citations


Proceedings ArticleDOI
06 Apr 2017
TL;DR: The functional verification of the conventional and reversible Kogge-stone Adder, Vedic Multiplier and Barrel Shifter is performed using Verilog in Xilinx ISE and the power, delay and area of the Kogger-stone-Adder-Vedic-Multiplier- Barrel-Shifter are computed using Cadence RTL compiler software in 90 nm technology.
Abstract: The modern systems are denser and faster. But these systems consume more power. The variants of power dissipation are dynamic power, leakage power, short circuit power and static power dissipation. Adders, Shifters and Multipliers are the essential building blocks of any Digital Signal Processor (DSP) architecture. Hence design of these building blocks to dissipate less power is of utmost importance in digital system design. In order to reduce this power dissipation, there are many low power approaches such as Multi-Vth, reducing the voltage swing, clock gating, use of reversible logic gates etc. The main advantage of designing circuit using reversible logic gates is that the designed circuits will be compatible with the available resources. Reversible Logic gates are that logic gates which have the same number of inputs and outputs and all the outputs are unique for a given input combination. These gates are used to reduce the power dissipation due to loss of information bits. The functional verification of the conventional and reversible Kogge-stone Adder, Vedic Multiplier and Barrel Shifter is performed using Verilog in Xilinx ISE and the power, delay and area of the Kogge-stone Adder, Vedic Multiplier and Barrel Shifter are computed using Cadence RTL compiler software in 90 nm technology. The proposed reversible Vedic Multiplier consumed 0.621 % less power than the conventional Vedic Multiplier. The power dissipation of reversible Barrel Shifter is found to be 26.49 % less than conventional Barrel Shifter.

16 citations


Posted Content
TL;DR: This paper presents an investigation of power consumption overhead of the DPR process using a high-speed digital oscilloscope and the shunt resistor method and results in terms of reconfiguration time andPower consumption overhead for Virtex 5 FPGAs are shown.
Abstract: In the context of embedded systems design, two important challenges are still under investigation. First, improve real-time data processing, reconfigurability, scalability, and self-adjusting capabilities of hardware components. Second, reduce power consumption through low-power design techniques as clock gating, logic gating, and dynamic partial reconfiguration (DPR) capabilities. Today, several application, e.g., cryptography, Software-defined radio or aerospace missions exploit the benefits of DPR of programmable logic devices. The DPR allows well defined reconfigurable FPGA region to be modified during runtime. However, it introduces an overhead in term of power consumption and time during the reconfiguration phase. In this paper, we present an investigation of power consumption overhead of the DPR process using a high-speed digital oscilloscope and the shunt resistor method. Results in terms of reconfiguration time and power consumption overhead for Virtex 5 FPGAs are shown.

13 citations


Patent
20 Sep 2017
TL;DR: In this article, a clock gating circuit is described, which includes an input circuit configured to receive an enable signal and a clock enable circuitry that receives an input clock signal, and a latch that captures and stores an enabled state of the enable signal.
Abstract: A clock gating circuit is disclosed. The clock gating circuit includes an input circuit configured to receive an enable signal and clock enable circuitry configured to receive an input clock signal. The clock gating circuit also includes a latch that captures and stores an enabled state of the enable signal when the enable signal is asserted. An output circuit is coupled to the latch, and provides an output signal corresponding to a state of the clock signal when the latch is storing the enabled state. The clock gating circuit is arranged such that, when the latch is not storing the enabled state, no dynamic power is consumed responsive to state changes of the input clock signal.

12 citations


Proceedings ArticleDOI
01 Feb 2017
TL;DR: A fully integrated adiabatic clocking scheme that efficiently synthesizes n-step clock waveforms from 1MHz to 2GHz via a switched-capacitor DC-AC multi-level inverter topology, theoretically reducing power by 1/n without using any magnetic component.
Abstract: Clock distribution in modern SoCs consumes a significant fraction of total chip power. To reduce clock distribution power, resonant clocking schemes, where an inductive reactance is used to cancel the capacitive reactance of global clock networks at a given resonance frequency, f o , have been proposed. Conventionally, such schemes are only suitable at high multi-GHz frequencies in order to be able to place the employed inductors on chip [1, 2]. Since many modern energy-efficient SoC designs optimize for clock frequencies DD to the MHz and near-threshold regimes, respectively, there is a need to develop low-power clock distribution schemes that can work across increasingly wider operating ranges. Recent work in quasi-continuous resonant clocking has proposed intermittent cancelation of global clock-tree capacitance during edge transitions, however, such techniques require large off-chip inductors and are limited to 0.98MHz [3] and 150MHz [4], respectively, owing to the need to operate well below resonance (i.e., o /10). Thus, while prior-art has shown power reduction for targeted applications, they all require large on- or off-chip magnetics, and do not meet the MHz-to-GHz frequency-range needs of modern DVFS-enabled SoCs. To address these problems, this paper introduces a fully integrated adiabatic clocking scheme that efficiently synthesizes n-step clock waveforms from 1MHz to 2GHz via a switched-capacitor DC-AC multi-level inverter topology, theoretically reducing power by 1/n without using any magnetic component.

11 citations


Patent
Martin Clara1
07 Feb 2017
TL;DR: In this paper, a differential phase adjustment approach measures for the phase imbalance and corrects the differential clock input signals used for generating clock signals which drive the digital-toanalog converter or the analog-to-digital converter.
Abstract: Differential clock phase imbalance can produce undesirable spurious content at a digital to analog converter output, or interleaving spurs on an analog-to-digital converter output spectrum, or more generally, in interleaving circuit architectures that depend on rising and falling edges of a differential input clock for triggering digital-to-analog conversion or analog-to-digital conversion. A differential phase adjustment approach measures for the phase imbalance and corrects the differential clock input signals used for generating clock signals which drive the digital-to-analog converter or the analog-to-digital converter. The approach can reduce or eliminate this phase imbalance, thereby reducing detrimental effects due to phase imbalance or differential clock skew.

Journal ArticleDOI
TL;DR: Experimental results indicate that the K-means clustering heuristic significantly reduces the clock power by clustering modules with similar switching behavior and close proximity, and the SA algorithm effectively inserts the shutdown gates to a 3D clock tree, while considering control TSV’s placement.
Abstract: We propose efficient algorithms to construct a low-power clock tree for through-silicon-via (TSV)-based 3D-ICs. We use shutdown gates to save clock trees’ dynamic power, which selectively turn off certain clock tree branches to avoid unnecessary clock activities when the modules in these tree branches are inactive. While this clock gating technique has been extensively studied in 2D circuits, its application in 3D-ICs is unclear. In 3D-ICs, a shutdown gate is connected to a control signal unit through control TSVs, which may cause placement conflicts with existing clock TSVs in the layout due to TSV’s large physical dimension. We develop a two-phase clock tree synthesis design flow for 3D-ICs: (1) 3D abstract clock tree generation based on K-means clustering and (2) clock tree embedding with simultaneous shutdown gates’ insertion based on simulated annealing (SA) and a force-directed TSV placer. Experimental results indicate that (1) the K-means clustering heuristic significantly reduces the clock power by clustering modules with similar switching behavior and close proximity, and (2) the SA algorithm effectively inserts the shutdown gates to a 3D clock tree, while considering control TSV’s placement. Compared with previous 3D clock tree synthesis techniques, our K-means clustering-based approach achieves larger reduction in clock tree power consumption while ensuring zero clock skew.

Journal ArticleDOI
TL;DR: This paper proposes the first CM clock synthesis (CMCS) methodology to reduce the overall clock network power with low skew and can integrate with traditional clock routing followed by transmitter and receiver sizing.
Abstract: In a high-performance VLSI design, the clock network consumes a significant amount of power. While most existing methodologies use voltage-mode (VM) signaling, these clock distributions lose a tremendous amount of dynamic power to charge/discharge the large global clock capacitance. New circuit approaches for current-mode (CM) clocking save significant clock power, but have been limited to only symmetric networks, while most application specific integrated circuits have asymmetric clock distributions. In this paper, we propose the first CM clock synthesis (CMCS) methodology to reduce the overall clock network power with low skew. The method can integrate with traditional clock routing followed by transmitter and receiver sizing. We validate the proposed methodology using ISPD 2009 and 2010 industrial benchmarks using an extracted SPICE model distributed in 1.4–275.6-mm2 area and consists of 81–2249 sinks. This methodology saves 39%–84% average power with similar skew on the benchmarks using 45-nm CMOS technology simulation of clock frequencies range from 1–3 GHz. In addition, the CMCS methodology takes $2.4-9.1\times $ less running time and consumes 20%–26% less transistor area compared with synthesized, buffered VM clock distributions.

Proceedings ArticleDOI
S. Aakash1, A. Anisha1, G. Jaswanth Das1, T. Abhiram1, J. P. Anita1 
20 Apr 2017
TL;DR: In this article, an improved version of the conventional comparator design for high-speed functioning and low power consumption has been proposed to overcome the challenges faced due to the digital change.
Abstract: In the fast moving digital world, it becomes imperative to constantly come up with innovation in digitization. The analog to digital converter is the second most widely used device in the world of electronic circuits. ADCs are composed of dynamic comparators. To overcome the challenges faced due to the digital change, improved versions of the conventional comparator design for high-speed functioning and low power consumption has been proposed. Area is another main factor when keeping in mind the design of these dynamic comparators. 180 nm CMOS technology and a constant supply voltage of 0.8V have been used. A conventional double tail comparator has been designed by adding transistors without hindering the functionality. This provides faster, more efficient modification of the comparator design. A new design for a dynamic regenerative double-tail comparator has been proposed which uses clock-gating techniques. This further reduces the power consumption and provides higher speed by reducing the delay time of the circuit.

Journal ArticleDOI
TL;DR: A probabilistic model is implemented to maximize the expected energy savings by grouping FFs in increasing order of their data-to-clock toggling probabilities, and it is shown to achieve the power savings of 23% and 17%, respectively, compared with designs with ordinary FFs.
Abstract: Data-driven clock gated (DDCG) and multibit flip-flops (MBFFs) are two low-power design techniques that are usually treated separately. Combining these techniques into a single grouping algorithm and design flow enables further power savings. We study MBFF multiplicity and its synergy with FF data-to-clock toggling probabilities. A probabilistic model is implemented to maximize the expected energy savings by grouping FFs in increasing order of their data-to-clock toggling probabilities. We present a front-end design flow, guided by physical layout considerations for a 65-nm 32-bit MIPS and a 28-nm industrial network processor. It is shown to achieve the power savings of 23% and 17%, respectively, compared with designs with ordinary FFs. About half of the savings was due to integrating the DDCG into the MBFFs.

Journal ArticleDOI
TL;DR: In this article, a method has been proposed by which one can reduce the clock jitter and achieve almost flat frequency clock output from the phase-locked loop (PLL), independent of the power supply voltage fluctuation.
Abstract: In this paper, a method has been proposed by which one can reduce the clock jitter and achieve almost flat frequency clock output from the phase-locked loop (PLL), independent of the power supply voltage fluctuation. These voltage fluctuations occur when a given chip comes out from the sleep mode to the active mode. This causes the chip to draw a hasty current, which in turn produces LdI/dt noise. That causes the voltage to drop and also to oscillate at the power delivery network’s resonance frequency. This power supply noise causes clock jitter. The voltage-controlled oscillator of the proposed PLL is designed at 45-nm technology such that when there is supply voltage variation, it is automatically corrected by a feedback methodology having only 11-ps response time delay, compared to 588-ps clock period. Simulation result shows that, for the proposed new PLL design, the number of places where the clock periods are altered due to this power supply voltage fluctuation is reduced. The performance of the proposed PLL design in terms of reduction of clock jitter, caused by the variation of power supply voltage and the flatness of the frequency versus power supply voltages, is tested by feeding the clock to a circuit (c17 of ISCAS’85) for the conventional methodology and also for our new methodology. It has been shown that, using the proposed method, the clock jitter caused by the power supply noise can be reduced by about 50% compared to the conventional design methodology.

Proceedings ArticleDOI
01 Oct 2017
TL;DR: This paper evaluates energy consumption of data types, operators, control statements, exception, and object in Java at a granular level to help in standardizing the energy consumption traits of Java which can be leveraged by software developers to generate energy efficient code in future.
Abstract: There has been a 10,000-fold increase in performance of supercomputers since 1992 but only 300-fold improvement in performance per watt. Dynamic adaptation of hardware techniques such as fine-grain clock gating, power gating and dynamic voltage/frequency scaling, are used for many years to improve the computer's energy efficiency. However, recent demands of exascale computation, as well as the increasing carbon footprint, require new breakthrough to make ICT systems more energy efficient. Energy efficient software has not been well studied in the last decade. In this paper, we take an early step to investigate the energy efficiency of Java which is one of the most common languages used in ICT systems. We evaluate energy consumption of data types, operators, control statements, exception, and object in Java at a granular level. Intel Running Average Power Limit (RAPL) technology is applied to measure the relative power consumption of small code snippets. Several observations are found, and these results will help in standardizing the energy consumption traits of Java which can be leveraged by software developers to generate energy efficient code in future.

Journal ArticleDOI
TL;DR: This brief describes the design and implementation of a 250-Mb/s to 6-Gb/s single-loop referenceless clock and data recovery circuit using the clock frequency multiplier and the referenceless frequency acquisition circuit to cover a wide-range data rate.
Abstract: This brief describes the design and implementation of a 250-Mb/s to 6-Gb/s single-loop referenceless clock and data recovery circuit. The clock frequency multiplier and the referenceless frequency acquisition circuit are used to cover a wide-range data rate. The clock frequency multiplier is proposed to generate the 6-GHz clock with low jitter. In addition, the voltage-controlled oscillator operates at 1/5-rate frequency of the sampling clock, which has a merit of low power consumption. The proposed circuit achieves 9.56-ps rms jitter, consumes 13.2 mW at 6 Gb/s, and occupies 0.0944 mm2 in a 65-nm CMOS technology.

Proceedings ArticleDOI
01 Jan 2017
Abstract: A mechanical circuit has been demonstrated that harnesses squegging to convert −50dBm of input continuous-wave (CW) energy into a local 1-kHz clock output while consuming three orders less local battery power than a typical real-time clock (RTC). Unlike a previous clock receiver that relied on a modulated RF input, this clock generator converts a CW input — no modulation needed — to a clock output via squegging of an impacting micromechanical resonant switch (“resoswitch”). Here, impact-induced disruption compels the device's resonating element to lose oscillation amplitude (hence stop impacting), then recover to impact again, only to again lose amplitude, in a periodic and repeatable fashion. The resulting time domain waveform, with periodic peaks and valleys, then provides a stable frequency that serves as a local on-board clock for low data rate applications. By dispensing with the need for a positive feedback sustaining amplifier, this CW-powered mechanical clock generator operates with only 0.8nW of battery power when outputting a triangle-wave into 0.8pF, which is 1250× lower than the μW of a typical RTC.

Proceedings ArticleDOI
24 Jun 2017
TL;DR: The primary contribution of CHARSTAR is optimizing reconfiguration mechanisms to become clock hierarchy aware, which improves processor energy efficiency by 20–25%, with efficiency improvements of roughly 2x in comparison to a naive power gating mechanism.
Abstract: High-performance architectures are over-provisioned with resources to extract the maximum achievable performance out of applications. Two sources of avoidable power dissipation are the leakage power from underutilized resources, along with clock power from the clock hierarchy that feeds these resources. Most reconfiguration mechanisms either focus solely on power gating execution resources alone or in addition, simply turn off the immediate clock tree segment which supplied the clock to those resources. These proposals neither attempt to gate further up the clock hierarchy nor do they involve the clock hierarchy in influencing the reconfiguration decisions. The primary contribution of CHARSTAR is optimizing reconfiguration mechanisms to become clock hierarchy aware. Resource gating decisions are cognizant of the power consumed by each node in the clock hierarchy and additionally, entire branches of the clock tree are greedily shut down whenever possible.The CHARSTAR design is further optimized for balanced spatio-temporal reconfiguration and also enables efficient joint control of resource and frequency scaling. The proposal is implemented by leveraging the inherent advantages of spatial architectures, utilizing a control mechanism driven by a lightweight offline trained neural predictor. CHARSTAR, when deployed on the CRIB tiled microarchitecture, improves processor energy efficiency by 20-25%, with efficiency improvements of roughly 2x in comparison to a naive power gating mechanism. Alternatively, it improves performance by 10-20% under varying power and energy constraints.

Patent
30 Mar 2017
TL;DR: In this article, a duty cycle corrector (DCC) is proposed to correct the duty cycle of a clock signal by detecting a difference in phase between the output clock signal and the intermediate clock signal.
Abstract: Apparatuses and methods for correcting a duty cycle of a clock signal are described. An example apparatus includes: a duty cycle corrector (DCC) that receives an input clock signal and a control signal and produces an output clock signal responsive, at least in part, to the input clock signal and the control signal; a circuit that divides a frequency of the input clock signal by a positive even integer and generates an intermediate clock signal; and a phase detector that generates the control signal responsive, at least in part, to a difference in phase between the output clock signal and the intermediate clock signal.

Patent
19 Apr 2017
TL;DR: In this paper, a universal dynamic aging system for Virtex-5 FPGAs (field programmable gate arrays) is proposed. But the aging function module is automatically adjusted according to the temperature of a chip, and aging response data are generated.
Abstract: The invention relates to a universal dynamic aging system for Virtex-5 FPGAs (field programmable gate arrays). The universal dynamic aging system for the Virtex-5 FPGAs includes an upper computer, a program-controlled power source, a programmer, an aging signal board, a high temperature test box, an aging test board and an aging FPGA; the upper computer generates a power-on instruction, sends an aging FPGA configuration bit stream, receives and displays aging response data; the program-controlled power source supplies power; the programmer completes the bit stream configuration of the aging FPGA; the aging signal board generates aging excitation signals; the high temperature test box adjusts the temperature of the aging signal board and the aging FPGA; and the aging FPGA includes a clock management module, a system monitoring module, a clock gating module and an aging function module. According to the universal dynamic aging system for the Virtex-5 FPGAs (field programmable gate arrays), the status of the aging function module is automatically adjusted according to the temperature of a chip, and aging response data are generated.

Proceedings ArticleDOI
05 May 2017
TL;DR: The technique involves symbolic simulation-based co-analysis of a processor's hardware design and a software binary to derive profitable and safe power gating decisions for a given set of module-oblivious domains when the software binary is run on the processor.
Abstract: The increasingly-stringent power and energy requirements of emerging embedded applications have led to a strong recent interest in aggressive power gating techniques. Conventional techniques for aggressive power gating perform module-based power gating in processors, where power domains correspond to RTL modules. We observe that there can be significant power benefits from module-oblivious power gating, where power domains can include an arbitrary set of gates, possibly from multiple RTL modules. However, since it is not possible to infer the activity of module-oblivious power domains from software alone, conventional software-based power management techniques cannot be applied for module-oblivious power gating in processors. Also, since module-oblivious domains are not encapsulated with a well-defined port list and functionality like RTL modules, hardware-based management of module-oblivious domains is prohibitively expensive. In this paper, we present a technique for low-cost management of module-oblivious power domains in embedded processors. The technique involves symbolic simulation-based co-analysis of a processor's hardware design and a software binary to derive profitable and safe power gating decisions for a given set of module-oblivious domains when the software binary is run on the processor. Our technique is automated, does not require programmer intervention, and incurs low management overhead. We demonstrate that module-oblivious power gating based on our technique reduces leakage energy by 2x with respect to state-of-the-art aggressive module-based power gating for a common embedded processor.

Proceedings ArticleDOI
28 May 2017
TL;DR: The associated clock signal generation techniques are presented in this paper, which range from particular non-overlapping clocks and extensive buffering to synchronous resetting for adjusting the order of the clock signals in each time-interleaved channel.
Abstract: A clock generation system for a 1GS/s 8-bit subranging time-interleaved analog-to-digital converter (ADC) is introduced General timing considerations for time-interleaved ADCs are reviewed prior to describing the design methodology for a prototype ADC This hybrid ADC architecture contains four time-interleaved combined sample-and-hold and capacitive digital-to-analog converter (SHDAC) circuits as front-end sample-and-hold for a flash stage and for a time-interleaved successive approximation stage, which minimizes the errors due to sampling time mismatches between the two stages The associated clock signal generation techniques that enable this hybrid ADC design approach are presented in this paper, which range from particular non-overlapping clocks and extensive buffering to synchronous resetting for adjusting the order of the clock signals in each time-interleaved channel The transistor-level and layout-level clock generation circuits were designed and simulated in 130nm CMOS technology, and consume 388mW from a 12V supply The standard deviation of the timing skews between time-interleaved channels is less than 1ps based on Monte Carlo simulations To evaluate the feasibility of the clock generation approach, post-layout simulations were conducted with the interconnected ADC core layout and routed clock generation circuits The hybrid ADC achieved an effective number of bits (ENOB) of 739 with a sampling frequency of 1GHz and an input frequency close to the Nyquist rate

Journal ArticleDOI
TL;DR: The proposed hybrid verification approach is described that represents an integral part of the suggested design methodology, and consists of formal and informal techniques, enabling the verification process to begin at the very early specification stage of the system development.
Abstract: Nowadays, power is a dominant factor that constrains highly integrated hardware-systems designs. The implied problems of high power density, causing chip overheating, or limited power source in modern Internet-of-Things devices are most commonly dealt with the use of the dynamic power management. This method enables to use power-reduction techniques, such as clock gating, power gating, or voltage and frequency scaling. Since the adoption of power management is quite difficult in modern complex systems, there are new approaches evolving intended to simplify power-constrained systems design. We have also proposed such an approach, utilizing the system level of design abstraction and increased automation in the design process. In this paper, the proposed hybrid verification approach is described that represents an integral part of the suggested design methodology. It consists of formal and informal techniques, enabling the verification process to begin at the very early specification stage of the system development. Our approach helps a designer to create correct and consistent power-management specification and verifies whether the specified power intent is preserved after design refinement. The continuous automated verification steps can quickly find errors at early design stages and thus reduce the amount of design re-spins, which speeds-up the overall development process.

Proceedings ArticleDOI
01 Dec 2017
TL;DR: Gating logic is incorporated to offer solution to PSN occurrence in CMOS circuits by controlling di/dt, which is generated by the linear current ramp of present day high performance CPU.
Abstract: With the continuous advent of CMOS, process technologies is extending threat to the noise immune capability of CMOS circuits and the power consumed by them. In present day scenario, though there are a lot of techniques that exist for power reduction, the study of power–supply noise (PSN) based on those techniques is almost unattended in literature. Modern clock gating is one of the best techniques to reduce dynamic and static power dissipation by curbing down the switching activity of the operating clock as well as blocking the direct path between the power lines during logic transition. Therefore, in this paper, we have incorporated gating logic to offer solution to PSN occurrence in CMOS circuits by controlling di/dt, which is generated by the linear current ramp of present day high performance CPU. It is witnessed that, the gated architectures generate very less di/dt with respect to their non-gated counterpart, resulting a noted amount of reduction in PSN.

Proceedings ArticleDOI
01 Aug 2017
TL;DR: The experimental results prove that the proposed approach improves the parameter of power consumption up to 58%, 62% and 67% for Hamming code, duplication with parity and TMR, respectively.
Abstract: The trade-off between power consumption and fault tolerance in embedded processors has been highlighted in recent years. This paper proposes an approach to reduce the dynamic power of the conventional fault-tolerant techniques used in the processors register file without affecting the effectiveness of the techniques. The power reduction mechanism is based on alleviating the dynamic power of unused registers in the register file. To evaluate the proposed approach, it has been applied to three conventional fault-tolerant techniques: Single-bit Error Correction-Double-bit Error Detection (SEC-DE D) code, duplication with parity, and Triple Modular Redundancy (TMR). As a case study, this approach is also employed in two processors: OpenRISC 1200 and the LEON III which are popular processors in the embedded applications. The experimental results prove that the proposed approach improves the parameter of power consumption up to 58%, 62% and 67% for Hamming code, duplication with parity and TMR, respectively.

Journal ArticleDOI
TL;DR: The fixed-width pulse feedback technique is most effective to reduce clock jitter effects among all techniques at high sampling frequency, while switched-capacitor-resistor and switched-shaped current techniques have best performance at medium frequency or below.
Abstract: It is well known that continuous-time Delta-Sigma modulators are very sensitive to clock jitter effects. In literature, a number of techniques have been proposed to cope with them. In this brief, we present a detailed review and comparison of the reported techniques. While the effectiveness to reduce clock jitter effects may be of most importance in this comparison, we also consider other performance metrics such as circuit complexity and overhead to implement the technique, power consumption overhead of technique, synthesis complexity incurred in system-level design, extensibility of the technique from single-bit to multi-bit operation, and robustness to process variation. When clock jitter is relatively large, the fixed-width pulse feedback technique is most effective to reduce clock jitter effects among all techniques at high sampling frequency, while switched-capacitor-resistor and switched-shaped current techniques have best performance at medium frequency or below.

Book ChapterDOI
01 Jan 2017
TL;DR: Here the RISC 32-bit processor architecture is developed using Clock gating technique to perform logical memory and branching instruction to reduce the power of RISC core.
Abstract: Here we developed the RISC 32-bit processor architecture using Clock gating technique to perform logical memory and branching instruction. The different blocks are using to fetch, decode, execute, and memory read/write to execute four stage pipelining. The Harvard architecture used which contains memory space for data and program. To reduce the power of RISC core, clock gating technique is used in the architectural level as an effective low power method. The further enhancement of pipeline architecture can be done using Verilog and simulation is carried out using Model sim tool and implemented on FPGA board.

Proceedings ArticleDOI
24 Jul 2017
TL;DR: The proposed FPGA-based neural processor system is up to 29% more energy efficient than a baseline LSM processor with little extra hardware overhead and uses the spoken English letters adopted from the TI46 speech recognition corpus as a benchmark.
Abstract: As a model of recurrent spiking neural networks, the Liquid State Machine (LSM) offers a powerful brain-inspired computing platform for pattern recognition and machine learning applications. While operated by processing neural spiking activities, the LSM naturally lends itself to an efficient hardware implementation via exploration of typical sparse firing patterns emerged from the recurrent neural network and smart processing of computational tasks that are orchestrated by different firing events at runtime. We explore these opportunities by presenting a LSM processor architecture with integrated on-chip learning and its FPGA implementation. Our LSM processor leverage the sparsity of firing activities to allow for efficient event-driven processing and activity-dependent clock gating. Using the spoken English letters adopted from the TI46 [1] speech recognition corpus as a benchmark, we show that the proposed FPGA-based neural processor system is up to 29% more energy efficient than a baseline LSM processor with little extra hardware overhead.