scispace - formally typeset
Search or ask a question

Showing papers on "Clock gating published in 2016"


Journal ArticleDOI
TL;DR: An area-efficient, globally asynchronous, locally synchronous network-on-chip (NoC) architecture for a hard real-time multiprocessor platform that uses statically scheduled time-division multiplexing (TDM) to control the communication over a structure of routers, links, and network interfaces (NIs).
Abstract: In this paper, we present an area-efficient, globally asynchronous, locally synchronous network-on-chip (NoC) architecture for a hard real-time multiprocessor platform. The NoC implements message-passing communication between processor cores. It uses statically scheduled time-division multiplexing (TDM) to control the communication over a structure of routers, links, and network interfaces (NIs) to offer real-time guarantees. The area-efficient design is a result of two contributions: 1) asynchronous routers combined with TDM scheduling and 2) a novel NI microarchitecture. Together they result in a design in which data are transferred in a pipelined fashion, from the local memory of the sending core to the local memory of the receiving core, without any dynamic arbitration, buffering, and clock synchronization. The routers use two-phase bundled-data handshake latches based on the Mousetrap latch controller and are extended with a clock gating mechanism to reduce the energy consumption. The NIs integrate the direct memory access functionality and the TDM schedule, and use dual-ported local memories to avoid buffering, flow-control, and synchronization. To verify the design, we have implemented a 4 $\times $ 4 bitorus NoC in 65-nm CMOS technology and we present results on area, speed, and energy consumption for the router, NI, NoC, and postlayout.

89 citations


Journal ArticleDOI
TL;DR: In this paper, a scan-programmable LDO macro in a low-power 0.13-μm technology operating down to 1.07×, the transistor $V_{{\rm TH}}$, and featuring greater than 90% current efficiency across a 50× current range was presented.
Abstract: Digitally implementable LDOs embedded within digital functional units augment their analog counterparts for ultrafine-grained power management in digital ICs. Digital load circuits represent load currents with large and infrequent current transients and require a wide voltage range of operation, preferably down to the threshold voltage ( $V_{{\rm TH}}$ ) of the transistor. This paper presents a discrete-time, fully digital, scan-programmable LDO macro in a low-power 0.13-μm technology operating down to 1.07×, the transistor $V_{{\rm TH}}$ , and featuring greater than 90% current efficiency across a 50× current range through fine-grained clock gating and adaptive control. An 8× improvement in transient response time to large load steps is achieved through switched mode control. Both transient and steady-state operation models and measurements of the LDO are presented.

86 citations


Proceedings ArticleDOI
12 Mar 2016
TL;DR: This work defines a configurable Software Defined Radio (SDR) baseband processor design for the Internet of Things (IoT) and introduces several architectural optimizations to this design: streaming registers, variable bit width datapath, dedicated ALUs for critical kernels, and an optimized flexible reduction network.
Abstract: In this paper, we define a configurable Software Defined Radio (SDR) baseband processor design for the Internet of Things (IoT). We analyzed the fundamental algorithms in communications systems on IoT devices to enable a microarchitecture design that supports many IoT standards and custom nonstandard communications. Based on this analysis, we propose a custom SIMD execution model coupled with a scalar unit. We introduce several architectural optimizations to this design: streaming registers, variable bit width datapath, dedicated ALUs for critical kernels, and an optimized flexible reduction network. We employ voltage scaling and clock gating to further reduce the power, while more than a 100% time margin has been reserved for reliable operation in the near-threshold region. Together our architectural enhancements lead to a 71× power reduction compared to a classic general purpose SDR SIMD architecture. Our IoT SDR datapath has sub-mW power consumption based on SPICE simulation, and is placed and routed to fit within an area of 0.074mm2 in a 28nm process. We implemented several essential elementary signal processing kernels and combined them to demonstrate two end-to-end upper bound systems, 802.15.4-OQPSK and Bluetooth Low Energy. Our full SDR baseband system consists of a configurable SIMD with a control plane MCU and memory. For comparison, the best commercial wireless transceiver consumes 23.8mW for the entire wireless system (digital/RF/ analog). We show that our digital system power is below 2mW, in other words only 8% of the total system power. The wireless system is dominated by RF/analog power comsumption, thus the price of flexibility that SDR affords is small. We believe this work is unique in demonstrating the value of baseband SDR in the low power IoT domain.

35 citations


Journal ArticleDOI
Saleh Masoodian1, Arun Rao1, Jiaju Ma1, Kofi Odame1, Eric R. Fossum1 
TL;DR: In this article, a pathfinder binary image sensor for exploring low power dissipation needed for future implementation of gigajot single-bit quanta image sensor (QIS) devices is presented.
Abstract: This paper presents a pathfinder binary image sensor for exploring low-power dissipation needed for future implementation of gigajot single-bit quanta image sensor (QIS) devices. Using a charge-transfer amplifier design in the readout signal chain and pseudostatic clock gating units for row and column addressing, the 1-Mpixel binary image sensor operating at 1000 frames/s dissipates only 20-mW total power consumption, including I/O pads. The gain and analog-to-digital converter stages together dissipate 2.5 pJ/b, successfully paving the way for future gigajot QIS sensor designs.

26 citations


Proceedings ArticleDOI
05 Jun 2016
TL;DR: This work proposes a new approach, clock overgating, for the design of approximate circuits at the Register Transfer Level (RTL), to gate the clock signal to selected Flip-Flops in the circuit, even during execution cycles in which the circuit functionality is sensitive to their state.
Abstract: Approximate computing is an emerging paradigm to improve the efficiency of computing systems by leveraging the intrinsic resilience of applications to their computations being executed in an approximate manner. Prior efforts on approximate hardware design have largely focused on circuit-level techniques. We propose a new approach, clock overgating, for the design of approximate circuits at the Register Transfer Level (RTL). The key idea is to gate the clock signal to selected Flip-Flops (FFs) in the circuit, even during execution cycles in which the circuit functionality is sensitive to their state. This saves power in the clock tree, the FF itself and in its downstream logic, while a quality loss ensues if the erroneous FF state propagates to the circuit output. We develop a systematic methodology to identify an energy-efficient overgating configuration for any given circuit and quality constraint. Towards this end, we develop 3 key strategies — significance-based overgating, grouping FFs into overgating islands, and utilizing internal signals of the circuit as triggers for overgating — that efficiently prune the large space of possible overgating configurations. We evaluate clock overgating by designing approximate versions of 6 machine learning accelerators, and demonstrate energy benefits of 1.36x on average (and upto 1.80x) for negligible (<0.5%) loss in application quality (classification accuracy).

23 citations


Proceedings ArticleDOI
23 May 2016
TL;DR: A 256-channel TDC with a reasonably high performance is implemented in a single chip of Xilinx Kintex-7 FPGA and the results show that the time resolution of dualchannel measurements is in the range of 27.1 ps ~ 56.2 ps with a very low sensitivity to ambient temperature.
Abstract: One of the advantages using a multi-phase clock embedded in field programmable gate arrays (FPGAs) to construct a time-to-digital converter (TDC) is its great multi-channel capability. However, many technical aspects limit the time resolution that can be achieved in the TDC architecture. In this paper, a series of solutions to these technical challenges are proposed and a 256-channel TDC with a reasonably high performance is implemented in a single chip of Xilinx Kintex-7 FPGA. The test results show that the time resolution of dualchannel measurements is in the range of 27.1 ps ∼ 56.2 ps with a very low sensitivity to ambient temperature. The measurement dead time is three system clock cycles, namely 4.3 ns, which means the peak measurement throughput of the TDC can reach 233 M events per second. The improved performance will make TDCs with the architecture more applicable for varieties of applications.

23 citations


Proceedings ArticleDOI
01 Jan 2016
TL;DR: This paper proposes a new approach: splitting clock domains at a very fine level, down to the level of a handful of gates, by adopting the paradigm of stochastic computation, where signal values are encoded as random bit streams.
Abstract: Clock distribution networks (CDNs) are costly in high-performance ASICs. This paper proposes a new approach: splitting clock domains at a very fine level, down to the level of a handful of gates. Each domain is synchronized with an inexpensive clock signal, generated locally. This is possible by adopting the paradigm of stochastic computation, where signal values are encoded as random bit streams. The design method is illustrated with the synthesis of circuits for applications in signal and image processing.

23 citations


Proceedings ArticleDOI
12 Mar 2016
TL;DR: In this paper, a distributed power delivery and process variation model at functional unit granularity is developed and used to simulate supply voltage behavior in a multicore GPU system, and a variation-aware solution for dynamically reducing voltage margins is presented.
Abstract: Voltage noise and manufacturing process variation represent significant reliability challenges for modern microprocessors. Voltage noise is caused by rapid changes in processor activity that can lead to timing violations and errors. Process variation is caused by manufacturing challenges in low-nanometer technologies and can lead to significant heterogeneity in performance and reliability across the chip. To ensure correct execution under worst-case conditions, chip designers generally add operating margins that are often unnecessarily conservative for most use cases, which results in wasted energy. This paper investigates the combined effects of process variation and voltage noise on modern GPU architectures. A distributed power delivery and process variation model at functional unit granularity was developed and used to simulate supply voltage behavior in a multicore GPU system. We observed that, just like in CPUs, large changes in power demand can lead to significant voltage droops. We also note that process variation makes some cores much more vulnerable to noise than others in the same GPU. Therefore, protecting the chip against large voltage droops by using fixed and uniform voltage guardbands is costly and inefficient. This paper presents core tunneling, a variation-aware solution for dynamically reducing voltage margins. The system relies on hardware critical path monitors to detect voltage noise conditions and quickly reacts by clock-gating vulnerable cores to prevent timing violations. This allows a substantial reduction in voltage margins. Since clock gating is enabled infrequently and only on the most vulnerable cores, the performance impact of core tunneling is very low. On average, core tunneling reduces energy consumption by 15%.

23 citations


Journal ArticleDOI
TL;DR: In this paper, a novel high-performance, low-cost, and fully SEDU-immune latch, referred to as HSMUF, is presented to tolerate SEDUs when any arbitrary combination pair of nodes is affected by a particle striking.
Abstract: Technology scaling results in that, soft errors, due to radiation-induced single event double-upset (SEDU) that affects double nodes through charge sharing, become a prominent concern in nanoscale CMOS technology. Existing hardened schemes suffer from being not fully SEDU-immune, or perform with too large cost penalties regarding propagation delay, silicon area, and power dissipation. A novel high-performance, low-cost, and fully SEDU-immune latch, referred to as HSMUF, is presented to tolerate SEDU when any arbitrary combination pair of nodes is affected by a particle striking. The latch mainly consists of a clock gating-based triple path DICE and a multiple-input Muller C-element. Simulation results demonstrate the SEDU-immunity and a 99.73% area–power–delay product saving for the HSMUF latch, compared with the SEDU fully immune DNCS-SEUT latch.

21 citations


Journal ArticleDOI
TL;DR: A novel D-flip-flop cell that maximizes power savings by enabling low-voltage/swing operation throughout the entire clock network and a novel clock tree synthesis algorithm to ensure that the same timing constraints are satisfied.
Abstract: A low-voltage/swing clocking methodology is developed through both circuit and algorithmic innovations. The primary objective is to significantly reduce the power consumed by the clock network while maintaining the circuit performance the same. The methodology consists of two primary components: 1) a novel D-flip-flop (DFF) cell that maximizes power savings by enabling low-voltage/swing operation throughout the entire clock network and 2) a novel clock tree synthesis algorithm to ensure that the same timing constraints (i.e., clock frequency, skew, and slew) are satisfied. The proposed methodology is integrated within an industrial design flow. Experimental results on ISCAS’89 benchmark circuits demonstrate that the overall power consumed by the clock tree can be reduced by up to 27% and 44% in, respectively, 32- and 45-nm technologies while satisfying the same timing constraints. Furthermore, the proposed low-swing DFF cell maintains the clock-to-Q delay the same while achieving up to 32% and 15% power savings in the overall flip-flop power of the benchmark circuits at, respectively, 1- and 1.5-GHz clock frequencies.

20 citations


Journal ArticleDOI
TL;DR: In this article, two types of clock networks including clock mesh and a buffered clock tree in a daisy-chain style were utilized to synchronize 5 DFF chains and fabricated in a 28-nm bulk CMOS technology.
Abstract: Two types of clock networks including clock mesh and a buffered clock tree in a daisy-chain style were utilized to synchronize 5 DFF chains and fabricated in a 28 nm bulk CMOS technology. Alpha and proton particles did not trigger any errors indicating the significant single event tolerance of these clock networks. Heavy ion results for the data input pattern of checkerboard (alternate 1 and 0) are presented showing few occurrences of burst errors induced by single event transients (SETs) in the buffered clock tree at relatively high LET values. The same phenomena were observed in laser tests. Clock mesh is therefore proven to be less sensitive to SETs, if pre-mesh drivers do not generate transients. Otherwise, clock mesh possesses lower tolerance, as demonstrated in previous work. Moreover, these burst errors occurred (1) simultaneously in a DFF chain and its subsequent chains, or (2) in a single chain with subsequent chains unaffected. The distinct mechanisms of these burst errors were found to be the electrical masking effect of the daisy-chain clock buffers.

Proceedings ArticleDOI
27 Jun 2016
TL;DR: A novel AES microarchitecture with 32-bit datapath optimized for low-power and low-energy consumption targeting IoT applications and uses simple shift registers for key/data storage and permutation to minimize the area, and the power/energy consumption.
Abstract: In this paper, we propose a novel AES microarchitecture with 32-bit datapath optimized for low-power and low-energy consumption targeting IoT applications. The proposed design uses simple shift registers for key/data storage and permutation to minimize the area, and the power/energy consumption. These shift registers also minimize the control logics in the key expansion and the encryption path. The proposed architecture is further optimized for area and/or power/energy consumption by selecting a suitable implementation of S-boxes and applying the clock gating technique. The implementation results in TSMC 65nm technology show that our design can save 20% of area or 20% of energy per bit at the same area when compared with the current 32-bit datapath designs. Our design also occupies smaller core area with lower energy per bit and at least 4 times higher in throughput in comparison with other 8-bit designs in the same technology node.

Journal ArticleDOI
TL;DR: The proposed “virtual clock” system improves the efficiency of circuits layout, substantially reducing interconnections overhead and boosting the reliability of the majority voter, and allows to globally reduce dynamic power consumption by considerably shrinking circuits area.
Abstract: Among emerging technologies nanomagnet logic (NML) has recently received particular attention. NML uses magnets as constitutive elements, and this leads to logic circuits where there is no need of an external power supply to maintain their logic state. As a consequence, a system with intrinsic memory and zero stand-by power consumption can be envisioned. Despite the interesting nature of NML, a fundamental open problem still calls for a solution that could really boost the NML technology: the clock system. It constrains the layout of circuits and leads to a potentially high dynamic power consumption if not carefully conceived. The first clock system developed was based on the generation of a magnetic field through an on-chip current. After that other types of NML, based on several different types of clock systems, were proposed to improve clocking. We present here our proposal for a new clock delivery method. We named this system “virtual clock.” It offers several important advantages over previous solutions. First, it notably simplifies the clock generation network, reducing the complexity of the fabrication process. It improves the efficiency of circuits layout, substantially reducing interconnections overhead and boosting the reliability of the majority voter. It enables the fabrication of in-plane NML circuits with two layers, while they were confined to one single layer up to now. Finally, it allows to globally reduce dynamic power consumption by considerably shrinking circuits area. Overall the “virtual clock” system that we propose represents an important step forward in the development of the NML technology.

Proceedings ArticleDOI
01 Sep 2016
TL;DR: A novel latch design is proposed in which all internal and external nodes are capable of recovering the previous value after a single or double node upset, which offers higher speed, lower power consumption and lower area requirements compared to all existing DNU tolerant latches.
Abstract: Due to technology scaling, radiation induced errors which cause a double node upset (DNU) have become more common in data storage elements All current designs either suffer from high area and performance overhead or are vulnerable to an error after a DNU thus making them unsuitable for clock gating A novel latch design is proposed in which all internal and external nodes are capable of recovering the previous value after a single or double node upset The proposed latch offers higher speed, lower power consumption and lower area requirements compared to all existing DNU tolerant latches capable of recovering all nodes

Journal ArticleDOI
TL;DR: An FPGA architecture that enables dynamically controlled power gating, in which FPGAs resources can be selectively powered down at run-time is presented, which could lead to significant overall energy savings for applications having modules with long idle times.
Abstract: Leakage power is an important component of the total power consumption in field-programmable gate arrays (FPGAs) built using 90-nm and smaller technology nodes. Power gating was shown to be effective at reducing the leakage power. Previous techniques focus on turning OFF unused FPGA resources at configuration time; the benefit of this approach depends on resource utilization. In this paper, we present an FPGA architecture that enables dynamically controlled power gating, in which FPGA resources can be selectively powered down at run-time. This could lead to significant overall energy savings for applications having modules with long idle times. We also present a CAD flow that can be used to map applications to the proposed architecture. We study the area and power tradeoffs by varying the different FPGA architecture parameters and power gating granularity. The proposed CAD flow is used to map a set of benchmark circuits that have multiple power-gated modules to the proposed architecture. Power savings of up to 83% are achievable for these circuits. Finally, we study a control system of a robot that is used in endoscopy. Using the proposed architecture combined with clock gating results in up to 19% energy savings in this application.

Proceedings ArticleDOI
15 Jun 2016
TL;DR: Leveraging channel hardening in massive MIMO, a symbol hardening technique is proposed to reduce MPD's complexity by more than 60% with minimal SNR loss.
Abstract: A 0.58mm2 40nm CMOS message-passing detector (MPD) is designed for a 256-QAM massive MIMO system supporting 32 concurrent mobile users in each time-frequency resource. Leveraging channel hardening in massive MIMO, a symbol hardening technique is proposed to reduce MPD's complexity by more than 60% with minimal SNR loss. The MPD is implemented in a 4-layer 2-way interleaved architecture to enable a 2.76Gb/s throughput (average 4.9 iterations at 27dB SNR with early termination) using 76% smaller area than a fully parallel architecture. With dynamic precision control and clock gating to exploit algorithmic properties, the energy is reduced to 79.8pJ/b (or 2.49pJ/b per TX antenna).

Journal ArticleDOI
TL;DR: A phase tracking procedure is proposed to enable the ADCDR to track the phase drift due to the voltage and temperature variations, and the proposed all-digital clock and data recovery (ADCDR) circuit, which is well suited for today's CMOS process scaling, enables the receiver to achieve low power and area consumption.
Abstract: This paper presents a robust energy/area-efficient receiver fabricated in a 28-nm CMOS process. The receiver consists of eight data lanes plus one forwarded-clock lane supporting the hypertransport standard for high-density chip-to-chip links. The proposed all-digital clock and data recovery (ADCDR) circuit, which is well suited for today’s CMOS process scaling, enables the receiver to achieve low power and area consumption. The ADCDR can enter into open loop after lock-in to save power and avoid clock dithering phenomenon. Moreover, to compensate the open loop, a phase tracking procedure is proposed to enable the ADCDR to track the phase drift due to the voltage and temperature variations. Furthermore, the all-digital delay-locked loop circuit integrated in the ADCDR can generate accurate multiphase clocks with the proposed calibrated locking algorithm in the presence of process variations. The precise multiphase clocks are essential for the half-rate sampling and Alexander-type phase detecting. Measurement results show that the receiver can operate at a data rate of 6.4 Gbits/s with a bit error rate $ , consuming 7.5-mW per lane (1.2 pJ/bit) under a 0.85 V power supply. With ADCDR’s phase tracking, the receiver performs better in jitter tolerance and achieves a 500-kHz bandwidth, which is high enough to track the phase drift. The receiver core occupies an area of 0.02 mm $^{2}$ per lane.

Patent
25 Mar 2016
TL;DR: In this article, a command decoder is configured to generate an auto-sync signal in response to a command for writing data at memory cells or reading data from a memory cell, and an internal data clock generating circuit configured to phase synchronize a second clock, having a clock frequency higher than a first clock.
Abstract: A semiconductor memory device includes a command decoder configured to generate an auto-sync signal in response to a command for writing data at a memory cell or reading data from a memory cell, and an internal data clock generating circuit configured to phase synchronize a second clock, having a clock frequency higher than a clock frequency of a first clock, with the first clock in response to the auto-sync signal.

Proceedings ArticleDOI
01 Sep 2016
TL;DR: In this article, the authors investigated the power consumption overhead of the dynamic partial reconfiguration (DPR) process using a high-speed digital oscilloscope and the shunt resistor method.
Abstract: In the context of embedded systems design, two important challenges are still under investigation. First, improve real-time data processing, reconfigurability, scalability, and self-adjusting capabilities of hardware components. Second, reduce power consumption through low-power design techniques as clock gating, logic gating, and dynamic partial reconfiguration (DPR) capabilities. Today, several application, e.g., cryptography, Software-defined radio or aerospace missions exploit the benefits of DPR of programmable logic devices. The DPR allows well defined reconfigurable FPGA region to be modified during runtime. However, it introduces an overhead in term of power consumption and time during the reconfiguration phase. In this paper, we present an investigation of power consumption overhead of the DPR process using a high-speed digital oscilloscope and the shunt resistor method. Results in terms of reconfiguration time and power consumption overhead for Virtex 5 FPGAs are shown.

Proceedings ArticleDOI
22 May 2016
TL;DR: In this paper, the first work for the synthesis of OCV-aware top-level activity-driven clock trees is presented, with the objective to minimize the weighted sum of the worst timing slack and the power consumption.
Abstract: Clock gating is recognized as one of the most effective techniques to reduce the dynamic power consumption. Many research efforts have been paid to build activity-driven clock trees for low power designs. On the other hand, as the feature size continues to shrink, the on-chip-variation (OCV) effect has become a serious concern, especially for the clock skew of the top-level clock tree. Based on this observation, in this paper, we present the first work for the synthesis of OCV-aware top-level activity-driven clock trees. In our approach, the clock skew variation is considered during the top-level activity-driven clock tree synthesis. Our objective is to minimize the weighted sum of the worst timing slack and the power consumption. Compared with previous works, benchmark data consistently show that our approach can greatly increase the worst timing slack with a small overhead on the power consumption.

Journal ArticleDOI
TL;DR: This paper has applied the concept of variable frequency together with CG in a 3-b up counter to demonstrate that one can construct a design where the clock can be modulated during its functional operation without any functional failure.
Abstract: The performance of silicon chip depends on the operating voltage of the chip. The chip should be designed using proper power and ground bond pads to minimize the power supply noise. When the chip starts working from the sleep mode, then the sudden rise in current inside the chip causes $LdI/dt$ noise. This noise reduces the power supply voltage, which in turn reduces the operating frequency of the chip. This makes the chip manufacturer to sell products of high-performance chips at a lower operating frequency. In order to ramp this current slowly, an innovative and fundamentally new method is implemented. In this proposed method, we have increased the operating frequency inside the chip slowly (from $f_{\textrm {min}}$ to $f_{\textrm {max}}$ ) to control the current ramp and at the same time performed clock gating (CG) to minimize noise by suppressing the current drawn by the device. We have applied this concept of variable frequency together with CG in a 3-b up counter to demonstrate that one can construct a design where the clock can be modulated during its functional operation without any functional failure.

Proceedings ArticleDOI
15 Jun 2016
TL;DR: This work presents a 1.40mm2 40nm CMOS sparse neuromorphic processor that implements a two-layer convolutional restricted Boltzmann machine (CRBM) for inference and a support vector machine (SVM) classifier that incorporates sparse convolvers to realize sparsity-proportional workload reduction.
Abstract: Sparsity is a brain-inspired property that enables a significant reduction in workload and power dissipation of deep learning. This work presents a 1.40mm2 40nm CMOS sparse neuromorphic processor that implements a two-layer convolutional restricted Boltzmann machine (CRBM) for inference and a support vector machine (SVM) classifier. The processor incorporates sparse convolvers to realize sparsity-proportional workload reduction. The architecture is parallelized along a non-sparse dimension to minimize stalling. At 0.9V and 240MHz, the processor achieves an effective 898.2GOPS performance, dissipating 140.9mW. Using sparsity, we reduce the workload, datapath power consumption and area by 3.4×, 3.3× and 1.74×, respectively. The design uses latch-based memory to reduce area and dynamic clock gating to save power.

Patent
10 Jun 2016
TL;DR: In this article, a first core, a master clock core, and a clock gating circuit are used to generate the master clock signal in response to a stall signal from the first core.
Abstract: A device may include a first core, a master clock core, and a clock gating circuit. The master clock core may generate a master clock signal. The clock gating circuit may clock gate the master clock signal in response to a stall signal from the first core.

Proceedings ArticleDOI
10 Apr 2016
TL;DR: DualSync is proposed, a synchronization approach for low-power wireless networks under dynamic working condition that maintains an accurate clock model to closely trace the relationship between clock skew and the influencing factors and incorporates an error-driven mechanism to facilitate interplay between Inter-Sync and Self-Sync so as to preserve high synchronization accuracy while minimizing communication cost.
Abstract: The low-cost crystal oscillators embedded in wireless sensor nodes are prone to be affected by their working condition, leading to undesired variation of clock skew. To preserve synchronized clocks, nodes have to undergo frequent re-synchronization to cope with the time-varying clock skew, which in turn means excessive energy consumption. In this paper, we propose DualSync, a synchronization approach for low-power wireless networks under dynamic working condition. By utilizing time-stamp exchanges and local measurement of temperature and voltage, DualSync maintains an accurate clock model to closely trace the relationship between clock skew and the influencing factors. We further incorporate an error-driven mechanism to facilitate interplay between Inter-Sync and Self-Sync, so as to preserve high synchronization accuracy while minimizing communication cost. We evaluate the performance of DualSync across various scenarios and compare it with state-of-art approaches. The experimental results illustrate the superior performance of DualSync in terms of both accuracy and energy efficiency.

Proceedings ArticleDOI
01 Oct 2016
TL;DR: A new clock gating technique incorporating Leakage Control Transistor is presented and an impressive reduction in power, delay and latency is observed using the proposed gating logic, which has outsmarted the existing works.
Abstract: The continuous growing demand of portable battery-powered electronics devices hunts for Nano-electronic circuit design for ultra-low power applications by reducing dynamic power, static power and short circuit power. In sequential circuit elements of an IC, a notable amount of power dissipation occurs due to the rapid switching of high frequency clock signals, which do not fetch any data bit or information. The needless switching of clock, during the HOLD phase of either ‘logic 1’ or ‘logic 0’, may be abolished using gated clock. In this paper, we have presented a new clock gating technique incorporating Leakage Control Transistor. The improvised technique is employed to trigger a D-Flip Flop using 90nm PTM technology at 1.1V power supply. We have observed an impressive reduction in power, delay and latency using the proposed gating logic, which has outsmarted the existing works. The simulation is also performed in smaller technology nodes such as 65nm, 45nm and 32 nm to notice the change in delay, dynamic power and static power of the circuit.

Patent
09 Aug 2016
TL;DR: In this article, a method for clock data recovery circuit calibration is described, which includes configuring a first clock recovery circuit to provide a clock signal that has a first frequency and that includes a single pulse for each symbol transmitted on a 3-wire, 3-phase interface.
Abstract: Methods, apparatus, and systems for clock calibration are disclosed. A method for clock data recovery circuit calibration includes configuring a first clock recovery circuit to provide a clock signal that has a first frequency and that includes a single pulse for each symbol transmitted on a 3-wire, 3-phase interface, and calibrating the first clock recovery circuit by incrementally increasing a delay period provided by a delay element of the first clock recovery circuit until the clock signal provided by the first clock recovery circuit has a frequency that is less than the first frequency and, when the first clock recovery circuit has a frequency that is less than the first frequency, incrementally decreasing the delay period provided by the delay element of the first clock recovery circuit until the clock signal provided by the first clock recovery circuit has a frequency that matches the first frequency.

Patent
29 Sep 2016
TL;DR: In this paper, an upstream server of a transport network generates clock difference data indicating a time difference between a server clock signal and a client clock signal, which have an asynchronous timing relationship with respect to one another.
Abstract: Apparatus and methods for asynchronous clock mapping are provided herein. In certain configurations, an upstream server of a transport network generates clock difference data indicating a time difference between a server clock signal and a client clock signal, which have an asynchronous timing relationship with respect to one another. The clock difference data is generated with high precision by using one or more time-to-digital converters (TDCs). The clock difference data is included in a transmitted data stream, and is used by a downstream server to recover client information with enhanced accuracy.

Proceedings ArticleDOI
16 May 2016
TL;DR: A way to model in advance power and clock gating costs on the basis of the functional, technological and architectural parameters of the baseline CGR system is proposed, saving designer effort and time.
Abstract: Power reduction is one of the biggest challenges in modern systems and tends to become a severe issue dealing with complex scenarios. To provide high-performance and flexibility, designers often opt for coarse-grained reconfigurable (CGR) systems. Nevertheless, these systems require specific attention to the power problem, since large set of resources may be underutilized while computing a certain task. This paper focuses on this issue. Targeting CGR devices, we propose a way to model in advance power and clock gating costs on the basis of the functional, technological and architectural parameters of the baseline CGR system. The proposed flow guides designers towards optimal implementations, saving designer effort and time.

Proceedings ArticleDOI
21 Feb 2016
TL;DR: It is shown how this capability to generate customized clock trees can provide better performance through reduced clock loss while maintaining the ability to handle the large number of clock domains that modern systems require.
Abstract: We present the clock architecture of the Stratix?10 FPGA, which uses a routable clock network rather than the fixed clock networks of previous generations. We describe the flexibility provided by this routable clock network and how arbitrarily sized clock trees can be synthesized and placed anywhere on the FPGA. We show how this capability to generate customized clock trees can provide better performance through reduced clock loss while maintaining the ability to handle the large number of clock domains that modern systems require. We experimentally demonstrate how a routable clock tree reduces the clock loss of the user design implementation by up to 6% of clock insertion delay.

Proceedings ArticleDOI
23 Mar 2016
TL;DR: The proposed ADPLL uses power optimized digital loop filter instead of the conventional one and is implemented using Verilog HDL and synthesized using Cadence RTL compiler using gpdk 45 nm technology.
Abstract: This paper presents a power efficient design of All Digital Phase Locked Loop (ADPLL). The proposed ADPLL uses power optimized digital loop filter instead of the conventional one. The power optimization of digital loop filter is carried out with the aid of clock gating technique without degrading the performance of the overall system. The proposed architecture is implemented using Verilog HDL and is synthesized using Cadence RTL compiler using gpdk 45 nm technology. To validate its functionality, verification and simulation is done by using the Cadence IES (Incisive Enterprise Simulator) tool. The power consumption of this ADPLL is 0.704 µW at a center frequency of 625 KHz. The total chip area is 207 µm2.