Showing papers on "Clock gating published in 2016"

PDF

Open Access

Journal Article•DOI•

Argo: A Real-Time Network-on-Chip Architecture With an Efficient GALS Implementation

[...]

Evangelia Kasapaki¹, Martin Schoeberl¹, Rasmus Bo Sørensen¹, Christoph Thomas Muller¹, Kees Goossens², Jens Sparsø¹ - Show less +2 more•Institutions (2)

Technical University of Denmark¹, Eindhoven University of Technology²

01 Feb 2016-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: An area-efficient, globally asynchronous, locally synchronous network-on-chip (NoC) architecture for a hard real-time multiprocessor platform that uses statically scheduled time-division multiplexing (TDM) to control the communication over a structure of routers, links, and network interfaces (NIs).

...read moreread less

Abstract: In this paper, we present an area-efficient, globally asynchronous, locally synchronous network-on-chip (NoC) architecture for a hard real-time multiprocessor platform. The NoC implements message-passing communication between processor cores. It uses statically scheduled time-division multiplexing (TDM) to control the communication over a structure of routers, links, and network interfaces (NIs) to offer real-time guarantees. The area-efficient design is a result of two contributions: 1) asynchronous routers combined with TDM scheduling and 2) a novel NI microarchitecture. Together they result in a design in which data are transferred in a pipelined fashion, from the local memory of the sending core to the local memory of the receiving core, without any dynamic arbitration, buffering, and clock synchronization. The routers use two-phase bundled-data handshake latches based on the Mousetrap latch controller and are extended with a clock gating mechanism to reduce the energy consumption. The NIs integrate the direct memory access functionality and the TDM schedule, and use dual-ported local memories to avoid buffering, flow-control, and synchronization. To verify the design, we have implemented a 4 $\times $ 4 bitorus NoC in 65-nm CMOS technology and we present results on area, speed, and energy consumption for the router, NI, NoC, and postlayout.

...read moreread less

89 citations

Journal Article•DOI•

All-Digital Low-Dropout Regulator With Adaptive Control and Reduced Dynamic Stability for Digital Load Circuits

[...]

Saad Bin Nasir¹, Samantak Gangopadhyay¹, Arijit Raychowdhury¹•Institutions (1)

Georgia Institute of Technology¹

19 Jan 2016-IEEE Transactions on Power Electronics

TL;DR: In this paper, a scan-programmable LDO macro in a low-power 0.13-μm technology operating down to 1.07×, the transistor $V_{{\rm TH}}$, and featuring greater than 90% current efficiency across a 50× current range was presented.

...read moreread less

Abstract: Digitally implementable LDOs embedded within digital functional units augment their analog counterparts for ultrafine-grained power management in digital ICs. Digital load circuits represent load currents with large and infrequent current transients and require a wide voltage range of operation, preferably down to the threshold voltage ( $V_{{\rm TH}}$ ) of the transistor. This paper presents a discrete-time, fully digital, scan-programmable LDO macro in a low-power 0.13-μm technology operating down to 1.07×, the transistor $V_{{\rm TH}}$ , and featuring greater than 90% current efficiency across a 50× current range through fine-grained clock gating and adaptive control. An 8× improvement in transient response time to large load steps is achieved through switched mode control. Both transient and steady-state operation models and measurements of the LDO are presented.

...read moreread less

86 citations

Proceedings Article•DOI•

A low power software-defined-radio baseband processor for the Internet of Things

[...]

Yajing Chen¹, Shengshuo Lu¹, Hun-Seok Kim¹, David Blaauw¹, Ronald G. Dreslinski¹, Trevor Mudge¹ - Show less +2 more•Institutions (1)

University of Michigan¹

12 Mar 2016

TL;DR: This work defines a configurable Software Defined Radio (SDR) baseband processor design for the Internet of Things (IoT) and introduces several architectural optimizations to this design: streaming registers, variable bit width datapath, dedicated ALUs for critical kernels, and an optimized flexible reduction network.

...read moreread less

Abstract: In this paper, we define a configurable Software Defined Radio (SDR) baseband processor design for the Internet of Things (IoT). We analyzed the fundamental algorithms in communications systems on IoT devices to enable a microarchitecture design that supports many IoT standards and custom nonstandard communications. Based on this analysis, we propose a custom SIMD execution model coupled with a scalar unit. We introduce several architectural optimizations to this design: streaming registers, variable bit width datapath, dedicated ALUs for critical kernels, and an optimized flexible reduction network. We employ voltage scaling and clock gating to further reduce the power, while more than a 100% time margin has been reserved for reliable operation in the near-threshold region. Together our architectural enhancements lead to a 71× power reduction compared to a classic general purpose SDR SIMD architecture. Our IoT SDR datapath has sub-mW power consumption based on SPICE simulation, and is placed and routed to fit within an area of 0.074mm2 in a 28nm process. We implemented several essential elementary signal processing kernels and combined them to demonstrate two end-to-end upper bound systems, 802.15.4-OQPSK and Bluetooth Low Energy. Our full SDR baseband system consists of a configurable SIMD with a control plane MCU and memory. For comparison, the best commercial wireless transceiver consumes 23.8mW for the entire wireless system (digital/RF/ analog). We show that our digital system power is below 2mW, in other words only 8% of the total system power. The wireless system is dominated by RF/analog power comsumption, thus the price of flexibility that SDR affords is small. We believe this work is unique in demonstrating the value of baseband SDR in the low power IoT domain.

...read moreread less

35 citations

Journal Article•DOI•

A 2.5 pJ/b Binary Image Sensor as a Pathfinder for Quanta Image Sensors

[...]

Saleh Masoodian¹, Arun Rao¹, Jiaju Ma¹, Kofi Odame¹, Eric R. Fossum¹ - Show less +1 more•Institutions (1)

Dartmouth College¹

01 Jan 2016-IEEE Transactions on Electron Devices

TL;DR: In this article, a pathfinder binary image sensor for exploring low power dissipation needed for future implementation of gigajot single-bit quanta image sensor (QIS) devices is presented.

...read moreread less

Abstract: This paper presents a pathfinder binary image sensor for exploring low-power dissipation needed for future implementation of gigajot single-bit quanta image sensor (QIS) devices. Using a charge-transfer amplifier design in the readout signal chain and pseudostatic clock gating units for row and column addressing, the 1-Mpixel binary image sensor operating at 1000 frames/s dissipates only 20-mW total power consumption, including I/O pads. The gain and analog-to-digital converter stages together dissipate 2.5 pJ/b, successfully paving the way for future gigajot QIS sensor designs.

...read moreread less

26 citations

Proceedings Article•DOI•

Designing approximate circuits using clock overgating

[...]

Younghoon Kim¹, Swagath Venkataramani¹, Kaushik Roy¹, Anand Raghunathan¹•Institutions (1)

Purdue University¹

05 Jun 2016

TL;DR: This work proposes a new approach, clock overgating, for the design of approximate circuits at the Register Transfer Level (RTL), to gate the clock signal to selected Flip-Flops in the circuit, even during execution cycles in which the circuit functionality is sensitive to their state.

...read moreread less

Abstract: Approximate computing is an emerging paradigm to improve the efficiency of computing systems by leveraging the intrinsic resilience of applications to their computations being executed in an approximate manner. Prior efforts on approximate hardware design have largely focused on circuit-level techniques. We propose a new approach, clock overgating, for the design of approximate circuits at the Register Transfer Level (RTL). The key idea is to gate the clock signal to selected Flip-Flops (FFs) in the circuit, even during execution cycles in which the circuit functionality is sensitive to their state. This saves power in the clock tree, the FF itself and in its downstream logic, while a quality loss ensues if the erroneous FF state propagates to the circuit output. We develop a systematic methodology to identify an energy-efficient overgating configuration for any given circuit and quality constraint. Towards this end, we develop 3 key strategies — significance-based overgating, grouping FFs into overgating islands, and utilizing internal signals of the circuit as triggers for overgating — that efficiently prune the large space of possible overgating configurations. We evaluate clock overgating by designing approximate versions of 6 machine learning accelerators, and demonstrate energy benefits of 1.36x on average (and upto 1.80x) for negligible (<0.5%) loss in application quality (classification accuracy).

...read moreread less

23 citations

Proceedings Article•DOI•

A 256-channel multi-phase clock sampling-based time-to-digital converter implemented in a Kintex-7 FPGA

[...]

Yonggang Wang¹, Peng Kuang¹, Chong Liu¹•Institutions (1)

University of Science and Technology of China¹

23 May 2016

TL;DR: A 256-channel TDC with a reasonably high performance is implemented in a single chip of Xilinx Kintex-7 FPGA and the results show that the time resolution of dualchannel measurements is in the range of 27.1 ps ~ 56.2 ps with a very low sensitivity to ambient temperature.

...read moreread less

Abstract: One of the advantages using a multi-phase clock embedded in field programmable gate arrays (FPGAs) to construct a time-to-digital converter (TDC) is its great multi-channel capability. However, many technical aspects limit the time resolution that can be achieved in the TDC architecture. In this paper, a series of solutions to these technical challenges are proposed and a 256-channel TDC with a reasonably high performance is implemented in a single chip of Xilinx Kintex-7 FPGA. The test results show that the time resolution of dualchannel measurements is in the range of 27.1 ps ∼ 56.2 ps with a very low sensitivity to ambient temperature. The measurement dead time is three system clock cycles, namely 4.3 ns, which means the peak measurement throughput of the TDC can reach 233 M events per second. The improved performance will make TDCs with the architecture more applicable for varieties of applications.

...read moreread less

23 citations

Proceedings Article•DOI•

Polysynchronous stochastic circuits

[...]

M. Hassan Najafi¹, David J. Lilja¹, Marc D. Riedel¹, Kia Bazargan¹•Institutions (1)

University of Minnesota¹

01 Jan 2016

TL;DR: This paper proposes a new approach: splitting clock domains at a very fine level, down to the level of a handful of gates, by adopting the paradigm of stochastic computation, where signal values are encoded as random bit streams.

...read moreread less

Abstract: Clock distribution networks (CDNs) are costly in high-performance ASICs. This paper proposes a new approach: splitting clock domains at a very fine level, down to the level of a handful of gates. Each domain is synchronized with an inexpensive clock signal, generated locally. This is possible by adopting the paradigm of stochastic computation, where signal values are encoded as random bit streams. The design method is illustrated with the synthesis of circuits for applications in signal and image processing.

...read moreread less

23 citations

Proceedings Article•DOI•

Core tunneling: Variation-aware voltage noise mitigation in GPUs

[...]

Renji Thomas¹, Kristin Barber¹, Naser Sedaghati¹, Li Zhou¹, Radu Teodorescu¹ - Show less +1 more•Institutions (1)

Ohio State University¹

12 Mar 2016

TL;DR: In this paper, a distributed power delivery and process variation model at functional unit granularity is developed and used to simulate supply voltage behavior in a multicore GPU system, and a variation-aware solution for dynamically reducing voltage margins is presented.

...read moreread less

Abstract: Voltage noise and manufacturing process variation represent significant reliability challenges for modern microprocessors. Voltage noise is caused by rapid changes in processor activity that can lead to timing violations and errors. Process variation is caused by manufacturing challenges in low-nanometer technologies and can lead to significant heterogeneity in performance and reliability across the chip. To ensure correct execution under worst-case conditions, chip designers generally add operating margins that are often unnecessarily conservative for most use cases, which results in wasted energy. This paper investigates the combined effects of process variation and voltage noise on modern GPU architectures. A distributed power delivery and process variation model at functional unit granularity was developed and used to simulate supply voltage behavior in a multicore GPU system. We observed that, just like in CPUs, large changes in power demand can lead to significant voltage droops. We also note that process variation makes some cores much more vulnerable to noise than others in the same GPU. Therefore, protecting the chip against large voltage droops by using fixed and uniform voltage guardbands is costly and inefficient. This paper presents core tunneling, a variation-aware solution for dynamically reducing voltage margins. The system relies on hardware critical path monitors to detect voltage noise conditions and quickly reacts by clock-gating vulnerable cores to prevent timing violations. This allows a substantial reduction in voltage margins. Since clock gating is enabled infrequently and only on the most vulnerable cores, the performance impact of core tunneling is very low. On average, core tunneling reduces energy consumption by 15%.

...read moreread less

23 citations

Journal Article•DOI•

High-performance, low-cost, and highly reliable radiation hardened latch design

[...]

Aibin Yan¹, Huaguo Liang, Zhengfeng Huang, Cuiyun Jiang¹•Institutions (1)

Hefei University¹

21 Jan 2016-Electronics Letters

TL;DR: In this paper, a novel high-performance, low-cost, and fully SEDU-immune latch, referred to as HSMUF, is presented to tolerate SEDUs when any arbitrary combination pair of nodes is affected by a particle striking.

...read moreread less

Abstract: Technology scaling results in that, soft errors, due to radiation-induced single event double-upset (SEDU) that affects double nodes through charge sharing, become a prominent concern in nanoscale CMOS technology. Existing hardened schemes suffer from being not fully SEDU-immune, or perform with too large cost penalties regarding propagation delay, silicon area, and power dissipation. A novel high-performance, low-cost, and fully SEDU-immune latch, referred to as HSMUF, is presented to tolerate SEDU when any arbitrary combination pair of nodes is affected by a particle striking. The latch mainly consists of a clock gating-based triple path DICE and a multiple-input Muller C-element. Simulation results demonstrate the SEDU-immunity and a 99.73% area–power–delay product saving for the HSMUF latch, compared with the SEDU fully immune DNCS-SEUT latch.

...read moreread less

21 citations

Journal Article•DOI•

Design Methodology for Voltage-Scaled Clock Distribution Networks

[...]

Can Sitik¹, Weicheng Liu², Baris Taskin¹, Emre Salman²•Institutions (2)

Drexel University¹, Stony Brook University²

01 Oct 2016-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A novel D-flip-flop cell that maximizes power savings by enabling low-voltage/swing operation throughout the entire clock network and a novel clock tree synthesis algorithm to ensure that the same timing constraints are satisfied.

...read moreread less

Abstract: A low-voltage/swing clocking methodology is developed through both circuit and algorithmic innovations. The primary objective is to significantly reduce the power consumed by the clock network while maintaining the circuit performance the same. The methodology consists of two primary components: 1) a novel D-flip-flop (DFF) cell that maximizes power savings by enabling low-voltage/swing operation throughout the entire clock network and 2) a novel clock tree synthesis algorithm to ensure that the same timing constraints (i.e., clock frequency, skew, and slew) are satisfied. The proposed methodology is integrated within an industrial design flow. Experimental results on ISCAS’89 benchmark circuits demonstrate that the overall power consumed by the clock tree can be reduced by up to 27% and 44% in, respectively, 32- and 45-nm technologies while satisfying the same timing constraints. Furthermore, the proposed low-swing DFF cell maintains the clock-to-Q delay the same while achieving up to 32% and 15% power savings in the overall flip-flop power of the benchmark circuits at, respectively, 1- and 1.5-GHz clock frequencies.

...read moreread less

20 citations

Journal Article•DOI•

Single-Event Transient Sensitivity Evaluation of Clock Networks at 28-nm CMOS Technology

[...]

Haibin Wang¹, N. N. Mahatme², L. Chen¹, M. Newton¹, Yuanqing Li¹, Rui Liu¹, Mo Chen¹, Bharat L. Bhuva³, K. Lilja, S.-J. Wen⁴, Rick Wong⁴, R. Fung⁴, Sanghyeon Baeg⁵ - Show less +9 more•Institutions (5)

University of Saskatchewan¹, Freescale Semiconductor², Vanderbilt University³, Cisco Systems, Inc.⁴, Hanyang University⁵

15 Feb 2016-IEEE Transactions on Nuclear Science

TL;DR: In this article, two types of clock networks including clock mesh and a buffered clock tree in a daisy-chain style were utilized to synchronize 5 DFF chains and fabricated in a 28-nm bulk CMOS technology.

...read moreread less

Abstract: Two types of clock networks including clock mesh and a buffered clock tree in a daisy-chain style were utilized to synchronize 5 DFF chains and fabricated in a 28 nm bulk CMOS technology. Alpha and proton particles did not trigger any errors indicating the significant single event tolerance of these clock networks. Heavy ion results for the data input pattern of checkerboard (alternate 1 and 0) are presented showing few occurrences of burst errors induced by single event transients (SETs) in the buffered clock tree at relatively high LET values. The same phenomena were observed in laser tests. Clock mesh is therefore proven to be less sensitive to SETs, if pre-mesh drivers do not generate transients. Otherwise, clock mesh possesses lower tolerance, as demonstrated in previous work. Moreover, these burst errors occurred (1) simultaneously in a DFF chain and its subsequent chains, or (2) in a single chain with subsequent chains unaffected. The distinct mechanisms of these burst errors were found to be the electrical masking effect of the daisy-chain clock buffers.

...read moreread less

Proceedings Article•DOI•

Ultra low-power and low-energy 32-bit datapath AES architecture for IoT applications

[...]

Duy-Hieu Bui¹, Diego Puschini¹, Simone Bacles-Min¹, Edith Beigne¹, Xuan-Tu Tran² - Show less +1 more•Institutions (2)

University of Grenoble¹, University of Engineering and Technology, Lahore²

27 Jun 2016

TL;DR: A novel AES microarchitecture with 32-bit datapath optimized for low-power and low-energy consumption targeting IoT applications and uses simple shift registers for key/data storage and permutation to minimize the area, and the power/energy consumption.

...read moreread less

Abstract: In this paper, we propose a novel AES microarchitecture with 32-bit datapath optimized for low-power and low-energy consumption targeting IoT applications. The proposed design uses simple shift registers for key/data storage and permutation to minimize the area, and the power/energy consumption. These shift registers also minimize the control logics in the key expansion and the encryption path. The proposed architecture is further optimized for area and/or power/energy consumption by selecting a suitable implementation of S-boxes and applying the clock gating technique. The implementation results in TSMC 65nm technology show that our design can save 20% of area or 20% of energy per bit at the same area when compared with the current 32-bit datapath designs. Our design also occupies smaller core area with lower energy per bit and at least 4 times higher in throughput in comparison with other 8-bit designs in the same technology node.

...read moreread less

Journal Article•DOI•

Virtual Clocking for NanoMagnet Logic

[...]

Marco Vacca¹, Fabrizio Cairo¹, Giovanna Turvani¹, Fabrizio Riente¹, Maurizio Zamboni¹, Mariagrazia Graziano¹ - Show less +2 more•Institutions (1)

Polytechnic University of Turin¹

01 Nov 2016-IEEE Transactions on Nanotechnology

TL;DR: The proposed “virtual clock” system improves the efficiency of circuits layout, substantially reducing interconnections overhead and boosting the reliability of the majority voter, and allows to globally reduce dynamic power consumption by considerably shrinking circuits area.

...read moreread less

Abstract: Among emerging technologies nanomagnet logic (NML) has recently received particular attention. NML uses magnets as constitutive elements, and this leads to logic circuits where there is no need of an external power supply to maintain their logic state. As a consequence, a system with intrinsic memory and zero stand-by power consumption can be envisioned. Despite the interesting nature of NML, a fundamental open problem still calls for a solution that could really boost the NML technology: the clock system. It constrains the layout of circuits and leads to a potentially high dynamic power consumption if not carefully conceived. The first clock system developed was based on the generation of a magnetic field through an on-chip current. After that other types of NML, based on several different types of clock systems, were proposed to improve clocking. We present here our proposal for a new clock delivery method. We named this system “virtual clock.” It offers several important advantages over previous solutions. First, it notably simplifies the clock generation network, reducing the complexity of the fabrication process. It improves the efficiency of circuits layout, substantially reducing interconnections overhead and boosting the reliability of the majority voter. It enables the fabrication of in-plane NML circuits with two layers, while they were confined to one single layer up to now. Finally, it allows to globally reduce dynamic power consumption by considerably shrinking circuits area. Overall the “virtual clock” system that we propose represents an important step forward in the development of the NML technology.

...read moreread less

Proceedings Article•DOI•

A Highly Robust Double Node Upset Tolerant latch

[...]

Adam Watkins¹, Spyros Tragouodas•Institutions (1)

Los Alamos National Laboratory¹

01 Sep 2016

TL;DR: A novel latch design is proposed in which all internal and external nodes are capable of recovering the previous value after a single or double node upset, which offers higher speed, lower power consumption and lower area requirements compared to all existing DNU tolerant latches.

...read moreread less

Abstract: Due to technology scaling, radiation induced errors which cause a double node upset (DNU) have become more common in data storage elements All current designs either suffer from high area and performance overhead or are vulnerable to an error after a DNU thus making them unsuitable for clock gating A novel latch design is proposed in which all internal and external nodes are capable of recovering the previous value after a single or double node upset The proposed latch offers higher speed, lower power consumption and lower area requirements compared to all existing DNU tolerant latches capable of recovering all nodes

...read moreread less

Journal Article•DOI•

An FPGA Architecture and CAD Flow Supporting Dynamically Controlled Power Gating

[...]

Assem A. M. Bsoul¹, Steven J. E. Wilton¹, Kuen Hung Tsoi², Wayne Luk²•Institutions (2)

University of British Columbia¹, Imperial College London²

01 Jan 2016-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: An FPGA architecture that enables dynamically controlled power gating, in which FPGAs resources can be selectively powered down at run-time is presented, which could lead to significant overall energy savings for applications having modules with long idle times.

...read moreread less

Abstract: Leakage power is an important component of the total power consumption in field-programmable gate arrays (FPGAs) built using 90-nm and smaller technology nodes. Power gating was shown to be effective at reducing the leakage power. Previous techniques focus on turning OFF unused FPGA resources at configuration time; the benefit of this approach depends on resource utilization. In this paper, we present an FPGA architecture that enables dynamically controlled power gating, in which FPGA resources can be selectively powered down at run-time. This could lead to significant overall energy savings for applications having modules with long idle times. We also present a CAD flow that can be used to map applications to the proposed architecture. We study the area and power tradeoffs by varying the different FPGA architecture parameters and power gating granularity. The proposed CAD flow is used to map a set of benchmark circuits that have multiple power-gated modules to the proposed architecture. Power savings of up to 83% are achievable for these circuits. Finally, we study a control system of a robot that is used in endoscopy. Using the proposed architecture combined with clock gating results in up to 19% energy savings in this application.

...read moreread less

Proceedings Article•DOI•

A 0.58mm 2 2.76Gb/s 79.8pJ/b 256-QAM massive MIMO message-passing detector

[...]

Wei Tang¹, Chia-Hsiang Chen¹, Zhengya Zhang¹•Institutions (1)

University of Michigan¹

15 Jun 2016

TL;DR: Leveraging channel hardening in massive MIMO, a symbol hardening technique is proposed to reduce MPD's complexity by more than 60% with minimal SNR loss.

...read moreread less

Abstract: A 0.58mm2 40nm CMOS message-passing detector (MPD) is designed for a 256-QAM massive MIMO system supporting 32 concurrent mobile users in each time-frequency resource. Leveraging channel hardening in massive MIMO, a symbol hardening technique is proposed to reduce MPD's complexity by more than 60% with minimal SNR loss. The MPD is implemented in a 4-layer 2-way interleaved architecture to enable a 2.76Gb/s throughput (average 4.9 iterations at 27dB SNR with early termination) using 76% smaller area than a fully parallel architecture. With dynamic precision control and clock gating to exploit algorithmic properties, the energy is reduced to 79.8pJ/b (or 2.49pJ/b per TX antenna).

...read moreread less

Journal Article•DOI•

A Robust Energy/Area-Efficient Forwarded-Clock Receiver With All-Digital Clock and Data Recovery in 28-nm CMOS for High-Density Interconnects

[...]

Shuai Chen¹, Hao Li², Patrick Chiang²•Institutions (2)

Chinese Academy of Sciences¹, Oregon State University²

01 Feb 2016-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A phase tracking procedure is proposed to enable the ADCDR to track the phase drift due to the voltage and temperature variations, and the proposed all-digital clock and data recovery (ADCDR) circuit, which is well suited for today's CMOS process scaling, enables the receiver to achieve low power and area consumption.

...read moreread less

Abstract: This paper presents a robust energy/area-efficient receiver fabricated in a 28-nm CMOS process. The receiver consists of eight data lanes plus one forwarded-clock lane supporting the hypertransport standard for high-density chip-to-chip links. The proposed all-digital clock and data recovery (ADCDR) circuit, which is well suited for today’s CMOS process scaling, enables the receiver to achieve low power and area consumption. The ADCDR can enter into open loop after lock-in to save power and avoid clock dithering phenomenon. Moreover, to compensate the open loop, a phase tracking procedure is proposed to enable the ADCDR to track the phase drift due to the voltage and temperature variations. Furthermore, the all-digital delay-locked loop circuit integrated in the ADCDR can generate accurate multiphase clocks with the proposed calibrated locking algorithm in the presence of process variations. The precise multiphase clocks are essential for the half-rate sampling and Alexander-type phase detecting. Measurement results show that the receiver can operate at a data rate of 6.4 Gbits/s with a bit error rate $ , consuming 7.5-mW per lane (1.2 pJ/bit) under a 0.85 V power supply. With ADCDR’s phase tracking, the receiver performs better in jitter tolerance and achieves a 500-kHz bandwidth, which is high enough to track the phase drift. The receiver core occupies an area of 0.02 mm $^{2}$ per lane.

...read moreread less

Patent•

Semiconductor memory device having clock generation scheme based on command

[...]

Seung-Jun Shin¹, Doo Su Yeon¹, Oh Tae Young¹•Institutions (1)

Samsung¹

25 Mar 2016

TL;DR: In this article, a command decoder is configured to generate an auto-sync signal in response to a command for writing data at memory cells or reading data from a memory cell, and an internal data clock generating circuit configured to phase synchronize a second clock, having a clock frequency higher than a first clock.

...read moreread less

Abstract: A semiconductor memory device includes a command decoder configured to generate an auto-sync signal in response to a command for writing data at a memory cell or reading data from a memory cell, and an internal data clock generating circuit configured to phase synchronize a second clock, having a clock frequency higher than a clock frequency of a first clock, with the first clock in response to the auto-sync signal.

...read moreread less

Proceedings Article•DOI•

Accurate measurement of power consumption overhead during FPGA dynamic partial reconfiguration

[...]

Amor Nafkha¹, Yves Louet¹•Institutions (1)

CentraleSupélec¹

01 Sep 2016

TL;DR: In this article, the authors investigated the power consumption overhead of the dynamic partial reconfiguration (DPR) process using a high-speed digital oscilloscope and the shunt resistor method.

...read moreread less

Abstract: In the context of embedded systems design, two important challenges are still under investigation. First, improve real-time data processing, reconfigurability, scalability, and self-adjusting capabilities of hardware components. Second, reduce power consumption through low-power design techniques as clock gating, logic gating, and dynamic partial reconfiguration (DPR) capabilities. Today, several application, e.g., cryptography, Software-defined radio or aerospace missions exploit the benefits of DPR of programmable logic devices. The DPR allows well defined reconfigurable FPGA region to be modified during runtime. However, it introduces an overhead in term of power consumption and time during the reconfiguration phase. In this paper, we present an investigation of power consumption overhead of the DPR process using a high-speed digital oscilloscope and the shunt resistor method. Results in terms of reconfiguration time and power consumption overhead for Virtex 5 FPGAs are shown.

...read moreread less

Proceedings Article•DOI•

Top-level activity-driven clock tree synthesis with clock skew variation considered

[...]

Te-Jui Wang¹, Shih-Hsu Huang¹, Wei-Kai Cheng¹, Yih-Chih Chou²•Institutions (2)

Chung Yuan Christian University¹, Global Unichip Corporation²

22 May 2016

TL;DR: In this paper, the first work for the synthesis of OCV-aware top-level activity-driven clock trees is presented, with the objective to minimize the weighted sum of the worst timing slack and the power consumption.

...read moreread less

Abstract: Clock gating is recognized as one of the most effective techniques to reduce the dynamic power consumption. Many research efforts have been paid to build activity-driven clock trees for low power designs. On the other hand, as the feature size continues to shrink, the on-chip-variation (OCV) effect has become a serious concern, especially for the clock skew of the top-level clock tree. Based on this observation, in this paper, we present the first work for the synthesis of OCV-aware top-level activity-driven clock trees. In our approach, the clock skew variation is considered during the top-level activity-driven clock tree synthesis. Our objective is to minimize the weighted sum of the worst timing slack and the power consumption. Compared with previous works, benchmark data consistently show that our approach can greatly increase the worst timing slack with a small overhead on the power consumption.

...read moreread less

Journal Article•DOI•

Reduction of Noise Using Continuously Changing Variable Clock and Clock Gating for IC Chips

[...]

Suman Bhowmik¹, Debajit Deb¹, Sambhu Nath Pradhan¹, Bidyut K. Bhattacharyya¹•Institutions (1)

National Institute of Technology Agartala¹

01 Jun 2016-IEEE Transactions on Components, Packaging and Manufacturing Technology

TL;DR: This paper has applied the concept of variable frequency together with CG in a 3-b up counter to demonstrate that one can construct a design where the clock can be modulated during its functional operation without any functional failure.

...read moreread less

Abstract: The performance of silicon chip depends on the operating voltage of the chip. The chip should be designed using proper power and ground bond pads to minimize the power supply noise. When the chip starts working from the sleep mode, then the sudden rise in current inside the chip causes $LdI/dt$ noise. This noise reduces the power supply voltage, which in turn reduces the operating frequency of the chip. This makes the chip manufacturer to sell products of high-performance chips at a lower operating frequency. In order to ramp this current slowly, an innovative and fundamentally new method is implemented. In this proposed method, we have increased the operating frequency inside the chip slowly (from $f_{\textrm {min}}$ to $f_{\textrm {max}}$ ) to control the current ramp and at the same time performed clock gating (CG) to minimize noise by suppressing the current drawn by the device. We have applied this concept of variable frequency together with CG in a 3-b up counter to demonstrate that one can construct a design where the clock can be modulated during its functional operation without any functional failure.

...read moreread less

Proceedings Article•DOI•

A 1.40mm 2 141mW 898GOPS sparse neuromorphic processor in 40nm CMOS

[...]

Phil Knag¹, Chester Liu¹, Zhengya Zhang¹•Institutions (1)

University of Michigan¹

15 Jun 2016

TL;DR: This work presents a 1.40mm2 40nm CMOS sparse neuromorphic processor that implements a two-layer convolutional restricted Boltzmann machine (CRBM) for inference and a support vector machine (SVM) classifier that incorporates sparse convolvers to realize sparsity-proportional workload reduction.

...read moreread less

Abstract: Sparsity is a brain-inspired property that enables a significant reduction in workload and power dissipation of deep learning. This work presents a 1.40mm2 40nm CMOS sparse neuromorphic processor that implements a two-layer convolutional restricted Boltzmann machine (CRBM) for inference and a support vector machine (SVM) classifier. The processor incorporates sparse convolvers to realize sparsity-proportional workload reduction. The architecture is parallelized along a non-sparse dimension to minimize stalling. At 0.9V and 240MHz, the processor achieves an effective 898.2GOPS performance, dissipating 140.9mW. Using sparsity, we reduce the workload, datapath power consumption and area by 3.4×, 3.3× and 1.74×, respectively. The design uses latch-based memory to reduce area and dynamic clock gating to save power.

...read moreread less

Patent•

Apparatus including core and clock gating circuit and method of operating same

[...]

Edgar Barber¹, Ronen Golan¹, Eli Elmoalem¹•Institutions (1)

SanDisk¹

10 Jun 2016

TL;DR: In this article, a first core, a master clock core, and a clock gating circuit are used to generate the master clock signal in response to a stall signal from the first core.

...read moreread less

Abstract: A device may include a first core, a master clock core, and a clock gating circuit. The master clock core may generate a master clock signal. The clock gating circuit may clock gate the master clock signal in response to a stall signal from the first core.

...read moreread less

Proceedings Article•DOI•

DualSync: Taming clock skew variation for synchronization in low-power wireless networks

[...]

Meng Jin¹, Tianzhang Xing¹, Xiaojiang Chen¹, Xin Meng¹, Dingyi Fang¹, Yuan He² - Show less +2 more•Institutions (2)

Northwest University (China)¹, Tsinghua University²

10 Apr 2016

TL;DR: DualSync is proposed, a synchronization approach for low-power wireless networks under dynamic working condition that maintains an accurate clock model to closely trace the relationship between clock skew and the influencing factors and incorporates an error-driven mechanism to facilitate interplay between Inter-Sync and Self-Sync so as to preserve high synchronization accuracy while minimizing communication cost.

...read moreread less

Abstract: The low-cost crystal oscillators embedded in wireless sensor nodes are prone to be affected by their working condition, leading to undesired variation of clock skew. To preserve synchronized clocks, nodes have to undergo frequent re-synchronization to cope with the time-varying clock skew, which in turn means excessive energy consumption. In this paper, we propose DualSync, a synchronization approach for low-power wireless networks under dynamic working condition. By utilizing time-stamp exchanges and local measurement of temperature and voltage, DualSync maintains an accurate clock model to closely trace the relationship between clock skew and the influencing factors. We further incorporate an error-driven mechanism to facilitate interplay between Inter-Sync and Self-Sync, so as to preserve high synchronization accuracy while minimizing communication cost. We evaluate the performance of DualSync across various scenarios and compare it with state-of-art approaches. The experimental results illustrate the superior performance of DualSync in terms of both accuracy and energy efficiency.

...read moreread less

Proceedings Article•DOI•

A 90 nm leakage control transistor based clock gating for low power flip flop applications

[...]

Pritam Bhattacharjee, Alak Majumder, Tushar Dhabal Das¹•Institutions (1)

National Institute of Technology, Arunachal Pradesh¹

01 Oct 2016

TL;DR: A new clock gating technique incorporating Leakage Control Transistor is presented and an impressive reduction in power, delay and latency is observed using the proposed gating logic, which has outsmarted the existing works.

...read moreread less

Abstract: The continuous growing demand of portable battery-powered electronics devices hunts for Nano-electronic circuit design for ultra-low power applications by reducing dynamic power, static power and short circuit power. In sequential circuit elements of an IC, a notable amount of power dissipation occurs due to the rapid switching of high frequency clock signals, which do not fetch any data bit or information. The needless switching of clock, during the HOLD phase of either ‘logic 1’ or ‘logic 0’, may be abolished using gated clock. In this paper, we have presented a new clock gating technique incorporating Leakage Control Transistor. The improvised technique is employed to trigger a D-Flip Flop using 90nm PTM technology at 1.1V power supply. We have observed an impressive reduction in power, delay and latency using the proposed gating logic, which has outsmarted the existing works. The simulation is also performed in smaller technology nodes such as 65nm, 45nm and 32 nm to notice the change in delay, dynamic power and static power of the circuit.

...read moreread less

Patent•

Multiphase clock data recovery circuit calibration

[...]

Duan Ying¹, Chulkyu Lee¹, Dang Harry¹, Ohjoon Kwon¹•Institutions (1)

Qualcomm¹

09 Aug 2016

TL;DR: In this article, a method for clock data recovery circuit calibration is described, which includes configuring a first clock recovery circuit to provide a clock signal that has a first frequency and that includes a single pulse for each symbol transmitted on a 3-wire, 3-phase interface.

...read moreread less

Abstract: Methods, apparatus, and systems for clock calibration are disclosed. A method for clock data recovery circuit calibration includes configuring a first clock recovery circuit to provide a clock signal that has a first frequency and that includes a single pulse for each symbol transmitted on a 3-wire, 3-phase interface, and calibrating the first clock recovery circuit by incrementally increasing a delay period provided by a delay element of the first clock recovery circuit until the clock signal provided by the first clock recovery circuit has a frequency that is less than the first frequency and, when the first clock recovery circuit has a frequency that is less than the first frequency, incrementally decreasing the delay period provided by the delay element of the first clock recovery circuit until the clock signal provided by the first clock recovery circuit has a frequency that matches the first frequency.

...read moreread less

Patent•

Apparatus and methods for asynchronous clock mapping

[...]

Xiaopeng Song¹, Yiming Zhao¹, Yi Wang¹•Institutions (1)

Analog Devices¹

29 Sep 2016

TL;DR: In this paper, an upstream server of a transport network generates clock difference data indicating a time difference between a server clock signal and a client clock signal, which have an asynchronous timing relationship with respect to one another.

...read moreread less

Abstract: Apparatus and methods for asynchronous clock mapping are provided herein. In certain configurations, an upstream server of a transport network generates clock difference data indicating a time difference between a server clock signal and a client clock signal, which have an asynchronous timing relationship with respect to one another. The clock difference data is generated with high precision by using one or more time-to-digital converters (TDCs). The clock difference data is included in a transmitted data stream, and is used by a downstream server to recover client information with enhanced accuracy.

...read moreread less

Proceedings Article•DOI•

Power and clock gating modelling in coarse grained reconfigurable systems

[...]

Tiziana Fanni¹, Carlo Sau¹, Paolo Meloni¹, Luigi Raffo¹, Francesca Palumbo² - Show less +1 more•Institutions (2)

University of Cagliari¹, University of Sassari²

16 May 2016

TL;DR: A way to model in advance power and clock gating costs on the basis of the functional, technological and architectural parameters of the baseline CGR system is proposed, saving designer effort and time.

...read moreread less

Abstract: Power reduction is one of the biggest challenges in modern systems and tends to become a severe issue dealing with complex scenarios. To provide high-performance and flexibility, designers often opt for coarse-grained reconfigurable (CGR) systems. Nevertheless, these systems require specific attention to the power problem, since large set of resources may be underutilized while computing a certain task. This paper focuses on this issue. Targeting CGR devices, we propose a way to model in advance power and clock gating costs on the basis of the functional, technological and architectural parameters of the baseline CGR system. The proposed flow guides designers towards optimal implementations, saving designer effort and time.

...read moreread less

Proceedings Article•DOI•

Stratix™ 10 High Performance Routable Clock Networks

[...]

Carl Ebeling¹, Dana How¹, David Lewis¹, Herman Schmit¹•Institutions (1)

Altera¹

21 Feb 2016

TL;DR: It is shown how this capability to generate customized clock trees can provide better performance through reduced clock loss while maintaining the ability to handle the large number of clock domains that modern systems require.

...read moreread less

Abstract: We present the clock architecture of the Stratix?10 FPGA, which uses a routable clock network rather than the fixed clock networks of previous generations. We describe the flexibility provided by this routable clock network and how arbitrarily sized clock trees can be synthesized and placed anywhere on the FPGA. We show how this capability to generate customized clock trees can provide better performance through reduced clock loss while maintaining the ability to handle the large number of clock domains that modern systems require. We experimentally demonstrate how a routable clock tree reduces the clock loss of the user design implementation by up to 6% of clock insertion delay.

...read moreread less

Proceedings Article•DOI•

Design of power efficient All Digital Phase Locked Loop (ADPLL)

[...]

Nitesh Tripathi¹, Sambhu Nath Pradhan¹•Institutions (1)

National Institute of Technology Agartala¹

23 Mar 2016

TL;DR: The proposed ADPLL uses power optimized digital loop filter instead of the conventional one and is implemented using Verilog HDL and synthesized using Cadence RTL compiler using gpdk 45 nm technology.

...read moreread less

Abstract: This paper presents a power efficient design of All Digital Phase Locked Loop (ADPLL). The proposed ADPLL uses power optimized digital loop filter instead of the conventional one. The power optimization of digital loop filter is carried out with the aid of clock gating technique without degrading the performance of the overall system. The proposed architecture is implemented using Verilog HDL and is synthesized using Cadence RTL compiler using gpdk 45 nm technology. To validate its functionality, verification and simulation is done by using the Cadence IES (Incisive Enterprise Simulator) tool. The power consumption of this ADPLL is 0.704 µW at a center frequency of 625 KHz. The total chip area is 207 µm2.

...read moreread less

Collapse