scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Buffer reduction algorithm for mesh-based clock distribution

01 Oct 2014-pp 1-4
TL;DR: This short paper proposes a buffer reduction algorithm which can reduce the power dissipated in clock meshes by 15-18% at the cost of 10-20 ps increase in skew when compared to the previously published work.
Abstract: In deep sub-micron technology, Mesh-based clock distribution is becoming a preferred method to distribute the clock since it is tolerant to process variations Buffers are placed on the mesh nodes to drive the mesh wire capacitance and large load capacitance of clock sinks In this short paper, we propose a buffer reduction algorithm which can reduce the power dissipated in clock meshes We calculate the importance of each buffer by the impact its removal has on the clock latency and clock slew at sinks We then calculate a rank for each buffer and buffers with lower ranks are removed Our buffer reduction algorithm is able to achieve 15–18% reduction in power at the cost of 10–20 ps increase in skew when compared to the previously published work
Citations
More filters
Journal ArticleDOI
TL;DR: A novel design platform, merging and replacing of multiple multiplexers and dividers (MRMMD), is developed to intelligently identify those suspicious clock architectures and resynthesize them into a power-and-area effective and less complicated clock structure.
Abstract: To trigger events for application-specific data transfer among registers in a multimillion-gate system-on-chip (SoC), various kinds of clock signals, selectively driven by different frequency-dependent sources and/or dividers (DIVs), are usually centralized in one or more clock generation modules, where clock gating cells (CGCs), multiplexers (MUXes) and DIVs are used to create the clocks required by different functional operations in an SoC. These modules will introduce uncommon and longer timing paths for clock propagations and further make the clock tree synthesis (CTS) process become more challenging due to the on-chip-variation (OCV) effects. In addition, high volume of switching activities in the increased number of clock logic cells will consume more power. In this article, a novel design platform, merging and replacing of multiple multiplexers and dividers (MRMMD), is developed to intelligently identify those suspicious clock architectures and resynthesize them into a power-and-area effective and less complicated clock structure. Using our resynthesis platform, not only the number of clock-related timing paths and their corresponding logic levels can be reduced, but also the corresponding analysis and implementations of clock skew minimizations during CTS become much easier. The experimental results implemented in TSMC 55- and 28-nm process nodes on optimizing some industrial clock architectures showed that significant reductions of area, power, latency, skew and clock path, logic level, OCV impact, total wire length, and implementation runtime are achieved using our MRMMD platform.

4 citations


Cites methods from "Buffer reduction algorithm for mesh..."

  • ...A buffer reduction method for meshbased clock distribution to achieve smaller clock network area was shown [8]....

    [...]

Book ChapterDOI
01 Jan 2020
TL;DR: K skew minimization design should be introduced in VLSI physical design at early stages of SoC’s where it has the highest benefits for QoR.
Abstract: The most critical constraints in System on chip (SoC’s), to determine the performance are area and power. As technology scales down, innovative clock tree design techniques are required to improve the skew. Hence, skew minimization design should be introduced in VLSI physical design at early stages of SoC’s where it has the highest benefits for QoR. In this paper, skew balance methodology using H-Tree is introduced in Multisource CTS design.
References
More filters
Journal ArticleDOI
TL;DR: In this paper, a microarchitecture-aware model for process variation is proposed, including both random and systematic effects, and the model is specified using a small number of highly intuitive parameters.
Abstract: Within-die parameter variation poses a major challenge to high-performance microprocessor design, negatively impacting a processor's frequency and leakage power. Addressing this problem, this paper proposes a microarchitecture-aware model for process variation-including both random and systematic effects. The model is specified using a small number of highly intuitive parameters. Using the variation model, this paper also proposes a framework to model timing errors caused by parameter variation. The model yields the failure rate of microarchitectural blocks as a function of clock frequency and the amount of variation. With the combination of the variation model and the error model, we have VARIUS, a comprehensive model that is capable of producing detailed statistics of timing errors as a function of different process parameters and operating conditions. We propose possible applications of VARIUS to microarchitectural research.

386 citations

Journal ArticleDOI
TL;DR: The MeshWorks framework is presented, the first comprehensive automated framework for planning, synthesis, and optimization of clock mesh networks that addresses the above issues and can achieve an additional reduction of 31% in buffer area, 21% in wirelength, and 23% in power.
Abstract: Clock mesh networks are well known for their variation tolerance. But their usage is limited to high-end designs due to the significantly high resource requirements compared to clock trees and the lack of automatic mesh synthesis tools. Most existing works on clock mesh networks either deal with semi-custom design or perform optimizations on a given clock mesh. However, the problem of obtaining a good initial clock mesh has not been addressed. Also, the problem of achieving a smooth tradeoff between variation tolerance and resource requirements has not been addressed adequately. In this paper, we present our MeshWorks framework, the first comprehensive automated framework for planning, synthesis, and optimization of clock mesh networks that addresses the above issues. Experimental results suggest that our algorithms can achieve an additional reduction of 31% in buffer area, 21% in wirelength, and 23% in power, compared to the best previous work, with similar worst case maximum frequency. We also demonstrate the effectiveness of our framework under several practical issues such as blockages, multiple clocks, uneven load distribution, and electromigration violations.

26 citations


"Buffer reduction algorithm for mesh..." refers background or methods in this paper

  • ...In [1], the buffers are placed using a set-cover algorithm with a discrete buffer library....

    [...]

  • ...A detailed study has been made on leaf level clock mesh synthesis in [1], [2] and [3]....

    [...]

  • ...But this comes at the cost of increased power dissipation since mesh has increased wire capacitance [1]....

    [...]

Journal ArticleDOI
TL;DR: A novel clock tree synthesizer with dual-MST geometric approach of perfect matching is developed for symmetric clock tree construction and a special technique of buffer sizing is introduced to reduce the variation effect.

17 citations


"Buffer reduction algorithm for mesh..." refers background in this paper

  • ...This increased skew (upto 30 ps) is still acceptable since tree based distribution have reported skew of 45-70 ps on the same ISPD2010 benchmarks under similar simulation conditions and variations ([8])....

    [...]

Journal ArticleDOI
TL;DR: This work presents two techniques to optimize high-performance clock meshes, the first of which is a mesh perturbation methodology for nonuniform mesh routing and the second a skew-aware buffer placement through iterative buffer deletion.
Abstract: Clock meshes are extremely effective at producing low-skew regional clock networks that are tolerant of environmental and process variations. For this reason, clock meshes are used in most high-performance designs, but this robustness consumes significant power. In this work, we present two techniques to optimize high-performance clock meshes. The first technique is a mesh perturbation methodology for nonuniform mesh routing. The second technique is a skew-aware buffer placement through iterative buffer deletion. We demonstrate how these optimizations can achieve significant power reductions and a near elimination of short-circuit power. In addition, the total wire length is decreased, the number of required buffers is decreased, and both skew and robustness are improved on average when variation is considered.

16 citations


"Buffer reduction algorithm for mesh..." refers background or methods or result in this paper

  • ...The IBD of [2] is applied to the same ISPD2010 benchmarks under the same simulation conditions as ours....

    [...]

  • ...To ascertain the effectiveness of our algorithm, we compare our results with skew and power of [2] in Table IV and note that our buffer reduction algorithm can achieve 15 - 18% reduction in power at the cost of increased skew....

    [...]

  • ...A detailed study has been made on leaf level clock mesh synthesis in [1], [2] and [3]....

    [...]

  • ...To the best of our knowledge, the Iterative Buffer Deletion algorithm (IBD) presented in [2] is the only published work on buffer reduction in clock mesh....

    [...]

  • ...ispd10 This work IBD of [2] % Pwr reduction Skew(ps) Pwr(mW) Skew(ps) Pwr(mW) cns06 21....

    [...]

Journal ArticleDOI
TL;DR: Different methods to manage skew and skew variations within tree and non-tree clock distribution networks are reviewed and compared and metrics to determine the most power efficient technique for a given circuit are discussed and verified with simulation.
Abstract: Power is a primary concern in modern circuits. Clock distribution networks, in particular, are an essential element of a synchronous digital circuit and a significant power consumer. Clock distribution networks are subject to clock skew due to process, voltage, and temperature (PVT) variations and load imbalances. A target skew between sequentially-adjacent registers can be obtained in a balanced low power clock tree using techniques such as buffer and wire sizing. Existing skew mitigation techniques in tree-based clock distribution networks, however, are not efficient in coping with post design variations; whereas the latest non-tree mesh-based solutions reliably handle skew variations, albeit with a significant increase in dissipated power. Alternatively, crosslink-based methods provide low power and variation-efficient skew solutions. Existing crosslink-based methods, however, only address skew at the network topology level and do not target low power consumption. Different methods to manage skew and skew variations within tree and non-tree clock distribution networks are reviewed and compared in this paper. Guidelines for inserting crosslinks within a buffered low power clock tree are provided. Metrics to determine the most power efficient technique for a given circuit are discussed and verified with simulation.

9 citations


"Buffer reduction algorithm for mesh..." refers background in this paper

  • ...In [4], the authors observe that this SC power dissipation is a linear function of inter-buffer skew....

    [...]