scispace - formally typeset
Search or ask a question
Author

Raghavendra K

Bio: Raghavendra K is an academic researcher from Indian Institutes of Technology. The author has contributed to research in topics: Cache & Shared memory. The author has an hindex of 1, co-authored 2 publications receiving 5 citations.
Topics: Cache, Shared memory, Queue, Access time, Router

Papers
More filters
Proceedings ArticleDOI
10 Mar 2008
TL;DR: Experimental results reveal that the proposed process variation aware issue queue design recovers most of the lost performance due to process variation and incurs a performance penalty of less than 2% with respect to the performance of issue queues without process variation.
Abstract: In sub-90 nm process technology it becomes harder to control the fabrication process, which in turn causes variations between the design-time parameters and the fabricated parameters. Variations in the critical process parameters can result in significant fluctuations in the switching speed and leakage power consumption of different transistors in the same chip. In this paper, we study the impact of process variation on issue queues. Due to process variation, issue queues can take variable access latency. In order to work with nonuniform access latency issue queues, by exploiting ready operands of instructions at dispatch time, we propose a process variation aware issue queue design. Experimental results reveal that, for a 64-entry issue queue with half of the entries affected by process variation, our technique recovers most of the lost performance due to process variation and incurs a performance penalty of less than 2% with respect to the performance of issue queues without process variation.

5 citations

Proceedings ArticleDOI
01 Oct 2020
TL;DR: Wang et al. as mentioned in this paper proposed a congestion management technique in the LLC that equips the NoC router with small storage to keep a copy of heavily shared cache blocks, and also propose a prediction classifier in LLC controller.
Abstract: Multiple cores in a tiled multi-core processor are connected using a network-on-chip mechanism. All these cores share the last-level cache (LLC). For large-sized LLCs, generally, non-uniform cache architecture design is considered, where the LLC is split into multiple slices. Accessing highly shared cache blocks from an LLC slice by several cores simultaneously results in congestion at the LLC, which in turn increases the access latency. To deal with this issue, we propose a congestion management technique in the LLC that equips the NoC router with small storage to keep a copy of heavily shared cache blocks. To identify highly shared cache blocks, we also propose a prediction classifier in the LLC controller. We implement our technique in Sniper, an architectural simulator for multi-core systems, and evaluate its effectiveness by running a set of parallel benchmarks. Our experimental results show that the proposed technique is effective in reducing the LLC access time.

2 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A survey of architectural techniques for managing process variation (PV) in modern processors is presented and these techniques are classified based on several important parameters to bring out their similarities and differences.
Abstract: Process variation—deviation in parameters from their nominal specifications—threatens to slow down and even pause technological scaling, and mitigation of it is the way to continue the benefits of chip miniaturization. In this article, we present a survey of architectural techniques for managing process variation (PV) in modern processors. We also classify these techniques based on several important parameters to bring out their similarities and differences. The aim of this article is to provide insights to researchers into the state of the art in PV management techniques and motivate them to further improve these techniques for designing PV-resilient processors of tomorrow.

68 citations

Proceedings ArticleDOI
24 Jun 2008
TL;DR: This paper provides a comprehensive analysis of the effects of PV on the microprocessor issue queue and proposes mechanisms that allow the fast and slow issue-queue entries to co-exist in turn enabling instruction dispatch, issue and forwarding to proceed with minimal stalls.
Abstract: The last few years have witnessed an unprecedented explosion in transistor densities. Diminutive feature sizes have enabled microprocessor designers to break the billion-transistors per chip mark. However various new reliability challenges such as process variation (PV) have emerged that can no longer be ignored by chip designers. In this paper, we provide a comprehensive analysis of the effects of PV on the microprocessorpsilas Issue Queue. Variations can slow down issue queue entries and result in as much as 20.5% performance degradation. To counter this, we look at different solutions that include instruction steering, operand- and port- switching mechanisms. Given that PV is non-deterministic at design-time, our mechanisms allow the fast and slow issue-queue entries to co-exist in turn enabling instruction dispatch, issue and forwarding to proceed with minimal stalls. Evaluation on a detailed simulation environment indicates that the proposed mechanisms can reduce performance degradation due to PV to a low 1.3%.

8 citations

Proceedings ArticleDOI
04 May 2008
TL;DR: The goal of this work is to find the optimal frequency that balances performance with power against asymmetry, and it is demonstrated that traditional task scheduling techniques need to be revisited to mitigate the effects of process variations.
Abstract: Faced with the challenge of finding ways to use an ever-growing transistor budget, microarchitects have begun to move towards the chip multiprocessors (CMPs) as an attractive solution. CMPs have become a common way of reducing chip complexity and power consumption while maintaining high performance. Multiple cores are replicated on a single chip, resulting in a potential linear scaling of performance. Cores are becoming sufficiently small with technology scaling. As technology continues to scale, inter-die and intra-die variations in process parameters can result in significant impact on performance and power consumption, leading to asymmetry among the cores that were designed to be symmetric. Adaptive voltage scaling can be used to bring all cores to the same performance level leaving only core-to-core power variations. The goal of our work is to find the optimal frequency that balances performance with power against asymmetry. We also demonstrate that traditional task scheduling techniques need to be revisited to mitigate the effects of process variations.

8 citations

Dissertation
01 Jan 2013
TL;DR: This thesis will show combinatorial digital design using the 65nm transistor technology operating in near/sub-threshold region and a new digital building block for standard digital building blocks optimized for subthreshold performance are proposed.
Abstract: This thesis will show combinatorial digital design using the 65nm transistor technology operating in near/sub-threshold region. Designing a 16By9Bit adder optimized with regard to power consumption with a speed requirement of 50MHz per operation for micro-beamforming. To optimize the addition of the 16, 9 bit numbers, studies of different building block are performed to find the best building blocks optimized for low power consumption, robustness and regular layout design without breaking the speed requirement. A new digital building block for standard digital building blocks optimized for subthreshold performance are proposed. In addition there will be shown a way to make regular layout designs. As a final result there will be shown a 16by9bit adder layout design with a delay equal to 17.7nS = 56.5MHz with a power consumption of 25uW at 20degrees and delay equal to 10nS = 100MHz with a power consumption of 36.2$uW at 80degrees. The design are build up from 6736 transistor and uses a area of 240um * 84um = 20.1mm^2.

3 citations

Proceedings ArticleDOI
16 Mar 2009
TL;DR: This paper investigates to exploit the statistical features in circuit delay and to cascade dependent instructions for reducing variations to unveil how efficiently instruction cascading improves performance yield of processors.
Abstract: As semiconductor technologies are aggressively advanced, the problem of parameter variations is emerging. Process variations in transistors affect circuit delay, resulting in serious yield loss. Considering the situations, variationaware designs for yield enhancement interest researchers. This paper investigates to exploit the statistical features in circuit delay and to cascade dependent instructions for reducing variations. From statistical static timing analysis in circuit level and performance evaluation in processor level, this paper tries to unveil how efficiently instruction cascading improves performance yield of processors. Cascading instructions increases logic depth and decreases the standard deviation of the circuit delay. That might improve performance yield of microprocessors. Unfortunately, however, it is found that variability reduction in the circuit level does not always mean yield enhancement in the microarchitecture level.

1 citations