scispace - formally typeset
Search or ask a question

Showing papers by "Madhu Mutyam published in 2013"


Proceedings ArticleDOI
18 Mar 2013
TL;DR: An adaptive deflection router, DeBAR, that uses a minimal set of central buffers to accommodate a fraction of mis-routed flits and reduces the average flit latency and the deflection rate, and improves the throughput with respect to the existing minimally buffered deflection routers without any change in the critical path.
Abstract: Energy efficiency of the underlying communication framework plays a major role in the performance of multicore systems. NoCs with buffer-less routing are gaining popularity due to simplicity in the router design, low power consumption, and load balancing capacity. With minimal number of buffers, deflection routers evenly distribute the traffic across links. In this paper, we propose an adaptive deflection router, DeBAR, that uses a minimal set of central buffers to accommodate a fraction of mis-routed flits. DeBAR incorporates a hybrid flit ejection mechanism that gives the effect of dual ejection with a single ejection port, an innovative adaptive routing algorithm, and a selective flit buffering based on flit marking. Our proposed router design reduces the average flit latency and the deflection rate, and improves the throughput with respect to the existing minimally buffered deflection routers without any change in the critical path.

31 citations


Book ChapterDOI
19 Feb 2013
TL;DR: Experimental evaluation of ACR technique for 2-core and 4-core systems using SPEC CPU 2000 and 2006 benchmark suites shows significant speed-up improvement over the least recently used and thread-aware dynamic re-reference interval prediction techniques.
Abstract: Current day multicore processors employ multi-level cache hierarchy with one or two levels of private caches and a shared last-level cache (LLC). Efficient cache replacement policies at LLC are essential for reducing the off-chip memory traffic as well as contention for memory bandwidth. Cache replacement techniques for unicore LLCs may not be efficient for multicore LLCs as multicore LLCs can be shared by applications with varying access behavior, running simultaneously. One application may dominate another by flooding of cache requests and evicting the useful data of the other application. This paper proposes a new cache replacement policy for shared LLC called Application-aware Cache Replacement (ACR). ACR policy prevents victimizing low-access rate application by a high-access rate application. It dynamically keeps track of maximum life-time of cache lines in shared LLC for each concurrent application and helps in efficient utilization of the cache space. Experimental evaluation of ACR technique for 2-core and 4-core systems using SPEC CPU 2000 and 2006 benchmark suites shows significant speed-up improvement over the least recently used and thread-aware dynamic re-reference interval prediction techniques.

13 citations


Proceedings ArticleDOI
07 Nov 2013
TL;DR: Experimental results show that SLIDER reduces average flit latency, channel wastage, and deflection rate, and increases throughput in the network when compared to the state-of-the-art minimally buffered deflection routers.
Abstract: Network-on-Chip (NoC) provides a scalable communication interface for processing cores in large multicore systems. An efficient NoC router should not only minimize the average packet latency of the network but also have minimum pipeline latency, area, and power. Area and power overheads are affecting the scalability and popularity of traditional input buffered routers. In this context minimally buffered deflection routers are emerging as a cost effective alternative. We propose SLIDER, Smart Late Injection DEflection Router, that uses side buffers for accommodating a fraction of deflected flits. The main contributions of this work are smart late injection and selective flit preemption. In SLIDER the injection stage is kept at the end of the router pipeline. This reduces the contention in the arbitration stage, eliminates unwanted intra-router movement of flits and effectively utilizes the idle output channels. We parallelize independent operations in the router pipeline and reduce the pipeline latency by 25%. Experimental results on synthetic and real workloads show that SLIDER reduces average flit latency, channel wastage, and deflection rate, and increases throughput in the network when compared to the state-of-the-art minimally buffered deflection routers.

10 citations


Journal ArticleDOI
TL;DR: A novel flow-control mechanism is proposed to address cost/performance constraints in torus networks and ensure deadlock freedom and achieves significant improvement in throughput, as compared with the traditional design, while using significantly fewer buffers.
Abstract: The challenge for on-chip networks is to provide low latency communication in a very low power budget. To reduce the latency and maintain the simplicity of a mesh topology, torus topology is proposed. As torus topology has an inherent circular dependency, additional effort is needed to prevent deadlock, even if deadlock free routing algorithms are used. The authors propose a novel flow-control mechanism to address cost/performance constraints in torus networks and ensure deadlock freedom. They achieve flow-control by using a prevention mechanism and ensure deadlock freedom while requiring only a single packet buffer per input port. They simplify the router design by having a simple switch allocator that prioritises in-flight packets, and a single packet buffer per input port that eliminates the need for virtual channels. They also propose a mechanism to avoid starvation that can arise because of the prioritised arbitration. Experimental validation reveals that the authors design achieves significant improvement in throughput, as compared with the traditional design, while using significantly fewer buffers.

1 citations