Showing papers by "Madhu Mutyam published in 2013"

PDF

Open Access

Proceedings Article•DOI•

DeBAR: deflection based adaptive router with minimal buffering

[...]

John Jose¹, Bhawna Nayak¹, Kranthi Kumar¹, Madhu Mutyam¹•Institutions (1)

18 Mar 2013

TL;DR: An adaptive deflection router, DeBAR, that uses a minimal set of central buffers to accommodate a fraction of mis-routed flits and reduces the average flit latency and the deflection rate, and improves the throughput with respect to the existing minimally buffered deflection routers without any change in the critical path.

...read moreread less

Abstract: Energy efficiency of the underlying communication framework plays a major role in the performance of multicore systems. NoCs with buffer-less routing are gaining popularity due to simplicity in the router design, low power consumption, and load balancing capacity. With minimal number of buffers, deflection routers evenly distribute the traffic across links. In this paper, we propose an adaptive deflection router, DeBAR, that uses a minimal set of central buffers to accommodate a fraction of mis-routed flits. DeBAR incorporates a hybrid flit ejection mechanism that gives the effect of dual ejection with a single ejection port, an innovative adaptive routing algorithm, and a selective flit buffering based on flit marking. Our proposed router design reduces the average flit latency and the deflection rate, and improves the throughput with respect to the existing minimally buffered deflection routers without any change in the critical path.

...read moreread less

31 citations

Book Chapter•DOI•

An application-aware cache replacement policy for last-level caches

[...]

Tripti S. Warrier¹, B. Anupama¹, Madhu Mutyam¹•Institutions (1)

Indian Institute of Technology Madras¹

19 Feb 2013

TL;DR: Experimental evaluation of ACR technique for 2-core and 4-core systems using SPEC CPU 2000 and 2006 benchmark suites shows significant speed-up improvement over the least recently used and thread-aware dynamic re-reference interval prediction techniques.

...read moreread less

Abstract: Current day multicore processors employ multi-level cache hierarchy with one or two levels of private caches and a shared last-level cache (LLC). Efficient cache replacement policies at LLC are essential for reducing the off-chip memory traffic as well as contention for memory bandwidth. Cache replacement techniques for unicore LLCs may not be efficient for multicore LLCs as multicore LLCs can be shared by applications with varying access behavior, running simultaneously. One application may dominate another by flooding of cache requests and evicting the useful data of the other application. This paper proposes a new cache replacement policy for shared LLC called Application-aware Cache Replacement (ACR). ACR policy prevents victimizing low-access rate application by a high-access rate application. It dynamically keeps track of maximum life-time of cache lines in shared LLC for each concurrent application and helps in efficient utilization of the cache space. Experimental evaluation of ACR technique for 2-core and 4-core systems using SPEC CPU 2000 and 2006 benchmark suites shows significant speed-up improvement over the least recently used and thread-aware dynamic re-reference interval prediction techniques.

...read moreread less

13 citations

Proceedings Article•DOI•

SLIDER: Smart Late Injection DEflection Router for mesh NoCs

[...]

Bhawna Nayak¹, John Jose¹, Madhu Mutyam¹•Institutions (1)

Indian Institute of Technology Madras¹

07 Nov 2013

TL;DR: Experimental results show that SLIDER reduces average flit latency, channel wastage, and deflection rate, and increases throughput in the network when compared to the state-of-the-art minimally buffered deflection routers.

...read moreread less

Abstract: Network-on-Chip (NoC) provides a scalable communication interface for processing cores in large multicore systems. An efficient NoC router should not only minimize the average packet latency of the network but also have minimum pipeline latency, area, and power. Area and power overheads are affecting the scalability and popularity of traditional input buffered routers. In this context minimally buffered deflection routers are emerging as a cost effective alternative. We propose SLIDER, Smart Late Injection DEflection Router, that uses side buffers for accommodating a fraction of deflected flits. The main contributions of this work are smart late injection and selective flit preemption. In SLIDER the injection stage is kept at the end of the router pipeline. This reduces the contention in the arbitration stage, eliminates unwanted intra-router movement of flits and effectively utilizes the idle output channels. We parallelize independent operations in the router pipeline and reduce the pipeline latency by 25%. Experimental results on synthetic and real workloads show that SLIDER reduces average flit latency, channel wastage, and deflection rate, and increases throughput in the network when compared to the state-of-the-art minimally buffered deflection routers.

...read moreread less

10 citations

Journal Article•DOI•

Prevention slot flow-control mechanism for low latency torus network-on-chip

[...]

Arpit Joshi, Prasanna Venkatesh, Madhu Mutyam

31 Oct 2013-Iet Computers and Digital Techniques

TL;DR: A novel flow-control mechanism is proposed to address cost/performance constraints in torus networks and ensure deadlock freedom and achieves significant improvement in throughput, as compared with the traditional design, while using significantly fewer buffers.

...read moreread less

Abstract: The challenge for on-chip networks is to provide low latency communication in a very low power budget. To reduce the latency and maintain the simplicity of a mesh topology, torus topology is proposed. As torus topology has an inherent circular dependency, additional effort is needed to prevent deadlock, even if deadlock free routing algorithms are used. The authors propose a novel flow-control mechanism to address cost/performance constraints in torus networks and ensure deadlock freedom. They achieve flow-control by using a prevention mechanism and ensure deadlock freedom while requiring only a single packet buffer per input port. They simplify the router design by having a simple switch allocator that prioritises in-flight packets, and a single packet buffer per input port that eliminates the need for virtual channels. They also propose a mechanism to avoid starvation that can arise because of the prioritised arbitration. Experimental validation reveals that the authors design achieves significant improvement in throughput, as compared with the traditional design, while using significantly fewer buffers.

...read moreread less

1 citations