scispace - formally typeset
Search or ask a question
Author

Amrutur Bharadwaj

Bio: Amrutur Bharadwaj is an academic researcher from Indian Institute of Science. The author has contributed to research in topics: Programmable Interrupt Controller & Clock rate. The author has an hindex of 4, co-authored 7 publications receiving 89 citations.

Papers
More filters
Proceedings ArticleDOI
09 Dec 2006
TL;DR: Through simulation studies, this paper establishes the superiority of molecular cache (caches built as aggregations of molecules) that offers a 29% power advantage over that of an equivalently performing traditional cache.
Abstract: CMPs enable simultaneous execution of multiple applications on the same platforms that share cache resources. Diversity in the cache access patterns of these simultaneously executing applications can potentially trigger inter-application interference, leading to cache pollution. Whereas a large cache can ameliorate this problem, the issues of larger power consumption with increasing cache size, amplified at sub-100nm technologies, makes this solution prohibitive. In this paper, in order to address the issues relating to power-aware performance of caches, we propose a caching structure that addresses the following: 1. Definition of application-specific cache partitions as an aggregation of caching units (molecules). The parameters of each molecule namely size, associativity and line size are chosen so that the power consumed by it and access time are optimal for the given technology. 2. Application-Specific resizing of cache partitions with variable and adaptive associativity per cache line, way size and variable line size. 3. A replacement policy that is transparent to the partition in terms of size, heterogeneity in associativity and line size. Through simulation studies we establish the superiority of molecular cache (caches built as aggregations of molecules) that offers a 29% power advantage over that of an equivalently performing traditional cache.

63 citations

Journal ArticleDOI
TL;DR: Results show that the cumulative distribution function of leakage current of ISCAS'85 circuits can be predicted accurately with the error in mean and standard deviation, compared to Monte Carlo-based simulations, being less than 1% and 2% respectively across a range of voltage and temperature values.
Abstract: Artificial neural networks (ANNs) have shown great promise in modeling circuit parameters for computer aided design applications. Leakage currents, which depend on process parameters, supply voltage and temperature can be modeled accurately with ANNs. However, the complex nature of the ANN model, with the standard sigmoidal activation functions, does not allow analytical expressions for its mean and variance. We propose the use of a new activation function that allows us to derive an analytical expression for the mean and a semi-analytical expression for the variance of the ANN-based leakage model. To the best of our knowledge this is the first result in this direction. Our neural network model also includes the voltage and temperature as input parameters, thereby enabling voltage and temperature aware statistical leakage analysis (SLA). All existing SLA frameworks are closely tied to the exponential polynomial leakage model and hence fail to work with sophisticated ANN models. In this paper, we also set up an SLA framework that can efficiently work with these ANN models. Results show that the cumulative distribution function of leakage current of ISCAS'85 circuits can be predicted accurately with the error in mean and standard deviation, compared to Monte Carlo-based simulations, being less than 1% and 2% respectively across a range of voltage and temperature values.

19 citations

Proceedings ArticleDOI
01 Jan 2020
TL;DR: CorNET as mentioned in this paper is a co-simulation middleware for applications involving multi-robot systems like a network of Unmanned Aerial Vehicle (UAV) systems, which integrates existing tools to simulate flight dynamics and network related aspects.
Abstract: This paper describes CORNET, a co-simulation middleware for applications involving multi-robot systems like a network of Unmanned Aerial Vehicle (UAV) systems. Design of such systems requires knowledge of the flight dynamics of UAVs and the communication links connecting UAVs with each other or with the ground control station. Besides, UAV networks are dynamic and distinctive from other ad-hoc networks and require protocols that can adapt to high-mobility, dynamic topology and changing link quality in power constrained resource platforms. Therefore, it is necessary to co-design the UAV path planning algorithms and the communication protocols. The proposed co-simulation framework integrates existing tools to simulate flight dynamics and network related aspects. Gazebo with robot operating system (ROS) is used as a physical system UAV simulator and NS-3 is used as a network simulator, to jointly capture the cyber-physical system (CPS) aspects of the multi-UAV systems. A particular aspect we address is on synchronizing time and position across the two simulation environments, and we provide APIs to allow easy migration of the algorithms to real platforms.

17 citations

Proceedings ArticleDOI
29 Mar 2018
TL;DR: This work focusses on a new processor for a System on Chip that has a 32-bit, 5-stage pipelined processor, memory subsystem with virtual memory support, interrupt controller, memory error control module, and UART.
Abstract: The emergence of System-on-Chip technology has brought in opportunities in the form of reduced cycle time, superior performance and time-to-market considerations. Our work focusses on a new processor for a System on Chip. The system has a 32-bit, 5-stage pipelined processor, memory subsystem with virtual memory support, interrupt controller, memory error control module, and UART. The processor is based on RISC-V ISA. It supports Integer, Multiply, and Atomic instructions. Memory subsystem includes split caches and translation lookaside buffers. Interrupt controller supports four levels of preemptive priority and preemption can be programmed for individual interrupts. Memory error control module provides single error correction and double error detection for main memory. Wishbone B.3 bus standard is adopted as on-chip bus protocol. The design is implemented on Virtex-7 (XC7VX485tffg1761-2) board and achieves a peak clock frequency of 100MHz.

11 citations

Proceedings ArticleDOI
23 Jul 2020
TL;DR: A high-performance general-purpose processor system, based on open source RISC-V instruction set architecture, that has a 32-bit 5-stage pipeline core with separate 8 KB I-Cache and D-Cache, and supports virtual memory system.
Abstract: A processor is the core component of an electronic system. In this work, we present a high-performance general-purpose processor system, based on open source RISC-V instruction set architecture. Our processor has a 32-bit 5-stage pipeline core with separate 8 KB I-Cache and D-Cache, and supports virtual memory system. The processor supports integer, atomic and floating-point (single and double precision) instruction subset of RISC-V ISA. The nested vectored interrupt unit and the dedicated floating-point execution unit is included in the system to improve its real-time performance. To improve the execution speed of the processor, a branch prediction unit and a hardware Economic Value Added replacement policy for I-Cache and D-Cache is implemented. The performance of processor is evaluated using CoreMark and has a CoreMark value of 3.32 CoreMark/MHz. The design is implemented on Xilinx's Virtex-7 (XC7VX485tffg1761-2) FPGA and has maximum clock frequency of 60MHz.

2 citations


Cited by
More filters
Journal ArticleDOI
01 May 2006
TL;DR: This paper presents CMP cooperative caching, a unified framework to manage a CMP's aggregate on-chip cache resources by forming an aggregate "shared" cache through cooperation among private caches that performs robustly over a range of system/cache sizes and memory latencies.
Abstract: This paper presents CMP Cooperative Caching, a unified framework to manage a CMP's aggregate on-chip cache resources. Cooperative caching combines the strengths of private and shared cache organizations by forming an aggregate "shared" cache through cooperation among private caches. Locally active data are attracted to the private caches by their accessing processors to reduce remote on-chip references, while globally active data are cooperatively identified and kept in the aggregate cache to reduce off-chip accesses. Examples of cooperation include cache-to-cache transfers of clean data, replication-aware data replacement, and global replacement of inactive data. These policies can be implemented by modifying an existing cache replacement policy and cache coherence protocol, or by the new implementation of a directory-based protocol presented in this paper. Our evaluation using full-system simulation shows that cooperative caching achieves an off-chip miss rate similar to that of a shared cache, and a local cache hit rate similar to that of using private caches. Cooperative caching performs robustly over a range of system/cache sizes and memory latencies. For an 8-core CMP with 1MB L2 cache per core, the best cooperative caching scheme improves the performance of multithreaded commercial workloads by 5-11% compared with a shared cache and 4-38% compared with private caches. For a 4-core CMP running multiprogrammed SPEC2000 workloads, cooperative caching is on average 11% and 6% faster than shared and private cache organizations, respectively.

377 citations

Proceedings ArticleDOI
04 Jun 2011
TL;DR: This work presents Vantage, a novel cache partitioning technique that overcomes the limitations of existing schemes: caches can have tens of partitions with sizes specified at cache line granularity, while maintaining high associativity and strong isolation among partitions.
Abstract: Cache partitioning has a wide range of uses in CMPs, from guaranteeing quality of service and controlled sharing to security-related techniques. However, existing cache partitioning schemes (such as way-partitioning) are limited to coarse-grain allocations, can only support few partitions, and reduce cache associativity, hurting performance. Hence, these techniques can only be applied to CMPs with 2-4 cores, but fail to scale to tens of cores. We present Vantage, a novel cache partitioning technique that overcomes the limitations of existing schemes: caches can have tens of partitions with sizes specified at cache line granularity, while maintaining high associativity and strong isolation among partitions. Vantage leverages cache arrays with good hashing and associativity, which enable soft-pinning a large portion of cache lines. It enforces capacity allocations by controlling the replacement process. Unlike prior schemes, Vantage provides strict isolation guarantees by partitioning most (e.g. 90%) of the cache instead of all of it. Vantage is derived from analytical models, which allow us to provide strong guarantees and bounds on associativity and sizing independent of the number of partitions and their behaviors. It is simple to implement, requiring around 1.5% state overhead and simple changes to the cache controller. We evaluate Vantage using extensive simulations. On a 32-core system, using 350 multiprogrammed workloads and one partition per core, partitioning the last-level cache with conventional techniques degrades throughput for 71% of the workloads versus an unpartitioned cache (by 7% average, 25% maximum degradation), even when using 64-way caches. In contrast, Vantage improves throughput for 98% of the workloads, by 8% on average (up to 20%), using a 4-way cache.

229 citations

Proceedings ArticleDOI
09 Jun 2007
TL;DR: This paper develops CMPmemory systems for server consolidation where most sharing occurs within Virtual Machines (VMs) and develops the central idea of imposing atwo-level virtual coherence hierarchy on a physically flat CMP that harmonizes with VM assignment.
Abstract: Server consolidation is becoming an increasingly popular technique to manage and utilize systems This paper develops CMP memory systems for server consolidation where most sharing occurs within Virtual Machines (VMs) Our memory systems maximize shared memory accesses serviced within a VM, minimize interference among separate VMs, facilitate dynamic reassignment of VMs to processors and memory, and support content-based page sharing among VMs We begin with a tiled architecture where each of 64 tiles contains a processor, private L1 caches, and an L2 bank First, we reveal why single-level directory designs fail to meet workload consolidation goals Second, we develop the paper's central idea of imposing a two-level virtual (or logical) coherence hierarchy on a physically flat CMP that harmonizes with VM assignment Third, we show that the best of our two virtual hierarchy (VH) variants performs 12-58% better than the best alternative flat directory protocol when consolidating Apache, OLTP, and Zeus commel workloads on our simulated 64-core CMP

219 citations

Proceedings ArticleDOI
06 Mar 2009
TL;DR: These mechanisms allow the hardware and OS to dynamically manage cache capacity per thread as well as optimize placement of data shared by multiple threads.
Abstract: In future multi-cores, large amounts of delay and power will be spent accessing data in large L2/L3 caches. It has been recently shown that OS-based page coloring allows a non-uniform cache architecture (NUCA) to provide low latencies and not be hindered by complex data search mechanisms. In this work, we extend that concept with mechanisms that dynamically move data within caches. The key innovation is the use of a shadow address space to allow hardware control of data placement in the L2 cache while being largely transparent to the user application and off-chip world. These mechanisms allow the hardware and OS to dynamically manage cache capacity per thread as well as optimize placement of data shared by multiple threads. We show an average IPC improvement of 10-20% for multi-programmed workloads with capacity allocation policies and an average IPC improvement of 8% for multi-threaded workloads with policies for shared page placement.

99 citations

Proceedings ArticleDOI
01 Feb 2018
TL;DR: KPart is presented, a hybrid cache partitioning-sharing technique that sidesteps the limitations of way-partitioning and unlocks significant performance on current systems, and achieves most of the performance of more advanced partitioning techniques that are not yet available in hardware.
Abstract: Cache partitioning is now available in commercial hardware. In theory, software can leverage cache partitioning to use the last-level cache better and improve performance. In practice, however, current systems implement way-partitioning, which offers a limited number of partitions and often hurts performance. These limitations squander the performance potential of smart cache management. We present KPart, a hybrid cache partitioning-sharing technique that sidesteps the limitations of way-partitioning and unlocks significant performance on current systems. KPart first groups applications into clusters, then partitions the cache among these clusters. To build clusters, KPart relies on a novel technique to estimate the performance loss an application suffers when sharing a partition. KPart automatically chooses the number of clusters, balancing the isolation benefits of way-partitioning with its potential performance impact. KPart uses detailed profiling information to make these decisions. This information can be gathered either offline, or online at low overhead using a novel profiling mechanism. We evaluate KPart in a real system and in simulation. KPart improves throughput by 24% on average (up to 79%) on an Intel Broadwell-D system, whereas prior per-application partitioning policies improve throughput by just 1.7% on average and hurt 30% of workloads. Simulation results show that KPart achieves most of the performance of more advanced partitioning techniques that are not yet available in hardware.

96 citations