scispace - formally typeset
Search or ask a question
Author

Neel Gala

Bio: Neel Gala is an academic researcher from Indian Institute of Technology Madras. The author has contributed to research in topics: Overhead (computing) & Microarchitecture. The author has an hindex of 5, co-authored 18 publications receiving 117 citations. Previous affiliations of Neel Gala include Indian Institutes of Technology & Texas Instruments.

Papers
More filters
Proceedings ArticleDOI
25 Jun 2017
TL;DR: This work presents a unified hardware framework for handling spatial and temporal memory attacks with a RISC-V based micro-architecture with an enhanced application binary interface that enables software layers to use these features to protect sensitive data.
Abstract: With increased usage of compute cores for sensitive applications, including e-commerce, there is a need to provide additional hardware support for securing information from memory based attacks. This work presents a unified hardware framework for handling spatial and temporal memory attacks. The paper integrates the proposed hardware framework with a RISC-V based micro-architecture with an enhanced application binary interface that enables software layers to use these features to protect sensitive data. We demonstrate the effectiveness of the proposed scheme through practical case studies in addition to taking the design through a VLSI CAD design flow. The proposed processor reduces the metadata storage overhead up to 4 x in comparison with the existing solutions, while incurring an area overhead of just 1914 LUTs and 2197 flip flops on an FPGA, without affecting the critical path delay of the processor.

39 citations

Proceedings ArticleDOI
22 Nov 2015
TL;DR: SHAKTI-F, a RISC-V based SEE-tolerant micro-processor architecture that provides a solution to the reliability issues mentioned above, is presented and there is a 45% reduction in power consumption due to introduction of fault tolerance.
Abstract: Deeply scaled CMOS circuits are vulnerable to soft and hard errors. These errors pose reliability concerns, especially for systems used in radiation-prone environments like space and nuclear applications. This paper presents SHAKTI-F, a RISC-V based SEE-tolerant micro-processor architecture that provides a solution to the reliability issues mentioned above. The proposed architecture uses error correcting codes (ECC) to tolerate errors in registers and memories, while it employs a combination of space and time redundancy based techniques to tolerate errors in the ALU. Two novel re-computation techniques for detecting errors for the addition/subtraction and multiplication modules are proposed. The scheme also identifies parts of the circuitry that need to be radiation hardened thus providing a total protection to SEEs. The proposed scheme provides fine-grain error detection capability that help in localization of the error to a specific functional unit and isolating the same, rather than the entire processor or a large module within a processor. This provides a graceful degradation and/or fail-safe shutdown capability to the processor. The HDL model of the processor was validated by simulating it with randomly induced SEEs. The proposed scheme adds an extra penalty of only 20% on the core area and 25% penalty on the performance when compared with conventional systems. This is very less when compared to the penalty incurred by employing schemes including double modular and triple modular redundancy. Interestingly, there is a 45% reduction in power consumption due to introduction of faulttolerance. The resulting system runs at 330M Hz on a 55nm technology node, which is sufficient for the class of applications these cores are utilized for.

31 citations

Proceedings ArticleDOI
04 Jan 2016
TL;DR: This workshop plans to talk about the various designs that the SHAKTI initiative is working on and how they can be used, and the need for open-source hardware design and the rationale behind the choices of ISA and HDL.
Abstract: State-of-the-Art Computer Architecture research at Indian Academic Institutions is majorly restricted due to unavailability of processor design models that are close to current commercially available cores. The SHAKTI processor initiative aims at breaking this barrier between Academia and Industry by providing open-source Processor and SoC designs. With the advent of RISC-V ISA by UC Berkeley, we have a simple, clean and most importantly an open source ISA that can be used to design processors which have the potential to match the current day processors in the market. The initiative is also driven to provide substantial information about design decisions and promote more competitive learning environment in academia. The processors from SHAKTI will help in aiding research related to architecture, where one can run simulations on the actual hardware and obtain much accurate results, rather than settling with a lesser accurate software simulation. Since these processors and designs are targeted for real-world use, they can also be freely adopted by industries, thereby supporting the initiative further in terms of product-driven research.In this workshop we plan to talk about the various designs that we are working on and how they can be used. The proposed tutorial consists of four parts. The first part emphasizes on the need for open-source hardware design and the rationale behind our choices of ISA and HDL. The second part covers about our microcontroller class (C-class) core and the verification and debug environment. The third part presents our flagship processor which is the industrial class (I-class) core; the various design decisions involved and some performance metrics. The final part concludes on the note of future work and upcoming releases under SHAKTI.

31 citations

Proceedings ArticleDOI
01 Jul 2017
TL;DR: This work proposes PEASE, a Programmable Event-driven processor Architecture for SNN Evaluation, a method to map any given SNN to PEASE such that the workload is balanced across SPUs and SPU clusters, while pipeling across layers of the network to improve performance.
Abstract: Spiking neural networks (SNNs) represent the third generation of neural networks and are expected to enable new classes of machine learning applications. However, evaluating large-scale SNNs (e.g., of the scale of the visual cortex) on power-constrained systems requires significant improvements in computing efficiency. A unique attribute of SNNs is their event-driven nature—information is encoded as a series of spikes, and work is dynamically generated as spikes propagate through the network. Therefore, parallel implementations of SNNs on multi-cores and GPGPUs are severely limited by communication and synchronization overheads. Recent years have seen great interest in deep learning accelerators for non-spiking neural networks, however, these architectures are not well suited to the dynamic, irregular parallelism in SNNs. Prior efforts on specialized SNN hardware utilize spatial architectures, wherein each neuron is allocated a dedicated processing element, and large networks are realized by connecting multiple chips into a system. While suitable for large-scale systems, this approach is not a good match to size or cost constrained mobile devices. We propose PEASE, a Programmable Event-driven processor Architecture for SNN Evaluation. PEASE comprises of Spike Processing Units (SPUs) that are dynamically scheduled to execute computations triggered by a spike. Instructions to the SPUs are dynamically generated by Spike Schedulers (SSs) that utilize event queues to track unprocessed spikes and identify neurons that need to be evaluated. The memory hierarchy in PEASE is fully software managed, and the processing elements are interconnected using a two-tiered bus-ring topology matching the communication characteristics of SNNs. We propose a method to map any given SNN to PEASE such that the workload is balanced across SPUs and SPU clusters, while pipelining across layers of the network to improve performance. We implemented PEASE at the RTL level and synthesized it to IBM 45 technology. Across 6 SNN benchmarks, our 64-SPU configuration of PEASE achieves 7.1×−17.5× and 2.6×−5.8× speedups, respectively, over software implementations on an Intel Xeon E5-2680 CPU and NVIDIA Tesla K40C GPU. The energy reductions over the CPU and GPU are 71×−179× and 198×−467×, respectively.

25 citations

Journal ArticleDOI
TL;DR: This article provides insights on how the Single-Precision Floating Point (“F”) extension of RISC-V can be leveraged to support posit arithmetic and presents the implementation details of a parameterized and feature-complete posit Floating Point Unit (FPU).
Abstract: Owing to the failure of Dennard’s scaling, the past decade has seen a steep growth of prominent new paradigms leveraging opportunities in computer architecture. Two technologies of interest are Posit and RISC-V. Posit was introduced in mid-2017 as a viable alternative to IEEE-754, and RISC-V provides a commercial-grade open source Instruction Set Architecture (ISA). In this article, we bring these two technologies together and propose a Configurable Posit Enabled RISC-V Core called PERI. The article provides insights on how the Single-Precision Floating Point (“F”) extension of RISC-V can be leveraged to support posit arithmetic. We also present the implementation details of a parameterized and feature-complete posit Floating Point Unit (FPU). The configurability and the parameterization features of this unit generate optimal hardware, which caters to the accuracy and energy/area tradeoffs imposed by the applications, a feature not possible with IEEE-754 implementation. The posit FPU has been integrated with the RISC-V compliant SHAKTI C-class core as an execution unit. To further leverage the potential of posit, we enhance our posit FPU to support two different exponent sizes (with posit-size being 32-bits), thereby enabling multiple-precision at runtime. To enable the compilation and execution of C programs on PERI, we have made minimal modifications to the GNU C Compiler (GCC), targeting the “F” extension of the RISC-V. We compare posit with IEEE-754 in terms of hardware area, application accuracy, and runtime. We also present an alternate methodology of integrating the posit FPU with the RISC-V core as an accelerator using the custom opcode space of RISC-V.

11 citations


Cited by
More filters
Journal ArticleDOI
Florian Zaruba1, Luca Benini1
TL;DR: A thorough power, performance, and efficiency analysis of the RISC-V ISA targeting baseline “application class” functionality, i.e., supporting the Linux OS and its application environment based on the authors' open-source single-issue in-order implementation of the 64-bit ISA variant (RV64GC) called Ariane.
Abstract: The open-source RISC-V instruction set architecture (ISA) is gaining traction, both in industry and academia. The ISA is designed to scale from microcontrollers to server-class processors. Furthermore, openness promotes the availability of various open-source and commercial implementations. Our main contribution in this paper is a thorough power, performance, and efficiency analysis of the RISC-V ISA targeting baseline “application class” functionality, i.e., supporting the Linux OS and its application environment based on our open-source single-issue in-order implementation of the 64-bit ISA variant (RV64GC) called Ariane. Our analysis is based on a detailed power and efficiency analysis of the RISC-V ISA extracted from silicon measurements and calibrated simulation of an Ariane instance (RV64IMC) taped-out in GlobalFoundries 22FDX technology. Ariane runs at up to 1.7-GHz, achieves up to 40-Gop/sW energy efficiency, which is superior to similar cores presented in the literature. We provide insight into the interplay between functionality required for the application-class execution (e.g., virtual memory, caches, and multiple modes of privileged operation) and energy cost. We also compare Ariane with RISCY, a simpler and a slower microcontroller-class core. Our analysis confirms that supporting application-class execution implies a nonnegligible energy-efficiency loss and that compute performance is more cost-effectively boosted by instruction extensions (e.g., packed SIMD) rather than the high-frequency operation.

195 citations

Journal ArticleDOI
TL;DR: In this article, the authors present the state of the art of hardware implementations of spiking neural networks and the current trends in algorithm elaboration from model selection to training mechanisms, and describe the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level.
Abstract: Neuromorphic computing is henceforth a major research field for both academic and industrial actors. As opposed to Von Neumann machines, brain-inspired processors aim at bringing closer the memory and the computational elements to efficiently evaluate machine learning algorithms. Recently, spiking neural networks, a generation of cognitive algorithms employing computational primitives mimicking neuron and synapse operational principles, have become an important part of deep learning. They are expected to improve the computational performance and efficiency of neural networks, but they are best suited for hardware able to support their temporal dynamics. In this survey, we present the state of the art of hardware implementations of spiking neural networks and the current trends in algorithm elaboration from model selection to training mechanisms. The scope of existing solutions is extensive; we thus present the general framework and study on a case-by-case basis the relevant particularities. We describe the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level and discuss their related advantages and challenges.

112 citations

Journal ArticleDOI
TL;DR: This survey presents the state of the art of hardware implementations of spiking neural networks and the current trends in algorithm elaboration from model selection to training mechanisms and describes the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level.
Abstract: Neuromorphic computing is henceforth a major research field for both academic and industrial actors. As opposed to Von Neumann machines, brain-inspired processors aim at bringing closer the memory and the computational elements to efficiently evaluate machine-learning algorithms. Recently, Spiking Neural Networks, a generation of cognitive algorithms employing computational primitives mimicking neuron and synapse operational principles, have become an important part of deep learning. They are expected to improve the computational performance and efficiency of neural networks, but are best suited for hardware able to support their temporal dynamics. In this survey, we present the state of the art of hardware implementations of spiking neural networks and the current trends in algorithm elaboration from model selection to training mechanisms. The scope of existing solutions is extensive; we thus present the general framework and study on a case-by-case basis the relevant particularities. We describe the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level and discuss their related advantages and challenges.

73 citations

Proceedings ArticleDOI
30 May 2020
TL;DR: Xuantie-910 is an industry leading 64-bit high performance embedded RISC-V processor from Alibaba T-Head division that features custom extensions to arithmetic operation, bit manipulation, load and store, TLB and cache operations, and implements the 0.7.1 stable release of RISCV vector extension specification for high efficiency vector processing.
Abstract: The open source RISC-V ISA has been quickly gaining momentum. This paper presents Xuantie-910, an industry leading 64-bit high performance embedded RISC-V processor from Alibaba T-Head division. It is fully based on the RV64GCV instruction set and it features custom extensions to arithmetic operation, bit manipulation, load and store, TLB and cache operations. It also implements the 0.7.1 stable release of RISC-V vector extension specification for high efficiency vector processing. Xuantie-910 supports multi-core multi-cluster SMP with cache coherence. Each cluster contains 1 to 4 core(s) capable of booting the Linux operating system. Each single core utilizes the state-of-the-art 12-stage deep pipeline, out-of-order, multi-issue superscalar architecture, achieving a maximum clock frequency of 2.5 GHz in the typical process, voltage and temperature condition in a TSMC 12nm FinFET process technology. Each single core with the vector execution unit costs an area of 0.8 mm2 (excluding the L2 cache). The toolchain is enhanced significantly to support the vector extension and custom extensions. Through hardware and toolchain co-optimization, to date Xuantie-910 delivers the highest performance (in terms of IPC, speed, and power efficiency) for a number of industrial control flow and data computing benchmarks, when compared with its predecessors in the RISC-V family. Xuantie-910 FPGA implementation has been deployed in the data centers of Alibaba Cloud, for application-specific acceleration (e.g., blockchain transaction). The ASIC deployment at low-cost SoC applications, such as IoT endpoints and edge computing, is planned to facilitate Alibaba's end-to-end and cloud-to-edge computing infrastructure.

55 citations

Journal ArticleDOI
TL;DR: The development of compact e-nose design and calculation over the last few decades is reviewed, possible future trends are discussed, and the development of on-chip calculation and wireless computing is focused on.
Abstract: An electronic nose (e-nose) is a measuring instrument that mimics human olfaction and outputs ‘fingerprint’ information of mixed gases or odors. Generally speaking, an e-nose is mainly composed of two parts: a gas sensing system (gas sensor arrays, gas transmission paths) and an information processing system (microprocessor and related hardware, pattern recognition algorithms). It has been more than 30 years since the e-nose concept was introduced in the 1980s. Since then, e-noses have evolved from being large in size, expensive, and power-hungry instruments to portable, low cost devices with low power consumption. This paper reviews the development of compact e-nose design and calculation over the last few decades, and discusses possible future trends. Regarding the compact e-nose design, which is related to its size and weight, this paper mainly summarizes the development of sensor array design, hardware circuit design, gas path (i.e. the path through which the mixed gases to be measured flow inside the e-nose system) and sampling design, as well as portable design. For the compact e-nose calculation, which is directly related to its rapidity of detection, this review focuses on the development of on-chip calculation and wireless computing. The future trends of compact e-noses include the integration with the internet of things, wearable e-noses, and mobile e-nose systems.

47 citations