scispace - formally typeset
Search or ask a question
Author

Gautham N. Chinya

Bio: Gautham N. Chinya is an academic researcher from Intel. The author has contributed to research in topics: Thread (computing) & Instruction set. The author has an hindex of 20, co-authored 80 publications receiving 2662 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Loihi is a 60-mm2 chip fabricated in Intels 14-nm process that advances the state-of-the-art modeling of spiking neural networks in silicon, and can solve LASSO optimization problems with over three orders of magnitude superior energy-delay-product compared to conventional solvers running on a CPU iso-process/voltage/area.
Abstract: Loihi is a 60-mm2 chip fabricated in Intels 14-nm process that advances the state-of-the-art modeling of spiking neural networks in silicon. It integrates a wide range of novel features for the field, such as hierarchical connectivity, dendritic compartments, synaptic delays, and, most importantly, programmable synaptic learning rules. Running a spiking convolutional form of the Locally Competitive Algorithm, Loihi can solve LASSO optimization problems with over three orders of magnitude superior energy-delay-product compared to conventional solvers running on a CPU iso-process/voltage/area. This provides an unambiguous example of spike-based computation, outperforming all known conventional solutions.

2,331 citations

Proceedings ArticleDOI
10 Jun 2007
TL;DR: Exoskeleton Sequencer (EXO), an architecture to represent heterogeneous accelerators as ISA-based MIMD architecture resources, and C for Heterogeneous Integration (CHI), an integrated C/C++ programming environment that supports accelerator-specific inline assembly and domain-specific languages are presented.
Abstract: Future mainstream microprocessors will likely integrate specialized accelerators, such as GPUs, onto a single die to achieve better performance and power efficiency. However, it remains a keen challenge to program such a heterogeneous multicore platform, since these specialized accelerators feature ISAs and functionality that are significantly different from the general purpose CPU cores. In this paper, we present EXOCHI: (1) Exoskeleton Sequencer(EXO), an architecture to represent heterogeneous acceleratorsas ISA-based MIMD architecture resources, and a shared virtual memory heterogeneous multithreaded program execution model that tightly couples specialized accelerator cores with generalpurpose CPU cores, and (2) C for Heterogeneous Integration(CHI), an integrated C/C++ programming environment that supports accelerator-specific inline assembly and domain-specific languages. The CHI compiler extends the OpenMP pragma for heterogeneous multithreading programming, and produces a single fat binary with code sections corresponding to different instruction sets. The runtime can judiciously spread parallel computation across the heterogeneous cores to optimize performance and power.We have prototyped the EXO architecture on a physical heterogeneous platform consisting of an Intel® Core™ 2 Duo processor and an 8-core 32-thread Intel® Graphics Media Accelerator X3000. In addition, we have implemented the CHI integrated programming environment with the Intel® C++ Compiler, runtime toolset, and debugger. On the EXO prototype system, we have enhanced a suite of production-quality media kernels for video and image processing to utilize the accelerator through the CHI programming interface, achieving significant speedup (1.41X to10.97X) over execution on the IA32 CPU alone.

169 citations

Journal ArticleDOI
TL;DR: The authors present the Loihi toolchain, which consists of an intuitive Python-based API for specifying SNNs, a compiler and runtime for building and executing SNN’s on LoihI, and several target platforms (Loihi silicon, FPGA, and functional simulator).
Abstract: Loihi is Intel’s novel, manycore neuromorphic processor and is the first of its kind to feature a microcode-programmable learning engine that enables on-chip training of spiking neural networks (SNNs). The authors present the Loihi toolchain, which consists of an intuitive Python-based API for specifying SNNs, a compiler and runtime for building and executing SNNs on Loihi, and several target platforms (Loihi silicon, FPGA, and functional simulator). To showcase the toolchain, the authors describe how to build, train, and use a SNN to classify handwritten digits from the MNIST database.

108 citations

Patent
30 Jun 2005
TL;DR: In this paper, a method for managing user-level threads on a first instruction sequencer in response to executing user level instructions on a second instruction sequencers that is under control of an application level program is presented.
Abstract: In an embodiment, a method is provided. The method includes managing user-level threads on a first instruction sequencer in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least 1) a field that makes reference to one or more instruction sequencers or 2) implicitly references with a pointer to code that specifically addresses one or more instruction sequencers when the code is executed.

97 citations

Patent
21 Mar 2014
TL;DR: In this article, a memory management unit (MMU) has entries to store virtual address to physical address translations, where each entry includes a location indicator to indicate whether a memory location for the corresponding entry is present in a local or remote memory.
Abstract: In one embodiment, the present invention includes a memory management unit (MMU) having entries to store virtual address to physical address translations, where each entry includes a location indicator to indicate whether a memory location for the corresponding entry is present in a local or remote memory. In this way, a common virtual memory space can be shared between the two memories, which may be separated by one or more non-coherent links. Other embodiments are described and claimed.

91 citations


Cited by
More filters
Proceedings ArticleDOI
20 Feb 2008
TL;DR: This work discusses the GeForce 8800 GTX processor's organization, features, and generalized optimization strategies, and achieves increased performance by reordering accesses to off-chip memory to combine requests to the same or contiguous memory locations and apply classical optimizations to reduce the number of executed operations.
Abstract: GPUs have recently attracted the attention of many application developers as commodity data-parallel coprocessors. The newest generations of GPU architecture provide easier programmability and increased generality while maintaining the tremendous memory bandwidth and computational power of traditional GPUs. This opportunity should redirect efforts in GPGPU research from ad hoc porting of applications to establishing principles and strategies that allow efficient mapping of computation to graphics hardware. In this work we discuss the GeForce 8800 GTX processor's organization, features, and generalized optimization strategies. Key to performance on this platform is using massive multithreading to utilize the large number of cores and hide global memory latency. To achieve this, developers face the challenge of striking the right balance between each thread's resource usage and the number of simultaneously active threads. The resources to manage include the number of registers and the amount of on-chip memory used per thread, number of threads per multiprocessor, and global memory bandwidth. We also obtain increased performance by reordering accesses to off-chip memory to combine requests to the same or contiguous memory locations and apply classical optimizations to reduce the number of executed operations. We apply these strategies across a variety of applications and domains and achieve between a 10.5X to 457X speedup in kernel codes and between 1.16X to 431X total application speedup.

993 citations

Journal ArticleDOI
27 Nov 2019-Nature
TL;DR: An overview of the developments in neuromorphic computing for both algorithms and hardware is provided and the fundamentals of learning and hardware frameworks are highlighted, with emphasis on algorithm–hardware codesign.
Abstract: Guided by brain-like ‘spiking’ computational frameworks, neuromorphic computing—brain-inspired computing for machine intelligence—promises to realize artificial intelligence while reducing the energy requirements of computing platforms. This interdisciplinary field began with the implementation of silicon circuits for biological neural routines, but has evolved to encompass the hardware implementation of algorithms with spike-based encoding and event-driven representations. Here we provide an overview of the developments in neuromorphic computing for both algorithms and hardware and highlight the fundamentals of learning and hardware frameworks. We discuss the main challenges and the future prospects of neuromorphic computing, with emphasis on algorithm–hardware codesign. The authors review the advantages and future prospects of neuromorphic computing, a multidisciplinary engineering concept for energy-efficient artificial intelligence with brain-inspired functionality.

877 citations

Journal ArticleDOI
TL;DR: This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras.
Abstract: Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of is), very high dynamic range (140dB vs. 60dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.

697 citations

Journal ArticleDOI
TL;DR: In this paper, the spin degree of freedom of electrons and/or holes, which can also interact with their orbital moments, is described with respect to the spin generation methods as detailed in Sections 2-~-9.

614 citations

Journal ArticleDOI
01 Jul 2018
TL;DR: This Review Article examines the development of organic neuromorphic devices, considering the different switching mechanisms used in the devices and the challenges the field faces in delivering neuromorphic computing applications.
Abstract: Neuromorphic computing could address the inherent limitations of conventional silicon technology in dedicated machine learning applications. Recent work on silicon-based asynchronous spiking neural networks and large crossbar arrays of two-terminal memristive devices has led to the development of promising neuromorphic systems. However, delivering a compact and efficient parallel computing technology that is capable of embedding artificial neural networks in hardware remains a significant challenge. Organic electronic materials offer an attractive option for such systems and could provide biocompatible and relatively inexpensive neuromorphic devices with low-energy switching and excellent tunability. Here, we review the development of organic neuromorphic devices. We consider different resistance-switching mechanisms, which typically rely on electrochemical doping or charge trapping, and report approaches that enhance state retention and conductance tuning. We also discuss the challenges the field faces in implementing low-power neuromorphic computing, such as device downscaling and improving device speed. Finally, we highlight early demonstrations of device integration into arrays, and consider future directions and potential applications of this technology.

568 citations