Showing papers by "Rathinakumar Appuswamy published in 2020"

PDF

Open Access

Proceedings Article•

[...]

Steven K. Esser¹, Jeffrey L. McKinstry¹, Deepika Bablani¹, Rathinakumar Appuswamy¹, Dharmendra S. Modha¹ - Show less +1 more•Institutions (1)

IBM¹

30 Apr 2020

TL;DR: This work introduces a novel means to estimate and scale the task loss gradient at each weight and activation layer's quantizer step size, such that it can be learned in conjunction with other network parameters.

...read moreread less

Abstract: Deep networks run with low precision operations at inference time offer power and space advantages over high precision alternatives, but need to overcome the challenge of maintaining high accuracy as precision decreases. Here, we present a method for training such networks, Learned Step Size Quantization, that achieves the highest accuracy to date on the ImageNet dataset when using models, from a variety of architectures, with weights and activations quantized to 2-, 3- or 4-bits of precision, and that can train 3-bit models that reach full precision baseline accuracy. Our approach builds upon existing methods for learning weights in quantized networks by improving how the quantizer itself is configured. Specifically, we introduce a novel means to estimate and scale the task loss gradient at each weight and activation layer's quantizer step size, such that it can be learned in conjunction with other network parameters. This approach works using different levels of precision as needed for a given system and requires only a simple modification of existing training code.

...read moreread less

303 citations

Patent•

Data distribution in an array of neural network cores

[...]

Brian Taba¹, Andrew S. Cassidy, Myron D. Flickner, Pallab Datta, Hartmut Penner, Rathinakumar Appuswamy, Sawada Jun, John V. Arthur, Dharmendra S. Modha, Steven K. Esser, Klamo Jennifer - Show less +7 more•Institutions (1)

IBM¹

02 Apr 2020

TL;DR: In this article, an array of neural cores is adapted to compute, in parallel, an output activation tensor of a neural network layer, where a network is operatively connected to each of the neural cores.

...read moreread less

Abstract: Parallel processing among arrays of physical neural cores is provided. An array of neural cores is adapted to compute, in parallel, an output activation tensor of a neural network layer. A network is operatively connected to each of the neural cores. The output activation tensor is distributed across the neural cores. An input activation tensor is distributed across the neural cores. A weight tensor is distributed across the neural cores. Each neural core's computation comprises multiplying elements of a portion of the input activation tensor at that core with elements of a portion of the weight tensor at that core, and storing the summed products in a partial sum corresponding to an element of the output activation tensor. Each element of the output activation tensor is computed by accumulating all of the partial sums corresponding to that element via the network. The partial sums for each element of the output activation tensor are computed in a sequence of steps whose order is described by tracing a path through the weight tensor that visits every weight tensor element that contributes to any partial sum.

...read moreread less

1 citations

Patent•

Hierarchical parallelism in a network of distributed neural network cores

[...]

John V. Arthur¹, Andrew S. Cassidy, Myron D. Flickner, Pallab Datta, Hartmut Penner, Rathinakumar Appuswamy, Sawada Jun, Dharmendra S. Modha, Esser Steven Kyle, Brian Taba, Klamo Jennifer - Show less +7 more•Institutions (1)

IBM¹

16 Jan 2020

TL;DR: In this paper, a plurality of distributed neural cores are provided with hierarchical parallelism, and each of the neural cores is assigned a subset of output activations of a layer of a neural network.

...read moreread less

Abstract: Networks of distributed neural cores are provided with hierarchical parallelism. In various embodiments, a plurality of neural cores is provided. Each of the plurality of neural cores comprises a plurality of vector compute units configured to operate in parallel. Each of the plurality of neural cores is configured to compute in parallel output activations by applying its plurality of vector compute units to input activations. Each of the plurality of neural cores is assigned a subset of output activations of a layer of a neural network for computation. Upon receipt of a subset of input activations of the layer of the neural network, each of the plurality of neural cores computes a partial sum for each of its assigned output activations, and computes its assigned output activations from at least the computed partial sums.

...read moreread less

Patent•

Utilizing a distributed and parallel set of neurosynaptic core circuits for neuronal computation and non-neuronal computation

[...]

Alvarez-Icaza Rivera Rodrigo¹, Rathinakumar Appuswamy, John V. Arthur, Andrew S. Cassidy, Bryan L. Jackson, Paul A. Merolla, Dharmendra S. Modha, Jun Sawada - Show less +4 more•Institutions (1)

IBM¹

12 May 2020

TL;DR: In this article, a neurosynaptic system consisting of a delay unit for receiving and buffering axonal inputs, and a neural computation unit for generating neuronal outputs by performing a set of computations based on at least one axonal input received by the delay unit is described.

...read moreread less

Abstract: Embodiments of the invention provide a neurosynaptic system comprising a delay unit for receiving and buffering axonal inputs, and a neural computation unit for generating neuronal outputs by performing a set of computations based on at least one axonal input received by the delay unit. The system further comprises a permutation unit for receiving external inputs to the system, and transmitting external outputs from the system. The permutation unit maps each external input received as either an axonal input to the delay unit or an external output from the system. The permutation unit maps each neuronal output generated by the neural computation unit as either an axonal input to the delay unit or an external output from the system. The neural computation unit comprises multiple electronic neurons, multiple electronic axons, and a plurality of electronic synapse devices interconnecting the neurons with the axons.

...read moreread less

Patent•

Compound instruction set architecture for a neural inference chip

[...]

Andrew S. Cassidy¹, Rathinakumar Appuswamy, John V. Arthur, Pallab Datta, Michael DeBole, Steven K. Esser, Myron D. Flickner, Dharmendra S. Modha, Hartmut Penner, Sawada Jun, Brian Taba - Show less +7 more•Institutions (1)

IBM¹

28 May 2020

TL;DR: In this article, a device for controlling neural inference processor cores, including a compound instruction set architecture, is presented, which includes an instruction memory, which comprises a plurality of instructions for controlling a NIR processor core.

...read moreread less

Abstract: A device for controlling neural inference processor cores is provided, including a compound instruction set architecture. The device comprises an instruction memory, which comprises a plurality of instructions for controlling a neural inference processor core. Each of the plurality of instructions comprises a control operation. The device further comprises a program counter. The device further comprises at least one loop counter register. The device is adapted to execute the plurality of instructions. Executing the plurality of instructions comprises: reading an instruction from the instruction memory based on a value of the program counter; updating the at least one loop counter register according to the control operation of the instruction; and updating the program counter according to the control operation of the instruction and a value of the at least one loop counter register.

...read moreread less