scispace - formally typeset
Search or ask a question

Showing papers by "Rathinakumar Appuswamy published in 2020"


Proceedings Article
30 Apr 2020
TL;DR: This work introduces a novel means to estimate and scale the task loss gradient at each weight and activation layer's quantizer step size, such that it can be learned in conjunction with other network parameters.
Abstract: Deep networks run with low precision operations at inference time offer power and space advantages over high precision alternatives, but need to overcome the challenge of maintaining high accuracy as precision decreases. Here, we present a method for training such networks, Learned Step Size Quantization, that achieves the highest accuracy to date on the ImageNet dataset when using models, from a variety of architectures, with weights and activations quantized to 2-, 3- or 4-bits of precision, and that can train 3-bit models that reach full precision baseline accuracy. Our approach builds upon existing methods for learning weights in quantized networks by improving how the quantizer itself is configured. Specifically, we introduce a novel means to estimate and scale the task loss gradient at each weight and activation layer's quantizer step size, such that it can be learned in conjunction with other network parameters. This approach works using different levels of precision as needed for a given system and requires only a simple modification of existing training code.

303 citations


Patent
02 Apr 2020
TL;DR: In this article, an array of neural cores is adapted to compute, in parallel, an output activation tensor of a neural network layer, where a network is operatively connected to each of the neural cores.
Abstract: Parallel processing among arrays of physical neural cores is provided. An array of neural cores is adapted to compute, in parallel, an output activation tensor of a neural network layer. A network is operatively connected to each of the neural cores. The output activation tensor is distributed across the neural cores. An input activation tensor is distributed across the neural cores. A weight tensor is distributed across the neural cores. Each neural core's computation comprises multiplying elements of a portion of the input activation tensor at that core with elements of a portion of the weight tensor at that core, and storing the summed products in a partial sum corresponding to an element of the output activation tensor. Each element of the output activation tensor is computed by accumulating all of the partial sums corresponding to that element via the network. The partial sums for each element of the output activation tensor are computed in a sequence of steps whose order is described by tracing a path through the weight tensor that visits every weight tensor element that contributes to any partial sum.

1 citations


Patent
16 Jan 2020
TL;DR: In this paper, a plurality of distributed neural cores are provided with hierarchical parallelism, and each of the neural cores is assigned a subset of output activations of a layer of a neural network.
Abstract: Networks of distributed neural cores are provided with hierarchical parallelism. In various embodiments, a plurality of neural cores is provided. Each of the plurality of neural cores comprises a plurality of vector compute units configured to operate in parallel. Each of the plurality of neural cores is configured to compute in parallel output activations by applying its plurality of vector compute units to input activations. Each of the plurality of neural cores is assigned a subset of output activations of a layer of a neural network for computation. Upon receipt of a subset of input activations of the layer of the neural network, each of the plurality of neural cores computes a partial sum for each of its assigned output activations, and computes its assigned output activations from at least the computed partial sums.

Patent
12 May 2020
TL;DR: In this article, a neurosynaptic system consisting of a delay unit for receiving and buffering axonal inputs, and a neural computation unit for generating neuronal outputs by performing a set of computations based on at least one axonal input received by the delay unit is described.
Abstract: Embodiments of the invention provide a neurosynaptic system comprising a delay unit for receiving and buffering axonal inputs, and a neural computation unit for generating neuronal outputs by performing a set of computations based on at least one axonal input received by the delay unit. The system further comprises a permutation unit for receiving external inputs to the system, and transmitting external outputs from the system. The permutation unit maps each external input received as either an axonal input to the delay unit or an external output from the system. The permutation unit maps each neuronal output generated by the neural computation unit as either an axonal input to the delay unit or an external output from the system. The neural computation unit comprises multiple electronic neurons, multiple electronic axons, and a plurality of electronic synapse devices interconnecting the neurons with the axons.

Patent
28 May 2020
TL;DR: In this article, a device for controlling neural inference processor cores, including a compound instruction set architecture, is presented, which includes an instruction memory, which comprises a plurality of instructions for controlling a NIR processor core.
Abstract: A device for controlling neural inference processor cores is provided, including a compound instruction set architecture. The device comprises an instruction memory, which comprises a plurality of instructions for controlling a neural inference processor core. Each of the plurality of instructions comprises a control operation. The device further comprises a program counter. The device further comprises at least one loop counter register. The device is adapted to execute the plurality of instructions. Executing the plurality of instructions comprises: reading an instruction from the instruction memory based on a value of the program counter; updating the at least one loop counter register according to the control operation of the instruction; and updating the program counter according to the control operation of the instruction and a value of the at least one loop counter register.