scispace - formally typeset
Patent

Method and apparatus for distributed and cooperative computation in artificial neural networks

Reads0
Chats0
TLDR
In this paper, an apparatus and method for distributed and cooperative computation in artificial neural networks is described, which comprises an input/output (I/O) interface, a plurality of processing units communicatively coupled to the I/O interface to receive data for input neurons and synaptic weights associated with each of the input neurons, each unit processing at least a portion of the data for the inputs and weights to generate partial results.
Abstract
An apparatus and method are described for distributed and cooperative computation in artificial neural networks. For example, one embodiment of an apparatus comprises: an input/output (I/O) interface; a plurality of processing units communicatively coupled to the I/O interface to receive data for input neurons and synaptic weights associated with each of the input neurons, each of the plurality of processing units to process at least a portion of the data for the input neurons and synaptic weights to generate partial results; and an interconnect communicatively coupling the plurality of processing units, each of the processing units to share the partial results with one or more other processing units over the interconnect, the other processing units using the partial results to generate additional partial results or final results. The processing units may share data including input neurons and weights over the shared input bus.

read more

Citations
More filters
Patent

Layer-based operations scheduling to optimise memory for CNN applications

TL;DR: In this article, a method of configuring a System-on-Chip (SoC) to execute a Convolutional Neural Network (CNN) by receiving scheduling schemes each specifying a sequence of operations executable by Processing Units (PUs) of the SoC is presented.
Patent

Neural network accelerator with parameters resident on chip

TL;DR: In this paper, the authors present an accelerator that includes a computing unit; a first memory bank for storing input activations; and a second memory bank configured to store a sufficient amount of the neural network parameters on the computing unit to allow for latency below a specified level with throughput above aspecified level.
Patent

Network on chip switch interconnect

TL;DR: In this article, a disclosed network on chip includes a semiconductor die and switches disposed on the die, each switch has ports configured to receive packets from and transmit packets to at least two other switches.
Patent

Acceleration processing unit based on convolutional neural network and array structure thereof

TL;DR: In this article, an acceleration processing unit based on a convolutional neural network is used for performing convolution operation on local data, which includes multiple multimedia data and includes a multiplier and an adder.
Patent

Accelerated mathematical engine

TL;DR: In this paper, an accelerated mathematical engine is applied to image processing such that convolution of an image is accelerated by using a two-dimensional matrix processor comprising sub-circuits that include an ALU, output register and shadow register.
References
More filters
Proceedings ArticleDOI

DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning

TL;DR: This study designs an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy, and shows that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s in a small footprint.
Proceedings ArticleDOI

A dynamically configurable coprocessor for convolutional neural networks

TL;DR: This is the first CNN architecture to achieve real-time video stream processing (25 to 30 frames per second) on a wide range of object detection and recognition tasks.
Proceedings ArticleDOI

NeuFlow: A runtime reconfigurable dataflow processor for vision

TL;DR: A scalable dataflow hardware architecture optimized for the computation of general-purpose vision algorithms — neuFlow — and a dataflow compiler — luaFlow — that transforms high-level flow-graph representations of these algorithms into machine code for neu Flow are presented.
Proceedings ArticleDOI

A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks

TL;DR: The nn-X system is presented, a scalable, low-power coprocessor for enabling real-time execution of deep neural networks, able to achieve a peak performance of 227 G-ops/s, which translates to a performance per power improvement of 10 to 100 times that of conventional mobile and desktop processors.
Related Papers (5)