Method and apparatus for distributed and cooperative computation in artificial neural networks

Patent

Method and apparatus for distributed and cooperative computation in artificial neural networks

Chats0

TLDR

In this paper, an apparatus and method for distributed and cooperative computation in artificial neural networks is described, which comprises an input/output (I/O) interface, a plurality of processing units communicatively coupled to the I/O interface to receive data for input neurons and synaptic weights associated with each of the input neurons, each unit processing at least a portion of the data for the inputs and weights to generate partial results.

Abstract:

An apparatus and method are described for distributed and cooperative computation in artificial neural networks. For example, one embodiment of an apparatus comprises: an input/output (I/O) interface; a plurality of processing units communicatively coupled to the I/O interface to receive data for input neurons and synaptic weights associated with each of the input neurons, each of the plurality of processing units to process at least a portion of the data for the input neurons and synaptic weights to generate partial results; and an interconnect communicatively coupling the plurality of processing units, each of the processing units to share the partial results with one or more other processing units over the interconnect, the other processing units using the partial results to generate additional partial results or final results. The processing units may share data including input neurons and weights over the shared input bus.

Citations

PDF

Open Access

More filters

Patent

Layer-based operations scheduling to optimise memory for CNN applications

Ambrose Jude Angelo, +5 more

TL;DR: In this article, a method of configuring a System-on-Chip (SoC) to execute a Convolutional Neural Network (CNN) by receiving scheduling schemes each specifying a sequence of operations executable by Processing Units (PUs) of the SoC is presented.

...read moreread less

Patent

Neural network accelerator with parameters resident on chip

Olivier Temam, +3 more

TL;DR: In this paper, the authors present an accelerator that includes a computing unit; a first memory bank for storing input activations; and a second memory bank configured to store a sufficient amount of the neural network parameters on the computing unit to allow for latency below a specified level with throughput above aspecified level.

...read moreread less

Patent

Network on chip switch interconnect

Swarbrick Ian A, +1 more

TL;DR: In this article, a disclosed network on chip includes a semiconductor die and switches disposed on the die, each switch has ports configured to receive packets from and transmit packets to at least two other switches.

...read moreread less

Patent

Acceleration processing unit based on convolutional neural network and array structure thereof

Song Boyang, +5 more

TL;DR: In this article, an acceleration processing unit based on a convolutional neural network is used for performing convolution operation on local data, which includes multiple multimedia data and includes a multiplier and an adder.

...read moreread less

Patent

Accelerated mathematical engine

Peter Joseph Bannon, +2 more

TL;DR: In this paper, an accelerated mathematical engine is applied to image processing such that convolution of an image is accelerated by using a two-dimensional matrix processor comprising sub-circuits that include an ALU, output register and shadow register.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning

Tianshi Chen, +6 more

TL;DR: This study designs an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy, and shows that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s in a small footprint.

...read moreread less

Journal ArticleDOI

A reconfigurable fabric for accelerating large-scale datacenter services

Andrew Putnam, +22 more

TL;DR: The requirements and architecture of the fabric are described, the critical engineering challenges and solutions needed to make the system robust in the presence of failures are detailed, and the performance, power, and resilience of the system when ranking candidate documents are measured.

...read moreread less

Proceedings ArticleDOI

A dynamically configurable coprocessor for convolutional neural networks

Srimat Chakradhar, +3 more

TL;DR: This is the first CNN architecture to achieve real-time video stream processing (25 to 30 frames per second) on a wide range of object detection and recognition tasks.

...read moreread less

Proceedings ArticleDOI

NeuFlow: A runtime reconfigurable dataflow processor for vision

Clement Farabet, +5 more

TL;DR: A scalable dataflow hardware architecture optimized for the computation of general-purpose vision algorithms — neuFlow — and a dataflow compiler — luaFlow — that transforms high-level flow-graph representations of these algorithms into machine code for neu Flow are presented.

...read moreread less

Proceedings ArticleDOI

A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks

Vinayak Gokhale, +4 more

TL;DR: The nn-X system is presented, a scalable, low-power coprocessor for enabling real-time execution of deep neural networks, able to achieve a peak performance of 227 G-ops/s, which translates to a performance per power improvement of 10 to 100 times that of conventional mobile and desktop processors.

...read moreread less

Method and apparatus for distributed and cooperative computation in artificial neural networks

Citations

Layer-based operations scheduling to optimise memory for CNN applications

Neural network accelerator with parameters resident on chip

Network on chip switch interconnect

Acceleration processing unit based on convolutional neural network and array structure thereof

Accelerated mathematical engine

References

DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning

A reconfigurable fabric for accelerating large-scale datacenter services

A dynamically configurable coprocessor for convolutional neural networks

NeuFlow: A runtime reconfigurable dataflow processor for vision

A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks

Related Papers (5)

Storage device and method for performing convolution operations

Dynamically configurable, multi-ported co-processor for convolutional neural networks

Adaptive selection of artificial neural networks

Artificial neural network architecture

Systolic processor elements for a neural network