Patent
Method and apparatus for distributed and cooperative computation in artificial neural networks
Frederico Pratas,Ayose Falcón,Marc Lupon,Fernando Latorre,Pedro Lopez,Enric Herrero Abellanas,Georgios Tournavitis +6 more
Reads0
Chats0
TLDR
In this paper, an apparatus and method for distributed and cooperative computation in artificial neural networks is described, which comprises an input/output (I/O) interface, a plurality of processing units communicatively coupled to the I/O interface to receive data for input neurons and synaptic weights associated with each of the input neurons, each unit processing at least a portion of the data for the inputs and weights to generate partial results.Abstract:
An apparatus and method are described for distributed and cooperative computation in artificial neural networks. For example, one embodiment of an apparatus comprises: an input/output (I/O) interface; a plurality of processing units communicatively coupled to the I/O interface to receive data for input neurons and synaptic weights associated with each of the input neurons, each of the plurality of processing units to process at least a portion of the data for the input neurons and synaptic weights to generate partial results; and an interconnect communicatively coupling the plurality of processing units, each of the processing units to share the partial results with one or more other processing units over the interconnect, the other processing units using the partial results to generate additional partial results or final results. The processing units may share data including input neurons and weights over the shared input bus.read more
Citations
More filters
Patent
Layer-based operations scheduling to optimise memory for CNN applications
Ambrose Jude Angelo,Ahmed Iftekhar,Yachide Yusuke,Bokhari Haseeb,Peddersen Jorgen,Parameswaran Sridevan +5 more
TL;DR: In this article, a method of configuring a System-on-Chip (SoC) to execute a Convolutional Neural Network (CNN) by receiving scheduling schemes each specifying a sequence of operations executable by Processing Units (PUs) of the SoC is presented.
Patent
Neural network accelerator with parameters resident on chip
TL;DR: In this paper, the authors present an accelerator that includes a computing unit; a first memory bank for storing input activations; and a second memory bank configured to store a sufficient amount of the neural network parameters on the computing unit to allow for latency below a specified level with throughput above aspecified level.
Patent
Network on chip switch interconnect
Swarbrick Ian A,Sagheer Ahmad +1 more
TL;DR: In this article, a disclosed network on chip includes a semiconductor die and switches disposed on the die, each switch has ports configured to receive packets from and transmit packets to at least two other switches.
Patent
Acceleration processing unit based on convolutional neural network and array structure thereof
TL;DR: In this article, an acceleration processing unit based on a convolutional neural network is used for performing convolution operation on local data, which includes multiple multimedia data and includes a multiplier and an adder.
Patent
Accelerated mathematical engine
TL;DR: In this paper, an accelerated mathematical engine is applied to image processing such that convolution of an image is accelerated by using a two-dimensional matrix processor comprising sub-circuits that include an ALU, output register and shadow register.
References
More filters
Proceedings ArticleDOI
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
TL;DR: This study designs an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy, and shows that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s in a small footprint.
Journal ArticleDOI
A reconfigurable fabric for accelerating large-scale datacenter services
Andrew Putnam,Adrian M. Caulfield,Eric S. Chung,Derek Chiou,Kypros Constantinides,John Demme,Hadi Esmaeilzadeh,Jeremy Fowers,Gopi Prashanth Gopal,Jan Gray,Michael Haselman,Scott Hauck,Stephen F. Heil,Amir Hormati,Joo-Young Kim,Sitaram Lanka,James R. Larus,Eric C. Peterson,Simon Pope,Aaron L. Smith,Jason Thong,Phillip Yi Xiao,Doug Burger +22 more
TL;DR: The requirements and architecture of the fabric are described, the critical engineering challenges and solutions needed to make the system robust in the presence of failures are detailed, and the performance, power, and resilience of the system when ranking candidate documents are measured.
Proceedings ArticleDOI
A dynamically configurable coprocessor for convolutional neural networks
TL;DR: This is the first CNN architecture to achieve real-time video stream processing (25 to 30 frames per second) on a wide range of object detection and recognition tasks.
Proceedings ArticleDOI
NeuFlow: A runtime reconfigurable dataflow processor for vision
TL;DR: A scalable dataflow hardware architecture optimized for the computation of general-purpose vision algorithms — neuFlow — and a dataflow compiler — luaFlow — that transforms high-level flow-graph representations of these algorithms into machine code for neu Flow are presented.
Proceedings ArticleDOI
A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks
TL;DR: The nn-X system is presented, a scalable, low-power coprocessor for enabling real-time execution of deep neural networks, able to achieve a peak performance of 227 G-ops/s, which translates to a performance per power improvement of 10 to 100 times that of conventional mobile and desktop processors.