scispace - formally typeset
Search or ask a question
Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.


Papers
More filters
Proceedings ArticleDOI
22 May 2008
TL;DR: It is proved, that input patterns can be encoded in the synaptic weights by local Hebbian delay-learning where, after learning, the firing time of an output neuron reflects the distance of the evaluated pattern to its learned input pattern thus realizing a kind of RBF neuron.
Abstract: In this paper we describe a novel, hardware implementation friendly model of spiking neurons, with "sparse temporal coding". This is used then to implement a neural network on a FPGA platform, yielding a high degree of parallelism. In the first section of this paper the biological background of spiking neural networks are discussed such as the structure and the functionality of natural neurons which form the basis of the further presented, artificially built ones. With a clustering application in mind, we prove, that input patterns can be encoded in the synaptic weights by local Hebbian delay-learning where, after learning, the firing time of an output neuron reflects the distance of the evaluated pattern to its learned input pattern thus realizing a kind of RBF neuron. Further in the paper, we show that temporal spike-time coding and Hebbian learning is a viable means for unsupervised computation in a network of spiking neurons, as the network is capable of clustering realistic data. The modular neuron structure, the multiplier- less, fully parallel FPGA hardware implementation of the network, the acquired signals during and after the learning phase are given, with the proper interpretation of the results compared to other reported results in the specific literature.

11 citations

Patent
10 May 2017
TL;DR: In this paper, the authors proposed a self-adaptive rate control method for stream data processing, based on a common data receiving message queue and a big data distributor calculating framework. But the authors did not consider the real-time property and the stability of a mass data processing system.
Abstract: The invention belongs to the technical field of computer applications and relates to a self-adaptive rate control method for stream data processing. According to the method, based on a common data receiving message queue and a big data distributor calculating framework, the degree of parallelism of data processing is adjusted through a pre-fragmentation mode according to the condition of a current calculating colony and the quantity of current processed data of the colony is dynamically adjusted according to a self adaptive real time rate control method, so that the stability of the calculating colony is ensured, and the delay of data stream processing is reduced. Along with gradual penetration of big data into the industries, the application range of the real-time processing of mass data is gradually expanded. The real time property and the stability of a mass data processing system are quite important. On the premise of not increasing the quantity of the colony hardware and the task programming complexity, the stability and the processing efficiency of the calculating colony are enhanced to a certain extent.

11 citations

Journal ArticleDOI
TL;DR: This paper presents several parallel, multiwavefront algorithms based on two approaches, i.e., identification and elimination approaches, to verify association patterns specified in queries, thus introducing a higher degree of parallelism in query processing.

11 citations

Proceedings ArticleDOI
28 Apr 2018
TL;DR: The proposed 8-bits fixed-point parallel multiply-accumulate (MAC) unit architecture aimed to create a fully-customize MAC unit for the Convolutional Neural Networks (CNN) instead of depending on the conventional DSP blocks and embedded memories units on the FPGAs architecture silicon fabrics.
Abstract: Deep neural network algorithms have proven their enormous capabilities in wide range of artificial intelligence applications, specially in Printed/Handwritten text recognition, Multimedia processing, Robotics and many other high end technological trends. The most challenging aspect nowadays is to overcome the extremely computational processing demands in applying such algorithms, especially in real-time systems. Recently, the Field Programmable Gate Array (FPGA) has been considered as one of the optimum hardware accelerator platform for accelerating the deep neural network architectures due to its large adaptability and the high degree of parallelism it offers. In this paper, the proposed 8-bits fixed-point parallel multiply-accumulate (MAC) unit architecture aimed to create a fully-customize MAC unit for the Convolutional Neural Networks (CNN) instead of depending on the conventional DSP blocks and embedded memories units on the FPGAs architecture silicon fabrics. The proposed 8-bits fixed-point parallel multiply-accumulate (MAC) unit architecture is designed using VHDL language and can performs a computational speed up to 4.17 Giga Operation per Second (GOPS) using high-density FPGAs.

11 citations

Journal ArticleDOI
TL;DR: In this article, an implementation on GPUs of the Sources Reconstruction Method (SRM) applied to antenna characterization that is based on a compute-bound algorithm with a high degree of parallelism is presented.
Abstract: The Sources Reconstruction Method (SRM) is a noninvasive technique for, among other applications, antenna characterization. The SRM is based on obtaining a distribution of equivalent currents that radiate the same field as the antenna under test. The computation of these currents requires solving a linear system, usually ill-posed, that may be very computationally demanding for commercial antennas. Graphics Processing Units (GPUs) are an interesting hardware choice for solving compute-bound problems that are prone to parallelism. In this paper, we present an implementation on GPUs of the SRM applied to antenna characterization that is based on a compute-bound algorithm with a high degree of parallelism. The GPU implementation introduced in this work provides a dramatic reduction on the time cost compared to our CPU implementation and, in addition, keeps the low-memory footprint of the latter. For the sake of illustration, the equivalent currents are obtained on a base station antenna array and a helix antenna working at practical frequencies. Quasi real-time results are obtained on a desktop workstation.

11 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
85% related
Scheduling (computing)
78.6K papers, 1.3M citations
83% related
Network packet
159.7K papers, 2.2M citations
80% related
Web service
57.6K papers, 989K citations
80% related
Quality of service
77.1K papers, 996.6K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202147
202048
201952
201870
201775