Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Genetic algorithms for parallel code optimization

[...]

Ender Özcan¹, E. Onbasioglu¹•Institutions (1)

Yeditepe University¹

19 Jun 2004

TL;DR: Steady state memetic algorithm is compared with transgenerational Memetic algorithm using different crossover operators and hill-climbing methods to find the best number of processors and the best data distribution method for each stage of a parallel program.

...read moreread less

Abstract: Determining the optimum data distribution, degree of parallelism and the communication structure on distributed memory machines for a given algorithm is not a straightforward task. Assuming that a parallel algorithm consists of consecutive stages, a genetic algorithm is proposed to find the best number of processors and the best data distribution method to be used for each stage of the parallel algorithm. Steady state genetic algorithm is compared with transgenerational genetic algorithm using different crossover operators. Performance is evaluated in terms of the total execution time of the program including communication and computation times. A computation intensive, a communication intensive and a mixed implementation are utilized in the experiments. The performance of GA provides satisfactory results for these illustrative examples.

...read moreread less

31 citations

Proceedings Article•DOI•

Towards highly parallel event processing through reconfigurable hardware

[...]

Mohammad Sadoghi¹, Harsh Singh¹, Hans-Arno Jacobsen¹•Institutions (1)

University of Toronto¹

13 Jun 2011

TL;DR: In this article, the authors present an efficient event processing platform to support high-frequency and low-latency event matching over reconfigurable hardware, where each solution is formulated as a design trade-off between the degree of parallelism versus the desired application requirement.

...read moreread less

Abstract: We present fpga-ToPSS (Toronto Publish/Subscribe System), an efficient event processing platform to support high-frequency and low-latency event matching. fpga-ToPSS is built over reconfigurable hardware---FPGAs---to achieve line-rate processing by exploring various degrees of parallelism. Furthermore, each of our proposed FPGA-based designs is geared towards a unique application requirement, such as flexibility, adaptability, scalability, or pure performance, such that each solution is specifically optimized to attain a high level of parallelism. Therefore, each solution is formulated as a design trade-off between the degree of parallelism versus the desired application requirement. Moreover, our event processing engine supports Boolean expression matching with an expressive predicate language applicable to a wide range of applications including real-time data analysis, algorithmic trading, targeted advertisement, and (complex) event processing.

...read moreread less

31 citations

Book Chapter•DOI•

Massively Parallel Video Networks

[...]

Joao Carreira, Viorica Patraucean, Laurent Mazaré, Andrew Zisserman, Simon Osindero - Show less +1 more

08 Sep 2018

TL;DR: In this article, causal video understanding models are proposed to improve efficiency of video processing by maximising throughput, minimising latency, and reducing the number of clock cycles by using operation pipelining and multi-rate clocks.

...read moreread less

Abstract: We introduce a class of causal video understanding models that aims to improve efficiency of video processing by maximising throughput, minimising latency, and reducing the number of clock cycles. Leveraging operation pipelining and multi-rate clocks, these models perform a minimal amount of computation (e.g. as few as four convolutional layers) for each frame per timestep to produce an output. The models are still very deep, with dozens of such operations being performed but in a pipelined fashion that enables depth-parallel computation. We illustrate the proposed principles by applying them to existing image architectures and analyse their behaviour on two video tasks: action recognition and human keypoint localisation. The results show that a significant degree of parallelism, and implicitly speedup, can be achieved with little loss in performance.

...read moreread less

31 citations

Journal Article•DOI•

On the Computation of Reducible Invariant Tori on a Parallel Computer

[...]

Àngel Jorba¹, Estrella Olmedo¹•Institutions (1)

University of Barcelona¹

14 Oct 2009-Siam Journal on Applied Dynamical Systems

TL;DR: The algorithm presents a high degree of parallelism, and the computational effort grows linearly with the number of Fourier modes needed to represent the solution, for these reasons it is a very good option to compute quasi-periodic solutions with several basic frequencies.

...read moreread less

Abstract: We present an algorithm for the computation of reducible invariant tori of discrete dynamical systems that is suitable for tori of dimensions larger than 1. It is based on a quadratically convergent scheme that approximates, at the same time, the Fourier series of the torus, its Floquet transformation, and its Floquet matrix. The Floquet matrix describes the linearization of the dynamics around the torus and, hence, its linear stability. The algorithm presents a high degree of parallelism, and the computational effort grows linearly with the number of Fourier modes needed to represent the solution. For these reasons it is a very good option to compute quasi-periodic solutions with several basic frequencies. The paper includes some examples (flows) to show the efficiency of the method in a parallel computer. In these flows we compute invariant tori of dimensions up to 5, by taking suitable sections.

...read moreread less

31 citations

Journal Article•DOI•

Efficient GPU-Based Electromagnetic Transient Simulation for Power Systems With Thread-Oriented Transformation and Automatic Code Generation

[...]

Yankan Song¹, Ying Chen¹, Shaowei Huang¹, Yin Xu², Zhitong Yu¹, Wei Xue¹ - Show less +2 more•Institutions (2)

Tsinghua University¹, Beijing Jiaotong University²

07 May 2018-IEEE Access

TL;DR: An efficient GPU-based parallel EMT simulator is designed that significantly accelerates EMT simulations compared with a CPU-based program, and code automation tools improve computational efficiency by substantially reducing addressing and memory access.

...read moreread less

Abstract: Electromagnetic transients (EMT) simulation is the most accurate and intensive computation for power systems. Past research has shown the potential of accelerating such simulations using graphics processing units (GPUs). In this paper, an efficient GPU-based parallel EMT simulator is designed. Thread-oriented model transformations are first proposed for the electrical and control systems. Following the transformations, the electrical system is represented by connected networks of massive primitive electrical elements, the computations of which can be constructed as massive fused multiply-add operations and solutions to a linear equation. The control systems are represented by a layered directed acyclic graph with primitive control elements that can be dealt with using single-instruction-multiple-threads groups. Finally, code automation tools are designed to form the GPU kernels. Compared with past work, the proposed model transformations improve the degree of parallelism. Most importantly, the code automation tools improve computational efficiency by substantially reducing addressing and memory access, and render the implementation of the algorithm more general and convenient. Test systems of different sizes were created by connecting multiple IEEE 33-bus distribution systems and adding distributed generators. Simulations were performed on NVIDIA’s K20 $\times$ and P100 cards. The results indicate that the proposed method significantly accelerates EMT simulations compared with a CPU-based program. Real-time performance was also achieved under certain conditions.

...read moreread less

31 citations

Collapse

Network Information

Performance

Metrics

1,515

Papers

27,447

Citations

No. of papers in the topic in previous years
Year	Papers
2022	1
2021	47
2020	48
2019	52
2018	70
2017	75

Degree of parallelism

Papers published on a yearly basis

Papers

Trending Questions (7)

Network Information

Related Topics (5)

Performance

Metrics