scispace - formally typeset
Search or ask a question
Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.


Papers
More filters
Book ChapterDOI
01 Jan 1996
TL;DR: This chapter discusses parallelization of the sequential code SWAN (simulating waves nearshore) on distributed memory architectures using MPI for simulating wind-generated waves in coastal regions using the third-generation wave model SWAN.
Abstract: Publisher Summary This chapter discusses parallelization of the sequential code SWAN (simulating waves nearshore) on distributed memory architectures using MPI for simulating wind-generated waves in coastal regions. Efficient parallel algorithms are required to calculate spectra of random short-crested, wind-generated waves in coastal regions using the third-generation wave model SWAN. The propagation schemes used in SWAN are fully implicit, so that they can be utilized for computing waves in shallow water. Two strategies for parallelizing these schemes are presented: (1) the block Jacobi approximation with a high degree of parallelism, and (2) the block wavefront approach that is to a large extent parallelizable. Contrary to the first one, the latter has the same behavior as the sequential method with respect to convergence. Numerical experiments are run on a dedicated Beowulf cluster with a real-life application. They show that good speedups have been achieved with the block wavefront approach, as long as the computational domain is not divided into too thin slices. Concerning the block Jacobi method, a considerable decline in performance is observed, which is attributable to the numerical overhead arising from tripling the number of iterations.

8 citations

Journal ArticleDOI
18 Mar 2019-PeerJ
TL;DR: This paper proposes using speculation to unleash the parallelism when it is uncertain if some tasks will modify data, and formalizes a new methodology to enable speculative execution in a graph of tasks.
Abstract: Task-based programming models have demonstrated their efficiency in the development of scientific applications on modern high-performance platforms. They allow delegation of the management of parallelization to the runtime system (RS), which is in charge of the data coherency, the scheduling, and the assignment of the work to the computational units. However, some applications have a limited degree of parallelism such that no matter how efficient the RS implementation, they may not scale on modern multicore CPUs. In this paper, we propose using speculation to unleash the parallelism when it is uncertain if some tasks will modify data, and we formalize a new methodology to enable speculative execution in a graph of tasks. This description is partially implemented in our new C++ RS called SPETABARU, which is capable of executing tasks in advance if some others are not certain to modify the data. We study the behavior of our approach to compute Monte Carlo and replica exchange Monte Carlo simulations.

8 citations

Journal ArticleDOI
Jiaxi Hu1, Bingzhe Li1, Cong Ma1, David J. Lilja1, Steven J. Koester1 
TL;DR: A scalable SNG based on the spin-Hall-effect (SHE), which is capable of generating multiple independent stochastic streams simultaneously, and takes advantages of the efficient charge-to-spin conversion from the Spin-Hall material and the intrinsic Stochasticity of nanomagnets.
Abstract: Stochastic computing (SC) is a promising technology that can be used for low-cost hardware designs. However, SC suffers from its long latency. Although parallel processing can efficiently shorten the latency, duplicated stochastic number generators (SNGs) are necessary, which cause substantial hardware overhead. This paper proposes a scalable SNG based on the spin-Hall-effect (SHE), which is capable of generating multiple independent stochastic streams simultaneously. The design takes advantages of the efficient charge-to-spin conversion from the Spin-Hall material and the intrinsic stochasticity of nanomagnets. Compared to previous spintronic SNGs, the SHE-SNG can reduce the area by $1.6\times -7.8\times $ and the power by $4.9\times -13\times $ while increasing the degree of parallelism from 1 to 16. Compared to CMOS-based SNGs, the proposed SNG obtained $24\times -120\times $ and $53\times $ reduction in terms of area and power, respectively. Finally, three benchmarks were implemented, and the results indicate that SC implementations with the proposed SHE-SNG can achieve $1.2\times -29\times $ reduction of hardware resources compared to implementations with previous CMOS- and spintronic-based designs while scaling the degree of parallelism from 1 to 64.

8 citations

Journal ArticleDOI
TL;DR: The Data Synchronized Pipeline Architecture (DSPA) as mentioned in this paper allows a high degree of parallelism in the pipeline, even in the case of unforeseeable behaviors of some resource.

8 citations

Proceedings ArticleDOI
10 Oct 2011
TL;DR: Tarragon, which is based on dataflow, targets latency tolerant scientific computations and achieves high performance, in many cases exceeding the performance of equivalent latency-tolerant, hard coded MPI implementations.
Abstract: In the current practice, scientific programmer and HPC users are required todevelop code that exposes a high degree of parallelism, exhibits high locality,dynamically adapts to the available resources, and hides communication latency.Hiding communication latency is crucial to realize the potential of today'sdistributed memory machines with highly parallel processing modules, andtechnological trends indicate that communication latencies will continue to bean issue as the performance gap between computation and communication widens.However, under Bulk Synchronous Parallel models, the predominant paradigm inscientific computing, scheduling is embedded into the application code. All thephases of a computation are defined and laid out as a linear sequence ofoperations limiting overlap and the program's ability to adapt to communicationdelays.In this paper we present an alternative model, called Tarragon, to overcome thelimitations of Bulk Synchronous Parallelism. Tarragon, which is based ondataflow, targets latency tolerant scientific computations. Tarragon supports atask-dependency graph abstraction in which tasks, the basic unit ofcomputation, are organized as a graph according to their data dependencies,i.e. task precedence. In addition to the task graph, Tarragon supports metadataabstractions, annotations to the task graph, to express locality informationand scheduling policies to improve performance.Tarragon's functionality and underlying programming methodology aredemonstrated on three classes of computations used in scientific domains:structured grids, sparse linear algebra, and dynamic programming. In theapplication studies, Tarragon implementations achieve high performance, in manycases exceeding the performance of equivalent latency-tolerant, hard coded MPIimplementations.

8 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
85% related
Scheduling (computing)
78.6K papers, 1.3M citations
83% related
Network packet
159.7K papers, 2.2M citations
80% related
Web service
57.6K papers, 989K citations
80% related
Quality of service
77.1K papers, 996.6K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202147
202048
201952
201870
201775