scispace - formally typeset
Search or ask a question
Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.


Papers
More filters
Proceedings ArticleDOI
05 Jun 2017
TL;DR: This paper makes the preliminary attempt to develop the dataflow insight into a specialized graph accelerator and believes that this work would open a wide range of opportunities to improve the performance of computation and memory access for large-scale graph processing.
Abstract: Existing graph processing frameworks greatly improve the performance of memory subsystem, but they are still subject to the underlying modern processor, resulting in the potential inefficiencies for graph processing in the sense of low instruction level parallelism and high branch misprediction. These inefficiencies, in accordance with our comprehensive micro-architectural study, mainly arise out of a wealth of dependencies, serial semantic of instruction streams, and complex conditional instructions in graph processing. In this paper, we propose that a fundamental shift of approach is necessary to break through the inefficiencies of the underlying processor via the dataflow paradigm. It is verified that the idea of applying dataflow approach into graph processing is extremely appealing for the following two reasons. First, as the execution and retirement of instructions only depend on the availability of input data in dataflow model, a high degree of parallelism can be therefore provided to relax the heavy dependency and serial semantic. Second, dataflow is guaranteed to make it possible to reduce the costs of branch misprediction by simultaneously executing all branches of a conditional instruction. Consequently, we make the preliminary attempt to develop the dataflow insight into a specialized graph accelerator. We believe that our work would open a wide range of opportunities to improve the performance of computation and memory access for large-scale graph processing.

8 citations

01 Feb 1985
TL;DR: This thesis explores the issues involved in developing a framework for circuit simulation which exploits the locality exhibited by circuit operation to achieve a high degree of parallelism, and designed and implemented the circuit simulator PRISM.
Abstract: Integrated circuit technology has been advancing at a phenomenal rate over the last several years, and promises to continue to do so. If circuit design is to keep pace with fabrication technology, radically new approaches to computer-aided design will be necessary. One appealing approach is general purpose parallel processing. This thesis explores the issues involved in developing a framework for circuit simulation which exploits the locality exhibited by circuit operation to achieve a high degree of parallelism. This framework maps the topology of the circuit onto the multiprocessor, assigning the simulation of individual partitions to separate processors. A new form of synchronization is developed, based upon a history maintenance and roll back strategy. The circuit simulator PRISM was designed and implemented to determine the efficacy of this approach. The results of several preliminary experiments are reported, along with an analysis of the behavior of PRISM.

8 citations

Journal ArticleDOI
TL;DR: This work shows some possible approaches to the optimization of satellite image processing algorithms on a range of different platforms, discussing the implementation in OpenCL of the classic Brightness Temperature Difference ash-cloud detection algorithm.
Abstract: Satellite image processing algorithms often offer a very high degree of parallelism (e.g., pixel-by-pixel processing) that make them optimal candidates for execution on high-performance parallel computing hardware such as modern graphic processing units (GPUs) and multicore CPUs with vector processing capabilities. By using the OpenCL computing standard, a single implementation of a parallel algorithm can be deployed on a wide range of hardware platforms. However, achieving the best performance on each individual platform may still require a custom implementation. We show some possible approaches to the optimization of satellite image processing algorithms on a range of different platforms, discussing the implementation in OpenCL of the classic Brightness Temperature Difference ash-cloud detection algorithm.

8 citations

Journal ArticleDOI
17 May 1988
TL;DR: The GREEDY network is presented, a new interconnection network (IN) for tightly coupled multiprocessors (TCMs) and an original and cost effective hardware synchronization mechanism is proposed which may be achieved at execution time on a very large spectrum of loops.
Abstract: To satisfy the growing need for computing power, a high degree of parallelism will be necessary in future supercomputers. Up to the late 70s, supercomputers were either multiprocessors (SIMD-MIMD) or pipelined monoprocessors. Current commercial products combine these two levels of parallelism.Effective performance will depend on the spectrum of algorithms which is actually run in parallel. In a previous paper [Je86], we have presented the DSPA processor, a pipeline processor which is actually performant on a very large family of loops.In this paper, we present the GREEDY network, a new interconnection network (IN) for tightly coupled multiprocessors (TCMs). Then we propose an original and cost effective hardware synchronization mechanism. When DSPA processors are connected with a shared memory through a GREEDY network and synchronized by our synchronization mechanism, a very high parallelism may be achieved at execution time on a very large spectrum of loops including loops where independency of the successive iterations cannot be checked at compile time as e.g. loop 1: DO 1 I=1 N1 A(P(I)=A(Q(I))

8 citations

Journal ArticleDOI
TL;DR: A model of system performance for parallel processing on clustered multiprocessors is developed which unifies multiprogramming with speedup and scaled-speedup and heuristics are developed that relate cluster size to parallel fraction of a program and to process scaling factors.
Abstract: A model of system performance for parallel processing on clustered multiprocessors is developed which unifies multiprogramming with speedup and scaled-speedup. The model is used to explore processor to process allocation alternatives for executing a workload consisting of multiple processes. Heuristics are developed that relate cluster size to parallel fraction of a program and to process scaling factors. The basic analytical model is made more sophisticated by incorporating considerations that affect the realizable speedup, including explicit process scaling, Degree of Parallelism (DOP) as a discrete function, and communication overhead. New developments incorporate nonuniform workload, interconnection network probability of acceptance of requests, nonuniform memory access, and multithreaded processes.

8 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
85% related
Scheduling (computing)
78.6K papers, 1.3M citations
83% related
Network packet
159.7K papers, 2.2M citations
80% related
Web service
57.6K papers, 989K citations
80% related
Quality of service
77.1K papers, 996.6K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202147
202048
201952
201870
201775