scispace - formally typeset
Search or ask a question
Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.


Papers
More filters
Proceedings ArticleDOI
05 Feb 2014
TL;DR: The results hereby provided show that adaptively is a strictly necessary requirement to reduce energy consumption in STM systems: Without it, it is not possible to reach any acceptable level of energy efficiency at all.
Abstract: Energy efficiency is becoming a pressing issue, especially in large data centers where it entails, at the same time, a non-negligible management cost, an enhancement of hardware fault probability, and a significant environmental footprint. In this paper, we study how Software Transactional Memories (STM)can provide benefits on both power saving and the overall applications' execution performance. This is related to the fact that encapsulating shared-data accesses within transactions gives the freedom to the STM middleware to both ensure consistency and reduce the actual data contention, the latter having been shown to affect the overall power needed to complete the application's execution. We have selected a set of self-adaptive extensions to existing STM middle wares (namely, TinySTM and R-STM) to prove how self-adapting computation can capture the actual degree of parallelism and/or logical contention on shared data in a better way, enhancing even more the intrinsic benefits provided by STM. Of course, this benefit comes at a cost, which is the actual execution time required by the proposed approaches to precisely tune the execution parameters for reducing power consumption and enhancing execution performance. Nevertheless, the results hereby provided show that adaptively is a strictly necessary requirement to reduce energy consumption in STM systems: Without it, it is not possible to reach any acceptable level of energy efficiency at all.

10 citations

Dissertation
23 Feb 1998
TL;DR: A new polynomial-time algorithm is described, outperforming other current methods in terms of both complexity and application domain, and a general framework so as to handle any kind of dependences, by possibly producing approximate dependences is presented.
Abstract: Array dataflow dependence analysis is paramount for automatic parallelization The description of dependences at the operation and array element level has been shown to improve significantly the output of many code optimizations But this kind of analysis has two main issues: its high cost and its scope limited to a small number of programs We first describe a new polynomial-time algorithm, outperforming other current methods in terms of both complexity and application domain Then, in the continuity of the work done by J-F Collard, we present a general framework so as to handle any kind of dependences, by possibly producing approximate dependences The model of programs is extended to any reducible control graph and any kind of references to array elements An original method called iterative analysis, finds relations between non-affine constraints so as to improve the accuracy of the method Besides, we provide a criterion ensuring that the approximation obtained is the best with respect to the information gathered on non-affine constraints by other analyses Finally, several traditional applications of dataflow analyses are adapted to our method in order to take advantage of its results, and we detail more specifically an array expansion that is a trade-off between run-time overhead, memory requirement and degree of parallelism

10 citations

Book ChapterDOI
26 Aug 2003
TL;DR: RoCL is a communication library that aims to exploit the low-level communication facilities of today’s cluster networking hardware and to merge, via the resource oriented paradigm, those facilities and the high-level degree of parallelism achieved on SMP systems through multi-threading.
Abstract: RoCL is a communication library that aims to exploit the low-level communication facilities of today’s cluster networking hardware and to merge, via the resource oriented paradigm, those facilities and the high-level degree of parallelism achieved on SMP systems through multi-threading.

10 citations

01 Jan 1989
TL;DR: A novel partitioning strategy is outlined for maximizing the degree of parallelism in structural analysis and design that was implemented on the CRAY X-MP/4 and the Alliant FX/8 computers.
Abstract: A review is given of the recent advances in computer technology that are likely to impact structural analysis and design. The computational needs for future structures technology are described. The characteristics of new and projected computing systems are summarized. Advances in programming environments, numerical algorithms, and computational strategies for new computing systems are reviewed, and a novel partitioning strategy is outlined for maximizing the degree of parallelism. The strategy is designed for computers with a shared memory and a small number of powerful processors (or a small number of clusters of medium-range processors). It is based on approximating the response of the structure by a combination of symmetric and antisymmetric response vectors, each obtained using a fraction of the degrees of freedom of the original finite element model. The strategy was implemented on the CRAY X-MP/4 and the Alliant FX/8 computers. For nonlinear dynamic problems on the CRAY X-MP with four CPUs, it resulted in an order of magnitude reduction in total analysis time, compared with the direct analysis on a single-CPU CRAY X-MP machine.

10 citations

Proceedings ArticleDOI
01 Dec 2013
TL;DR: A novel framework for implementing portable and scalable data-intensive applications on reconfigurable hardware featuring Field-Programmable Gate Arrays and memory and a new method to automatically select a task's optimal degree of parallelism on an FPGA for a given hardware platform is presented.
Abstract: This paper presents a novel framework for implementing portable and scalable data-intensive applications on reconfigurable hardware. Instead of using expensive “reconfigurable supercomputers”, we focus our work on standard PCs and PCI-Express extension cards featuring Field-Programmable Gate Arrays (FPGAs) and memory. In our framework, we exploit task-level parallelism by manually partitioning applications into several parallel tasks using a communication API for data streams. This also allows pure software implementations on PCs without FPGA cards. If an FPGA accelerator is present, the same API calls transfer data between the PC's CPU and the FPGA. Then, the tasks implemented in hardware can exploit instruction-level and pipelining parallelsims as well. Furthermore, the framework consists of hardware implementation rules which enable portable and scalable designs. Device specific hardware wrappers hide the FPGA's and board's idiosyncrasies from the application developer. We also present a new method to automatically select a task's optimal degree of parallelism on an FPGA for a given hardware platform, i. e. to generate a hardware design which uses the available communication bandwidth between the PC and the FPGA optimally. Experimental results show the feasibility of our approach.

10 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
85% related
Scheduling (computing)
78.6K papers, 1.3M citations
83% related
Network packet
159.7K papers, 2.2M citations
80% related
Web service
57.6K papers, 989K citations
80% related
Quality of service
77.1K papers, 996.6K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202147
202048
201952
201870
201775