Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

DPPDL: A Dynamic Partial-Parallel Data Layout for Green Video Surveillance Storage

[...]

Zhizhuo Sun, Quanxin Zhang¹, Yuanzhang Li¹, Yu-an Tan¹•Institutions (1)

Beijing Institute of Technology¹

01 Jan 2018-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A dynamic partial-parallel data layout (DPPDL) is proposed for green video surveillance storage, which dynamically allocates the storage space with an appropriate degree of partial parallelism according to performance requirement.

...read moreread less

Abstract: Video surveillance requires storing massive amounts of video data, which results in the rapid increasing of storage energy consumption. With the popularization of video surveillance, green storage for video surveillance is very attractive. The existing energy-saving methods for massive storage mostly concentrate on the data centers, mainly with random access, whereas the storage of video surveillance has inherent workload characteristics and access pattern, which can be fully exploited to save more energy. A dynamic partial-parallel data layout (DPPDL) is proposed for green video surveillance storage. It adopts a dynamic partial-parallel strategy, which dynamically allocates the storage space with an appropriate degree of partial parallelism according to performance requirement. Partial parallelism benefits energy conservation by scheduling only partial disks to work; a dynamic degree of parallelism can provide appropriate performances for various intensity workloads. DPPDL is evaluated by a simulated video surveillance consisting of 60–300 cameras with $1920 \times 1080$ pixels. The experiment shows that DPPDL is most energy efficient, while tolerating single disk failure and providing more than 20% performance margin. On average, it saves 7%, 19%, 31%, 36%, 56%, and 59% more energy than a CacheRAID, Semi-RAID, Hibernator, MAID, eRAID5, and PARAID, respectively.

...read moreread less

46 citations

Proceedings Article•DOI•

Locality-Aware Mapping of Nested Parallel Patterns on GPUs

[...]

HyoukJoong Lee¹, Kevin J. Brown¹, Arvind K. Sujeeth¹, Tiark Rompf², Kunle Olukotun¹ - Show less +1 more•Institutions (2)

Stanford University¹, Purdue University²

13 Dec 2014

TL;DR: This work presents a general analysis framework for automatically and efficiently mapping nested patterns onto GPUs and compares the performance of the automatically selected mappings to hand-optimized implementations on multiple benchmarks and shows that the average performance gap on 7 out of 8 benchmarks is 24%.

...read moreread less

Abstract: Recent work has explored using higher level languages to improve programmer productivity on GPUs. These languages often utilize high level computation patterns (e.g., Map and Reduce) that encode parallel semantics to enable automatic compilation to GPU kernels. However, the problem of efficiently mapping patterns to GPU hardware becomes significantly more difficult when the patterns are nested, which is common in non-trivial applications. To address this issue, we present a general analysis framework for automatically and efficiently mapping nested patterns onto GPUs. The analysis maps nested patterns onto a logical multidimensional domain and parameterizes the block size and degree of parallelism in each dimension. We then add GPU-specific hard and soft constraints to prune the space of possible mappings and select the best mapping. We also perform multiple compiler optimizations that are guided by the mapping to avoid dynamic memory allocations and automatically utilize shared memory within GPU kernels. We compare the performance of our automatically selected mappings to hand-optimized implementations on multiple benchmarks and show that the average performance gap on 7 out of 8 benchmarks is 24%. Furthermore, our mapping strategy outperforms simple 1D mappings and existing 2D mappings by up to 28.6x and 9.6x respectively.

...read moreread less

46 citations

Proceedings Article•DOI•

Architectural optimizations for high performance and energy efficient Smith-Waterman implementation on FPGAs using OpenCL

[...]

Lorenzo Di Tucci¹, Kenneth O'Brien², Michaela Blott², Marco D. Santambrogio¹•Institutions (2)

Polytechnic University of Milan¹, Xilinx²

27 Mar 2017

TL;DR: This work presents an implementation of the pure version of the Smith-Waterman algorithm, including the key architectural optimizations to achieve highest possible performance for a given platform and leverage the Berkeley roofline model to track the performance and guide the optimizations.

...read moreread less

Abstract: Smith-Waterman is a dynamic programming algorithm that plays a key role in the modern genomics pipeline as it is guaranteed to find the optimal local alignment between two strings of data. The state of the art presents many hardware acceleration solutions that have been implemented in order to exploit the high degree of parallelism available in this algorithm. The majority of these implementations use heuristics to increase the performance of the system at the expense of the accuracy of the result. In this work, we present an implementation of the pure version of the algorithm. We include the key architectural optimizations to achieve highest possible performance for a given platform and leverage the Berkeley roofline model to track the performance and guide the optimizations. To achieve scalability, our custom design comprises of systolic arrays, data compression features and shift registers, while a custom port mapping strategy aims to maximize performance. Our designs are built leveraging an OpenCL-based design entry, namely Xilinx SDAccel, in conjunction with a Xilinx Virtex 7 and Kintex Ultrascale platform. Our final design achieves a performance of 42.47 GCUPS (giga cell updates per second) with an energy efficiency of 1.6988 GCUPS/W. This represents an improvement of 1.72x in performance and energy efficiency over previously published FPGA implementations and 8.49x better in energy efficiency over comparable GPU implementations.

...read moreread less

45 citations

Journal Article•DOI•

Best Split-Half and Maximum Reliability

[...]

Satyendra Nath Chakrabartty

01 Jan 2013-IOSR Journal of Research & Method in Education

TL;DR: In this article, an iterative method by which a test can be dichotomized in parallel halves and ensures maximum split-half reliability was presented. But, no assumption was made regarding form or availability of reference test.

...read moreread less

Abstract: The paper addresses an iterative method by which a test can be dichotomized in parallel halves and ensures maximum split-half reliability. The method assumes availability of data on scores of binary items. Since, it was aiming at splitting a test in parallel halves, no assumption was made regarding form or availability of reference test. Empirical verification is also provided. Other properties of the iterative methods discussed. New measures of degree of parallelism given. Simultaneous testing of single multidimensional hypothesis of equality of mean, variance and correlation of parallel tests can also be carried out by testing equality of regression lines of test scores on scores of each of the parallel halves, ANOVA or by Mahalanobis�� 2 . The iterative method can be extended to find split-half reliability of a battery of tests. The method thus provides answer to much needed question of splitting a test uniquely in parallel halves ensuring maximum value of the split-half reliability. The method may be adopted while reporting a test.

...read moreread less

44 citations

Proceedings Article•DOI•

Parallel test reduces cost of test more effectively than just a cheap tester

[...]

J. Rivoir¹•Institutions (1)

Agilent Technologies¹

14 Jul 2004

TL;DR: In this paper, the authors show quantitatively that parallel test is a much effective test cost reduction method than low-cost ATE, because it reduces all test cost contributors, not only capital cost of ATE.

...read moreread less

Abstract: Today's manufacturers of high-volume consumer devices are under tremendous cost pressure and consequently under extreme pressure to reduce cost of test. Low-cost ATE has often been promoted as the obvious solution. Parallel test is another well-known approach, where multiple devices are tested in parallel (multi-site test) and/or multiple blocks within one device are tested in parallel (concurrent test). This paper shows quantitatively that parallel test is a much effective test cost reduction method than low-cost ATE, because it reduces all test cost contributors, not only capital cost of ATE. It also shows that the optimum number of sites is relatively insensitive to ATE capital cost, operating cost, yield, and various limiting factors, but the cost benefits diminish fast, if limited independent ATE resources reduce the degree of parallelism and force a partially sequential test.

...read moreread less

44 citations

Collapse

Network Information

Performance

Metrics

1,515

Papers

27,447

Citations

No. of papers in the topic in previous years
Year	Papers
2022	1
2021	47
2020	48
2019	52
2018	70
2017	75

Degree of parallelism

Papers published on a yearly basis

Papers

Trending Questions (7)

Network Information

Related Topics (5)

Performance

Metrics