scispace - formally typeset
Search or ask a question
Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A dynamic partial-parallel data layout (DPPDL) is proposed for green video surveillance storage, which dynamically allocates the storage space with an appropriate degree of partial parallelism according to performance requirement.
Abstract: Video surveillance requires storing massive amounts of video data, which results in the rapid increasing of storage energy consumption. With the popularization of video surveillance, green storage for video surveillance is very attractive. The existing energy-saving methods for massive storage mostly concentrate on the data centers, mainly with random access, whereas the storage of video surveillance has inherent workload characteristics and access pattern, which can be fully exploited to save more energy. A dynamic partial-parallel data layout (DPPDL) is proposed for green video surveillance storage. It adopts a dynamic partial-parallel strategy, which dynamically allocates the storage space with an appropriate degree of partial parallelism according to performance requirement. Partial parallelism benefits energy conservation by scheduling only partial disks to work; a dynamic degree of parallelism can provide appropriate performances for various intensity workloads. DPPDL is evaluated by a simulated video surveillance consisting of 60–300 cameras with $1920 \times 1080$ pixels. The experiment shows that DPPDL is most energy efficient, while tolerating single disk failure and providing more than 20% performance margin. On average, it saves 7%, 19%, 31%, 36%, 56%, and 59% more energy than a CacheRAID, Semi-RAID, Hibernator, MAID, eRAID5, and PARAID, respectively.

46 citations

Proceedings ArticleDOI
13 Dec 2014
TL;DR: This work presents a general analysis framework for automatically and efficiently mapping nested patterns onto GPUs and compares the performance of the automatically selected mappings to hand-optimized implementations on multiple benchmarks and shows that the average performance gap on 7 out of 8 benchmarks is 24%.
Abstract: Recent work has explored using higher level languages to improve programmer productivity on GPUs. These languages often utilize high level computation patterns (e.g., Map and Reduce) that encode parallel semantics to enable automatic compilation to GPU kernels. However, the problem of efficiently mapping patterns to GPU hardware becomes significantly more difficult when the patterns are nested, which is common in non-trivial applications. To address this issue, we present a general analysis framework for automatically and efficiently mapping nested patterns onto GPUs. The analysis maps nested patterns onto a logical multidimensional domain and parameterizes the block size and degree of parallelism in each dimension. We then add GPU-specific hard and soft constraints to prune the space of possible mappings and select the best mapping. We also perform multiple compiler optimizations that are guided by the mapping to avoid dynamic memory allocations and automatically utilize shared memory within GPU kernels. We compare the performance of our automatically selected mappings to hand-optimized implementations on multiple benchmarks and show that the average performance gap on 7 out of 8 benchmarks is 24%. Furthermore, our mapping strategy outperforms simple 1D mappings and existing 2D mappings by up to 28.6x and 9.6x respectively.

46 citations

Proceedings ArticleDOI
27 Mar 2017
TL;DR: This work presents an implementation of the pure version of the Smith-Waterman algorithm, including the key architectural optimizations to achieve highest possible performance for a given platform and leverage the Berkeley roofline model to track the performance and guide the optimizations.
Abstract: Smith-Waterman is a dynamic programming algorithm that plays a key role in the modern genomics pipeline as it is guaranteed to find the optimal local alignment between two strings of data. The state of the art presents many hardware acceleration solutions that have been implemented in order to exploit the high degree of parallelism available in this algorithm. The majority of these implementations use heuristics to increase the performance of the system at the expense of the accuracy of the result. In this work, we present an implementation of the pure version of the algorithm. We include the key architectural optimizations to achieve highest possible performance for a given platform and leverage the Berkeley roofline model to track the performance and guide the optimizations. To achieve scalability, our custom design comprises of systolic arrays, data compression features and shift registers, while a custom port mapping strategy aims to maximize performance. Our designs are built leveraging an OpenCL-based design entry, namely Xilinx SDAccel, in conjunction with a Xilinx Virtex 7 and Kintex Ultrascale platform. Our final design achieves a performance of 42.47 GCUPS (giga cell updates per second) with an energy efficiency of 1.6988 GCUPS/W. This represents an improvement of 1.72x in performance and energy efficiency over previously published FPGA implementations and 8.49x better in energy efficiency over comparable GPU implementations.

45 citations

Journal ArticleDOI
TL;DR: In this article, an iterative method by which a test can be dichotomized in parallel halves and ensures maximum split-half reliability was presented. But, no assumption was made regarding form or availability of reference test.
Abstract: The paper addresses an iterative method by which a test can be dichotomized in parallel halves and ensures maximum split-half reliability. The method assumes availability of data on scores of binary items. Since, it was aiming at splitting a test in parallel halves, no assumption was made regarding form or availability of reference test. Empirical verification is also provided. Other properties of the iterative methods discussed. New measures of degree of parallelism given. Simultaneous testing of single multidimensional hypothesis of equality of mean, variance and correlation of parallel tests can also be carried out by testing equality of regression lines of test scores on scores of each of the parallel halves, ANOVA or by Mahalanobis�� 2 . The iterative method can be extended to find split-half reliability of a battery of tests. The method thus provides answer to much needed question of splitting a test uniquely in parallel halves ensuring maximum value of the split-half reliability. The method may be adopted while reporting a test.

44 citations

Proceedings ArticleDOI
J. Rivoir1
14 Jul 2004
TL;DR: In this paper, the authors show quantitatively that parallel test is a much effective test cost reduction method than low-cost ATE, because it reduces all test cost contributors, not only capital cost of ATE.
Abstract: Today's manufacturers of high-volume consumer devices are under tremendous cost pressure and consequently under extreme pressure to reduce cost of test. Low-cost ATE has often been promoted as the obvious solution. Parallel test is another well-known approach, where multiple devices are tested in parallel (multi-site test) and/or multiple blocks within one device are tested in parallel (concurrent test). This paper shows quantitatively that parallel test is a much effective test cost reduction method than low-cost ATE, because it reduces all test cost contributors, not only capital cost of ATE. It also shows that the optimum number of sites is relatively insensitive to ATE capital cost, operating cost, yield, and various limiting factors, but the cost benefits diminish fast, if limited independent ATE resources reduce the degree of parallelism and force a partially sequential test.

44 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
85% related
Scheduling (computing)
78.6K papers, 1.3M citations
83% related
Network packet
159.7K papers, 2.2M citations
80% related
Web service
57.6K papers, 989K citations
80% related
Quality of service
77.1K papers, 996.6K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202147
202048
201952
201870
201775