scispace - formally typeset
Search or ask a question
Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.


Papers
More filters
Proceedings ArticleDOI
01 Sep 2021
TL;DR: Zhang et al. as mentioned in this paper proposed a near-data processing (NDP) architecture based on 3D-stacking, which distributes key operations across NDP cores to exploit high degree of parallelism and high memory bandwidth.
Abstract: De novo assembly of genomes for which there is no reference, is essential for novel species discovery and metagenomics. In this work, we accelerate two key performance bottlenecks of DBG-based assembly, graph construction and graph traversal, with a near-data processing (NDP) architecture based on 3D-stacking. The proposed framework distributes key operations across NDP cores to exploit a high degree of parallelism and high memory bandwidth. We propose several optimizations based on domain-specific properties to improve the performance of our design. We integrate the proposed techniques into an existing DBG assembly tool, and our simulation-based evaluation shows that the proposed NDP implementation can improve the performance of graph construction by 33× and traversal by 16× compared to the state-of-the-art.

4 citations

Patent
11 Apr 2012
TL;DR: In this article, a signal capture method in a satellite navigation system and an apparatus thereof is described, where a method of determining whether phases of an energy peak value is consistent is combined with the method of comparing the peak value to a threshold so that whether a satellite signal is visible and a frequency and the code phase of the satellite signal can be determined.
Abstract: The invention provides a signal capture method in a satellite navigation system and an apparatus thereof. According to the method and the apparatus of the invention, a method of determining whether phases of an energy peak value is consistent is combined with the method of comparing the energy peak value to a threshold so that whether a satellite signal is visible and a frequency and the code phase of the satellite signal can be determined. The method of comparing the detection energy peak value to the threshold in the traditional capture method is not used. Capture time can be effectively reduced. Under the condition of maintaining a degree of parallelism, a receiver used the method of the invention can be rapidly started. Otherwise, if the starting time of the receiver is required to be the same, by using the method, the degree of parallelism of an energy monitor can be reduced so as to reduce hardware realization costs. Therefore, the costs can be saved.

4 citations

01 Aug 2013
TL;DR: This position paper puts forward possible ways to achieve preconditioned Krylov solvers that efficiently use all the resources on many-core chips and are extremely scalable on massively parallel machines.
Abstract: For many HPC codes the major cost lies in the solution of large sparse linear systems [1]. For many problems, the methods of choice for such systems are preconditioned Krylov solvers. However, Krylov solvers are hard to scale to a large numbers of cores due to two main bottlenecks: the inter-node latency and the on-node bandwidth. In this position paper, we review recently proposed techniques to overcome each of these bottlenecks and we put forward possible ways to achieve preconditioned Krylov solvers that efficiently use all the resources on many-core chips and are extremely scalable on massively parallel machines. A key ingredient in our approach is the use of stencil compilers. Future supercomputers will have a large number of nodes, each being a many-core processor. In addition, the cores will feature vector processing units (VPU) with very long vectors. New algorithms and software should exploit these three levels of parallelism. On massively parallel machines, global communication should be avoided as much as possible. Global communication is very expensive due to the large latency on the wire, unless it can be overlapped with calculations. In Krylov solvers, there are usually at least two such global communication phases per iteration, used for orthogonalization and normalization of the Krylov base vectors. In the standard formulation of most Krylov methods, there is no possibility to overlap this communication with local work, which leads to a bulk synchronous execution pattern that leaves many resources idle. Recently, pipelined Krylov methods [6, 7] reorganized the algorithms with only one global reduction per iteration. The reduction’s latency can be overlapped with other work such as the (preconditioned) sparse matrix-vector product ((P)SpMV). While the reduction takes place in the background, new Krylov base vectors can be computed using the (P)SpMV. Only when enough (P)SpMVs have been computed to completely hide the global communication latency, an orthogonalization and normalization step is performed. This deferred orthogonalization obviously changes the numerical properties of the Krylov algorithm. However, since only very few (P)SpMVs are required to completely hide the global latency, numerical stability is mildly affected. This can be remediated by introducing shifts in the (P)SpMV that prevents the base vectors from aligning to the dominant eigenvector and results in an improved Krylov basis. Pipelined methods lift the main bottleneck for scaling Krylov solvers to extreme numbers of cores and the resulting solver scales as well as the (P)SpMV. For many applications, good scalability for the sparse matrix-vector product (SpMV) can be achieved even for 100k cores, if the problem is partitioned such that there is only local communication. For the preconditioner, there is typically a trade-off between parallelism and efficiency. We expect pipelined methods to be better suited for cheap preconditioners with a high degree of parallelism. 2 Although the (P)SpMV may scale well on a distributed memory system, the on-node performance may still be poor. Within a node, the different threads have to share the available

4 citations

Journal ArticleDOI
TL;DR: This work describes a multiphase partitioning approach based on the above motivation for the general cases of DOALL loops to achieve a computation+communication load balanced partitioning through static data and iteration space distribution.

4 citations

Proceedings ArticleDOI
29 Mar 1976
TL;DR: The power of this modeling technique with respect to comprehensibility, accuracy of representation and ease of validation and modification is demonstrated by application to modeling of the UT-2 operating system for the CDC 6000 series computer system.
Abstract: This paper defines and determines a graph model of a computer system in a form applicable to system performance analysis. The power of this modeling technique with respect to comprehensibility, accuracy of representation and ease of validation and modification is demonstrated by application to modeling of the UT-2 operating system for the CDC 6000 series computer system. This multiprocessor-multi-programmed operating system with its high degree of parallelism provides an excellent test for the utility and range of application of graph models in performance evaluation. A programmed representation of the kernel monitor is used. All other system processes are represented in graph form and are input data to the simulator. A generally applicable technique for extracting graph representations of processes are represented in graph form and are input data to the simulator. A generally applicable technique for extracting graph representations of processes from event trace data is described and applied to the event trace generated by the UT-2 operating system. The technique is both complete and general and may be profitably applied for either partial or complete models of any type of complex computer system process where data or techniques for automated graph construction are available or can he applied.

4 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
85% related
Scheduling (computing)
78.6K papers, 1.3M citations
83% related
Network packet
159.7K papers, 2.2M citations
80% related
Web service
57.6K papers, 989K citations
80% related
Quality of service
77.1K papers, 996.6K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202147
202048
201952
201870
201775