scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Comparative study on Boruvka's implementation on hetrogenous platform with cache analysis

TL;DR: The paper presents the time comparative study of two different methods of Boruvka's algorithm on different architectures of multi core CPU-chips and shows that the Compressed Sparse Row[CSR] format of implementation for Boruvko's algorithm outperforms in all the platforms for different readily available benchmarks.
Abstract: To address the performance analysis of Boruvka's algorithms by analyzing different cache performance parameters in VTUNE Amplifier. In general, Minimum Spanning Tree [MST] is computed with various algorithms, Boruvka's being the oldest and efficient. MST is widely used in applications like search engines, pattern recognition, routing algorithms, network design etc. The paper presents the time comparative study of two different methods of Boruvka's algorithm on different architectures of multi core CPU-chips. It highlights the importance of GPU computing in the modern era and shows that the Compressed Sparse Row[CSR] format of implementation for Boruvka's algorithm outperforms in all the platforms for different readily available benchmarks.
Citations
More filters
Proceedings Article
01 Jan 2017
TL;DR: In this paper, the authors present state-of-the-art performance tools for leading-edge HPC systems founded on the Score-P community instrumentation and measurement infrastructure, demonstrating how they can be used for performance engineering of effective scientific applications based on standard MPI or OpenMP.
Abstract: This tutorial presents state-of-the-art performance tools for leading-edge HPC systems founded on the Score-P community instrumentation and measurement infrastructure, demonstrating how they can be used for performance engineering of effective scientific applications based on standard MPI or OpenMP and now common mixed-mode hybrid parallelizations. Parallel performance evaluation tools from the Virtual Institute --- High Productivity Supercomputing (VI-HPS) are introduced and featured in hands-on exercises with Periscope, Scalasca, Vampir and TAU.We cover all aspects of performance engineering practice, including instrumentation, measurement (profiling and tracing, timing and hardware counters), data storage, analysis and visualization. Emphasis is placed on how tools are used in combination for identifying performance problems and investigating optimization alternatives, illustrated with a case study using a major application code.
References
More filters
Journal ArticleDOI
TL;DR: There are several apparently independent sources and algorithmic solutions of the minimum spanning tree problem and their motivations, and they have appeared in Czechoslovakia, France, and Poland, going back to the beginning of this century.
Abstract: It is standard practice among authors discussing the minimum spanning tree problem to refer to the work of Kruskal(1956) and Prim (1957) as the sources of the problem and its first efficient solutions, despite the citation by both of Boruvka (1926) as a predecessor. In fact, there are several apparently independent sources and algorithmic solutions of the problem. They have appeared in Czechoslovakia, France, and Poland, going back to the beginning of this century. We shall explore and compare these works and their motivations, and relate them to the most recent advances on the minimum spanning tree problem.

788 citations


"Comparative study on Boruvka's impl..." refers background in this paper

  • ...These system contain multiple processors with a powerful CPU and GPU core to accelerate the computation[1]....

    [...]

Journal ArticleDOI
TL;DR: The first English translation of both of Borůvka's pioneering works, which are generally regarded as a cornerstone of Combinatorial Optimization, are presented.

322 citations


"Comparative study on Boruvka's impl..." refers background or methods in this paper

  • ...Traditionally GPU operates in co-ordination with CPU where CPU offloads parallel parts of computation to GPU[4]....

    [...]

  • ...In GPU, cache is use to localize data during volume rendering [17], while in CPU, it is used to localized data during memory references....

    [...]

01 Jan 2012
TL;DR: The aim of this article is to simplify the process of getting started with GPU programming, by giving an overview of current GPU programming strategies, profile-driven development, and an outlook to future trends.
Abstract: Over the last decade, there has been a growing interest in the use of graphics processing units (GPUs) for nongraphics applications. From early academic proof-of-concept papers around the year 2000, the use of GPUs has now matured to a point where there are countless industrial applications. Together with the expanding use of GPUs, we have also seen a tremendous development in the programming languages and tools, and getting started programming GPUs has never been easier. However, whilst getting started with GPU programming can be simple, being able to fully utilize GPU hardware is an art that can take months and years to master. The aim of this article is to simplify this process, by giving an overview of current GPU programming strategies, profile driven development, and an outlook to future trends.

240 citations


"Comparative study on Boruvka's impl..." refers methods in this paper

  • ...In the second method[12], we use CSR(Compressed Sparse Row) format to represent the graph and then find the MST. MST-solver algorithm is mostly implemented through CPU....

    [...]

Proceedings ArticleDOI
01 Aug 2009
TL;DR: This paper presents a minimum spanning tree algorithm on Nvidia GPUs under CUDA, as a recursive formulation of Borůvka's approach for undirected graphs, implemented using scalable primitives such as scan, segmented scan and split.
Abstract: Graphics Processor Units are used for many general purpose processing due to high compute power available on them. Regular, data-parallel algorithms map well to the SIMD architecture of current GPU. Irregular algorithms on discrete structures like graphs are harder to map to them. Efficient data-mapping primitives can play crucial role in mapping such algorithms onto the GPU. In this paper, we present a minimum spanning tree algorithm on Nvidia GPUs under CUDA, as a recursive formulation of Boruvka's approach for undirected graphs. We implement it using scalable primitives such as scan, segmented scan and split. The irregular steps of supervertex formation and recursive graph construction are mapped to primitives like split to categories involving vertex ids and edge weights. We obtain 30 to 50 times speedup over the CPU implementation on most graphs and 3 to 10 times speedup over our previous GPU implementation. We construct the minimum spanning tree on a 5 million node and 30 million edge graph in under 1 second on one quarter of the Tesla S1070 GPU.

126 citations


"Comparative study on Boruvka's impl..." refers background in this paper

  • ...Traditionally GPU operates in co-ordination with CPU where CPU offloads parallel parts of computation to GPU[4]....

    [...]

Journal ArticleDOI
TL;DR: This work demonstrates the implementation of the FETI method to a hybrid CPU–GPU computing environment and reveals the tremendous potential of this type of hybrid computing environment as a result of the full exploitation of multi-core CPU hardware resources and the intrinsic software and hardware features of the GPUs.

111 citations


"Comparative study on Boruvka's impl..." refers background in this paper

  • ...Keywords—performance;borukva’s;cache; vtune; MST; oldest; time; architectures; gpu; csr;outperforms;benchmarks I. INTRODUCTION In the new era of modern computing, new types of heterogeneous computers have begun to emerge[15]....

    [...]