Falcon: A Graph Manipulation Language for Heterogeneous Systems
TLDR
A domain-specific language (DSL) is proposed, Falcon, for implementing graph algorithms that abstracts the hardware, provides constructs to write explicitly parallel programs at a higher level, and can work with general algorithms that may change the graph structure.Abstract:
Graph algorithms have been shown to possess enough parallelism to keep several computing resources busy—even hundreds of cores on a GPU. Unfortunately, tuning their implementation for efficient execution on a particular hardware configuration of heterogeneous systems consisting of multicore CPUs and GPUs is challenging, time consuming, and error prone. To address these issues, we propose a domain-specific language (DSL), Falcon, for implementing graph algorithms that (i) abstracts the hardware, (ii) provides constructs to write explicitly parallel programs at a higher level, and (iii) can work with general algorithms that may change the graph structure (morph algorithms). We illustrate the usage of our DSL to implement local computation algorithms (that do not change the graph structure) and morph algorithms such as Delaunay mesh refinement, survey propagation, and dynamic SSSP on GPU and multicore CPUs. Using a set of benchmark graphs, we illustrate that the generated code performs close to the state-of-the-art hand-tuned implementations.read more
Citations
More filters
Journal ArticleDOI
A Unified Cryptoprocessor for Lattice-Based Signature and Key-Exchange
Aikata Aikata,Ahmet Can Mert,David Jacquemin,Amitabh Das,Donald Matthews,Santosh Ghosh,Sujoy Sinha Roy +6 more
TL;DR: The cryptoprocessor architecture has been optimized targeting the signature scheme ’CRYSTALS-Dilithium’ and the key encapsulation mechanism (KEM) ’Saber’, both NIST’s post-quantum cryptography standardization project.
Proceedings ArticleDOI
DH-Falcon: A Language for Large-Scale Graph Processing on Distributed Heterogeneous Systems
TL;DR: DH-Falcon is presented, a graph DSL (domain-specific language) which can be used to implement parallel algorithms for large-scale graphs, tar-geting Distributed Heterogeneous (CPU and GPU) clusters and gains a speedup of up to 13×.
Journal ArticleDOI
Take your MEDS: Digital Signatures from Matrix Code Equivalence
Tung Chou,Ruben Niederhagen,Edoardo Persichetti,Tovohery Randrianarisoa,Krijn Reijnders,Simona Samardjiska,Monika Trimoska +6 more
Proceedings ArticleDOI
iTurboGraph: Scaling and Automating Incremental Graph Analytics
TL;DR: ŁNGA as mentioned in this paper is a domain-specific language for incremental neighbor-centric graph analytics (NGA) for large-scale graph analytics, which can be used to solve the limitations of previous systems: lack of usability due to the difficulties in programming incremental algorithms for NGA and limited scalability and efficiency due to maintaining intermediate results for graph traversals in NGA.
Proceedings ArticleDOI
Efficient execution of graph algorithms on CPU with SIMD extensions
Ruohuang Zheng,Sreepathi Pai +1 more
TL;DR: In this paper, the authors retarget an existing GPU graph algorithm compiler to obtain the first graph framework that uses SIMD extensions on CPUs to efficiently execute graph algorithms, and evaluate this compiler on 10 benchmarks and 3 graphs on 3 different CPUs and also compare to the GPU.
References
More filters
Journal ArticleDOI
The evolution of random graphs
Journal ArticleDOI
A bridging model for parallel computation
TL;DR: The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.
Proceedings ArticleDOI
Pregel: a system for large-scale graph processing
Grzegorz Malewicz,Matthew H. Austern,Aart J. C. Bik,James C. Dehnert,Ilan Horn,Naty Leiser,Grzegorz Czajkowski +6 more
TL;DR: A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.
Proceedings ArticleDOI
Scalable parallel programming with CUDA
TL;DR: Presents a collection of slides covering the following topics: CUDA parallel programming model; CUDA toolkit and libraries; performance optimization; and application development.