scispace - formally typeset
Proceedings ArticleDOI

Optimizing remote data transfers in X10

TLDR
A new optimization AT-Opt is presented that minimizes the amount of data serialized and communicated during place-change operations in the program and achieves a geometric mean speedup of 8.61× and 5.57× over the current X10 compiler.
Abstract
X10 is a partitioned global address space (PGAS) programming language that supports the notion of places; a place consists of some data and some lightweight tasks called activities. Each activity runs at a place and may invoke a place-change operation (using the at-construct) to synchronously perform some computation at another place. These place-change operations need to copy all the required data from the current place to the remote place. However, identifying the required data during each place-change operation is a non-trivial task, especially in the context of irregular applications (like graph applications) that contain large amounts of cross-referencing objects - not all of those objects may be actually required, at the remote place. In this paper, we present a new optimization AT-Opt that minimizes the amount of data serialized and communicated during place-change operations. AT-Opt uses a novel abstraction called abstract-place-tree to capture place-change operations in the program. For each place-change operation, AT-Opt uses a novel inter-procedural analysis to precisely identify the data required at the remote place, in terms of the variables in the current scope. AT-Opt then emits the appropriate code to copy the identified data-items to the remote place. We have implemented AT-Opt in the x10v2.6.0 compiler and tested it over the IMSuite benchmark kernels. Compared to the current X10 compiler, the AT-Opt optimized code achieved a geometric mean speedup of 8.61× and 5.57×, on a two-node (32 cores each) Intel and two-node (16 cores each) AMD system, respectively.

read more

Citations
More filters
Posted Content

Memory-efficient array redistribution through portable collective communication

TL;DR: In this paper, a type-directed approach to synthesize array redistributions as sequences of MPI-style collective operations is presented, which is shown to be memory-efficient and performs no excessive data transfers.
Journal ArticleDOI

Optimizing Remote Communication in X10

TL;DR: AT-Com, a scheme to optimize X10 code with place-change operations, consists of two inter-related new optimizations: AT-Opt, which minimizes the amount of data serialized and communicated during place- Change operations, and AT-Pruning, which identifies/elides redundant place- change operations and does parallel execution of place-changing operations.
References
More filters
Proceedings ArticleDOI

Global Data Re-allocation via Communication Aggregation in Chapel

TL;DR: This work analyzes Chapel's standard Block and Cyclic distribution modules and optimize the communication routines for array assignments by performing aggregation, finding that the implemented techniques can lead to significant reductions in communication time.
Book ChapterDOI

Static Detection of Place Locality and Elimination of Runtime Checks

TL;DR: A novel framework for statically establishing place locality in X 10 is presented, based on a static abstraction of activities (threads) incorporating places and an extension to classical escape analysis to track the abstract-activities to which an object can escape.
Proceedings ArticleDOI

Optimizing shared data accesses in distributed-memory X10 systems

TL;DR: This work incorporates a directory-based protocol into the runtime system of X10 - a Partitioned-Global-Address-Space (PGAS) programming language - to manage read-mostly, producer-consumer, stencil, and migratory variables and introduces a new shared-variable access-pattern profiler that is used by a new coherence-policy manager to decide which protocol should be used for each shared variable.
Related Papers (5)