Top 6 papers published by Sameer Shende from University of Oregon in 2013

Proceedings Article•DOI•

An early prototype of an autonomic performance environment for exascale

[...]

Kevin Huck¹, Sameer Shende¹, Allen D. Malony¹, Hartmut Kaiser², Allan Porterfield³, Rob Fowler³, Ron Brightwell⁴ - Show less +3 more•Institutions (4)

University of Oregon¹, Louisiana State University², Renaissance Computing Institute³, Sandia National Laboratories⁴

10 Jun 2013

TL;DR: The DOE-funded XPRESS project and the role of autonomic performance support in Exascale systems are described and results are presented that highlight the challenges of highly integrative observation and runtime analysis.

...read moreread less

Abstract: Extreme-scale computing requires a new perspective on the role of performance observation in the Exascale system software stack. Because of the anticipated high concurrency and dynamic operation in these systems, it is no longer reasonable to expect that a post-mortem performance measurement and analysis methodology will suffice. Rather, there is a strong need for performance observation that merges first-and third-person observation, in situ analysis, and introspection across stack layers that serves online dynamic feedback and adaptation. In this paper we describe the DOE-funded XPRESS project and the role of autonomic performance support in Exascale systems. XPRESS will build an integrated Exascale software stack (called OpenX) that supports the ParalleX execution model and is targeted towards future Exascale platforms. An initial version of an autonomic performance environment called APEX has been developed for OpenX using the current TAU performance technology and results are presented that highlight the challenges of highly integrative observation and runtime analysis.

...read moreread less

22 citations

Proceedings Article•DOI•

MIL: A language to build program analysis tools through static binary instrumentation

[...]

Andres S. Charif-Rubial, Denis Barthou¹, Cédric Valensi, Sameer Shende², Allen D. Malony², William Jalby - Show less +2 more•Institutions (2)

University of Bordeaux¹, University of Oregon²

18 Dec 2013

TL;DR: This paper proposes a language, MIL, for the development of program analysis tools based on static binary instrumentation to ease the integration of static, global program analysis with instrumentation and shows how this enables both a precise targeting of the code regions to analyze and a better understanding of the optimized program behavior.

...read moreread less

Abstract: As software complexity increases, the analysis of code behavior during its execution is becoming more important. Instrumentation techniques, through the insertion of code directly into binaries, are essential for program analyses used in debugging, runtime profiling, and performance evaluation. In the context of high-performance parallel applications, building an instrumentation framework is quite challenging. One of the difficulties is due to the necessity to capture both coarse-grain behavior, such as the execution time of different functions, as well as finer-grain actions, in order to pinpoint performance issues. In this paper, we propose a language, MIL, for the development of program analysis tools based on static binary instrumentation. The key feature of MIL is to ease the integration of static, global program analysis with instrumentation. We will show how this enables both a precise targeting of the code regions to analyze and a better understanding of the optimized program behavior.

...read moreread less

18 citations

Proceedings Article•DOI•

Inspector/executor load balancing algorithms for block-sparse tensor contractions

[...]

David Ozog¹, Sameer Shende¹, Allen D. Malony¹, Jeff R. Hammond², James Dinan², Pavan Balaji² - Show less +2 more•Institutions (2)

University of Oregon¹, Argonne National Laboratory²

10 Jun 2013

TL;DR: A set of static and dynamic scheduling algorithms for block-sparse tensor contractions within the NWChem computational chemistry code for different degrees of sparsity (and therefore load imbalance) are explored.

...read moreread less

Abstract: Developing effective yet scalable load-balancing methods for irregular computations is critical to the successful application of simulations in a variety of disciplines at petascale and beyond. This paper explores a set of static and dynamic scheduling algorithms for block-sparse tensor contractions within the NWChem computational chemistry code for different degrees of sparsity (and therefore load imbalance). In this particular application, a relatively large amount of task information can be obtained at minimal cost, which enables the use of static partitioning techniques that take the entire task list as input. However, fully static partitioning is incapable of dealing with dynamic variation of task costs, such as from transient network contention or operating system noise, so we also consider hybrid schemes that utilize dynamic scheduling within subgroups. These two schemes, which have not been previously implemented in NWChem or its proxies (i.e. quantum chemistry mini-apps) are compared to the original centralized dynamic load-balancing algorithm as well as improved centralized scheme. In all cases, we separate the scheduling of tasks from the execution of tasks into an inspector phase and an executor phase. The impact of these methods upon the application is substantial on a large InfiniBand cluster: execution time is reduced by as much as 50% at scale. The technique is applicable to any scientific application requiring load balance where performance models or estimations of kernel execution times are available.

...read moreread less

15 citations

Proceedings Article•DOI•

Inspector-Executor Load Balancing Algorithms for Block-Sparse Tensor Contractions

[...]

David Ozog¹, Jeff R. Hammond², James Dinan², Pavan Balaji², Sameer Shende¹, Allen D. Malony¹ - Show less +2 more•Institutions (2)

University of Oregon¹, Argonne National Laboratory²

01 Oct 2013

TL;DR: This paper explores a set of static and dynamic scheduling algorithms for block-sparse tensor contractions within the NWChem computational chemistry code for different degrees of sparsity (and therefore load imbalance) in order to develop effective yet scalable load-balancing methods for irregular computations.

...read moreread less

Abstract: Developing effective yet scalable load-balancing methods for irregular computations is critical to the successful application of simulations in a variety of disciplines at petascale and beyond. This paper explores a set of static and dynamic scheduling algorithms for block-sparse tensor contractions within the NWChem computational chemistry code for different degrees of sparsity (and therefore load imbalance). In this particular application, a relatively large amount of task information can be obtained at minimal cost, which enables the use of static partitioning techniques that take the entire task list as input. However, fully static partitioning is incapable of dealing with dynamic variation of task costs, such as from transient network contention or operating system noise, so we also consider hybrid schemes that utilize dynamic scheduling within subgroups. These two schemes, which have not been previously implemented in NWChem or its proxies (i.e. quantum chemistry mini-apps) are compared to the original centralized dynamic load-balancing algorithm as well as improved centralized scheme. In all cases, we separate the scheduling of tasks from the execution of tasks into an inspector phase and an executor phase. The impact of these methods upon the application is substantial on a large InfiniBand cluster: execution time is reduced by as much as 50% at scale. The technique is applicable to any scientific application requiring load balance where performance models or estimations of kernel execution times are available.

...read moreread less

11 citations

Proceedings Article•DOI•

Test-driven coarray parallelization of a legacy Fortran application

[...]

Hari Radhakrishnan, Damian Rouson¹, Karla Morris², Sameer Shende³, Stavros C. Kassinos⁴ - Show less +1 more•Institutions (4)

Stanford University¹, Sandia National Laboratories², University of Oregon³, University of Cyprus⁴

17 Nov 2013

TL;DR: This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively, and studies the resulting performance.

...read moreread less

Abstract: This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP) facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multi-core processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallize the program, and study the resulting performance. Our scaling studies show that the bottleneck in the performance was due to the implementation of the collective sum procedure. Replacing the sequential procedure with a binary tree procedure improved the scaling performance of the program. This bottleneck will be resolved in the future by new collective procedures in Fortran 2015.

...read moreread less

9 citations

Proceedings Article•

Hands-on Practical Hybrid Parallel Application Performance Engineering

[...]

Christian Feld¹, Brian J. N. Wylie¹, Bert Wesarg², Sameer Shende³, Markus Geimer² - Show less +1 more•Institutions (3)

Forschungszentrum Jülich¹, Dresden University of Technology², University of Oregon³

01 Jan 2013

2 citations

Showing papers by "Sameer Shende published in 2013"