Home
/
Authors
/
Zachary W. Parchman

Author

Zachary W. Parchman

Bio: Zachary W. Parchman is an academic researcher from Tennessee Technological University. The author has contributed to research in topics: Non-volatile random-access memory & Shared memory. The author has an hindex of 2, co-authored 5 publications receiving 13 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

SharP: Towards Programming Extreme-Scale Systems with Hierarchical Heterogeneous Memory

[...]

Manjunath Gorentla Venkata¹, Ferrol Aderholdt¹, Zachary W. Parchman²•Institutions (2)

Oak Ridge National Laboratory¹, Tennessee Technological University²

01 Aug 2017

TL;DR: This work proposes and develops the programming abstraction called SHARed data-structure centric Programming abstraction (SharP), a simple, usable, and portable abstraction for hierarchical-heterogeneous memory and a unified programming abstraction for Big-Compute and Big-Data applications.

...read moreread less

Abstract: The pre-exascale systems are expected to have a significant amount of hierarchical and heterogeneous on-node memory, and this trend of system architecture in extreme-scale systems is expected to continue into the exascale era. Along with hierarchical-heterogeneous memory, the system typically has a high-performing network and a compute accelerator. This system architecture is not only effective for running traditional High Performance Computing (HPC) applications (Big-Compute), but also running data-intensive HPC applications and Big-Data applications. As a consequence, there is a growing desire to have a single system serve the needs of both Big-Compute and Big-Data applications. Though the system architecture supports the convergence of the Big-Compute and Big-Data, the programming models have yet to evolve to support either hierarchical-heterogeneous memory systems or the convergence. In this work, we propose and develop the programming abstraction called SHARed data-structure centric Programming abstraction (SharP) to address both of these goals, i.e., provide (1) a simple, usable, and portable abstraction for hierarchical-heterogeneous memory and (2) a unified programming abstraction for Big-Compute and Big-Data applications. To evaluate SharP, we implement a Stencil benchmark using SharP, port QMCPack, a petascale-capable application, and adapt Memcached ecosystem, a popular Big-Data framework, to use SharP, and quantify the performance and productivity advantages. Additionally, we demonstrate the simplicity of using SharP on different memories including DRAM, High-bandwidth Memory (HBM), and non-volatile random access memory (NVRAM).

...read moreread less

9 citations

Proceedings Article•DOI•

SharP Hash: A High-Performing Distributed Hash for Extreme-Scale Systems

[...]

Zachary W. Parchman¹, Ferrol Aderholdt², Manjunath Gorentla Venkata²•Institutions (2)

Tennessee Technological University¹, Oak Ridge National Laboratory²

01 Sep 2017

TL;DR: SharP Hash's high performance is obtained through the use of high-performing networks and one-sided semantics and its performance characteristics are demonstrated with a synthetic micro-benchmark and implementation of a Key Value (KV) store, Memcached.

...read moreread less

Abstract: A high-performing distributed hash is critical for achieving performance in many applications and system software using extreme-scale systems. It is also a central part of many Big-Data frameworks including Memcached, file systems, and job schedulers. However, there is a lack of high-performing distributed hash implementations. In this work, we propose, design, and implement, SharP Hash, a high-performing, RDMA-based distributed hash for extreme-scale systems. SharP Hash's high performance is obtained through the use of high-performing networks and one-sided semantics. We perform an evaluation of SharP Hash and demonstrate its performance characteristics with a synthetic micro-benchmark and implementation of a Key Value (KV) store, Memcached.

...read moreread less

6 citations

Book Chapter•DOI•

SharP Unified Memory Allocator: An Intent-Based Memory Allocator for Extreme-Scale Systems

[...]

Ferrol Aderholdt¹, Manjunath Gorentla Venkata¹, Zachary W. Parchman²•Institutions (2)

Oak Ridge National Laboratory¹, Tennessee Technological University²

27 Aug 2018

TL;DR: The Unified Memory Allocator (UMA) of the SHARed data-structure centric Programming abstraction (SharP) library is presented, which provides a unified interface for memory allocations across DRAM, HBM, and NVRAM and is extensible to support future memory types.

...read moreread less

Abstract: The pre-exascale systems will soon be deployed with a deep, complex memory hierarchy composed of many heterogeneous memories. This presents multiple challenges for users including: how to allocate data objects with locality between memories and devices for the various memories in these systems, which includes DRAM, High-bandwidth Memory (HBM), and non-volatile random access memory (NVRAM), and how to perform these allocations while providing portability for their application. Currently, the user can make use of multiple, disjoint libraries to allocate data objects on these memories. However, it is difficult to obtain locality between memories and devices when using libraries that are unaware of each other. This paper presents the Unified Memory Allocator (UMA) of the SHARed data-structure centric Programming abstraction (SharP) library, which provides a unified interface for memory allocations across DRAM, HBM, and NVRAM and is extensible to support future memory types. In addition, the SharP UMA allows for portability between systems by supporting both explicit and implicit, intent-based memory allocations. To demonstrate the ease of use of the SharP UMA, we have extended both Open MPIand OpenSHMEM-Xto support SharP. We validate this work by evaluating the performance implications and intent-based approach with synthetic benchmarks as well as adaptations of the Graph500 benchmark.

...read moreread less

1 citations

Proceedings Article•DOI•

SharP Data Constructs: Data Constructs to Enable Data-Centric Computing

[...]

Ferrol Aderholdt¹, Manjunath Gorentla Venkata¹, Zachary W. Parchman²•Institutions (2)

Oak Ridge National Laboratory¹, Tennessee Technological University²

01 Mar 2018

TL;DR: The need for taking a holistic approach towards data-centric abstractions is shown, how these approaches were realized in the SharP library, a data-structure centric programming abstraction, and these approaches are applied to a variety of applications that demonstrate its usefulness are applied.

...read moreread less

Abstract: Extreme-scale applications (i.e., Big-Compute) are becoming increasingly data-intensive, i.e., producing and consuming increasingly large amounts of data. The HPC systems traditionally used for these applications are now used for Big-Data applications such as data analytics, social network analysis, machine learning, and genomics. As a consequence of these trends, the system architecture should be flexible and data-centric. This can already be witnessed in the pre-exascale systems with TBs of on-node hierarchical and heterogeneous memories, PBs of system memory, low-latency, high-throughput networks, and many threaded cores. As such, the pre-exascale systems suit the needs of both Big-Compute and Big-Data applications. Though the system architecture is flexible enough to support both Big-Compute and Big-Data, we argue there is a software gap. Particularly, we need data-centric abstractions to leverage the full potential of the system, i.e., there is a need for native support for data resilience, the ability to express data locality and affinity, mechanisms to reduce data movement, the ability to share data, and abstractions to express User's data usage and data access patterns. In this paper, we (i) show the need for taking a holistic approach towards data-centric abstractions, (ii) show how these approaches were realized in the SHARed data-structure centric Programming abstraction (SharP) library, a data-structure centric programming abstraction, and (iii) apply these approaches to a variety of applications that demonstrate its usefulness. Particularly, we apply these approaches to QMCPack and the Graph500 benchmark and demonstrate the advantages of this approach on extreme-scale systems.

...read moreread less

1 citations

Proceedings Article•DOI•

Adding Fault Tolerance to NPB Benchmarks Using ULFM

[...]

Zachary W. Parchman¹, Geoffroy Vallée², Thomas Naughton², Christian Engelmann², David E. Bernholdt², Stephen L. Scott¹ - Show less +2 more•Institutions (2)

Tennessee Technological University¹, Oak Ridge National Laboratory²

31 May 2016

TL;DR: This work presents an application-level library to "checkpoint" and restore data, extensions of NPB benchmarks for fault tolerance based on different strategies, and some preliminary results that show the impact of such fault tolerant strategies on the application execution.

...read moreread less

Abstract: In the world of high-performance computing, fault tolerance and application resilience are becoming some of the primary concerns because of increasing hardware failures and memory corruptions. While the research community has been investigating various options, from system-level solutions to application-level solutions, standards such as the Message Passing Interface (MPI) are also starting to include such capabilities. The current proposal for MPI fault tolerant is centered around the User-Level Failure Mitigation (ULFM) concept, which provides means for fault detection and recovery of the MPI layer. This approach does not address application-level recovery, which is currently left to application developers. In this work, we present a modification of some of the benchmarks of the NAS parallel benchmark (NPB) to include support of the ULFM capabilities as well as application-level strategies and mechanisms for application-level failure recovery. As such, we present: (i) an application-level library to "checkpoint" and restore data, (ii) extensions of NPB benchmarks for fault tolerance based on different strategies, (iii) a fault injection tool, and (iv) some preliminary results that show the impact of such fault tolerant strategies on the application execution.

...read moreread less

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A Black-Box Fork-Join Latency Prediction Model for Data-Intensive Applications

[...]

Minh Quoc Nguyen¹, Sami Alesawi¹, Ning Li¹, Hao Che¹, Hong Jiang¹ - Show less +1 more•Institutions (1)

University of Texas at Arlington¹

01 Sep 2020-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A black-box Fork-Join model is proposed that covers a wide range of Fork- join structures for the prediction of tail and mean latency, called ForkTail and ForkMean, respectively, and can be used as a powerful tool to aid the design of tail-and-mean-latency guaranteed job scheduling and resource provisioning, especially at high load, for datacenter applications.

...read moreread less

Abstract: The workflows of the predominant datacenter services are underlaid by various Fork-Join structures. Due to the lack of good understanding of the performance of Fork-Join structures in general, today's datacenters often operate under low resource utilization to meet stringent service level objectives (SLOs), e.g., in terms of tail and/or mean latency, for such services. Hence, to achieve high resource utilization, while meeting stringent SLOs, it is of paramount importance to be able to accurately predict the tail and/or mean latency for a broad range of Fork-Join structures of practical interests. In this article, we propose a black-box Fork-Join model that covers a wide range of Fork-Join structures for the prediction of tail and mean latency, called ForkTail and ForkMean, respectively. We derive highly computational effective, empirical expressions for tail and mean latency as functions of means and variances of task response times. Our extensive testing results based on model-based and trace-driven simulations, as well as a real-world case study in a cloud environment demonstrate that the models can consistently predict the tail and mean latency within 20 and 15 percent prediction errors at 80 and 90 percent load levels, respectively, for heavy-tailed workloads, and at any load levels for light-tailed workloads. Moreover, our sensitivity analysis demonstrates that such errors can be well compensated for with no more than 7 percent resource overprovisioning. Consequently, the proposed prediction model can be used as a powerful tool to aid the design of tail-and-mean-latency guaranteed job scheduling and resource provisioning, especially at high load, for datacenter applications.

...read moreread less

19 citations

Journal Article•DOI•

A Case For Intra-rack Resource Disaggregation in HPC

[...]

George Michelogiannakis, Benjamin Klenk, Brandon Cook, Min Yee Teh, Madeleine Glick, Larry R. Dennison, Keren Bergman, John Shalf - Show less +4 more

02 Feb 2022-ACM Transactions on Architecture and Code Optimization

TL;DR: It is shown that for a rack (cabinet) configuration and applications similar to Cori, a central processing unit with intra-rack disaggregation has a 99.5% probability to find all resources it requires inside its rack.

...read moreread less

Abstract: The expected halt of traditional technology scaling is motivating increased heterogeneity in high-performance computing (HPC) systems with the emergence of numerous specialized accelerators. As heterogeneity increases, so does the risk of underutilizing expensive hardware resources if we preserve today’s rigid node configuration and reservation strategies. This has sparked interest in resource disaggregation to enable finer-grain allocation of hardware resources to applications. However, there is currently no data-driven study of what range of disaggregation is appropriate in HPC. To that end, we perform a detailed analysis of key metrics sampled in NERSC’s Cori, a production HPC system that executes a diverse open-science HPC workload. In addition, we profile a variety of deep-learning applications to represent an emerging workload. We show that for a rack (cabinet) configuration and applications similar to Cori, a central processing unit with intra-rack disaggregation has a 99.5% probability to find all resources it requires inside its rack. In addition, ideal intra-rack resource disaggregation in Cori could reduce memory and NIC resources by 5.36% to 69.01% and still satisfy the worst-case average rack utilization.

...read moreread less

13 citations

Journal Article•DOI•

Approaches of enhancing interoperations among high performance computing and big data analytics via augmentation

[...]

Ajeet Ram Pathak¹, Manjusha Pandey¹, Siddharth Swarup Rautaray¹•Institutions (1)

KIIT University¹

01 Jun 2020-Cluster Computing

TL;DR: This paper sheds light upon how big data frameworks can be ported to HPC platforms as a preliminary step towards the convergence of big data and exascale computing ecosystem.

...read moreread less

Abstract: The dawn of exascale computing and its convergence with big data analytics has greatly spurred research interests. The reasons are straightforward. Traditionally, high performance computing (HPC) systems have been used for scientific applications involving majority of compute-intensive tasks. At the same time, the proliferation of big data resulted into design of data-intensive processing paradigms like Apache big data stack. Big data generating at high pace necessitates faster processing mechanisms for getting insights at a real time. For this, the HPC systems may serve as panacea for solving the big data problems. Though the HPC systems have the capability to give the promising results for big data, directly integrating them with existing data-intensive frameworks like Apache big data stack is not straightforward due to challenges associated with them. This triggers a research on seamlessly integrating these two paradigms based on interoperable framework, programming model, and system architecture. The aim of this paper is to assess a progress made in HPC world as an effort to augment it with big data analytics support. As an outcome of this, the taxonomy showing the factors to be considered for augmenting HPC systems with big data support has been put forth. This paper sheds light upon how big data frameworks can be ported to HPC platforms as a preliminary step towards the convergence of big data and exascale computing ecosystem. The focus is given on research issues related to augmenting HPC paradigms with big data frameworks and corresponding approaches to address those issues. This paper also discusses data-intensive as well as compute-intensive processing paradigms, benchmark suites and workloads, and future directions in the domain of integrating HPC with big data analytics.

...read moreread less

11 citations

Proceedings Article•DOI•

bespoKV: application tailored scale-out key-value stores

[...]

Ali Anwar¹, Yue Cheng², Hai Huang¹, Jingoo Han³, Hyogi Sim⁴, Dongyoon Lee³, Fred Douglis, Ali R. Butt³ - Show less +4 more•Institutions (4)

IBM¹, George Mason University², Virginia Tech³, Oak Ridge National Laboratory⁴

11 Nov 2018

TL;DR: BESPOKV is presented, an adaptive, extensible, and scale-out KV store framework that decouples theKV store design into the control plane for distributed management and the data plane for local data store, and transparently enables a scalable and fault-tolerant distributed KV Store service.

...read moreread less

Abstract: Enterprise KV stores are not well suited for HPC applications, and entail customization and cumbersome end-to-end KV design to extract the HPC application needs. To this end, in this paper we present bespoKV, an adaptive, extensible, and scale-out KV store framework. bespoKV decouples the KV store design into the control plane for distributed management and the data plane for local data store. bespoKV takes as input a single-server KV store, called a datalet, and transparently enables a scalable and fault-tolerant distributed KV store service. The resulting distributed stores are also adaptive to consistency or topology requirement changes and can be easily extended for new types of services. Experiments show that bespoKV-enabled distributed KV stores scale horizontally to a large number of nodes, and performs comparably and sometimes better than the state-of-the-art systems.

...read moreread less

10 citations

Proceedings Article•DOI•

SharP Hash: A High-Performing Distributed Hash for Extreme-Scale Systems

[...]

Zachary W. Parchman¹, Ferrol Aderholdt², Manjunath Gorentla Venkata²•Institutions (2)

Tennessee Technological University¹, Oak Ridge National Laboratory²

01 Sep 2017

...read moreread less

6 citations

1
2
3
4
…