scispace - formally typeset
Search or ask a question
Author

Ferrol Aderholdt

Bio: Ferrol Aderholdt is an academic researcher from Oak Ridge National Laboratory. The author has contributed to research in topics: Virtual machine & Virtualization. The author has an hindex of 4, co-authored 21 publications receiving 60 citations. Previous affiliations of Ferrol Aderholdt include Middle Tennessee State University & Tennessee Technological University.

Papers
More filters
Book ChapterDOI
02 Aug 2016
TL;DR: The experimental results show that OpenSHMEM-UCX outperforms the vendor supplied Open SHMEM implementation in most cases on the Cray XK system by up to 40% with respect to message rate and up to 70% for the execution of application kernels.
Abstract: The OpenSHMEM reference implementation was developed towards the goal of developing an open source and high-performing OpenSHMEM implementation. To achieve portability and performance across various networks, the OpenSHMEM reference implementation uses GASNet and UCCS for network operations. Recently, new network layers have emerged with the promise of providing high-performance, scalability, and portability for HPC applications. In this paper, we implement the OpenSHMEM reference implementation to use the UCX framework for network operations. Then, we evaluate its performance and scalability on Cray XK systems to understand UCX’s suitability for developing the OpenSHMEM programming model. Further, we develop a benchmark called SHOMS for evaluating the OpenSHMEM implementation. Our experimental results show that OpenSHMEM-UCX outperforms the vendor supplied OpenSHMEM implementation in most cases on the Cray XK system by up to 40% with respect to message rate and up to 70% for the execution of application kernels.

12 citations

Proceedings ArticleDOI
26 May 2014
TL;DR: This paper presents a method of check pointing VMs by utilizing virtual machine introspection (VMI), which is able to determine which pages of memory within the guest are used or free and are better able to reduce the amount of pages written to disk during a checkpoint.
Abstract: Cloud Computing environments rely heavily on system-level virtualization. This is due to the inherent benefits of virtualization including fault tolerance through checkpoint/restart (C/R) mechanisms. Because clouds are the abstraction of large datacenters and large datacenters have a higher potential for failure, it is imperative that a C/R mechanism for such an environment provide minimal latency as well as a small checkpoint file size. Recently, there has been much research into C/R with respect to virtual machines (VM) providing excellent solutions to reduce either checkpoint latency or checkpoint file size. However, these approaches do not provide both. This paper presents a method of checkpointing VMs by utilizing virtual machine introspection (VMI). Through the usage of VMI, we are able to determine which pages of memory within the guest are used or free and are better able to reduce the amount of pages written to disk during a checkpoint. We have validated this work by using various benchmarks to measure the latency along with the checkpoint size. With respect to checkpoint file size, our approach results in file sizes within 24% or less of the actual used memory within the guest. Additionally, the checkpoint latency of our approach is up to 52% faster than KVM's default method.

10 citations

Proceedings ArticleDOI
01 Aug 2017
TL;DR: This work proposes and develops the programming abstraction called SHARed data-structure centric Programming abstraction (SharP), a simple, usable, and portable abstraction for hierarchical-heterogeneous memory and a unified programming abstraction for Big-Compute and Big-Data applications.
Abstract: The pre-exascale systems are expected to have a significant amount of hierarchical and heterogeneous on-node memory, and this trend of system architecture in extreme-scale systems is expected to continue into the exascale era. Along with hierarchical-heterogeneous memory, the system typically has a high-performing network and a compute accelerator. This system architecture is not only effective for running traditional High Performance Computing (HPC) applications (Big-Compute), but also running data-intensive HPC applications and Big-Data applications. As a consequence, there is a growing desire to have a single system serve the needs of both Big-Compute and Big-Data applications. Though the system architecture supports the convergence of the Big-Compute and Big-Data, the programming models have yet to evolve to support either hierarchical-heterogeneous memory systems or the convergence. In this work, we propose and develop the programming abstraction called SHARed data-structure centric Programming abstraction (SharP) to address both of these goals, i.e., provide (1) a simple, usable, and portable abstraction for hierarchical-heterogeneous memory and (2) a unified programming abstraction for Big-Compute and Big-Data applications. To evaluate SharP, we implement a Stencil benchmark using SharP, port QMCPack, a petascale-capable application, and adapt Memcached ecosystem, a popular Big-Data framework, to use SharP, and quantify the performance and productivity advantages. Additionally, we demonstrate the simplicity of using SharP on different memories including DRAM, High-bandwidth Memory (HBM), and non-volatile random access memory (NVRAM).

9 citations

Proceedings ArticleDOI
01 Sep 2017
TL;DR: SharP Hash's high performance is obtained through the use of high-performing networks and one-sided semantics and its performance characteristics are demonstrated with a synthetic micro-benchmark and implementation of a Key Value (KV) store, Memcached.
Abstract: A high-performing distributed hash is critical for achieving performance in many applications and system software using extreme-scale systems. It is also a central part of many Big-Data frameworks including Memcached, file systems, and job schedulers. However, there is a lack of high-performing distributed hash implementations. In this work, we propose, design, and implement, SharP Hash, a high-performing, RDMA-based distributed hash for extreme-scale systems. SharP Hash's high performance is obtained through the use of high-performing networks and one-sided semantics. We perform an evaluation of SharP Hash and demonstrate its performance characteristics with a synthetic micro-benchmark and implementation of a Key Value (KV) store, Memcached.

6 citations

Proceedings ArticleDOI
21 May 2012
TL;DR: This paper proposes a novel diskless check pointing technique on clusters of virtual machines that splits Virtual Machines into sets of orthogonal RAID systems and distributes parity evenly across the cluster, similar to a RAID-5 configuration, but using VM images as data elements.
Abstract: Today's high-end computing systems are facing a crisis of high failure rates due to increased numbers of components. Recent studies have shown that traditional fault tolerant techniques incur overheads that more than double execution times on these highly parallel machines. Thus, future high-end computing must be able to provide adequate fault tolerance at an acceptable cost or the burdens of fault management will severely affect the viability of such systems. Cluster virtualization offers a potentially unique solution for fault management, but brings significant overhead, especially for I/O. In this paper, we propose a novel diskless check pointing technique on clusters of virtual machines. Our technique splits Virtual Machines into sets of orthogonal RAID systems and distributes parity evenly across the cluster, similar to a RAID-5 configuration, but using VM images as data elements. Our theoretical analysis shows that our technique significantly reduces the overhead associated with check pointing by removing the disk I/O bottleneck.

5 citations


Cited by
More filters
Proceedings ArticleDOI
17 Jun 2013
TL;DR: A novel user-space file system that stores data in main memory and transparently spills over to other storage, like local flash memory or the parallel file system, as needed, which extends the reach of libraries like SCR to systems where they otherwise could not be used.
Abstract: With the massive scale of high-performance computing systems, long-running scientific parallel applications periodically save the state of their execution to files called checkpoints to recover from system failures Checkpoints are stored on external parallel file systems, but limited bandwidth makes this a time-consuming operation Multilevel checkpointing systems, like the Scalable Checkpoint/Restart (SCR) library, alleviate this bottleneck by caching checkpoints in storage located close to the compute nodes However, most large scale systems do not provide file storage on compute nodes, preventing the use of SCR We have implemented a novel user-space file system that stores data in main memory and transparently spills over to other storage, like local flash memory or the parallel file system, as needed This technique extends the reach of libraries like SCR to systems where they otherwise could not be used Furthermore, we expose file contents for Remote Direct Memory Access, allowing external tools to copy checkpoints to the parallel file system in the background with reduced CPU interruption Our file system scales linearly with node count and delivers a 1~PB/s throughput at three million MPI processes, which is 20x faster than the system RAM disk and 1000x faster than the parallel file system

56 citations

01 Jan 2001
TL;DR: 你可听好喽,可以安装Win-dows 98、DOS、Linux,甚至大名 鼎鼎的UNIX,够酷吧?这就是今天我要向大家介—VMware Works一tation
Abstract: 你可听好喽,这不是白给你一台真正的机器,而是用一个软件虚拟出另一台机器。这台虚拟机器上,可以安装Win-dows 98、DOS、Linux,甚至大名鼎鼎的UNIX,够酷吧?这就是今天我要向大家介绍的新朋友—VMware Works一tation。

37 citations

Journal ArticleDOI
TL;DR: Overall, this work observes that great advancements have been made in tackling the two primary exascale challenges: energy efficiency and fault tolerance, but still foresee two major concerns: the lack of suitable programming tools and the growing gap between processor performance and data bandwidth.
Abstract: The next generation of supercomputers will break the exascale barrier. Soon we will have systems capable of at least one quintillion (billion billion) floating-point operations per second (1018 FLOPS). Tremendous amounts of work have been invested into identifying and overcoming the challenges of the exascale era. In this work, we present an overview of these efforts and provide insight into the important trends, developments, and exciting research opportunities in exascale computing. We use a three-stage approach in which we (1) discuss various exascale landmark studies, (2) use data-driven techniques to analyze the large collection of related literature, and (3) discuss eight research areas in depth based on influential articles. Overall, we observe that great advancements have been made in tackling the two primary exascale challenges: energy efficiency and fault tolerance. However, as we look forward, we still foresee two major concerns: the lack of suitable programming tools and the growing gap between processor performance and data bandwidth (i.e., memory, storage, networks). Although we will certainly reach exascale soon, without additional research, these issues could potentially limit the applicability of exascale computing.

37 citations

Proceedings ArticleDOI
14 Mar 2015
TL;DR: A deep dive on VMI consistency aspects is presented to understand the sources of inconsistency in observed VM state and shows that, contrary to common expectation, pause-and-introspect based VMI techniques achieve very little to improve consistency despite their substantial performance impact.
Abstract: While there are a variety of existing virtual machine introspection (VMI) techniques, their latency, overhead, complexity and consistency trade-offs are not clear. In this work, we address this gap by first organizing the various existing VMI techniques into a taxonomy based upon their operational principles, so that they can be put into context. Next we perform a thorough exploration of their trade-offs both qualitatively and quantitatively. We present a comprehensive set of observations and best practices for efficient, accurate and consistent VMI operation based on our experiences with these techniques. Our results show the stunning range of variations in performance, complexity and overhead with different VMI techniques.We further present a deep dive on VMI consistency aspects to understand the sources of inconsistency in observed VM state and show that, contrary to common expectation, pause-and-introspect based VMI techniques achieve very little to improve consistency despite their substantial performance impact.

35 citations

Proceedings ArticleDOI
01 Dec 2012
TL;DR: The techniques of virtualizing a CCI, types of attacks on VCCI, vulnerabilities of VMMs and the significance of security tools and techniques for securing a V CCI are described.
Abstract: A multi-tenant Cloud Computing Infrastructure (CCI) consists of several Virtual Machines (VMs) running on same physical platform by using virtualization techniques. The VMs are monitored and managed by kernel based software i.e. Virtual Machine Monitor (VMM) or hypervisor which is main component of Virtualized Cloud Computing Infrastructure (VCCI). Due to software based vulnerabilities, VMMs are compromised to security attacks that may take place from inside or outside attackers. In order to formulate a secure VCCI, VMM must be protected by implementing strong security tools and techniques such as Encryption and Key Management (EKM), Access Control Mechanisms (ACMs), Intrusion Detection Tools (IDTs), Virtual Trusted Platform Module (vTPM), Virtual Firewalls (VFs) and Trusted Virtual Domains (TVDs). In this research paper we describe the techniques of virtualizing a CCI, types of attacks on VCCI, vulnerabilities of VMMs and we critically describe the significance of security tools and techniques for securing a VCCI.

28 citations