scispace - formally typeset
Search or ask a question
Author

Jeanine Cook

Bio: Jeanine Cook is an academic researcher from Sandia National Laboratories. The author has contributed to research in topics: Proxy (statistics) & Cache. The author has an hindex of 11, co-authored 53 publications receiving 653 citations. Previous affiliations of Jeanine Cook include New Mexico State University & Los Alamos National Laboratory.


Papers
More filters
Journal ArticleDOI
29 Mar 2011
TL;DR: The Structural Simulation Toolkit (SST) as discussed by the authors is an open, modular, parallel, multi-criteria, multiscale simulation framework for HPC systems that includes a number of processor, memory, and network models.
Abstract: As supercomputers grow, understanding their behavior and performance has become increasingly challenging. New hurdles in scalability, programmability, power consumption, reliability, cost, and cooling are emerging, along with new technologies such as 3D integration, GP-GPUs, silicon-photonics, and other "game changers". Currently, they HPC community lacks a unified toolset to evaluate these technologies and design for these challenges.To address this problem, a number of institutions have joined together to create the Structural Simulation Toolkit (SST), an open, modular, parallel, multi-criteria, multi-scale simulation framework. The SST includes a number of processor, memory, and network models. The SST has been used in a variety of network, memory, and application studies and aims to become the standard simulation framework for designing and procuring HPC systems.

270 citations

Proceedings ArticleDOI
16 Nov 2014
TL;DR: A set of abstract machine models are presented and applied to one of these models to demonstrate how a proxy architecture can enable a more concrete exploration of how well application codes map onto future architectures.
Abstract: To achieve exascale computing, fundamental hardware architectures must change. This will significantly impact scientific applications that run on current high performance computing (HPC) systems, many of which codify years of scientific domain knowledge and refinements for contemporary computer systems. To adapt to exascale architectures, developers must be able to reason about new hardware and determine what programming models and algorithms will provide the best blend of performance and energy efficiency in the future. An abstract machine model is designed to expose to the application developers and system software only the aspects of the machine that are important or relevant to performance and code structure. These models are intended as communication aids between application developers and hardware architects during the co-design process. A proxy architecture is a parameterized version of an abstract machine model, with parameters added to elucidate potential speeds and capacities of key hardware components. These more detailed architectural models enable discussion among the developers of analytic models and simulators and computer hardware architects and they allow for application performance analysis, system software development, and hardware optimization opportunities. In this paper, we present a set of abstract machine models and show how they might be used to help software developers prepare for exascale. We then apply parameters to one of these models to demonstrate how a proxy architecture can enable a more concrete exploration of how well application codes map onto future architectures.

62 citations

Proceedings ArticleDOI
08 Sep 2015
TL;DR: This paper characterize task scheduling overheads and show metrics to determine optimal task size, the first step toward the goal of dynamically adapting task size to optimize parallel performance.
Abstract: As High Performance Computing moves toward Exascale, where parallel applications will be expected to run on millions of cores concurrently, every component of the computational model must perform optimally. One such component, the task scheduler, can potentially be optimized to runtime application requirements. We focus our study using a task-based runtime system, one possible solution towards Exascale computation. Based on task size and scheduler, the overheads associated with task scheduling vary. Therefore, to minimize overheads and optimize performance, either the task size or the scheduler must adapt. In this paper, we focus on adapting the task size, which can be easily done statically and potentially done dynamically. To this end, we first show how scheduling overheads change with task size or granularity. We then propose and execute a methodology to characterize these overheads and dynamically measure the effects of task granularity. The HPX runtime system [1] employs asynchronous fine-grained task scheduling and incorporates a dynamic performance modeling capability, providing an ideal experimental platform. Using the performance counter capabilities in HPX, we characterize task scheduling overheads and show metrics to determine optimal task size. This is the first step toward the goal of dynamically adapting task size to optimize parallel performance.

33 citations

Proceedings ArticleDOI
27 Sep 2005
TL;DR: This work quantifies the estimation error of the event-counts in the multiplexed mode, which indicates that as many as 42% of sampled intervals are estimated with error greater than 10%, and proposes new estimation algorithms that result in an accuracy improvement of up to 40%.
Abstract: On-chip performance counters are gaining popularity as an analysis and validation tool Most contemporary processors have between two and six physical counters that can monitor an equal number of unique events simultaneously at fixed sampling periods Through multiplexing and estimation, an even greater number of unique events can be monitored in a single program execution When a program is sampled in multiplexed mode using round-robin scheduling of a specified event set, the number of events that are physically counted during each sampling period is limited by the number of counters that can be simultaneously accessed During this period, the remaining events of the multiplexed event-set are not monitored, but their counts are estimated Our work quantifies the estimation error of the event-counts in the multiplexed mode, which indicates that as many as 42% of sampled intervals are estimated with error greater than 10% We propose new estimation algorithms that result in an accuracy improvement of up to 40%

28 citations

Journal ArticleDOI
TL;DR: A proposed modeling framework can be used with an earthquake simulation and an aerospace application and reduces energy consumption by up to 48.65 percent and 30.67 percent, respectively.
Abstract: Energy-efficient scientific applications require insight into how high-performance computing system features impact the applications' power and performance. This insight results from the development of performance and power models. When used with an earthquake simulation and an aerospace application, a proposed modeling framework reduces energy consumption by up to 48.65 percent and 30.67 percent, respectively.

28 citations


Cited by
More filters
01 May 1993
TL;DR: Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems.
Abstract: Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently—those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed-memory parallel machine which allows for message-passing of data between independently executing processors. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers--the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems. For large problems, the spatial algorithm achieves parallel efficiencies of 90% and a 1840-node Intel Paragon performs up to 165 faster than a single Cray C9O processor. Trade-offs between the three algorithms and guidelines for adapting them to more complex molecular dynamics simulations are also discussed.

29,323 citations

Journal ArticleDOI
TL;DR: The process of validating DRAMSim2 timing against manufacturer Verilog models in an effort to prove the accuracy of simulation results is described.
Abstract: In this paper we present DRAMSim2, a cycle accurate memory system simulator. The goal of DRAMSim2 is to be an accurate and publicly available DDR2/3 memory system model which can be used in both full system and trace-based simulations. We describe the process of validating DRAMSim2 timing against manufacturer Verilog models in an effort to prove the accuracy of simulation results. We outline the combination of DRAMSim2 with a cycle-accurate x86 simulator that can be used to perform full system simulations. Finally, we discuss DRAMVis, a visualization tool that can be used to graph and compare the results of DRAMSim2 simulations.

860 citations

Journal ArticleDOI
TL;DR: A motivation and brief review of the ongoing effort to port Quantum ESPRESSO onto heterogeneous architectures based on hardware accelerators, which will overcome the energy constraints that are currently hindering the way toward exascale computing are presented.
Abstract: Quantum ESPRESSO is an open-source distribution of computer codes for quantum-mechanical materials modeling, based on density-functional theory, pseudopotentials, and plane waves, and renowned for its performance on a wide range of hardware architectures, from laptops to massively parallel computers, as well as for the breadth of its applications. In this paper, we present a motivation and brief review of the ongoing effort to port Quantum ESPRESSO onto heterogeneous architectures based on hardware accelerators, which will overcome the energy constraints that are currently hindering the way toward exascale computing.

543 citations