scispace - formally typeset
Search or ask a question

Showing papers by "Robert E. Walkup published in 2008"


Proceedings ArticleDOI
20 Apr 2008
TL;DR: A first implementation provides 256 concurrent 64b counters which offers an up to 64x increase in counter number compared to performance monitors typically found in microprocessors today, and thereby dramatically expands the capabilities of counter-based performance tuning.
Abstract: We present a novel performance monitor architecture, implemented in the Blue Gene/PTM supercomputer. This performance monitor supports the tracking of a large number of concurrent events by using a hybrid counter architecture. The counters have their low order data implemented in registers which are concurrently updated, while the high order counter data is maintained in a dense SRAM array that is updated from the registers on a regular basis. The per formance monitoring architecture includes support for per- event thresholding and fast event notification, using a two- phase interrupt-arming and triggering protocol. A first implementation provides 256 concurrent 64b counters which offers an up to 64x increase in counter number compared to performance monitors typically found in microprocessors today, and thereby dramatically expands the capabilities of counter-based performance tuning.

35 citations


Journal ArticleDOI
TL;DR: The Gyrokinetic Toroidal Code (GTC) was developed to study the global influence of microturbulence on particle and energy confinement, and has been optimized on the IBM Blue Gene/L™ (BG/L) computer, achieving essentially linear scaling on more than 30,000 processors.
Abstract: As the global energy economy makes the transition from fossil fuels toward cleaner alternatives, nuclear fusion becomes an attractive potential solution for satisfying growing needs. Fusion, the power source of the stars, has been the focus of active research since the early, 1950s. While progress has been impressive--especially for magnetically confined plasma devices called tokamaks--the design of a practical power plant remains an outstanding challenge. A key topic of current interest is microturbulence, which is believed to be responsible for the unacceptably large leakage of energy and particles out of the hot plasma core. Understanding and controlling this process is of utmost importance for operating current devices and designing future ones. In addressing such issues, the Gyrokinetic Toroidal Code (GTC) was developed to study the global influence of microturbulence on particle and energy confinement. It has been optimized on the IBM Blue Gene/L™ (BG/L) computer, achieving essentially linear scaling on more than 30,000 processors. A full simulation of unprecedented phase-space resolution was carried out with 32,768 processors on the BG/L supercomputer located at the IBM T. J. Watson Research Center, providing new insights on the influence of collisions on microturbulence.

34 citations


Journal ArticleDOI
01 Jul 2008
TL;DR: In this article, the authors carried out a nature run involving an idealized high resolution rotating fluid on the hemisphere, at a size and resolution never before attempted, and used it to investigate scales that span the k-3 to k-5/3 kinetic energy spectral transition, via simulations.
Abstract: The Weather Research and Forecast (WRF) model is a model of the atmosphere for mesoscale research and operational numerical weather prediction (NWP). A petascale problem for WRF is a nature run that provides very high-resolution 'truth' against which more coarse simulations or perturbation runs may be com-pared for purposes of studying predictability, stochastic parameterization, and fundamental dynamics. We carried out a nature run involving an idealized high resolution rotating fluid on the hemisphere, at a size and resolution never before attempted, and used it to investigate scales that span the k-3 to k-5/3 kinetic energy spectral transition, via simulations. We used up to 15,360 processors of the New York Blue IBM BG/L machine at Stony Brook Uni-versity and Brookhaven National Laboratory. The grid we employed has 4486 by 4486 horizontal grid points and 101 vertical levels (2 billion cells) at 5km resolution; this is 32 times larger than the previously largest 63 million cell 2.5km resolution WRF CONUS benchmark [10]). To solve a problem of this size, we worked through issues of parallel I/O and scalability and employed more processors than have ever been used in a WRF run. We achieved a sustained 3.4 Tflop/s on the New York Blue sys-tem, inputting and then generating an enormous amount of data to produce a scientifically meaningful result. More than 200 GB of data was input to initialize the run, which then generated output datasets of 40 GB each simulated hour. The cost of output was considered a key component of our investigation. Then we ran the same problem on more than 12K processors of the XT4 system at NERSC and achieved 8.8 Tflop/s. Our primary result however is not just scalability and a high Tflop/s number, but capture of atmosphere features never before represented by simulation, and taking an important step towards understanding weather predict-ability at high resolution.

26 citations


Journal ArticleDOI
TL;DR: The High Order Method Modeling Environment is a scalable, spectral-element-based prototype for the Community Atmospheric Model component of the Community Climate System Model, which circumvents the time step restrictions associated with gravity waves.
Abstract: The High Order Method Modeling Environment is a scalable, spectral-element-based prototype for the Community Atmospheric Model component of the Community Climate System Model. The 3D moist primitive equations are solved on the cubed sphere with a hybrid pressure η vertical coordinate using an Emanuel convective parametrization for moist processes. Semi-implicit time integration, based on a preconditioned conjugate gradient solver, circumvents the time step restrictions associated with gravity waves. Benchmarks for two standard tests problems at 10 km horizontal resolution have been run on Blue Gene/L. Results obtained on a 32-rack Blue Gene/L system (65,536 processors, 183.5-teraflop peak) show sustained performance of 8.0 teraflops on 32,768 processors for the moist Held–Suarez test problem in coprocessor mode and 11.3 teraflops on 32,768 processors for the aquaplanet test problem, running in virtual node mode.

11 citations