scispace - formally typeset
Search or ask a question
Author

Matthew Wolf

Other affiliations: Georgia Institute of Technology
Bio: Matthew Wolf is an academic researcher from Oak Ridge National Laboratory. The author has contributed to research in topics: Workflow & Data visualization. The author has an hindex of 26, co-authored 131 publications receiving 2782 citations. Previous affiliations of Matthew Wolf include Georgia Institute of Technology.


Papers
More filters
Journal ArticleDOI
TL;DR: The startling observations made in the last half decade of I/O research and development are described, and some of the challenges that remain as the coming Exascale era are detailed.
Abstract: Applications running on leadership platforms are more and more bottlenecked by storage input/output I/O. In an effort to combat the increasing disparity between I/O throughput and compute capability, we created Adaptable IO System ADIOS in 2005. Focusing on putting users first with a service oriented architecture, we combined cutting edge research into new I/O techniques with a design effort to create near optimal I/O methods. As a result, ADIOS provides the highest level of synchronous I/O performance for a number of mission critical applications at various Department of Energy Leadership Computing Facilities. Meanwhile ADIOS is leading the push for next generation techniques including staging and data processing pipelines. In this paper, we describe the startling observations we have made in the last half decade of I/O research and development, and elaborate the lessons we have learned along this journey. We also detail some of the challenges that remain as we look toward the coming Exascale era. Copyright © 2013 John Wiley & Sons, Ltd.

201 citations

Journal ArticleDOI
TL;DR: Experimental evaluations of the flexible ‘DataStager’ framework establish both the necessity of intelligent data staging and the high performance of the approach, using the GTC fusion modeling code and benchmarks running on 1000+ processors.
Abstract: Known challenges for petascale machines are that (1) the costs of I/O for high performance applications can be substantial, especially for output tasks like checkpointing, and (2) noise from I/O actions can inject undesirable delays into the runtimes of such codes on individual compute nodes. This paper introduces the flexible `DataStager' framework for data staging and alternative services within that jointly address (1) and (2). Data staging services moving output data from compute nodes to staging or I/O nodes prior to storage are used to reduce I/O overheads on applications' total processing times, and explicit management of data staging offers reduced perturbation when extracting output data from a petascale machine's compute partition. Experimental evaluations of DataStager on the Cray XT machine at Oak Ridge National Laboratory establish both the necessity of intelligent data staging and the high performance of our approach, using the GTC fusion modeling code and benchmarks running on 1000+ processors.

199 citations

Proceedings ArticleDOI
19 Apr 2010
TL;DR: PreDatA, short for Preparatory Data Analytics, is an approach to preparing and characterizing data while it is being produced by the large scale simulations running on peta-scale machines that enhances the scalability and flexibility of the current I/O stack on HEC platforms.
Abstract: Peta-scale scientific applications running on High End Computing (HEC) platforms can generate large volumes of data. For high performance storage and in order to be useful to science end users, such data must be organized in its layout, indexed, sorted, and otherwise manipulated for subsequent data presentation, visualization, and detailed analysis. In addition, scientists desire to gain insights into selected data characteristics ‘hidden’ or ‘latent’ in these massive datasets while data is being produced by simulations. PreDatA, short for Preparatory Data Analytics, is an approach to preparing and characterizing data while it is being produced by the large scale simulations running on peta-scale machines. By dedicating additional compute nodes on the machine as ‘staging’ nodes and by staging simulations' output data through these nodes, PreDatA can exploit their computational power to perform select data manipulations with lower latency than attainable by first moving data into file systems and storage. Such intransit manipulations are supported by the PreDatA middleware through asynchronous data movement to reduce write latency, application-specific operations on streaming data that are able to discover latent data characteristics, and appropriate data reorganization and metadata annotation to speed up subsequent data access. PreDatA enhances the scalability and flexibility of the current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and inspection, as well as for data exchange between concurrently running simulations.

173 citations

Proceedings ArticleDOI
13 Nov 2010
TL;DR: These measurements motivate developing a 'managed' IO approach using adaptive algorithms varying the IO system workload based on current levels and use areas, which achieves higher overall performance and less variability in both a typical usage environment and with artificially introduced levels of 'noise'.
Abstract: Significant challenges exist for achieving peak or even consistent levels of performance when using IO systems at scale They stem from sharing IO system resources across the processes of single largescale applications and/or multiple simultaneous programs causing internal and external interference, which in turn, causes substantial reductions in IO performance This paper presents interference effects measurements for two different file systems at multiple supercomputing sites These measurements motivate developing a 'managed' IO approach using adaptive algorithms varying the IO system workload based on current levels and use areas An implementation of these methods deployed for the shared, general scratch storage system on Oak Ridge National Laboratory machines achieves higher overall performance and less variability in both a typical usage environment and with artificially introduced levels of 'noise' The latter serving to clearly delineate and illustrate potential problems arising from shared system usage and the advantages derived from actively managing it

172 citations

Proceedings ArticleDOI
11 Jun 2009
TL;DR: Experimental evaluations of DataStager on the Cray XT machine at Oak Ridge National Laboratory establish the necessity of intelligent data staging and the high performance of the approach, using the GTC fusion modeling code and benchmarks running on 1000+ processors.
Abstract: Known challenges for petascale machines are that (1) the costs of I/O for high performance applications can be substantial, especially for output tasks like checkpointing, and (2) noise from I/O actions can inject undesirable delays into the runtimes of such codes on individual compute nodes. This paper introduces the flexible 'DataStager' framework for data staging and alternative services within that jointly address (1) and (2). Data staging services moving output data from compute nodes to staging or I/O nodes prior to storage are used to reduce I/O overheads on applications' total processing times, and explicit management of data staging offers reduced perturbation when extracting output data from a petascale machine's compute partition. Experimental evaluations of DataStager on the Cray XT machine at Oak Ridge National Laboratory establish both the necessity of intelligent data staging and the high performance of our approach, using the GTC fusion modeling code and benchmarks running on 1000+ processors.

147 citations


Cited by
More filters
01 May 1993
TL;DR: Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems.
Abstract: Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently—those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed-memory parallel machine which allows for message-passing of data between independently executing processors. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers--the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems. For large problems, the spatial algorithm achieves parallel efficiencies of 90% and a 1840-node Intel Paragon performs up to 165 faster than a single Cray C9O processor. Trade-offs between the three algorithms and guidelines for adapting them to more complex molecular dynamics simulations are also discussed.

29,323 citations

Journal Article
TL;DR: This book by a teacher of statistics (as well as a consultant for "experimenters") is a comprehensive study of the philosophical background for the statistical design of experiment.
Abstract: THE DESIGN AND ANALYSIS OF EXPERIMENTS. By Oscar Kempthorne. New York, John Wiley and Sons, Inc., 1952. 631 pp. $8.50. This book by a teacher of statistics (as well as a consultant for \"experimenters\") is a comprehensive study of the philosophical background for the statistical design of experiment. It is necessary to have some facility with algebraic notation and manipulation to be able to use the volume intelligently. The problems are presented from the theoretical point of view, without such practical examples as would be helpful for those not acquainted with mathematics. The mathematical justification for the techniques is given. As a somewhat advanced treatment of the design and analysis of experiments, this volume will be interesting and helpful for many who approach statistics theoretically as well as practically. With emphasis on the \"why,\" and with description given broadly, the author relates the subject matter to the general theory of statistics and to the general problem of experimental inference. MARGARET J. ROBERTSON

13,333 citations

Journal ArticleDOI
TL;DR: It is shown how to generate randomly symmetric structures, and how to introduce 'smart' variation operators, learning about preferable local environments, that substantially improve the efficiency of the evolutionary algorithm USPEX and allow reliable prediction of structures with up to ∼200 atoms in the unit cell.

1,010 citations

Journal ArticleDOI
27 Aug 1999-Science
TL;DR: Some recent progress in finding the global minima of potential energy functions is described, focusing on applications of the simple "basin-hopping" approach to atomic and molecular clusters and more complicated hypersurface deformation techniques for crystals and biomolecules.
Abstract: Finding the optimal solution to a complex optimization problem is of great importance in many fields, ranging from protein structure prediction to the design of microprocessor circuitry. Some recent progress in finding the global minima of potential energy functions is described, focusing on applications of the simple "basin-hopping" approach to atomic and molecular clusters and more complicated hypersurface deformation techniques for crystals and biomolecules. These methods have produced promising results and should enable larger and more complex systems to be treated in the future.

973 citations

Journal ArticleDOI
TL;DR: This paper discusses approaches and environments for carrying out analytics on Clouds for Big Data applications, and identifies possible gaps in technology and provides recommendations for the research community on future directions on Cloud-supported Big Data computing and analytics solutions.

773 citations