Top 8 papers published by Michael Wilde from Argonne National Laboratory in 2021

Journal Article•DOI•

Application of an automated machine learning-genetic algorithm (AutoML-GA) coupled with computational fluid dynamics simulations for rapid engine design optimization:

[...]

Opeoluwa Owoyele¹, Pinaki Pal¹, Alvaro Vidal Torreira, Daniel Probst, Matthew Shaxted, Michael Wilde, Peter Kelly Senecal - Show less +3 more•Institutions (1)

Argonne National Laboratory¹

14 Jul 2021-International Journal of Engine Research

TL;DR: An automated active learning approach, AutoML-GA, for surrogate-based optimization of internal combustion engines, using a Bayesian optimization technique to find the best machine learning hyperparameters based on an initial dataset obtained from a small number of CFD simulations.

...read moreread less

Abstract: In recent years, the use of machine learning-based surrogate models for computational fluid dynamics (CFD) simulations has emerged as a promising technique for reducing the computational cost assoc...

...read moreread less

11 citations

DOI•

Workflows Community Summit: Bringing the Scientific Workflows Community Together.

[...]

16 Mar 2021

TL;DR: Workflows Community Summit 2019 as mentioned in this paper provided a view of the state of the art and identified crucial research challenges in the workflow community, which were translated into 6 broad themes for the summit, each of them being the object of a focused discussion.

...read moreread less

Abstract: Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions must be managed using some software infrastructure. Due to the popularity of workflows, workflow management systems (WMSs) have been developed to provide abstractions for creating and executing workflows conveniently, efficiently, and portably. While these efforts are all worthwhile, there are now hundreds of independent WMSs, many of which are moribund. As a result, the WMS landscape is segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist. As a result, many teams, small and large, still elect to build their own custom workflow solution rather than adopt, or build upon, existing WMSs. This current state of the WMS landscape negatively impacts workflow users, developers, and researchers. The "Workflows Community Summit" was held online on January 13, 2021. The overarching goal of the summit was to develop a view of the state of the art and identify crucial research challenges in the workflow community. Prior to the summit, a survey sent to stakeholders in the workflow community (including both developers of WMSs and users of workflows) helped to identify key challenges in this community that were translated into 6 broad themes for the summit, each of them being the object of a focused discussion led by a volunteer member of the community. This report documents and organizes the wealth of information provided by the participants before, during, and after the summit.

...read moreread less

8 citations

Posted Content•DOI•

Workflows Community Summit: Advancing the State-of-the-art of Scientific Workflows Management Systems Research and Development

[...]

Rafael Ferreira da Silva, Henri Casanova, Kyle Chard, Tainã Coleman, Dan Laney, Dong H. Ahn, Shantenu Jha, Dorran Howell, Stian Soiland-Reyes, Ilkay Altintas, Douglas Thain, Rosa Filgueira, Yadu Babuji, Rosa M. Badia, Bartosz Balis, Silvina Caíno-Lores, S. Callaghan, Frederik Coppens, Michael R. Crusoe, Kaushik De, Frank Di Natale, Tu Mai Anh Do, Bjoern Enders, Thomas Fahringer, Anne Fouilloux, Grigori Fursin, Alban Gaignard, Alex Ganose, Daniel Garijo, Sandra Gesing, Carole Goble, Adil Hasan, Sebastiaan P. Huber, Daniel S. Katz, Ulf Leser, Douglas Lowe, Bertram Ludäscher, Ketan Maheshwari, Maciej Malawski, Rajiv Mayani, Kshitij Mehta, Andre Merzky, Todd Munson, Jonathan Ozik, Loïc Pottier, Sashko Ristov, Mehdi Roozmeh, Renan Souza, Frédéric Suter, Benjamin Tovar, Matteo Turilli, Karan Vahi, Alvaro Vidal-Torreira, Wendy R. Whitcup, Michael Wilde, Alan Williams, Matthew Wolf, Justin M. Wozniak - Show less +54 more

09 Jun 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: Workflows are increasingly composed of tasks that perform computations such as short machine learning inference, multi-node simulations, long-running machine learning model training, and thus increasingly rely on heterogeneous architectures that include CPUs but also GPUs and accelerators.

...read moreread less

Abstract: Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale HPC platforms. Workflows will play a crucial role in the data-oriented and post-Moore's computing landscape as they democratize the application of cutting-edge research techniques, computationally intensive methods, and use of new computing platforms. As workflows continue to be adopted by scientific projects and user communities, they are becoming more complex. Workflows are increasingly composed of tasks that perform computations such as short machine learning inference, multi-node simulations, long-running machine learning model training, amongst others, and thus increasingly rely on heterogeneous architectures that include CPUs but also GPUs and accelerators. The workflow management system (WMS) technology landscape is currently segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist. Another fundamental problem is that there are conflicting theoretical bases and abstractions for a WMS. Systems that use the same underlying abstractions can likely be translated between, which is not the case for systems that use different abstractions. More information: this https URL

...read moreread less

4 citations

Posted Content•DOI•

Workflows Community Summit: Bringing the Scientific Workflows Community Together

[...]

16 Mar 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: Workflows Community Summit 2019 as discussed by the authors provided a view of the state of the art and identified crucial research challenges in the workflow community, which were translated into 6 broad themes for the summit, each of them being the object of a focused discussion.

...read moreread less

Abstract: Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions must be managed using some software infrastructure. Due to the popularity of workflows, workflow management systems (WMSs) have been developed to provide abstractions for creating and executing workflows conveniently, efficiently, and portably. While these efforts are all worthwhile, there are now hundreds of independent WMSs, many of which are moribund. As a result, the WMS landscape is segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist. As a result, many teams, small and large, still elect to build their own custom workflow solution rather than adopt, or build upon, existing WMSs. This current state of the WMS landscape negatively impacts workflow users, developers, and researchers. The "Workflows Community Summit" was held online on January 13, 2021. The overarching goal of the summit was to develop a view of the state of the art and identify crucial research challenges in the workflow community. Prior to the summit, a survey sent to stakeholders in the workflow community (including both developers of WMSs and users of workflows) helped to identify key challenges in this community that were translated into 6 broad themes for the summit, each of them being the object of a focused discussion led by a volunteer member of the community. This report documents and organizes the wealth of information provided by the participants before, during, and after the summit.

...read moreread less

2 citations

Posted Content•

Application of an automated machine learning-genetic algorithm (AutoML-GA) coupled with computational fluid dynamics simulations for rapid engine design optimization

[...]

Opeoluwa Owoyele¹, Pinaki Pal¹, Alvaro Vidal Torreira, Daniel Probst, Matthew Shaxted, Michael Wilde, Peter Kelly Senecal - Show less +3 more•Institutions (1)

Argonne National Laboratory¹

07 Jan 2021-arXiv: Learning

TL;DR: In this article, an automated active learning approach, AutoML-GA, was proposed for surrogate-based optimization of internal combustion engines, where a Bayesian optimization technique was used to find the best machine learning hyperparameters based on an initial dataset obtained from a small number of computational fluid dynamics simulations.

...read moreread less

Abstract: In recent years, the use of machine learning-based surrogate models for computational fluid dynamics (CFD) simulations has emerged as a promising technique for reducing the computational cost associated with engine design optimization. However, such methods still suffer from drawbacks. One main disadvantage of is that the default machine learning (ML) hyperparameters are often severely suboptimal for a given problem. This has often been addressed by manually trying out different hyperparameter settings, but this solution is ineffective in a high-dimensional hyperparameter space. Besides this problem, the amount of data needed for training is also not known a priori. In response to these issues that need to be addressed, the present work describes and validates an automated active learning approach, AutoML-GA, for surrogate-based optimization of internal combustion engines. In this approach, a Bayesian optimization technique is used to find the best machine learning hyperparameters based on an initial dataset obtained from a small number of CFD simulations. Subsequently, a genetic algorithm is employed to locate the design optimum on the ML surrogate surface. In the vicinity of the design optimum, the solution is refined by repeatedly running CFD simulations at the projected optimum and adding the newly obtained data to the training dataset. It is demonstrated that AutoML-GA leads to a better optimum with a lower number of CFD simulations, compared to the use of default hyperparameters. The proposed framework offers the advantage of being a more hands-off approach that can be readily utilized by researchers and engineers in industry who do not have extensive machine learning expertise.

...read moreread less

Journal Article•DOI•

Extended Abstract: Productive Parallel Programming with Parsl

[...]

Kyle Chard¹, Yadu Babuji¹, Anna Woodard¹, Ben Clifford¹, Zhuozhao Li¹, Mihael Hategan¹, Ian Foster¹, Michael Wilde, Daniel S. Katz² - Show less +5 more•Institutions (2)

University of Chicago¹, University of Illinois at Urbana–Champaign²

27 Apr 2021-ACM Sigada Ada Letters

TL;DR: Parsl as discussed by the authors is a parallel programming library for Python that aims to make it easy to specify parallelism in programs and to realize that parallelism on arbitrary parallel and distributed computing systems.

...read moreread less

Abstract: Parsl is a parallel programming library for Python that aims to make it easy to specify parallelism in programs and to realize that parallelism on arbitrary parallel and distributed computing systems. Parsl relies on developers annotating Python functions-wrapping either Python or external applications-to indicate that these functions may be executed concurrently. Developers can then link together functions via the exchange of data. Parsl establishes a dynamic dependency graph and sends tasks for execution on connected resources when dependencies are resolved. Parsl's runtime system enables different compute resources to be used, from laptops to supercomputers, without modification to the Parsl program.

...read moreread less

Posted Content•

An automated machine learning-genetic algorithm (AutoML-GA) approach for efficient simulation-driven engine design optimization.

[...]

Opeoluwa Owoyele, Pinaki Pal, Alvaro Vidal Torreira, Daniel Probst, Matthew Shaxted, Michael Wilde, Peter Kelly Senecal - Show less +3 more

07 Jan 2021-arXiv: Learning

TL;DR: In this paper, an automated active learning approach for surrogate-based optimization of internal combustion engines, AutoML-GA, is presented, where a Bayesian optimization technique is used to find the best machine learning hyperparameters based on an initial dataset obtained from a small number of CFD simulations, and a genetic algorithm is employed to locate the design optimum on the surrogate surface trained with the optimal hyperparameter.

...read moreread less

Abstract: In recent years, the use of machine learning techniques as surrogate models for computational fluid dynamics (CFD) simulations has emerged as a promising method for reducing the computational cost associated with engine design optimization. However, such methods still suffer from drawbacks. One main disadvantage of such methods is that the default machine learning hyperparameters are often severely suboptimal for a given problem. This has often been addressed by manually trying out different hyperparameter settings, but this solution is ineffective in a high-dimensional hyperparameter space. Besides this problem, the amount of data needed for training is also not known a priori. In response to these issues which need to be addressed, this work describes and validates an automated active learning approach for surrogate-based optimization of internal combustion engines, AutoML-GA. In this approach, a Bayesian optimization technique is used to find the best machine learning hyperparameters based on an initial dataset obtained from a small number of CFD simulations. Subsequently, a genetic algorithm is employed to locate the design optimum on the surrogate surface trained with the optimal hyperparameters. In the vicinity of the design optimum, the solution is refined by repeatedly running CFD simulations at the projected optimum and adding the newly obtained data to the training dataset. It is shown that this approach leads to a better optimum with a lower number of CFD simulations, compared to the use of default hyperparameters. The developed approach offers the advantage of being a more hands-off approach that can be easily applied by researchers and engineers in industry who do not have a machine learning background.

...read moreread less

Posted Content•

Evaluating Distributed Execution of Workloads.

[...]

Matteo Turilli¹, Yadu Babuji², Andre Merzky¹, Ming Tai Ha¹, Michael Wilde², Daniel S. Katz³, Shantenu Jha - Show less +3 more•Institutions (3)

Rutgers University¹, United States Department of Energy², National Center for Supercomputing Applications³

03 Nov 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this paper, the authors compare the AIMES middleware and Swift workflow scripting language and runtime for distributed execution of workflows on Pilot-Jobs managed by the middleware, and use this insight to execute a multi-stage workflow across five production grade resources.

...read moreread less

Abstract: Resource selection and task placement for distributed execution poses conceptual and implementation difficulties. Although resource selection and task placement are at the core of many tools and workflow systems, the methods are ad hoc rather than being based on models. Consequently, partial and non-interoperable implementations proliferate. We address both the conceptual and implementation difficulties by experimentally characterizing diverse modalities of resource selection and task placement. We compare the architectures and capabilities of two systems: the AIMES middleware and Swift workflow scripting language and runtime. We integrate these systems to enable the distributed execution of Swift workflows on Pilot-Jobs managed by the AIMES middleware. Our experiments characterize and compare alternative execution strategies by measuring the time to completion of heterogeneous uncoupled workloads executed at diverse scale and on multiple resources. We measure the adverse effects of pilot fragmentation and early binding of tasks to resources and the benefits of backfill scheduling across pilots on multiple resources. We then use this insight to execute a multi-stage workflow across five production-grade resources. We discuss the importance and implications for other tools and workflow systems.

...read moreread less

Showing papers by "Michael Wilde published in 2021"