scispace - formally typeset
Search or ask a question

Showing papers by "Michael Wilde published in 2021"


Journal ArticleDOI
TL;DR: An automated active learning approach, AutoML-GA, for surrogate-based optimization of internal combustion engines, using a Bayesian optimization technique to find the best machine learning hyperparameters based on an initial dataset obtained from a small number of CFD simulations.
Abstract: In recent years, the use of machine learning-based surrogate models for computational fluid dynamics (CFD) simulations has emerged as a promising technique for reducing the computational cost assoc...

11 citations


DOI
16 Mar 2021
TL;DR: Workflows Community Summit 2019 as mentioned in this paper provided a view of the state of the art and identified crucial research challenges in the workflow community, which were translated into 6 broad themes for the summit, each of them being the object of a focused discussion.
Abstract: Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions must be managed using some software infrastructure. Due to the popularity of workflows, workflow management systems (WMSs) have been developed to provide abstractions for creating and executing workflows conveniently, efficiently, and portably. While these efforts are all worthwhile, there are now hundreds of independent WMSs, many of which are moribund. As a result, the WMS landscape is segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist. As a result, many teams, small and large, still elect to build their own custom workflow solution rather than adopt, or build upon, existing WMSs. This current state of the WMS landscape negatively impacts workflow users, developers, and researchers. The "Workflows Community Summit" was held online on January 13, 2021. The overarching goal of the summit was to develop a view of the state of the art and identify crucial research challenges in the workflow community. Prior to the summit, a survey sent to stakeholders in the workflow community (including both developers of WMSs and users of workflows) helped to identify key challenges in this community that were translated into 6 broad themes for the summit, each of them being the object of a focused discussion led by a volunteer member of the community. This report documents and organizes the wealth of information provided by the participants before, during, and after the summit.

8 citations


Posted ContentDOI
TL;DR: Workflows are increasingly composed of tasks that perform computations such as short machine learning inference, multi-node simulations, long-running machine learning model training, and thus increasingly rely on heterogeneous architectures that include CPUs but also GPUs and accelerators.
Abstract: Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale HPC platforms. Workflows will play a crucial role in the data-oriented and post-Moore's computing landscape as they democratize the application of cutting-edge research techniques, computationally intensive methods, and use of new computing platforms. As workflows continue to be adopted by scientific projects and user communities, they are becoming more complex. Workflows are increasingly composed of tasks that perform computations such as short machine learning inference, multi-node simulations, long-running machine learning model training, amongst others, and thus increasingly rely on heterogeneous architectures that include CPUs but also GPUs and accelerators. The workflow management system (WMS) technology landscape is currently segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist. Another fundamental problem is that there are conflicting theoretical bases and abstractions for a WMS. Systems that use the same underlying abstractions can likely be translated between, which is not the case for systems that use different abstractions. More information: this https URL

4 citations


Posted ContentDOI
TL;DR: Workflows Community Summit 2019 as discussed by the authors provided a view of the state of the art and identified crucial research challenges in the workflow community, which were translated into 6 broad themes for the summit, each of them being the object of a focused discussion.
Abstract: Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions must be managed using some software infrastructure. Due to the popularity of workflows, workflow management systems (WMSs) have been developed to provide abstractions for creating and executing workflows conveniently, efficiently, and portably. While these efforts are all worthwhile, there are now hundreds of independent WMSs, many of which are moribund. As a result, the WMS landscape is segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist. As a result, many teams, small and large, still elect to build their own custom workflow solution rather than adopt, or build upon, existing WMSs. This current state of the WMS landscape negatively impacts workflow users, developers, and researchers. The "Workflows Community Summit" was held online on January 13, 2021. The overarching goal of the summit was to develop a view of the state of the art and identify crucial research challenges in the workflow community. Prior to the summit, a survey sent to stakeholders in the workflow community (including both developers of WMSs and users of workflows) helped to identify key challenges in this community that were translated into 6 broad themes for the summit, each of them being the object of a focused discussion led by a volunteer member of the community. This report documents and organizes the wealth of information provided by the participants before, during, and after the summit.

2 citations


Posted Content
TL;DR: In this article, an automated active learning approach, AutoML-GA, was proposed for surrogate-based optimization of internal combustion engines, where a Bayesian optimization technique was used to find the best machine learning hyperparameters based on an initial dataset obtained from a small number of computational fluid dynamics simulations.
Abstract: In recent years, the use of machine learning-based surrogate models for computational fluid dynamics (CFD) simulations has emerged as a promising technique for reducing the computational cost associated with engine design optimization. However, such methods still suffer from drawbacks. One main disadvantage of is that the default machine learning (ML) hyperparameters are often severely suboptimal for a given problem. This has often been addressed by manually trying out different hyperparameter settings, but this solution is ineffective in a high-dimensional hyperparameter space. Besides this problem, the amount of data needed for training is also not known a priori. In response to these issues that need to be addressed, the present work describes and validates an automated active learning approach, AutoML-GA, for surrogate-based optimization of internal combustion engines. In this approach, a Bayesian optimization technique is used to find the best machine learning hyperparameters based on an initial dataset obtained from a small number of CFD simulations. Subsequently, a genetic algorithm is employed to locate the design optimum on the ML surrogate surface. In the vicinity of the design optimum, the solution is refined by repeatedly running CFD simulations at the projected optimum and adding the newly obtained data to the training dataset. It is demonstrated that AutoML-GA leads to a better optimum with a lower number of CFD simulations, compared to the use of default hyperparameters. The proposed framework offers the advantage of being a more hands-off approach that can be readily utilized by researchers and engineers in industry who do not have extensive machine learning expertise.

Journal ArticleDOI
TL;DR: Parsl as discussed by the authors is a parallel programming library for Python that aims to make it easy to specify parallelism in programs and to realize that parallelism on arbitrary parallel and distributed computing systems.
Abstract: Parsl is a parallel programming library for Python that aims to make it easy to specify parallelism in programs and to realize that parallelism on arbitrary parallel and distributed computing systems. Parsl relies on developers annotating Python functions-wrapping either Python or external applications-to indicate that these functions may be executed concurrently. Developers can then link together functions via the exchange of data. Parsl establishes a dynamic dependency graph and sends tasks for execution on connected resources when dependencies are resolved. Parsl's runtime system enables different compute resources to be used, from laptops to supercomputers, without modification to the Parsl program.

Posted Content
TL;DR: In this paper, an automated active learning approach for surrogate-based optimization of internal combustion engines, AutoML-GA, is presented, where a Bayesian optimization technique is used to find the best machine learning hyperparameters based on an initial dataset obtained from a small number of CFD simulations, and a genetic algorithm is employed to locate the design optimum on the surrogate surface trained with the optimal hyperparameter.
Abstract: In recent years, the use of machine learning techniques as surrogate models for computational fluid dynamics (CFD) simulations has emerged as a promising method for reducing the computational cost associated with engine design optimization. However, such methods still suffer from drawbacks. One main disadvantage of such methods is that the default machine learning hyperparameters are often severely suboptimal for a given problem. This has often been addressed by manually trying out different hyperparameter settings, but this solution is ineffective in a high-dimensional hyperparameter space. Besides this problem, the amount of data needed for training is also not known a priori. In response to these issues which need to be addressed, this work describes and validates an automated active learning approach for surrogate-based optimization of internal combustion engines, AutoML-GA. In this approach, a Bayesian optimization technique is used to find the best machine learning hyperparameters based on an initial dataset obtained from a small number of CFD simulations. Subsequently, a genetic algorithm is employed to locate the design optimum on the surrogate surface trained with the optimal hyperparameters. In the vicinity of the design optimum, the solution is refined by repeatedly running CFD simulations at the projected optimum and adding the newly obtained data to the training dataset. It is shown that this approach leads to a better optimum with a lower number of CFD simulations, compared to the use of default hyperparameters. The developed approach offers the advantage of being a more hands-off approach that can be easily applied by researchers and engineers in industry who do not have a machine learning background.

Posted Content
TL;DR: In this paper, the authors compare the AIMES middleware and Swift workflow scripting language and runtime for distributed execution of workflows on Pilot-Jobs managed by the middleware, and use this insight to execute a multi-stage workflow across five production grade resources.
Abstract: Resource selection and task placement for distributed execution poses conceptual and implementation difficulties. Although resource selection and task placement are at the core of many tools and workflow systems, the methods are ad hoc rather than being based on models. Consequently, partial and non-interoperable implementations proliferate. We address both the conceptual and implementation difficulties by experimentally characterizing diverse modalities of resource selection and task placement. We compare the architectures and capabilities of two systems: the AIMES middleware and Swift workflow scripting language and runtime. We integrate these systems to enable the distributed execution of Swift workflows on Pilot-Jobs managed by the AIMES middleware. Our experiments characterize and compare alternative execution strategies by measuring the time to completion of heterogeneous uncoupled workloads executed at diverse scale and on multiple resources. We measure the adverse effects of pilot fragmentation and early binding of tasks to resources and the benefits of backfill scheduling across pilots on multiple resources. We then use this insight to execute a multi-stage workflow across five production-grade resources. We discuss the importance and implications for other tools and workflow systems.