scispace - formally typeset
Search or ask a question

Showing papers by "Michael Wilde published in 2019"


Proceedings ArticleDOI
17 Jun 2019
TL;DR: Parsl as mentioned in this paper is a parallel scripting library that augments Python with simple, scalable, and flexible constructs for encoding parallelism, allowing it to construct a dynamic dependency graph of components that it can then execute efficiently on one or many processors.
Abstract: High-level programming languages such as Python are increasingly used to provide intuitive interfaces to libraries written in lower-level languages and for assembling applications from various components. This migration towards orchestration rather than implementation, coupled with the growing need for parallel computing (e.g., due to big data and the end of Moore's law), necessitates rethinking how parallelism is expressed in programs. Here, we present Parsl, a parallel scripting library that augments Python with simple, scalable, and flexible constructs for encoding parallelism. These constructs allow Parsl to construct a dynamic dependency graph of components that it can then execute efficiently on one or many processors. Parsl is designed for scalability, with an extensible set of executors tailored to different use cases, such as low-latency, high-throughput, or extreme-scale execution. We show, via experiments on the Blue Waters supercomputer, that Parsl executors can allow Python scripts to execute components with as little as 5 ms of overhead, scale to more than 250000 workers across more than 8000 nodes, and process upward of 1200 tasks per second. Other Parsl features simplify the construction and execution of composite programs by supporting elastic provisioning and scaling of infrastructure, fault-tolerant execution, and integrated wide-area data management. We show that these capabilities satisfy the needs of many-task, interactive, online, and machine learning applications in fields such as biology, cosmology, and materials science.

117 citations


Proceedings ArticleDOI
TL;DR: Parsl as mentioned in this paper is a parallel scripting library that augments Python with simple, scalable, and flexible constructs for encoding parallelism, allowing it to construct a dynamic dependency graph of components that it can then execute efficiently on one or many processors.
Abstract: High-level programming languages such as Python are increasingly used to provide intuitive interfaces to libraries written in lower-level languages and for assembling applications from various components. This migration towards orchestration rather than implementation, coupled with the growing need for parallel computing (e.g., due to big data and the end of Moore's law), necessitates rethinking how parallelism is expressed in programs. Here, we present Parsl, a parallel scripting library that augments Python with simple, scalable, and flexible constructs for encoding parallelism. These constructs allow Parsl to construct a dynamic dependency graph of components that it can then execute efficiently on one or many processors. Parsl is designed for scalability, with an extensible set of executors tailored to different use cases, such as low-latency, high-throughput, or extreme-scale execution. We show, via experiments on the Blue Waters supercomputer, that Parsl executors can allow Python scripts to execute components with as little as 5 ms of overhead, scale to more than 250 000 workers across more than 8000 nodes, and process upward of 1200 tasks per second. Other Parsl features simplify the construction and execution of composite programs by supporting elastic provisioning and scaling of infrastructure, fault-tolerant execution, and integrated wide-area data management. We show that these capabilities satisfy the needs of many-task, interactive, online, and machine learning applications in fields such as biology, cosmology, and materials science.

103 citations


01 Jan 2019
TL;DR: Parsl (Parallel Scripting Library), a Python library for programming and executing data-oriented workflows in parallel, addresses problems of sophisticated orchestration and management of applications and data, and customization for specific execution environments.
Abstract: Computational and data-driven research practices have significantly changed over the past decade to encompass new analysis models such as interactive and online computing. Science gateways are simultaneously evolving to support this transforming landscape with the aim to enable transparent, scalable execution of a variety of analyses. Science gateways often rely on workflow management systems to represent and execute analyses efficiently and reliably. However, integrating workflow systems in science gateways can be challenging, especially as analyses become more interactive and dynamic, requiring sophisticated orchestration and management of applications and data, and customization for specific execution environments. Parsl (Parallel Scripting Library), a Python library for programming and executing data-oriented workflows in parallel, addresses these problems. Developers simply annotate a Python script with Parsl directives wrapping either Python functions or calls to external applications. Parsl manages the execution of the script on clusters, clouds, grids, and other resources; orchestrates required data movement; and manages the execution of Python functions and external applications in parallel. The Parsl library can be easily integrated into Python-based gateways, allowing for simple management and scaling of workflows. Parsl, Parallel scripting, Python, Scientific Workflows—

22 citations


Proceedings ArticleDOI
28 Jul 2019
TL;DR: Parsl makes it straightforward for developers to implement parallelism in Python by annotating functions that can be executed asynchronously and in parallel, and to scale analyses from a laptop to thousands of nodes on a supercomputer or distributed system.
Abstract: Python is increasingly the lingua franca of scientific computing. It is used as a higher level language to wrap lower-level libraries and to compose scripts from various independent components. However, scaling and moving Python programs from laptops to supercomputers remains a challenge. Here we present Parsl, a parallel scripting library for Python. Parsl makes it straightforward for developers to implement parallelism in Python by annotating functions that can be executed asynchronously and in parallel, and to scale analyses from a laptop to thousands of nodes on a supercomputer or distributed system. We examine how Parsl is implemented, focusing on syntax and usage. We describe two scientific use cases in which Parsl's intuitive and scalable parallelism is used.

13 citations