scispace - formally typeset
Search or ask a question

Showing papers on "Fortran published in 2016"


ReportDOI
01 Apr 2016
TL;DR: This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers.
Abstract: This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication. PETSc includes an expanding suite of parallel linear, nonlinear equation solvers and time integrators that may be used in application codes written in Fortran, C, C++, Python, and MATLAB (sequential). PETSc provides many of the mechanisms needed within parallel application codes, such as parallel matrix and vector assembly routines. The library is organized hierarchically, enabling users to employ the level of abstraction that is most appropriate for a particular problem. By using techniques of object-oriented programming, PETSc provides enormous flexibility for users. PETSc is a sophisticated set of software tools; as such, for some users it initially has a much steeper learning curve than a simple subroutine library. In particular, for individuals without some computer science background, experience programming in C, C++ or Fortran and experience using a debugger such as gdb or dbx, it may require more » a significant amount of time to take full advantage of the features that enable efficient software use. However, the power of the PETSc design and the algorithms it incorporates may make the efficient implementation of many application codes simpler than “rolling them” yourself; For many tasks a package such as MATLAB is often the best tool; PETSc is not intended for the classes of problems for which effective MATLAB code can be written. PETSc also has a MATLAB interface, so portions of your code can be written in MATLAB to “try out” the PETSc solvers. The resulting code will not be scalable however because currently MATLAB is inherently not scalable; and PETSc should not be used to attempt to provide a “parallel linear solver” in an otherwise sequential code. Certainly all parts of a previously sequential code need not be parallelized but the matrix generation portion must be parallelized to expect any kind of reasonable performance. Do not expect to generate your matrix sequentially and then “use PETSc” to solve the linear system in parallel. Since PETSc is under continued development, small changes in usage and calling sequences of routines will occur. PETSc is supported; see the web site http://www.mcs.anl.gov/petsc for information on contacting support. A http://www.mcs.anl.gov/petsc/publications may be found a list of publications and web sites that feature work involving PETSc. We welcome any reports of corrections for this document. « less

430 citations


Journal ArticleDOI
TL;DR: The iEBE-VISHNU as discussed by the authors code package performs event-by-event simulations for relativistic heavy-ion collisions using a hybrid approach based on ( 2 + 1 )-dimensional viscous hydrodynamics coupled to a hadronic cascade model.

316 citations


Journal ArticleDOI
TL;DR: This paper discusses the modular approach to atomistic machine learning through the development of the open-source Atomistic Machine-learning Package ( Amp), which allows for representations of both the total and atom-centered potential energy surface, in both periodic and non-periodic systems.

302 citations


Journal ArticleDOI
TL;DR: The SEISCOPE optimization toolbox is a set of FORTRAN 90 routines, which implement first- order methods and second-order methods, for the solution of large-scale nonlinear optimization problems, including Traveltime tomography, least-squares migration, or full-waveform inversion.
Abstract: The SEISCOPE optimization toolbox is a set of FORTRAN 90 routines, which implement first-order methods (steepest-descent and nonlinear conjugate gradient) and second-order methods (l-BFGS and truncated Newton), for the solution of large-scale nonlinear optimization problems. An efficient line-search strategy ensures the robustness of these implementations. The routines are proposed as black boxes easy to interface with any computational code, where such large-scale minimization problems have to be solved. Traveltime tomography, least-squares migration, or full-waveform inversion are examples of such problems in the context of geophysics. Integrating the toolbox for solving this class of problems presents two advantages. First, it helps to separate the routines depending on the physics of the problem from the ones related to the minimization itself, thanks to the reverse communication protocol. This enhances flexibility in code development and maintenance. Second, it allows us to switch easily betw...

133 citations


Journal ArticleDOI
TL;DR: HOS-ocean is an efficient High-Order Spectral code developed to solve the deterministic propagation of nonlinear wavefields in open ocean and is released as open-source, developed and distributed under the terms of GNU General Public License (GPLv3).

98 citations


Journal ArticleDOI
TL;DR: A new release of the QCDLoop library based on a modern object-oriented framework with available new features such as the extension to the complex masses, the possibility to perform computations in double and quadruple precision simultaneously, and useful caching mechanisms to improve the computational speed are presented.

68 citations


Proceedings ArticleDOI
02 Jun 2016
TL;DR: The benefits of verified lifting are demonstrated by first automatically summarizing Fortran source code into a high-level predicate language, and subsequently translating the lifted summaries into Halide, with the translated code achieving median performance speedups of 4.1X and up to 24X for non-trivial stencils as compared to the original implementation.
Abstract: This paper demonstrates a novel combination of program synthesis and verification to lift stencil computations from low-level Fortran code to a high-level summary expressed using a predicate language. The technique is sound and mostly automated, and leverages counter-example guided inductive synthesis (CEGIS) to find provably correct translations. Lifting existing code to a high-performance description language has a number of benefits, including maintainability and performance portability. For example, our experiments show that the lifted summaries can enable domain specific compilers to do a better job of parallelization as compared to an off-the-shelf compiler working on the original code, and can even support fully automatic migration to hardware accelerators such as GPUs. We have implemented verified lifting in a system called STNG and have evaluated it using microbenchmarks, mini-apps, and real-world applications. We demonstrate the benefits of verified lifting by first automatically summarizing Fortran source code into a high-level predicate language, and subsequently translating the lifted summaries into Halide, with the translated code achieving median performance speedups of 4.1X and up to 24X for non-trivial stencils as compared to the original implementation.

64 citations


Proceedings ArticleDOI
10 Jul 2016
TL;DR: Verificarlo as mentioned in this paper is an extension to the LLVM compiler to automatically use Monte Carlo arithmetic in a transparent way for the end-user, and it supports all the major languages including C, C++, and Fortran.
Abstract: Numerical accuracy of floating point computation is a well studied topic which has not made its way to the end-user in scientific computing. Yet, it has become a critical issue with the recent requirements for code modernization to harness new highly parallel hardware and perform higher resolution computation. To democratize numerical accuracy analysis, it is important to propose tools and methodologies to study large use cases in a reliable and automatic way. In this paper, we propose verificarlo, an extension to the LLVM compiler to automatically use Monte Carlo Arithmetic in a transparent way for the end-user. It supports all the major languages including C, C++, and Fortran. Unlike source-to-source approaches, our implementation captures the influence of compiler optimizations on the numerical accuracy. We illustrate how Monte Carlo Arithmetic using the verificarlo tool outperforms the existing approaches on various use cases and is a step toward automatic numerical analysis.

64 citations


Journal ArticleDOI
TL;DR: New version of previously published Fortran and C programs for solving the Gross–Pitaevskii equation for a Bose–Einstein condensate with contact interaction in one, two and three spatial dimensions in imaginary and real time are presented, yielding both stationary and non-stationary solutions.

58 citations


Journal ArticleDOI
TL;DR: A pure scientific Python pseudo-spectral DNS code that nearly matches the performance of C++ for thousands of processors and billions of unknowns is described and a version optimized through Cython, that is found to match the speed ofC++.

51 citations


Journal ArticleDOI
TL;DR: An R-matrix Fortran package to solve coupled-channel problems in nuclear physics, and the Lagrange-mesh method, which deals with open and closed channels simultaneously, without numerical instability associated with closed channels.

01 Jan 2016
TL;DR: SQOPT is a software package for minimizing a convex quadratic function subject to both equality and inequality constraints, and is part of the SNOPT package for large-scale nonlinearly constrained optimization.
Abstract: SQOPT is a software package for minimizing a convex quadratic function subject to both equality and inequality constraints. SQOPT may also be used for linear programming and for nding a feasible point for a set of linear equalities and inequalities. SQOPT uses a two-phase, active-set, reduce-Hessian method. It is most ecient on problems with relatively few degrees of freedom (for example, if only some of the variables appear in the quadratic term, or the number of active constraints and bounds is nearly as large as the number of variables). However, unlike previous versions of SQOPT, there is no limit on the number of degrees of freedom. SQOPT is primarily intended for large linear and quadratic problems with sparse constraint matrices. A quadratic term 1 x T Hx in the objective function is represented by a user subroutine that returns the product Hx for a given vector x. SQOPT uses stable numerical methods throughout and includes a reliable basis package (for maintaining sparse LU factors of the basis matrix), a practical antidegeneracy procedure, scaling, and elastic bounds on any number of constraints and variables. SQOPT is part of the SNOPT package for large-scale nonlinearly constrained optimization. The source code is re-entrant and suitable for any machine with a Fortran compiler. SQOPT may be called from a driver program in Fortran, Matlab, or C/C++ with the new interface based on the Fortran 2003 standard on Fortran-C interoperability. A f2c translation of SQOPT to the C language is still provided, although this feature will be discontinued in the future (users should migrate to the new C/C++ interface). SQOPT can also be used as a stand-alone package, reading data in the MPS format used by commercial mathematical programming systems.

Journal ArticleDOI
TL;DR: OpenACC as discussed by the authors is a high-level approach based on compiler directives to mark regions of existing C, C++, or Fortran code to run on accelerators, which directly addresses code portability, leaving to compilers the support of each different accelerator, but one has to carefully assess the relative costs of portable approaches versus computing efficiency.
Abstract: An increasingly large number of HPC systems rely on heterogeneous architectures combining traditional multi-core CPUs with power efficient accelerators. Designing efficient applications for these systems have been troublesome in the past as accelerators could usually be programmed using specific programming languages threatening maintainability, portability, and correctness. Several new programming environments try to tackle this problem. Among them, OpenACC offers a high-level approach based on compiler directives to mark regions of existing C, C++, or Fortran codes to run on accelerators. This approach directly addresses code portability, leaving to compilers the support of each different accelerator, but one has to carefully assess the relative costs of portable approaches versus computing efficiency. In this paper, we address precisely this issue, using as a test-bench a massively parallel lattice Boltzmann algorithm. We first describe our multi-node implementation and optimization of the algorithm, using OpenACC and MPI. We then benchmark the code on a variety of processors, including traditional CPUs and GPUs, and make accurate performance comparisons with other GPU implementations of the same algorithm using CUDA and OpenCL. We also asses the performance impact associated with portable programming, and the actual portability and performance-portability of OpenACC-based applications across several state-of-the-art architectures. Copyright © 2016 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The Vofi library has been developed to accurately calculate the volume fraction field demarcated by implicitly-defined fluid interfaces in Cartesian grids with cubic cells by computes the integration limits along two coordinate directions and the local height function, that is the integrand of a double Gauss–Legendre integration with a variable number of nodes.

Journal ArticleDOI
TL;DR: A hybrid GPU implementation and performance analysis of Nekbone, which represents one of the core kernels of the incompressible Navier–Stokes solver Nek5000, is presented, which significantly minimizes the modification of the existing CPU code while extending the simulation capability of the code to GPU architectures.
Abstract: We present a hybrid GPU implementation and performance analysis of Nekbone, which represents one of the core kernels of the incompressible Navier---Stokes solver Nek5000. The implementation is based on OpenACC and CUDA Fortran for local parallelization of the compute-intensive matrix---matrix multiplication part, which significantly minimizes the modification of the existing CPU code while extending the simulation capability of the code to GPU architectures. Our discussion includes the GPU results of OpenACC interoperating with CUDA Fortran and the gather---scatter operations with GPUDirect communication. We demonstrate performance of up to 552 Tflops on 16, 384 GPUs of the OLCF Cray XK7 Titan.

Journal ArticleDOI
01 May 2016
TL;DR: An assessment of the performance of the code is presented, showing a significant improvement in the code running-time achieved by preconditioning, while strong scaling tests show a factor of 10 speed-up using 16 threads.
Abstract: We present the design and implementation of a spectral code, called SpectralPlasmaSolver (SPS), for the solution of the multi-dimensional Vlasov-Maxwell equations. The method is based on a Hermite-Fourier decomposition of the particle distribution function. The code is written in Fortran and uses the PETSc library for solving the non-linear equations and preconditioning and the FFTW library for the convolutions. SPS is parallelized for shared- memory machines using OpenMP. As a verification example, we discuss simulations of the two-dimensional Orszag-Tang vortex problem and successfully compare them against a fully kinetic Particle-In-Cell simulation. An assessment of the performance of the code is presented, showing a significant improvement in the code running-time achieved by preconditioning, while strong scaling tests show a factor of 10 speed-up using 16 threads.

Journal ArticleDOI
TL;DR: A kinetic Monte Carlo model of a reversible addition–fragmentation chain transfer (RAFT) process is presented and offers an efficient method for predicting average properties and molecular weight distributions of the polymer species, including the bivariate molecular weight distribution of the intermediate two-arm adduct.
Abstract: A kinetic Monte Carlo model of a reversible addition–fragmentation chain transfer (RAFT) process is presented. The algorithm has been developed and implemented in Julia for the three main RAFT theories under current discussion (slow fragmentation, intermediate radical termination, and intermediate radical termination with oligomers). Julia is a modern programming language designed to achieve high performance in numerical and scientific computing. Thanks to a careful optimization of the code, it is possible to simulate a RAFT reaction scheme in short computing times for any of the three theories. The code is benchmarked against other programming languages (MATLAB, Python, FORTRAN, and C), showing that Julia presents advantages for this particular system. The model offers an efficient method for predicting average properties and molecular weight distributions of the polymer species, including the bivariate molecular weight distribution of the intermediate two-arm adduct. The proposed model can also be emplo...

Book ChapterDOI
19 Jun 2016
TL;DR: TiDA as mentioned in this paper is a multicore programming model based on tiling and implemented as C++ and Fortran libraries, which hides the details of data decomposition, cache locality optimizations, and memory affinity management.
Abstract: The high energy costs for data movement compared to computation gives paramount importance to data locality management in programs. Managing data locality manually is not a trivial task and also complicates programming. Tiling is a well-known approach that provides both data locality and parallelism in an application. However, there is no standard programming construct to express tiling at the application level. We have developed a multicore programming model, TiDA, based on tiling and implemented the model as C++ and Fortran libraries. The proposed programming model has three high level abstractions, tiles, regions and tile iterator. These abstractions in the library hide the details of data decomposition, cache locality optimizations, and memory affinity management in the application. In this paper we unveil the internals of the library and demonstrate the performance and programability advantages of the model on five applications on multiple NUMA nodes. The library achieves up to 2.10x speedup over OpenMP in a single compute node for simple kernels, and up to 22x improvement over a single thread for a more complex combustion proxy application (SMC) on 24 cores. The MPI+TiDA implementation of geometric multigrid demonstrates a 30.9 % performance improvement over MPI+OpenMP when scaling to 3072 cores (excluding MPI communication overheads, 8.5 % otherwise).

Journal ArticleDOI
TL;DR: This paper proposed two parallel SCE-UA methods and implemented them on Intel multi-core CPU and NVIDIA many-core GPU by OpenMP and CUDA Fortran, respectively and the Griewank benchmark function was adopted in this paper to test and compare the performances of the serial and parallel Sce- UA methods.
Abstract: The famous global optimization SCE-UA method, which has been widely used in the field of environmental model parameter calibration, is an effective and robust method. However, the SCE-UA method has a high computational load which prohibits the application of SCE-UA to high dimensional and complex problems. In recent years, the hardware of computer, such as multi-core CPUs and many-core GPUs, improves significantly. These much more powerful new hardware and their software ecosystems provide an opportunity to accelerate the SCE-UA method. In this paper, we proposed two parallel SCE-UA methods and implemented them on Intel multi-core CPU and NVIDIA many-core GPU by OpenMP and CUDA Fortran, respectively. The Griewank benchmark function was adopted in this paper to test and compare the performances of the serial and parallel SCE-UA methods. According to the results of the comparison, some useful advises were given to direct how to properly use the parallel SCE-UA methods.

Journal ArticleDOI
01 Jan 2016
TL;DR: This paper presents a Python-based tool that greatly simplifies the generation of computational kernels from Fortran based applications and has utilized its tool to extract more than thirty computational kernel from a million-line climate simulation model.
Abstract: Computational kernels, which are small pieces of software that selectively capture the characteristics of larger applications, have been used successfully for decades. Kernels allow for the testing of a compiler's ability to optimize code, performance of future hardware and reproducing compiler bugs. Unfortunately they can be rather time consuming to create and do not always accurately represent the full complexity of large scientific applications. Furthermore, expert knowledge is often required to create such kernels. In this paper, we present a Python-based tool that greatly simplifies the generation of computational kernels from Fortran based applications. Our tool automatically extracts partial source code of a larger Fortran application into a stand-alone executable kernel. Additionally, our tool also generates state data necessary for proper execution and verification of the extracted kernel. We have utilized our tool to extract more than thirty computational kernels from a million-line climate simulation model. Our extracted kernels have been used for a variety of purposes including: code modernization, identification of limitations in compiler optimizations, numerical algorithm debugging, compiler bug reporting, and for procurement benchmarking.

Journal ArticleDOI
TL;DR: EMUstack, an open-source implementation of the Scattering Matrix Method (SMM) for solving field problems in layered media, is described in terms of Bloch modes that are found using the Finite Element Method (FEM).

Journal ArticleDOI
TL;DR: The proposed software named PDoublePop implements a client–server model for parallel genetic algorithms with advanced features for the local genetic algorithms such as an enhanced stopping rule, an advanced mutation scheme and periodical application of a local search procedure in order to solve optimization problems.

Proceedings ArticleDOI
04 Jan 2016
TL;DR: An examination of the code's performance demonstrates good parallel scaling, as well as an implementation of the FR method with a computational cost/degree- of-freedom/time-step that is essentially independent of the solution order of accuracy for structured geometries.
Abstract: A computational fluid dynamics code based on the flux reconstruction (FR) method is currently being developed at NASA Glenn Research Center to ultimately provide a large- eddy simulation capability that is both accurate and efficient for complex aeropropulsion flows. The FR approach offers a simple and efficient method that is easy to implement and accurate to an arbitrary order on common grid cell geometries. The governing compressible Navier-Stokes equations are discretized in time using various explicit Runge-Kutta schemes, with the default being the 3-stage/3rd-order strong stability preserving scheme. The code is written in modern Fortran (i.e., Fortran 2008) and parallelization is attained through MPI for execution on distributed-memory high-performance computing systems. An h- refinement study of the isentropic Euler vortex problem is able to empirically demonstrate the capability of the FR method to achieve super-accuracy for inviscid flows. Additionally, the code is applied to the Taylor-Green vortex problem, performing numerous implicit large-eddy simulations across a range of grid resolutions and solution orders. The solution found by a pseudo-spectral code is commonly used as a reference solution to this problem, and the FR code is able to reproduce this solution using approximately the same grid resolution. Finally, an examination of the code's performance demonstrates good parallel scaling, as well as an implementation of the FR method with a computational cost/degree- of-freedom/time-step that is essentially independent of the solution order of accuracy for structured geometries.

Posted Content
TL;DR: This paper briefly demonstrates how to construct a Continuous Time Stochastic Model using multivariate time series data, and how to estimate the embedded parameters.
Abstract: ctsmr is an R package providing a general framework for identifying and estimating partially observed continuous-discrete time gray-box models. The estimation is based on maximum likelihood principles and Kalman filtering efficiently implemented in Fortran. This paper briefly demonstrates how to construct a Continuous Time Stochastic Model using multivariate time series data, and how to estimate the embedded parameters. The setup provides a unique framework for statistical modeling of physical phenomena, and the approach is often called grey box modeling. Finally three examples are provided to demonstrate the capabilities of ctsmr.

Journal ArticleDOI
TL;DR: The pengeom subroutines (a subset of the penelope code) track particles through the material structure, independently of the details of the physics models adopted to describe the interactions, in Monte Carlo simulations of radiation transport with arbitrary interaction models.

Journal Article
TL;DR: It is shown that Automatic Differentiation operators can be provided in a dynamic language without sacrificing numeric performance, and to achieve this, general forward and reverse AD functions are added to a simple high-level dynamic language and support for them is included in an aggressive optimizing compiler.
Abstract: We show that Automatic Differentiation (AD) operators can be provided in a dynamic language without sacrificing numeric performance. To achieve this, general forward and reverse AD functions are added to a simple high-level dynamic language, and support for them is included in an aggressive optimizing compiler. Novel technical mechanisms are discussed, which have the ability to migrate the AD transformations from run-time to compile-time. The resulting system, although only a research prototype, exhibits startlingly good performance. In fact, despite the potential inefficiencies entailed by support of a functional-programming language and a first-class AD operator, performance is competitive with the fastest available preprocessor-based Fortran AD systems. On benchmarks involving nested use of the AD operators, it can even dramatically exceed their performance.

Journal ArticleDOI
TL;DR: This article describes Fortran codes produced, or organized, for the generation of the following random objects: numbers, probability vectors, unitary matrices, and quantum state vectors and density matrices.
Abstract: The usefulness of generating random configurations is recognized in many areas of knowledge. Fortran was born for scientific computing and has been one of the main programming languages in this area since then. And several ongoing projects targeting towards its betterment indicate that it will keep this status in the decades to come. In this article, we describe Fortran codes produced, or organized, for the generation of the following random objects: numbers, probability vectors, unitary matrices, and quantum state vectors and density matrices. Some matrix functions are also included and may be of independent interest.

Proceedings ArticleDOI
13 Mar 2016
TL;DR: This paper presents recent research exploring techniques to gain compiler auto-vectorization for unstructured mesh applications from the CFD domain and sees that there is considerable performance improvements with autovectorization.
Abstract: For modern x86 based CPUs with increasingly longer vector lengths, achieving good vectorization has become very important for gaining higher performance. Using very explicit SIMD vector programming techniques has been shown to give near optimal performance, however they are difficult to implement for all classes of applications particularly ones with very irregular memory accesses and usually require considerable re-factorisation of the code. Vector intrinsics are also not available for languages such as Fortran which is still heavily used in large production applications. The alternative is to depend on compiler auto-vectorization which usually have been less effective in vectorizing codes with irregular memory access patterns. In this paper we present recent research exploring techniques to gain compiler auto-vectorization for unstructured mesh applications. A key contribution is details on software techniques that achieve auto-vectorisation for a large production grade unstructured mesh application from the CFD domain so as to benefit from the vector units on the latest Intel processors without a significant code re-write. We use code generation tools in the OP2 domain specific library to apply the auto-vectorising optimisations automatically to the production code base and further explore the performance of the application compared to the performance with other parallelisations such as on the latest NVIDIA GPUs. We see that there is considerable performance improvements with autovectorization. The most compute intensive parallel loops in the large CFD application shows speedups of nearly 40% on a 20 core Intel Haswell system compared to their non-vectorized versions. However not all loops gain due to vectorization where loops with less computational intensity lose performance due to the associated overheads.

DOI
Akihiro Hayashi1, Jun Shirako1, Ettore Tiotto2, Robert Ho2, Vivek Sarkar1 
13 Nov 2016
TL;DR: To study potential performance improvements by compiling and optimizing high-level GPU programs, a set of OpenMP 4.0 benchmarks are evaluated and a comparable performance analysis is conducted among hand-written CUDA and automatically-generated GPU programs by the IBM XL and clang/LLVM compilers.
Abstract: While GPUs are increasingly popular for high-performance computing, optimizing the performance of GPU programs is a time-consuming and non-trivial process in general. This complexity stems from the low abstraction level of standard GPU programming models such as CUDA and OpenCL: programmers are required to orchestrate low-level operations in order to exploit the full capability of GPUs. In terms of software productivity and portability, a more attractive approach would be to facilitate GPU programming by providing high-level abstractions for expressing parallel algorithms.OpenMP is a directive-based shared memory parallel programming model and has been widely used for many years. From OpenMP 4.0 onwards, GPU platforms are supported by extending OpenMP's high-level parallel abstractions with accelerator programming. This extension allows programmers to write GPU programs in standard C/C++ or Fortran languages, without exposing too many details of GPU architectures.However, such high-level parallel programming strategies generally impose additional program optimizations on compilers, which could result in lower performance than fully hand-tuned code with low-level programming models. To study potential performance improvements by compiling and optimizing high-level GPU programs, in this paper, we 1) evaluate a set of OpenMP 4.× benchmarks on an IBM POWER8 and NVIDIA Tesla GPU platform and 2) conduct a comparable performance analysis among hand-written CUDA and automatically-generated GPU programs by the IBM XL and clang/LLVM compilers.

Journal ArticleDOI
Bill Long1
TL;DR: The next Fortran standard will include enhancements to the parallel programming model based on coarrays that is part of Fortran 2008, including support for teams of images, synchronization using events, collective and atomic operations, enhanced error reporting, and continued execution after the failure of an image.
Abstract: The next Fortran standard will include enhancements to the parallel programming model based on coarrays that is part of Fortran 2008. Included will be support for teams of images, synchronization using events, collective and atomic operations, enhanced error reporting, and continued execution after the failure of an image.