Showing papers on "Fortran published in 2016"

PDF

Open Access

Report•DOI•

[...]

Satish Balay¹, Shrirang Abhyankar¹, Mark F. Adams², Peter R. Brune¹, Kristopher R. Buschelman¹, Lisandro Dalcin³, W. Gropp¹, Barry Smith¹, Dmitry Karpeyev¹, Dinesh K. Kaushik¹, L. Curfman McInnes¹, Karl Rupp¹, Hong Zhang⁴, Stefano Zampini³ - Show less +10 more•Institutions (4)

Argonne National Laboratory¹, Lawrence Berkeley National Laboratory², King Abdullah University of Science and Technology³, Illinois Institute of Technology⁴

01 Apr 2016

TL;DR: This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers.

...read moreread less

Abstract: This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication. PETSc includes an expanding suite of parallel linear, nonlinear equation solvers and time integrators that may be used in application codes written in Fortran, C, C++, Python, and MATLAB (sequential). PETSc provides many of the mechanisms needed within parallel application codes, such as parallel matrix and vector assembly routines. The library is organized hierarchically, enabling users to employ the level of abstraction that is most appropriate for a particular problem. By using techniques of object-oriented programming, PETSc provides enormous flexibility for users. PETSc is a sophisticated set of software tools; as such, for some users it initially has a much steeper learning curve than a simple subroutine library. In particular, for individuals without some computer science background, experience programming in C, C++ or Fortran and experience using a debugger such as gdb or dbx, it may require more » a significant amount of time to take full advantage of the features that enable efficient software use. However, the power of the PETSc design and the algorithms it incorporates may make the efficient implementation of many application codes simpler than “rolling them” yourself; For many tasks a package such as MATLAB is often the best tool; PETSc is not intended for the classes of problems for which effective MATLAB code can be written. PETSc also has a MATLAB interface, so portions of your code can be written in MATLAB to “try out” the PETSc solvers. The resulting code will not be scalable however because currently MATLAB is inherently not scalable; and PETSc should not be used to attempt to provide a “parallel linear solver” in an otherwise sequential code. Certainly all parts of a previously sequential code need not be parallelized but the matrix generation portion must be parallelized to expect any kind of reasonable performance. Do not expect to generate your matrix sequentially and then “use PETSc” to solve the linear system in parallel. Since PETSc is under continued development, small changes in usage and calling sequences of routines will occur. PETSc is supported; see the web site http://www.mcs.anl.gov/petsc for information on contacting support. A http://www.mcs.anl.gov/petsc/publications may be found a list of publications and web sites that feature work involving PETSc. We welcome any reports of corrections for this document. « less

...read moreread less

430 citations

Journal Article•DOI•

The iEBE-VISHNU code package for relativistic heavy-ion collisions

[...]

Chun Shen¹, Chun Shen², Zhi Qiu¹, Huichao Song³, Huichao Song¹, Jonah E. Bernhard⁴, Steffen A. Bass⁴, Ulrich Heinz¹ - Show less +4 more•Institutions (4)

Ohio State University¹, McGill University², Peking University³, Duke University⁴

01 Feb 2016-Computer Physics Communications

TL;DR: The iEBE-VISHNU as discussed by the authors code package performs event-by-event simulations for relativistic heavy-ion collisions using a hybrid approach based on ( 2 + 1 )-dimensional viscous hydrodynamics coupled to a hadronic cascade model.

...read moreread less

316 citations

Journal Article•DOI•

Amp: A modular approach to machine learning in atomistic simulations☆

[...]

Alireza Khorshidi¹, Andrew A. Peterson¹•Institutions (1)

Brown University¹

01 Oct 2016-Computer Physics Communications

TL;DR: This paper discusses the modular approach to atomistic machine learning through the development of the open-source Atomistic Machine-learning Package ( Amp), which allows for representations of both the total and atom-centered potential energy surface, in both periodic and non-periodic systems.

...read moreread less

302 citations

Journal Article•DOI•

The SEISCOPE optimization toolbox: A large-scale nonlinear optimization library based on reverse communication

[...]

Ludovic Métivier¹, Romain Brossier¹•Institutions (1)

University of Grenoble¹

01 Mar 2016-Geophysics

TL;DR: The SEISCOPE optimization toolbox is a set of FORTRAN 90 routines, which implement first- order methods and second-order methods, for the solution of large-scale nonlinear optimization problems, including Traveltime tomography, least-squares migration, or full-waveform inversion.

...read moreread less

Abstract: The SEISCOPE optimization toolbox is a set of FORTRAN 90 routines, which implement first-order methods (steepest-descent and nonlinear conjugate gradient) and second-order methods (l-BFGS and truncated Newton), for the solution of large-scale nonlinear optimization problems. An efficient line-search strategy ensures the robustness of these implementations. The routines are proposed as black boxes easy to interface with any computational code, where such large-scale minimization problems have to be solved. Traveltime tomography, least-squares migration, or full-waveform inversion are examples of such problems in the context of geophysics. Integrating the toolbox for solving this class of problems presents two advantages. First, it helps to separate the routines depending on the physics of the problem from the ones related to the minimization itself, thanks to the reverse communication protocol. This enhances flexibility in code development and maintenance. Second, it allows us to switch easily betw...

...read moreread less

133 citations

Journal Article•DOI•

HOS-ocean: Open-source solver for nonlinear waves in open ocean based on High-Order Spectral method

[...]

Guillaume Ducrozet¹, Félicien Bonnefoy¹, David Le Touzé¹, Pierre Ferrant¹•Institutions (1)

École centrale de Nantes¹

01 Jun 2016-Computer Physics Communications

TL;DR: HOS-ocean is an efficient High-Order Spectral code developed to solve the deterministic propagation of nonlinear wavefields in open ocean and is released as open-source, developed and distributed under the terms of GNU General Public License (GPLv3).

...read moreread less

98 citations

Journal Article•DOI•

QCDLoop: A comprehensive framework for one-loop scalar integrals☆

[...]

Stefano Carrazza¹, R. Keith Ellis², Giulia Zanderighi³, Giulia Zanderighi¹•Institutions (3)

CERN¹, Durham University², University of Oxford³

01 Dec 2016-Computer Physics Communications

TL;DR: A new release of the QCDLoop library based on a modern object-oriented framework with available new features such as the extension to the complex masses, the possibility to perform computations in double and quadruple precision simultaneously, and useful caching mechanisms to improve the computational speed are presented.

...read moreread less

68 citations

Proceedings Article•DOI•

Verified lifting of stencil computations

[...]

Shoaib Kamil¹, Alvin Cheung², Shachar Itzhaky³, Armando Solar-Lezama³•Institutions (3)

Adobe Systems¹, University of Washington², Massachusetts Institute of Technology³

02 Jun 2016

TL;DR: The benefits of verified lifting are demonstrated by first automatically summarizing Fortran source code into a high-level predicate language, and subsequently translating the lifted summaries into Halide, with the translated code achieving median performance speedups of 4.1X and up to 24X for non-trivial stencils as compared to the original implementation.

...read moreread less

Abstract: This paper demonstrates a novel combination of program synthesis and verification to lift stencil computations from low-level Fortran code to a high-level summary expressed using a predicate language. The technique is sound and mostly automated, and leverages counter-example guided inductive synthesis (CEGIS) to find provably correct translations. Lifting existing code to a high-performance description language has a number of benefits, including maintainability and performance portability. For example, our experiments show that the lifted summaries can enable domain specific compilers to do a better job of parallelization as compared to an off-the-shelf compiler working on the original code, and can even support fully automatic migration to hardware accelerators such as GPUs. We have implemented verified lifting in a system called STNG and have evaluated it using microbenchmarks, mini-apps, and real-world applications. We demonstrate the benefits of verified lifting by first automatically summarizing Fortran source code into a high-level predicate language, and subsequently translating the lifted summaries into Halide, with the translated code achieving median performance speedups of 4.1X and up to 24X for non-trivial stencils as compared to the original implementation.

...read moreread less

64 citations

Proceedings Article•DOI•

Verificarlo: Checking Floating Point Accuracy through Monte Carlo Arithmetic

[...]

Christophe Denis¹, Pablo de Oliveira Castro¹, Pablo de Oliveira Castro², Eric Petit¹, Eric Petit² - Show less +1 more•Institutions (2)

Université Paris-Saclay¹, Versailles Saint-Quentin-en-Yvelines University²

10 Jul 2016

TL;DR: Verificarlo as mentioned in this paper is an extension to the LLVM compiler to automatically use Monte Carlo arithmetic in a transparent way for the end-user, and it supports all the major languages including C, C++, and Fortran.

...read moreread less

Abstract: Numerical accuracy of floating point computation is a well studied topic which has not made its way to the end-user in scientific computing. Yet, it has become a critical issue with the recent requirements for code modernization to harness new highly parallel hardware and perform higher resolution computation. To democratize numerical accuracy analysis, it is important to propose tools and methodologies to study large use cases in a reliable and automatic way. In this paper, we propose verificarlo, an extension to the LLVM compiler to automatically use Monte Carlo Arithmetic in a transparent way for the end-user. It supports all the major languages including C, C++, and Fortran. Unlike source-to-source approaches, our implementation captures the influence of compiler optimizations on the numerical accuracy. We illustrate how Monte Carlo Arithmetic using the verificarlo tool outperforms the existing approaches on various use cases and is a step toward automatic numerical analysis.

...read moreread less

64 citations

Journal Article•DOI•

OpenMP Fortran and C programs for solving the time-dependent Gross-Pitaevskii equation in an anisotropic trap

[...]

Luis E. Young-S.¹, Dušan Vudragović², Paulsamy Muruganandam³, Sadhan K. Adhikari¹, Antun Balaž² - Show less +1 more•Institutions (3)

Spanish National Research Council¹, University of Belgrade², Bharathidasan University³

01 Jul 2016-Computer Physics Communications

TL;DR: New version of previously published Fortran and C programs for solving the Gross–Pitaevskii equation for a Bose–Einstein condensate with contact interaction in one, two and three spatial dimensions in imaginary and real time are presented, yielding both stationary and non-stationary solutions.

...read moreread less

58 citations

Journal Article•DOI•

High performance Python for direct numerical simulations of turbulent flows

[...]

Mikael Mortensen¹, Mikael Mortensen², Hans Petter Langtangen¹, Hans Petter Langtangen²•Institutions (2)

University of Oslo¹, Simula Research Laboratory²

01 Jun 2016-Computer Physics Communications

TL;DR: A pure scientific Python pseudo-spectral DNS code that nearly matches the performance of C++ for thousands of processors and billions of unknowns is described and a version optimized through Cython, that is found to match the speed ofC++.

...read moreread less

51 citations

Journal Article•DOI•

An R -matrix package for coupled-channel problems in nuclear physics

[...]

Pierre Descouvemont¹•Institutions (1)

Université libre de Bruxelles¹

01 Mar 2016-Computer Physics Communications

TL;DR: An R-matrix Fortran package to solve coupled-channel problems in nuclear physics, and the Lagrange-mesh method, which deals with open and closed channels simultaneously, without numerical instability associated with closed channels.

...read moreread less

User's Guide for SQOPT Version 7.5: Software for Large-Scale Linear and Quadratic Programming

[...]

Philip E. Gill¹, Elizabeth Wong², Walter Murray², Michael A. Saunders²•Institutions (2)

University of California, San Diego¹, Stanford University²

01 Jan 2016

TL;DR: SQOPT is a software package for minimizing a convex quadratic function subject to both equality and inequality constraints, and is part of the SNOPT package for large-scale nonlinearly constrained optimization.

...read moreread less

Abstract: SQOPT is a software package for minimizing a convex quadratic function subject to both equality and inequality constraints. SQOPT may also be used for linear programming and for nding a feasible point for a set of linear equalities and inequalities. SQOPT uses a two-phase, active-set, reduce-Hessian method. It is most ecient on problems with relatively few degrees of freedom (for example, if only some of the variables appear in the quadratic term, or the number of active constraints and bounds is nearly as large as the number of variables). However, unlike previous versions of SQOPT, there is no limit on the number of degrees of freedom. SQOPT is primarily intended for large linear and quadratic problems with sparse constraint matrices. A quadratic term 1 x T Hx in the objective function is represented by a user subroutine that returns the product Hx for a given vector x. SQOPT uses stable numerical methods throughout and includes a reliable basis package (for maintaining sparse LU factors of the basis matrix), a practical antidegeneracy procedure, scaling, and elastic bounds on any number of constraints and variables. SQOPT is part of the SNOPT package for large-scale nonlinearly constrained optimization. The source code is re-entrant and suitable for any machine with a Fortran compiler. SQOPT may be called from a driver program in Fortran, Matlab, or C/C++ with the new interface based on the Fortran 2003 standard on Fortran-C interoperability. A f2c translation of SQOPT to the C language is still provided, although this feature will be discontinued in the future (users should migrate to the new C/C++ interface). SQOPT can also be used as a stand-alone package, reading data in the MPS format used by commercial mathematical programming systems.

...read moreread less

Journal Article•DOI•

Performance and portability of accelerated lattice Boltzmann applications with OpenACC

[...]

Enrico Calore¹, Alessandro Gabbana¹, Jiri Kraus², Sebastiano Fabio Schifano¹, Raffaele Tripiccione¹ - Show less +1 more•Institutions (2)

University of Ferrara¹, Nvidia²

25 Aug 2016-Concurrency and Computation: Practice and Experience

TL;DR: OpenACC as discussed by the authors is a high-level approach based on compiler directives to mark regions of existing C, C++, or Fortran code to run on accelerators, which directly addresses code portability, leaving to compilers the support of each different accelerator, but one has to carefully assess the relative costs of portable approaches versus computing efficiency.

...read moreread less

Abstract: An increasingly large number of HPC systems rely on heterogeneous architectures combining traditional multi-core CPUs with power efficient accelerators. Designing efficient applications for these systems have been troublesome in the past as accelerators could usually be programmed using specific programming languages threatening maintainability, portability, and correctness. Several new programming environments try to tackle this problem. Among them, OpenACC offers a high-level approach based on compiler directives to mark regions of existing C, C++, or Fortran codes to run on accelerators. This approach directly addresses code portability, leaving to compilers the support of each different accelerator, but one has to carefully assess the relative costs of portable approaches versus computing efficiency. In this paper, we address precisely this issue, using as a test-bench a massively parallel lattice Boltzmann algorithm. We first describe our multi-node implementation and optimization of the algorithm, using OpenACC and MPI. We then benchmark the code on a variety of processors, including traditional CPUs and GPUs, and make accurate performance comparisons with other GPU implementations of the same algorithm using CUDA and OpenCL. We also asses the performance impact associated with portable programming, and the actual portability and performance-portability of OpenACC-based applications across several state-of-the-art architectures. Copyright © 2016 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

VOFI - A library to initialize the volume fraction scalar field

[...]

Simone Bnà¹, Sandro Manservisi¹, Ruben Scardovelli¹, Philip Yecko², Stéphane Zaleski³, Stéphane Zaleski⁴ - Show less +2 more•Institutions (4)

University of Bologna¹, Cooper Union², Centre national de la recherche scientifique³, University of Paris⁴

01 Mar 2016-Computer Physics Communications

TL;DR: The Vofi library has been developed to accurately calculate the volume fraction field demarcated by implicitly-defined fluid interfaces in Cartesian grids with cubic cells by computes the integration limits along two coordinate directions and the local height function, that is the integrand of a double Gauss–Legendre integration with a variable number of nodes.

...read moreread less

Journal Article•DOI•

Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations

[...]

Jing Gong¹, Stefano Markidis¹, Erwin Laure¹, Matthew Otten², Paul Fischer³, Misun Min⁴ - Show less +2 more•Institutions (4)

Royal Institute of Technology¹, Cornell University², University of Illinois at Urbana–Champaign³, Argonne National Laboratory⁴

01 Nov 2016-The Journal of Supercomputing

TL;DR: A hybrid GPU implementation and performance analysis of Nekbone, which represents one of the core kernels of the incompressible Navier–Stokes solver Nek5000, is presented, which significantly minimizes the modification of the existing CPU code while extending the simulation capability of the code to GPU architectures.

...read moreread less

Abstract: We present a hybrid GPU implementation and performance analysis of Nekbone, which represents one of the core kernels of the incompressible Navier---Stokes solver Nek5000. The implementation is based on OpenACC and CUDA Fortran for local parallelization of the compute-intensive matrix---matrix multiplication part, which significantly minimizes the modification of the existing CPU code while extending the simulation capability of the code to GPU architectures. Our discussion includes the GPU results of OpenACC interoperating with CUDA Fortran and the gather---scatter operations with GPUDirect communication. We demonstrate performance of up to 552 Tflops on 16, 384 GPUs of the OLCF Cray XK7 Titan.

...read moreread less

Journal Article•DOI•

SpectralPlasmaSolver: a Spectral Code for Multiscale Simulations of Collisionless, Magnetized Plasmas

[...]

Juris Vencels¹, Gian Luca Delzanno¹, Gianmarco Manzini¹, Stefano Markidis¹, Ivy Bo Peng², Vadim Roytershteyn³ - Show less +2 more•Institutions (3)

Los Alamos National Laboratory¹, Royal Institute of Technology², Space Science Institute³

01 May 2016

TL;DR: An assessment of the performance of the code is presented, showing a significant improvement in the code running-time achieved by preconditioning, while strong scaling tests show a factor of 10 speed-up using 16 threads.

...read moreread less

Abstract: We present the design and implementation of a spectral code, called SpectralPlasmaSolver (SPS), for the solution of the multi-dimensional Vlasov-Maxwell equations. The method is based on a Hermite-Fourier decomposition of the particle distribution function. The code is written in Fortran and uses the PETSc library for solving the non-linear equations and preconditioning and the FFTW library for the convolutions. SPS is parallelized for shared- memory machines using OpenMP. As a verification example, we discuss simulations of the two-dimensional Orszag-Tang vortex problem and successfully compare them against a fully kinetic Particle-In-Cell simulation. An assessment of the performance of the code is presented, showing a significant improvement in the code running-time achieved by preconditioning, while strong scaling tests show a factor of 10 speed-up using 16 threads.

...read moreread less

Journal Article•DOI•

Modeling of RAFT Polymerization Processes Using an Efficient Monte Carlo Algorithm in Julia

[...]

Esteban Pintos¹, Esteban Pintos², Claudia Sarmoria¹, Claudia Sarmoria², Adriana Brandolin¹, Adriana Brandolin², Mariano Asteasuain¹, Mariano Asteasuain² - Show less +4 more•Institutions (2)

Universidad Nacional del Sur¹, National Scientific and Technical Research Council²

26 Jul 2016-Industrial & Engineering Chemistry Research

TL;DR: A kinetic Monte Carlo model of a reversible addition–fragmentation chain transfer (RAFT) process is presented and offers an efficient method for predicting average properties and molecular weight distributions of the polymer species, including the bivariate molecular weight distribution of the intermediate two-arm adduct.

...read moreread less

Abstract: A kinetic Monte Carlo model of a reversible addition–fragmentation chain transfer (RAFT) process is presented. The algorithm has been developed and implemented in Julia for the three main RAFT theories under current discussion (slow fragmentation, intermediate radical termination, and intermediate radical termination with oligomers). Julia is a modern programming language designed to achieve high performance in numerical and scientific computing. Thanks to a careful optimization of the code, it is possible to simulate a RAFT reaction scheme in short computing times for any of the three theories. The code is benchmarked against other programming languages (MATLAB, Python, FORTRAN, and C), showing that Julia presents advantages for this particular system. The model offers an efficient method for predicting average properties and molecular weight distributions of the polymer species, including the bivariate molecular weight distribution of the intermediate two-arm adduct. The proposed model can also be emplo...

...read moreread less

Book Chapter•DOI•

TiDA: High-Level Programming Abstractions for Data Locality Management

[...]

Didem Unat¹, Tan Nguyen², Weiqun Zhang², Muhammed Nufail Farooqi¹, Burak Bastem¹, George Michelogiannakis², Ann S. Almgren², John Shalf² - Show less +4 more•Institutions (2)

Koç University¹, Lawrence Berkeley National Laboratory²

19 Jun 2016

TL;DR: TiDA as mentioned in this paper is a multicore programming model based on tiling and implemented as C++ and Fortran libraries, which hides the details of data decomposition, cache locality optimizations, and memory affinity management.

...read moreread less

Abstract: The high energy costs for data movement compared to computation gives paramount importance to data locality management in programs. Managing data locality manually is not a trivial task and also complicates programming. Tiling is a well-known approach that provides both data locality and parallelism in an application. However, there is no standard programming construct to express tiling at the application level. We have developed a multicore programming model, TiDA, based on tiling and implemented the model as C++ and Fortran libraries. The proposed programming model has three high level abstractions, tiles, regions and tile iterator. These abstractions in the library hide the details of data decomposition, cache locality optimizations, and memory affinity management in the application. In this paper we unveil the internals of the library and demonstrate the performance and programability advantages of the model on five applications on multiple NUMA nodes. The library achieves up to 2.10x speedup over OpenMP in a single compute node for simple kernels, and up to 22x improvement over a single thread for a more complex combustion proxy application (SMC) on 24 cores. The MPI+TiDA implementation of geometric multigrid demonstrates a 30.9 % performance improvement over MPI+OpenMP when scaling to 3072 cores (excluding MPI communication overheads, 8.5 % otherwise).

...read moreread less

Journal Article•DOI•

Accelerating the SCE-UA Global Optimization Method Based on Multi-Core CPU and Many-Core GPU

[...]

Guangyuan Kan, Ke Liang, Jiren Li, Liuqian Ding, Xiaoyan He, Youbing Hu, Mark Amo-Boateng - Show less +3 more

06 Apr 2016-Advances in Meteorology

TL;DR: This paper proposed two parallel SCE-UA methods and implemented them on Intel multi-core CPU and NVIDIA many-core GPU by OpenMP and CUDA Fortran, respectively and the Griewank benchmark function was adopted in this paper to test and compare the performances of the serial and parallel Sce- UA methods.

...read moreread less

Abstract: The famous global optimization SCE-UA method, which has been widely used in the field of environmental model parameter calibration, is an effective and robust method. However, the SCE-UA method has a high computational load which prohibits the application of SCE-UA to high dimensional and complex problems. In recent years, the hardware of computer, such as multi-core CPUs and many-core GPUs, improves significantly. These much more powerful new hardware and their software ecosystems provide an opportunity to accelerate the SCE-UA method. In this paper, we proposed two parallel SCE-UA methods and implemented them on Intel multi-core CPU and NVIDIA many-core GPU by OpenMP and CUDA Fortran, respectively. The Griewank benchmark function was adopted in this paper to test and compare the performances of the serial and parallel SCE-UA methods. According to the results of the comparison, some useful advises were given to direct how to properly use the parallel SCE-UA methods.

...read moreread less

Journal Article•DOI•

KGEN: A Python tool for automated Fortran kernel generation and verification

[...]

Youngsung Kim¹, John M. Dennis¹, Christopher Kerr¹, Raghu Raj Prasanna Kumar¹, Amogh Simha², Allison H. Baker¹, Sheri Mickelson¹ - Show less +3 more•Institutions (2)

National Center for Atmospheric Research¹, University of Colorado Boulder²

01 Jan 2016

TL;DR: This paper presents a Python-based tool that greatly simplifies the generation of computational kernels from Fortran based applications and has utilized its tool to extract more than thirty computational kernel from a million-line climate simulation model.

...read moreread less

Abstract: Computational kernels, which are small pieces of software that selectively capture the characteristics of larger applications, have been used successfully for decades. Kernels allow for the testing of a compiler's ability to optimize code, performance of future hardware and reproducing compiler bugs. Unfortunately they can be rather time consuming to create and do not always accurately represent the full complexity of large scientific applications. Furthermore, expert knowledge is often required to create such kernels. In this paper, we present a Python-based tool that greatly simplifies the generation of computational kernels from Fortran based applications. Our tool automatically extracts partial source code of a larger Fortran application into a stand-alone executable kernel. Additionally, our tool also generates state data necessary for proper execution and verification of the extracted kernel. We have utilized our tool to extract more than thirty computational kernels from a million-line climate simulation model. Our extracted kernels have been used for a variety of purposes including: code modernization, identification of limitations in compiler optimizations, numerical algorithm debugging, compiler bug reporting, and for procurement benchmarking.

...read moreread less

Journal Article•DOI•

EMUstack: An open source route to insightful electromagnetic computation via the Bloch mode scattering matrix method

[...]

Björn C. P. Sturmberg¹, Kokou B. Dossou², Felix J. Lawrence², Christopher G. Poulton², Ross C. McPhedran¹, C. Martijn de Sterke¹, Lindsay C. Botten³, Lindsay C. Botten² - Show less +4 more•Institutions (3)

University of Sydney¹, University of Technology, Sydney², Australian National University³

01 May 2016-Computer Physics Communications

TL;DR: EMUstack, an open-source implementation of the Scattering Matrix Method (SMM) for solving field problems in layered media, is described in terms of Bloch modes that are found using the Finite Element Method (FEM).

...read moreread less

Journal Article•DOI•

PDoublePop: An implementation of parallel genetic algorithm for function optimization ☆

[...]

Ioannis G. Tsoulos¹, Alexandros T. Tzallas¹, Dimitrios G. Tsalikakis²•Institutions (2)

American Hotel & Lodging Educational Institute¹, University of Western Macedonia²

01 Dec 2016-Computer Physics Communications

TL;DR: The proposed software named PDoublePop implements a client–server model for parallel genetic algorithms with advanced features for the local genetic algorithms such as an enhanced stopping rule, an advanced mutation scheme and periodical application of a local search procedure in order to solve optimization problems.

...read moreread less

Proceedings Article•DOI•

Overview of the NASA Glenn Flux Reconstruction Based High-Order Unstructured Grid Code

[...]

Seth C. Spiegel¹, James R. DeBonis², H. T. Huynh²•Institutions (2)

Oak Ridge Associated Universities¹, Glenn Research Center²

04 Jan 2016

TL;DR: An examination of the code's performance demonstrates good parallel scaling, as well as an implementation of the FR method with a computational cost/degree- of-freedom/time-step that is essentially independent of the solution order of accuracy for structured geometries.

...read moreread less

Abstract: A computational fluid dynamics code based on the flux reconstruction (FR) method is currently being developed at NASA Glenn Research Center to ultimately provide a large- eddy simulation capability that is both accurate and efficient for complex aeropropulsion flows. The FR approach offers a simple and efficient method that is easy to implement and accurate to an arbitrary order on common grid cell geometries. The governing compressible Navier-Stokes equations are discretized in time using various explicit Runge-Kutta schemes, with the default being the 3-stage/3rd-order strong stability preserving scheme. The code is written in modern Fortran (i.e., Fortran 2008) and parallelization is attained through MPI for execution on distributed-memory high-performance computing systems. An h- refinement study of the isentropic Euler vortex problem is able to empirically demonstrate the capability of the FR method to achieve super-accuracy for inviscid flows. Additionally, the code is applied to the Taylor-Green vortex problem, performing numerous implicit large-eddy simulations across a range of grid resolutions and solution orders. The solution found by a pseudo-spectral code is commonly used as a reference solution to this problem, and the FR code is able to reproduce this solution using approximately the same grid resolution. Finally, an examination of the code's performance demonstrates good parallel scaling, as well as an implementation of the FR method with a computational cost/degree- of-freedom/time-step that is essentially independent of the solution order of accuracy for structured geometries.

...read moreread less

Posted Content•

ctsmr - Continuous Time Stochastic Modeling in R

[...]

Rune Juhl, Jan Kloppenborg Møller, Henrik Madsen

01 Jun 2016-arXiv: Computation

TL;DR: This paper briefly demonstrates how to construct a Continuous Time Stochastic Model using multivariate time series data, and how to estimate the embedded parameters.

...read moreread less

Abstract: ctsmr is an R package providing a general framework for identifying and estimating partially observed continuous-discrete time gray-box models. The estimation is based on maximum likelihood principles and Kalman filtering efficiently implemented in Fortran. This paper briefly demonstrates how to construct a Continuous Time Stochastic Model using multivariate time series data, and how to estimate the embedded parameters. The setup provides a unique framework for statistical modeling of physical phenomena, and the approach is often called grey box modeling. Finally three examples are provided to demonstrate the capabilities of ctsmr.

...read moreread less

Journal Article•DOI•

PENGEOM: A general-purpose geometry package for Monte Carlo simulation of radiation transport in material systems defined by quadric surfaces

[...]

Julio F. Almansa, Francesc Salvat-Pujol¹, Gloria Díaz-Londoño², Artur Carnicer³, Antonio M. Lallena⁴, Francesc Salvat³ - Show less +2 more•Institutions (4)

Goethe University Frankfurt¹, University of La Frontera², University of Barcelona³, University of Granada⁴

01 Feb 2016-Computer Physics Communications

TL;DR: The pengeom subroutines (a subset of the penelope code) track particles through the material structure, independently of the details of the physics models adopted to describe the interactions, in Monte Carlo simulations of radiation transport with arbitrary interaction models.

...read moreread less

Journal Article•

Efficient Implementation of a Higher-Order Language with Built-In AD

[...]

Jeffrey Mark Siskind, Barak A. Pearlmutter

10 Nov 2016-arXiv: Programming Languages

TL;DR: It is shown that Automatic Differentiation operators can be provided in a dynamic language without sacrificing numeric performance, and to achieve this, general forward and reverse AD functions are added to a simple high-level dynamic language and support for them is included in an aggressive optimizing compiler.

...read moreread less

Abstract: We show that Automatic Differentiation (AD) operators can be provided in a dynamic language without sacrificing numeric performance. To achieve this, general forward and reverse AD functions are added to a simple high-level dynamic language, and support for them is included in an aggressive optimizing compiler. Novel technical mechanisms are discussed, which have the ability to migrate the AD transformations from run-time to compile-time. The resulting system, although only a research prototype, exhibits startlingly good performance. In fact, despite the potential inefficiencies entailed by support of a functional-programming language and a first-class AD operator, performance is competitive with the fastest available preprocessor-based Fortran AD systems. On benchmarks involving nested use of the AD operators, it can even dramatically exceed their performance.

...read moreread less

Journal Article•DOI•

Fortran Code for Generating Random Probability Vectors, Unitaries, and Quantum States

[...]

Jonas Maziero¹, Jonas Maziero²•Institutions (2)

University of the Republic¹, Universidade Federal de Santa Maria²

16 Mar 2016-Frontiers in ICT

TL;DR: This article describes Fortran codes produced, or organized, for the generation of the following random objects: numbers, probability vectors, unitary matrices, and quantum state vectors and density matrices.

...read moreread less

Abstract: The usefulness of generating random configurations is recognized in many areas of knowledge. Fortran was born for scientific computing and has been one of the main programming languages in this area since then. And several ongoing projects targeting towards its betterment indicate that it will keep this status in the decades to come. In this article, we describe Fortran codes produced, or organized, for the generation of the following random objects: numbers, probability vectors, unitary matrices, and quantum state vectors and density matrices. Some matrix functions are also included and may be of independent interest.

...read moreread less

Proceedings Article•DOI•

Auto-vectorizing a large-scale production unstructured-mesh CFD application

[...]

Gihan R. Mudalige¹, Istvan Z. Reguly², Michael B. Giles¹•Institutions (2)

University of Oxford¹, Pázmány Péter Catholic University²

13 Mar 2016

TL;DR: This paper presents recent research exploring techniques to gain compiler auto-vectorization for unstructured mesh applications from the CFD domain and sees that there is considerable performance improvements with autovectorization.

...read moreread less

Abstract: For modern x86 based CPUs with increasingly longer vector lengths, achieving good vectorization has become very important for gaining higher performance. Using very explicit SIMD vector programming techniques has been shown to give near optimal performance, however they are difficult to implement for all classes of applications particularly ones with very irregular memory accesses and usually require considerable re-factorisation of the code. Vector intrinsics are also not available for languages such as Fortran which is still heavily used in large production applications. The alternative is to depend on compiler auto-vectorization which usually have been less effective in vectorizing codes with irregular memory access patterns. In this paper we present recent research exploring techniques to gain compiler auto-vectorization for unstructured mesh applications. A key contribution is details on software techniques that achieve auto-vectorisation for a large production grade unstructured mesh application from the CFD domain so as to benefit from the vector units on the latest Intel processors without a significant code re-write. We use code generation tools in the OP2 domain specific library to apply the auto-vectorising optimisations automatically to the production code base and further explore the performance of the application compared to the performance with other parallelisations such as on the latest NVIDIA GPUs. We see that there is considerable performance improvements with autovectorization. The most compute intensive parallel loops in the large CFD application shows speedups of nearly 40% on a 20 core Intel Haswell system compared to their non-vectorized versions. However not all loops gain due to vectorization where loops with less computational intensity lose performance due to the associated overheads.

...read moreread less

DOI•

Exploring compiler optimization opportunities for the OpenMP 4.x accelerator model on a POWER8+GPU platform

[...]

Akihiro Hayashi¹, Jun Shirako¹, Ettore Tiotto², Robert Ho², Vivek Sarkar¹ - Show less +1 more•Institutions (2)

Rice University¹, IBM²

13 Nov 2016

TL;DR: To study potential performance improvements by compiling and optimizing high-level GPU programs, a set of OpenMP 4.0 benchmarks are evaluated and a comparable performance analysis is conducted among hand-written CUDA and automatically-generated GPU programs by the IBM XL and clang/LLVM compilers.

...read moreread less

Abstract: While GPUs are increasingly popular for high-performance computing, optimizing the performance of GPU programs is a time-consuming and non-trivial process in general. This complexity stems from the low abstraction level of standard GPU programming models such as CUDA and OpenCL: programmers are required to orchestrate low-level operations in order to exploit the full capability of GPUs. In terms of software productivity and portability, a more attractive approach would be to facilitate GPU programming by providing high-level abstractions for expressing parallel algorithms.OpenMP is a directive-based shared memory parallel programming model and has been widely used for many years. From OpenMP 4.0 onwards, GPU platforms are supported by extending OpenMP's high-level parallel abstractions with accelerator programming. This extension allows programmers to write GPU programs in standard C/C++ or Fortran languages, without exposing too many details of GPU architectures.However, such high-level parallel programming strategies generally impose additional program optimizations on compilers, which could result in lower performance than fully hand-tuned code with low-level programming models. To study potential performance improvements by compiling and optimizing high-level GPU programs, in this paper, we 1) evaluate a set of OpenMP 4.× benchmarks on an IBM POWER8 and NVIDIA Tesla GPU platform and 2) conduct a comparable performance analysis among hand-written CUDA and automatically-generated GPU programs by the IBM XL and clang/LLVM compilers.

...read moreread less

Journal Article•DOI•

Additional Parallel Features in Fortran

[...]

Bill Long¹•Institutions (1)

Cray¹

29 Jul 2016-ACM Sigplan Fortran Forum

TL;DR: The next Fortran standard will include enhancements to the parallel programming model based on coarrays that is part of Fortran 2008, including support for teams of images, synchronization using events, collective and atomic operations, enhanced error reporting, and continued execution after the failure of an image.

...read moreread less

Abstract: The next Fortran standard will include enhancements to the parallel programming model based on coarrays that is part of Fortran 2008. Included will be support for teams of images, synchronization using events, collective and atomic operations, enhanced error reporting, and continued execution after the failure of an image.

...read moreread less