scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Computational Physics in 2013"


Journal ArticleDOI
TL;DR: In this article, a new molecular simulation toolkit composed of some lately developed force fields and specified models is presented to study the self-assembly, phase transition, and other properties of polymeric systems at mesoscopic scale by utilizing the computational power of GPUs.
Abstract: A new molecular simulation toolkit composed of some lately developed force fields and specified models is presented to study the self-assembly, phase transition, and other properties of polymeric systems at mesoscopic scale by utilizing the computational power of GPUs. In addition, the hierarchical self-assembly of soft anisotropic particles and the problems related to polymerization can be studied by corresponding models included in this toolkit.

168 citations


Journal ArticleDOI
TL;DR: A simple and general implementation of Hamiltonian replica exchange for the popular molecular-dynamics software GROMACS is presented in this paper, where arbitrarily different Hamiltonians can be used for the different replicas without incurring in any significant performance penalty.
Abstract: A simple and general implementation of Hamiltonian replica exchange for the popular molecular-dynamics software GROMACS is presented. In this implementation, arbitrarily different Hamiltonians can be used for the different replicas without incurring in any significant performance penalty. The implementation was validated on a simple toy model - alanine dipeptide in water - and applied to study the rearrangement of an RNA tetraloop, where it was used to compare recently proposed force-field corrections.

152 citations


Journal ArticleDOI
TL;DR: In this article, a wide variety of numerical methods are evaluated and compared for solving the stochastic differential equations encountered in molecular dynamics, based on the application of deterministic impulses, drifts and Brownian motions in some combination.
Abstract: A wide variety of numerical methods are evaluated and compared for solving the stochastic differential equations encountered in molecular dynamics. The methods are based on the application of deterministic impulses, drifts, and Brownian motions in some combination. The Baker-Campbell-Hausdorff expansion is used to study sampling accuracy following recent work by the authors, which allows determination of the stepsize-dependent bias in configurational averaging. For harmonic oscillators, configurational averaging is exact for certain schemes, which may result in improved performance in the modelling of biomolecules where bond stretches play a prominent role. For general systems, an optimal method can be identified that has very low bias compared to alternatives. In simulations of the alanine dipeptide reported here (both solvated and unsolvated), higher accuracy is obtained without loss of computational efficiency, while allowing large timestep, and with no impairment of the conformational exploration rate (the effective diffusion rate observed in simulation). The optimal scheme is a uniformly better performing algorithm for molecular sampling, with overall efficiency improvements of 25% or more in practical timestep size achievable in vacuum, and with reductions in the error of configurational averages of a factor of ten or more attainable in solvated simulations at large timestep.

107 citations


Journal ArticleDOI
TL;DR: An algorithm used for Full Configuration Interaction Quantum Monte Carlo (FCIQMC), which is implemented and available in MOLPRO and as a standalone code, and is designed for high-level parallelism and linear-scaling with walker number is explored.
Abstract: For many decades, quantum chemical method development has been dominated by algorithms which involve increasingly complex series of tensor contractions over one-electron orbital spaces. Procedures for their derivation and implementation have evolved to require the minimum amount of logic and rely heavily on computationally efficient library-based matrix algebra and optimized paging schemes. In this regard, the recent development of exact stochastic quantum chemical algorithms to reduce computational scaling and memory overhead requires a contrasting algorithmic philosophy, but one which when implemented efficiently can often achieve higher accuracy/cost ratios with small random errors. Additionally, they can exploit the continuing trend for massive parallelization which hinders the progress of deterministic high-level quantum chemical algorithms. In the Quantum Monte Carlo community, stochastic algorithms are ubiquitous but the discrete Fock space of quantum chemical methods is often unfamiliar, and the methods introduce new concepts required for algorithmic efficiency. In this paper, we explore these concepts and detail an algorithm used for Full Configuration Interaction Quantum Monte Carlo (FCIQMC), which is implemented and available in MOLPRO and as a standalone code, and is designed for high-level parallelism and linear-scaling with walker number. Many of the algorithms are also in use in, or can be transferred to, other stochastic quantum chemical methods and implementations. We apply these algorithms to the strongly correlated Chromium dimer, to demonstrate their efficiency and parallelism.

101 citations


Journal ArticleDOI
TL;DR: In this article, the authors present concise, computationally efficient formulas for several quantities of interest (including absorbed and scattered power, optical force (radiation pressure), and torque) in scattering calculations performed using the boundary element method (BEM).
Abstract: We present concise, computationally efficient formulas for several quantities of interest -- including absorbed and scattered power, optical force (radiation pressure), and torque -- in scattering calculations performed using the boundary-element method (BEM) [also known as the method of moments (MOM)]. Our formulas compute the quantities of interest \textit{directly} from the BEM surface currents with no need ever to compute the scattered electromagnetic fields. We derive our new formulas and demonstrate their effectiveness by computing power, force, and torque in a number of example geometries. Free, open-source software implementations of our formulas are available for download online.

81 citations


Journal ArticleDOI
TL;DR: The multi-layer multi-configuration time-dependent Hartree method for bosons (ML-MCTDHB), a variational numerically exact ab initio method for studying the quantum dynamics and stationary properties of general bosonic systems, is developed.
Abstract: We develop the multi-layer multi-configuration time-dependent Hartree method for bosons (ML-MCTDHB), a variational numerically exact ab-initio method for studying the quantum dynamics and stationary properties of bosonic systems. ML-MCTDHB takes advantage of the permutation symmetry of identical bosons, which allows for investigations of the quantum dynamics from few to many-body systems. Moreover, the multi-layer feature enables ML-MCTDHB to describe mixed bosonic systems consisting of arbitrary many species. Multi-dimensional as well as mixed-dimensional systems can be accurately and efficiently simulated via the multi-layer expansion scheme. We provide a detailed account of the underlying theory and the corresponding implementation. We also demonstrate the superior performance by applying the method to the tunneling dynamics of bosonic ensembles in a one-dimensional double well potential, where a single-species bosonic ensemble of various correlation strengths and a weakly interacting two-species bosonic ensemble are considered.

81 citations


Posted Content
TL;DR: DDSCAT 7.3 as discussed by the authors is an open-source Fortran-90 software package applying the discrete dipole approximation to calculate scattering and absorption of electromagnetic waves by targets with arbitrary geometries and complex refractive index.
Abstract: DDSCAT 7.3 is an open-source Fortran-90 software package applying the discrete dipole approximation to calculate scattering and absorption of electromagnetic waves by targets with arbitrary geometries and complex refractive index. The targets may be isolated entities (e.g., dust particles), but may also be 1-d or 2-d periodic arrays of "target unit cells", allowing calculation of absorption, scattering, and electric fields around arrays of nanostructures. The theory of the DDA and its implementation in DDSCAT is presented in Draine (1988) and Draine & Flatau (1994), and its extension to periodic structures in Draine & Flatau (2008), and efficient near-field calculations in Flatau & Draine (2012). DDSCAT 7.3 includes support for MPI, OpenMP, and the Intel Math Kernel Library (MKL). DDSCAT supports calculations for a variety of target geometries. Target materials may be both inhomogeneous and anisotropic. It is straightforward for the user to "import" arbitrary target geometries into the code. DDSCAT automatically calculates total cross sections for absorption and scattering and selected elements of the Mueller scattering intensity matrix for user-specified scattering directions. DDSCAT 7.3 can efficiently calculate E and B throughout a user-specified volume containing the target. This User Guide explains how to use DDSCAT 7.3 to carry out electromagnetic scattering calculations, including use of DDPOSTPROCESS, a Fortran-90 code to perform calculations with E and B at user-selected locations near the target. A number of changes have been made since the last release, DDSCAT 7.2 .

72 citations


Journal ArticleDOI
TL;DR: For the computation of static hysteresis loops the steepest descent minimizer is faster than a Landau-Lifshitz micromagnetic solver by more than a factor of two.
Abstract: We present a steepest descent energy minimization scheme for micromagnetics The method searches on a curve that lies on the sphere which keeps the magnitude of the magnetization vector constant The step size is selected according to a modified Barzilai-Borwein method Standard linear tetrahedral finite elements are used for space discretization For the computation of static hysteresis loops the steepest descent minimizer is faster than a Landau-Lifshitz micromagnetic solver by more than a factor of two The speed up on a graphic processor is 48 as compared to the fastest single-core CPU implementation

66 citations


Journal ArticleDOI
TL;DR: A class of high‐order accurate cell‐centered arbitrary Lagrangian–Eulerian (ALE) one‐step ADER weighted essentially non‐oscillatory (WENO) finite volume schemes for the solution of nonlinear hyperbolic conservation laws on two‐dimensional unstructured triangular meshes.
Abstract: In this paper we present a class of high order accurate cell-centered Arbitrary-Eulerian-Lagrangian (ALE) one-step ADER-WENO finite volume schemes for the solution of nonlinear hyperbolic conservation laws on two-dimensional unstructured triangular meshes. High order of accuracy in space is achieved by a WENO reconstruction algorithm, while a local space-time Galerkin predictor allows the schemes to be high order accurate also in time by using an element-local weak formulation of the governing PDE on moving meshes. The mesh motion can be computed by choosing among three different node solvers, which are for the first time compared with each other in this article: the node velocity may be obtained i) either as an arithmetic average among the states surrounding the node, or, ii) as a solution of multiple one-dimensional half-Riemann problems around a vertex, or, iii) by solving approximately a multidimensional Riemann problem around each vertex of the mesh using the genuinely multidimensional HLL Riemann. Once the vertex velocity and thus the new node location has been determined by the node solver, the local mesh motion is then constructed by straight edges connecting the vertex positions at the old time level with the new ones at the next time level. If necessary, a rezoning step can be introduced here to overcome mesh tangling or highly deformed elements. We apply the high order algorithm presented in this paper to the Euler equations of compressible gas dynamics as well as to the ideal classical and relativistic MHD equations. We show numerical convergence results up to fifth order of accuracy in space and time together with some classical numerical test problems for each hyperbolic system under consideration.

62 citations


Journal ArticleDOI
TL;DR: An algorithm for sampling exactly from the normal distribution that reads some number of uniformly distributed random digits in a given base and generates an initial portion of the representation of a normal deviate in the same base with mean cost that scales linearly in the precision.
Abstract: An algorithm for sampling exactly from the normal distribution is given. The algorithm reads some number of uniformly distributed random digits in a given base and generates an initial portion of the representation of a normal deviate in the same base. Thereafter, uniform random digits are copied directly into the representation of the normal deviate. Thus, in contrast to existing methods, it is possible to generate normal deviates exactly rounded to any precision with a mean cost that scales linearly in the precision. The method performs no extended precision arithmetic, calls no transcendental functions, and, indeed, uses no floating point arithmetic whatsoever; it uses only simple integer operations. It can easily be adapted to sample exactly from the discrete normal distribution whose parameters are rational numbers.

54 citations


Journal ArticleDOI
TL;DR: Results for current-generation GPUs from AMD and Nvidia show that the implementation, implemented in the free code Octopus, can reach a sustained performance of up to 90 GFlops for a single GPU, representing a significant speed-up when compared to the CPU version of the code.
Abstract: We discuss the application of graphical processing units (GPUs) to accelerate real-space density functional theory (DFT) calculations. To make our implementation efficient, we have developed a scheme to expose the data parallelism available in the DFT approach; this is applied to the different procedures required for a real-space DFT calculation. We present results for current-generation GPUs from AMD and Nvidia, which show that our scheme, implemented in the free code Octopus, can reach a sustained performance of up to 90 GFlops for a single GPU, representing a significant speed-up when compared to the CPU version of the code. Moreover, for some systems our implementation can outperform a GPU Gaussian basis set code, showing that the real-space approach is a competitive alternative for DFT simulations on GPUs.

Journal ArticleDOI
TL;DR: In this paper, a set of stochastic isokinetic equations of motion that are shown to be rigorously ergodic and can be integrated using a multiple time-stepping algorithm that can be easily implemented in existing molecular dynamics codes.
Abstract: Molecular dynamics is one of the most commonly used approaches for studying the dynamics and statistical distributions of many physical, chemical, and biological systems using atomistic or coarse-grained models. It is often the case, however, that the interparticle forces drive motion on many time scales, and the efficiency of a calculation is limited by the choice of time step, which must be sufficiently small that the fastest force components are accurately integrated. Multiple time-stepping algorithms partially alleviate this inefficiency by assigning to each time scale an appropriately chosen step-size. However, such approaches are limited by resonance phenomena, wherein motion on the fastest time scales limits the step sizes associated with slower time scales. In atomistic models of biomolecular systems, for example, resonances limit the largest time step to around 5-6 fs. In this paper, we introduce a set of stochastic isokinetic equations of motion that are shown to be rigorously ergodic and that can be integrated using a multiple time-stepping algorithm that can be easily implemented in existing molecular dynamics codes. The technique is applied to a simple, illustrative problem and then to a more realistic system, namely, a flexible water model. Using this approach outer time steps as large as 100 fs are shown to be possible.

Journal ArticleDOI
TL;DR: This work presents a numerical model for predicting the performance of ionic wind devices or electrostatic fluid accelerators with the main benefit is the ability to accurately predict the amount of charge injected from the corona electrode.
Abstract: Ionic wind devices or electrostatic fluid accelerators are becoming of increasing interest as tools for thermal management, in particular for semiconductor devices. In this work, we present a numerical model for predicting the performance of such devices, whose main benefit is the ability to accurately predict the amount of charge injected at the corona electrode. Our multiphysics numerical model consists of a highly nonlinear strongly coupled set of PDEs including the Navier-Stokes equations for fluid flow, Poisson's equation for electrostatic potential, charge continuity and heat transfer equations. To solve this system we employ a staggered solution algorithm that generalizes Gummel's algorithm for charge transport in semiconductors. Predictions of our simulations are validated by comparison with experimental measurements and are shown to closely match. Finally, our simulation tool is used to estimate the effectiveness of the design of an electrohydrodynamic cooling apparatus for power electronics applications.

Journal ArticleDOI
TL;DR: In this paper, an explicitly correlated plane wave basis for periodic wave function expansions at the level of second-order Mller-Plesset perturbation theory (MP2) was investigated and compared to conventional MP2 theory in a finite homogeneous electron gas model.
Abstract: We present an investigation into the use of an explicitly correlated plane wave basis for periodic wavefunction expansions at the level of second-order M{\o}ller-Plesset perturbation theory (MP2). The convergence of the electronic correlation energy with respect to the one-electron basis set is investigated and compared to conventional MP2 theory in a finite homogeneous electron gas model. In addition to the widely used Slater-type geminal correlation factor, we also derive and investigate a novel correlation factor that we term Yukawa-Coulomb. The Yukawa-Coulomb correlation factor is motivated by analytic results for two electrons in a box and allows for a further improved convergence of the correlation energies with respect to the employed basis set. We find the combination of the infinitely delocalized plane waves and local short-ranged geminals provides a complementary, and rapidly convergent basis for the description of periodic wavefunctions. We hope that this approach will expand the scope of discrete wavefunction expansions in periodic systems.

Journal ArticleDOI
TL;DR: This paper proposes error estimation and controlled-fidelity model reduction methods based on Path-Space Information Theory, combined with statistical parametric estimation of rates for non-equilibrium stationary processes, and proposes an asymptotically equivalent method-related to maximum likelihood estimators for stochastic processes.
Abstract: In this paper we focus on the development of new methods suitable for efficient and reliable coarse-graining of {\it non-equilibrium} molecular systems. In this context, we propose error estimation and controlled-fidelity model reduction methods based on Path-Space Information Theory, and combine it with statistical parametric estimation of rates for non-equilibrium stationary processes. The approach we propose extends the applicability of existing information-based methods for deriving parametrized coarse-grained models to Non-Equilibrium systems with Stationary States (NESS). In the context of coarse-graining it allows for constructing optimal parametrized Markovian coarse-grained dynamics, by minimizing information loss (due to coarse-graining) on the path space. Furthermore, the associated path-space Fisher Information Matrix can provide confidence intervals for the corresponding parameter estimators. We demonstrate the proposed coarse-graining method in a non-equilibrium system with diffusing interacting particles, driven by out-of-equilibrium boundary conditions.

Journal ArticleDOI
TL;DR: Libsharp as discussed by the authors is a code library for spherical harmonic transforms (SHTs), which evolved from the libpsht library, addressing several of its shortcomings, such as adding MPI support for distributed memory systems and SHTs of fields with arbitrary spin, but also supporting new developments in CPU instruction sets like AVX or fused multiply-accumulate (FMA) instructions.
Abstract: We present libsharp, a code library for spherical harmonic transforms (SHTs), which evolved from the libpsht library, addressing several of its shortcomings, such as adding MPI support for distributed memory systems and SHTs of fields with arbitrary spin, but also supporting new developments in CPU instruction sets like the Advanced Vector Extensions (AVX) or fused multiply-accumulate (FMA) instructions. The library is implemented in portable C99 and provides an interface that can be easily accessed from other programming languages such as C++, Fortran, Python etc. Generally, libsharp's performance is at least on par with that of its predecessor; however, significant improvements were made to the algorithms for scalar SHTs, which are roughly twice as fast when using the same CPU capabilities. The library is available at this http URL under the terms of the GNU General Public License.

Posted Content
TL;DR: HOOMD-blue as discussed by the authors is a particle simulation engine designed for nano-and colloidal-scale molecular dynamics and hard particle Monte Carlo simulations, which has been actively developed since March 2007 and available open source since August 2008.
Abstract: HOOMD-blue is a particle simulation engine designed for nano- and colloidal-scale molecular dynamics and hard particle Monte Carlo simulations. It has been actively developed since March 2007 and available open source since August 2008. HOOMD-blue is a Python package with a high performance C++/CUDA backend that we built from the ground up for GPU acceleration. The Python interface allows users to combine HOOMD-blue with with other packages in the Python ecosystem to create simulation and analysis workflows. We employ software engineering practices to develop, test, maintain, and expand the code.

Journal ArticleDOI
TL;DR: This paper uses the Adaptive Two-Regime Method for an in-depth study of front propagation in a stochastic reaction-diffusion system which has its mean-field model given in terms of the Fisher equation.
Abstract: The Adaptive Two-Regime Method (ATRM) is developed for hybrid (multiscale) stochastic simulation of reaction-diffusion problems. It efficiently couples detailed Brownian dynamics simulations with coarser lattice-based models. The ATRM is a generalization of the previously developed Two-Regime Method [Flegg et al, Journal of the Royal Society Interface, 2012] to multiscale problems which require a dynamic selection of regions where detailed Brownian dynamics simulation is used. Typical applications include a front propagation or spatio-temporal oscillations. In this paper, the ATRM is used for an in-depth study of front propagation in a stochastic reaction-diffusion system which has its mean-field model given in terms of the Fisher equation [Fisher, Annals of Eugenics, 1937]. It exhibits a travelling reaction front which is sensitive to stochastic fluctuations at the leading edge of the wavefront. Previous studies into stochastic effects on the Fisher wave propagation speed have focused on lattice-based models, but there has been limited progress using off-lattice (Brownian dynamics) models, which suffer due to their high computational cost, particularly at the high molecular numbers that are necessary to approach the Fisher mean-field model. By modelling only the wavefront itself with the off-lattice model, it is shown that the ATRM leads to the same Fisher wave results as purely off-lattice models, but at a fraction of the computational cost. The error analysis of the ATRM is also presented for a morphogen gradient model.

Journal ArticleDOI
TL;DR: This article derives a family of extended variable integrators for the Generalized Langevin equation with a positive Prony series memory kernel using stability and error analysis and implements the corresponding numerical algorithm in the LAMMPS MD software package.
Abstract: Generalized Langevin dynamics (GLD) arise in the modeling of a number of systems, ranging from structured fluids that exhibit a viscoelastic mechanical response, to biological systems, and other media that exhibit anomalous diffusive phenomena. Molecular dynamics (MD) simulations that include GLD in conjunction with external and/or pairwise forces require the development of numerical integrators that are efficient, stable, and have known convergence properties. In this article, we derive a family of extended variable integrators for the Generalized Langevin equation (GLE) with a positive Prony series memory kernel. Using stability and error analysis, we identify a superlative choice of parameters and implement the corresponding numerical algorithm in the LAMMPS MD software package. Salient features of the algorithm include exact conservation of the first and second moments of the equilibrium velocity distribution in some important cases, stable behavior in the limit of conventional Langevin dynamics, and the use of a convolution-free formalism that obviates the need for explicit storage of the time history of particle velocities. Capability is demonstrated with respect to accuracy in numerous canonical examples, stability in certain limits, and an exemplary application in which the effect of a harmonic confining potential is mapped onto a memory kernel.

Posted Content
TL;DR: In this paper, the authors describe the architectural modifications required to implement peer-to-peer access to NVIDIA Fermi- and Kepler-class GPUs on an FPGA-based cluster interconnect.
Abstract: Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications required to implement peer-to-peer access to NVIDIA Fermi- and Kepler-class GPUs on an FPGA-based cluster interconnect. Besides, the current software implementation, which integrates this feature by minimally extending the RDMA programming model, is discussed, as well as some issues raised while employing it in a higher level API like MPI. Finally, the current limits of the technique are studied by analyzing the performance improvements on low-level benchmarks and on two GPU-accelerated applications, showing when and how they seem to benefit from the GPU peer-to-peer method.

Journal ArticleDOI
TL;DR: In this article, the origin of the fixed-node errors in quantum Monte Carlo calculations is investigated. And the key features which affect the fixed node errors are the differences in electron density and the degree of node nonlinearity.
Abstract: We elucidate the origin of large differences (two-fold or more) in the fixed-node errors between the first- vs second-row systems for single-configuration trial wave functions in quantum Monte Carlo calculations. This significant difference in the fixed-node biases is studied across a set of atoms, molecules, and also Si, C solid crystals. The analysis is done over valence isoelectronic systems that share similar correlation energies, bond patterns, geometries, ground states, and symmetries. We show that the key features which affect the fixed-node errors are the differences in electron density and the degree of node nonlinearity. The findings reveal how the accuracy of the quantum Monte Carlo varies across a variety of systems, provide new perspectives on the origins of the fixed-node biases in electronic structure calculations of molecular and condensed systems, and carry implications for pseudopotential constructions for heavy elements

Journal ArticleDOI
TL;DR: In this paper, the authors present a computational framework for quasi-static brittle fracture in 3D solids, based on the concept of configurational mechanics, consistent with Griffith's theory.
Abstract: This paper presents a computational framework for quasi-static brittle fracture in three dimensional solids. The paper set outs the theoretical basis for determining the initiation and direction of propagating cracks based on the concept of configurational mechanics, consistent with Griffith's theory. Resolution of the propagating crack by the finite element mesh is achieved by restricting cracks to element faces and adapting the mesh to align it with the predicted crack direction. A local mesh improvement procedure is developed to maximise mesh quality in order to improve both accuracy and solution robustness and to remove the influence of the initial mesh on the direction of propagating cracks. An arc-length control technique is derived to enable the dissipative load path to be traced. A hierarchical hp-refinement strategy is implemented in order to improve both the approximation of displacements and crack geometry. The performance of this modelling approach is demonstrated on two numerical examples that qualitatively illustrate its ability to predict complex crack paths. All problems are three-dimensional, including a torsion problem that results in the accurate prediction of a doubly-curved crack.

Journal ArticleDOI
TL;DR: This paper presents an efficient and widely applicable method, called discrete flow mapping, for solving problems on triangulated surfaces of high-frequency linear wave fields, and an application in structural dynamics, determining the vibroacoustic response of a cast aluminium car body component is presented.
Abstract: Energy distributions of high frequency linear wave fields are often modelled in terms of flow or transport equations with ray dynamics given by a Hamiltonian vector field in phase space. Applications arise in underwater and room acoustics, vibro-acoustics, seismology, electromagnetics, and quantum mechanics. Related flow problems based on general conservation laws are used, for example, in weather forecasting or molecular dynamics simulations. Solutions to these flow equations are often large scale, complex and high-dimensional, leading to formidable challenges for numerical approximation methods. This paper presents an efficient and widely applicable method, called discrete flow mapping, for solving such problems on triangulated surfaces. An application in structural dynamics - determining the vibro-acoustic response of a cast aluminium car body component - is presented.

Posted Content
TL;DR: In this paper, the relationship between current sources and the resulting electromagnetic waves in FDTD simulations is discussed and the effects of dispersion and discretization are discussed, and a simple technique to separate incident and scattered fields is described to compensate for imperfect equivalent currents.
Abstract: This chapter discusses the relationships between current sources and the resulting electromagnetic waves in FDTD simulations. First, the "total-field/scattered-field" approach to creating incident plane waves is reviewed and seen to be a special case of the well-known principle of equivalence in electromagnetism: this can be used to construct "equivalent" current sources for any desired incident field, including waveguide modes. The effects of dispersion and discretization are discussed, and a simple technique to separate incident and scattered fields is described in order to compensate for imperfect equivalent currents. The important concept of the local density of states (LDOS) is reviewed, which elucidates the relationship between current sources and the resulting fields, including enhancement of the LDOS via mode cutoffs (Van Hove singularities) and resonant cavities (Purcell enhancement). We also address various other source techniques such as covering a wide range of frequencies and incident angles in a small number of simulations for waves incident on a periodic surface, sources to excite eigenmodes in rectangular supercells of periodic systems, moving sources, and thermal sources via a Monte Carlo/Langevin approach.

Posted Content
TL;DR: In this paper, the authors study gradient-based methods for solving the direct minimization problem by constructing new iterations along the gradient on the Stiefel manifold, which can outperform SCF consistently on many practically large systems.
Abstract: The density functional theory (DFT) in electronic structure calculations can be formulated as either a nonlinear eigenvalue or direct minimization problem. The most widely used approach for solving the former is the so-called self-consistent field (SCF) iteration. A common observation is that the convergence of SCF is not clear theoretically while approaches with convergence guarantee for solving the latter are often not competitive to SCF numerically. In this paper, we study gradient type methods for solving the direct minimization problem by constructing new iterations along the gradient on the Stiefel manifold. Global convergence (i.e., convergence to a stationary point from any initial solution) as well as local convergence rate follows from the standard theory for optimization on manifold directly. A major computational advantage is that the computation of linear eigenvalue problems is no longer needed. The main costs of our approaches arise from the assembling of the total energy functional and its gradient and the projection onto the manifold. These tasks are cheaper than eigenvalue computation and they are often more suitable for parallelization as long as the evaluation of the total energy functional and its gradient is efficient. Numerical results show that they can outperform SCF consistently on many practically large systems.

Journal ArticleDOI
TL;DR: Two schemes that enable efficient time-scale separation in ab initio calculations are presented: one based on fragment decomposition and the other on range separation of the Coulomb operator in the electronic Hamiltonian.
Abstract: Multiple time-scale algorithms exploit the natural separation of time-scales in chemical systems to greatly accelerate the efficiency of molecular dynamics simulations. Although the utility of these methods in systems where the interactions are described by empirical potentials is now well established, their application to ab initio molecular dynamics calculations has been limited by difficulties associated with splitting the ab initio potential into fast and slowly varying components. Here we show that such a timescale separation is possible using two different schemes: one based on fragment decomposition and the other on range separation of the Coulomb operator in the electronic Hamiltonian. We demonstrate for both water clusters and a solvated hydroxide ion that multiple time-scale molecular dynamics allows for outer time steps of 2.5 fs, which are as large as those obtained when such schemes are applied to empirical potentials, while still allowing for bonds to be broken and reformed throughout the dynamics. This permits computational speedups of up to 4.4x, compared to standard Born-Oppenheimer ab initio molecular dynamics with a 0.5 fs time step, while maintaining the same energy conservation and accuracy.

Posted Content
TL;DR: For real-valued, nonnegative image reconstruction, the image of interest is shown to be an optimal solution of the formulated l1 minimization in the noise free case and the proposed approach is fast, accurate and robust to measurements noises.
Abstract: Phase retrieval refers to a classical nonconvex problem of recovering a signal from its Fourier magnitude measurements Inspired by the compressed sensing technique, signal sparsity is exploited in recent studies of phase retrieval to reduce the required number of measurements, known as compressive phase retrieval (CPR) In this paper, l1 minimization problems are formulated for CPR to exploit the signal sparsity and alternating direction algorithms are presented for problem solving For real-valued, nonnegative image reconstruction, the image of interest is shown to be an optimal solution of the formulated l1 minimization in the noise free case Numerical simulations demonstrate that the proposed approach is fast, accurate and robust to measurements noises

Journal ArticleDOI
TL;DR: A new tracking algorithm based on the Hough transform will be evaluated for the first time on a multi-core Intel Xeon E5-2697v2 CPU, an NVIDIA Tesla K20c GPU, and an Intel \xphi\ 7120 coprocessor.
Abstract: Recent innovations focused around {\em parallel} processing, either through systems containing multiple processors or processors containing multiple cores, hold great promise for enhancing the performance of the trigger at the LHC and extending its physics program. The flexibility of the CMS/ATLAS trigger system allows for easy integration of computational accelerators, such as NVIDIA's Tesla Graphics Processing Unit (GPU) or Intel's \xphi, in the High Level Trigger. These accelerators have the potential to provide faster or more energy efficient event selection, thus opening up possibilities for new complex triggers that were not previously feasible. At the same time, it is crucial to explore the performance limits achievable on the latest generation multicore CPUs with the use of the best software optimization methods. In this article, a new tracking algorithm based on the Hough transform will be evaluated for the first time on a multi-core Intel Xeon E5-2697v2 CPU, an NVIDIA Tesla K20c GPU, and an Intel \xphi\ 7120 coprocessor. Preliminary time performance will be presented.

Journal ArticleDOI
TL;DR: A significant improvement in its efficiency is achieved by showing how the dimension of the final numerical integral may often be reduced by one, if n is the number of common vertices between the two triangles.
Abstract: We present a generic technique, automated by computer-algebra systems and available as open-source software \cite{scuff-em}, for efficient numerical evaluation of a large family of singular and nonsingular 4-dimensional integrals over triangle-product domains, such as those arising in the boundary-element method (BEM) of computational electromagnetism. To date, practical implementation of BEM solvers has often required the aggregation of multiple disparate integral-evaluation schemes to treat all of the distinct types of integrals needed for a given BEM formulation; in contrast, our technique allows many different types of integrals to be handled by the \emph{same} algorithm and the same code implementation. Our method is a significant generalization of the Taylor--Duffy approach \cite{Taylor2003,Duffy1982}, which was originally presented for just a single type of integrand; in addition to generalizing this technique to a broad class of integrands, we also achieve a significant improvement in its efficiency by showing how the \emph{dimension} of the final numerical integral may often be reduced by one. In particular, if $n$ is the number of common vertices between the two triangles, in many cases we can reduce the dimension of the integral from $4-n$ to $3-n$, obtaining a closed-form analytical result for $n=3$ (the common-triangle case).

Journal ArticleDOI
TL;DR: This paper presents a Graphics Processing Units (GPUs) implementation of the Semiclassical Initial Value Representation (SC-IVR) propagator for vibrational molecular spectroscopy calculations, showing a reduction in computational time and power consumption and semiclassical GPU calculations are shown to be environment friendly.
Abstract: This paper presents a Graphics Processing Units (GPUs) implementation of the Semiclassical Initial Value Representation (SC-IVR) propagator for vibrational molecular spectroscopy calculations. The time-averaging formulation of the SC-IVR for power spectrum calculations is employed. Details about the GPU implementation of the semiclassical code are provided. Four molecules with an increasing number of atoms are considered and the GPU-calculated vibrational frequencies perfectly match the benchmark values. The computational time scaling of two GPUs (NVIDIA Tesla C2075 and Kepler K20) respectively versus two CPUs (Intel Core i5 and Intel Xeon E5-2687W) and the critical issues related to the GPU implementation are discussed. The resulting reduction in computational time and power consumption is significant and semiclassical GPU calculations are shown to be environment friendly.