scispace - formally typeset
Open AccessJournal ArticleDOI

A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics

TLDR
Domain-aware expression templates combined with SIMD instructions are shown to provide a significant speed-up over the classical low-level style programming techniques.
About
This article is published in Computer Physics Communications.The article was published on 2017-07-01 and is currently open access. It has received 23 citations till now. The article focuses on the topics: Tensor contraction & Tensor product.

read more

Citations
More filters
Journal ArticleDOI

A curvilinear high order finite element framework for electromechanics: From linearised electro-elasticity to massively deformable dielectric elastomers

TL;DR: In this paper, a high-order finite element implementation of the convex multi-variable electro-elasticity for large deformations large electric fields analyses and its particularisation to the case of small strains through a staggered scheme is presented.
Journal ArticleDOI

A Pipeline Computing Method of SpTV for Three-Order Tensors on CPU and GPU

TL;DR: Tensors have drawn a growing attention in many applications, such as physics, engineering science, social networks, recommended systems, and so on.
Journal ArticleDOI

On a family of numerical models for couple stress based flexoelectricity for continua and beams

TL;DR: In this article, the axial curvature vector is used as a strain gradient measure and a skew-symmetric couple stress theory is proposed to model the electric enthalpy in terms of curvature and electric field.
Journal ArticleDOI

A reduced mixed finite-element formulation for modeling the viscoelastic response of electro-active polymers at finite deformation:

TL;DR: In this article, a parameter identification procedure has been held for characterizing the widely used dielectric elastomer VHB, which has been performed using various experimental setups and parameters.
Journal ArticleDOI

Fourth-order tensor algebraic operations and matrix representation in continuum mechanics

TL;DR: A system of cyclic tensor algebra for operations involving fourth-order tensors that is objectively and consistently defined in three ways that each fall into one of three universal classes to facilitate programming for numerical computing.
References
More filters
Book

Artificial Intelligence: A Modern Approach

TL;DR: In this article, the authors present a comprehensive introduction to the theory and practice of artificial intelligence for modern applications, including game playing, planning and acting, and reinforcement learning with neural networks.
Journal ArticleDOI

Introduction to algorithms: 4. Turtle graphics

TL;DR: In this article, a language similar to logo is used to draw geometric pictures using this language and programs are developed to draw geometrical pictures using it, which is similar to the one we use in this paper.
Journal ArticleDOI

OpenMP: an industry standard API for shared-memory programming

L. Dagum, +1 more
TL;DR: At its most elemental level, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran (and separately, C and C++ to express shared memory parallelism) and leaves the base language unspecified.
Book

Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book

TL;DR: This book is a tutorial written by researchers and developers behind the FEniCS Project and explores an advanced, expressive approach to the development of mathematical software.
Related Papers (5)
Frequently Asked Questions (9)
Q1. How many FLOPs are needed for the by-pair tensor contraction scheme?

for L3 cache, a reduction of 106 and for tensor networks not fitting in any cache, a reduction of (107) in floating point operations is required for the by-pair tensor contraction scheme to be beneficial. 

The paper presents aspects of implementation of a new high performance tensor contraction framework for the numerical analysis of coupled and multi-physics problems on streaming architectures. In addition to explicit SIMD instructions and smart expression templates, the framework introduces domain specific constructs for the tensor cross product and its associated algebra recently rediscovered by Bonet et. al. [ 1, 2 ] in the context of solid mechanics. The two key ingredients of the presented expression template engine are as follows. Every aspect of the framework is examined through relevant performance benchmarks, including the impact of data parallelism on the performance of isomorphic and nonisomorphic tensor products, the FLOP and memory I/O optimality in the evaluation of tensor networks, the compilation cost and memory footprint of the framework and the performance of tensor cross product kernels. In this context, domain-aware expression templates are shown to provide a significant speed-up over the classical low-level style programming techniques. First, the capability to mathematically transform complex chains of operations to simpler equivalent expressions, while potentially avoiding routes with higher levels of computational complexity and, second, to perform a compile time depth-first search to find the optimal contraction indices of a large tensor network in order to minimise the number of floating point operations. 

To study the various aspects of the above optimisation levels, a singleton comprising of one 7th order tensor A and one 8th order tensor B is considered. The goal here is, to study Fastor ’ s internal optimisation schemes with realistic compiler flags ( also in order to be consistent with the other benchmarks ). Further build profiling reveals that unlike ICC and Clang, GCC stores up all large variadic templates and static arrays on the stack in order to perform global optimisation for fixed indices, but does not optimise the memory I/O. A deeper insight can be gained through a comparison of different optimisation levels presented in Table 2 20 Next, the compilation aspect of operation minimisation is studied. 

The fundamental design principle that all tensor frameworks rely on is the concept of expression templates in C++ [13, 34, 35], which provides a powerful means for lazy or on-demand evaluation of arbitrary chained operators as well as delaying the evaluation of certain tensor algebraic operations. 

For high order elements, nodal Lagrange basis functions with optimal nodal placements [60, 72] are chosen, to guarantee the stability and p-convergence property of the basis functions. 

Note that data for GCC 6.2.0 for 4 index contraction and lower is not available for optimisation level -DOPT=2, due to stall and excessive memory footprint. 

This optimisation level is indeed equivalent to writing the contraction loop nest explicitly as multiple nested for loops and relying on the compiler for further optimisations. 

As described in subsection 3.6, generating the Cartesian product of iteration space and further the indices of tensors metaprogrammatically can lead to an increase in compilation time. 

In the next subsections, the multiple stages of designing a tensor contraction framework using modern C++ features are presented, with the point of departure being the explicit SIMD vector types.