Showing papers on "Computation published in 2021"

PDF

Open Access

Book•

Computational Engineering - Introduction to Numerical Methods

[...]

Michael Schäfer, Dhruba J. Biswas¹•Institutions (1)

21 Aug 2021

TL;DR: Finite-Volume Methods for Incompressible Flows for Computation of Turbulent Flows and Acceleration of Computations for Solution of Algebraic Systems of Equations.

...read moreread less

Abstract: Modeling of Continuum Mechanical Problems.- Discretization of Problem Domain.- Finite-Volume Methods.- Finite-Element Methods.- Time Discretization.- Solution of Algebraic Systems of Equations.- Properties of Numerical Methods.- Finite-Element Methods in Structural Mechanics.- Finite-Volume Methods for Incompressible Flows.- Computation of Turbulent Flows.- Acceleration of Computations.

...read moreread less

135 citations

Journal Article•DOI•

Hyper-optimized tensor network contraction

[...]

Johnnie Gray¹, Johnnie Gray², Stefanos Kourtis¹, Stefanos Kourtis³, Stefanos Kourtis⁴ - Show less +1 more•Institutions (4)

Imperial College London¹, California Institute of Technology², Boston University³, Université de Sherbrooke⁴

15 Mar 2021

TL;DR: This work implements new randomized protocols that find very high quality contraction paths for arbitrary and large tensor networks, and introduces a hyper-optimization approach, where both the method applied and its algorithmic parameters are tuned during the path finding.

...read moreread less

Abstract: Tensor networks represent the state-of-the-art in computational methods across many disciplines, including the classical simulation of quantum many-body systems and quantum circuits. Several applications of current interest give rise to tensor networks with irregular geometries. Finding the best possible contraction path for such networks is a central problem, with an exponential effect on computation time and memory footprint. In this work, we implement new randomized protocols that find very high quality contraction paths for arbitrary and large tensor networks. We test our methods on a variety of benchmarks, including the random quantum circuit instances recently implemented on Google quantum chips. We find that the paths obtained can be very close to optimal, and often many orders or magnitude better than the most established approaches. As different underlying geometries suit different methods, we also introduce a hyper-optimization approach, where both the method applied and its algorithmic parameters are tuned during the path finding. The increase in quality of contraction schemes found has significant practical implications for the simulation of quantum many-body systems and particularly for the benchmarking of new quantum chips. Concretely, we estimate a speed-up of over 10,000$\times$ compared to the original expectation for the classical simulation of the Sycamore `supremacy' circuits.

...read moreread less

101 citations

Journal Article•DOI•

DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement

[...]

Yibo Lin¹, Zixuan Jiang², Jiaqi Gu², Wuxi Li³, Shounak Dhar⁴, Haoxing Ren⁵, Brucek Khailany⁵, David Z. Pan² - Show less +4 more•Institutions (5)

Peking University¹, University of Texas at Austin², Xilinx³, Intel⁴, Nvidia⁵

01 Apr 2021-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A novel GPU-accelerated placement framework DREAMPlace is proposed, by casting the analytical placement problem equivalently to training a neural network, to achieve speedup in global placement without quality degradation compared to the state-of-the-art multithreaded placer RePlAce.

...read moreread less

Abstract: Placement for very large-scale integrated (VLSI) circuits is one of the most important steps for design closure We propose a novel GPU-accelerated placement framework DREAMPlace, by casting the analytical placement problem equivalently to training a neural network Implemented on top of a widely adopted deep learning toolkit PyTorch , with customized key kernels for wirelength and density computations, DREAMPlace can achieve around $40\times $ speedup in global placement without quality degradation compared to the state-of-the-art multithreaded placer RePlAce We believe this work shall open up new directions for revisiting classical EDA problems with advancements in AI hardware and software

...read moreread less

87 citations

Journal Article•

Predictive Coding Approximates Backprop along Arbitrary Computation Graphs

[...]

Beren Millidge¹, Alexander Tschantz², Christopher L. Buckley²•Institutions (2)

University of Edinburgh¹, University of Sussex²

04 May 2021-arXiv: Learning

TL;DR: Predictive coding converges asymptotically (and in practice, rapidly) to exact backprop gradients on arbitrary computation graphs using only local learning rules, raising the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.

...read moreread less

Abstract: The backpropagation of error (backprop) is a powerful algorithm for training machine learning architectures through end-to-end differentiation. Recently it has been shown that backprop in multilayer-perceptrons (MLPs) can be approximated using predictive coding, a biologically-plausible process theory of cortical computation which relies solely on local and Hebbian updates. The power of backprop, however, lies not in its instantiation in MLPs, but rather in the concept of automatic differentiation which allows for the optimisation of any differentiable program expressed as a computation graph. Here, we demonstrate that predictive coding converges asymptotically (and in practice rapidly) to exact backprop gradients on arbitrary computation graphs using only local learning rules. We apply this result to develop a straightforward strategy to translate core machine learning architectures into their predictive coding equivalents. We construct predictive coding CNNs, RNNs, and the more complex LSTMs, which include a non-layer-like branching internal graph structure and multiplicative interactions. Our models perform equivalently to backprop on challenging machine learning benchmarks, while utilising only local and (mostly) Hebbian plasticity. Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry, and may also contribute to the development of completely distributed neuromorphic architectures.

...read moreread less

81 citations

Journal Article•DOI•

DNNOff: Offloading DNN-based Intelligent IoT Applications in Mobile Edge Computing

[...]

Xing Chen¹, Ming Li¹, Hao Zhong², Yun Ma³, Ching-Hsien Hsu⁴ - Show less +1 more•Institutions (4)

Fuzhou University¹, Shanghai Jiao Tong University², Peking University³, Asia University (Taiwan)⁴

26 Apr 2021-IEEE Transactions on Industrial Informatics

TL;DR: For a given DNN-based application, DNNOff first rewrites the source code to implement a special program structure supporting on-demand offloading and, at runtime, automatically determines the offloading scheme.

...read moreread less

Abstract: Deep neural network (DNN) has become increasingly popular in industrial IoT scenarios. Due to high demands on computational capability, it is hard for DNN-based applications to directly run on intelligent end devices with limited resources. Computation offloading technology offers a feasible solution by offloading some computation-intensive tasks to the cloud or edges. Supporting such capability is not easy due to two aspects: (1) Adaptability: offloading should be dynamically occur among computation nodes. (2) Effectiveness: it needs to be determined which parts are worth offloading. This paper proposed a novel approach, called DNNOff. For a given DNN-based application, DNNOff first rewrites the source code to implement a special program structure supporting on-demand offloading, and at runtime, automatically determines the offloading scheme. We evaluated DNNOff on a real-world intelligent application, with three DNN models. Our results show that, compared with other approaches, DNNOff saves response time by 12.4%-66.6% on average.

...read moreread less

73 citations

Journal Article•DOI•

Lyapunov-Guided Deep Reinforcement Learning for Stable Online Computation Offloading in Mobile-Edge Computing Networks

[...]

Suzhi Bi¹, Liang Huang², Hui Wang³, Ying-Jun Angela Zhang⁴•Institutions (4)

Shenzhen University¹, Zhejiang University of Technology², Shenzhen Institute of Information Technology³, The Chinese University of Hong Kong⁴

09 Jun 2021-IEEE Transactions on Wireless Communications

TL;DR: A novel framework, named LyDROO, is proposed that combines the advantages of Lyapunov optimization and deep reinforcement learning (DRL), and guarantees to satisfy all the long-term constraints by solving the per-frame MINLP subproblems that are much smaller in size.

...read moreread less

Abstract: Opportunistic computation offloading is an effective method to improve the computation performance of mobile-edge computing (MEC) networks under dynamic edge environment. In this paper, we consider a multi-user MEC network with time-varying wireless channels and stochastic user task data arrivals in sequential time frames. In particular, we aim to design an online computation offloading algorithm to maximize the network data processing capability subject to the long-term data queue stability and average power constraints. The online algorithm is practical in the sense that the decisions for each time frame are made without the assumption of knowing the future realizations of random channel conditions and data arrivals. We formulate the problem as a multi-stage stochastic mixed integer non-linear programming (MINLP) problem that jointly determines the binary offloading (each user computes the task either locally or at the edge server) and system resource allocation decisions in sequential time frames. To address the coupling in the decisions of different time frames, we propose a novel framework, named LyDROO, that combines the advantages of Lyapunov optimization and deep reinforcement learning (DRL). Specifically, LyDROO first applies Lyapunov optimization to decouple the multi-stage stochastic MINLP into deterministic per-frame MINLP subproblems. By doing so, it guarantees to satisfy all the long-term constraints by solving the per-frame subproblems that are much smaller in size. Then, LyDROO integrates model-based optimization and model-free DRL to solve the per-frame MINLP problems with very low computational complexity. Simulation results show that under various network setups, the proposed LyDROO achieves optimal computation performance while stabilizing all queues in the system. Besides, it induces very low computation time that is particularly suitable for real-time implementation in fast fading environments.

...read moreread less

68 citations

Journal Article•DOI•

Use of Quantum Differential Equations in Sonic Processes

[...]

Muharrem Tuncay Gençoğlu, Praveen Agarwal¹•Institutions (1)

Hodges University¹

01 Jan 2021

TL;DR: In this article, a Burger-like equation with complex solutions is defined in Hilbert space and solved with an example in order to smoothen the sonic processing of a simple turbulence flow.

...read moreread less

Abstract: Emerging as a new field, quantum computation has reinvented the fundamentals of Computer Science and knowledge theory in a manner consistent with quantum physics. The fact that quantum computation has superior features and new events than classical computation provides benefits in proving mathematical theories. With advances in technology, the nonlinear partial differential equations are used in almost every area, and many difficulties have been overcome by the solutions of these equations. In particular, the complex solutions of KdV and Burgers equations have been shown to be used in modeling a simple turbulence flow. In this study, Burger-like equation with complex solutions is defined in Hilbert space and solved with an example. In addition, these solutions were analyzed. Thanks to the Quantum Burgers-Like equation, the nonlinear differential equation is solved by linearizing. The pattern changes of time made the result linear. This means that the Quantum Burgers-Like equation can be used to smoothen the sonic processing

...read moreread less

63 citations

Journal Article•DOI•

Using the Sequence-Space Jacobian to Solve and Estimate Heterogeneous-Agent Models

[...]

Adrien Auclert¹, Adrien Auclert², Bence Bardóczy³, Matthew Rognlie², Ludwig Straub² - Show less +1 more•Institutions (3)

Center for Economic and Policy Research¹, National Bureau of Economic Research², Federal Reserve Board of Governors³

01 Sep 2021-Econometrica

TL;DR: This work provides a fast algorithm for computing Jacobians for heterogeneous agents, a technique to substantially reduce dimensionality, a rapid procedure for likelihood-based estimation, a determinacy condition for the sequence space, and a method to solve nonlinear perfect-foresight transitions.

...read moreread less

Abstract: We propose a general and highly efficient method for solving and estimating general equilibrium heterogeneous-agent models with aggregate shocks in discrete time. Our approach relies on the rapid computation of sequence-space Jacobians—the derivatives of perfect-foresight equilibrium mappings between aggregate sequences around the steady state. Our main contribution is a fast algorithm for calculating Jacobians for a large class of heterogeneous-agent problems. We combine this algorithm with a systematic approach to composing and inverting Jacobians to solve for general equilibrium impulse responses. We obtain a rapid procedure for likelihood-based estimation and computation of nonlinear perfect-foresight transitions. We apply our methods to three canonical heterogeneous-agent models: a neoclassical model, a New Keynesian model with one asset, and a New Keynesian model with two assets.

...read moreread less

62 citations

Journal Article•DOI•

Quantum Solver of Contracted Eigenvalue Equations for Scalable Molecular Simulations on Quantum Computing Devices.

[...]

Scott E. Smart¹, David A. Mazziotti¹•Institutions (1)

University of Chicago¹

18 Feb 2021-Physical Review Letters

TL;DR: A quantum solver of contracted eigenvalue equations is introduced, the quantum analog of classical methods for the energies and reduced density matrices of ground and excited states and achieves an exponential speed-up over its classical counterpart.

...read moreread less

Abstract: The accurate computation of ground and excited states of many-fermion quantum systems is one of the most consequential, contemporary challenges in the physical and computational sciences whose solution stands to benefit significantly from the advent of quantum computing devices. Existing methodologies using phase estimation or variational algorithms have potential drawbacks such as deep circuits requiring substantial error correction or nontrivial high-dimensional classical optimization. Here, we introduce a quantum solver of contracted eigenvalue equations, the quantum analog of classical methods for the energies and reduced density matrices of ground and excited states. The solver does not require deep circuits or difficult classical optimization and achieves an exponential speed-up over its classical counterpart. We demonstrate the algorithm though computations on both a quantum simulator and two IBM quantum processing units.

...read moreread less

60 citations

Journal Article•DOI•

Efficient phase-factor evaluation in quantum signal processing

[...]

Yulong Dong¹, Xiang Meng¹, K. Birgitta Whaley¹, Lin Lin², Lin Lin¹ - Show less +1 more•Institutions (2)

University of California, Berkeley¹, Lawrence Berkeley National Laboratory²

22 Apr 2021-Physical Review A

TL;DR: This work presents an optimization based method that can accurately compute the phase factors using standard double precision arithmetic operations and demonstrates the performance of this approach with applications to Hamiltonian simulation, eigenvalue filtering, and the quantum linear system problems.

...read moreread less

Abstract: Quantum signal processing (QSP) is a powerful quantum algorithm to exactly implement matrix polynomials on quantum computers. Asymptotic analysis of quantum algorithms based on QSP has shown that asymptotically optimal results can in principle be obtained for a range of tasks, such as Hamiltonian simulation and the quantum linear system problem. A further benefit of QSP is that it uses a minimal number of ancilla qubits, which facilitates its implementation on near-to-intermediate term quantum architectures. However, there is so far no classically stable algorithm allowing computation of the phase factors that are needed to build QSP circuits. Existing methods require the use of variable precision arithmetic and can only be applied to polynomials of a relatively low degree. We present here an optimization-based method that can accurately compute the phase factors using standard double precision arithmetic operations. We demonstrate the performance of this approach with applications to Hamiltonian simulation, eigenvalue filtering, and quantum linear system problems. Our numerical results show that the optimization algorithm can find phase factors to accurately approximate polynomials of a degree larger than $10\phantom{\rule{0.16em}{0ex}}000$ with errors below ${10}^{\ensuremath{-}12}$.

...read moreread less

59 citations

Proceedings Article•DOI•

FedMask: Joint Computation and Communication-Efficient Personalized Federated Learning via Heterogeneous Masking

[...]

Ang Li¹, Jingwei Sun¹, Xiao Zeng², Mi Zhang², Hai Li¹, Yiran Chen¹ - Show less +2 more•Institutions (2)

Duke University¹, Michigan State University²

15 Nov 2021

TL;DR: FedMask as discussed by the authors is a communication and computation efficient federated learning framework for on-device deep learning applications, where each device learns a sparse binary mask (i.e., 1 bit per network parameter) while keeping the parameters of each local model unchanged.

...read moreread less

Abstract: Recent advancements in deep neural networks (DNN) enabled various mobile deep learning applications. However, it is technically challenging to locally train a DNN model due to limited data on devices like mobile phones. Federated learning (FL) is a distributed machine learning paradigm which allows for model training on decentralized data residing on devices without breaching data privacy. Hence, FL becomes a natural choice for deploying on-device deep learning applications. However, the data residing across devices is intrinsically statistically heterogeneous (i.e., non-IID data distribution) and mobile devices usually have limited communication bandwidth to transfer local updates. Such statistical heterogeneity and communication bandwidth limit are two major bottlenecks that hinder applying FL in practice. In addition, considering mobile devices usually have limited computational resources, improving computation efficiency of training and running DNNs is critical to developing on-device deep learning applications. In this paper, we present FedMask - a communication and computation efficient FL framework. By applying FedMask, each device can learn a personalized and structured sparse DNN, which can run efficiently on devices. To achieve this, each device learns a sparse binary mask (i.e., 1 bit per network parameter) while keeping the parameters of each local model unchanged; only these binary masks will be communicated between the server and the devices. Instead of learning a shared global model in classic FL, each device obtains a personalized and structured sparse model that is composed by applying the learned binary mask to the fixed parameters of the local model. Our experiments show that compared with status quo approaches, FedMask improves the inference accuracy by 28.47% and reduces the communication cost and the computation cost by 34.48X and 2.44X. FedMask also achieves 1.56X inference speedup and reduces the energy consumption by 1.78X.

...read moreread less

Journal Article•DOI•

Improving Sum-Rate of Cell-Free Massive MIMO with Expanded Compute-and-Forward

[...]

Jiayi Zhang¹, Jing Zhang¹, Derrick Wing Kwan Ng², Shi Jin³, Bo Ai¹ - Show less +1 more•Institutions (3)

Beijing Jiaotong University¹, University of New South Wales², Southeast University³

21 Nov 2021-IEEE Transactions on Signal Processing

TL;DR: In this article, the authors proposed power control algorithms for the parallel computation and successive computation in the expanded compute-and-forward (ECF) framework, respectively, to exploit the performance gain and then improve the system performance.

...read moreread less

Abstract: Cell-free massive multiple-input multiple-output (MIMO) employs a large number of distributed access points (APs) to serve a small number of user equipments (UEs) via the same time/frequency resource. Due to the strong macro diversity gain, cell-free massive MIMO can considerably improve the achievable sum-rate compared to conventional cellular massive MIMO. However, the performance of cell-free massive MIMO is upper limited by inter-user interference (IUI) when employing simple maximum ratio combining (MRC) at receivers. To harness IUI, the expanded compute-and-forward (ECF) framework is adopted. In particular, we propose power control algorithms for the parallel computation and successive computation in the ECF framework, respectively, to exploit the performance gain and then improve the system performance. Furthermore, we propose an AP selection scheme and the application of different decoding orders for the successive computation. Finally, numerical results demonstrate that ECF frameworks outperform the conventional CF and MRC frameworks in terms of achievable sum-rate.

...read moreread less

Proceedings Article•

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

[...]

Dmitry Lepikhin¹, HyoukJoong Lee¹, Yuanzhong Xu¹, Dehao Chen¹, Orhan Firat¹, Yanping Huang¹, Maxim Krikun¹, Noam Shazeer², Zhifeng Chen¹ - Show less +5 more•Institutions (2)

Google¹, Duke University²

03 May 2021

TL;DR: GShard as mentioned in this paper is a module composed of a set of lightweight annotation APIs and an extension to the XLA compiler to enable large scale models with up to trillions of parameters, which is critical for improving the model quality in many real-world machine learning applications with vast amounts of training data and compute.

...read moreread less

Abstract: Neural network scaling has been critical for improving the model quality in many real-world machine learning applications with vast amounts of training data and compute. Although this trend of scaling is affirmed to be a sure-fire approach for better model quality, there are challenges on the path such as the computation cost,ease of programming, and efficient implementation on parallel devices. In this paper we demonstrate conditional computation as a remedy to the above mentioned impediments, and demonstrate its efficacy and utility. We make extensive use of GShard, a module composed of a set of lightweight annotation APIs and an extension to the XLA compiler to enable large scale models with up to trillions of parameters. GShard and conditional computation enable us to scale up multilingual neural machine translation Transformer model with Sparsely-Gated Mixture-of-Experts. We demonstrate that such a giant model with 600 billion parameters can efficiently be trained on 2048 TPU v3 cores in 4 days to achieve far superior quality for translation from 100 languages to English compared to the prior art.

...read moreread less

Journal Article•DOI•

Online Learning Based Computation Offloading in MEC Systems With Communication and Computation Dynamics

[...]

Kun Guo¹, Ruifeng Gao², Wenchao Xia³, Tony Q. S. Quek¹•Institutions (3)

Singapore University of Technology and Design¹, Nantong University², Nanjing University of Posts and Telecommunications³

01 Feb 2021-IEEE Transactions on Communications

TL;DR: This paper designs online computation offloading mechanisms to minimize the time average expected task execution delay under the constraint of average energy consumption, and combines the multi-armed bandit framework for an online learning based MEC server selection algorithm.

...read moreread less

Abstract: By offloading tasks from the mobile device (MD) to its nearby deployed access points (APs), each of which is connected to one server for task processing, computation offloading can strike a balance between MD’s task execution delay and energy consumption in mobile edge computing (MEC) systems. Considering communication and computation dynamics in MEC systems, we aim to design online computation offloading mechanisms in this paper to minimize the time average expected task execution delay under the constraint of average energy consumption. Firstly, with known current channel gains between the MD and APs as well as available computing capability at MEC servers, we leverage the Lyapunov optimization framework to make an optimal one-slot decision on MD’s transmit power allocation and MEC server selection. On this basis, we then consider a more realistic scenario, where it is difficult to capture current available computing capability at MEC servers, and combine the multi-armed bandit framework for an online learning based MEC server selection algorithm. Finally, through theoretical analyses and extensive simulations, we demonstrate the near-optimality and feasibility of our proposed algorithms, and present that our proposed algorithms fully explore the interplay between communication and computation with enriched user experience and reduced energy consumption.

...read moreread less

Proceedings Article•

HiFT: Hierarchical Feature Transformer for Aerial Tracking

[...]

Ziang Cao¹, Changhong Fu¹, Junjie Ye¹, Bowen Li¹, Yiming Li² - Show less +1 more•Institutions (2)

Tongji University¹, New York University²

31 Jul 2021

TL;DR: Huang et al. as mentioned in this paper proposed an efficient and effective hierarchical feature transformer (HiFT) for aerial tracking, where hierarchical similarity maps generated by multi-level convolutional layers are fed into the feature transformer to achieve the interactive fusion of spatial (shallow layers) and semantics cues.

...read moreread less

Abstract: Most existing Siamese-based tracking methods execute the classification and regression of the target object based on the similarity maps. However, they either employ a single map from the last convolutional layer which degrades the localization accuracy in complex scenarios or separately use multiple maps for decision making, introducing intractable computations for aerial mobile platforms. Thus, in this work, we propose an efficient and effective hierarchical feature transformer (HiFT) for aerial tracking. Hierarchical similarity maps generated by multi-level convolutional layers are fed into the feature transformer to achieve the interactive fusion of spatial (shallow layers) and semantics cues (deep layers). Consequently, not only the global contextual information can be raised, facilitating the target search, but also our end-to-end architecture with the transformer can efficiently learn the interdependencies among multi-level features, thereby discovering a tracking-tailored feature space with strong discriminability. Comprehensive evaluations on four aerial benchmarks have proven the effectiveness of HiFT. Real-world tests on the aerial platform have strongly validated its practicability with a real-time speed. Our code is available at this https URL.

...read moreread less

Posted Content•

Doubling the size of quantum simulators by entanglement forging

[...]

A. Eddins, Mario Motta, Tanvi P. Gujarati, Sergey Bravyi, Antonio Mezzacapo, Charles Hadfield, Sarah Sheldon - Show less +3 more

20 Apr 2021-arXiv: Quantum Physics

TL;DR: In this article, the authors presented a method, classical entanglement forging, that harnesses classical resources to capture quantum correlations and double the size of the system that can be simulated on quantum hardware.

...read moreread less

Abstract: Quantum computers are promising for simulations of chemical and physical systems, but the limited capabilities of today's quantum processors permit only small, and often approximate, simulations. Here we present a method, classical entanglement forging, that harnesses classical resources to capture quantum correlations and double the size of the system that can be simulated on quantum hardware. Shifting some of the computation to classical post-processing allows us to represent ten spin-orbitals on five qubits of an IBM Quantum processor to compute the ground state energy of the water molecule in the most accurate simulation to date. We discuss conditions for applicability of classical entanglement forging and present a roadmap for scaling to larger problems.

...read moreread less

Journal Article•DOI•

Analytical computation of support characteristic curve for circumferential yielding lining in tunnel design

[...]

Kui Wu, Zhushan Shao, Mostafa Sharifzadeh, Hong Siyuan, Su Qin - Show less +1 more

01 Nov 2021-Journal of rock mechanics and geotechnical engineering

Proceedings Article•DOI•

MixFaceNets: Extremely Efficient Face Recognition Networks

[...]

Fadi Boutros¹, Naser Damer¹, Meiling Fang¹, Florian Kirchbuchner¹, Arjan Kuijper¹ - Show less +1 more•Institutions (1)

Fraunhofer Society¹

04 Aug 2021-International Journal of Central Banking

TL;DR: MixFaceNets as discussed by the authors is a set of extremely efficient and high throughput models for accurate face verification, which are inspired by Mixed Depthwise Convolutional Kernels (MDCK).

...read moreread less

Abstract: In this paper, we present a set of extremely efficient and high throughput models for accurate face verification, Mix-FaceNets which are inspired by Mixed Depthwise Convolutional Kernels. Extensive experiment evaluations on Label Face in the Wild (LFW), Age-DB, MegaFace, and IARPA Janus Benchmarks IJB-B and IJB-C datasets have shown the effectiveness of our MixFaceNets for applications requiring extremely low computational complexity. Under the same level of computation complexity (≤ 500M FLOPs), our MixFaceNets outperform MobileFaceNets on all the evaluated datasets, achieving 99.60% accuracy on LFW, 97.05% accuracy on AgeDB-30, 93.60 TAR (at FAR1e-6) on MegaFace, 90.94 TAR (at FAR1e-4) on IJB-B and 93.08 TAR (at FAR1e-4) on IJB-C. With computational complexity between 500M and 1G FLOPs, our MixFaceNets achieved results comparable to the top-ranked models, while using significantly fewer FLOPs and less computation over-head, which proves the practical value of our proposed Mix-FaceNets. All training codes, pre-trained models, and training logs have been made available https://github.com/fdbtrs/mixfacenets.

...read moreread less

Journal Article•DOI•

Numerical computation of buoyancy and radiation effects on MHD micropolar nanofluid flow over a stretching/shrinking sheet with heat source

[...]

Saif Ur Rehman¹, Amna Mariam², Asmat Ullah², Muhammad Imran Asjad¹, Mohd Yazid Bajuri³, Bruno Antonio Pansera⁴, Ali Ahmadian³ - Show less +3 more•Institutions (4)

University of Management and Technology, Lahore¹, National College of Business Administration and Economics², National University of Malaysia³, Mediterranea University of Reggio Calabria⁴

01 Jun 2021-Case Studies in Thermal Engineering

TL;DR: In this article, the effect of buoyancy parameters along with radiation on magneto-hydrodynamic (MHD) micro-polar nano-fluid flow over a stretching/shrinking sheet is taken into consideration.

...read moreread less

Journal Article•DOI•

Fast inversion, preconditioned quantum linear system solvers, fast Green's-function computation, and fast evaluation of matrix functions

[...]

Yu Tong¹, Dong An¹, Nathan Wiebe², Nathan Wiebe³, Nathan Wiebe⁴, Lin Lin⁵, Lin Lin¹ - Show less +3 more•Institutions (5)

University of California, Berkeley¹, University of Washington², University of Toronto³, Pacific Northwest National Laboratory⁴, Lawrence Berkeley National Laboratory⁵

27 Sep 2021-Physical Review A

TL;DR: A quantum primitive called fast inversion is introduced, which can be used as a preconditioner for solving quantum linear systems, and two efficient approaches for computing matrix functions, based on the contour integral formulation and the inverse transform respectively are introduced.

...read moreread less

Abstract: Preconditioning is the most widely used and effective way for treating ill-conditioned linear systems in the context of classical iterative linear system solvers. We introduce a quantum primitive called fast inversion, which can be used as a preconditioner for solving quantum linear systems. The key idea of fast inversion is to directly block encode a matrix inverse through a quantum circuit implementing the inversion of eigenvalues via classical arithmetics. We demonstrate the application of preconditioned linear system solvers for computing single-particle Green's functions of quantum many-body systems, which are widely used in quantum physics, chemistry, and materials science. We analyze the complexities in three scenarios: the Hubbard model, the quantum many-body Hamiltonian in the plane-wave-dual basis, and the Schwinger model. We also provide a method for performing Green's function calculation in second quantization within a fixed-particle manifold and note that this approach may be valuable for simulation more broadly. Aside from solving linear systems, fast inversion also allows us to develop fast algorithms for computing matrix functions, such as the efficient preparation of Gibbs states. We introduce two efficient approaches for such a task, based on the contour-integral formulation and the inverse transform, respectively.

...read moreread less

Posted Content•DOI•

Lightweight Self-Attentive Sequential Recommendation

[...]

Yang Li¹, Tong Chen¹, Peng-Fei Zhang¹, Hongzhi Yin¹•Institutions (1)

University of Queensland¹

25 Aug 2021-arXiv: Information Retrieval

TL;DR: Li et al. as mentioned in this paper proposed a lightweight self-attentive network (LSAN) for sequential recommendation, where each item embedding is composed by merging a group of selected base embedding vectors derived from substantially smaller embedding matrices.

...read moreread less

Abstract: Modern deep neural networks (DNNs) have greatly facilitated the development of sequential recommender systems by achieving state-of-the-art recommendation performance on various sequential recommendation tasks. Given a sequence of interacted items, existing DNN-based sequential recommenders commonly embed each item into a unique vector to support subsequent computations of the user interest. However, due to the potentially large number of items, the over-parameterised item embedding matrix of a sequential recommender has become a memory bottleneck for efficient deployment in resource-constrained environments, e.g., smartphones and other edge devices. Furthermore, we observe that the widely-used multi-head self-attention, though being effective in modelling sequential dependencies among items, heavily relies on redundant attention units to fully capture both global and local item-item transition patterns within a sequence. In this paper, we introduce a novel lightweight self-attentive network (LSAN) for sequential recommendation. To aggressively compress the original embedding matrix, LSAN leverages the notion of compositional embeddings, where each item embedding is composed by merging a group of selected base embedding vectors derived from substantially smaller embedding matrices. Meanwhile, to account for the intrinsic dynamics of each item, we further propose a temporal context-aware embedding composition scheme. Besides, we develop an innovative twin-attention network that alleviates the redundancy of the traditional multi-head self-attention while retaining full capacity for capturing long- and short-term (i.e., global and local) item dependencies. Comprehensive experiments demonstrate that LSAN significantly advances the accuracy and memory efficiency of existing sequential recommenders.

...read moreread less

Journal Article•DOI•

Model order reduction based on direct normal form: application to large finite element MEMS structures featuring internal resonance

[...]

Andrea Opreni¹, Alessandra Vizzaccaro², Attilio Frangi¹, Cyril Touzé³•Institutions (3)

Polytechnic University of Milan¹, University of Bristol², Centre national de la recherche scientifique³

18 Mar 2021-Nonlinear Dynamics

TL;DR: A reduction method based on direct normal form computation for large finite element (FE) models is detailed, avoiding the computation of the complete eigenfunctions spectrum and making a direct link with the parametrisation of invariant manifolds.

...read moreread less

Abstract: Dimensionality reduction in mechanical vibratory systems poses challenges for distributed structures including geometric nonlinearities, mainly because of the lack of invariance of the linear subspaces. A reduction method based on direct normal form computation for large finite element (FE) models is here detailed. The main advantage resides in operating directly from the physical space, hence avoiding the computation of the complete eigenfunctions spectrum. Explicit solutions are given, thus enabling a fully non-intrusive version of the reduction method. The reduced dynamics is obtained from the normal form of the geometrically nonlinear mechanical problem, free of non-resonant monomials, and truncated to the selected master coordinates, thus making a direct link with the parametrisation of invariant manifolds. The method is fully expressed with a complex-valued formalism by detailing the homological equations in a systematic manner, and the link with real-valued expressions is established. A special emphasis is put on the treatment of second-order internal resonances and the specific case of a 1:2 resonance is made explicit. Finally, applications to large-scale models of micro-electro-mechanical structures featuring 1:2 and 1:3 resonances are reported, along with considerations on computational efficiency.

...read moreread less

Journal Article•DOI•

Unsupervised Image-to-Image Translation via Pre-trained StyleGAN2 Network

[...]

Jialu Huang¹, Jing Liao¹, Sam Kwong•Institutions (1)

City University of Hong Kong¹

17 Mar 2021-IEEE Transactions on Multimedia

TL;DR: Both qualitative and quantitative evaluations were conducted to verify that the proposed I2I translation method can achieve better performance in terms of image quality, diversity and semantic similarity to the input and reference images compared to state-of-the-art works.

...read moreread less

Abstract: Image-to-Image (I2I) translation is a heated topic in academia, and it also has been applied in real-world industry for tasks like image synthesis, super-resolution, and colorization. However, traditional I2I translation methods train data in two or more domains together. This requires lots of computation resources. Moreover, the results are of lower quality, and they contain many more artifacts. The training process could be unstable when the data in different domains are not balanced, and modal collapse is more likely to happen. We proposed a new I2I translation method that generates a new model in the target domain via a series of model transformations on a pretrained StyleGAN2 model in the source domain. After that, we proposed an inversion method to achieve the conversion between an image and its latent vector. By feeding the latent vector into the generated model, we can perform I2I translation between the source domain and target domain. Both qualitative and quantitative evaluations were conducted to prove that the proposed method can achieve outstanding performance in terms of image quality, diversity and semantic similarity to the input and reference images compared to state-of-the-art works.

...read moreread less

Journal Article•DOI•

Finite element analysis of functionally graded sandwich plates with porosity via a new hyperbolic shear deformation theory

[...]

Pham Van Vinh¹, Le Quang Huy¹•Institutions (1)

Le Quy Don Technical University¹

09 Mar 2021-Defence Technology

TL;DR: In this article, a finite element model based on a new hyperbolic sheareformation theory was established to investigate the static bending, free vibration, and buckling of the functionally graded sandwich plates with porosity.

...read moreread less

Journal Article•DOI•

Exploiting Dynamic Quantum Circuits in a Quantum Algorithm with Superconducting Qubits

[...]

Antonio Corcoles¹, Maika Takita¹, Ken Inoue¹, Scott Lekuch¹, Zlatko Minev¹, Jerry M. Chow¹, Jay M. Gambetta¹ - Show less +3 more•Institutions (1)

IBM¹

31 Aug 2021-Physical Review Letters

TL;DR: In this paper, the authors explore quantum phase estimation in its adaptive version, which exploits dynamic circuits, and compare the results to a nonadaptive implementation of the same algorithm, and demonstrate that the version of real-time quantum computing with dynamic circuits can yield results comparable to an approach involving classical asynchronous postprocessing.

...read moreread less

Abstract: To date, quantum computation on real, physical devices has largely been limited to simple, time-ordered sequences of unitary operations followed by a final projective measurement. As hardware platforms for quantum computing continue to mature in size and capability, it is imperative to enable quantum circuits beyond their conventional construction. Here we break into the realm of dynamic quantum circuits on a superconducting-based quantum system. Dynamic quantum circuits not only involve the evolution of the quantum state throughout the computation but also periodic measurements of qubits midcircuit and concurrent processing of the resulting classical information on timescales shorter than the execution times of the circuits. Using noisy quantum hardware, we explore one of the most fundamental quantum algorithms, quantum phase estimation, in its adaptive version, which exploits dynamic circuits, and compare the results to a nonadaptive implementation of the same algorithm. We demonstrate that the version of real-time quantum computing with dynamic circuits can yield results comparable to an approach involving classical asynchronous postprocessing, thus opening the door to a new realm of available algorithms on real quantum systems.

...read moreread less

Book Chapter•DOI•

Quantum Machine Learning: A Review and Current Status

[...]

Nimish Mishra¹, Manik Kapil², Hemant Rakesh³, Amit Anand⁴, Nilima Mishra⁵, Aakash Warke⁶, Soumya Sarkar⁷, Sanchayan Dutta, Sabhyata Gupta⁸, Aditya Prasad Dash⁹, Rakshit Gharat⁷, Yagnik Chatterjee¹⁰, Shuvarati Roy⁹, Shivam Raj¹¹, Valay Kumar Jain⁶, Shreeram Bagaria, Smit Chaudhary¹², Vishwanath Singh¹³, Rituparna Maji¹⁴, Priyanka Dalei, Bikash K. Behera¹⁵, Sabyasachi Mukhopadhyay, Prasanta K. Panigrahi¹⁵ - Show less +19 more•Institutions (15)

Indian Institutes of Information Technology¹, Indian Institute of Technology Guwahati², Nitte Meenakshi Institute of Technology³, Indian Institute of Engineering Science and Technology, Shibpur⁴, International Institute of Information Technology⁵, Bennett University⁶, National Institute of Technology, Karnataka⁷, Panjab University, Chandigarh⁸, Indian Institute of Science⁹, National Institute of Technology, Rourkela¹⁰, National Institute of Science Education and Research¹¹, Indian Institute of Technology Kanpur¹², Indian Institutes of Technology¹³, Central University of Karnataka¹⁴, Indian Institute of Science Education and Research, Kolkata¹⁵

01 Jan 2021

TL;DR: The previous literature on quantum machine learning is reviewed and the current status of it is provided, postulating that quantum computers may overtake classical computers on machine learning tasks.

...read moreread less

Abstract: Quantum machine learning is at the intersection of two of the most sought after research areas—quantum computing and classical machine learning. Quantum machine learning investigates how results from the quantum world can be used to solve problems from machine learning. The amount of data needed to reliably train a classical computation model is evergrowing and reaching the limits which normal computing devices can handle. In such a scenario, quantum computation can aid in continuing training with huge data. Quantum machine learning looks to devise learning algorithms faster than their classical counterparts. Classical machine learning is about trying to find patterns in data and using those patterns to predict further events. Quantum systems, on the other hand, produce atypical patterns which are not producible by classical systems, thereby postulating that quantum computers may overtake classical computers on machine learning tasks. Here, we review the previous literature on quantum machine learning and provide the current status of it.

...read moreread less

Journal Article•DOI•

Rubik: A Hierarchical Architecture for Efficient Graph Neural Network Training

[...]

Xiaobing Chen¹, Yuke Wang², Xinfeng Xie², Xing Hu¹, Abanti Basak², Ling Liang², Mingyu Yan¹, Lei Deng³, Yufei Ding², Zidong Du¹, Yuan Xie² - Show less +7 more•Institutions (3)

Chinese Academy of Sciences¹, University of California, Santa Barbara², Tsinghua University³

11 May 2021-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This work proposes a lightweight graph reordering methodology, incorporated with a GCN accelerator architecture that equips a customized cache design to fully utilize the graph-level data reuse, and proposes a mapping methodology aware of data reuse and task-level parallelism to handle various graphs inputs effectively.

...read moreread less

Abstract: Graph convolutional network (GCN) emerges as a promising direction to learn the inductive representation in graph data commonly used in widespread applications, such as E-commerce, social networks, and knowledge graphs. However, learning from graphs is non-trivial because of its mixed computation model involving both graph analytics and neural network computing. To this end, we decompose the GCN learning into two hierarchical paradigms: graph-level and node-level computing. Such a hierarchical paradigm facilitates the software and hardware accelerations for GCN learning. We propose a lightweight graph reordering methodology, incorporated with a GCN accelerator architecture that equips a customized cache design to fully utilize the graph-level data reuse. We also propose a mapping methodology aware of data reuse and task-level parallelism to handle various graphs inputs effectively. Results show that Rubik accelerator design improves energy efficiency by 26.3x to 1375.2x than GPU platforms across different datasets and GCN models.

...read moreread less

Proceedings Article•

On the Tractability of SHAP Explanations

[...]

Guy Van den Broeck¹, Anton Lykov¹, Maximilian Schleich², Dan Suciu²•Institutions (2)

University of California, Los Angeles¹, University of Washington²

18 May 2021

TL;DR: In this paper, the complexity of computing the SHAP explanation is shown to be #P-hard for logistic regression models over fully-factorized data distributions, and even for naive Bayes distributions.

...read moreread less

Abstract: SHAP explanations are a popular feature-attribution mechanism for explainable AI. They use game-theoretic notions to measure the influence of individual features on the prediction of a machine learning model. Despite a lot of recent interest from both academia and industry, it is not known whether SHAP explanations of common machine learning models can be computed efficiently. In this paper, we establish the complexity of computing the SHAP explanation in three important settings. First, we consider fully-factorized data distributions, and show that the complexity of computing the SHAP explanation is the same as the complexity of computing the expected value of the model. This fully-factorized setting is often used to simplify the SHAP computation, yet our results show that the computation can be intractable for commonly used models such as logistic regression. Going beyond fully-factorized distributions, we show that computing SHAP explanations is already intractable for a very simple setting: computing SHAP explanations of trivial classifiers over naive Bayes distributions. Finally, we show that even computing SHAP over the empirical distribution is #P-hard.

...read moreread less

Journal Article•DOI•

Event-Triggered Control of Nonlinear Discrete-Time System With Unknown Dynamics Based on HDP(λ).

[...]

Ting Li¹, Dongsheng Yang¹, Xiangpeng Xie², Huaguang Zhang¹•Institutions (2)

Northeastern University (China)¹, Nanjing University of Posts and Telecommunications²

02 Feb 2021-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: In this paper, an event-triggered heuristic dynamic programming (HDP) (λ)-based optimal control strategy, which takes a long-term prediction parameter λ into account using an iterative manner, accelerates the learning rate and reduces the computation complexity.

...read moreread less

Abstract: The heuristic dynamic programming (HDP) (λ)-based optimal control strategy, which takes a long-term prediction parameter λ into account using an iterative manner, accelerates the learning rate obviously The computation complexity caused by the state-associated extra variable in λ-return value computing of the traditional value-gradient learning method can be reduced However, as the iteration number increases, calculation costs have grown dramatically that bring huge challenge for the optimal control process with limited bandwidth and computational units In this article, we propose an event-triggered HDP (ETHDP) (λ) optimal control strategy for nonlinear discrete-time (NDT) systems with unknown dynamics The iterative relation for λ-return of the final target value is derived first The event-triggered condition ensuring system stability is designed to reduce the computation and communication requirements Next, we build a model-actor-critic neural network (NN) structure, in which the model NN evaluates the system state for getting λ-return of the current time target value, which is used to obtain the critic NN real-time update errors The event-triggered optimal control signal and one-step-return value are approximated by actor and critic NN, respectively Then, the event trigger-based uniformly ultimately bounded (UUB) stability of the system state and NN weight errors are demonstrated by applying the Lyapunov technology Finally, we illustrate the effectiveness of our proposed ETHDP (λ) strategy by two cases

...read moreread less

Journal Article•DOI•

Materials challenges for trapped-ion quantum computers

[...]

Kenneth R. Brown¹, John Chiaverini², Jeremy M. Sage², Hartmut Häffner³•Institutions (3)

Duke University¹, Massachusetts Institute of Technology², University of California, Berkeley³

25 Mar 2021-Nature Reviews Materials

TL;DR: In this article, materials requirements for such integrated systems, with a focus on problems that hinder current progress towards practical quantum computation, are considered, and suggestions for how materials scientists and trapped ion technologists can work together to develop materials-based integration and noise-mitigation strategies to enable the next generation of trapped-ion quantum computers.

...read moreread less

Abstract: Trapped-ion quantum information processors store information in atomic ions maintained in position in free space by electric fields. Quantum logic is enacted through manipulation of the ions’ internal and shared motional quantum states using optical and microwave signals. Although trapped ions show great promise for quantum-enhanced computation, sensing and communication, materials research is needed to design traps that allow for improved performance by means of integration of system components, including optics and electronics for ion-qubit control, while minimizing the near-ubiquitous electric-field noise produced by trap-electrode surfaces. In this Review, we consider the materials requirements for such integrated systems, with a focus on problems that hinder current progress towards practical quantum computation. We give suggestions for how materials scientists and trapped-ion technologists can work together to develop materials-based integration and noise-mitigation strategies to enable the next generation of trapped-ion quantum computers. Trapped-ion qubits have great potential for quantum computation, but materials improvements are needed. This Review surveys materials opportunities to improve the performance of trapped-ion qubits, from understanding the surface science that leads to electric-field noise to developing methods for building ion traps with integrated optics and electronics.

...read moreread less

Collapse