Showing papers on "Maxima and minima published in 2020"

PDF

Open Access

Journal Article•DOI•

Locally adaptive activation functions with slope recovery for deep and physics-informed neural networks.

[...]

Ameya D. Jagtap¹, Kenji Kawaguchi², George Em Karniadakis¹, George Em Karniadakis³•Institutions (3)

Brown University¹, Massachusetts Institute of Technology², Pacific Northwest National Laboratory³

29 Jul 2020-Proceedings of The Royal Society A: Mathematical, Physical and Engineering Sciences

TL;DR: It is proved that in the proposed method, the gradient descent algorithms are not attracted to sub-optimal critical points or local minima under practical conditions on the initialization and learning rate, and that the gradient dynamics of the proposedmethod is not achievable by base methods with any (adaptive) learning rates.

...read moreread less

Abstract: We propose two approaches of locally adaptive activation functions namely, layer-wise and neuron-wise locally adaptive activation functions, which improve the performance of deep and physics-informed neural networks. The local adaptation of activation function is achieved by introducing a scalable parameter in each layer (layer-wise) and for every neuron (neuron-wise) separately, and then optimizing it using a variant of stochastic gradient descent algorithm. In order to further increase the training speed, an activation slope-based slope recovery term is added in the loss function, which further accelerates convergence, thereby reducing the training cost. On the theoretical side, we prove that in the proposed method, the gradient descent algorithms are not attracted to sub-optimal critical points or local minima under practical conditions on the initialization and learning rate, and that the gradient dynamics of the proposed method is not achievable by base methods with any (adaptive) learning rates. We further show that the adaptive activation methods accelerate the convergence by implicitly multiplying conditioning matrices to the gradient of the base method without any explicit computation of the conditioning matrix and the matrix-vector product. The different adaptive activation functions are shown to induce different implicit conditioning matrices. Furthermore, the proposed methods with the slope recovery are shown to accelerate the training process.

...read moreread less

159 citations

Journal Article•DOI•

Point2Mesh: a self-prior for deformable meshes

[...]

Rana Hanocka¹, Gal Metzer¹, Raja Giryes¹, Daniel Cohen-Or¹•Institutions (1)

Tel Aviv University¹

08 Jul 2020-ACM Transactions on Graphics

TL;DR: This paper introduces Point2Mesh, a technique for reconstructing a surface mesh from an input point cloud that is robust to non-ideal conditions, and shows that shrink-wrapping a point cloud with a self-prior converges to a desirable solution.

...read moreread less

Abstract: In this paper, we introduce Point2Mesh, a technique for reconstructing a surface mesh from an input point cloud. Instead of explicitly specifying a prior that encodes the expected shape properties, the prior is defined automatically using the input point cloud, which we refer to as a self-prior. The self-prior encapsulates reoccurring geometric repetitions from a single shape within the weights of a deep neural network. We optimize the network weights to deform an initial mesh to shrink-wrap a single input point cloud. This explicitly considers the entire reconstructed shape, since shared local kernels are calculated to fit the overall object. The convolutional kernels are optimized globally across the entire shape, which inherently encourages local-scale geometric self-similarity across the shape surface. We show that shrink-wrapping a point cloud with a self-prior converges to a desirable solution; compared to a prescribed smoothness prior, which often becomes trapped in undesirable local minima. While the performance of traditional reconstruction approaches degrades in non-ideal conditions that are often present in real world scanning, i.e., unoriented normals, noise and missing (low density) parts, Point2Mesh is robust to non-ideal conditions. We demonstrate the performance of Point2Mesh on a large variety of shapes with varying complexity.

...read moreread less

137 citations

Journal Article•DOI•

Embedding Deep Learning in Inverse Scattering Problems

[...]

Yash Sanghvi¹, Yaswanth Kalepu¹, Uday K. Khankhoje¹•Institutions (1)

Indian Institute of Technology Madras¹

01 Jan 2020-IEEE Transactions on Computational Imaging

TL;DR: A novel convolutional neural network architecture is proposed, termed the contrast source network, that learns the noise space components of the radiation operator that helps in producing high resolution solutions without any significant increase in computational costs.

...read moreread less

Abstract: In this paper, we introduce a deep-learning-based framework to solve electromagnetic inverse scattering problems. This framework builds on and extends the capabilities of existing physics-based inversion algorithms. These algorithms, such as the contrast source inversion, subspace-optimization method, and their variants face a problem of getting trapped in false local minima when recovering objects with high permittivity. We propose a novel convolutional neural network architecture, termed the contrast source network, that learns the noise space components of the radiation operator. Together with the signal space components directly estimated from the data, we iteratively refine the solution and show convergence to the correct solution in cases where traditional techniques fail without any significant increase in computational time. We also propose a novel multiresolution strategy that helps in producing high resolution solutions without any significant increase in computational costs. Through extensive numerical experiments, we demonstrate the ability to recover high permittivity objects that include homogeneous, heterogeneous, and lossy scatterers.

...read moreread less

109 citations

Journal Article•DOI•

Avoiding local minima in variational quantum eigensolvers with the natural gradient optimizer

[...]

David Wierichs¹, Christian Gogolin¹, Michael J. Kastoryano¹, Michael J. Kastoryano²•Institutions (2)

University of Cologne¹, Amazon.com²

17 Nov 2020

TL;DR: In this paper, the convergence of the natural gradient optimizer for the variational quantum eigensolver across multiple spin chain systems was shown for a single spin chain system, where the optimizer is based on the natural gradients.

...read moreread less

Abstract: This paper shows the convergence of the natural gradient optimizer for the variational quantum eigensolver across multiple spin chain systems.

...read moreread less

104 citations

Journal Article•DOI•

On the Regularity of Minima of Non-autonomous Functionals

[...]

Cristiana De Filippis¹, Giuseppe Mingione²•Institutions (2)

University of Oxford¹, University of Parma²

01 Apr 2020-Journal of Geometric Analysis

TL;DR: In this paper, the authors consider regularity issues for minima of non-autonomous functionals in the Calculus of Variations exhibiting non-uniform ellipticity features.

...read moreread less

Abstract: We consider regularity issues for minima of non-autonomous functionals in the Calculus of Variations exhibiting non-uniform ellipticity features. We provide a few sharp regularity results for local minimizers that also cover the case of functionals with nearly linear growth. The analysis is carried out provided certain necessary approximation-in-energy conditions are satisfied. These are related to the occurrence of the so-called Lavrentiev phenomenon that non-autonomous functionals might exhibit, and which is a natural obstruction to regularity. In the case of vector valued problems, we concentrate on higher gradient integrability of minima. Instead, in the scalar case, we prove local Lipschitz estimates. We also present an approach via a variant of Moser’s iteration technique that allows to reduce the analysis of several non-uniformly elliptic problems to that for uniformly elliptic ones.

...read moreread less

85 citations

Journal Article•DOI•

Solar photovoltaic parameter estimation using an improved equilibrium optimizer

[...]

Mohamed Abdel-Basset¹, Reda Mohamed¹, Seyedali Mirjalili, Ripon K. Chakrabortty², Michael J. Ryan² - Show less +1 more•Institutions (2)

Zagazig University¹, University of New South Wales²

01 Oct 2020-Solar Energy

TL;DR: In this paper, a linear reduction diversity technique (LRD) and local minima elimination method (MEM) are used to improve the best-so-far solution of the problem.

...read moreread less

79 citations

Journal Article•DOI•

Efficient Global Structure Optimization with a Machine-Learned Surrogate Model.

[...]

Malthe Kjær Bisbo¹, Bjørk Hammer¹•Institutions (1)

Aarhus University¹

27 Feb 2020-Physical Review Letters

TL;DR: In this article, a global optimization with first-principle energy expressions of atomistic structure is proposed to identify initial stages of the edge oxidation and oxygen intercalation of graphene sheets on the Ir(111) surface.

...read moreread less

Abstract: We propose a scheme for global optimization with first-principles energy expressions of atomistic structure. While unfolding its search, the method actively learns a surrogate model of the potential energy landscape on which it performs a number of local relaxations (exploitation) and further structural searches (exploration). Assuming Gaussian processes, deploying two separate kernel widths to better capture rough features of the energy landscape while retaining a good resolution of local minima, an acquisition function is used to decide on which of the resulting structures is the more promising and should be treated at the first-principles level. The method is demonstrated to outperform by 2 orders of magnitude a well established first-principles based evolutionary algorithm in finding surface reconstructions. Finally, global optimization with first-principles energy expressions is utilized to identify initial stages of the edge oxidation and oxygen intercalation of graphene sheets on the Ir(111) surface.

...read moreread less

75 citations

Journal Article•DOI•

Constrained Subspace Method for the Identification of Structured State-Space Models (COSMOS)

[...]

Chengpu Yu¹, Lennart Ljung², Adrian Wills³, Michel Verhaegen⁴•Institutions (4)

Beijing Institute of Technology¹, Linköping University², University of Newcastle³, Delft University of Technology⁴

01 Oct 2020-IEEE Transactions on Automatic Control

TL;DR: A unified identification framework called constrained subspace method for structured state-space models (COSMOS) is presented, where the structure is defined by a user-specified linear or polynomial parametrization.

...read moreread less

Abstract: In this article, a unified identification framework called constrained subspace method for structured state-space models (COSMOS) is presented, where the structure is defined by a user-specified linear or polynomial parametrization. The new approach operates directly from the input and output data, which differs from the traditional two-step method that first obtains a state-space realization followed by the system-parameter estimation. The new identification framework relies on a subspace inspired linear regression problem which may not yield a consistent estimate in the presence of process noise. To alleviate this problem, the linear regression formulation is imposed by structured and low-rank constraints in terms of a finite set of system Markov parameters and the user specified model parameters. The nonconvex nature of the constrained optimization problem is dealt with by transforming the problem into a difference-of-convex optimization problem, which is then handled by the sequential convex programming strategy. Numerical simulation examples show that the proposed identification method is more robust than the classical prediction-error method initialized by random initial values in converging to local minima, but at the cost of heavier computational burden.

...read moreread less

74 citations

Journal Article•DOI•

Robust Federated Learning With Noisy Communication

[...]

Fan Ang¹, Li Chen¹, Nan Zhao², Yunfei Chen³, Weidong Wang¹, F. Richard Yu⁴ - Show less +2 more•Institutions (4)

University of Science and Technology of China¹, Dalian University of Technology², University of Warwick³, Carleton University⁴

06 Mar 2020-IEEE Transactions on Communications

TL;DR: In this article, a robust design for federated learning to decline the effect of noise is proposed, where the authors formulate the training problem as a parallel optimization for each node under the expectation-based model and worst-case model and utilize the sampling-based successive convex approximation algorithm to develop a feasible training scheme to tackle the unavailable maxima or minima noise condition and the non-convex issue of the objective function.

...read moreread less

Abstract: Federated learning is a communication-efficient training process that alternate between local training at the edge devices and averaging of the updated local model at the center server. Nevertheless, it is impractical to achieve perfect acquisition of the local models in wireless communication due to the noise, which also brings serious effect on federated learning. To tackle this challenge in this paper, we propose a robust design for federated learning to decline the effect of noise. Considering the noise in two aforementioned steps, we first formulate the training problem as a parallel optimization for each node under the expectation-based model and worst-case model. Due to the non-convexity of the problem, regularizer approximation method is proposed to make it tractable. Regarding the worst-case model, we utilize the sampling-based successive convex approximation algorithm to develop a feasible training scheme to tackle the unavailable maxima or minima noise condition and the non-convex issue of the objective function. Furthermore, the convergence rates of both new designs are analyzed from a theoretical point of view. Finally, the improvement of prediction accuracy and the reduction of loss function value are demonstrated via simulation for the proposed designs.

...read moreread less

74 citations

Proceedings Article•DOI•

Monte-Carlo Imaging for Optical Interferometry

[...]

Michael J. Ireland¹, John D. Monnier², Nathalie Thureau²•Institutions (2)

California Institute of Technology¹, University of Michigan²

01 Jul 2020-arXiv: Instrumentation and Methods for Astrophysics

TL;DR: Using the statistical properties from Monte-Carlo Markov chains of images, it is shown how this code can place statistical limits on image features such as unseen binary companions.

...read moreread less

Abstract: We present a flexible code created for imaging from the bispectrum and visibility-squared. By using a simulated annealing method, we limit the probability of converging to local chi-squared minima as can occur when traditional imaging methods are used on data sets with limited phase information. We present the results of our code used on a simulated data set utilizing a number of regularization schemes including maximum entropy. Using the statistical properties from Monte-Carlo Markov chains of images, we show how this code can place statistical limits on image features such as unseen binary companions.

...read moreread less

65 citations

Journal Article•DOI•

nPINNs: Nonlocal physics-informed neural networks for a parametrized nonlocal universal Laplacian operator. Algorithms and applications

[...]

Guofei Pang¹, Marta D'Elia², Michael L. Parks², George Em Karniadakis¹•Institutions (2)

Brown University¹, Sandia National Laboratories²

01 Dec 2020-Journal of Computational Physics

TL;DR: This paper proposes a unified nonlocal Laplace operator, which converges to the classical Laplacian as one of the operator parameters, the nonlocal interaction radius δ goes to zero, and to the fractional LaplACian as δ Goes to infinity, and forms a super-set of classical LaPlacian and fractionalLaplace operators and, thus, has the potential to fit a broad spectrum of data sets.

...read moreread less

Journal Article•DOI•

Construction of a Pathway Map on a Complicated Energy Landscape.

[...]

Jianyuan Yin¹, Yiwei Wang², Jeff Z. Y. Chen³, Pingwen Zhang¹, Lei Zhang¹ - Show less +1 more•Institutions (3)

Peking University¹, Illinois Institute of Technology², University of Waterloo³

02 Mar 2020-Physical Review Letters

TL;DR: A general numerical method is introduced that constructs the pathway map, which guides the understanding of how a physical system moves on the energy landscape.

...read moreread less

Abstract: How do we search for the entire family tree of possible intermediate states, without unwanted random guesses, starting from a stationary state on the energy landscape all the way down to energy minima? Here we introduce a general numerical method that constructs the pathway map, which guides our understanding of how a physical system moves on the energy landscape. The method identifies the transition state between energy minima and the energy barrier associated with such a state. As an example, we solve the Landau--de Gennes energy incorporating the Dirichlet boundary conditions to model a liquid crystal confined in a square box; we illustrate the basic concepts by examining the multiple stationary solutions and the connected pathway maps of the model.

...read moreread less

Posted Content•

A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima

[...]

Zeke Xie¹, Issei Sato¹, Masashi Sugiyama¹•Institutions (1)

University of Tokyo¹

10 Feb 2020-arXiv: Learning

TL;DR: This work develops a density diffusion theory (DDT) to reveal how minima selection quantitatively depends on the minima sharpness and the hyperparameters, and is the first to theoretically and empirically prove that, benefited from the Hessian-dependent covariance of stochastic gradient noise, SGD favors flat minima exponentially more than sharp minima.

...read moreread less

Abstract: Stochastic Gradient Descent (SGD) and its variants are mainstream methods for training deep networks in practice. SGD is known to find a flat minimum that often generalizes well. However, it is mathematically unclear how deep learning can select a flat minimum among so many minima. To answer the question quantitatively, we develop a density diffusion theory (DDT) to reveal how minima selection quantitatively depends on the minima sharpness and the hyperparameters. To the best of our knowledge, we are the first to theoretically and empirically prove that, benefited from the Hessian-dependent covariance of stochastic gradient noise, SGD favors flat minima exponentially more than sharp minima, while Gradient Descent (GD) with injected white noise favors flat minima only polynomially more than sharp minima. We also reveal that either a small learning rate or large-batch training requires exponentially many iterations to escape from minima in terms of the ratio of the batch size and learning rate. Thus, large-batch training cannot search flat minima efficiently in a realistic computational time.

...read moreread less

Journal Article•DOI•

A novel machine-learning based on the global search techniques using vectorized data for damage detection in structures

[...]

H. Tran-Ngoc¹, Samir Khatir¹, T. Le-Xuan, G. De Roeck², Thanh Bui-Tien, M. Abdel Wahab³ - Show less +2 more•Institutions (3)

Ghent University¹, Katholieke Universiteit Leuven², Ton Duc Thang University³

01 Dec 2020-International Journal of Engineering Science

TL;DR: A novel machine-learning based on an evolutionary algorithm, namely Cuckoo search (CS) to solve the local minimum problem of ML in the most radical way and completely outperforms CS, ML, and other hybrid ML in terms of accuracy and considerably reduces calculational costs compared to CS.

...read moreread less

Journal Article•DOI•

On the optimization landscape of tensor decompositions

[...]

Rong Ge¹, Tengyu Ma²•Institutions (2)

Duke University¹, Stanford University²

24 Oct 2020-Mathematical Programming

TL;DR: For the random over-complete tensor decomposition problem, the authors showed that for any small constant > 0, all local optima are (approximately) global optima, i.e., the set of points with function values that are larger than the expectation of the function, all the local maxima are approximate global maxima.

...read moreread less

Abstract: Non-convex optimization with local search heuristics has been widely used in machine learning, achieving many state-of-art results. It becomes increasingly important to understand why they can work for these NP-hard problems on typical data. The landscape of many objective functions in learning has been conjectured to have the geometric property that “all local optima are (approximately) global optima”, and thus they can be solved efficiently by local search algorithms. However, establishing such property can be very difficult. In this paper, we analyze the optimization landscape of the random over-complete tensor decomposition problem, which has many applications in unsupervised learning, especially in learning latent variable models. In practice, it can be efficiently solved by gradient ascent on a non-convex objective. We show that for any small constant $$\varepsilon > 0$$ , among the set of points with function values $$(1+\varepsilon )$$ -factor larger than the expectation of the function, all the local maxima are approximate global maxima. Previously, the best-known result only characterizes the geometry in small neighborhoods around the true components. Our result implies that even with an initialization that is barely better than the random guess, the gradient ascent algorithm is guaranteed to solve this problem. However, achieving such a initialization with random guess would still require super-polynomial number of attempts. Our main technique uses Kac–Rice formula and random matrix theory. To our best knowledge, this is the first time when Kac–Rice formula is successfully applied to counting the number of local optima of a highly-structured random polynomial with dependent coefficients.

...read moreread less

Proceedings Article•DOI•

Masking as an Efficient Alternative to Finetuning for Pretrained Language Models

[...]

Mengjie Zhao¹, Tao Lin¹, Fei Mi¹, Martin Jaggi¹, Hinrich Schütze² - Show less +1 more•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, Ludwig Maximilian University of Munich²

01 Nov 2020

TL;DR: Analyzing the loss landscape, it is shown that Masking and finetuning produce models that reside in minima that can be connected by a line segment with nearly constant test accuracy, confirming that masking can be utilized as an efficient alternative to finetuned.

...read moreread less

Abstract: We present an efficient method of utilizing pretrained language models, where we learn selective binary masks for pretrained weights in lieu of modifying them through finetuning. Extensive evaluations of masking BERT, RoBERTa, and DistilBERT on eleven diverse NLP tasks show that our masking scheme yields performance comparable to finetuning, yet has a much smaller memory footprint when several tasks need to be inferred. Intrinsic evaluations show that representations computed by our binary masked language models encode information necessary for solving downstream tasks. Analyzing the loss landscape, we show that masking and finetuning produce models that reside in minima that can be connected by a line segment with nearly constant test accuracy. This confirms that masking can be utilized as an efficient alternative to finetuning.

...read moreread less

Journal Article•DOI•

Shaping the learning landscape in neural networks around wide flat minima.

[...]

Carlo Baldassi¹, Fabrizio Pittorino¹, Riccardo Zecchina¹, Riccardo Zecchina²•Institutions (2)

Bocconi University¹, International Centre for Theoretical Physics²

07 Jan 2020-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: In this article, the authors show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points, and that the minimizers of the cross-entropy loss overlap with the WFM of error loss.

...read moreread less

Abstract: Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points and such minimizers are often satisfactory at avoiding overfitting. How these 2 features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far-reaching open question. In this paper we study basic nonconvex 1- and 2-layer neural network models that learn random patterns and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy-driven greedy and message-passing algorithms that focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian, and their generalization performance on real data.

...read moreread less

Proceedings Article•

Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks Using PAC-Bayesian Analysis

[...]

Yusuke Tsuzuku, Issei Sato¹, Masashi Sugiyama¹•Institutions (1)

University of Tokyo¹

12 Jul 2020

TL;DR: A modified notion of the flatness is introduced that does not suffer from the insufficiency and might provide better hierarchy in the hypothesis class and highlight the scale dependence of existing matrix-norm based generalization error bounds similar to the existing flat minima definitions.

...read moreread less

Abstract: The notion of flat minima has played a key role in the generalization studies of deep learning models. However, existing definitions of the flatness are known to be sensitive to the rescaling of parameters. The issue suggests that the previous definitions of the flatness might not be a good measure of generalization, because generalization is invariant to such rescalings. In this paper, from the PAC-Bayesian perspective, we scrutinize the discussion concerning the flat minima and introduce the notion of normalized flat minima, which is free from the known scale dependence issues. Additionally, we highlight the scale dependence of existing matrix-norm based generalization error bounds similar to the existing flat minima definitions. Our modified notion of the flatness does not suffer from the insufficiency, either, suggesting it might provide better hierarchy in the hypothesis class.

...read moreread less

Book Chapter•DOI•

A Consistently Fast and Globally Optimal Solution to the Perspective-n-Point Problem

[...]

George Terzakis, Manolis I. A. Lourakis¹•Institutions (1)

Foundation for Research & Technology – Hellas¹

23 Aug 2020

TL;DR: In this article, a non-linear quadratic program is proposed to identify regions in the parameter space that contain unique minima with guarantees that at least one of them will be the global minimum.

...read moreread less

Abstract: An approach for estimating the pose of a camera given a set of 3D points and their corresponding 2D image projections is presented. It formulates the problem as a non-linear quadratic program and identifies regions in the parameter space that contain unique minima with guarantees that at least one of them will be the global minimum. Each regional minimum is computed with a sequential quadratic programming scheme. These premises result in an algorithm that always determines the global minima of the perspective-n-point problem for any number of input correspondences, regardless of possible coplanar arrangements of the imaged 3D points. For its implementation, the algorithm merely requires ordinary operations available in any standard off-the-shelf linear algebra library. Comparative evaluation demonstrates that the algorithm achieves state-of-the-art results at a consistently low computational cost.

...read moreread less

Journal Article•DOI•

On the smoothness of nonlinear system identification

[...]

Antônio H. Ribeiro¹, Antônio H. Ribeiro², Koen Tiels³, Koen Tiels², Jack Umenberger², Thomas B. Schön², Luis A. Aguirre¹ - Show less +3 more•Institutions (3)

Universidade Federal de Minas Gerais¹, Uppsala University², Eindhoven University of Technology³

01 Nov 2020-Automatica

TL;DR: New light is shed on the smoothness of optimization problems arising in prediction error parameter estimation of linear and nonlinear systems and the use of multiple shooting as a viable solution is proposed.

...read moreread less

Journal Article•DOI•

Visualising basins of attraction for the cross-entropy and the squared error neural network loss functions

[...]

Anna Sergeevna Bosman¹, Andries P. Engelbrecht², Mardé Helbig³•Institutions (3)

University of Pretoria¹, Stellenbosch University², Griffith University³

04 Aug 2020-Neurocomputing

TL;DR: The proposed visualisation technique successfully captures the local minima properties exhibited by the neural network loss surfaces, and can be used for the purpose of fitness landscape analysis of neural networks.

...read moreread less

Journal Article•DOI•

Automatically growing global reactive neural network potential energy surfaces: A trajectory-free active learning strategy

[...]

Qidong Lin¹, Yaolong Zhang¹, Bin Zhao², Bin Jiang¹•Institutions (2)

University of Science and Technology of China¹, Bielefeld University²

17 Apr 2020-Journal of Chemical Physics

TL;DR: In this paper, an active learning method is proposed to automatically sample data points for constructing globally accurate reactive potential energy surfaces (PESs) using neural networks (NNs), which can alternatively minimize the negative of the squared difference surface (NSDS) given by two different NN models to actively locate the point where the PES is least confident.

...read moreread less

Abstract: An efficient and trajectory-free active learning method is proposed to automatically sample data points for constructing globally accurate reactive potential energy surfaces (PESs) using neural networks (NNs). Although NNs do not provide the predictive variance as the Gaussian process regression does, we can alternatively minimize the negative of the squared difference surface (NSDS) given by two different NN models to actively locate the point where the PES is least confident. A batch of points in the minima of this NSDS can be iteratively added into the training set to improve the PES. The configuration space is gradually and globally covered without the need to run classical trajectory (or equivalently molecular dynamics) simulations. Through refitting the available analytical PESs of H3 and OH3 reactive systems, we demonstrate the efficiency and robustness of this new strategy, which enables fast convergence of the reactive PESs with respect to the number of points in terms of quantum scattering probabilities.

...read moreread less

Journal Article•DOI•

A new ensemble convolutional neural network with diversity regularization for fault diagnosis

[...]

Thai Tran Duc¹, Long Wen², Xiaotong Xie, Xinyu Li³, Liang Gao³ - Show less +1 more•Institutions (3)

Ji Hua Laboratory¹, China University of Geosciences (Wuhan)², Huazhong University of Science and Technology³

16 Dec 2020-Journal of Manufacturing Systems

TL;DR: The experimental results show that ISECNN can increase the generalization ability without decreasing the prediction accuracy, and is compared with traditional DL and machine learning methods, and the results validate the potential performance ofISECNN in the fault diagnosis field.

...read moreread less

Journal Article•DOI•

Point2Mesh: A Self-Prior for Deformable Meshes

[...]

Rana Hanocka¹, Gal Metzer¹, Raja Giryes¹, Daniel Cohen-Or²•Institutions (2)

Tel Aviv University¹, Association for Computing Machinery²

22 May 2020-arXiv: Graphics

TL;DR: Point2Mesh as mentioned in this paper proposes a self-prior that encapsulates reoccurring geometric repetitions from a single shape within the weights of a deep neural network, which is defined automatically using the input point cloud.

...read moreread less

Journal Article•DOI•

A New Aggrandized Class Topper Optimization Algorithm to Solve Economic Load Dispatch Problem in a Power System.

[...]

Abhishek Srivastava¹, Dushmanta Kumar Das¹•Institutions (1)

National Institute of Technology Nagaland¹

06 Nov 2020-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: A new human intelligence-based metaheuristic optimization technique, that is, aggrandized class topper optimization (CTO), is proposed to solve ELD and CEED problems and proves that the proposed algorithm provides better and effective results in almost each test case.

...read moreread less

Abstract: Optimization techniques are widely being used to solve large and complex economical load dispatch (ELD) and combined emission economical dispatch (CEED) problems in power systems. These techniques can solve these problems in a short computational time. In this article, a new human intelligence-based metaheuristic optimization technique, that is, aggrandized class topper optimization (CTO), is proposed to solve ELD and CEED problems. This proposed algorithm is an upgraded form of classical CTO in which the concept of remedial classes is incorporated to enhance the learning ability of weak students of a class. To validate the exploration, exploitation, convergence, and local minima avoidance capabilities of the proposed algorithm, 29 benchmark functions are considered. Furthermore, seven different test cases for the ELD problem and four test cases for a CEED problem are considered to test the effectiveness of the proposed algorithm to solve these complex problems. The result analysis proves that the proposed algorithm provides better and effective results in almost each test case.

...read moreread less

Journal Article•DOI•

Geometry of Energy Landscapes and the Optimizability of Deep Neural Networks.

[...]

Simon Becker¹, Yao Zhang¹, Alpha A. Lee¹•Institutions (1)

University of Cambridge¹

10 Mar 2020-Physical Review Letters

TL;DR: This work analytically shows that the multilayered structure holds the key to optimizability: Fixing the number of parameters and increasing network depth, theNumber of stationary points in the loss function decreases, minima become more clustered in parameter space, and the trade-off between the depth and width of minima becomes less severe.

...read moreread less

Abstract: Deep neural networks are workhorse models in machine learning with multiple layers of nonlinear functions composed in series. Their loss function is highly nonconvex, yet empirically even gradient descent minimization is sufficient to arrive at accurate and predictive models. It is hitherto unknown why deep neural networks are easily optimizable. We analyze the energy landscape of a spin glass model of deep neural networks using random matrix theory and algebraic geometry. We analytically show that the multilayered structure holds the key to optimizability: Fixing the number of parameters and increasing network depth, the number of stationary points in the loss function decreases, minima become more clustered in parameter space, and the trade-off between the depth and width of minima becomes less severe. Our analytical results are numerically verified through comparison with neural networks trained on a set of classical benchmark datasets. Our model uncovers generic design principles of machine learning models.

...read moreread less

Journal Article•DOI•

Maximum number of modes of Gaussian mixtures

[...]

Carlos Améndola¹, Alexander Engström², Christian Haase³•Institutions (3)

Technische Universität München¹, Aalto University², Free University of Berlin³

21 Sep 2020-Information and Inference: A Journal of the IMA

TL;DR: In particular, it is not known how many modes a mixture of $k$ Gaussians in $d$ dimensions can have, provided it is finite as mentioned in this paper. But the upper bound on the maximum number of modes is known.

...read moreread less

Abstract: Gaussian mixture models are widely used in Statistics. A fundamental aspect of these distributions is the study of the local maxima of the density, or modes. In particular, it is not known how many modes a mixture of $k$ Gaussians in $d$ dimensions can have. We give a brief account of this problem's history. Then, we give improved lower bounds and the first upper bound on the maximum number of modes, provided it is finite.

...read moreread less

Proceedings Article•

Bad Global Minima Exist and SGD Can Reach Them

[...]

Shengchao Liu¹, Dimitris S. Papailiopoulos¹, Dimitris Achlioptas²•Institutions (2)

University of Wisconsin-Madison¹, University of California, Santa Cruz²

01 Jan 2020

TL;DR: It is shown that regularization seems to provide SGD with an escape route: once heuristics such as data augmentation are used, starting from a complex model (adversarial initialization) has no effect on the test accuracy.

...read moreread less

Abstract: Several recent works have aimed to explain why severely overparameterized models, generalize well when trained by Stochastic Gradient Descent (SGD). The emergent consensus explanation has two parts: the first is that there are "no bad local minima", while the second is that SGD performs implicit regularization by having a bias towards low complexity models. We revisit both of these ideas in the context of image classification with common deep neural network architectures. Our first finding is that there exist bad global minima, i.e., models that fit the training set perfectly, yet have poor generalization. Our second finding is that given only unlabeled training data, we can easily construct initializations that will cause SGD to quickly converge to such bad global minima. For example, on CIFAR, CINIC10, and (Restricted) ImageNet, this can be achieved by starting SGD at a model derived by fitting random labels on the training data: while subsequent SGD training (with the correct labels) will reach zero training error, the resulting model will exhibit a test accuracy degradation of up to 40% compared to training from a random initialization. Finally, we show that regularization seems to provide SGD with an escape route: once heuristics such as data augmentation are used, starting from a complex model (adversarial initialization) has no effect on the test accuracy.

...read moreread less

Proceedings Article•

Optimizing Mode Connectivity via Neuron Alignment

[...]

Norman Joseph Tatro¹, Pin-Yu Chen², Payel Das², Igor Melnyk², Prasanna Sattigeri², Rongjie Lai¹ - Show less +2 more•Institutions (2)

Rensselaer Polytechnic Institute¹, IBM²

01 Jan 2020

TL;DR: This work proposes a more general framework to investigate the effect of symmetry on landscape connectivity by accounting for the weight permutations of the networks being connected by introducing an inexpensive heuristic referred to as neuron alignment.

...read moreread less

Abstract: The loss landscapes of deep neural networks are not well understood due to their high nonconvexity. Empirically, the local minima of these loss functions can be connected by a learned curve in model space, along which the loss remains nearly constant; a feature known as mode connectivity. Yet, current curve finding algorithms do not consider the influence of symmetry in the loss surface created by model weight permutations. We propose a more general framework to investigate the effect of symmetry on landscape connectivity by accounting for the weight permutations of the networks being connected. To approximate the optimal permutation, we introduce an inexpensive heuristic referred to as neuron alignment. Neuron alignment promotes similarity between the distribution of intermediate activations of models along the curve. We provide theoretical analysis establishing the benefit of alignment to mode connectivity based on this simple heuristic. We empirically verify that the permutation given by alignment is locally optimal via a proximal alternating minimization scheme. Empirically, optimizing the weight permutation is critical for efficiently learning a simple, planar, low-loss curve between networks that successfully generalizes. Our alignment method can significantly alleviate the recently identified robust loss barrier on the path connecting two adversarial robust models and find more robust and accurate models on the path.

...read moreread less

Proceedings Article•

Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks

[...]

Alexander Shevchenko¹, Marco Mondelli¹•Institutions (1)

Institute of Science and Technology Austria¹

12 Jul 2020

TL;DR: In this paper, the authors show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately connected and thus more favorable to optimization.

...read moreread less

Abstract: The optimization of multilayer neural networks typically leads to a solution with zero training error, yet the landscape can exhibit spurious local minima and the minima can be disconnected. In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately connected and thus more favorable to optimization. More specifically, we prove that SGD solutions are connected via a piecewise linear path, and the increase in loss along this path vanishes as the number of neurons grows large. This result is a consequence of the fact that the parameters found by SGD are increasingly dropout stable as the network becomes wider. We show that, if we remove part of the neurons (and suitably rescale the remaining ones), the change in loss is independent of the total number of neurons, and it depends only on how many neurons are left. Our results exhibit a mild dependence on the input dimension: they are dimension-free for two-layer networks and depend linearly on the dimension for multilayer networks. We validate our theoretical findings with numerical experiments for different architectures and classification tasks.

...read moreread less

Collapse