Showing papers on "Maxima and minima published in 2021"

PDF

Open Access

Journal Article•DOI•

Efficient Artificial neural networks based on a hybrid metaheuristic optimization algorithm for damage detection in laminated composite structures

[...]

H. Tran-Ngoc¹, Samir Khatir², H. Ho-Khac, G. De Roeck³, Thanh Bui-Tien, M. Abdel Wahab⁴ - Show less +2 more•Institutions (4)

Ghent University¹, Open University², Katholieke Universiteit Leuven³, Ton Duc Thang University⁴

15 Apr 2021-Composite Structures

TL;DR: A novel approach applying the fast convergence speed of GD techniques of ANN and the global search capacity of EAs to train the network and proves that the proposed method is superior to traditional ANN, other hybrid-ANNs, and HGACs in terms of accuracy, and significantly reduces computational time compared with HGACS.

...read moreread less

81 citations

Journal Article•DOI•

Zeroth-Order Nonconvex Stochastic Optimization: Handling Constraints, High Dimensionality, and Saddle Points

[...]

Krishnakumar Balasubramanian¹, Saeed Ghadimi²•Institutions (2)

University of California, Davis¹, University of Waterloo²

19 Mar 2021-Foundations of Computational Mathematics

TL;DR: This paper proposes and analyzes zeroth-order stochastic approximation algorithms for nonconvex and convex optimization, with a focus on addressing constrained optimization, high-dimensional setting, and saddle point avoiding.

...read moreread less

Abstract: In this paper, we propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization, with a focus on addressing constrained optimization, high-dimensional setting, and saddle point avoiding. To handle constrained optimization, we first propose generalizations of the conditional gradient algorithm achieving rates similar to the standard stochastic gradient algorithm using only zeroth-order information. To facilitate zeroth-order optimization in high dimensions, we explore the advantages of structural sparsity assumptions. Specifically, (i) we highlight an implicit regularization phenomenon where the standard stochastic gradient algorithm with zeroth-order information adapts to the sparsity of the problem at hand by just varying the step size and (ii) propose a truncated stochastic gradient algorithm with zeroth-order information, whose rate of convergence depends only poly-logarithmically on the dimensionality. We next focus on avoiding saddle points in nonconvex setting. Toward that, we interpret the Gaussian smoothing technique for estimating gradient based on zeroth-order information as an instantiation of first-order Stein’s identity. Based on this, we provide a novel linear-(in dimension) time estimator of the Hessian matrix of a function using only zeroth-order information, which is based on second-order Stein’s identity. We then provide a zeroth-order variant of cubic regularized Newton method for avoiding saddle points and discuss its rate of convergence to local minima.

...read moreread less

65 citations

Journal Article•DOI•

Optimization for Oriented Object Detection via Representation Invariance Loss

[...]

Qi Ming, Zhiqiang Zhou, Lingjuan Miao, Xue Yang, Yunpeng Dong - Show less +1 more

22 Mar 2021-IEEE Geoscience and Remote Sensing Letters

TL;DR: In this paper, a Representation Invariance Loss (RIL) is proposed to optimize the bounding box regression for the rotating objects, which treats multiple representations of an oriented object as multiple equivalent local minima, and hence transforms bounding boxes regression into an adaptive matching process with these local minimima.

...read moreread less

Abstract: Arbitrary-oriented objects exist widely in natural scenes, and thus the oriented object detection has received extensive attention in recent years The mainstream rotation detectors use oriented bounding boxes (OBB) or quadrilateral bounding boxes (QBB) to represent the rotating objects However, these methods suffer from the representation ambiguity for oriented object definition, which leads to suboptimal regression optimization and the inconsistency between the loss metric and the localization accuracy of the predictions In this paper, we propose a Representation Invariance Loss (RIL) to optimize the bounding box regression for the rotating objects Specifically, RIL treats multiple representations of an oriented object as multiple equivalent local minima, and hence transforms bounding box regression into an adaptive matching process with these local minima Then, the Hungarian matching algorithm is adopted to obtain the optimal regression strategy We also propose a normalized rotation loss to alleviate the weak correlation between different variables and their unbalanced loss contribution in OBB representation Extensive experiments on remote sensing datasets and scene text datasets show that our method achieves consistent and substantial improvement The source code and trained models are available at this https URL

...read moreread less

57 citations

Journal Article•DOI•

Improved Shuffled Jaya algorithm for sizing optimization of skeletal structures with discrete variables

[...]

Ali Kaveh¹, Seyed Milad Hosseini¹, Ataollah Zaerreza¹•Institutions (1)

Iran University of Science and Technology¹

01 Feb 2021-Structures

TL;DR: The Improved Shuffled based Jaya (IS-Jaya) algorithm is proposed, which uses the concept of shuffling process to gain superior exploration capability in the search mechanism and can be an effective tool for solving discrete size optimization of skeletal structures.

...read moreread less

34 citations

Journal Article•DOI•

Nucleation is more than critical: A case study of the electroweak phase transition in the NMSSM

[...]

Sebastian Baum¹, Marcela Carena², Marcela Carena³, Nausheen R. Shah⁴, Carlos E. M. Wagner², Carlos E. M. Wagner⁵, Yikun Wang³, Yikun Wang² - Show less +4 more•Institutions (5)

Stanford University¹, University of Chicago², Fermilab³, Wayne State University⁴, Argonne National Laboratory⁵

04 Mar 2021-Journal of High Energy Physics

TL;DR: In this paper, the authors compare the phase transition patterns suggested by the vacuum structure at the critical temperatures, at which local minima are degenerate, with those obtained from computing the probability for nucleation via tunneling through the barrier separating local minimum.

...read moreread less

Abstract: Electroweak baryogenesis is an attractive mechanism to generate the baryon asymmetry of the Universe via a strong first order electroweak phase transition. We compare the phase transition patterns suggested by the vacuum structure at the critical temperatures, at which local minima are degenerate, with those obtained from computing the probability for nucleation via tunneling through the barrier separating local minima. Heuristically, nucleation becomes difficult if the barrier between the local minima is too high, or if the distance (in field space) between the minima is too large. As an example of a model exhibiting such behavior, we study the Next-to-Minimal Supersymmetric Standard Model, whose scalar sector contains two SU(2) doublets and one gauge singlet. We find that the calculation of the nucleation probabilities prefers different regions of parameter space for a strong first order electroweak phase transition than the calculation based solely on the critical temperatures. Our results demonstrate that analyzing only the vacuum structure via the critical temperatures can provide a misleading picture of the phase transition patterns, and, in turn, of the parameter space suitable for electroweak baryogenesis.

...read moreread less

31 citations

Journal Article•DOI•

Optimization for Arbitrary-Oriented Object Detection via Representation Invariance Loss

[...]

Qi Ming¹, Lingjuan Miao¹, Zhiqiang Zhou¹, Xue Yang², Yunpeng Dong¹ - Show less +1 more•Institutions (2)

Beijing Institute of Technology¹, Shanghai Jiao Tong University²

01 Oct 2021-IEEE Geoscience and Remote Sensing Letters

TL;DR: In this article, a representation invariance loss (RIL) is proposed to optimize the bounding box regression for the rotating objects in remote sensing images, which treats multiple representations of an oriented object as multiple equivalent local minima.

...read moreread less

Abstract: Arbitrary-oriented objects exist widely in remote sensing images. The mainstream rotation detectors use oriented bounding boxes (OBBs) or quadrilateral bounding boxes (QBBs) to represent the rotating objects. However, these methods suffer from the representation ambiguity for oriented object definition, which leads to suboptimal regression optimization and the inconsistency between the loss metric and the localization accuracy of the predictions. In this letter, we propose a representation invariance loss (RIL) to optimize the bounding box regression for the rotating objects in the remote sensing images. RIL treats multiple representations of an oriented object as multiple equivalent local minima and hence transforms bounding box regression into an adaptive matching process with these local minima. Next, the Hungarian matching algorithm is adopted to obtain the optimal regression strategy. Besides, we propose a normalized rotation loss to alleviate the weak correlation between different variables and their unbalanced loss contribution in OBB representation. Extensive experiments on remote sensing datasets show that our method achieves consistent and substantial improvement. The code and models are available at https://github.com/ming71/RIDet to facilitate future research.

...read moreread less

31 citations

Journal Article•DOI•

A fast and accurate similarity measure for long time series classification based on local extrema and dynamic time warping

[...]

Abdelmadjid Lahreche¹, Bachir Boucheham¹•Institutions (1)

University of Skikda¹

15 Apr 2021-Expert Systems With Applications

TL;DR: A new parameter-free measure for the specific purpose of quickly and accurately assessing the similarity between two given long time series, which outperforms DTW while providing competitive results against popular distance-based classifiers and is orders of magnitude faster than DTW.

...read moreread less

Abstract: The problem of similarity measures is a major area of interest within the field of time series classification (TSC). With the ubiquitous of long time series and the increasing demand for analyzing them on limited resource devices, there is a crucial need for efficient and accurate measures to deal with such kind of data. In fact, there are a plethora of good time series similarity measures in the literature. However, most existing methods achieve good performance for short time series, but their effectiveness decreases quickly as time series are longer. In this paper, we develop a new parameter-free measure for the specific purpose of quickly and accurately assessing the similarity between two given long time series. The proposed “Local Extrema Dynamic Time Warping” (LE-DTW) consists of two steps. The first is a time series representation technique that starts by reducing the dimensionality of a given time series using its local extrema. Next, it physically separates the minima and maxima points for more intuitiveness and consistency of the so-obtained time series representation. The second step consists in adapting the Dynamic Time Warping (DTW) measure so as to evaluate the score of similarity between the generated representations. We test the performance of LE-DTW on a wide range of real-world problems from the UCR time series archive for TSC. Experimental results indicate that for short time series, the proposed method achieves reasonable classification accuracy as compared to DTW. However, for long time series, LE-DTW performs much better. Indeed, it outperforms DTW while providing competitive results against popular distance-based classifiers. Moreover, in terms of efficiency, LE-DTW is orders of magnitude faster than DTW.

...read moreread less

29 citations

Journal Article•DOI•

Initializing k-means Clustering by Bootstrap and Data Depth

[...]

Aurora Torrente¹, Juan Romo¹•Institutions (1)

Charles III University of Madrid¹

01 Jul 2021-Journal of Classification

TL;DR: This paper explores a simple, computationally feasible method, which provides k-means with a set of initial seeds to cluster datasets of arbitrary dimensions, and shows that it is feasible and more efficient than previous proposals in many situations.

...read moreread less

Abstract: The k-means algorithm is widely used in various research fields because of its fast convergence to the cost function minima; however, it frequently gets stuck in local optima as it is sensitive to initial conditions. This paper explores a simple, computationally feasible method, which provides k-means with a set of initial seeds to cluster datasets of arbitrary dimensions. Our technique consists of two stages: firstly, we use the original data space to obtain a set of prototypes (cluster centers) by applying k-means to bootstrap replications of the data and, secondly, we cluster the space of centers, which has tighter (thus easier to separate) groups, and search the deepest point in each assembled cluster using a depth notion. We test this method with simulated and real data, compare it with commonly used k-means initialization algorithms, and show that it is feasible and more efficient than previous proposals in many situations.

...read moreread less

27 citations

Journal Article•DOI•

Quantum annealing initialization of the quantum approximate optimization algorithm

[...]

Stefan Sack¹, Maksym Serbyn¹•Institutions (1)

Institute of Science and Technology Austria¹

14 Jan 2021-arXiv: Quantum Physics

TL;DR: In this paper, the Trotterized quantum annealing (TQA) protocol is proposed to circumvent the issue of false minima for a broad range of time steps, yielding the same performance as the best result out of an exponentially scaling number of random initializations.

...read moreread less

Abstract: The quantum approximate optimization algorithm (QAOA) is a prospective near-term quantum algorithm due to its modest circuit depth and promising benchmarks. However, an external parameter optimization required in QAOA could become a performance bottleneck. This motivates studies of the optimization landscape and search for heuristic ways of parameter initialization. In this work we visualize the optimization landscape of the QAOA applied to the MaxCut problem on random graphs, demonstrating that random initialization of the QAOA is prone to converging to local minima with sub-optimal performance. We introduce the initialization of QAOA parameters based on the Trotterized quantum annealing (TQA) protocol, parameterized by the Trotter time step. We find that the TQA initialization allows to circumvent the issue of false minima for a broad range of time steps, yielding the same performance as the best result out of an exponentially scaling number of random initializations. Moreover, we demonstrate that the optimal value of the time step coincides with the point of proliferation of Trotter errors in quantum annealing. Our results suggest practical ways of initializing QAOA protocols on near-term quantum devices and reveals new connections between QAOA and quantum annealing.

...read moreread less

27 citations

Journal Article•DOI•

A Novel Smell Agent Optimization (SAO): An extensive CEC study and engineering application

[...]

A. T. Salawudeen¹, Muhammed Bashir Mu’azu², Yusuf A. Sha’aban², Adewale Emmanuel Adedokun²•Institutions (2)

University of Jos¹, Ahmadu Bello University²

28 Nov 2021-Knowledge Based Systems

TL;DR: An extensive study of a new metaheuristics algorithm called Smell Agent Optimization (SAO) on some CEC numerical optimization benchmark functions and Hybrid Renewable Energy System (HRES) engineering problems revealed that the SAO can find the global optimum in 76% of the benchmark functions.

...read moreread less

Abstract: This paper presents an extensive study of a new metaheuristics algorithm called Smell Agent Optimization (SAO) on some CEC numerical optimization benchmark functions and Hybrid Renewable Energy System (HRES) engineering problems. The SAO implements the relationships between a smell agent and an object evaporating a smell molecule. These relationships are modelled into three separate modes called the sniffing, trailing and random modes. The sniffing mode simulates the smell perception capability of the agent as the smell molecules diffuse from a smell source towards the agent. The trailing mode simulates the capability of the agent to track the part of the smell molecules until its source is identified. Whereas, the random mode is a strategy employed by the agent to avoid getting stuck in local minima. Thirty-seven commonly used CEC benchmark functions, and HRES engineering problem are tested, and results are compared with six other metaheuristics methods. Experimental results revealed that the SAO can find the global optimum in 76% of the benchmark functions. Similarly, statistical results showed that the SAO also obtained the most cost effective HRES design compared to the benchmarked algorithms.

...read moreread less

26 citations

Journal Article•DOI•

Multipatch Feature Pyramid Network for Weakly Supervised Object Detection in Optical Remote Sensing Images

[...]

Pourya Shamsolmoali¹, Jocelyn Chanussot², Masoumeh Zareapoor¹, Huiyu Zhou³, Jie Yang¹ - Show less +1 more•Institutions (3)

Shanghai Jiao Tong University¹, University of Grenoble², University of Leicester³

30 Aug 2021-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A new architecture with a multipatch feature pyramid network (MPFP-Net) is proposed to achieve better accuracy compared to several state-of-the-art object detection models and introduces an effective method to regularize the residual values and make the fusion transition layers strictly norm-preserving.

...read moreread less

Abstract: Object detection is a challenging task in remote sensing because objects only occupy a few pixels in the images, and the models are required to simultaneously learn object locations and detection. Even though the established approaches well perform for the objects of regular sizes, they achieve weak performance when analyzing small ones or getting stuck in the local minima (e.g., false object parts). Two possible issues stand in their way. First, the existing methods struggle to perform stably on the detection of small objects because of the complicated background. Second, most of the standard methods used handcrafted features and do not work well on the detection of objects parts that are missing. We here address the above issues and propose a new architecture with a multipatch feature pyramid network (MPFP-Net). Different from the current models that, during training, only pursue the most discriminative patches, in MPFP-Net, the patches are divided into class-affiliated subsets, in which the patches are related, and based on the primary loss function, a sequence of smooth loss functions is determined for the subsets to improve the model for collecting small object parts. To enhance the feature representation for patch selection, we introduce an effective method to regularize the residual values and make the fusion transition layers strictly norm-preserving. The network contains bottom-up and crosswise connections to fuse the features of different scales to achieve better accuracy compared to several state-of-the-art object detection models. Also, the developed architecture is more efficient than the baselines.

...read moreread less

Journal Article•DOI•

A symbiosis of arithmetic optimizer with slime mould algorithm for improving global optimization and conventional design problem

[...]

Sumika Chauhan¹, Govind Vashishtha¹, Anil Kumar²•Institutions (2)

Sant Longowal Institute of Engineering and Technology¹, Wenzhou University²

12 Oct 2021-The Journal of Supercomputing

TL;DR: An arithmetic optimizer algorithm (AOA) is hybridized with slime mold algorithm (SMA) to address the issue of less internal memory and slow convergence at local minima which is termed as HAOASMA as mentioned in this paper.

...read moreread less

Abstract: An arithmetic optimizer algorithm (AOA) is hybridized with slime mould algorithm (SMA) to address the issue of less internal memory and slow convergence at local minima which is termed as HAOASMA. Lens opposition-based learning strategy is also integrated with the hybrid algorithm which enhances the population diversity of the hybrid optimizer to accelerate the convergence. The local best ( $$P_{{{\text{best}}}} )$$ and global best ( $$g_{{{\text{best}}}} )$$ of SMA initializes the AOA’s search process. The $$P_{{{\text{best}}}}$$ obtained from AOA again initializes the SMA to further exploit the search space. In this way, the developed hybrid algorithm utilizes the exploitation and exploration capabilities of SMA and AOA, respectively. The developed HAOASMA has been compared on twenty-three benchmark functions at different dimensions with basic SMA, AOA and six renowned meta-heuristic algorithms. The HAOASMA has also been applied to classical engineering design problems. The performance of HAOASMA is significantly superior compared to basic SMA, AOA and other meta-heuristic algorithms.

...read moreread less

Journal Article•DOI•

The inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minima

[...]

Yu Feng¹, Yuhai Tu¹•Institutions (1)

IBM¹

02 Mar 2021-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: In this article, the authors investigated the connection between SGD learning dynamics and the loss function landscape and found that SGD serves as a landscape-dependent annealing algorithm, which is the opposite to the fluctuation-response relation in equilibrium statistical physics.

...read moreread less

Abstract: Despite tremendous success of the stochastic gradient descent (SGD) algorithm in deep learning, little is known about how SGD finds generalizable solutions at flat minima of the loss function in high-dimensional weight space. Here, we investigate the connection between SGD learning dynamics and the loss function landscape. A principal component analysis (PCA) shows that SGD dynamics follow a low-dimensional drift-diffusion motion in the weight space. Around a solution found by SGD, the loss function landscape can be characterized by its flatness in each PCA direction. Remarkably, our study reveals a robust inverse relation between the weight variance and the landscape flatness in all PCA directions, which is the opposite to the fluctuation-response relation (aka Einstein relation) in equilibrium statistical physics. To understand the inverse variance-flatness relation, we develop a phenomenological theory of SGD based on statistical properties of the ensemble of minibatch loss functions. We find that both the anisotropic SGD noise strength (temperature) and its correlation time depend inversely on the landscape flatness in each PCA direction. Our results suggest that SGD serves as a landscape-dependent annealing algorithm. The effective temperature decreases with the landscape flatness so the system seeks out (prefers) flat minima over sharp ones. Based on these insights, an algorithm with landscape-dependent constraints is developed to mitigate catastrophic forgetting efficiently when learning multiple tasks sequentially. In general, our work provides a theoretical framework to understand learning dynamics, which may eventually lead to better algorithms for different learning tasks.

...read moreread less

Journal Article•DOI•

Stochastic Mirror Descent on Overparameterized Nonlinear Models.

[...]

Navid Azizan, Sahin Lale, Babak Hassibi

16 Jul 2021-IEEE Transactions on Neural Networks

TL;DR: This article shows that for sufficiently- overparameterized nonlinear models, SMD with a (small enough) fixed step size converges to a global minimum that is ``very close'' (in Bregman divergence) to the minimum-potential interpolating solution, thus attaining approximate implicit regularization.

...read moreread less

Abstract: Most modern learning problems are highly overparameterized, i.e., have many more model parameters than the number of training data points. As a result, the training loss may have infinitely many global minima (parameter vectors that perfectly ``interpolate'' the training data). It is therefore imperative to understand which interpolating solutions we converge to, how they depend on the initialization and learning algorithm, and whether they yield different test errors. In this article, we study these questions for the family of stochastic mirror descent (SMD) algorithms, of which stochastic gradient descent (SGD) is a special case. Recently, it has been shown that for overparameterized linear models, SMD converges to the closest global minimum to the initialization point, where closeness is in terms of the Bregman divergence corresponding to the potential function of the mirror descent. With appropriate initialization, this yields convergence to the minimum-potential interpolating solution, a phenomenon referred to as implicit regularization. On the theory side, we show that for sufficiently- overparameterized nonlinear models, SMD with a (small enough) fixed step size converges to a global minimum that is ``very close'' (in Bregman divergence) to the minimum-potential interpolating solution, thus attaining approximate implicit regularization. On the empirical side, our experiments on the MNIST and CIFAR-10 datasets consistently confirm that the above phenomenon occurs in practical scenarios. They further indicate a clear difference in the generalization performances of different SMD algorithms: experiments on the CIFAR-10 dataset with different regularizers, l₁ to encourage sparsity, l₂ (SGD) to encourage small Euclidean norm, and l to discourage large components, surprisingly show that the l norm consistently yields better generalization performance than SGD, which in turn generalizes better than the l₁ norm.

...read moreread less

Journal Article•DOI•

Phase Retrieval for Partially Coherent Observations

[...]

Jonas Kornprobst¹, Alexander Paulus¹, Josef Knapp¹, Thomas F. Eibert¹•Institutions (1)

Technische Universität München¹

04 Feb 2021-IEEE Transactions on Signal Processing

TL;DR: This work considers the case where the measurement samples within typically very small and disconnected subsets are coherently linked to each other — which is a reasonable assumption for the objective of antenna measurements.

...read moreread less

Abstract: Phase retrieval is in general a non-convex and non-linear task and the corresponding algorithms struggle with the issue of local minima. We consider the case where the measurement samples within typically very small and disconnected subsets are coherently linked to each other — which is a reasonable assumption for our objective of antenna measurements. Two classes of measurement setups are discussed which can provide this kind of extra information: multi-probe systems and holographic measurements with multiple reference signals. We propose several formulations of the corresponding phase retrieval problem. The simplest of these formulations poses a linear system of equations similar to an eigenvalue problem where a unique non-trivial null-space vector needs to be found. Accurate phase reconstruction for partially coherent observations is, thus, possible by a reliable solution process and with judgment of the solution quality. Under ideal, noise-free conditions, the required sampling density is less than two times the number of unknowns. Noise and other observation errors increase this value slightly. Simulations for Gaussian random matrices and for antenna measurement scenarios demonstrate that reliable phase reconstruction is possible with the presented approach.

...read moreread less

Journal Article•DOI•

Bilevel Parameter Learning for Nonlocal Image Denoising Models

[...]

Marta D'Elia¹, J.C. De los Reyes², A. Miniguano-Trujillo²•Institutions (2)

Sandia National Laboratories¹, National Technical University²

24 Apr 2021-Journal of Mathematical Imaging and Vision

TL;DR: A bilevel optimization approach for the estimation of parameters in nonlocal image denoising models that investigates the differentiability of the solution operator in function spaces and derives a first-order optimality system that characterizes local minima.

...read moreread less

Abstract: We propose a bilevel optimization approach for the estimation of parameters in nonlocal image denoising models. The parameters we consider are both the fidelity weight and weights within the kernel of the nonlocal operator. In both cases, we investigate the differentiability of the solution operator in function spaces and derive a first-order optimality system that characterizes local minima. For the numerical solution of the problems, we use a second-order trust-region algorithm in combination with a finite element discretization of the nonlocal denoising models and introduce a computational strategy for the solution of the resulting dense linear systems. Several experiments illustrate the applicability and effectiveness of our approach.

...read moreread less

Journal Article•DOI•

Topological design of microstructures using periodic material-field series-expansion and gradient-free optimization algorithm

[...]

Pai Liu¹, Yi Yan¹, Xiaopeng Zhang¹, Yangjun Luo¹, Zhan Kang¹ - Show less +1 more•Institutions (1)

Dalian University of Technology¹

01 Feb 2021-Materials & Design

TL;DR: In this article, an effective gradient-free framework for periodic microstructure design is presented, which exhibits powerful global searching capabilities and requires no sensitivity information. But, the optimization problem is known to have multiple local minima, and most gradient-based topology optimization methods significantly depend on the initial guess of the microstructural geometry, thus requiring the designer's experiences.

...read moreread less

Journal Article•DOI•

Novel local tuning techniques for speeding up one-dimensional algorithms in expensive global optimization using Lipschitz derivatives

[...]

Yaroslav D. Sergeyev¹, Yaroslav D. Sergeyev², Maria Chiara Nasso¹, Marat S. Mukhametzhanov¹, Marat S. Mukhametzhanov², Dmitri E. Kvasov², Dmitri E. Kvasov¹ - Show less +3 more•Institutions (2)

University of Calabria¹, N. I. Lobachevsky State University of Nizhny Novgorod²

01 Feb 2021-Journal of Computational and Applied Mathematics

TL;DR: Numerical experiments executed on two classes of randomly generated test functions show a promising behavior of global optimization methods using the introduced local tuning techniques for speeding up the process of the global search.

...read moreread less

Journal Article•DOI•

Coarse-to-Fine: Progressive Knowledge Transfer-Based Multitask Convolutional Neural Network for Intelligent Large-Scale Fault Diagnosis.

[...]

Yu Wang¹, Ruonan Liu¹, Di Lin¹, Dongyue Chen¹, Ping Li², Qinghua Hu¹, C. L. Philip Chen³ - Show less +3 more•Institutions (3)

Tianjin University¹, Hong Kong Polytechnic University², South China University of Technology³

09 Aug 2021-IEEE Transactions on Neural Networks

TL;DR: Wang et al. as mentioned in this paper proposed a progressive knowledge transfer-based multitask convolutional neural network (PKT-MCNN) to address the problems of intra/inter-class distance unbalance and poor local minima.

...read moreread less

Abstract: In modern industry, large-scale fault diagnosis of complex systems is emerging and becoming increasingly important. Most deep learning-based methods perform well on small number of fault diagnosis, but cannot converge to satisfactory results when handling large-scale fault diagnosis because the huge number of fault types will lead to the problems of intra/inter-class distance unbalance and poor local minima in neural networks. To address the above problems, a progressive knowledge transfer-based multitask convolutional neural network (PKT-MCNN) is proposed. First, to construct the coarse-to-fine knowledge structure intelligently, a structure learning algorithm is proposed via clustering fault types in different coarse-grained nodes. Thus, the intra/inter-class distance unbalance problem can be mitigated by spreading similar tasks into different nodes. Then, an MCNN architecture is designed to learn the coarse and fine-grained task simultaneously and extract more general fault information, thereby pushing the algorithm away from poor local minima. Last but not least, a PKT algorithm is proposed, which can not only transfer the coarse-grained knowledge to the fine-grained task and further alleviate the intra/inter-class distance unbalance in feature space, but also regulate different learning stages by adjusting the attention weight to each task progressively. To verify the effectiveness of the proposed method, a dataset of a nuclear power system with 66 fault types was collected and analyzed. The results demonstrate that the proposed method can be a promising tool for large-scale fault diagnosis.

...read moreread less

Journal Article•DOI•

The landscape law for the integrated density of states

[...]

Guy David¹, Marcel Filoche², Svitlana Mayboroda³•Institutions (3)

Université Paris-Saclay¹, École Polytechnique², University of Minnesota³

29 Oct 2021-Advances in Mathematics

TL;DR: In this paper, nonasymptotic estimates from above and below on the integrated density of states of the Schrodinger operator L = − Δ + V, using a counting function for the minima of the localization landscape, were established.

...read moreread less

Journal Article•DOI•

Fast and robust algorithm for energy minimization of spin systems applied in an analysis of high temperature spin configurations in terms of skyrmion density

[...]

Aleksei V. Ivanov¹, Aleksei V. Ivanov², Valery M. Uzdin², Valery M. Uzdin³, Hannes Jónsson⁴, Hannes Jónsson¹ - Show less +2 more•Institutions (4)

University of Iceland¹, Saint Petersburg State University², Saint Petersburg State University of Information Technologies, Mechanics and Optics³, Aalto University⁴

01 Mar 2021-Computer Physics Communications

TL;DR: An algorithm for the minimization of the energy of magnetic systems is presented and applied to the analysis of thermal configurations of a ferromagnet to identify inherent structures, i.e. the nearest local energy minima, as a function of temperature.

...read moreread less

Proceedings Article•

A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima

[...]

Zeke Xie¹, Issei Sato¹, Masashi Sugiyama•Institutions (1)

University of Tokyo¹

03 May 2021

TL;DR: In this paper, the Hessian-dependent covariance of stochastic gradient noise was used to show that SGD favors flat minima exponentially more than sharp minima, while Gradient Descent (GD) with injected white noise favors flat minimization only polynomially more than the sharp minimization.

...read moreread less

Abstract: Stochastic Gradient Descent (SGD) and its variants are mainstream methods for training deep networks in practice. SGD is known to find a flat minimum that often generalizes well. However, it is mathematically unclear how deep learning can select a flat minimum among so many minima. To answer the question quantitatively, we develop a density diffusion theory (DDT) to reveal how minima selection quantitatively depends on the minima sharpness and the hyperparameters. To the best of our knowledge, we are the first to theoretically and empirically prove that, benefited from the Hessian-dependent covariance of stochastic gradient noise, SGD favors flat minima exponentially more than sharp minima, while Gradient Descent (GD) with injected white noise favors flat minima only polynomially more than sharp minima. We also reveal that either a small learning rate or large-batch training requires exponentially many iterations to escape from minima in terms of the ratio of the batch size and learning rate. Thus, large-batch training cannot search flat minima efficiently in a realistic computational time.

...read moreread less

Journal Article•DOI•

Global Minima of Overparameterized Neural Networks

[...]

Yaim Cooper

13 May 2021

TL;DR: In this article, the authors explore some mathematical features of the loss landscape of overparameterized neural networks, and show that the loss function looks like a typical function from φ-mathbb{R}^d.

...read moreread less

Abstract: We explore some mathematical features of the loss landscape of overparameterized neural networks. A priori, one might imagine that the loss function looks like a typical function from $\mathbb{R}^d...

...read moreread less

Proceedings Article•DOI•

Simulated Annealing: a Review and a New Scheme

[...]

Thomas Guilmeau¹, Emilie Chouzenoux¹, Victor Elvira²•Institutions (2)

Université Paris-Saclay¹, University of Edinburgh²

11 Jul 2021

TL;DR: In this paper, the authors provide a comprehensive overview on SA and its accelerated variants, and propose a novel SA scheme called curious simulated annealing, combining the assets of two recent acceleration strategies.

...read moreread less

Abstract: Finding the global minimum of a nonconvex optimization problem is a notoriously hard task appearing in numerous applications, from signal processing to machine learning. Simulated annealing (SA) is a family of stochastic optimization methods where an artificial temperature controls the exploration of the search space while preserving convergence to the global minima. SA is efficient, easy to implement, and theoretically sound, but suffers from a slow convergence rate. The purpose of this work is two-fold. First, we provide a comprehensive overview on SA and its accelerated variants. Second, we propose a novel SA scheme called curious simulated annealing, combining the assets of two recent acceleration strategies. Theoretical guarantees of this algorithm are provided. Its performance with respect to existing methods is illustrated on practical examples.

...read moreread less

Journal Article•DOI•

Efficient parallel surrogate optimization algorithm and framework with application to parameter calibration of computationally expensive three-dimensional hydrodynamic lake PDE models

[...]

Wei Xia¹, Christine A. Shoemaker¹, Taimoor Akhtar¹, Manh-Tuan Nguyen•Institutions (1)

National University of Singapore¹

01 Jan 2021-Environmental Modelling and Software

TL;DR: A new general-purpose parallel surrogate global optimization method, PODS, that reduces the number of model simulations as well as the human time needed for proper calibration of these multimodal problems without derivatives is presented.

...read moreread less

Abstract: Parameter calibration for computationally expensive environmental models (e.g., hydrodynamic models) is challenging because of limits on computing budget and on human time for analysis and because the optimization problem can have multiple local minima and no available derivatives. We present a new general-purpose parallel surrogate global optimization method P arallel O ptimization with D ynamic coordinate search using S urrogates (PODS) that reduces the number of model simulations as well as the human time needed for proper calibration of these multimodal problems without derivatives. PODS outperforms state-of-art parallel surrogate algorithms and a heuristic method, Parallel Differential Evolution (P-DE), on all eight well-known test problems. We further apply PODS to the parameter calibration of two expensive (5 h per simulation), three-dimensional hydrodynamic models with the assistant of High-Performance Computing (HPC). Results indicate that PODS outperforms the popularly used P-DE algorithm in speed (about twice faster) and accuracy with 24 parallel processors.

...read moreread less

Journal Article•DOI•

ModPSO-CNN: an evolutionary convolution neural network with application to visual recognition

[...]

Shanshan Tu¹, Sadaqat Ur Rehman², Muhammad Waqas³, Muhammad Waqas¹, Obaid Ur Rehman, Zubair Shah⁴, Zhongliang Yang⁵, Anis Koubaa⁶, Anis Koubaa⁷ - Show less +5 more•Institutions (7)

Beijing University of Technology¹, Namal College², Ghulam Ishaq Khan Institute of Engineering Sciences and Technology³, Khalifa University⁴, Tsinghua University⁵, Prince Sultan University⁶, Polytechnic Institute of Porto⁷

01 Feb 2021

TL;DR: The proposed ModPSO-CNN algorithm results in the fusion of modified particle swarm optimization (ModPSO) along with backpropagation (BP) and convolution neural network (CNN) to encourage performance improvement by avoiding premature convergence and local minima.

...read moreread less

Abstract: Training optimization plays a vital role in the development of convolution neural network (CNN). CNNs are hard to train because of the presence of multiple local minima. The optimization problem for a CNN is non-convex, hence, has multiple local minima. If any of the chosen hyper-parameters are not appropriate, it will end up at bad local minima, which leads to poor performance. Hence, proper optimization of the training algorithm for CNN is the key to converge to a good local minimum. Therefore, in this paper, we introduce an evolutionary convolution neural network (ModPSO-CNN) algorithm. The proposed algorithm results in the fusion of modified particle swarm optimization (ModPSO) along with backpropagation (BP) and convolution neural network (CNN). The training of CNN involves ModPSO along with backpropagation (BP) algorithm to encourage performance improvement by avoiding premature convergence and local minima. The ModPSO have adaptive, dynamic and improved parameters, to handle the issues in training CNN. The adaptive and dynamic parameters bring a proper balance between the global and local search ability, while an improved parameter keeps the diversity of the swarm. The proposed ModPSO algorithm is validated on three standard mathematical test functions and compared with three variants of the benchmark PSO algorithm. Furthermore, the performance of the proposed ModPSO-CNN is also compared with other training algorithms focusing on the analysis of computational cost, convergence and accuracy based on a standard problem specific to classification applications, such as CIFAR-10 dataset and face and skin detection dataset.

...read moreread less

Journal Article•DOI•

Computing Multiple Solutions of Topology Optimization Problems

[...]

Ioannis P. A. Papadopoulos, Patrick E. Farrell, Thomas M. Surowiec

06 May 2021-SIAM Journal on Scientific Computing

TL;DR: This paper presents an algorithm to perform a systematic exploratory search for the solutions of the optimization problem via second-order methods without a good initial guess, which combines the techniques of deflation, barrier methods and primal-dual active set solvers in a novel way.

...read moreread less

Abstract: Topology optimization problems often support multiple local minima due to a lack of convexity. Typically, gradient-based techniques combined with continuation in model parameters are used to promot...

...read moreread less

Journal Article•DOI•

H∞ optimization of multiple tuned mass dampers for multimodal vibration control

[...]

Ghislain Raze¹, Gaëtan Kerschen¹•Institutions (1)

University of Liège¹

01 May 2021-Computers & Structures

TL;DR: An original alternative to this approach through norm-homotopy optimization, combined with an efficient technique to compute the structural response, is shown to outperform direct H ∞ optimization in terms of speed and performance.

...read moreread less

Journal Article•DOI•

Escaping local minima with local derivative-free methods: a numerical investigation

[...]

Coralia Cartis¹, Lindon Roberts¹, Lindon Roberts², Oliver Sheridan-Methven¹•Institutions (2)

University of Oxford¹, Australian National University²

19 Feb 2021-Optimization

TL;DR: In this article, the authors investigate the potential of applying a state-of-the-art local derivative-free solver, Py-BOBYQA, to global optimization problems.

...read moreread less

Abstract: We investigate the potential of applying a state-of-the-art, local derivative-free solver, Py-BOBYQA to global optimization problems. In particular, we demonstrate the potential of a restarts proce...

...read moreread less

Posted Content•

Solving phase retrieval with random initial guess is nearly as good as by spectral initialization.

[...]

Jian-Feng Cai, Anil Lalwani¹, Meng Huang, Dong Li, Yang Wang - Show less +1 more•Institutions (1)

Wuhan University¹

10 Jan 2021-arXiv: Information Theory

TL;DR: This paper shows that the smoothed amplitude flow model for phase retrieval has benign geometric structure under the optimal sampling complexity and the gradient descent algorithm with random initialization performs well even comparing with state-ofthe-art algorithms with spectral initialization in empirical success rate and convergence speed.

...read moreread less

Abstract: The problem of recovering a signal $\mathbf{x}\in \mathbb{R}^n$ from a set of magnitude measurements $y_i=|\langle \mathbf{a}_i, \mathbf{x} \rangle |, \; i=1,\ldots,m$ is referred as phase retrieval, which has many applications in fields of physical sciences and engineering. In this paper we show that the smoothed amplitude flow model for phase retrieval has benign geometric structure under the optimal sampling complexity. In particular, we show that when the measurements $\mathbf{a}_i\in \mathbb{R}^n$ are Gaussian random vectors and the number of measurements $m\ge Cn$, our smoothed amplitude flow model has no spurious local minimizers with high probability, ie., the target solution $\mathbf{x}$ is the unique global minimizer (up to a global phase) and the loss function has a negative directional curvature around each saddle point. Due to this benign geometric landscape, the phase retrieval problem can be solved by the gradient descent algorithms without spectral initialization. Numerical experiments show that the gradient descent algorithm with random initialization performs well even comparing with state-of-the-art algorithms with spectral initialization in empirical success rate and convergence speed.

...read moreread less

Collapse