scispace - formally typeset
Search or ask a question

Showing papers on "Maxima and minima published in 2021"


Journal ArticleDOI
TL;DR: A novel approach applying the fast convergence speed of GD techniques of ANN and the global search capacity of EAs to train the network and proves that the proposed method is superior to traditional ANN, other hybrid-ANNs, and HGACs in terms of accuracy, and significantly reduces computational time compared with HGACS.

81 citations


Journal ArticleDOI
TL;DR: This paper proposes and analyzes zeroth-order stochastic approximation algorithms for nonconvex and convex optimization, with a focus on addressing constrained optimization, high-dimensional setting, and saddle point avoiding.
Abstract: In this paper, we propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization, with a focus on addressing constrained optimization, high-dimensional setting, and saddle point avoiding. To handle constrained optimization, we first propose generalizations of the conditional gradient algorithm achieving rates similar to the standard stochastic gradient algorithm using only zeroth-order information. To facilitate zeroth-order optimization in high dimensions, we explore the advantages of structural sparsity assumptions. Specifically, (i) we highlight an implicit regularization phenomenon where the standard stochastic gradient algorithm with zeroth-order information adapts to the sparsity of the problem at hand by just varying the step size and (ii) propose a truncated stochastic gradient algorithm with zeroth-order information, whose rate of convergence depends only poly-logarithmically on the dimensionality. We next focus on avoiding saddle points in nonconvex setting. Toward that, we interpret the Gaussian smoothing technique for estimating gradient based on zeroth-order information as an instantiation of first-order Stein’s identity. Based on this, we provide a novel linear-(in dimension) time estimator of the Hessian matrix of a function using only zeroth-order information, which is based on second-order Stein’s identity. We then provide a zeroth-order variant of cubic regularized Newton method for avoiding saddle points and discuss its rate of convergence to local minima.

65 citations


Journal ArticleDOI
TL;DR: In this paper, a Representation Invariance Loss (RIL) is proposed to optimize the bounding box regression for the rotating objects, which treats multiple representations of an oriented object as multiple equivalent local minima, and hence transforms bounding boxes regression into an adaptive matching process with these local minimima.
Abstract: Arbitrary-oriented objects exist widely in natural scenes, and thus the oriented object detection has received extensive attention in recent years The mainstream rotation detectors use oriented bounding boxes (OBB) or quadrilateral bounding boxes (QBB) to represent the rotating objects However, these methods suffer from the representation ambiguity for oriented object definition, which leads to suboptimal regression optimization and the inconsistency between the loss metric and the localization accuracy of the predictions In this paper, we propose a Representation Invariance Loss (RIL) to optimize the bounding box regression for the rotating objects Specifically, RIL treats multiple representations of an oriented object as multiple equivalent local minima, and hence transforms bounding box regression into an adaptive matching process with these local minima Then, the Hungarian matching algorithm is adopted to obtain the optimal regression strategy We also propose a normalized rotation loss to alleviate the weak correlation between different variables and their unbalanced loss contribution in OBB representation Extensive experiments on remote sensing datasets and scene text datasets show that our method achieves consistent and substantial improvement The source code and trained models are available at this https URL

57 citations


Journal ArticleDOI
TL;DR: The Improved Shuffled based Jaya (IS-Jaya) algorithm is proposed, which uses the concept of shuffling process to gain superior exploration capability in the search mechanism and can be an effective tool for solving discrete size optimization of skeletal structures.

34 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compare the phase transition patterns suggested by the vacuum structure at the critical temperatures, at which local minima are degenerate, with those obtained from computing the probability for nucleation via tunneling through the barrier separating local minimum.
Abstract: Electroweak baryogenesis is an attractive mechanism to generate the baryon asymmetry of the Universe via a strong first order electroweak phase transition. We compare the phase transition patterns suggested by the vacuum structure at the critical temperatures, at which local minima are degenerate, with those obtained from computing the probability for nucleation via tunneling through the barrier separating local minima. Heuristically, nucleation becomes difficult if the barrier between the local minima is too high, or if the distance (in field space) between the minima is too large. As an example of a model exhibiting such behavior, we study the Next-to-Minimal Supersymmetric Standard Model, whose scalar sector contains two SU(2) doublets and one gauge singlet. We find that the calculation of the nucleation probabilities prefers different regions of parameter space for a strong first order electroweak phase transition than the calculation based solely on the critical temperatures. Our results demonstrate that analyzing only the vacuum structure via the critical temperatures can provide a misleading picture of the phase transition patterns, and, in turn, of the parameter space suitable for electroweak baryogenesis.

31 citations


Journal ArticleDOI
TL;DR: In this article, a representation invariance loss (RIL) is proposed to optimize the bounding box regression for the rotating objects in remote sensing images, which treats multiple representations of an oriented object as multiple equivalent local minima.
Abstract: Arbitrary-oriented objects exist widely in remote sensing images. The mainstream rotation detectors use oriented bounding boxes (OBBs) or quadrilateral bounding boxes (QBBs) to represent the rotating objects. However, these methods suffer from the representation ambiguity for oriented object definition, which leads to suboptimal regression optimization and the inconsistency between the loss metric and the localization accuracy of the predictions. In this letter, we propose a representation invariance loss (RIL) to optimize the bounding box regression for the rotating objects in the remote sensing images. RIL treats multiple representations of an oriented object as multiple equivalent local minima and hence transforms bounding box regression into an adaptive matching process with these local minima. Next, the Hungarian matching algorithm is adopted to obtain the optimal regression strategy. Besides, we propose a normalized rotation loss to alleviate the weak correlation between different variables and their unbalanced loss contribution in OBB representation. Extensive experiments on remote sensing datasets show that our method achieves consistent and substantial improvement. The code and models are available at https://github.com/ming71/RIDet to facilitate future research.

31 citations


Journal ArticleDOI
TL;DR: A new parameter-free measure for the specific purpose of quickly and accurately assessing the similarity between two given long time series, which outperforms DTW while providing competitive results against popular distance-based classifiers and is orders of magnitude faster than DTW.
Abstract: The problem of similarity measures is a major area of interest within the field of time series classification (TSC). With the ubiquitous of long time series and the increasing demand for analyzing them on limited resource devices, there is a crucial need for efficient and accurate measures to deal with such kind of data. In fact, there are a plethora of good time series similarity measures in the literature. However, most existing methods achieve good performance for short time series, but their effectiveness decreases quickly as time series are longer. In this paper, we develop a new parameter-free measure for the specific purpose of quickly and accurately assessing the similarity between two given long time series. The proposed “Local Extrema Dynamic Time Warping” (LE-DTW) consists of two steps. The first is a time series representation technique that starts by reducing the dimensionality of a given time series using its local extrema. Next, it physically separates the minima and maxima points for more intuitiveness and consistency of the so-obtained time series representation. The second step consists in adapting the Dynamic Time Warping (DTW) measure so as to evaluate the score of similarity between the generated representations. We test the performance of LE-DTW on a wide range of real-world problems from the UCR time series archive for TSC. Experimental results indicate that for short time series, the proposed method achieves reasonable classification accuracy as compared to DTW. However, for long time series, LE-DTW performs much better. Indeed, it outperforms DTW while providing competitive results against popular distance-based classifiers. Moreover, in terms of efficiency, LE-DTW is orders of magnitude faster than DTW.

29 citations


Journal ArticleDOI
TL;DR: This paper explores a simple, computationally feasible method, which provides k-means with a set of initial seeds to cluster datasets of arbitrary dimensions, and shows that it is feasible and more efficient than previous proposals in many situations.
Abstract: The k-means algorithm is widely used in various research fields because of its fast convergence to the cost function minima; however, it frequently gets stuck in local optima as it is sensitive to initial conditions. This paper explores a simple, computationally feasible method, which provides k-means with a set of initial seeds to cluster datasets of arbitrary dimensions. Our technique consists of two stages: firstly, we use the original data space to obtain a set of prototypes (cluster centers) by applying k-means to bootstrap replications of the data and, secondly, we cluster the space of centers, which has tighter (thus easier to separate) groups, and search the deepest point in each assembled cluster using a depth notion. We test this method with simulated and real data, compare it with commonly used k-means initialization algorithms, and show that it is feasible and more efficient than previous proposals in many situations.

27 citations


Journal ArticleDOI
TL;DR: In this paper, the Trotterized quantum annealing (TQA) protocol is proposed to circumvent the issue of false minima for a broad range of time steps, yielding the same performance as the best result out of an exponentially scaling number of random initializations.
Abstract: The quantum approximate optimization algorithm (QAOA) is a prospective near-term quantum algorithm due to its modest circuit depth and promising benchmarks. However, an external parameter optimization required in QAOA could become a performance bottleneck. This motivates studies of the optimization landscape and search for heuristic ways of parameter initialization. In this work we visualize the optimization landscape of the QAOA applied to the MaxCut problem on random graphs, demonstrating that random initialization of the QAOA is prone to converging to local minima with sub-optimal performance. We introduce the initialization of QAOA parameters based on the Trotterized quantum annealing (TQA) protocol, parameterized by the Trotter time step. We find that the TQA initialization allows to circumvent the issue of false minima for a broad range of time steps, yielding the same performance as the best result out of an exponentially scaling number of random initializations. Moreover, we demonstrate that the optimal value of the time step coincides with the point of proliferation of Trotter errors in quantum annealing. Our results suggest practical ways of initializing QAOA protocols on near-term quantum devices and reveals new connections between QAOA and quantum annealing.

27 citations


Journal ArticleDOI
TL;DR: An extensive study of a new metaheuristics algorithm called Smell Agent Optimization (SAO) on some CEC numerical optimization benchmark functions and Hybrid Renewable Energy System (HRES) engineering problems revealed that the SAO can find the global optimum in 76% of the benchmark functions.
Abstract: This paper presents an extensive study of a new metaheuristics algorithm called Smell Agent Optimization (SAO) on some CEC numerical optimization benchmark functions and Hybrid Renewable Energy System (HRES) engineering problems. The SAO implements the relationships between a smell agent and an object evaporating a smell molecule. These relationships are modelled into three separate modes called the sniffing, trailing and random modes. The sniffing mode simulates the smell perception capability of the agent as the smell molecules diffuse from a smell source towards the agent. The trailing mode simulates the capability of the agent to track the part of the smell molecules until its source is identified. Whereas, the random mode is a strategy employed by the agent to avoid getting stuck in local minima. Thirty-seven commonly used CEC benchmark functions, and HRES engineering problem are tested, and results are compared with six other metaheuristics methods. Experimental results revealed that the SAO can find the global optimum in 76% of the benchmark functions. Similarly, statistical results showed that the SAO also obtained the most cost effective HRES design compared to the benchmarked algorithms.

26 citations


Journal ArticleDOI
TL;DR: A new architecture with a multipatch feature pyramid network (MPFP-Net) is proposed to achieve better accuracy compared to several state-of-the-art object detection models and introduces an effective method to regularize the residual values and make the fusion transition layers strictly norm-preserving.
Abstract: Object detection is a challenging task in remote sensing because objects only occupy a few pixels in the images, and the models are required to simultaneously learn object locations and detection. Even though the established approaches well perform for the objects of regular sizes, they achieve weak performance when analyzing small ones or getting stuck in the local minima (e.g., false object parts). Two possible issues stand in their way. First, the existing methods struggle to perform stably on the detection of small objects because of the complicated background. Second, most of the standard methods used handcrafted features and do not work well on the detection of objects parts that are missing. We here address the above issues and propose a new architecture with a multipatch feature pyramid network (MPFP-Net). Different from the current models that, during training, only pursue the most discriminative patches, in MPFP-Net, the patches are divided into class-affiliated subsets, in which the patches are related, and based on the primary loss function, a sequence of smooth loss functions is determined for the subsets to improve the model for collecting small object parts. To enhance the feature representation for patch selection, we introduce an effective method to regularize the residual values and make the fusion transition layers strictly norm-preserving. The network contains bottom-up and crosswise connections to fuse the features of different scales to achieve better accuracy compared to several state-of-the-art object detection models. Also, the developed architecture is more efficient than the baselines.

Journal ArticleDOI
TL;DR: An arithmetic optimizer algorithm (AOA) is hybridized with slime mold algorithm (SMA) to address the issue of less internal memory and slow convergence at local minima which is termed as HAOASMA as mentioned in this paper.
Abstract: An arithmetic optimizer algorithm (AOA) is hybridized with slime mould algorithm (SMA) to address the issue of less internal memory and slow convergence at local minima which is termed as HAOASMA. Lens opposition-based learning strategy is also integrated with the hybrid algorithm which enhances the population diversity of the hybrid optimizer to accelerate the convergence. The local best ( $$P_{{{\text{best}}}} )$$ and global best ( $$g_{{{\text{best}}}} )$$ of SMA initializes the AOA’s search process. The $$P_{{{\text{best}}}}$$ obtained from AOA again initializes the SMA to further exploit the search space. In this way, the developed hybrid algorithm utilizes the exploitation and exploration capabilities of SMA and AOA, respectively. The developed HAOASMA has been compared on twenty-three benchmark functions at different dimensions with basic SMA, AOA and six renowned meta-heuristic algorithms. The HAOASMA has also been applied to classical engineering design problems. The performance of HAOASMA is significantly superior compared to basic SMA, AOA and other meta-heuristic algorithms.

Journal ArticleDOI
Yu Feng1, Yuhai Tu1
TL;DR: In this article, the authors investigated the connection between SGD learning dynamics and the loss function landscape and found that SGD serves as a landscape-dependent annealing algorithm, which is the opposite to the fluctuation-response relation in equilibrium statistical physics.
Abstract: Despite tremendous success of the stochastic gradient descent (SGD) algorithm in deep learning, little is known about how SGD finds generalizable solutions at flat minima of the loss function in high-dimensional weight space. Here, we investigate the connection between SGD learning dynamics and the loss function landscape. A principal component analysis (PCA) shows that SGD dynamics follow a low-dimensional drift-diffusion motion in the weight space. Around a solution found by SGD, the loss function landscape can be characterized by its flatness in each PCA direction. Remarkably, our study reveals a robust inverse relation between the weight variance and the landscape flatness in all PCA directions, which is the opposite to the fluctuation-response relation (aka Einstein relation) in equilibrium statistical physics. To understand the inverse variance-flatness relation, we develop a phenomenological theory of SGD based on statistical properties of the ensemble of minibatch loss functions. We find that both the anisotropic SGD noise strength (temperature) and its correlation time depend inversely on the landscape flatness in each PCA direction. Our results suggest that SGD serves as a landscape-dependent annealing algorithm. The effective temperature decreases with the landscape flatness so the system seeks out (prefers) flat minima over sharp ones. Based on these insights, an algorithm with landscape-dependent constraints is developed to mitigate catastrophic forgetting efficiently when learning multiple tasks sequentially. In general, our work provides a theoretical framework to understand learning dynamics, which may eventually lead to better algorithms for different learning tasks.

Journal ArticleDOI
TL;DR: This article shows that for sufficiently- overparameterized nonlinear models, SMD with a (small enough) fixed step size converges to a global minimum that is ``very close'' (in Bregman divergence) to the minimum-potential interpolating solution, thus attaining approximate implicit regularization.
Abstract: Most modern learning problems are highly overparameterized, i.e., have many more model parameters than the number of training data points. As a result, the training loss may have infinitely many global minima (parameter vectors that perfectly ``interpolate'' the training data). It is therefore imperative to understand which interpolating solutions we converge to, how they depend on the initialization and learning algorithm, and whether they yield different test errors. In this article, we study these questions for the family of stochastic mirror descent (SMD) algorithms, of which stochastic gradient descent (SGD) is a special case. Recently, it has been shown that for overparameterized linear models, SMD converges to the closest global minimum to the initialization point, where closeness is in terms of the Bregman divergence corresponding to the potential function of the mirror descent. With appropriate initialization, this yields convergence to the minimum-potential interpolating solution, a phenomenon referred to as implicit regularization. On the theory side, we show that for sufficiently- overparameterized nonlinear models, SMD with a (small enough) fixed step size converges to a global minimum that is ``very close'' (in Bregman divergence) to the minimum-potential interpolating solution, thus attaining approximate implicit regularization. On the empirical side, our experiments on the MNIST and CIFAR-10 datasets consistently confirm that the above phenomenon occurs in practical scenarios. They further indicate a clear difference in the generalization performances of different SMD algorithms: experiments on the CIFAR-10 dataset with different regularizers, l₁ to encourage sparsity, l₂ (SGD) to encourage small Euclidean norm, and l to discourage large components, surprisingly show that the l norm consistently yields better generalization performance than SGD, which in turn generalizes better than the l₁ norm.

Journal ArticleDOI
TL;DR: This work considers the case where the measurement samples within typically very small and disconnected subsets are coherently linked to each other — which is a reasonable assumption for the objective of antenna measurements.
Abstract: Phase retrieval is in general a non-convex and non-linear task and the corresponding algorithms struggle with the issue of local minima. We consider the case where the measurement samples within typically very small and disconnected subsets are coherently linked to each other — which is a reasonable assumption for our objective of antenna measurements. Two classes of measurement setups are discussed which can provide this kind of extra information: multi-probe systems and holographic measurements with multiple reference signals. We propose several formulations of the corresponding phase retrieval problem. The simplest of these formulations poses a linear system of equations similar to an eigenvalue problem where a unique non-trivial null-space vector needs to be found. Accurate phase reconstruction for partially coherent observations is, thus, possible by a reliable solution process and with judgment of the solution quality. Under ideal, noise-free conditions, the required sampling density is less than two times the number of unknowns. Noise and other observation errors increase this value slightly. Simulations for Gaussian random matrices and for antenna measurement scenarios demonstrate that reliable phase reconstruction is possible with the presented approach.

Journal ArticleDOI
TL;DR: A bilevel optimization approach for the estimation of parameters in nonlocal image denoising models that investigates the differentiability of the solution operator in function spaces and derives a first-order optimality system that characterizes local minima.
Abstract: We propose a bilevel optimization approach for the estimation of parameters in nonlocal image denoising models. The parameters we consider are both the fidelity weight and weights within the kernel of the nonlocal operator. In both cases, we investigate the differentiability of the solution operator in function spaces and derive a first-order optimality system that characterizes local minima. For the numerical solution of the problems, we use a second-order trust-region algorithm in combination with a finite element discretization of the nonlocal denoising models and introduce a computational strategy for the solution of the resulting dense linear systems. Several experiments illustrate the applicability and effectiveness of our approach.

Journal ArticleDOI
Pai Liu1, Yi Yan1, Xiaopeng Zhang1, Yangjun Luo1, Zhan Kang1 
TL;DR: In this article, an effective gradient-free framework for periodic microstructure design is presented, which exhibits powerful global searching capabilities and requires no sensitivity information. But, the optimization problem is known to have multiple local minima, and most gradient-based topology optimization methods significantly depend on the initial guess of the microstructural geometry, thus requiring the designer's experiences.

Journal ArticleDOI
TL;DR: Numerical experiments executed on two classes of randomly generated test functions show a promising behavior of global optimization methods using the introduced local tuning techniques for speeding up the process of the global search.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a progressive knowledge transfer-based multitask convolutional neural network (PKT-MCNN) to address the problems of intra/inter-class distance unbalance and poor local minima.
Abstract: In modern industry, large-scale fault diagnosis of complex systems is emerging and becoming increasingly important. Most deep learning-based methods perform well on small number of fault diagnosis, but cannot converge to satisfactory results when handling large-scale fault diagnosis because the huge number of fault types will lead to the problems of intra/inter-class distance unbalance and poor local minima in neural networks. To address the above problems, a progressive knowledge transfer-based multitask convolutional neural network (PKT-MCNN) is proposed. First, to construct the coarse-to-fine knowledge structure intelligently, a structure learning algorithm is proposed via clustering fault types in different coarse-grained nodes. Thus, the intra/inter-class distance unbalance problem can be mitigated by spreading similar tasks into different nodes. Then, an MCNN architecture is designed to learn the coarse and fine-grained task simultaneously and extract more general fault information, thereby pushing the algorithm away from poor local minima. Last but not least, a PKT algorithm is proposed, which can not only transfer the coarse-grained knowledge to the fine-grained task and further alleviate the intra/inter-class distance unbalance in feature space, but also regulate different learning stages by adjusting the attention weight to each task progressively. To verify the effectiveness of the proposed method, a dataset of a nuclear power system with 66 fault types was collected and analyzed. The results demonstrate that the proposed method can be a promising tool for large-scale fault diagnosis.

Journal ArticleDOI
TL;DR: In this paper, nonasymptotic estimates from above and below on the integrated density of states of the Schrodinger operator L = − Δ + V, using a counting function for the minima of the localization landscape, were established.

Journal ArticleDOI
TL;DR: An algorithm for the minimization of the energy of magnetic systems is presented and applied to the analysis of thermal configurations of a ferromagnet to identify inherent structures, i.e. the nearest local energy minima, as a function of temperature.

Proceedings Article
03 May 2021
TL;DR: In this paper, the Hessian-dependent covariance of stochastic gradient noise was used to show that SGD favors flat minima exponentially more than sharp minima, while Gradient Descent (GD) with injected white noise favors flat minimization only polynomially more than the sharp minimization.
Abstract: Stochastic Gradient Descent (SGD) and its variants are mainstream methods for training deep networks in practice. SGD is known to find a flat minimum that often generalizes well. However, it is mathematically unclear how deep learning can select a flat minimum among so many minima. To answer the question quantitatively, we develop a density diffusion theory (DDT) to reveal how minima selection quantitatively depends on the minima sharpness and the hyperparameters. To the best of our knowledge, we are the first to theoretically and empirically prove that, benefited from the Hessian-dependent covariance of stochastic gradient noise, SGD favors flat minima exponentially more than sharp minima, while Gradient Descent (GD) with injected white noise favors flat minima only polynomially more than sharp minima. We also reveal that either a small learning rate or large-batch training requires exponentially many iterations to escape from minima in terms of the ratio of the batch size and learning rate. Thus, large-batch training cannot search flat minima efficiently in a realistic computational time.

Journal ArticleDOI
13 May 2021
TL;DR: In this article, the authors explore some mathematical features of the loss landscape of overparameterized neural networks, and show that the loss function looks like a typical function from φ-mathbb{R}^d.
Abstract: We explore some mathematical features of the loss landscape of overparameterized neural networks. A priori, one might imagine that the loss function looks like a typical function from $\mathbb{R}^d...

Proceedings ArticleDOI
11 Jul 2021
TL;DR: In this paper, the authors provide a comprehensive overview on SA and its accelerated variants, and propose a novel SA scheme called curious simulated annealing, combining the assets of two recent acceleration strategies.
Abstract: Finding the global minimum of a nonconvex optimization problem is a notoriously hard task appearing in numerous applications, from signal processing to machine learning. Simulated annealing (SA) is a family of stochastic optimization methods where an artificial temperature controls the exploration of the search space while preserving convergence to the global minima. SA is efficient, easy to implement, and theoretically sound, but suffers from a slow convergence rate. The purpose of this work is two-fold. First, we provide a comprehensive overview on SA and its accelerated variants. Second, we propose a novel SA scheme called curious simulated annealing, combining the assets of two recent acceleration strategies. Theoretical guarantees of this algorithm are provided. Its performance with respect to existing methods is illustrated on practical examples.

Journal ArticleDOI
TL;DR: A new general-purpose parallel surrogate global optimization method, PODS, that reduces the number of model simulations as well as the human time needed for proper calibration of these multimodal problems without derivatives is presented.
Abstract: Parameter calibration for computationally expensive environmental models (e.g., hydrodynamic models) is challenging because of limits on computing budget and on human time for analysis and because the optimization problem can have multiple local minima and no available derivatives. We present a new general-purpose parallel surrogate global optimization method P arallel O ptimization with D ynamic coordinate search using S urrogates (PODS) that reduces the number of model simulations as well as the human time needed for proper calibration of these multimodal problems without derivatives. PODS outperforms state-of-art parallel surrogate algorithms and a heuristic method, Parallel Differential Evolution (P-DE), on all eight well-known test problems. We further apply PODS to the parameter calibration of two expensive (5 h per simulation), three-dimensional hydrodynamic models with the assistant of High-Performance Computing (HPC). Results indicate that PODS outperforms the popularly used P-DE algorithm in speed (about twice faster) and accuracy with 24 parallel processors.

Journal ArticleDOI
01 Feb 2021
TL;DR: The proposed ModPSO-CNN algorithm results in the fusion of modified particle swarm optimization (ModPSO) along with backpropagation (BP) and convolution neural network (CNN) to encourage performance improvement by avoiding premature convergence and local minima.
Abstract: Training optimization plays a vital role in the development of convolution neural network (CNN). CNNs are hard to train because of the presence of multiple local minima. The optimization problem for a CNN is non-convex, hence, has multiple local minima. If any of the chosen hyper-parameters are not appropriate, it will end up at bad local minima, which leads to poor performance. Hence, proper optimization of the training algorithm for CNN is the key to converge to a good local minimum. Therefore, in this paper, we introduce an evolutionary convolution neural network (ModPSO-CNN) algorithm. The proposed algorithm results in the fusion of modified particle swarm optimization (ModPSO) along with backpropagation (BP) and convolution neural network (CNN). The training of CNN involves ModPSO along with backpropagation (BP) algorithm to encourage performance improvement by avoiding premature convergence and local minima. The ModPSO have adaptive, dynamic and improved parameters, to handle the issues in training CNN. The adaptive and dynamic parameters bring a proper balance between the global and local search ability, while an improved parameter keeps the diversity of the swarm. The proposed ModPSO algorithm is validated on three standard mathematical test functions and compared with three variants of the benchmark PSO algorithm. Furthermore, the performance of the proposed ModPSO-CNN is also compared with other training algorithms focusing on the analysis of computational cost, convergence and accuracy based on a standard problem specific to classification applications, such as CIFAR-10 dataset and face and skin detection dataset.

Journal ArticleDOI
TL;DR: This paper presents an algorithm to perform a systematic exploratory search for the solutions of the optimization problem via second-order methods without a good initial guess, which combines the techniques of deflation, barrier methods and primal-dual active set solvers in a novel way.
Abstract: Topology optimization problems often support multiple local minima due to a lack of convexity. Typically, gradient-based techniques combined with continuation in model parameters are used to promot...

Journal ArticleDOI
TL;DR: An original alternative to this approach through norm-homotopy optimization, combined with an efficient technique to compute the structural response, is shown to outperform direct H ∞ optimization in terms of speed and performance.

Journal ArticleDOI
TL;DR: In this article, the authors investigate the potential of applying a state-of-the-art local derivative-free solver, Py-BOBYQA, to global optimization problems.
Abstract: We investigate the potential of applying a state-of-the-art, local derivative-free solver, Py-BOBYQA to global optimization problems. In particular, we demonstrate the potential of a restarts proce...

Posted Content
Jian-Feng Cai, Anil Lalwani1, Meng Huang, Dong Li, Yang Wang 
TL;DR: This paper shows that the smoothed amplitude flow model for phase retrieval has benign geometric structure under the optimal sampling complexity and the gradient descent algorithm with random initialization performs well even comparing with state-ofthe-art algorithms with spectral initialization in empirical success rate and convergence speed.
Abstract: The problem of recovering a signal $\mathbf{x}\in \mathbb{R}^n$ from a set of magnitude measurements $y_i=|\langle \mathbf{a}_i, \mathbf{x} \rangle |, \; i=1,\ldots,m$ is referred as phase retrieval, which has many applications in fields of physical sciences and engineering. In this paper we show that the smoothed amplitude flow model for phase retrieval has benign geometric structure under the optimal sampling complexity. In particular, we show that when the measurements $\mathbf{a}_i\in \mathbb{R}^n$ are Gaussian random vectors and the number of measurements $m\ge Cn$, our smoothed amplitude flow model has no spurious local minimizers with high probability, ie., the target solution $\mathbf{x}$ is the unique global minimizer (up to a global phase) and the loss function has a negative directional curvature around each saddle point. Due to this benign geometric landscape, the phase retrieval problem can be solved by the gradient descent algorithms without spectral initialization. Numerical experiments show that the gradient descent algorithm with random initialization performs well even comparing with state-of-the-art algorithms with spectral initialization in empirical success rate and convergence speed.