Top 212 papers published in the topic of Maxima and minima in 2017

Posted Content•

[...]

Stanisław Jastrzębski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey - Show less +3 more

13 Nov 2017-arXiv: Learning

TL;DR: Through this analysis, it is found that three factors – learning rate, batch size and the variance of the loss gradients – control the trade-off between the depth and width of the minima found by SGD, with wider minima favoured by a higher ratio of learning rate to batch size.

...read moreread less

Abstract: We investigate the dynamical and convergent properties of stochastic gradient descent (SGD) applied to Deep Neural Networks (DNNs). Characterizing the relation between learning rate, batch size and the properties of the final minima, such as width or generalization, remains an open question. In order to tackle this problem we investigate the previously proposed approximation of SGD by a stochastic differential equation (SDE). We theoretically argue that three factors - learning rate, batch size and gradient covariance - influence the minima found by SGD. In particular we find that the ratio of learning rate to batch size is a key determinant of SGD dynamics and of the width of the final minima, and that higher values of the ratio lead to wider minima and often better generalization. We confirm these findings experimentally. Further, we include experiments which show that learning rate schedules can be replaced with batch size schedules and that the ratio of learning rate to batch size is an important factor influencing the memorization process.

...read moreread less

386 citations

Posted Content•

Sharp Minima Can Generalize For Deep Nets

[...]

Laurent Dinh¹, Razvan Pascanu, Samy Bengio², Yoshua Bengio¹•Institutions (2)

Université de Montréal¹, Google²

15 Mar 2017-arXiv: Learning

TL;DR: It is argued that most notions of flatness are problematic for deep models and can not be directly applied to explain generalization, and when focusing on deep networks with rectifier units, the particular geometry of parameter space induced by the inherent symmetries that these architectures exhibit is exploited.

...read moreread less

Abstract: Despite their overwhelming capacity to overfit, deep learning architectures tend to generalize relatively well to unseen data, allowing them to be deployed in practice. However, explaining why this is the case is still an open area of research. One standing hypothesis that is gaining popularity, e.g. Hochreiter & Schmidhuber (1997); Keskar et al. (2017), is that the flatness of minima of the loss function found by stochastic gradient based methods results in good generalization. This paper argues that most notions of flatness are problematic for deep models and can not be directly applied to explain generalization. Specifically, when focusing on deep networks with rectifier units, we can exploit the particular geometry of parameter space induced by the inherent symmetries that these architectures exhibit to build equivalent models corresponding to arbitrarily sharper minima. Furthermore, if we allow to reparametrize a function, the geometry of its parameters can change drastically without affecting its generalization properties.

...read moreread less

341 citations

Proceedings Article•

Sharp minima can generalize for deep nets

[...]

Laurent Dinh¹, Razvan Pascanu, Samy Bengio², Yoshua Bengio¹•Institutions (2)

Université de Montréal¹, Google²

06 Aug 2017

TL;DR: The authors argue that most notions of flatness are problematic for deep models and can not be directly applied to explain generalization, and exploit the particular geometry of parameter space induced by the inherent symmetries that these architectures exhibit to build equivalent models corresponding to arbitrarily sharper minima.

...read moreread less

Abstract: Despite their overwhelming capacity to overfit, deep learning architectures tend to generalize relatively well to unseen data, allowing them to be deployed in practice. However, explaining why this is the case is still an open area of research. One standing hypothesis that is gaining popularity, e.g. Hochreiter & Schmidhuber (1997); Keskar et al. (2017), is that the flatness of minima of the loss function found by stochastic gradient based methods results in good generalization. This paper argues that most notions of flatness are problematic for deep models and can not be directly applied to explain generalization. Specifically, when focusing on deep networks with rectifier units, we can exploit the particular geometry of parameter space induced by the inherent symmetries that these architectures exhibit to build equivalent models corresponding to arbitrarily sharper minima. Furthermore, if we allow to reparametrize a function, the geometry of its parameters can change drastically without affecting its generalization properties.

...read moreread less

323 citations

Posted Content•

No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis

[...]

Rong Ge¹, Chi Jin², Yi Zheng¹•Institutions (2)

Duke University¹, University of California, Berkeley²

03 Apr 2017-arXiv: Learning

TL;DR: In this paper, the authors developed a new framework that captures the common landscape underlying the common non-convex low-rank matrix problems including matrix sensing, matrix completion and robust PCA.

...read moreread less

Abstract: In this paper we develop a new framework that captures the common landscape underlying the common non-convex low-rank matrix problems including matrix sensing, matrix completion and robust PCA. In particular, we show for all above problems (including asymmetric cases): 1) all local minima are also globally optimal; 2) no high-order saddle points exists. These results explain why simple algorithms such as stochastic gradient descent have global converge, and efficiently optimize these non-convex objective functions in practice. Our framework connects and simplifies the existing analyses on optimization landscapes for matrix sensing and symmetric matrix completion. The framework naturally leads to new results for asymmetric matrix completion and robust PCA.

...read moreread less

295 citations

Proceedings Article•

How to escape saddle points efficiently

[...]

Chi Jin¹, Rong Ge², Praneeth Netrapalli³, Sham M. Kakade⁴, Michael I. Jordan¹ - Show less +1 more•Institutions (4)

University of California, Berkeley¹, Duke University², Microsoft³, University of Washington⁴

06 Aug 2017

TL;DR: In this article, the authors show that perturbed gradient descent can escape saddle points almost for free, in a number of iterations which depends only poly-logarithmically on dimension.

...read moreread less

Abstract: This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i.e., it is almost "dimension-free"). The convergence rate of this procedure matches the well-known convergence rate of gradient descent to first-order stationary points, up to log factors. When all saddle points are non-degenerate, all second-order stationary points are local minima, and our result thus shows that perturbed gradient descent can escape saddle points almost for free. Our results can be directly applied to many machine learning applications, including deep learning. As a particular concrete example of such an application, we show that our results can be used directly to establish sharp global convergence rates for matrix factorization. Our results rely on a novel characterization of the geometry around saddle points, which may be of independent interest to the non-convex optimization community.

...read moreread less

280 citations

Posted Content•

How to Escape Saddle Points Efficiently

[...]

Chi Jin¹, Rong Ge², Praneeth Netrapalli³, Sham M. Kakade⁴, Michael I. Jordan¹ - Show less +1 more•Institutions (4)

University of California, Berkeley¹, Duke University², Microsoft³, University of Washington⁴

02 Mar 2017-arXiv: Learning

TL;DR: This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension, which shows that perturbed gradient descent can escape saddle points almost for free.

...read moreread less

Abstract: This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i.e., it is almost "dimension-free"). The convergence rate of this procedure matches the well-known convergence rate of gradient descent to first-order stationary points, up to log factors. When all saddle points are non-degenerate, all second-order stationary points are local minima, and our result thus shows that perturbed gradient descent can escape saddle points almost for free. Our results can be directly applied to many machine learning applications, including deep learning. As a particular concrete example of such an application, we show that our results can be used directly to establish sharp global convergence rates for matrix factorization. Our results rely on a novel characterization of the geometry around saddle points, which may be of independent interest to the non-convex optimization community.

...read moreread less

259 citations

Proceedings Article•

No spurious local minima in nonconvex low rank problems: a unified geometric analysis

[...]

Rong Ge¹, Chi Jin², Yi Zheng¹•Institutions (2)

Duke University¹, University of California, Berkeley²

06 Aug 2017

TL;DR: A new framework that captures the common landscape underlying the common non-convex low-rank matrix problems including matrix sensing, matrix completion and robust PCA shows that all local minima are also globally optimal; no high-order saddle points exists.

...read moreread less

Abstract: In this paper we develop a new framework that captures the common landscape underlying the common non-convex low-rank matrix problems including matrix sensing, matrix completion and robust PCA. In particular, we show for all above problems (including asymmetric cases): 1) all local minima are also globally optimal; 2) no high-order saddle points exists. These results explain why simple algorithms such as stochastic gradient descent have global converge, and efficiently optimize these non-convex objective functions in practice. Our framework connects and simplifies the existing analyses on optimization landscapes for matrix sensing and symmetric matrix completion. The framework naturally leads to new results for asymmetric matrix completion and robust PCA.

...read moreread less

202 citations

Proceedings Article•

Self-supervised learning of motion capture

[...]

Hsiao-Yu Fish Tung¹, Hsiao-Wei Tung², Ersin Yumer³, Katerina Fragkiadaki¹•Institutions (3)

Carnegie Mellon University¹, University of Pittsburgh², Adobe Systems³

04 Dec 2017

TL;DR: This work proposes a learning based motion capture model that optimizes neural network weights that predict 3D shape and skeleton configurations given a monocular RGB video and shows that the proposed model improves with experience and converges to low-error solutions where previous optimization methods fail.

...read moreread less

Abstract: Current state-of-the-art solutions for motion capture from a single camera are optimization driven: they optimize the parameters of a 3D human model so that its re-projection matches measurements in the video (e.g. person segmentation, optical flow, keypoint detections etc.). Optimization models are susceptible to local minima. This has been the bottleneck that forced using clean green-screen like backgrounds at capture time, manual initialization, or switching to multiple cameras as input resource. In this work, we propose a learning based motion capture model for single camera input. Instead of optimizing mesh and skeleton parameters directly, our model optimizes neural network weights that predict 3D shape and skeleton configurations given a monocular RGB video. Our model is trained using a combination of strong supervision from synthetic data, and self-supervision from differentiable rendering of (a) skeletal keypoints, (b) dense 3D mesh motion, and (c) human-background segmentation, in an end-to-end framework. Empirically we show our model combines the best of both worlds of supervised learning and test-time optimization: supervised learning initializes the model parameters in the right regime, ensuring good pose and surface initialization at test time, without manual effort. Self-supervision by back-propagating through differentiable rendering allows (unsupervised) adaptation of the model to the test data, and offers much tighter fit than a pretrained fixed model. We show that the proposed model improves with experience and converges to low-error solutions where previous optimization methods fail.

...read moreread less

201 citations

Posted Content•

Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes

[...]

Lei Wu, Zhanxing Zhu, Weinan E

30 Jun 2017-arXiv: Learning

TL;DR: The underlying reasons why deep neural networks often generalize well are investigated, and it is shown that the characteristics the landscape of the loss function that explains the good generalization capability is the volume of basin of attraction of good minima.

...read moreread less

Abstract: It is widely observed that deep learning models with learned parameters generalize well, even with much more model parameters than the number of training samples. We systematically investigate the underlying reasons why deep neural networks often generalize well, and reveal the difference between the minima (with the same training error) that generalize well and those they don't. We show that it is the characteristics the landscape of the loss function that explains the good generalization capability. For the landscape of loss function for deep networks, the volume of basin of attraction of good minima dominates over that of poor minima, which guarantees optimization methods with random initialization to converge to good minima. We theoretically justify our findings through analyzing 2-layer neural networks; and show that the low-complexity solutions have a small norm of Hessian matrix with respect to model parameters. For deeper networks, extensive numerical evidence helps to support our arguments.

...read moreread less

167 citations

Proceedings Article•DOI•

Global Optimality in Neural Network Training

[...]

Benjamin D. Haeffele¹, René Vidal¹•Institutions (1)

Johns Hopkins University¹

01 Jul 2017

TL;DR: There are sufficient conditions to guarantee that local minima are globally optimal and that a local descent strategy can reach a global minima from any initialization.

...read moreread less

Abstract: The past few years have seen a dramatic increase in the performance of recognition systems thanks to the introduction of deep networks for representation learning. However, the mathematical reasons for this success remain elusive. A key issue is that the neural network training problem is nonconvex, hence optimization algorithms may not return a global minima. This paper provides sufficient conditions to guarantee that local minima are globally optimal and that a local descent strategy can reach a global minima from any initialization. Our conditions require both the network output and the regularization to be positively homogeneous functions of the network parameters, with the regularization being designed to control the network size. Our results apply to networks with one hidden layer, where size is measured by the number of neurons in the hidden layer, and multiple deep subnetworks connected in parallel, where size is measured by the number of subnetworks.

...read moreread less

132 citations

Posted Content•

Output Range Analysis for Deep Neural Networks

[...]

Souradeep Dutta, Susmit Jha, Sriram Sanakaranarayanan, Ashish Tiwari

26 Sep 2017-arXiv: Learning

TL;DR: This paper presents an efficient range estimation algorithm that uses a combination of local search and linear programming problems to efficiently find the maximum and minimum values taken by the outputs of the NN over the given input set and demonstrates the effectiveness of the proposed approach for verification of NNs used in automated control as well as those used in classification.

...read moreread less

Abstract: Deep neural networks (NN) are extensively used for machine learning tasks such as image classification, perception and control of autonomous systems. Increasingly, these deep NNs are also been deployed in high-assurance applications. Thus, there is a pressing need for developing techniques to verify neural networks to check whether certain user-expected properties are satisfied. In this paper, we study a specific verification problem of computing a guaranteed range for the output of a deep neural network given a set of inputs represented as a convex polyhedron. Range estimation is a key primitive for verifying deep NNs. We present an efficient range estimation algorithm that uses a combination of local search and linear programming problems to efficiently find the maximum and minimum values taken by the outputs of the NN over the given input set. In contrast to recently proposed "monolithic" optimization approaches, we use local gradient descent to repeatedly find and eliminate local minima of the function. The final global optimum is certified using a mixed integer programming instance. We implement our approach and compare it with Reluplex, a recently proposed solver for deep neural networks. We demonstrate the effectiveness of the proposed approach for verification of NNs used in automated control as well as those used in classification.

...read moreread less

Posted Content•

Depth Creates No Bad Local Minima

[...]

Haihao Lu, Kenji Kawaguchi

27 Feb 2017-arXiv: Learning

TL;DR: It is proved that without nonlinearity, depth alone does not create bad local minima, although it induces non-convex loss surface.

...read moreread less

Abstract: In deep learning, \textit{depth}, as well as \textit{nonlinearity}, create non-convex loss surfaces. Then, does depth alone create bad local minima? In this paper, we prove that without nonlinearity, depth alone does not create bad local minima, although it induces non-convex loss surface. Using this insight, we greatly simplify a recently proposed proof to show that all of the local minima of feedforward deep linear neural networks are global minima. Our theoretical results generalize previous results with fewer assumptions, and this analysis provides a method to show similar results beyond square loss in deep linear models.

...read moreread less

Proceedings Article•

Practical Gauss-Newton Optimisation for Deep Learning

[...]

Aleksandar Botev¹, Hippolyt Ritter¹, David Barber¹•Institutions (1)

University College London¹

01 Aug 2017

TL;DR: In this article, an efficient block-diagonal approximation to the Gauss-Newton matrix for feedforward neural networks is presented, which is competitive with state-of-the-art first-order optimisation methods.

...read moreread less

Abstract: We present an efficient block-diagonal approximation to the Gauss-Newton matrix for feedforward neural networks. Our resulting algorithm is competitive against state-of-the-art first-order optimisation methods, with sometimes significant improvement in optimisation performance. Unlike first-order methods, for which hyperparameter tuning of the optimisation parameters is often a laborious process, our approach can provide good performance even when used with default settings. A side result of our work is that for piecewise linear transfer functions, the network objective function can have no differentiable local maxima, which may partially explain why such transfer functions facilitate effective optimisation.

...read moreread less

Journal Article•DOI•

Global optimality in low-rank matrix optimization

[...]

Zhihui Zhu¹, Qiuwei Li¹, Gongguo Tang¹, Michael B. Wakin¹•Institutions (1)

Colorado School of Mines¹

25 Feb 2017

TL;DR: In this paper, the authors consider the minimization of a general objective function over a set of rectangular matrices that have rank at most r. Despite the resulting nonconvexity, recent studies in matrix completion and sensing have shown that the factored problem has no spurious local minima and obeys the strict saddle property.

...read moreread less

Abstract: This paper considers the minimization of a general objective function $f(\boldsymbol{X})$ over the set of rectangular $n\times m$ matrices that have rank at most $r$ . To reduce the computational burden, we factorize the variable $\boldsymbol{X}$ into a product of two smaller matrices and optimize over these two matrices instead of $\boldsymbol{X}$ . Despite the resulting nonconvexity, recent studies in matrix completion and sensing have shown that the factored problem has no spurious local minima and obeys the so-called strict saddle property (the function has a directional negative curvature at all critical points but local minima). We analyze the global geometry for a general and yet well-conditioned objective function $f(\boldsymbol{X})$ whose restricted strong convexity and restricted strong smoothness constants are comparable. In particular, we show that the reformulated objective function has no spurious local minima and obeys the strict saddle property. These geometric properties imply that a number of iterative optimization algorithms (such as gradient descent) can provably solve the factored problem with global convergence.

...read moreread less

Posted Content•

The loss surface of deep and wide neural networks

[...]

Quynh C. Nguyen¹, Matthias Hein¹•Institutions (1)

Saarland University¹

26 Apr 2017-arXiv: Learning

TL;DR: This article showed that almost all local minima are globally optimal for a fully connected network with squared loss and analytic activation function given that the number of hidden units of one layer of the network is larger than the training points and the network structure from this layer on is pyramidal.

...read moreread less

Abstract: While the optimization problem behind deep neural networks is highly non-convex, it is frequently observed in practice that training deep networks seems possible without getting stuck in suboptimal points. It has been argued that this is the case as all local minima are close to being globally optimal. We show that this is (almost) true, in fact almost all local minima are globally optimal, for a fully connected network with squared loss and analytic activation function given that the number of hidden units of one layer of the network is larger than the number of training points and the network structure from this layer on is pyramidal.

...read moreread less

Proceedings Article•

The loss surface of deep and wide neural networks

[...]

Quynh C. Nguyen¹, Matthias Hein¹•Institutions (1)

Saarland University¹

06 Aug 2017

TL;DR: The authors showed that almost all local minima are globally optimal for a fully connected network with squared loss and analytic activation function given that the number of hidden units of one layer of the network is larger than the training points and the network structure from this layer on is pyramidal.

...read moreread less

Abstract: While the optimization problem behind deep neural networks is highly non-convex, it is frequently observed in practice that training deep networks seems possible without getting stuck in suboptimal points. It has been argued that this is the case as all local minima are close to being globally optimal. We show that this is (almost) true, in fact almost all local minima are globally optimal, for a fully connected network with squared loss and analytic activation function given that the number of hidden units of one layer of the network is larger than the number of training points and the network structure from this layer on is pyramidal.

...read moreread less

Proceedings Article•

A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics

[...]

Yuchen Zhang¹, Percy Liang¹, Moses Charikar¹•Institutions (1)

Stanford University¹

18 Feb 2017

TL;DR: It is proved that for empirical risk minimization, if the empirical risk is point-wise close to the (smooth) population risk, then the algorithm achieves an approximate local minimum of the population risk in polynomial time, escaping suboptimal local minima that only exist in the empiricalrisk.

...read moreread less

Abstract: We study the Stochastic Gradient Langevin Dynamics (SGLD) algorithm for non-convex optimization. The algorithm performs stochastic gradient descent, where in each step it injects appropriately scaled Gaussian noise to the update. We analyze the algorithm's hitting time to an arbitrary subset of the parameter space. Two results follow from our general theory: First, we prove that for empirical risk minimization, if the empirical risk is point-wise close to the (smooth) population risk, then the algorithm achieves an approximate local minimum of the population risk in polynomial time, escaping suboptimal local minima that only exist in the empirical risk. Second, we show that SGLD improves on one of the best known learnability results for learning linear classifiers under the zero-one loss.

...read moreread less

Journal Article•DOI•

Recent progresses of global minimum searches of nanoclusters with a constrained Basin-Hopping algorithm in the TGMin program

[...]

Xin Chen¹, Ya-Fan Zhao, Lai-Sheng Wang², Jun Li¹•Institutions (2)

Tsinghua University¹, Brown University²

01 May 2017-Computational and Theoretical Chemistry

TL;DR: The Tsinghua Global Minimum (TGMin) algorithm as discussed by the authors is based on the Basin-Hopping algorithm to find the global minima of nanoclusters, as well as periodic systems.

...read moreread less

Journal Article•DOI•

Neural Network and Nearest Neighbor Algorithms for Enhancing Sampling of Molecular Dynamics.

[...]

Raimondas Galvelis, Yuji Sugita

04 May 2017-Journal of Chemical Theory and Computation

TL;DR: This work proposes a high-dimensional bias potential method (NN2B) based on two machine learning algorithms: the nearest neighbor density estimator (NNDE) and the artificial neural network (ANN) for the bias potential approximation.

...read moreread less

Abstract: The free energy calculations of complex chemical and biological systems with molecular dynamics (MD) are inefficient due to multiple local minima separated by high-energy barriers. The minima can be escaped using an enhanced sampling method such as metadynamics, which apply bias (i.e., importance sampling) along a set of collective variables (CV), but the maximum number of CVs (or dimensions) is severely limited. We propose a high-dimensional bias potential method (NN2B) based on two machine learning algorithms: the nearest neighbor density estimator (NNDE) and the artificial neural network (ANN) for the bias potential approximation. The bias potential is constructed iteratively from short biased MD simulations accounting for correlation among CVs. Our method is capable of achieving ergodic sampling and calculating free energy of polypeptides with up to 8-dimensional bias potential.

...read moreread less

Proceedings Article•

On the Optimization Landscape of Tensor Decompositions

[...]

Rong Ge¹, Tengyu Ma²•Institutions (2)

Duke University¹, Princeton University²

01 Jun 2017

TL;DR: For the random over-complete tensor decomposition problem, this article showed that for any small constant ε > 0, among the set of points with function values ε ≥ 0, all the local minima are approximate global minima.

...read moreread less

Abstract: Non-convex optimization with local search heuristics has been widely used in machine learning, achieving many state-of-art results. It becomes increasingly important to understand why they can work for these NP-hard problems on typical data. The landscape of many objective functions in learning has been conjectured to have the geometric property that ``all local optima are (approximately) global optima'', and thus they can be solved efficiently by local search algorithms. However, establishing such property can be very difficult. In this paper, we analyze the optimization landscape of the random over-complete tensor decomposition problem, which has many applications in unsupervised leaning, especially in learning latent variable models. In practice, it can be efficiently solved by gradient ascent on a non-convex objective. We show that for any small constant $\epsilon > 0$, among the set of points with function values $(1+\epsilon)$-factor larger than the expectation of the function, all the local maxima are approximate global maxima. Previously, the best-known result only characterizes the geometry in small neighborhoods around the true components. Our result implies that even with an initialization that is barely better than the random guess, the gradient ascent algorithm is guaranteed to solve this problem. Our main technique uses Kac-Rice formula and random matrix theory. To our best knowledge, this is the first time when Kac-Rice formula is successfully applied to counting the number of local minima of a highly-structured random polynomial with dependent coefficients.

...read moreread less

Journal Article•DOI•

Flux-corrected transport algorithms for continuous Galerkin methods based on high order Bernstein finite elements

[...]

Christoph Lohmann¹, Dmitri Kuzmin¹, John N. Shadid², John N. Shadid³, Sibusiso Mabuza³ - Show less +1 more•Institutions (3)

Technical University of Dortmund¹, University of New Mexico², Sandia National Laboratories³

01 Sep 2017-Journal of Computational Physics

TL;DR: A new discrete upwinding strategy leading to local extremum bounded low order approximations with compact stencils, high order variational stabilization based on the difference between two gradient approximation, and new localized limiting techniques for antidiffusive element contributions are proposed.

...read moreread less

Journal Article•DOI•

On the virtual element method for topology optimization on polygonal meshes

[...]

Paola F. Antonietti¹, Matteo Bruggi¹, Simone Scacchi, Marco Verani¹•Institutions (1)

Polytechnic University of Milan¹

01 Sep 2017-Computers & Mathematics With Applications

TL;DR: This paper considers polygonal meshes and employs the virtual element method (VEM) to solve two classes of paradigmatic topology optimization problems, one governed by nearly-incompressible and compressible linear elasticity and the other by Stokes equations.

...read moreread less

Abstract: It is well known that the solution of topology optimization problems may be affected both by the geometric properties of the computational mesh, which can steer the minimization process towards local (and non-physical) minima, and by the accuracy of the method employed to discretize the underlying differential problem, which may not be able to correctly capture the physics of the problem. In light of the above remarks, in this paper we consider polygonal meshes and employ the virtual element method (VEM) to solve two classes of paradigmatic topology optimization problems, one governed by nearly-incompressible and compressible linear elasticity and the other by Stokes equations. Several numerical results show the virtues of our polygonal VEM based approach with respect to more standard methods.

...read moreread less

Journal Article•DOI•

Combinatorial optimization using dynamical phase transitions in driven-dissipative systems.

[...]

Timothée Leleu¹, Yoshihisa Yamamoto², Shoko Utsunomiya³, Kazuyuki Aihara¹•Institutions (3)

University of Tokyo¹, Stanford University², National Institute of Informatics³

14 Feb 2017-Physical Review E

TL;DR: The dynamics of driven-dissipative systems is shown to be well-fitted for achieving efficient combinatorial optimization and the heterogeneity in amplitude can be reduced by setting the parameters of the driving signal near a regime, called the dynamic phase transition, where the analog spins' DC components map more accurately the global minima of the Ising Hamiltonian which, in turn, increases the quality of solutions found.

...read moreread less

Abstract: The dynamics of driven-dissipative systems is shown to be well-fitted for achieving efficient combinatorial optimization. The proposed method can be applied to solve any combinatorial optimization problem that is equivalent to minimizing an Ising Hamiltonian. Moreover, the dynamics considered can be implemented using various physical systems as it is based on generic dynamics---the normal form of the supercritical pitchfork bifurcation. The computational principle of the proposed method relies on an hybrid analog-digital representation of the binary Ising spins by considering the gradient descent of a Lyapunov function that is the sum of an analog Ising Hamiltonian and archetypal single or double-well potentials. By gradually changing the shape of the latter potentials from a single to double well shape, it can be shown that the first nonzero steady states to become stable are associated with global minima of the Ising Hamiltonian, under the approximation that all analog spins have the same amplitude. In the more general case, the heterogeneity in amplitude between analog spins induces the stabilization of local minima, which reduces the quality of solutions to combinatorial optimization problems. However, we show that the heterogeneity in amplitude can be reduced by setting the parameters of the driving signal near a regime, called the dynamic phase transition, where the analog spins' DC components map more accurately the global minima of the Ising Hamiltonian which, in turn, increases the quality of solutions found. Last, we discuss the possibility of a physical implementation of the proposed method using networks of degenerate optical parametric oscillators.

...read moreread less

Posted Content•

Natasha 2: Faster Non-Convex Optimization Than SGD

[...]

Zeyuan Allen-Zhu¹•Institutions (1)

Microsoft¹

29 Aug 2017-arXiv: Optimization and Control

TL;DR: A stochastic algorithm to train any smooth neural network to $\varepsilon$-approximate local minima, using backpropagations to find any smooth nonconvex function in rate, with only oracle access to stochastically gradients.

...read moreread less

Abstract: We design a stochastic algorithm to train any smooth neural network to $\varepsilon$-approximate local minima, using $O(\varepsilon^{-3.25})$ backpropagations. The best result was essentially $O(\varepsilon^{-4})$ by SGD. More broadly, it finds $\varepsilon$-approximate local minima of any smooth nonconvex function in rate $O(\varepsilon^{-3.25})$, with only oracle access to stochastic gradients.

...read moreread less

Journal Article•DOI•

Free energy calculation of mechanically unstable but dynamically stabilized bcc titanium

[...]

Sara Kadkhodaei¹, Qi-Jun Hong¹, Axel van de Walle¹•Institutions (1)

Brown University¹

02 Feb 2017-Physical Review B

TL;DR: In this paper, the phase diagram of numerous materials of technological importance features high-symmetry high-temperature phases that exhibit phonon instabilities, and the authors propose to compute the free energy in such phases by exploring the system's potential energy surface by discrete sampling of local minima by means of a lattice gas Monte Carlo approach.

...read moreread less

Abstract: The phase diagram of numerous materials of technological importance features high-symmetry high-temperature phases that exhibit phonon instabilities. Leading examples include shape-memory alloys, as well as ferroelectric, refractory, and structural materials. The thermodynamics of these phases have proven challenging to handle by atomistic computational thermodynamic techniques due to the occurrence of constant anharmonicity-driven hopping between local low-symmetry distortions, while maintaining a high-symmetry time-averaged structure. To compute the free energy in such phases, we propose to explore the system's potential-energy surface by discrete sampling of local minima by means of a lattice gas Monte Carlo approach and by continuous sampling by means of a lattice dynamics approach in the vicinity of each local minimum. Given the proximity of the local minima, it is necessary to carefully partition phase space by using a Voronoi tessellation to constrain the domain of integration of the partition function in order to avoid double counting artifacts and enable an accurate harmonic treatment near each local minima. We consider the bcc phase of titanium as a prototypical example to illustrate our approach.

...read moreread less

Journal Article•DOI•

Adaptive Genetic Algorithm Based Multi-Objective Optimization for Photovoltaic Cell Design Parameter Extraction

[...]

P. Ashwini Kumari¹, P. Geethanjali¹•Institutions (1)

VIT University¹

01 Jun 2017-Energy Procedia

TL;DR: Pearson residual error optimization (PRO) optimizes least mean square error (LSE) reduction while alleviating the probability of under/over-fitting that ensures optimal PV design parameter identification.

...read moreread less

Journal Article•DOI•

Systematics of Aligned Axions

[...]

Thomas C. Bachlechner¹, Kate Eckerle¹, Kate Eckerle², Oliver Janssen³, Matthew Kleban³ - Show less +1 more•Institutions (3)

Columbia University¹, University of Milan², New York University³

04 Sep 2017-Journal of High Energy Physics

TL;DR: A novel technique is described that renders theories of N axions tractable, and can be used to efficiently analyze a large class of periodic potentials of arbitrary dimension, and is found that in a broad class of random theories, the potential is smooth over diameters enhanced by N3/2 compared to the typical scale of the potential.

...read moreread less

Abstract: We describe a novel technique that renders theories of N axions tractable, and more generally can be used to efficiently analyze a large class of periodic potentials of arbitrary dimension. Such potentials are complex energy landscapes with a number of local minima that scales as $$ \sqrt{N!} $$ , and so for large N appear to be analytically and numerically intractable. Our method is based on uncovering a set of approximate symmetries that exist in addition to the N periods. These approximate symmetries, which are exponentially close to exact, allow us to locate the minima very efficiently and accurately and to analyze other characteristics of the potential. We apply our framework to evaluate the diameters of flat regions suitable for slow-roll inflation, which unifies, corrects and extends several forms of “axion alignment” previously observed in the literature. We find that in a broad class of random theories, the potential is smooth over diameters enhanced by N 3/2 compared to the typical scale of the potential. A Mathematica implementation of our framework is available online.

...read moreread less

Posted Content•

An Empirical Analysis of Deep Network Loss Surfaces

[...]

Daniel Jiwoong Im, Michael Tao¹, Kristin Branson•Institutions (1)

University of Toronto¹

24 Apr 2017

TL;DR: This paper empirically investigate the geometry of the loss functions for state-of-the-art networks with multiple stochastic optimization methods through several experiments that are visualized on polygons to understand how and when these stochastically optimization methods find local minima.

...read moreread less

Abstract: The training of deep neural networks is a high-dimension optimization problem with respect to the loss function of a model. Unfortunately, these functions are of high dimension and non-convex and hence difficult to characterize. In this paper, we empirically investigate the geometry of the loss functions for state-of-the-art networks with multiple stochastic optimization methods. We do this through several experiments that are visualized on polygons to understand how and when these stochastic optimization methods find minima.

...read moreread less

Posted Content•

Local minima in training of deep networks

[...]

Grzegorz Swirszcz¹, Wojciech Marian Czarnecki¹, Razvan Pascanu¹•Institutions (1)

Google¹

24 Apr 2017

TL;DR: It is demonstrated that in this scenario one can construct counter-examples (datasets or initialization schemes) when the network does become susceptible to bad local minima over the weight space.

...read moreread less

Abstract: There has been a lot of recent interest in trying to characterize the error surface of deep models. This stems from a long standing question. Given that deep networks are highly nonlinear systems optimized by local gradient methods, why do they not seem to be affected by bad local minima? It is widely believed that training of deep models using gradient methods works so well because the error surface either has no local minima, or if they exist they need to be close in value to the global minimum. It is known that such results hold under strong assumptions which are not satisfied by real models. In this paper we present examples showing that for such theorem to be true additional assumptions on the data, initialization schemes and/or the model classes have to be made. We look at the particular case of finite size datasets. We demonstrate that in this scenario one can construct counter-examples (datasets or initialization schemes) when the network does become susceptible to bad local minima over the weight space.

...read moreread less

Journal Article•DOI•

Balancing global and local search in parallel efficient global optimization algorithms

[...]

Dawei Zhan¹, Jiachang Qian¹, Yuansheng Cheng¹•Institutions (1)

Huazhong University of Science and Technology¹

01 Apr 2017-Journal of Global Optimization

TL;DR: The results of numerical experiments show that, although the proposed parallel EGO algorithm needs more evaluations to find the optimum compared to the standard EGO algorithms, it is able to reduce the optimization cycles.

...read moreread less

Abstract: Most parallel efficient global optimization (EGO) algorithms focus only on the parallel architectures for producing multiple updating points, but give few attention to the balance between the global search (i.e., sampling in different areas of the search space) and local search (i.e., sampling more intensely in one promising area of the search space) of the updating points. In this study, a novel approach is proposed to apply this idea to further accelerate the search of parallel EGO algorithms. In each cycle of the proposed algorithm, all local maxima of expected improvement (EI) function are identified by a multi-modal optimization algorithm. Then the local EI maxima with value greater than a threshold are selected and candidates are sampled around these selected EI maxima. The results of numerical experiments show that, although the proposed parallel EGO algorithm needs more evaluations to find the optimum compared to the standard EGO algorithm, it is able to reduce the optimization cycles. Moreover, the proposed parallel EGO algorithm gains better results in terms of both number of cycles and evaluations compared to a state-of-the-art parallel EGO algorithm over six test problems.

...read moreread less

Showing papers on "Maxima and minima published in 2017"