scispace - formally typeset
Search or ask a question

Showing papers by "Stanley Osher published in 2018"


Journal ArticleDOI
TL;DR: This work discusses how to impose boundary conditions at irregular domains and free boundaries, as well as the extension of level-set methods to adaptive Cartesian grids and parallel architectures.

289 citations


Journal ArticleDOI
TL;DR: Stochastic homogenization theory allows us to better understand the convergence of the algorithm, and a stochastic control interpretation is used to prove that a modified algorithm converges faster than SGD in expectation.
Abstract: Entropy-SGD is a first-order optimization method which has been used successfully to train deep neural networks. This algorithm, which was motivated by statistical physics, is now interpreted as gradient descent on a modified loss function. The modified, or relaxed, loss function is the solution of a viscous Hamilton–Jacobi partial differential equation (PDE). Experimental results on modern, high-dimensional neural networks demonstrate that the algorithm converges faster than the benchmark stochastic gradient descent (SGD). Well-established PDE regularity results allow us to analyze the geometry of the relaxed energy landscape, confirming empirical evidence. Stochastic homogenization theory allows us to better understand the convergence of the algorithm. A stochastic control interpretation is used to prove that a modified algorithm converges faster than SGD in expectation.

135 citations


Journal ArticleDOI
TL;DR: This paper adopts a primal-dual algorithm designed there, which uses very simple updates at each iteration and is shown to converge very rapidly.
Abstract: We propose a new algorithm to approximate the Earth Mover’s distance (EMD). Our main idea is motivated by the theory of optimal transport, in which EMD can be reformulated as a familiar $$L_1$$ type minimization. We use a regularization which gives us a unique solution for this $$L_1$$ type problem. The new regularized minimization is very similar to problems which have been solved in the fields of compressed sensing and image processing, where several fast methods are available. In this paper, we adopt a primal-dual algorithm designed there, which uses very simple updates at each iteration and is shown to converge very rapidly. Several numerical examples are provided.

78 citations


Journal ArticleDOI
TL;DR: Newton’s method is adopted, which converges to the minimizer with a quadratic rate, to approximate the optimal transport distance.
Abstract: We propose a fast algorithm to approximate the optimal transport distance. The main idea is to add a Fisher information regularization into the dynamical setting of the problem, originated by Benamou and Brenier. The regularized problem is shown to be smooth and strictly convex, thus many classical fast algorithms are available. In this paper, we adopt Newton’s method, which converges to the minimizer with a quadratic rate. Several numerical examples are provided.

49 citations


Posted Content
TL;DR: A class of very simple modifications of gradient descent and stochastic gradient descent can dramatically reduce the variance, allow to take a larger step size, and improve the generalization accuracy when applied to a large variety of machine learning problems.
Abstract: We propose a class of very simple modifications of gradient descent and stochastic gradient descent. We show that when applied to a large variety of machine learning problems, ranging from logistic regression to deep neural nets, the proposed surrogates can dramatically reduce the variance, allow to take a larger step size, and improve the generalization accuracy. The methods only involve multiplying the usual (stochastic) gradient by the inverse of a positive definitive matrix (which can be computed efficiently by FFT) with a low condition number coming from a one-dimensional discrete Laplacian or its high order generalizations. It also preserves the mean and increases the smallest component and decreases the largest component. The theory of Hamilton-Jacobi partial differential equations demonstrates that the implicit version of the new algorithm is almost the same as doing gradient descent on a new function which (i) has the same global minima as the original function and (ii) is ``more convex". Moreover, we show that optimization algorithms with these surrogates converge uniformly in the discrete Sobolev $H_\sigma^p$ sense and reduce the optimality gap for convex optimization problems. The code is available at: \url{this https URL}

34 citations


Journal ArticleDOI
01 Jan 2018
TL;DR: A method for solving a large class of non-convex Hamilton-Jacobi partial differential equations (HJ PDE), which yields decoupled subproblems, which can be solved in an embarrassingly parallel fashion.
Abstract: : In this paper, we develop a method for solving a large class of non-convex Hamilton-Jacobi partial differential equations (HJ PDE). The method yields decoupled subproblems, which can be solved in an embarrassingly parallel fashion. The complexity of the resulting algorithm is polynomial in the problem dimension; hence, it overcomes the curse of dimensionality [1, 2]. We extend previous work in[6] and apply the Hopf formula to solve HJ PDE involving non-convex Hamiltonians. We propose an ADMM approach for finding the minimizer associated with the Hopf formula. Some explicit formulae of proximal maps, as well as newly-defined stretch operators, are used in the numerical solutions of ADMM subproblems. Our approach is expected to have wide applications in continuous dynamic games, control theory problems, and elsewhere.

32 citations


Posted Content
TL;DR: In this article, the authors propose a simple yet effective ResNets ensemble algorithm to boost the accuracy of the robustly trained model on both clean and adversarial images, which leads to a robust model with a natural accuracy of {\bf 85.62}\% on clean images and a robust accuracy of ${\bf 57.94 \%}$ under the 20 iterations of the IFGSM attack on the CIFAR10.
Abstract: Empirical adversarial risk minimization (EARM) is a widely used mathematical framework to robustly train deep neural nets (DNNs) that are resistant to adversarial attacks. However, both natural and robust accuracies, in classifying clean and adversarial images, respectively, of the trained robust models are far from satisfactory. In this work, we unify the theory of optimal control of transport equations with the practice of training and testing of ResNets. Based on this unified viewpoint, we propose a simple yet effective ResNets ensemble algorithm to boost the accuracy of the robustly trained model on both clean and adversarial images. The proposed algorithm consists of two components: First, we modify the base ResNets by injecting a variance specified Gaussian noise to the output of each residual mapping. Second, we average over the production of multiple jointly trained modified ResNets to get the final prediction. These two steps give an approximation to the Feynman-Kac formula for representing the solution of a transport equation with viscosity, or a convection-diffusion equation. For the CIFAR10 benchmark, this simple algorithm leads to a robust model with a natural accuracy of {\bf 85.62}\% on clean images and a robust accuracy of ${\bf 57.94 \%}$ under the 20 iterations of the IFGSM attack, which outperforms the current state-of-the-art in defending against IFGSM attack on the CIFAR10. Both natural and robust accuracies of the proposed ResNets ensemble can be improved dynamically as the building block ResNet advances. The code is available at: \url{this https URL}.

28 citations


Journal ArticleDOI
01 Apr 2018
TL;DR: Presented is a new method for calculating the time-optimal guidance control for a multiple vehicle pursuit-evasion system, computed efficiently in high-dimensional space, without a discrete grid, using the generalized Hopf formula.
Abstract: Presented is a new method for calculating the time-optimal guidance control for a multiple vehicle pursuit-evasion system. A joint differential game of $k$ pursuing vehicles relative to the evader is constructed, and a Hamilton–Jacobi–Isaacs equation that describes the evolution of the value function is formulated. The value function is built such that the terminal cost is the squared distance from the boundary of the terminal surface. Additionally, all vehicles are assumed to have bounded controls. Typically, a joint state space constructed in this way would have too large a dimension to be solved with existing grid-based approaches. The value function is computed efficiently in high-dimensional space, without a discrete grid, using the generalized Hopf formula. The optimal time-to-reach is iteratively solved, and the optimal control is inferred from the gradient of the value function.

26 citations


Book ChapterDOI
06 Oct 2018
TL;DR: In this paper, the Wasserstein-2 metric proximal is applied on the generators to define a parametrization invariant natural gradient by pulling back optimal transport structures from probability space to parameter space.
Abstract: We introduce a new method for training generative adversarial networks by applying the Wasserstein-2 metric proximal on the generators. The approach is based on Wasserstein information geometry. It defines a parametrization invariant natural gradient by pulling back optimal transport structures from probability space to parameter space. We obtain easy-to-implement iterative regularizers for the parameter updates of implicit deep generative models. Our experiments demonstrate that this method improves the speed and stability of training in terms of wall-clock time and Frechet Inception Distance.

19 citations


Journal ArticleDOI
TL;DR: The experimental results show that the computational cost is reduced significantly with the help of WNLL and the results in image inpainting and denoising are also better than the original LDMM and competitive with state-of-the-art methods.
Abstract: In this paper we use the idea of the weighted nonlocal Laplacian (Shi et al. in J Sci Comput, 2017) to deal with the constraints in the low dimensional manifold model (Osher et al. in SIAM J Imaging Sci, 2017). In the original LDMM, the constraints are enforced by the point integral method. The point integral method provides a correct way to deal with the constraints, however it is not very efficient due to the fact that the symmetry of the original Laplace–Beltrami operator is destroyed. WNLL provides another way to enforce the constraints in LDMM. In WNLL, the discretized system is symmetric and sparse and hence it can be solved very fast. Our experimental results show that the computational cost is reduced significantly with the help of WNLL. Moreover, the results in image inpainting and denoising are also better than the original LDMM and competitive with state-of-the-art methods.

18 citations


Journal ArticleDOI
TL;DR: In this paper, a band-limited a priori knowledge on the Fourier or Radon spectrum is used to obtain the geometric parameters in the data, such as the dominant slope or curvature.
Abstract: We propose a new decomposition algorithm for seismic data based on a band-limited a priori knowledge on the Fourier or Radon spectrum. This decomposition is called geometric mode decomposition (GMD), as it decomposes a 2D signal into components consisting of linear or parabolic features. Rather than using a predefined frame, GMD adaptively obtains the geometric parameters in the data, such as the dominant slope or curvature. GMD is solved by alternatively pursuing the geometric parameters and the corresponding modes in the Fourier or Radon domain. The geometric parameters are obtained from the weighted center of the corresponding mode's energy spectrum. The mode is obtained by applying a Wiener filter, the design of which is based on a certain band-limited property. We apply GMD to seismic events splitting, noise attenuation, interpolation, and demultiple. The results show that our method is a promising adaptive tool for seismic signal processing, in comparisons with the Fourier and curvelet transforms, empirical mode decomposition (EMD) and variational mode decomposition (VMD) methods.

Journal ArticleDOI
TL;DR: This work proposes a new algorithm to solve the unbalanced and partial $$L_1$$L1-Monge–Kantorovich problems that is scalable and parallel, conceptually simple, computationally cheap, and easy to parallelize.
Abstract: We propose a new algorithm to solve the unbalanced and partial $$L_1$$L1-Monge---Kantorovich problems. The proposed method is a first-order primal-dual method that is scalable and parallel. The method's iterations are conceptually simple, computationally cheap, and easy to parallelize. We provide several numerical examples solved on a CUDA GPU, which demonstrate the method's practical effectiveness.

Posted Content
TL;DR: This work improves the robustness of deep neural nets to adversarial attacks by using an interpolating function as the output activation, and combines this data-dependent activation with total variation minimization on adversarial images and training data augmentation.
Abstract: Author(s): Wang, Bao; Lin, Alex T; Zhu, Wei; Yin, Penghang; Bertozzi, Andrea L; Osher, Stanley J | Abstract: We improve the robustness of Deep Neural Net (DNN) to adversarial attacks by using an interpolating function as the output activation. This data-dependent activation remarkably improves both the generalization and robustness of DNN. In the CIFAR10 benchmark, we raise the robust accuracy of the adversarially trained ResNet20 from $\sim 46\%$ to $\sim 69\%$ under the state-of-the-art Iterative Fast Gradient Sign Method (IFGSM) based adversarial attack. When we combine this data-dependent activation with total variation minimization on adversarial images and training data augmentation, we achieve an improvement in robust accuracy by 38.9$\%$ for ResNet56 under the strongest IFGSM attack. Furthermore, We provide an intuitive explanation of our defense by analyzing the geometry of the feature space.

Posted Content
TL;DR: It is proved that, within a certain regime, the untargetedFGSM can fool any convolutional neural nets (CNNs) with ReLU activation; the targeted FGSM can mislead any CNNs with Re LU activation to classify any given image into any prescribed class.
Abstract: In this paper, we analyze efficacy of the fast gradient sign method (FGSM) and the Carlini-Wagner's L2 (CW-L2) attack. We prove that, within a certain regime, the untargeted FGSM can fool any convolutional neural nets (CNNs) with ReLU activation; the targeted FGSM can mislead any CNNs with ReLU activation to classify any given image into any prescribed class. For a special two-layer neural network: a linear layer followed by the softmax output activation, we show that the CW-L2 attack increases the ratio of the classification probability between the target and ground truth classes. Moreover, we provide numerical results to verify all our theoretical results.

Posted Content
26 Nov 2018
TL;DR: It is shown that even an ensemble of two ResNet20 leads to a 5% higher accuracy towards the strongest iterative fast gradient sign attack than the state-of-the-art adversarial defense algorithm.
Abstract: We propose a simple yet powerful ResNet ensemble algorithm which consists of two components: First, we modify the base ResNet by adding variance specified Gaussian noise to the output of each original residual mapping. Second, we average over the production of multiple parallel and jointly trained modified ResNets to get the final prediction. Heuristically, these two simple steps give an approximation to the well-known Feynman-Kac formula for representing the solution of a transport equation with viscosity, or a convection-diffusion equation. This simple ensemble algorithm improves neural nets' generalizability and robustness towards adversarial attack. In particular, for the CIFAR10 benchmark, with the projected gradient descent adversarial training, we show that even an ensemble of two ResNet20 leads to a 5$\%$ higher accuracy towards the strongest iterative fast gradient sign attack than the state-of-the-art adversarial defense algorithm.

Journal ArticleDOI
TL;DR: A low dimensional manifold model is applied to scientific data interpolation from regular and irregular samplings with a significant amount of missing information through alternating minimization with respect to the manifold and the data set.

Proceedings ArticleDOI
01 Aug 2018
TL;DR: This work presents a primal-dual method for efficient numeric solution and presents how the resulting optimal trajectory can be generated directly from the solution of the Hopf formula, without further optimization.
Abstract: Presented is a method for efficient computation of the Hamilton-Jacobi (HJ) equation for time-optimal control problems using the generalized Hopf formula. Typically, numerical methods to solve the HJ equation rely on a discrete grid of the solution space and exhibit exponential scaling with dimension. The generalized Hopf formula avoids the use of grids and numerical gradients by formulating an unconstrained convex optimization problem. The solution at each point is completely independent, and allows a massively parallel implementation if solutions at multiple points are desired. This work presents a primal-dual method for efficient numeric solution and presents how the resulting optimal trajectory can be generated directly from the solution of the Hopf formula, without further optimization. Examples presented have execution times on the order of milliseconds and experiments show computation scales approximately polynomial in dimension with very small high-order coefficients.

Posted Content
TL;DR: The analysis of the convergence of the weighted nonlocal Laplacian (WNLL) on high dimensional randomly distributed data reveals the importance of the scaling weight of WNLL for high dimensional data interpolation.
Abstract: We analyze the convergence of the weighted nonlocal Laplacian (WNLL) on high dimensional randomly distributed data. The analysis reveals the importance of the scaling weight $\mu \sim P|/|S|$ with $|P|$ and $|S|$ be the number of entire and labeled data, respectively. The result gives a theoretical foundation of WNLL for high dimensional data interpolation.

Journal ArticleDOI
TL;DR: A parallel method for solving the eikonal equation associated with level set redistancing using the Hopf–Lax formulation and extending the work of Lee et al.

Posted Content
TL;DR: Novel deep neural network structures that can be inherited from all existing DNNs with almost the same level of complexity are proposed, and it is shown that the paradigm successfully resolves the lack of data issue.
Abstract: Though deep neural networks (DNNs) achieve remarkable performances in many artificial intelligence tasks, the lack of training instances remains a notorious challenge. As the network goes deeper, the generalization accuracy decays rapidly in the situation of lacking massive amounts of training data. In this paper, we propose novel deep neural network structures that can be inherited from all existing DNNs with almost the same level of complexity, and develop simple training algorithms. We show our paradigm successfully resolves the lack of data issue. Tests on the CIFAR10 and CIFAR100 image recognition datasets show that the new paradigm leads to 20$\%$ to $30\%$ relative error rate reduction compared to their base DNNs. The intuition of our algorithms for deep residual network stems from theories of the partial differential equation (PDE) control problems. Code will be made available.

Posted Content
TL;DR: This work derives a simple formulation for the constrained dynamical OT problems constrained in a parameterized probability subset in application problems such as deep learning.
Abstract: We propose dynamical optimal transport (OT) problems constrained in a parameterized probability subset. In application problems such as deep learning, the probability distribution is often generated by a parameterized mapping function. In this case, we derive a simple formulation for the constrained dynamical OT.

Proceedings ArticleDOI
01 Sep 2018
TL;DR: In this paper, the dimension of the manifold is directly used as a regularizer in a variational functional, which is solved efficiently by alternating direction of minimization and weighted nonlocal Laplacian.
Abstract: We present a scalable low dimensional manifold model for the reconstruction of noisy and incomplete hyperspectral images. The model is based on the observation that the spatial-spectral blocks of a hyperspectral image typically lie close to a collection of low dimensional manifolds. To emphasize this, the dimension of the manifold is directly used as a regularizer in a variational functional, which is solved efficiently by alternating direction of minimization and weighted nonlocal Laplacian. Unlike general 3D images, the same similarity matrix can be shared across all spectral bands for a hyperspectral image, therefore the resulting algorithm is much more scalable than that for general 3D data [1]. Numerical experiments on the reconstruction of hyperspectral images from sparse and noisy sampling demonstrate the superiority of our proposed algorithm in terms of both speed and accuracy.