Showing papers by "Stanley Osher published in 2018"

PDF

Open Access

Journal Article•DOI•

A review of level-set methods and some recent applications

[...]

Frederic Gibou¹, Ronald Fedkiw², Stanley Osher³•Institutions (3)

University of California, Santa Barbara¹, Stanford University², University of California, Los Angeles³

15 Jan 2018-Journal of Computational Physics

TL;DR: This work discusses how to impose boundary conditions at irregular domains and free boundaries, as well as the extension of level-set methods to adaptive Cartesian grids and parallel architectures.

...read moreread less

289 citations

Journal Article•DOI•

Deep relaxation: partial differential equations for optimizing deep neural networks

[...]

Pratik Chaudhari¹, Adam M. Oberman², Stanley Osher¹, Stefano Soatto¹, Guillaume Carlier³ - Show less +1 more•Institutions (3)

University of California, Los Angeles¹, McGill University², University of Paris³

01 Sep 2018-Research in the Mathematical Sciences

TL;DR: Stochastic homogenization theory allows us to better understand the convergence of the algorithm, and a stochastic control interpretation is used to prove that a modified algorithm converges faster than SGD in expectation.

...read moreread less

Abstract: Entropy-SGD is a first-order optimization method which has been used successfully to train deep neural networks. This algorithm, which was motivated by statistical physics, is now interpreted as gradient descent on a modified loss function. The modified, or relaxed, loss function is the solution of a viscous Hamilton–Jacobi partial differential equation (PDE). Experimental results on modern, high-dimensional neural networks demonstrate that the algorithm converges faster than the benchmark stochastic gradient descent (SGD). Well-established PDE regularity results allow us to analyze the geometry of the relaxed energy landscape, confirming empirical evidence. Stochastic homogenization theory allows us to better understand the convergence of the algorithm. A stochastic control interpretation is used to prove that a modified algorithm converges faster than SGD in expectation.

...read moreread less

135 citations

Journal Article•DOI•

A Parallel Method for Earth Mover’s Distance

[...]

Wuchen Li¹, Ernest K. Ryu¹, Stanley Osher¹, Wotao Yin¹, Wilfrid Gangbo¹ - Show less +1 more•Institutions (1)

University of California, Los Angeles¹

01 Apr 2018-Journal of Scientific Computing

TL;DR: This paper adopts a primal-dual algorithm designed there, which uses very simple updates at each iteration and is shown to converge very rapidly.

...read moreread less

Abstract: We propose a new algorithm to approximate the Earth Mover’s distance (EMD). Our main idea is motivated by the theory of optimal transport, in which EMD can be reformulated as a familiar $$L_1$$ type minimization. We use a regularization which gives us a unique solution for this $$L_1$$ type problem. The new regularized minimization is very similar to problems which have been solved in the fields of compressed sensing and image processing, where several fast methods are available. In this paper, we adopt a primal-dual algorithm designed there, which uses very simple updates at each iteration and is shown to converge very rapidly. Several numerical examples are provided.

...read moreread less

78 citations

Journal Article•DOI•

Computations of Optimal Transport Distance with Fisher Information Regularization

[...]

Wuchen Li¹, Penghang Yin¹, Stanley Osher¹•Institutions (1)

University of California, Los Angeles¹

01 Jun 2018-Journal of Scientific Computing

TL;DR: Newton’s method is adopted, which converges to the minimizer with a quadratic rate, to approximate the optimal transport distance.

...read moreread less

Abstract: We propose a fast algorithm to approximate the optimal transport distance. The main idea is to add a Fisher information regularization into the dynamical setting of the problem, originated by Benamou and Brenier. The regularized problem is shown to be smooth and strictly convex, thus many classical fast algorithms are available. In this paper, we adopt Newton’s method, which converges to the minimizer with a quadratic rate. Several numerical examples are provided.

...read moreread less

49 citations

Posted Content•

Laplacian Smoothing Gradient Descent

[...]

Stanley Osher, Bao Wang, Penghang Yin, Xiyang Luo, Minh Pham, Alex Tong Lin - Show less +2 more

27 Sep 2018-arXiv: Learning

TL;DR: A class of very simple modifications of gradient descent and stochastic gradient descent can dramatically reduce the variance, allow to take a larger step size, and improve the generalization accuracy when applied to a large variety of machine learning problems.

...read moreread less

Abstract: We propose a class of very simple modifications of gradient descent and stochastic gradient descent. We show that when applied to a large variety of machine learning problems, ranging from logistic regression to deep neural nets, the proposed surrogates can dramatically reduce the variance, allow to take a larger step size, and improve the generalization accuracy. The methods only involve multiplying the usual (stochastic) gradient by the inverse of a positive definitive matrix (which can be computed efficiently by FFT) with a low condition number coming from a one-dimensional discrete Laplacian or its high order generalizations. It also preserves the mean and increases the smallest component and decreases the largest component. The theory of Hamilton-Jacobi partial differential equations demonstrates that the implicit version of the new algorithm is almost the same as doing gradient descent on a new function which (i) has the same global minima as the original function and (ii) is ``more convex". Moreover, we show that optimization algorithms with these surrogates converge uniformly in the discrete Sobolev $H_\sigma^p$ sense and reduce the optimality gap for convex optimization problems. The code is available at: \url{this https URL}

...read moreread less

34 citations

Journal Article•DOI•

Algorithm for overcoming the curse of dimensionality for certain non-convex Hamilton–Jacobi equations, projections and differential games

[...]

Yat Tin Chow¹, Jérôme Darbon², Stanley Osher¹, Wotao Yin¹•Institutions (2)

University of California, Los Angeles¹, Brown University²

01 Jan 2018

TL;DR: A method for solving a large class of non-convex Hamilton-Jacobi partial differential equations (HJ PDE), which yields decoupled subproblems, which can be solved in an embarrassingly parallel fashion.

...read moreread less

Abstract: : In this paper, we develop a method for solving a large class of non-convex Hamilton-Jacobi partial differential equations (HJ PDE). The method yields decoupled subproblems, which can be solved in an embarrassingly parallel fashion. The complexity of the resulting algorithm is polynomial in the problem dimension; hence, it overcomes the curse of dimensionality [1, 2]. We extend previous work in[6] and apply the Hopf formula to solve HJ PDE involving non-convex Hamiltonians. We propose an ADMM approach for finding the minimizer associated with the Hopf formula. Some explicit formulae of proximal maps, as well as newly-defined stretch operators, are used in the numerical solutions of ADMM subproblems. Our approach is expected to have wide applications in continuous dynamic games, control theory problems, and elsewhere.

...read moreread less

32 citations

Posted Content•

ResNets Ensemble via the Feynman-Kac Formalism to Improve Natural and Robust Accuracies

[...]

Bao Wang, Binjie Yuan¹, Zuoqiang Shi¹, Stanley Osher²•Institutions (2)

Tsinghua University¹, University of California, Los Angeles²

26 Nov 2018-arXiv: Learning

TL;DR: In this article, the authors propose a simple yet effective ResNets ensemble algorithm to boost the accuracy of the robustly trained model on both clean and adversarial images, which leads to a robust model with a natural accuracy of {\bf 85.62}\% on clean images and a robust accuracy of ${\bf 57.94 \%}$ under the 20 iterations of the IFGSM attack on the CIFAR10.

...read moreread less

Abstract: Empirical adversarial risk minimization (EARM) is a widely used mathematical framework to robustly train deep neural nets (DNNs) that are resistant to adversarial attacks. However, both natural and robust accuracies, in classifying clean and adversarial images, respectively, of the trained robust models are far from satisfactory. In this work, we unify the theory of optimal control of transport equations with the practice of training and testing of ResNets. Based on this unified viewpoint, we propose a simple yet effective ResNets ensemble algorithm to boost the accuracy of the robustly trained model on both clean and adversarial images. The proposed algorithm consists of two components: First, we modify the base ResNets by injecting a variance specified Gaussian noise to the output of each residual mapping. Second, we average over the production of multiple jointly trained modified ResNets to get the final prediction. These two steps give an approximation to the Feynman-Kac formula for representing the solution of a transport equation with viscosity, or a convection-diffusion equation. For the CIFAR10 benchmark, this simple algorithm leads to a robust model with a natural accuracy of {\bf 85.62}\% on clean images and a robust accuracy of ${\bf 57.94 \%}$ under the 20 iterations of the IFGSM attack, which outperforms the current state-of-the-art in defending against IFGSM attack on the CIFAR10. Both natural and robust accuracies of the proposed ResNets ensemble can be improved dynamically as the building block ResNet advances. The code is available at: \url{this https URL}.

...read moreread less

28 citations

Journal Article•DOI•

Time-Optimal Collaborative Guidance Using the Generalized Hopf Formula

[...]

Matthew R. Kirchner¹, Robert Mar¹, Gary Hewer¹, Jérôme Darbon², Stanley Osher³, Yat Tin Chow³ - Show less +2 more•Institutions (3)

Naval Air Warfare Center Weapons Division¹, Brown University², University of California, Los Angeles³

01 Apr 2018

TL;DR: Presented is a new method for calculating the time-optimal guidance control for a multiple vehicle pursuit-evasion system, computed efficiently in high-dimensional space, without a discrete grid, using the generalized Hopf formula.

...read moreread less

Abstract: Presented is a new method for calculating the time-optimal guidance control for a multiple vehicle pursuit-evasion system. A joint differential game of $k$ pursuing vehicles relative to the evader is constructed, and a Hamilton–Jacobi–Isaacs equation that describes the evolution of the value function is formulated. The value function is built such that the terminal cost is the squared distance from the boundary of the terminal surface. Additionally, all vehicles are assumed to have bounded controls. Typically, a joint state space constructed in this way would have too large a dimension to be solved with existing grid-based approaches. The value function is computed efficiently in high-dimensional space, without a discrete grid, using the generalized Hopf formula. The optimal time-to-reach is iteratively solved, and the optimal control is inferred from the gradient of the value function.

...read moreread less

26 citations

Book Chapter•DOI•

Wasserstein Proximal of GANs

[...]

Alex Tong Lin¹, Wuchen Li², Stanley Osher¹, Guido Montúfar³, Guido Montúfar¹ - Show less +1 more•Institutions (3)

University of California, Los Angeles¹, University of South Carolina², Max Planck Society³

06 Oct 2018

TL;DR: In this paper, the Wasserstein-2 metric proximal is applied on the generators to define a parametrization invariant natural gradient by pulling back optimal transport structures from probability space to parameter space.

...read moreread less

Abstract: We introduce a new method for training generative adversarial networks by applying the Wasserstein-2 metric proximal on the generators. The approach is based on Wasserstein information geometry. It defines a parametrization invariant natural gradient by pulling back optimal transport structures from probability space to parameter space. We obtain easy-to-implement iterative regularizers for the parameter updates of implicit deep generative models. Our experiments demonstrate that this method improves the speed and stability of training in terms of wall-clock time and Frechet Inception Distance.

...read moreread less

19 citations

Journal Article•DOI•

Generalization of the Weighted Nonlocal Laplacian in Low Dimensional Manifold Model

[...]

Zuoqiang Shi¹, Stanley Osher², Wei Zhu²•Institutions (2)

Tsinghua University¹, University of California²

01 May 2018-Journal of Scientific Computing

TL;DR: The experimental results show that the computational cost is reduced significantly with the help of WNLL and the results in image inpainting and denoising are also better than the original LDMM and competitive with state-of-the-art methods.

...read moreread less

Abstract: In this paper we use the idea of the weighted nonlocal Laplacian (Shi et al. in J Sci Comput, 2017) to deal with the constraints in the low dimensional manifold model (Osher et al. in SIAM J Imaging Sci, 2017). In the original LDMM, the constraints are enforced by the point integral method. The point integral method provides a correct way to deal with the constraints, however it is not very efficient due to the fact that the symmetry of the original Laplace–Beltrami operator is destroyed. WNLL provides another way to enforce the constraints in LDMM. In WNLL, the discretized system is symmetric and sparse and hence it can be solved very fast. Our experimental results show that the computational cost is reduced significantly with the help of WNLL. Moreover, the results in image inpainting and denoising are also better than the original LDMM and competitive with state-of-the-art methods.

...read moreread less

18 citations

Journal Article•DOI•

Geometric mode decomposition

[...]

Siwei Yu, Jianwei Ma, Stanley Osher

21 Jun 2018-Inverse Problems and Imaging

TL;DR: In this paper, a band-limited a priori knowledge on the Fourier or Radon spectrum is used to obtain the geometric parameters in the data, such as the dominant slope or curvature.

...read moreread less

Abstract: We propose a new decomposition algorithm for seismic data based on a band-limited a priori knowledge on the Fourier or Radon spectrum. This decomposition is called geometric mode decomposition (GMD), as it decomposes a 2D signal into components consisting of linear or parabolic features. Rather than using a predefined frame, GMD adaptively obtains the geometric parameters in the data, such as the dominant slope or curvature. GMD is solved by alternatively pursuing the geometric parameters and the corresponding modes in the Fourier or Radon domain. The geometric parameters are obtained from the weighted center of the corresponding mode's energy spectrum. The mode is obtained by applying a Wiener filter, the design of which is based on a certain band-limited property. We apply GMD to seismic events splitting, noise attenuation, interpolation, and demultiple. The results show that our method is a promising adaptive tool for seismic signal processing, in comparisons with the Fourier and curvelet transforms, empirical mode decomposition (EMD) and variational mode decomposition (VMD) methods.

...read moreread less

Journal Article•DOI•

Unbalanced and Partial $$L_1$$L1 Monge---Kantorovich Problem: A Scalable Parallel First-Order Method

[...]

Ernest K. Ryu¹, Wuchen Li¹, Penghang Yin¹, Stanley Osher¹•Institutions (1)

University of California, Los Angeles¹

01 Jun 2018-Journal of Scientific Computing

TL;DR: This work proposes a new algorithm to solve the unbalanced and partial $$L_1$$L1-Monge–Kantorovich problems that is scalable and parallel, conceptually simple, computationally cheap, and easy to parallelize.

...read moreread less

Abstract: We propose a new algorithm to solve the unbalanced and partial $$L_1$$L1-Monge---Kantorovich problems. The proposed method is a first-order primal-dual method that is scalable and parallel. The method's iterations are conceptually simple, computationally cheap, and easy to parallelize. We provide several numerical examples solved on a CUDA GPU, which demonstrate the method's practical effectiveness.

...read moreread less

Posted Content•

Adversarial Defense via Data Dependent Activation Function and Total Variation Minimization

[...]

Bao Wang, Alex Tong Lin, Zuoqiang Shi, Wei Zhu, Penghang Yin, Andrea L. Bertozzi, Stanley Osher - Show less +3 more

22 Sep 2018-arXiv: Learning

TL;DR: This work improves the robustness of deep neural nets to adversarial attacks by using an interpolating function as the output activation, and combines this data-dependent activation with total variation minimization on adversarial images and training data augmentation.

...read moreread less

Abstract: Author(s): Wang, Bao; Lin, Alex T; Zhu, Wei; Yin, Penghang; Bertozzi, Andrea L; Osher, Stanley J | Abstract: We improve the robustness of Deep Neural Net (DNN) to adversarial attacks by using an interpolating function as the output activation. This data-dependent activation remarkably improves both the generalization and robustness of DNN. In the CIFAR10 benchmark, we raise the robust accuracy of the adversarially trained ResNet20 from $\sim 46\%$ to $\sim 69\%$ under the state-of-the-art Iterative Fast Gradient Sign Method (IFGSM) based adversarial attack. When we combine this data-dependent activation with total variation minimization on adversarial images and training data augmentation, we achieve an improvement in robust accuracy by 38.9$\%$ for ResNet56 under the strongest IFGSM attack. Furthermore, We provide an intuitive explanation of our defense by analyzing the geometry of the feature space.

...read moreread less

Posted Content•

Mathematical Analysis of Adversarial Attacks

[...]

Zehao Dou, Stanley Osher, Bao Wang

15 Nov 2018-arXiv: Learning

TL;DR: It is proved that, within a certain regime, the untargetedFGSM can fool any convolutional neural nets (CNNs) with ReLU activation; the targeted FGSM can mislead any CNNs with Re LU activation to classify any given image into any prescribed class.

...read moreread less

Abstract: In this paper, we analyze efficacy of the fast gradient sign method (FGSM) and the Carlini-Wagner's L2 (CW-L2) attack. We prove that, within a certain regime, the untargeted FGSM can fool any convolutional neural nets (CNNs) with ReLU activation; the targeted FGSM can mislead any CNNs with ReLU activation to classify any given image into any prescribed class. For a special two-layer neural network: a linear layer followed by the softmax output activation, we show that the CW-L2 attack increases the ratio of the classification probability between the target and ground truth classes. Moreover, we provide numerical results to verify all our theoretical results.

...read moreread less

Posted Content•

EnResNet: ResNet Ensemble via the Feynman-Kac Formalism.

[...]

Bao Wang, Binjie Yuan, Zuoqiang Shi, Stanley Osher

26 Nov 2018

TL;DR: It is shown that even an ensemble of two ResNet20 leads to a 5% higher accuracy towards the strongest iterative fast gradient sign attack than the state-of-the-art adversarial defense algorithm.

...read moreread less

Abstract: We propose a simple yet powerful ResNet ensemble algorithm which consists of two components: First, we modify the base ResNet by adding variance specified Gaussian noise to the output of each original residual mapping. Second, we average over the production of multiple parallel and jointly trained modified ResNets to get the final prediction. Heuristically, these two simple steps give an approximation to the well-known Feynman-Kac formula for representing the solution of a transport equation with viscosity, or a convection-diffusion equation. This simple ensemble algorithm improves neural nets' generalizability and robustness towards adversarial attack. In particular, for the CIFAR10 benchmark, with the projected gradient descent adversarial training, we show that even an ensemble of two ResNet20 leads to a 5$\%$ higher accuracy towards the strongest iterative fast gradient sign attack than the state-of-the-art adversarial defense algorithm.

...read moreread less

Journal Article•DOI•

Scientific data interpolation with low dimensional manifold model

[...]

Wei Zhu¹, Bao Wang¹, Richard C. Barnard², Cory D. Hauck², Cory D. Hauck³, Frank Jenko¹, Stanley Osher¹ - Show less +3 more•Institutions (3)

University of California, Los Angeles¹, Oak Ridge National Laboratory², University of Tennessee³

01 Jan 2018-Journal of Computational Physics

TL;DR: A low dimensional manifold model is applied to scientific data interpolation from regular and irregular samplings with a significant amount of missing information through alternating minimization with respect to the manifold and the data set.

...read moreread less

Proceedings Article•DOI•

A Primal-Dual Method for Optimal Control and Trajectory Generation in High-Dimensional Systems

[...]

Matthew R. Kirchner¹, Gary Hewer¹, Jérôme Darbon, Stanley Osher²•Institutions (2)

Naval Air Warfare Center Weapons Division¹, University of California, Los Angeles²

01 Aug 2018

TL;DR: This work presents a primal-dual method for efficient numeric solution and presents how the resulting optimal trajectory can be generated directly from the solution of the Hopf formula, without further optimization.

...read moreread less

Abstract: Presented is a method for efficient computation of the Hamilton-Jacobi (HJ) equation for time-optimal control problems using the generalized Hopf formula. Typically, numerical methods to solve the HJ equation rely on a discrete grid of the solution space and exhibit exponential scaling with dimension. The generalized Hopf formula avoids the use of grids and numerical gradients by formulating an unconstrained convex optimization problem. The solution at each point is completely independent, and allows a massively parallel implementation if solutions at multiple points are desired. This work presents a primal-dual method for efficient numeric solution and presents how the resulting optimal trajectory can be generated directly from the solution of the Hopf formula, without further optimization. Examples presented have execution times on the order of milliseconds and experiments show computation scales approximately polynomial in dimension with very small high-order coefficients.

...read moreread less

Posted Content•

Error estimation of weighted nonlocal Laplacian on random point cloud

[...]

Zuoqiang Shi, Bao Wang, Stanley Osher

23 Sep 2018-arXiv: Numerical Analysis

TL;DR: The analysis of the convergence of the weighted nonlocal Laplacian (WNLL) on high dimensional randomly distributed data reveals the importance of the scaling weight of WNLL for high dimensional data interpolation.

...read moreread less

Abstract: We analyze the convergence of the weighted nonlocal Laplacian (WNLL) on high dimensional randomly distributed data. The analysis reveals the importance of the scaling weight $\mu \sim P|/|S|$ with $|P|$ and $|S|$ be the number of entire and labeled data, respectively. The result gives a theoretical foundation of WNLL for high dimensional data interpolation.

...read moreread less

Journal Article•DOI•

Parallel redistancing using the Hopf-Lax formula

[...]

Michael Royston¹, Andre Pradhana¹, Byungjoon Lee², Yat Tin Chow¹, Wotao Yin¹, Joseph Teran¹, Stanley Osher¹ - Show less +3 more•Institutions (2)

University of California, Los Angeles¹, Catholic University of Korea²

15 Jul 2018-Journal of Computational Physics

TL;DR: A parallel method for solving the eikonal equation associated with level set redistancing using the Hopf–Lax formulation and extending the work of Lee et al.

...read moreread less

Posted Content•

Deep Learning with Data Dependent Implicit Activation Function

[...]

Bao Wang, Xiyang Luo, Zhen Li, Wei Zhu, Zuoqiang Shi, Stanley Osher - Show less +2 more

01 Feb 2018-arXiv: Learning

TL;DR: Novel deep neural network structures that can be inherited from all existing DNNs with almost the same level of complexity are proposed, and it is shown that the paradigm successfully resolves the lack of data issue.

...read moreread less

Abstract: Though deep neural networks (DNNs) achieve remarkable performances in many artificial intelligence tasks, the lack of training instances remains a notorious challenge. As the network goes deeper, the generalization accuracy decays rapidly in the situation of lacking massive amounts of training data. In this paper, we propose novel deep neural network structures that can be inherited from all existing DNNs with almost the same level of complexity, and develop simple training algorithms. We show our paradigm successfully resolves the lack of data issue. Tests on the CIFAR10 and CIFAR100 image recognition datasets show that the new paradigm leads to 20$\%$ to $30\%$ relative error rate reduction compared to their base DNNs. The intuition of our algorithms for deep residual network stems from theories of the partial differential equation (PDE) control problems. Code will be made available.

...read moreread less

Posted Content•

Constrained dynamical optimal transport and its Lagrangian formulation

[...]

Wuchen Li, Stanley Osher

03 Jul 2018-arXiv: Optimization and Control

TL;DR: This work derives a simple formulation for the constrained dynamical OT problems constrained in a parameterized probability subset in application problems such as deep learning.

...read moreread less

Abstract: We propose dynamical optimal transport (OT) problems constrained in a parameterized probability subset. In application problems such as deep learning, the probability distribution is often generated by a parameterized mapping function. In this case, we derive a simple formulation for the constrained dynamical OT.

...read moreread less

Proceedings Article•DOI•

Scalable Low Dimensional Manifold Model In The Reconstruction Of Noisy And Incomplete Hyperspectral Images

[...]

Wei Zhu¹, Zuoqiang Shi², Stanley Osher³•Institutions (3)

Duke University¹, Tsinghua University², University of California, Los Angeles³

01 Sep 2018

TL;DR: In this paper, the dimension of the manifold is directly used as a regularizer in a variational functional, which is solved efficiently by alternating direction of minimization and weighted nonlocal Laplacian.

...read moreread less

Abstract: We present a scalable low dimensional manifold model for the reconstruction of noisy and incomplete hyperspectral images. The model is based on the observation that the spatial-spectral blocks of a hyperspectral image typically lie close to a collection of low dimensional manifolds. To emphasize this, the dimension of the manifold is directly used as a regularizer in a variational functional, which is solved efficiently by alternating direction of minimization and weighted nonlocal Laplacian. Unlike general 3D images, the same similarity matrix can be shared across all spectral bands for a hyperspectral image, therefore the resulting algorithm is much more scalable than that for general 3D data [1]. Numerical experiments on the reconstruction of hyperspectral images from sparse and noisy sampling demonstrate the superiority of our proposed algorithm in terms of both speed and accuracy.

...read moreread less