scispace - formally typeset
Search or ask a question
Author

Stanley Osher

Bio: Stanley Osher is an academic researcher from University of California, Los Angeles. The author has contributed to research in topics: Level set method & Hyperbolic partial differential equation. The author has an hindex of 114, co-authored 510 publications receiving 104028 citations. Previous affiliations of Stanley Osher include University of Minnesota & University of Innsbruck.


Papers
More filters
Journal ArticleDOI
TL;DR: This work improves the robustness of Deep Neural Net to adversarial attacks by using an interpolating function as the output activation, and achieves an improvement in robust accuracy by 38.9% for ResNet56 under the strongest IFGSM attack.
Abstract: We improve the robustness of Deep Neural Net (DNN) to adversarial attacks by using an interpolating function as the output activation. This data-dependent activation remarkably improves both the generalization and robustness of DNN. In the CIFAR10 benchmark, we raise the robust accuracy of the adversarially trained ResNet20 from \begin{document}$ \sim 46\% $\end{document} to \begin{document}$ \sim 69\% $\end{document} under the state-of-the-art Iterative Fast Gradient Sign Method (IFGSM) based adversarial attack. When we combine this data-dependent activation with total variation minimization on adversarial images and training data augmentation, we achieve an improvement in robust accuracy by 38.9 \begin{document}$ \% $\end{document} for ResNet56 under the strongest IFGSM attack. Furthermore, We provide an intuitive explanation of our defense by analyzing the geometry of the feature space.

4 citations

Posted Content
TL;DR: This work derives a simple formulation for the constrained dynamical OT problems constrained in a parameterized probability subset in application problems such as deep learning.
Abstract: We propose dynamical optimal transport (OT) problems constrained in a parameterized probability subset. In application problems such as deep learning, the probability distribution is often generated by a parameterized mapping function. In this case, we derive a simple formulation for the constrained dynamical OT.

4 citations

Journal ArticleDOI
TL;DR: An energy-efficient velocity control algorithm for a large number of UAVs based on mean field games and the G-prox primal dual hybrid gradient (PDHG) method is proposed, which can avoid obstacles and provide communication services for the search and rescue team while minimizing their energy consumption.
Abstract: In this paper, we propose an energy-efficient velocity control algorithm for a large number of UAVs based on mean field games. In particular, we first formulate the velocity control problem for a large number of UAVs as a differential game, where we jointly consider the energy consumption, channel capacity, and obstacle avoidance in the cost function. Meanwhile, the state dynamics are used to describe the motion of UAVs under the influence of the wind. Then we derive the corresponding mean field game for a large number of UAVs and solve it with the G-prox primal dual hybrid gradient (PDHG) method using its underlying variational primal dual structure. Scalability analysis shows that the computational complexity of the proposed method is unrelated to the number of UAVs. Based on the PDHG method, we conduct a comprehensive experiment where we analytically show the fast convergence of our energy-efficient velocity control algorithm by the convergence of the residual errors of the Hamilton-Jacobi-Bellman equation and the Fokker-Planck-Kolmogorov equation. The experiment also shows that a large number of UAVs can avoid obstacles and provide communication services for the search and rescue team while minimizing their energy consumption.

4 citations

Journal ArticleDOI
TL;DR: The Hamilton-Jacobi-based Moreau adaptive descent (HJ-MAD) as discussed by the authors is a zero-order algorithm with guaranteed convergence to global minima, assuming continuity of the objective function.
Abstract: Computing tasks may often be posed as optimization problems. The objective functions for real-world scenarios are often nonconvex and/or nondifferentiable. State-of-the-art methods for solving these problems typically only guarantee convergence to local minima. This work presents Hamilton-Jacobi-based Moreau adaptive descent (HJ-MAD), a zero-order algorithm with guaranteed convergence to global minima, assuming continuity of the objective function. The core idea is to compute gradients of the Moreau envelope of the objective (which is “piece-wise convex”) with adaptive smoothing parameters. Gradients of the Moreau envelope (i.e., proximal operators) are approximated via the Hopf-Lax formula for the viscous Hamilton-Jacobi equation. Our numerical examples illustrate global convergence.

4 citations

Book ChapterDOI
01 Jan 2016
TL;DR: A new and remarkably fast algorithm for solving a large class of high dimensional Hamilton-Jacobi (H-J) initial value problems arising in optimal control and elsewhere is outlined.
Abstract: In this chapter we briefly outline a new and remarkably fast algorithm for solving a large class of high dimensional Hamilton-Jacobi (H-J) initial value problems arising in optimal control and elsewhere [1]. This is done without the use of grids or numerical approximations. Moreover, by using the level set method [8] we can rapidly compute projections of a point in \(\mathbb{R}^{n}\), n large to a fairly arbitrary compact set [2]. The method seems to generalize widely beyond what will we present here to some nonconvex Hamiltonians, new linear programming algorithms, differential games, and perhaps state dependent Hamiltonians.

4 citations


Cited by
More filters
Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations

Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

01 May 1993
TL;DR: Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems.
Abstract: Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently—those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed-memory parallel machine which allows for message-passing of data between independently executing processors. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers--the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems. For large problems, the spatial algorithm achieves parallel efficiencies of 90% and a 1840-node Intel Paragon performs up to 165 faster than a single Cray C9O processor. Trade-offs between the three algorithms and guidelines for adapting them to more complex molecular dynamics simulations are also discussed.

29,323 citations

Book
23 May 2011
TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

17,433 citations