scispace - formally typeset
Search or ask a question
Author

Stephan Wojtowytsch

Bio: Stephan Wojtowytsch is an academic researcher from Princeton University. The author has contributed to research in topics: Artificial neural network & Willmore energy. The author has an hindex of 8, co-authored 36 publications receiving 261 citations. Previous affiliations of Stephan Wojtowytsch include Carnegie Mellon University & Durham University.

Papers
More filters
Posted Content
TL;DR: The purpose of this article is to review the achievements made in the last few years towards the understanding of the reasons behind the success and subtleties of neural network-based machine learning.
Abstract: The purpose of this article is to review the achievements made in the last few years towards the understanding of the reasons behind the success and subtleties of neural network-based machine learning. In the tradition of good old applied mathematics, we will not only give attention to rigorous mathematical results, but also the insight we have gained from careful numerical experiments as well as the analysis of simplified models. Along the way, we also list the open problems which we believe to be the most important topics for further study. This is not a complete overview over this quickly moving field, but we hope to provide a perspective which may be helpful especially to new researchers in the area.

85 citations

Posted Content
TL;DR: It is shown that functions whose singular set is fractal or curved (for example distance functions from smooth submanifolds) cannot be represented by infinitely wide two-layer networks with finite path-norm.
Abstract: We study the natural function space for infinitely wide two-layer neural networks with ReLU activation (Barron space) and establish different representation formulae. In two cases, we describe the space explicitly up to isomorphism. Using a convenient representation, we study the pointwise properties of two-layer networks and show that functions whose singular set is fractal or curved (for example distance functions from smooth submanifolds) cannot be represented by infinitely wide two-layer networks with finite path-norm. We use this structure theorem to show that the only $C^1$-diffeomorphisms which Barron space are affine. Furthermore, we show that every Barron function can be decomposed as the sum of a bounded and a positively one-homogeneous function and that there exist Barron functions which decay rapidly at infinity and are globally Lebesgue-integrable. This result suggests that two-layer neural networks may be able to approximate a greater variety of functions than commonly believed.

45 citations

Posted Content
TL;DR: A necessary and sufficient condition for the convergence to minimum Bayes risk when training two-layer ReLU-networks by gradient descent in the mean field regime with omni-directional initial parameter distribution is described.
Abstract: We describe a necessary and sufficient condition for the convergence to minimum Bayes risk when training two-layer ReLU-networks by gradient descent in the mean field regime with omni-directional initial parameter distribution. This article extends recent results of Chizat and Bach to ReLU-activated networks and to the situation in which there are no parameters which exactly achieve MBR. The condition does not depend on the initalization of parameters and concerns only the weak convergence of the realization of the neural network, not its parameter distribution.

34 citations

Journal ArticleDOI
TL;DR: In this paper, a phase field approximation based on De Giorgi's diffuse Willmore functional is proposed to solve the variational problem of minimizing the Willmore energy in the class of connected surfaces.
Abstract: This article is concerned with the problem of minimising the Willmore energy in the class of connected surfaces with prescribed area which are confined to a small container. We propose a phase field approximation based on De Giorgi’s diffuse Willmore functional to this variational problem. Our main contribution is a penalisation term which ensures connectedness in the sharp interface limit. The penalisation of disconnectedness is based on a geodesic distance chosen to be small between two points that lie on the same connected component of the transition layer of the phase field. We prove that in two dimensions, sequences of phase fields with uniformly bounded diffuse Willmore energy and diffuse area converge uniformly to the zeros of a double-well potential away from the support of a limiting measure. In three dimensions, we show that they converge \({\mathcal{H}^1}\)-almost everywhere on curves. This enables us to show \({\Gamma}\)-convergence to a sharp interface problem that only allows for connected structures. The results also imply Hausdorff convergence of the level sets in two dimensions and a similar result in three dimensions. Furthermore, we present numerical evidence of the effectiveness of our model. The implementation relies on a coupling of Dijkstra’s algorithm in order to compute the topological penalty to a finite element approach for the Willmore term.

29 citations

Posted Content
TL;DR: It is proved that the gradient descent training of a two-layer neural network on empirical or population risk may not decrease population risk at an order faster than $t^{-4/(d-2)}$ under mean field scaling, and gradient descentTraining for fitting reasonably smooth, but truly high-dimensional data may be subject to the curse of dimensionality.
Abstract: We prove that the gradient descent training of a two-layer neural network on empirical or population risk may not decrease population risk at an order faster than $t^{-4/(d-2)}$ under mean field scaling. Thus gradient descent training for fitting reasonably smooth, but truly high-dimensional data may be subject to the curse of dimensionality. We present numerical evidence that gradient descent training with general Lipschitz target functions becomes slower and slower as the dimension increases, but converges at approximately the same rate in all dimensions when the target function lies in the natural function space for two-layer ReLU networks.

28 citations


Cited by
More filters
Reference EntryDOI
15 Oct 2004

2,118 citations

Journal ArticleDOI
TL;DR: It is shown that the full set of hydromagnetic equations admit five more integrals, besides the energy integral, if dissipative processes are absent, which made it possible to formulate a variational principle for the force-free magnetic fields.
Abstract: where A represents the magnetic vector potential, is an integral of the hydromagnetic equations. This -integral made it possible to formulate a variational principle for the force-free magnetic fields. The integral expresses the fact that motions cannot transform a given field in an entirely arbitrary different field, if the conductivity of the medium isconsidered infinite. In this paper we shall show that the full set of hydromagnetic equations admit five more integrals, besides the energy integral, if dissipative processes are absent. These integrals, as we shall presently verify, are I2 =fbHvdV, (2)

1,858 citations

Posted Content
TL;DR: It is shown that the limits of the gradient flow on exponentially tailed losses can be fully characterized as a max-margin classifier in a certain non-Hilbertian space of functions.
Abstract: Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based methods are observed to perform well in many supervised classification tasks. Towards understanding this phenomenon, we analyze the training and generalization behavior of infinitely wide two-layer neural networks with homogeneous activations. We show that the limits of the gradient flow on exponentially tailed losses can be fully characterized as a max-margin classifier in a certain non-Hilbertian space of functions. In presence of hidden low-dimensional structures, the resulting margin is independent of the ambiant dimension, which leads to strong generalization bounds. In contrast, training only the output layer implicitly solves a kernel support vector machine, which a priori does not enjoy such an adaptivity. Our analysis of training is non-quantitative in terms of running time but we prove computational guarantees in simplified settings by showing equivalences with online mirror descent. Finally, numerical experiments suggest that our analysis describes well the practical behavior of two-layer neural networks with ReLU activation and confirm the statistical benefits of this implicit bias.

197 citations