Home
/
Authors
/
Stephan Wojtowytsch

Author

Stephan Wojtowytsch

Other affiliations: Carnegie Mellon University, Durham University

Bio: Stephan Wojtowytsch is an academic researcher from Princeton University. The author has contributed to research in topics: Artificial neural network & Willmore energy. The author has an hindex of 8, co-authored 36 publications receiving 261 citations. Previous affiliations of Stephan Wojtowytsch include Carnegie Mellon University & Durham University.

Papers

PDF

Open Access

More filters

Posted Content•

Towards a Mathematical Understanding of Neural Network-Based Machine Learning: what we know and what we don't

[...]

Weinan E¹, Chao Ma², Stephan Wojtowytsch¹, Lei Wu¹•Institutions (2)

Princeton University¹, Stanford University²

22 Sep 2020-arXiv: Learning

TL;DR: The purpose of this article is to review the achievements made in the last few years towards the understanding of the reasons behind the success and subtleties of neural network-based machine learning.

...read moreread less

Abstract: The purpose of this article is to review the achievements made in the last few years towards the understanding of the reasons behind the success and subtleties of neural network-based machine learning. In the tradition of good old applied mathematics, we will not only give attention to rigorous mathematical results, but also the insight we have gained from careful numerical experiments as well as the analysis of simplified models. Along the way, we also list the open problems which we believe to be the most important topics for further study. This is not a complete overview over this quickly moving field, but we hope to provide a perspective which may be helpful especially to new researchers in the area.

...read moreread less

85 citations

Posted Content•

Representation formulas and pointwise properties for Barron functions

[...]

Weinan E, Stephan Wojtowytsch

10 Jun 2020-arXiv: Machine Learning

TL;DR: It is shown that functions whose singular set is fractal or curved (for example distance functions from smooth submanifolds) cannot be represented by infinitely wide two-layer networks with finite path-norm.

...read moreread less

Abstract: We study the natural function space for infinitely wide two-layer neural networks with ReLU activation (Barron space) and establish different representation formulae. In two cases, we describe the space explicitly up to isomorphism. Using a convenient representation, we study the pointwise properties of two-layer networks and show that functions whose singular set is fractal or curved (for example distance functions from smooth submanifolds) cannot be represented by infinitely wide two-layer networks with finite path-norm. We use this structure theorem to show that the only $C^1$-diffeomorphisms which Barron space are affine. Furthermore, we show that every Barron function can be decomposed as the sum of a bounded and a positively one-homogeneous function and that there exist Barron functions which decay rapidly at infinity and are globally Lebesgue-integrable. This result suggests that two-layer neural networks may be able to approximate a greater variety of functions than commonly believed.

...read moreread less

45 citations

Posted Content•

On the Convergence of Gradient Descent Training for Two-layer ReLU-networks in the Mean Field Regime

[...]

Stephan Wojtowytsch

27 May 2020-arXiv: Analysis of PDEs

TL;DR: A necessary and sufficient condition for the convergence to minimum Bayes risk when training two-layer ReLU-networks by gradient descent in the mean field regime with omni-directional initial parameter distribution is described.

...read moreread less

Abstract: We describe a necessary and sufficient condition for the convergence to minimum Bayes risk when training two-layer ReLU-networks by gradient descent in the mean field regime with omni-directional initial parameter distribution. This article extends recent results of Chizat and Bach to ReLU-activated networks and to the situation in which there are no parameters which exactly achieve MBR. The condition does not depend on the initalization of parameters and concerns only the weak convergence of the realization of the neural network, not its parameter distribution.

...read moreread less

34 citations

Journal Article•DOI•

Phase Field Models for Thin Elastic Structures with Topological Constraint

[...]

Patrick W. Dondl¹, Antoine Lemenant², Stephan Wojtowytsch³•Institutions (3)

University of Freiburg¹, Paris Diderot University², Durham University³

01 Feb 2017-Archive for Rational Mechanics and Analysis

TL;DR: In this paper, a phase field approximation based on De Giorgi's diffuse Willmore functional is proposed to solve the variational problem of minimizing the Willmore energy in the class of connected surfaces.

...read moreread less

Abstract: This article is concerned with the problem of minimising the Willmore energy in the class of connected surfaces with prescribed area which are confined to a small container. We propose a phase field approximation based on De Giorgi’s diffuse Willmore functional to this variational problem. Our main contribution is a penalisation term which ensures connectedness in the sharp interface limit. The penalisation of disconnectedness is based on a geodesic distance chosen to be small between two points that lie on the same connected component of the transition layer of the phase field. We prove that in two dimensions, sequences of phase fields with uniformly bounded diffuse Willmore energy and diffuse area converge uniformly to the zeros of a double-well potential away from the support of a limiting measure. In three dimensions, we show that they converge ${\mathcal{H}^1}$-almost everywhere on curves. This enables us to show ${\Gamma}$-convergence to a sharp interface problem that only allows for connected structures. The results also imply Hausdorff convergence of the level sets in two dimensions and a similar result in three dimensions. Furthermore, we present numerical evidence of the effectiveness of our model. The implementation relies on a coupling of Dijkstra’s algorithm in order to compute the topological penalty to a finite element approach for the Willmore term.

...read moreread less

29 citations

Posted Content•

Can Shallow Neural Networks Beat the Curse of Dimensionality? A mean field training perspective

[...]

Stephan Wojtowytsch¹, Weinan E¹•Institutions (1)

Princeton University¹

21 May 2020-arXiv: Learning

TL;DR: It is proved that the gradient descent training of a two-layer neural network on empirical or population risk may not decrease population risk at an order faster than $t^{-4/(d-2)}$ under mean field scaling, and gradient descentTraining for fitting reasonably smooth, but truly high-dimensional data may be subject to the curse of dimensionality.

...read moreread less

Abstract: We prove that the gradient descent training of a two-layer neural network on empirical or population risk may not decrease population risk at an order faster than $t^{-4/(d-2)}$ under mean field scaling. Thus gradient descent training for fitting reasonably smooth, but truly high-dimensional data may be subject to the curse of dimensionality. We present numerical evidence that gradient descent training with general Lipschitz target functions becomes slower and slower as the dimension increases, but converges at approximately the same rate in all dimensions when the target function lies in the natural function space for two-layer ReLU networks.

...read moreread less

28 citations

1
2
3
4
…
5
6
7
8
9

Collapse

Cited by

PDF

Open Access

More filters

[신간의 별자리x] 우리/미술, 그리고 ‘슬픔의 박물관’

[...]

이화영

01 Jan 2015

12,972 citations

Reference Entry•DOI•

IEEE Transactions on Pattern Analysis and Machine Intelligence

[...]

King-Sun Fu

15 Oct 2004

2,118 citations

Journal Article•DOI•

Proceedings of the National Academy of Sciences

[...]

Omar Bagasra

18 Aug 1998-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: It is shown that the full set of hydromagnetic equations admit five more integrals, besides the energy integral, if dissipative processes are absent, which made it possible to formulate a variational principle for the force-free magnetic fields.

...read moreread less

Abstract: where A represents the magnetic vector potential, is an integral of the hydromagnetic equations. This -integral made it possible to formulate a variational principle for the force-free magnetic fields. The integral expresses the fact that motions cannot transform a given field in an entirely arbitrary different field, if the conductivity of the medium isconsidered infinite. In this paper we shall show that the full set of hydromagnetic equations admit five more integrals, besides the energy integral, if dissipative processes are absent. These integrals, as we shall presently verify, are I2 =fbHvdV, (2)

...read moreread less

1,858 citations

Posted Content•

Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss

[...]

Lénaïc Chizat¹, Francis Bach²•Institutions (2)

Département de Mathématiques¹, École Normale Supérieure²

11 Feb 2020-arXiv: Optimization and Control

TL;DR: It is shown that the limits of the gradient flow on exponentially tailed losses can be fully characterized as a max-margin classifier in a certain non-Hilbertian space of functions.

...read moreread less

Abstract: Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based methods are observed to perform well in many supervised classification tasks. Towards understanding this phenomenon, we analyze the training and generalization behavior of infinitely wide two-layer neural networks with homogeneous activations. We show that the limits of the gradient flow on exponentially tailed losses can be fully characterized as a max-margin classifier in a certain non-Hilbertian space of functions. In presence of hidden low-dimensional structures, the resulting margin is independent of the ambiant dimension, which leads to strong generalization bounds. In contrast, training only the output layer implicitly solves a kernel support vector machine, which a priori does not enjoy such an adaptivity. Our analysis of training is non-quantitative in terms of running time but we prove computational guarantees in simplified settings by showing equivalences with online mirror descent. Finally, numerical experiments suggest that our analysis describes well the practical behavior of two-layer neural networks with ReLU activation and confirm the statistical benefits of this implicit bias.

...read moreread less

197 citations

Book Chapter•DOI•

Sets of Finite Perimeter and Geometric Variational Problems: Equilibrium shapes of liquids and sessile drops

[...]

Francesco Maggi

01 Jan 2012

152 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

Collapse