Home
/
Authors
/
Ronen Eldan

Author

Ronen Eldan

Other affiliations: University of Washington, Microsoft, Tel Aviv University

Bio: Ronen Eldan is an academic researcher from Weizmann Institute of Science. The author has contributed to research in topics: Mathematics & Convex body. The author has an hindex of 22, co-authored 96 publications receiving 2249 citations. Previous affiliations of Ronen Eldan include University of Washington & Microsoft.

Topics: Mathematics, Convex body, Gaussian, Computer science, Gaussian measure ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007

Papers

PDF

Open Access

More filters

Proceedings Article•

The Power of Depth for Feedforward Neural Networks

[...]

Ronen Eldan¹, Ohad Shamir¹•Institutions (1)

Weizmann Institute of Science¹

06 Jun 2016

TL;DR: In this article, it was shown that a simple (approximately radial) function on R d, expressible by a small 3-layer feedforward neural networks, which cannot be approximated by any 2-layer network, unless its width is exponential in the dimension.

...read moreread less

Abstract: We show that there is a simple (approximately radial) function on R d , expressible by a small 3-layer feedforward neural networks, which cannot be approximated by any 2-layer network, to more than a certain constant accuracy, unless its width is exponential in the dimension. The result holds for virtually all known activation functions, including rectified linear units, sigmoids and thresholds, and formally demonstrates that depth ‐ even if increased by 1 ‐ can be exponentially more valuable than width for standard feedforward neural networks. Moreover, compared to related results in the context of Boolean functions, our result requires fewer assumptions, and the proof techniques and construction are very different.

...read moreread less

490 citations

Posted Content•

The Power of Depth for Feedforward Neural Networks

[...]

Ronen Eldan¹, Ohad Shamir¹•Institutions (1)

Weizmann Institute of Science¹

12 Dec 2015-arXiv: Learning

TL;DR: In this paper, it was shown that a simple (approximately radial) function expressible by a small 3-layer feed-forward neural network, which cannot be approximated by any 2-layer network, to more than a certain constant accuracy, unless its width is exponential in the dimension.

...read moreread less

Abstract: We show that there is a simple (approximately radial) function on $\reals^d$, expressible by a small 3-layer feedforward neural networks, which cannot be approximated by any 2-layer network, to more than a certain constant accuracy, unless its width is exponential in the dimension. The result holds for virtually all known activation functions, including rectified linear units, sigmoids and thresholds, and formally demonstrates that depth -- even if increased by 1 -- can be exponentially more valuable than width for standard feedforward neural networks. Moreover, compared to related results in the context of Boolean functions, our result requires fewer assumptions, and the proof techniques and construction are very different.

...read moreread less

413 citations

Journal Article•DOI•

Sparks of Artificial General Intelligence: Early experiments with GPT-4

[...]

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Kirkland Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuan-Fang Li, Scott M. Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang - Show less +10 more

22 Mar 2023-arXiv.org

TL;DR: In this paper , an early version of GPT-4 was investigated, when it was still in active development by OpenAI, and it was shown that it can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without requiring any special prompting.

...read moreread less

Abstract: Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions.

...read moreread less

318 citations

Journal Article•DOI•

Testing for high‐dimensional geometry in random graphs

[...]

Sébastien Bubeck¹, Jian Ding², Ronen Eldan³, Miklos Z. Racz⁴•Institutions (4)

Princeton University¹, University of Chicago², University of Washington³, University of California, Berkeley⁴

01 Oct 2016-Random Structures and Algorithms

TL;DR: In this article, the authors studied the problem of detecting the presence of an underlying high-dimensional geometric structure in a random graph, where each vertex corresponds to a latent independent random vector uniformly distributed on the sphere Sd−1 and two vertices are connected if the corresponding latent vectors are close enough.

...read moreread less

Abstract: We study the problem of detecting the presence of an underlying high-dimensional geometric structure in a random graph. Under the null hypothesis, the observed graph is a realization of an Erdős-Renyi random graph G(n, p). Under the alternative, the graph is generated from the G(n,p,d) model, where each vertex corresponds to a latent independent random vector uniformly distributed on the sphere Sd−1, and two vertices are connected if the corresponding latent vectors are close enough. In the dense regime (i.e., p is a constant), we propose a near-optimal and computationally efficient testing procedure based on a new quantity which we call signed triangles. The proof of the detection lower bound is based on a new bound on the total variation distance between a Wishart matrix and an appropriately normalized GOE matrix. In the sparse regime, we make a conjecture for the optimal detection boundary. We conclude the paper with some preliminary steps on the problem of estimating the dimension in G(n,p,d). © 2016 Wiley Periodicals, Inc. Random Struct. Alg., 49, 503–532, 2016

...read moreread less

120 citations

Posted Content•

Kernel-based methods for bandit convex optimization

[...]

Sébastien Bubeck¹, Ronen Eldan², Yin Tat Lee³•Institutions (3)

Microsoft¹, Weizmann Institute of Science², University of Washington³

11 Jul 2016-arXiv: Learning

TL;DR: In this paper, the authors considered the adversarial convex bandit problem and showed that a simple variant of this algorithm can be run in O(n poly(poly(n) T √ log(T))$-time per step at the cost of an additional O(poly{poly}(n)-factor in the regret.

...read moreread less

Abstract: We consider the adversarial convex bandit problem and we build the first $\mathrm{poly}(T)$-time algorithm with $\mathrm{poly}(n) \sqrt{T}$-regret for this problem. To do so we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves $\tilde{O}(n^{9.5} \sqrt{T})$-regret, and we show that a simple variant of this algorithm can be run in $\mathrm{poly}(n \log(T))$-time per step at the cost of an additional $\mathrm{poly}(n) T^{o(1)}$ factor in the regret. These results improve upon the $\tilde{O}(n^{11} \sqrt{T})$-regret and $\exp(\mathrm{poly}(T))$-time result of the first two authors, and the $\log(T)^{\mathrm{poly}(n)} \sqrt{T}$-regret and $\log(T)^{\mathrm{poly}(n)}$-time result of Hazan and Li. Furthermore we conjecture that another variant of the algorithm could achieve $\tilde{O}(n^{1.5} \sqrt{T})$-regret, and moreover that this regret is unimprovable (the current best lower bound being $\Omega(n \sqrt{T})$ and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order $n^3 / \epsilon^2$.

...read moreread less

107 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

Collapse

Cited by

PDF

Open Access

More filters

Convex Analysisの二,三の進展について

[...]

徹丸山

01 Feb 1977

5,933 citations

Posted Content•

Understanding deep learning requires rethinking generalization

[...]

Chiyuan Zhang¹, Samy Bengio², Moritz Hardt², Benjamin Recht³, Oriol Vinyals² - Show less +1 more•Institutions (3)

Massachusetts Institute of Technology¹, Google², University of California³

10 Nov 2016-arXiv: Learning

TL;DR: The authors showed that deep neural networks can fit a random labeling of the training data, and that this phenomenon is qualitatively unaffected by explicit regularization, and occurs even if the true images are replaced by completely unstructured random noise.

...read moreread less

Abstract: Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. We interpret our experimental findings by comparison with traditional models.

...read moreread less

2,523 citations

Book Chapter•DOI•

Stochastic Differential Equations

[...]

Ioannis Karatzas¹, Steven E. Shreve²•Institutions (2)

Columbia University¹, Carnegie Mellon University²

01 Jan 1998

TL;DR: In this paper, the authors explore questions of existence and uniqueness for solutions to stochastic differential equations and offer a study of their properties, using diffusion processes as a model of a Markov process with continuous sample paths.

...read moreread less

Abstract: We explore in this chapter questions of existence and uniqueness for solutions to stochastic differential equations and offer a study of their properties. This endeavor is really a study of diffusion processes. Loosely speaking, the term diffusion is attributed to a Markov process which has continuous sample paths and can be characterized in terms of its infinitesimal generator.

...read moreread less

2,446 citations

Journal Article•DOI•

Understanding deep learning (still) requires rethinking generalization

[...]

Chiyuan Zhang¹, Samy Bengio¹, Moritz Hardt², Benjamin Recht², Oriol Vinyals - Show less +1 more•Institutions (2)

Google¹, University of California, Berkeley²

22 Feb 2021-Communications of The ACM

TL;DR: These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.

...read moreread less

Abstract: Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small gap between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. We interpret our experimental findings by comparison with traditional models. We supplement this republication with a new section at the end summarizing recent progresses in the field since the original version of this paper.

...read moreread less

1,664 citations

Book•

Convex Optimization: Algorithms and Complexity

[...]

Sébastien Bubeck¹•Institutions (1)

Microsoft¹

28 Oct 2015

TL;DR: This monograph presents the main complexity theorems in convex optimization and their corresponding algorithms and provides a gentle introduction to structural optimization with FISTA, saddle-point mirror prox, Nemirovski's alternative to Nesterov's smoothing, and a concise description of interior point methods.

...read moreread less

Abstract: This monograph presents the main complexity theorems in convex optimization and their corresponding algorithms. Starting from the fundamental theory of black-box optimization, the material progresses towards recent advances in structural optimization and stochastic optimization. Our presentation of black-box optimization, strongly influenced by the seminal book of Nesterov, includes the analysis of cutting plane methods, as well as accelerated gradient descent schemes. We also pay special attention to non-Euclidean settings relevant algorithms include Frank-Wolfe, mirror descent, and dual averaging and discuss their relevance in machine learning. We provide a gentle introduction to structural optimization with FISTA to optimize a sum of a smooth and a simple non-smooth term, saddle-point mirror prox Nemirovski's alternative to Nesterov's smoothing, and a concise description of interior point methods. In stochastic optimization we discuss stochastic gradient descent, mini-batches, random coordinate descent, and sublinear algorithms. We also briefly touch upon convex relaxation of combinatorial problems and the use of randomness to round solutions, as well as random walks based methods.

...read moreread less

1,213 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse