Home
/
Authors
/
Eric B. Baum

Author

Eric B. Baum

Bio: Eric B. Baum is an academic researcher from Princeton University. The author has contributed to research in topics: Blocks world & Probability distribution. The author has an hindex of 16, co-authored 26 publications receiving 5861 citations.

Topics: Blocks world, Probability distribution, Time complexity, Minimax, Generalization ...read more

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

On genetic algorithms

[...]

Eric B. Baum¹, Dan Boneh¹, Charles Garrett¹•Institutions (1)

Princeton University¹

05 Jul 1995

TL;DR: C Culling is near optimal for this problem, highly noise tolerant, and the best known a~~roach in some regimes, and some new large deviation bounds on this submartingale enable us to determine the running time of the algorithm.

...read moreread less

Abstract: We analyze the performance of a Genetic Type Algorithm we call Culling and a variety of other algorithms on a problem we refer to as ASP. Culling is near optimal for this problem, highly noise tolerant, and the best known a~~roach . . in some regimes. We show that the problem of learning the Ising perception is reducible to noisy ASP. These results provide an example of a rigorous analysis of GA’s and give insight into when and how C,A’s can beat competing methods. To analyze the genetic algorithm, we view it as a special type of submartingale. We prove some new large deviation bounds on this submartingale w~ich enable us to determine the running time of the algorithm.

...read moreread less

4,520 citations

Journal Article•DOI•

What Size Net Gives Valid Generalization

[...]

Eric B. Baum¹, David Haussler²•Institutions (2)

Princeton University¹, University of California, Santa Cruz²

01 Jan 1988

TL;DR: It is shown that if m O(W/ ∊ log N/∊) random examples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a fraction 1 ∊/2 of the examples are correctly classified, then one has confidence approaching certainty that the network will correctly classify a fraction 2 ∊ of future test examples drawn from the same distribution.

...read moreread less

Abstract: We address the question of when a network can be expected to generalize from m random training examples chosen from some arbitrary probability distribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs. network size. Assume 0 < e ≤ 1/8. We show that if m ≥ O(W/e log N/e) random examples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a fraction 1 - e/2 of the examples are correctly classified, then one has confidence approaching certainty that the network will correctly classify a fraction 1 - e of future test examples drawn from the same distribution. Conversely, for fully-connected feedforward nets with one hidden layer, any learning algorithm using fewer than Ω(W/e) random training examples will, for some distributions of examples consistent with an appropriate weight choice, fail at least some fixed fraction of the time to find a weight choice that will correctly classify more than a 1 - e fraction of the future test examples.

...read moreread less

1,649 citations

Journal Article•DOI•

Building an associative memory vastly larger than the brain

[...]

Eric B. Baum¹•Institutions (1)

Princeton University¹

28 Apr 1995-Science

174 citations

Journal Article•DOI•

Rush Hour is PSAPCE-complete, or Why you should generously tip parking lot attendants

[...]

Gary W. Flake¹, Eric B. Baum¹•Institutions (1)

Princeton University¹

06 Jan 2002-Theoretical Computer Science

TL;DR: This work shows that deciding if the target car can legally exit the grid is PSPACE-complete, and uses a lazy form of dual-rail reversible logic such that movement of "output" cars can only occur if logical combinations of "input" car can also move.

...read moreread less

87 citations

Proceedings Article•

Constructing Hidden Units using Examples and Queries

[...]

Eric B. Baum¹, Kevin J. Lang¹•Institutions (1)

Princeton University¹

01 Oct 1990

TL;DR: Empirical tests show that the method can also learn far more complicated functions such as randomly generated networks with 200 hidden units, and requires only 30 minutes of CPU time to learn 200-bit parity to 99.7% accuracy.

...read moreread less

Abstract: While the network loading problem for 2-layer threshold nets is NP-hard when learning from examples alone (as with backpropagation), (Baum, 91) has now proved that a learner can employ queries to evade the hidden unit credit assignment problem and PAC-load nets with up to four hidden units in polynomial time. Empirical tests show that the method can also learn far more complicated functions such as randomly generated networks with 200 hidden units. The algorithm easily approximates Wieland's 2-spirals function using a single layer of 50 hidden units, and requires only 30 minutes of CPU time to learn 200-bit parity to 99.7% accuracy.

...read moreread less

86 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Book•

Neural networks for pattern recognition

[...]

Christopher M. Bishop¹•Institutions (1)

Aston University¹

01 Jan 1995

TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.

...read moreread less

Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

...read moreread less

19,056 citations

Journal Article•DOI•

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

[...]

Yoav Freund¹, Robert E. Schapire¹•Institutions (1)

AT&T Labs¹

01 Aug 1997

TL;DR: The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.

...read moreread less

Abstract: In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework. The model we study can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting. We show that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. We show how the resulting learning algorithm can be applied to a variety of problems, including gambling, multiple-outcome prediction, repeated games, and prediction of points in Rn. In the second part of the paper we apply the multiplicative weight-update technique to derive a new boosting algorithm. This boosting algorithm does not require any prior knowledge about the performance of the weak learning algorithm. We also study generalizations of the new boosting algorithm to the problem of learning functions whose range, rather than being binary, is an arbitrary finite set or a bounded segment of the real line.

...read moreread less

15,813 citations

Journal Article•DOI•

Deep learning in neural networks

[...]

Jürgen Schmidhuber¹•Institutions (1)

University of Lugano¹

01 Jan 2015-Neural Networks

TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

...read moreread less

14,635 citations

Journal Article•DOI•

Approximation by superpositions of a sigmoidal function

[...]

George Cybenko¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Dec 1989-Mathematics of Control, Signals, and Systems

TL;DR: It is demonstrated that finite linear combinations of compositions of a fixed, univariate function and a set of affine functionals can uniformly approximate any continuous function ofn real variables with support in the unit hypercube.

...read moreread less

Abstract: In this paper we demonstrate that finite linear combinations of compositions of a fixed, univariate function and a set of affine functionals can uniformly approximate any continuous function ofn real variables with support in the unit hypercube; only mild conditions are imposed on the univariate function. Our results settle an open question about representability in the class of single hidden layer neural networks. In particular, we show that arbitrary decision regions can be arbitrarily well approximated by continuous feedforward neural networks with only a single internal, hidden layer and any continuous sigmoidal nonlinearity. The paper discusses approximation properties of other possible types of nonlinearities that might be implemented by artificial neural networks.

...read moreread less

12,286 citations

Proceedings Article•DOI•

A training algorithm for optimal margin classifiers

[...]

Bernhard E. Boser¹, Isabelle Guyon², Vladimir Vapnik²•Institutions (2)

University of California, Berkeley¹, Bell Labs²

01 Jul 1992

TL;DR: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented, applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions.

...read moreread less

Abstract: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to the decision boundary. Bounds on the generalization performance based on the leave-one-out method and the VC-dimension are given. Experimental results on optical character recognition problems demonstrate the good generalization obtained when compared with other learning algorithms.

...read moreread less

11,211 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse