Approximation by superpositions of a sigmoidal function

doi:10.1007/BF02551274

Home
/
Papers
/
Approximation by superpositions of a sigmoidal function

Journal Article•DOI•

Approximation by superpositions of a sigmoidal function

George Cybenko¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Dec 1989-Mathematics of Control, Signals, and Systems (Springer-Verlag)-Vol. 2, Iss: 4, pp 303-314

TL;DR: It is demonstrated that finite linear combinations of compositions of a fixed, univariate function and a set of affine functionals can uniformly approximate any continuous function ofn real variables with support in the unit hypercube.

read less

Abstract: In this paper we demonstrate that finite linear combinations of compositions of a fixed, univariate function and a set of affine functionals can uniformly approximate any continuous function ofn real variables with support in the unit hypercube; only mild conditions are imposed on the univariate function. Our results settle an open question about representability in the class of single hidden layer neural networks. In particular, we show that arbitrary decision regions can be arbitrarily well approximated by continuous feedforward neural networks with only a single internal, hidden layer and any continuous sigmoidal nonlinearity. The paper discusses approximation properties of other possible types of nonlinearities that might be implemented by artificial neural networks.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Gradient-based learning applied to document recognition

[...]

Yann LeCun¹, Léon Bottou², Léon Bottou³, Yoshua Bengio⁴, Yoshua Bengio⁵, Yoshua Bengio³, Patrick Haffner³ - Show less +3 more•Institutions (5)

Bell Labs¹, École Normale Supérieure², AT&T³, Alcatel-Lucent⁴, École Polytechnique de Montréal⁵

01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

...read moreread less

42,067 citations

Book•

The Nature of Statistical Learning Theory

[...]

Vladimir Vapnik¹•Institutions (1)

Bell Labs¹

01 Jan 1995

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?

...read moreread less

Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

...read moreread less

40,147 citations

Cites background from "Approximation by superpositions of ..."

...In 1989 Cybenko proved that using a superposition of sigmoid functions (neurons) one can approximate any smooth function (Cybenko, 1989)....
[...]

Book•

Deep Learning

[...]

Ian Goodfellow¹, Yoshua Bengio², Aaron Courville²•Institutions (2)

Google¹, Université de Montréal²

18 Nov 2016

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

...read moreread less

38,208 citations

Book•

Reinforcement Learning: An Introduction

[...]

Richard S. Sutton¹, Andrew G. Barto•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1988

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

...read moreread less

37,989 citations

Cites background from "Approximation by superpositions of ..."

...However an ANN with a single hidden layer having a large enough finite number of sigmoid units can approximate any continuous function on a compact region of the network’s input space to any degree of accuracy (Cybenko, 1989)....
[...]

Book•

Neural networks for pattern recognition

[...]

Christopher M. Bishop¹•Institutions (1)

Aston University¹

01 Jan 1995

TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.

...read moreread less

Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

...read moreread less

19,056 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Multilayer feedforward networks are universal approximators

[...]

Kurt Hornik, Maxwell B. Stinchcombe¹, Halbert White¹•Institutions (1)

University of California, San Diego¹

01 Jul 1989-Neural Networks

TL;DR: It is rigorously established that standard multilayer feedforward networks with as few as one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function from one finite dimensional space to another to any desired degree of accuracy, provided sufficiently many hidden units are available.

...read moreread less

18,794 citations

Monograph•DOI•

Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations

[...]

David E. Rumelhart, James L. McClelland, Au

17 Jul 1986

15,313 citations

Book•

Functional analysis

[...]

Walter Rudin

01 Jan 1973

14,545 citations

Book•

Real and complex analysis

[...]

Walter Rudin

01 Jan 1966

TL;DR: In this paper, the Riesz representation theorem is used to describe the regularity properties of Borel measures and their relation to the Radon-Nikodym theorem of continuous functions.

...read moreread less

Abstract: Preface Prologue: The Exponential Function Chapter 1: Abstract Integration Set-theoretic notations and terminology The concept of measurability Simple functions Elementary properties of measures Arithmetic in [0, ] Integration of positive functions Integration of complex functions The role played by sets of measure zero Exercises Chapter 2: Positive Borel Measures Vector spaces Topological preliminaries The Riesz representation theorem Regularity properties of Borel measures Lebesgue measure Continuity properties of measurable functions Exercises Chapter 3: Lp-Spaces Convex functions and inequalities The Lp-spaces Approximation by continuous functions Exercises Chapter 4: Elementary Hilbert Space Theory Inner products and linear functionals Orthonormal sets Trigonometric series Exercises Chapter 5: Examples of Banach Space Techniques Banach spaces Consequences of Baire's theorem Fourier series of continuous functions Fourier coefficients of L1-functions The Hahn-Banach theorem An abstract approach to the Poisson integral Exercises Chapter 6: Complex Measures Total variation Absolute continuity Consequences of the Radon-Nikodym theorem Bounded linear functionals on Lp The Riesz representation theorem Exercises Chapter 7: Differentiation Derivatives of measures The fundamental theorem of Calculus Differentiable transformations Exercises Chapter 8: Integration on Product Spaces Measurability on cartesian products Product measures The Fubini theorem Completion of product measures Convolutions Distribution functions Exercises Chapter 9: Fourier Transforms Formal properties The inversion theorem The Plancherel theorem The Banach algebra L1 Exercises Chapter 10: Elementary Properties of Holomorphic Functions Complex differentiation Integration over paths The local Cauchy theorem The power series representation The open mapping theorem The global Cauchy theorem The calculus of residues Exercises Chapter 11: Harmonic Functions The Cauchy-Riemann equations The Poisson integral The mean value property Boundary behavior of Poisson integrals Representation theorems Exercises Chapter 12: The Maximum Modulus Principle Introduction The Schwarz lemma The Phragmen-Lindelof method An interpolation theorem A converse of the maximum modulus theorem Exercises Chapter 13: Approximation by Rational Functions Preparation Runge's theorem The Mittag-Leffler theorem Simply connected regions Exercises Chapter 14: Conformal Mapping Preservation of angles Linear fractional transformations Normal families The Riemann mapping theorem The class L Continuity at the boundary Conformal mapping of an annulus Exercises Chapter 15: Zeros of Holomorphic Functions Infinite Products The Weierstrass factorization theorem An interpolation problem Jensen's formula Blaschke products The Muntz-Szas theorem Exercises Chapter 16: Analytic Continuation Regular points and singular points Continuation along curves The monodromy theorem Construction of a modular function The Picard theorem Exercises Chapter 17: Hp-Spaces Subharmonic functions The spaces Hp and N The theorem of F. and M. Riesz Factorization theorems The shift operator Conjugate functions Exercises Chapter 18: Elementary Theory of Banach Algebras Introduction The invertible elements Ideals and homomorphisms Applications Exercises Chapter 19: Holomorphic Fourier Transforms Introduction Two theorems of Paley and Wiener Quasi-analytic classes The Denjoy-Carleman theorem Exercises Chapter 20: Uniform Approximation by Polynomials Introduction Some lemmas Mergelyan's theorem Exercises Appendix: Hausdorff's Maximality Theorem Notes and Comments Bibliography List of Special Symbols Index

...read moreread less

9,642 citations

Journal Article•DOI•

An introduction to computing with neural nets

[...]

Richard P. Lippmann¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Apr 1987-IEEE Assp Magazine

TL;DR: This paper provides an introduction to the field of artificial neural nets by reviewing six important neural net models that can be used for pattern classification and exploring how some existing classification and clustering algorithms can be performed using simple neuron-like components.

...read moreread less

Abstract: Artificial neural net models have been studied for many years in the hope of achieving human-like performance in the fields of speech and image recognition. These models are composed of many nonlinear computational elements operating in parallel and arranged in patterns reminiscent of biological neural nets. Computational elements or nodes are connected via weights that are typically adapted during use to improve performance. There has been a recent resurgence in the field of artificial neural nets caused by new net topologies and algorithms, analog VLSI implementation techniques, and the belief that massive parallelism is essential for high performance speech and image recognition. This paper provides an introduction to the field of artificial neural nets by reviewing six important neural net models that can be used for pattern classification. These nets are highly parallel building blocks that illustrate neural net components and design principles and can be used to construct more complex systems. In addition to describing these nets, a major emphasis is placed on exploring how some existing classification and clustering algorithms can be performed using simple neuron-like components. Single-layer nets can implement algorithms required by Gaussian maximum-likelihood classifiers and optimum minimum-error classifiers for binary patterns corrupted by noise. More generally, the decision regions required by any classification algorithm can be generated in a straightforward manner by three-layer feed-forward nets.

...read moreread less

7,798 citations