scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Approximation by superpositions of a sigmoidal function

01 Dec 1989-Mathematics of Control, Signals, and Systems (Springer-Verlag)-Vol. 2, Iss: 4, pp 303-314
TL;DR: It is demonstrated that finite linear combinations of compositions of a fixed, univariate function and a set of affine functionals can uniformly approximate any continuous function ofn real variables with support in the unit hypercube.
Abstract: In this paper we demonstrate that finite linear combinations of compositions of a fixed, univariate function and a set of affine functionals can uniformly approximate any continuous function ofn real variables with support in the unit hypercube; only mild conditions are imposed on the univariate function. Our results settle an open question about representability in the class of single hidden layer neural networks. In particular, we show that arbitrary decision regions can be arbitrarily well approximated by continuous feedforward neural networks with only a single internal, hidden layer and any continuous sigmoidal nonlinearity. The paper discusses approximation properties of other possible types of nonlinearities that might be implemented by artificial neural networks.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

42,067 citations

Book
Vladimir Vapnik1
01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

40,147 citations


Cites background from "Approximation by superpositions of ..."

  • ...In 1989 Cybenko proved that using a superposition of sigmoid functions (neurons) one can approximate any smooth function (Cybenko, 1989)....

    [...]

Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

Book
01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

37,989 citations


Cites background from "Approximation by superpositions of ..."

  • ...However an ANN with a single hidden layer having a large enough finite number of sigmoid units can approximate any continuous function on a compact region of the network’s input space to any degree of accuracy (Cybenko, 1989)....

    [...]

Book
01 Jan 1995
TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.
Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

19,056 citations

References
More filters
Journal ArticleDOI
TL;DR: It is rigorously established that standard multilayer feedforward networks with as few as one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function from one finite dimensional space to another to any desired degree of accuracy, provided sufficiently many hidden units are available.

18,794 citations

Book
01 Jan 1973

14,545 citations

Book
01 Jan 1966
TL;DR: In this paper, the Riesz representation theorem is used to describe the regularity properties of Borel measures and their relation to the Radon-Nikodym theorem of continuous functions.
Abstract: Preface Prologue: The Exponential Function Chapter 1: Abstract Integration Set-theoretic notations and terminology The concept of measurability Simple functions Elementary properties of measures Arithmetic in [0, ] Integration of positive functions Integration of complex functions The role played by sets of measure zero Exercises Chapter 2: Positive Borel Measures Vector spaces Topological preliminaries The Riesz representation theorem Regularity properties of Borel measures Lebesgue measure Continuity properties of measurable functions Exercises Chapter 3: Lp-Spaces Convex functions and inequalities The Lp-spaces Approximation by continuous functions Exercises Chapter 4: Elementary Hilbert Space Theory Inner products and linear functionals Orthonormal sets Trigonometric series Exercises Chapter 5: Examples of Banach Space Techniques Banach spaces Consequences of Baire's theorem Fourier series of continuous functions Fourier coefficients of L1-functions The Hahn-Banach theorem An abstract approach to the Poisson integral Exercises Chapter 6: Complex Measures Total variation Absolute continuity Consequences of the Radon-Nikodym theorem Bounded linear functionals on Lp The Riesz representation theorem Exercises Chapter 7: Differentiation Derivatives of measures The fundamental theorem of Calculus Differentiable transformations Exercises Chapter 8: Integration on Product Spaces Measurability on cartesian products Product measures The Fubini theorem Completion of product measures Convolutions Distribution functions Exercises Chapter 9: Fourier Transforms Formal properties The inversion theorem The Plancherel theorem The Banach algebra L1 Exercises Chapter 10: Elementary Properties of Holomorphic Functions Complex differentiation Integration over paths The local Cauchy theorem The power series representation The open mapping theorem The global Cauchy theorem The calculus of residues Exercises Chapter 11: Harmonic Functions The Cauchy-Riemann equations The Poisson integral The mean value property Boundary behavior of Poisson integrals Representation theorems Exercises Chapter 12: The Maximum Modulus Principle Introduction The Schwarz lemma The Phragmen-Lindelof method An interpolation theorem A converse of the maximum modulus theorem Exercises Chapter 13: Approximation by Rational Functions Preparation Runge's theorem The Mittag-Leffler theorem Simply connected regions Exercises Chapter 14: Conformal Mapping Preservation of angles Linear fractional transformations Normal families The Riemann mapping theorem The class L Continuity at the boundary Conformal mapping of an annulus Exercises Chapter 15: Zeros of Holomorphic Functions Infinite Products The Weierstrass factorization theorem An interpolation problem Jensen's formula Blaschke products The Muntz-Szas theorem Exercises Chapter 16: Analytic Continuation Regular points and singular points Continuation along curves The monodromy theorem Construction of a modular function The Picard theorem Exercises Chapter 17: Hp-Spaces Subharmonic functions The spaces Hp and N The theorem of F. and M. Riesz Factorization theorems The shift operator Conjugate functions Exercises Chapter 18: Elementary Theory of Banach Algebras Introduction The invertible elements Ideals and homomorphisms Applications Exercises Chapter 19: Holomorphic Fourier Transforms Introduction Two theorems of Paley and Wiener Quasi-analytic classes The Denjoy-Carleman theorem Exercises Chapter 20: Uniform Approximation by Polynomials Introduction Some lemmas Mergelyan's theorem Exercises Appendix: Hausdorff's Maximality Theorem Notes and Comments Bibliography List of Special Symbols Index

9,642 citations

Journal ArticleDOI
TL;DR: This paper provides an introduction to the field of artificial neural nets by reviewing six important neural net models that can be used for pattern classification and exploring how some existing classification and clustering algorithms can be performed using simple neuron-like components.
Abstract: Artificial neural net models have been studied for many years in the hope of achieving human-like performance in the fields of speech and image recognition. These models are composed of many nonlinear computational elements operating in parallel and arranged in patterns reminiscent of biological neural nets. Computational elements or nodes are connected via weights that are typically adapted during use to improve performance. There has been a recent resurgence in the field of artificial neural nets caused by new net topologies and algorithms, analog VLSI implementation techniques, and the belief that massive parallelism is essential for high performance speech and image recognition. This paper provides an introduction to the field of artificial neural nets by reviewing six important neural net models that can be used for pattern classification. These nets are highly parallel building blocks that illustrate neural net components and design principles and can be used to construct more complex systems. In addition to describing these nets, a major emphasis is placed on exploring how some existing classification and clustering algorithms can be performed using simple neuron-like components. Single-layer nets can implement algorithms required by Gaussian maximum-likelihood classifiers and optimum minimum-error classifiers for binary patterns corrupted by noise. More generally, the decision regions required by any classification algorithm can be generated in a straightforward manner by three-layer feed-forward nets.

7,798 citations