Asymptotic global behavior for stochastic approximation and diffusions with slowly decreasing noise effects: Global minimization via Monte Carlo

doi:10.1137/0147010

Home
/
Papers
/
Asymptotic global behavior for stochastic approximation and diffusions with slowly decreasing noise effects: Global minimization via Monte Carlo

Journal Article•DOI•

Asymptotic global behavior for stochastic approximation and diffusions with slowly decreasing noise effects: Global minimization via Monte Carlo

01 Mar 1987-Siam Journal on Applied Mathematics (Society for Industrial and Applied Mathematics)-Vol. 47, Iss: 1, pp 169-185

TL;DR: In this article, the authors studied the asymptotic behavior of the systems where the objective function values can only be sampled via Monte Carlo, where the discrete algorithm is a combination of stochastic approximation and simulated annealing.

read less

Abstract: The asymptotic behavior of the systems $X_{n + 1} = X_n + a_n b( {X_n ,\xi _n } ) + a_n \sigma ( X_n )\psi_n $ and $dy = \bar b( y )dt + \sqrt {a( t )} \sigma ( y )dw$ is studied, where $\{ {\psi _n } \}$ is i.i.d. Gaussian, $\{ \xi _n \}$ is a (correlated) bounded sequence of random variables and $a_n \approx A_0/\log (A_1 + n )$. Without $\{ \xi _n \}$, such algorithms are versions of the “simulated annealing” method for global optimization. When the objective function values can only be sampled via Monte Carlo, the discrete algorithm is a combination of stochastic approximation and simulated annealing. Our forms are appropriate. The $\{ \psi _n \}$ are the “annealing” variables, and $\{ \xi _n \}$ is the sampling noise. For large $A_0 $, a full asymptotic analysis is presented, via the theory of large deviations: Mean escape time (after arbitrary time n) from neighborhoods of stable sets of the algorithm, mean transition times (after arbitrary time n) from a neighborhood of one stable set to another, a...

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Learning in Artificial Neural Networks: A Statistical Perspective

[...]

Halbert White¹•Institutions (1)

University of California, San Diego¹

01 Dec 1989-Neural Computation

TL;DR: Concepts and analytical results from the literatures of mathematical statistics, econometrics, systems identification, and optimization theory relevant to the analysis of learning in artificial neural networks are reviewed.

...read moreread less

Abstract: The premise of this article is that learning procedures used to train artificial neural networks are inherently statistical techniques. It follows that statistical theory can provide considerable insight into the properties, advantages, and disadvantages of different network learning methods. We review concepts and analytical results from the literatures of mathematical statistics, econometrics, systems identification, and optimization theory relevant to the analysis of learning in artificial neural networks. Because of the considerable variety of available learning procedures and necessary limitations of space, we cannot provide a comprehensive treatment. Our focus is primarily on learning procedures for feedforward networks. However, many of the concepts and issues arising in this framework are also quite broadly relevant to other network learning paradigms. In addition to providing useful insights, the material reviewed here suggests some potentially useful new training methods for artificial neural ne...

...read moreread less

969 citations

Cites background or methods from "Asymptotic global behavior for stoc..."

...The results of White (1989b) and of Kuan and White (1989) can be used to construct tests of the irrelevant input hypothesis and the irrelevant hidden unit hypothesis in ways directly analogous to those discussed earlier....
[...]
...Kushner (1987) has studied a modification of equation 4.7 guaranteed to converge w.p.1 to a global solution to equation 4.1 as n + 00....
[...]

Journal Article•DOI•

Neural Networks and Related Methods for Classification

[...]

Brian D. Ripley

01 Sep 1994-Journal of the royal statistical society series b-methodological

580 citations

Journal Article•DOI•

Artificial neural networks: an econometric perspective ∗

[...]

Chung-Ming Kuan¹, Halbert White²•Institutions (2)

University of Illinois at Urbana–Champaign¹, University of California, San Diego²

01 Jan 1994-Econometric Reviews

TL;DR: Theoretical neural networks from an econometric perspective: a perspective from the perspective of an economist.

...read moreread less

Abstract: (1994). Artificial neural networks: an econometric perspective. Econometric Reviews: Vol. 13, No. 1, pp. 1-91.

...read moreread less

484 citations

Journal Article•DOI•

The effects of adding noise during backpropagation training on a generalization performance

[...]

Guozhong An¹•Institutions (1)

Royal Dutch Shell¹

01 Apr 1996-Neural Computation

TL;DR: It is shown that input noise and weight noise encourage the neural-network output to be a smooth function of the input or its weights, respectively, and in the weak-noise limit, noise added to the output of the neural networks only changes the objective function by a constant, it cannot improve generalization.

...read moreread less

Abstract: We study the effects of adding noise to the inputs, outputs, weight connections, and weight changes of multilayer feedforward neural networks during backpropagation training. We rigorously derive and analyze the objective functions that are minimized by the noise-affected training processes. We show that input noise and weight noise encourage the neural-network output to be a smooth function of the input or its weights, respectively. In the weak-noise limit, noise added to the output of the neural networks only changes the objective function by a constant. Hence, it cannot improve generalization. Input noise introduces penalty terms in the objective function that are related to, but distinct from, those found in the regularization approaches. Simulations have been performed on a regression and a classification problem to further substantiate our analysis. Input noise is found to be effective in improving the generalization performance for both problems. However, weight noise is found to be effective in improving the generalization performance only for the classification problem. Other forms of noise have practically no effect on generalization.

...read moreread less

465 citations

Journal Article•DOI•

Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network Models

[...]

Halbert White

01 Dec 1989-Journal of the American Statistical Association

TL;DR: In this article, the authors investigate the properties of a recursive estimation procedure (the method of "back-propagation") for a class of nonlinear regression models (single hidden-layer feedforward network models) recently developed by cognitive scientists.

...read moreread less

Abstract: We investigate the properties of a recursive estimation procedure (the method of “back-propagation”) for a class of nonlinear regression models (single hidden-layer feedforward network models) recently developed by cognitive scientists. The results follow from more general results for a class of recursive m estimators, obtained using theorems of Ljung (1977) and Walk (1977) for the method of stochastic approximation. Conditions are given ensuring that the back-propagation estimator converges almost surely to a parameter value that locally minimizes expected squared error loss (provided the estimator does not diverge) and that the back-propagation estimator is asymptotically normal when centered at this minimizer. This estimator is shown to be statistically inefficient, and a two-step procedure that has efficiency equivalent to that of nonlinear least squares is proposed. Practical issues are illustrated by a numerical example involving approximation of the Henon map.

...read moreread less

448 citations

Cites background from "Asymptotic global behavior for stoc..."

...Results of Ljung (1977) and Kushner and Clark (1978) provide a basis for proving the consistency and asymptotic normality of back-propagation and related methods with dependent heterogeneous...
[...]
...Kushner (1987) established convergence to a global optimum with (n Gaussian and iin , 1/log(n + 1)....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Optimization by Simulated Annealing

[...]

Scott Kirkpatrick¹, C. D. Gelatt¹, Mario P. Vecchi²•Institutions (2)

IBM¹, Venezuelan Institute for Scientific Research²

13 May 1983-Science

TL;DR: There is a deep and useful connection between statistical mechanics and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters), and a detailed analogy with annealing in solids provides a framework for optimization of very large and complex systems.

...read moreread less

Abstract: There is a deep and useful connection between statistical mechanics (the behavior of systems with many degrees of freedom in thermal equilibrium at a finite temperature) and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters). A detailed analogy with annealing in solids provides a framework for optimization of the properties of very large and complex systems. This connection to statistical mechanics exposes new information and provides an unfamiliar perspective on traditional optimization problems and methods.

...read moreread less

41,772 citations

Journal Article•DOI•

Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

[...]

Stuart Geman¹, Donald Geman²•Institutions (2)

Brown University¹, University of Massachusetts Amherst²

01 Nov 1984-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The analogy between images and statistical mechanics systems is made and the analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations, creating a highly parallel ``relaxation'' algorithm for MAP estimation.

...read moreread less

Abstract: We make an analogy between images and statistical mechanics systems. Pixel gray levels and the presence and orientation of edges are viewed as states of atoms or molecules in a lattice-like physical system. The assignment of an energy function in the physical system determines its Gibbs distribution. Because of the Gibbs distribution, Markov random field (MRF) equivalence, this assignment also determines an MRF image model. The energy function is a more convenient and natural mechanism for embodying picture attributes than are the local characteristics of the MRF. For a range of degradation mechanisms, including blurring, nonlinear deformations, and multiplicative or additive noise, the posterior distribution is an MRF with a structure akin to the image model. By the analogy, the posterior distribution defines another (imaginary) physical system. Gradual temperature reduction in the physical system isolates low energy states (``annealing''), or what is the same thing, the most probable states under the Gibbs distribution. The analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations. The result is a highly parallel ``relaxation'' algorithm for MAP estimation. We establish convergence properties of the algorithm and we experiment with some simple pictures, for which good restorations are obtained at low signal-to-noise ratios.

...read moreread less

18,761 citations

Book•

Large Deviations and Applications

[...]

Srinivasa R. S. Varadhan¹•Institutions (1)

Courant Institute of Mathematical Sciences¹

01 Jan 1984

TL;DR: The large deviation problem for empirical distributions of Markov Processes has been studied in this article, where it has been applied to the problem of the Wiener Sausage problem.

...read moreread less

Abstract: Large Deviations Cramer's Theorem Multidimensional Version of Cramer's Theorem An Infinite Dimensional Example: Brownian Motion The Ventcel-Freidlin Theory The Exit Problem Empirical Distributions The Large Deviation Problem for Empirical Distributions of Markov Processes Some Properties of Entropy Upper Bounds Lower Bounds Contraction Principle Application to the Problem of the Wiener Sausage The Polaron Problem

...read moreread less

921 citations

Journal Article•DOI•

Diffusions for global optimizations

[...]

Stuart Geman, Chii-Ruey Hwang

01 Sep 1986-Siam Journal on Control and Optimization

TL;DR: In this article, the authors seek a global minimum of $U:[0, 1]^n \to R$, where R is the number of vertices in an n-dimensional (n-dimensional) Brownian motion.

...read moreread less

Abstract: We seek a global minimum of $U:[0,1]^n \to R$ The solution to $({d / {dt}})x_t = - abla U(x_t )$ will find local minima The solution to $dx_t = - abla U(x_t )dt + \sqrt {2T} dw_t $, where w is standard (n-dimensional) Brownian motion and the boundaries are reflecting, will concentrate near the global minima of U, at least when “temperature” T is small: the equilibrium distribution for $x_t $, is Gibbs with density $\pi _T (x)\alpha \exp \{ - {{U(x)} / T}\} $ This suggests setting $T = T(t) \downarrow 0$, to find the global minima of U We give conditions on $U(x)$ and $T(t)$ such that the solution to $dx_t = - abla U(x_t )dt + \sqrt {2T} dw_t $ converges weakly to a distribution concentrated on the global minima of U

...read moreread less

305 citations

Journal Article•DOI•

The averaging principle and theorems on large deviations

[...]

Mark Freidlin

31 Oct 1978-Russian Mathematical Surveys

TL;DR: In this article, the average principle for stochastic differential equations is used to describe the behavior of the system over large time intervals, and the probability of large deviations from the averaged system is analyzed.

...read moreread less

Abstract: ContentsIntroduction § 1. Null approximation and normal deviations § 2. Large deviations from the averaged system § 3. Large deviations. Continuation § 4. Moderate deviations § 5. The behaviour of the system over large time intervals § 6. Examples. Remarks § 7. The averaging principle for stochastic differential equations § 8. Inequalities for the probabilities of large deviationsReferences

...read moreread less

94 citations