Book ChapterDOI
Generalization Error of Limear Neural Networks in Unidentifiable Cases
Kenji Fukumizu
- pp 51-62
Reads0
Chats0
TLDR
It is shown that the expectation of the generalization error in the unidentifiable cases is larger than what is given by the usual asymptotic theory, and dependent on the rank of the target function.Abstract:
The statistical asymptotic theory is often used in theoretical results in computational and statistical learning theory It describes the limiting distribution of the maximum likelihood estimator (MLE) as an normal distribution However, in layered models such as neural networks, the regularity condition of the asymptotic theory is not necessarily satisfied The true parameter is not identifiable, if the target function can be realized by a network of smaller size than the size of the model There has been little known on the behavior of the MLE in these cases of neural networks In this paper, we analyze the expectation of the generalization error of three-layer linear neural networks, and elucidate a strange behavior in unidentifiable cases We show that the expectation of the generalization error in the unidentifiable cases is larger than what is given by the usual asymptotic theory, and dependent on the rank of the target functionread more
Citations
More filters
Journal ArticleDOI
Algebraic Analysis for Nonidentifiable Learning Machines
TL;DR: It is rigorously proved that the Bayesian stochastic complexity or the free energy is asymptotically equal to 1 logn (m1 1) loglogn + constant, where n is the number of training samples and 1 and m1 are the rational number and the natural number, which are determined as the birational invariant values of the singularities in the parameter space.
Proceedings Article
Statistical Performance of Convex Tensor Decomposition
TL;DR: Under some conditions that the mean squared error of the convex method scales linearly with the quantity the authors call the normalized rank of the true tensor, which naturally extends the analysis of convex low-rank matrix estimation to tensors.
Journal ArticleDOI
Singularities Affect Dynamics of Learning in Neuromanifolds
TL;DR: An overview of the phenomena caused by the singularities of statistical manifolds related to multilayer perceptrons and gaussian mixtures is given and the natural gradient method is shown to perform well because it takes the singular geometrical structure into account.
BookDOI
Machine Learning: ECML 2000
TL;DR: This talk describes how information about the search process can be taken into account when evaluating hypotheses, and how the expected generalization error of a hypothesis is computed as a function of the search steps leading to it.
Journal ArticleDOI
Likelihood ratio of unidentifiable models and multilayer neural networks
TL;DR: The behavior of the maximum likelihood estimator (MLE), in the case that the true parameter cannot be identified uniquely, is discussed, and a larger order is proved if the true function is given by a smaller model.
References
More filters
Book
Perturbation theory for linear operators
TL;DR: The monograph by T Kato as discussed by the authors is an excellent reference work in the theory of linear operators in Banach and Hilbert spaces and is a thoroughly worthwhile reference work both for graduate students in functional analysis as well as for researchers in perturbation, spectral, and scattering theory.
Journal ArticleDOI
The Strong Limits of Random Matrix Spectra for Sample Matrices of Independent Elements
TL;DR: In this paper, the authors prove almost-sure convergence of the empirical measure of the normalized singular values of increasing rectangular submatrices of an infinite random matrix of independent elements, where the matrix elements are required to have uniformly bounded central $2 + εth moments, and the same means and variances within a row.
Journal ArticleDOI
Learning in linear neural networks: a survey
Pierre Baldi,Kurt Hornik +1 more
TL;DR: Most of the known results on linear networks, including backpropagation learning and the structure of the error function landscape, the temporal evolution of generalization, and unsupervised learning algorithms and their properties are surveyed.
Book
Multivariate reduced-rank regression
TL;DR: Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.
Journal ArticleDOI
A regularity condition of the information matrix of a multilayer perceptron network
TL;DR: This paper proves rigorously that the Fisher information matrix of a three-layer perceptron network is positive definite if and only if the network is irreducible.