scispace - formally typeset
Book ChapterDOI

Generalization Error of Limear Neural Networks in Unidentifiable Cases

Reads0
Chats0
TLDR
It is shown that the expectation of the generalization error in the unidentifiable cases is larger than what is given by the usual asymptotic theory, and dependent on the rank of the target function.
Abstract
The statistical asymptotic theory is often used in theoretical results in computational and statistical learning theory It describes the limiting distribution of the maximum likelihood estimator (MLE) as an normal distribution However, in layered models such as neural networks, the regularity condition of the asymptotic theory is not necessarily satisfied The true parameter is not identifiable, if the target function can be realized by a network of smaller size than the size of the model There has been little known on the behavior of the MLE in these cases of neural networks In this paper, we analyze the expectation of the generalization error of three-layer linear neural networks, and elucidate a strange behavior in unidentifiable cases We show that the expectation of the generalization error in the unidentifiable cases is larger than what is given by the usual asymptotic theory, and dependent on the rank of the target function

read more

Citations
More filters
Journal ArticleDOI

Algebraic Analysis for Nonidentifiable Learning Machines

TL;DR: It is rigorously proved that the Bayesian stochastic complexity or the free energy is asymptotically equal to 1 logn (m1 1) loglogn + constant, where n is the number of training samples and 1 and m1 are the rational number and the natural number, which are determined as the birational invariant values of the singularities in the parameter space.
Proceedings Article

Statistical Performance of Convex Tensor Decomposition

TL;DR: Under some conditions that the mean squared error of the convex method scales linearly with the quantity the authors call the normalized rank of the true tensor, which naturally extends the analysis of convex low-rank matrix estimation to tensors.
Journal ArticleDOI

Singularities Affect Dynamics of Learning in Neuromanifolds

TL;DR: An overview of the phenomena caused by the singularities of statistical manifolds related to multilayer perceptrons and gaussian mixtures is given and the natural gradient method is shown to perform well because it takes the singular geometrical structure into account.
BookDOI

Machine Learning: ECML 2000

TL;DR: This talk describes how information about the search process can be taken into account when evaluating hypotheses, and how the expected generalization error of a hypothesis is computed as a function of the search steps leading to it.
Journal ArticleDOI

Likelihood ratio of unidentifiable models and multilayer neural networks

TL;DR: The behavior of the maximum likelihood estimator (MLE), in the case that the true parameter cannot be identified uniquely, is discussed, and a larger order is proved if the true function is given by a smaller model.
References
More filters
Book

Perturbation theory for linear operators

Tosio Kato
TL;DR: The monograph by T Kato as discussed by the authors is an excellent reference work in the theory of linear operators in Banach and Hilbert spaces and is a thoroughly worthwhile reference work both for graduate students in functional analysis as well as for researchers in perturbation, spectral, and scattering theory.
Journal ArticleDOI

The Strong Limits of Random Matrix Spectra for Sample Matrices of Independent Elements

TL;DR: In this paper, the authors prove almost-sure convergence of the empirical measure of the normalized singular values of increasing rectangular submatrices of an infinite random matrix of independent elements, where the matrix elements are required to have uniformly bounded central $2 + εth moments, and the same means and variances within a row.
Journal ArticleDOI

Learning in linear neural networks: a survey

TL;DR: Most of the known results on linear networks, including backpropagation learning and the structure of the error function landscape, the temporal evolution of generalization, and unsupervised learning algorithms and their properties are surveyed.
Book

Multivariate reduced-rank regression

Neil H. Timm
TL;DR: Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.
Journal ArticleDOI

A regularity condition of the information matrix of a multilayer perceptron network

TL;DR: This paper proves rigorously that the Fisher information matrix of a three-layer perceptron network is positive definite if and only if the network is irreducible.