scispace - formally typeset
Search or ask a question

Showing papers on "Generalization published in 1992"


Proceedings ArticleDOI
01 Jul 1992
TL;DR: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented, applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions.
Abstract: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to the decision boundary. Bounds on the generalization performance based on the leave-one-out method and the VC-dimension are given. Experimental results on optical character recognition problems demonstrate the good generalization obtained when compared with other learning algorithms.

11,211 citations


Journal ArticleDOI
TL;DR: The conclusion is that for almost any real-world generalization problem one should use some version of stacked generalization to minimize the generalization error rate.

5,834 citations


Journal ArticleDOI
TL;DR: A generalization of the numerical renormalization-group procedure used first by Wilson for the Kondo problem is presented and it is shown that this formulation is optimal in a certain sense.
Abstract: A generalization of the numerical renormalization-group procedure used first by Wilson for the Kondo problem is presented. It is shown that this formulation is optimal in a certain sense. As a demonstration of the effectiveness of this approach, results from numerical real-space renormalization-group calculations for Heisenberg chains are presented.

5,625 citations



Journal ArticleDOI
TL;DR: A generalization of Allen's interval-based approach to temporal reasoning is presented and the notion of ‘conceptual neighborhood’ of qualitative relations between events is central to the presented approach, using semi-intervals rather than intervals as the basic units of knowledge.

701 citations


Journal ArticleDOI
TL;DR: It is shown that for smooth networks, i.e., those with continuously varying weights and smooth transfer functions, the generalization curve asymptotically obeys an inverse power law, while for nonsmooth networks other behaviors can appear, depending on the nature of the nonlinearities as well as the realizability of the rule.
Abstract: Learning from examples in feedforward neural networks is studied within a statistical-mechanical framework. Training is assumed to be stochastic, leading to a Gibbs distribution of networks characterized by a temperature parameter T. Learning of realizable rules as well as of unrealizable rules is considered. In the latter case, the target rule cannot be perfectly realized by a network of the given architecture. Two useful approximate theories of learning from examples are studied: the high-temperature limit and the annealed approximation. Exact treatment of the quenched disorder generated by the random sampling of the examples leads to the use of the replica theory. Of primary interest is the generalization curve, namely, the average generalization error ${\mathrm{\ensuremath{\epsilon}}}_{\mathit{g}}$ versus the number of examples P used for training. The theory implies that, for a reduction in ${\mathrm{\ensuremath{\epsilon}}}_{\mathit{g}}$ that remains finite in the large-N limit, P should generally scale as \ensuremath{\alpha}N, where N is the number of independently adjustable weights in the network. We show that for smooth networks, i.e., those with continuously varying weights and smooth transfer functions, the generalization curve asymptotically obeys an inverse power law. In contrast, for nonsmooth networks other behaviors can appear, depending on the nature of the nonlinearities as well as the realizability of the rule. In particular, a discontinuous learning transition from a state of poor to a state of perfect generalization can occur in nonsmooth networks learning realizable rules.We illustrate both gradual and continuous learning with a detailed analytical and numerical study of several single-layer perceptron models. Comparing with the exact replica theory of perceptron learning, we find that for realizable rules the high-temperature and annealed theories provide very good approximations to the generalization performance. Assuming this to hold for multilayer networks as well, we propose a classification of possible asymptotic forms of learning curves in general realizable models. For unrealizable rules we find that the above approximations fail in general to predict correctly the shapes of the generalization curves. Another indication of the important role of quenched disorder for unrealizable rules is that the generalization error is not necessarily a monotonically increasing function of temperature. Also, unrealizable rules can possess genuine spin-glass phases indicative of degenerate minima separated by high barriers.

461 citations


Journal ArticleDOI
TL;DR: It is proved that AC-5, in conjuction with node consistency, provides a decision procedure for these constraints running in time $O(ed)$ and has an important application in constraint logic programming over finite domains.

450 citations


Journal ArticleDOI
TL;DR: In this article, the authors reviewed the assumptions of conventional effective-mass theory, especially the one of continuity of the envelope function at an abrupt interface, and the need for a fresh approach becomes apparent.
Abstract: The assumptions of conventional effective-mass theory, especially the one of continuity of the envelope function at an abrupt interface, are reviewed critically so that the need for a fresh approach becomes apparent. A new envelope-function method, developed by the author over the past few years, is reviewed. This new method is based on both a generalization and a novel application to microstructures of the Luttinger-Kohn envelope-function expansion. The differences between this new method and the conventional envelope-function method are emphasized. An alternative derivation of the new envelope-function equations, which are exact, to that already published is provided. A new and improved derivation of the author's effective-mass equation is given, in which the differences in the zone-centre eigenstates of the constituent crystals are taken into account. The cause of the kinks in the conventional effective-mass envelope function, at abrupt effective-mass changes, is identified.

375 citations


Book
01 Jan 1992

311 citations


Book ChapterDOI
01 Jun 1992
TL;DR: The scheme is viewed as a generalization of the inference methods of classical time-series analysis in the sense that it allows description of non-linear, multivariate dynamic systems with complex conditional independence structures.
Abstract: A computational scheme for reasoning about dynamic systems using (causal) probabilistic networks is presented. The scheme is based on the framework of Lauritzen and Spiegel-halter (1988), and may be viewed as a generalization of the inference methods of classical time-series analysis in the sense that it allows description of non-linear, multivariate dynamic systems with complex conditional independence structures. Further, the scheme provides a method for efficient backward smoothing and possibilities for efficient, approximate forecasting methods. The scheme has been implemented on top of the HUGIN shell.

217 citations


Proceedings Article
30 Nov 1992
TL;DR: It is shown that even high-order polynomial classifiers in high dimensional spaces can be trained with a small amount of training data and yet generalize better than classifiers with a smaller VC-dimension.
Abstract: Large VC-dimension classifiers can learn difficult tasks, but are usually impractical because they generalize well only if they are trained with huge quantities of data. In this paper we show that even high-order polynomial classifiers in high dimensional spaces can be trained with a small amount of training data and yet generalize better than classifiers with a smaller VC-dimension. This is achieved with a maximum margin algorithm (the Generalized Portrait). The technique is applicable to a wide variety of classifiers, including Perceptrons, polynomial classifiers (sigma-pi unit networks) and Radial Basis Functions. The effective number of parameters is adjusted automatically by the training algorithm to match the complexity of the problem. It is shown to equal the number of those training patterns which are closest patterns to the decision boundary (supporting patterns). Bounds on the generalization error and the speed of convergence of the algorithm are given. Experimental results on handwritten digit recognition demonstrate good generalization compared to other algorithms.

Journal ArticleDOI
TL;DR: Tow schemes for simplifying quantifier alternation, called Skolemization and raising, are presented and various optimizations on the general unification search problem are discussed.

Journal ArticleDOI
TL;DR: This paper summarizes the results of a retrospective review of generalization in the context of social skills research with preschool children and reveals some differences concerning the practices employed by studies within each group.
Abstract: This paper summarizes the results of a retrospective review of generalization in the context of social skills research with preschool children. A review of studies from 22 journals (1976 to 1990) that assessed generalization as part of social interaction research provided information concerning the prevalence of studies that have assessed generalization, common practices concerning the production and assessment of generalization, and the overall success of obtaining generalization and maintenance of social behaviors. A comparison of the most and least successful studies, with respect to generalization, revealed some differences concerning the practices employed by studies within each group. Differences differentially related to the production of generalization are discussed and recommendations are provided to guide and support future research efforts.

Journal ArticleDOI
TL;DR: A new set of algorithms for locally–adaptive line generalization based on the so-called natural principle of objective generalization is described, which is compared with benchmarks based on both manual cartographic procedures and a standard method found in many geographical information systems.
Abstract: This article describes a new set of algorithms for locally–adaptive line generalization based on the so-called natural principle of objective generalization. The drawbacks of existing methods of line generalization are briefly discussed and the algorithms described. The performance of these new methods is compared with benchmarks based on both manual cartographic procedures and a standard method found in many geographical information systems.

01 Jan 1992
TL;DR: This paper presents a meta-analysis of the generalization performance of backpropagation learning on a syllabification task in connection with Connectionism and natural language processing.
Abstract: Citation for published version (APA): Daelemans, W. M. P., & Bosch, A. P. J. (1992). Generalization performance of backpropagation learning on a syllabification task. In M. F. J. Drossaers, & A. Nijholt (Eds.), Connectionism and natural language processing: Proceedings of the third Twente Workshop on Language Technology, TWLT3, Enschede, May 12-13, 1992 (organized by Project Parlevink) (Vol. 3, pp. 27-38). (Memoranda informatica; Vol. 3, No. 92-64). University of Twente, Department of Computer Science.

Journal ArticleDOI
TL;DR: An algorithm for finding, in a multicriteria network, Pareto-optimal paths, one each for each efficient objective vector, a generalization of an earlier one for the bicriterion case.

Journal ArticleDOI
TL;DR: In this paper, a vector valued variational principle by using a general concept of ∊-efficiency and a nonconvex separation theorem is presented. But this principle is not applicable to the problem of vector valued VAE.
Abstract: This paper presents a vector valued variational principle by using a general concept of ∊-efficiency and a nonconvex separation theorem

Journal ArticleDOI
TL;DR: It is shown that a weight decay of the same size as the variance of the noise on the teacher improves on the generalization and suppresses the overfitting, and weight noise and output noise acts similarly above the transition at alpha =1.
Abstract: The authors study the evolution of the generalization ability of a simple linear perceptron with N inputs which learns to imitate a 'teacher perceptron'. The system is trained on p= alpha N example inputs drawn from some distribution and the generalization ability is measured by the average agreement with the teacher on test examples drawn from the same distribution. The dynamics may be solved analytically and exhibits a phase transition from imperfect to perfect generalization at alpha =1, when there are no errors (static noise) in the training examples. If the examples are produced by an erroneous teacher, overfitting is observed, i.e. the generalization error starts to increase after a finite time of training. It is shown that a weight decay of the same size as the variance of the noise (errors) on the teacher improves on the generalization and suppresses the overfitting. The generalization error as a function of time is calculated numerically for various values of the parameters. Finally dynamic noise in the training is considered. White noise on the input corresponds on average to a weight decay, and can thus improve generalization, whereas white noise on the weights or the output degrades generalization. Generalization is particularly sensitive to noise on the weights (for alpha (1) where it makes the error constantly increase with time, but this effect is also shown to be damped by a weight decay. Weight noise and output noise acts similarly above the transition at alpha =1.

Journal ArticleDOI
TL;DR: In this article, a generalization of the standard normalized quadratic form has been proposed, which can provide a local second-order approximation while maintaining the correct curvature globally.
Abstract: In this paper, the authors propose and estimate a system of producer output supply and input demand functions that generalizes the standard normalized quadratic form. The generalization adds either linear or quadratic splines in a time (or technical change) variable, yet retains the main attractive property of the normalized quadratic, which is that it can provide a local second order approximation while maintaining the correct curvature globally. However, the generalization has additional desirable approximation properties with respect to the splined variable and, thus, permits a more flexible treatment of technical change than is provided by standard flexible functional forms. Copyright 1992 by Economics Department of the University of Pennsylvania and the Osaka University Institute of Social and Economic Research Association.

Journal ArticleDOI
TL;DR: In this article, a new learning algorithm for the one-layer perceptron is presented, which aims to maximize the generalization gain per example by maximizing the expected stability of the example in the teacher perceptron.
Abstract: A new learning algorithm for the one-layer perceptron is presented. It aims to maximize the generalization gain per example. Analytical results are obtained for the case of single presentation of each example. The weight attached to a Hebbian term is a function of the expected stability of the example in the teacher perceptron. This leads to the obtention of upper bounds for the generalization ability. This scheme can be iterated and the results of numerical simulations show that it converges, within errors, to the theoretical optimal generalization ability of the Bayes algorithm. Analytical and numerical results for an algorithm with maximized generalization in the learning strategy with selection of examples are obtained and it is proved that, as expected, orthogonal selection is optimal. Exponential decay of the generalization error is obtained for the single presentation of selected examples.

Book ChapterDOI
01 Jan 1992
TL;DR: An improved version of a self-organizing network model which has been proposed at the ICANN-91 and since then has been applied to various problems is described, with the generalization of the model to arbitrary dimension and the introduction of a local estimate of the probability density.
Abstract: In this paper an improved version of a self-organizing network model is described which has been proposed at the ICANN-91[3] and since then has been applied to various problems [1,2,5]. The improvements presented here are the generalization of the model to arbitrary dimension and the introduction of a local estimate of the probability density. The latter leads to a very clear distinction between necessary and superfluous neurons with respect to modeling a given probability distribution. This makes it possible to automatically generate network structures that are nearly optimally suited for the distribution at hand.

Journal ArticleDOI
TL;DR: In this article, the authors question the independence assumption by theoretically integrating situational variables into the validity generalization estimation process, which is based on the assumption that the effects of statistical artifacts on validities are independent of the effects on situational moderators.
Abstract: A primary objective of validity generalization analysis is to decompose the between-situation variance in validities into (a) variance attributable to between-situation differences in statistical artifacts and (b) variance attributable to between-situation differences in (unidentified) situational moderators. This process is based on the assumption that the effects of statistical artifacts on validities are independent of the effects of situational moderators on validities. The present article seeks to question the independence assumption by theoretically integrating situational variables into the validity generalization estimation process.

Book ChapterDOI
01 Jun 1992
TL;DR: The fundamental updating process in the transferable belief model is related to the concept of specialization and can be described by a specialization matrix, and it is shown that Dempster's rule of conditioning corresponds essentially to the least committed specialization.
Abstract: The fundamental updating process in the transferable belief model is related to the concept of specialization and can be described by a specialization matrix. The degree of belief in the truth of a proposition is a degree of justified support. The Principle of Minimal Commitment implies that one should never give more support to the truth of a proposition than justified. We show that Dempster's rule of conditioning corresponds essentially to the least committed specialization, and that Dempster's rule of combination results essentially from commutativity requirements. The concept of generalization, dual to the concept of specialization, is described.

Book ChapterDOI
TL;DR: In this article, the minimality and realization theory for discrete time-varying finite dimensional linear systems with time varying state spaces has been developed, and the results appear as a natural generalization of the corresponding theory for the time independent case.
Abstract: The minimality and realization theory is developed for discrete time-varying finite dimensional linear systems with time-varying state spaces. The results appear as a natural generalization of the corresponding theory for the time-independent case. Special attention is paid to periodical systems. The case when the state space dimensions do not change in time is re-examined.

Journal ArticleDOI
01 Jan 1992
TL;DR: The project described in this article had two primary objectives: to design a strategy for terrain generalization that is adaptive to different terrain types, scales, and map purposes, and to implement and evaluate some components of this approach to assess its potential.
Abstract: The project described in this article had two primary objectives: to design a strategy for terrain generalization that is adaptive to different terrain types, scales, and map purposes, and to implement and evaluate some components of this approach to assess its potential. The strategy includes three different generalization methods: a global filtering procedure, a selective (iterative) filtering method, and a heuristic approach based on the generalization of the terrain's structure lines. For a given generalization problem that is constrained by the terrain character, map objective, scale, graphic limits, and data quality, the appropriate technique is selected through structure and process recognition procedures. Some of the key components of the strategy have been implemented and some experiments were conducted. Other parts were covered by proposing models that could serve as implementation guidelines. Our work was intended to break ground for future research. Recommendations for appropriate parameter se...

Journal ArticleDOI
TL;DR: It is found that, in some cases, the average generalization of neural networks trained on a variety of simple functions is significantly better than the VC bound: the approach to perfect performance is exponential in the number of examples m, rather than the 1/m result of the bound.
Abstract: We describe a series of numerical experiments that measure the average generalization capability of neural networks trained on a variety of simple functions. These experiments are designed to test the relationship between average generalization performance and the worst-case bounds obtained from formal learning theory using the Vapnik-Chervonenkis (VC) dimension (Blumer et al. 1989; Haussler et al. 1990). Recent statistical learning theories (Tishby et al. 1989; Schwartz et al. 1990) suggest that surpassing these bounds might be possible if the spectrum of possible generalizations has a “gap” near perfect performance. We indeed find that, in some cases, the average generalization is significantly better than the VC bound: the approach to perfect performance is exponential in the number of examples m, rather than the 1/m result of the bound. However, in these cases, we have not found evidence of the gap predicted by the above statistical theories. In other cases, we do find the 1/m behavior of the VC bound...

Journal ArticleDOI
TL;DR: In this paper, the notion of p-capacity for a reversible Markov operator on a general measure space was defined and it was shown that uniform estimates for the ratio of capacity and measure are equivalent to certain imbedding theorems for the Orlicz and Dirichlet norms.
Abstract: We define the notion of p-capacity for a reversible Markov operator on a general measure space and prove that uniform estimates for the ratio of capacity and measure are equivalent to certain imbedding theorems for the Orlicz and Dirichlet norms. As a corollary we get results on connections between embedding theorems and isoperimetric properties for general Markov operators and, particularly, a generalization of the Kesten theorem on the spectral radius of random walks on amenable groups for the case of arbitrary graphs with non-finitely supported transition probabilities.


Book ChapterDOI
Henry Kautz1, Bart Selman1
12 Jul 1992
TL;DR: It is proved that unless NP ⊆ non-uniform P, not all theories have small Horn least-upper-bound approximations.
Abstract: Knowledge compilation speeds inference by creating tractable approximations of a knowledge base, but this advantage is lost if the approximations are too large. We show how learning concept generalizations can allow for a more compact representation of the tractable theory. We also give a general induction rule for generating such concept generalizations. Finally, we prove that unless NP ⊆ non-uniform P, not all theories have small Horn least-upper-bound approximations.