scispace - formally typeset
Search or ask a question

Showing papers on "Generalization published in 1997"


Book
01 Jan 1997
TL;DR: In this paper, the authors consider the effects of an external field (or weight) on the minimum energy problem and provide a unified approach to seemingly different problems in constructive analysis, such as the asymptotic analysis of orthogonal polynomials, the limited behavior of weighted Fekete points, the existence and construction of fast decreasing polynomial, the numerical conformal mapping of simply and doubly connected domains, generalization of the Weierstrass approximation theorem to varying weights, and the determination of convergence rates for best approximating rational functions.
Abstract: This treatment of potential theory emphasizes the effects of an external field (or weight) on the minimum energy problem. Several important aspects of the external field problem (and its extension to signed measures) justify its special attention. The most striking is that it provides a unified approach to seemingly different problems in constructive analysis. These include the asymptotic analysis of orthogonal polynomials, the limited behavior of weighted Fekete points; the existence and construction of fast decreasing polynomials; the numerical conformal mapping of simply and doubly connected domains; generalization of the Weierstrass approximation theorem to varying weights; and the determination of convergence rates for best approximating rational functions.

1,560 citations


Journal ArticleDOI
Sherif Hashem1
TL;DR: This paper extends the idea of optimal linear combinations (OLCs) of neural networks and presents two algorithms for selecting the component networks for the combination to improve the generalization ability of OLCs, and demonstrates significant improvements in model accuracy.

513 citations


Journal ArticleDOI
TL;DR: A characterization of learnability in the probabilistic concept model, solving an open problem posed by Kearns and Schapire, and shows that the accuracy parameter plays a crucial role in determining the effective complexity of the learner's hypothesis class.
Abstract: Learnability in Valiant's PAC learning model has been shown to be strongly related to the existence of uniform laws of large numbers These laws define a distribution-free convergence property of means to expectations uniformly over classes of random variables Classes of real-valued functions enjoying such a property are also known as uniform Glivenko-Cantelli classes In this paper, we prove, through a generalization of Sauer's lemma that may be interesting in its own right, a new characterization of uniform Glivenko-Cantelli classes Our characterization yields Dudley, Gine´, and Zinn's previous characterization as a corollary Furthermore, it is the first based on a Gine´, and Zinn's previous characterization as a corollary Furthermore, it is the first based on a simple combinatorial quantity generalizing the Vapnik-Chervonenkis dimension We apply this result to obtain the weakest combinatorial condition known to imply PAC learnability in the statistical regression (or “agnostic”) framework Furthermore, we find a characterization of learnability in the probabilistic concept model, solving an open problem posed by Kearns and Schapire These results show that the accuracy parameter plays a crucial role in determining the effective complexity of the learner's hypothesis class

398 citations


Journal ArticleDOI
TL;DR: A statistical theory for overtraining is proposed and it is shown that the asymptotic gain in the generalization error is small if the authors perform early stopping, even if they have access to the optimal stopping time.
Abstract: A statistical theory for overtraining is proposed. The analysis treats general realizable stochastic neural networks, trained with Kullback-Leibler divergence in the asymptotic case of a large number of training examples. It is shown that the asymptotic gain in the generalization error is small if we perform early stopping, even if we have access to the optimal stopping time. Based on the cross-validation stopping we consider the ratio the examples should be divided into training and cross-validation sets in order to obtain the optimum performance. Although cross-validated early stopping is useless in the asymptotic region, it surely decreases the generalization error in the nonasymptotic region. Our large scale simulations done on a CM5 are in good agreement with our analytical findings.

350 citations


Book
02 Jan 1997
TL;DR: In this article, the authors introduce integer flows basic properties of integer flows, including the following properties: nowhere zero 4-flows nowhere zero 3-flow nowhere zero 2-streams nowhere zero 1-stream nowhere zero k-flow everywhere zero 0.
Abstract: Introduction to integer flows basic properties of integer flows nowhere-zero 4-flows nowhere-zero 3-flows nowhere-zero k-flows faithful cycle covers cycle double covers shortest cycle covers generalization and unification compatible decompositions. Appendices: fundamental theories hints for exercises terminology.

323 citations


Journal ArticleDOI
TL;DR: A complex-valued version of the back-propagation algorithm (called 'Complex-BP'), which can be applied to multi-layered neural networks whose weights, threshold values, input and output signals are all complex numbers, is presented.

309 citations


Journal ArticleDOI
TL;DR: In this paper, the authors generalize a classical result of T. Kato on the existence of global solutions to the Navier-Stokes system in C([0,8);L3(R3)).
Abstract: We generalize a classical result of T. Kato on the existence of global solutions to the Navier-Stokes system in C([0,8);L3(R3)). More precisely, we show that if the initial data are sufficiently oscillating, in a suitable Besov space, then Kato's solution exists globally. As a corollary to this result, we obtain a theory of existence of self-similar solutions for the Navier-Stokes equations.

283 citations


Journal ArticleDOI
TL;DR: In this paper, an explicit travelling solitary wave solution to a compound KdV-Burgers equation is obtained by using an automated method. And a two-dimensional generalization is discussed.

278 citations


Book
01 Jan 1997
TL;DR: This paper presents Vapnik-Chervonenkis and Pollard (Pseudo-) Dimensions, a model of learning based on uniform Convergence of Empirical Means, and applications to Neural Networks and Control Systems, and some Open Problems.
Abstract: Contents: Preface.- Introduction.- Preliminaries.- Problem Formulations.- Vapnik-Chervonenkis and Pollard (Pseudo-) Dimensions.- Uniform Convergence of Empirical Means.- Learning Under a Fixed Probability Measure.- Distribution-ree Learning.- Learning Under an Intermediate Family of Probabilities.- Alternate Models of Learning.- Applications to Neural Networks.- Applications to Control Systems.- Some Open Problems.

255 citations


Journal ArticleDOI
Suresh Sethi, Feng Cheng1
TL;DR: In this paper, a generalization of classical inventory models (with fixed ordering costs) that exhibit (s, S) policies is presented. But the model is not extended to include realistic features such as no ordering periods and storage and service level constraints, and the distribution of demands in successive periods is dependent on a Markov chain.
Abstract: This paper is concerned with a generalization of classical inventory models (with fixed ordering costs) that exhibit (s, S) policies. In our model, the distribution of demands in successive periods is dependent on a Markov chain. The model includes the case of cyclic or seasonal demand. The model is further extended to incorporate some other realistic features such as no ordering periods and storage and service level constraints. Both finite and infinite horizon nonstationary problems are considered. We show that (s, S) policies are also optimal for the generalized model as well as its extensions.

233 citations


Proceedings Article
08 Jul 1997
TL;DR: Three intuitive noise-tolerant algorithms that can be used to prune instances from the training set are presented and the algorithm that achieves the highest reduction in storage also results in the highest generalization accuracy of the three methods.
Abstract: The nearest neighbor algorithm and its derivatives are often quite successful at learning a concept from a training set and providing good generalization on subsequent input vectors. However, these techniques often retain the entire training set in memory, resulting in large memory requirements and slow execution speed, as well as a sensitivity to noise. This paper provides a discussion of issues related to reducing the number of instances retained in memory while maintaining (and sometimes improving) generalization accuracy, and mentions algorithms other researchers have used to address this problem. It presents three intuitive noise-tolerant algorithms that can be used to prune instances from the training set. In experiments on 29 applications, the algorithm that achieves the highest reduction in storage also results in the highest generalization accuracy of the three methods.

Journal ArticleDOI
TL;DR: It is shown thai, using a general definition of possibility measures, and a generalization of Sugeno's fuzzy integral-the semi-normed fuzzy integral, or possibility integral-.
Abstract: In this paper, I provide the basis for a measure- and integral-theoretic formulation of possibility theory. It is shown thai, using a general definition of possibility measures, and a generalization of Sugeno's fuzzy integral-the semi-normed fuzzy integral, or possibility integral-. a unified and consistent account can be given of many of the possibilistic results extant in the literature. The striking formal analogy between this treatment of possibility theory, using possibility integrals, and Kolmogorov's measure-theoretic formulation of probability theory, using Lebesgue integrals, is explored and exploited. I introduce and study possibilistic and fuzzy variables as possibilistic counterparts of stochastic and real stochastic variables respeclively, and develop the notion of a possibility distribution for these variables. The almost everywhere equality and dominance of fuzzy variables is defined and studied. The proof is given for a Radon-Nikodym-like theorem in possibility theory. Following t...

Proceedings Article
23 Aug 1997
TL;DR: This paper addresses two crucial issues which have been considered to be a `black art' in classiication tasks ever since the introduction of stacked generalization in 1992 by Wolpert: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input.
Abstract: Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy. In this paper we address two crucial issues which have been considered to be a `black art' in classiication tasks ever since the introduction of stacked generalization in 1992 by Wolpert: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input. We demonstrate the eeectiveness of stacked generalization for combining three diierent types of learning algorithms.


Journal ArticleDOI
TL;DR: Simulations focus on the task of discovering "algorithmically simple" neural networks with low Kolmogorov complexity and high generalization capability and it is demonstrated that the method can lead to generalization results unmatchable by previous neural network algorithms.

Journal ArticleDOI
TL;DR: In this paper, a generalization of non-commutative geometry and gauge theories based on ternary Z3-graded structures is proposed, where all products of two entities are left free, the only constraining relations being imposed on Ternary products.
Abstract: We propose a generalization of non-commutative geometry and gauge theories based on ternary Z3-graded structures. In the new algebraic structures we define, all products of two entities are left free, the only constraining relations being imposed on ternary products. These relations reflect the action of the Z3-group, which may be either trivial, i.e., abc=bca=cab, generalizing the usual commutativity, or non-trivial, i.e., abc=jbca, with j=e(2πi)/3. The usual Z2-graded structures such as Grassmann, Lie, and Clifford algebras are generalized to the Z3-graded case. Certain suggestions concerning the eventual use of these new structures in physics of elementary particles and fields are exposed.

Journal ArticleDOI
TL;DR: The generalization performance achieved by a simple model ensemble of linear students is calculated exactly in the thermodynamic limit of a large number of input components and shows a surprisingly rich behavior.
Abstract: Within the context of learning a rule from examples, we study the general characteristics of learning with ensembles. The generalization performance achieved by a simple model ensemble of linear students is calculated exactly in the thermodynamic limit of a large number of input components and shows a surprisingly rich behavior. Our main findings are the following. For learning in large ensembles, it is advantageous to use underregularized students, which actually overfit the training data. Globally optimal generalization performance can be obtained by choosing the training set sizes of the students optimally. For smaller ensembles, optimization of the ensemble weights can yield significant improvements in ensemble generalization performance, in particular if the individual students are subject to noise in the training process. Choosing students with a wide range of regularization parameters makes this improvement robust against changes in the unknown level of corruption of the training data. @S1063-651X~97!00701-0#

Journal ArticleDOI
TL;DR: In this article, a generalized form of vectorial equilibria is proposed, and, using an abstract monotonicity condition, an existence result is demonstrated for the existence of a generalized vectorial equilibrium.
Abstract: A generalized form of vectorial equilibria is proposed, and, using an abstract monotonicity condition, an existence result is demonstrated.

Journal ArticleDOI
TL;DR: In this article, the authors present a survey of recent results, scattered in a series of papers that have appeared during the past five years, whose common denominator has been the use of cubic relations in various algebraic structures.
Abstract: We present a survey of recent results, scattered in a series of papers that have appeared during the past five years, whose common denominator has been the use of cubic relations in various algebraic structures. Cubic (or ternary) relations can represent different symmetries with respect to the permutation group S3, or its cyclic subgroup Z3. Also ordinary or ternary algebras can be divided into different classes with respect to their symmetry properties. We pay special attention to the non-associative ternary algebra of 3-forms (or cubic matrices), and Z3-graded matrix algebras. We also discuss the Z3-graded generalization of Grassmann algebras and their realization in generalized exterior differential forms dξ and d2ξ, with d3ξ=0. A new type of gauge theory based on this differential calculus is presented. Finally, a ternary generalization of Clifford algebras is introduced, and an analogue of Dirac's equation is discussed, which can be diagonalized only after taking the cube of the Z3-graded generalization of Dirac's operator. A possibility of using these ideas for the description of quark fields is suggested and discussed in the last section.

Journal ArticleDOI
TL;DR: The class ofweight inequalities is introduced, needed to describe the knapsack polyhedron when the weights of the items lie in certain intervals, and the properties of lifted minimal cover inequalities are extended to this general class of inequalities.
Abstract: This paper deals with the 0/1 knapsack polytope. In particular, we introduce the class ofweight inequalities. This class of inequalities is needed to describe the knapsack polyhedron when the weights of the items lie in certain intervals. A generalization of weight inequalities yields the so-called "weight-reduction principle" and the class of extended weight inequalities. The latter class of inequalities includes minimal cover and (l,k)-configuration inequalities. The properties of lifted minimal cover inequalities are extended to this general class of inequalities.

BookDOI
01 Jan 1997
TL;DR: In this article, the authors propose a method for contextualizing actors' behaviour based on description and inductive modelling of a specific aspect of this context: the relational pattern or "structure" of the social setting in which action is observed, which requires collecting specific data on the relationships and exchanges between all the members, and analyzing these data using specific procedures.
Abstract: Part of sociologists' work is to contextualize individual and collective behaviour (Silverman and Gubrium 1994). Contextualization has both a substantive and a methodological dimension. Substantively, it means identifying specific constraints put on some members' behaviour and specific opportunities offered to them and to others. Methodologically, it is a necessary step for comparative analysis and for appropriate generalization of results. Network analysis is an efficient way of contextualizing actors' behaviour, based on description and inductive modelling of a specific aspect of this context: the relational pattern, or ‘structure’, of the social setting in which action is observed. It requires collecting specific data on the relationships and exchanges between all the members, and analysing these data using specific procedures.1 In fact, it can be seen as a systematic and formalized version of a kind of analysis that sociologists and ethnographers have always done intuitively: collecting information...

Journal ArticleDOI
TL;DR: A standard combinatorial problem is to estimate the number of coupons, drawn at random, needed to complete a collection of all possible m types, and two computational paradigms are shown that are well suited for this type of problems.
Abstract: A standard combinatorial problem is to estimate the number (T) of coupons, drawn at random, needed to complete a collection of all possible m types. Generalizations of this problem have found many engineering applications. The usefulness of the model is hampered by the difficulties in obtaining numerical results for moments or distributions. We show two computational paradigms that are well suited for this type of problems: one, following Flajolet et al. [21], is the calculus of generating functions over regular languages. We use it to provide relatively efficient answers to several questions about the sampling process – we show it is possible to compute arbitrarily accurate approximations for quantities such as E[T], in a time which is linear in m for any type distribution, while an exponential time is required for exact calculation. It also leads to a proof of a long-standing folk-theorem, concerning the extremality of uniform reference probabilities. The second method is a generalization of the Poisson...

Journal ArticleDOI
TL;DR: This analysis demonstrates that the effects of generalization cannot be ignored when studying the evolution of symmetry preferences and symmetric signals.
Abstract: Biological displays are often symmetrical, and there is growing evidence that receivers are sensitive to these symmetries. One explanation for the evolution of such sensitivity is that symmetry reflects the quality of the signaller. An alternative is that the sensitivity may arise as a by-product of general properties of biological recognition systems. In line with the latter idea, simulations of the recognition process based on simple, artificial neural networks have suggested that generalization can give rise to preferences for particular symmetrical stimuli. However, it is not clear from these studies exactly how the preferences emerge, and to what extent the results are relevant to biological recognition systems. Here, we employ a different class of recognition models (gradient interaction models) to demonstrate more clearly how generalization can generate a preference for symmetrical variants of a display. We also point out that the predictions of the gradient interaction and network-based models regarding the effects of generalization closely match the results from empirical studies of stimulus control. Our analysis demonstrates that the effects of generalization cannot be ignored when studying the evolution of symmetry preferences and symmetric signals.

Journal ArticleDOI
TL;DR: Base Class R (base class R) and base class R (Base class r) as discussed by the authors ) are the classes of interest:Base class R and base classes R [2]
Abstract: Base Class R

Journal ArticleDOI
TL;DR: In this article, a generalization of the Poincare sphere method is proposed to represent pure states of a three-level quantum system in a convenient geometrical manner, based on the properties of the group SU(3) and its generators in the defining representation.
Abstract: We describe a recently developed generalization of the Poincare sphere method, to represent pure states of a three-level quantum system in a convenient geometrical manner. The construction depends on the properties of the group SU(3) and its generators in the defining representation, and uses geometrical objects and operations in an eight-dimensional real Euclidean space. This construction is then used to develop a generalization of the well known Pancharatnam geometric phase formula, for evolution of a three-level system along a geodesic triangle in state space.

Book ChapterDOI
Vladimir Vapnik1
08 Oct 1997
TL;DR: The general idea of the Support Vector method is described and theorems demonstrating that the generalization ability of the SV method is based on factors which classical statistics do not take into account are presented.
Abstract: The Support Vector (SV) method is a new general method of function estimation which does not depend explicitly on the dimensionality of input space It was applied for pattern recognition, regression estimation, and density estimation problems as well as for problems of solving linear operator equations In this article we describe the general idea of the SV method and present theorems demonstrating that the generalization ability of the SV method is based on factors which classical statistics do not take into account We also describe the SV method for density estimation in a set of functions defined by a mixture of an infinite number of Gaussians

Journal ArticleDOI
TL;DR: In this paper, a natural "elliptic" generalization of the classical polylogarithm is introduced, and the properties of these functions and their relationship with Eisenstein series are studied.
Abstract: In this article we introduce a natural ‘elliptic’ generalization of theclassical polylogarithms, study the properties of these functions and theirrelations with Eisenstein series.

Journal ArticleDOI
TL;DR: It is shown that neural networks which use continuous activation functions have VC dimension at least as large as the square of the number of weightsw, which settles a long-standing open question.

Journal ArticleDOI
Jacob Feldman1
TL;DR: The Genericity Constraint dictates that among all the models on the lattice that apply, the observer should choose the one in which the observed object is generic, which is simply the lowest in the partial order.

Journal ArticleDOI
TL;DR: In this article, symbolic computation with the generalized tanh method leads to new soliton-like solutions for a (2+1)-dimensional generalization of the shallow water wave equations, which is used in this paper.
Abstract: We report that symbolic computation with the generalized tanh method leads to new soliton-like solutions for a (2+1)-dimensional generalization of the shallow water wave equations.