scispace - formally typeset
Search or ask a question

Showing papers on "Generalization published in 2000"


Proceedings Article
01 Jan 2000
TL;DR: An on-line recursive algorithm for training support vector machines, one vector at a time, is presented and interpretation of decremental unlearning in feature space sheds light on the relationship between generalization and geometry of the data.
Abstract: An on-line recursive algorithm for training support vector machines, one vector at a time, is presented. Adiabatic increments retain the Kuhn-Tucker conditions on all previously seen training data, in a number of steps each computed analytically. The incremental procedure is reversible, and decremental "unlearning" offers an efficient method to exactly evaluate leave-one-out generalization performance. Interpretation of decremental unlearning in feature space sheds light on the relationship between generalization and geometry of the data.

1,319 citations


Journal ArticleDOI
TL;DR: The notion of iISS generalizes the concept of finite gain when using an integral norm on inputs but supremum norms of states, in that sense generalizing the linear "H/sup 2/" theory.
Abstract: The notion of input-to-state stability (ISS) is now recognized as a central concept in nonlinear systems analysis. It provides a nonlinear generalization of finite gains with respect to supremum norms and also of finite L/sup 2/ gains. It plays a central role in recursive design, coprime factorizations, controllers for nonminimum phase systems, and many other areas. In this paper, a newer notion, that of integral input-to-state stability (iISS), is studied. The notion of iISS generalizes the concept of finite gain when using an integral norm on inputs but supremum norms of states, in that sense generalizing the linear "H/sup 2/" theory. It allows one to quantify sensitivity even in the presence of certain forms of nonlinear resonance. We obtain several necessary and sufficient characterizations of the iISS property, expressed in terms of dissipation inequalities and other alternative and nontrivial characterizations.

639 citations


Proceedings ArticleDOI
29 Jun 2000
TL;DR: Without any computation-intensive resampling, the new estimators developed here are computationally much more e cient than cross-validation or bootstrapping and address the special performancemeasures needed for evaluating text classi ers.

408 citations


Journal ArticleDOI
TL;DR: EP has better recognition performance than PCA (eigenfaces) and better generalization abilities than the Fisher linear discriminant (Fisherfaces).
Abstract: Introduces evolutionary pursuit (EP) as an adaptive representation method for image encoding and classification In analogy to projection pursuit, EP seeks to learn an optimal basis for the dual purpose of data compression and pattern classification It should increase the generalization ability of the learning machine as a result of seeking the trade-off between minimizing the empirical risk encountered during training and narrowing the confidence interval for reducing the guaranteed risk during testing It therefore implements strategies characteristic of GA for searching the space of possible solutions to determine the optimal basis It projects the original data into a lower dimensional whitened principal component analysis (PCA) space Directed random rotations of the basis vectors in this space are searched by GA where evolution is driven by a fitness function defined by performance accuracy (empirical risk) and class separation (confidence interval) Accuracy indicates the extent to which learning has been successful, while separation gives an indication of expected fitness The method has been tested on face recognition using a greedy search algorithm To assess both accuracy and generalization capability, the data includes for each subject images acquired at different times or under different illumination conditions EP has better recognition performance than PCA (eigenfaces) and better generalization abilities than the Fisher linear discriminant (Fisherfaces)

343 citations


Journal ArticleDOI
TL;DR: The notion of invariance has been used to resolve a number of dilemmas that arise in standard treatments of explanatory generalizations in the special sciences as mentioned in this paper, such as whether or not a generalization can be used to explain.
Abstract: This paper describes an alternative to the common view that explanation in the special sciences involves subsumption under laws. According to this alternative, whether or not a generalization can be used to explain has to do with whether it is invariant rather than with whether it is lawful. A generalization is invariant if it is stable or robust in the sense that it would continue to hold under a relevant if it is stable or robust in the sense that it would continue to hold under a relevant class of changes. Unlike lawfulness, invariance comes in degrees and has other features that are well suited to capture the characteristics of explanatory generalizations in the special sciences. For example, a generalization can be invariant even if it has exceptions or holds only over a limited spatio-temporal interval. The notion of invariance can be used to resolve a number of dilemmas that arise in standard treatments of explanatory generalizations in the special sciences.

263 citations


Journal ArticleDOI
TL;DR: In this paper, a generalization of the Bernstein polynomials is proposed, in which the approximated function is evaluated at points spaced in geometric progression instead of the equal spacing of the original polynomial.
Abstract: This paper is concerned with a generalization of the Bernstein polynomials in which the approximated function is evaluated at points spaced in geometric progression instead of the equal spacing of the original polynomials.

258 citations


Journal ArticleDOI
TL;DR: The purpose of this article is to formalize the generalization criterion method for model comparison, which has the potential to provide powerful comparisons of complex and nonnested models that may also differ in terms of numbers of parameters.

246 citations


Proceedings Article
29 Jun 2000
TL;DR: This work reviews the ALERGIA algorithm and explains why its generalization criterion, a state merging operation, is purely local, and presents an alternative approach, the MDI algorithm, in which the solution is a probabilistic automaton that trades off minimal divergence from the training sample and minimal size.
Abstract: Probabilistic DFA inference is the problem of inducing a stochastic regular grammar from a positive sample of an unknown language. The ALERGIA algorithm is one of the most successful approaches to this problem. In the present work we review this algorithm and explain why its generalization criterion, a state merging operation, is purely local. This characteristic leads to the conclusion that there is no explicit way to bound the divergence between the distribution defined by the solution and the training set distribution (that is, to control globally the generalization from the training sample). In this paper we present an alternative approach, the MDI algorithm, in which the solution is a probabilistic automaton that trades off minimal divergence from the training sample and minimal size. An efficient computation of the Kullback-Leibler divergence between two probabilistic DFAs is described, from which the new learning criterion is derived. Empirical results in the domain of language model construction for a travel information task show that the MDI algorithm significantly outperforms ALERGIA.

169 citations


01 Jan 2000
TL;DR: The paper presents solutions for generalization problems using least squares adjustment theory, a well known general framework to determine unknown parameters based on given observations, and demonstrates the validity of this approach to the simplification of building ground plans and the displacement of arbitrary cartographic objects.
Abstract: The paper presents solutions for generalization problems using least squares adjustment theory. This concept allows for the introduction of several observations in terms of constraints and for a holistic solution of all these - possibly contrary and competing - constraints. Two examples are used to demonstrate the validity of this approach: the simplification of building ground plans and the displacement of arbitrary cartographic objects. Each approach is verified with several examples; furthermore, the integration of these different approaches is presented, in terms of the fusion of cadastral and topographic data. Least Squares Adjustment theory (LSA) is a well known general framework to determine unknown parameters based on given observations. This optimization technique is well founded in mathematics, operations research, and in geodesy. This general concept allows for the integration of different constraints in order to solve an overall, complex problem. This paper proposes to use adjustment theory for generalization. One problem is the set-up of the constraints for the various generalization tasks. The generalization of building ground plans is formulated in terms of a model-based approach, the problem being the determination of the model. In this case it is derived by the application of some rules. The second generalization operation treated with LSA is displacement: different objects have to be displayed on a map - for reasons of legibility certain constraints have to be satisfied, e.g. minimal object sizes and minimal object distances have to be enforced. LSA offers a straightforward framework to introduce different kinds of these constraints. In one step, all these constraints are solved simultaneously, resulting in one optimized solution with the feature that all residuals are distributed evenly among all the observations. Besides this result, quality parameters indicate how well the initial constraints have been satisfied. The paper is organized as follows: after a review of related work, the simplification of building ground plans using a model-based approach is presented, together with some examples showing the possibilities and the deficiencies. Then the approach for displacement based on least squares adjustment is shown, giving both the theoretical background, and explanatory examples. The integration of the two approaches is demonstrated with the example of the fusion of cadastral information with topographic information. Finally, a summary concludes the paper.

154 citations


Journal ArticleDOI
TL;DR: In this article, a canonical form for pure states of a general multipartite system, in which the constraints on the coordinates (with respect to a factorizable orthonormal basis) are simply that certain ones vanish and certain others are real.
Abstract: We find a canonical form for pure states of a general multipartite system, in which the constraints on the coordinates (with respect to a factorizable orthonormal basis) are simply that certain ones vanish and certain others are real. For identical particles they are invariant under permutations of the particles. As an application, we find the dimension of the generic local equivalence class.

147 citations


Proceedings ArticleDOI
01 May 2000
TL;DR: It is shown that an immediate generalization of the Abelian case solution to the non-Abelian case does not efficiently solve Graph Isomorphism.
Abstract: The Hidden Subgroup Problem is the foundation of many quantum algorithms. An efficient solution is known for the problem over Abelian groups and this was used in Simon's algorithm and Shor's Factoring and Discrete Log algorithms. The non-Abelian case is open; an efficient solution would give rise to an efficient quantum algorithm for Graph Isomorphism. We fully analyze a natural generalization of the Abelian case solution to the non-Abelian case, and give an efficient solution to the problem for normal subgroups. We show, however, that this immediate generalization of the Abelian algorithm does not efficiently solve Graph Isomorphism.


Proceedings Article
30 Jul 2000
TL;DR: This work presents translations of several well-known reasoning tasks from the area of nonmonotonic reasoning into QBFs, and compares their implementation in the prototype system QUIP with established NMRprovers.
Abstract: We consider the compilation of different reasoning tasks into the evaluation problem of quantified boolean formulas (QBFs) as an approach to develop prototype reasoning systems useful for, e.g., experimental purposes. Such a method is a natural generalization of a similar technique applied to NP-problems and has been recently proposed by other researchers. More specifically, we present translations of several well-known reasoning tasks from the area of nonmonotonic reasoning into QBFs, and compare their implementation in the prototype system QUIP with established NMRprovers. The results show reasonable performance, and document that the QBF approach is an attractive tool for rapid prototyping of experimental knowledge-representation systems.

Proceedings Article
01 Jan 2000
TL;DR: This work presents a novel way of obtaining PAC-style bounds on the generalization error of learning algorithms, explicitly using their stability properties, and demonstrates that regularization networks possess the required stability property.
Abstract: We present a novel way of obtaining PAC-style bounds on the generalization error of learning algorithms, explicitly using their stability properties. A stable learner is one for which the learned solution does not change much with small changes in the training set. The bounds we obtain do not depend on any measure of the complexity of the hypothesis space (e.g. VC dimension) but rather depend on how the learning algorithm searches this space, and can thus be applied even when the VC dimension is infinite. We demonstrate that regularization networks possess the required stability property and apply our method to obtain new bounds on their generalization performance.

Journal ArticleDOI
TL;DR: An algorithmic procedure is developed for the random expansion of a given training set to combat overfitting and improve the generalization ability of backpropagation trained multilayer perceptrons (MLPs).
Abstract: An algorithmic procedure is developed for the random expansion of a given training set to combat overfitting and improve the generalization ability of backpropagation trained multilayer perceptrons (MLPs). The training set is K-means clustered and locally most entropic colored Gaussian joint input-output probability density function estimates are formed per cluster. The number of clusters is chosen such that the resulting overall colored Gaussian mixture exhibits minimum differential entropy upon global cross-validated shaping. Numerical studies on real data and synthetic data examples drawn from the literature illustrate and support these theoretical developments.

Proceedings Article
14 Aug 2000
TL;DR: It is shown that from any logic that is used in place of sets of attributes can be derived a contextualized logic that takes into account the formal context and that is isomorphic to the concept lattice.
Abstract: We propose a generalization of Formal Concept Analysis (FCA) in which sets of attributes are replaced by expressions of an almost arbitrary logic. We prove that all FCA can be reconstructed on this basis. We show that from any logic that is used in place of sets of attributes can be derived a contextualized logic that takes into account the formal context and that is isomorphic to the concept lattice. We then justify the generalization of FCA compared with existing extensions and in the perspective of its application to information systems.

Book
01 Jan 2000
TL;DR: In this article, a complete characterization of tight frames, and particularly of orthonormal wavelets, for an arbitrary dilation factor a>1, that are generated by a family of finitely many functions in L2:=L2(R ).
Abstract: The objective of this paper is to establish a complete characterization of tight frames, and particularly of orthonormal wavelets, for an arbitrary dilation factor a>1, that are generated by a family of finitely many functions in L2:=L2( R ). This is a generalization of the fundamental work of G. Weiss and his colleagues who considered only integer dilations. As an application, we give an example of tight frames generated by one single L2 function for an arbitrary dilation a>1 that possess “good” time-frequency localization. As another application, we also show that there does not exist an orthonormal wavelet with good time-frequency localization when the dilation factor a>1 is irrational such that aj remains irrational for any positive integer j. This answers a question in Daubechies' Ten Lectures book for almost all irrational dilation factors. Other applications include a generalization of the notion of s-elementary wavelets of Dai and Larson to s-elementary wavelet families with arbitrary dilation factors a>1. Generalization to dual frames is also discussed in this paper.

Proceedings ArticleDOI
26 Mar 2000
TL;DR: It is argued that, even with large statistics, the dimensionality of the PCA subspace necessary for adequate representation of the identity information in relatively tightly cropped faces is in the 400-700 range, and it is shown that a dimensionality in the range of 200 is inadequate.
Abstract: A low-dimensional representation of sensory signals is the key to solving many of the computational problems encountered in high-level vision. Principal component analysis (PCA) has been used in the past to derive such compact representations for the object class of human faces. Here, with an interpretation of PCA as a probabilistic model, we employ two objective criteria to study its generalization properties in the context of large frontal-pose face databases. We find that the eigenfaces, the eigenspectrum, and the generalization depend strongly on the ensemble composition and size, with statistics for populations as large as 5500, still not stationary. Further, the assumption of mirror symmetry of the ensemble improves the quality of the results substantially in the low-statistics regime, and is also essential in the high-statistics regime. We employ a perceptual criterion and argue that, even with large statistics, the dimensionality of the PCA subspace necessary for adequate representation of the identity information in relatively tightly cropped faces is in the 400-700 range, and we show that a dimensionality of 200 is inadequate. Finally, we discuss some of the shortcomings of PCA and suggest possible solutions.

Journal ArticleDOI
TL;DR: A new learning scheme for improving generalization of multilayer perceptrons using a multi-objective optimization approach to balance between the error of the training data and the norm of network weight vectors to avoid overfitting.

Journal ArticleDOI
TL;DR: In this paper, the authors consider coherent imprecise probability assessments on finite families of conditional events and study the problem of their extension to the case of probabilistic assessments. And they adopt a generalized definition of coherence, called g -coherence, which is based on a suitable generalization of the coherence principle of de Finetti.

Journal ArticleDOI
TL;DR: In this article, a generalization of Fueter's theorem is discussed, which states that whenever f(xo,x) is holomorphic in x 0 +x, then it satisfies DOf = 0, D = O +t0 +j81 2 + kD, 3 being the Fueter operator.
Abstract: In this paper we discuss a generalization of Fueter's theorem which states that whenever f(xo,x) is holomorphic in x 0 +x, then it satisfies DOf = 0, D = O +t0 +j81 2 + kD,, 3 being the Fueter operator.

Journal ArticleDOI
TL;DR: In this article, a central limit theorem for a triangular array of m-dependent random variables is presented, where m may tend to infinity with the row index at a certain rate.

Journal ArticleDOI
01 Dec 2000-Fractals
TL;DR: In this paper, a commutative generalization of complex numbers called bicomplex numbers is introduced to introduce bicomial dynamics, and a generalized version of the Mandelbrot set with quadratic polynomial in the form w2 + c is shown to be connected.
Abstract: We use a commutative generalization of complex numbers called bicomplex numbers to introduce bicomplex dynamics. In particular, we give a generalization of the Mandelbrot set and of the "filled-Julia" sets in dimensions three and four. Also, we establish that our version of the Mandelbrot set with quadratic polynomial in bicomplex numbers of the form w2 + c is identically the set of points where the associated generalized "filled-Julia" set is connected. Moreover, we prove that our generalized Mandelbrot set of dimension four is connected.

Journal ArticleDOI
TL;DR: In this paper, a generalization of the stability of the Pexider equation has been shown in the spirit of Hyers, Ulam, Rassias, and Gavruta.

Journal ArticleDOI
Kevin Cowtan1
TL;DR: Two special cases, where structure factors are independent and where electron-density values are independent, are examined, related to the new likelihood-based framework of Terwilliger for employing structural information which was previously exploited by means of conventional density-modification calculations.
Abstract: A general multivariate quadratic function of the structure factors is constructed and transformed to obtain a quadratic function of the continuous electron density. Two special cases, where structure factors are independent and where electron-density values are independent, are examined. These results are related to the new likelihood-based framework of Terwilliger [Terwilliger (1999), Acta Cryst. D55, pp. 1863–1871] for employing structural information which was previously exploited by means of conventional density-modification calculations. The treatment here involves different assumptions and highlights new features of Terwilliger's calculation. The generalization quadratic construction allows the generation of cross terms relating all reflections and electron densities. Other applications of this approach are considered.

Journal ArticleDOI
TL;DR: In this paper, a modified notion called multitopic set is introduced, which is a mild generalization of the same-named notion introduced by Joachim Lambek in 1969, inspired by the concept of opetopic set introduced by John C. Baez and James Dolan.


Journal ArticleDOI
TL;DR: The cross-modal generalization effects of training complex sentence comprehension and complex sentence production were examined in 4 individuals with agrammatic Broca's aphasia who showed difficulty comprehending and producing complex, noncanonical sentences.
Abstract: The cross-modal generalization effects of training complex sentence comprehension and complex sentence production were examined in 4 individuals with agrammatic Broca's aphasia who showed difficulty comprehending and producing complex, noncanonical sentences. Object-cleft and passive sentences were selected for treatment because the two are linguistically distinct, relying on wh-and NP movement, respectively (Chomsky, 1986). Two participants received comprehension training, and 2 received production training using linguistic specific treatment (LST). LST takes participants through a series of steps that emphasize the verb and verb argument structure, as well as the linguistic movement required to derive target sentences. A single-subject multiple-baseline design across behaviors was used to measure acquisition and generalization within and across sentence types, as well as cross-modal generalization (i.e., from comprehension to production and vice versa) and generalization to discourse. Results indicated that both treatment methods were effective for training comprehension and production of target sentences and that comprehension treatment resulted in generalization to spoken and written sentence production. Sentence production treatment generalized to written sentence production only; generalization to comprehension did not occur. Across sentence types generalization also did not occur, as predicted, and the effects of treatment on discourse were inconsistent across participants. These data are discussed with regard to models of normal sentence comprehension and production.

Proceedings Article
28 Jun 2000
TL;DR: The following SRM bound was proved in [McA98] and, for completeness, is proved again in Section 2.1, which state that with probability1 over the sampleS the authors have the following.
Abstract: The problem of over-fitting is central to both the theory and practice of machine learning. Intuitively, one over-fits by using too many parameters in the concept, e.g, fitting an nth order polynomial ton data points. One under-fits by using too few parameters, e.g., fitting a linear curve to clearly quadr atic data. The fundamental question is how many parameters, or what concept size, should one allow for a given amount of training data. A standard theoretical approach is to prove a bound on generalization error as a function of the training error and the concept size (or VC dimension). One can then select a concept minimizing this bound, i.e., optimizing a certain tradeoff, as expressed in the bound, between traini ng error and concept size. Bounds on generalization error that express a tradeoff between the training error and the size of the concept are often called structural risk minimization (SRM) formulas. A variety of SRM bounds have been proved in the literature [Vap82]. The following SRM bound was proved in [McA98] and, for completeness, is proved again in Section 2. It state s that with probability1 over the sampleS we have the following.

Journal ArticleDOI
TL;DR: In this article, a simple inductive method for the analysis of the convergence of cluster expansions (Taylor expansions, Mayer expansions) for the partition functions of polymer models is presented. And a simple proof of the Dobrushin-Kotecký-Preiss criterion and a generalization usable for situations where a successive expansion of the partition function has to be used.
Abstract: We explain a simple inductive method for the analysis of the convergence of cluster expansions (Taylor expansions, Mayer expansions) for the partition functions of polymer models. We give a very simple proof of the Dobrushin–Kotecký–Preiss criterion and formulate a generalization usable for situations where a successive expansion of the partition function has to be used.