scispace - formally typeset
Search or ask a question

Showing papers on "Generalization published in 1998"


01 Jan 1998
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Abstract: A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

26,531 citations


Journal ArticleDOI
TL;DR: Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights.
Abstract: Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the network. Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. For example, consider a two-layer feedforward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A and the input dimension is n. We show that the misclassification probability is no more than a certain error estimate (that is related to squared error on the training set) plus A/sup 3/ /spl radic/((log n)/m) (ignoring log A and log m factors), where m is the number of training patterns. This may explain the generalization performance of neural networks, particularly when the number of training examples is considerably smaller than the number of weights. It also supports heuristics (such as weight decay and early stopping) that attempt to keep the weights small during training. The proof techniques appear to be useful for the analysis of other pattern classifiers: when the input domain is a totally bounded metric space, we use the same approach to give upper bounds on misclassification probability for classifiers with decision boundaries that are far from the training examples.

1,234 citations


Journal ArticleDOI
TL;DR: A result is presented that allows one to trade off errors on the training sample against improved generalization performance, and a more general result in terms of "luckiness" functions, which provides a quite general way for exploiting serendipitous simplicity in observed data to obtain better prediction accuracy from small training sets.
Abstract: The paper introduces some generalizations of Vapnik's (1982) method of structural risk minimization (SRM). As well as making explicit some of the details on SRM, it provides a result that allows one to trade off errors on the training sample against improved generalization performance. It then considers the more general case when the hierarchy of classes is chosen in response to the data. A result is presented on the generalization performance of classifiers with a "large margin". This theoretically explains the impressive generalization performance of the maximal margin hyperplane algorithm of Vapnik and co-workers (which is the basis for their support vector machines). The paper concludes with a more general result in terms of "luckiness" functions, which provides a quite general way for exploiting serendipitous simplicity in observed data to obtain better prediction accuracy from small training sets. Four examples are given of such functions, including the Vapnik-Chervonenkis (1971) dimension measured on the sample.

589 citations


Journal ArticleDOI
TL;DR: In this paper, reliability generalization characterizes the typical reliability of scores for a given test across studies, the amount of variability in reliability coefficients for given measures, and the sources of variability of reliability coefficients across studies.
Abstract: Because tests are not reliable, it is important to explore score reliability in virtually all studies. The present article proposes and illustrates a new method-reliability generalization-that can be used in a meta-analysis application similar to validity generalization. Reliability generalization characterizes (a) the typical reliability of scores for a given test across studies, (b) the amount of variability in reliability coefficients for given measures, and (c) the sources of variability in reliability coefficients across studies. The use of reliability generalization is illustrated here by analyzing 87 reliability coefficients reported for the two scales of the Bem Sex Role Inventory (BSRI).

380 citations


Journal ArticleDOI
TL;DR: In this paper, a definition of weakω-categories based on a higher-order generalization of the apparatus of operads is presented, where weakω is defined as a class of weak operads.

316 citations


15 Oct 1998
TL;DR: The convergence of the backpropagation algorithm with respect to a) the complexity of the required function approximatio n, b) the size of the network in relation to the size required for an optimal solution, and c) the degree of noise in the training data is investigated.
Abstract: One of the most important aspects of any machine learning paradigm is how i t scales according to problem size and complexity. Using a task with known optimal train ing error, and a pre-specified maximum number of training updates, we investigate the convergence of th backpropagation algorithm with respect to a) the complexity of the required function approximatio n, b) the size of the network in relation to the size required for an optimal solution, and c) the degree o f noise in the training data. In general, for a) the solution found is worse when the function to be app roximated is more complex, for b) oversized networks can result in lower training and generalization error in certain cases, and for c) the use of committee or ensemble techniques can be more beneficial as the level o f noise in the training data is increased. For the experiments we performed, we do not obtain the optimal solution in any case. We further support the observation that larger networks can produce bett er training and generalization error using a face recognition example where a network with many more par ameters than training points generalizes better than smaller networks.

271 citations


01 Jan 1998
TL;DR: In this paper, the Steiger-Lind root mean square error of approximation fit indexes and interval estimation procedure for models based on multiple independent samples is discussed and an approach that seems both reasonable and workable, and caution against one that definitely seems inappropriate.
Abstract: Generalization of the Steiger-Lind root mean square error of approximation fit indexes and interval estimation procedure to models based on multiple independent samples is discussed. In this article, we suggest an approach that seems both reasonable and workable, and caution against one that definitely seems inappropriate.

253 citations


Journal ArticleDOI
TL;DR: In this article, a matrix variate generalization of the power exponential distribution family is proposed, which can be useful in generalizing statistical procedures in multivariate analysis and in designing robust alternatives to them.
Abstract: This paper proposes a matrix variate generalization of the power exponential distribution family, which can be useful in generalizing statistical procedures in multivariate analysis and in designing robust alternatives to them. An example is added to show an application of the generalization.

250 citations


Journal ArticleDOI
TL;DR: In this paper, a novel approach using an artificial neural network was used to develop a model for analyzing the relationship between the Global Radiation (GR) and climatological variables, and to predict GR for locations not covered by the model's training data.

242 citations


Journal ArticleDOI
TL;DR: In this paper, the authors extended the Poisson model of games with population uncertainty by allowing that expected population sizes and players' utility functions may depend on an unknown state of the world.

199 citations


Journal Article
Hang Li1, Naoki Abe1
TL;DR: In this paper, the problem of generalizing values of a case frame slot for a verb is viewed as that of estimating a conditional probability distribution over a partition of words, and a new generalization method based on the minimum description length (MDL) principle is proposed.
Abstract: A new method for automatically acquiring case frame patterns from large corpora is proposed. In particular, the problem of generalizing values of a case frame slot for a verb is viewed as that of estimating a conditional probability distribution over a partition of words, and a new generalization method based on the Minimum Description Length (MDL) principle is proposed. In order to assist with efficiency, the proposed method makes use of an existing thesaurus and restricts its attention to those partitions that are present as "cuts" in the thesaurus tree, thus reducing the generalization problem to that of estimating a "tree cut model" of the thesaurus tree. An efficient algorithm is given, which provably obtains the optimal tree cut model for the given frequency data of a case slot, in the sense of MDL. Case frame patterns obtained by the method were used to resolve PP-attachment ambiguity. Experimental results indicate that the proposed method improves upon or is at least comparable with existing methods.

Journal ArticleDOI
TL;DR: A method for updating approximations of a concept incrementally is presented, using the inductive learning algorithm, LERS based on rough set theory, to implement a quasi-incremental algorithm for learning classification rules from very large data bases generalized by dynamic conceptual hierarchies provided by users.

Proceedings Article
01 Dec 1998
TL;DR: This work investigates the problem of learning a classification task on data represented in terms of their pairwise proximities, which does not refer to an explicit feature representation of the data items and is thus more general than the standard approach of using Euclidean feature vectors.
Abstract: We investigate the problem of learning a classification task on data represented in terms of their pairwise proximities. This representation does not refer to an explicit feature representation of the data items and is thus more general than the standard approach of using Euclidean feature vectors, from which pairwise proximities can always be calculated. Our first approach is based on a combined linear embedding and classification procedure resulting in an extension of the Optimal Hyperplane algorithm to pseudo-Euclidean data. As an alternative we present another approach based on a linear threshold model in the proximity values themselves, which is optimized using Structural Risk Minimization. We show that prior knowledge about the problem can be incorporated by the choice of distance measures and examine different metrics W.r.t. their generalization. Finally, the algorithms are successfully applied to protein structure data and to data from the cat's cerebral cortex. They show better performance than K-nearest-neighbor classification.

Journal ArticleDOI
TL;DR: It is claimed that the notion of existence dependency is always possible to classify objects according to this relationship, thus removing the necessity for the Part Of relation and other kinds of associations between object types.
Abstract: In object-oriented conceptual modeling, the generalization/specialization hierarchy and the whole/part relationship are prevalent classification schemes for object types. This paper presents an object-oriented conceptual model where, in the end, object types are classified according to two relationships only. Existence dependency and generalization/specialization. Existence dependency captures some of the interesting semantics that are usually associated with the concept of aggregation (also called composition or Part Of relation), but in contrast with the latter concept, the semantics of existence dependency are very precise and its use clear cut. The key advantage of classifying object types according to existence dependency are the simplicity of the concept, its absolute unambiguity, and the fact that it enables to check conceptual schemes for semantic integrity and consistency. We will first define the notion of existence dependency and claim that it is always possible to classify objects according to this relationship, thus removing the necessity for the Part Of relation and other kinds of associations between object types. The second claim of this paper is that existence dependency is the key to semantic integrity checking to a level unknown to current object-oriented analysis methods. In other words: Existence dependency allows us to track and solve inconsistencies in an object-oriented conceptual schema.

Journal ArticleDOI
01 Jan 1998
TL;DR: The aim of this paper is to observe school-case solutions available in standard cartographic books and try to replicate those automatically to preserve the overall structure with line bends which are mathematically defined according to size, shape, and context.
Abstract: Many solutions for line generalizations have already been proposed. Most of them, however, are geometric solutions, not cartographic ones. The position we take in this paper is to observe school-case solutions available in standard cartographic books and try to replicate those automatically. A central criterion guiding the process of cartographic generalization is line structure, which itself can be decomposed into a series of line bends. Hence our solution is to preserve the overall structure with line bends which are mathematically defined according to size, shape, and context. Rules are subsequently applied using operators such as elimination, combination, and exaggeration. The algorithms that were used are both procedural and knowledge based. Various experiments were conducted on physical and political geographic lines, and we show the graphical results so that readers may visually assess them. Further research to improve the present solutions is discussed, particularly options for avoiding conflicts ...

Journal ArticleDOI
TL;DR: In this paper, a generalization of the circular and hyperbolic functions is proposed, based on the Tsallis statistics and on a consistent generalisation of the Euler formula, and some properties of the presently proposed q-trigonometry are then investigated.
Abstract: A generalization of the circular and hyperbolic functions is proposed, based on the Tsallis statistics and on a consistent generalization of the Euler formula. Some properties of the presently proposed q-trigonometry are then investigated. The generalized functions are exact solutions of a nonlinear oscillator. Original circular and hyperbolic relations are recovered as the limiting case.

Journal ArticleDOI
TL;DR: It is proved that universal fault-tolerant computation is possible with any higher-dimensional stabilizer code for prime d, and the theory of fault-Tolerant computations using such codes is discussed.
Abstract: Instead of a quantum computer where the fundamental units are 2-dimensional qubits, we can consider a quantum computer made up of d-dimensional systems. There is a straightforward generalization of the class of stabilizer codes to d-dimensional systems, and I will discuss the theory of fault-tolerant computation using such codes. I prove that universal fault-tolerant computation is possible with any higher-dimensional stabilizer code for prime d.

Journal ArticleDOI
TL;DR: This paper showed that generalization only occurs when some studied items are systematically marked and the process consists of two components: one component involves making a link between the phonological markers and the indicators (e.g., definite and indefinite articles) of subclass membership.

Journal ArticleDOI
TL;DR: This article elaborate global control for partial deduction, using the concept of a characteristic tree, encapsulating specialization behavior rather than syntactic structure, to guide generalization and polyvariance, and shows how this can be done in a correct and elegant way.
Abstract: Given a program and some input data, partial deduction computes a specialized program handling any remaining input more efficiently. However, controlling the process well is a rather difficult problem. In this article, we elaborate global control for partial deduction: for which atoms, among possibly infinitely many, should specialized relations be produced, meanwhile guaranteeing correctness as well as termination? Our work is based on two ingredients. First, we use the concept of a characteristic tree, encapsulating specialization behavior rather than syntactic structure, to guide generalization and polyvariance, and we show how this can be done in a correct and elegant way. Second, we structure combinations of atoms and associated characteristic trees in global trees registering “causal” relationships among such pairs. This allows us to spot looming nontermination and perform proper generalization in order to avert the danger, without having to impose a depth bound on characteristic trees. The practical relevance and benefits of the work are illustrated through extensive experiments. Finally, a similar approach may improve upon current (on-line) control strategies for program transformation in general such as (positive) supercompilation of functional programs. It also seems valuable in the context of abstract interpretation to handle infinite domains of infinite height with more precision.

Journal ArticleDOI
TL;DR: The authors showed 9-and 11-month-old infants imitation of animal and vehicle properties such as drinking from a cup or giving a ride. But infants generalized the properties broadly to both typical and novel exemplars within the appropriate domain, and rarely to exemplars from the inappropriate domain.
Abstract: Using little models, we showed 9- and 11-month-old infants events in which animal or vehicle properties were demonstrated, such as a dog drinking from a cup or a car giving a ride. The infants were tested on imitation of these properties on the same exemplars as used for the modeling, on generalization to other exemplars from the same domain, and on generalization to exemplars from an inappropriate domain. Infants generalized the properties broadly to both typical and novel exemplars within the appropriate domain, and rarely to exemplars from the inappropriate domain. It is concluded that at least by 9 months infants have formed global concepts of animals and vehicles that control the way infants learn the characteristic properties of these classes.

Book
01 Dec 1998
TL;DR: In this paper, the results of 30 years of investigation by the author into the creation of a new theory on statistical analysis of observations, based on the principle of random arrays of random vectors and matrices of increasing dimensions, are described.
Abstract: This book contains the results of 30 years of investigation by the author into the creation of a new theory on statistical analysis of observations, based on the principle of random arrays of random vectors and matrices of increasing dimensions It describes limit phenomena of sequences of random observations, which occupy a central place in the theory of random matrices This is the first book to explore statistical analysis of random arrays and provides the necessary tools for such analysis This book is a natural generalization of multidimensional statistical analysis and aims to provide its readers with new, improved estimators of this analysis The book consists of 14 chapters and opens with the theory of sample random matrices of fixed dimension, which allows to envelop not only the problems of multidimensional statistical analysis, but also some important problems of mechanics, physics and economics The second chapter deals with all 50 known canonical equations of the new statistical analysis, which form the basis for finding new and improved statistical estimators Chapters 3-5 contain detailed proof of the three main laws on the theory of sample random matrices In chapters 6-10 detailed, strong proofs of the Circular and Elliptic Laws and their generalization are given In chapters 11-13 the convergence rates of spectral functions are given for the practical application of new estimators and important questions on random matrix physics are considered The final chapter contains 54 new statistical estimators, which generalize the main estimators of statistical analysis

Book ChapterDOI
01 Oct 1998
TL;DR: A neural network’s ability to generalize from examples using ideas from statistical mechanics is estimated and a variety of learning problems that can be treated exactly by the replica method of statistical physics are introduced.
Abstract: We estimate a neural network’s ability to generalize from examples using ideas from statistical mechanics. We discuss the connection between this approach and other powerful concepts from mathematical statistics, computer science, and information theory that are useful in explaining the performance of such machines. For the simplest network, the perceptron, we introduce a variety of learning problems that can be treated exactly by the replica method of statistical physics.

Journal ArticleDOI
TL;DR: In this paper, the authors studied the optimal cutting strategy for an ongoing forest, using stochastic impulse control, and showed how Faustmann's formula can be generalized to growing forests.

Journal ArticleDOI
TL;DR: The differentiability of the Wiener process as a sesquilinear form on a dense domain in the Hilbert space of square-integrable functions over Wiener space is shown and is extended to the quantum context, providing a basis for a corresponding generalization of the Ito theory of stochastic integration.
Abstract: A natural explanation for extreme irregularities in the evolution of prices in financial markets is provided by quantum effects. The lack of simultaneous observability of relevant variables and the interference of attempted observation with the values of these variables represent such effects. These characteristics have been noted by traders and economists and appear intrinsic to market dynamics. This explanation is explored here in terms of a corresponding generalization of the Wiener process and its role in the Black–Scholes–Merton theory. The differentiability of the Wiener process as a sesquilinear form on a dense domain in the Hilbert space of square-integrable functions over Wiener space is shown and is extended to the quantum context. This provides a basis for a corresponding generalization of the Ito theory of stochastic integration. An extension of the Black–Scholes option pricing formula to the quantum context is deduced.

Journal ArticleDOI
TL;DR: In this article, the authors prove convergence of global, bounded, and smooth solutions of the wave equation with linear dissipation and analytic nonlinearity, and a generalization and examples of applications are given at the end of the paper.

Book ChapterDOI
TL;DR: It is shown that evolutionary algorithms are able to converge to the set of minimal elements in finite time with probability one, provided that the search space is finite, the time-invariant variation operator is associated with a positive transition probability function and that the selection operator obeys the so-called ‘elite preservation strategy.’
Abstract: The task of finding minimal elements of a partially ordered set is a generalization of the task of finding the global minimum of a real-valued function or of finding Pareto-optimal points of a multicriteria optimization problem. It is shown that evolutionary algorithms are able to converge to the set of minimal elements in finite time with probability one, provided that the search space is finite, the time-invariant variation operator is associated with a positive transition probability function and that the selection operator obeys the so-called ‘elite preservation strategy.’

Journal ArticleDOI
01 Mar 1998
TL;DR: The study shows that a set of sophisticated generalization operators can be constructed for generalization of complex data objects, a dimension-based class generalization mechanism can be developed for object cube construction, and sophisticated rule formation methods can be develop for extraction of different kinds of knowledge from data.
Abstract: Data mining is the discovery of knowledge and useful information from the large amounts of data stored in databases. With the increasing popularity of object-oriented database systems in advanced database applications, it is important to study the data mining methods for object-oriented databases because mining knowledge from such databases may improve understanding, organization, and utilization of the data stored there. In this paper, issues on generalization-based data mining in object-oriented databases are investigated in three aspects: (1) generalization of complex objects, (2) class-based generalization, and (3) extraction of different kinds of rules. An object cube model is proposed for class-based generalization, on-line analytical processing, and data mining. The study shows that (i) a set of sophisticated generalization operators can be constructed for generalization of complex data objects, (ii) a dimension-based class generalization mechanism can be developed for object cube construction, and (iii) sophisticated rule formation methods can be developed for extraction of different kinds of knowledge from data, including characteristic rules, discriminant rules, association rules, and classification rules. Furthermore, the application of such discovered knowledge may substantially enhance the power and flexibility of browsing databases, organizing databases and querying data and knowledge in object-oriented databases.

Journal ArticleDOI
TL;DR: A statistically based methodology for the design of neural networks when the dimension d of the network input is comparable to the size n of the training set, illustrated in detail in the context of short-term forecasting of the demand for electric power from an electric utility.
Abstract: We introduce a statistically based methodology for the design of neural networks when the dimension d of the network input is comparable to the size n of the training set. If one proceeds straightforwardly, then one is committed to a network of complexity exceeding n. The result will be good performance on the training set but poor generalization performance when the network is presented with new data. To avoid this we need to select carefully the network architecture, including control over the input variables. Our approach to selecting a network architecture first selects a subset of input variables (features) using the nonparametric statistical process of difference-based variance estimation and then selects a simple network architecture using projection pursuit regression (PPR) ideas combined with the statistical idea of slicing inverse regression (SIR). The resulting network, which is then retrained without regard to the PPR/SIR determined parameters, is one of moderate complexity (number of parameters significantly less than n) whose performance on the training set can be expected to generalize well. The application of this methodology is illustrated in detail in the context of short-term forecasting of the demand for electric power from an electric utility.

Journal ArticleDOI
TL;DR: Testing whether viewpoint-specific representations for some members of a class facilitate the recognition of other members of that class supports the hypothesis that image-based representations are viewpoint dependent, but that these representations generalize across members of perceptually-defined classes.