scispace - formally typeset
Search or ask a question

Showing papers on "Generalization published in 1996"


Journal ArticleDOI
26 Feb 1996
TL;DR: The data cube operator as discussed by the authors generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers.
Abstract: Data analysis applications typically aggregate data across many dimensions looking for unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional answers. Applications need the N-dimensional generalization of these operators. The paper defines that operator, called the data cube or simply cube. The cube operator generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers. The cube treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N-dimensionaI cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. Aggregation points are represented by an "infinite value": ALL, so the point (ALL,ALL,...,ALL, sum(*)) represents the global sum of all items. Each ALL value actually represents the set of values contributing to that aggregation.

2,308 citations


Journal ArticleDOI
Guozhong An1
TL;DR: It is shown that input noise and weight noise encourage the neural-network output to be a smooth function of the input or its weights, respectively, and in the weak-noise limit, noise added to the output of the neural networks only changes the objective function by a constant, it cannot improve generalization.
Abstract: We study the effects of adding noise to the inputs, outputs, weight connections, and weight changes of multilayer feedforward neural networks during backpropagation training. We rigorously derive and analyze the objective functions that are minimized by the noise-affected training processes. We show that input noise and weight noise encourage the neural-network output to be a smooth function of the input or its weights, respectively. In the weak-noise limit, noise added to the output of the neural networks only changes the objective function by a constant. Hence, it cannot improve generalization. Input noise introduces penalty terms in the objective function that are related to, but distinct from, those found in the regularization approaches. Simulations have been performed on a regression and a classification problem to further substantiate our analysis. Input noise is found to be effective in improving the generalization performance for both problems. However, weight noise is found to be effective in improving the generalization performance only for the classification problem. Other forms of noise have practically no effect on generalization.

465 citations


Journal ArticleDOI
TL;DR: In this paper, an extension of the concept of quantiles in multidimensions that uses the geometry of multivariate data clouds has been considered, based on blending and generalization of the key ideas used in the construction of spatial median and regression quantiles, both of which have been extensively studied in the literature.
Abstract: An extension of the concept of quantiles in multidimensions that uses the geometry of multivariate data clouds has been considered. The approach is based on blending as well as generalization of the key ideas used in the construction of spatial median and regression quantiles, both of which have been extensively studied in the literature. These geometric quantiles are potentially useful in constructing trimmed multivariate means as well as many other L estimates of multivariate location, and they lead to a directional notion of central and extreme points in a multidimensional setup. Such quantiles can be defined as meaningful and natural objects even in infinite-dimensional Hilbert and Banach spaces, and they yield an effective generalization of quantile regression in multiresponse linear model problems. Desirable equivariance properties are shown to hold for these multivariate quantiles, and issues related to their computation for data in finite-dimensional spaces are discussed. n 1/2 consistenc...

411 citations


Book ChapterDOI
16 Jul 1996
TL;DR: This work presents a method of incorporating prior knowledge about transformation invariances by applying transformations to support vectors, the training examples most critical for determining the classification boundary.
Abstract: Developed only recently, support vector learning machines achieve high generalization ability by minimizing a bound on the expected test error; however, so far there existed no way of adding knowledge about invariances of a classification problem at hand. We present a method of incorporating prior knowledge about transformation invariances by applying transformations to support vectors, the training examples most critical for determining the classification boundary.

315 citations


Proceedings Article
03 Dec 1996
TL;DR: This paper shows that if a large neural network is used for a pattern classification problem, and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights.
Abstract: This paper shows that if a large neural network is used for a pattern classification problem, and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. More specifically, consider an l-layer feed-forward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A. The misclassification probability converges to an error estimate (that is closely related to squared error on the training set) at rate O((cA)l(l+1)/2 √(log n)/m) ignoring log factors, where m is the number of training patterns, n is the input dimension, and c is a constant. This may explain the generalization performance of neural networks, particularly when the number of training examples is considerably smaller than the number of weights. It also supports heuristics (such as weight decay and early stopping) that attempt to keep the weights small during training.

279 citations


Journal ArticleDOI
TL;DR: A revision and a generalization of the results contained in the only paper so far published on the matter of translation invariance is undertaken by allowing inputs and outputs to take not only zero but negative values, broadening the field of application of the DEA methodology.
Abstract: In this paper, we undertake a revision and a generalization of the results contained in the only paper so far published on the matter of translation invariance by allowing inputs and outputs to take not only zero but negative values. This broadens the field of application of the DEA methodology.

231 citations


Journal ArticleDOI
TL;DR: Contrary to the traditional view, it is found that infants this age have generalized the properties of drinking and sleeping throughout the animal domain, andThe properties of "being keyed" and "giving a ride" throughout the vehicle domain.

227 citations


Journal ArticleDOI
TL;DR: The authors develop a self-contained theory for linear estimation in Krein spaces based on simple concepts such as projections and matrix factorizations and leads to an interesting connection between Krein space projection and the recursive computation of the stationary points of certain second-order (or quadratic) forms.
Abstract: The authors develop a self-contained theory for linear estimation in Krein spaces. The derivation is based on simple concepts such as projections and matrix factorizations and leads to an interesting connection between Krein space projection and the recursive computation of the stationary points of certain second-order (or quadratic) forms. The authors use the innovations process to obtain a general recursive linear estimation algorithm. When specialized to a state-space structure, the algorithm yields a Krein space generalization of the celebrated Kalman filter with applications in several areas such as H/sup /spl infin//-filtering and control, game problems, risk sensitive control, and adaptive filtering.

208 citations


Journal ArticleDOI
TL;DR: The step‐by‐step approach suggested here provides structured guidance to validators of educational assessments as a chain of eight linked stages that suggests that validity is limited by the weakest link, and that efforts to make other links particularly strong may be wasteful or even harmful.
Abstract: Validity is the most important quality of an assessment, but its evaluation is often neglected. The step‐by‐step approach suggested here provides structured guidance to validators of educational assessments. Assessment is depicted as a chain of eight linked stages: administration, scoring, aggregation, generalization, extrapolation, evaluation, decision and impact. Evaluating validity requires careful consideration of threats to validity associated with each link. Several threats are described and exemplified for each link. These sets of threats are intended to be illustrative rather than comprehensive. The chain model suggests that validity is limited by the weakest link, and that efforts to make other links particularly strong may be wasteful or even harmful. The chain model and list of threats is also shown to be valuable when planning assessments.

208 citations


Journal ArticleDOI
TL;DR: Advantages of this approach include its inherent ability for one-class generalization, freedom from characterizing the non-target class, and the ability to form closed decision boundaries for multi-modal classes that are more complex than hyperspheres without requiring inversion of large matrices.

206 citations


01 Jan 1996
Abstract: A new family of orientation parameters derived from the Euler parameters is presented. They are found by a general stereographic projection of the Euler parameter constraint surface, a four-dimensional unit sphere, onto a three-dimensional hyperplane. The resulting set of three stereographic parameters have a low degree polynomial non-linearity in the corresponding kinematic equations and direction cosine matrix parameterization. The stereographic parameters are not unique, but have a corresponding set of “shadow” parameters. These “shadow” parameters are distinct, yet represent the same physical orientation. Using the original stereographic parameters combined with judicious switching to their shadow set, it is possible to describe any rotation without encountering a singularity. The symmetric stereographic parameters are nonsingular for up to a principal rotation of ±360°. The asymmetric stereographic parameters are well suited for describing the kinematics of spinning bodies, since they only go singular when oriented at a specific angle about a specific axis. A globally regular and stable control law using symmetric stereographic parameters is presented which can bring a spinning body to rest in any desired orientation without backtracking the motion.

Journal ArticleDOI
TL;DR: MacKay's Bayesian framework for backpropagation is a practical and powerful means to improve the generalization ability of neural networks and is applied in the prediction of fat content in minced meat from near infrared spectra.
Abstract: MacKay's (1992) Bayesian framework for backpropagation is a practical and powerful means to improve the generalization ability of neural networks. It is based on a Gaussian approximation to the posterior weight distribution. The framework is extended, reviewed, and demonstrated in a pedagogical way. The notation is simplified using the ordinary weight decay parameter, and a detailed and explicit procedure for adjusting several weight decay parameters is given. Bayesian backprop is applied in the prediction of fat content in minced meat from near infrared spectra. It outperforms "early stopping" as well as quadratic regression. The evidence of a committee of differently trained networks is computed, and the corresponding improved generalization is verified. The error bars on the predictions of the fat content are computed. There are three contributors: The random noise, the uncertainty in the weights, and the deviation among the committee members. The Bayesian framework is compared to Moody's GPE (1992). Finally, MacKay and Neal's automatic relevance determination, in which the weight decay parameters depend on the input number, is applied to the data with improved results.

Journal ArticleDOI
TL;DR: A pragmatic framework for comparisons between various methods is described, and a detailed comparison study comprising several thousand individual experiments is presented, which provides some insights on applicability of various methods.
Abstract: The problem of estimating an unknown function from a finite number of noisy data points has fundamental importance for many applications. This problem has been studied in statistics, applied mathematics, engineering, artificial intelligence, and, more recently, in the fields of artificial neural networks, fuzzy systems, and genetic optimization. In spite of many papers describing individual methods, very little is known about the comparative predictive (generalization) performance of various methods. We discuss subjective and objective factors contributing to the difficult problem of meaningful comparisons. We also describe a pragmatic framework for comparisons between various methods, and present a detailed comparison study comprising several thousand individual experiments. Our approach to comparisons is biased toward general (nonexpert) users. Our study uses six representative methods described using a common taxonomy. Comparisons performed on artificial data sets provide some insights on applicability of various methods. No single method proved to be the best, since a method's performance depends significantly on the type of the target function, and on the properties of training data.

Journal ArticleDOI
TL;DR: This article shows that the generalization error can be decomposed into two terms: the approximation error, due to the insufficient representational capacity of a finite sized network, and the estimation error,due to insufficient information about the target function because of the finite number of samples.
Abstract: Feedforward networks together with their training algorithms are a class of regression techniques that can be used to learn to perform some task from a set of examples. The question of generalization of network performance from a finite training set to unseen data is clearly of crucial importance. In this article we first show that the generalization error can be decomposed into two terms: the approximation error, due to the insufficient representational capacity of a finite sized network, and the estimation error, due to insufficient information about the target function because of the finite number of samples. We then consider the problem of learning functions belonging to certain Sobolev spaces with gaussian radial basis functions. Using the above-mentioned decomposition we bound the generalization error in terms of the number of basis functions and number of examples. While the bound that we derive is specific for radial basis functions, a number of observations deriving from it apply to any approximation technique. Our result also sheds light on ways to choose an appropriate network architecture for a particular problem and the kinds of problems that can be effectively solved with finite resources, i.e., with a finite number of parameters and finite amounts of data.

Journal ArticleDOI
TL;DR: This work applies the criterion of minimal mutual information to the real world problem of electrical motor fault detection treated as a novelty detection task, and generalizes to nonlinear transformations by only demanding perfect transmission of information.
Abstract: According to Barlow (1989), feature extraction can be understood as finding a statistically independent representation of the probability distribution underlying the measured signals. The search for a statistically independent representation can be formulated by the criterion of minimal mutual information, which reduces to decorrelation in the case of gaussian distributions. If nongaussian distributions are to be considered, minimal mutual information is the appropriate generalization of decorrelation as used in linear Principal Component Analyses (PCA). We also generalize to nonlinear transformations by only demanding perfect transmission of information. This leads to a general class of nonlinear transformations, namely symplectic maps. Conservation of information allows us to consider only the statistics of single coordinates. The resulting factorial representation of the joint probability distribution gives a density estimation. We apply this concept to the real world problem of electrical motor fault detection treated as a novelty detection task.


Book ChapterDOI
01 Jan 1996
TL;DR: In this paper, a teaching experiment using generalization activities is presented, and two generalizing activities are described in some detail, looking at the behavior of adults in the experimental group in the light of research results of high school students on tests and interviews involving the same activities.
Abstract: Considering algebra as a culture, this chapter looks at the introduction of algebra as an initiation process where generalization activities can be extremely effective. After a reflection on my own immersion into algebra and the evolution of attitudes toward the teaching of algebra, a teaching experiment using generalization activities is presented. Two generalizing activities are described in some detail, looking at the behavior of adults in the experimental group in the light of research results of high school students on tests and interviews involving the same activities. The paper concludes with a “cultural” reflection on the teaching experiment and a more general consideration of the role of generalization in the introduction of algebra.

Journal ArticleDOI
TL;DR: Theoretical results show that applying a controlled amount of noise during training may improve convergence and generalization performance, and it is predicted that best overall performance can be achieved by injecting additive noise at each time step.
Abstract: Concerns the effect of noise on the performance of feedforward neural nets. We introduce and analyze various methods of injecting synaptic noise into dynamically driven recurrent nets during training. Theoretical results show that applying a controlled amount of noise during training may improve convergence and generalization performance. We analyze the effects of various noise parameters and predict that best overall performance can be achieved by injecting additive noise at each time step. Noise contributes a second-order gradient term to the error function which can be viewed as an anticipatory agent to aid convergence. This term appears to find promising regions of weight space in the beginning stages of training when the training error is large and should improve convergence on error surfaces with local minima. The first-order term is a regularization term that can improve generalization. Specifically, it can encourage internal representations where the state nodes operate in the saturated regions of the sigmoid discriminant function. While this effect can improve performance on automata inference problems with binary inputs and target outputs, it is unclear what effect it will have on other types of problems. To substantiate these predictions, we present simulations on learning the dual parity grammar from temporal strings for all noise models, and present simulations on learning a randomly generated six-state grammar using the predicted best noise model.

Journal ArticleDOI
TL;DR: This study proposes two analytical approximations of the optimum threshold of clade selection to interpret (i.e., reduce) the bootstrap tree and investigates error measures that stem from a generalization of Robinson and Foulds’ (1981) distance, used to quantify the divergence between the true phylogeny and the estimated trees.
Abstract: In this study we address the problem of interpreting a bootstrap tree. The main issue is choosing the threshold of clade selection in order to separate reliable clades from unreliable ones, depending on their bootstrap proportion. This threshold depends on the chosen error measure. We investigate error measures that stem from a generalization of Robinson and Foulds’ (1981) distance, used to quantify the divergence between the true phylogeny and the estimated trees. We propose two analytical approximations of the optimum threshold of clade selection to interpret (i.e., reduce) the bootstrap tree. We performed extensive simulations along the lines of Kuhner and Felsenstein (1994) using the neighbor-joining and the maximum-parsimony methods. These simulations show that our approximations cause only small losses in quality when compared to the optimum threshold resulting from empirical observation. Next, we measured the error reduction achieved when estimating the true phylogeny by the properly reduced bootstrap tree rather than by the complete original tree, obtained with a classical tree-building method. Our simulations on short sequences show that an error reduction of 39% is achieved with the parsimony method and an error reduction of 33% is achieved with the distance method when the error is measured with the standard Robinson and Foulds distance. The observed error reduction is shown to originate from an important decrease in Type I error (wrong inferences), while Type II error (omitted correct clades) is only slightly increased. Greater error reduction is achieved when shorter sequences are used, and when more importance is given to Type I error than to Type II error. To investigate the causes of error from another point of view, we propose a general decomposition of the error expectation in two terms of bias, and one of variance. Results for these terms show that no fundamental bias is introduced by the bootstrap process, the only source of bias being structural (lack of resolution). Moreover, the variance in the estimations is greatly reduced, providing another explanation for the better results of the reduced bootstrap tree compared with the original tree estimate.

Journal ArticleDOI
TL;DR: In this paper, the authors established sharp capacitary estimates for Carnot-Caratheodory rings associated to a system of vector fields of Hormander type, which are instrumental to the study of the local behavior of singular solutions of a wide class of nonlinear subelliptic equations.
Abstract: We establish sharp capacitary estimates for Carnot-Caratheodory rings associated to a system of vector fields of Hormander type. Such estimates are instrumental to the study of the local behavior of singular solutions of a wide class of nonlinear subelliptic equations. One of the main results is a generalization of fundamental estimates obtained independently by Sanchez-Calle and Nagel, Stein and Wainger.


Journal ArticleDOI
R.M. de Jong1
TL;DR: The consistent model specification test proposed by Bierens is generalized to the framework of time series and a simulation procedure that is capable of establishing asymptotically valid critical values for such a test is described.


Journal ArticleDOI
TL;DR: The algorithm does not use any Big-M initial point and achieves O(\sqrt {nL} )$$ -iteration complexity, wheren andL are the number of variables and the length of data of the LP problem, and detects LP infeasibility based on a provable criterion.
Abstract: We present a simplification and generalization of the recent homogeneous and self-dual linear programming (LP) algorithm. The algorithm does not use any Big-M initial point and achieves\(O(\sqrt {nL} )\)-iteration complexity, wheren andL are the number of variables and the length of data of the LP problem. It also detects LP infeasibility based on a provable criterion. Its preliminary implementation with a simple predictor and corrector technique results in an efficient computer code in practice. In contrast to other interior-point methods, our code solves NETLIB problems, feasible or infeasible, starting simply fromx=e (primal variables),y=0 (dual variables),z=e (dual slack variables), wheree is the vector of all ones. We describe our computational experience in solving these problems, and compare our results with OB1.60, a state-of-the-art implementation of interior-point algorithms.

Journal ArticleDOI
TL;DR: In this paper, a traveling solitary wave solution to the generalized KdV equation is presented, and its two-dimensional generalization is discussed in terms of a travelling solitary wave.


Journal ArticleDOI
TL;DR: The main contribution of the paper is the development of Algorithm GenCom (Generalization for Commonality extraction) that makes use of concept generalization to effectively derive many meaningful commonalities that cannot be found otherwise.
Abstract: Studies two spatial knowledge discovery problems involving proximity relationships between clusters and features. The first problem is: given a cluster of points, how can we efficiently find features (represented as polygons) that are closest to the majority of points in the cluster? We measure proximity in an aggregate sense due to the nonuniform distribution of points in a cluster (e.g. houses on a map), and the different shapes and sizes of features (e.g. natural or man-made geographic features). The second problem is: given n clusters of points, how can we extract the aggregate proximity commonalities (i.e. features) that apply to most, if not all, of the n clusters? Regarding the first problem, the main contribution of the paper is the development of Algorithm CRH (Circle, Rectangle and Hull), which uses geometric approximations (i.e. encompassing circles, isothetic rectangles and convex hulls) to filter and select features. The highly scalable and incremental Algorithm CRH can examine over 50,000 features and their spatial relationships with a given cluster in approximately one second of CPU time. Regarding the second problem, the key contribution is the development of Algorithm GenCom (Generalization for Commonality extraction) that makes use of concept generalization to effectively derive many meaningful commonalities that cannot be found otherwise.

Journal ArticleDOI
TL;DR: In this paper, the central elements of the universal enveloping algebra U(gl(n)) which are called quantum immanants are considered and expressed in terms of generators and differential operators on the space of matrices.
Abstract: We consider some remarkable central elements of the universal enveloping algebraU(gl(n)) which we call quantum immanants. We express them in terms of generatorsE ij ofU(gl(n)) and as differential operators on the space of matrices These expressions are a direct generalization of the classical Capelli identities. They result in many nontrivial properties of quantum immanants.

Posted Content
TL;DR: In this article, the central elements of the universal enveloping algebra of the general linear algebra which are called quantum immanants are considered and expressed in terms of generators and differential operators on the space of matrices.
Abstract: We consider remarkable central elements of the universal enveloping algebra of the general linear algebra which we call quantum immanants. We express them in terms of generators $E_{ij}$ and as differential operators on the space of matrices. These expressions are a direct generalization of the classical Capelli identities. They result in many nontrivial properties of quantum immanants.

Posted Content
TL;DR: In this paper, the authors studied the optimal cutting strategy for an ongoing forest, using stochastic impulse control, and showed how Faustmann's formula can be generalized to growing forests.
Abstract: In the present paper we study the optimal cutting strategy { τ 1 , τ 2 , …} for an ongoing forest. By using stochastic impulse control we show how Faustmann's formula can be generalized to stochastic growing forests. The paper extends and clarifies previous studies by Miller and Voltaire (1983), Clarke and Reed (1989), and Reed and Clarke (1990).