scispace - formally typeset
Search or ask a question

Showing papers on "Generalization published in 1989"


Book
01 Jul 1989
TL;DR: CBR tends to be a good approach for rich, complex domains in which there are myriad ways to generalize a case, and is similar to the rule-induction algorithms of machine learning.
Abstract: Case-based reasoning, broadly construed, is the process of solving new problems based on the solutions of similar past problems. An auto mechanic who fixes an engine by recalling another car that exhibited similar symptoms is using case-based reasoning. A lawyer who advocates a particular outcome in a trial based on legal precedents is using case-based reasoning. It has been argued that case-based reasoning is not only a powerful method for computer reasoning, but also a pervasive behavior in everyday human problem solving. Case-based reasoning (CBR) has been formalized as a four-step process:N 1. Retrieve: Given a target problem, retrieve cases from memory that are relevant to solving it. A case consists of a problem, its solution, and, typically, annotations about how the solution was derived. For example, suppose Fred wants to prepare blueberry pancakes. Being a novice cook, the most relevant experience he can recall is one in which he successfully made plain pancakes. The procedure he followed for making the plain pancakes, together with justifications for decisions made along the way, constitutes Fred's retrieved case. 2. Reuse: Map the solution from the previous case to the target problem. This may involve adapting the solution as needed to fit the new situation. In the pancake example, Fred must adapt his retrieved solution to include the addition of blueberries. 3. Revise: Having mapped the previous solution to the target situation, test the new solution in the real world (or a simulation) and, if necessary, revise. Suppose Fred adapted his pancake solution by adding blueberries to the batter. After mixing, he discovers that the batter has turned blue -- an undesired effect. This suggests the following revision: delay the addition of blueberries until after the batter has been ladled into the pan. 4. Retain: After the solution has been successfully adapted to the target problem, store the resulting experience as a new case in memory. Fred, accordingly, records his newfound procedure for making blueberry pancakes, thereby enriching his set of stored experiences, and better preparing him for future pancake-making demands. At first glance, CBR may seem similar to the rule-induction algorithmsP of machine learning.N Like a rule-induction algorithm, CBR starts with a set of cases or training examples; it forms generalizations of these examples, albeit implicit ones, by identifying commonalities between a retrieved case and the target problem. For instance, when Fred mapped his procedure for plain pancakes to blueberry pancakes, he decided to use the same basic batter and frying method, thus implicitly generalizing the set of situations under which the batter and frying method can be used. The key difference, however, between the implicit generalization in CBR and the generalization in rule induction lies in when the generalization is made. A rule-induction algorithm draws its generalizations from a set of training examples before the target problem is even known; that is, it performs eager generalization. For instance, if a rule-induction algorithm were given recipes for plain pancakes, Dutch apple pancakes, and banana pancakes as its training examples, it would have to derive, at training time, a set of general rules for making all types of pancakes. It would not be until testing time that it would be given, say, the task of cooking blueberry pancakes. The difficulty for the rule-induction algorithm is in anticipating the different directions in which it should attempt to generalize its training examples. This is in contrast to CBR, which delays (implicit) generalization of its cases until testing time -- a strategy of lazy generalization. In the pancake example, CBR has already been given the target problem of cooking blueberry pancakes; thus it can generalize its cases exactly as needed to cover this situation. CBR therefore tends to be a good approach for rich, complex domains in which there are myriad ways to generalize a case.

1,458 citations



Proceedings Article
20 Aug 1989
TL;DR: A general framework for defining nonmonotonic systems based on the notion of preferred maximal consistent subsets of the premises is presented, which subsumes David Poole's THEORIST approach to default reasoning as a particular instance.
Abstract: We present a general framework for defining nonmonotonic systems based on the notion of preferred maximal consistent subsets of the premises. This framework subsumes David Poole's THEORIST approach to default reasoning as a particular instance. A disadvantage of THEORIST is that it does not allow to represent priorities between defaults adequately (as distinct from blocking defaults in specific situations). We therefore propose two generalizations of Poole's system: in the first generalization several layers of possible hypotheses representing different degrees of reliability are introduced. In a second further generalization a partial ordering between premises is used to distinguish between more and less reliable formulas. In both approaches a formula is provable from a theory if it is possible to construct a consistent argument for it based on the most reliable hypotheses. This allows for a simple representation of priorities between defaults.

486 citations


Journal ArticleDOI
TL;DR: The generalization problem is deeneed, various approaches in generalization are summarized, the credit assignment problem is identified, and the problem and some solutions in measuring generalizability are presented.
Abstract: (1989). A Generalization of . The College Mathematics Journal: Vol. 20, No. 5, pp. 416-418.

288 citations


Proceedings Article
01 Jan 1989
TL;DR: An empirical study of the relation of the number of parameters (weights) in a feedforward net to generalization performance and the application of cross-validation techniques to prevent overfitting is done.
Abstract: We have done an empirical study of the relation of the number of parameters (weights) in a feedforward net to generalization performance. Two experiments are reported. In one, we use simulated data sets with well-controlled parameters, such as the signal-to-noise ratio of continuous-valued data. In the second, we train the network on vector-quantized mel cepstra from real speech samples. In each case, we use back-propagation to train the feedforward net to discriminate in a multiple class pattern classification problem. We report the results of these studies, and show the application of cross-validation techniques to prevent overfitting.

272 citations


Proceedings Article
20 Aug 1989
TL;DR: A definition of feature construction in concept learning is presented, and a framework for its study is offered based on four aspects: detection, selection, generalization, and evaluation.
Abstract: Selective induction techniques perform poorly when the features are inappropriate for the target concept One solution is to have the learning system construct new features automatically; unfortunately feature construction is a difficult and poorly understood problem In this paper we present a definition of feature construction in concept learning, and offer a framework for its study based on four aspects: detection, selection, generalization, and evaluation This framework is used in the analysis of existing learning systems and as the basis for the design of a new system, CITRE CITRE performs feature construction using decision trees and simple domain knowledge as constructive biases Initial results on a set of spatial-dependent problems suggest the importance of domain knowledge and feature generalization, ie, constructive induction

217 citations


Journal ArticleDOI
TL;DR: In this paper, a unified approach to the asymptotic theory of alternative test criteria for testing parametric restrictions is provided, and the discussion develops within a general framework that distinguishes whether or not the fitting function is a chi-square distribution, and allows the null and alternative hypothesis to be only approximations of the true model.
Abstract: In the context of covariance structure analysis, a unified approach to the asymptotic theory of alternative test criteria for testing parametric restrictions is provided. The discussion develops within a general framework that distinguishes whether or not the fitting function is asymptotically optimal, and allows the null and alternative hypothesis to be only approximations of the true model. Also, the equivalent of the information matrix, and the asymptotic covariance matrix of the vector of summary statistics, are allowed to be singular. When the fitting function is not asymptotically optimal, test statistics which have asymptotically a chi-square distribution are developed as a natural generalization of more classical ones. Issues relevant for power analysis, and the asymptotic theory of a testing related statistic, are also investigated.

209 citations


Journal ArticleDOI
TL;DR: In this paper, an analog of the spectral analysis of time series is developed for data in general spaces, applied to data from an election in which 5738 people rank ordered five candidates.
Abstract: An analog of the spectral analysis of time series is developed for data in general spaces. This is applied to data from an election in which 5738 people rank ordered five candidates. Group theoretic considerations offer an analysis of variance like decomposition which seems natural and fruitful. A variety of inferential tools are suggested. The spectral ideas are then extended to general homogeneous spaces such as the sphere.

197 citations



Proceedings ArticleDOI
Tishby1, Levin1, Solla2
01 Jan 1989
TL;DR: The problem of learning a general input-output relation using a layered neural network is discussed in a statistical framework and the authors arrive at a Gibbs distribution on a canonical ensemble of networks with the same architecture.
Abstract: The problem of learning a general input-output relation using a layered neural network is discussed in a statistical framework By imposing the consistency condition that the error minimization be equivalent to a likelihood maximization for training the network, the authors arrive at a Gibbs distribution on a canonical ensemble of networks with the same architecture This statistical description enables them to evaluate the probability of a correct prediction of an independent example, after training the network on a given training set The prediction probability is highly correlated with the generalization ability of the network, as measured outside the training set This suggests a general and practical criterion for training layered networks by minimizing prediction errors The authors demonstrate the utility of this criterion for selecting the optimal architecture in the continuity problem As a theoretical application of the statistical formalism, they discuss the question of learning curves and estimate the sufficient training size needed for correct generalization, in a simple example >

167 citations


Journal ArticleDOI
TL;DR: In this article, the authors present a model of another type of cartographic line, namely boundaries between categories or classes which occur over contiguous regions of geographic space, and focus their attention on 'natural' area-class data sets such as soil maps.
Abstract: Appropriate generalization methods for geographic data must depend upon the kind of feature being generalized. Most research on cartographic line generalization has concentrated on linear features such as coastlines, rivers and roads; however, methods for the generalization of such linear geographic features may not be appropriate for the generalization of other types of cartographic lines. In this paper, we present a model of another type of cartographic line, namely boundaries between categories or classes which occur over contiguous regions of geographic space. We focus our attention on 'natural' area-class data sets such as soil maps. In the model, such boundary lines are far more similar (mathematically) to elevation contours than they are to coastlines and rivers. Appropriate generalization methods may involve construction of surfaces representing probability of class membership, generalization or smoothing of such surfaces and 'contouring' the probabilities to find boundaries. Pour etre appropriees...

01 Jan 1989
TL;DR: This paper will address the question of cartographic generalization in a digital environment by presenting a logical framework of the digital generalization process which includes a consideration of the intrinsic objectives of why the authors generalize; an assessment of the situations which indicate when to generalizing; and an understanding of how to generalize using spatial and attribute transformations.
Abstract: A key aspect of the mapping process cartographic generalization plays a vital role in assessing the overall utility of both computer-assisted map production systems and geographic information systems. Within the digital environment, a significant, if not the dominant, control on the graphic output is the role and effect of cartographic generalization. Unfortunately, there exists a paucity of research that addresses digital generalization in a holistic manner, looking at the interrelationships between the conditions that indicate a need for its application, the objectives or goals of the process, as well as the specific spatial and attribute transformations required to effect the changes. Given the necessary conditions for generalization in the digital domain, the display of both vector and raster data is, in part, a direct result of the application of such transformations, of their interactions between one another, and of the specific tolerances required. How then should cartographic generalization be embodied in a digital environment? This paper will address that question by presenting a logical framework of the digital generalization process which includes: a consideration of the intrinsic objectives of why we generalize; an assessment of the situations which indicate when to generalize; and an understanding of how to generalize using spatial and attribute transformations. In a recent publication, the authors examined the first of these three components. This paper focuses on the latter two areas: to examine the underlying conditions or situations when we need to generalize, and the spatial and attribute transformations that are employed to effect the changes.

Journal ArticleDOI
TL;DR: An empirical study evaluates three methods for solving the problem of identifying a correct concept definition from positive examples such that the concept is some specialization of a target concept defined by a domain theory, and concludes that the new method, IOE, does not exhibit these shortcomings.
Abstract: This paper formalizes a new learning-from-examples problem: identifying a correct concept definition from positive examples such that the concept is some specialization of a target concept defined by a domain theory. It describes an empirical study that evaluates three methods for solving this problem: explanation-based generalization (EBG), multiple example explanation-based generalization (mEBG), and a new method, induction over explanations (IOE). The study demonstrates that the two existing methods (EBG and mEBG) exhibit two shortcomings: (a) they rarely identify the correct definition, and (b) they are brittle in that their success depends greatly on the choice of encoding of the domain theory rules. The study demonstrates that the new method, IOE, does not exhibit these shortcomings. This method applies the domain theory to construct explanations from multiple training examples as in mEBG, but forms the concept definition by employing a similarity-based generalization policy over the explanations. IOE has the advantage that an explicit domain theory can be exploited to aid the learning process, the dependence on the initial encoding of the domain theory is significantly reduced, and the correct concepts can be learned from few examples. The study evaluates the methods in the context of an implemented system, called Wyl2, which learns a variety of concepts in chess including “skewer” and “knight-fork.”


Journal ArticleDOI
TL;DR: The formulation of this generalization includes a unified presentation of the optimality conditions, the Lagrangian multipliers, and the resizing and scaling algorithms in terms of the sensitivity derivatives of the constraint and objective functions.
Abstract: This paper presents a generalization of what is frequently referred to in the literature as the optimality criteria approach in structural optimization. This generalization includes a unified presentation of the optimality conditions, the Lagrangian multipliers, and the resizing and scaling algorithms in terms of the sensitivity derivatives of the constraint and objective functions. The by-product of this generalization is the derivation of a set of simple nondimensional parameters which provides significant insight into the behavior of the structure as well as the optimization algorithm. A number of important issues, such as, active and passive variables, constraints and three types of linking are discussed in the context of the present derivation of the optimality criteria approach. The formulation as presented in this paper brings multidisciplinary optimization within the purview of this extremely efficient optimality criteria approach.

Journal ArticleDOI
TL;DR: Generalisation du theoreme du point fixe de Dotson et application a l'approximation invariant as mentioned in this paper, and application invariance invariant to the application.

Journal ArticleDOI
TL;DR: A conceptual model, based on a sequential set of five procedures – or transformations – for the processing of linear digital data, is proposed, which focuses on the geometric interaction of simplification and smoothing algorithms.
Abstract: The cartographic generalization of vector data in digital format involves six distinct processes, including simplification, smoothing, enhancement, displacement, merging, and omission. Although the research agenda has addressed each of the six elements individually, little consideration has been given to the geometric interaction of the components. This paper proposes a conceptual model, based on a sequential set of five procedures – or transformations – for the processing of linear digital data. The geometric interaction of simplification and smoothing algorithms is especially emphasized. The first process involves cleaning the digital file, whereby digitizing errors and duplicate coordinate pairs are eliminated. This is followed by a simple smoothing–normally based on weighted-averaging–designed to eliminate the 'gridding' constraints of the digitizing tablet or other encoding device. A third manipulation involves what is called database simplification where a sequential approach (such as Lang toleranci...

Book ChapterDOI
01 Jan 1989
TL;DR: A brief review of the history of the star-triangle equation and its generalization to the chiral Potts model can be found in this paper, where the authors discuss how the recent solutions in terms of higher-genus Riemann surfaces emerge.
Abstract: After a brief review of the history of the star-triangle equation, we shall illustrate its importance with a few results for the two-dimensional Ising model and its generalization to the chiral Potts model. We shall discuss how the recent solutions in terms of higher-genus Riemann surfaces emerge. We shall finish with some further results for the quantities of interest in these new models. More related work is presented in the talks by Profs. Baxter and McCoy.

Journal ArticleDOI
TL;DR: A class of random-search-algorithms representing a generalization of the deterministic gradient algorithm in such a way that the gradient direction is replaced by the direction of a random vector uniformly distributed on the unit hypersphere is shown.
Abstract: The paper deals with a class of random-search-algorithms representing a generalization of the deterministic gradient algorithm in such a way that the gradient direction is replaced by the direction of a random vector uniformly distributed on the unit hypersphere. It is shown that under weak assumptions on the objective function the linear convergence rate of the gradient method can be transferred to these stochastic algorithms.

Journal ArticleDOI
TL;DR: The primary goal of this paper is to show how second derivative information can be used in an effective way in structural optimization problems, and a primal–dual approach is employed, that can be interpreted as a sequential quadratic programming method.
Abstract: The primary goal of this paper is to show how second derivative information can be used in an effective way in structural optimization problems. The basic idea is to generate such an information at the expense of only one more ‘virtual load case’ in the sensitivity analysis part of the finite element code. To achieve this goal a primal–dual approach is employed, that can also be interpreted as a sequential quadratic programming method. Another objective is to relate the proposed method to the well known family of approximation concepts techniques, where the primary optimization problem is transformed into a sequence of non-linear explicit subproblems. When restricted to diagonal second derivatives, the new approach can be viewed as a recursive convex programming method, similar to the ‘Convex Linearization’ method (CONLIN), and to its recent generalization, the ‘Method of Moving Asymptotes’ (MMA). This new method has been successfully tested on simple problems that can be solved in closed form, as well as on sizing optimization of trusses. In all cases the method converges faster than CONLIN, MMA or other approximation techniques based on reciprocal variables.


Proceedings Article
01 Jan 1989
TL;DR: This paper describes a neural network algorithm called complementary reinforcement back-propagation (CRBP), and reports simulation results on problems designed to offer differing opportunities for generalization.
Abstract: In associative reinforcement learning, an environment generates input vectors, a learning system generates possible output vectors, and a reinforcement function computes feedback signals from the input-output pairs. The task is to discover and remember input-output pairs that generate rewards. Especially difficult cases occur when rewards are rare, since the expected time for any algorithm can grow exponentially with the size of the problem. Nonetheless, if a reinforcement function possesses regularities, and a learning algorithm exploits them, learning time can be reduced below that of non-generalizing algorithms. This paper describes a neural network algorithm called complementary reinforcement back-propagation (CRBP), and reports simulation results on problems designed to offer differing opportunities for generalization.


Journal ArticleDOI
TL;DR: In this paper, the free energy expansion for a generalized system consisting of an arbitrary number of interacting magnetic contributions was given for a discussion of the type of magnetic transition in quasi-binary RE-Co2 compounds such as TbxHo1−xCo2.
Abstract: Inoue and Shimizu [1] give the free-energy expansion for a system consisting of a set of identical local (rare-earth) magnetic moments interacting with an itinerant electron subsystem (Co2). In this paper the free-energy expansion is given for a generalized system consisting of an arbitrary number of interacting magnetic contributions. This generalization is necessary for a discussion of the type of magnetic transition in quasi-binary RE-Co2 compounds such as TbxHo1−xCo2.

Proceedings ArticleDOI
C. H. Pedersen1
01 Sep 1989
TL;DR: It is argued that support of generalization in addition to specialization will improve class reusability and make it possible to create super-classes for already existing classes, hereby enabling exclusion of methods and creation of classes that describe commonalties among already existing ones.
Abstract: The arrangement of classes in a specialization hierarchy has proved to be a useful abstraction mechanism in class-based object oriented programming languages. The success of the mechanism is based on the high degree of code reuse that is offered, along with simple type conformance rules.The opposite of specialization is generalization. We will argue that support of generalization in addition to specialization will improve class reusability. A language that only supports specialization requires the class hierarchy to be constructed in a top down fashion. Support for generalization will make it possible to create super-classes for already existing classes, hereby enabling exclusion of methods and creation of classes that describe commonalties among already existing ones.We will show how generalization can coexist with specialization in class-based object oriented programming languages. Furthermore, we will verify that this can be achieved without changing the simple conformance rules or introducing new problems with name conflicts.

Proceedings ArticleDOI
23 May 1989
TL;DR: An analog of the Baum-Eagon inequality for rational functions makes it possible to use an E-M (expectation-maximization) algorithm for maximizing these functions.
Abstract: The well-known Baum-Eagon (1967) inequality provides an effective iterative scheme for finding a local maximum for homogeneous polynomials with positive coefficients over a domain of probability values. However, in a large class of statistical problems, such as those arising in speech recognition based on hidden Markov models, it was found that estimation of parameters via some other criteria that use conditional likelihood, mutual information, or the recently introduced H-criteria can give better results than maximum-likelihood estimation. These problems require finding maxima for rational functions over domains of probability values, and an analog of the Baum-Eagon inequality for rational functions makes it possible to use an E-M (expectation-maximization) algorithm for maximizing these functions. The authors describe this extension. >

Journal ArticleDOI
TL;DR: In this paper, a nouvelle methode non biaisee permettant la correction de la variance due a l'erreur d'echantillonnage is presented.
Abstract: Presentation d'une nouvelle methode non biaisee permettant la correction de la variance due a l'erreur d'echantillonnage


Journal ArticleDOI
15 Apr 1989-EPL
TL;DR: The Hebb solution for the perceptron realization of an arbitrary linearly separable Boolean function defined on the hypercube of dimension N is investigated and the learning and generalization rates in the N → ∞ limit are calculated.
Abstract: We investigate the Hebb solution for the perceptron realization of an arbitrary linearly separable Boolean function defined on the hypercube of dimension N. We calculate the learning and generalization rates in the N → ∞ limit. They can be analytically expressed vs. α = P/N, where P is the number of learned pattern.

Journal ArticleDOI
TL;DR: This work proposes new rules of conditioning motivated by the work of Dubois and Prade, and shows how Jeffrey's generalization of Bayes' rule of conditioning can be reinterpreted in terms of the theory of belief functions.