Showing papers on "Generalization published in 1991"

PDF

Open Access

Journal Article•DOI•

A note on a general definition of the coefficient of determination

[...]

01 Sep 1991-Biometrika

TL;DR: In this article, a generalization of the coefficient of determination R2 to general regression models is discussed, and a modification of an earlier definition to allow for discrete models is proposed.

...read moreread less

Abstract: SUMMARY A generalization of the coefficient of determination R2 to general regression models is discussed. A modification of an earlier definition to allow for discrete models is proposed.

...read moreread less

5,085 citations

Proceedings Article•

A Simple Weight Decay Can Improve Generalization

[...]

Anders Krogh¹, John Hertz•Institutions (1)

University of California, Santa Cruz¹

02 Dec 1991

TL;DR: It is proven that a weight decay has two effects in a linear network, and it is shown how to extend these results to networks with hidden layers and non-linear units.

...read moreread less

Abstract: It has been observed in numerical simulations that a weight decay can improve generalization in a feed-forward neural network. This paper explains why. It is proven that a weight decay has two effects in a linear network. First, it suppresses any irrelevant components of the weight vector by choosing the smallest vector that solves the learning problem. Second, if the size is chosen right, a weight decay can suppress some of the effects of static noise on the targets, which improves generalization quite a lot. It is then shown how to extend these results to networks with hidden layers and non-linear units. Finally the theory is confirmed by some numerical simulations using the data from NetTalk.

...read moreread less

1,569 citations

Journal Article•DOI•

"Fractional statistics" in arbitrary dimensions: A generalization of the Pauli principle.

[...]

Frederick D. Haldane¹•Institutions (1)

Princeton University¹

19 Aug 1991-Physical Review Letters

TL;DR: Fractional statistics is reformulated as a generalization of the Pauli exclusion principle, and a definition independent of the dimension of space is obtained, which is used to classify spinons in gapless spin-1/2 antiferromagnetic chains as semions.

...read moreread less

Abstract: The concept of ``fractional statistics'' is reformulated as a generalization of the Pauli exclusion principle, and a definition independent of the dimension of space is obtained. When applied to the vortexlike quasiparticles of the fractional quantum Hall effect, it gives the same result as that based on the braid-group. It is also used to classify spinons in gapless spin-1/2 antiferromagnetic chains as semions. An extensive one-particle Hilbert-space dimension is essential, limiting fractional statistics of this type to topological excitations confined to the interior of condensed matter. The new definition does not apply to ``anyon gas'' models as currently formulated: A possible resolution of this difficulty is proposed.

...read moreread less

830 citations

Proceedings Article•

Learning with many irrelevant features

[...]

Hussein Almuallim¹, Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

14 Jul 1991

TL;DR: It is shown that any learning algorithm implementing the MIN-FEATURES bias requires Θ(1/e ln 1/δ+ 1/e[2p + p ln n]) training examples to guarantee PAC-learning a concept having p relevant features out of n available features, and suggests that training data should be preprocessed to remove irrelevant features before being given to ID3 or FRINGE.

...read moreread less

Abstract: In many domains, an appropriate inductive bias is the MIN-FEATURES bias, which prefers consistent hypotheses definable over as few features as possible. This paper defines and studies this bias. First, it is shown that any learning algorithm implementing the MIN-FEATURES bias requires Θ(1/e ln 1/δ+ 1/e[2p + p ln n]) training examples to guarantee PAC-learning a concept having p relevant features out of n available features. This bound is only logarithmic in the number of irrelevant features. The paper also presents a quasi-polynomial time algorithm, FOCUS, which implements MIN-FEATURES. Experimental studies are presented that compare FOCUS to the ID3 and FRINGE algorithms. These experiments show that-- contrary to expectations--these algorithms do not implement good approximations of MIN-FEATURES. The coverage, sample complexity, and generalization performance of FOCUS is substantially better than either ID3 or FRINGE on learning problems where the MIN-FEATURES bias is appropriate. This suggests that, in practical applications, training data should be preprocessed to remove irrelevant features before being given to ID3 or FRINGE.

...read moreread less

716 citations

Journal Article•DOI•

Improving the generalization properties of radial basis function neural networks

[...]

Christopher M. Bishop¹•Institutions (1)

AEA Technology¹

01 Dec 1991-Neural Computation

TL;DR: It is shown that a modification to the error functional allows smoothing to be introduced explicitly without significantly affecting the speed of training.

...read moreread less

Abstract: An important feature of radial basis function neural networks is the existence of a fast, linear learning algorithm in a network capable of representing complex nonlinear mappings. Satisfactory generalization in these networks requires that the network mapping be sufficiently smooth. We show that a modification to the error functional allows smoothing to be introduced explicitly without significantly affecting the speed of training. A simple example is used to demonstrate the resulting improvement in the generalization properties of the network.

...read moreread less

325 citations

Book•

Map Generalization: Making Rules for Knowledge Representation

[...]

Buttenfield P Barbara, Robert B McMaster, Robert B Mc Master

01 Jan 1991

TL;DR: Part 1 Rule base organization: design considerations for a rule based system conceptual frameworks for geographical knowledge knowledge engineering for generalization and data modelling issues: suitable representation schema for geographic information knowledge classification and organization object modelling and phenomenon-based generalization.

...read moreread less

Abstract: Part 1 Rule base organization: design considerations for a rule based system conceptual frameworks for geographical knowledge knowledge engineering for generalization. Part 2 Data modelling issues: suitable representation schema for geographic information knowledge classification and organization object modelling and phenomenon-based generalization. Part 3 Formulation of rules: constraints on rule formation rule section for small scale map generalizations a rule for describing feature geometry amplified intelligence and rule based systems. Part 4 Computational and representational issues: role of interpolation in feature displacement parallel software and computation integration and evaluation of map generalization.

...read moreread less

289 citations

Proceedings Article•

Input generalization in delayed reinforcement learning: an algorithm and performance comparisons

[...]

David Chapman, Leslie Pack Kaelbling

24 Aug 1991

TL;DR: This paper describes the input generalization problem (whereby the system must generalize to produce similar actions in similar situations) and an implemented solution, the G algorithm, which is based on recursive splitting of the state space based on statistical measures of differences in reinforcements received.

...read moreread less

Abstract: Delayed reinforcement learning is an attractive framework for the unsupervised learning of action policies for autonomous agents Some existing delayed reinforcement learning techniques have shown promise in simple domains However, a number of hurdles must be passed before they are applicable to realistic problems This paper describes one such difficulty, the input generalization problem (whereby the system must generalize to produce similar actions in similar situations) and an implemented solution, the G algorithm This algorithm is based on recursive splitting of the state space based on statistical measures of differences in reinforcements received Connectionist backpropagation has previously been used for input generalization in reinforcement learning We compare the two techniques analytically and empirically The G algorithm's sound statistical basis makes it easy to predict when it should and should not work, whereas the behavior of back-propagation is unpredictable We found that a previous successful use of backpropagation can be explained by the linearity of the application domain We found that in another domain, G reliably found the optimal policy, whereas none of a set of runs of backpropagation with many combinations of parameters did

...read moreread less

272 citations

Journal Article•DOI•

Catalan Numbers, Their Generalization, and Their Uses

[...]

Peter Hilton¹, Jean Pedersen²•Institutions (2)

Binghamton University¹, Santa Clara University²

01 Mar 1991-The Mathematical Intelligencer

TL;DR: In this article, a path is a sequence of points Po P1 9 9 9 Pro, m >I O, where each P, is a lattice point (that is, a point with integer coordinates) and Pz+l, i 1> 0, is obtained by stepping one unit east or one unit north of P,.

...read moreread less

Abstract: Probably the most prominent among the special integers that arise in combinatorial contexts are the binomial coefficients (~). These have many uses and, often, fascinating interpretations [9]. We would like to stress one particular interpretation in terms of paths on the integral lattice in the coordinate plane, and discuss the celebrated ballot problem using this interpretation. A path is a sequence of points Po P1 9 9 9 Pro, m >I O, where each P, is a lattice point (that is, a point with integer coordinates) and Pz+l, i 1> 0, is obtained by stepping one unit east or one unit north of P,. We say that this is a path from P to Q if Po = P, Pm= Q. It is now easy to count the number of paths.

...read moreread less

262 citations

Journal Article•DOI•

The Use of Background Knowledge in Decision Tree Induction

[...]

Marlon Núñez¹•Institutions (1)

Telefónica¹

01 May 1991-Machine Learning

TL;DR: The algorithm presented in this paper tries to generate more logical and understandable decision trees than those generated by ID3-like algorithms; it executes various types of generalization and at the same time reduces the classification cost by means of background knowledge.

...read moreread less

Abstract: At present, algorithms of the ID3 family are not based on background knowledge. For that reason, most of the time they are neither logical nor understandable to experts. These algorithms cannot perform different types of generalization as others can do (Michalski, 1983s Kodratoff, 1983), nor can they can reduce the cost of classifications. The algorithm presented in this paper tries to generate more logical and understandable decision trees than those generated by ID3-like algorithmss it executes various types of generalization and at the same time reduces the classification cost by means of background knowledge. The background knowledge contains the ISA hierarchy and the measurement cost associated with each attribute. The user can define the degrees of economy and generalization. These data will influence directly the quantity of search that the algorithm must undertake. This algorithm, which is an attribute version of the EG2 method (Nunez, 1988a, 1988b), has been implemented and the results appear in this paper comparing them with other methods.

...read moreread less

208 citations

Journal Article•DOI•

Solving systems of linear fuzzy equations

[...]

James J. Buckley¹, Yunxia Qu²•Institutions (2)

University of Alabama at Birmingham¹, Taiyuan University of Technology²

01 Sep 1991-Fuzzy Sets and Systems

TL;DR: It is argued that the previous method of solving for x, based on the extension principle and regular fuzzy arithmetic, should be abandoned since it too often fails to produce a solution.

...read moreread less

206 citations

Journal Article•DOI•

On methods for bounding the overall properties of nonlinear composites

[...]

John R. Willis¹•Institutions (1)

University of Bath¹

01 Jan 1991-Journal of The Mechanics and Physics of Solids

TL;DR: It is demonstrated that the two methods for bounding the overall properties of nonlinear composites generate precisely the same information, and hence that differences noted by Ponte Castaneda arise from comparing optimal bounds obtained from the new procedure with sub-optimal bounds obtaining from the original one.

...read moreread less

Abstract: A new method for bounding the overall properties of nonlinear composites, proposed byPonte Castaneda (J. Mech. Phys. Solids 39, 45, 1991), is compared with an older prescription based on a generalization to nonlinear behaviour of the Hashin-Shtrikman procedure. It is demonstrated that the two methods generate precisely the same information, and hence that differences noted by Ponte Castaneda arise from comparing optimal bounds obtained from the new procedure with sub-optimal bounds obtained from the original one. The relative advantages of either procedure are discussed.

...read moreread less

Journal Article•DOI•

On the representation of infinite-length distributed RC one-ports

[...]

Vladimir Szekely

01 Jul 1991-IEEE Transactions on Circuits and Systems

TL;DR: In this paper, the dipole intensity function and the time-constant density of RC one-port networks are introduced for the identification and synthesis of distributed RC networks, and the results can also be applied directly for inductance-resistance networks.

...read moreread less

Abstract: Representations of infinite distributed RC one-ports are described. Two functions are introduced: the dipole intensity function (as the generalization of pole-zero pattern) and the time-constant density (as the generalization of the discrete time-constant set of a lumped network). Relations between these representations and the complex impedance are presented. These representations can be regarded as the generalization of the descriptions commonly used in the theory of lumped networks. The representations offer possibilities for the identification and for the synthesis of distributed RC networks. Although the representations were introduced for the case of RC networks, the results can also be applied directly for inductance-resistance networks. The use of the new representations is demonstrated by some examples. >

...read moreread less

The K-Server Dual and Loose Competitiveness for Paging.

[...]

Neal E. Young¹•Institutions (1)

University of Maryland, College Park¹

01 Jan 1991

TL;DR: Weighted caching is a generalization of paging in which the cost to evict an item depends on the item as mentioned in this paper, and it is studied as a restriction of the well-known k-server problem.

...read moreread less

Abstract: Weighted caching is a generalization ofpaging in which the cost to evict an item depends on the item. We study both of these problems as restrictions of the well-knownk-server problem, which involves moving servers in a graph in response to requests so as to minimize the distance traveled.

...read moreread less

Proceedings Article•DOI•

Note on generalization, regularization and architecture selection in nonlinear learning systems

[...]

John Moody¹•Institutions (1)

Yale University¹

30 Sep 1991

TL;DR: The author proposes a new estimate of generalization performance for nonlinear learning systems called the generalized prediction error (GPE) which is based upon the notion of the effective number of parameters p/sub eff/( lambda ).

...read moreread less

Abstract: The author proposes a new estimate of generalization performance for nonlinear learning systems called the generalized prediction error (GPE) which is based upon the notion of the effective number of parameters p/sub eff/( lambda ). GPE does not require the use of a test set or computationally intensive cross validation and generalizes previously proposed model selection criteria (such as GCV, FPE, AIC, and PSE) in that it is formulated to include biased, nonlinear models (such as back propagation networks) which may incorporate weight decay or other regularizers. The effective number of parameters p/sub eff/( lambda ) depends upon the amount of bias and smoothness (as determined by the regularization parameter lambda ) in the model, but generally differs from the number of weights p. Construction of an optimal architecture thus requires not just finding the weights w/sub lambda /* which minimize the training function U( lambda , w) but also the lambda which minimizes GPE( lambda ). >

...read moreread less

Journal Article•DOI•

Generalizing the notion of schema in genetic algorithms

[...]

Michael D. Vose¹•Institutions (1)

University of Tennessee¹

01 Aug 1991-Artificial Intelligence

TL;DR: The concept of schema and the Schema Theorem are interpreted from a new perspective, which allows GAs to be regarded as a constrained random walk, and offers a view which is amenable to generalization.

...read moreread less

Journal Article•

The General, the Abstract, and the Generic in Advanced Mathematics

[...]

Guershon Harel, David Tall

01 Jan 1991-for the learning of mathematics

TL;DR: The process of formal definition in advanced mathematics actually consists of two distinct complementary processes: the first is the abstraction of specific properties of one or more mathematical objects to form the basis of the definition of the new abstract mathematical object and the second is the process of construction of the abstract concept through logical deduction from the definition as discussed by the authors.

...read moreread less

Abstract: ion An abstraction process occurs when the subject focuses attention on specific properties of a given object and then considers these properties in isolation from the original This might be done, for example, to understand the essence of a certain phenomenon, perhaps later to be able to apply the same theory in other cases to which it applies Such application of an abstract theory would be a case of reconstructive generalization – because the abstracted properties are reconstructions of the original properties, now applied to a broader General, Abstract and Generic Guershon Harel & David Tall – 4 – domain However, note that once the reconstructive generalization has occurred, it may then be possible to extend the range of examples to which the arguments apply through the simpler process of expansive generalization For instance, when the group properties are extracted from various contexts to give the axioms for a group, this must be followed by the reconstruction of other properties (such as uniqueness of identity and of inverses) from the axioms This leads to the construction of an abstract group concept which is a re-constructive generalization of various familiar examples of groups When this abstract construction has been made, further applications of group theory to other contexts (usually performed by specialization from the abstract concept) are now expansive generalizations of the original ideas The case of definition The process of formal definition in advanced mathematics actually consists of two distinct complementary processes One is the abstraction of specific properties of one or more mathematical objects to form the basis of the definition of the new abstract mathematical object The other is the process of construction of the abstract concept through logical deduction from the definition The first of these processes we will call formal abstraction, in that it abstracts the form of the new concept through the selection of generative properties of one or more specific situations; for example, abstracting the vector-space axioms from the space of directed-line segments alone or from what it is noticed to be common to this space and the space of polynomials This formal abstraction historically took many generations, but is now a preferred method of progress in building mathematical theories The student rarely sees this part of the process Instead (s)he is presented with the definition in terms of carefully selected properties as a fait accomplit When presented with the definition, the student is faced with the naming of the concept and the statement of a small number of properties or axioms But the definition is more than a naming It is the selection of generative properties suitable for deductive construction of the abstract concept The abstract concept which satisfies only those properties that may be deduced from the definition and no others requires a massive reconstruction Its construction is guided by the properties which hold in the original mathematical concepts from which it was abstracted, but judgement of the truth of these properties must be suspended until they are deduced from the definition For the novice this is liable to cause great confusion at the time The newly constructed abstract object will then generalize the General, Abstract and Generic Guershon Harel & David Tall – 5 – properties embodied in the definition, because any properties that may be deduced from them will be part of it Because of the difficulties involved in the construction process, this is a reconstructive generalization Occasionally the process leads to a newly constructed abstract object whose properties apply only to the original domain, and not to a more general domain For instance, the formal abstraction of the notion of a complete ordered field from the real numbers, or the abstraction of the group concept from groups of transformations Up to isomorphism there is only one complete ordered field, and Cayley’s theorem shows that every abstract group is isomorphic to a group of transformations In these cases the process leads to an abstract concept which does not extend the class of possible embodiments We include these instances within the same theoretical framework for, though they fail to generalize the notion to a broader class of examples, they very much change the nature of the concept in question The formal abstraction process coupled with the construction of the formal concept, when achieved, leads to a mental object that is easier for the expert to manipulate mentally because the precise properties of the concept have been abstracted and can lead to precise general proofs based on these properties Formal abstraction leading to mathematical definitions usually serves two purposes which are particularly attractive to the expert mathematician: (a) Any arguments valid for the abstracted properties apply to all other instances where the abstracted properties hold, so (provided that there are other instances) the arguments are more general (b) Once the abstraction is made, by concentrating on the abstracted properties and ignoring all others, the abstraction should involve less cognitive strain These two factors make a formal abstraction a powerful tool for the expert yet – because of the cognitive reconstruction involved – they may cause great difficulty for the learner

...read moreread less

Journal Article•DOI•

A risk-sensitive maximum principle: the case of imperfect state observation

[...]

Peter Whittle¹•Institutions (1)

University of Cambridge¹

01 Jul 1991-IEEE Transactions on Automatic Control

TL;DR: In this article, the risk sensitive maximum principle for optimal stochastic control derived by the author in an earlier work (System Control Letters, vol.15, 1990) is restated.

...read moreread less

Abstract: The risk-sensitive maximum principle for optimal stochastic control derived by the author in an earlier work (System Control Letters, vol.15, 1990) is restated. This is an immediate generalization of the classic Pontryagin principle, to which it reduces in the deterministic case, and is expressed immediately in terms of observables. It is derived on the assumption that the criterion function is the exponential of an additive cost function, and is exact under linear-quadratic Gaussian assumptions, but is otherwise valid as a large deviation approximation. The principle is extended to the case of imperfect state observation after preliminary establishment of a certainty-equivalence principle. The derivation yields as byproduct a large-deviation version of the updating equation for nonlinear filtering. The development is heuristic. It is believed that the mathematical arguments given are the essential ones, and provide a self-contained treatment at this level. >

...read moreread less

Proceedings Article•DOI•

Selecting neural network architectures via the prediction risk: application to corporate bond rating prediction

[...]

J. Utans¹, J. Moody¹•Institutions (1)

Yale University¹

09 Oct 1991

TL;DR: The authors propose the prediction risk as a measure of the generalization ability of multi-layer perceptron networks and use it to select the optimal network architecture.

...read moreread less

Abstract: The notion of generalization can be defined precisely as the prediction risk, the expected performance of an estimator on new observations. The authors propose the prediction risk as a measure of the generalization ability of multi-layer perceptron networks and use it to select the optimal network architecture. The prediction risk must be estimated from the available data. The authors approximate the prediction risk by v-fold cross-validation and asymptotic estimates of generalized cross-validation or H. Akaike's (1970) final prediction error. They apply the technique to the problem of predicting corporate bond ratings. This problem is very attractive as a case study, since it is characterized by the limited availability of the data and by the lack of complete a priori information that could be used to impose a structure to the network architecture. >

...read moreread less

Book•

The Bounds of Logic: A Generalized Viewpoint

[...]

Gila Sher

08 Nov 1991

TL;DR: New bounds the initial generalization to be a logical term semantics from the ground up ways of branching quantifiers a new conception of logic.

...read moreread less

Abstract: New bounds the initial generalization to be a logical term semantics from the ground up ways of branching quantifiers a new conception of logic.

...read moreread less

Journal Article•DOI•

Generalization performance of Bayes optimal classification algorithm for learning a perceptron.

[...]

Manfred Opper¹, David Haussler²•Institutions (2)

University of Giessen¹, University of California, Santa Cruz²

20 May 1991-Physical Review Letters

TL;DR: It is shown that approximations to the generalization error of the Bayes optimal algorithm can be achieved by learning algorithms that use a two-layer neutral net to learn a perceptron.

...read moreread less

Abstract: The generalization error of the Bayes optimal classification algorithm when learning a perceptron from noise-free random training examples is calculated exactly using methods of statistical mechanics. It is shown that if an assumption of replica symmetry is made, then, in the thermodynamic limit, the error of the Bayes optimal algorithm is less than the error of a canonical stochastic learning algorithm, by a factor approaching \ensuremath{\surd}2 as the ratio of the number of training examples to perceptron weights grows. In addition, it is shown that approximations to the generalization error of the Bayes optimal algorithm can be achieved by learning algorithms that use a two-layer neutral net to learn a perceptron.

...read moreread less

Proceedings Article•

Principled Architecture Selection for Neural Networks: Application to Corporate Bond Rating Prediction

[...]

John Moody¹, Joachim Utans¹•Institutions (1)

Yale University¹

02 Dec 1991

TL;DR: The prediction risk is proposed as a measure of the generalization ability of multi-layer perceptron networks and used to select an optimal network architecture from a set of possible architectures and a heuristic search strategy is proposed to explore the space of possible architecture.

...read moreread less

Abstract: The notion of generalization ability can be defined precisely as the prediction risk, the expected performance of an estimator in predicting new observations. In this paper, we propose the prediction risk as a measure of the generalization ability of multi-layer perceptron networks and use it to select an optimal network architecture from a set of possible architectures. We also propose a heuristic search strategy to explore the space of possible architectures. The prediction risk is estimated from the available data; here we estimate the prediction risk by v-fold cross-validation and by asymptotic approximations of generalized cross-validation or Akaike's final prediction error. We apply the technique to the problem of predicting corporate bond ratings. This problem is very attractive as a case study, since it is characterized by the limited availability of the data and by the lack of a complete a priori model which could be used to impose a structure to the network architecture.

...read moreread less

Journal Article•DOI•

(ZN×)n−1 generalization of the chiral Potts model

[...]

Vladimir V. Bazhanov, Rinat Kashaev, Vladimir V. Mangazeev, Yu. G. Stroganov

01 May 1991-Communications in Mathematical Physics

TL;DR: In this article, it was shown that the R-matrix which intertwines twon-by-Nn−1 state cyclic L-operators related with a generalization of Uq(sl(n)) algebra can be considered as a Boltzmann weight of four-spin box for a lattice model with two-spin interaction.

...read moreread less

Abstract: We show that theR-matrix which intertwines twon-by-Nn−1 state cyclicL-operators related with a generalization ofUq(sl(n)) algebra can be considered as a Boltzmann weight of four-spin box for a lattice model with two-spin interaction just as theR-matrix of the checkerboard chiral Potts model. The rapidity variables lie on the algebraic curve of the genusg=N2(n−1)((n−1)N-n)+1 defined by 2n–3 independent moduli. This curve is a natural generalization of the curve which appeared in the chiral Potts model. Factorization properties of theL-operator and its connection to the SOS models are also discussed.

...read moreread less

Journal Article•DOI•

Recognizing hand-printed letters and digits using backpropagation learning

[...]

Gale L. Martin, James A. Pittman

01 Jun 1991-Neural Computation

TL;DR: Results suggest that a large and representative training sample may be the single, most important factor in achieving high recognition accuracy in hand-printed character recognition systems, and benefits of reducing the number of net connections are discussed.

...read moreread less

Abstract: We report on results of training backpropagation nets with samples of hand-printed digits scanned off of bank checks and hand-printed letters interactively entered into a computer through a stylus digitizer. Generalization results are reported as a function of training set size and network capacity. Given a large training set, and a net with sufficient capacity to achieve high performance on the training set, nets typically achieved error rates of 4-5% at a 0% reject rate and 1-2% at a 10% reject rate. The topology and capacity of the system, as measured by the number of connections in the net, have surprisingly little effect on generalization. For those developing hand-printed character recognition systems, these results suggest that a large and representative training sample may be the single, most important factor in achieving high recognition accuracy. Benefits of reducing the number of net connections, other than improving generalization, are discussed.

...read moreread less

Book•

Using genetic search to exploit the emergent behavior of neural networks

[...]

J. David Schaffer, Richard A. Caruana, Larry Eshelman

01 Jun 1991

TL;DR: The results show that a network architecture evolved by the genetic algorithm performs better than a large network using backpropagation learning alone when the criterion is correct generalization from a set of examples.

...read moreread less

Abstract: Neural networks are known to exhibit emergent behaviors, but it is often far from easy to exploit these properties for desired ends such as effective machine learning. We demonstrate that a genetic algorithm is capable of discovering how to exploit the abilities of one type of network learning, backpropagation in feedforward networks. Our results show that a network architecture evolved by the genetic algorithm performs better than a large network using backpropagation learning alone when the criterion is correct generalization from a set of examples. This is potentially a powerful method for design of neural networks-design by evolution.

...read moreread less

Journal Article•DOI•

Improving generalization of neural networks through pruning

[...]

Hans Henrik Thodberg

01 Jan 1991-International Journal of Neural Systems

TL;DR: A technique for constructing neural network architectures with better ability to generalize is presented under the name Ockham's Razor: several networks are trained and then pruned by removing connections one by one and retraining, resulting in perfect generalization.

...read moreread less

Abstract: A technique for constructing neural network architectures with better ability to generalize is presented under the name Ockham's Razor: several networks are trained and then pruned by removing connections one by one and retraining. The networks which achieve fewest connections generalize best. The method is tested on a classification of bit strings (the contiguity problem): the optimal architecture emerges, resulting in perfect generalization. The internal representation of the network changes substantially during the retraining, and this distinguishes the method from previous pruning studies.

...read moreread less

Journal Article•DOI•

On generalization on Nguyen's theorem

[...]

Robert Fullér¹, Tibor Kereszfalvi²•Institutions (2)

Åbo Akademi University¹, Eötvös Loránd University²

01 Jun 1991-Fuzzy Sets and Systems

TL;DR: In this paper, the authors generalize the α-cuts of two-place functions defined by the Zadeh's extension principle to the case of extended twoplace functions via a sup-t-norm convolution.

...read moreread less

Dissertation•

Applications and extensions of Fomin's generalization of the Robinson-Schensted correspondence to differential posets

[...]

Tom Roby

01 Jan 1991

Journal Article•DOI•

The qualitative analysis of n-species Lotka-Volterra periodic competition systems

[...]

Xiao-Qiang Zhao¹•Institutions (1)

Academia Sinica¹

01 Nov 1991-Mathematical and Computer Modelling

TL;DR: In this paper, the authors considered an n-species Lotka-Volterra periodic competition system and obtained sufficient conditions for the ultimate boundedness of solutions and the existence and global attractivity of a positive periodic solution.

...read moreread less

Journal Article•DOI•

Characterization of solution sets of convex programs

[...]

James V. Burke¹, Michael C. Ferris²•Institutions (2)

University of Washington¹, University of Wisconsin-Madison²

01 Feb 1991-Operations Research Letters

TL;DR: This paper gives several characterizations of the solution set of convex programs, and the subgradients attaining the minimum principle are explicitly characterized, and this characterization is shown to be independent of any solution.

...read moreread less

Proceedings Article•DOI•

Double backpropagation increasing generalization performance

[...]

H. Drucker¹, Y. Le Cun¹•Institutions (1)

Bell Labs¹

08 Jul 1991

TL;DR: It is shown that a training algorithm termed double back- Propagation improves generalization by simultaneously minimizing the normal energy term found in back-propagation and an additional energy term that is related to the sum of the squares of the input derivatives (gradients).

...read moreread less

Abstract: One test of a training algorithm is how well the algorithm generalizes from the training data to the test data. It is shown that a training algorithm termed double back-propagation improves generalization by simultaneously minimizing the normal energy term found in back-propagation and an additional energy term that is related to the sum of the squares of the input derivatives (gradients). In normal back-propagation training, minimizing the energy function tends to push the input gradient to zero. However, this is not always possible. Double back-propagation explicitly pushes the input gradients to zero, making the minimum broader, and increases the generalization on the test data. The authors show the improvement over normal back-propagation on four candidate architectures with a training set of 320 handwritten numbers and a test set of size 180. >

...read moreread less

Collapse