# Showing papers in "IEEE Transactions on Neural Networks in 1992"

••

TL;DR: Using the Stone-Weierstrass theorem, it is proved that linear combinations of the fuzzy basis functions are capable of uniformly approximating any real continuous function on a compact set to arbitrary accuracy.

Abstract: Fuzzy systems are represented as series expansions of fuzzy basis functions which are algebraic superpositions of fuzzy membership functions. Using the Stone-Weierstrass theorem, it is proved that linear combinations of the fuzzy basis functions are capable of uniformly approximating any real continuous function on a compact set to arbitrary accuracy. Based on the fuzzy basis function representations, an orthogonal least-squares (OLS) learning algorithm is developed for designing fuzzy systems based on given input-output pairs; then, the OLS algorithm is used to select significant fuzzy basis functions which are used to construct the final fuzzy system. The fuzzy basis function expansion is used to approximate a controller for the nonlinear ball and beam system, and the simulation results show that the control performance is improved by incorporating some common-sense fuzzy control rules. >

2,575 citations

••

TL;DR: A direct adaptive tracking control architecture is proposed and evaluated for a class of continuous-time nonlinear dynamic systems for which an explicit linear parameterization of the uncertainty in the dynamics is either unknown or impossible.

Abstract: A direct adaptive tracking control architecture is proposed and evaluated for a class of continuous-time nonlinear dynamic systems for which an explicit linear parameterization of the uncertainty in the dynamics is either unknown or impossible. The architecture uses a network of Gaussian radial basis functions to adaptively compensate for the plant nonlinearities. Under mild assumptions about the degree of smoothness exhibit by the nonlinear functions, the algorithm is proven to be globally stable, with tracking errors converging to a neighborhood of zero. A constructive procedure is detailed, which directly translates the assumed smoothness properties of the nonlinearities involved into a specification of the network required to represent the plant to a chosen degree of accuracy. A stable weight adjustment mechanism is determined using Lyapunov theory. The network construction and performance of the resulting controller are illustrated through simulations with example systems. >

2,254 citations

••

TL;DR: The fuzzy ARTMAP system is compared with Salzberg's NGE systems and with Simpson's FMMC system, and its performance in relation to benchmark backpropagation and generic algorithm systems.

Abstract: A neural network architecture is introduced for incremental supervised learning of recognition categories and multidimensional maps in response to arbitrary sequences of analog or binary input vectors, which may represent fuzzy or crisp sets of features. The architecture, called fuzzy ARTMAP, achieves a synthesis of fuzzy logic and adaptive resonance theory (ART) neural networks by exploiting a close formal similarity between the computations of fuzzy subsethood and ART category choice, resonance, and learning. Four classes of simulation illustrated fuzzy ARTMAP performance in relation to benchmark backpropagation and generic algorithm systems. These simulations include finding points inside versus outside a circle, learning to tell two spirals apart, incremental approximation of a piecewise-continuous function, and a letter recognition database. The fuzzy ARTMAP system is also compared with Salzberg's NGE systems and with Simpson's FMMC system. >

2,096 citations

••

[...]

TL;DR: A wavelet network concept, which is based on wavelet transform theory, is proposed as an alternative to feedforward neural networks for approximating arbitrary nonlinear functions.

Abstract: A wavelet network concept, which is based on wavelet transform theory, is proposed as an alternative to feedforward neural networks for approximating arbitrary nonlinear functions. The basic idea is to replace the neurons by 'wavelons', i.e., computing units obtained by cascading an affine transform and a multidimensional wavelet. Then these affine transforms and the synaptic weights must be identified from possibly noise corrupted input/output data. An algorithm of backpropagation type is proposed for wavelet network training, and experimental results are reported. >

2,031 citations

••

TL;DR: A fuzzy neural network model based on the multilayer perceptron, using the backpropagation algorithm, and capable of fuzzy classification of patterns is described, and the results are compared with those of the conventional MLP, the Bayes classifier, and other related models.

Abstract: A fuzzy neural network model based on the multilayer perceptron, using the backpropagation algorithm, and capable of fuzzy classification of patterns is described. The input vector consists of membership values to linguistic properties while the output vector is defined in terms of fuzzy class membership values. This allows efficient modeling of fuzzy uncertain patterns with appropriate weights being assigned to the backpropagated errors depending upon the membership values at the corresponding outputs. During training, the learning rate is gradually decreased in discrete steps until the network converges to a minimum error solution. The effectiveness of the algorithm is demonstrated on a speech recognition problem. The results are compared with those of the conventional MLP, the Bayes classifier, and other related models. >

1,031 citations

••

TL;DR: The generalized approximate-reasoning-based intelligent control (GARIC) architecture learns and tunes a fuzzy logic controller even when only weak reinforcement is available; introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; and learns to produce real-valued control actions.

Abstract: A method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system is presented. It is shown that: the generalized approximate-reasoning-based intelligent control (GARIC) architecture learns and tunes a fuzzy logic controller even when only weak reinforcement, such as a binary failure signal, is available; introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward network, which can then adaptively improve performance by using gradient descent methods. The GARIC architecture is applied to a cart-pole balancing system and demonstrates significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing. >

987 citations

••

TL;DR: A generalized control strategy that enhances fuzzy controllers with self-learning capability for achieving prescribed control objectives in a near-optimal manner is presented and the inverted pendulum system is employed as a testbed to demonstrate the effectiveness of the proposed control scheme and the robustness of the acquired fuzzy controller.

Abstract: A generalized control strategy that enhances fuzzy controllers with self-learning capability for achieving prescribed control objectives in a near-optimal manner is presented. This methodology, termed temporal backpropagation, is model-sensitive in the sense that it can deal with plants that can be represented in a piecewise-differentiable format, such as difference equations, neural networks, GMDH structures, and fuzzy models. Regardless of the numbers of inputs and outputs of the plants under consideration, the proposed approach can either refine the fuzzy if-then rules of human experts or automatically derive the fuzzy if-then rules if human experts are not available. The inverted pendulum system is employed as a testbed to demonstrate the effectiveness of the proposed control scheme and the robustness of the acquired fuzzy controller. >

915 citations

••

TL;DR: A fuzzy modeling method using fuzzy neural networks with the backpropagation algorithm is presented that can identify the fuzzy model of a nonlinear system automatically.

Abstract: A fuzzy modeling method using fuzzy neural networks with the backpropagation algorithm is presented. The method can identify the fuzzy model of a nonlinear system automatically. The feasibility of the method is examined using simple numerical data. >

894 citations

••

TL;DR: The fuzzy min-max classifier neural network implementation is explained, the learning and recall algorithms are outlined, and several examples of operation demonstrate the strong qualities of this new neural network classifier.

Abstract: A supervised learning neural network classifier that utilizes fuzzy sets as pattern classes is described. Each fuzzy set is an aggregate (union) of fuzzy set hyperboxes. A fuzzy set hyperbox is an n-dimensional box defined by a min point and a max point with a corresponding membership function. The min-max points are determined using the fuzzy min-max learning algorithm, an expansion-contraction process that can learn nonlinear class boundaries in a single pass through the data and provides the ability to incorporate new and refine existing classes without retraining. The use of a fuzzy set approach to pattern classification inherently provides a degree of membership information that is extremely useful in higher-level decision making. The relationship between fuzzy sets and pattern classification is described. The fuzzy min-max classifier neural network implementation is explained, the learning and recall algorithms are outlined, and several examples of operation demonstrate the strong qualities of this new neural network classifier. >

723 citations

••

TL;DR: For a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, inconsistency in rating among experts was observed, with fuzzy c-means approaches being slightly preferred over feedforward cascade correlation results.

Abstract: Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representations of the original data with three approaches: the literal and approximate fuzzy c-means unsupervised clustering algorithms, and a supervised computational neural network. Initial clinical results are presented on normal volunteers and selected patients with brain tumors surrounded by edema. Supervised and unsupervised segmentation techniques provide broadly similar results. Unsupervised fuzzy algorithms were visually observed to show better segmentation when compared with raw image data for volunteer studies. For a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, where the tissues have similar MR relaxation behavior, inconsistency in rating among experts was observed, with fuzz-c-means approaches being slightly preferred over feedforward cascade correlation results. Various facets of both approaches, such as supervised versus unsupervised learning, time complexity, and utility for the diagnostic process, are compared. >

636 citations

••

TL;DR: It is not proved that the introduction of additive noise to the training vectors always improves network generalization, but the analysis suggests mathematically justified rules for choosing the characteristics of noise if additive noise is used in training.

Abstract: The possibility of improving the generalization capability of a neural network by introducing additive noise to the training samples is discussed. The network considered is a feedforward layered neural network trained with the back-propagation algorithm. Back-propagation training is viewed as nonlinear least-squares regression and the additive noise is interpreted as generating a kernel estimate of the probability density that describes the training vector distribution. Two specific application types are considered: pattern classifier networks and estimation of a nonstochastic mapping from data corrupted by measurement errors. It is not proved that the introduction of additive noise to the training vectors always improves network generalization. However, the analysis suggests mathematically justified rules for choosing the characteristics of noise if additive noise is used in training. Results of mathematical statistics are used to establish various asymptotic consistency results for the proposed method. Numerical simulations support the applicability of the training method. >

••

TL;DR: It is shown that a topographic product P, first introduced in nonlinear dynamics, is an appropriate measure of the preservation or violation of neighborhood relations and it is found that a 3D output space seems to be optimally suited to the data.

Abstract: It is shown that a topographic product P, first introduced in nonlinear dynamics, is an appropriate measure of the preservation or violation of neighborhood relations. It is sensitive to large-scale violations of the neighborhood ordering, but does not account for neighborhood ordering distortions caused by varying areal magnification factors. A vanishing value of the topographic product indicates a perfect neighborhood preservation; negative (positive) values indicate a too small (too large) output space dimensionality. In a simple example of maps from a 2D input space onto 1D, 2D, and 3D output spaces, it is demonstrated how the topographic product picks the correct output space dimensionality. In a second example, 19D speech data are mapped onto various output spaces and it is found that a 3D output space (instead of 2D) seems to be optimally suited to the data. This is an agreement with a recent speech recognition experiment on the same data set. >

••

TL;DR: An empirical study of the effects of limited precision in cascade-correlation networks on three different learning problems is presented and techniques for dynamic rescaling and probabilistic rounding that allow reliable convergence down to 7 bits of precision or less are introduced.

Abstract: A key question in the design of specialized hardware for simulation of neural networks is whether fixed-point arithmetic of limited numerical precision can be used with existing learning algorithms. An empirical study of the effects of limited precision in cascade-correlation networks on three different learning problems is presented. It is shown that learning can fail abruptly as the precision of network weights or weight-update calculations is reduced below a certain level, typically about 13 bits including the sign. Techniques for dynamic rescaling and probabilistic rounding that allow reliable convergence down to 7 bits of precision or less, with only a small and gradual reduction in the quality of the solutions, are introduced. >

••

TL;DR: The fuzzy systems performed well until over 50% of their fuzzy-associative-memory (FAM) rules were removed, and they also performed well when the key FAM equilibration rule was replaced with destructive, or ;sabotage', rules.

Abstract: Fuzzy control systems and neural-network control systems for backing up a simulated truck, and truck-and-trailer, to a loading dock in a parking lot are presented. The supervised backpropagation learning algorithm trained the neural network systems. The robustness of the neural systems was tested by removing random subsets of training data in learning sequences. The neural systems performed well but required extensive computation for training. The fuzzy systems performed well until over 50% of their fuzzy-associative-memory (FAM) rules were removed. They also performed well when the key FAM equilibration rule was replaced with destructive, or 'sabotage', rules. Unsupervised differential competitive learning (DCL) and product-space clustering adaptively generated FAM rules from training data. The original fuzzy control systems and neural control systems generated trajectory data. The DCL system rapidly recovered the underlying FAM rules. Product-space clustering converted the neural truck systems into structured sets of FAM rules that approximated the neural system's behavior. >

••

Bell Labs

^{1}TL;DR: It is shown that double backpropagation, as compared to backpropAGation, creates weights that are smaller, thereby causing the output of the neurons to spend more time in the linear region.

Abstract: In order to generalize from a training set to a test set, it is desirable that small changes in the input space of a pattern do not change the output components. This can be done by forcing this behavior as part of the training algorithm. This is done in double backpropagation by forming an energy function that is the sum of the normal energy term found in backpropagation and an additional term that is a function of the Jacobian. Significant improvement is shown with different architectures and different test sets, especially with architectures that had previously been shown to have very good performance when trained using backpropagation. It is shown that double backpropagation, as compared to backpropagation, creates weights that are smaller, thereby causing the output of the neurons to spend more time in the linear region. >

••

TL;DR: It is shown that using gradient descent with direct approximation of the gradient instead of back-propagation is more economical for parallel analog implementations and is suitable for multilayer recurrent networks as well.

Abstract: Previous work on analog VLSI implementation of multilayer perceptrons with on-chip learning has mainly targeted the implementation of algorithms such as back-propagation. Although back-propagation is efficient, its implementation in analog VLSI requires excessive computational hardware. It is shown that using gradient descent with direct approximation of the gradient instead of back-propagation is more economical for parallel analog implementations. It is shown that this technique (which is called 'weight perturbation') is suitable for multilayer recurrent networks as well. A discrete level analog implementation showing the training of an XOR network as an example is presented. >

••

TL;DR: A novel network called the validity index network (VI net), derived from radial basis function networks, fits functions and calculates confidence intervals for its predictions, indicating local regions of poor fit and extrapolation.

Abstract: A novel network called the validity index network (VI net) is presented. The VI net, derived from radial basis function networks, fits functions and calculates confidence intervals for its predictions, indicating local regions of poor fit and extrapolation. >

••

TL;DR: The authors describe experiments using a genetic algorithm for feature selection in the context of neural network classifiers, specifically, counterpropagation networks, and propose a method called the training set sampling, which selects feature sets that are as good as and occasionally better for counterPropagation than those chosen by an evaluation that uses the entire training set.

Abstract: The authors describe experiments using a genetic algorithm for feature selection in the context of neural network classifiers, specifically, counterpropagation networks. They present the novel techniques used in the application of genetic algorithms. First, the genetic algorithm is configured to use an approximate evaluation in order to reduce significantly the computation required. In particular, though the desired classifiers are counterpropagation networks, they use a nearest-neighbor classifier to evaluate features sets and show that the features selected by this method are effective in the context of counterpropagation networks. Second, a method called the training set sampling in which only a portion of the training set is used on any given evaluation, is proposed. Computational savings can be made using this method, i.e., evaluations can be made over an order of magnitude faster. This method selects feature sets that are as good as and occasionally better for counterpropagation than those chosen by an evaluation that uses the entire training set. >

••

TL;DR: The performance of these networks at recognizing types and handwritten numerals independently of their position, size, and orientation is compared with and found superior to the performance of a layered feedforward network to which image features extracted by the method of moments are presented as input.

Abstract: The classification and recognition of two-dimensional patterns independently of their position, orientation, and size by using high-order networks are discussed. A method is introduced for reducing and controlling the number of weights of a third-order network used for invariant pattern recognition. The method leads to economical networks that exhibit high recognition rates for translated, rotated, and scaled, as well as locally distorted, patterns. The performance of these networks at recognizing types and handwritten numerals independently of their position, size, and orientation is compared with and found superior to the performance of a layered feedforward network to which image features extracted by the method of moments are presented as input. >

••

TL;DR: In the approach described, the ANN outputs constitute the sequence of observation vectors for the HMM, and an algorithm is proposed for global optimization of all the parameters.

Abstract: The integration of multilayered and recurrent artificial neural networks (ANNs) with hidden Markov models (HMMs) is addressed. ANNs are suitable for approximating functions that compute new acoustic parameters, whereas HMMs have been proven successful at modeling the temporal structure of the speech signal. In the approach described, the ANN outputs constitute the sequence of observation vectors for the HMM. An algorithm is proposed for global optimization of all the parameters. Results on speaker-independent recognition experiments using this integrated ANN-HMM system on the TIMIT continuous speech database are reported. >

••

TL;DR: A method of initialization is introduced and shown to decrease the possibility of local minima occurring on various test problems and suggests sensible ways of choosing the weights from which the training process is initiated.

Abstract: The training of neural net classifiers is often hampered by the occurrence of local minima, which results in the attainment of inferior classification performance. It has been shown that the occurrence of local minima in the criterion function is often related to specific patterns of defects in the classifier. In particular, three main causes for local minima were identified. Such an understanding of the physical correlates of local minima suggests sensible ways of choosing the weights from which the training process is initiated. A method of initialization is introduced and shown to decrease the possibility of local minima occurring on various test problems. >

••

TL;DR: Various techniques of optimizing criterion functions to train neural-net classifiers are investigated and it is found that the stochastic technique is preferable on problems with large training sets and that the convergence rates of the variable metric and conjugate gradient techniques are similar.

Abstract: Various techniques of optimizing criterion functions to train neural-net classifiers are investigated. These techniques include three standard deterministic techniques (variable metric, conjugate gradient, and steepest descent), and a new stochastic technique. It is found that the stochastic technique is preferable on problems with large training sets and that the convergence rates of the variable metric and conjugate gradient techniques are similar. >

••

TL;DR: Using the new theory of information geometry, a natural invariant Riemannian metric and a dual pair of affine connections on the Boltzmann neural network manifold are established and the meaning of geometrical structures is elucidated from the stochastic and the statistical point of view.

Abstract: A Boltzmann machine is a network of stochastic neurons. The set of all the Boltzmann machines with a fixed topology forms a geometric manifold of high dimension, where modifiable synaptic weights of connections play the role of a coordinate system to specify networks. A learning trajectory, for example, is a curve in this manifold. It is important to study the geometry of the neural manifold, rather than the behavior of a single network, in order to know the capabilities and limitations of neural networks of a fixed topology. Using the new theory of information geometry, a natural invariant Riemannian metric and a dual pair of affine connections on the Boltzmann neural network manifold are established. The meaning of geometrical structures is elucidated from the stochastic and the statistical point of view. This leads to a natural modification of the Boltzmann machine learning rule. >

••

TL;DR: It is shown that, contrary to what might have been expected from the well-known representation theorems, three-layer nets are not sufficient for stabilization, but four-layernets are enough, assuming that threshold processors are used.

Abstract: The representational capabilities of one-hidden-layer and two-hidden-layer nets consisting of feedforward interconnections of linear threshold units are compared. It is remarked that for certain problems two hidden layers are required, contrary to what might be in principle expected from the known approximation theorems. The differences are not based on numerical accuracy or number of units needed, nor on capabilities for feature extraction, but rather on a much more basic classification into direct and inverse problems. The former correspond to the approximation of continuous functions, while the latter are concerned with approximating one-sided inverses of continuous functions, and are often encountered in the context of inverse kinematics determination or in control questions. A general result is given showing that nonlinear control systems can be stabilized using two hidden layers, but not, in general, using just one. >

••

TL;DR: Several generalizations of the fuzzy c-shells (FCS) algorithm are presented for characterizing and detecting clusters that are hyperellipsoidal shells and show that the AFCS algorithm requires less memory than the HT-based methods, and it is at least an order of magnitude faster than theHT approach.

Abstract: Several generalizations of the fuzzy c-shells (FCS) algorithm are presented for characterizing and detecting clusters that are hyperellipsoidal shells. An earlier generalization, the adaptive fuzzy c-shells (AFCS) algorithm, is examined in detail and is found to have global convergence problems when the shapes to be detected are partial. New formulations are considered wherein the norm inducing matrix in the distance metric is unconstrained in contrast to the AFCS algorithm. The resulting algorithm, called the AFCS-U algorithm, performs better for partial shapes. Another formulation based on the second-order quadrics equation is considered. These algorithms can detect ellipses and circles in 2D data. They are compared with the Hough transform (HT)-based methods for ellipse detection. Existing HT-based methods for ellipse detection are evaluated, and a multistage method incorporating the good features of all the methods is used for comparison. Numerical examples of real image data show that the AFCS algorithm requires less memory than the HT-based methods, and it is at least an order of magnitude faster than the HT approach. >

••

TL;DR: It is shown that the feedforward network (FFN) pattern learning rule is a first-order approximation of the FFN-batch learning rule, and is valid for nonlinear activation networks provided the learning rate is small.

Abstract: Four types of neural net learning rules are discussed for dynamic system identification. It is shown that the feedforward network (FFN) pattern learning rule is a first-order approximation of the FFN-batch learning rule. As a result, pattern learning is valid for nonlinear activation networks provided the learning rate is small. For recurrent types of networks (RecNs), RecN-pattern learning is different from RecN-batch learning. However, the difference can be controlled by using small learning rates. While RecN-batch learning is strict in a mathematical sense, RecN-pattern learning is simple to implement and can be implemented in a real-time manner. Simulation results agree very well with the theorems derived. It is shown by simulation that for system identification problems, recurrent networks are less sensitive to noise. >

••

TL;DR: It is shown that neural network classifiers with single-layer training can be applied efficiently to complex real-world classification problems such as the recognition of handwritten digits and provided appropriate data representations and learning rules are used, performance comparable to that obtained by more complex networks can be achieved.

Abstract: It is shown that neural network classifiers with single-layer training can be applied efficiently to complex real-world classification problems such as the recognition of handwritten digits. The STEPNET procedure, which decomposes the problem into simpler subproblems which can be solved by linear separators, is introduced. Provided appropriate data representations and learning rules are used, performance comparable to that obtained by more complex networks can be achieved. Results from two different databases are presented: an European database comprising 8700 isolated digits and a zip code database from the US Postal Service comprising 9000 segmented digits. A hardware implementation of the classifier is briefly described. >

••

TL;DR: The network proposed by M.P. Kennedy and L.O. Chua is justified from the viewpoint of optimization theory and the technique is extended to solve optimization problems, such as the least-squares problem.

Abstract: Neural networks for linear and quadratic programming are analyzed. The network proposed by M.P. Kennedy and L.O. Chua (IEEE Trans. Circuits Syst., vol.35, pp.554-562, May 1988) is justified from the viewpoint of optimization theory and the technique is extended to solve optimization problems, such as the least-squares problem. For quadratic programming, the network converges either to an equilibrium or to an exact solution, depending on whether the problem has constraints or not. The results also suggest an analytical approach to solve the linear system Bx=b without calculating the matrix inverse. The results are directly applicable to optimization problems with C/sup 2/ convex objective functions and linear constraints. The dynamics and applicability of the networks are demonstrated by simulation. The distance between the equilibria of the networks and the problem solutions can be controlled by the appropriate choice of a network parameter. >

••

TL;DR: A neural pattern recognition system which is insensitive to rotation of input pattern by various degrees is proposed and was used in a rotation-invariant coin recognition problem to distinguish between a 500 yen coin and a 500 won coin.

Abstract: In pattern recognition, it is often necessary to deal with problems to classify a transformed pattern. A neural pattern recognition system which is insensitive to rotation of input pattern by various degrees is proposed. The system consists of a fixed invariance network with many slabs and a trainable multilayered network. The system was used in a rotation-invariant coin recognition problem to distinguish between a 500 yen coin and a 500 won coin. The results show that the approach works well for variable rotation pattern recognition. >

••

TL;DR: A new approach to the fuzzy c spherical shells algorithm is presented, which uses a cluster validity measure to identify good clusters, merges all compatible clusters, and eliminates spurious clusters to achieve the final results.

Abstract: The fuzzy c spherical shells (FCSS) algorithm is specially designed to search for clusters that can be described by circular arcs or, generally, by shells of hyperspheres. A new approach to the FCSS algorithm is presented. This algorithm is computationally and implementationally simpler than other clustering algorithms that have been suggested for this purpose. An unsupervised algorithm which automatically finds the optimum number of clusters is not known. It uses a cluster validity measure to identify good clusters, merges all compatible clusters, and eliminates spurious clusters to achieve the final results. Experimental results on several data sets are presented. >