scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Neural Networks in 1997"


Journal ArticleDOI
TL;DR: A hybrid neural-network for human face recognition which compares favourably with other methods and analyzes the computational complexity and discusses how new classes could be added to the trained recognizer.
Abstract: We present a hybrid neural-network for human face recognition which compares favourably with other methods. The system combines local image sampling, a self-organizing map (SOM) neural network, and a convolutional neural network. The SOM provides a quantization of the image samples into a topological space where inputs that are nearby in the original space are also nearby in the output space, thereby providing dimensionality reduction and invariance to minor changes in the image sample, and the convolutional neural network provides partial invariance to translation, rotation, scale, and deformation. The convolutional network extracts successively larger features in a hierarchical set of layers. We present results using the Karhunen-Loeve transform in place of the SOM, and a multilayer perceptron (MLP) in place of the convolutional network for comparison. We use a database of 400 images of 40 individuals which contains quite a high degree of variability in expression, pose, and facial details. We analyze the computational complexity and discuss how new classes could be added to the trained recognizer.

2,954 citations


Journal ArticleDOI
TL;DR: As one of the part of book categories, the nature of statistical learning theory always becomes the most wanted book.
Abstract: If you really want to be smarter, reading can be one of the lots ways to evoke and realize. Many people who like reading will have more knowledge and experiences. Reading can be a way to gain information from economics, politics, science, fiction, literature, religion, and many others. As one of the part of book categories, the nature of statistical learning theory always becomes the most wanted book. Many people are absolutely searching for this book. It means that many love to read this kind of book.

2,716 citations


Journal ArticleDOI
TL;DR: Referring to the above said paper by Narendra-Parthasarathy (ibid.
Abstract: Referring to the above said paper by Narendra-Parthasarathy (ibid., vol.1, p4-27 (1990)), it is noted that the given Example 2 (p.15) has a third equilibrium state corresponding to the point (0.5, 0.5).

1,528 citations


Journal ArticleDOI
TL;DR: In this paper, the authors discuss a variety of adaptive critic designs (ACDs) for neuro-control, which are suitable for learning in noisy, nonlinear, and nonstationary environments They have common roots as generalizations of dynamic programming for neural reinforcement learning approaches.
Abstract: We discuss a variety of adaptive critic designs (ACDs) for neurocontrol These are suitable for learning in noisy, nonlinear, and nonstationary environments They have common roots as generalizations of dynamic programming for neural reinforcement learning approaches Our discussion of these origins leads to an explanation of three design families: heuristic dynamic programming, dual heuristic programming, and globalized dual heuristic programming (GDHP) The main emphasis is on DHP and GDHP as advanced ACDs We suggest two new modifications of the original GDHP design that are currently the only working implementations of GDHP They promise to be useful for many engineering applications in the areas of optimization and optimal control Based on one of these modifications, we present a unified approach to all ACDs This leads to a generalized training procedure for ACDs

1,109 citations


Journal ArticleDOI
TL;DR: The experimental results show that EPNet can produce very compact ANNs with good generalization ability in comparison with other algorithms, and has been tested on a number of benchmark problems in machine learning and ANNs.
Abstract: This paper presents a new evolutionary system, i.e., EPNet, for evolving artificial neural networks (ANNs). The evolutionary algorithm used in EPNet is based on Fogel's evolutionary programming (EP). Unlike most previous studies on evolving ANN's, this paper puts its emphasis on evolving ANN's behaviors. Five mutation operators proposed in EPNet reflect such an emphasis on evolving behaviors. Close behavioral links between parents and their offspring are maintained by various mutations, such as partial training and node splitting. EPNet evolves ANN's architectures and connection weights (including biases) simultaneously in order to reduce the noise in fitness evaluation. The parsimony of evolved ANN's is encouraged by preferring node/connection deletion to addition. EPNet has been tested on a number of benchmark problems in machine learning and ANNs, such as the parity problem, the medical diagnosis problems, the Australian credit card assessment problem, and the Mackey-Glass time series prediction problem. The experimental results show that EPNet can produce very compact ANNs with good generalization ability in comparison with other algorithms.

891 citations


Journal ArticleDOI
TL;DR: Algorithms for wavelet network construction are proposed for the purpose of nonparametric regression estimation and particular attentions are paid to sparse training data so that problems of large dimension can be better handled.
Abstract: Wavelet networks are a class of neural networks consisting of wavelets. In this paper, algorithms for wavelet network construction are proposed for the purpose of nonparametric regression estimation. Particular attentions are paid to sparse training data so that problems of large dimension can be better handled. A numerical example on nonlinear system identification is presented for illustration.

760 citations


Journal ArticleDOI
TL;DR: A self-organized neural network performing two tasks: vector quantization of the submanifold in the data set (input space) and nonlinear projection of these quantizing vectors toward an output space, providing a revealing unfolding of theSub manifold.
Abstract: We present a new strategy called "curvilinear component analysis" (CCA) for dimensionality reduction and representation of multidimensional data sets. The principle of CCA is a self-organized neural network performing two tasks: vector quantization (VQ) of the submanifold in the data set (input space); and nonlinear projection (P) of these quantizing vectors toward an output space, providing a revealing unfolding of the submanifold. After learning, the network has the ability to continuously map any new point from one space into another: forward mapping of new points in the input space, or backward mapping of an arbitrary position in the output space.

721 citations


Journal ArticleDOI
TL;DR: The paper demonstrates a successful application of PDBNN to face recognition applications on two public (FERET and ORL) and one in-house (SCR) databases and experimental results on three different databases such as recognition accuracies as well as false rejection and false acceptance rates are elaborated.
Abstract: This paper proposes a face recognition system, based on probabilistic decision-based neural networks (PDBNN). With technological advance on microelectronic and vision system, high performance automatic techniques on biometric recognition are now becoming economically feasible. Among all the biometric identification methods, face recognition has attracted much attention in recent years because it has potential to be most nonintrusive and user-friendly. The PDBNN face recognition system consists of three modules: First, a face detector finds the location of a human face in an image. Then an eye localizer determines the positions of both eyes in order to generate meaningful feature vectors. The facial region proposed contains eyebrows, eyes, and nose, but excluding mouth (eye-glasses will be allowed). Lastly, the third module is a face recognizer. The PDBNN can be effectively applied to all the three modules. It adopts a hierarchical network structures with nonlinear basis functions and a competitive credit-assignment scheme. The paper demonstrates a successful application of PDBNN to face recognition applications on two public (FERET and ORL) and one in-house (SCR) databases. Regarding the performance, experimental results on three different databases such as recognition accuracies as well as false rejection and false acceptance rates are elaborated. As to the processing speed, the whole recognition process (including PDBNN processing for eye localization, feature extraction, and classification) consumes approximately one second on Sparc10, without using hardware accelerator or co-processor.

637 citations


Journal ArticleDOI
TL;DR: It is shown that neural networks can, in fact, represent and classify structured patterns and all the supervised networks developed for the classification of sequences can, on the whole, be generalized to structures.
Abstract: Standard neural networks and statistical methods are usually believed to be inadequate when dealing with complex structures because of their feature-based approach. In fact, feature-based approaches usually fail to give satisfactory solutions because of the sensitivity of the approach to the a priori selection of the features, and the incapacity to represent any specific information on the relationships among the components of the structures. However, we show that neural networks can, in fact, represent and classify structured patterns. The key idea underpinning our approach is the use of the so called "generalized recursive neuron", which is essentially a generalization to structures of a recurrent neuron. By using generalized recursive neurons, all the supervised networks developed for the classification of sequences, such as backpropagation through time networks, real-time recurrent networks, simple recurrent networks, recurrent cascade correlation networks, and neural trees can, on the whole, be generalized to structures. The results obtained by some of the above networks (with generalized recursive neurons) on the classification of logic terms are presented.

569 citations


Journal ArticleDOI
TL;DR: This interpretation of neural networks is built with fuzzy rules using a new fuzzy logic operator which is defined after introducing the concept of f-duality and offers an automated knowledge acquisition procedure.
Abstract: Artificial neural networks are efficient computing models which have shown their strengths in solving hard problems in artificial intelligence. They have also been shown to be universal approximators. Notwithstanding, one of the major criticisms is their being black boxes, since no satisfactory explanation of their behavior has been offered. In this paper, we provide such an interpretation of neural networks so that they will no longer be seen as black boxes. This is stated after establishing the equality between a certain class of neural nets and fuzzy rule-based systems. This interpretation is built with fuzzy rules using a new fuzzy logic operator which is defined after introducing the concept of f-duality. In addition, this interpretation offers an automated knowledge acquisition procedure.

488 citations


Journal ArticleDOI
TL;DR: A case is made in this paper that such approximate input-output models warrant a detailed study in their own right in view of their mathematical tractability as well as their success in simulation studies.
Abstract: The NARMA model is an exact representation of the input-output behavior of finite-dimensional nonlinear discrete-time dynamical systems in a neighborhood of the equilibrium state. However, it is not convenient for purposes of adaptive control using neural networks due to its nonlinear dependence on the control input. Hence, quite often, approximate methods are used for realizing the neural controllers to overcome computational complexity. In this paper, we introduce two classes of models which are approximations to the NARMA model, and which are linear in the control input. The latter fact substantially simplifies both the theoretical analysis as well as the practical implementation of the controller. Extensive simulation studies have shown that the neural controllers designed using the proposed approximate models perform very well, and in many cases even better than an approximate controller designed using the exact NARMA model. In view of their mathematical tractability as well as their success in simulation studies, a case is made in this paper that such approximate input-output models warrant a detailed study in their own right.

Journal ArticleDOI
TL;DR: A survey of constructive algorithms for structure learning in feed-forward neural networks for regression problems can be found in this paper, where the authors formulate the whole problem as a state-space search, with special emphasis on the search strategy.
Abstract: In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole problem as a state-space search, we first describe the general issues in constructive algorithms, with special emphasis on the search strategy. A taxonomy, based on the differences in the state transition mapping, the training algorithm, and the network architecture, is then presented.

Journal ArticleDOI
TL;DR: This paper proposes neural structures related to multilayer feedforward networks for performing complete independent component analysis (ICA) and modify the previous nonlinear PCA type algorithms so that their separation capabilities are greatly improved.
Abstract: Independent component analysis (ICA) is a recently developed, useful extension of standard principal component analysis (PCA). The ICA model is utilized mainly in blind separation of unknown source signals from their linear mixtures. In this application only the source signals which correspond to the coefficients of the ICA expansion are of interest. In this paper, we propose neural structures related to multilayer feedforward networks for performing complete ICA. The basic ICA network consists of whitening, separation, and basis vector estimation layers. It can be used for both blind source separation and estimation of the basis vectors of ICA. We consider learning algorithms for each layer, and modify our previous nonlinear PCA type algorithms so that their separation capabilities are greatly improved. The proposed class of networks yields good results in test examples with both artificial and real-world data.

Journal ArticleDOI
TL;DR: This paper proposes the use of a three-layer feedforward neural network to select those input attributes that are most useful for discriminating classes in a given set of input patterns.
Abstract: Feature selection is an integral part of most learning algorithms. Due to the existence of irrelevant and redundant attributes, by selecting only the relevant attributes of the data, higher predictive accuracy can be expected from a machine learning method. In this paper, we propose the use of a three-layer feedforward neural network to select those input attributes that are most useful for discriminating classes in a given set of input patterns. A network pruning algorithm is the foundation of the proposed algorithm. By adding a penalty term to the error function of the network, redundant network connections can be distinguished from those relevant ones by their small weights when the network training process has been completed. A simple criterion to remove an attribute based on the accuracy rate of the network is developed. The network is retrained after removal of an attribute, and the selection process is repeated until no attribute meets the criterion for removal. Our experimental results suggest that the proposed method works very well on a wide variety of classification problems.

Journal ArticleDOI
TL;DR: A proof is given showing that a three-layered feedforward network with N-1 hidden units can give any N input-target relations exactly, and a four-layering network is constructed and is found to give anyN input- target relations with a negligibly small error using only (N/2)+3 hidden units.
Abstract: Neural-network theorems state that only when there are infinitely many hidden units is a four-layered feedforward neural network equivalent to a three-layered feedforward neural network. In actual applications, however, the use of infinitely many hidden units is impractical. Therefore, studies should focus on the capabilities of a neural network with a finite number of hidden units, In this paper, a proof is given showing that a three-layered feedforward network with N-1 hidden units can give any N input-target relations exactly. Based on results of the proof, a four-layered network is constructed and is found to give any N input-target relations with a negligibly small error using only (N/2)+3 hidden units. This shows that a four-layered feedforward network is superior to a three-layered feedforward network in terms of the number of parameters needed for the training data.

Journal ArticleDOI
TL;DR: Two new methods for modeling the manifolds of digitized images of handwritten digits of principal components analysis and factor analysis are described, based on locally linear low-dimensional approximations to the underlying data manifold.
Abstract: This paper describes two new methods for modeling the manifolds of digitized images of handwritten digits. The models allow a priori information about the structure of the manifolds to be combined with empirical data. Accurate modeling of the manifolds allows digits to be discriminated using the relative probability densities under the alternative models. One of the methods is grounded in principal components analysis, the other in factor analysis. Both methods are based on locally linear low-dimensional approximations to the underlying data manifold. Links with other methods that model the manifold are discussed.

Journal ArticleDOI
TL;DR: A statistical theory for overtraining is proposed and it is shown that the asymptotic gain in the generalization error is small if the authors perform early stopping, even if they have access to the optimal stopping time.
Abstract: A statistical theory for overtraining is proposed. The analysis treats general realizable stochastic neural networks, trained with Kullback-Leibler divergence in the asymptotic case of a large number of training examples. It is shown that the asymptotic gain in the generalization error is small if we perform early stopping, even if we have access to the optimal stopping time. Based on the cross-validation stopping we consider the ratio the examples should be divided into training and cross-validation sets in order to obtain the optimum performance. Although cross-validated early stopping is useless in the asymptotic region, it surely decreases the generalization error in the nonasymptotic region. Our large scale simulations done on a CM5 are in good agreement with our analytical findings.

Journal ArticleDOI
TL;DR: This paper introduces a precise definition of topology preservation and proposes a tool for measuring it, the topographic function, and demonstrates the power of this tool for various examples of data manifolds.
Abstract: The neighborhood preservation of self-organizing feature maps like the Kohonen map is an important property which is exploited in many applications. However, if a dimensional conflict arises this property is lost. Various qualitative and quantitative approaches are known for measuring the degree of topology preservation. They are based on using the locations of the synaptic weight vectors. These approaches, however, may fail in case of nonlinear data manifolds. To overcome this problem, in this paper we present an approach which uses what we call the induced receptive fields for determining the degree of topology preservation. We first introduce a precise definition of topology preservation and then propose a tool for measuring it, the topographic function. The topographic function vanishes if and only if the map is topology preserving. We demonstrate the power of this tool for various examples of data manifolds.

Journal ArticleDOI
TL;DR: The proposed hybrid learning scheme provides a framework for incorporating existing algorithms in the training of GRBF networks, which include unsupervised algorithms for clustering and learning vector quantization, as well as learning algorithms for training single-layer linear neural networks.
Abstract: This paper proposes a framework for constructing and training radial basis function (RBF) neural networks. The proposed growing radial basis function (GRBF) network begins with a small number of prototypes, which determine the locations of radial basis functions. In the process of training, the GRBF network gross by splitting one of the prototypes at each growing cycle. Two splitting criteria are proposed to determine which prototype to split in each growing cycle. The proposed hybrid learning scheme provides a framework for incorporating existing algorithms in the training of GRBF networks. These include unsupervised algorithms for clustering and learning vector quantization, as well as learning algorithms for training single-layer linear neural networks. A supervised learning scheme based on the minimization of the localized class-conditional variance is also proposed and tested. GRBF neural networks are evaluated and tested on a variety of data sets with very satisfactory results.

Journal ArticleDOI
TL;DR: A new pruning method is developed, based on the idea of iteratively eliminating units and adjusting the remaining weights in such a way that the network performance does not worsen over the entire training set.
Abstract: The problem of determining the proper size of an artificial neural network is recognized to be crucial, especially for its practical implications in such important issues as learning and generalization. One popular approach for tackling this problem is commonly known as pruning and it consists of training a larger than necessary network and then removing unnecessary weights/nodes. In this paper, a new pruning method is developed, based on the idea of iteratively eliminating units and adjusting the remaining weights in such a way that the network performance does not worsen over the entire training set. The pruning problem is formulated in terms of solving a system of linear equations, and a very efficient conjugate gradient algorithm is used for solving it, in the least-squares sense. The algorithm also provides a simple criterion for choosing the units to be removed, which has proved to work well in practice. The results obtained over various test problems demonstrate the effectiveness of the proposed approach.

Journal ArticleDOI
TL;DR: A load forecasting system known as ANNSTLF (artificial neural-network short-term load forecaster) which has received wide acceptance by the electric utility industry and presently is being used by 32 utilities across the USA and Canada is described.
Abstract: A key component of the daily operation and planning activities of an electric utility is short-term load forecasting, i.e., the prediction of hourly loads (demand) for the next hour to several days out. The accuracy of such forecasts has significant economic impact for the utility. This paper describes a load forecasting system known as ANNSTLF (artificial neural-network short-term load forecaster) which has received wide acceptance by the electric utility industry and presently is being used by 32 utilities across the USA and Canada. ANNSTLF can consider the effect of temperature and relative humidity on the load. Besides its load forecasting engine, ANNSTLF contains forecasters that can generate the hourly temperature and relative humidity forecasts needed by the system. ANNSTLF is based on a multiple ANN strategy that captures various trends in the data. Both the first and the second generation of the load forecasting engine are discussed and compared. The building block of the forecasters is a multilayer perceptron trained with the error backpropagation learning rule. An adaptive scheme is employed to adjust the ANN weights during online forecasting. The forecasting models are site independent and only the number of hidden layer nodes of ANN's need to be adjusted for a new database. The results of testing the system on data from ten different utilities are reported.

Journal ArticleDOI
TL;DR: The aim is to derive a class of objective functions the computation of which and the corresponding weight updates can be done in O(N) time, where N is the number of training patterns.
Abstract: In this paper, we study a number of objective functions for training new hidden units in constructive algorithms for multilayer feedforward networks. The aim is to derive a class of objective functions the computation of which and the corresponding weight updates can be done in O(N) time, where N is the number of training patterns. Moreover, even though input weight freezing is applied during the process for computational efficiency, the convergence property of the constructive algorithms using these objective functions is still preserved. We also propose a few computational tricks that can be used to improve the optimization of the objective functions under practical situations. Their relative performance in a set of two-dimensional regression problems is also discussed.

Journal ArticleDOI
TL;DR: This paper presents an online system for fraud detection of credit card operations based on a neural classifier that is fully operational and currently handles more than 12 million operations per year with very satisfactory results.
Abstract: This paper presents an online system for fraud detection of credit card operations based on a neural classifier. Since it is installed in a transactional hub for operation distribution, and not on a card-issuing institution, it acts solely on the information of the operation to be rated and of its immediate previous history, and not on historic databases of past cardholder activities. Among the main characteristics of credit card traffic are the great imbalance between proper and fraudulent operations, and a great degree of mixing between both. To ensure proper model construction, a nonlinear version of Fisher's discriminant analysis, which adequately separates a good proportion of fraudulent operations away from other closer to normal traffic, has been used. The system is fully operational and currently handles more than 12 million operations per year with very satisfactory results.

Journal ArticleDOI
TL;DR: A case study of neural-network modeling techniques developed for the EMERALD system, which modeled a subset of modules representing over seven million lines of code from a very large telecommunications software system and found the neural- network model had better predictive accuracy.
Abstract: Society relies on telecommunications to such an extent that telecommunications software must have high reliability. Enhanced measurement for early risk assessment of latent defects (EMERALD) is a joint project of Nortel and Bell Canada for improving the reliability of telecommunications software products. This paper reports a case study of neural-network modeling techniques developed for the EMERALD system. The resulting neural network is currently in the prototype testing phase at Nortel. Neural-network models can be used to identify fault-prone modules for extra attention early in development, and thus reduce the risk of operational problems with those modules. We modeled a subset of modules representing over seven million lines of code from a very large telecommunications software system. The set consisted of those modules reused with changes from the previous release. The dependent variable was membership in the class of fault-prone modules. The independent variables were principal components of nine measures of software design attributes. We compared the neural-network model with a nonparametric discriminant model and found the neural-network model had better predictive accuracy.

Journal ArticleDOI
TL;DR: It is proved that the degree of population diversity converges to zero with probability one so that the search ability of a GA decreases and premature convergence occurs.
Abstract: In this paper, a concept of degree of population diversity is introduced to quantitatively characterize and theoretically analyze the problem of premature convergence in genetic algorithms (GAs) within the framework of Markov chain. Under the assumption that the mutation probability is zero, the search ability of GA is discussed. It is proved that the degree of population diversity converges to zero with probability one so that the search ability of a GA decreases and premature convergence occurs. Moreover, an explicit formula for the conditional probability of allele loss at a certain bit position is established to show the relationships between premature convergence and the GA parameters, such as population size, mutation probability, and some population statistics. The formula also partly answers the questions of to where a GA most likely converges. The theoretical results are all supported by the simulation experiments.

Journal ArticleDOI
Sung-Bae Cho1
TL;DR: Three sophisticated neural-network classifiers to solve complex pattern recognition problems: multiple multilayer perceptron (MLP) classifiers, hidden Markov model (HMM)/MLP hybrid classifier, and structure-adaptive self-organizing map (SOM) classifier are presented.
Abstract: Artificial neural networks have been recognized as a powerful tool for pattern classification problems, but a number of researchers have also suggested that straightforward neural-network approaches to pattern recognition are largely inadequate for difficult problems such as handwritten numeral recognition. In this paper, we present three sophisticated neural-network classifiers to solve complex pattern recognition problems: multiple multilayer perceptron (MLP) classifier, hidden Markov model (HMM)/MLP hybrid classifier, and structure-adaptive self-organizing map (SOM) classifier. In order to verify the superiority of the proposed classifiers, experiments were performed with the unconstrained handwritten numeral database of Concordia University, Montreal, Canada. The three methods have produced 97.35%, 96.55%, and 96.05% of the recognition rates, respectively, which are better than those of several previous methods reported in the literature on the same database.

Journal ArticleDOI
TL;DR: The experimental results obtained with the proposed approach show that the PCNN outperforms both a conjugate-gradient backpropagation neural network and conventional statistical methods in terms of overall classification accuracy of test data.
Abstract: A new type of a neural-network architecture, the parallel consensual neural network (PCNN), is introduced and applied in classification/data fusion of multisource remote sensing and geographic data. The PCNN architecture is based on statistical consensus theory and involves using stage neural networks with transformed input data. The input data are transformed several times and the different transformed data are used as if they were independent inputs. The independent inputs are first classified using the stage neural networks. The output responses from the stage networks are then weighted and combined to make a consensual decision. In this paper, optimization methods are used in order to weight the outputs from the stage networks. Two approaches are proposed to compute the data transforms for the PCNN, one for binary data and another for analog data. The analog approach uses wavelet packets. The experimental results obtained with the proposed approach show that the PCNN outperforms both a conjugate-gradient backpropagation neural network and conventional statistical methods in terms of overall classification accuracy of test data.

Journal ArticleDOI
TL;DR: A number of alternative ways to deal with the problem of variable selection, how to use model misspecification tests, and approaches to predictive neural modeling which are more in tune with the requirements for modeling financial data series are described.
Abstract: Neural networks have shown considerable successes in modeling financial data series. However, a major weakness of neural modeling is the lack of established procedures for performing tests for misspecified models, and tests of statistical significance for the various parameters that have been estimated. This is a serious disadvantage in applications where there is a strong culture for testing not only the predictive power of a model or the sensitivity of the dependent variable to changes in the inputs but also the statistical significance of the finding at a specified level of confidence. Rarely is this more important than in the case of financial engineering, where the data generating processes are dominantly stochastic and only partially deterministic. Partly a tutorial, partly a review, this paper describes a collection of typical applications in options pricing, cointegration, the term structure of interest rates and models of investor behavior which highlight these weaknesses and propose and evaluate a number of solutions. We describe a number of alternative ways to deal with the problem of variable selection, show how to use model misspecification tests, we deploy a novel way based on cointegration to deal with the problem of nonstationarity, and generally describe approaches to predictive neural modeling which are more in tune with the requirements for modeling financial data series.

Journal ArticleDOI
TL;DR: Experimental results presented here show that QNNs are capable of recognizing structures in data, a property that conventional FFNNs with sigmoidal hidden units lack.
Abstract: This paper introduces quantum neural networks (QNNs), a class of feedforward neural networks (FFNNs) inherently capable of estimating the structure of a feature space in the form of fuzzy sets. The hidden units of these networks develop quantized representations of the sample information provided by the training data set in various graded levels of certainty. Unlike other approaches attempting to merge fuzzy logic and neural networks, QNNs can be used in pattern classification problems without any restricting assumptions such as the availability of a priori knowledge or desired membership profile, convexity of classes, a limited number of classes, etc. Experimental results presented here show that QNNs are capable of recognizing structures in data, a property that conventional FFNNs with sigmoidal hidden units lack.

Journal ArticleDOI
TL;DR: This publication aims at determining the optimal variance (or range) for the initial weights and biases, which is the principal parameter of random initialization methods for both types of neural networks.
Abstract: Proper initialization is one of the most important prerequisites for fast convergence of feedforward neural networks like high-order and multilayer perceptrons. This publication aims at determining the optimal variance (or range) for the initial weights and biases, which is the principal parameter of random initialization methods for both types of neural networks. An overview of random weight initialization methods for multilayer perceptrons is presented. These methods are extensively tested using eight real-world benchmark data sets and a broad range of initial weight variances by means of more than 30000 simulations, in the aim to find the best weight initialization method for multilayer perceptrons. For high-order networks, a large number of experiments (more than 200000 simulations) was performed, using three weight distributions, three activation functions, several network orders, and the same eight data sets. The results of these experiments are compared to weight initialization techniques for multilayer perceptrons, which leads to the proposal of a suitable initialization method for high-order perceptrons. The conclusions on the initialization methods for both types of networks are justified by sufficiently small confidence intervals of the mean convergence times.