scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Neural Networks in 2004"


Journal ArticleDOI
TL;DR: The biological plausibility and computational efficiency of some of the most useful models of spiking and bursting neurons are discussed and their applicability to large-scale simulations of cortical neural networks is compared.
Abstract: We discuss the biological plausibility and computational efficiency of some of the most useful models of spiking and bursting neurons. We compare their applicability to large-scale simulations of cortical neural networks.

2,396 citations


Journal ArticleDOI
TL;DR: Adapt neural control schemes are proposed for two classes of uncertain multi-input/multi-output (MIMO) nonlinear systems in block-triangular forms that avoid the controller singularity problem completely without using projection algorithms.
Abstract: In this paper, adaptive neural control schemes are proposed for two classes of uncertain multi-input/multi-output (MIMO) nonlinear systems in block-triangular forms. The MIMO systems consist of interconnected subsystems, with couplings in the forms of unknown nonlinearities and/or parametric uncertainties in the input matrices, as well as in the system interconnections without any bounding restrictions. Using the block-triangular structure properties, the stability analyses of the closed-loop MIMO systems are shown in a nested iterative manner for all the states. By exploiting the special properties of the affine terms of the two classes of MIMO systems, the developed neural control schemes avoid the controller singularity problem completely without using projection algorithms. Semiglobal uniform ultimate boundedness (SGUUB) of all the signals in the closed-loop of MIMO nonlinear systems is achieved. The outputs of the systems are proven to converge to a small neighborhood of the desired trajectories. The control performance of the closed-loop system is guaranteed by suitably choosing the design parameters. The proposed schemes offer systematic design procedures for the control of the two classes of uncertain MIMO nonlinear systems. Simulation results are presented to show the effectiveness of the approach.

771 citations


Journal ArticleDOI
TL;DR: In this article, the authors address the problem of finding the pre-image of a feature vector in the feature space induced by a kernel, which is of central importance in some kernel applications such as on using kernel principal component analysis (PCA) for image denoising.
Abstract: In this paper, we address the problem of finding the pre-image of a feature vector in the feature space induced by a kernel. This is of central importance in some kernel applications, such as on using kernel principal component analysis (PCA) for image denoising. Unlike the traditional method in which relies on nonlinear optimization, our proposed method directly finds the location of the pre-image based on distance constraints in the feature space. It is noniterative, involves only linear algebra and does not suffer from numerical instability or local minimum problems. Evaluations on performing kernel PCA and kernel clustering on the USPS data set show much improved performance.

414 citations


Journal ArticleDOI
TL;DR: Two different backstepping neural network (NN) control approaches are presented for a class of affine nonlinear systems in the strict-feedback form with unknown nonlinearities and the controller singularity problem is avoided perfectly in both approaches.
Abstract: In this paper, two different backstepping neural network (NN) control approaches are presented for a class of affine nonlinear systems in the strict-feedback form with unknown nonlinearities. By a special design scheme, the controller singularity problem is avoided perfectly in both approaches. Furthermore, the closed loop signals are guaranteed to be semiglobally uniformly ultimately bounded and the outputs of the system are proved to converge to a small neighborhood of the desired trajectory. The control performances of the closed-loop systems can be shaped as desired by suitably choosing the design parameters. Simulation results obtained demonstrate the effectiveness of the approaches proposed. The differences observed between the inputs of the two controllers are analyzed briefly.

404 citations


Journal ArticleDOI
TL;DR: This work proposes a novel system for voiced speech segregation that segregates resolved and unresolved harmonics differently, and it yields substantially better performance, especially for the high-frequency part of speech.
Abstract: Segregating speech from one monaural recording has proven to be very challenging. Monaural segregation of voiced speech has been studied in previous systems that incorporate auditory scene analysis principles. A major problem for these systems is their inability to deal with the high-frequency part of speech. Psychoacoustic evidence suggests that different perceptual mechanisms are involved in handling resolved and unresolved harmonics. We propose a novel system for voiced speech segregation that segregates resolved and unresolved harmonics differently. For resolved harmonics, the system generates segments based on temporal continuity and cross-channel correlation, and groups them according to their periodicities. For unresolved harmonics, it generates segments based on common amplitude modulation (AM) in addition to temporal continuity and groups them according to AM rates. Underlying the segregation process is a pitch contour that is first estimated from speech segregated according to dominant pitch and then adjusted according to psychoacoustic constraints. Our system is systematically evaluated and compared with pervious systems, and it yields substantially better performance, especially for the high-frequency part of speech.

394 citations


Journal ArticleDOI
Volker Roth1
TL;DR: This paper presents a different class of kernel regressors that effectively overcome the above problems, and presents a highly efficient algorithm with guaranteed global convergence that defies a unique framework for sparse regression models in the very rich class of IRLS models.
Abstract: In the last few years, the support vector machine (SVM) method has motivated new interest in kernel regression techniques. Although the SVM has been shown to exhibit excellent generalization properties in many experiments, it suffers from several drawbacks, both of a theoretical and a technical nature: the absence of probabilistic outputs, the restriction to Mercer kernels, and the steep growth of the number of support vectors with increasing size of the training set. In this paper, we present a different class of kernel regressors that effectively overcome the above problems. We call this approach generalized LASSO regression. It has a clear probabilistic interpretation, can handle learning sets that are corrupted by outliers, produces extremely sparse solutions, and is capable of dealing with large-scale problems. For regression functionals which can be modeled as iteratively reweighted least-squares (IRLS) problems, we present a highly efficient algorithm with guaranteed global convergence. This defies a unique framework for sparse regression models in the very rich class of IRLS models, including various types of robust regression models and logistic regression. Performance studies for many standard benchmark datasets effectively demonstrate the advantages of this model over related approaches.

281 citations


Journal ArticleDOI
TL;DR: An efficient face recognition scheme which has two features: representation of face images by two-dimensional wavelet subband coefficients and recognition by a modular, personalised classification method based on kernel associative memory models.
Abstract: In this paper, we propose an efficient face recognition scheme which has two features: 1) representation of face images by two-dimensional (2D) wavelet subband coefficients and 2) recognition by a modular, personalised classification method based on kernel associative memory models. Compared to PCA projections and low resolution "thumb-nail" image representations, wavelet subband coefficients can efficiently capture substantial facial features while keeping computational complexity low. As there are usually very limited samples, we constructed an associative memory (AM) model for each person and proposed to improve the performance of AM models by kernel methods. Specifically, we first applied kernel transforms to each possible training pair of faces sample and then mapped the high-dimensional feature space back to input space. Our scheme using modular autoassociative memory for face recognition is inspired by the same motivation as using autoencoders for optical character recognition (OCR), for which the advantages has been proven. By associative memory, all the prototypical faces of one particular person are used to reconstruct themselves and the reconstruction error for a probe face image is used to decide if the probe face is from the corresponding person. We carried out extensive experiments on three standard face recognition datasets, the FERET data, the XM2VTS data, and the ORL data. Detailed comparisons with earlier published results are provided and our proposed scheme offers better recognition accuracy on all of the face datasets.

268 citations


Journal ArticleDOI
TL;DR: Under various mild conditions, the proposed general projection neural network is shown to be globally convergent, globally asymptotically stable, and globally exponentially stable.
Abstract: Recently, a projection neural network for solving monotone variational inequalities and constrained optimization problems was developed. In this paper, we propose a general projection neural network for solving a wider class of variational inequalities and related optimization problems. In addition to its simple structure and low complexity, the proposed neural network includes existing neural networks for optimization, such as the projection neural network, the primal-dual neural network, and the dual neural network, as special cases. Under various mild conditions, the proposed general projection neural network is shown to be globally convergent, globally asymptotically stable, and globally exponentially stable. Furthermore, several improved stability criteria on two special cases of the general projection neural network are obtained under weaker conditions. Simulation results demonstrate the effectiveness and characteristics of the proposed neural network.

254 citations


Journal ArticleDOI
V. Singh1
TL;DR: A novel linear matrix inequality (LMI)-based criterion for the global asymptotic stability and uniqueness of the equilibrium point of a class of delayed cellular neural networks (CNNs) is presented and turns out to be a generalization and improvement over some previous criteria.
Abstract: A novel linear matrix inequality (LMI)-based criterion for the global asymptotic stability and uniqueness of the equilibrium point of a class of delayed cellular neural networks (CNNs) is presented. The criterion turns out to be a generalization and improvement over some previous criteria.

216 citations


Journal ArticleDOI
TL;DR: A new class of probabilistic neural networks (PNNs) working in nonstationary environment is proposed and definitions of optimality of PNNs in time-varying environment are presented, for the first time in literature,.
Abstract: In this paper, we propose a new class of probabilistic neural networks (PNNs) working in nonstationary environment. The novelty is summarized as follows: 1) We formulate the problem of pattern classification in nonstationary environment as the prediction problem and design a probabilistic neural network to classify patterns having time-varying probability distributions. We note that the problem of pattern classification in the nonstationary case is closely connected with the problem of prediction because on the basis of a learning sequence of the length n, a pattern in the moment n+k, k/spl ges/1 should be classified. 2) We present, for the first time in literature, definitions of optimality of PNNs in time-varying environment. Moreover, we prove that our PNNs asymptotically approach the Bayes-optimal (time-varying) decision surface. 3) We investigate the speed of convergence of constructed PNNs. 4) We design in detail PNNs based on Parzen kernels and multivariate Hermite series.

211 citations


Journal ArticleDOI
TL;DR: Experimental results show that the proposed neural connectionism approaches, with respect to the nonneural ones, are more efficient and feasible in finding the arbitrary roots of arbitrary polynomials.
Abstract: This paper proposes a constructive approach for finding arbitrary (real or complex) roots of arbitrary (real or complex) polynomials by multilayer perceptron network (MLPN) using constrained learning algorithm (CLA), which encodes the a priori information of constraint relations between root moments and coefficients of a polynomial into the usual BP algorithm (BPA). Moreover, the root moment method (RMM) is also simplified into a recursive version so that the computational complexity can be further decreased, which leads the roots of those higher order polynomials to be readily found. In addition, an adaptive learning parameter with the CLA is also proposed in this paper; an initial weight selection method is also given. Finally, several experimental results show that our proposed neural connectionism approaches, with respect to the nonneural ones, are more efficient and feasible in finding the arbitrary roots of arbitrary polynomials.

Journal ArticleDOI
TL;DR: Improved wavelet-based image fusion procedure is improved by applying the discrete wavelet frame transform (DWFT) and the support vector machines (SVM), which yields a translation-invariant signal representation.
Abstract: Many vision-related processing tasks, such as edge detection, image segmentation and stereo matching, can be performed more easily when all objects in the scene are in good focus. However, in practice, this may not be always feasible as optical lenses, especially those with long focal lengths, only have a limited depth of field. One common approach to recover an everywhere-in-focus image is to use wavelet-based image fusion. First, several source images with different focuses of the same scene are taken and processed with the discrete wavelet transform (DWT). Among these wavelet decompositions, the wavelet coefficient with the largest magnitude is selected at each pixel location. Finally, the fused image can be recovered by performing the inverse DWT. In this paper, we improve this fusion procedure by applying the discrete wavelet frame transform (DWFT) and the support vector machines (SVM). Unlike DWT, DWFT yields a translation-invariant signal representation. Using features extracted from the DWFT coefficients, a SVM is trained to select the source image that has the best focus at each pixel location, and the corresponding DWFT coefficients are then incorporated into the composite wavelet representation. Experimental results show that the proposed method outperforms the traditional approach both visually and quantitatively.

Journal ArticleDOI
TL;DR: A novel independent component analysis algorithm, which is truly blind to the particular underlying distribution of the mixed signals, is introduced, which consistently outperformed all state-of-the-art ICA methods and demonstrated the following properties.
Abstract: In this paper, we introduce a novel independent component analysis (ICA) algorithm, which is truly blind to the particular underlying distribution of the mixed signals. Using a nonparametric kernel density estimation technique, the algorithm performs simultaneously the estimation of the unknown probability density functions of the source signals and the estimation of the unmixing matrix. Following the proposed approach, the blind signal separation framework can be posed as a nonlinear optimization problem, where a closed form expression of the cost function is available, and only the elements of the unmixing matrix appear as unknowns. We conducted a series of Monte Carlo simulations, involving linear mixtures of various source signals with different statistical characteristics and sample sizes. The new algorithm not only consistently outperformed all state-of-the-art ICA methods, but also demonstrated the following properties: 1) Only a flexible model, capable of learning the source statistics, can consistently achieve an accurate separation of all the mixed signals. 2) Adopting a suitably designed optimization framework, it is possible to derive a flexible ICA algorithm that matches the stability and convergence properties of conventional algorithms. 3) A nonparametric approach does not necessarily require large sample sizes in order to outperform methods with fixed or partially adaptive contrast functions.

Journal ArticleDOI
TL;DR: The authors discuss delayed Cohen-Grossberg neural network models and investigate their global exponential stability of the equilibrium point for the systems.
Abstract: The authors discuss delayed Cohen-Grossberg neural network models and investigate their global exponential stability of the equilibrium point for the systems. A set of sufficient conditions ensuring robust global exponential convergence of the Cohen-Grossberg neural networks with time delays are given.

Journal ArticleDOI
TL;DR: This paper proposes a neuro-fuzzy scheme for designing a classifier along with feature selection, a four-layered feed-forward network for realizing a fuzzy rule-based classifier.
Abstract: Most methods of classification either ignore feature analysis or do it in a separate phase, offline prior to the main classification task. This paper proposes a neuro-fuzzy scheme for designing a classifier along with feature selection. It is a four-layered feed-forward network for realizing a fuzzy rule-based classifier. The network is trained by error backpropagation in three phases. In the first phase, the network learns the important features and the classification rules. In the subsequent phases, the network is pruned to an "optimal" architecture that represents an "optimal" set of rules. Pruning is found to drastically reduce the size of the network without degrading the performance. The pruned network is further tuned to improve performance. The rules learned by the network can be easily read from the network. The system is tested on both synthetic and real data sets and found to perform quite well.

Journal ArticleDOI
TL;DR: This paper elaborate upon the claim that clustering in the recurrent layer of recurrent neural networks (RNNs) reflects meaningful information processing states even prior to training.
Abstract: In this paper, we elaborate upon the claim that clustering in the recurrent layer of recurrent neural networks (RNNs) reflects meaningful information processing states even prior to training. By concentrating on activation clusters in RNNs, while not throwing away the continuous state space network dynamics, we extract predictive models that we call neural prediction machines (NPMs). When RNNs with sigmoid activation functions are initialized with small weights (a common technique in the RNN community), the clusters of recurrent activations emerging prior to training are indeed meaningful and correspond to Markov prediction contexts. In this case, the extracted NPMs correspond to a class of Markov models, called variable memory length Markov models (VLMMs). In order to appreciate how much information has really been induced during the training, the RNN performance should always be compared with that of VLMMs and NPMs extracted before training as the "null" base models. Our arguments are supported by experiments on a chaotic symbolic sequence and a context-free language with a deep recursive structure.

Journal ArticleDOI
TL;DR: This paper presents a neuromorphic analog very large scale integration (VLSI) circuit that contains a feedforward network of silicon neurons with STDP synapses and shows that the chip can detect and amplify hierarchical spike-timing synchrony structures embedded in noisy spike trains.
Abstract: Spike-timing dependent synaptic plasticity (STDP) is a form of plasticity driven by precise spike-timing differences between presynaptic and postsynaptic spikes. Thus, the learning rules underlying STDP are suitable for learning neuronal temporal phenomena such as spike-timing synchrony. It is well known that weight-independent STDP creates unstable learning processes resulting in balanced bimodal weight distributions. In this paper, we present a neuromorphic analog very large scale integration (VLSI) circuit that contains a feedforward network of silicon neurons with STDP synapses. The learning rule implemented can be tuned to have a moderate level of weight dependence. This helps stabilise the learning process and still generates binary weight distributions. From on-chip learning experiments we show that the chip can detect and amplify hierarchical spike-timing synchrony structures embedded in noisy spike trains. The weight distributions of the network emerging from learning are bimodal.

Journal ArticleDOI
TL;DR: The scalar equation approach to Boolean network models is further developed and then applied to two interesting biological models and gives immediate information about both cycle and transient structure of the network.
Abstract: One way of coping with the complexity of biological systems is to use the simplest possible models which are able to reproduce at least some nontrivial features of reality. Although two value Boolean models have a long history in technology, it is perhaps a little bit surprising that they can also represent important features of living organizms. In this paper, the scalar equation approach to Boolean network models is further developed and then applied to two interesting biological models. In particular, a linear reduced scalar equation is derived from a more rudimentary nonlinear scalar equation. This simpler, but higher order, two term equation gives immediate information about both cycle and transient structure of the network.

Journal ArticleDOI
TL;DR: Experimental results on simulated and real-world data sets indicate that the approach works well even on large data sets, and has the advantages of Bayesian methods for model adaptation and error bars of its predictions.
Abstract: In this paper, we use a unified loss function, called the soft insensitive loss function, for Bayesian support vector regression. We follow standard Gaussian processes for regression to set up the Bayesian framework, in which the unified loss function is used in the likelihood evaluation. Under this framework, the maximum a posteriori estimate of the function values corresponds to the solution of an extended support vector regression problem. The overall approach has the merits of support vector regression such as convex quadratic programming and sparsity in solution representation. It also has the advantages of Bayesian methods for model adaptation and error bars of its predictions. Experimental results on simulated and real-world data sets indicate that the approach works well even on large data sets.

Journal ArticleDOI
TL;DR: A new decoding function is introduced that combines the margins through an estimate of their class conditional probabilities, which can be used to tune kernel hyperparameters and empirical evaluations on model selection indicate that the bound leads to good estimates of kernel parameters.
Abstract: We study the problem of multiclass classification within the framework of error correcting output codes (ECOC) using margin-based binary classifiers. Specifically, we address two important open problems in this context: decoding and model selection. The decoding problem concerns how to map the outputs of the classifiers into class codewords. In this paper we introduce a new decoding function that combines the margins through an estimate of their class conditional probabilities. Concerning model selection, we present new theoretical results bounding the leave-one-out (LOO) error of ECOC of kernel machines, which can be used to tune kernel hyperparameters. We report experiments using support vector machines as the base binary classifiers, showing the advantage of the proposed decoding function over other functions of I he margin commonly used in practice. Moreover, our empirical evaluations on model selection indicate that the bound leads to good estimates of kernel parameters.

Journal ArticleDOI
TL;DR: It is demonstrated that the estimation errors decrease as the SOM training proceeds, allowing the VQTAM scheme to be understood as a self-supervised gradient-based error reduction method.
Abstract: In this paper, we introduce a general modeling technique, called vector-quantized temporal associative memory (VQTAM), which uses Kohonen's self-organizing map (SOM) as an alternative to multilayer perceptron (MLP) and radial basis function (RBF) neural models for dynamical system identification and control. We demonstrate that the estimation errors decrease as the SOM training proceeds, allowing the VQTAM scheme to be understood as a self-supervised gradient-based error reduction method. The performance of the proposed approach is evaluated on a variety of complex tasks, namely: i) time series prediction; ii) identification of SISO/MIMO systems; and iii) nonlinear predictive control. For all tasks, the simulation results produced by the SOM are as accurate as those produced by the MLP network, and better than those produced by the RBF network. The SOM has also shown to be less sensitive to weight initialization than MLP networks. We conclude the paper by discussing the main properties of the VQTAM and their relationships to other well established methods for dynamical system identification. We also suggest directions for further work.

Journal ArticleDOI
TL;DR: Using the local inhibition, conditions for nondivergence are derived, which not only guarantee nondiversgence, but also allow for the existence of multiequilibrium points.
Abstract: This paper studies the multistability of a class of discrete-time recurrent neural networks with unsaturating piecewise linear activation functions. It addresses the nondivergence, global attractivity, and complete stability of the networks. Using the local inhibition, conditions for nondivergence are derived, which not only guarantee nondivergence, but also allow for the existence of multiequilibrium points. Under these nondivergence conditions, global attractive compact sets are obtained. Complete stability is studied via constructing novel energy functions and using the well-known Cauchy Convergence Principle. Examples and simulation results are used to illustrate the theory.

Journal ArticleDOI
TL;DR: A general overview and unification of several information theoretic criteria for the extraction of a single independent component is presented and tools that extend these criteria to allow the simultaneous blind extraction of subsets with an arbitrary number of independent components are presented.
Abstract: This paper reports a study on the problem of the blind simultaneous extraction of specific groups of independent components from a linear mixture. This paper first presents a general overview and unification of several information theoretic criteria for the extraction of a single independent component. Then, our contribution fills the theoretical gap that exists between extraction and separation by presenting tools that extend these criteria to allow the simultaneous blind extraction of subsets with an arbitrary number of independent components. In addition, we analyze a family of learning algorithms based on Stiefel manifolds and the natural gradient ascent, present the nonlinear optimal activations (score) functions, and provide new or extended local stability conditions. Finally, we illustrate the performance and features of the proposed approach by computer-simulation experiments.

Journal ArticleDOI
TL;DR: A discriminative learning algorithm to optimize the parameters of MQDF with aim to improve the classification accuracy while preserving the superior noncharacter resistance is proposed, which is justified in handwritten digit recognition and numeral string recognition.
Abstract: In character string recognition integrating segmentation and classification, high classification accuracy and resistance to noncharacters are desired to the underlying classifier. In a previous evaluation study, the modified quadratic discriminant function (MQDF) proposed by Kimura et al. was shown to be superior in noncharacter resistance but inferior in classification accuracy to neural networks. This paper proposes a discriminative learning algorithm to optimize the parameters of MQDF with aim to improve the classification accuracy while preserving the superior noncharacter resistance. We refer to the resulting classifier as discriminative learning QDF (DLQDF). The parameters of DLQDF adhere to the structure of MQDF under the Gaussian density assumption and are optimized under the minimum classification error (MCE) criterion. The promise of DLQDF is justified in handwritten digit recognition and numeral string recognition, where the performance of DLQDF is comparable to or superior to that of neural classifiers. The results are also competitive to the best ones reported in the literature.

Journal ArticleDOI
TL;DR: This paper shows that the spkmeans algorithm can be derived from a certain maximum likelihood formulation using a mixture of von Mises-Fisher distributions as the generative model, and in fact, it can be considered as a batch-mode version of (normalized) competitive learning.
Abstract: Competitive learning mechanisms for clustering, in general, suffer from poor performance for very high-dimensional (>1000) data because of "curse of dimensionality" effects. In applications such as document clustering, it is customary to normalize the high-dimensional input vectors to unit length, and it is sometimes also desirable to obtain balanced clusters, i.e., clusters of comparable sizes. The spherical kmeans (spkmeans) algorithm, which normalizes the cluster centers as well as the inputs, has been successfully used to cluster normalized text documents in 2000+ dimensional space. Unfortunately, like regular kmeans and its soft expectation-maximization-based version, spkmeans tends to generate extremely imbalanced clusters in high-dimensional spaces when the desired number of clusters is large (tens or more). This paper first shows that the spkmeans algorithm can be derived from a certain maximum likelihood formulation using a mixture of von Mises-Fisher distributions as the generative model, and in fact, it can be considered as a batch-mode version of (normalized) competitive learning. The proposed generative model is then adapted in a principled way to yield three frequency-sensitive competitive learning variants that are applicable to static data and produced high-quality and well-balanced clusters for high-dimensional data. Like kmeans, each iteration is linear in the number of data points and in the number of clusters for all the three algorithms. A frequency-sensitive algorithm to cluster streaming data is also proposed. Experimental results on clustering of high-dimensional text data sets are provided to show the effectiveness and applicability of the proposed techniques.

Journal ArticleDOI
TL;DR: This work proposes the use of a "nonnegative principal component analysis (nonnegative PCA)" algorithm, which is a special case of the nonlinear PCA algorithm, but with a rectification nonlinearity, and conjecture that this algorithm will find such nonnegative well-grounded independent sources, under reasonable initial conditions.
Abstract: We consider the task of independent component analysis when the independent sources are known to be nonnegative and well-grounded, so that they have a nonzero probability density function (pdf) in the region of zero. We propose the use of a "nonnegative principal component analysis (nonnegative PCA)" algorithm, which is a special case of the nonlinear PCA algorithm, but with a rectification nonlinearity, and we conjecture that this algorithm will find such nonnegative well-grounded independent sources, under reasonable initial conditions. While the algorithm has proved difficult to analyze in the general case, we give some analytical results that are consistent with this conjecture and some numerical simulations that illustrate its operation.

Journal ArticleDOI
TL;DR: It is shown that the proposed neural network is stable in the sense of Lyapunov and can converge to an exact optimal solution of the original problem.
Abstract: In this paper, we present a neural network for solving the nonlinear convex programming problem in real time by means of the projection method. The main idea is to convert the convex programming problem into a variational inequality problem. Then a dynamical system and a convex energy function are constructed for resulting variational inequality problem. It is shown that the proposed neural network is stable in the sense of Lyapunov and can converge to an exact optimal solution of the original problem. Compared with the existing neural networks for solving the nonlinear convex programming problem, the proposed neural network has no Lipschitz condition, no adjustable parameter, and its structure is simple. The validity and transient behavior of the proposed neural network are demonstrated by some simulation results.

Journal ArticleDOI
TL;DR: This work presents theoretical and simulation evidence that lone noisy threshold and continuous neurons exhibit the SR effect in terms of the mutual information between random input and output sequences, and a new statistically robust learning law can find this entropy-optimal noise level.
Abstract: Noise can improve how memoryless neurons process signals and maximize their throughput information. Such favorable use of noise is the so-called "stochastic resonance" or SR effect at the level of threshold neurons and continuous neurons. This work presents theoretical and simulation evidence that 1) lone noisy threshold and continuous neurons exhibit the SR effect in terms of the mutual information between random input and output sequences, 2) a new statistically robust learning law can find this entropy-optimal noise level, and 3) the adaptive SR effect is robust against highly impulsive noise with infinite variance. Histograms estimate the relevant probability density functions at each learning iteration. A theorem shows that almost all noise probability density functions produce some SR effect in threshold neurons even if the noise is impulsive and has infinite variance. The optimal noise level in threshold neurons also behaves nonlinearly as the input signal amplitude increases. Simulations further show that the SR effect persists for several sigmoidal neurons and for Gaussian radial-basis-function neurons.

Journal ArticleDOI
TL;DR: The generalized regession neural networks studied in this paper are able to follow changes of the best model, i.e., time-varying regression functions, and prove convergence of the GRNN based on general learning theorems presented in Section IV.
Abstract: The current state of knowledge regarding nonstationary processes is significantly poorer then in the case of stationary signals. In many applications, signals are treated as stationary only because in this way it is easier to analyze them; in fact, they are nonstationary. Nonstationary processes are undoubtedly more difficult to analyze and their diversity makes application of universal tools impossible. In this paper we propose a new class of generalized regression neural networks working in nonstationary environment. The generalized regession neural networks (GRNN) studied in this paper are able to follow changes of the best model, i.e., time-varying regression functions. The novelty is summarized as follows: 1) We present adaptive GRNN tracking time-varying regression functions. 2) We prove convergence of the GRNN based on general learning theorems presented in Section IV. 3) We design in detail special GRNN based on the Parzen and orthogonal series kernels. In each case we precise conditions ensuring convergence of the GRNN to the best models described by regression function. 4) We investigate speed of convergence of the GRNN and compare performance of specific structures based on the Parzen kernel and orthogonal series kernel. 5) We study various nonstationarities (multiplicative, additive, "scale change," "movable argument") and design in each case the GRNN based on the Parzen kernel and orthogonal series kernel.

Journal ArticleDOI
TL;DR: This paper presents feature selection algorithms for multilayer perceptrons (MLPs) and multiclass support vector machines (SVMs), using mutual information between class labels and classifier outputs, as an objective function.
Abstract: This paper presents feature selection algorithms for multilayer perceptrons (MLPs) and multiclass support vector machines (SVMs), using mutual information between class labels and classifier outputs, as an objective function. This objective function involves inexpensive computation of information measures only on discrete variables; provides immunity to prior class probabilities; and brackets the probability of error of the classifier. The maximum output information (MOI) algorithms employ this function for feature subset selection by greedy elimination and directed search. The output of the MOI algorithms is a feature subset of user-defined size and an associated trained classifier (MLP/SVM). These algorithms compare favorably with a number of other methods in terms of performance on various artificial and real-world data sets.