# Showing papers in "Neural Processing Letters in 1999"

••

TL;DR: A least squares version for support vector machine (SVM) classifiers that follows from solving a set of linear equations, instead of quadratic programming for classical SVM's.

Abstract: In this letter we discuss a least squares version for support vector machine (SVM) classifiers. Due to equality type constraints in the formulation, the solution follows from solving a set of linear equations, instead of quadratic programming for classical SVM‘s. The approach is illustrated on a two-spiral benchmark classification problem.

8,811 citations

••

TL;DR: It is shown that the fast fixed-point algorithm is closely connected to maximum likelihood estimation as well, and modifications of the algorithm maximize the likelihood without constraints.

Abstract: The author previously introduced a fast fixed-point algorithm for independent component analysis. The algorithm was derived from objective functions motivated by projection pursuit. In this paper, it is shown that the algorithm is closely connected to maximum likelihood estimation as well. The basic fixed-point algorithm maximizes the likelihood under the constraint of decorrelation, if the score function is used as the nonlinearity. Modifications of the algorithm maximize the likelihood without constraints.

300 citations

••

TL;DR: The Self-Organizing Map (SOM) and Learning Vector Quantization (LVQ) algorithms are constructed in this work for variable-length and warped feature sequences and good results have been obtained in speaker-independent speech recognition.

Abstract: The Self-Organizing Map (SOM) and Learning Vector Quantization (LVQ) algorithms are constructed in this work for variable-length and warped feature sequences. The novelty is to associate an entire feature vector sequence, instead of a single feature vector, as a model with each SOM node. Dynamic time warping is used to obtain time-normalized distances between sequences with different lengths. Starting with random initialization, ordered feature sequence maps then ensue, and Learning Vector Quantization can be used to fine tune the prototype sequences for optimal class separation. The resulting SOM models, the prototype sequences, can then be used for the recognition as well as synthesis of patterns. Good results have been obtained in speaker-independent speech recognition.

170 citations

••

TL;DR: SVM can be seen as a way to ‘sparsify’ Fisher's Linear Discriminant in order to obtain the most generalizing classification from the training set.

Abstract: We show that the orientation and location of the separating hyperplane for 2-class supervised pattern classification obtained by the Support Vector Machine (SVM) proposed by Vapnik and his colleagues, is equivalent to the solution obtained by Fisher‘s Linear Discriminant on the set of Support Vectors. In other words, SVM can be seen as a way to ’sparsify‘ Fisher‘s Linear Discriminant in order to obtain the most generalizing classification from the training set.

64 citations

••

KAIST

^{1}TL;DR: A new algorithm to train feed-forward neural networks for non-linear input-to-output mappings with small incomplete data in arbitrary distributions results in much better recognition accuracy for test data.

Abstract: A new algorithm is developed to train feed-forward neural networks for non-linear input-to-output mappings with small incomplete data in arbitrary distributions. The developed Training-EStimation-Training (TEST) algorithm consists of 3 steps, i.e., (1) training with the complete portion of the training data set, (2) estimation of the missing attributes with the trained neural networks, and (3) re-training the neural networks with the whole data set. Error back propagation is still applicable to estimate the missing attributes. Unlike other training methods with missing data, it does not assume data distribution models which may not be appropriate for small training data. The developed TEST algorithm is first tested for the Iris benchmark data. By randomly removing some attributes from the complete data set and estimating the values latter, accuracy of the TEST algorithm is demonstrated. Then it is applied to the Diabetes benchmark data, of which about 50% contains missing attributes. Compared with other existing algorithms, the proposed TEST algorithm results in much better recognition accuracy for test data.

46 citations

••

TL;DR: A novel neural model made up of two self-organizing maps nets – one on top of the other – is introduced and analysed experimentally that makes effective use of context information, and that enables it to perform sequence classification and discrimination efficiently.

Abstract: A novel neural model made up of two self-organizing maps nets – one on top of the other – is introduced and analysed experimentally. The model makes effective use of context information, and that enables it to perform sequence classification and discrimination efficiently. It was successfully applied to real sequences, taken from the third voice of the sixteenth four-part fugue in G minor of the Well-Tempered Clavier (vol. I) of J.S. Bach. The model has an application in domains which require pattern recognition, or more specifically, which demand the recognition of either a set of sequences of vectors in time or sub-sequences into a unique and large sequence of vectors in time.

38 citations

••

TL;DR: The new back-propagation algorithm is to change the derivative of the activation function so as to magnify the backward propagated error signal, thus the convergence rate can be accelerated and the local minimum can be escaped.

Abstract: The conventional back-propagation algorithm is basically a gradient-descent method, it has the problems of local minima and slow convergence. A new generalized back-propagation algorithm which can effectively speed up the convergence rate and reduce the chance of being trapped in local minima is introduced. The new back-propagation algorithm is to change the derivative of the activation function so as to magnify the backward propagated error signal, thus the convergence rate can be accelerated and the local minimum can be escaped. In this letter, we also investigate the convergence of the generalized back-propagation algorithm with constant learning rate. The weight sequences in generalized back-propagation algorithm can be approximated by a certain ordinary differential equation (ODE). When the learning rate tends to zero, the interpolated weight sequences of generalized back-propagation converge weakly to the solution of associated ODE.

37 citations

••

TL;DR: A method is proposed for constructing salient features from a set of features given as input to a feedforward neural network used for supervised learning that is applied to classification problems leading to improved generalization ability.

Abstract: A method is proposed for constructing salient features from a set of features that are given as input to a feedforward neural network used for supervised learning. Combinations of the original features are formed that maximize the sensitivity of the network‘s outputs with respect to variations of its inputs. The method exhibits some similarity to Principal Component Analysis, but also takes into account supervised character of the learning task. It is applied to classification problems leading to improved generalization ability originating from the alleviation of the curse of dimensionality problem.

31 citations

••

TL;DR: A neural learning-based crowd estimation system for surveillance in complex scenes at the platform of underground stations is presented and promising experimental results were obtained in terms of estimation accuracy and real-time response capability to alert the operators automatically.

Abstract: A neural learning-based crowd estimation system for surveillance in complex scenes at the platform of underground stations is presented. Estimation is carried out by extracting a set of significant features from the sequences of images. Feature indices are modeled by the neural networks to estimate the crowd density. The learning phase is based on our proposed hybrid algorithms which are capable of providing the global search characteristic and fast convergence speed. Promising experimental results were obtained in terms of estimation accuracy and real-time response capability to alert the operators automatically.

30 citations

••

TL;DR: Although no distance function over the input data is definable, it is still possible to implement the self-organizing map (SOM) process using evolutionary-learning operations, and an order that complies with the ‘functional similarity’ of the models can be seen to emerge.

Abstract: Although no distance function over the input data is definable, it is still possible to implement the self-organizing map (SOM) process using evolutionary-learning operations. The process can be made to converge more rapidly when the probabilistic trials of conventional evolutionary learning are replaced by averaging using the so-called Batch Map version of the self-organizing map. Although no other condition or metric than a fitness function between the input samples and the models is assumed, an order in the map that complies with the ’functional similarity‘ of the models can be seen to emerge. There exist two modes of use of this new principle: representation of nonmetric input data distributions by models that may have variable structures, and fast generation of evolutionary cycles that resemble those defined by the genetic algorithms. The spatial order in the array of models can be utilized for finding more uniform variations, such as crossings between functionally similar models.

29 citations

••

TL;DR: In the first approach, information from the derivative of the fuzzy system is used to regularize the neural network learning, whereas in the second approach the fuzzy rules are used as a catalyst to increase the learning speed.

Abstract: The incorporation of prior knowledge into neural networks can improve neural network learning in several respects, for example, a faster learning speed and better generalization ability. However, neural network learning is data driven and there is no general way to exploit knowledge which is not in the form of data input-output pairs. In this paper, we propose two approaches for incorporating knowledge into neural networks from fuzzy rules. These fuzzy rules are generated based on expert knowledge or intuition. In the first approach, information from the derivative of the fuzzy system is used to regularize the neural network learning, whereas in the second approach the fuzzy rules are used as a catalyst. Simulation studies show that both approaches increase the learning speed significantly.

••

TL;DR: A CFHNN is proposed which integrates a Compensated Fuzzy C-Means model into the learning scheme and updating strategies of the Hopfield neural network, and shows promising results in comparison with FCM and PFCM methods.

Abstract: Hopfield neural networks are well known for cluster analysis with an unsupervised learning scheme. This class of networks is a set of heuristic procedures that suffers from several problems such as not guaranteed convergence and output depending on the sequence of input data. In this paper, a Compensated Fuzzy Hopfield Neural Network (CFHNN) is proposed which integrates a Compensated Fuzzy C-Means (CFCM) model into the learning scheme and updating strategies of the Hopfield neural network. The CFCM, modified from Penalized Fuzzy C-Means algorithm (PFCM), is embedded into a Hopfield net to avoid the NP-hard problem and to speed up the convergence rate for the clustering procedure. The proposed network also avoids determining values for the weighting factors in the energy function. In addition, its training scheme enables the network to learn more rapidly and more effectively than FCM and PFCM. In experimental results, the CFHNN method shows promising results in comparison with FCM and PFCM methods.

••

TL;DR: Both the Grassberger–Procaccia and the Takens' method were applied, yielding similar values for the correlation dimension, hence for the model order, and appropriately structured neural nets for short-term prediction were designed.

Abstract: In this paper we present an application of dynamics reconstruction techniques to model order estimation Both the Grassberger–Procaccia and the Takens‘ method were applied, yielding similar values for the correlation dimension, hence for the model order Based on this model order, appropriately structured neural nets for short-term prediction were designed Satisfactory experimental results were obtained in one-hour-ahead electrical load forecasting on a six-month benchmark from an electric utility in the USA

••

TL;DR: The validity of the method in handling both pattern classification and time series problems is demonstrated, using a weighted mixture of a finite number of Gaussian kernels whose parameters and weights are estimated iteratively from the input samples using the Maximum Likelihood procedure.

Abstract: We address the problem of estimating an unknown probability density function from a sequence of input samples. We approximate the input density with a weighted mixture of a finite number of Gaussian kernels whose parameters and weights we estimate iteratively from the input samples using the Maximum Likelihood (ML) procedure. In order to decide on the correct total number of kernels we employ simple statistical tests involving the mean, variance, and the kurtosis, or fourth moment, of a particular kernel. We demonstrate the validity of our method in handling both pattern classification (stationary) and time series (nonstationary) problems.

••

[...]

TL;DR: It is shown that a tiled floor provides continuous stabilizing visual information against stumbling and falling while walking.

Abstract: It is shown that a tiled floor provides continuous stabilizing visual information against stumbling and falling while walking. A steady walk, eyes at a nearly constant height, maintains a certain level of net optical flow in a tile‘s image. A stumble generates a disturbance in the net flow, which is fed back as a corrective signal to the limbs. This may explain why people with certain motoric disorders, such as those associated with Parkinson‘s disease, appear to be more comfortable walking on tiled floors than on untiled ones.

••

Fudan University

^{1}TL;DR: It is proved that under weaker conditions, the conditions given in [1] are sufficient but unnecessary for the global asymptotically stable equilibrium of a class of delay differential equations.

Abstract: In this paper, we point out that the conditions given in [1] are sufficient but unnecessary for the global asymptotically stable equilibrium of a class of delay differential equations. Instead, we prove that under weaker conditions, it is still global asymptotically stable.

••

TL;DR: A modified Monte Carlo technique, the so-called Importance Sampling (IS) technique, is considered, and some topics are developed, such as optimal and suboptimal IS probability density functions, control parameters and new algorithms for the minimization of the estimator error.

Abstract: Often, Neural Networks are involved in binary detectors of communication, radar or sonar systems. The design phase of a neural network detector usually requires the application of Monte Carlo trials in order to estimate some performance parameters.
The classical Monte Carlo method is suitable to estimate high event probabilities (higher than 0.01), but not suitable to estimate very low event probabilities (say, 10^-5 or less). For estimations of very low false alarm probabilities (or error probabilities), a modified Monte Carlo technique, the so-called Importance Sampling (IS) technique, is considered in this paper; some topics are developed, such as optimal and suboptimal IS probability density functions (biasing density functions), control parameters and new algorithms for the minimization of the estimator error.
The main novelty of this paper is the application of an efficient IS technique on neural networks, drastically reducing the number of patterns required for testing events of low probability. As a practical application, the IS technique is applied to a neural detector on a radar (or sonar) system.

••

TL;DR: In the new procedure, the amplified signal received by a neuron from other neurons is treated as the target value for its activation (output) value, and the activation of a neuron is updated directly based on the difference between its current activation and the received target value.

Abstract: When solving an optimization problem with a Hopfield network, a solution is obtained after the network is relaxed to an equilibrium state. The relaxation process is an important step in achieving a solution. In this paper, a new procedure for the relaxation process is proposed. In the new procedure, the amplified signal received by a neuron from other neurons is treated as the target value for its activation (output) value. The activation of a neuron is updated directly based on the difference between its current activation and the received target value, without using the updating of the input value as an intermediate step. A relaxation rate is applied to control the updating scale for a smooth relaxation process. The new procedure is evaluated and compared with the original procedure in the Hopfield network through simulations based on 200 randomly generated instances of the 10-city traveling salesman problem. The new procedure reduces the error rate by 34.6% and increases the percentage of valid tours by 194.6% as compared with the original procedure.

••

TL;DR: The direct relation between the mean square error (MSE) and the statistical sensitivity to weight deviations is shown, defining a measure of tolerance based on statistical sentitivity that is called Mean Square Sensitivity (MSS); this allows us to predict accurately the degradation of the MSE when the weight values change and so constitutes a useful parameter for choosing between different configurations of MLPs.

Abstract: The inherent fault tolerance of artificial neural networks (ANNs) is usually assumed, but several authors have claimed that ANNs are not always fault tolerant and have demonstrated the need to evaluate their robustness by quantitative measures. For this purpose, various alternatives have been proposed. In this paper we show the direct relation between the mean square error (MSE) and the statistical sensitivity to weight deviations, defining a measure of tolerance based on statistical sentitivity that we have called Mean Square Sensitivity (MSS); this allows us to predict accurately the degradation of the MSE when the weight values change and so constitutes a useful parameter for choosing between different configurations of MLPs. The experimental results obtained for different MLPs are shown and demonstrate the validity of our model.

••

TL;DR: A new neural network training algorithm which optimises performance in relation to the available memory which has equivalent properties to Full Memory BFGS optimisation (FM) when there are no restrictions on memory and to FM with periodic reset when memory is limited.

Abstract: A new neural network training algorithm which optimises performance in relation to the available memory is described. Numerically it has equivalent properties to Full Memory BFGS optimisation (FM) when there are no restrictions on memory and to FM with periodic reset when memory is limited. Achievable performance is determined by the ratio between available memory and problem size and accordingly varies between that of the full and memory-less versions of the BFGS algorithm.

••

TL;DR: This article proposes an extension of the existing models by including a developmental phase – a growth process – of the neural network, based on the recursive encoding method for structure optimization of neural networks, applied to the problem domain of time series prediction.

Abstract: The interaction between learning and evolution has elicited much interest particularly among researchers who use evolutionary algorithms for the optimization of neural structures. In this article, we will propose an extension of the existing models by including a developmental phase – a growth process – of the neural network. In this way, we are able to examine the dynamical interaction between genetic information and information learned during development. Several measures are proposed to quantitatively examine the benefits and the effects of such an overlap between learning and evolution. The proposed model, which is based on the recursive encoding method for structure optimization of neural networks, is applied to the problem domain of time series prediction. Furthermore, comments are made on problem domains which associate growing networks (size) during development with problems of increasing complexity.

••

TL;DR: This paper deals with the design and implementation of a neural network-based self-tuning controller that takes advantage of the ability to learn of the neural networks and to use them in place of an identifier in the conventional self- Tuner scheme.

Abstract: This paper deals with the design and implementation of a neural network-based self-tuning controller. The structure of the controller is based on using a neural network, or a set of them, as a self-tuner for a controller. The intention of this approach is to take advantage of the ability to learn of the neural networks and to use them in place of an identifier in the conventional self-tuner scheme. The work is divided into two main parts. The first one is dedicated to the design of the self-controller. And the second is an application of the algorithm on a nonlinear system: an overhead crane. Some simulations were carried out to verify the efficiency of the self-tuner and then a real-time implementation on a scale prototype was performed.

••

TL;DR: A new initialization scheme is proposed based on the search for an explicit approximate solution to the problem of mapping between pattern and target to avoid local minima, reduce training time, obtain a better generalization and estimate the network's size.

Abstract: This paper concerns the initialization problem of the training algorithm in Neural Networks. We focus herein on backpropagation networks with one hidden layer. The initialization of the weights is crucial; if the network is incorrectly initialized, it converges to local minima. The classical random initialization therefore appears as a very poor solution. If we were to consider the Taylor development of the mapping problem and the nonlinearity of sigmoids, the improvements could be very significant. We propose a new initialization scheme based on the search for an explicit approximate solution to the problem of mapping between pattern and target. Simulation results are presented which show that these original initializations avoid local minima, reduce training time, obtain a better generalization and estimate the network‘s size.

••

TL;DR: A methodology based on self-organizing feature maps and indexing techniques for time and memory efficient neural network training and classification of large volumes of remotely sensed data is presented and shows dramatic improvement of the classification time of the k-nearest neighbors statistical classifier.

Abstract: A methodology based on self-organizing feature maps and indexing techniques for time and memory efficient neural network training and classification of large volumes of remotely sensed data is presented. Results on land-cover classification of multispectral satellite images using two popular neural models show orders of magnitude of speedup with respect to both training and classification times. The generality of the proposed methodology is demonstrated with a dramatic improvement of the classification time of the k-nearest neighbors statistical classifier.

••

TL;DR: The approximation properties of the RBF neural networks are investigated and a new approach is proposed, which is based on approximations with orthogonal combinations of functions, which can be used to design efficient neural networks.

Abstract: The approximation properties of the RBF neural networks are investigated in this paper. A new approach is proposed, which is based on approximations with orthogonal combinations of functions. An orthogonalization framework is presented for the Gaussian basis functions. It is shown how to use this framework to design efficient neural networks. Using this method we can estimate the necessary number of the hidden nodes, and we can evaluate how appropriate the use of the Gaussian RBF networks is for the approximation of a given function.

•

••

TL;DR: This paper described a method, in which by modifying the constraints imposed on the weights in HONn's, the performance of a HONN with respect to distortion can be improved considerably.

Abstract: The higher order neural network(HONN) was proved to be able to realize invariant object recognition. By taking the relationship between input units into account, HONN‘s are superior to other neural models in invariant pattern recognition. However, there are two main problems preventing HONN‘s from practical applications. One is the combinatorial increase of weight number, that is, as input size increases, the number of weights in a HONN increases exponentially. The other problem is sensitivity to distortion and noise. In this paper, we described a method, in which by modifying the constraints imposed on the weights in HONN‘s, the performance of a HONN with respect to distortion can be improved considerably.

••

TL;DR: The results immediately raise the question of why a perceptron with a continuous activation function may fail to recognize linear separability and how to remedy this failure.

Abstract: Recently it was pointed out that a well-known benchmark data set, the sonar target data, indeed is linearly separable. This fact comes somewhat surprising, since earlier studies involving delta rule trained perceptrons did not achieve the separation of the training data. These results immediately raise the question of why a perceptron with a continuous activation function may fail to recognize linear separability and how to remedy this failure. The study of these issues directly leads to a performance comparison of a wide variety of different perceptron training procedures on real world data.

••

TL;DR: The bounds given here allow the user to choose the appropriate size of a neural network such that: (i) the given classification problem can be solved, and (ii) the network architecture is not oversized.

Abstract: This paper presents a constructive approach to estimating the size of a neural network necessary to solve a given classification problem. The results are derived using an information entropy approach in the context of limited precision integer weights. Such weights are particularly suited for hardware implementations since the area they occupy is limited, and the computations performed with them can be efficiently implemented in hardware. The considerations presented use an information entropy perspective and calculate lower bounds on the number of bits needed in order to solve a given classification problem. These bounds are obtained by approximating the classification hypervolumes with the volumes of several regular (i.e., highly symmetric) n-dimensional bodies. The bounds given here allow the user to choose the appropriate size of a neural network such that: (i) the given classification problem can be solved, and (ii) the network architecture is not oversized. All considerations presented take into account the restrictive case of limited precision integer weights, and therefore can be directly applied when designing VLSI implementations of neural networks.

••

TL;DR: The weighted fun-in activation function may be replaced by a distance function between the inputs and the weights, offering a natural generalization of the standard MLP model.

Abstract: Multilayer Perceptrons (MLPs) use scalar products to compute weighted activation of neurons providing decision borders using combinations of soft hyperplanes. The weighted fun-in activation function may be replaced by a distance function between the inputs and the weights, offering a natural generalization of the standard MLP model. Non-Euclidean distance functions may also be introduced by normalization of the input vectors into an extended feature space. Both approaches influence the shapes of decision borders dramatically. An illustrative example showing these changes is provided.