scispace - formally typeset
Search or ask a question

Showing papers on "Artificial neural network published in 1992"


Journal ArticleDOI
TL;DR: It is suggested that current-generation feedforward neural networks are largely inadequate for difficult problems in machine perception and machine learning, regardless of parallel-versus-serial hardware or other implementation issues.
Abstract: Feedforward neural networks trained by error backpropagation are examples of nonparametric regression estimators. We present a tutorial on nonparametric inference and its relation to neural networks, and we use the statistical viewpoint to highlight strengths and weaknesses of neural models. We illustrate the main points with some recognition experiments involving artificial data as well as handwritten numerals. In way of conclusion, we suggest that current-generation feedforward neural networks are largely inadequate for difficult problems in machine perception and machine learning, regardless of parallel-versus-serial hardware or other implementation issues. Furthermore, we suggest that the fundamental challenges in neural modeling are about representation rather than learning per se. This last point is supported by additional experiments with handwritten numerals.

3,492 citations


Journal ArticleDOI
TL;DR: In this paper, the authors focus on the promise of artificial neural networks in the realm of modelling, identification and control of nonlinear systems and explore the links between the fields of control science and neural networks.

1,721 citations


Journal ArticleDOI
TL;DR: Alcove selectively attends to relevant stimulus dimensions, can account for a form of base-rate neglect, does not suffer catastrophic forgetting, and can exhibit 3-stage learning of high-frequency exceptions to rules, whereas such effects are not easily accounted for by models using other combinations of representation and learning method.
Abstract: ALCOVE (attention learning covering map) is a connectionist model of category learning that incorporates an exemplar-based representation (Medin & Schaffer, 1978; Nosofsky, 1986) with error-driven learning (Gluck & Bower, 1988; Rumelhart, Hinton, & Williams, 1986) Alcove selectively attends to relevant stimulus dimensions, is sensitive to correlated dimensions, can account for a form of base-rate neglect, does not suffer catastrophic forgetting, and can exhibit 3-stage (U-shaped) learning of high-frequency exceptions to rules, whereas such effects are not easily accounted for by models using other combinations of representation and learning method

1,574 citations


Journal ArticleDOI
TL;DR: Research aimed at correcting words in text has focused on three progressively more difficult problems: nonword error detection; (2) isolated-word error correction; and (3) context-dependent work correction, which surveys documented findings on spelling error patterns.
Abstract: Research aimed at correcting words in text has focused on three progressively more difficult problems:(1) nonword error detection; (2) isolated-word error correction; and (3) context-dependent work correction. In response to the first problem, efficient pattern-matching and n-gram analysis techniques have been developed for detecting strings that do not appear in a given word list. In response to the second problem, a variety of general and application-specific spelling correction techniques have been developed. Some of them were based on detailed studies of spelling error patterns. In response to the third problem, a few experiments using natural-language-processing tools or statistical-language models have been carried out. This article surveys documented findings on spelling error patterns, provides descriptions of various nonword detection and isolated-word error correction techniques, reviews the state of the art of context-dependent word correction techniques, and discusses research issues related to all three areas of automatic error correction in text.

1,417 citations


Book
01 Jan 1992
TL;DR: The Computational Brain addresses the foundational ideas of the emerging field of computational neuroscience, examines a diverse range of neural network models, and considers future directions of the field.
Abstract: From the Publisher: How do groups of neurons interact to enable the organism to see, decide, and move appropriately? What are the principles whereby networks of neurons represent and compute? These are the central questions probed by The Computational Brain. Churchland and Sejnowski address the foundational ideas of the emerging field of computational neuroscience, examine a diverse range of neural network models, and consider future directions of the field. The Computational Brain is the first unified and broadly accessible book to bring together computational concepts and behavioral data within a neurobiological framework. Computer models constrained by neurobiological data can help reveal how networks of neurons subserve perception and behavior--how their physical interactions can yield global results in perception and behavior, and how their physical properties are used to code information and compute solutions. The Computational Brain focuses mainly on three domains: visual perception, learning and memory, and sensorimotor integration. Examples of recent computer models in these domains are discussed in detail, highlighting strengths and weaknesses, and extracting principles applicable to other domains. Churchland and Sejnowski show how both abstract models and neurobiologically realistic models can have useful roles in computational neuroscience, and they predict the coevolution of models and experiments at many levels of organization, from the neuron to the system. The Computational Brain addresses a broad audience: neuroscientists, computer scientists, cognitive scientists, and philosophers. It is written for both the expert and novice. A basic overview of neuroscience and computational theory is provided, followed by a study of some of the most recent and sophisticated modeling work in the context of relevant neurobiological research. Technical terms are clearly explained in the text, and definitions are provided in an extensive glossary. The appendix contains a precis of

1,389 citations


Journal ArticleDOI
TL;DR: First- and second-order optimization methods for learning in feedforward neural networks are reviewed to illustrate the main characteristics of the different methods and their mutual relations.
Abstract: On-line first-order backpropagation is sufficiently fast and effective for many large-scale classification problems but for very high precision mappings, batch processing may be the method of choice. This paper reviews first- and second-order optimization methods for learning in feedforward neural networks. The viewpoint is that of optimization: many methods can be cast in the language of optimization techniques, allowing the transfer to neural nets of detailed results about computational complexity and safety procedures to ensure convergence and to avoid numerical problems. The review is not intended to deliver detailed prescriptions for the most appropriate methods in specific applications, but to illustrate the main characteristics of the different methods and their mutual relations.

1,218 citations


Journal ArticleDOI
TL;DR: Empirical results show that neural nets is a promising method of evaluating bank conditions in terms of predictive accuracy, adaptability, and robustness.
Abstract: This paper introduces a neural-net approach to perform discriminant analysis in business research. A neural net represents a nonlinear discriminant function as a pattern of connections between its processing units. Using bank default data, the neural-net approach is compared with linear classifier, logistic regression, kNN, and ID3. Empirical results show that neural nets is a promising method of evaluating bank conditions in terms of predictive accuracy, adaptability, and robustness. Limitations of using neural nets as a general modeling tool are also discussed.

1,141 citations


Journal ArticleDOI
TL;DR: A fuzzy neural network model based on the multilayer perceptron, using the backpropagation algorithm, and capable of fuzzy classification of patterns is described, and the results are compared with those of the conventional MLP, the Bayes classifier, and other related models.
Abstract: A fuzzy neural network model based on the multilayer perceptron, using the backpropagation algorithm, and capable of fuzzy classification of patterns is described. The input vector consists of membership values to linguistic properties while the output vector is defined in terms of fuzzy class membership values. This allows efficient modeling of fuzzy uncertain patterns with appropriate weights being assigned to the backpropagated errors depending upon the membership values at the corresponding outputs. During training, the learning rate is gradually decreased in discrete steps until the network converges to a minimum error solution. The effectiveness of the algorithm is demonstrated on a speech recognition problem. The results are compared with those of the conventional MLP, the Bayes classifier, and other related models. >

1,031 citations


Journal ArticleDOI
TL;DR: In this article, a generalization of the PAC learning model based on statistical decision theory is described, where the learner receives randomly drawn examples, each example consisting of an instance x in X and an outcome y in Y, and tries to find a hypothesis h : X < A, where h in H, that specifies the appropriate action a in A to take for each instance x, in order to minimize the expectation of a loss l(y,a).
Abstract: We describe a generalization of the PAC learning model that is based on statistical decision theory. In this model the learner receives randomly drawn examples, each example consisting of an instance x in X and an outcome y in Y , and tries to find a hypothesis h : X --< A , where h in H , that specifies the appropriate action a in A to take for each instance x , in order to minimize the expectation of a loss l(y,a). Here X, Y, and A are arbitrary sets, l is a real-valued function, and examples are generated according to an arbitrary joint distribution on X times Y . Special cases include the problem of learning a function from X into Y , the problem of learning the conditional probability distribution on Y given X (regression), and the problem of learning a distribution on X (density estimation). We give theorems on the uniform convergence of empirical loss estimates to true expected loss rates for certain hypothesis spaces H , and show how this implies learnability with bounded sample size, disregarding computational complexity. As an application, we give distribution-independent upper bounds on the sample size needed for learning with feedforward neural networks. Our theorems use a generalized notion of VC dimension that applies to classes of real-valued functions, adapted from Pollard''s work, and a notion of *capacity* and *metric dimension* for classes of functions that map into a bounded metric space. (Supersedes 89-30 and 90-52.) [Also in "Information and Computation", Vol. 100, No.1, September 1992]

1,025 citations


Journal ArticleDOI
TL;DR: The generalized approximate-reasoning-based intelligent control (GARIC) architecture learns and tunes a fuzzy logic controller even when only weak reinforcement is available; introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; and learns to produce real-valued control actions.
Abstract: A method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system is presented. It is shown that: the generalized approximate-reasoning-based intelligent control (GARIC) architecture learns and tunes a fuzzy logic controller even when only weak reinforcement, such as a binary failure signal, is available; introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward network, which can then adaptively improve performance by using gradient descent methods. The GARIC architecture is applied to a cart-pole balancing system and demonstrates significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing. >

987 citations


Journal ArticleDOI
TL;DR: A fuzzy modeling method using fuzzy neural networks with the backpropagation algorithm is presented that can identify the fuzzy model of a nonlinear system automatically.
Abstract: A fuzzy modeling method using fuzzy neural networks with the backpropagation algorithm is presented. The method can identify the fuzzy model of a nonlinear system automatically. The feasibility of the method is examined using simple numerical data. >

Journal ArticleDOI
01 Jul 1992
TL;DR: Both template matching and structure analysis approaches to R&D are considered and it is noted that the two approaches are coming closer and tending to merge.
Abstract: Research and development of OCR systems are considered from a historical point of view. The historical development of commercial systems is included. Both template matching and structure analysis approaches to R&D are considered. It is noted that the two approaches are coming closer and tending to merge. Commercial products are divided into three generations, for each of which some representative OCR systems are chosen and described in some detail. Some comments are made on recent techniques applied to OCR, such as expert systems and neural networks, and some open problems are indicated. The authors' views and hopes regarding future trends are presented. >

27 Oct 1992
TL;DR: Experimental results show that the ensemble method dramatically improves neural network performance on difficult real-world optical character recognition tasks.
Abstract: : This paper presents a general theoretical framework for ensemble methods of constructing significantly improved regression estimates. Given a population of regression estimators, the authors construct a hybrid estimator that is as good or better in the mean square error sense than any estimator in the population. They argue that the ensemble method presented has several properties: (1) it efficiently uses all the networks of a population -- none of the networks need to be discarded; (2) it efficiently uses all of the available data for training without over-fitting; (3) it inherently performs regularization by smoothing in functional space, which helps to avoid over-fitting; (4) it utilizes local minima to construct improved estimates whereas other neural network algorithms are hindered by local minima; (5) it is ideally suited for parallel computation; (6) it leads to a very useful and natural measure of the number of distinct estimators in a population; and (7) the optimal parameters of the ensemble estimator are given in closed form. Experimental results show that the ensemble method dramatically improves neural network performance on difficult real-world optical character recognition tasks.

Journal ArticleDOI
TL;DR: The Stochastic Gradient Ascent neural network is proposed and shown to be closely related to the Generalized Hebbian Algorithm (GHA), and the SGA behaves better for extracting the less dominant eigenvectors.

Journal ArticleDOI
TL;DR: In this article, a hybrid neural network-first principles modeling scheme is developed and used to model a fedbatch bioreactor, which combines a partial first principles model, which incorporates the available prior knowledge about the process being modeled, with a neural network which serves as an estimator of unmeasuredprocess parameters that are difficult to model from first principles.
Abstract: A hybrid neural network-first principles modeling scheme is developed and used to model a fedbatch bioreactor. The hybrid model combines a partial first principles model, which incorporates the available prior knowledge about the process being modeled, with a neural network which serves as an estimator of unmeasuredprocess parameters that are difficult to model from first principles. This hybrid model has better properties than standard “black-box” neural network models in that it is able to interpolate and extrapolate much more accurately, is easier to analyze and interpret, and requires significantly fewer training examples. Two alternative state and parameter estimation strategies, extended Kalman filtering and NLP optimization, are also considered. When no a priori known model of the unobserved process parameters is available, the hybrid network model gives better estimates of the parameters, when compared to these methods. By providing a model of these unmeasured parameters, the hybrid network can also make predictions and hence can be used for process optimization. These results apply both when full and partial state measurements are available, but in the latter case a state reconstruction method must be used for the first principles component of the hybrid model.

Journal ArticleDOI
TL;DR: A system architecture and a network computational approach compatible with the goal of devising a general-purpose artificial neural network computer are described and the functionalities of supervised learning and optimization are illustrated.
Abstract: A system architecture and a network computational approach compatible with the goal of devising a general-purpose artificial neural network computer are described. The functionalities of supervised learning and optimization are illustrated, and cluster analysis and associative recall are briefly mentioned. >

Journal ArticleDOI
TL;DR: From a direct proof of the universal approximation capabilities of perceptron type networks with two hidden layers, estimates of numbers of hidden units are derived based on properties of the function being approximation and the accuracy of its approximation.

Journal ArticleDOI
TL;DR: A more complicated penalty term is proposed in which the distribution of weight values is modeled as a mixture of multiple gaussians, which allows the parameters of the mixture model to adapt at the same time as the network learns.
Abstract: One way of simplifying neural networks so they generalize better is to add an extra term to the error function that will penalize complexity. Simple versions of this approach include penalizing the sum of the squares of the weights or penalizing the number of nonzero weights. We propose a more complicated penalty term in which the distribution of weight values is modeled as a mixture of multiple gaussians. A set of weights is simple if the weights have high probability density under the mixture model. This can be achieved by clustering the weights into subsets with the weights in each cluster having very similar values. Since we do not know the appropriate means or variances of the clusters in advance, we allow the parameters of the mixture model to adapt at the same time as the network learns. Simulations on two different problems demonstrate that this complexity term is more effective than previous complexity terms.

Journal ArticleDOI
TL;DR: A neural network is developed to forecast rainfall intensity fields in space and time using a three-layer learning network with input, hidden, and output layers and is shown to perform well when a relatively large number of hidden nodes are utilized.

Journal ArticleDOI
TL;DR: A theoretical framework for backpropagation (BP) is proposed and it is proven in particular that the convergence holds if the classes are linearly separable and that multilayered neural networks (MLNs) exceed perceptrons in generalization to new examples.
Abstract: The authors propose a theoretical framework for backpropagation (BP) in order to identify some of its limitations as a general learning procedure and the reasons for its success in several experiments on pattern recognition. The first important conclusion is that examples can be found in which BP gets stuck in local minima. A simple example in which BP can get stuck during gradient descent without having learned the entire training set is presented. This example guarantees the existence of a solution with null cost. Some conditions on the network architecture and the learning environment that ensure the convergence of the BP algorithm are proposed. It is proven in particular that the convergence holds if the classes are linearly separable. In this case, the experience gained in several experiments shows that multilayered neural networks (MLNs) exceed perceptrons in generalization to new examples. >

Journal ArticleDOI
TL;DR: For a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, inconsistency in rating among experts was observed, with fuzzy c-means approaches being slightly preferred over feedforward cascade correlation results.
Abstract: Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representations of the original data with three approaches: the literal and approximate fuzzy c-means unsupervised clustering algorithms, and a supervised computational neural network. Initial clinical results are presented on normal volunteers and selected patients with brain tumors surrounded by edema. Supervised and unsupervised segmentation techniques provide broadly similar results. Unsupervised fuzzy algorithms were visually observed to show better segmentation when compared with raw image data for volunteer studies. For a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, where the tissues have similar MR relaxation behavior, inconsistency in rating among experts was observed, with fuzz-c-means approaches being slightly preferred over feedforward cascade correlation results. Various facets of both approaches, such as supervised versus unsupervised learning, time complexity, and utility for the diagnostic process, are compared. >

Journal ArticleDOI
TL;DR: This paper addresses the issues related to the identification of nonlinear discrete-time dynamic systems using neural networks with particular attention to the connections between existing techniques for nonlinear systems identification and some aspects of neural network methodology.
Abstract: Many real-world systems exhibit complex nonlinear characteristics and cannot be treated satisfactorily using linear systems theory. A neural network which has the ability to learn sophisticated nonlinear relationships provides an ideal means of modelling complicated nonlinear systems. This paper addresses the issues related to the identification of nonlinear discrete-time dynamic systems using neural networks. Three network architectures, namely the multi-layer perceptron, the radial basis function network and the functional-link network, are presented and several learning or identification algorithms are derived. Advantages and disadvantages of these structures are discussed and illustrated using simulated and real data. Particular attention is given to the connections between existing techniques for nonlinear systems identification and some aspects of neural network methodology, and this demonstrates that certain techniques employed in the neural network context have long been developed by the control e...

DissertationDOI
01 Jan 1992
TL;DR: The Bayesian framework for model comparison and regularisation is demonstrated by studying interpolation and classification problems modelled with both linear and non-linear models, and it is shown that the careful incorporation of error bar information into a classifier's predictions yields improved performance.
Abstract: The Bayesian framework for model comparison and regularisation is demonstrated by studying interpolation and classification problems modelled with both linear and non-linear models. This framework quantitatively embodies 'Occam's razor'. Over-complex and under-regularised models are automatically inferred to be less probable, even though their flexibility allows them to fit the data better. When applied to 'neural networks', the Bayesian framework makes possible (1) objective comparison of solutions using alternative network architectures; (2) objective stopping rules for network pruning or growing procedures; (3) objective choice of type of weight decay terms (or regularisers); (4) on-line techniques for optimising weight decay (or regularisation constant) magnitude; (5) a measure of the effective number of well-determined parameters in a model; (6) quantified estimates of the error bars on network parameters and on network output. In the case of classification models, it is shown that the careful incorporation of error bar information into a classifier's predictions yields improved performance. Comparisons of the inferences of the Bayesian framework with more traditional cross-validation methods help detect poor underlying assumptions in learning models. The relationship of the Bayesian learning framework to 'active learning' is examined. Objective functions are discussed which measure the expected informativeness of candidate data measurements, in the context of both interpolation and classification problems. The concepts and methods described in this thesis are quite general and will be applicable to other data modelling problems whether they involve regression, classification or density estimation.

Proceedings ArticleDOI
06 Jun 1992
TL;DR: An overview of this body of literature drawing out common themes and providing, where possible, the emerging wisdom about what seems to work and what does not is provided.
Abstract: Various schemes for combining genetic algorithms and neural networks have been proposed and tested in recent years, but the literature is scattered among a variety of journals, proceedings and technical reports. Activity in this area is clearly increasing. The authors provide an overview of this body of literature drawing out common themes and providing, where possible, the emerging wisdom about what seems to work and what does not. >

Journal ArticleDOI
TL;DR: It is demonstrated that continuous-time recurrent neural networks are a viable mechanism for adaptive agent control and that the genetic algorithm can be used to evolve effective neural controllers.
Abstract: We would like the behavior of the artificial agents that we construct to be as well-adapted to their environments as natural animals are to theirs. Unfortunately, designing controllers with these properties is a very difficult task. In this article, we demonstrate that continuous-time recurrent neural networks are a viable mechanism for adaptive agent control and that the genetic algorithm can be used to evolve effective neural controllers. A significant advantage of this approach is that one need specify only a measure of an agent's overall performance rather than the precise motor output trajectories by which it is achieved. By manipulating the performance evaluation, one can place selective pressure on the development of controllers with desired properties. Several novel controllers have been evolved, including a chemotaxis controller that switches between different strategies depending on environmental conditions, and a locomotion controller that takes advantage of sensory feedback if available but th...

Journal ArticleDOI
TL;DR: A neural network model that processes input data consisting of financial ratios is developed to predict the financial health of thrift institutions and its ability to discriminate between healthy and failed institutions is compared to a traditional statistical model.
Abstract: A neural network model that processes input data consisting of financial ratios is developed to predict the financial health of thrift institutions. The network's ability to discriminate between healthy and failed institutions is compared to a traditional statistical model. The differences and similarities in the two modelling approaches are discussed. The neural network, which uses the same financial data, requires fewer assumptions, achieves a higher degree of prediction accuracy, and is more robust.

Proceedings ArticleDOI
04 May 1992
TL;DR: The authors feel the need for alternative techniques and introduce the use of a neural network component for modeling user's behavior as a component for the intrusion detection system, and suggest the time series approach to add broader scope to the model.
Abstract: An approach toward user behavior modeling that takes advantage of the properties of neural algorithms is described, and results obtained on preliminary testing of the approach are presented. The basis of the approach is the IDES (Intruder Detection Expert System) which has two components, an expert system looking for evidence of attacks on known vulnerabilities of the system and a statistical model of the behavior of a user on the computer system under surveillance. This model learns the habits a user has when he works with the computer, and raises warnings when the current behavior is not consistent with the previously learned patterns. The authors suggest the time series approach to add broader scope to the model. They therefore feel the need for alternative techniques and introduce the use of a neural network component for modeling user's behavior as a component for the intrusion detection system. >

Journal ArticleDOI
TL;DR: In this paper, an artificial neural network (ANN) method is applied to forecast the short-term load for a large power system, where the load has two distinct patterns: weekday and weekend-day patterns.
Abstract: An artificial neural network (ANN) method is applied to forecast the short-term load for a large power system. The load has two distinct patterns: weekday and weekend-day patterns. The weekend-day pattern includes Saturday, Sunday, and Monday loads. A nonlinear load model is proposed and several structures of an ANN for short-term load forecasting were tested. Inputs to the ANN are past loads and the output of the ANN is the load forecast for a given day. The network with one or two hidden layers was tested with various combinations of neurons, and results are compared in terms of forecasting error. The neural network, when grouped into different load patterns, gives a good load forecast. >

Proceedings ArticleDOI
30 Aug 1992
TL;DR: It is shown that a large fraction of the parameters (the weights of neural networks) are of less importance and do not need to be measured with high accuracy and therefore the reported experiments seem to be more realistic from a classical point of view.
Abstract: In the field of neural network research a number of experiments described seem to be in contradiction with the classical pattern recognition or statistical estimation theory. The authors attempt to give some experimental understanding why this could be possible by showing that a large fraction of the parameters (the weights of neural networks) are of less importance and do not need to be measured with high accuracy. The remaining part is capable to implement the desired classifier and because this is only a small fraction of the total number of weights, the reported experiments seem to be more realistic from a classical point of view. >