scispace - formally typeset
Search or ask a question

Showing papers in "Neural Computation in 1998"


Journal ArticleDOI
TL;DR: A new method for performing a nonlinear form of principal component analysis by the use of integral operator kernel functions is proposed and experimental results on polynomial feature extraction for pattern recognition are presented.
Abstract: A new method for performing a nonlinear form of principal component analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in high-dimensional feature spaces, related to input space by some nonlinear map—for instance, the space of all possible five-pixel products in 16 × 16 images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition.

8,175 citations


Journal ArticleDOI
TL;DR: This article reviews five approximate statistical tests for determining whether one learning algorithm outperforms another on a particular learning task and measures the power (ability to detect algorithm differences when they do exist) of these tests.
Abstract: This article reviews five approximate statistical tests for determining whether one learning algorithm outperforms another on a particular learning task. These test sare compared experimentally to determine their probability of incorrectly detecting a difference when no difference exists (type I error). Two widely used statistical tests are shown to have high probability of type I error in certain situations and should never be used: a test for the difference of two proportions and a paired-differences t test based on taking several random train-test splits. A third test, a paired-differences t test based on 10-fold cross-validation, exhibits somewhat elevated probability of type I error. A fourth test, McNemar's test, is shown to have low type I error. The fifth test is a new test, 5 × 2 cv, based on five iterations of twofold cross-validation. Experiments show that this test also has acceptable type I error. The article also measures the power (ability to detect algorithm differences when they do exist)...

3,356 citations


Journal ArticleDOI
TL;DR: In this paper, the authors used information geometry to calculate the natural gradients in the parameter space of perceptrons, the space of matrices (for blind source separation), and the spaces of linear dynamical systems for blind source deconvolution, and proved that Fisher efficient online learning has asymptotically the same performance as the optimal batch estimation of parameters.
Abstract: When a parameter space has a certain underlying structure, the ordinary gradient of a function does not represent its steepest direction, but the natural gradient does. Information geometry is used for calculating the natural gradients in the parameter space of perceptrons, the space of matrices (for blind source separation), and the space of linear dynamical systems (for blind source deconvolution). The dynamical behavior of natural gradient online learning is analyzed and is proved to be Fisher efficient, implying that it has asymptotically the same performance as the optimal batch estimation of parameters. This suggests that the plateau phenomenon, which appears in the backpropagation learning algorithm of multilayer perceptrons, might disappear or might not be so serious when the natural gradient is used. An adaptive method of updating the learning rate is proposed and analyzed.

2,504 citations


Journal ArticleDOI
TL;DR: A form of nonlinear latent variable model called the generative topographic mapping, for which the parameters of the model can be determined using the expectation-maximization algorithm, is introduced.
Abstract: Latent variable models represent the probability density of data in a space of several dimensions in terms of a smaller number of latent, or hidden, variables. A familiar example is factor analysis which is based on a linear transformations between the latent space and the data space. In this paper we introduce a form of non-linear latent variable model called the Generative Topographic Mapping, for which the parameters of the model can be determined using the EM algorithm. GTM provides a principled alternative to the widely used Self-Organizing Map (SOM) of Kohonen (1982), and overcomes most of the significant limitations of the SOM. We demonstrate the performance of the GTM algorithm on a toy problem and on simulated data from flow diagnostics for a multi-phase oil pipeline.

1,469 citations


Journal ArticleDOI
TL;DR: A unified phenomenological model is proposed that allows computation of the postsynaptic current generated by both types of synapses when driven by an arbitrary pattern of action potential activity in a presynaptic population and allows for derivation of mean-field equations, which govern the activity of large, interconnected networks.
Abstract: Transmission across neocortical synapses depends on the frequency of presynaptic activity (Thomson & Deuchars, 1994). Interpyramidal synapses in layer V exhibit fast depression of synaptic transmission, while other types of synapses exhibit facilitation of transmission. To study the role of dynamic synapses in network computation, we propose a unified phenomenological model that allows computation of the postsynaptic current generated by both types of synapses when driven by an arbitrary pattern of action potential (AP) activity in a presynaptic population. Using this formalism, we analyze different regimes of synaptic transmission and demonstrate that dynamic synapses transmit different aspects of the presynaptic activity depending on the average presynaptic frequency. The model also allows for derivation of mean-field equations, which govern the activity of large, interconnected networks. We show that the dynamics of synaptic transmission results in complex sets of regular and irregular regimes of network activity.

892 citations


Journal ArticleDOI
TL;DR: The chaotic nature of the balanced state of this network model is revealed by showing that the evolution of the microscopic state of the network is extremely sensitive to small deviations in its initial conditions.
Abstract: The nature and origin of the temporal irregularity in the electrical activity of cortical neurons in vivo are not well understood. We consider the hypothesis that this irregularity is due to a balance of excitatory and inhibitory currents into the cortical cells. We study a network model with excitatory and inhibitory populations of simple binary units. The internal feedback is mediated by relatively large synaptic strengths, so that the magnitude of the total excitatory and inhibitory feedback is much larger than the neuronal threshold. The connectivity is random and sparse. The mean number of connections per unit is large, though small compared to the total number of cells in the network. The network also receives a large, temporally regular input from external sources. We present an analytical solution of the mean-field theory of this model, which is exact in the limit of large network size. This theory reveals a new cooperative stationary state of large networks, which we term a balanced state. In thi...

759 citations


Journal ArticleDOI
TL;DR: A constructive, incremental learning system for regression problems that models data by means of spatially localized linear models that can allocate resources as needed while dealing with the bias-variance dilemma in a principled way is introduced.
Abstract: We introduce a constructive, incremental learning system for regression problems that models data by means of spatially localized linear models. In contrast to other approaches, the size and shape of the receptive field of each locally linear model, as well as the parameters of the locally linear model itself, are learned independently, that is, without the need for competition or any other kind of communication. Independent learning is accomplished by incrementally minimizing a weighted local cross-validation error. As a result, we obtain a learning system that can allocate resources as needed while dealing with the bias-variance dilemma in a principled way. The spatial localization of the linear models increases robustness toward negative interference. Our learning system can be interpreted as a nonparametric adaptive bandwidth smoother, as a mixture of experts where the experts are trained in isolation, and as a learning system that profits from combining independent expert knowledge on the same problem. This article illustrates the potential learning capabilities of purely local learning and offers an interesting and powerful approach to learning with receptive fields.

577 citations


Journal ArticleDOI
TL;DR: If the data are noiseless, the modified version of basis pursuit denoising proposed in this article is equivalent to SVM in the following sense: if applied to the same data set, the two techniques give the same solution, which is obtained by solving the same quadratic programming problem.
Abstract: In the first part of this paper we show a similarity between the principle of Structural Risk Minimization Principle (SRM) (Vapnik, 1982) and the idea of Sparse Approximation, as defined in (Chen, Donoho and Saunders, 1995) and Olshausen and Field (1996). Then we focus on two specific (approximate) implementations of SRM and Sparse Approximation, which have been used to solve the problem of function approximation. For SRM we consider the Support Vector Machine technique proposed by V. Vapnik and his team at AT\&T Bell Labs, and for Sparse Approximation we consider a modification of the Basis Pursuit De-Noising algorithm proposed by Chen, Donoho and Saunders (1995). We show that, under certain conditions, these two techniques are equivalent: they give the same solution and they require the solution of the same quadratic programming problem.

538 citations


Journal ArticleDOI
Rahul Sarpeshkar1
TL;DR: The results suggest that it is likely that the brain computes in a hybrid fashion and that an underappreciated and important reason for the efficiency of the human brain, which consumes only 12 W, is the hybrid and distributed nature of its architecture.
Abstract: We review the pros and cons of analog and digital computation. We propose that computation that is most efficient in its use of resources is neither analog computation nor digital computation but, rather, a mixture of the two forms. For maximum efficiency, the information and information-processing resources of the hybrid form must be distributed over many wires, with an optimal signal-to-noise ratio per wire. Our results suggest that it is likely that the brain computes in a hybrid fashion and that an underappreciated and important reason for the efficiency of the human brain, which consumes only 12 W, is the hybrid and distributed nature of its architecture.

495 citations


Journal ArticleDOI
TL;DR: It is suggested that the noise inherent in the operation of ion channels enables neurons to act as smart encoders and channel stochasticity should be considered in realistic models of neurons.
Abstract: The firing reliability and precision of an isopotential membrane patch consisting of a realistically large number of ion channels is investigated using a stochastic Hodgkin-Huxley (HH) model. In sharp contrast to the deterministic HH model, the biophysically inspired stochastic model reproduces qualitatively the different reliability and precision characteristics of spike firing in response to DC and fluctuating current input in neocortical neurons, as reported by Mainen & Sejnowski (1995). For DC inputs, spike timing is highly unreliable; the reliability and precision are significantly increased for fluctuating current input. This behavior is critically determined by the relatively small number of excitable channels that are opened near threshold for spike firing rather than by the total number of channels that exist in the membrane patch. Channel fluctuations, together with the inherent bistability in the HH equations, give rise to three additional experimentally observed phenomena: subthreshold oscillations in the membrane voltage for DC input, "spontaneous" spikes for subthreshold inputs, and "missing" spikes for suprathreshold inputs. We suggest that the noise inherent in the operation of ion channels enables neurons to act as "smart" encoders. Slowly varying, uncorrelated inputs are coded with low reliability and accuracy and, hence, the information about such inputs is encoded almost exclusively by the spike rate. On the other hand, correlated presynaptic activity produces sharp fluctuations in the input to the postsynaptic cell, which are then encoded with high reliability and accuracy. In this case, information about the input exists in the exact timing of the spikes. We conclude that channel stochasticity should be considered in realistic models of neurons.

460 citations


Journal ArticleDOI
TL;DR: In this article, the authors present a model of contour integration that uses only known V1 elements, operations, and connec-tional relations, but they do not discuss how to integrate contours in V1.
Abstract: Experimental observations suggest that contour integration may take place in V1. However, there has yet to be a model of contour integration that uses only known V1 elements, operations, and connec...

Journal ArticleDOI
TL;DR: It is shown that in the context of population coding, the mutual information between the activity of a large array of neurons and a stimulus to which the neurons are tuned is naturally related to the Fisher information.
Abstract: In the context of parameter estimation and model selection, it is only quite recently that a direct link between the Fisher information and information-theoretic quantities has been exhibited. We give an interpretation of this link within the standard framework of information theory. We show that in the context of population coding, the mutual information between the activity of a large array of neurons and a stimulus to which the neurons are tuned is naturally related to the Fisher information. In the light of this result, we consider the optimization of the tuning curves parameters in the case of neurons responding to a stimulus represented by an angular variable.

Journal ArticleDOI
TL;DR: It is suggested that the hippocampus plays two roles that allow rodents to solve the hidden-platform water maze: self-localization and route replay, and these two mechanisms can coexist and form a basis for memory consolidation.
Abstract: We suggest that the hippocampus plays two roles that allow rodents to solve the hidden-platform water maze: self-localization and route replay. When an animal explores an environment such as the water maze, the combination of place fields and correlational (Hebbian) long-term potentiation produces a weight matrix in the CA3 recurrent collaterals such that cells with overlapping place fields are more strongly interconnected than cells with nonoverlapping fields. When combined with global inhibition, this forms an attractor with coherent representations of position as stable states. When biased by local view information, this allows the animal to determine its position relative to the goal when it returns to the environment. We call this self-localization. When an animal traces specific routes within an environment, the weights in the CA3 recurrent collaterals become asymmetric. We show that this stores these routes in the recurrent collaterals. When primed with noise in the absence of sensory input, a coherent representation of position still forms in the CA3 population, but then that representation drifts, retracing a route. We show that these two mechanisms can coexist and form a basis for memory consolidation, explaining the anterograde and limited retrograde amnesia seen following hippocampal lesions.

Journal ArticleDOI
TL;DR: It is shown that negative feedback to highly non linear frequency-current (F-I) curves results in an effective linearization and the adaptation is slow compared to other processes and the unadapted F-I curve is highly nonlinear.
Abstract: We show that negative feedback to highly nonlinear frequency-current (F-I) curves results in an effective linearization. (By highly nonlinear we mean that the slope at threshold is infinite or very steep.) We then apply this to a specific model for spiking neurons and show that the details of the adaptation mechanism do not affect the results. The crucial points are that the adaptation is slow compared to other processes and the unadapted F-I curve is highly nonlinear.

Journal ArticleDOI
TL;DR: This work shows how a nonlinear recurrent network can be used to perform estimation in a near-optimal way while keeping the estimate in a coarse code format, and suggests that lateral connections in the cortex may be involved in cleaning up uncorrelated noise among neurons representing similar variables.
Abstract: Coarse codes are widely used throughout the brain to encode sensory and motor variables. Methods designed to interpret these codes, such as population vector analysis, are either inefficient (the variance of the estimate is much larger than the smallest possible variance) or biologically implausible, like maximum likelihood. Moreover, these methods attempt to compute a scalar or vector estimate of the encoded variable. Neurons are faced with a similar estimation problem. They must read out the responses of the presynaptic neurons, but, by contrast, they typically encode the variable with a further population code rather than as a scalar. We show how a nonlinear recurrent network can be used to perform estimation in a near-optimal way while keeping the estimate in a coarse code format. This work suggests that lateral connections in the cortex may be involved in cleaning up uncorrelated noise among neurons representing similar variables.

Journal ArticleDOI
TL;DR: The results suggest that the high CV values such as those observed in cortical spike trains are an intrinsic characteristic of type I membranes driven to firing by random inputs, in contrast to neural oscillators or neurons exhibiting type II excitability should produce regular spike trains.
Abstract: We propose a biophysical mechanism for the high interspike interval variability observed in cortical spike trains. The key lies in the nonlinear dynamics of cortical spike generation, which are consistent with type I membranes where saddle-node dynamics underlie excitability (Rinzel & Ermentrout, 1989). We present a canonical model for type I membranes, the µ-neuron. The µ-neuron is a phase model whose dynamics reflect salient features of type I membranes. This model generates spike trains with coefficient of variation (CV) above 0.6 when brought to firing by noisy inputs. This happens because the timing of spikes for a type I excitable cell is exquisitely sensitive to the amplitude of the suprathreshold stimulus pulses. A noisy input current, giving random amplitude “kicks” to the cell, evokes highly irregular firing across a wide range of firing rates; an intrinsically oscillating cell gives regular spike trains. We corroborate the results with simulations of the Morris-Lecar (M-L) neural model with random synaptic inputs: type I M-L yields high CVs. When this model is modified to have type II dynamics (periodicity arises via a Hopf bifurcation), however, it gives regular spike trains (CV below 0.3). Our results suggest that the high CV values such as those observed in cortical spike trains are an intrinsic characteristic of type I membranes driven to firing by “random” inputs. In contrast, neural oscillators or neurons exhibiting type II excitability should produce regular spike trains.

Journal ArticleDOI
TL;DR: It is shown that the decision surface can be written as the sum of two orthogonal terms, the first depending on only the margin vectors (which are SVs lying on the margin), the second proportional to the regularization parameter for almost all values of the parameter.
Abstract: Support Vector Machines (SVMs) perform pattern recognition between two point classes by finding a decision surface determined by certain points of the training set, termed Support Vectors (SV). This surface, which in some feature space of possibly infinite dimension can be regarded as a hyperplane, is obtained from the solution of a problem of quadratic programming that depends on a regularization parameter. In this paper we study some mathematical properties of support vectors and show that the decision surface can be written as the sum of two orthogonal terms, the first depending only on the {\em margin vectors} (which are SVs lying on the margin), the second proportional to the regularization parameter. For almost all values of the parameter, this enables us to predict how the decision surface varies for small parameter changes. In the special but important case of feature space of finite dimension m, we also show that there are at most m+1 margin vectors and observe that m+1 SVs are usually sufficient to fully determine the decision surface. For relatively small m this latter result leads to a consistent reduction of the SV number.

Journal ArticleDOI
TL;DR: This work presents a new approximate learning algorithm for Boltzmann machines, based on mean-field theory and the linear response theorem, that is close to the optimal solutions and gives a significant improvement when correlations play a significant role.
Abstract: The learning process in Boltzmann machines is computationally very expensive. The computational complexity of the exact algorithm is exponential in the number of neurons. We present a new approximate learning algorithm for Boltzmann machines, based on mean-field theory and the linear response theorem. The computational complexity of the algorithm is cubic in the number of neurons. In the absence of hidden units, we show how the weights can be directly computed from the fixed-point equation of the learning rules. Thus, in this case we do not need to use a gradient descent procedure for the learning process. We show that the solutions of this method are close to the optimal solutions and give a significant improvement when correlations play a significant role. Finally, we apply the method to a pattern completion task and show good performance for networks up to 100 neurons.

Journal ArticleDOI
TL;DR: It is shown that very small time steps are required to reproduce correctly the synchronization properties of large networks of integrate-and-fire neurons when the differential system describing their dynamics is integrated with the standard Euler or second-order Runge-Kutta algorithms.
Abstract: It is shown that very small time steps are required to reproduce correctly the synchronization properties of large networks of integrate-and-fire neurons when the differential system describing their dynamics is integrated with the standard Euler or second-order Runge-Kutta algorithms. The reason for that behavior is analyzed, and a simple improvement of these algorithms is proposed.

Journal ArticleDOI
TL;DR: For neural networks with a wide class of weight priors, it can be shown that in the limit of an infinite number of hidden units, the prior over functions tends to a gaussian process.
Abstract: For neural networks with a wide class of weight priors, it can be shown that in the limit of an infinite number of hidden units, the prior over functions tends to a gaussian process. In this article, analytic forms are derived for the covariance function of the gaussian processes corresponding to networks with sigmoidal and gaussian hidden units. This allows predictions to be made efficiently using networks with an infinite number of hidden units and shows, somewhat paradoxically, that it may be easier to carry out Bayesian prediction with infinite networks rather than finite ones.

Journal ArticleDOI
TL;DR: It is shown that memory performance is maximized if synapses are first overgrown and then pruned following optimal minimal-value deletion, which leads to interesting insights concerning childhood amnesia.
Abstract: Research with humans and primates shows that the developmental course of the brain involves synaptic overgrowth followed by marked selective pruning. Previous explanations have suggested that this ...

Journal ArticleDOI
TL;DR: A model based on the hypothesis that alternation can be generated by competition between top-down cortical explanations for the inputs, rather than by direct competition between the inputs is built.
Abstract: Binocular rivalry is the alternating percept that can result when the two eyes see different scenes. Recent psychophysical evidence supports the notion that some aspects of binocular rivalry bear functional similarities to other bistable percepts. We build a model based on the hypothesis (Logothetis & Schall, 1989; Leopold & Logothetis, 1996; Logothetis, Leopold, & Sheinberg, 1996) that alternation can be generated by competition between top-down cortical explanations for the inputs, rather than by direct competition between the inputs. Recent neurophysiological evidence shows that some binocular neurons are modulated with the changing percept; others are not, even if they are selective between the stimuli presented to the eyes. We extend our model to a hierarchy to address these effects.

Journal ArticleDOI
TL;DR: Focusing on the cortical left-right symmetry, this work derives a bimodal description of the brain activity that is connected to behavioral dynamics and makes predictions of global features of brain dynamics during coordination tasks and test these against experimental magnetoencephalogram results.
Abstract: For the paradigmatic case of bimanual coordination, we review levels of organization of behavioral dynamics and present a description in terms of modes of behavior. We briefly review a recently developed model of spatiotemporal brain activity that is based on short- and long-range connectivity of neural ensembles. This model is specified for the case of motor and sensorimotor units embedded in the neural sheet. Focusing on the cortical left-right symmetry, we derive a bimodal description of the brain activity that is connected to behavioral dynamics. We make predictions of global features of brain dynamics during coordination tasks and test these against experimental magnetoencephalogram (MEG) results. A key feature of our approach is that phenomenological laws at the behavioral level can be connected to a field-theoretical description of cortical dynamics.

Journal ArticleDOI
TL;DR: A very efficient Markov chain Monte Carlo scheme is suggested for inference and prediction with fixed-architecture feedforward neural networks and extended to the variable architecture case, providing a data-driven procedure to identify sensible architectures.
Abstract: Stemming from work by Buntine and Weigend (1991) and MacKay (1992), there is a growing interest in Bayesian analysis of neural network models. Although conceptually simple, this problem is computationally involved. We suggest a very efficient Markov chain Monte Carlo scheme for inference and prediction with fixed-architecture feedforward neural networks. The scheme is then extended to the variable architecture case, providing a data-driven procedure to identify sensible architectures.

Journal ArticleDOI
TL;DR: A novel family of unsupervised learning algorithms for blind separation of mixed and convolved sources is derived based on formulating the separation problem as a learning task of a spatiotemporal generative model, whose parameters are adapted iteratively to minimize suitable error functions, thus ensuring stability of the algorithms.
Abstract: We derive a novel family of unsupervised learning algorithms for blind separation of mixed and convolved sources. Our approach is based on formulating the separation problem as a learning task of a spatiotemporal generative model, whose parameters are adapted iteratively to minimize suitable error functions, thus ensuring stability of the algorithms. The resulting learning rules achieve separation by exploiting high-order spatiotemporal statistics of the mixture data. Different rules are obtained by learning generative models in the frequency and time domains, whereas a hybrid frequency-time model leads to the best performance. These algorithms generalize independent component analysis to the case of convolutive mixtures and exhibit superior performance on instantaneous mixtures. An extension of the relative-gradient concept to the spatiotemporal case leads to fast and efficient learning rules with equivariant properties. Our approach can incorporate information about the mixing situation when available, ...

Journal ArticleDOI
TL;DR: It is demonstrated that the spike frequency adaptation seen in many pyramidal cells plays a subtle but important role in the dynamics of cortical networks.
Abstract: Oscillations in many regions of the cortex have common temporal characteristics with dominant frequencies centered around the 40 Hz (gamma) frequency range and the 5–10 Hz (theta) frequency range. Experimental results also reveal spatially synchronous oscillations, which are stimulus dependent (GraySGray, Konig, Engel, & Singer, 1989; Engel, Konig, Kreiter, Schillen, & Singer, 1992). This rhythmic activity suggests that the coherence of neural populations is a crucial feature of cortical dynamics (Gray, 1994). Using both simulations and a theoretical coupled oscillator approach, we demonstrate that the spike frequency adaptation seen in many pyramidal cells plays a subtle but important role in the dynamics of cortical networks. Without adaptation, excitatory connections among model pyramidal cells are desynchronizing. However, the slow processes associated with adaptation encourage stable synchronous behavior.

Journal ArticleDOI
TL;DR: The gaussian mixture model of Pearson is employed in deriving a closed-form generic score function for strictly subgaussian sources to provide a computationally simple yet powerful algorithm for performing independent component analysis on arbitrary mixtures of nongaussian sources.
Abstract: This article develops an extended independent component analysis algorithm for mixtures of arbitrary subgaussian and supergaussian sources. The gaussian mixture model of Pearson is employed in deriving a closedform generic score function for strictly subgaussian sources. This is combined with the score function for a unimodal supergaussian density to provide a computationally simple yet powerful algorithm for performing independent component analysis on arbitrary mixtures of nongaussian sources.

Journal ArticleDOI
TL;DR: How coincidence detection depends on the shape of the postsynaptic response function, the number of synapses, and the input statistics is shown, and it is demonstrated that there is an optimal threshold.
Abstract: How does a neuron vary its mean output firing rate if the input changes from random to oscillatory coherent but noisy activity? What are the critical parameters of the neuronal dynamics and input statistics? To answer these questions, we investigate the coincidence-detection properties of an integrate-and-fire neuron. We derive an expression indicating how coincidence detection depends on neuronal parameters. Specifically, we show how coincidence detection depends on the shape of the postsynaptic response function, the number of synapses, and the input statistics, and we demonstrate that there is an optimal threshold. Our considerations can be used to predict from neuronal parameters whether and to what extent a neuron can act as a coincidence detector and thus can convert a temporal code into a rate code.

Journal ArticleDOI
TL;DR: This review illustrates, with a large number of modeling studies, the specific computations performed by neuromodulation in the context of various neural models of invertebrate and vertebrate preparations.
Abstract: Computational modeling of neural substrates provides an excellent theoretical framework for the understanding of the computational roles of neuromodulation. In this review, we illustrate, with a large number of modeling studies, the specific computations performed by neuromodulation in the context of various neural models of invertebrate and vertebrate preparations. We base our characterization of neuromodulations on their computational and functional roles rather than on anatomical or chemical criteria. We review the main framework in which neuromodulation has been studied theoretically (central pattern generation and oscillations, sensory processing, memory and information integration). Finally, we present a detailed mathematical overview of how neuromodulation has been implemented at the single cell and network levels in modeling studies. Overall, neuromodulation is found to increase and control computational complexity.

Journal ArticleDOI
TL;DR: A simple, unsupervised neural network algorithm that uses only the co-occurring patterns of lip motion and sound signals from a human speaker to learn separate visual and auditory speech classifiers that perform comparably to supervised networks.
Abstract: Humans and other animals learn to form complex categories without receiving a target output, or teaching signal, with each input pattern. In contrast, most computer algorithms that emulate such performance assume the brain is provided with the correct output at the neuronal level or require grossly unphysiological methods of information propagation. Natural environments do not contain explicit labeling signals, but they do contain important information in the form of temporal correlations between sensations to different sensory modalities, and humans are affected by this correlational structure (Howells, 1944; McGurk & MacDonald, 1976; MacDonald & McGurk, 1978; Zellner & Kautz, 1990; Durgin & Proffitt, 1996). In this article we describe a simple, unsupervised neural network algorithm that also uses this natural structure. Using only the co-occurring patterns of lip motion and sound signals from a human speaker, the network learns separate visual and auditory speech classifiers that perform comparably to supervised networks.