scispace - formally typeset
Search or ask a question

Showing papers in "Neural Computation in 1994"


Journal ArticleDOI
TL;DR: An Expectation-Maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning and an on-line learning algorithm in which the parameters are updated incrementally.
Abstract: We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM's). Learning is treated as a maximum likelihood problem; in particular, we present an Expectation-Maximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an on-line learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain.

2,418 citations


Journal ArticleDOI
TL;DR: It is proposed that compact coding schemes are insufficient to account for the receptive field properties of cells in the mammalian visual pathway and suggested that natural scenes, to a first approximation, can be considered as a sum of self-similar local functions (the inverse of a wavelet).
Abstract: A number of recent attempts have been made to describe early sensory coding in terms of a general information processing strategy. In this paper, two strategies are contrasted. Both strategies take advantage of the redundancy in the environment to produce more effective representations. The first is described as a "compact" coding scheme. A compact code performs a transform that allows the input to be represented with a reduced number of vectors (cells) with minimal RMS error. This approach has recently become popular in the neural network literature and is related to a process called Principal Components Analysis (PCA). A number of recent papers have suggested that the optimal compact code for representing natural scenes will have units with receptive field profiles much like those found in the retina and primary visual cortex. However, in this paper, it is proposed that compact coding schemes are insufficient to account for the receptive field properties of cells in the mammalian visual pathway. In contrast, it is proposed that the visual system is near to optimal in representing natural scenes only if optimality is defined in terms of "sparse distributed" coding. In a sparse distributed code, all cells in the code have an equal response probability across the class of images but have a low response probability for any single image. In such a code, the dimensionality is not reduced. Rather, the redundancy of the input is transformed into the redundancy of the firing pattern of cells. It is proposed that the signature for a sparse code is found in the fourth moment of the response distribution (i.e., the kurtosis). In measurements with 55 calibrated natural scenes, the kurtosis was found to peak when the bandwidths of the visual code matched those of cells in the mammalian visual cortex. Codes resembling "wavelet transforms" are proposed to be effective because the response histograms of such codes are sparse (i.e., show high kurtosis) when presented with natural scenes. It is proposed that the structure of the image that allows sparse coding is found in the phase spectrum of the image. It is suggested that natural scenes, to a first approximation, can be considered as a sum of self-similar local functions (the inverse of a wavelet). Possible reasons for why sensory systems would evolve toward sparse coding are presented.

990 citations


Journal ArticleDOI
Gerald Tesauro1
TL;DR: The latest version of TD-Gammon is now estimated to play at a strong master level that is extremely close to the world's best human players.
Abstract: TD-Gammon is a neural network that is able to teach itself to play backgammon solely by playing against itself and learning from the results, based on the TD(») reinforcement learning algorithm (Sutton 1988). Despite starting from random initial weights (and hence random initial strategy), TD-Gammon achieves a surprisingly strong level of play. With zero knowledge built in at the start of learning (i.e., given only a "raw" description of the board state), the network learns to play at a strong intermediate level. Furthermore, when a set of hand-crafted features is added to the network's input representation, the result is a truly staggering level of performance: the latest version of TD-Gammon is now estimated to play at a strong master level that is extremely close to the world's best human players.

852 citations


Journal ArticleDOI
TL;DR: This work derives a technique that directly calculates Hv, where v is an arbitrary vector, and shows that this technique can be used at the heart of many iterative techniques for computing various properties of H, obviating any need to calculate the full Hessian.
Abstract: Just storing the Hessian H (the matrix of second derivatives δ2E/δwiδ wj of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly calculates Hv, where v is an arbitrary vector. To calculate Hv, we first define a differential operator Rv{f(w)} = (δ/δr)f(w + rv)|r=0, note that Rv{∇w} = Hv and Rv{w} = v, and then apply Rv{·} to the equations used to compute ∇w. The result is an exact and numerically stable procedure for computing Hv, which takes about as much computation, and is about as local, as a gradient evaluation. We then apply the technique to a one pass gradient calculation algorithm (backpropagation), a relaxation gradient calculation algorithm (recurrent backpropagation), and two stochastic gradient calculation algorithms (Boltzmann machines and weight perturbation). Finally, we show that this technique can be used at the heart of many iterative techniques for computing various properties of H, obviating any need to calculate the full Hessian.

700 citations


Journal ArticleDOI
TL;DR: This paper shows that reasonable biophysical assumptions about synaptic transmission allow the equations for a simple kinetic synapse model to be solved analytically, and yields a mechanism that preserves the advantages of kinetic models while being as fast to compute as a single tr-function.
Abstract: where gsyn is the synaptic conductance and to is the time of transmitter release. This function peaks at a value of l / e at t = to + T , and decays exponentially with a time constant of T . When multiple events occur in succession at a single synapse, the total conductance at any time is a sum of such waveforms calculated over the individual event times. There are several drawbacks to this method. First, the relationship to actual synaptic conductances is based only on an approximate correspondence of the time-course of the waveform to physiological recordings of the postsynaptic response, rather than plausible biophysical mechanisms. Second, summation of multiple waveforms can be cumbersome, since each event time must be stored in a queue for the duration of the waveform and necessitates calculation of an additional exponential during this period (but see Srinivasan and Chiel 1993). Third, there is no natural provision for saturation of the conductance. An alternative to the use of stereotyped waveforms is to compute synaptic conductances directly using a kinetic model (Perkel eta! . 1981). This approach allows a more realistic biophysical representation and is consistent with the formalism used to describe the conductances of other ion channels. However, solution of the associated differential equations generally requires computationally expensive numerical integration. In this paper we show that reasonable biophysical assumptions about synaptic transmission allow the equations for a simple kinetic synapse model to be solved analytically. This yields a mechanism that preserves the advantages of kinetic models while being as fast to compute as a single tr-function. Moreover, this mechanism accounts implicitly for sat-

491 citations


Journal ArticleDOI
TL;DR: These results may be used to understand constraints both over output cells and over input cells, and a variety of rules that can implement constrained dynamics are discussed.
Abstract: Models of unsupervised, correlation-based (Hebbian) synaptic plasticity are typically unstable: either all synapses grow until each reaches the maximum allowed strength, or all synapses decay to zero strength. A common method of avoiding these outcomes is to use a constraint that conserves or limits the total synaptic strength over a cell. We study the dynamic effects of such constraints. Two methods of enforcing a constraint are distinguished, multiplicative and subtractive. For otherwise linear learning rules, multiplicative enforcement of a constraint results in dynamics that converge to the principal eigenvector of the operator determining unconstrained synaptic development. Subtractive enforcement, in contrast, typically leads to a final state in which almost all synaptic strengths reach either the maximum or minimum allowed value. This final state is often dominated by weight configurations other than the principal eigenvector of the unconstrained operator. Multiplicative enforcement yields a "graded" receptive field in which most mutually correlated inputs are represented, whereas subtractive enforcement yields a receptive field that is "sharpened" to a subset of maximally correlated inputs. If two equivalent input populations (e.g., two eyes) innervate a common target, multiplicative enforcement prevents their segregation (ocular dominance segregation) when the two populations are weakly correlated; whereas subtractive enforcement allows segregation under these circumstances. These results may be used to understand constraints both over output cells and over input cells. A variety of rules that can implement constrained dynamics are discussed.

476 citations


Journal ArticleDOI
TL;DR: A method for measuring the capacity of learning machines is described, based on fitting a theoretically derived function to empirical measurements of the maximal difference between the error rates on two separate data sets of varying sizes.
Abstract: A method for measuring the capacity of learning machines is described. The method is based on fitting a theoretically derived function to empirical measurements of the maximal difference between the error rates on two separate data sets of varying sizes. Experimental measurements of the capacity of various types of linear classifiers are presented.

378 citations


Journal ArticleDOI
Harris Drucker1, Corinna Cortes1, Lawrence D. Jackel1, Yann LeCun1, Vladimir Vapnik1 
TL;DR: A surprising result is shown for the original boosting algorithm: namely, that as the training set size increases, the training error decreases until it asymptotes to the test error rate.
Abstract: We compare the performance of three types of neural network-based ensemble techniques to that of a single neural network. The ensemble algorithms are two versions of boosting and committees of neural networks trained independently. For each of the four algorithms, we experimentally determine the test and training error curves in an optical character recognition (OCR) problem as both a function of training set size and computational cost using three architectures. We show that a single machine is best for small training set size while for large training set size some version of boosting is best. However, for a given computational cost, boosting is always best. Furthermore, we show a surprising result for the original boosting algorithm: namely, that as the training set size increases, the training error decreases until it asymptotes to the test error rate. This has potential implications in the search for better training algorithms.

360 citations


Journal ArticleDOI
TL;DR: This review considers the input-output behavior of neurons with dendritic trees, with an emphasis on questions of information processing.
Abstract: This review considers the input-output behavior of neurons with dendritic trees, with an emphasis on questions of information processing. The parts of this review are (1) a brief history of ideas about dendritic trees, (2) a review of the complex electrophysiology of dendritic neurons, (3) an overview of conceptual tools used in dendritic modeling studies, including the cable equation and compartmental modeling techniques, and (4) a review of modeling studies that have addressed various issues relevant to dendritic information processing.

307 citations


Journal ArticleDOI
TL;DR: It is shown that a network with synapses that have two stable states can dynamically learn with optimal storage efficiency, be a palimpsest, and maintain its (associative) memory for an indefinitely long time provided the coding level is low and depression is equilibrated against potentiation.
Abstract: We discuss the long term maintenance of acquired memory in synaptic connections of a perpetually learning electronic device. This is affected by ascribing each synapse a finite number of stable states in which it can maintain for indefinitely long periods. Learning uncorrelated stimuli is expressed as a stochastic process produced by the neural activities on the synapses. In several interesting cases the stochastic process can be analyzed in detail, leading to a clarification of the performance of the network, as an associative memory, during the process of uninterrupted learning. The stochastic nature of the process and the existence of an asymptotic distribution for the synaptic values in the network imply generically that the memory is a palimpsest but capacity is as low as log N for a network of N neurons. The only way we find for avoiding this tight constraint is to allow the parameters governing the learning process (the coding level of the stimuli; the transition probabilities for potentiation and depression and the number of stable synaptic levels) to depend on the number of neurons. It is shown that a network with synapses that have two stable states can dynamically learn with optimal storage efficiency, be a palimpsest, and maintain its (associative) memory for an indefinitely long time provided the coding level is low and depression is equilibrated against potentiation. We suggest that an option so easily implementable in material devices would not have been overlooked by biology. Finally we discuss the stochastic learning on synapses with variable number of stable synaptic states.

297 citations


Journal ArticleDOI
TL;DR: In this paper, the authors describe a neural network model based on the synaptic organization of the cerebellum that can generate timed responses in the range of tens of milliseconds to seconds, and demonstrate that a network based on cerebellar circuitry can learn appropriately timed responses by encoding time as the population vector of granule cell activity.
Abstract: Substantial evidence has established that the cerebellum plays an important role in the generation of movements. An important aspect of motor output is its timing in relation to external stimuli or to other components of a movement. Previous studies suggest that the cerebellum plays a role in the timing of movements. Here we describe a neural network model based on the synaptic organization of the cerebellum that can generate timed responses in the range of tens of milliseconds to seconds. In contrast to previous models, temporal coding emerges from the dynamics of the cerebellar circuitry and depends neither on conduction delays, arrays of elements with different time constants, nor populations of elements oscillating at different frequencies. Instead, time is extracted from the instantaneous granule cell population vector. The subset of active granule cells is time-varying due to the granule---Golgi---granule cell negative feedback. We demonstrate that the population vector of simulated granule cell activity exhibits dynamic, nonperiodic trajectories in response to a periodic input. With time encoded in this manner, the output of the network at a particular interval following the onset of a stimulus can be altered selectively by changing the strength of granule ’ Purkinje cell connections for those granule cells that are active during the target time window. The memory of the reinforcement at that interval is subsequently expressed as a change in Purkinje cell activity that is appropriately timed with respect to stimulus onset. Thus, the present model demonstrates that a network based on cerebellar circuitry can learn appropriately timed responses by encoding time as the population vector of granule cell activity.

Journal ArticleDOI
TL;DR: By defining a probabilistic model of the waveform, the probability of both the form and number of spike shapes can be quantified and this framework is used to obtain an efficient algorithm for the decomposition of arbitrarily complex overlap sequences.
Abstract: Identifying and classifying action potential shapes in extracellular neural waveforms have long been the subject of research, and although several algorithms for this purpose have been successfully applied, their use has been limited by some outstanding problems. The first is how to determine shapes of the action potentials in the waveform and, second, how to decide how many shapes are distinct. A harder problem is that action potentials frequently overlap making difficult both the determination of the shapes and the classification of the spikes. In this report, a solution to each of these problems is obtained by applying Bayesian probability theory. By defining a probabilistic model of the waveform, the probability of both the form and number of spike shapes can be quantified. In addition, this framework is used to obtain an efficient algorithm for the decomposition of arbitrarily complex overlap sequences. This algorithm can extract many times more information than previous methods and facilitates the extracellular investigation of neuronal classes and of interactions within neuronal circuits.

Journal ArticleDOI
TL;DR: A model for biological stereo vision based on known receptive field profiles of binocular cells in the visual cortex is proposed and the first demonstration that these cells could effectively solve random dot stereograms is provided.
Abstract: Many models for stereo disparity computation have been proposed, but few can be said to be truly biological. There is also a rich literature devoted to physiological studies of stereopsis. Cells sensitive to binocular disparity have been found in the visual cortex, but it is not clear whether these cells could be used to compute disparity maps from stereograms. Here we propose a model for biological stereo vision based on known receptive field profiles of binocular cells in the visual cortex and provide the first demonstration that these cells could effectively solve random dot stereograms. Our model also allows a natural integration of stereo vision and motion detection. This may help explain the existence of units tuned to both disparity and motion in the visual cortex.

Journal ArticleDOI
TL;DR: A robust method for novelty detection is developed, which aims to minimize the number of heuristically chosen thresholds in the novelty decision process by growing a gaussian mixture model to form a representation of a training set of normal system states.
Abstract: The detection of novel or abnormal input vectors is of importance in many monitoring tasks, such as fault detection in complex systems and detection of abnormal patterns in medical diagnostics. We have developed a robust method for novelty detection, which aims to minimize the number of heuristically chosen thresholds in the novelty decision process. We achieve this by growing a gaussian mixture model to form a representation of a training set of "normal" system states. When previously unseen data are to be screened for novelty we use the same threshold as was used during training to define a novelty decision boundary. We show on a sample problem of medical signal processing that this method is capable of providing robust novelty decision boundaries and apply the technique to the detection of epileptic seizures within a data record.

Journal ArticleDOI
TL;DR: This work extends Barron's results to feedforward networks with possibly nonsigmoid activation functions approximating mappings and their derivatives simultaneously, showing that the approximation error decreases at rates as fast as n1/2, where n is the number of hidden units.
Abstract: Recently Barron (1993) has given rates for hidden layer feedforward networks with sigmoid activation functions approximating a class of functions satisfying a certain smoothness condition. These rates do not depend on the dimension of the input space. We extend Barron's results to feedforward networks with possibly nonsigmoid activation functions approximating mappings and their derivatives simultaneously. Our conditions are similar but not identical to Barron's, but we obtain the same rates of approximation, showing that the approximation error decreases at rates as fast as n-1/2, where n is the number of hidden units. The dimension of the input space appears only in the constants of our bounds.

Journal ArticleDOI
TL;DR: The method of averaging and a detailed bifurcation calculation are used to reduce a system of synaptically coupled neurons to a Hopfield type continuous time neural network and show how to derive a new type of squashing function whose properties are directly related to the detailed ionic mechanisms of the membrane.
Abstract: The method of averaging and a detailed bifurcation calculation are used to reduce a system of synaptically coupled neurons to a Hopfield type continuous time neural network. Due to some special properties of the bifurcation, explicit averaging is not required and the reduction becomes a simple algebraic problem. The resultant calculations show one how to derive a new type of "squashing function" whose properties are directly related to the detailed ionic mechanisms of the membrane. Frequency encoding as opposed to amplitude encoding emerges in a natural fashion from the theory. The full system and the reduced system are numerically compared.

Journal ArticleDOI
TL;DR: Bayesian methods are used to analyze some of the properties of a special type of Markov chain and derive the theory of self-supervision, in which the higher layers of a multilayer network supervise the lower layers, even though overall there is no external teacher.
Abstract: In this paper Bayesian methods are used to analyze some of the properties of a special type of Markov chain. The forward transitions through the chain are followed by inverse transitions (using Bayes' theorem) backward through a copy of the same chain; this will be called a folded Markov chain. If an appropriately defined Euclidean error (between the original input and its "reconstruction" via Bayes' theorem) is minimized with respect to the choice of Markov chain transition probabilities, then the familiar theories of both vector quantizers and self-organizing maps emerge. This approach is also used to derive the theory of self-supervision, in which the higher layers of a multilayer network supervise the lower layers, even though overall there is no external teacher.

Journal ArticleDOI
TL;DR: The -test, a general method which establishes functional dependencies given a sequence of measurements, is presented, based on calculating conditional probabilities from vector component distances, which is illustrated on synthetic time-series with different time-lag dependencies and noise levels.
Abstract: We present a general method, the δ-test, which establishes functional dependencies given a sequence of measurements. The approach is based on calculating conditional probabilities from vector component distances. Imposing the requirement of continuity of the underlying function, the obtained values of the conditional probabilities carry information on the embedding dimension and variable dependencies. The power of the method is illustrated on synthetic time-series with different time-lag dependencies and noise levels and on the sunspot data. The virtue of the method for preprocessing data in the context of feedforward neural networks is demonstrated. Also, its applicability for tracking residual errors in output units is stressed.

Journal ArticleDOI
TL;DR: A simple learning algorithm for Hidden Markov Models (HMMs) is presented together with a number of variations, proved to be exact or approximate gradient optimization algorithms with respect to likelihood, log-likelihood, or cross-entropy functions, and as such are usually convergent.
Abstract: A simple learning algorithm for Hidden Markov Models (HMMs) is presented together with a number of variations. Unlike other classical algorithms such as the Baum-Welch algorithm, the algorithms described are smooth and can be used on-line (after each example presentation) or in batch mode, with or without the usual Viterbi most likely path approximation. The algorithms have simple expressions that result from using a normalized-exponential representation for the HMM parameters. All the algorithms presented are proved to be exact or approximate gradient optimization algorithms with respect to likelihood, log-likelihood, or cross-entropy functions, and as such are usually convergent. These algorithms can also be casted in the more general EM (Expectation-Maximization) framework where they can be viewed as exact or approximate GEM (Generalized Expectation-Maximization) algorithms. The mathematical properties of the algorithms are derived in the appendix.

Journal ArticleDOI
TL;DR: Close connections are demonstrated between mean field theory methods and other approaches, in particular, barrier function and interior point methods, for obtaining approximate solutions to optimization problems.
Abstract: In recent years there has been significant interest in adapting techniques from statistical physics, in particular mean field theory, to provide deterministic heuristic algorithms for obtaining approximate solutions to optimization problems. Although these algorithms have been shown experimentally to be successful there has been little theoretical analysis of them. In this paper we demonstrate connections between mean field theory methods and other approaches, in particular, barrier function and interior point methods. As an explicit example, we summarize our work on the linear assignment problem. In this previous work we defined a number of algorithms, including deterministic annealing, for solving the assignment problem. We proved convergence, gave bounds on the convergence times, and showed relations to other optimization algorithms.

Journal ArticleDOI
TL;DR: This paper presents a learning algorithm for neural networks, called Alopex, which uses local correlations between changes in individual weights and changes in the global error measure, and shows that learning times are comparable to those for standard gradient descent methods.
Abstract: We present a learning algorithm for neural networks, called Alopex. Instead of error gradient, Alopex uses local correlations between changes in individual weights and changes in the global error measure. The algorithm does not make any assumptions about transfer functions of individual neurons, and does not explicitly depend on the functional form of the error measure. Hence, it can be used in networks with arbitrary transfer functions and for minimizing a large class of error measures. The learning algorithm is the same for feedforward and recurrent networks. All the weights in a network are updated simultaneously, using only local computations. This allows complete parallelization of the algorithm. The algorithm is stochastic and it uses a “temperature” parameter in a manner similar to that in simulated annealing. A heuristic “annealing schedule” is presented that is effective in finding global minima of error surfaces. In this paper, we report extensive simulation studies illustrating these advantages and show that learning times are comparable to those for standard gradient descent methods. Feedforward networks trained with Alopex are used to solve the MONK's problems and symmetry problems. Recurrent networks trained with the same algorithm are used for solving temporal XOR problems. Scaling properties of the algorithm are demonstrated using encoder problems of different sizes and advantages of appropriate error measures are illustrated using a variety of problems.

Journal ArticleDOI
TL;DR: The role of synaptic synchronization is investigated for the leaky integrate-and-fire neuron as well as for a biophysically and anatomically detailed compartmental model of a cortical pyramidal cell to find that if the number of excitatory inputs, N, is on the same order as thenumber of fully synchronized inputs necessary to trigger a single action potential, Nt, synchronization always increases the firing rate.
Abstract: It is commonly assumed that temporal synchronization of excitatory synaptic inputs onto a single neuron increases its firing rate. We investigate here the role of synaptic synchronization for the leaky integrate-and-fire neuron as well as for a biophysically and anatomically detailed compartmental model of a cortical pyramidal cell. We find that if the number of excitatory inputs, N, is on the same order as the number of fully synchronized inputs necessary to trigger a single action potential, Nt, synchronization always increases the firing rate (for both constant and Poisson-distributed input). However, for large values of N compared to Nt, "overcrowding" occurs and temporal synchronization is detrimental to firing frequency. This behavior is caused by the conflicting influence of the low-pass nature of the passive dendritic membrane on the one hand and the refractory period on the other. If both temporal synchronization as well as the fraction of synchronized inputs (Murthy and Fetz 1993) is varied, synchronization is only advantageous if either N or the average input frequency, fin, are small enough.

Journal ArticleDOI
TL;DR: There are strong relationships between approaches to optmization and learning based on statistical physics or mixtures of experts, and the EM algorithm can be interpreted as converging either to a local maximum of the mixtures model or to a saddle point solution to the statistical physics system.
Abstract: We show that there are strong relationships between approaches to optmization and learning based on statistical physics or mixtures of experts. In particular, the EM algorithm can be interpreted as converging either to a local maximum of the mixtures model or to a saddle point solution to the statistical physics system. An advantage of the statistical physics approach is that it naturally gives rise to a heuristic continuation method, deterministic annealing, for finding good solutions.

Journal ArticleDOI
TL;DR: The authors showed that the existence of a population vector constitutes only weak support for the explicit use of a particular coordinate representation by motor cortex, and that such a vector can always be found given a very general set of assumptions.
Abstract: Recent evidence of population coding in motor cortex has led some researchers to claim that certain variables such as hand direction or force may be coded within a Cartesian coordinate system with respect to extra personal space. These claims are based on the ability to predict the rectangular coordinates of hand movement direction using a "population vector" computed from multiple cells' firing rates. I show here that such a population vector can always be found given a very general set of assumptions. Therefore the existence of a population vector constitutes only weak support for the explicit use of a particular coordinate representation by motor cortex.

Journal ArticleDOI
TL;DR: It is shown that high variability can be obtained due to the amplification of correlated fluctuations in a recurrent network, and the inhibitory and excitatory feedback connections, which cause hotspots of neural activity to form within the network, are investigated.
Abstract: We investigate a model for neural activity in a two-dimensional sheet of leaky integrate-and-fire neurons with feedback connectivity consisting of local excitation and surround inhibition. Each neuron receives stochastic input from an external source, independent in space and time. As recently suggested by Softky and Koch (1992, 1993), independent stochastic input alone cannot explain the high interspike interval variability exhibited by cortical neurons in behaving monkeys. We show that high variability can be obtained due to the amplification of correlated fluctuations in a recurrent network. Furthermore, the cross-correlation functions have a dual structure, with a sharp peak on top of a much broader hill. This is due to the inhibitory and excitatory feedback connections, which cause "hotspots" of neural activity to form within the network. These localized patterns of excitation appear as clusters or stripes that coalesce, disintegrate, or fluctuate in size while simultaneously moving in a random walk constrained by the interaction with other clusters. The synaptic current impinging upon a single neuron shows large fluctuations at many time scales, leading to a large coefficient of variation (CV) for the interspike interval statistics. The power spectrum associated with single units shows a 1/f decay for small frequencies and is flat at higher frequencies, while the power spectrum of the spiking activity averaged over many cells---equivalent to the local field potential---shows no 1/f decay but a prominent peak around 40 Hz, in agreement with data recorded from cat and monkey cortex (Gray et al. 1990; Eckhorn et al. 1993). Firing rates exhibit self-similarity between 20 and 800 msec, resulting in 1/f-like noise, consistent with the fractal nature of neural spike trains (Teich 1992).

Journal ArticleDOI
TL;DR: The hypothesis that linear cortical neurons are concerned with building a particular type of representation of the visual world one that not only preserves the information and the efficiency achieved by the retina, but in addition preserves spatial relationships in the input both in the plane of vision and in the depth dimension is explored.
Abstract: We explore the hypothesis that linear cortical neurons are concerned with building a particular type of representation of the visual world---one that not only preserves the information and the efficiency achieved by the retina, but in addition preserves spatial relationships in the input---both in the plane of vision and in the depth dimension. Focusing on the linear cortical cells, we classify all transforms having these properties. They are given by representations of the scaling and translation group and turn out to be labeled by rational numbers '(p + q)/p' (p, q integers). Any given (p, q) predicts a set of receptive fields that comes at different spatial locations and scales (sizes) with a bandwidth of log2 [(p + q)/p] octaves and, most interestingly, with a diversity of 'q' cell varieties. The bandwidth affects the trade-off between preservation of planar and depth relations and, we think, should be selected to match structures in natural scenes. For bandwidths between 1 and 2 octaves, which are the ones we feel provide the best matching, we find for each scale a minimum of two distinct cell types that reside next to each other and in phase quadrature, that is, differ by 90° in the phases of their receptive fields, as are found in the cortex, they resemble the "even-symmetric" and "odd-symmetric" simple cells in special cases. An interesting consequence of the representations presented here is that the pattern of activation in the cells in response to a translation or scaling of an object remains the same but merely shifts its locus from one group of cells to another. This work also provides a new understanding of color coding changes from the retina to the cortex.

Journal ArticleDOI
TL;DR: Effectiveness of synaptic transmission, measured by the peak area of cross-correlations between input and output spikes, increased with increasing synchrony, and could reduce firing rate for higher values of the three parameters.
Abstract: For a model cortical neuron with three active conductances, we studied the dependence of the firing rate on the degree of synchrony in its synaptic inputs. The effect of synchrony was determined as a function of three parameters: number of inputs, average input frequency, and the synaptic strength (maximal unitary conductance change). Synchrony alone could increase the cell's firing rate when the product of these three parameters was below a critical value. But for higher values of the three parameters, synchrony could reduce firing rate. Instantaneous responses to time-varying input firing rates were close to predictions from steady-state responses when input synchrony was high, but fell below steady-state responses when input synchrony was low. Effectiveness of synaptic transmission, measured by the peak area of cross-correlations between input and output spikes, increased with increasing synchrony.

Journal ArticleDOI
TL;DR: It is shown that there are important differences in the representation formed depending on whether the constraint is enforced by dividing each weight by the same amount or subtracting a fixed amount from each weight.
Abstract: The effect of different kinds of weight normalization on the outcome of a simple competitive learning rule is analyzed. It is shown that there are important differences in the representation formed depending on whether the constraint is enforced by dividing each weight by the same amount ("divisive enforcement") or subtracting a fixed amount from each weight ("subtractive enforcement"). For the divisive cases weight vectors spread out over the space so as to evenly represent "typical" inputs, whereas for the subtractive cases the weight vectors tend to the axes of the space, so as to represent "extreme" inputs. The consequences of these differences are examined.

Journal ArticleDOI
TL;DR: A neural network model of translation-invariant object recognition that incorporates features of the neural circuitry of IMHV, and exhibits behavior qualitatively similar to a range of findings in the filial imprinting paradigm is developed.
Abstract: Using neural and behavioral constraints from a relatively simple biological visual system, we evaluate the mechanism and behavioral implications of a model of invariant object recognition. Evidence from a variety of methods suggests that a localized portion of the domestic chick brain, the intermediate and medial hyperstriatum ventrale (IMHV), is critical for object recognition. We have developed a neural network model of translation-invariant object recognition that incorporates features of the neural circuitry of IMHV, and exhibits behavior qualitatively similar to a range of findings in the filial imprinting paradigm. We derive several counter-intuitive behavioral predictions that depend critically upon the biologically derived features of the model. In particular, we propose that the recurrent excitatory and lateral inhibitory circuitry in the model, and observed in IMHV, produces hysteresis on the activation state of the units in the model and the principal excitatory neurons in IMHV. Hysteresis, when combined with a simple Hebbian covariance learning mechanism, has been shown in this and earlier work (Foldiak 1991; O'Reilly and McClelland 1992) to produce translation-invariant visual representations. The hysteresis and learning rule are responsible for a sensitive period phenomenon in the network, and for a series of novel temporal blending phenomena. These effects are empirically testable. Further, physiological and anatomical features of mammalian visual cortex support a hysteresis-based mechanism, arguing for the generality of the algorithm.

Journal ArticleDOI
TL;DR: A modification of the tabu learning technique is presented as a more coherent approach to general problem solving with neural networks for quadratic 0-1 programming problems with linear equality and inequality constraints.
Abstract: The often disappointing performance of optimizing neural networks can be partly attributed to the rather ad hoc manner in which problems are mapped onto them for solution. In this paper a rigorous mapping is described for quadratic 0-1 programming problems with linear equality and inequality constraints, this being the most general class of problem such networks can solve. The problem's constraints define a polyhedron P containing all the valid solution points, and the mapping guarantees strict confinement of the network's state vector to P. However, forcing convergence to a 0-1 point within P is shown to be generally intractable, rendering the Hopfield and similar models inapplicable to the vast majority of problems. A modification of the tabu learning technique is presented as a more coherent approach to general problem solving with neural networks. When tested on a collection of knapsack problems, the modified dynamics produced some very encouraging results.