scispace - formally typeset
Search or ask a question

Showing papers in "Neural Computation in 2001"


Journal ArticleDOI
TL;DR: In this paper, the authors propose a method to estimate a function f that is positive on S and negative on the complement of S. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space.
Abstract: Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified value between 0 and 1. We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data.

4,397 citations


Journal ArticleDOI
TL;DR: Using clues from the KKT conditions for the dual problem, two threshold parameters are employed to derive modifications of SMO that perform significantly faster than the original SMO on all benchmark data sets tried.
Abstract: This article points out an important source of inefficiency in Platt's sequential minimal optimization (SMO) algorithm that is caused by the use of a single threshold value. Using clues from the KKT conditions for the dual problem, two threshold parameters are employed to derive modifications of SMO. These modified algorithms perform significantly faster than the original SMO on all benchmark data sets tried.

1,814 citations


Journal ArticleDOI
TL;DR: This work suggests a two-stage separation process: a priori selection of a possibly overcomplete signal dictionary in which the sources are assumed to be sparsely representable, followed by unmixing the sources by exploiting the their sparse representability.
Abstract: The blind source separation problem is to extract the underlying source signals from a set of linear mixtures, where the mixing matrix is unknown. This situation is common in acoustics, radio, medical signal and image processing, hyperspectral imaging, and other areas. We suggest a two-stage separation process: a priori selection of a possibly overcomplete signal dictionary (for instance, a wavelet frame or a learned dictionary) in which the sources are assumed to be sparsely representable, followed by unmixing the sources by exploiting the their sparse representability. We consider the general case of more sources than mixtures, but also derive a more efficient algorithm in the case of a nonovercomplete dictionary and an equal numbers of sources and mixtures. Experiments with artificial signals and musical sounds demonstrate significantly better separation than other known techniques.

829 citations


Journal ArticleDOI
TL;DR: Simulations of an object manipulation task prove that the MOSAIC architecture can learn to manipulate multiple objects and switch between them appropriately and shows generalization to novel objects whose dynamics lie within the polyhedra of already learned dynamics.
Abstract: Humans demonstrate a remarkable ability to generate accurate and appropriate motor behavior under many different and often uncertain environmental conditions. We previously proposed a new modular architecture, the modular selection and identification for control (MOSAIC) model, for motor learning and control based on multiple pairs of forward (predictor) and inverse (controller) models. The architecture simultaneously learns the multiple inverse models necessary for control as well as how to select the set of inverse models appropriate for a given environment. It combines both feedforward and feedback sensorimotor information so that the controllers can be selected both prior to movement and subsequently during movement. This article extends and evaluates the MOSAIC architecture in the following respects. The learning in the architecture was implemented by both the original gradient-descent method and the expectation-maximization (EM) algorithm. Unlike gradient descent, the newly derived EM algorithm is robust to the initial starting conditions and learning parameters. Second, simulations of an object manipulation task prove that the architecture can learn to manipulate multiple objects and switch between them appropriately. Moreover, after learning, the model shows generalization to novel objects whose dynamics lie within the polyhedra of already learned dynamics. Finally, when each of the dynamics is associated with a particular object shape, the model is able to select the appropriate controller before movement execution. When presented with a novel shape-dynamic pairing, inappropriate activation of modules is observed followed by on-line correction.

732 citations


Journal ArticleDOI
TL;DR: A measure is introduced for the distance between two spike trains that has a time constant as a parameter and can be used to determine the intrinsic noise of a neuron.
Abstract: The discrimination between two spike trains is a fundamental problem for both experimentalists and the nervous system itself. We introduce a measure for the distance between two spike trains. The distance has a time constant as a parameter. Depending on this parameter, the distance interpolates between a coincidence detector and a rate difference counter. The dependence of the distance on noise is studied with an integrate-and-fire model. For an intermediate range of the time constants, the distance depends linearly on the noise. This property can be used to determine the intrinsic noise of a neuron.

569 citations


Journal ArticleDOI
TL;DR: This work evaluates the performances of two rate codes based on the spike count and the mean interspike interval, and compares the results with a rank order code, where the first ganglion cells to emit a spike are given a maximal weight.
Abstract: It is often supposed that the messages sent to the visual cortex by the retinal ganglion cells are encoded by the mean firing rates observed on spike trains generated with a Poisson process. Using an information transmission approach, we evaluate the performances of two such codes, one based on the spike count and the other on the mean interspike interval, and compare the results with a rank order code, where the first ganglion cells to emit a spike are given a maximal weight. Our results show that the rate codes are far from optimal for fast information transmission and that the temporal structure of the spike train can be efficiently used to maximize the information transfer rate under conditions where each cell needs to fire only one spike.

545 citations


Journal ArticleDOI
TL;DR: A linear decomposition is obtained into approximately independent components, where the dependence of two components is approximated by the proximity of the components in the topographic representation.
Abstract: In ordinary independent component analysis, the components are assumed to be completely independent, and they do not necessarily have any meaningful order relationships. In practice, however, the estimated "independent" components are often not at all independent. We propose that this residual dependence structure could be used to define a topographic order for the components. In particular, a distance between two components could be defined using their higher-order correlations, and this distance could be used to create a topographic representation. Thus, we obtain a linear decomposition into approximately independent components, where the dependence of two components is approximated by the proximity of the components in the topographic representation.

505 citations


Journal ArticleDOI
TL;DR: In this paper, the authors define predictive information Ipred(T) as the mutual information between the past and the future of a time series, and show that the divergent part of the predictive information provides the unique measure for the complexity of the dynamics underlying the time series.
Abstract: We define predictive information Ipred(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T:Ipred(T) can remain finite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a finite number of parameters, then Ipred(T) grows logarithmically with a coefficient that counts the dimensionality of the model space. In contrast, power-law growth is associated, for example, with the learning of infinite parameter (or nonparametric) models such as continuous functions with smoothness constraints. There are connections between the predictive information and measures of complexity that have been defined both in learning theory and the analysis of physical systems through statistical mechanics and dynamical systems theory. Furthermore, in the same way that entropy provides the unique measure of available information consistent with some simple and plausible conditions, we argue that the divergent part of Ipred(T) provides the unique measure for the complexity of dynamics underlying a time series. Finally, we discuss how these ideas may be useful in problems in physics, statistics, and biology.

485 citations


Journal ArticleDOI
TL;DR: A decomposition method for -SVM is proposed that is competitive with existing methods for C-SVM and shows that in general they are two different problems with the same optimal solution set.
Abstract: The ν-support vector machine (ν-SVM) for classification proposed by Scholkopf, Smola, Williamson, and Bartlett (2000) has the advantage of using a parameter ν on controlling the number of support vectors. In this article, we investigate the relation between ν-SVM and C-SVM in detail. We show that in general they are two different problems with the same optimal solution set. Hence, we may expect that many numerical aspects of solving them are similar. However, compared to regular C-SVM, the formulation of ν-SVM is more complicated, so up to now there have been no effective methods for solving large-scale ν-SVM. We propose a decomposition method for ν-SVM that is competitive with existing methods for C-SVM. We also discuss the behavior of ν-SVM by some numerical experiments.

461 citations


Journal ArticleDOI
TL;DR: By combining sequential model selection procedures, the online VB method provides a fully online learning method with a model selection mechanism and was able to adapt the model structure to dynamic environments.
Abstract: The Bayesian framework provides a principled way of model selection. This framework estimates a probability distribution over an ensemble of models, and the prediction is done by averaging over the ensemble of models. Accordingly, the uncertainty of the models is taken into account, and complex models with more degrees of freedom are penalized. However, integration over model parameters is often intractable, and some approximation scheme is needed. Recently, a powerful approximation scheme, called the variational bayes (VB) method, has been proposed. This approach defines the free energy for a trial probability distribution, which approximates a joint posterior probability distribution over model parameters and hidden variables. The exact maximization of the free energy gives the true posterior distribution. The VB method uses factorized trial distributions. The integration over model parameters can be done analytically, and an iterative expectation-maximization-like algorithm, whose convergence is guaranteed, is derived. In this article, we derive an online version of the VB algorithm and prove its convergence by showing that it is a stochastic approximation for finding the maximum of the free energy. By combining sequential model selection procedures, the online VB method provides a fully online learning method with a model selection mechanism. In preliminary experiments using synthetic data, the online VB method was able to adapt the model structure to dynamic environments.

415 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that in contrast to continuous processes, the variance of the estimators cannot be reduced by smoothing beyond a scale set by the number of point events in the interval.
Abstract: The spectrum and coherency are useful quantities for characterizing the temporal correlations and functional relations within and between point processes. This article begins with a review of these quantities, their interpretation, and how they may be estimated. A discussion of how to assess the statistical significance of features in these measures is included. In addition, new work is presented that builds on the framework established in the review section. This work investigates how the estimates and their error bars are modified by finite sample sizes. Finite sample corrections are derived based on a doubly stochastic inhomogeneous Poisson process model in which the rate functions are drawn from a low-variance gaussian process. It is found that in contrast to continuous processes, the variance of the estimators cannot be reduced by smoothing beyond a scale set by the number of point events in the interval. Alternatively, the degrees of freedom of the estimators can be thought of as bounded from above by the expected number of point events in the interval. Further new work describing and illustrating a method for detecting the presence of a line in a point process spectrum is also presented, corresponding to the detection of a periodic modulation of the underlying rate. This work demonstrates that a known statistical test, applicable to continuous processes, applies with little modification to point process spectra and is of utility in studying a point process driven by a continuous stimulus. Although the material discussed is of general applicability to point processes, attention will be confined to sequences of neuronal action potentials (spike trains), the motivation for this work.

Journal ArticleDOI
TL;DR: In this article, the authors introduce a method for validation of results obtained by clustering analysis of data based on resampling the available data, and a figure of merit that measures the stability of clustering solutions against resample is introduced.
Abstract: We introduce a method for validation of results obtained by clustering analysis of data. The method is based on resampling the available data. A figure of merit that measures the stability of clustering solutions against resampling is introduced. Clusters that are stable against resampling give rise to local maxima of this figure of merit. This is presented first for a one-dimensional data set, for which an analytic approximation for the figure of merit is derived and compared with numerical measurements. Next, the applicability of the method is demonstrated for higher-dimensional data, including gene microarray expression data.

Journal ArticleDOI
TL;DR: It is shown that a bump solution can exist in a spiking network provided the neurons fire asynchronously within the bump, and that the activity profile matches that of a corresponding population rate model.
Abstract: We examine the existence and stability of spatially localized "bumps" of neuronal activity in a network of spiking neurons. Bumps have been proposed in mechanisms of visual orientation tuning, the rat head direction system, and working memory. We show that a bump solution can exist in a spiking network provided the neurons fire asynchronously within the bump. We consider a parameter regime where the bump solution is bistable with an all-off state and can be initiated with a transient excitatory stimulus. We show that the activity profile matches that of a corresponding population rate model. The bump in a spiking network can lose stability through partial synchronization to either a traveling wave or the all-off state. This can occur if the synaptic timescale is too fast through a dynamical effect or if a transient excitatory pulse is applied to the network. A bump can thus be activated and deactivated with excitatory inputs that may have physiological relevance.

Journal ArticleDOI
TL;DR: This work proposes a simple model, which is to assume that the time at which a neuron fires is determined probabilistically by, and only by, two quantities: the experimental clock time and the elapsed time since the previous spike.
Abstract: Poisson processes usually provide adequate descriptions of the irregularity in neuron spike times after pooling the data across large numbers of trials, as is done in constructing the peristimulus time histogram. When probabilities are needed to describe the behavior of neurons within individual trials, however, Poisson process models are often inadequate. In principle, an explicit formula gives the probability density of a single spike train in great generality, but without additional assumptions, the firing-rate intensity function appearing in that formula cannot be estimated. We propose a simple solution to this problem, which is to assume that the time at which a neuron fires is determined probabilistically by, and only by, two quantities: the experimental clock time and the elapsed time since the previous spike. We show that this model can be fitted with standard methods and software and that it may used successfully to fit neuronal data.

Journal ArticleDOI
TL;DR: It is shown through an analysis of some standard models, that the M-current adaptation alters the mechanism for repetitive firing, while the after hyperpolarization adaptation works via shunting the incoming synapses.
Abstract: There are several different biophysical mechanisms for spike frequency adaptation observed in recordings from cortical neurons. The two most commonly used in modeling studies are a calcium-dependent potassium current Iahp and a slow voltage-dependent potassium current, Im. We show that both of these have strong effects on the synchronization properties of excitatorily coupled neurons. Furthermore, we show that the reasons for these effects are different. We show through an analysis of some standard models, that the M-current adaptation alters the mechanism for repetitive firing, while the afterhyperpolarization adaptation works via shunting the incoming synapses. This latter mechanism applies with a network that has recurrent inhibition. The shunting behavior is captured in a simple two-variable reduced model that arises near certain types of bifurcations. A one-dimensional map is derived from the simplified model.

Journal Article
TL;DR: In this article, the authors exploit the property of the sources to have a sparse representation in a corresponding signal dictionary, which can consist of wavelets, wavelet packets, etc., or be obtained by learning from a given family of signals.
Abstract: The blind source separation problem is to extract the underlying source signals from a set of their linear mixtures, where the mixing matrix is unknown. This situation is common, eg in acoustics, radio, and medical signal processing. We exploit the property of the sources to have a sparse representation in a corresponding signal dictionary. Such a dictionary may consist of wavelets, wavelet packets, etc., or be obtained by learning from a given family of signals. Starting from the maximum a posteriori framework, which is applicable to the case of more sources than mixtures, we derive a few other categories of objective functions, which provide faster and more robust computations, when there are an equal number of sources and mixtures. Our experiments with artificial signals and with musical sounds demonstrate significantly better separation than other known techniques.

Journal ArticleDOI
TL;DR: Using a biophysical model of a cortical neuron, it is shown that a temporal difference rule used in conjunction with dendritic backpropagating action potentials reproduces the temporally asymmetric window of Hebbian plasticity observed physiologically.
Abstract: A spike-timing-dependent Hebbian mechanism governs the plasticity of recurrent excitatory synapses in the neocortex: synapses that are activated a few milliseconds before a postsynaptic spike are potentiated, while those that are activated a few milliseconds after are depressed. We show that such a mechanism can implement a form of temporal difference learning for prediction of input sequences. Using a biophysical model of a cortical neuron, we show that a temporal difference rule used in conjunction with dendritic backpropagating action potentials reproduces the temporally asymmetric window of Hebbian plasticity observed physio-logically. Furthermore, the size and shape of the window vary with the distance of the synapse from the soma. Using a simple example, we show how a spike-timing-based temporal difference learning rule can allow a network of neocortical neurons to predict an input a few milliseconds before the input's expected arrival.

Journal ArticleDOI
TL;DR: The proposed spike- based synaptic learning algorithm provides a general framework for regulating neurotransmitter release probability by modifying the probability of vesicle discharge as a function of the relative timing of spikes in the pre- and postsynaptic neurons.
Abstract: The precise times of occurrence of individual pre- and postsynaptic action potentials are known to play a key role in the modification of synaptic efficacy. Based on stimulation protocols of two synaptically connected neurons, we infer an algorithm that reproduces the experimental data by modifying the probability of vesicle discharge as a function of the relative timing of spikes in the pre- and postsynaptic neurons. The primary feature of this algorithm is an asymmetry with respect to the direction of synaptic modification depending on whether the presynaptic spikes precede or follow the postsynaptic spike. Specifically, if the presynaptic spike occurs up to 50 ms before the postsynaptic spike, the probability of vesicle discharge is upregulated, while the probability of vesicle discharge is downregulated if the presynaptic spike occurs up to 50 ms after the postsynaptic spike. When neurons fire irregularly with Poisson spike trains at constant mean firing rates, the probability of vesicle discharge converges toward a characteristic value determined by the pre- and postsynaptic firing rates. On the other hand, if the mean rates of the Poisson spike trains slowly change with time, our algorithm predicts modifications in the probability of release that generalize Hebbian and Bienenstock-Cooper-Munro rules. We conclude that the proposed spike-based synaptic learning algorithm provides a general framework for regulating neurotransmitter release probability.

Journal ArticleDOI
TL;DR: A very simple batch learning algorithm for semi-blind extraction of a desired source signal with temporal structure from linear mixtures and shows that the a priori information about the autocorrelation function of primary sources can be used to extract the desired signals.
Abstract: In this work we develop a very simple batch learning algorithm for semiblind extraction of a desired source signal with temporal structure from linear mixtures. Although we use the concept of sequential blind extraction of sources and independent component analysis, we do not carry out the extraction in a completely blind manner; neither do we assume that sources are statistically independent. In fact, we show that the a priori information about the autocorrelation function of primary sources can be used to extract the desired signals (sources of interest) from their linear mixtures. Extensive computer simulations and real data application experiments confirm the validity and high performance of the proposed algorithm.

Journal ArticleDOI
TL;DR: It is rigorously proved that the Bayesian stochastic complexity or the free energy is asymptotically equal to 1 logn (m1 1) loglogn + constant, where n is the number of training samples and 1 and m1 are the rational number and the natural number, which are determined as the birational invariant values of the singularities in the parameter space.
Abstract: This article clarifies the relation between the learning curve and the algebraic geometrical structure of a nonidentifiable learning machine such as a multilayer neural network whose true parameter set is an analytic set with singular points. By using a concept in algebraic analysis, we rigorously prove that the Bayesian stochastic complexity or the free energy is asymptotically equal to λ1 log n - (m1 - 1) log log n + constant, where n is the number of training samples and λ1 and m1 are the rational number and the natural number, which are determined as the birational invariant values of the singularities in the parameter space. Also we show an algorithm to calculate λ1 and m1 based on the resolution of singularities in algebraic geometry. In regular statistical models, 2λ1 is equal to the number of parameters and m1 = 1 , whereas in nonregular models, such as multilayer networks, 2λ1 is not larger than the number of parameters and m1 ≥ 1. Since the increase of the stochastic complexity is equal to the learning curve or the generalization error, the nonidentifiable learning machines are better models than the regular ones if Bayesian ensemble learning is applied.

Journal ArticleDOI
TL;DR: It is shown that this property can be used to recover source signals from a set of linear mixtures of those signals by finding an un-mixing matrix that maximizes a measure of temporal predictability for each recovered signal.
Abstract: A measure of temporal predictability is defined and used to separate linear mixtures of signals. Given any set of statistically independent source signals, it is conjectured here that a linear mixture of those signals has the following property: the temporal predictability of any signal mixture is less than (or equal to) that of any of its component source signals. It is shown that this property can be used to recover source signals from a set of linear mixtures of those signals by finding an un-mixing matrix that maximizes a measure of temporal predictability for each recovered signal. This matrix is obtained as the solution to a generalized eigenvalue problem; such problems have scaling characteristics of O(N3), where N is the number of signal mixtures. In contrast to independent component analysis, the temporal predictability method requires minimal assumptions regarding the probability density functions of source signals. It is demonstrated that the method can separate signal mixtures in which each mixture is a linear combination of source signals with supergaussian, subgaussian, and gaussian probability density functions and on mixtures of voices and music.

Journal ArticleDOI
TL;DR: A new architecture for adaptively integrating different cues in a self-organized manner called Democratic Integration is proposed, in which discordant cues are quickly suppressed and recalibrated, while cues having been consistent with the result in the recent past are given a higher weight in the future.
Abstract: Sensory integration or sensor fusion - the integration of information from different modalities, cues, or sensors - is among the most fundamental problems of perception in biological and artificial systems. We propose a new architecture for adaptively integrating different cues in a self-organized manner. In Democratic Integration different cues agree on a result, and each cue adapts toward the result agreed on. In particular, discordant cues are quickly suppressed and recalibrated, while cues having been consistent with the result in the recent past are given a higher weight in the future. The architecture is tested in a face tracking scenario. Experiments show its robustness with respect to sudden changes in the environment as long as the changes disrupt only a minority of cues at the same time, although all cues may be disrupted at one time or another.

Journal ArticleDOI
TL;DR: An expectation-maximization algorithm for learning sparse and overcomplete data representations is presented, which exploits a variational approximation to a range of heavy-tailed distributions whose limit is the Laplacian.
Abstract: An expectation-maximization algorithm for learning sparse and overcomplete data representations is presented. The proposed algorithm exploits a variational approximation to a range of heavy-tailed distributions whose limit is the Laplacian. A rigorous lower bound on the sparse prior distribution is derived, which enables the analytic marginalization of a lower bound on the data likelihood. This lower bound enables the development of an expectation-maximization algorithm for learning the overcomplete basis vectors and inferring the most probable basis coefficients.

Journal ArticleDOI
TL;DR: Results of this analysis show that rate-place predictions for frequency discrimination are inconsistent with human performance in the dependence on frequency for high frequencies and that there is significant temporal information in the AN up to at least 10 kHz.
Abstract: A method for calculating psychophysical performance limits based on stochastic neural responses is introduced and compared to previous analytical methods for evaluating auditory discrimination of tone frequency and level. The method uses signal detection theory and a computational model for a population of auditory nerve (AN) fiber responses. The use of computational models allows predictions to be made over a wider parameter range and with more complete descriptions of AN responses than in analytical models. Performance based on AN discharge times (all-information) is compared to performance based only on discharge counts (rate-place). After the method is verified over the range of parameters for which previous analytical models are applicable, the parameter space is then extended. For example, a computational model of AN activity that extends to high frequencies is used to explore the common belief that rate-place information is responsible for frequency encoding at high frequencies due to the rolloff in AN phase locking above 2 kHz. This rolloff is thought to eliminate temporal information at high frequencies. Contrary to this belief, results of this analysis show that rate-place predictions for frequency discrimination are inconsistent with human performance in the dependence on frequency for high frequencies and that there is significant temporal information in the AN up to at least 10 kHz. In fact, the all-information predictions match the functional dependence of human performance on frequency, although optimal performance is much better than human performance. The use of computational AN models in this study provides new constraints on hypotheses of neural encoding of frequency in the auditory system; however, the method is limited to simple tasks with deterministic stimuli. A companion article in this issue ("Evaluating Auditory Performance Limits: II") describes an extension of this approach to more complex tasks that include random variation of one parameter, for example, random-level variation, which is often used in psychophysics to test neural encoding hypotheses.

Journal ArticleDOI
TL;DR: It is shown that plasticity can lead to an intrinsic stabilization of the mean firing rate of the postsynaptic neuron and that Hebbian and anti-Hebbian rules are questionable since learning is driven by correlations on the timescale of the learning window.
Abstract: We study analytically a model of long-term synaptic plasticity where synaptic changes are triggered by presynaptic spikes, postsynaptic spikes, and the time differences between presynaptic and postsynaptic spikes. The changes due to correlated input and output spikes are quantified by means of a learning window. We show that plasticity can lead to an intrinsic stabilization of the mean firing rate of the postsynaptic neuron. Subtractive normalization of the synaptic weights (summed over all presynaptic inputs converging on a postsynaptic neuron) follows if, in addition, the mean input rates and the mean input correlations are identical at all synapses. If the integral over the learning window is positive, firing-rate stabilization requires a non-Hebbian component, whereas such a component is not needed if the integral of the learning window is negative. A negative integral corresponds to anti-Hebbian learning in a model with slowly varying firing rates. For spike-based learning, a strict distinction between Hebbian and anti-Hebbian rules is questionable since learning is driven by correlations on the timescale of the learning window. The correlations between presynaptic and postsynaptic firing are evaluated for a piecewise-linear Poisson model and for a noisy spiking neuron model with refractoriness. While a negative integral over the learning window leads to intrinsic rate stabilization, the positive part of the learning window picks up spatial and temporal correlations in the input.

Journal ArticleDOI
TL;DR: This study demonstrates that the prediction signal of the TD model reproduces characteristics of cortical and striatal anticipatory neural activity, and suggests that tonic anticipatory activities may reflect prediction signals that are involved in the processing of dopamine neuron activity.
Abstract: Anticipatory neural activity preceding behaviorally important events has been reported in cortex, striatum, and midbrain dopamine neurons. Whereas dopamine neurons are phasically activated by reward-predictive stimuli, anticipatory activity of cortical and striatal neurons is increased during delay periods before important events. Characteristics of dopamine neuron activity resemble those of the prediction error signal of the temporal difference (TD) model of Pavlovian learning (Sutton & Barto, 1990). This study demonstrates that the prediction signal of the TD model reproduces characteristics of cortical and striatal anticipatory neural activity. This finding suggests that tonic anticipatory activities may reflect prediction signals that are involved in the processing of dopamine neuron activity.

Journal ArticleDOI
TL;DR: A generalization of projection pursuit for time series, that is, signals with time structure, is introduced, and a simple approximation of coding length is derived that takes into account both the nongaussianity and the autocorrelations of the time series.
Abstract: A generalization of projection pursuit for time series, that is, signals with time structure, is introduced. The goal is to find projections of time series that have interesting structure, defined using criteria related to Kolmogoroff complexity or coding length. Interesting signals are those that can be coded with a short code length. We derive a simple approximation of coding length that takes into account both the nongaussianity and the autocorrelations of the time series. Also, we derive a simple algorithm for its approximative optimization. The resulting method is closely related to blind separation of nongaussian, time-dependent source signals.

Journal ArticleDOI
TL;DR: It is shown that networks of tonically firing adapting excitatory neurons can evolve to a state where the neurons burst in a synchronized manner, and the mechanism leading to this burst activity is analyzed in a network of integrate-and-fire neurons with spike adaptation.
Abstract: We study the emergence of synchronized burst activity in networks of neurons with spike adaptation. We show that networks of tonically firing adapting excitatory neurons can evolve to a state where the neurons burst in a synchronized manner. The mechanism leading to this burst activity is analyzed in a network of integrate-and-fire neurons with spike adaptation. The dependence of this state on the different network parameters is investigated, and it is shown that this mechanism is robust against inhomogeneities, sparseness of the connectivity, and noise. In networks of two populations, one excitatory and one inhibitory, we show that decreasing the inhibitory feedback can cause the network to switch from a tonically active, asynchronous state to the synchronized bursting state. Finally, we show that the same mechanism also causes synchronized burst activity in networks of more realistic conductance-based model neurons.

Journal ArticleDOI
TL;DR: Simulations using the Leabra algorithm show that cognitive neuroscience models that incorporate the core mechanistic principles of interactivity, inhibitory competition, and error-driven and Hebbian learning satisfy a wider range of biological, psychological, and computational constraints than models employing a subset of these principles.
Abstract: Computational models in cognitive neuroscience should ideally use biological properties and powerful computational principles to produce behavior consistent with psychological findings. Error-driven backpropagation is computationally powerful and has proven useful for modeling a range of psychological data but is not biologically plausible. Several approaches to implementing backpropagation in a biologically plausible fashion converge on the idea of using bidirectional activation propagation in interactive networks to convey error signals. This article demonstrates two main points about these error-driven interactive networks: (1) they generalize poorly due to attractor dynamics that interfere with the network's ability to produce novel combinatorial representations systematically in response to novel inputs, and (2) this generalization problem can be remedied by adding two widely used mechanistic principles, inhibitory competition and Hebbian learning, that can be independently motivated for a variety of biological, psychological, and computational reasons. Simulations using the Leabra algorithm, which combines the generalized recirculation (GeneRec), biologically plausible, error-driven learning algorithm with inhibitory competition and Hebbian learning, show that these mechanisms can result in good generalization in interactive networks. These results support the general conclusion that cognitive neuroscience models that incorporate the core mechanistic principles of interactivity, inhibitory competition, and error-driven and Hebbian learning satisfy a wider range of biological, psychological, and computational constraints than models employing a subset of these principles.

Journal ArticleDOI
TL;DR: In this article, the information contained in the spike occurrence times of a population of neurons can be broken up into a series of terms, each reflecting something about potential coding mechanisms, and a transition between two coding regimes, depending on the size of the relevant observation timescale.
Abstract: We demonstrate that the information contained in the spike occurrence times of a population of neurons can be broken up into a series of terms, each reflecting something about potential coding mechanisms. This is possible in the coding regime in which few spikes are emitted in the relevant time window. This approach allows us to study the additional information contributed by spike timing beyond that present in the spike counts and to examine the contributions to the whole information of different statistical properties of spike trains, such as firing rates and correlation functions. It thus forms the basis for a new quantitative procedure for analyzing simultaneous multiple neuron recordings and provides theoretical constraints on neural coding strategies. We find a transition between two coding regimes, depending on the size of the relevant observation timescale. For time windows shorter than the timescale of the stimulus-induced response fluctuations, there exists a spike count coding phase, in which the purely temporal information is of third order in time. For time windows much longer than the characteristic timescale, there can be additional timing information of first order, leading to a temporal coding phase in which timing information may affect the instantaneous information rate. In this new framework, we study the relative contributions of the dynamic firing rate and correlation variables to the full temporal information, the interaction of signal and noise correlations in temporal coding, synergy between spikes and between cells, and the effect of refractoriness. We illustrate the utility of the technique by analyzing a few cells from the rat barrel cortex.