scispace - formally typeset
Search or ask a question

Showing papers in "Neural Computation in 2011"


Journal ArticleDOI
TL;DR: This letter describes algorithms for nonnegative matrix factorization (NMF) with the β-divergence, a family of cost functions parameterized by a single shape parameter β that takes the Euclidean distance, the Kullback-Leibler divergence, and the Itakura-Saito divergence as special cases.
Abstract: This letter describes algorithms for nonnegative matrix factorization (NMF) with the β-divergence (β-NMF). The β-divergence is a family of cost functions parameterized by a single shape parameter β that takes the Euclidean distance, the Kullback-Leibler divergence, and the Itakura-Saito divergence as special cases (β = 2, 1, 0 respectively). The proposed algorithms are based on a surrogate auxiliary function (a local majorization of the criterion function). We first describe a majorization-minimization algorithm that leads to multiplicative updates, which differ from standard heuristic multiplicative updates by a β-dependent power exponent. The monotonicity of the heuristic algorithm can, however, be proven for β ∈ (0, 1) using the proposed auxiliary function. Then we introduce the concept of the majorization-equalization (ME) algorithm, which produces updates that move along constant level sets of the auxiliary function and lead to larger steps than MM. Simulations on synthetic and real data illustrate the faster convergence of the ME approach. The letter also describes how the proposed algorithms can be adapted to two common variants of NMF: penalized NMF (when a penalty function of the factors is added to the criterion function) and convex NMF (when the dictionary is assumed to belong to a known subspace).

846 citations


Journal ArticleDOI
TL;DR: A proper probabilistic model for the denoising autoencoder technique is defined, which makes it in principle possible to sample from them or rank examples by their energy, and a different way to apply score matching that is related to learning to denoise and does not require computing second derivatives is suggested.
Abstract: Denoising autoencoders have been previously shown to be competitive alternatives to restricted Boltzmann machines for unsupervised pretraining of each layer of a deep architecture. We show that a simple denoising autoencoder training criterion is equivalent to matching the score (with respect to the data) of a specific energy-based model to that of a nonparametric Parzen density estimator of the data. This yields several useful insights. It defines a proper probabilistic model for the denoising autoencoder technique, which makes it in principle possible to sample from them or rank examples by their energy. It suggests a different way to apply score matching that is related to learning to denoise and does not require computing second derivatives. It justifies the use of tied weights between the encoder and decoder and suggests ways to extend the success of denoising autoencoders to a larger family of energy-based models.

779 citations


Journal ArticleDOI
TL;DR: Adaptive machine learning methods to eliminate offline calibration are investigated, the performance of 11 volunteers in a BCI based on the modulation of sensorimotor rhythms is analyzed, and an adaptation scheme that individually guides the user is presented.
Abstract: Brain-computer interfaces (BCIs) allow users to control a computer application by brain activity as acquired (e.g., by EEG). In our classic machine learning approach to BCIs, the participants undertake a calibration measurement without feedback to acquire data to train the BCI system. After the training, the user can control a BCI and improve the operation through some type of feedback. However, not all BCI users are able to perform sufficiently well during feedback operation. In fact, a nonnegligible portion of participants (estimated 15%-30%) cannot control the system (a BCI illiteracy problem, generic to all motor-imagery-based BCIs). We hypothesize that one main difficulty for a BCI user is the transition from offline calibration to online feedback. In this work, we investigate adaptive machine learning methods to eliminate offiine calibration and analyze the performance of 11 volunteers in a BCI based on the modulation of sensorimotor rhythms. We present an adaptation scheme that individually guides the user. It starts with a subject-independent classifier that evolves to a subject-optimized state-of-the-art classifier within one session while the user interacts continuously. These initial runs use supervised techniques for robust coadaptive learning of user and machine. Subsequent runs use unsupervised adaptation to track the features' drift during the session and provide an unbiased measure of BCI performance. Using this approach, without any offline calibration, six users, including one novice, obtained good performance after 3 to 6 minutes of adaptation. More important, this novel guided learning also allows participants with BCI illiteracy to gain significant control with the BCI in less than 60 minutes. In addition, one volunteer without sensorimotor idle rhythm peak at the beginning of the BCI experiment developed it during the course of the session and used voluntary modulation of its amplitude to control the feedback application.

194 citations


Journal ArticleDOI
TL;DR: Several decoding methods based on point-process neural encoding models, or forward models that predict spike responses to stimuli, are developed, which allow efficient maximum-likelihood model fitting and stimulus decoding.
Abstract: One of the central problems in systems neuroscience is to understand how neural spike trains convey sensory information. Decoding methods, which provide an explicit means for reading out the information contained in neural spike responses, offer a powerful set of tools for studying the neural coding problem. Here we develop several decoding methods based on point-process neural encoding models, or forward models that predict spike responses to stimuli. These models have concave log-likelihood functions, which allow efficient maximum-likelihood model fitting and stimulus decoding. We present several applications of the encoding model framework to the problem of decoding stimulus information from population spike responses: (1) a tractable algorithm for computing the maximum a posteriori (MAP) estimate of the stimulus, the most probable stimulus to have generated an observed single-or multiple-neuron spike train response, given some prior distribution over the stimulus; (2) a gaussian approximation to the posterior stimulus distribution that can be used to quantify the fidelity with which various stimulus features are encoded; (3) an efficient method for estimating the mutual information between the stimulus and the spike trains emitted by a neural population; and (4) a framework for the detection of change-point times (the time at which the stimulus undergoes a change in mean or variance) by marginalizing over the posterior stimulus distribution. We provide several examples illustrating the performance of these estimators with simulated and real neural data.

162 citations


Journal ArticleDOI
TL;DR: The proposed Bayesian regression self-training method for updating the parameters of an unscented Kalman filter decoder uses the decoder's output to periodically update its neuronal tuning model in a Bayesian linear regression, and significantly improved the accuracy of offline reconstructions.
Abstract: Brain-machine interfaces (BMIs) transform the activity of neurons recorded in motor areas of the brain into movements of external actuators. Representation of movements by neuronal populations varies over time, during both voluntary limb movements and movements controlled through BMIs, due to motor learning, neuronal plasticity, and instability in recordings. To ensure accurate BMI performance over long time spans, BMI decoders must adapt to these changes. We propose the Bayesian regression self-training method for updating the parameters of an unscented Kalman filter decoder. This novel paradigm uses the decoder's output to periodically update its neuronal tuning model in a Bayesian linear regression. We use two previously known statistical formulations of Bayesian linear regression: a joint formulation, which allows fast and exact inference, and a factorized formulation, which allows the addition and temporary omission of neurons from updates but requires approximate variational inference. To evaluate these methods, we performed offline reconstructions and closed-loop experiments with rhesus monkeys implanted cortically with microwire electrodes. Offline reconstructions used data recorded in areas M1, S1, PMd, SMA, and PP of three monkeys while they controlled a cursor using a handheld joystick. The Bayesian regression self-training updates significantly improved the accuracy of offline reconstructions compared to the same decoder without updates. We performed 11 sessions of real-time, closed-loop experiments with a monkey implanted in areas M1 and S1. These sessions spanned 29 days. The monkey controlled the cursor using the decoder with and without updates. The updates maintained control accuracy and did not require information about monkey hand movements, assumptions about desired movements, or knowledge of the intended movement goals as training signals. These results indicate that Bayesian regression self-training can maintain BMI control accuracy over long periods, making clinical neuroprosthetics more viable.

119 citations


Journal ArticleDOI
TL;DR: A new multiple linear regression model using regularized correntropy to improve the robustness of the classical mean square error (MSE) criterion that is sensitive to outliers is proposed and a novel algorithm to solve the nonlinear optimization problem is proposed.
Abstract: This letter proposes a new multiple linear regression model using regularized correntropy for robust pattern recognition. First, we motivate the use of correntropy to improve the robustness of the classical mean square error (MSE) criterion that is sensitive to outliers. Then an l1 regularization scheme is imposed on the correntropy to learn robust and sparse representations. Based on the half-quadratic optimization technique, we propose a novel algorithm to solve the nonlinear optimization problem. Second, we develop a new correntropy-based classifier based on the learned regularization scheme for robust object recognition. Extensive experiments over several applications confirm that the correntropy-based l1 regularization can improve recognition accuracy and receiver operator characteristic curves under noise corruption and occlusion.

114 citations


Journal ArticleDOI
TL;DR: This letter proposes a general framework for studying neural mass models defined by ordinary differential equations, and establishes an important relation, similar to a dictionary, between their behaviors and normal and pathological, especially epileptic, cortical patterns of activity.
Abstract: In this letter, we propose a general framework for studying neural mass models defined by ordinary differential equations. By studying the bifurcations of the solutions to these equations and their sensitivity to noise, we establish an important relation, similar to a dictionary, between their behaviors and normal and pathological, especially epileptic, cortical patterns of activity. We then apply this framework to the analysis of two models that feature most phenomena of interest, the Jansen and Rit model, and the slightly more complex model recently proposed by Wendling and Chauvel. This model-based approach allows us to test various neurophysiological hypotheses on the origin of pathological cortical behaviors and investigate the effect of medication. We also study the effects of the stochastic nature of the inputs, which gives us clues about the origins of such important phenomena as interictal spikes, interictal bursts, and fast onset activity that are of particular relevance in epilepsy.

113 citations


Journal ArticleDOI
TL;DR: This work introduces a basic model, the masked RBM, which explicitly models occlusion boundaries in image patches by factoring the appearance of any patch region from its shape, and proposes a generative model of larger images using a field of such RBMs.
Abstract: Computer vision has grown tremendously in the past two decades. Despite all efforts, existing attempts at matching parts of the human visual system's extraordinary ability to understand visual scenes lack either scope or power. By combining the advantages of general low-level generative models and powerful layer-based and hierarchical models, this work aims at being a first step toward richer, more flexible models of images. After comparing various types of restricted Boltzmann machines (RBMs) able to model continuous-valued data, we introduce our basic model, the masked RBM, which explicitly models occlusion boundaries in image patches by factoring the appearance of any patch region from its shape. We then propose a generative model of larger images using a field of such RBMs. Finally, we discuss how masked RBMs could be stacked to form a deep model able to generate more complicated structures and suitable for various tasks such as segmentation or object recognition.

106 citations


Journal ArticleDOI
TL;DR: It is shown that any distribution on the set of binary vectors of length can be arbitrarily well approximated by an RBM with hidden units, and this confirms a conjecture presented in Le Roux and Bengio (2010).
Abstract: We improve recently published results about resources of restricted Boltzmann machines RBM and deep belief networks DBN required to make them universal approximators. We show that any distribution on the set of binary vectors of length can be arbitrarily well approximated by an RBM with hidden units, where is the minimal number of pairs of binary vectors differing in only one entry such that their union contains the support set of . In important cases this number is half the cardinality of the support set of given in Le Roux & Bengio, 2008. We construct a DBN with , hidden layers of width that is capable of approximating any distribution on arbitrarily well. This confirms a conjecture presented in Le Roux and Bengio 2010.

98 citations


Journal ArticleDOI
TL;DR: This work extends FPCD using an idea borrowed from Herding in order to obtain a pure sampling algorithm, which it is called the rates-FPCD sampler, which can improve the model as the authors collect more samples, since it optimizes a lower bound on the log likelihood of the training data.
Abstract: Two recently proposed learning algorithms, herding and fast persistent contrastive divergence (FPCD), share the following interesting characteristic: they exploit changes in the model parameters while sampling in order to escape modes and mix better during the sampling process that is part of the learning algorithm. We justify such approaches as ways to escape modes while keeping approximately the same asymptotic distribution of the Markov chain. In that spirit, we extend FPCD using an idea borrowed from Herding in order to obtain a pure sampling algorithm, which we call the rates-FPCD sampler. Interestingly, this sampler can improve the model as we collect more samples, since it optimizes a lower bound on the log likelihood of the training data. We provide empirical evidence that this new algorithm displays substantially better and more robust mixing than Gibbs sampling.

89 citations


Journal ArticleDOI
TL;DR: This article deduces the mathematical fundamentals for its utilization in gradient-based online vector quantization algorithms, and bears on the generalized derivatives of the divergences known as Frchet derivatives in functional analysis, which reduces in finite-dimensional problems to partial derivatives in a natural way.
Abstract: Supervised and unsupervised vector quantization methods for classification and clustering traditionally use dissimilarities, frequently taken as Euclidean distances. In this article, we investigate the applicability of divergences instead, focusing on online learning. We deduce the mathematical fundamentals for its utilization in gradient-based online vector quantization algorithms. It bears on the generalized derivatives of the divergences known as Freechet derivatives in functional analysis, which reduces in finite-dimensional problems to partial derivatives in a natural way. We demonstrate the application of this methodology for widely applied supervised and unsupervised online vector quantization schemes, including self-organizing maps, neural gas, and learning vector quantization. Additionally, principles for hyperparameter optimization and relevance learning for parameterized divergences in the case of supervised vector quantization are given to achieve improved classification accuracy.

Journal ArticleDOI
TL;DR: This work compares several Markov chain Monte Carlo (MCMC) algorithms that allow for the calculation of general Bayesian estimators involving posterior expectations (conditional on model parameters) and addresses the application of MCMC methods for extracting nonmarginal properties of the posterior distribution.
Abstract: Stimulus reconstruction or decoding methods provide an important tool for understanding how sensory and motor information is represented in neural activity. We discuss Bayesian decoding methods based on an encoding generalized linear model (GLM) that accurately describes how stimuli are transformed into the spike trains of a group of neurons. The form of the GLM likelihood ensures that the posterior distribution over the stimuli that caused an observed set of spike trains is log concave so long as the prior is. This allows the maximum a posteriori (MAP) stimulus estimate to be obtained using efficient optimization algorithms. Unfortunately, the MAP estimate can have a relatively large average error when the posterior is highly nongaussian. Here we compare several Markov chain Monte Carlo (MCMC) algorithms that allow for the calculation of general Bayesian estimators involving posterior expectations (conditional on model parameters). An efficient version of the hybrid Monte Carlo (HMC) algorithm was significantly superior to other MCMC methods for gaussian priors. When the prior distribution has sharp edges and corners, on the other hand, the “hit-and-run” algorithm performed better than other MCMC methods. Using these algorithms, we show that for this latter class of priors, the posterior mean estimate can have a considerably lower average error than MAP, whereas for gaussian priors, the two estimators have roughly equal efficiency. We also address the application of MCMC methods for extracting nonmarginal properties of the posterior distribution. For example, by using MCMC to calculate the mutual information between the stimulus and response, we verify the validity of a computationally efficient Laplace approximation to this quantity for gaussian priors in a wide range of model parameters; this makes direct model-based computation of the mutual information tractable even in the case of large observed neural populations, where methods based on binning the spike train fail. Finally, we consider the effect of uncertainty in the GLM parameters on the posterior estimators.

Journal ArticleDOI
TL;DR: It is shown that an actor-critic model of reinforcement learning and a model assuming that the cortico-basal-ganglia circuit implements a statistically optimal decision-making procedure converges to the weights required for optimal decision making when biologically realistic limits on synaptic weights are introduced.
Abstract: This article seeks to integrate two sets of theories describing action selection in the basal ganglia: reinforcement learning theories describing learning which actions to select to maximize reward and decision-making theories proposing that the basal ganglia selects actions on the basis of sensory evidence accumulated in the cortex. In particular, we present a model that integrates the actor-critic model of reinforcement learning and a model assuming that the cortico-basal-ganglia circuit implements a statistically optimal decision-making procedure. The values of corico-striatal weights required for optimal decision making in our model differ from those provided by standard reinforcement learning models. Nevertheless, we show that an actor-critic model converges to the weights required for optimal decision making when biologically realistic limits on synaptic weights are introduced. We also describe the model's predictions concerning reaction times and neural responses during learning, and we discuss directions required for further integration of reinforcement learning and optimal decision-making theories.

Journal ArticleDOI
TL;DR: A simple expression for a lower bound of Fisher information is derived for a network of recurrently connected spiking neurons that have been driven to a noise-perturbed steady state and offers substantial insight into the sources of information degradation across successive layers of a neural network.
Abstract: A simple expression for a lower bound of Fisher information is derived for a network of recurrently connected spiking neurons that have been driven to a noise-perturbed steady state. We call this lower bound linear Fisher information, as it corresponds to the Fisher information that can be recovered by a locally optimal linear estimator. Unlike recent similar calculations, the approach used here includes the effects of nonlinear gain functions and correlated input noise and yields a surprisingly simple and intuitive expression that offers substantial insight into the sources of information degradation across successive layers of a neural network. Here, this expression is used to (1) compute the optimal (i.e., information-maximizing) firing rate of a neuron, (2) demonstrate why sharpening tuning curves by either thresholding or the action of recurrent connectivity is generally a bad idea, (3) show how a single cortical expansion is sufficient to instantiate a redundant population code that can propagate across multiple cortical layers with minimal information loss, and (4) show that optimal recurrent connectivity strongly depends on the covariance structure of the inputs to the network.

Journal ArticleDOI
TL;DR: It is proved that these P systems with extended rules (several spikes can be produced by a rule) are equivalent to register machines, and if the number of spikes present in the system is bounded, the power of time-free SN P systems falls, and in this case, a characterization of semilinear sets of natural numbers is obtained.
Abstract: Different biological processes take different times to be completed, which can also be influenced by many environmental factors. In this work, a realistic definition of nonsynchronized spiking neural P systems (SN P systems, for short) is considered: during the work of an SN P system, the execution times of spiking rules cannot be known exactly (i.e., they are arbitrary). In order to establish robust systems against the environmental factors, a special class of SN P systems, called time-free SN P systems, is introduced, which always produce the same computation result independent of the execution times of the rules. The universality of time-free SN P systems is investigated. It is proved that these P systems with extended rules (several spikes can be produced by a rule) are equivalent to register machines. However, if the number of spikes present in the system is bounded, then the power of time-free SN P systems falls, and in this case, a characterization of semilinear sets of natural numbers is obtained.

Journal ArticleDOI
TL;DR: The method offers a route toward a high-level task configuration language for neuromorphic VLSI systems and permits a seamless integration between software simulations with hardware emulations and intertranslatability between the parameters of abstract neuronal models and their emulation counterparts.
Abstract: An increasing number of research groups are developing custom hybrid analog/digital very large scale integration (VLSI) chips and systems that implement hundreds to thousands of spiking neurons with biophysically realistic dynamics, with the intention of emulating brainlike real-world behavior in hardware and robotic systems rather than simply simulating their performance on general-purpose digital computers. Although the electronic engineering aspects of these emulation systems is proceeding well, progress toward the actual emulation of brainlike tasks is restricted by the lack of suitable high-level configuration methods of the kind that have already been developed over many decades for simulations on general-purpose computers. The key difficulty is that the dynamics of the CMOS electronic analogs are determined by transistor biases that do not map simply to the parameter types and values used in typical abstract mathematical models of neurons and their networks. Here we provide a general method for resolving this difficulty. We describe a parameter mapping technique that permits an automatic configuration of VLSI neural networks so that their electronic emulation conforms to a higher-level neuronal simulation. We show that the neurons configured by our method exhibit spike timing statistics and temporal dynamics that are the same as those observed in the software simulated neurons and, in particular, that the key parameters of recurrent VLSI neural networks (e.g., implementing soft winner-take-all) can be precisely tuned. The proposed method permits a seamless integration between software simulations with hardware emulations and intertranslatability between the parameters of abstract neuronal models and their emulation counterparts. Most important, our method offers a route toward a high-level task configuration language for neuromorphic VLSI systems.

Journal ArticleDOI
TL;DR: A framework for estimating state-dependent neural response properties from spike train data is developed and a simple reformulation of the state space of the underlying Markov chain allows a hybrid half-multistate, half-histogram model that may be more appropriate for capturing the complexity of certain data sets than either a simple HMM or a simple peristimulus time histogram model alone.
Abstract: Given recent experimental results suggesting that neural circuits may evolve through multiple firing states, we develop a framework for estimating state-dependent neural response properties from spike train data. We modify the traditional hidden Markov model HMM framework to incorporate stimulus-driven, non-Poisson point-process observations. For maximal flexibility, we allow external, time-varying stimuli and the neurons' own spike histories to drive both the spiking behavior in each state and the transitioning behavior between states. We employ an appropriately modified expectation-maximization algorithm to estimate the model parameters. The expectation step is solved by the standard forward-backward algorithm for HMMs. The maximization step reduces to a set of separable concave optimization problems if the model is restricted slightly. We first test our algorithm on simulated data and are able to fully recover the parameters used to generate the data and accurately recapitulate the sequence of hidden states. We then apply our algorithm to a recently published data set in which the observed neuronal ensembles displayed multistate behavior and show that inclusion of spike history information significantly improves the fit of the model. Additionally, we show that a simple reformulation of the state space of the underlying Markov chain allows us to implement a hybrid half-multistate, half-histogram model that may be more appropriate for capturing the complexity of certain data sets than either a simple HMM or a simple peristimulus time histogram model alone.

Journal ArticleDOI
TL;DR: It is shown that LEMs are closely related to slow feature analysis (SFA), a biologically inspired, unsupervised learning algorithm originally designed for learning invariant visual representations, and SFA can be interpreted as a function approximation of Lems, where the topological neighborhoods required for Lems are implicitly defined by the temporal structure of the data.
Abstract: The past decade has seen a rise of interest in Laplacian eigenmaps (LEMs) for nonlinear dimensionality reduction. LEMs have been used in spectral clustering, in semisupervised learning, and for providing efficient state representations for reinforcement learning. Here, we show that LEMs are closely related to slow feature analysis (SFA), a biologically inspired, unsupervised learning algorithm originally designed for learning invariant visual representations. We show that SFA can be interpreted as a function approximation of LEMs, where the topological neighborhoods required for LEMs are implicitly defined by the temporal structure of the data. Based on this relation, we propose a generalization of SFA to arbitrary neighborhood relations and demonstrate its applicability for spectral clustering. Finally, we review previous work with the goal of providing a unifying view on SFA and LEMs.

Journal ArticleDOI
TL;DR: A general expression for a nonparametric empirical Bayes least squares (NEBLS) estimator is developed, which expresses the optimal least squares estimator in terms of the measurement density, with no explicit reference to the unknown (prior) density.
Abstract: Selection of an optimal estimator typically relies on either supervised training samples (pairs of measurements and their associated true values) or a prior probability model for the true values. Here, we consider the problem of obtaining a least squares estimator given a measurement process with known statistics (i.e., a likelihood function) and a set of unsupervised measurements, each arising from a corresponding true value drawn randomly from an unknown distribution. We develop a general expression for a nonparametric empirical Bayes least squares (NEBLS) estimator, which expresses the optimal least squares estimator in terms of the measurement density, with no explicit reference to the unknown (prior) density. We study the conditions under which such estimators exist and derive specific forms for a variety of different measurement processes. We further show that each of these NEBLS estimators may be used to express the mean squared estimation error as an expectation over the measurement density alone, thus generalizing Stein's unbiased risk estimator (SURE), which provides such an expression for the additive gaussian noise case. This error expression may then be optimized over noisy measurement samples, in the absence of supervised training data, yielding a generalized SURE-optimized parametric least squares (SURE2PLS) estimator. In the special case of a linear parameterization (i.e., a sum of nonlinear kernel functions), the objective function is quadratic, and we derive an incremental form for learning this estimator from data. We also show that combining the NEBLS form with its corresponding generalized SURE expression produces a generalization of the score-matching procedure for parametric density estimation. Finally, we have implemented several examples of such estimators, and we show that their performance is comparable to their optimal Bayesian or supervised regression counterparts for moderate to large amounts of data.

Journal ArticleDOI
TL;DR: A model that is able to extract object identity, position, and rotation angles, and the model behavior on complex three-dimensional objects under translation and rotation in depth on a homogeneous background is demonstrated.
Abstract: Primates are very good at recognizing objects independent of viewing angle or retinal position, and they outperform existing computer vision systems by far. But invariant object recognition is only one prerequisite for successful interaction with the environment. An animal also needs to assess an object's position and relative rotational angle. We propose here a model that is able to extract object identity, position, and rotation angles. We demonstrate the model behavior on complex three-dimensional objects under translation and rotation in depth on a homogeneous background. A similar model has previously been shown to extract hippocampal spatial codes from quasi-natural videos. The framework for mathematical analysis of this earlier application carries over to the scenario of invariant object recognition. Thus, the simulation results can be explained analytically even for the complex high-dimensional data we employed.

Journal ArticleDOI
TL;DR: This work approximates the regular cortical architecture as many interconnected cooperative-competitive modules so that by properly understanding the behavior of this small computational module, one can reason systematically about the stability and convergence of very large networks composed of these modules.
Abstract: The neocortex has a remarkably uniform neuronal organization, suggesting that common principles of processing are employed throughout its extent. In particular, the patterns of connectivity observed in the superficial layers of the visual cortex are consistent with the recurrent excitation and inhibitory feedback required for cooperative-competitive circuits such as the soft winner-take-all (WTA). WTA circuits offer interesting computational properties such as selective amplification, signal restoration, and decision making. But these properties depend on the signal gain derived from positive feedback, and so there is a critical trade-off between providing feedback strong enough to support the sophisticated computations while maintaining overall circuit stability. The issue of stability is all the more intriguing when one considers that the WTAs are expected to be densely distributed through the superficial layers and that they are at least partially interconnected. We consider how to reason about stability in very large distributed networks of such circuits. We approach this problem by approximating the regular cortical architecture as many interconnected cooperative-competitive modules. We demonstrate that by properly understanding the behavior of this small computational module, one can reason over the stability and convergence of very large networks composed of these modules. We obtain parameter ranges in which the WTA circuit operates in a high-gain regime, is stable, and can be aggregated arbitrarily to form large, stable networks. We use nonlinear contraction theory to establish conditions for stability in the fully nonlinear case and verify these solutions using numerical simulations. The derived bounds allow modes of operation in which the WTA network is multistable and exhibits state-dependent persistent activities. Our approach is sufficiently general to reason systematically about the stability of any network, biological or technological, composed of networks of small modules that express competition through shared inhibition.

Journal ArticleDOI
TL;DR: This work represents input spike trains as point processes, with each input spike eliciting a finite postsynaptic response, and derives several new results that provide intuitive insights into the fundamental mechanisms that modulate the transfer of spiking correlations.
Abstract: Correlations between neuronal spike trains affect network dynamics and population coding. Overlapping afferent populations and correlations between presynaptic spike trains introduce correlations between the inputs to downstream cells. To understand network activity and population coding, it is therefore important to understand how these input correlations are transferred to output correlations.Recent studies have addressed this question in the limit of many inputs with infinitesimal postsynaptic response amplitudes, where the total input can be approximated by gaussian noise. In contrast, we address the problem of correlation transfer by representing input spike trains as point processes, with each input spike eliciting a finite postsynaptic response. This approach allows us to naturally model synaptic noise and recurrent coupling and to treat excitatory and inhibitory inputs separately.We derive several new results that provide intuitive insights into the fundamental mechanisms that modulate the transfer of spiking correlations.

Journal ArticleDOI
TL;DR: A set of algorithms to simulate large spiking neural networks efficiently with high-level languages using vector-based operations are described, which constitute the core of Brian, a spiking Neural Network Simulator written in the Python language.
Abstract: High-level languages (Matlab, Python) are popular in neuroscience because they are flexible and accelerate development. However, for simulating spiking neural networks, the cost of interpretation is a bottleneck. We describe a set of algorithms to simulate large spiking neural networks efficiently with high-level languages using vector-based operations. These algorithms constitute the core of Brian, a spiking neural network simulator written in the Python language. Vectorized simulation makes it possible to combine the flexibility of high-level languages with the computational efficiency usually associated with compiled languages.

Journal ArticleDOI
TL;DR: A theoretical study of a general two-stage computational method that may help to significantly reduce the number of stimuli needed to obtain an accurate mathematical description of nonlinear neural responses.
Abstract: The stimulus-response relationship of many sensory neurons is nonlinear, but fully quantifying this relationship by a complex nonlinear model may require too much data to be experimentally tractable. Here we present a theoretical study of a general two-stage computational method that may help to significantly reduce the number of stimuli needed to obtain an accurate mathematical description of nonlinear neural responses. Our method of active data collection first adaptively generates stimuli that are optimal for estimating the parameters of competing nonlinear models and then uses these estimates to generate stimuli online that are optimal for discriminating these models. We applied our method to simple hierarchical circuit models, including nonlinear networks built on the spatiotemporal or spectral-temporal receptive fields, and confirmed that collecting data using our two-stage adaptive algorithm was far more effective for estimating and comparing competing nonlinear sensory processing models than standard nonadaptive methods using random stimuli.

Journal ArticleDOI
TL;DR: It is demonstrated that theoretical spike-field coherence for a broad class of spiking models depends on the expected rate of spiker activity, and intensity field coherence is a rate-independent measure and a candidate on which to base the appropriate statistical inference of spike field synchrony.
Abstract: The coherence between neural spike trains and local-field potential recordings, called spike-field coherence, is of key importance in many neuroscience studies. In this work, aside from questions of estimator performance, we demonstrate that theoretical spike-field coherence for a broad class of spiking models depends on the expected rate of spiking. This rate dependence confounds the phase locking of spike events to field-potential oscillations with overall neuron activity and is demonstrated analytically, for a large class of stochastic models, and in simulation. Finally, the relationship between the spike-field coherence and the intensity field coherence is detailed analytically. This latter quantity is independent of neuron firing rate and, under commonly found conditions, is proportional to the probability that a neuron spikes at a specific phase of field oscillation. Hence, intensity field coherence is a rate-independent measure and a candidate on which to base the appropriate statistical inference of spike field synchrony.

Journal ArticleDOI
TL;DR: A tuning example of a fast spiking neuron is presented, which reproduces the frequency-current characteristics of the reference data, as well as the membrane voltage behavior, in order to interconnect neuromimetic chips as neural networks, with specific cellular properties, for future theoretical studies in neuroscience.
Abstract: We propose a new estimation method for the characterization of the Hodgkin-Huxley formalism. This method is an alternative technique to the classical estimation methods associated with voltage clamp measurements. It uses voltage clamp type recordings, but is based on the differential evolution algorithm. The parameters of an ionic channel are estimated simultaneously, such that the usual approximations of classical methods are avoided and all the parameters of the model, including the time constant, can be correctly optimized. In a second step, this new estimation technique is applied to the automated tuning of neuromimetic analog integrated circuits designed by our research group. We present a tuning example of a fast spiking neuron, which reproduces the frequency-current characteristics of the reference data, as well as the membrane voltage behavior. The final goal of this tuning is to interconnect neuromimetic chips as neural networks, with specific cellular properties, for future theoretical studies in neuroscience.

Journal ArticleDOI
TL;DR: A new upper bound for the bias of the k-step CD is derived, which depends on k, the number of variables in the RBM, and the maximum change in energy that can be produced by changing a single variable.
Abstract: Optimization based on k-step contrastive divergence (CD) has become a common way to train restricted Boltzmann machines (RBMs). The k-step CD is a biased estimator of the log-likelihood gradient relying on Gibbs sampling. We derive a new upper bound for this bias. Its magnitude depends on k, the number of variables in the RBM, and the maximum change in energy that can be produced by changing a single variable. The last reflects the dependence on the absolute values of the RBM parameters. The magnitude of the bias is also affected by the distance in variation between the modeled distribution and the starting distribution of the Gibbs chain.

Journal ArticleDOI
TL;DR: A penalized likelihood estimator is proposed to address the difficulty of parameter estimation for gaussian mixture models with high dimensionality because of the large number of parameters that need to be estimated.
Abstract: Finite gaussian mixture models are widely used in statistics thanks to their great flexibility. However, parameter estimation for gaussian mixture models with high dimensionality can be challenging because of the large number of parameters that need to be estimated. In this letter, we propose a penalized likelihood estimator to address this difficulty. The -type penalty we impose on the inverse covariance matrices encourages sparsity on its entries and therefore helps to reduce the effective dimensionality of the problem. We show that the proposed estimate can be efficiently computed using an expectation-maximization algorithm. To illustrate the practical merits of the proposed method, we consider its applications in model-based clustering and mixture discriminant analysis. Numerical experiments with both simulated and real data show that the new method is a valuable tool for high-dimensional data analysis.

Journal ArticleDOI
TL;DR: Improved spike train measures can be successfully used for fitting neuron models to experimental data, for comparisons of spike trains, and classification of spike train data, and it is demonstrated that when similarity measures are used forfitting mathematical models, all previous methods systematically underestimate the noise.
Abstract: Multiple measures have been developed to quantify the similarity between two spike trains. These measures have been used for the quantification of the mismatch between neuron models and experiments as well as for the classification of neuronal responses in neuroprosthetic devices and electrophysiological experiments. Frequently only a few spike trains are available in each class. We derive analytical expressions for the small-sample bias present when comparing estimators of the time-dependent firing intensity. We then exploit analogies between the comparison of firing intensities and previously used spike train metrics and show that improved spike train measures can be successfully used for fitting neuron models to experimental data, for comparisons of spike trains, and classification of spike train data. In classification tasks, the improved similarity measures can increase the recovered information. We demonstrate that when similarity measures are used for fitting mathematical models, all previous methods systematically underestimate the noise. Finally, we show a striking implication of this deterministic bias by reevaluating the results of the single-neuron prediction challenge.

Journal ArticleDOI
TL;DR: In this article, a computational model that highlights the role of basal ganglia (BG) in generating simple reaching movements was presented, and the model was cast within the reinforcement learning (RL) framework with correspondence between RL components and neuroanatomy.
Abstract: We present a computational model that highlights the role of basal ganglia (BG) in generating simple reaching movements. The model is cast within the reinforcement learning (RL) framework with correspondence between RL components and neuroanatomy as follows: dopamine signal of substantia nigra pars compacta as the temporal difference error, striatum as the substrate for the critic, and the motor cortex as the actor. A key feature of this neurobiological interpretation is our hypothesis that the indirect pathway is the explorer. Chaotic activity, originating from the indirect pathway part of the model, drives the wandering, exploratory movements of the arm. Thus, the direct pathway subserves exploitation, while the indirect pathway subserves exploration. The motor cortex becomes more and more independent of the corrective influence of BG as training progresses. Reaching trajectories show diminishing variability with training. Reaching movements associated with Parkinson's disease (PD) are simulated by reducing dopamine and degrading the complexity of indirect pathway dynamics by switching it from chaotic to periodic behavior. Under the simulated PD conditions, the arm exhibits PD motor symptoms like tremor, bradykinesia and undershooting. The model echoes the notion that PD is a dynamical disease.