scispace - formally typeset
Search or ask a question

Showing papers in "Neural Computation in 2010"


Journal ArticleDOI
TL;DR: Good old online backpropagation for plain multilayer perceptrons yields a very low 0.35 error rate on the MNIST handwritten digits benchmark.
Abstract: Good old online backpropagation for plain multilayer perceptrons yields a very low 0.35% error rate on the MNIST handwritten digits benchmark. All we need to achieve this best result so far are many hidden layers, many neurons per layer, numerous deformed training images to avoid overfitting, and graphics cards to greatly speed up learning.

1,016 citations


Journal ArticleDOI
TL;DR: A model of supervised learning for biologically plausible neurons is presented that enables spiking neurons to reproduce arbitrary template spike patterns in response to given synaptic stimuli even in the presence of various sources of noise and shows that the learning rule can also be used for decision-making tasks.
Abstract: Learning from instructions or demonstrations is a fundamental property of our brain necessary to acquire new knowledge and develop novel skills or behavioral patterns. This type of learning is thought to be involved in most of our daily routines. Although the concept of instruction-based learning has been studied for several decades, the exact neural mechanisms implementing this process remain unrevealed. One of the central questions in this regard is, How do neurons learn to reproduce template signals (instructions) encoded in precisely timed sequences of spikes? Here we present a model of supervised learning for biologically plausible neurons that addresses this question. In a set of experiments, we demonstrate that our approach enables us to train spiking neurons to reproduce arbitrary template spike patterns in response to given synaptic stimuli even in the presence of various sources of noise. We show that the learning rule can also be used for decision-making tasks. Neurons can be trained to classify categories of input signals based on only a temporal configuration of spikes. The decision is communicated by emitting precisely timed spike trains associated with given input categories. Trained neurons can perform the classification task correctly even if stimuli and corresponding decision times are temporally separated and the relevant information is consequently highly overlapped by the ongoing neural activity. Finally, we demonstrate that neurons can be trained to reproduce sequences of spikes with a controllable time shift with respect to target templates. A reproduced signal can follow or even precede the targets. This surprising result points out that spiking neurons can potentially be applied to forecast the behavior (firing times) of other reference neurons or networks.

526 citations


Journal ArticleDOI
TL;DR: This work presents a machine learning approach to computing an affinity graph using a convolutional network (CN) trained using ground truth provided by human experts and shows that the CN affinity graph can be paired with any standard partitioning algorithm and improves segmentation accuracy significantly compared to standard hand-designed affinity functions.
Abstract: Many image segmentation algorithms first generate an affinity graph and then partition it. We present a machine learning approach to computing an affinity graph using a convolutional network (CN) trained using ground truth provided by human experts. The CN affinity graph can be paired with any standard partitioning algorithm and improves segmentation accuracy significantly compared to standard hand-designed affinity functions. We apply our algorithm to the challenging 3D segmentation problem of reconstructing neuronal processes from volumetric electron microscopy (EM) and show that we are able to learn a good affinity graph directly from the raw EM images. Further, we show that our affinity graph improves the segmentation accuracy of both simple and sophisticated graph partitioning algorithms. In contrast to previous work, we do not rely on prior knowledge in the form of hand-designed image features or image preprocessing. Thus, we expect our algorithm to generalize effectively to arbitrary image types.

404 citations


Journal ArticleDOI
TL;DR: A low-rank approximation to this interaction tensor that uses a sum of factors, each of which is a three-way outer product, which allows efficient learning of transformations between larger image patches and demonstrates the learning of optimal filter pairs from various synthetic and real image sequences.
Abstract: To allow the hidden units of a restricted Boltzmann machine to model the transformation between two successive images, Memisevic and Hinton (2007) introduced three-way multiplicative interactions that use the intensity of a pixel in the first image as a multiplicative gain on a learned, symmetric weight between a pixel in the second image and a hidden unit This creates cubically many parameters, which form a three-dimensional interaction tensor We describe a low-rank approximation to this interaction tensor that uses a sum of factors, each of which is a three-way outer product This approximation allows efficient learning of transformations between larger image patches Since each factor can be viewed as an image filter, the model as a whole learns optimal filter pairs for efficiently representing transformations We demonstrate the learning of optimal filter pairs from various synthetic and real image sequences We also show how learning about image transformations allows the model to perform a simple visual analogy task, and we show how a completely unsupervised network trained on transformations perceives multiple motions of transparent dot patterns in the same way as humans

263 citations


Journal ArticleDOI
TL;DR: This work analyzes operating regimes in the Willshaw model in which structural plasticity can compress the network structure and push performance to the theoretical benchmark and introduces fair measures for information-theoretic capacity in associative memory that also provide a theoretical benchmark.
Abstract: Neural associative networks with plastic synapses have been proposed as computational models of brain functions and also for applications such as pattern recognition and information retrieval. To guide biological models and optimize technical applications, several definitions of memory capacity have been used to measure the efficiency of associative memory. Here we explain why the currently used performance measures bias the comparison between models and cannot serve as a theoretical benchmark. We introduce fair measures for information-theoretic capacity in associative memory that also provide a theoretical benchmark. In neural networks, two types of manipulating synapses can be discerned: synaptic plasticity, the change in strength of existing synapses, and structural plasticity, the creation and pruning of synapses. One of the new types of memory capacity we introduce permits quantifying how structural plasticity can increase the network efficiency by compressing the network structure, for example, by pruning unused synapses. Specifically, we analyze operating regimes in the Willshaw model in which structural plasticity can compress the network structure and push performance to the theoretical benchmark. The amount C of information stored in each synapse can scale with the logarithm of the network size rather than being constant, as in classical Willshaw and Hopfield nets (≤ ln 2 ≈ 0.7). Further, the review contains novel technical material: a capacity analysis of the Willshaw model that rigorously controls for the level of retrieval quality, an analysis for memories with a nonconstant number of active units (where C ≤ 1/eln 2 ≈ 0.53), and the analysis of the computational complexity of associative memories with and without network compression.

181 citations


Journal ArticleDOI
TL;DR: It is proved that deep but narrow feedforward neural networks with sigmoidal units can represent any Boolean expression.
Abstract: Deep belief networks (DBN) are generative models with many layers of hidden causal variables, recently introduced by Hinton, Osindero, and Teh (2006), along with a greedy layer-wise unsupervised learning algorithm. Building on Le Roux and Bengio (2008) and Sutskever and Hinton (2008), we show that deep but narrow generative networks do not require more parameters than shallow ones to achieve universal approximation. Exploiting the proof technique, we prove that deep but narrow feedforward neural networks with sigmoidal units can represent any Boolean expression.

170 citations


Journal ArticleDOI
TL;DR: This letter shows that when geometry is introduced to evolved ANNs through the hypercube-based neuroevolution of augmenting topologies algorithm, they begin to acquire characteristics that indeed are reminiscent of biological brains.
Abstract: Looking to nature as inspiration, for at least the past 25 years, researchers in the field of neuroevolution (NE) have developed evolutionary algorithms designed specifically to evolve artificial neural networks (ANNs). Yet the ANNs evolved through NE algorithms lack the distinctive characteristics of biological brains, perhaps explaining why NE is not yet a mainstream subject of neural computation. Motivated by this gap, this letter shows that when geometry is introduced to evolved ANNs through the hypercube-based neuroevolution of augmenting topologies algorithm, they begin to acquire characteristics that indeed are reminiscent of biological brains. That is, if the neurons in evolved ANNs are situated at locations in space (i.e., if they are given coordinates), then, as experiments in evolving checkers-playing ANNs in this letter show, topographic maps with symmetries and regularities can evolve spontaneously. The ability to evolve such maps is shown in this letter to provide an important advantage in generalization. In fact, the evolved maps are sufficiently informative that their analysis yields the novel insight that the geometry of the connectivity patterns of more general players is significantly smoother and more contiguous than less general ones. Thus, the results reveal a correlation between generality and smoothness in connectivity patterns. They also hint at the intriguing possibility that as NE matures as a field, its algorithms can evolve ANNs of increasing relevance to those who study neural computation in general.

166 citations


Journal ArticleDOI
TL;DR: Investigating the influence of the network connectivity (parameterized by the neuron in-degree) on a family of network models that interpolates between analog and binary networks reveals that the phase transition between ordered and chaotic network behavior of binary circuits qualitatively differs from the one in analog circuits, leading to decreased computational performance observed in binary circuits that are densely connected.
Abstract: Reservoir computing (RC) systems are powerful models for online computations on input sequences. They consist of a memoryless readout neuron that is trained on top of a randomly connected recurrent neural network. RC systems are commonly used in two flavors: with analog or binary (spiking) neurons in the recurrent circuits. Previous work indicated a fundamental difference in the behavior of these two implementations of the RC idea. The performance of an RC system built from binary neurons seems to depend strongly on the network connectivity structure. In networks of analog neurons, such clear dependency has not been observed. In this letter, we address this apparent dichotomy by investigating the influence of the network connectivity (parameterized by the neuron in-degree) on a family of network models that interpolates between analog and binary networks. Our analyses are based on a novel estimation of the Lyapunov exponent of the network dynamics with the help of branching process theory, rank measures that estimate the kernel quality and generalization capabilities of recurrent networks, and a novel mean field predictor for computational performance. These analyses reveal that the phase transition between ordered and chaotic network behavior of binary circuits qualitatively differs from the one in analog circuits, leading to differences in the integration of information over short and long timescales. This explains the decreased computational performance observed in binary circuits that are densely connected. The mean field predictor is also used to bound the memory function of recurrent circuits of binary neurons.

161 citations


Journal ArticleDOI
TL;DR: In this article, a generalized activity model for neural networks is proposed, which includes higher-order statistics like correlations between firing, and it is shown in an example of an all-to-all connected network how their system of generalized activity equations captures phenomena missed by the mean field rate equations alone.
Abstract: Population rate or activity equations are the foundation of a common approach to modeling for neural networks. These equations provide mean field dynamics for the firing rate or activity of neurons within a network given some connectivity. The shortcoming of these equations is that they take into account only the average firing rate, while leaving out higher-order statistics like correlations between firing. A stochastic theory of neural networks that includes statistics at all orders was recently formulated. We describe how this theory yields a systematic extension to population rate equations by introducing equations for correlations and appropriate coupling terms. Each level of the approximation yields closed equations; they depend only on the mean and specific correlations of interest, without an ad hoc criterion for doing so. We show in an example of an all-to-all connected network how our system of generalized activity equations captures phenomena missed by the mean field rate equations alone.

156 citations


Journal ArticleDOI
TL;DR: This letter shows that the approach can effectively identify change-points in both toy and real data sets with complex hazard rates and how it can be used as an ideal-observer model for human and animal behavior when faced with rapidly changing inputs.
Abstract: Change-point models are generative models of time-varying data in which the underlying generative parameters undergo discontinuous changes at different points in time known as change points. Change-points often represent important events in the underlying processes, like a change in brain state reflected in EEG data or a change in the value of a company reflected in its stock price. However, change-points can be difficult to identify in noisy data streams. Previous attempts to identify change-points online using Bayesian inference relied on specifying in advance the rate at which they occur, called the hazard rate (h). This approach leads to predictions that can depend strongly on the choice of h and is unable to deal optimally with systems in which h is not constant in time. In this letter, we overcome these limitations by developing a hierarchical extension to earlier models. This approach allows h itself to be inferred from the data, which in turn helps to identify when change-points occur. We show that our approach can effectively identify change-points in both toy and real data sets with complex hazard rates and how it can be used as an ideal-observer model for human and animal behavior when faced with rapidly changing inputs.

153 citations


Journal ArticleDOI
TL;DR: A utility function based on mutual information is used and three intuitive interpretations of the utility function in terms of Bayesian posterior estimates are given and offered as a proof of concept to an experiment on memory retention.
Abstract: Discriminating among competing statistical models is a pressing issue for many experimentalists in the field of cognitive science. Resolving this issue begins with designing maximally informative experiments. To this end, the problem to be solved in adaptive design optimization is identifying experimental designs under which one can infer the underlying model in the fewest possible steps. When the models under consideration are nonlinear, as is often the case in cognitive science, this problem can be impossible to solve analytically without simplifying assumptions. However, as we show in this letter, a full solution can be found numerically with the help of a Bayesian computational trick derived from the statistics literature, which recasts the problem as a probability density simulation in which the optimal design is the mode of the density. We use a utility function based on mutual information and give three intuitive interpretations of the utility function in terms of Bayesian posterior estimates. As a proof of concept, we offer a simple example application to an experiment on memory retention.

Journal ArticleDOI
TL;DR: It is proved that integers suffice for computing all Turing computable sets of numbers in both the generative and the accepting modes and a characterization of the family of semilinear sets ofNumbers is obtained.
Abstract: A variant of spiking neural P systems with positive or negative weights on synapses is introduced, where the rules of a neuron fire when the potential of that neuron equals a given value. The involved values---weights, firing thresholds, potential consumed by each rule---can be real (computable) numbers, rational numbers, integers, and natural numbers. The power of the obtained systems is investigated. For instance, it is proved that integers (very restricted: 1, -1 for weights, 1 and 2 for firing thresholds, and as parameters in the rules) suffice for computing all Turing computable sets of numbers in both the generative and the accepting modes. When only natural numbers are used, a characterization of the family of semilinear sets of numbers is obtained. It is shown that spiking neural P systems with weights can efficiently solve computationally hard problems in a nondeterministic way. Some open problems and suggestions for further research are formulated.

Journal ArticleDOI
TL;DR: Relational topographic maps are introduced as an extension of relational clustering algorithms, which offer prototype-based representations of dissimilarity data, to incorporate neighborhood structure and are equivalent to the standard techniques if a Euclidean embedding exists, while preventing the need to explicitly compute such an embedding.
Abstract: Topographic maps such as the self-organizing map (SOM) or neural gas (NG) constitute powerful data mining techniques that allow simultaneously clustering data and inferring their topological structure, such that additional features, for example, browsing, become available. Both methods have been introduced for vectorial data sets; they require a classical feature encoding of information. Often data are available in the form of pairwise distances only, such as arise from a kernel matrix, a graph, or some general dissimilarity measure. In such cases, NG and SOM cannot be applied directly. In this article, we introduce relational topographic maps as an extension of relational clustering algorithms, which offer prototype-based representations of dissimilarity data, to incorporate neighborhood structure. These methods are equivalent to the standard (vectorial) techniques if a Euclidean embedding exists, while preventing the need to explicitly compute such an embedding. Extending these techniques for the general case of non-Euclidean dissimilarities makes possible an interpretation of relational clustering as clustering in pseudo-Euclidean space. We compare the methods to well-known clustering methods for proximity data based on deterministic annealing and discuss how far convergence can be guaranteed in the general case. Relational clustering is quadratic in the number of data points, which makes the algorithms infeasible for huge data sets. We propose an approximate patch version of relational clustering that runs in linear time. The effectiveness of the methods is demonstrated in a number of examples.

Journal ArticleDOI
TL;DR: It is shown that by using the hierarchical generative model, this letter can obtain good-quality reconstructions of visual images of handwritten digits presented during an fMRI scanning session.
Abstract: Recent research has shown that reconstruction of perceived images based on hemodynamic response as measured with functional magnetic resonance imaging (fMRI) is starting to become feasible. In this letter, we explore reconstruction based on a learned hierarchy of features by employing a hierarchical generative model that consists of conditional restricted Boltzmann machines. In an unsupervised phase, we learn a hierarchy of features from data, and in a supervised phase, we learn how brain activity predicts the states of those features. Reconstruction is achieved by sampling from the model, conditioned on brain activity. We show that by using the hierarchical generative model, we can obtain good-quality reconstructions of visual images of handwritten digits presented during an fMRI scanning session.

Journal ArticleDOI
TL;DR: It is demonstrated that DBNs can infer the underlying nonlinear and time-varying causal interactions between these neurons and can discriminate between mono- and polysynaptic links between them under certain constraints governing their putative connectivity.
Abstract: Coordination among cortical neurons is believed to be a key element in mediating many high-level cortical processes such as perception, attention, learning, and memory formation. Inferring the structure of the neural circuitry underlying this coordination is important to characterize the highly nonlinear, time-varying interactions between cortical neurons in the presence of complex stimuli. In this work, we investigate the applicability of dynamic Bayesian networks (DBNs) in inferring the effective connectivity between spiking cortical neurons from their observed spike trains. We demonstrate that DBNs can infer the underlying nonlinear and time-varying causal interactions between these neurons and can discriminate between mono-and polysynaptic links between them under certain constraints governing their putative connectivity. We analyzed conditionally Poisson spike train data mimicking spiking activity of cortical networks of small and moderately large size. The performance was assessed and compared to other methods under systematic variations of the network structure to mimic a wide range of responses typically observed in the cortex. Results demonstrate the utility of DBN in inferring the effective connectivity in cortical networks.

Journal ArticleDOI
TL;DR: A broader class of models consisting of two partially correlated neuronal integrators with arbitrarily time-varying decision boundaries that allow a natural description of confidence is introduced.
Abstract: Diffusion models have become essential for describing the performance and statistics of reaction times in human decision making. Despite their success, it is not known how to evaluate decision confidence from them. I introduce a broader class of models consisting of two partially correlated neuronal integrators with arbitrarily time-varying decision boundaries that allow a natural description of confidence. The dependence of decision confidence on the state of the losing integrator, decision time, time-varying boundaries, and correlations is analytically described. The marginal confidence is computed for the half-anticorrelated case using the exact solution of the diffusion process with constant boundaries and compared to that of the independent and completely anticorrelated cases.

Journal ArticleDOI
TL;DR: This study investigates the tracking dynamics of continuous attractor neural networks (CANNs) by using the wave functions of the quantum harmonic oscillator as the basis and develops a perturbation approach that utilizes the dominating movement of the network's stationary states in the state space.
Abstract: Understanding how the dynamics of a neural network is shaped by the network structure and, consequently, how the network structure facilitates the functions implemented by the neural system is at the core of using mathematical models to elucidate brain functions. This study investigates the tracking dynamics of continuous attractor neural networks (CANNs). Due to the translational invariance of neuronal recurrent interactions, CANNs can hold a continuous family of stationary states. They form a continuous manifold in which the neural system is neutrally stable. We systematically explore how this property facilitates the tracking performance of a CANN, which is believed to have clear correspondence with brain functions. By using the wave functions of the quantum harmonic oscillator as the basis, we demonstrate how the dynamics of a CANN is decomposed into different motion modes, corresponding to distortions in the amplitude, position, width, or skewness of the network state. We then develop a perturbation approach that utilizes the dominating movement of the network's stationary states in the state space. This method allows us to approximate the network dynamics up to an arbitrary accuracy depending on the order of perturbation used. We quantify the distortions of a gaussian bump during tracking and study their effects on tracking performance. Results are obtained on the maximum speed for a moving stimulus to be trackable and the reaction time for the network to catch up with an abrupt change in the stimulus.

Journal ArticleDOI
John Hertz1
TL;DR: Examination of the correlation functions of synaptic currents reveals that after these bursts of firing, balance is restored within a few milliseconds by a rapid increase in inhibitory synaptic conductance.
Abstract: Neuronal firing correlations are studied using simulations of a simple network model for a cortical column in a high-conductance state with dynamically balanced excitation and inhibition. Although correlations between individual pairs of neurons exhibit considerable heterogeneity, population averages show systematic behavior. When the network is in a stationary state, the average correlations are generically small: correlation coefficients are of order 1/N, where N is the number of neurons in the network. However, when the input to the network varies strongly in time, much larger values are found. In this situation, the network is out of balance, and the synaptic conductance is low, at times when the strongest firing occurs. However, examination of the correlation functions of synaptic currents reveals that after these bursts, balance is restored within a few milliseconds by a rapid increase in inhibitory synaptic conductance. These findings suggest an extension of the notion of the balanced state to include balanced fluctuations of synaptic currents, with a characteristic timescale of a few milliseconds.

Journal ArticleDOI
Richard F. Lyon1, Martin Rehn1, Samy Bengio1, Thomas C. Walters1, Gal Chechik1 
TL;DR: A machine-vision method is adapted, the passive-aggressive model for image retrieval (PAMIR), which efficiently learns a linear mapping from a very large sparse feature space to a large query-term space and shows a significant advantage for the auditory models over vector-quantized MFCCs.
Abstract: To create systems that understand the sounds that humans are exposed to in everyday life, we need to represent sounds with features that can discriminate among many different sound classes Here, we use a sound-ranking framework to quantitatively evaluate such representations in a large-scale task We have adapted a machine-vision method, the passive-aggressive model for image retrieval (PAMIR), which efficiently learns a linear mapping from a very large sparse feature space to a large query-term space Using this approach, we compare different auditory front ends and different ways of extracting sparse features from high-dimensional auditory images We tested auditory models that use an adaptive pole--zero filter cascade (PZFC) auditory filter bank and sparse-code feature extraction from stabilized auditory images with multiple vector quantizers In addition to auditory image models, we compare a family of more conventional mel-frequency cepstral coefficient (MFCC) front ends The experimental results show a significant advantage for the auditory models over vector-quantized MFCCs When thousands of sound files with a query vocabulary of thousands of words were ranked, the best precision at top-1 was 73% and the average precision was 35%, reflecting a 18% improvement over the best competing MFCC front end

Journal ArticleDOI
TL;DR: It is proposed that replication (with mutation) of patterns of neuronal activity can occur within the brain using known neurophysiological processes, and evolutionary algorithms implemented by neuro- nal circuits can play a role in cognition.
Abstract: We propose that replication (with mutation) of patterns of neuronal activity can occur within the brain using known neurophysiological processes. Thereby evolutionary algorithms implemented by neuro-nal circuits can play a role in cognition. Replication of structured neuronal representations is assumed in several cognitive architectures. Replicators overcome some limitations of selectionist models of neuronal search. Hebbian learning is combined with replication to structure exploration on the basis of associations learned in the past. Neuromodulatory gating of sets of bistable neurons allows patterns of activation to be copied with mutation. If the probability of copying a set is related to the utility of that set, then an evolutionary algorithm can be implemented at rapid timescales in the brain. Populations of neuronal replicators can undertake a more rapid and stable search than can be achieved by serial modification of a single solution. Hebbian learning added to neuronal replication allows a powerful structuring of variability capable of learning the location of a global optimum from multiple previously visited local optima. Replication of solutions can solve the problem of catastrophic forgetting in the stability-plasticity dilemma. In short, neuronal replication is essential to explain several features of flexible cognition. Predictions are made for the experimental validation of the neuronal replicator hypothesis.

Journal ArticleDOI
TL;DR: It is shown that psychophysically fitted image representation in V1 has appealing statistical properties, for example, approximate PDF factorization and substantial mutual information reduction, even though no statistical information is used to fit the V1 model.
Abstract: The conventional approach in computational neuroscience in favor of the efficient coding hypothesis goes from image statistics to perception. It has been argued that the behavior of the early stages of biological visual processing (e.g., spatial frequency analyzers and their nonlinearities) may be obtained from image samples and the efficient coding hypothesis using no psychophysical or physiological information. In this work we address the same issue in the opposite direction: from perception to image statistics. We show that psychophysically fitted image representation in V1 has appealing statistical properties, for example, approximate PDF factorization and substantial mutual information reduction, even though no statistical information is used to fit the V1 model. These results are complementary evidence in favor of the efficient coding hypothesis.

Journal ArticleDOI
TL;DR: A new family of positive-definite kernels for large margin classification in support vector machines (SVMs) are introduced and it is found that on some problems, these SVMs yield state-of-the-art results, beating not only other SVMs but also deep belief nets.
Abstract: We introduce a new family of positive-definite kernels for large margin classification in support vector machines (SVMs). These kernels mimic the computation in large neural networks with one layer of hidden units. We also show how to derive new kernels, by recursive composition, that may be viewed as mapping their inputs through a series of nonlinear feature spaces. These recursively derived kernels mimic the computation in deep networks with multiple hidden layers. We evaluate SVMs with these kernels on problems designed to illustrate the advantages of deep architectures. Compared to previous benchmarks, we find that on some problems, these SVMs yield state-of-the-art results, beating not only other SVMs but also deep belief nets.

Journal ArticleDOI
TL;DR: By quantitatively assessing the efficiency of the neural representation during learning, a cooperative homeostasis mechanism is derived that optimally tunes the competition between neurons within the sparse coding algorithm.
Abstract: Neurons in the input layer of primary visual cortex in primates develop edge-like receptive fields. One approach to understanding the emergence of this response is to state that neural activity has to efficiently represent sensory data with respect to the statistics of natural scenes. Furthermore, it is believed that such an efficient coding is achieved using a competition across neurons so as to generate a sparse representation, that is, where a relatively small number of neurons are simultaneously active. Indeed, different models of sparse coding, coupled with Hebbian learning and homeostasis, have been proposed that successfully match the observed emergent response. However, the specific role of homeostasis in learning such sparse representations is still largely unknown. By quantitatively assessing the efficiency of the neural representation during learning, we derive a cooperative homeostasis mechanism that optimally tunes the competition between neurons within the sparse coding algorithm. We apply this homeostasis while learning small patches taken from natural images and compare its efficiency with state-of-the-art algorithms. Results show that while different sparse coding algorithms give similar coding results, the homeostasis provides an optimal balance for the representation of natural images within the population of neurons. Competition in sparse coding is optimized when it is fair. By contributing to optimizing statistical competition across neurons, homeostasis is crucial in providing a more efficient solution to the emergence of independent components.

Journal ArticleDOI
TL;DR: It is demonstrated that finite temporal resolution of discrete time models prevents their rescaled ISIs from being exponentially distributed, and a discrete time version of the time-rescaling theorem is proved that analytically corrects for the effects of finite resolution.
Abstract: One approach for understanding the encoding of information by spike trains is to fit statistical models and then test their goodness of fit. The time-rescaling theorem provides a goodness-of-fit test consistent with the point process nature of spike trains. The interspike intervals (ISIs) are rescaled (as a function of the model's spike probability) to be independent and exponentially distributed if the model is accurate. A Kolmogorov-Smirnov (KS) test between the rescaled ISIs and the exponential distribution is then used to check goodness of fit. This rescaling relies on assumptions of continuously defined time and instantaneous events. However, spikes have finite width, and statistical models of spike trains almost always discretize time into bins. Here we demonstrate that finite temporal resolution of discrete time models prevents their rescaled ISIs from being exponentially distributed. Poor goodness of fit may be erroneously indicated even if the model is exactly correct. We present two adaptations of the time-rescaling theorem to discrete time models. In the first we propose that instead of assuming the rescaled times to be exponential, the reference distribution be estimated through direct simulation by the fitted model. In the second, we prove a discrete time version of the time-rescaling theorem that analytically corrects for the effects of finite resolution. This allows us to define a rescaled time that is exponentially distributed, even at arbitrary temporal discretizations. We demonstrate the efficacy of both techniques by fitting generalized linear models to both simulated spike trains and spike trains recorded experimentally in monkey V1 cortex. Both techniques give nearly identical results, reducing the false-positive rate of the KS test and greatly increasing the reliability of model evaluation based on the time-rescaling theorem.

Journal ArticleDOI
TL;DR: In this paper, causal state models (CSMs), the minimal hidden Markov models or stochastic automata capable of generating statistically identical time series, are used to quantify both the generalizable structure and the idiosyncratic randomness of the spike train.
Abstract: Neurons perform computations, and convey the results of those computations through the statistical structure of their output spike trains. Here we present a practical method, grounded in the information-theoretic analysis of prediction, for inferring a minimal representation of that structure and for characterizing its complexity. Starting from spike trains, our approach finds their causal state models (CSMs), the minimal hidden Markov models or stochastic automata capable of generating statistically identical time series. We then use these CSMs to objectively quantify both the generalizable structure and the idiosyncratic randomness of the spike train. Specifically, we show that the expected algorithmic information content (the information needed to describe the spike train exactly) can be split into three parts describing (1) the time-invariant structure (complexity) of the minimal spike-generating process, which describes the spike train statistically; (2) the randomness (internal entropy rate) of the minimal spike-generating process; and (3) a residual pure noise term not described by the minimal spike-generating process. We use CSMs to approximate each of these quantities. The CSMs are inferred nonparametrically from the data, making only mild regularity assumptions, via the causal state splitting reconstruction algorithm. The methods presented here complement more traditional spike train analyses by describing not only spiking probability and spike train entropy, but also the complexity of a spike train's structure. We demonstrate our approach using both simulated spike trains and experimental data recorded in rat barrel cortex during vibrissa stimulation.

Journal ArticleDOI
TL;DR: This work quantified how changes in neurotransmitter release probability (p) modulated information transmission of a cerebellar granule cell and showed that the spatiotemporal characteristics of the inputs determine the effect of p on neurotransmission, thus permitting the selection of distinctive preferred stimuli for different p values.
Abstract: A nerve cell receives multiple inputs from upstream neurons by way of its synapses. Neuron processing functions are thus influenced by changes in the biophysical properties of the synapse, such as long-term potentiation (LTP) or depression (LTD). This observation has opened new perspectives on the biophysical basis of learning and memory, but its quantitative impact on the information transmission of a neuron remains partially elucidated. One major obstacle is the high dimensionality of the neuronal input-output space, which makes it unfeasible to perform a thorough computational analysis of a neuron with multiple synaptic inputs. In this work, information theory was employed to characterize the information transmission of a cerebellar granule cell over a region of its excitatory input space following synaptic changes. Granule cells have a small dendritic tree (on average, they receive only four mossy fiber afferents), which greatly bounds the input combinatorial space, reducing the complexity of information-theoretic calculations. Numerical simulations and LTP experiments quantified how changes in neurotransmitter release probability (p) modulated information transmission of a cerebellar granule cell. Numerical simulations showed that p shaped the neurotransmission landscape in unexpected ways. As p increased, the optimality of the information transmission of most stimuli did not increase strictly monotonically; instead it reached a plateau at intermediate p levels. Furthermore, our results showed that the spatiotemporal characteristics of the inputs determine the effect of p on neurotransmission, thus permitting the selection of distinctive preferred stimuli for different p values. These selective mechanisms may have important consequences on the encoding of cerebellar mossy fiber inputs and the plasticity and computation at the next circuit stage, including the parallel fiber--Purkinje cell synapses.

Journal ArticleDOI
TL;DR: A neural network to simulate the visual-tactile representation of the peripersonal space around the right and left hands is developed, able to mimic the responses characteristic of right-brain-damaged patients with left tactile extinction.
Abstract: Neurophysiological and behavioral studies suggest that the peripersonal space is represented in a multisensory fashion by integrating stimuli of different modalities. We developed a neural network to simulate the visual-tactile representation of the peripersonal space around the right and left hands. The model is composed of two networks (one per hemisphere), each with three areas of neurons: two are unimodal (visual and tactile) and communicate by synaptic connections with a third downstream multimodal (visual-tactile) area. The hemispheres are interconnected by inhibitory synapses. We applied a combination of analytic and computer simulation techniques. The analytic approach requires some simplifying assumptions and approximations (linearization and a reduced number of neurons) and is used to investigate network stability as a function of parameter values, providing some emergent properties. These are then tested and extended by computer simulations of a more complex nonlinear network that does not rely on the previous simplifications. With basal parameter values, the extended network reproduces several in vivo phenomena: multisensory coding of peripersonal space, reinforcement of unisensory perception by multimodal stimulation, and coexistence of simultaneous right-and left-hand representations in bilateral stimulation. By reducing the strength of the synapses from the right tactile neurons, the network is able to mimic the responses characteristic of right-brain-damaged patients with left tactile extinction: perception of unilateral left tactile stimulation, cross-modal extinction and cross-modal facilitation in bilateral stimulation. Finally, a variety of sensitivity analyses on some key parameters was performed to shed light on the contribution of single-model components in network behaviour. The model may help us understand the neural circuitry underlying peripersonal space representation and identify its alterations explaining neurological deficits. In perspective, it could help in interpreting results of psychophysical and behavioral trials and clarifying the neural correlates of multisensory-based rehabilitation procedures.

Journal ArticleDOI
TL;DR: This work derives a closed-form solution for estimating the phase coupling parameters from observed phase statistics and derives a regularized solution to the estimation and shows that the resulting procedure improves performance when only a limited amount of data is available.
Abstract: Coupled oscillators are prevalent throughout the physical world. Dynamical system formulations of weakly coupled oscillator systems have proven effective at capturing the properties of real-world systems and are compelling models of neural systems. However, these formulations usually deal with the forward problem: simulating a system from known coupling parameters. Here we provide a solution to the inverse problem: determining the coupling parameters from measurements. Starting from the dynamic equations of a system of symmetrically coupled phase oscillators, given by a nonlinear Langevin equation, we derive the corresponding equilibrium distribution. This formulation leads us to the maximum entropy distribution that captures pairwise phase relationships. To solve the inverse problem for this distribution, we derive a closed-form solution for estimating the phase coupling parameters from observed phase statistics. Through simulations, we show that the algorithm performs well in high dimensions (d = 100) and in cases with limited data (as few as 100 samples per dimension). In addition, we derive a regularized solution to the estimation and show that the resulting procedure improves performance when only a limited amount of data is available. Because the distribution serves as the unique maximum entropy solution for pairwise phase statistics, phase coupling estimation can be broadly applied in any situation where phase measurements are made. Under the physical interpretation, the model may be used for inferring coupling relationships within cortical networks.

Journal ArticleDOI
TL;DR: Ten methods of classifying fMRI volumes by applying them to data from a longitudinal study of stroke recovery are compared, and the best overall performers were adaptive quadratic discriminant, support vector machines with RBF kernels, and generatively trained pairs of RBMs.
Abstract: We compare 10 methods of classifying fMRI volumes by applying them to data from a longitudinal study of stroke recovery: adaptive Fisher's linear and quadratic discriminant; gaussian naive Bayes; support vector machines with linear, quadratic, and radial basis function (RBF) kernels; logistic regression; two novel methods based on pairs of restricted Boltzmann machines (RBM); and K-nearest neighbors. All methods were tested on three binary classification tasks, and their out-of-sample classification accuracies are compared. The relative performance of the methods varies considerably across subjects and classification tasks. The best overall performers were adaptive quadratic discriminant, support vector machines with RBF kernels, and generatively trained pairs of RBMs.

Journal ArticleDOI
TL;DR: A uniform probabilistic framework to separate convolutive mixtures of acoustic signals using independent vector analysis (IVA), which is based on a joint distribution for the frequency components originating from the same source and is capable of preventing permutation disorder.
Abstract: Convolutive mixtures of signals, which are common in acoustic environments, can be difficult to separate into their component sources. Here we present a uniform probabilistic framework to separate convolutive mixtures of acoustic signals using independent vector analysis (IVA), which is based on a joint distribution for the frequency components originating from the same source and is capable of preventing permutation disorder. Different gaussian mixture models (GMM) served as source priors, in contrast to the original IVA model, where all sources were modeled by identical multivariate Laplacian distributions. This flexible source prior enabled the IVA model to separate different type of signals. Three classes of models were derived and tested: noiseless IVA, online IVA, and noisy IVA. In the IVA model without sensor noise, the unmixing matrices were efficiently estimated by the expectation maximization (EM) algorithm. An online EM algorithm was derived for the online IVA algorithm to track the movement of the sources and separate them under nonstationary conditions. The noisy IVA model included the sensor noise and combined denoising with separation. An EM algorithm was developed that found the model parameters and separated the sources simultaneously. These algorithms were applied to separate mixtures of speech and music. Performance as measured by the signal-to-interference ratio (SIR) was substantial for all three models.