Showing papers in "Neural Computation in 2017"

PDF

Open Access

Journal Article•DOI•

Deep convolutional neural networks for image classification: A comprehensive review

[...]

Waseem Rawat¹, Zenghui Wang¹•Institutions (1)

01 Sep 2017-Neural Computation

TL;DR: This review, which focuses on the application of CNNs to image classification tasks, covers their development, from their predecessors up to recent state-of-the-art deep learning systems.

...read moreread less

Abstract: Convolutional neural networks CNNs have been applied to visual tasks since the late 1980s. However, despite a few scattered applications, they were dormant until the mid-2000s when developments in computing power and the advent of large amounts of labeled data, supplemented by improved algorithms, contributed to their advancement and brought them to the forefront of a neural network renaissance that has seen rapid progression since 2012. In this review, which focuses on the application of CNNs to image classification tasks, we cover their development, from their predecessors up to recent state-of-the-art deep learning systems. Along the way, we analyze 1 their early successes, 2 their role in the deep learning renaissance, 3 selected symbolic works that have contributed to their recent popularity, and 4 several improvement attempts by reviewing contributions and challenges of over 300 publications. We also introduce some of their current trends and remaining challenges.

...read moreread less

2,366 citations

Journal Article•DOI•

Active inference: A process theory

[...]

Karl J. Friston¹, Thomas H. B. FitzGerald¹, Francesco Rigoli¹, Philipp Schwartenbeck¹, Giovanni Pezzulo² - Show less +1 more•Institutions (2)

Wellcome Trust Centre for Neuroimaging¹, National Research Council²

01 Jan 2017-Neural Computation

TL;DR: The fact that a gradient descent appears to be a valid description of neuronal activity means that variational free energy is a Lyapunov function for neuronal dynamics, which therefore conform to Hamilton’s principle of least action.

...read moreread less

Abstract: This article describes a process theory based on active inference and belief propagation. Starting from the premise that all neuronal processing and action selection can be explained by maximizing Bayesian model evidence-or minimizing variational free energy-we ask whether neuronal responses can be described as a gradient descent on variational free energy. Using a standard Markov decision process generative model, we derive the neuronal dynamics implicit in this description and reproduce a remarkable range of well-characterized neuronal phenomena. These include repetition suppression, mismatch negativity, violation responses, place-cell activity, phase precession, theta sequences, theta-gamma coupling, evidence accumulation, race-to-bound dynamics, and transfer of dopamine responses. Furthermore, the approximately Bayes' optimal behavior prescribed by these dynamics has a degree of face validity, providing a formal explanation for reward seeking, context learning, and epistemic foraging. Technically, the fact that a gradient descent appears to be a valid description of neuronal activity means that variational free energy is a Lyapunov function for neuronal dynamics, which therefore conform to Hamilton's principle of least action.

...read moreread less

708 citations

Journal Article•DOI•

Active Inference, Curiosity and Insight

[...]

Karl J. Friston¹, Marco Lin¹, Chris D. Frith¹, Giovanni Pezzulo², J. Allan Hobson¹, Sasha Ondobaka¹ - Show less +2 more•Institutions (2)

Wellcome Trust Centre for Neuroimaging¹, National Research Council²

11 Sep 2017-Neural Computation

TL;DR: This article uses simulations of abstract rule learning and approximate Bayesian inference to show that minimizing (expected) variational free energy leads to active sampling of novel contingencies and closes explanatory gaps in generative models of the world, thereby reducing uncertainty and satisfying curiosity.

...read moreread less

Abstract: This article offers a formal account of curiosity and insight in terms of active (Bayesian) inference. It deals with the dual problem of inferring states of the world and learning its statistical structure. In contrast to current trends in machine learning (e.g., deep learning), we focus on how people attain insight and understanding using just a handful of observations, which are solicited through curious behavior. We use simulations of abstract rule learning and approximate Bayesian inference to show that minimizing (expected) variational free energy leads to active sampling of novel contingencies. This epistemic behavior closes explanatory gaps in generative models of the world, thereby reducing uncertainty and satisfying curiosity. We then move from epistemic learning to model selection or structure learning to show how abductive processes emerge when agents test plausible hypotheses about symmetries (i.e., invariances or rules) in their generative models. The ensuing Bayesian model reduction evinces ...

...read moreread less

248 citations

Journal Article•DOI•

An Approximation of the Error Backpropagation Algorithm in a Predictive Coding Network with Local Hebbian Synaptic Plasticity.

[...]

James C.R. Whittington¹, Rafal Bogacz¹•Institutions (1)

John Radcliffe Hospital¹

14 Apr 2017-Neural Computation

TL;DR: It is shown that a network developed in the predictive coding framework can efficiently perform supervised learning fully autonomously, employing only simple local Hebbian plasticity.

...read moreread less

Abstract: To efficiently learn from feedback, cortical networks need to update synaptic weights on multiple levels of cortical hierarchy. An effective and well-known algorithm for computing such changes in synaptic weights is the error backpropagation algorithm. However, in this algorithm, the change in synaptic weights is a complex function of weights and activities of neurons not directly connected with the synapse being modified, whereas the changes in biological synapses are determined only by the activity of presynaptic and postsynaptic neurons. Several models have been proposed that approximate the backpropagation algorithm with local synaptic plasticity, but these models require complex external control over the network or relatively complex plasticity rules. Here we show that a network developed in the predictive coding framework can efficiently perform supervised learning fully autonomously, employing only simple local Hebbian plasticity. Furthermore, for certain parameters, the weight change in the predictive coding model converges to that of the backpropagation algorithm. This suggests that it is possible for cortical networks with simple Hebbian synaptic plasticity to implement efficient learning algorithms in which synapses in areas on multiple levels of hierarchy are modified to minimize the error on the output.

...read moreread less

173 citations

Journal Article•DOI•

The deterministic information bottleneck

[...]

DJ Strouse¹, David J. Schwab²•Institutions (2)

Princeton University¹, Northwestern University²

01 Jun 2017-Neural Computation

TL;DR: The deterministic information bottleneck DIB (DIB) as mentioned in this paper is an alternative formulation that replaces mutual information with entropy, which they argue better captures this notion of compression and argues that the solution to the DIB problem turns out to be a deterministic encoder, or hard clustering, as opposed to the stochastic encoder or soft clustering.

...read moreread less

Abstract: Lossy compression and clustering fundamentally involve a decision about which features are relevant and which are not. The information bottleneck method IB by Tishby, Pereira, and Bialek 1999 formalized this notion as an information-theoretic optimization problem and proposed an optimal trade-off between throwing away as many bits as possible and selectively keeping those that are most important. In the IB, compression is measured by mutual information. Here, we introduce an alternative formulation that replaces mutual information with entropy, which we call the deterministic information bottleneck DIB and argue better captures this notion of compression. As suggested by its name, the solution to the DIB problem turns out to be a deterministic encoder, or hard clustering, as opposed to the stochastic encoder, or soft clustering, that is optimal under the IB. We compare the IB and DIB on synthetic data, showing that the IB and DIB perform similarly in terms of the IB cost function, but that the DIB significantly outperforms the IB in terms of the DIB cost function. We also empirically find that the DIB offers a considerable gain in computational efficiency over the IB, over a range of convergence parameters. Our derivation of the DIB also suggests a method for continuously interpolating between the soft clustering of the IB and the hard clustering of the DIB.

...read moreread less

100 citations

Journal Article•DOI•

Variational Latent Gaussian Process for Recovering Single-Trial Dynamics from Population Spike Trains.

[...]

Yuan Zhao¹, Il Memming Park¹•Institutions (1)

Stony Brook University¹

14 Apr 2017-Neural Computation

TL;DR: The variational latent gaussian process (vLGP) is proposed, a practical and efficient inference method that combines a generative model with a history-dependent point process observation, together with a smoothness prior on the latent trajectories to reveal hidden neural dynamics from large-scale neural recordings.

...read moreread less

Abstract: When governed by underlying low-dimensional dynamics, the interdependence of simultaneously recorded populations of neurons can be explained by a small number of shared factors, or a low-dimensional trajectory. Recovering these latent trajectories, particularly from single-trial population recordings, may help us understand the dynamics that drive neural computation. However, due to the biophysical constraints and noise in the spike trains, inferring trajectories from data is a challenging statistical problem in general. Here, we propose a practical and efficient inference method, the variational latent gaussian process (vLGP). The vLGP combines a generative model with a history-dependent point process observation, together with a smoothness prior on the latent trajectories. The vLGP improves on earlier methods for recovering latent trajectories, which assume either observation models inappropriate for point processes or linear dynamics. We compare and validate vLGP on both simulated data sets and populat...

...read moreread less

99 citations

Journal Article•DOI•

Stdp-compatible approximation of backpropagation in an energy-based model

[...]

Yoshua Bengio¹, Thomas Mesnard², Asja Fischer³, Saizheng Zhang⁴, Yuhuai Wu⁵ - Show less +1 more•Institutions (5)

Canadian Institute for Advanced Research¹, École Normale Supérieure², University of Bonn³, Université de Montréal⁴, University of Toronto⁵

01 Mar 2017-Neural Computation

TL;DR: Simulations and a theoretical argument suggest that this rate-based update rule is consistent with those associated with spike-timing-dependent plasticity, and could be an element of a theory for explaining how brains perform credit assignment in deep hierarchies as efficiently as backpropagation does.

...read moreread less

Abstract: We show that Langevin Markov chain Monte Carlo inference in an energy-based model with latent variables has the property that the early steps of inference, starting from a stationary point, correspond to propagating error gradients into internal layers, similar to backpropagation. The backpropagated error is with respect to output units that have received an outside driving force pushing them away from the stationary point. Backpropagated error gradients correspond to temporal derivatives with respect to the activation of hidden units. These lead to a weight update proportional to the product of the presynaptic firing rate and the temporal rate of change of the postsynaptic firing rate. Simulations and a theoretical argument suggest that this rate-based update rule is consistent with those associated with spike-timing-dependent plasticity. The ideas presented in this article could be an element of a theory for explaining how brains perform credit assignment in deep hierarchies as efficiently as backpropagation does, with neural computation corresponding to both approximate inference in continuous-valued latent variables and error backpropagation, at the same time.

...read moreread less

88 citations

Journal Article•DOI•

Deep learning with dynamic spiking neurons and fixed feedback weights

[...]

Arash Samadi¹, Timothy P. Lillicrap², Douglas B. Tweed³•Institutions (3)

University of Toronto¹, Google², York University³

01 Mar 2017-Neural Computation

TL;DR: These problems can be solved by two simple devices: learning rules can approximate dynamic input-output relations with piecewise-smooth functions, and a variation on the feedback alignment algorithm can train deep networks without having to coordinate forward and feedback synapses.

...read moreread less

Abstract: Recent work in computer science has shown the power of deep learning driven by the backpropagation algorithm in networks of artificial neurons. But real neurons in the brain are different from most of these artificial ones in at least three crucial ways: they emit spikes rather than graded outputs, their inputs and outputs are related dynamically rather than by piecewise-smooth functions, and they have no known way to coordinate arrays of synapses in separate forward and feedback pathways so that they change simultaneously and identically, as they do in backpropagation. Given these differences, it is unlikely that current deep learning algorithms can operate in the brain, but we that show these problems can be solved by two simple devices: learning rules can approximate dynamic input-output relations with piecewise-smooth functions, and a variation on the feedback alignment algorithm can train deep networks without having to coordinate forward and feedback synapses. Our results also show that deep spiking networks learn much better if each neuron computes an intracellular teaching signal that reflects that cell's nonlinearity. With this mechanism, networks of spiking neurons show useful learning in synapses at least nine layers upstream from the output cells and perform well compared to other spiking networks in the literature on the MNIST digit recognition task.

...read moreread less

69 citations

Journal Article•DOI•

Capturing the Dynamical Repertoire of Single Neurons with Generalized Linear Models.

[...]

Alison I. Weber¹, Jonathan W. Pillow²•Institutions (2)

University of Washington¹, Princeton University²

28 Nov 2017-Neural Computation

TL;DR: It is shown that Poisson GLMs can reproduce a comprehensive suite of canonical neural response behaviors, including tonic and phasic spiking, bursting, spike rate adaptation, type I and type II excitation, and two forms of bistability.

...read moreread less

Abstract: A key problem in computational neuroscience is to find simple, tractable models that are nevertheless flexible enough to capture the response properties of real neurons. Here we examine the capabil...

...read moreread less

68 citations

Journal Article•DOI•

Learning Simpler Language Models with the Differential State Framework.

[...]

Alexander G. Ororbia¹, Tomas Mikolov², David Reitter¹•Institutions (2)

Penn State College of Information Sciences and Technology¹, Facebook²

28 Nov 2017-Neural Computation

TL;DR: Learning useful information across long time lags is a critical and difficult problem for temporal neural models in tasks such as language modeling as discussed by the authors, and existing architectures that address the issue ar...

...read moreread less

Abstract: Learning useful information across long time lags is a critical and difficult problem for temporal neural models in tasks such as language modeling. Existing architectures that address the issue ar...

...read moreread less

67 citations

Journal Article•DOI•

Comparison of different generalizations of clustering coefficient and local efficiency for weighted undirected graphs

[...]

Yu Wang¹, Eshwar Gorakhnath Ghumare¹, Rik Vandenberghe¹, Patrick Dupont¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Feb 2017-Neural Computation

TL;DR: The best generalization of the clustering coefficient is , defined in Miyajima and Sakuragawa (2014), while the best generalizations of the local efficiency is , proposed in this letter.

...read moreread less

Abstract: Binary undirected graphs are well established, but when these graphs are constructed, often a threshold is applied to a parameter describing the connection between two nodes. Therefore, the use of weighted graphs is more appropriate. In this work, we focus on weighted undirected graphs. This implies that we have to incorporate edge weights in the graph measures, which require generalizations of common graph metrics. After reviewing existing generalizations of the clustering coefficient and the local efficiency, we proposed new generalizations for these graph measures. To be able to compare different generalizations, a number of essential and useful properties were defined that ideally should be satisfied. We applied the generalizations to two real-world networks of different sizes. As a result, we found that not all existing generalizations satisfy all essential properties. Furthermore, we determined the best generalization for the clustering coefficient and local efficiency based on their properties and the performance when applied to two networks. We found that the best generalization of the clustering coefficient is , defined in Miyajima and Sakuragawa 2014, while the best generalization of the local efficiency is , proposed in this letter. Depending on the application and the relative importance of sensitivity and robustness to noise, other generalizations may be selected on the basis of the properties investigated in this letter.

...read moreread less

Journal Article•DOI•

Deep Restricted Kernel Machines Using Conjugate Feature Duality.

[...]

Johan A. K. Suykens¹•Institutions (1)

Katholieke Universiteit Leuven¹

17 Jul 2017-Neural Computation

TL;DR: A theory of deep restricted kernel machines offering new foundations for deep learning with kernel machines, which includes least squares support vector machines for classification and regression, kernel principal component analysis (PCA), matrix singular value decomposition, and Parzen-type models is proposed.

...read moreread less

Abstract: The aim of this letter is to propose a theory of deep restricted kernel machines offering new foundations for deep learning with kernel machines. From the viewpoint of deep learning, it is partially related to restricted Boltzmann machines, which are characterized by visible and hidden units in a bipartite graph without hidden-to-hidden connections and deep learning extensions as deep belief networks and deep Boltzmann machines. From the viewpoint of kernel machines, it includes least squares support vector machines for classification and regression, kernel principal component analysis (PCA), matrix singular value decomposition, and Parzen-type models. A key element is to first characterize these kernel machines in terms of so-called conjugate feature duality, yielding a representation with visible and hidden units. It is shown how this is related to the energy form in restricted Boltzmann machines, with continuous variables in a nonprobabilistic setting. In this new framework of so-called restricted kernel machine (RKM) representations, the dual variables correspond to hidden features. Deep RKM are obtained by coupling the RKMs. The method is illustrated for deep RKM, consisting of three levels with a least squares support vector machine regression level and two kernel PCA levels. In its primal form also deep feedforward neural networks can be trained within this framework.

...read moreread less

Journal Article•DOI•

Evidence accumulation and change rate inference in dynamic environments

[...]

Adrian E. Radillo¹, Alan Veliz-Cuba², Krešimir Josić³, Zachary P. Kilpatrick⁴•Institutions (4)

University of Houston¹, University of Dayton², Rice University³, University of Colorado Denver⁴

01 Jun 2017-Neural Computation

TL;DR: An ideal observer model capable of inferring the present state of the environment along with its rate of change is developed, and this model can be used to infer the environmental state and change rate with accuracy comparable to the ideal observer.

...read moreread less

Abstract: In a constantly changing world, animals must account for environmental volatility when making decisions. To appropriately discount older, irrelevant information, they need to learn the rate at which the environment changes. We develop an ideal observer model capable of inferring the present state of the environment along with its rate of change. Key to this computation is an update of the posterior probability of all possible change point counts. This computation can be challenging, as the number of possibilities grows rapidly with time. However, we show how the computations can be simplified in the continuum limit by a moment closure approximation. The resulting low-dimensional system can be used to infer the environmental state and change rate with accuracy comparable to the ideal observer. The approximate computations can be performed by a neural network model via a rate-correlation-based plasticity rule. We thus show how optimal observers accumulate evidence in changing environments and map this computation to reduced models that perform inference using plausible neural mechanisms.

...read moreread less

Journal Article•DOI•

An Initialization Method Based on Hybrid Distance for k-Means Algorithm

[...]

Jie Yang¹, Yan Ma¹, Xiangfen Zhang¹, Li Shunbao¹, Zhang Yuping¹ - Show less +1 more•Institutions (1)

Shanghai Normal University¹

17 Oct 2017-Neural Computation

TL;DR: An algorithm for selecting initial cluster centers that can dynamically adjust the weighting parameter is proposed and a new internal clustering validation measure, the clustering validate index based on the neighbors (CVN), which can be exploited to select the optimal result among multiple clustering results.

...read moreread less

Abstract: The traditional [Formula: see text]-means algorithm has been widely used as a simple and efficient clustering method. However, the performance of this algorithm is highly dependent on the selection of initial cluster centers. Therefore, the method adopted for choosing initial cluster centers is extremely important. In this letter, we redefine the density of points according to the number of its neighbors, as well as the distance between points and their neighbors. In addition, we define a new distance measure that considers both Euclidean distance and density. Based on that, we propose an algorithm for selecting initial cluster centers that can dynamically adjust the weighting parameter. Furthermore, we propose a new internal clustering validation measure, the clustering validation index based on the neighbors (CVN), which can be exploited to select the optimal result among multiple clustering results. Experimental results show that the proposed algorithm outperforms existing initialization methods on real-world data sets and demonstrates the adaptability of the proposed algorithm to data sets with various characteristics.

...read moreread less

Journal Article•DOI•

Blind Nonnegative Source Separation Using Biological Neural Networks.

[...]

Cengiz Pehlevan, Sreyas Mohan¹, Dmitri B. Chklovskii•Institutions (1)

Indian Institute of Technology Madras¹

17 Oct 2017-Neural Computation

TL;DR: In this paper, the authors formulate blind nonnegative source separation as a similarity matching problem and derive neural networks from the similarity matching objective, where synaptic weights in their networks are updated according to biologically plausible local learning rules.

...read moreread less

Abstract: Blind source separation—the extraction of independent sources from a mixture—is an important problem for both artificial and natural signal processing Here, we address a special case of this problem when sources (but not the mixing matrix) are known to be nonnegative—for example, due to the physical nature of the sources We search for the solution to this problem that can be implemented using biologically plausible neural networks Specifically, we consider the online setting where the data set is streamed to a neural network The novelty of our approach is that we formulate blind nonnegative source separation as a similarity matching problem and derive neural networks from the similarity matching objective Importantly, synaptic weights in our networks are updated according to biologically plausible local learning rules

...read moreread less

Journal Article•DOI•

Olfactory Recognition Based on EEG Gamma-Band Activity.

[...]

Onder Aydemir¹•Institutions (1)

Karadeniz Technical University¹

22 May 2017-Neural Computation

TL;DR: The results prove that the proposed continuous wavelet transform–based feature extraction method has great potential to classify the EEG signals recorded during smelling of the present odors.

...read moreread less

Abstract: There are various kinds of brain monitoring techniques, including local field potential, near-infrared spectroscopy, magnetic resonance imaging (MRI), positron emission tomography, functional MRI, electroencephalography (EEG), and magnetoencephalography. Among those techniques, EEG is the most widely used one due to its portability, low setup cost, and noninvasiveness. Apart from other advantages, EEG signals also help to evaluate the ability of the smelling organ. In such studies, EEG signals, which are recorded during smelling, are analyzed to determine the subject lacks any smelling ability or to measure the response of the brain. The main idea of this study is to show the emotional difference in EEG signals during perception of valerian, lotus flower, cheese, and rosewater odors by the EEG gamma wave. The proposed method was applied to the EEG signals, which were taken from five healthy subjects in the conditions of eyes open and eyes closed at the Swiss Federal Institute of Technology. In order to represent the signals, we extracted features from the gamma band of the EEG trials by continuous wavelet transform with the selection of Morlet as a wavelet function. Then the [Formula: see text]-nearest neighbor algorithm was implemented as the classifier for recognizing the EEG trials as valerian, lotus flower, cheese, and rosewater. We achieved an average classification accuracy rate of 87.50% with the 4.3 standard deviation value for the subjects in eyes-open condition and an average classification accuracy rate of 94.12% with the 2.9 standard deviation value for the subjects in eyes-closed condition. The results prove that the proposed continuous wavelet transform-based feature extraction method has great potential to classify the EEG signals recorded during smelling of the present odors. It has been also established that gamma-band activity of the brain is highly associated with olfaction.

...read moreread less

Journal Article•DOI•

Dopamine, inference, and uncertainty

[...]

Samuel J. Gershman¹•Institutions (1)

Harvard University¹

01 Dec 2017-Neural Computation

TL;DR: It is postulated that orbitofrontal cortex transforms the stimulus representation through recurrent dynamics, such that a simple error-driven learning rule operating on the transformed representation can implement the Bayesian reinforcement learning update.

...read moreread less

Abstract: The hypothesis that the phasic dopamine response reports a reward prediction error has become deeply entrenched. However, dopamine neurons exhibit several notable deviations from this hypothesis. A coherent explanation for these deviations can be obtained by analyzing the dopamine response in terms of Bayesian reinforcement learning. The key idea is that prediction errors are modulated by probabilistic beliefs about the relationship between cues and outcomes, updated through Bayesian inference. This account can explain dopamine responses to inferred value in sensory preconditioning, the effects of cue preexposure latent inhibition, and adaptive coding of prediction errors when rewards vary across orders of magnitude. We further postulate that orbitofrontal cortex transforms the stimulus representation through recurrent dynamics, such that a simple error-driven learning rule operating on the transformed representation can implement the Bayesian reinforcement learning update.

...read moreread less

Journal Article•DOI•

Interpretation of the precision matrix and its application in estimating sparse brain connectivity during sleep spindles from human electrocorticography recordings

[...]

Anup Das¹, Aaron L. Sampson¹, Claudia Lainscsek¹, Lyle Muller¹, Wutu Lin¹, John Doyle², Sydney S. Cash³, Eric Halgren⁴, Terrence J. Sejnowski¹ - Show less +5 more•Institutions (4)

Salk Institute for Biological Studies¹, California Institute of Technology², Harvard University³, University of California, San Diego⁴

01 Mar 2017-Neural Computation

TL;DR: The application of the SRPM method for estimating brain connectivity during stage 2 sleep spindles from human electrocorticography recordings using an electrode array is demonstrated and the recovery of the connectivity structure using theSRPM method can be explained by energy models using the Boltzmann distribution.

...read moreread less

Abstract: The correlation method from brain imaging has been used to estimate functional connectivity in the human brain. However, brain regions might show very high correlation even when the two regions are not directly connected due to the strong interaction of the two regions with common input from a third region. One previously proposed solution to this problem is to use a sparse regularized inverse covariance matrix or precision matrix SRPM assuming that the connectivity structure is sparse. This method yields partial correlations to measure strong direct interactions between pairs of regions while simultaneously removing the influence of the rest of the regions, thus identifying regions that are conditionally independent. To test our methods, we first demonstrated conditions under which the SRPM method could indeed find the true physical connection between a pair of nodes for a spring-mass example and an RC circuit example. The recovery of the connectivity structure using the SRPM method can be explained by energy models using the Boltzmann distribution. We then demonstrated the application of the SRPM method for estimating brain connectivity during stage 2 sleep spindles from human electrocorticography ECoG recordings using an electrode array. The ECoG recordings that we analyzed were from a 32-year-old male patient with long-standing pharmaco-resistant left temporal lobe complex partial epilepsy. Sleep spindles were automatically detected using delay differential analysis and then analyzed with SRPM and the Louvain method for community detection. We found spatially localized brain networks within and between neighboring cortical areas during spindles, in contrast to the case when sleep spindles were not present.

...read moreread less

Journal Article•DOI•

Parameter Identifiability in Statistical Machine Learning: A Review

[...]

Zhi-Yong Ran¹, Bao-Gang Hu²•Institutions (2)

Chongqing University of Posts and Telecommunications¹, Chinese Academy of Sciences²

14 Apr 2017-Neural Computation

TL;DR: This review examines the relevance of parameter identifiability for statistical models used in machine learning and addresses several issues of identifiable closely related to machine learning, showing the advantages and disadvantages of state-of-the-art research and demonstrating recent progress.

...read moreread less

Abstract: This review examines the relevance of parameter identifiability for statistical models used in machine learning. In addition to defining main concepts, we address several issues of identifiability closely related to machine learning, showing the advantages and disadvantages of state-of-the-art research and demonstrating recent progress. First, we review criteria for determining the parameter structure of models from the literature. This has three related issues: parameter identifiability, parameter redundancy, and reparameterization. Second, we review the deep influence of identifiability on various aspects of machine learning from theoretical and application viewpoints. In addition to illustrating the utility and influence of identifiability, we emphasize the interplay among identifiability theory, machine learning, mathematical statistics, information theory, optimization theory, information geometry, Riemann geometry, symbolic computation, Bayesian inference, algebraic geometry, and others. Finally, we present a new perspective together with the associated challenges.

...read moreread less

Journal Article•DOI•

Rat Prefrontal Cortex Inactivations during Decision Making Are Explained by Bistable Attractor Dynamics.

[...]

Alex T. Piet¹, Jeffrey C. Erlich², Charles D. Kopec¹, Carlos D. Brody³•Institutions (3)

Princeton University¹, New York University Shanghai², Howard Hughes Medical Institute³

17 Oct 2017-Neural Computation

TL;DR: This memory model naturally accounts for optogenetic perturbations of FOF in the same task and correctly predicts a memory-duration-dependent deficit caused by silencing FOF by fitting several models to test whether integration, categorization, or decision memory could account for it.

...read moreread less

Abstract: Two-node attractor networks are flexible models for neural activity during decision making. Depending on the network configuration, these networks can model distinct aspects of decisions including evidence integration, evidence categorization, and decision memory. Here, we use attractor networks to model recent causal perturbations of the frontal orienting fields (FOF) in rat cortex during a perceptual decision-making task (Erlich, Brunton, Duan, Hanks, & Brody, 2015 ). We focus on a striking feature of the perturbation results. Pharmacological silencing of the FOF resulted in a stimulus-independent bias. We fit several models to test whether integration, categorization, or decision memory could account for this bias and found that only the memory configuration successfully accounts for it. This memory model naturally accounts for optogenetic perturbations of FOF in the same task and correctly predicts a memory-duration-dependent deficit caused by silencing FOF in a different task. Our results provide mechanistic support for a "postcategorization" memory role of the FOF in upcoming choices.

...read moreread less

Journal Article•DOI•

Support vector algorithms for optimizing the partial area under the roc curve

[...]

Harikrishna Narasimhan¹, Shivani Agarwal²•Institutions (2)

Harvard University¹, University of Pennsylvania²

01 Jul 2017-Neural Computation

TL;DR: In this paper, support vector algorithms for directly optimizing the partial area under the ROC curve between any two false-positive rates are developed based on minimizing a suitable proxy or surrogate objective for the partial AUC error.

...read moreread less

Abstract: The area under the ROC curve AUC is a widely used performance measure in machine learning. Increasingly, however, in several applications, ranging from ranking to biometric screening to medicine, performance is measured not in terms of the full area under the ROC curve but in terms of the partial area under the ROC curve between two false-positive rates. In this letter, we develop support vector algorithms for directly optimizing the partial AUC between any two false-positive rates. Our methods are based on minimizing a suitable proxy or surrogate objective for the partial AUC error. In the case of the full AUC, one can readily construct and optimize convex surrogates by expressing the performance measure as a summation of pairwise terms. The partial AUC, on the other hand, does not admit such a simple decomposable structure, making it more challenging to design and optimize tight convex surrogates for this measure. Our approach builds on the structural SVM framework of Joachims 2005 to design convex surrogates for partial AUC and solves the resulting optimization problem using a cutting plane solver. Unlike the full AUC, where the combinatorial optimization needed in each iteration of the cutting plane solver can be decomposed and solved efficiently, the corresponding problem for the partial AUC is harder to decompose. One of our main contributions is a polynomial time algorithm for solving the combinatorial optimization problem associated with partial AUC. We also develop an approach for optimizing a tighter nonconvex hinge loss-based surrogate for the partial AUC using difference-of-convex programming. Our experiments on a variety of real-world and benchmark tasks confirm the efficacy of the proposed methods.

...read moreread less

Journal Article•DOI•

Multistability of delayed recurrent neural networks with mexican hat activation functions

[...]

Peng Liu¹, Zhigang Zeng¹, Jun Wang²•Institutions (2)

Huazhong University of Science and Technology¹, City University of Hong Kong²

01 Feb 2017-Neural Computation

TL;DR: This letter studies the multistability analysis of delayed recurrent neural networks with Mexican hat activation function and improves and extends the existing stability results in the literature.

...read moreread less

Abstract: This letter studies the multistability analysis of delayed recurrent neural networks with Mexican hat activation function. Some sufficient conditions are obtained to ensure that an -dimensional recurrent neural network can have equilibrium points with , and of them are locally exponentially stable. Furthermore, the attraction basins of these stable equilibrium points are estimated. We show that the attraction basins of these stable equilibrium points can be larger than their originally partitioned subsets. The results of this letter improve and extend the existing stability results in the literature. Finally, a numerical example containing different cases is given to illustrate the theoretical results.

...read moreread less

Journal Article•DOI•

Rank Selection in Nonnegative Matrix Factorization using Minimum Description Length.

[...]

Steven Squires¹, Adam Prügel-Bennett¹, Mahesan Niranjan¹•Institutions (1)

University of Southampton¹

31 May 2017-Neural Computation

TL;DR: A mechanism for selecting the subspace size by using a minimum description length technique is proposed and it is demonstrated that the technique provides plausible estimates for real data as well as accurately predicting the known size of synthetic data.

...read moreread less

Abstract: Nonnegative matrix factorization (NMF) is primarily a linear dimensionality reduction technique that factorizes a nonnegative data matrix into two smaller nonnegative matrices: one that represents the basis of the new subspace and the second that holds the coefficients of all the data points in that new space. In principle, the nonnegativity constraint forces the representation to be sparse and parts based. Instead of extracting holistic features from the data, real parts are extracted that should be significantly easier to interpret and analyze. The size of the new subspace selects how many features will be extracted from the data. An effective choice should minimize the noise while extracting the key features. We propose a mechanism for selecting the subspace size by using a minimum description length technique. We demonstrate that our technique provides plausible estimates for real data as well as accurately predicting the known size of synthetic data. We provide an implementation of our code in a Matlab format.

...read moreread less

Journal Article•DOI•

The population tracking model: A simple, scalable statistical model for neural population data

[...]

Cian O'Donnell¹, J. Tiago Gonçalves², Nick Whiteley³, Carlos Portera-Cailliau², Terrence J. Sejnowski⁴ - Show less +1 more•Institutions (4)

Salk Institute for Biological Studies¹, University of California, Los Angeles², University of Bristol³, University of California, San Diego⁴

01 Jan 2017-Neural Computation

TL;DR: A new statistical method for characterizing neural population activity that requires semi-independent fitting of only as many parameters as the square of the number of neurons, requiring drastically smaller data sets and minimal computation time is introduced.

...read moreread less

Abstract: Our understanding of neural population coding has been limited by a lack of analysis methods to characterize spiking data from large populations. The biggest challenge comes from the fact that the number of possible network activity patterns scales exponentially with the number of neurons recorded . Here we introduce a new statistical method for characterizing neural population activity that requires semi-independent fitting of only as many parameters as the square of the number of neurons, requiring drastically smaller data sets and minimal computation time. The model works by matching the population rate the number of neurons synchronously active and the probability that each individual neuron fires given the population rate. We found that this model can accurately fit synthetic data from up to 1000 neurons. We also found that the model could rapidly decode visual stimuli from neural population data from macaque primary visual cortex about 65i¾ ms after stimulus onset. Finally, we used the model to estimate the entropy of neural population activity in developing mouse somatosensory cortex and, surprisingly, found that it first increases, and then decreases during development. This statistical model opens new options for interrogating neural population data and can bolster the use of modern large-scale in vivo Ca and voltage imaging tools.

...read moreread less

Journal Article•DOI•

Time series decomposition into oscillation components and phase estimation

[...]

Takeru Matsuda¹, Fumiyasu Komaki²•Institutions (2)

University of Tokyo¹, RIKEN Brain Science Institute²

01 Feb 2017-Neural Computation

TL;DR: The proposed method for decomposing time series into oscillation components using state-space models based on the concept of random frequency modulation succeeds in extracting intermittent oscillations like ripples and detecting the phase reset phenomena.

...read moreread less

Abstract: Many time series are naturally considered as a superposition of several oscillation components. For example, electroencephalogram EEG time series include oscillation components such as alpha, beta, and gamma. We propose a method for decomposing time series into such oscillation components using state-space models. Based on the concept of random frequency modulation, gaussian linear state-space models for oscillation components are developed. In this model, the frequency of an oscillator fluctuates by noise. Time series decomposition is accomplished by this model like the Bayesian seasonal adjustment method. Since the model parameters are estimated from data by the empirical Bayes' method, the amplitudes and the frequencies of oscillation components are determined in a data-driven manner. Also, the appropriate number of oscillation components is determined with the Akaike information criterion AIC. In this way, the proposed method provides a natural decomposition of the given time series into oscillation components. In neuroscience, the phase of neural time series plays an important role in neural information processing. The proposed method can be used to estimate the phase of each oscillation component and has several advantages over a conventional method based on the Hilbert transform. Thus, the proposed method enables an investigation of the phase dynamics of time series. Numerical results show that the proposed method succeeds in extracting intermittent oscillations like ripples and detecting the phase reset phenomena. We apply the proposed method to real data from various fields such as astronomy, ecology, tidology, and neuroscience.

...read moreread less

Journal Article•DOI•

Unifying adversarial training algorithms with data gradient regularization

[...]

Alexander G. Ororbia¹, Daniel Kifer¹, C. Lee Giles¹•Institutions (1)

Pennsylvania State University¹

01 Apr 2017-Neural Computation

TL;DR: The proposed DataGrad framework, which can be viewed as a deep extension of the layerwise contractive autoencoder penalty, cleanly simplifies prior work and easily allows extensions such as adversarial training with multitask cues.

...read moreread less

Abstract: Many previous proposals for adversarial training of deep neural nets have included directly modifying the gradient, training on a mix of original and adversarial examples, using contractive penalties, and approximately optimizing constrained adversarial objective functions. In this article, we show that these proposals are actually all instances of optimizing a general, regularized objective we call DataGrad. Our proposed DataGrad framework, which can be viewed as a deep extension of the layerwise contractive autoencoder penalty, cleanly simplifies prior work and easily allows extensions such as adversarial training with multitask cues. In our experiments, we find that the deep gradient regularization of DataGrad which also has L1 and L2 flavors of regularization outperforms alternative forms of regularization, including classical L1, L2, and multitask, on both the original data set and adversarial sets. Furthermore, we find that combining multitask optimization with DataGrad adversarial training results in the most robust performance.

...read moreread less

Journal Article•DOI•

Fast, accurate localization of epileptic seizure onset zones based on detection of high-frequency oscillations using improved wavelet transform and matching pursuit methods

[...]

Min Wu¹, Ting Wan¹, Xiongbo Wan¹, Yuxiao Du², Jinhua She¹ - Show less +1 more•Institutions (2)

China University of Geosciences (Wuhan)¹, Guangdong University of Technology²

01 Jan 2017-Neural Computation

TL;DR: The improvement of two methods of detecting high-frequency oscillations (HFOs) and their use to localize epileptic seizure onset zones (SOZs) are described and it is shown that the improved WT method provides high specificity and quick localization and the improved MP method providing high sensitivity.

...read moreread less

Abstract: This letter describes the improvement of two methods of detecting high-frequency oscillations HFOs and their use to localize epileptic seizure onset zones SOZs. The wavelet transform WT method was improved by combining the complex Morlet WT with Shannon entropy to enhance the temporal-frequency resolution during HFO detection. And the matching pursuit MP method was improved by combining it with an adaptive genetic algorithm to improve the speed and accuracy of the calculations for HFO detection. The HFOs detected by these two methods were used to localize SOZs in five patients. A comparison shows that the improved WT method provides high specificity and quick localization and that the improved MP method provides high sensitivity.

...read moreread less

Journal Article•DOI•

Multisensory bayesian inference depends on synapse maturation during training: Theoretical analysis and neural modeling implementation

[...]

Mauro Ursino¹, Cristiano Cuppini¹, Elisa Magosso¹•Institutions (1)

University of Bologna¹

01 Mar 2017-Neural Computation

TL;DR: The model explains the ventriloquism illusion and, looking at the activity in the multimodal neurons, explains the automatic reweighting of auditory and visual inputs on a trial-by-trial basis, according to the reliability of the individual cues.

...read moreread less

Abstract: Recent theoretical and experimental studies suggest that in multisensory conditions, the brain performs a near-optimal Bayesian estimate of external events, giving more weight to the more reliable stimuli. However, the neural mechanisms responsible for this behavior, and its progressive maturation in a multisensory environment, are still insufficiently understood. The aim of this letter is to analyze this problem with a neural network model of audiovisual integration, based on probabilistic population coding-the idea that a population of neurons can encode probability functions to perform Bayesian inference. The model consists of two chains of unisensory neurons auditory and visual topologically organized. They receive the corresponding input through a plastic receptive field and reciprocally exchange plastic cross-modal synapses, which encode the spatial co-occurrence of visual-auditory inputs. A third chain of multisensory neurons performs a simple sum of auditory and visual excitations. The work includes a theoretical part and a computer simulation study. We show how a simple rule for synapse learning consisting of Hebbian reinforcement and a decay term can be used during training to shrink the receptive fields and encode the unisensory likelihood functions. Hence, after training, each unisensory area realizes a maximum likelihood estimate of stimulus position auditory or visual. In cross-modal conditions, the same learning rule can encode information on prior probability into the cross-modal synapses. Computer simulations confirm the theoretical results and show that the proposed network can realize a maximum likelihood estimate of auditory or visual positions in unimodal conditions and a Bayesian estimate, with moderate deviations from optimality, in cross-modal conditions. Furthermore, the model explains the ventriloquism illusion and, looking at the activity in the multimodal neurons, explains the automatic reweighting of auditory and visual inputs on a trial-by-trial basis, according to the reliability of the individual cues.

...read moreread less

Journal Article•DOI•

A Robust Regression Framework with Laplace Kernel-Induced Loss.

[...]

Liming Yang¹, Zhuo Ren¹, Yidan Wang¹, Hongwei Dong¹•Institutions (1)

China Agricultural University¹

17 Oct 2017-Neural Computation

TL;DR: A robust regression framework with nonconvex loss function based on the Laplace kernel-induced loss (LK-loss) and a continuous optimization method is developed to solve the problems.

...read moreread less

Abstract: This work proposes a robust regression framework with nonconvex loss function. Two regression formulations are presented based on the Laplace kernel-induced loss (LK-loss). Moreover, we illustrate that the LK-loss function is a nice approximation for the zero-norm. However, nonconvexity of the LK-loss makes it difficult to optimize. A continuous optimization method is developed to solve the proposed framework. The problems are formulated as DC (difference of convex functions) programming. The corresponding DC algorithms (DCAs) converge linearly. Furthermore, the proposed algorithms are applied directly to determine the hardness of licorice seeds using near-infrared spectral data with noisy input. Experiments in eight spectral regions show that the proposed methods improve generalization compared with the traditional support vector regressions (SVR), especially in high-frequency regions. Experiments on several benchmark data sets demonstrate that the proposed methods achieve better results than the traditional regression methods in most of data sets we have considered.

...read moreread less

Journal Article•DOI•

Robust Averaging of Covariances for EEG Recordings Classification in Motor Imagery Brain-Computer Interfaces.

[...]

Takashi Uehara¹, Matteo Sartori², Toshihisa Tanaka¹, Simone Fiori²•Institutions (2)

Tokyo University of Agriculture and Technology¹, Marche Polytechnic University²

22 May 2017-Neural Computation

TL;DR: Algorithm to average sample covariance matrices (SCMs) for the selection of the reference matrix in tangent space mapping (TSM)–based MI-BCI and the use of geometric medians and trimmed averages as robust estimators are discussed and tested.

...read moreread less

Abstract: The estimation of covariance matrices is of prime importance to analyze the distribution of multivariate signals. In motor imagery–based brain-computer interfaces (MI-BCI), covariance matrices play a central role in the extraction of features from recorded electroencephalograms (EEGs); therefore, correctly estimating covariance is crucial for EEG classification. This letter discusses algorithms to average sample covariance matrices (SCMs) for the selection of the reference matrix in tangent space mapping (TSM)–based MI-BCI. Tangent space mapping is a powerful method of feature extraction and strongly depends on the selection of a reference covariance matrix. In general, the observed signals may include outliers; therefore, taking the geometric mean of SCMs as the reference matrix may not be the best choice. In order to deal with the effects of outliers, robust estimators have to be used. In particular, we discuss and test the use of geometric medians and trimmed averages (defined on the basis of several m...

...read moreread less