scispace - formally typeset
Search or ask a question

Showing papers in "Neural Computation in 2016"


Journal ArticleDOI
TL;DR: In this paper, a hierarchical temporal memory (HTM) sequence memory model is proposed to handle branching temporal sequences by maintaining multiple predictions until there is sufficient disambiguating evidence.
Abstract: The ability to recognize and predict temporal sequences of sensory inputs is vital for survival in natural environments. Based on many known properties of cortical neurons, hierarchical temporal memory HTM sequence memory recently has been proposed as a theoretical framework for sequence learning in the cortex. In this letter, we analyze properties of HTM sequence memory and apply it to sequence learning and prediction problems with streaming data. We show the model is able to continuously learn a large number of variable order temporal sequences using an unsupervised Hebbian-like learning rule. The sparse temporal codes formed by the model can robustly handle branching temporal sequences by maintaining multiple predictions until there is sufficient disambiguating evidence. We compare the HTM sequence memory with other sequence learning algorithms, including statistical methods-autoregressive integrated moving average; feedforward neural networks-time delay neural network and online sequential extreme learning machine; and recurrent neural networks-long short-term memory and echo-state networks on sequence prediction problems with both artificial and real-world data. The HTM model achieves comparable accuracy to other state-of-the-art algorithms. The model also exhibits properties that are critical for sequence learning, including continuous online learning, the ability to handle multiple predictions and branching sequences with high-order statistics, robustness to sensor noise and fault tolerance, and good performance without task-specific hyperparameter tuning. Therefore, the HTM sequence memory not only advances our understanding of how the brain may solve the sequence learning problem but is also applicable to real-world sequence learning problems from continuous data streams.

187 citations


Journal ArticleDOI
TL;DR: An extensive literature on EEG-based MI brain connectivity analysis of healthy subjects is reviewed, and the brain connectomes during left and right hand, feet, and tongue MI movements are discussed.
Abstract: Recent research has reached a consensus on the feasibility of motor imagery brain-computer interface MI-BCI for different applications, especially in stroke rehabilitation. Most MI-BCI systems rely on temporal, spectral, and spatial features of single channels to distinguish different MI patterns. However, no successful communication has been established for a completely locked-in subject. To provide more useful and informative features, it has been recommended to take into account the relationships among electroencephalographic EEG sensor/source signals in the form of brain connectivity as an efficient tool of neuroscience. In this review, we briefly report the challenges and limitations of conventional MI-BCIs. Brain connectivity analysis, particularly functional and effective, has been described as one of the most promising approaches for improving MI-BCI performance. An extensive literature on EEG-based MI brain connectivity analysis of healthy subjects is reviewed. We subsequently discuss the brain connectomes during left and right hand, feet, and tongue MI movements. Moreover, key components involved in brain connectivity analysis that considerably affect the results are explained. Finally, possible technical shortcomings that may have influenced the results in previous research are addressed and suggestions are provided.

159 citations


Journal ArticleDOI
TL;DR: CorrNet as mentioned in this paper proposes an AE-based approach, correlational neural network CorrNet, that explicitly maximizes correlation among the views when projected to the common subspace.
Abstract: Common representation learning CRL, wherein different descriptions or views of the data are embedded in a common subspace, has been receiving a lot of attention recently. Two popular paradigms here are canonical correlation analysis CCA-based approaches and autoencoder AE-based approaches. CCA-based approaches learn a joint representation by maximizing correlation of the views when projected to the common subspace. AE-based methods learn a common representation by minimizing the error of reconstructing the two views. Each of these approaches has its own advantages and disadvantages. For example, while CCA-based approaches outperform AE-based approaches for the task of transfer learning, they are not as scalable as the latter. In this work, we propose an AE-based approach, correlational neural network CorrNet, that explicitly maximizes correlation among the views when projected to the common subspace. Through a series of experiments, we demonstrate that the proposed CorrNet is better than AE and CCA with respect to its ability to learn correlated common representations. We employ CorrNet for several cross-language tasks and show that the representations learned using it perform better than the ones learned using other state-of-the-art approaches.

136 citations


Journal ArticleDOI
TL;DR: The results lay the foundations for a mathematically rigorous treatment of decision confidence that can lead to a common framework for understanding confidence across different research domains, from human and animal behavior to neural representations.
Abstract: Decision confidence is a forecast about the probability that a decision will be correct. From a statistical perspective, decision confidence can be defined as the Bayesian posterior probability that the chosen option is correct based on the evidence contributing to it. Here, we used this formal definition as a starting point to develop a normative statistical framework for decision confidence. Our goal was to make general predictions that do not depend on the structure of the noise or a specific algorithm for estimating confidence. We analytically proved several interrelations between statistical decision confidence and observable decision measures, such as evidence discriminability, choice, and accuracy. These interrelationships specify necessary signatures of decision confidence in terms of externally quantifiable variables that can be empirically tested. Our results lay the foundations for a mathematically rigorous treatment of decision confidence that can lead to a common framework for understanding confidence across different research domains, from human and animal behavior to neural representations.

88 citations


Journal ArticleDOI
TL;DR: In this article, a complex-valued convolutional network convnet implements the repeated application of the following composition of three operations, recursively applying the composition to an input vector of nonnegative real numbers: 1 convolution with complexvalued vectors, 2 taking the absolute value of every entry of the resulting vectors, followed by 3 local averaging.
Abstract: A complex-valued convolutional network convnet implements the repeated application of the following composition of three operations, recursively applying the composition to an input vector of nonnegative real numbers: 1 convolution with complex-valued vectors, followed by 2 taking the absolute value of every entry of the resulting vectors, followed by 3 local averaging. For processing real-valued random vectors, complex-valued convnets can be viewed as data-driven multiscale windowed power spectra, data-driven multiscale windowed absolute spectra, data-driven multiwavelet absolute values, or in their most general configuration data-driven nonlinear multiwavelet packets. Indeed, complex-valued convnets can calculate multiscale windowed spectra when the convnet filters are windowed complex-valued exponentials. Standard real-valued convnets, using rectified linear units ReLUs, sigmoidal e.g., logistic or tanh nonlinearities, or max pooling, for example, do not obviously exhibit the same exact correspondence with data-driven wavelets whereas for complex-valued convnets, the correspondence is much more than just a vague analogy. Courtesy of the exact correspondence, the remarkably rich and rigorous body of mathematical analysis for wavelets applies directly to complex-valued convnets.

81 citations


Journal ArticleDOI
TL;DR: This letter explores empirically the effect of the no free lunch (NFL) theorem on some popular machine learning classification techniques over real-world data sets.
Abstract: A sizable amount of research has been done to improve the mechanisms for knowledge extraction such as machine learning classification or regression. Quite unintuitively, the no free lunch NFL theorem states that all optimization problem strategies perform equally well when averaged over all possible problems. This fact seems to clash with the effort put forth toward better algorithms. This letter explores empirically the effect of the NFL theorem on some popular machine learning classification techniques over real-world data sets.

74 citations


Journal ArticleDOI
TL;DR: It is proved that the global stabilization of BCNs under SDSFC is equivalent to that by piecewise constant control (PCC).
Abstract: In this letter, we investigate the sampled-data state feedback control SDSFC problem of Boolean control networks BCNs. Some necessary and sufficient conditions are obtained for the global stabilization of BCNs by SDSFC. Different from conventional state feedback controls, new phenomena observed the study of SDSFC. Based on the controllability matrix, we derive some necessary and sufficient conditions under which the trajectories of BCNs can be stabilized to a fixed point by piecewise constant control PCC. It is proved that the global stabilization of BCNs under SDSFC is equivalent to that by PCC. Moreover, algorithms are given to construct the sampled-data state feedback controllers. Numerical examples are given to illustrate the efficiency of the obtained results.

74 citations


Journal ArticleDOI
TL;DR: A new Lyapunov function method is applied to prove the existence and stability of periodic solutions for this delayed neural network with periodic coefficients to ensure the existence, uniqueness, and global exponential stability of almost periodic solutions of neural networks.
Abstract: In this letter, we deal with a class of memristor-based neural networks with distributed leakage delays. By applying a new Lyapunov function method, we obtain some sufficient conditions that ensure the existence, uniqueness, and global exponential stability of almost periodic solutions of neural networks. We apply the results of this solution to prove the existence and stability of periodic solutions for this delayed neural network with periodic coefficients. We then provide an example to illustrate the effectiveness of the theoretical results. Our results are completely new and complement the previous studies Chen, Zeng, and Jiang 2014 and Jiang, Zeng, and Chen 2015.

70 citations


Journal ArticleDOI
Victor Solo1
TL;DR: In this article, the Granger causality and its associated frequency domain strength measures GEMs due to Geweke provide a framework for the formulation and analysis of these issues.
Abstract: The recent interest in the dynamics of networks and the advent, across a range of applications, of measuring modalities that operate on different temporal scales have put the spotlight on some significant gaps in the theory of multivariate time series. Fundamental to the description of network dynamics is the direction of interaction between nodes, accompanied by a measure of the strength of such interactions. Granger causality and its associated frequency domain strength measures GEMs due to Geweke provide a framework for the formulation and analysis of these issues. In pursuing this setup, three significant unresolved issues emerge. First, computing GEMs involves computing submodels of vector time series models, for which reliable methods do not exist. Second, the impact of filtering on GEMs has never been definitively established. Third, the impact of downsampling on GEMs has never been established. In this work, using state-space methods, we resolve all these issues and illustrate the results with some simulations. Our analysis is motivated by some problems in fMRI brain imaging, to which we apply it, but it is of general applicability.

65 citations


Journal ArticleDOI
TL;DR: The results show that using different reference schemes can have drastic effects on phase differences and PSD slopes and therefore must be interpreted carefully to gain insights about network properties.
Abstract: Brain signals are often analyzed in the spectral domain, where the power spectral density PSD and phase differences and consistency can reveal important information about the network. However, for proper interpretation, it is important to know whether these measures depend on stimulus/behavioral conditions or the reference scheme used to analyze data. We recorded local field potential LFP from an array of microelectrodes chronically implanted in area V1 of monkeys under different stimulus/behavioral conditions and computed PSD slopes, coherence, and phase difference between LFPs as a function of frequency and interelectrode distance while using four reference schemes: single wire, average, bipolar, and current source density. PSD slopes were dependent on reference scheme at low frequencies below 200i¾ Hz but became invariant at higher frequencies. Average phase differences between sites also depended critically on referencing, switching from 0 degrees for single-wire to 180 degrees for average reference. Results were consistent across different stimulus/behavioral conditions. We were able to account for these results based on the coherence profile across sites and properties of the spectral estimator. Our results show that using different reference schemes can have drastic effects on phase differences and PSD slopes and therefore must be interpreted carefully to gain insights about network properties.

62 citations


Journal ArticleDOI
TL;DR: A signature of efficient coding is derived expressed as the correspondence between the population Fisher information and the distribution of the stimulus variable, which is more general than previously proposed solutions that rely on specific assumptions about the neural tuning characteristics.
Abstract: Fisher information is generally believed to represent a lower bound on mutual informationi¾ Brunel & Nadal, 1998, a result that is frequently used in the assessment of neural coding efficiency. However, we demonstrate that the relation between these two quantities is more nuanced than previously thought. For example, we find that in the small noise regime, Fisher information actually provides an upper bound on mutual information. Generally our results show that it is more appropriate to consider Fisher information as an approximation rather than a bound on mutual information. We analytically derive the correspondence between the two quantities and the conditions under which the approximation is good. Our results have implications for neural coding theories and the link between neural population coding and psychophysically measurable behavior. Specifically, they allow us to formulate the efficient coding problem of maximizing mutual information between a stimulus variable and the response of a neural population in terms of Fisher information. We derive a signature of efficient coding expressed as the correspondence between the population Fisher information and the distribution of the stimulus variable. The signature is more general than previously proposed solutions that rely on specific assumptions about the neural tuning characteristics. We demonstrate that it can explain measured tuning characteristics of cortical neural populations that do not agree with previous models of efficient coding.

Journal ArticleDOI
TL;DR: This work constructs algorithmically a smooth, sigmoidal, almost monotone activation function providing approximation to an arbitrary continuous function within any degree of accuracy.
Abstract: The possibility of approximating a continuous function on a compact subset of the real line by a feedforward single hidden layer neural network with a sigmoidal activation function has been studied in many papers. Such networks can approximate an arbitrary continuous function provided that an unlimited number of neurons in a hidden layer is permitted. In this note, we consider constructive approximation on any finite interval of by neural networks with only one neuron in the hidden layer. We construct algorithmically a smooth, sigmoidal, almost monotone activation function providing approximation to an arbitrary continuous function within any degree of accuracy. This algorithm is implemented in a computer program, which computes the value of at any reasonable point of the real axis.

Journal ArticleDOI
TL;DR: In this article, the authors derived a dimensionality-dependent generalization bound for k-dimensional coding schemes by bounding the covering number of the loss function class induced by the reconstruction error.
Abstract: The k-dimensional coding schemes refer to a collection of methods that attempt to represent data using a set of representative k-dimensional vectors and include nonnegative matrix factorization, dictionary learning, sparse coding, k-means clustering, and vector quantization as special cases. Previous generalization bounds for the reconstruction error of the k-dimensional coding schemes are mainly dimensionality-independent. A major advantage of these bounds is that they can be used to analyze the generalization error when data are mapped into an infinite-or high-dimensional feature space. However, many applications use finite-dimensional data features. Can we obtain dimensionality-dependent generalization bounds for k-dimensional coding schemes that are tighter than dimensionality-independent bounds when data are in a finite-dimensional feature space? Yes. In this letter, we address this problem and derive a dimensionality-dependent generalization bound for k-dimensional coding schemes by bounding the covering number of the loss function class induced by the reconstruction error. The bound is of order , where m is the dimension of features, k is the number of the columns in the linear implementation of coding schemes, and n is the size of sample, when n is finite and when n is infinite. We show that our bound can be tighter than previous results because it avoids inducing the worst-case upper bound on k of the loss function. The proposed generalization bound is also applied to some specific coding schemes to demonstrate that the dimensionality-dependent bound is an indispensable complement to the dimensionality-independent generalization bounds.

Journal ArticleDOI
TL;DR: This article studies the connections between edge configuration and dynamics in a simple oriented network composed of two interconnected cliques (representative of brain feedback regulatory circuitry) and suggests potential applications to understanding synaptic restructuring in learning networks and the effects of network configuration on function of regulatory neural circuits.
Abstract: Recent studies have been using graph-theoretical approaches to model complex networks such as social, infrastructural, or biological networks and how their hardwired circuitry relates to their dynamic evolution in time Understanding how configuration reflects on the coupled behavior in a system of dynamic nodes can be of great importance, for example, in the context of how the brain connectome is affecting brain function However, the effect of connectivity patterns on network dynamics is far from being fully understood We study the connections between edge configuration and dynamics in a simple oriented network composed of two interconnected cliques representative of brain feedback regulatory circuitry In this article our main goal is to study the spectra of the graph adjacency and Laplacian matrices, with a focus on three aspects in particular: 1i¾ the sensitivity and robustness of the spectrum in response to varying the intra-and intermodular edge density, 2 the effects on the spectrum of perturbing the edge configuration while keeping the densities fixed, and 3 the effects of increasing the network size We study some tractable aspects analytically, then simulate more general results numerically, thus aiming to motivate and explain our further work on the effect of these patterns on the network temporal dynamics and phase transitions We discuss the implications of such results to modeling brain connectomics We suggest potential applications to understanding synaptic restructuring in learning networks and the effects of network configuration on function of regulatory neural circuits

Journal ArticleDOI
TL;DR: A mathematical construction for the restricted Boltzmann machine (RBM) that does not require specifying the number of hidden units is presented, and the hidden layer size is adaptive and can grow during training.
Abstract: We present a mathematical construction for the restricted Boltzmann machine RBM that does not require specifying the number of hidden units. In fact, the hidden layer size is adaptive and can grow during training. This is obtained by first extending the RBM to be sensitive to the ordering of its hidden units. Then, with a carefully chosen definition of the energy function, we show that the limit of infinitely many hidden units is well defined. As with RBM, approximate maximum likelihood training can be performed, resulting in an algorithm that naturally and adaptively adds trained hidden units during learning. We empirically study the behavior of this infinite RBM, showing that its performance is competitive to that of the RBM, while not requiring the tuning of a hidden layer size.

Journal ArticleDOI
TL;DR: The current use of the NEURON simulator with message passing interface (MPI) for simulation in the domain of moderately large networks on commonly available high-performance computers (HPCs) is described and the basic layout of such simulations is discussed.
Abstract: Large multiscale neuronal network simulations are of increasing value as more big data are gathered about brain wiring and organization under the auspices of a current major research initiative, such as Brain Research through Advancing Innovative Neurotechnologies. The development of these models requires new simulation technologies. We describe here the current use of the NEURON simulator with message passing interface MPI for simulation in the domain of moderately large networks on commonly available high-performance computers HPCs. We discuss the basic layout of such simulations, including the methods of simulation setup, the run-time spike-passing paradigm, and postsimulation data storage and data management approaches. Using the Neuroscience Gateway, a portal for computational neuroscience that provides access to large HPCs, we benchmark simulations of neuronal networks of different sizes 500-100,000 cells, and using different numbers of nodes 1-256. We compare three types of networks, composed of either Izhikevich integrate-and-fire neurons I&F, single-compartment Hodgkin-Huxley HH cells, or a hybrid network with half of each. Results show simulation run time increased approximately linearly with network size and decreased almost linearly with the number of nodes. Networks with I&F neurons were faster than HH networks, although differences were small since all tested cells were point neurons with a single compartment.

Journal ArticleDOI
TL;DR: A general theorem is established that guarantees the almost sure convergence for the last iterate of OPERA without any assumptions on the underlying distribution and an interesting property for a family of widely used kernels in the setting of pairwise learning is established.
Abstract: Pairwise learning usually refers to a learning task that involves a loss function depending on pairs of examples, among which the most notable ones are bipartite ranking, metric learning, and AUC maximization. In this letter we study an online algorithm for pairwise learning with a least-square loss function in an unconstrained setting of a reproducing kernel Hilbert space RKHS that we refer to as the Online Pairwise lEaRning Algorithm OPERA. In contrast to existing works Kar, Sriperumbudur, Jain, & Karnick, 2013; Wang, Khardon, Pechyony, & Jones, 2012, which require that the iterates are restricted to a bounded domain or the loss function is strongly convex, OPERA is associated with a non-strongly convex objective function and learns the target function in an unconstrained RKHS. Specifically, we establish a general theorem that guarantees the almost sure convergence for the last iterate of OPERA without any assumptions on the underlying distribution. Explicit convergence rates are derived under the condition of polynomially decaying step sizes. We also establish an interesting property for a family of widely used kernels in the setting of pairwise learning and illustrate the convergence results using such kernels. Our methodology mainly depends on the characterization of RKHSs using its associated integral operators and probability inequalities for random variables with values in a Hilbert space.

Journal ArticleDOI
Kun Zhan1, Jicai Teng1, Jinhui Shi1, Qiaoqiao Li1, Mingying Wang1 
TL;DR: A feature-linking model (FLM) that uses the timing of spikes to encode information and the first spiking time of FLM is applied to image enhancement, and the processing mechanisms are consistent with the human visual system.
Abstract: Inspired by gamma-band oscillations and other neurobiological discoveries, neural networks research shifts the emphasis toward temporal coding, which uses explicit times at which spikes occur as an essential dimension in neural representations. We present a feature-linking model FLM that uses the timing of spikes to encode information. The first spiking time of FLM is applied to image enhancement, and the processing mechanisms are consistent with the human visual system. The enhancement algorithm achieves boosting the details while preserving the information of the input image. Experiments are conducted to demonstrate the effectiveness of the proposed method. Results show that the proposed method is effective.

Journal ArticleDOI
TL;DR: This study proposes an iteratively reweighted type algorithm and provides a constructive proof of its convergence to a stationary point and shows that in each iteration, it is a quadratic programming problem in its dual space and can be solved by using state-of-the-art methods.
Abstract: This letter addresses the robustness problem when learning a large margin classifier in the presence of label noise. In our study, we achieve this purpose by proposing robustified large margin support vector machines. The robustness of the proposed robust support vector classifiers RSVC, which is interpreted from a weighted viewpoint in this work, is due to the use of nonconvex classification losses. Besides the robustness, we also show that the proposed RSCV is simultaneously smooth, which again benefits from using smooth classification losses. The idea of proposing RSVC comes from M-estimation in statistics since the proposed robust and smooth classification losses can be taken as one-sided cost functions in robust statistics. Its Fisher consistency property and generalization ability are also investigated. Besides the robustness and smoothness, another nice property of RSVC lies in the fact that its solution can be obtained by solving weighted squared hinge loss-based support vector machine problems iteratively. We further show that in each iteration, it is a quadratic programming problem in its dual space and can be solved by using state-of-the-art methods. We thus propose an iteratively reweighted type algorithm and provide a constructive proof of its convergence to a stationary point. Effectiveness of the proposed classifiers is verified on both artificial and real data sets.

Journal ArticleDOI
TL;DR: This work proposes an approach for learning latent directed polytrees as long as there exists an appropriately defined discrepancy measure between the observed nodes and proves that the approach is consistent for learning minimal latent directed trees.
Abstract: We propose an approach for learning latent directed polytrees as long as there exists an appropriately defined discrepancy measure between the observed nodes. Specifically, we use our approach for learning directed information polytrees where samples are available from only a subset of processes. Directed information trees are a new type of probabilistic graphical models that represent the causal dynamics among a set of random processes in a stochastic system. We prove that the approach is consistent for learning minimal latent directed trees. We analyze the sample complexity of the learning task when the empirical estimator of mutual information is used as the discrepancy measure.

Journal ArticleDOI
TL;DR: Hermite polynomial-based functional link artificial neural network (FLANN) is proposed here to solve the Van der Pol–Duffing oscillator equation and the results reveal that this method is reliable and can be applied to other nonlinear problems too.
Abstract: Hermite polynomial-based functional link artificial neural network FLANN is proposed here to solve the Van der Pol-Duffing oscillator equation. A single-layer hermite neural network HeNN model is used, where a hidden layer is replaced by expansion block of input pattern using Hermite orthogonal polynomials. A feedforward neural network model with the unsupervised error backpropagation principle is used for modifying the network parameters and minimizing the computed error function. The Van der Pol-Duffing and Duffing oscillator equations may not be solved exactly. Here, approximate solutions of these types of equations have been obtained by applying the HeNN model for the first time. Three mathematical example problems and two real-life application problems of Van der Pol-Duffing oscillator equation, extracting the features of early mechanical failure signal and weak signal detection problems, are solved using the proposed HeNN method. HeNN approximate solutions have been compared with results obtained by the well known Runge-Kutta method. Computed results are depicted in term of graphs. After training the HeNN model, we may use it as a black box to get numerical results at any arbitrary point in the domain. Thus, the proposed HeNN method is efficient. The results reveal that this method is reliable and can be applied to other nonlinear problems too.

Journal ArticleDOI
TL;DR: Two complex Zhang neural network models for computing the Drazin inverse of arbitrary time-varying complex square matrix are presented andoretical results of convergence analysis are presented to show the desirable properties of the proposed complex-valued ZNN models.
Abstract: Two complex Zhang neural network ZNN models for computing the Drazin inverse of arbitrary time-varying complex square matrix are presented. The design of these neural networks is based on corresponding matrix-valued error functions arising from the limit representations of the Drazin inverse. Two types of activation functions, appropriate for handling complex matrices, are exploited to develop each of these networks. Theoretical results of convergence analysis are presented to show the desirable properties of the proposed complex-valued ZNN models. Numerical results further demonstrate the effectiveness of the proposed models.

Journal ArticleDOI
TL;DR: In this article, the authors show that a recent technology developed by IBM can be leveraged to realize neuromorphic circuits that operate as classifiers of complex real-world stimuli, and they also show that the energy consumption of these architectures, realized on the IBM chip, is typically two or more orders of magnitude lower than that of conventional digital machines implementing classifiers with comparable performance.
Abstract: Neuromorphic engineering combines the architectural and computational principles of systems neuroscience with semiconductor electronics, with the aim of building efficient and compact devices that mimic the synaptic and neural machinery of the brain. The energy consumptions promised by neuromorphic engineering are extremely low, comparable to those of the nervous system. Until now, however, the neuromorphic approach has been restricted to relatively simple circuits and specialized functions, thereby obfuscating a direct comparison of their energy consumption to that used by conventional von Neumann digital machines solving real-world tasks. Here we show that a recent technology developed by IBM can be leveraged to realize neuromorphic circuits that operate as classifiers of complex real-world stimuli. Specifically, we provide a set of general prescriptions to enable the practical implementation of neural architectures that compete with state-of-the-art classifiers. We also show that the energy consumption of these architectures, realized on the IBM chip, is typically two or more orders of magnitude lower than that of conventional digital machines implementing classifiers with comparable performance. Moreover, the spike-based dynamics display a trade-off between integration time and accuracy, which naturally translates into algorithms that can be flexibly deployed for either fast and approximate classifications, or more accurate classifications at the mere expense of longer running times and higher energy costs. This work finally proves that the neuromorphic approach can be efficiently used in real-world applications and has significant advantages over conventional digital devices when energy consumption is considered.

Journal ArticleDOI
TL;DR: It is shown that the stability of KENReg leads to generalization, and its sparseness confidence can be derived from generalization.
Abstract: Kernelized elastic net regularization KENReg is a kernelization of the well-known elastic net regularization Zou & Hastie, 2005. The kernel in KENReg is not required to be a Mercer kernel since it learns from a kernelized dictionary in the coefficient space. Feng, Yang, Zhao, Lv, and Suykens 2014 showed that KENReg has some nice properties including stability, sparseness, and generalization. In this letter, we continue our study on KENReg by conducting a refined learning theory analysis. This letter makes the following three main contributions. First, we present refined error analysis on the generalization performance of KENReg. The main difficulty of analyzing the generalization error of KENReg lies in characterizing the population version of its empirical target function. We overcome this by introducing a weighted Banach space associated with the elastic net regularization. We are then able to conduct elaborated learning theory analysis and obtain fast convergence rates under proper complexity and regularity assumptions. Second, we study the sparse recovery problem in KENReg with fixed design and show that the kernelization may improve the sparse recovery ability compared to the classical elastic net regularization. Finally, we discuss the interplay among different properties of KENReg that include sparseness, stability, and generalization. We show that the stability of KENReg leads to generalization, and its sparseness confidence can be derived from generalization. Moreover, KENReg is stable and can be simultaneously sparse, which makes it attractive theoretically and practically.

Journal ArticleDOI
TL;DR: This study shows that visual spatial pooling can be learned in a much simpler way using strong dimension reduction based on principal component analysis and demonstrates that pooling of model V1 simple cells learned in this way, even with nonlinearities other than squaring, can reproduce standard tuning properties of V1 complex cells.
Abstract: In visual modeling, invariance properties of visual cells are often explained by a pooling mechanism, in which outputs of neurons with similar selectivities to some stimulus parameters are integrated so as to gain some extent of invariance to other parameters. For example, the classical energy model of phase-invariant V1 complex cells pools model simple cells preferring similar orientation but different phases. Prior studies, such as independent subspace analysis, have shown that phase-invariance properties of V1 complex cells can be learned from spatial statistics of natural inputs. However, those previous approaches assumed a squaring nonlinearity on the neural outputs to capture energy correlation; such nonlinearity is arguably unnatural from a neurobiological viewpoint but hard to change due to its tight integration into their formalisms. Moreover, they used somewhat complicated objective functions requiring expensive computations for optimization. In this study, we show that visual spatial pooling can be learned in a much simpler way using strong dimension reduction based on principal component analysis. This approach learns to ignore a large part of detailed spatial structure of the input and thereby estimates a linear pooling matrix. Using this framework, we demonstrate that pooling of model V1 simple cells learned in this way, even with nonlinearities other than squaring, can reproduce standard tuning properties of V1 complex cells. For further understanding, we analyze several variants of the pooling model and argue that a reasonable pooling can generally be obtained from any kind of linear transformation that retains several of the first principal components and suppresses the remaining ones. In particular, we show how the classic Wiener filtering theory leads to one such variant.

Journal ArticleDOI
TL;DR: This work analytically derived the optimal tuning curve of a single neuron encoding a one-dimensional stimulus with an arbitrary input distribution and shows how the result can be generalized to a class of neural populations by introducing the concept of a meta–tuning curve.
Abstract: The efficient coding hypothesis assumes that biological sensory systems use neural codes that are optimized to best possibly represent the stimuli that occur in their environment. Most common models use information-theoretic measures, whereas alternative formulations propose incorporating downstream decoding performance. Here we provide a systematic evaluation of different optimality criteria using a parametric formulation of the efficient coding problem based on the reconstruction error of the maximum likelihood decoder. This parametric family includes both the information maximization criterion and squared decoding error as special cases. We analytically derived the optimal tuning curve of a single neuron encoding a one-dimensional stimulus with an arbitrary input distribution. We show how the result can be generalized to a class of neural populations by introducing the concept of a meta-tuning curve. The predictions of our framework are tested against previously measured characteristics of some early visual systems found in biology. We find solutions that correspond to low values of , suggesting that across different animal models, neural representations in the early visual pathways optimize similar criteria about natural stimuli that are relatively close to the information maximization criterion.

Journal ArticleDOI
TL;DR: This article proposes to infer the causal direction by comparing the distance correlation between and with the distance correlations between and, and infer that X causes Y if the dependence coefficient between and is smaller.
Abstract: In this article, we deal with the problem of inferring causal directions when the data are on discrete domain. By considering the distribution of the cause and the conditional distribution mapping cause to effect as independent random variables, we propose to infer the causal direction by comparing the distance correlation between and with the distance correlation between and . We infer that X causes Y if the dependence coefficient between and is smaller. Experiments are performed to show the performance of the proposed method.

Journal ArticleDOI
TL;DR: Two linear recurrent neural networks for generating outer inverses with prescribed range and null space are defined and the conditions that ensure stability of the proposed neural network are presented.
Abstract: Two linear recurrent neural networks for generating outer inverses with prescribed range and null space are defined. Each of the proposed recurrent neural networks is based on the matrix-valued differential equation, a generalization of dynamic equations proposed earlier for the nonsingular matrix inversion, the Moore-Penrose inversion, as well as the Drazin inversion, under the condition of zero initial state. The application of the first approach is conditioned by the properties of the spectrum of a certain matrix; the second approach eliminates this drawback, though at the cost of increasing the number of matrix operations. The cases corresponding to the most common generalized inverses are defined. The conditions that ensure stability of the proposed neural network are presented. Illustrative examples present the results of numerical simulations.

Journal ArticleDOI
TL;DR: It is shown that the phenomenon of phase precession of neurons in the hippocampus and ventral striatum correspond to the cognitive act of future prediction, and results in Weber-Fechner spacing for the representation of both past (memory) and future (prediction) timelines.
Abstract: Predicting the timing and order of future events is an essential feature of cognition in higher life forms. We propose a neural mechanism to nondestructively translate the current state of spatiotemporal memory into the future, so as to construct an ordered set of future predictions almost instantaneously. We hypothesize that within each cycle of hippocampal theta oscillations, the memory state is swept through a range of translations to yield an ordered set of future predictions through modulations in synaptic connections. Theoretically, we operationalize critical neurobiological findings from hippocampal physiology in terms of neural network equations representing spatiotemporal memory. Combined with constraints based on physical principles requiring scale invariance and coherence in translation across memory nodes, the proposition results in Weber-Fechner spacing for the representation of both past memory and future prediction timelines. We show that the phenomenon of phase precession of neurons in the hippocampus and ventral striatum correspond to the cognitive act of future prediction.

Journal ArticleDOI
TL;DR: In this paper, the authors theoretically and experimentally investigate tensor-based regression and classification, and derive an excess risk bound for each tensor norm and demonstrate the superiority of tensor based learning methods over vector-and matrix-based learning methods.
Abstract: We theoretically and experimentally investigate tensor-based regression and classification. Our focus is regularization with various tensor norms, including the overlapped trace norm, the latent trace norm, and the scaled latent trace norm. We first give dual optimization methods using the alternating direction method of multipliers, which is computationally efficient when the number of training samples is moderate. We then theoretically derive an excess risk bound for each tensor norm and clarify their behavior. Finally, we perform extensive experiments using simulated and real data and demonstrate the superiority of tensor-based learning methods over vector-and matrix-based learning methods.