# Showing papers in "Neural Computation in 1997"

••

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

Abstract: Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

72,897 citations

••

TL;DR: A novel fast algorithm for independent component analysis is introduced, which can be used for blind source separation and feature extraction, and the convergence speed is shown to be cubic.

Abstract: We introduce a novel fast algorithm for independent component analysis, which can be used for blind source separation and feature extraction. We show how a neural network learning rule can be transformed into a fixedpoint iteration, which provides an algorithm that is very simple, does not depend on any user-defined parameters, and is fast to converge to the most accurate solution allowed by the data. The algorithm finds, one at a time, all nongaussian independent components, regardless of their probability distributions. The computations can be performed in either batch mode or a semiadaptive manner. The convergence of the algorithm is rigorously proved, and the convergence speed is shown to be cubic. Some comparisons to gradient-based algorithms are made, showing that the new algorithm is usually 10 to 100 times faster, sometimes giving the solution in just a few iterations.

3,215 citations

••

Yale University

^{1}TL;DR: This work presents the basic ideas that would help informed users make the most efficient use of NEURON, the powerful and flexible environment for implementing models of individual neurons and small networks of neurons.

Abstract: The moment-to-moment processing of information by the nervous system involves the propagation and interaction of electrical and chemical signals that are distributed in space and time. Biologically realistic modeling is needed to test hypotheses about the mechanisms that govern these signals and how nervous system function emerges from the operation of these mechanisms. The NEURON simulation program provides a powerful and flexible environment for implementing such models of individual neurons and small networks of neurons. It is particularly useful when membrane potential is nonuniform and membrane currents are complex. We present the basic ideas that would help informed users make the most efficient use of NEURON.

2,617 citations

••

TL;DR: A new approach to shape recognition based on a virtually infinite family of binary features (queries) of the image data, designed to accommodate prior information about shape invariance and regularity, and a comparison with artificial neural networks methods is presented.

Abstract: We explore a new approach to shape recognition based on a virtually infinite family of binary features (queries) of the image data, designed to accommodate prior information about shape invariance and regularity. Each query corresponds to a spatial arrangement of several local topographic codes (or tags), which are in themselves too primitive and common to be informative about shape. All the discriminating power derives from relative angles and distances among the tags. The important attributes of the queries are a natural partial ordering corresponding to increasing structure and complexity; semi-invariance, meaning that most shapes of a given class will answer the same way to two queries that are successive in the ordering; and stability, since the queries are not based on distinguished points and substructures. No classifier based on the full feature set can be evaluated, and it is impossible to determine a priori which arrangements are informative. Our approach is to select informative features and build tree classifiers at the same time by inductive learning. In effect, each tree provides an approximation to the full posterior where the features chosen depend on the branch that is traversed. Due to the number and nature of the queries, standard decision tree construction based on a fixed-length feature vector is not feasible. Instead we entertain only a small random sample of queries at each node, constrain their complexity to increase with tree depth, and grow multiple trees. The terminal nodes are labeled by estimates of the corresponding posterior distribution over shape classes. An image is classified by sending it down every tree and aggregating the resulting distributions. The method is applied to classifying handwritten digits and synthetic linear and nonlinear deformations of three hundred L AT E X symbols. Stateof-the-art error rates are achieved on the National Institute of Standards and Technology database of digits. The principal goal of the experiments on L AT E X symbols is to analyze invariance, generalization error and related issues, and a comparison with artificial neural networks methods is presented in this context.

1,214 citations

••

TL;DR: A local linear approach to dimension reduction that provides accurate representations and is fast to compute is developed and it is shown that the local linear techniques outperform neural network implementations.

Abstract: Reducing or eliminating statistical redundancy between the components of high-dimensional vector data enables a lower-dimensional representation without significant loss of information. Recognizing the limitations of principal component analysis (PCA), researchers in the statistics and neural network communities have developed nonlinear extensions of PCA. This article develops a local linear approach to dimension reduction that provides accurate representations and is fast to compute. We exercise the algorithms on speech and image data, and compare performance with PCA and with neural network implementations of nonlinear PCA. We find that both nonlinear techniques can provide more accurate representations than PCA and show that the local linear techniques outperform neural network implementations.

702 citations

••

TL;DR: The algorithm combines the growth criterion of the resource-allocating network of Platt (1991) with a pruning strategy based on the relative contribution of each hidden unit to the overall network output to lead toward a minimal topology for the RBFNN.

Abstract: This article presents a sequential learning algorithm for function approximation and time-series prediction using a minimal radial basis function neural network (RBFNN). The algorithm combines the growth criterion of the resource-allocating network (RAN) of Platt (1991) with a pruning strategy based on the relative contribution of each hidden unit to the overall network output. The resulting network leads toward a minimal topology for the RBFNN. The performance of the algorithm is compared with RAN and the enhanced RAN algorithm of Kadirkamanathan and Niranjan (1993) for the following benchmark problems: (1) hearta from the benchmark problems database PROBEN1, (2) Hermite polynomial, and (3) Mackey-Glass chaotic time series. For these problems, the proposed algorithm is shown to realize RBFNNs with far fewer hidden neurons with better or same accuracy.

538 citations

••

TL;DR: The minimax entropy principle is applied to texture modeling, where a novel Markov random field model, called FRAME, is derived, and encouraging results are obtained in experiments on a variety of texture images.

Abstract: This article proposes a general theory and methodology, called the minimax entropy principle, for building statistical models for images (or signals) in a variety of applications. This principle consists of two parts. The first is the maximum entropy principle for feature binding (or fusion): for a given set of observed feature statistics, a distribution can be built to bind these feature statistics together by maximizing the entropy over all distributions that reproduce them. The second part is the minimum entropy principle for feature selection: among all plausible sets of feature statistics, we choose the set whose maximum entropy distribution has the minimum entropy. Computational and inferential issues in both parts are addressed; in particular, a feature pursuit procedure is proposed for approximately selecting the optimal set of features. The minimax entropy principle is then corrected by considering the sample variation in the observed feature statistics, and an information criterion for feature pursuit is derived. The minimax entropy principle is applied to texture modeling, where a novel Markov random field (MRF) model, called FRAME (filter, random field, and minimax entropy), is derived, and encouraging results are obtained in experiments on a variety of texture images. The relationship between our theory and the mechanisms of neural computation is also discussed.

477 citations

••

TL;DR: It is concluded that in the light of the vast hardware resources available in the ventral stream of the primate visual system relative to those exercised here, the appealingly simple feature-space conjecture remains worthy of serious consideration as a neurobiological model.

Abstract: Severe architectural and timing constraints within the primate visual system support the conjecture that the early phase of object recognition in the brain is based on a feedforward feature-extraction hierarchy. To assess the plausibility of this conjecture in an engineering context, a difficult three-dimensional object recognition domain was developed to challenge a pure feedforward, receptive-field-based recognition model called SEEMORE. SEEMORE is based on 102 viewpoint-invariant nonlinear filters that as a group are sensitive to contour, texture, and color cues. The visual domains consists of 100 real objects of many different types, including rigid (shovel), nonrigid (telephone cord), and statistical (maple leaf cluster) objects and photographs of complex scenes. Objects were individually presented in color video images under normal room lighting conditions. Based on 12 to 36 training views, SEEMORE was required to recognize unnormalized test views of objects that could vary in position, orientation in the image plane and in depth, and scale (factor of 2); for nonrigid objects, recognition was also tested under gross shape deformations. Correct classification performance on a test set consisting of 600 novel object views was 97 percent (chance was 1 percent) and was comparable for the subset of 15 nonrigid objects. Performance was also measured under a variety of image degradation conditions, including partial occlusion, limited clutter, color shift, and additive noise. Generalization behavior and classification errors illustrated the emergence of several striking natural shape categories that are not explicitly encoded in the dimensions of the feature space. It is concluded that in the light of the vast hardware resources available in the ventral stream of the primate visual system relative to those exercised here, the appealingly simple feature-space conjecture remains worthy of serious consideration as a neurobiological model.

371 citations

••

TL;DR: An algorithm and representation-level theory of illusory contour shape and salience that relies on the fact that the probability that a particle following a random walk will pass through a given position and orientation on a path joining two boundary fragments can be computed directly as the product of two vector-field convolutions.

Abstract: We describe an algorithm- and representation-level theory of illusory contour shape and salience. Unlike previous theories, our model is derived from a single assumption: that the prior probability distribution of boundary completion shape can be modeled by a random walk in a lattice whose points are positions and orientations in the image plane (i.e., the space that one can reasonably assume is represented by neurons of the mammalian visual cortex). Our model does not employ numerical relaxation or other explicit minimization, but instead relies on the fact that the probability that a particle following a random walk will pass through a given position and orientation on a path joining two boundary fragments can be computed directly as the product of two vector-field convolutions. We show that for the random walk we define, the maximum likelihood paths are curves of least energy, that is, on average, random walks follow paths commonly assumed to model the shape of illusory contours. A computer model is demonstrated on numerous illusory contour stimuli from the literature.

357 citations

••

TL;DR: It is argued that LEGION provides a novel and effective framework for image segmentation and figure-ground segregation and exhibits a natural capacity in segmenting images.

Abstract: We study image segmentation on the basis of locally excitatory, globally inhibitory oscillator networks (LEGION), whereby the phases of oscillators encode the binding of pixels. We introduce a lateral potential for each oscillator so that only oscillators with strong connections from their neighborhood can develop high potentials. Based on the concept of the lateral potential, a solution to remove noisy regions in an image is proposed for LEGION, so that it suppresses the oscillators corresponding to noisy regions but without affecting those corresponding to major regions. We show that the resulting oscillator network separates an image into several major regions, plus a background consisting of all noisy regions, and we illustrate network properties by computer simulation. The network exhibits a natural capacity in segmenting images. The oscillatory dynamics leads to a computer algorithm, which is applied successfully to segmenting real gray-level images. A number of issues regarding biological plausibility and perceptual organization are discussed. We argue that LEGION provides a novel and effective framework for image segmentation and figure-ground segregation.

356 citations

••

TL;DR: The results show that the description of a neuron as a threshold element can indeed be justified and the four-dimensional neuron model of Hodgkin and Huxley as a concrete example is studied.

Abstract: It is generally believed that a neuron is a threshold element that fires when some variable u reaches a threshold. Here we pursue the question of whether this picture can be justified and study the four-dimensional neuron model of Hodgkin and Huxley as a concrete example. The model is approximated by a response kernel expansion in terms of a single variable, the membrane voltage. The first-order term is linear in the input and its kernel has the typical form of an elementary postsynaptic potential. Higher-order kernels take care of nonlinear interactions between input spikes. In contrast to the standard Volterra expansion, the kernels depend on the firing time of the most recent output spike. In particular, a zero-order kernel that describes the shape of the spike and the typical after-potential is included. Our model neuron fires if the membrane voltage, given by the truncated response kernel expansion, crosses a threshold. The threshold model is tested on a spike train generated by the Hodgkin-Huxley model with a stochastic input current. We find that the threshold model predicts 90 percent of the spikes correctly. Our results show that, to good approximation, the description of a neuron as a threshold element can indeed be justified.

••

TL;DR: A hierarchical network model of visual recognition that explains experimental observations regarding neural responses in both free viewing and fixating conditions by using a form of the extended Kalman filter as given by the minimum description length (MDL) principle is described.

Abstract: Recent neurophysiological experiments appear to indicate that the responses of visual cortical neurons in a monkey freely viewing a natural scene can sometimes differ substantially from those obtained when the same image subregions are flashed during a conventional fixation task. These new findings attain significance from the fact that neurophysiological research in the past has been based predominantly on cell recordings obtained during fixation tasks, under the assumption that these data would be useful in predicting responses in more general situations. We describe a hierarchical model of visual memory that reconciles the two differing experimental results mentioned above by predicting neural responses in both fixating and free-viewing conditions. The model dynamically combines input-driven bottom-up signals with expectation-driven top-down signals to achieve optimal estimation of current state using a Kalman filter based framework. The architecture of the model posits a role for the reciprocal connections between adjoining visual cortical areas in determining neural response properties.

••

TL;DR: It is shown by computer simulation that the convergence of the stochastic descent algorithms is improved by using the natural gradient and the adaptively estimated cumulants.

Abstract: There are two major approaches for blind separation: maximum entropy (ME) and minimum mutual information (MMI). Both can be implemented by the stochastic gradient descent method for obtaining the demixing matrix. The MI is the contrast function for blind separation; the entropy is not. To justify the ME, the relation between ME and MMI is first elucidated by calculating the first derivative of the entropy and proving that the mean subtraction is necessary in applying the ME and at the solution points determined by the MI, the ME will not update the demixing matrix in the directions of increasing the cross-talking. Second, the natural gradient instead of the ordinary gradient is introduced to obtain efficient algorithms, because the parameter space is a Riemannian space consisting of matrices. The mutual information is calculated by applying the Gram-Charlier expansion to approximate probability density functions of the outputs. Finally, we propose an efficient learning algorithm that incorporates with an adaptive method of estimating the unknown cumulants. It is shown by computer simulation that the convergence of the stochastic descent algorithms is improved by using the natural gradient and the adaptively estimated cumulants.

••

TL;DR: It is shown that the well-known forward-backward and Viterbi algorithms for HMMs are special cases of more general inference algorithms for arbitrary PINs and the existence of inference and estimation algorithms for more general graphical models provides a set of analysis tools for HMM practitioners who wish to explore a richer class of HMM structures.

Abstract: Graphical techniques for modeling the dependencies of randomvariables have been explored in a variety of different areas includingstatistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics.Formalisms for manipulating these models have been developedrelatively independently in these research communities. In this paper weexplore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independencenetworks (PINs). The paper contains a self-contained review of the basic principles of PINs.It is shown that the well-known forward-backward (F-B) and Viterbialgorithms for HMMs are special cases of more general inference algorithms forarbitrary PINs. Furthermore, the existence of inference and estimationalgorithms for more general graphical models provides a set of analysistools for HMM practitioners who wish to explore a richer class of HMMstructures.Examples of relatively complex models to handle sensorfusion and coarticulationin speech recognitionare introduced and treated within the graphical model framework toillustrate the advantages of the general approach.

••

TL;DR: It is shown that shunting inhibition actually has a subtractive effect on the firing rate in most circumstances, and regulating a cell's passive membrane conductance via massive feedback is not an adequate mechanism for normalizing or scaling its output.

Abstract: Shunting inhibition, a conductance increase with a reversal potential close to the resting potential of the cell, has been shown to have a divisive effect on subthreshold excitatory postsynaptic potential amplitudes It has therefore been assumed to have the same divisive effect on firing rates We show that shunting inhibition actually has a subtractive effecton the firing rate in most circumstances Averaged over several interspike intervals, the spiking mechanism effectively clamps the somatic membrane potential to a value significantly above the resting potential, so that the current through the shunting conductance is approximately independent of the firing rate This leads to a subtractive rather than a divisive effect In addition, at distal synapses, shunting inhibition will also have an approximately subtractive effect if the excitatory conductance is not small compared to the inhibitory conductance Therefore regulating a cell's passive membrane conductance—for instance, via massive feedback—is not an adequate mechanism for normalizing or scaling its output

••

TL;DR: This work matches a simple integrate-and-fire model to the experimentally measured integrative properties of cortical regular spiking cells, leading to an intuitive picture of neuronal integration that unifies the seemingly contradictory and random walk pictures that have been proposed.

Abstract: To understand the interspike interval (ISI) variability displayed by visual cortical neurons (Softky & Koch, 1993), it is critical to examine the dynamics of their neuronal integration, as well as the variability in their synaptic input current. Most previous models have focused on the latter factor. We match a simple integrate-and-fire model to the experimentally measured integrative properties of cortical regular spiking cells (McCormick, Connors, Lighthall, & Prince, 1985). After setting RC parameters, the post-spike voltage reset is set to match experimental measurements of neuronal gain (obtained from in vitro plots of firing frequency versus injected current). Examination of the resulting model leads to an intuitive picture of neuronal integration that unifies the seemingly contradictory 1/square root of N and random walk pictures that have previously been proposed. When ISIs are dominated by postspike recovery, 1/square root of N arguments hold and spiking is regular; after the "memory" of the last spike becomes negligible, spike threshold crossing is caused by input variance around a steady state and spiking is Poisson. In integrate-and-fire neurons matched to cortical cell physiology, steady-state behavior is predominant, and ISIs are highly variable at all physiological firing rates and for a wide range of inhibitory and excitatory inputs.

••

TL;DR: It is shown that networks of relatively realistic mathematical models for biological neurons in principle can simulate arbitrary feedforward sigmoidal neural nets in a way that has previously not been considered and are universal approximators in the sense that they can approximate with regard to temporal coding any given continuous function of several variables.

Abstract: We show that networks of relatively realistic mathematical models for biological neurons in principle can simulate arbitrary feedforward sigmoidal neural nets in a way that has previously not been considered. This new approach is based on temporal coding by single spikes (respectively by the timing of synchronous firing in pools of neurons) rather than on the traditional interpretation of analog variables in terms of firing rates. The resulting new simulation is substantially faster and hence more consistent with experimental results about the maximal speed of information processing in cortical neural systems. As a consequence we can show that networks of noisy spiking neurons are “universal approximators” in the sense that they can approximate with regard to temporal coding any given continuous function of several variables. This result holds for a fairly large class of schemes for coding analog variables by firing times of spiking neurons. This new proposal for the possible organization of computations ...

••

TL;DR: A precise understanding of how Occam's razor, the principle that simpler models should be preferred until the data justify more complex models, is automatically embodied by probability theory is arrived at.

Abstract: The task of parametric model selection is cast in terms of a statistical mechanics on the space of probability distributions. Using the techniques of low-temperature expansions, I arrive at a systematic series for the Bayesian posterior probability of a model family that significantly extends known results in the literature. In particular, I arrive at a precise understanding of how Occam’s razor, the principle that simpler models should be preferred until the data justify more complex models, is automatically embodied by probability theory. These results require a measure on the space of model parameters and I derive and discuss an interpretation of Jeffreys’ prior distribution as a uniform prior over the distributions indexed by a family. Finally, I derive a theoretical index of the complexity of a parametric family relative to some true distribution that I call the razor of the model. The form of the razor immediately suggests several interesting questions in the theory of learning that can be studied using the techniques of statistical mechanics.

••

TL;DR: The no-free-lunch theorems have sparked heated debate in the computational learning community and a broader class of cross-validation is considered, when used more strictly, can yield the expected results on simple examples.

Abstract: The “no-free-lunch” theorems (Wolpert & Macready, 1995) have sparked heated debate in the computational learning community. A recent communication (Zhu & Rohwer, 1996) attempts to demonstrate the inefficiency of cross-validation on a simple problem. We elaborate on this result by considering a broader class of cross-validation. When used more strictly, cross-validation can yield the expected results on simple examples.

••

TL;DR: It is shown circumstances under which applying the RPP is guaranteed to increase the mean return, even though it can make large changes in the values of the parameters.

Abstract: We discuss Hinton's (1989) relative payoff procedure (RPP), a static reinforcement learning algorithm whose foundation is not stochastic gradient ascent. We show circumstances under which applying the RPP is guaranteed to increase the mean return, even though it can make large changes in the values of the parameters. The proof is based on a mapping between the RPP and a form of the expectation-maximization procedure of Dempster, Laird, and Rubin (1977).

••

TL;DR: The effectiveness of this penalty function for pruning feedforward neural network by weight elimination is tested on three well-known problems: the contiguity problem, the parity problems, and the monks problems.

Abstract: This article proposes the use of a penalty function for pruning feedforward neural network by weight elimination. The penalty function proposed consists of two terms. The first term is to discourage the use of unnecessary connections, and the second term is to prevent the weights of the connections from taking excessively large values. Simple criteria for eliminating weights from the network are also given. The effectiveness of this penalty function is tested on three well-known problems: the contiguity problem, the parity problems, and the monks problems. The resulting pruned networks obtained for many of these problems have fewer connections than previously reported in the literature.

••

Tokai University

^{1}TL;DR: It is suggested that the simple network model provides a mathematical basis for understanding neural selection mechanisms and the strength of lateral inhibition relative to that of self-inhibition is crucial for determining the steady states of the network among three qualitatively different types of behavior.

Abstract: A neuroecological equation of the Lotka-Volterra type for mean firing rate is derived from the conventional membrane dynamics of a neural network with lateral inhibition and self-inhibition. Neural selection mechanisms employed by the competitive neural network receiving external input sare studied with analytic and numerical calculations. A remarkable finding is that the strength of lateral inhibition relative to that of self-inhibition is crucial for determining the steady states of the network among three qualitatively different types of behavior. Equal strength of both types of inhibitory connections leads the network to the well-known winner-take-all behavior. If, however, the lateral inhibition is weaker than the self-inhibition, a certain number of neurons are activated in the steady states or the number of winners is in general more than one (the winners-share-all behavior). On the other hand, if the self-inhibition is weaker than the lateral one, only one neuron is activated, but the winner is no...

••

TL;DR: In this article, a Potts spin is assigned to each data point and an interaction between neighboring points, whose strength is a decreasing function of the distance between the neighbors, is introduced.

Abstract: We present a new approach to clustering, based on the physical properties of an inhomogeneous ferromagnet. No assumption is made regarding the underlying distribution of the data. We assign a Potts spin to each data point and introduce an interaction between neighboring points, whose strength is a decreasing function of the distance between the neighbors. This magnetic system exhibits three phases. At very low temperatures, it is completely ordered; all spins are aligned. At very high temperatures, the system does not exhibit any ordering, and in an intermediate regime, clusters of relatively strongly coupled spins become ordered, whereas different clusters remain uncorrelated. This intermediate phase is identified by a jump in the order parameters. The spin-spin correlation function is used to partition the spins and the corresponding data points into clusters. We demonstrate on three synthetic and three real data sets how the method works. Detailed comparison to the performance of other techniques clearly indicates the relative success of our method.

••

TL;DR: An algorithm for extracting rules from a standard three-layer feedforward neural network is proposed that can extract reasonably compact rule sets that have high predictive accuracy rates and is shown to work using real-world data arising from molecular biology and signal processing.

Abstract: An algorithm for extracting rules from a standard three-layer feedforward neural network is proposed. The trained network is first pruned not only to remove redundant connections in the network but, more important, to detect the relevant inputs. The algorithm generates rules from the pruned network by considering only a small number of activation values at the hidden units. If the number of inputs connected to a hidden unit is sufficiently small, then rules that describe how each of its activation values is obtained can be readily generated. Otherwise the hidden unit will be split and treated as output units, with each output unit corresponding to an activation value. A hidden layer is inserted and a new subnetwork is formed, trained, and pruned. This process is repeated until every hidden unit in the network has a relatively small number of input units connected to it. Examples on how the proposed algorithm works are shown using real-world data arising from molecular biology and signal processing. Our results show that for these complex problems, the algorithm can extract reasonably compact rule sets that have high predictive accuracy rates.

••

TL;DR: The adaptive-subspace self-organizing map (ASSOM) is a modular neural network architecture, the modules of which learn to identify input patterns subject to some simple transformations.

Abstract: The adaptive-subspace self-organizing map (ASSOM) is a modular neural network architecture, the modules of which learn to identify input patterns subject to some simple transformations. The learnin...

••

TL;DR: A self-organizing neural network model is presented for the simultaneous and cooperative development of topographic receptive fields and lateral interactions in cortical maps and explains why lateral connection patterns closely follow receptive field properties such as ocular dominance.

Abstract: A self-organizing neural network model for the simultaneous development of topographic receptive fields and lateral interactions in cortical maps is presented. Both afferent and lateral connections adapt by the same Hebbian mechanism in a purely local and unsupervised learning process. Afferent input weights of each neuron self-organize into hill-shaped profiles, receptive fields organize topographically across the network, and unique lateral interaction profiles develop for each neuron. The model suggests that precise cortical maps develop only if the initial receptive fields are topographically ordered or if they cover the whole receptive surface. It demonstrates how patterned lateral connections develop based on correlated activity, and explains why lateral connection patterns closely follow receptive field properties such as ocular dominance. The model predicts a dual role for lateral connections: to support self-organization of receptive fields, and to represent low-level Gestalt knowledge acquired during development of the cortex.

••

IBM

^{1}TL;DR: In this article, the authors present several additive corrections to the conventional quadratic loss bias-plus-variance formula, which is appropriate for measuring full generalization error over a test set rather than (as with conventional bias plus variance) error at a single point.

Abstract: This article presents several additive corrections to the conventional quadratic loss bias-plus-variance formula. One of these corrections is appropriate when both the target is not fixed (as in Bayesian analysis) and training sets are averaged over (as in the conventional bias plus variance formula). Another additive correction casts conventional fixed-trainingset Bayesian analysis directly in terms of bias plus variance. Another correction is appropriate for measuring full generalization error over a test set rather than (as with conventional bias plus variance) error at a single point. Yet another correction can help explain the recent counterintuitive bias-variance decomposition of Friedman for zero-one loss. After presenting these corrections, this article discusses some other loss function-specific aspects of supervised learning. In particular, there is a discussion of the fact that if the loss function is a metric (e.g., zero-one loss), then there is bound on the change in generalization error acco...

••

TL;DR: When the injected noise is gaussian, noise injection is naturally connected to the action of the heat kernel, and the connection between noise injection and heat kernel also enables controlling the fluctuations of the random perturbed cost function.

Abstract: Noise injection consists of adding noise to the inputs during neural network training. Experimental results suggest that it might improve the generalization ability of the resulting neural network. A justification of this improvement remains elusive: describing analytically the average perturbed cost function is difficult, and controlling the fluctuations of the random perturbed cost function is hard. Hence, recent papers suggest replacing the random perturbed cost by a (deterministic) Taylor approximation of the average perturbed cost function. This article takes a different stance: when the injected noise is gaussian, noise injection is naturally connected to the action of the heat kernel. This provides indications on the relevance domain of traditional Taylor expansions and shows the dependence of the quality of Taylor approximations on global smoothness properties of neural networks under consideration. The connection between noise injection and heat kernel also enables controlling the fluctuations of...

••

Laval University

^{1}TL;DR: A conductance-based model of Na+ and K+ currents underlying action potential generation is introduced by simplifying the quantitative model of Hodgkin and Huxley (HH), which generates action potentials very similar to the HH model but are computationally faster.

Abstract: A conductance-based model of Na+ and K+ currents underlying action potential generation is introduced by simplifying the quantitative model of Hodgkin and Huxley (HH). If the time course of rate co...