scispace - formally typeset
Search or ask a question

Showing papers on "Dimensionality reduction published in 2003"


Journal ArticleDOI
TL;DR: In this article, the authors proposed a geometrically motivated algorithm for representing high-dimensional data, based on the correspondence between the graph Laplacian, the Laplace Beltrami operator on the manifold and the connections to the heat equation.
Abstract: One of the central problems in machine learning and pattern recognition is to develop appropriate representations for complex data. We consider the problem of constructing a representation for data lying on a low-dimensional manifold embedded in a high-dimensional space. Drawing on the correspondence between the graph Laplacian, the Laplace Beltrami operator on the manifold, and the connections to the heat equation, we propose a geometrically motivated algorithm for representing the high-dimensional data. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality-preserving properties and a natural connection to clustering. Some potential applications and illustrative examples are discussed.

7,210 citations


Proceedings Article
09 Dec 2003
TL;DR: These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data set by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold.
Abstract: Many problems in information processing involve some form of dimensionality reduction. In this paper, we introduce Locality Preserving Projections (LPP). These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data set. LPP should be seen as an alternative to Principal Component Analysis (PCA) – a classical linear technique that projects the data along the directions of maximal variance. When the high dimensional data lies on a low dimensional manifold embedded in the ambient space, the Locality Preserving Projections are obtained by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold. As a result, LPP shares many of the data representation properties of nonlinear techniques such as Laplacian Eigenmaps or Locally Linear Embedding. Yet LPP is linear and more crucially is defined everywhere in ambient space rather than just on the training data points. This is borne out by illustrative examples on some high dimensional data sets.

4,318 citations


Proceedings Article
21 Aug 2003
TL;DR: A novel concept, predominant correlation, is introduced, and a fast filter method is proposed which can identify relevant features as well as redundancy among relevant features without pairwise correlation analysis.
Abstract: Feature selection, as a preprocessing step to machine learning, is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this work, we introduce a novel concept, predominant correlation, and propose a fast filter method which can identify relevant features as well as redundancy among relevant features without pairwise correlation analysis. The efficiency and effectiveness of our method is demonstrated through extensive comparisons with other methods using real-world data of high dimensionality

2,251 citations


Journal ArticleDOI
TL;DR: Locally linear embedding (LLE), an unsupervised learning algorithm that computes low dimensional, neighborhood preserving embeddings of high dimensional data, is described and several extensions that enhance its performance are discussed.
Abstract: The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation. Here we describe locally linear embedding (LLE), an unsupervised learning algorithm that computes low dimensional, neighborhood preserving embeddings of high dimensional data. The data, assumed to be sampled from an underlying manifold, are mapped into a single global coordinate system of lower dimensionality. The mapping is derived from the symmetries of locally linear reconstructions, and the actual computation of the embedding reduces to a sparse eigenvalue problem. Notably, the optimizations in LLE---though capable of generating highly nonlinear embeddings---are simple to implement, and they do not involve local minima. In this paper, we describe the implementation of the algorithm in detail and discuss several extensions that enhance its performance. We present results of the algorithm applied to data sampled from known manifolds, as well as to collections of images of faces, lips, and handwritten digits. These examples are used to provide extensive illustrations of the algorithm's performance---both successes and failures---and to relate the algorithm to previous and ongoing work in nonlinear dimensionality reduction.

1,614 citations


Proceedings Article
09 Dec 2003
TL;DR: A unified framework for extending Local Linear Embedding, Isomap, Laplacian Eigenmaps, Multi-Dimensional Scaling as well as for Spectral Clustering is provided.
Abstract: Several unsupervised learning algorithms based on an eigendecomposition provide either an embedding or a clustering only for given training points, with no straightforward extension for out-of-sample examples short of recomputing eigenvectors. This paper provides a unified framework for extending Local Linear Embedding (LLE), Isomap, Laplacian Eigenmaps, Multi-Dimensional Scaling (for dimensionality reduction) as well as for Spectral Clustering. This framework is based on seeing these algorithms as learning eigenfunctions of a data-dependent kernel. Numerical experiments show that the generalizations performed have a level of error comparable to the variability of the embedding algorithms due to the choice of training data.

1,072 citations


Journal ArticleDOI
TL;DR: The algorithm, which is based on dimensionality reduction and partial Voronoi diagram construction, can be used for computing the DT for a wide class of distance functions, including the L/sub p/ and chamfer metrics.
Abstract: A sequential algorithm is presented for computing the exact Euclidean distance transform (DT) of a k-dimensional binary image in time linear in the total number of voxels N. The algorithm, which is based on dimensionality reduction and partial Voronoi diagram construction, can be used for computing the DT for a wide class of distance functions, including the L/sub p/ and chamfer metrics. At each dimension level, the DT is computed by constructing the intersection of the Voronoi diagram whose sites are the feature voxels with each row of the image. This construction is performed efficiently by using the DT in the next lower dimension. The correctness and linear time complexity are demonstrated analytically and verified experimentally. The algorithm may be of practical value since it is relatively simple and easy to implement and it is relatively fast (not only does it run in O(N) time but the time constant is small). A simple modification of the algorithm computes the weighted Euclidean DT, which is useful for images with anisotropic voxel dimensions. A parallel version of the algorithm runs in O(N/p) time with p processors.

907 citations


Journal ArticleDOI
TL;DR: An empirical study is conducted to examine the pros and cons of these search methods, give some guidelines on choosing a search method, and compare the classifier error rates before and after feature selection.

846 citations


Journal ArticleDOI
TL;DR: The algorithm for feature selection is based on an application of a rough set method to the result of principal components analysis (PCA) used for feature projection and reduction.

801 citations


Journal ArticleDOI
TL;DR: The experiment shows that SVM by feature extraction using PCA, KPCA or ICA can perform better than that without feature extraction, and among the three methods, there is the best performance in K PCA feature extraction; followed by ICA feature extraction.

524 citations


Journal Article
TL;DR: The method constructs a series of sparse linear SVMs to generate linear models that can generalize well, and uses a subset of nonzero weighted variables found by the linear models to produce a final nonlinear model.
Abstract: We describe a methodology for performing variable ranking and selection using support vector machines (SVMs). The method constructs a series of sparse linear SVMs to generate linear models that can generalize well, and uses a subset of nonzero weighted variables found by the linear models to produce a final nonlinear model. The method exploits the fact that a linear SVM (no kernels) with l1-norm regularization inherently performs variable selection as a side-effect of minimizing capacity of the SVM model. The distribution of the linear model weights provides a mechanism for ranking and interpreting the effects of variables. Starplots are used to visualize the magnitude and variance of the weights for each variable. We illustrate the effectiveness of the methodology on synthetic data, benchmark problems, and challenging regression problems in drug design. This method can dramatically reduce the number of variables and outperforms SVMs trained using all attributes and using the attributes selected according to correlation coefficients. The visualization of the resulting models is useful for understanding the role of underlying variables.

481 citations


Proceedings ArticleDOI
18 Jun 2003
TL;DR: A dimensionality reduction algorithm that enables subspace analysis within the multilinear framework, based on a tensor decomposition known as the N-mode SVD, the natural extension to tensors of the conventional matrix singular value decomposition (SVD).
Abstract: Multilinear algebra, the algebra of higher-order tensors, offers a potent mathematical framework for analyzing ensembles of images resulting from the interaction of any number of underlying factors. We present a dimensionality reduction algorithm that enables subspace analysis within the multilinear framework. This N-mode orthogonal iteration algorithm is based on a tensor decomposition known as the N-mode SVD, the natural extension to tensors of the conventional matrix singular value decomposition (SVD). We demonstrate the power of multilinear subspace analysis in the context of facial image ensembles, where the relevant factors include different faces, expressions, viewpoints, and illuminations. In prior work we showed that our multilinear representation, called TensorFaces, yields superior facial recognition rates relative to standard, linear (PCA/eigenfaces) approaches. We demonstrate factor-specific dimensionality reduction of facial image ensembles. For example, we can suppress illumination effects (shadows, highlights) while preserving detailed facial features, yielding a low perceptual error.

Journal ArticleDOI
TL;DR: The experimental results indicate that the classification accuracy is increased significantly under parallel feature fusion and also demonstrate that the developed parallel fusion is more effective than the classical serial feature fusion.

Journal ArticleDOI
TL;DR: An abstract framework for integrating multiple feature spaces in the k-means clustering algorithm is presented and the effectiveness of feature weighting in clustering on several different application domains is demonstrated.
Abstract: Data sets with multiple, heterogeneous feature spaces occur frequently. We present an abstract framework for integrating multiple feature spaces in the k-means clustering algorithm. Our main ideas are (i) to represent each data object as a tuple of multiple feature vectors, (ii) to assign a suitable (and possibly different) distortion measure to each feature space, (iii) to combine distortions on different feature spaces, in a convex fashion, by assigning (possibly) different relative weights to each, (iv) for a fixed weighting, to cluster using the proposed convex k-means algorithm, and (v) to determine the optimal feature weighting to be the one that yields the clustering that simultaneously minimizes the average within-cluster dispersion and maximizes the average between-cluster dispersion along all the feature spaces. Using precision/recall evaluations and known ground truth classifications, we empirically demonstrate the effectiveness of feature weighting in clustering on several different application domains.

Proceedings ArticleDOI
02 Nov 2003
TL;DR: This paper investigates different cross-modal association methods using the linear correlation model, and introduces a novel method for cross- modal association called Cross-modAL Factor Analysis (CFA), which shows several advantages in analysis performance and feature usage.
Abstract: Multimodal information processing has received considerable attention in recent years The focus of existing research in this area has been predominantly on the use of fusion technology In this paper, we suggest that cross-modal association can provide a new set of powerful solutions in this area We investigate different cross-modal association methods using the linear correlation model We also introduce a novel method for cross-modal association called Cross-modal Factor Analysis (CFA) Our earlier work on Latent Semantic Indexing (LSI) is extended for applications that use off-line supervised training As a promising research direction and practical application of cross-modal association, cross-modal information retrieval where queries from one modality are used to search for content in another modality using low-level features is then discussed in detail Different association methods are tested and compared using the proposed cross-modal retrieval system All these methods achieve significant dimensionality reduction Among them CFA gives the best retrieval performance Finally, this paper addresses the use of cross-modal association to detect talking heads The CFA method achieves 911% detection accuracy, while LSI and Canonical Correlation Analysis (CCA) achieve 661% and 739% accuracy, respectively As shown by experiments, cross-modal association provides many useful benefits, such as robust noise resistance and effective feature selection Compared to CCA and LSI, the proposed CFA shows several advantages in analysis performance and feature usage Its capability in feature selection and noise resistance also makes CFA a promising tool for many multimedia analysis applications

Journal ArticleDOI
01 Sep 2003
TL;DR: A computer-aided diagnostic (CAD) system for the classification of hepatic lesions from computed tomography (CT) images is presented and shows that genetic algorithms result in lower dimension feature vectors and improved classification performance.
Abstract: In this paper, a computer-aided diagnostic (CAD) system for the classification of hepatic lesions from computed tomography (CT) images is presented. Regions of interest (ROIs) taken from nonenhanced CT images of normal liver, hepatic cysts, hemangiomas, and hepatocellular carcinomas have been used as input to the system. The proposed system consists of two modules: the feature extraction and the classification modules. The feature extraction module calculates the average gray level and 48 texture characteristics, which are derived from the spatial gray-level co-occurrence matrices, obtained from the ROIs. The classifier module consists of three sequentially placed feed-forward neural networks (NNs). The first NN classifies into normal or pathological liver regions. The pathological liver regions are characterized by the second NN as cyst or "other disease". The third NN classifies "other disease" into hemangioma or hepatocellular carcinoma. Three feature selection techniques have been applied to each individual NN: the sequential forward selection, the sequential floating forward selection, and a genetic algorithm for feature selection. The comparative study of the above dimensionality reduction methods shows that genetic algorithms result in lower dimension feature vectors and improved classification performance.

Proceedings ArticleDOI
24 Aug 2003
TL;DR: It is found that the random projection approach predictively underperforms PCA, but its computational advantages may make it attractive for certain applications.
Abstract: Dimensionality reduction via Random Projections has attracted considerable attention in recent years. The approach has interesting theoretical underpinnings and offers computational advantages. In this paper we report a number of experiments to evaluate Random Projections in the context of inductive supervised learning. In particular, we compare Random Projections and PCA on a number of different datasets and using different machine learning methods. While we find that the random projection approach predictively underperforms PCA, its computational advantages may make it attractive for certain applications.

Journal ArticleDOI
TL;DR: Functional relationships predicted by the new analysis are compared with those predicted using standard approaches; validation using bioinformatic databases suggests predictions using the new approach may be up to twice as accurate as some conventional approaches.
Abstract: The availability of parallel, high-throughput biological experiments that simultaneously monitor thousands of cellular observables provides an opportunity for investigating cellular behavior in a highly quantitative manner at multiple levels of resolution. One challenge to more fully exploit new experimental advances is the need to develop algorithms to provide an analysis at each of the relevant levels of detail. Here, the data analysis method non-negative matrix factorization (NMF) has been applied to the analysis of gene array experiments. Whereas current algorithms identify relationships on the basis of large-scale similarity between expression patterns, NMF is a recently developed machine learning technique capable of recognizing similarity between subportions of the data corresponding to localized features in expression space. A large data set consisting of 300 genome-wide expression measurements of yeast was used as sample data to illustrate the performance of the new approach. Local features detected are shown to map well to functional cellular subsystems. Functional relationships predicted by the new analysis are compared with those predicted using standard approaches; validation using bioinformatic databases suggests predictions using the new approach may be up to twice as accurate as some conventional approaches.

Journal ArticleDOI
TL;DR: A feature extraction method is presented by utilizing an error estimation equation based on the Bhattacharyya distance to use classification errors in the transformed feature space, which are estimated using the error estimation equations, as a criterion for feature extraction.

Journal ArticleDOI
TL;DR: The minimum classification error (MCE) training algorithm (which was originally proposed for optimizing classifiers) is investigated for feature extraction and a generalized MCE (GMCE)Training algorithm is proposed to mend the shortcomings of the MCE training algorithm.

Journal ArticleDOI
TL;DR: It is shown that automatic wavelet reduction yields better or comparable classification accuracy for hyperspectral data, while achieving substantial computational savings.
Abstract: Hyperspectral imagery provides richer information about materials than multispectral imagery. The new larger data volumes from hyperspectral sensors present a challenge for traditional processing techniques. For example, the identification of each ground surface pixel by its corresponding spectral signature is still difficult because of the immense volume of data. Conventional classification methods may not be used without dimension reduction preprocessing. This is due to the curse of dimensionality, which refers to the fact that the sample size needed to estimate a function of several variables to a given degree of accuracy grows exponentially with the number of variables. Principal component analysis (PCA) has been the technique of choice for dimension reduction. However, PCA is computationally expensive and does not eliminate anomalies that can be seen at one arbitrary band. Spectral data reduction using automatic wavelet decomposition could be useful. This is because it preserves the distinctions among spectral signatures. It is also computed in automatic fashion and can filter data anomalies. This is due to the intrinsic properties of wavelet transforms that preserves high- and low-frequency features, therefore preserving peaks and valleys found in typical spectra. Compared to PCA, for the same level of data reduction, we show that automatic wavelet reduction yields better or comparable classification accuracy for hyperspectral data, while achieving substantial computational savings.

Journal ArticleDOI
TL;DR: This paper shows that all clustering methods, which are invariant under additive shifts of the pairwise proximities, can be reformulated as grouping problems in Euclidian spaces and preserves the complete preservation of the cluster structure in the embedding space.
Abstract: For several major applications of data analysis, objects are often not represented as feature vectors in a vector space, but rather by a matrix gathering pairwise proximities. Such pairwise data often violates metricity and, therefore, cannot be naturally embedded in a vector space. Concerning the problem of unsupervised structure detection or clustering, in this paper, a new embedding method for pairwise data into Euclidean vector spaces is introduced. We show that all clustering methods, which are invariant under additive shifts of the pairwise proximities, can be reformulated as grouping problems in Euclidian spaces. The most prominent property of this constant shift embedding framework is the complete preservation of the cluster structure in the embedding space. Restating pairwise clustering problems in vector spaces has several important consequences, such as the statistical description of the clusters by way of cluster prototypes, the generic extension of the grouping procedure to a discriminative prediction rule, and the applicability of standard preprocessing methods like denoising or dimensionality reduction.

Journal ArticleDOI
Heiko Wersing1, Edgar Körner1
TL;DR: This work proposes a feedforward model for recognition that shares components like weight sharing, pooling stages, and competitive nonlinearities with earlier approaches but focuses on new methods for learning optimal feature-detecting cells in intermediate stages of the hierarchical network.
Abstract: There is an ongoing debate over the capabilities of hierarchical neural feedforward architectures for performing real-world invariant object recognition. Although a variety of hierarchical models exists, appropriate supervised and unsupervised learning methods are still an issue of intense research. We propose a feedforward model for recognition that shares components like weight sharing, pooling stages, and competitive nonlinearities with earlier approaches but focuses on new methods for learning optimal feature-detecting cells in intermediate stages of the hierarchical network. We show that principles of sparse coding, which were previously mostly applied to the initial feature detection stages, can also be employed to obtain optimized intermediate complex features. We suggest a new approach to optimize the learning of sparse features under the constraints of a weight-sharing or convolutional architecture that uses pooling operations to achieve gradual invariance in the feature hierarchy. The approach explicitly enforces symmetry constraints like translation invariance on the feature set. This leads to a dimension reduction in the search space of optimal features and allows determining more efficiently the basis representatives, which achieve a sparse decomposition of the input. We analyze the quality of the learned feature representation by investigating the recognition performance of the resulting hierarchical network on object and face databases. We show that a hierarchy with features learned on a single object data set can also be applied to face recognition without parameter changes and is competitive with other recent machine learning recognition approaches. To investigate the effect of the interplay between sparse coding and processing nonlinearities, we also consider alternative feedforward pooling nonlinearities such as presynaptic maximum selection and sum-of-squares integration. The comparison shows that a combination of strong competitive nonlinearities with sparse coding offers the best recognition performance in the difficult scenario of segmentation-free recognition in cluttered surround. We demonstrate that for both learning and recognition, a precise segmentation of the objects is not necessary.

Journal ArticleDOI
TL;DR: An adaptive dimension reduction method for generalized semi-parametric regression models is used that allows to solve the 'curse of dimensionality problem' arising in the context of expression data.
Abstract: Motivation: One particular application of microarray data, is to uncover the molecular variation among cancers. One feature of microarray studies is the fact that the number n of samples collected is relatively small compared to the number p of genes per sample which are usually in the thousands. In statistical terms this very large number of predictors compared to a small number of samples or observations makes the classification problem difficult. An efficient way to solve this problem is by using dimension reduction statistical techniques in conjunction with nonparametric discriminant procedures. Results: We view the classification problem as a regression problem with few observations and many predictor variables. We use an adaptive dimension reduction method for generalized semi-parametric regression models that allows us to solve the ‘curse of dimensionality problem’ arising in the context of expression data. The predictive performance of the resulting classification rule is illustrated on two well know data sets in the microarray literature: the leukemia data that is known to contain classes that are easy ‘separable’ and the colon data set. Availability: Software that implements the procedures on which this paper focus are freely available at http:

Journal ArticleDOI
TL;DR: A comparison between NMF, WNMF and the well-known principal component analysis (PCA) in the context of image patch classification has been carried out and it is claimed that all three techniques can be combined in a common and unique classifier.

Journal ArticleDOI
TL;DR: This work adapt and extend the discriminant analysis projection used in pattern recognition and shows that by using the generalized singular value decomposition (GSVD), it can achieve the same goal regardless of the relative dimensions of the term-document matrix.
Abstract: In today's vector space information retrieval systems, dimension reduction is imperative for efficiently manipulating the massive quantity of data. To be useful, this lower-dimensional representation must be a good approximation of the full document set. To that end, we adapt and extend the discriminant analysis projection used in pattern recognition. This projection preserves cluster structure by maximizing the scatter between clusters while minimizing the scatter within clusters. A common limitation of trace optimization in discriminant analysis is that one of the scatter matrices must be nonsingular, which restricts its application to document sets in which the number of terms does not exceed the number of documents. We show that by using the generalized singular value decomposition (GSVD), we can achieve the same goal regardless of the relative dimensions of the term-document matrix. In addition, applying the GSVD allows us to avoid the explicit formation of the scatter matrices in favor of working directly with the data matrix, thus improving the numerical properties of the approach. Finally, we present experimental results that confirm the effectiveness of our approach.

Journal ArticleDOI
TL;DR: Stochastic proximity embedding is introduced, a novel self‐organizing algorithm for producing meaningful underlying dimensions from proximity data that scales linearly with respect to sample size, and can be applied to very large data sets that are intractable by conventional embedding procedures.
Abstract: We introduce stochastic proximity embedding (SPE), a novel self-organizing algorithm for producing meaningful underlying dimensions from proximity data. SPE attempts to generate low-dimensional Euclidean embeddings that best preserve the similarities between a set of related observations. The method starts with an initial configuration, and iteratively refines it by repeatedly selecting pairs of objects at random, and adjusting their coordinates so that their distances on the map match more closely their respective proximities. The magnitude of these adjustments is controlled by a learning rate parameter, which decreases during the course of the simulation to avoid oscillatory behavior. Unlike classical multidimensional scaling (MDS) and nonlinear mapping (NLM), SPE scales linearly with respect to sample size, and can be applied to very large data sets that are intractable by conventional embedding procedures. The method is programmatically simple, robust, and convergent, and can be applied to a wide range of scientific problems involving exploratory data analysis and visualization.

Journal ArticleDOI
TL;DR: In this paper, a kernel-based algorithm for nonlinear blind source separation (BSS) with temporal information is proposed. But this algorithm requires the data to be mapped to a high (possibly infinite)-dimensional kernel feature space.
Abstract: We propose kTDSEP, a kernel-based algorithm for nonlinear blind source separation (BSS). It combines complementary research fields: kernel feature spaces and BSS using temporal information. This yields an efficient algorithm for nonlinear BSS with invertible nonlinearity. Key assumptions are that the kernel feature space is chosen rich enough to approximate the nonlinearity and that signals of interest contain temporal information. Both assumptions are fulfilled for a wide set of real-world applications. The algorithm works as follows: First, the data are (implicitly) mapped to a high (possibly infinite)-dimensional kernel feature space. In practice, however, the data form a smaller submanifold in feature space-- even smaller than the number of training data points--a fact that has already been used by, for example, reduced set techniques for support vector machines. We propose to adapt to this effective dimension as a preprocessing step and to construct an orthonormal basis of this submanifold. The latter dimension-reduction step is essential for making the subsequent application of BSS methods computationally and numerically tractable. In the reduced space, we use a BSS algorithm that is based on second-order temporal decorrelation. Finally, we propose a selection procedure to obtain the original sources from the extracted nonlinear components automatically.Experiments demonstrate the excellent performance and efficiency of our kTDSEP algorithm for several problems of nonlinear BSS and for more than two sources.

DOI
26 May 2003
TL;DR: A new approach to handling high dimensional data, named Visual Hierarchical Dimension Reduction (VHDR), that not only generates lower dimensional spaces that are meaningful to users, but also allows user interactions in most steps of the process.
Abstract: Traditional visualization techniques for multidimensional data sets, such as parallel coordinates, glyphs, and scatterplot matrices, do not scale well to high numbers of dimensions. A common approach to solving this problem is dimensionality reduction. Existing dimensionality reduction techniques usually generate lower dimensional spaces that have little intuitive meaning to users and allow little user interaction. In this paper we propose a new approach to handling high dimensional data, named Visual Hierarchical Dimension Reduction (VHDR), that addresses these drawbacks. VHDR not only generates lower dimensional spaces that are meaningful to users, but also allows user interactions in most steps of the process. In VHDR, dimensions are grouped into a hierarchy, and lower dimensional spaces are constructed using clusters of the hierarchy. We have implemented the VHDR approach into XmdvTool, and extended several traditional multidimensional visualization methods to convey dimension cluster characteristics when visualizing the data set in lower dimensional spaces. Our case study of applying VHDR to a real data set supports our belief that this approach is effective in supporting the exploration of high dimensional data sets.

Journal ArticleDOI
TL;DR: In this article, the Hodgkin-Huxley (HH) model is used to analyze the features in the stimulus that trigger a spike, explicitly eliminating the effects of interactions between spikes.
Abstract: A spiking neuron "computes" by transforming a complex dynamical input into a train of action potentials, or spikes. The computation performed by the neuron can be formulated as dimensional reduction, or feature detection, followed by a nonlinear decision function over the low-dimensional space. Generalizations of the reverse correlation technique with white noise input provide a numerical strategy for extracting the relevant low-dimensional features from experimental data, and information theory can be used to evaluate the quality of the low-dimensional approximation. We apply these methods to analyze the simplest biophysically realistic model neuron, the Hodgkin-Huxley (HH) model, using this system to illustrate the general methodological issues. We focus on the features in the stimulus that trigger a spike, explicitly eliminating the effects of interactions between spikes. One can approximate this triggering "feature space" as a two-dimensional linear subspace in the high-dimensional space of input histories, capturing in this way a substantial fraction of the mutual information between inputs and spike time. We find that an even better approximation, however, is to describe the relevant subspace as two dimensional but curved; in this way, we can capture 90% of the mutual information even at high time resolution. Our analysis provides a new understanding of the computational properties of the HH model. While it is common to approximate neural behavior as "integrate and fire," the HH model is not an integrator nor is it well described by a single threshold.

Journal ArticleDOI
TL;DR: A number of methods have been proposed in the last decade to overcome the limitation of LDA on small sample size, and these methods, in applying to face recognition, can be roughly grouped into three categories.