scispace - formally typeset
Search or ask a question

Showing papers on "Dimensionality reduction published in 2009"


01 Jan 2009
TL;DR: The results of the experiments reveal that nonlinear techniques perform well on selected artificial tasks, but that this strong performance does not necessarily extend to real-world tasks.
Abstract: In recent years, a variety of nonlinear dimensionality reduction techniques have been proposed that aim to address the limitations of traditional techniques such as PCA and classical scaling. The paper presents a review and systematic comparison of these techniques. The performances of the nonlinear techniques are investigated on artificial and natural tasks. The results of the experiments reveal that nonlinear techniques perform well on selected artificial tasks, but that this strong performance does not necessarily extend to real-world tasks. The paper explains these results by identifying weaknesses of current nonlinear techniques, and suggests how the performance of nonlinear dimensionality reduction techniques may be improved.

2,141 citations


Proceedings ArticleDOI
14 Jun 2009
TL;DR: In this article, the authors provide exponential tail bounds for feature hashing and show that the interaction between random subspaces is negligible with high probability, and demonstrate the feasibility of this approach with experimental results for a new use case.
Abstract: Empirical evidence suggests that hashing is an effective strategy for dimensionality reduction and practical nonparametric estimation. In this paper we provide exponential tail bounds for feature hashing and show that the interaction between random subspaces is negligible with high probability. We demonstrate the feasibility of this approach with experimental results for a new use case --- multitask learning with hundreds of thousands of tasks.

955 citations


Journal ArticleDOI
TL;DR: The Gini importance of the random forest provided superior means for measuring feature relevance on spectral data, but – on an optimal subset of features – the regularized classifiers might be preferable over the random Forest classifier, in spite of their limitation to model linear dependencies only.
Abstract: Regularized regression methods such as principal component or partial least squares regression perform well in learning tasks on high dimensional spectral data, but cannot explicitly eliminate irrelevant features. The random forest classifier with its associated Gini feature importance, on the other hand, allows for an explicit feature elimination, but may not be optimally adapted to spectral data due to the topology of its constituent classification trees which are based on orthogonal splits in feature space. We propose to combine the best of both approaches, and evaluated the joint use of a feature selection based on a recursive feature elimination using the Gini importance of random forests' together with regularized classification methods on spectral data sets from medical diagnostics, chemotaxonomy, biomedical analytics, food science, and synthetically modified spectral data. Here, a feature selection using the Gini feature importance with a regularized classification by discriminant partial least squares regression performed as well as or better than a filtering according to different univariate statistical tests, or using regression coefficients in a backward feature elimination. It outperformed the direct application of the random forest classifier, or the direct application of the regularized classifiers on the full set of features. The Gini importance of the random forest provided superior means for measuring feature relevance on spectral data, but – on an optimal subset of features – the regularized classifiers might be preferable over the random forest classifier, in spite of their limitation to model linear dependencies only. A feature selection based on Gini importance, however, may precede a regularized linear classification to identify this optimal subset of features, and to earn a double benefit of both dimensionality reduction and the elimination of noise from the classification task.

726 citations


Journal ArticleDOI
TL;DR: Preliminary experimental results show that the third criterion is a potential discriminative subspace selection method, which significantly reduces the class separation problem in comparing with the linear dimensionality reduction step in FLDA and its several representative extensions.
Abstract: Subspace selection approaches are powerful tools in pattern classification and data visualization. One of the most important subspace approaches is the linear dimensionality reduction step in the Fisher's linear discriminant analysis (FLDA), which has been successfully employed in many fields such as biometrics, bioinformatics, and multimedia information management. However, the linear dimensionality reduction step in FLDA has a critical drawback: for a classification task with c classes, if the dimension of the projected subspace is strictly lower than c - 1, the projection to a subspace tends to merge those classes, which are close together in the original feature space. If separate classes are sampled from Gaussian distributions, all with identical covariance matrices, then the linear dimensionality reduction step in FLDA maximizes the mean value of the Kullback-Leibler (KL) divergences between different classes. Based on this viewpoint, the geometric mean for subspace selection is studied in this paper. Three criteria are analyzed: 1) maximization of the geometric mean of the KL divergences, 2) maximization of the geometric mean of the normalized KL divergences, and 3) the combination of 1 and 2. Preliminary experimental results based on synthetic data, UCI Machine Learning Repository, and handwriting digits show that the third criterion is a potential discriminative subspace selection method, which significantly reduces the class separation problem in comparing with the linear dimensionality reduction step in FLDA and its several representative extensions.

581 citations


Proceedings ArticleDOI
01 Sep 2009
TL;DR: This paper describes a human detection method that augments widely used edge-based features with texture and color information, providing us with a much richer descriptor set, and is shown to outperform state-of-the-art techniques on three varied datasets.
Abstract: Significant research has been devoted to detecting people in images and videos. In this paper we describe a human detection method that augments widely used edge-based features with texture and color information, providing us with a much richer descriptor set. This augmentation results in an extremely high-dimensional feature space (more than 170,000 dimensions). In such high-dimensional spaces, classical machine learning algorithms such as SVMs are nearly intractable with respect to training. Furthermore, the number of training samples is much smaller than the dimensionality of the feature space, by at least an order of magnitude. Finally, the extraction of features from a densely sampled grid structure leads to a high degree of multicollinearity. To circumvent these data characteristics, we employ Partial Least Squares (PLS) analysis, an efficient dimensionality reduction technique, one which preserves significant discriminative information, to project the data onto a much lower dimensional subspace (20 dimensions, reduced from the original 170,000). Our human detection system, employing PLS analysis over the enriched descriptor set, is shown to outperform state-of-the-art techniques on three varied datasets including the popular INRIA pedestrian dataset, the low-resolution gray-scale DaimlerChrysler pedestrian dataset, and the ETHZ pedestrian dataset consisting of full-length videos of crowded scenes.

536 citations


Journal ArticleDOI
TL;DR: Three new approaches to fuzzy-rough FS-based on fuzzy similarity relations based on crisp discernibility matrices are proposed and utilized and initial experimentation shows that the methods greatly reduce dimensionality while preserving classification accuracy.
Abstract: There has been great interest in developing methodologies that are capable of dealing with imprecision and uncertainty. The large amount of research currently being carried out in fuzzy and rough sets is representative of this. Many deep relationships have been established, and recent studies have concluded as to the complementary nature of the two methodologies. Therefore, it is desirable to extend and hybridize the underlying concepts to deal with additional aspects of data imperfection. Such developments offer a high degree of flexibility and provide robust solutions and advanced tools for data analysis. Fuzzy-rough set-based feature (FS) selection has been shown to be highly useful at reducing data dimensionality but possesses several problems that render it ineffective for large datasets. This paper proposes three new approaches to fuzzy-rough FS-based on fuzzy similarity relations. In particular, a fuzzy extension to crisp discernibility matrices is proposed and utilized. Initial experimentation shows that the methods greatly reduce dimensionality while preserving classification accuracy.

521 citations


Journal ArticleDOI
TL;DR: A new approach for nonadaptive dimensionality reduction of manifold-modeled data is proposed, demonstrating that a small number of random linear projections can preserve key information about a manifold- modeled signal.
Abstract: We propose a new approach for nonadaptive dimensionality reduction of manifold-modeled data, demonstrating that a small number of random linear projections can preserve key information about a manifold-modeled signal. We center our analysis on the effect of a random linear projection operator Φ:ℝ N →ℝM , M

488 citations


Journal ArticleDOI
TL;DR: This paper proposes a method called Mlnb which adapts the traditional naive Bayes classifiers to deal with multi-label instances and achieves comparable performance to other well-established multi- label learning algorithms.

433 citations


Proceedings Article
15 Apr 2009
TL;DR: In this paper, a new unsupervised dimensionality reduction technique, called parametric t-SNE, was proposed, which learns a parametric mapping between the high-dimensional data space and the low-dimensional latent space.
Abstract: The paper presents a new unsupervised dimensionality reduction technique, called parametric t-SNE, that learns a parametric mapping between the high-dimensional data space and the low-dimensional latent space. Parametric t-SNE learns the parametric mapping in such a way that the local structure of the data is preserved as well as possible in the latent space. We evaluate the performance of parametric t-SNE in experiments on three datasets, in which we compare it to the performance of two other unsupervised parametric dimensionality reduction techniques. The results of experiments illustrate the strong performance of parametric t-SNE, in particular, in learning settings in which the dimensionality of the latent space is relatively low.

411 citations


Journal ArticleDOI
TL;DR: A new dimensionality reduction algorithm is developed, termed discrim inative locality alignment (DLA), by imposing discriminative information in the part optimization stage, and thorough empirical studies demonstrate the effectiveness of DLA compared with representative dimensionality Reduction algorithms.
Abstract: Spectral analysis-based dimensionality reduction algorithms are important and have been popularly applied in data mining and computer vision applications. To date many algorithms have been developed, e.g., principal component analysis, locally linear embedding, Laplacian eigenmaps, and local tangent space alignment. All of these algorithms have been designed intuitively and pragmatically, i.e., on the basis of the experience and knowledge of experts for their own purposes. Therefore, it will be more informative to provide a systematic framework for understanding the common properties and intrinsic difference in different algorithms. In this paper, we propose such a framework, named "patch alignment,rdquo which consists of two stages: part optimization and whole alignment. The framework reveals that (1) algorithms are intrinsically different in the patch optimization stage and (2) all algorithms share an almost identical whole alignment stage. As an application of this framework, we develop a new dimensionality reduction algorithm, termed discriminative locality alignment (DLA), by imposing discriminative information in the part optimization stage. DLA can (1) attack the distribution nonlinearity of measurements; (2) preserve the discriminative ability; and (3) avoid the small-sample-size problem. Thorough empirical studies demonstrate the effectiveness of DLA compared with representative dimensionality reduction algorithms.

390 citations


Journal ArticleDOI
TL;DR: This work presents a novel feature selection algorithm that is based on ant colony optimization that is inspired by observation on real ants in their search for the shortest paths to food sources and shows the superiority of the proposed algorithm on Reuters-21578 dataset.
Abstract: Feature selection and feature extraction are the most important steps in classification systems. Feature selection is commonly used to reduce dimensionality of datasets with tens or hundreds of thousands of features which would be impossible to process further. One of the problems in which feature selection is essential is text categorization. A major problem of text categorization is the high dimensionality of the feature space; therefore, feature selection is the most important step in text categorization. At present there are many methods to deal with text feature selection. To improve the performance of text categorization, we present a novel feature selection algorithm that is based on ant colony optimization. Ant colony optimization algorithm is inspired by observation on real ants in their search for the shortest paths to food sources. Proposed algorithm is easily implemented and because of use of a simple classifier in that, its computational complexity is very low. The performance of proposed algorithm is compared to the performance of genetic algorithm, information gain and CHI on the task of feature selection in Reuters-21578 dataset. Simulation results on Reuters-21578 dataset show the superiority of the proposed algorithm.

Journal ArticleDOI
TL;DR: This work considers a Bayesian approach to nonlinear inverse problems in which the unknown quantity is a spatial or temporal field, endowed with a hierarchical Gaussian process prior, and introduces truncated Karhunen-Loeve expansions, based on the prior distribution, to efficiently parameterize the unknown field.

Journal ArticleDOI
01 Jan 2009
TL;DR: The rough sets hybridization with fuzzy sets, neural network and metaheuristic algorithms have been reviewed and the performance analysis of the algorithms has been discussed in connection with the classification.
Abstract: A rough set theory is a new mathematical tool to deal with uncertainty and vagueness of decision system and it has been applied successfully in all the fields. It is used to identify the reduct set of the set of all attributes of the decision system. The reduct set is used as preprocessing technique for classification of the decision system in order to bring out the potential patterns or association rules or knowledge through data mining techniques. Several researchers have contributed variety of algorithms for computing the reduct sets by considering different cases like inconsistency, missing attribute values and multiple decision attributes of the decision system. This paper focuses on the review of the techniques for dimensionality reduction under rough set theory environment. Further, the rough sets hybridization with fuzzy sets, neural network and metaheuristic algorithms have also been reviewed. The performance analysis of the algorithms has been discussed in connection with the classification.

Journal ArticleDOI
TL;DR: A theoretical overview of the global optimum solution to the TR problem via the equivalent trace difference problem is proposed, and Eigenvalue perturbation theory is introduced to derive an efficient algorithm based on the Newton-Raphson method.
Abstract: Dimensionality reduction is an important issue in many machine learning and pattern recognition applications, and the trace ratio (TR) problem is an optimization problem involved in many dimensionality reduction algorithms. Conventionally, the solution is approximated via generalized eigenvalue decomposition due to the difficulty of the original problem. However, prior works have indicated that it is more reasonable to solve it directly than via the conventional way. In this brief, we propose a theoretical overview of the global optimum solution to the TR problem via the equivalent trace difference problem. Eigenvalue perturbation theory is introduced to derive an efficient algorithm based on the Newton-Raphson method. Theoretical issues on the convergence and efficiency of our algorithm compared with prior literature are proposed, and are further supported by extensive empirical results.

Journal ArticleDOI
TL;DR: Simple criteria are proposed, which quantify two aspects of the embedding quality, namely its overall quality and its tendency to favor intrusions or extrusions, which are applied to several recent dimensionality reduction methods.

Journal ArticleDOI
TL;DR: In this paper, the authors consider a spiked covariance model in which a base matrix is perturbed by adding a k-sparse maximal eigenvector, and analyze two computationally tractable methods for recovering the support set of this maximal eigvector, as follows: (a) a simple diagonal thresholding method, which transitions from success to failure as a function of the rescaled sample size θdia(n, p, k)=n/[k2log(p−k)]; and (b) a more sophisticated semidefinite programming
Abstract: Principal component analysis (PCA) is a classical method for dimensionality reduction based on extracting the dominant eigenvectors of the sample covariance matrix. However, PCA is well known to behave poorly in the “large p, small n” setting, in which the problem dimension p is comparable to or larger than the sample size n. This paper studies PCA in this high-dimensional regime, but under the additional assumption that the maximal eigenvector is sparse, say, with at most k nonzero components. We consider a spiked covariance model in which a base matrix is perturbed by adding a k-sparse maximal eigenvector, and we analyze two computationally tractable methods for recovering the support set of this maximal eigenvector, as follows: (a) a simple diagonal thresholding method, which transitions from success to failure as a function of the rescaled sample size θdia(n, p, k)=n/[k2log(p−k)]; and (b) a more sophisticated semidefinite programming (SDP) relaxation, which succeeds once the rescaled sample size θsdp(n, p, k)=n/[klog(p−k)] is larger than a critical threshold. In addition, we prove that no method, including the best method which has exponential-time complexity, can succeed in recovering the support if the order parameter θsdp(n, p, k) is below a threshold. Our results thus highlight an interesting trade-off between computational and statistical efficiency in high-dimensional inference.

Journal ArticleDOI
TL;DR: In this paper, the authors investigate the asymptotic behavior of the Principal Component (PC) directions in HDLSS data and show that if the first few eigenvalues of a population covariance matrix are large enough compared to the others, then the corresponding estimated PC directions are consistent or converge to the appropriate subspace (subspace consistency).
Abstract: Principal Component Analysis (PCA) is an important tool of dimension reduction especially when the dimension (or the number of variables) is very high. Asymptotic studies where the sample size is fixed, and the dimension grows (i.e. High Dimension, Low Sample Size (HDLSS)) are becoming increasingly relevant. We investigate the asymptotic behavior of the Principal Component (PC) directions. HDLSS asymptotics are used to study consistency, strong inconsistency and subspace consistency. We show that if the first few eigenvalues of a population covariance matrix are large enough compared to the others, then the corresponding estimated PC directions are consistent or converge to the appropriate subspace (subspace consistency) and most other PC directions are strongly inconsistent. Broad sets of sucient conditions for each of these cases are specified and the main theorem gives a catalogue of possible combinations. In preparation for these results, we show that the geometric representation of HDLSS data holds under general conditions, which includes a mixing condition and a broad range of sphericity measures of the covariance matrix.

Journal ArticleDOI
TL;DR: This work investigates the asymptotic behavior of the Principal Component (PC) directions and shows that if the first few eigenvalues of a population covariance matrix are large enough compared to the others, then the corresponding estimated PC directions are consistent or converge to the appropriate subspace (subspace consistency) and most otherPC directions are strongly inconsistent.
Abstract: Principal Component Analysis (PCA) is an important tool of dimension reduction especially when the dimension (or the number of variables) is very high. Asymptotic studies where the sample size is fixed, and the dimension grows [i.e., High Dimension, Low Sample Size (HDLSS)] are becoming increasingly relevant. We investigate the asymptotic behavior of the Principal Component (PC) directions. HDLSS asymptotics are used to study consistency, strong inconsistency and subspace consistency. We show that if the first few eigenvalues of a population covariance matrix are large enough compared to the others, then the corresponding estimated PC directions are consistent or converge to the appropriate subspace (subspace consistency) and most other PC directions are strongly inconsistent. Broad sets of sufficient conditions for each of these cases are specified and the main theorem gives a catalogue of possible combinations. In preparation for these results, we show that the geometric representation of HDLSS data holds under general conditions, which includes a $\rho$-mixing condition and a broad range of sphericity measures of the covariance matrix.

Journal ArticleDOI
TL;DR: This paper investigates feature subset selection for dimensionality reduction in machine learning and finds that GRASP and Tabu Search obtain significantly better results than the other methods.

Journal ArticleDOI
TL;DR: This work applies the force paradigm to create localized versions of MDS stress functions with a tuning parameter to adjust the strength of nonlocal repulsive forces and solves the problem of tuning parameter selection with a meta-criterion that measures how well the sets of K-nearest neighbors agree between the data and the embedding.
Abstract: In the past decade there has been a resurgence of interest in nonlinear dimension reduction. Among new proposals are “Local Linear Embedding,” “Isomap,” and Kernel Principal Components Analysis which all construct global low-dimensional embeddings from local affine or metric information. We introduce a competing method called “Local Multidimensional Scaling” (LMDS). Like LLE, Isomap, and KPCA, LMDS constructs its global embedding from local information, but it uses instead a combination of MDS and “force-directed” graph drawing. We apply the force paradigm to create localized versions of MDS stress functions with a tuning parameter to adjust the strength of nonlocal repulsive forces. We solve the problem of tuning parameter selection with a meta-criterion that measures how well the sets of K-nearest neighbors agree between the data and the embedding. Tuned LMDS seems to be able to outperform MDS, PCA, LLE, Isomap, and KPCA, as illustrated with two well-known image datasets. The meta-criterion can also be ...

Proceedings ArticleDOI
20 Jun 2009
TL;DR: A label sparse coding based subspace learning algorithm is derived to effectively harness multi-label information for dimensionality reduction and propagate the multi-labels of the training images to the query image with the sparse l1 reconstruction coefficients.
Abstract: In this paper, we present a multi-label sparse coding framework for feature extraction and classification within the context of automatic image annotation. First, each image is encoded into a so-called supervector, derived from the universal Gaussian Mixture Models on orderless image patches. Then, a label sparse coding based subspace learning algorithm is derived to effectively harness multi-label information for dimensionality reduction. Finally, the sparse coding method for multi-label data is proposed to propagate the multi-labels of the training images to the query image with the sparse l1 reconstruction coefficients. Extensive image annotation experiments on the Corel5k and Corel30k databases both show the superior performance of the proposed multi-label sparse coding framework over the state-of-the-art algorithms.

01 Jan 2009
TL;DR: A survey of several techniques for dimension reduction, including principal component analysis, projection pursuit and projection pursuit regression, principal curves and methods based on topologically continuous maps, such as Kohonen’s maps or the generalised topographic mapping are given.
Abstract: The problem of dimension reduction is introduced as a way to overcome the curse of the dimensionality when dealing with vector data in high-dimensional spaces and as a modelling tool for such data. It is defined as the search for a low-dimensional manifold that embeds the high-dimensional data. A classification of dimension reduction problems is proposed. A survey of several techniques for dimension reduction is given, including principal component analysis, projection pursuit and projection pursuit regression, principal curves and methods based on topologically continuous maps, such as Kohonen’s maps or the generalised topographic mapping. Neural network implementations for several of these techniques are also reviewed, such as the projection pursuit learning network and the BCM neuron with an objective function. Several appendices complement the mathematical treatment of the main text.

Journal ArticleDOI
10 Jun 2009
TL;DR: This work has developed a system that visualizes the results of principal component analysis using multiple coordinated views and a rich set of user interactions to support analysis of multivariate datasets through extensive interaction with the PCA output.
Abstract: Principle Component Analysis (PCA) is a widely used mathematical technique in many fields for factor and trend analysis, dimension reduction, etc. However, it is often considered to be a "black box" operation whose results are difficult to interpret and sometimes counter-intuitive to the user. In order to assist the user in better understanding and utilizing PCA, we have developed a system that visualizes the results of principal component analysis using multiple coordinated views and a rich set of user interactions. Our design philosophy is to support analysis of multivariate datasets through extensive interaction with the PCA output. To demonstrate the usefulness of our system, we performed a comparative user study with a known commercial system, SAS/INSIGHT's Interactive Data Exploration. Participants in our study solved a number of high-level analysis tasks with each interface and rated the systems on ease of learning and usefulness. Based on the participants' accuracy, speed, and qualitative feedback, we observe that our system helps users to better understand relationships between the data and the calculated eigenspace, which allows the participants to more accurately analyze the data. User feedback suggests that the interactivity and transparency of our system are the key strengths of our approach.

Proceedings ArticleDOI
TL;DR: A large number of hyperspectral detection algorithms have been developed and used over the last two decades as mentioned in this paper, some of which are based on highly sophisticated mathematical models and methods; others are derived using intuition and simple geometrical concepts.
Abstract: A large number of hyperspectral detection algorithms have been developed and used over the last two decades. Some algorithms are based on highly sophisticated mathematical models and methods; others are derived using intuition and simple geometrical concepts. The purpose of this paper is threefold. First, we discuss the key issues involved in the design and evaluation of detection algorithms for hyperspectral imaging data. Second, we present a critical review of existing detection algorithms for practical hyperspectral imaging applications. Finally, we argue that the "apparent" superiority of sophisticated algorithms with simulated data or in laboratory conditions, does not necessarily translate to superiority in real-world applications. A large number of hyperspectral detection algorithms have been developed and used over the last two decades. A partial list includes the classical matched filter, RX anomaly detector, orthogonal subspace projector, adaptive cosine estimator, finite target matched filter, mixture-tuned matched filter, subspace detectors, kernel matched subspace detectors, and joint subspace detectors. In addition, different methods for dimensionality reduction, background clutter modeling, endmem- ber selection, and the choice between radiance versus reflectance domain processing multiply the number of detection algorithms yet further. New algorithms, new variants of existing algorithms, and new implementations of existing meth- ods appear all the time. Furthermore, a large number of papers have been published in attempts to establish the relative superiority of these detectors. The main argument of this paper is that if we take into account important aspects of real hyperspectral imaging problems, proper use of simple detectors, like the matched filter and the adaptive cosine estimator, may provide acceptable performance for practically relevant applications. More specifically this paper has the following objectives. (a) Discuss how the at-sensor radiance physics-based signal model and the low-rank properties of background covariance matrix lead to a parsimonious taxonomy of most widely used detection algorithms. (b) Explain how the limited amount of background data with respect to the high dimensionality of the feature space limit the performance of detection algorithms that are optimum on theoretical grounds. (c) Argue that any small performance gains attained by more sophis- ticated detectors are irrelevant in practical applications because of the limitations and the uncertainties about many aspects of the situation in which the detector will be deployed. (d) Draw distinction between detectors which require substantial input of expertise and detectors which can be applied automatically with little external input of expertise. (e) Try to answer the question: Is there a best hyperspectral detection algorithm?

Journal ArticleDOI
TL;DR: A novel algorithm is presented in this paper, which can be applied on a small sized data set with a high number of features and outperform the commonly used Principle Component Analysis (PCA)/Multi-Dimensional Scaling (MDS) methods, and the more recently developed ISOMap dimensionality reduction method.
Abstract: Emotional expression and understanding are normal instincts of human beings, but automatical emotion recognition from speech without referring any language or linguistic information remains an unclosed problem. The limited size of existing emotional data samples, and the relative higher dimensionality have outstripped many dimensionality reduction and feature selection algorithms. This paper focuses on the data preprocessing techniques which aim to extract the most effective acoustic features to improve the performance of the emotion recognition. A novel algorithm is presented in this paper, which can be applied on a small sized data set with a high number of features. The presented algorithm integrates the advantages from a decision tree method and the random forest ensemble. Experiment results on a series of Chinese emotional speech data sets indicate that the presented algorithm can achieve improved results on emotional recognition, and outperform the commonly used Principle Component Analysis (PCA)/Multi-Dimensional Scaling (MDS) methods, and the more recently developed ISOMap dimensionality reduction method.

Journal ArticleDOI
TL;DR: CPPCA constitutes a fundamental departure from traditional PCA in that it permits its excellent dimensionality-reduction and compression performance to be realized in an light-encoder/heavy-decoder system architecture.
Abstract: Principal component analysis (PCA) is often central to dimensionality reduction and compression in many applications, yet its data-dependent nature as a transform computed via expensive eigendecomposition often hinders its use in severely resource-constrained settings such as satellite-borne sensors. A process is presented that effectively shifts the computational burden of PCA from the resource-constrained encoder to a presumably more capable base-station decoder. The proposed approach, compressive-projection PCA (CPPCA), is driven by projections at the sensor onto lower-dimensional subspaces chosen at random, while the CPPCA decoder, given only these random projections, recovers not only the coefficients associated with the PCA transform, but also an approximation to the PCA transform basis itself. An analysis is presented that extends existing Rayleigh-Ritz theory to the special case of highly eccentric distributions; this analysis in turn motivates a reconstruction process at the CPPCA decoder that consists of a novel eigenvector reconstruction based on a convex-set optimization driven by Ritz vectors within the projected subspaces. As such, CPPCA constitutes a fundamental departure from traditional PCA in that it permits its excellent dimensionality-reduction and compression performance to be realized in an light-encoder/heavy-decoder system architecture. In experimental results, CPPCA outperforms a multiple-vector variant of compressed sensing for the reconstruction of hyperspectral data.

Proceedings ArticleDOI
01 Sep 2009
TL;DR: A new image representation to capture both the appearance and spatial information for image classification applications is proposed and it is justified that the traditional histogram representation and the spatial pyramid matching are special cases of the hierarchical Gaussianization.
Abstract: In this paper, we propose a new image representation to capture both the appearance and spatial information for image classification applications First, we model the feature vectors, from the whole corpus, from each image and at each individual patch, in a Bayesian hierarchical framework using mixtures of Gaussians After such a hierarchical Gaussianization, each image is represented by a Gaussian mixture model (GMM) for its appearance, and several Gaussian maps for its spatial layout Then we extract the appearance information from the GMM parameters, and the spatial information from global and local statistics over Gaussian maps Finally, we employ a supervised dimension reduction technique called DAP (discriminant attribute projection) to remove noise directions and to further enhance the discriminating power of our representation We justify that the traditional histogram representation and the spatial pyramid matching are special cases of our hierarchical Gaussianization We compare our new representation with other approaches in scene classification, object recognition and face recognition, and our performance ranks among the top in all three tasks

Journal ArticleDOI
TL;DR: A system for dimensionality reduction by combining user-defined quality metrics using weight functions to preserve as many important structures as possible and provides enhancement of diverse structures by supplying a range of automatic variable orderings is introduced.
Abstract: Multivariate data sets including hundreds of variables are increasingly common in many application areas. Most multivariate visualization techniques are unable to display such data effectively, and a common approach is to employ dimensionality reduction prior to visualization. Most existing dimensionality reduction systems focus on preserving one or a few significant structures in data. For many analysis tasks, however, several types of structures can be of high significance and the importance of a certain structure compared to the importance of another is often task-dependent. This paper introduces a system for dimensionality reduction by combining user-defined quality metrics using weight functions to preserve as many important structures as possible. The system aims at effective visualization and exploration of structures within large multivariate data sets and provides enhancement of diverse structures by supplying a range of automatic variable orderings. Furthermore it enables a quality-guided reduction of variables through an interactive display facilitating investigation of trade-offs between loss of structure and the number of variables to keep. The generality and interactivity of the system is demonstrated through a case scenario.

Journal ArticleDOI
TL;DR: The proposed feature selection method called KFFS is produced very promising results compared to F-score feature selection, and the irrelevant or redundant features are removed from high dimensional input feature space.
Abstract: In this paper, we have proposed a new feature selection method called kernel F-score feature selection (KFFS) used as pre-processing step in the classification of medical datasets. KFFS consists of two phases. In the first phase, input spaces (features) of medical datasets have been transformed to kernel space by means of Linear (Lin) or Radial Basis Function (RBF) kernel functions. By this way, the dimensions of medical datasets have increased to high dimension feature space. In the second phase, the F-score values of medical datasets with high dimensional feature space have been calculated using F-score formula. And then the mean value of calculated F-scores has been computed. If the F-score value of any feature in medical datasets is bigger than this mean value, that feature will be selected. Otherwise, that feature is removed from feature space. Thanks to KFFS method, the irrelevant or redundant features are removed from high dimensional input feature space. The cause of using kernel functions transforms from non-linearly separable medical dataset to a linearly separable feature space. In this study, we have used the heart disease dataset, SPECT (Single Photon Emission Computed Tomography) images dataset, and Escherichia coli Promoter Gene Sequence dataset taken from UCI (University California, Irvine) machine learning database to test the performance of KFFS method. As classification algorithms, Least Square Support Vector Machine (LS-SVM) and Levenberg-Marquardt Artificial Neural Network have been used. As shown in the obtained results, the proposed feature selection method called KFFS is produced very promising results compared to F-score feature selection.

Journal ArticleDOI
TL;DR: This paper proposes a novel semi-supervised orthogonal discriminant analysis via label propagation that propagates the label information from the labeled data to the unlabeled data through a specially designed label propagation, and thus the distribution of the unl labeled data can be explored more effectively to learn a better subspace.