Showing papers on "Dimensionality reduction published in 2009"

PDF

Open Access

Dimensionality Reduction: A Comparative Review

[...]

Laurens van der Maaten, Eric O. Postma¹, Jaap van den Herik²•Institutions (2)

Tilburg University¹, Maastricht University²

01 Jan 2009

TL;DR: The results of the experiments reveal that nonlinear techniques perform well on selected artificial tasks, but that this strong performance does not necessarily extend to real-world tasks.

...read moreread less

Abstract: In recent years, a variety of nonlinear dimensionality reduction techniques have been proposed that aim to address the limitations of traditional techniques such as PCA and classical scaling. The paper presents a review and systematic comparison of these techniques. The performances of the nonlinear techniques are investigated on artificial and natural tasks. The results of the experiments reveal that nonlinear techniques perform well on selected artificial tasks, but that this strong performance does not necessarily extend to real-world tasks. The paper explains these results by identifying weaknesses of current nonlinear techniques, and suggests how the performance of nonlinear dimensionality reduction techniques may be improved.

...read moreread less

2,141 citations

Proceedings Article•DOI•

Feature hashing for large scale multitask learning

[...]

Kilian Q. Weinberger¹, Anirban Dasgupta¹, John Langford¹, Alexander J. Smola¹, Josh Attenberg¹ - Show less +1 more•Institutions (1)

Yahoo!¹

14 Jun 2009

TL;DR: In this article, the authors provide exponential tail bounds for feature hashing and show that the interaction between random subspaces is negligible with high probability, and demonstrate the feasibility of this approach with experimental results for a new use case.

...read moreread less

Abstract: Empirical evidence suggests that hashing is an effective strategy for dimensionality reduction and practical nonparametric estimation. In this paper we provide exponential tail bounds for feature hashing and show that the interaction between random subspaces is negligible with high probability. We demonstrate the feasibility of this approach with experimental results for a new use case --- multitask learning with hundreds of thousands of tasks.

...read moreread less

955 citations

Journal Article•DOI•

A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

[...]

Bjoern H. Menze¹, B. Michael Kelm¹, Ralf Masuch, Uwe Himmelreich², Peter Bachert³, Wolfgang Petrich⁴, Wolfgang Petrich⁵, Fred A. Hamprecht¹, Fred A. Hamprecht⁴ - Show less +5 more•Institutions (5)

Interdisciplinary Center for Scientific Computing¹, Katholieke Universiteit Leuven², German Cancer Research Center³, Heidelberg University⁴, Hoffmann-La Roche⁵

10 Jul 2009-BMC Bioinformatics

TL;DR: The Gini importance of the random forest provided superior means for measuring feature relevance on spectral data, but – on an optimal subset of features – the regularized classifiers might be preferable over the random Forest classifier, in spite of their limitation to model linear dependencies only.

...read moreread less

Abstract: Regularized regression methods such as principal component or partial least squares regression perform well in learning tasks on high dimensional spectral data, but cannot explicitly eliminate irrelevant features. The random forest classifier with its associated Gini feature importance, on the other hand, allows for an explicit feature elimination, but may not be optimally adapted to spectral data due to the topology of its constituent classification trees which are based on orthogonal splits in feature space. We propose to combine the best of both approaches, and evaluated the joint use of a feature selection based on a recursive feature elimination using the Gini importance of random forests' together with regularized classification methods on spectral data sets from medical diagnostics, chemotaxonomy, biomedical analytics, food science, and synthetically modified spectral data. Here, a feature selection using the Gini feature importance with a regularized classification by discriminant partial least squares regression performed as well as or better than a filtering according to different univariate statistical tests, or using regression coefficients in a backward feature elimination. It outperformed the direct application of the random forest classifier, or the direct application of the regularized classifiers on the full set of features. The Gini importance of the random forest provided superior means for measuring feature relevance on spectral data, but – on an optimal subset of features – the regularized classifiers might be preferable over the random forest classifier, in spite of their limitation to model linear dependencies only. A feature selection based on Gini importance, however, may precede a regularized linear classification to identify this optimal subset of features, and to earn a double benefit of both dimensionality reduction and the elimination of noise from the classification task.

...read moreread less

726 citations

Journal Article•DOI•

Geometric Mean for Subspace Selection

[...]

Dacheng Tao¹, Xuelong Li², Xindong Wu³, Stephen J. Maybank¹•Institutions (3)

Birkbeck, University of London¹, University of London², University of Vermont³

01 Feb 2009-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Preliminary experimental results show that the third criterion is a potential discriminative subspace selection method, which significantly reduces the class separation problem in comparing with the linear dimensionality reduction step in FLDA and its several representative extensions.

...read moreread less

Abstract: Subspace selection approaches are powerful tools in pattern classification and data visualization. One of the most important subspace approaches is the linear dimensionality reduction step in the Fisher's linear discriminant analysis (FLDA), which has been successfully employed in many fields such as biometrics, bioinformatics, and multimedia information management. However, the linear dimensionality reduction step in FLDA has a critical drawback: for a classification task with c classes, if the dimension of the projected subspace is strictly lower than c - 1, the projection to a subspace tends to merge those classes, which are close together in the original feature space. If separate classes are sampled from Gaussian distributions, all with identical covariance matrices, then the linear dimensionality reduction step in FLDA maximizes the mean value of the Kullback-Leibler (KL) divergences between different classes. Based on this viewpoint, the geometric mean for subspace selection is studied in this paper. Three criteria are analyzed: 1) maximization of the geometric mean of the KL divergences, 2) maximization of the geometric mean of the normalized KL divergences, and 3) the combination of 1 and 2. Preliminary experimental results based on synthetic data, UCI Machine Learning Repository, and handwriting digits show that the third criterion is a potential discriminative subspace selection method, which significantly reduces the class separation problem in comparing with the linear dimensionality reduction step in FLDA and its several representative extensions.

...read moreread less

581 citations

Proceedings Article•DOI•

Human detection using partial least squares analysis

[...]

William Robson Schwartz¹, Aniruddha Kembhavi¹, David Harwood¹, Larry S. Davis¹•Institutions (1)

University of Maryland, College Park¹

01 Sep 2009

TL;DR: This paper describes a human detection method that augments widely used edge-based features with texture and color information, providing us with a much richer descriptor set, and is shown to outperform state-of-the-art techniques on three varied datasets.

...read moreread less

Abstract: Significant research has been devoted to detecting people in images and videos. In this paper we describe a human detection method that augments widely used edge-based features with texture and color information, providing us with a much richer descriptor set. This augmentation results in an extremely high-dimensional feature space (more than 170,000 dimensions). In such high-dimensional spaces, classical machine learning algorithms such as SVMs are nearly intractable with respect to training. Furthermore, the number of training samples is much smaller than the dimensionality of the feature space, by at least an order of magnitude. Finally, the extraction of features from a densely sampled grid structure leads to a high degree of multicollinearity. To circumvent these data characteristics, we employ Partial Least Squares (PLS) analysis, an efficient dimensionality reduction technique, one which preserves significant discriminative information, to project the data onto a much lower dimensional subspace (20 dimensions, reduced from the original 170,000). Our human detection system, employing PLS analysis over the enriched descriptor set, is shown to outperform state-of-the-art techniques on three varied datasets including the popular INRIA pedestrian dataset, the low-resolution gray-scale DaimlerChrysler pedestrian dataset, and the ETHZ pedestrian dataset consisting of full-length videos of crowded scenes.

...read moreread less

536 citations

Journal Article•DOI•

New Approaches to Fuzzy-Rough Feature Selection

[...]

Richard Jensen¹, Qiang Shen¹•Institutions (1)

Aberystwyth University¹

01 Aug 2009-IEEE Transactions on Fuzzy Systems

TL;DR: Three new approaches to fuzzy-rough FS-based on fuzzy similarity relations based on crisp discernibility matrices are proposed and utilized and initial experimentation shows that the methods greatly reduce dimensionality while preserving classification accuracy.

...read moreread less

Abstract: There has been great interest in developing methodologies that are capable of dealing with imprecision and uncertainty. The large amount of research currently being carried out in fuzzy and rough sets is representative of this. Many deep relationships have been established, and recent studies have concluded as to the complementary nature of the two methodologies. Therefore, it is desirable to extend and hybridize the underlying concepts to deal with additional aspects of data imperfection. Such developments offer a high degree of flexibility and provide robust solutions and advanced tools for data analysis. Fuzzy-rough set-based feature (FS) selection has been shown to be highly useful at reducing data dimensionality but possesses several problems that render it ineffective for large datasets. This paper proposes three new approaches to fuzzy-rough FS-based on fuzzy similarity relations. In particular, a fuzzy extension to crisp discernibility matrices is proposed and utilized. Initial experimentation shows that the methods greatly reduce dimensionality while preserving classification accuracy.

...read moreread less

521 citations

Journal Article•DOI•

Random Projections of Smooth Manifolds

[...]

Richard G. Baraniuk¹, Michael B. Wakin²•Institutions (2)

Rice University¹, University of Michigan²

23 Jan 2009-Foundations of Computational Mathematics

TL;DR: A new approach for nonadaptive dimensionality reduction of manifold-modeled data is proposed, demonstrating that a small number of random linear projections can preserve key information about a manifold- modeled signal.

...read moreread less

Abstract: We propose a new approach for nonadaptive dimensionality reduction of manifold-modeled data, demonstrating that a small number of random linear projections can preserve key information about a manifold-modeled signal. We center our analysis on the effect of a random linear projection operator Φ:ℝ N →ℝM , M

...read moreread less

488 citations

Journal Article•DOI•

Feature selection for multi-label naive Bayes classification

[...]

Min-Ling Zhang¹, José M. Peña², Víctor Robles²•Institutions (2)

Nanjing University¹, Technical University of Madrid²

01 Sep 2009-Information Sciences

TL;DR: This paper proposes a method called Mlnb which adapts the traditional naive Bayes classifiers to deal with multi-label instances and achieves comparable performance to other well-established multi- label learning algorithms.

...read moreread less

433 citations

Proceedings Article•

Learning a Parametric Embedding by Preserving Local Structure

[...]

Laurens van der Maaten

15 Apr 2009

TL;DR: In this paper, a new unsupervised dimensionality reduction technique, called parametric t-SNE, was proposed, which learns a parametric mapping between the high-dimensional data space and the low-dimensional latent space.

...read moreread less

Abstract: The paper presents a new unsupervised dimensionality reduction technique, called parametric t-SNE, that learns a parametric mapping between the high-dimensional data space and the low-dimensional latent space. Parametric t-SNE learns the parametric mapping in such a way that the local structure of the data is preserved as well as possible in the latent space. We evaluate the performance of parametric t-SNE in experiments on three datasets, in which we compare it to the performance of two other unsupervised parametric dimensionality reduction techniques. The results of experiments illustrate the strong performance of parametric t-SNE, in particular, in learning settings in which the dimensionality of the latent space is relatively low.

...read moreread less

411 citations

Journal Article•DOI•

Patch Alignment for Dimensionality Reduction

[...]

Tianhao Zhang¹, Dacheng Tao², Xuelong Li³, Jie Yang¹•Institutions (3)

Shanghai Jiao Tong University¹, Hong Kong Polytechnic University², Birkbeck, University of London³

01 Sep 2009-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A new dimensionality reduction algorithm is developed, termed discrim inative locality alignment (DLA), by imposing discriminative information in the part optimization stage, and thorough empirical studies demonstrate the effectiveness of DLA compared with representative dimensionality Reduction algorithms.

...read moreread less

Abstract: Spectral analysis-based dimensionality reduction algorithms are important and have been popularly applied in data mining and computer vision applications. To date many algorithms have been developed, e.g., principal component analysis, locally linear embedding, Laplacian eigenmaps, and local tangent space alignment. All of these algorithms have been designed intuitively and pragmatically, i.e., on the basis of the experience and knowledge of experts for their own purposes. Therefore, it will be more informative to provide a systematic framework for understanding the common properties and intrinsic difference in different algorithms. In this paper, we propose such a framework, named "patch alignment,rdquo which consists of two stages: part optimization and whole alignment. The framework reveals that (1) algorithms are intrinsically different in the patch optimization stage and (2) all algorithms share an almost identical whole alignment stage. As an application of this framework, we develop a new dimensionality reduction algorithm, termed discriminative locality alignment (DLA), by imposing discriminative information in the part optimization stage. DLA can (1) attack the distribution nonlinearity of measurements; (2) preserve the discriminative ability; and (3) avoid the small-sample-size problem. Thorough empirical studies demonstrate the effectiveness of DLA compared with representative dimensionality reduction algorithms.

...read moreread less

390 citations

Journal Article•DOI•

Text feature selection using ant colony optimization

[...]

Mehdi Hosseinzadeh Aghdam¹, Nasser Ghasem-Aghaee¹, Mohammad Ehsan Basiri¹•Institutions (1)

University of Isfahan¹

01 Apr 2009-Expert Systems With Applications

TL;DR: This work presents a novel feature selection algorithm that is based on ant colony optimization that is inspired by observation on real ants in their search for the shortest paths to food sources and shows the superiority of the proposed algorithm on Reuters-21578 dataset.

...read moreread less

Abstract: Feature selection and feature extraction are the most important steps in classification systems. Feature selection is commonly used to reduce dimensionality of datasets with tens or hundreds of thousands of features which would be impossible to process further. One of the problems in which feature selection is essential is text categorization. A major problem of text categorization is the high dimensionality of the feature space; therefore, feature selection is the most important step in text categorization. At present there are many methods to deal with text feature selection. To improve the performance of text categorization, we present a novel feature selection algorithm that is based on ant colony optimization. Ant colony optimization algorithm is inspired by observation on real ants in their search for the shortest paths to food sources. Proposed algorithm is easily implemented and because of use of a simple classifier in that, its computational complexity is very low. The performance of proposed algorithm is compared to the performance of genetic algorithm, information gain and CHI on the task of feature selection in Reuters-21578 dataset. Simulation results on Reuters-21578 dataset show the superiority of the proposed algorithm.

...read moreread less

Journal Article•DOI•

Dimensionality reduction and polynomial chaos acceleration of Bayesian inference in inverse problems

[...]

Youssef M. Marzouk¹, Habib N. Najm²•Institutions (2)

Massachusetts Institute of Technology¹, Sandia National Laboratories²

01 Apr 2009-Journal of Computational Physics

TL;DR: This work considers a Bayesian approach to nonlinear inverse problems in which the unknown quantity is a spatial or temporal field, endowed with a hierarchical Gaussian process prior, and introduces truncated Karhunen-Loeve expansions, based on the prior distribution, to efficiently parameterize the unknown field.

...read moreread less

Journal Article•DOI•

Review: Dimensionality reduction based on rough set theory: A review

[...]

K. Thangavel¹, A. Pethalakshmi²•Institutions (2)

Periyar University¹, Mother Teresa Women's University²

01 Jan 2009

TL;DR: The rough sets hybridization with fuzzy sets, neural network and metaheuristic algorithms have been reviewed and the performance analysis of the algorithms has been discussed in connection with the classification.

...read moreread less

Abstract: A rough set theory is a new mathematical tool to deal with uncertainty and vagueness of decision system and it has been applied successfully in all the fields. It is used to identify the reduct set of the set of all attributes of the decision system. The reduct set is used as preprocessing technique for classification of the decision system in order to bring out the potential patterns or association rules or knowledge through data mining techniques. Several researchers have contributed variety of algorithms for computing the reduct sets by considering different cases like inconsistency, missing attribute values and multiple decision attributes of the decision system. This paper focuses on the review of the techniques for dimensionality reduction under rough set theory environment. Further, the rough sets hybridization with fuzzy sets, neural network and metaheuristic algorithms have also been reviewed. The performance analysis of the algorithms has been discussed in connection with the classification.

...read moreread less

Journal Article•DOI•

Trace Ratio Problem Revisited

[...]

Yangqing Jia¹, Feiping Nie¹, Changshui Zhang¹•Institutions (1)

Tsinghua University¹

01 Apr 2009-IEEE Transactions on Neural Networks

TL;DR: A theoretical overview of the global optimum solution to the TR problem via the equivalent trace difference problem is proposed, and Eigenvalue perturbation theory is introduced to derive an efficient algorithm based on the Newton-Raphson method.

...read moreread less

Abstract: Dimensionality reduction is an important issue in many machine learning and pattern recognition applications, and the trace ratio (TR) problem is an optimization problem involved in many dimensionality reduction algorithms. Conventionally, the solution is approximated via generalized eigenvalue decomposition due to the difficulty of the original problem. However, prior works have indicated that it is more reasonable to solve it directly than via the conventional way. In this brief, we propose a theoretical overview of the global optimum solution to the TR problem via the equivalent trace difference problem. Eigenvalue perturbation theory is introduced to derive an efficient algorithm based on the Newton-Raphson method. Theoretical issues on the convergence and efficiency of our algorithm compared with prior literature are proposed, and are further supported by extensive empirical results.

...read moreread less

Journal Article•DOI•

Quality assessment of dimensionality reduction: Rank-based criteria

[...]

John Aldo Lee¹, Michel Verleysen¹•Institutions (1)

Université catholique de Louvain¹

01 Mar 2009-Neurocomputing

TL;DR: Simple criteria are proposed, which quantify two aspects of the embedding quality, namely its overall quality and its tendency to favor intrusions or extrusions, which are applied to several recent dimensionality reduction methods.

...read moreread less

Journal Article•DOI•

High-dimensional analysis of semidefinite relaxations for sparse principal components

[...]

Arash A. Amini, Martin J. Wainwright

01 Oct 2009-Annals of Statistics

TL;DR: In this paper, the authors consider a spiked covariance model in which a base matrix is perturbed by adding a k-sparse maximal eigenvector, and analyze two computationally tractable methods for recovering the support set of this maximal eigvector, as follows: (a) a simple diagonal thresholding method, which transitions from success to failure as a function of the rescaled sample size θdia(n, p, k)=n/[k2log(p−k)]; and (b) a more sophisticated semidefinite programming

...read moreread less

Abstract: Principal component analysis (PCA) is a classical method for dimensionality reduction based on extracting the dominant eigenvectors of the sample covariance matrix. However, PCA is well known to behave poorly in the “large p, small n” setting, in which the problem dimension p is comparable to or larger than the sample size n. This paper studies PCA in this high-dimensional regime, but under the additional assumption that the maximal eigenvector is sparse, say, with at most k nonzero components. We consider a spiked covariance model in which a base matrix is perturbed by adding a k-sparse maximal eigenvector, and we analyze two computationally tractable methods for recovering the support set of this maximal eigenvector, as follows: (a) a simple diagonal thresholding method, which transitions from success to failure as a function of the rescaled sample size θdia(n, p, k)=n/[k2log(p−k)]; and (b) a more sophisticated semidefinite programming (SDP) relaxation, which succeeds once the rescaled sample size θsdp(n, p, k)=n/[klog(p−k)] is larger than a critical threshold. In addition, we prove that no method, including the best method which has exponential-time complexity, can succeed in recovering the support if the order parameter θsdp(n, p, k) is below a threshold. Our results thus highlight an interesting trade-off between computational and statistical efficiency in high-dimensional inference.

...read moreread less

Journal Article•DOI•

Pca consistency in high dimension, low sample size context

[...]

Sungkyu Jung¹, James Stephen Marron•Institutions (1)

University of North Carolina at Chapel Hill¹

01 Dec 2009-Annals of Statistics

TL;DR: In this paper, the authors investigate the asymptotic behavior of the Principal Component (PC) directions in HDLSS data and show that if the first few eigenvalues of a population covariance matrix are large enough compared to the others, then the corresponding estimated PC directions are consistent or converge to the appropriate subspace (subspace consistency).

...read moreread less

Abstract: Principal Component Analysis (PCA) is an important tool of dimension reduction especially when the dimension (or the number of variables) is very high. Asymptotic studies where the sample size is fixed, and the dimension grows (i.e. High Dimension, Low Sample Size (HDLSS)) are becoming increasingly relevant. We investigate the asymptotic behavior of the Principal Component (PC) directions. HDLSS asymptotics are used to study consistency, strong inconsistency and subspace consistency. We show that if the first few eigenvalues of a population covariance matrix are large enough compared to the others, then the corresponding estimated PC directions are consistent or converge to the appropriate subspace (subspace consistency) and most other PC directions are strongly inconsistent. Broad sets of sucient conditions for each of these cases are specified and the main theorem gives a catalogue of possible combinations. In preparation for these results, we show that the geometric representation of HDLSS data holds under general conditions, which includes a mixing condition and a broad range of sphericity measures of the covariance matrix.

...read moreread less

Journal Article•DOI•

PCA consistency in high dimension, low sample size context

[...]

Sungkyu Jung¹, James Stephen Marron•Institutions (1)

University of North Carolina at Chapel Hill¹

19 Nov 2009-arXiv: Statistics Theory

TL;DR: This work investigates the asymptotic behavior of the Principal Component (PC) directions and shows that if the first few eigenvalues of a population covariance matrix are large enough compared to the others, then the corresponding estimated PC directions are consistent or converge to the appropriate subspace (subspace consistency) and most otherPC directions are strongly inconsistent.

...read moreread less

Abstract: Principal Component Analysis (PCA) is an important tool of dimension reduction especially when the dimension (or the number of variables) is very high. Asymptotic studies where the sample size is fixed, and the dimension grows [i.e., High Dimension, Low Sample Size (HDLSS)] are becoming increasingly relevant. We investigate the asymptotic behavior of the Principal Component (PC) directions. HDLSS asymptotics are used to study consistency, strong inconsistency and subspace consistency. We show that if the first few eigenvalues of a population covariance matrix are large enough compared to the others, then the corresponding estimated PC directions are consistent or converge to the appropriate subspace (subspace consistency) and most other PC directions are strongly inconsistent. Broad sets of sufficient conditions for each of these cases are specified and the main theorem gives a catalogue of possible combinations. In preparation for these results, we show that the geometric representation of HDLSS data holds under general conditions, which includes a $\rho$-mixing condition and a broad range of sphericity measures of the covariance matrix.

...read moreread less

Journal Article•DOI•

Different metaheuristic strategies to solve the feature selection problem

[...]

Silvia Casado Yusta¹•Institutions (1)

University of Burgos¹

01 Apr 2009-Pattern Recognition Letters

TL;DR: This paper investigates feature subset selection for dimensionality reduction in machine learning and finds that GRASP and Tabu Search obtain significantly better results than the other methods.

...read moreread less

Journal Article•DOI•

Local Multidimensional Scaling for Nonlinear Dimension Reduction, Graph Drawing, and Proximity Analysis

[...]

Lisha Chen, Andreas Buja

01 Mar 2009-Journal of the American Statistical Association

TL;DR: This work applies the force paradigm to create localized versions of MDS stress functions with a tuning parameter to adjust the strength of nonlocal repulsive forces and solves the problem of tuning parameter selection with a meta-criterion that measures how well the sets of K-nearest neighbors agree between the data and the embedding.

...read moreread less

Abstract: In the past decade there has been a resurgence of interest in nonlinear dimension reduction. Among new proposals are “Local Linear Embedding,” “Isomap,” and Kernel Principal Components Analysis which all construct global low-dimensional embeddings from local affine or metric information. We introduce a competing method called “Local Multidimensional Scaling” (LMDS). Like LLE, Isomap, and KPCA, LMDS constructs its global embedding from local information, but it uses instead a combination of MDS and “force-directed” graph drawing. We apply the force paradigm to create localized versions of MDS stress functions with a tuning parameter to adjust the strength of nonlocal repulsive forces. We solve the problem of tuning parameter selection with a meta-criterion that measures how well the sets of K-nearest neighbors agree between the data and the embedding. Tuned LMDS seems to be able to outperform MDS, PCA, LLE, Isomap, and KPCA, as illustrated with two well-known image datasets. The meta-criterion can also be ...

...read moreread less

Proceedings Article•DOI•

Multi-label sparse coding for automatic image annotation

[...]

Changhu Wang¹, Shuicheng Yan², Lei Zhang³, Hong-Jiang Zhang⁴•Institutions (4)

University of Science and Technology of China¹, National University of Singapore², Microsoft³, Advanced Technology Center⁴

20 Jun 2009

TL;DR: A label sparse coding based subspace learning algorithm is derived to effectively harness multi-label information for dimensionality reduction and propagate the multi-labels of the training images to the query image with the sparse l1 reconstruction coefficients.

...read moreread less

Abstract: In this paper, we present a multi-label sparse coding framework for feature extraction and classification within the context of automatic image annotation. First, each image is encoded into a so-called supervector, derived from the universal Gaussian Mixture Models on orderless image patches. Then, a label sparse coding based subspace learning algorithm is derived to effectively harness multi-label information for dimensionality reduction. Finally, the sparse coding method for multi-label data is proposed to propagate the multi-labels of the training images to the query image with the sparse l1 reconstruction coefficients. Extensive image annotation experiments on the Corel5k and Corel30k databases both show the superior performance of the proposed multi-label sparse coding framework over the state-of-the-art algorithms.

...read moreread less

A Review of Dimension Reduction Techniques

[...]

Miguel Á. Carreira-Perpiñán¹•Institutions (1)

University of Sheffield¹

01 Jan 2009

TL;DR: A survey of several techniques for dimension reduction, including principal component analysis, projection pursuit and projection pursuit regression, principal curves and methods based on topologically continuous maps, such as Kohonen’s maps or the generalised topographic mapping are given.

...read moreread less

Abstract: The problem of dimension reduction is introduced as a way to overcome the curse of the dimensionality when dealing with vector data in high-dimensional spaces and as a modelling tool for such data. It is defined as the search for a low-dimensional manifold that embeds the high-dimensional data. A classification of dimension reduction problems is proposed. A survey of several techniques for dimension reduction is given, including principal component analysis, projection pursuit and projection pursuit regression, principal curves and methods based on topologically continuous maps, such as Kohonen’s maps or the generalised topographic mapping. Neural network implementations for several of these techniques are also reviewed, such as the projection pursuit learning network and the BCM neuron with an objective function. Several appendices complement the mathematical treatment of the main text.

...read moreread less

Journal Article•DOI•

iPCA: an interactive system for PCA-based visual analytics

[...]

Dong Hyun Jeong¹, Caroline Ziemkiewicz¹, Brian Fisher², William Ribarsky¹, Remco Chang¹ - Show less +1 more•Institutions (2)

University of North Carolina at Charlotte¹, Simon Fraser University²

10 Jun 2009

TL;DR: This work has developed a system that visualizes the results of principal component analysis using multiple coordinated views and a rich set of user interactions to support analysis of multivariate datasets through extensive interaction with the PCA output.

...read moreread less

Abstract: Principle Component Analysis (PCA) is a widely used mathematical technique in many fields for factor and trend analysis, dimension reduction, etc. However, it is often considered to be a "black box" operation whose results are difficult to interpret and sometimes counter-intuitive to the user. In order to assist the user in better understanding and utilizing PCA, we have developed a system that visualizes the results of principal component analysis using multiple coordinated views and a rich set of user interactions. Our design philosophy is to support analysis of multivariate datasets through extensive interaction with the PCA output. To demonstrate the usefulness of our system, we performed a comparative user study with a known commercial system, SAS/INSIGHT's Interactive Data Exploration. Participants in our study solved a number of high-level analysis tasks with each interface and rated the systems on ease of learning and usefulness. Based on the participants' accuracy, speed, and qualitative feedback, we observe that our system helps users to better understand relationships between the data and the calculated eigenspace, which allows the participants to more accurately analyze the data. User feedback suggests that the interactivity and transparency of our system are the key strengths of our approach.

...read moreread less

Proceedings Article•DOI•

Is there a best hyperspectral detection algorithm

[...]

Dimitris G. Manolakis¹, Ronald B. Lockwood², Thomas W. Cooley², J. Jacobson•Institutions (2)

Massachusetts Institute of Technology¹, Air Force Research Laboratory²

27 Apr 2009-Proceedings of SPIE

TL;DR: A large number of hyperspectral detection algorithms have been developed and used over the last two decades as mentioned in this paper, some of which are based on highly sophisticated mathematical models and methods; others are derived using intuition and simple geometrical concepts.

...read moreread less

Abstract: A large number of hyperspectral detection algorithms have been developed and used over the last two decades. Some algorithms are based on highly sophisticated mathematical models and methods; others are derived using intuition and simple geometrical concepts. The purpose of this paper is threefold. First, we discuss the key issues involved in the design and evaluation of detection algorithms for hyperspectral imaging data. Second, we present a critical review of existing detection algorithms for practical hyperspectral imaging applications. Finally, we argue that the "apparent" superiority of sophisticated algorithms with simulated data or in laboratory conditions, does not necessarily translate to superiority in real-world applications. A large number of hyperspectral detection algorithms have been developed and used over the last two decades. A partial list includes the classical matched filter, RX anomaly detector, orthogonal subspace projector, adaptive cosine estimator, finite target matched filter, mixture-tuned matched filter, subspace detectors, kernel matched subspace detectors, and joint subspace detectors. In addition, different methods for dimensionality reduction, background clutter modeling, endmem- ber selection, and the choice between radiance versus reflectance domain processing multiply the number of detection algorithms yet further. New algorithms, new variants of existing algorithms, and new implementations of existing meth- ods appear all the time. Furthermore, a large number of papers have been published in attempts to establish the relative superiority of these detectors. The main argument of this paper is that if we take into account important aspects of real hyperspectral imaging problems, proper use of simple detectors, like the matched filter and the adaptive cosine estimator, may provide acceptable performance for practically relevant applications. More specifically this paper has the following objectives. (a) Discuss how the at-sensor radiance physics-based signal model and the low-rank properties of background covariance matrix lead to a parsimonious taxonomy of most widely used detection algorithms. (b) Explain how the limited amount of background data with respect to the high dimensionality of the feature space limit the performance of detection algorithms that are optimum on theoretical grounds. (c) Argue that any small performance gains attained by more sophis- ticated detectors are irrelevant in practical applications because of the limitations and the uncertainties about many aspects of the situation in which the detector will be deployed. (d) Draw distinction between detectors which require substantial input of expertise and detectors which can be applied automatically with little external input of expertise. (e) Try to answer the question: Is there a best hyperspectral detection algorithm?

...read moreread less

Journal Article•DOI•

Acoustic feature selection for automatic emotion recognition from speech

[...]

Jia Rong¹, Gang Li¹, Yi-Ping Phoebe Chen¹•Institutions (1)

Deakin University¹

01 May 2009-Information Processing and Management

TL;DR: A novel algorithm is presented in this paper, which can be applied on a small sized data set with a high number of features and outperform the commonly used Principle Component Analysis (PCA)/Multi-Dimensional Scaling (MDS) methods, and the more recently developed ISOMap dimensionality reduction method.

...read moreread less

Abstract: Emotional expression and understanding are normal instincts of human beings, but automatical emotion recognition from speech without referring any language or linguistic information remains an unclosed problem. The limited size of existing emotional data samples, and the relative higher dimensionality have outstripped many dimensionality reduction and feature selection algorithms. This paper focuses on the data preprocessing techniques which aim to extract the most effective acoustic features to improve the performance of the emotion recognition. A novel algorithm is presented in this paper, which can be applied on a small sized data set with a high number of features. The presented algorithm integrates the advantages from a decision tree method and the random forest ensemble. Experiment results on a series of Chinese emotional speech data sets indicate that the presented algorithm can achieve improved results on emotional recognition, and outperform the commonly used Principle Component Analysis (PCA)/Multi-Dimensional Scaling (MDS) methods, and the more recently developed ISOMap dimensionality reduction method.

...read moreread less

Journal Article•DOI•

Compressive-Projection Principal Component Analysis

[...]

James E. Fowler¹•Institutions (1)

Mississippi State University¹

01 Oct 2009-IEEE Transactions on Image Processing

TL;DR: CPPCA constitutes a fundamental departure from traditional PCA in that it permits its excellent dimensionality-reduction and compression performance to be realized in an light-encoder/heavy-decoder system architecture.

...read moreread less

Abstract: Principal component analysis (PCA) is often central to dimensionality reduction and compression in many applications, yet its data-dependent nature as a transform computed via expensive eigendecomposition often hinders its use in severely resource-constrained settings such as satellite-borne sensors. A process is presented that effectively shifts the computational burden of PCA from the resource-constrained encoder to a presumably more capable base-station decoder. The proposed approach, compressive-projection PCA (CPPCA), is driven by projections at the sensor onto lower-dimensional subspaces chosen at random, while the CPPCA decoder, given only these random projections, recovers not only the coefficients associated with the PCA transform, but also an approximation to the PCA transform basis itself. An analysis is presented that extends existing Rayleigh-Ritz theory to the special case of highly eccentric distributions; this analysis in turn motivates a reconstruction process at the CPPCA decoder that consists of a novel eigenvector reconstruction based on a convex-set optimization driven by Ritz vectors within the projected subspaces. As such, CPPCA constitutes a fundamental departure from traditional PCA in that it permits its excellent dimensionality-reduction and compression performance to be realized in an light-encoder/heavy-decoder system architecture. In experimental results, CPPCA outperforms a multiple-vector variant of compressed sensing for the reconstruction of hyperspectral data.

...read moreread less

Proceedings Article•DOI•

Hierarchical Gaussianization for image classification

[...]

Xi Zhou¹, Na Cui¹, Zhen Li¹, Feng Liang¹, Thomas S. Huang¹ - Show less +1 more•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Sep 2009

TL;DR: A new image representation to capture both the appearance and spatial information for image classification applications is proposed and it is justified that the traditional histogram representation and the spatial pyramid matching are special cases of the hierarchical Gaussianization.

...read moreread less

Abstract: In this paper, we propose a new image representation to capture both the appearance and spatial information for image classification applications First, we model the feature vectors, from the whole corpus, from each image and at each individual patch, in a Bayesian hierarchical framework using mixtures of Gaussians After such a hierarchical Gaussianization, each image is represented by a Gaussian mixture model (GMM) for its appearance, and several Gaussian maps for its spatial layout Then we extract the appearance information from the GMM parameters, and the spatial information from global and local statistics over Gaussian maps Finally, we employ a supervised dimension reduction technique called DAP (discriminant attribute projection) to remove noise directions and to further enhance the discriminating power of our representation We justify that the traditional histogram representation and the spatial pyramid matching are special cases of our hierarchical Gaussianization We compare our new representation with other approaches in scene classification, object recognition and face recognition, and our performance ranks among the top in all three tasks

...read moreread less

Journal Article•DOI•

Interactive Dimensionality Reduction Through User-defined Combinations of Quality Metrics

[...]

Sara Johansson¹, Jimmy Johansson¹•Institutions (1)

Linköping University¹

01 Nov 2009-IEEE Transactions on Visualization and Computer Graphics

TL;DR: A system for dimensionality reduction by combining user-defined quality metrics using weight functions to preserve as many important structures as possible and provides enhancement of diverse structures by supplying a range of automatic variable orderings is introduced.

...read moreread less

Abstract: Multivariate data sets including hundreds of variables are increasingly common in many application areas. Most multivariate visualization techniques are unable to display such data effectively, and a common approach is to employ dimensionality reduction prior to visualization. Most existing dimensionality reduction systems focus on preserving one or a few significant structures in data. For many analysis tasks, however, several types of structures can be of high significance and the importance of a certain structure compared to the importance of another is often task-dependent. This paper introduces a system for dimensionality reduction by combining user-defined quality metrics using weight functions to preserve as many important structures as possible. The system aims at effective visualization and exploration of structures within large multivariate data sets and provides enhancement of diverse structures by supplying a range of automatic variable orderings. Furthermore it enables a quality-guided reduction of variables through an interactive display facilitating investigation of trade-offs between loss of structure and the number of variables to keep. The generality and interactivity of the system is demonstrated through a case scenario.

...read moreread less

Journal Article•DOI•

A new feature selection method on classification of medical datasets: Kernel F-score feature selection

[...]

Kemal Polat¹, Salih Güneş¹•Institutions (1)

Selçuk University¹

01 Sep 2009-Expert Systems With Applications

TL;DR: The proposed feature selection method called KFFS is produced very promising results compared to F-score feature selection, and the irrelevant or redundant features are removed from high dimensional input feature space.

...read moreread less

Abstract: In this paper, we have proposed a new feature selection method called kernel F-score feature selection (KFFS) used as pre-processing step in the classification of medical datasets. KFFS consists of two phases. In the first phase, input spaces (features) of medical datasets have been transformed to kernel space by means of Linear (Lin) or Radial Basis Function (RBF) kernel functions. By this way, the dimensions of medical datasets have increased to high dimension feature space. In the second phase, the F-score values of medical datasets with high dimensional feature space have been calculated using F-score formula. And then the mean value of calculated F-scores has been computed. If the F-score value of any feature in medical datasets is bigger than this mean value, that feature will be selected. Otherwise, that feature is removed from feature space. Thanks to KFFS method, the irrelevant or redundant features are removed from high dimensional input feature space. The cause of using kernel functions transforms from non-linearly separable medical dataset to a linearly separable feature space. In this study, we have used the heart disease dataset, SPECT (Single Photon Emission Computed Tomography) images dataset, and Escherichia coli Promoter Gene Sequence dataset taken from UCI (University California, Irvine) machine learning database to test the performance of KFFS method. As classification algorithms, Least Square Support Vector Machine (LS-SVM) and Levenberg-Marquardt Artificial Neural Network have been used. As shown in the obtained results, the proposed feature selection method called KFFS is produced very promising results compared to F-score feature selection.

...read moreread less

Journal Article•DOI•

Semi-supervised orthogonal discriminant analysis via label propagation

[...]

Feiping Nie¹, Shiming Xiang², Yangqing Jia¹, Changshui Zhang¹•Institutions (2)

Tsinghua University¹, Chinese Academy of Sciences²

01 Nov 2009-Pattern Recognition

TL;DR: This paper proposes a novel semi-supervised orthogonal discriminant analysis via label propagation that propagates the label information from the labeled data to the unlabeled data through a specially designed label propagation, and thus the distribution of the unl labeled data can be explored more effectively to learn a better subspace.

...read moreread less

Collapse