scispace - formally typeset
Search or ask a question

Showing papers on "Dimensionality reduction published in 2012"


Journal ArticleDOI
TL;DR: This paper first presents and evaluates different ways of aggregating local image descriptors into a vector and shows that the Fisher kernel achieves better performance than the reference bag-of-visual words approach for any given vector dimension.
Abstract: This paper addresses the problem of large-scale image search. Three constraints have to be taken into account: search accuracy, efficiency, and memory usage. We first present and evaluate different ways of aggregating local image descriptors into a vector and show that the Fisher kernel achieves better performance than the reference bag-of-visual words approach for any given vector dimension. We then jointly optimize dimensionality reduction and indexing in order to obtain a precise vector comparison as well as a compact representation. The evaluation shows that the image representation can be reduced to a few dozen bytes while preserving high accuracy. Searching a 100 million image data set takes about 250 ms on one processor core.

1,649 citations


Journal ArticleDOI
TL;DR: The proposed framework employs local Fisher's discriminant analysis to reduce the dimensionality of the data while preserving its multi-dimensional structure, while a subsequent Gaussian mixture model or support vector machine provides effective classification of the reduced-dimension multimodal data.
Abstract: Hyperspectral imagery typically provides a wealth of information captured in a wide range of the electromagnetic spectrum for each pixel in the image; however, when used in statistical pattern-classification tasks, the resulting high-dimensional feature spaces often tend to result in ill-conditioned formulations. Popular dimensionality-reduction techniques such as principal component analysis, linear discriminant analysis, and their variants typically assume a Gaussian distribution. The quadratic maximum-likelihood classifier commonly employed for hyperspectral analysis also assumes single-Gaussian class-conditional distributions. Departing from this single-Gaussian assumption, a classification paradigm designed to exploit the rich statistical structure of the data is proposed. The proposed framework employs local Fisher's discriminant analysis to reduce the dimensionality of the data while preserving its multimodal structure, while a subsequent Gaussian mixture model or support vector machine provides effective classification of the reduced-dimension multimodal data. Experimental results on several different multiple-class hyperspectral-classification tasks demonstrate that the proposed approach significantly outperforms several traditional alternatives.

408 citations


Journal ArticleDOI
TL;DR: A novel nearest neighbor-based feature weighting algorithm, which learns a feature Weighting vector by maximizing the expected leave-one-out classification accuracy with a regularization term, is proposed.
Abstract: Feature selection is of considerable importance in data mining and machine learning, especially for high dimensional data. In this paper, we propose a novel nearest neighbor-based feature weighting algorithm, which learns a feature weighting vector by maximizing the expected leave-one-out classification accuracy with a regularization term. The algorithm makes no parametric assumptions about the distribution of the data and scales naturally to multiclass problems. Experiments conducted on artificial and real data sets demonstrate that the proposed algorithm is largely insensitive to the increase in the number of irrelevant features and performs better than the state-of-the-art methods in most cases.

401 citations


Book ChapterDOI
07 Oct 2012
TL;DR: In this paper, a joint dimensionality reduction of multiple vocabularies is proposed to alleviate the quantization artifacts through a joint reduction of the dimensionality of the representations of multiple words.
Abstract: The paper addresses large scale image retrieval with short vector representations. We study dimensionality reduction by Principal Component Analysis (PCA) and propose improvements to its different phases. We show and explicitly exploit relations between i) mean subtraction and the negative evidence, i.e., a visual word that is mutually missing in two descriptions being compared, and ii) the axis de-correlation and the co-occurrences phenomenon. Finally, we propose an effective way to alleviate the quantization artifacts through a joint dimensionality reduction of multiple vocabularies. The proposed techniques are simple, yet significantly and consistently improve over the state of the art on compact image representations. Complementary experiments in image classification show that the methods are generally applicable.

390 citations


Journal ArticleDOI
TL;DR: In this paper, an efficient convex optimization-based algorithm that is called outlier pursuit is presented, which under some mild assumptions on the uncorrupted points (satisfied, e.g., by the standard generative assumption in PCA problems) recovers the exact optimal low-dimensional subspace and identifies the corrupted points.
Abstract: Singular-value decomposition (SVD) [and principal component analysis (PCA)] is one of the most widely used techniques for dimensionality reduction: successful and efficiently computable, it is nevertheless plagued by a well-known, well-documented sensitivity to outliers. Recent work has considered the setting where each point has a few arbitrarily corrupted components. Yet, in applications of SVD or PCA, such as robust collaborative filtering or bioinformatics, malicious agents, defective genes, or simply corrupted or contaminated experiments may effectively yield entire points that are completely corrupted. We present an efficient convex optimization-based algorithm that we call outlier pursuit, which under some mild assumptions on the uncorrupted points (satisfied, e.g., by the standard generative assumption in PCA problems) recovers the exact optimal low-dimensional subspace and identifies the corrupted points. Such identification of corrupted points that do not conform to the low-dimensional approximation is of paramount interest in bioinformatics, financial applications, and beyond. Our techniques involve matrix decomposition using nuclear norm minimization; however, our results, setup, and approach necessarily differ considerably from the existing line of work in matrix completion and matrix decomposition, since we develop an approach to recover the correct column space of the uncorrupted matrix, rather than the exact matrix itself. In any problem where one seeks to recover a structure rather than the exact initial matrices, techniques developed thus far relying on certificates of optimality will fail. We present an important extension of these methods, which allows the treatment of such problems.

388 citations


Journal Article
TL;DR: This work introduces a framework for feature selection based on dependence maximization between the selected features and the labels of an estimation problem, using the Hilbert-Schmidt Independence Criterion, and shows that a number of existing feature selectors are special cases of this framework.
Abstract: We introduce a framework for feature selection based on dependence maximization between the selected features and the labels of an estimation problem, using the Hilbert-Schmidt Independence Criterion. The key idea is that good features should be highly dependent on the labels. Our approach leads to a greedy procedure for feature selection. We show that a number of existing feature selectors are special cases of this framework. Experiments on both artificial and real-world data show that our feature selector works well in practice.

360 citations


Proceedings ArticleDOI
01 Apr 2012
TL;DR: A novel subspace search method that selects high contrast subspaces for density-based outlier ranking and proposes a first measure for the contrast of subspace dimensions to enhance the quality of traditional outlier rankings.
Abstract: Outlier mining is a major task in data analysis. Outliers are objects that highly deviate from regular objects in their local neighborhood. Density-based outlier ranking methods score each object based on its degree of deviation. In many applications, these ranking methods degenerate to random listings due to low contrast between outliers and regular objects. Outliers do not show up in the scattered full space, they are hidden in multiple high contrast subspace projections of the data. Measuring the contrast of such subspaces for outlier rankings is an open research challenge. In this work, we propose a novel subspace search method that selects high contrast subspaces for density-based outlier ranking. It is designed as pre-processing step to outlier ranking algorithms. It searches for high contrast subspaces with a significant amount of conditional dependence among the subspace dimensions. With our approach, we propose a first measure for the contrast of subspaces. Thus, we enhance the quality of traditional outlier rankings by computing outlier scores in high contrast projections only. The evaluation on real and synthetic data shows that our approach outperforms traditional dimensionality reduction techniques, naive random projections as well as state-of-the-art subspace search techniques and provides enhanced quality for outlier ranking.

353 citations


Journal ArticleDOI
TL;DR: In this paper, a convex program based on regularized maximum-likelihood was proposed for model selection in the latent-variable Gaussian graphical model setting, with the conditional statistics of the observed variables conditioned on the latent variables being specified by a graphical model.
Abstract: Suppose we observe samples of a subset of a collection of random variables. No additional information is provided about the number of latent variables, nor of the relationship between the latent and observed variables. Is it possible to discover the number of latent components, and to learn a statistical model over the entire collection of variables? We address this question in the setting in which the latent and observed variables are jointly Gaussian, with the conditional statistics of the observed variables conditioned on the latent variables being specified by a graphical model. As a first step we give natural conditions under which such latent-variable Gaussian graphical models are identifiable given marginal statistics of only the observed variables. Essentially these conditions require that the conditional graphical model among the observed variables is sparse, while the effect of the latent variables is “spread out” over most of the observed variables. Next we propose a tractable convex program based on regularized maximum-likelihood for model selection in this latent-variable setting; the regularizer uses both the $\ell_{1}$ norm and the nuclear norm. Our modeling framework can be viewed as a combination of dimensionality reduction (to identify latent variables) and graphical modeling (to capture remaining statistical structure not attributable to the latent variables), and it consistently estimates both the number of latent components and the conditional graphical model structure among the observed variables. These results are applicable in the high-dimensional setting in which the number of latent/observed variables grows with the number of samples of the observed variables. The geometric properties of the algebraic varieties of sparse matrices and of low-rank matrices play an important role in our analysis.

338 citations


Journal ArticleDOI
TL;DR: Theoretically, it is shown that constrained L 0 likelihood and its computational surrogate are optimal in that they achieve feature selection consistency andsharp parameter estimation, under one necessary condition required for any method to be selection consistent and to achieve sharp parameter estimation.
Abstract: In high-dimensional data analysis, feature selection becomes one effective means for dimension reduction, which proceeds with parameter estimation. Concerning accuracy of selection and estimation, we study nonconvex constrained and regularized likelihoods in the presence of nuisance parameters. Theoretically, we show that constrained L 0 likelihood and its computational surrogate are optimal in that they achieve feature selection consistency and sharp parameter estimation, under one necessary condition required for any method to be selection consistent and to achieve sharp parameter estimation. It permits up to exponentially many candidate features. Computationally, we develop difference convex methods to implement the computational surrogate through prime and dual subproblems. These results establish a central role of L 0 constrained and regularized likelihoods in feature selection and parameter estimation involving selection. As applications of the general method and theory, we perform feature selection...

282 citations


Book
01 Feb 2012
TL;DR: A comprehensive introduction of various density ratio estimators including methods via density estimation, moment matching, probabilistic classification, density fitting, and density ratio fitting as well as describing how these can be applied to machine learning can be found in this paper.
Abstract: Machine learning is an interdisciplinary field of science and engineering that studies mathematical theories and practical applications of systems that learn. This book introduces theories, methods, and applications of density ratio estimation, which is a newly emerging paradigm in the machine learning community. Various machine learning problems such as non-stationarity adaptation, outlier detection, dimensionality reduction, independent component analysis, clustering, classification, and conditional density estimation can be systematically solved via the estimation of probability density ratios. The authors offer a comprehensive introduction of various density ratio estimators including methods via density estimation, moment matching, probabilistic classification, density fitting, and density ratio fitting as well as describing how these can be applied to machine learning. The book also provides mathematical theories for density ratio estimation including parametric and non-parametric convergence analysis and numerical stability analysis to complete the first and definitive treatment of the entire framework of density ratio estimation in machine learning.

280 citations


Journal ArticleDOI
TL;DR: The authors proposed a two-stage refitted procedure via a data splitting technique, called refitted cross-validation, to attenuate the influence of irrelevant variables with high spurious correlations.
Abstract: Variance estimation is a fundamental problem in statistical modelling. In ultrahigh dimensional linear regression where the dimensionality is much larger than the sample size, traditional variance estimation techniques are not applicable. Recent advances in variable selection in ultrahigh dimensional linear regression make this problem accessible. One of the major problems in ultrahigh dimensional regression is the high spurious correlation between the unobserved realized noise and some of the predictors. As a result, the realized noises are actually predicted when extra irrelevant variables are selected, leading to serious underestimate of the level of noise. We propose a two-stage refitted procedure via a data splitting technique, called refitted cross-validation, to attenuate the influence of irrelevant variables with high spurious correlations. Our asymptotic results show that the resulting procedure performs as well as the oracle estimator, which knows in advance the mean regression function. The simulation studies lend further support to our theoretical claims. The naive two-stage estimator and the plug-in one-stage estimators using the lasso and smoothly clipped absolute deviation are also studied and compared. Their performances can be improved by the reffitted cross-validation method proposed.

Journal ArticleDOI
TL;DR: This study proposes a novel filter based probabilistic feature selection method, namely distinguishing feature selector (DFS), for text classification that is compared with well-known filter approaches including chi square, information gain, Gini index and deviation from Poisson distribution.
Abstract: High dimensionality of the feature space is one of the most important concerns in text classification problems due to processing time and accuracy considerations. Selection of distinctive features is therefore essential for text classification. This study proposes a novel filter based probabilistic feature selection method, namely distinguishing feature selector (DFS), for text classification. The proposed method is compared with well-known filter approaches including chi square, information gain, Gini index and deviation from Poisson distribution. The comparison is carried out for different datasets, classification algorithms, and success measures. Experimental results explicitly indicate that DFS offers a competitive performance with respect to the abovementioned approaches in terms of classification accuracy, dimension reduction rate and processing time.

Journal ArticleDOI
TL;DR: In this paper, the adaptive selection of the local neighborhood sizes when imposing a connectivity structure on the given set of high-dimensional data points and the adaptive bias reduction in the local low-dimensional embedding by accounting for the variations in the curvature of the manifold as well as its interplay with the sampling density of the data set.
Abstract: Manifold learning algorithms seek to find a low-dimensional parameterization of high-dimensional data. They heavily rely on the notion of what can be considered as local, how accurately the manifold can be approximated locally, and, last but not least, how the local structures can be patched together to produce the global parameterization. In this paper, we develop algorithms that address two key issues in manifold learning: 1) the adaptive selection of the local neighborhood sizes when imposing a connectivity structure on the given set of high-dimensional data points and 2) the adaptive bias reduction in the local low-dimensional embedding by accounting for the variations in the curvature of the manifold as well as its interplay with the sampling density of the data set. We demonstrate the effectiveness of our methods for improving the performance of manifold learning algorithms using both synthetic and real-world data sets.

Journal ArticleDOI
TL;DR: A novel contour-based shape descriptor is proposed, called the multiscale distance matrix, to capture the shape geometry while being invariant to translation, rotation, scaling, and bilateral symmetry, and is therefore fast and suitable for real-time applications.
Abstract: In this brief, we propose a novel contour-based shape descriptor, called the multiscale distance matrix, to capture the shape geometry while being invariant to translation, rotation, scaling, and bilateral symmetry. The descriptor is further combined with a dimensionality reduction to improve its discriminative power. The proposed method avoids the time-consuming pointwise matching encountered in most of the previously used shape recognition algorithms. It is therefore fast and suitable for real-time applications. We applied the proposed method to the task of plan leaf recognition with experiments on two data sets, the Swedish Leaf data set and the ICL Leaf data set. The experimental results clearly demonstrate the effectiveness and efficiency of the proposed descriptor.

Journal ArticleDOI
TL;DR: By proposing an additional technique that makes the feature descriptor robust to rotation, the efficiency of the algorithm is validated and it is proved that it is about 30 times faster than those based on Gabor filters.
Abstract: A good feature descriptor is desired to be discriminative, robust, and computationally inexpensive in both terms of time and storage requirement. In the domain of face recognition, these properties allow the system to quickly deliver high recognition results to the end user. Motivated by the recent feature descriptor called Patterns of Oriented Edge Magnitudes (POEM), which balances the three concerns, this paper aims at enhancing its performance with respect to all these criteria. To this end, we first optimize the parameters of POEM and then apply the whitened principal-component-analysis dimensionality reduction technique to get a more compact, robust, and discriminative descriptor. For face recognition, the efficiency of our algorithm is proved by strong results obtained on both constrained (Face Recognition Technology, FERET) and unconstrained (Labeled Faces in the Wild, LFW) data sets in addition with the low complexity. Impressively, our algorithm is about 30 times faster than those based on Gabor filters. Furthermore, by proposing an additional technique that makes our descriptor robust to rotation, we validate its efficiency for the task of image matching.

Journal ArticleDOI
TL;DR: It is demonstrated that the aa model is relevant for feature extraction and dimensionality reduction for a large variety of machine learning problems taken from computer vision, neuroimaging, chemistry, text mining and collaborative filtering leading to highly interpretable representations of the dynamics in the data.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed IGO-methods significantly outperform popular methods such as Gabor features and Local Binary Patterns and achieve state-of-the-art performance for difficult problems such as illumination and occlusion-robust face recognition.
Abstract: We introduce the notion of subspace learning from image gradient orientations for appearance-based object recognition. As image data are typically noisy and noise is substantially different from Gaussian, traditional subspace learning from pixel intensities very often fails to estimate reliably the low-dimensional subspace of a given data population. We show that replacing pixel intensities with gradient orientations and the l2 norm with a cosine-based distance measure offers, to some extend, a remedy to this problem. Within this framework, which we coin Image Gradient Orientations (IGO) subspace learning, we first formulate and study the properties of Principal Component Analysis of image gradient orientations (IGO-PCA). We then show its connection to previously proposed robust PCA techniques both theoretically and experimentally. Finally, we derive a number of other popular subspace learning techniques, namely, Linear Discriminant Analysis (LDA), Locally Linear Embedding (LLE), and Laplacian Eigenmaps (LE). Experimental results show that our algorithms significantly outperform popular methods such as Gabor features and Local Binary Patterns and achieve state-of-the-art performance for difficult problems such as illumination and occlusion-robust face recognition. In addition to this, the proposed IGO-methods require the eigendecomposition of simple covariance matrices and are as computationally efficient as their corresponding l2 norm intensity-based counterparts. Matlab code for the methods presented in this paper can be found at http://ibug.doc.ic.ac.uk/resources.

Journal ArticleDOI
TL;DR: This work uses l1, ∞ regularization to select the dictionary from the data and shows how to relax the restriction-to-X constraint by initializing an alternating minimization approach with the solution of the convex model, obtaining a dictionary close to but not necessarily in X.
Abstract: A collaborative convex framework for factoring a data matrix X into a nonnegative product AS , with a sparse coefficient matrix S, is proposed. We restrict the columns of the dictionary matrix A to coincide with certain columns of the data matrix X, thereby guaranteeing a physically meaningful dictionary and dimensionality reduction. We use l1, ∞ regularization to select the dictionary from the data and show that this leads to an exact convex relaxation of l0 in the case of distinct noise-free data. We also show how to relax the restriction-to-X constraint by initializing an alternating minimization approach with the solution of the convex model, obtaining a dictionary close to but not necessarily in X. We focus on applications of the proposed framework to hyperspectral endmember and abundance identification and also show an application to blind source separation of nuclear magnetic resonance data.

Journal ArticleDOI
TL;DR: The LS-WKRRR formulation of CA methods has several benefits: it provides a clean connection between many CA techniques and an intuitive framework to understand normalization factors, overcomes the small sample size problem, and provides a framework to easily extend CA methods.
Abstract: Over the last century, Component Analysis (CA) methods such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Canonical Correlation Analysis (CCA), Locality Preserving Projections (LPP), and Spectral Clustering (SC) have been extensively used as a feature extraction step for modeling, classification, visualization, and clustering. CA techniques are appealing because many can be formulated as eigen-problems, offering great potential for learning linear and nonlinear representations of data in closed-form. However, the eigen-formulation often conceals important analytic and computational drawbacks of CA techniques, such as solving generalized eigen-problems with rank deficient matrices (e.g., small sample size problem), lacking intuitive interpretation of normalization factors, and understanding commonalities and differences between CA methods. This paper proposes a unified least-squares framework to formulate many CA methods. We show how PCA, LDA, CCA, LPP, SC, and its kernel and regularized extensions correspond to a particular instance of least-squares weighted kernel reduced rank regression (LS--WKRRR). The LS-WKRRR formulation of CA methods has several benefits: 1) provides a clean connection between many CA techniques and an intuitive framework to understand normalization factors; 2) yields efficient numerical schemes to solve CA techniques; 3) overcomes the small sample size problem; 4) provides a framework to easily extend CA methods. We derive weighted generalizations of PCA, LDA, SC, and CCA, and several new CA techniques.

Journal ArticleDOI
TL;DR: This work presents a novel depth video-based translation and scaling invariant human activity recognition (HAR) system utilizing R transformation of depth silhouettes, and demonstrates that the proposed method is robust, reliable, and efficient in recognizing the daily human activities.
Abstract: Video-based human activity recognition systems have potential contributions to various applications such as smart homes and healthcare services. In this work, we present a novel depth video-based translation and scaling invariant human activity recognition (HAR) system utilizing R transformation of depth silhouettes. To perform HAR in indoor settings, an invariant HAR method is critical to freely perform activities anywhere in a camera view without translation and scaling problems of human body silhouettes. We obtain such invariant features via R transformation on depth silhouettes. Furthermore, in R transforming depth silhouettes, shape information of human body reflected in depth values is encoded into the features. In R transformation, 2D feature maps are computed first through Radon transform of each depth silhouette followed by computing 1D feature profile through R transform to get the translation and scaling invariant features. Then, we apply Principle Component Analysis (PCA) for dimension reduction and Linear Discriminant Analysis (LDA) to make the features more prominent, compact and robust. Finally, Hidden Markov Models (HMMs) are used to train and recognize different human activities. Our proposed system shows superior recognition rate over the conventional approaches, reaching up to the mean recognition rate of 93.16% for six typical human activities whereas the conventional PC and IC-based depth silhouettes achieved only 74.83% and 86.33% ,while binary silhouettes-based R transformation approach achieved 67.08% respectively. Our experimental results show that the proposed method is robust, reliable, and efficient in recognizing the daily human activities.

Journal ArticleDOI
TL;DR: The mathematical basis of the classification algorithms used for decoding fMRI signals, such as support vector machines (SVMs), are described and the workflow of processing steps required for MVPA are described such as feature selection, dimensionality reduction, cross- validation, and classifier performance estimation based on receiver operating characteristic (ROC) curves.
Abstract: Functional magnetic resonance imaging (fMRI) exploits blood-oxygen-level-dependent (BOLD) contrasts to map neural activity associated with a variety of brain functions including sensory processing, motor control, and cognitive and emotional functions. The general linear model (GLM) approach is used to reveal task-related brain areas by searching for linear correlations between the fMRI time course and a reference model. One of the limitations of the GLM approach is the assumption that the covariance across neighbouring voxels is not informative about the cognitive function under examination. Multivoxel pattern analysis (MVPA) represents a promising technique that is currently exploited to investigate the information contained in distributed patterns of neural activity to infer the functional role of brain areas and networks. MVPA is considered as a supervised classification problem where a classifier attempts to capture the relationships between spatial pattern of fMRI activity and experimental conditions. In this paper , we review MVPA and describe the mathematical basis of the classification algorithms used for decoding fMRI signals, such as support vector machines (SVMs). In addition, we describe the workflow of processing steps required for MVPA such as feature selection, dimensionality reduction, cross-validation, and classifier performance estimation based on receiver operating characteristic (ROC) curves.

Journal ArticleDOI
TL;DR: The generalized discriminant analysis (GerDA) proposed in this paper uses nonlinear transformations that are learnt by DNNs in a semisupervised fashion and displays excellent performance on real-world recognition and detection tasks, such as handwritten digit recognition and face detection.
Abstract: We present an approach to feature extraction that is a generalization of the classical linear discriminant analysis (LDA) on the basis of deep neural networks (DNNs). As for LDA, discriminative features generated from independent Gaussian class conditionals are assumed. This modeling has the advantages that the intrinsic dimensionality of the feature space is bounded by the number of classes and that the optimal discriminant function is linear. Unfortunately, linear transformations are insufficient to extract optimal discriminative features from arbitrarily distributed raw measurements. The generalized discriminant analysis (GerDA) proposed in this paper uses nonlinear transformations that are learnt by DNNs in a semisupervised fashion. We show that the feature extraction based on our approach displays excellent performance on real-world recognition and detection tasks, such as handwritten digit recognition and face detection. In a series of experiments, we evaluate GerDA features with respect to dimensionality reduction, visualization, classification, and detection. Moreover, we show that GerDA DNNs can preprocess truly high-dimensional input data to low-dimensional representations that facilitate accurate predictions even if simple linear predictors or measures of similarity are used.

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the bands selected by the approach on the whole data (containing noise bands) could achieve higher overall classification accuracies than those by other state-of-the-art feature selection techniques on the manual-band-removal (MBR) data, even better than the bands identified by the proposed approaches on the MBR data.
Abstract: The rich information available in hyperspectral imagery has provided significant opportunities for material classification and identification. Due to the problem of the “curse of dimensionality” (called Hughes phenomenon) posed by the high number of spectral channels along with small amounts of labeled training samples, dimensionality reduction is a necessary preprocessing step for hyperspectral data. Generally, in order to improve the classification accuracy, noise bands generated by various sources (primarily the sensor and the atmosphere) are often manually removed in advance. However, the removal of these bands may discard some important discriminative information, eventually degrading the classification accuracy. In this paper, we propose a new strategy to automatically select bands without manual band removal. Firstly, wavelet shrinkage is applied to denoise the spatial images of the whole data cube. Then affinity propagation, which is a recently proposed feature selection approach, is used to choose representative bands from the noise-reduced data. Experimental results on three real hyperspectral data collected by two different sensors demonstrate that the bands selected by our approach on the whole data (containing noise bands) could achieve higher overall classification accuracies than those by other state-of-the-art feature selection techniques on the manual-band-removal (MBR) data, even better than the bands identified by the proposed approach on the MBR data, indicating that the removed “noise” bands are valuable for hyperspectral classification, which should not be eliminated.

Journal ArticleDOI
TL;DR: Analysis of principal nested spheres provides an intuitive and flexible decomposition of the high-dimensional sphere and an interesting special case of the analysis results in finding principal geodesics, similar to those from previous approaches to manifold principal component analysis.
Abstract: A general framework for a novel non-geodesic decomposition of high-dimensional spheres or high-dimensional shape spaces for planar landmarks is discussed. The decomposition, principal nested spheres, leads to a sequence of submanifolds with decreasing intrinsic dimensions, which can be interpreted as an analogue of principal component analysis. In a number of real datasets, an apparent one-dimensional mode of variation curving through more than one geodesic component is captured in the one-dimensional component of principal nested spheres. While analysis of principal nested spheres provides an intuitive and flexible decomposition of the high-dimensional sphere, an interesting special case of the analysis results in finding principal geodesics, similar to those from previous approaches to manifold principal component analysis. An adaptation of our method to Kendall’s shape space is discussed, and a computational algorithm for fitting principal nested spheres is proposed. The result provides a coordinate system to visualize the data structure and an intuitive summary of principal modes of variation, as exemplified by several datasets.

Book
01 Jul 2012
TL;DR: The book moreover stresses the recently developed nonlinear methods and introduces the applications of dimensionality reduction in many areas, such as face recognition, image segmentation, data classification, data visualization, and hyperspectral imagery data analysis.
Abstract: "Geometric Structure of High-Dimensional Data and Dimensionality Reduction" adopts data geometry as a framework to address various methods of dimensionality reduction. In addition to the introduction to well-known linear methods, the book moreover stresses the recently developed nonlinear methods and introduces the applications of dimensionality reduction in many areas, such as face recognition, image segmentation, data classification, data visualization, and hyperspectral imagery data analysis. Numerous tables and graphs are included to illustrate the ideas, effects, and shortcomings of the methods. MATLAB code of all dimensionality reduction algorithms is provided to aid the readers with the implementations on computers. The book will be useful for mathematicians, statisticians, computer scientists, and data analysts. It is also a valuable handbook for other practitioners who have a basic background in mathematics, statistics and/or computer algorithms, like internet search engine designers, physicists, geologists, electronic engineers, and economists. Jianzhong Wang is a Professor of Mathematics at Sam Houston State University, U.S.A.

Book ChapterDOI
12 Sep 2012
TL;DR: This chapter presents Similarity Measures and Dimensionality Reduction Techniques for Time Series Data Mining and discusses how these techniques can be applied to time series data mining data.
Abstract: © 2012 Cassisi et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Similarity Measures and Dimensionality Reduction Techniques for Time Series Data Mining

Journal ArticleDOI
TL;DR: A method that combines signals from many brain regions observed in functional Magnetic Resonance Imaging to predict the subject's behavior during a scanning session yields higher prediction accuracy than standard voxel-based approaches and infers an explicit weighting of the regions involved in the regression or classification task.

Journal ArticleDOI
Jianbo Yu1
TL;DR: The experimental results demonstrate that the proposed LGPCA-based monitoring method effectively captures meaningful information hidden in the observations and shows superior process monitoring performance compared to those regular monitoring methods.

Journal ArticleDOI
TL;DR: The proposed semi-supervised feature analyzing framework is able to learn a classifier for different applications by selecting the discriminating features closely related to the semantic concepts by designing an efficient iterative algorithm with fast convergence, thus making it applicable to practical applications.
Abstract: In this paper, we propose a novel semi-supervised feature analyzing framework for multimedia data understanding and apply it to three different applications: image annotation, video concept detection and 3-D motion data analysis. Our method is built upon two advancements of the state of the art: (1) l2, 1-norm regularized feature selection which can jointly select the most relevant features from all the data points. This feature selection approach was shown to be robust and efficient in literature as it considers the correlation between different features jointly when conducting feature selection; (2) manifold learning which analyzes the feature space by exploiting both labeled and unlabeled data. It is a widely used technique to extend many algorithms to semi-supervised scenarios for its capability of leveraging the manifold structure of multimedia data. The proposed method is able to learn a classifier for different applications by selecting the discriminating features closely related to the semantic concepts. The objective function of our method is non-smooth and difficult to solve, so we design an efficient iterative algorithm with fast convergence, thus making it applicable to practical applications. Extensive experiments on image annotation, video concept detection and 3-D motion data analysis are performed on different real-world data sets to demonstrate the effectiveness of our algorithm.

Proceedings ArticleDOI
25 Mar 2012
TL;DR: In this article, the authors used variable-length units to represent acoustic events at the supra-frame level, in order to benefit from finer temporal alignments when deriving the acoustic prototypes.
Abstract: In recent work, we introduced Latent Perceptual Mapping (LPM) [1], a new framework for acoustic modeling suitable for template-like speech recognition. The basic idea is to leverage a reduced dimensionality description of the observations to derive acoustic prototypes that are closely aligned with perceived acoustic events. Our initial work adopted a bag-of-frames strategy to represent relevant acoustic information within speech segments. In this paper, we extend this approach by better integrating temporal information into the LPM feature extraction. Specifically, we use variable-length units to represent acoustic events at the supra-frame level, in order to benefit from finer temporal alignments when deriving the acoustic prototypes. The outcome can be viewed as a generalization of both conventional template-based approaches and recently proposed sparse representation solutions. This extension is experimentally validated on a context-independent phoneme classification task using the TIMIT corpus.