scispace - formally typeset
Search or ask a question

Showing papers on "Dimensionality reduction published in 2013"


Proceedings ArticleDOI
01 Dec 2013
TL;DR: JDA aims to jointly adapt both the marginal distribution and conditional distribution in a principled dimensionality reduction procedure, and construct new feature representation that is effective and robust for substantial distribution difference.
Abstract: Transfer learning is established as an effective technology in computer vision for leveraging rich labeled data in the source domain to build an accurate classifier for the target domain. However, most prior methods have not simultaneously reduced the difference in both the marginal distribution and conditional distribution between domains. In this paper, we put forward a novel transfer learning approach, referred to as Joint Distribution Adaptation (JDA). Specifically, JDA aims to jointly adapt both the marginal distribution and conditional distribution in a principled dimensionality reduction procedure, and construct new feature representation that is effective and robust for substantial distribution difference. Extensive experiments verify that JDA can significantly outperform several state-of-the-art methods on four types of cross-domain image classification problems.

1,542 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: A novel approach to the pedestrian re-identification problem that uses metric learning to improve the state-of-the-art performance on standard public datasets and is an effective way to process observations comprising multiple shots, and is non-iterative: the computation times are relatively modest.
Abstract: Metric learning methods, for person re-identification, estimate a scaling for distances in a vector space that is optimized for picking out observations of the same individual. This paper presents a novel approach to the pedestrian re-identification problem that uses metric learning to improve the state-of-the-art performance on standard public datasets. Very high dimensional features are extracted from the source color image. A first processing stage performs unsupervised PCA dimensionality reduction, constrained to maintain the redundancy in color-space representation. A second stage further reduces the dimensionality, using a Local Fisher Discriminant Analysis defined by a training set. A regularization step is introduced to avoid singular matrices during this stage. The experiments conducted on three publicly available datasets confirm that the proposed method outperforms the state-of-the-art performance, including all other known metric learning methods. Further-more, the method is an effective way to process observations comprising multiple shots, and is non-iterative: the computation times are relatively modest. Finally, a novel statistic is derived to characterize the Match Characteristic: the normalized entropy reduction can be used to define the 'Proportion of Uncertainty Removed' (PUR). This measure is invariant to test set size and provides an intuitive indication of performance.

607 citations


Journal ArticleDOI
TL;DR: Five types of beat classes of arrhythmia as recommended by Association for Advancement of Medical Instrumentation (AAMI) were analyzed and dimensionality reduced features were fed to the Support Vector Machine, neural network and probabilistic neural network (PNN) classifiers for automated diagnosis.

586 citations


Journal ArticleDOI
TL;DR: In this article, a tensor regression model was proposed for analysis of high-throughput data due to their ultra-high dimensionality as well as complex structure, which can efficiently exploit the structure of tensor covariates.
Abstract: Classical regression methods treat covariates as a vector and estimate a corresponding vector of regression coefficients Modern applications in medical imaging generate covariates of more complex form such as multidimensional arrays (tensors) Traditional statistical and computational methods are proving insufficient for analysis of these high-throughput data due to their ultrahigh dimensionality as well as complex structure In this article, we propose a new family of tensor regression models that efficiently exploit the special structure of tensor covariates Under this framework, ultrahigh dimensionality is reduced to a manageable level, resulting in efficient estimation and prediction A fast and highly scalable estimation algorithm is proposed for maximum likelihood estimation and its associated asymptotic properties are studied Effectiveness of the new methods is demonstrated on both synthetic and real MRI imaging data Supplementary materials for this article are available online

425 citations


Journal ArticleDOI
TL;DR: This article provides a comprehensive review and comparison of the performance of the principal methods of dimension reduction proposed in the ABC literature, split into three nonmutually exclusive classes consisting of best subset selection methods, projection techniques and regularization.
Abstract: Approximate Bayesian computation (ABC) methods make use of comparisons between simulated and observed summary statistics to overcome the problem of computationally intractable likelihood functions. As the practical implementation of ABC requires computations based on vectors of summary statistics, rather than full data sets, a central question is how to derive low-dimensional summary statistics from the observed data with minimal loss of information. In this article we provide a comprehensive review and comparison of the performance of the principal methods of dimension reduction proposed in the ABC literature. The methods are split into three nonmutually exclusive classes consisting of best subset selection methods, projection techniques and regularization. In addition, we introduce two new methods of dimension reduction. The first is a best subset selection method based on Akaike and Bayesian information criteria, and the second uses ridge regression as a regularization procedure. We illustrate the performance of these dimension reduction techniques through the analysis of three challenging models and data sets.

393 citations


Posted Content
Sanjoy Dasgupta1
TL;DR: Results of random projection as a promising dimensionality reduction technique for learning mixtures of Gaussians are summarized by a wide variety of experiments on synthetic and real data.
Abstract: Recent theoretical work has identified random projection as a promising dimensionality reduction technique for learning mixtures of Gausians. Here we summarize these results and illustrate them by a wide variety of experiments on synthetic and real data.

329 citations


Journal ArticleDOI
TL;DR: In this paper, a new iterative thresholding approach for estimating principal subspaces in the setting where the leading eigenvectors are sparse is proposed, and the new approach recovers the principal subspace and leading eigvectors consistently, and even optimally, in a range of high-dimensional sparse settings.
Abstract: Principal component analysis (PCA) is a classical dimension reduction method which projects data onto the principal subspace spanned by the leading eigenvectors of the covariance matrix. However, it behaves poorly when the number of features $p$ is comparable to, or even much larger than, the sample size $n$. In this paper, we propose a new iterative thresholding approach for estimating principal subspaces in the setting where the leading eigenvectors are sparse. Under a spiked covariance model, we find that the new approach recovers the principal subspace and leading eigenvectors consistently, and even optimally, in a range of high-dimensional sparse settings. Simulated examples also demonstrate its competitive performance.

298 citations


Journal ArticleDOI
TL;DR: It is demonstrated in this paper that PCCA+ always delivers an optimal fuzzy clustering for nearly uncoupled, not necessarily reversible, Markov chains with transition states.
Abstract: Given a row-stochastic matrix describing pairwise similarities between data objects, spectral clustering makes use of the eigenvectors of this matrix to perform dimensionality reduction for clustering in fewer dimensions. One example from this class of algorithms is the Robust Perron Cluster Analysis (PCCA+), which delivers a fuzzy clustering. Originally developed for clustering the state space of Markov chains, the method became popular as a versatile tool for general data classification problems. The robustness of PCCA+, however, cannot be explained by previous perturbation results, because the matrices in typical applications do not comply with the two main requirements: reversibility and nearly decomposability. We therefore demonstrate in this paper that PCCA+ always delivers an optimal fuzzy clustering for nearly uncoupled, not necessarily reversible, Markov chains with transition states.

288 citations


Journal ArticleDOI
TL;DR: A tensor organization scheme for representing a pixel's spectral-spatial feature and develop tensor discriminative locality alignment (TDLA) for removing redundant information for subsequent classification are defined.
Abstract: In this paper, we propose a method for the dimensionality reduction (DR) of spectral-spatial features in hyperspectral images (HSIs), under the umbrella of multilinear algebra, i.e., the algebra of tensors. The proposed approach is a tensor extension of conventional supervised manifold-learning-based DR. In particular, we define a tensor organization scheme for representing a pixel's spectral-spatial feature and develop tensor discriminative locality alignment (TDLA) for removing redundant information for subsequent classification. The optimal solution of TDLA is obtained by alternately optimizing each mode of the input tensors. The methods are tested on three public real HSI data sets collected by hyperspectral digital imagery collection experiment, reflective optics system imaging spectrometer, and airborne visible/infrared imaging spectrometer. The classification results show significant improvements in classification accuracies while using a small number of features.

283 citations


Journal ArticleDOI
TL;DR: A novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences is presented.
Abstract: Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time.

275 citations


Journal ArticleDOI
TL;DR: This work proposes multi-label feature selection methods which use the filter approach, and uses ReliefF and Information Gain to measure the goodness of features.

Journal ArticleDOI
TL;DR: A new multi-task feature selection algorithm is proposed and applied to multimedia (e.g., video and image) analysis, which enables the common knowledge of multiple tasks as supplementary information to facilitate decision making.
Abstract: While much progress has been made to multi-task classification and subspace learning, multi-task feature selection has long been largely unaddressed. In this paper, we propose a new multi-task feature selection algorithm and apply it to multimedia (e.g., video and image) analysis. Instead of evaluating the importance of each feature individually, our algorithm selects features in a batch mode, by which the feature correlation is considered. While feature selection has received much research attention, less effort has been made on improving the performance of feature selection by leveraging the shared knowledge from multiple related tasks. Our algorithm builds upon the assumption that different related tasks have common structures. Multiple feature selection functions of different tasks are simultaneously learned in a joint framework, which enables our algorithm to utilize the common knowledge of multiple tasks as supplementary information to facilitate decision making. An efficient iterative algorithm is proposed to optimize it, whose convergence is guaranteed. Experiments on different databases have demonstrated the effectiveness of the proposed algorithm.

Journal ArticleDOI
TL;DR: A dimensionality reduction method that fits SRC well, which maximizes the ratio of between- class reconstruction residual to within-class reconstruction residual in the projected space and thus enables SRC to achieve better performance.
Abstract: A sparse representation-based classifier (SRC) is developed and shows great potential for real-world face recognition. This paper presents a dimensionality reduction method that fits SRC well. SRC adopts a class reconstruction residual-based decision rule, we use it as a criterion to steer the design of a feature extraction method. The method is thus called the SRC steered discriminative projection (SRC-DP). SRC-DP maximizes the ratio of between-class reconstruction residual to within-class reconstruction residual in the projected space and thus enables SRC to achieve better performance. SRC-DP provides low-dimensional representation of human faces to make the SRC-based face recognition system more efficient. Experiments are done on the AR, the extended Yale B, and PIE face image databases, and results demonstrate the proposed method is more effective than other feature extraction methods based on the SRC.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: This paper advances descriptor-based face recognition by suggesting a novel usage of descriptors to form an over-complete representation, and by proposing a new metric learning pipeline within the same/not-same framework.
Abstract: This paper advances descriptor-based face recognition by suggesting a novel usage of descriptors to form an over-complete representation, and by proposing a new metric learning pipeline within the same/not-same framework. First, the Over-Complete Local Binary Patterns (OCLBP) face representation scheme is introduced as a multi-scale modified version of the Local Binary Patterns (LBP) scheme. Second, we propose an efficient matrix-vector multiplication-based recognition system. The system is based on Linear Discriminant Analysis (LDA) coupled with Within Class Covariance Normalization (WCCN). This is further extended to the unsupervised case by proposing an unsupervised variant of WCCN. Lastly, we introduce Diffusion Maps (DM) for non-linear dimensionality reduction as an alternative to the Whitened Principal Component Analysis (WPCA) method which is often used in face recognition. We evaluate the proposed framework on the LFW face recognition dataset under the restricted, unrestricted and unsupervised protocols. In all three cases we achieve very competitive results.

Journal ArticleDOI
TL;DR: This paper proposes a simple but effective robust LDA version based on L1-norm maximization, which learns a set of local optimal projection vectors by maximizing the ratio of the L2-norm-based between-class dispersion and the within- class dispersion.
Abstract: Linear discriminant analysis (LDA) is a well-known dimensionality reduction technique, which is widely used for many purposes. However, conventional LDA is sensitive to outliers because its objective function is based on the distance criterion using L2-norm. This paper proposes a simple but effective robust LDA version based on L1-norm maximization, which learns a set of local optimal projection vectors by maximizing the ratio of the L1-norm-based between-class dispersion and the L1-norm-based within-class dispersion. The proposed method is theoretically proved to be feasible and robust to outliers while overcoming the singular problem of the within-class scatter matrix for conventional LDA. Experiments on artificial datasets, standard classification datasets and three popular image databases demonstrate the efficacy of the proposed method.

Journal ArticleDOI
TL;DR: The underlying idea is to design an optimal projection matrix, which preserves the local neighborhood information inferred from unlabeled samples, while simultaneously maximizing the class discrimination of the data inferred from the labeled samples.
Abstract: We propose a novel semisupervised local discriminant analysis method for feature extraction in hyperspectral remote sensing imagery, with improved performance in both ill-posed and poor-posed conditions. The proposed method combines unsupervised methods (local linear feature extraction methods and supervised method (linear discriminant analysis) in a novel framework without any free parameters. The underlying idea is to design an optimal projection matrix, which preserves the local neighborhood information inferred from unlabeled samples, while simultaneously maximizing the class discrimination of the data inferred from the labeled samples. Experimental results on four real hyperspectral images demonstrate that the proposed method compares favorably with conventional feature extraction methods.

Journal ArticleDOI
TL;DR: This paper combines distance metric learning and dimensionality reduction to better explore the connections between facial features and age labels and presents an age-oriented local regression to capture the complicated facial aging process for age determination.

Journal ArticleDOI
TL;DR: In this paper, the authors investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output.
Abstract: The principal components analysis (PCA) algorithm is a standard tool for identifying good low-dimensional approximations to high-dimensional data. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We show that the sample complexity of the proposed method differs from the existing procedure in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results, showing that on real data there is a large performance gap between the existing method and our method.

Posted Content
TL;DR: The Manopt toolbox, available at www.manopt.org, is a user-friendly, documented piece of software dedicated to simplify experimenting with state of the art Riemannian optimization algorithms, which aims particularly at lowering the entrance barrier.
Abstract: Optimization on manifolds is a rapidly developing branch of nonlinear optimization. Its focus is on problems where the smooth geometry of the search space can be leveraged to design efficient numerical algorithms. In particular, optimization on manifolds is well-suited to deal with rank and orthogonality constraints. Such structured constraints appear pervasively in machine learning applications, including low-rank matrix completion, sensor network localization, camera network registration, independent component analysis, metric learning, dimensionality reduction and so on. The Manopt toolbox, available at this http URL, is a user-friendly, documented piece of software dedicated to simplify experimenting with state of the art Riemannian optimization algorithms. We aim particularly at reaching practitioners outside our field.

Book ChapterDOI
01 Jan 2013
TL;DR: This chapter discusses another popular data mining algorithm that can be used for supervised or unsupervised learning, Linear Discriminant Analysis, and presents the robust counterpart scheme originally proposed by Kim and Boyd.
Abstract: In this chapter we discuss another popular data mining algorithm that can be used for supervised or unsupervised learning. Linear Discriminant Analysis (LDA) was proposed by R. Fischer in 1936. It consists in finding the projection hyperplane that minimizes the interclass variance and maximizes the distance between the projected means of the classes. Similarly to PCA, these two objectives can be solved by solving an eigenvalue problem with the corresponding eigenvector defining the hyperplane of interest. This hyperplane can be used for classification, dimensionality reduction and for interpretation of the importance of the given features. In the first part of the chapter we discuss the generic formulation of LDA whereas in the second we present the robust counterpart scheme originally proposed by Kim and Boyd. We also discuss the non linear extension of LDA through the kernel transformation.

Journal ArticleDOI
TL;DR: A novel algorithm in which the artificial ants traverse on a directed graph with only O(2n) arcs is proposed, which incorporates the classification performance and feature set size into the heuristic guidance, and selects a feature set with small size and high classification accuracy.

Book
04 Jun 2013
TL;DR: This book is devoted to a novel approach for dimensionality reduction based on the famous nearest neighbor method that is a powerful classification and regression approach and various optimization approaches are compared, from evolutionary to swarm-based heuristics.
Abstract: This book is devoted to a novel approach for dimensionality reduction based on the famous nearest neighbor method that is a powerful classification and regression approach. It starts with an introduction to machine learning concepts and a real-world application from the energy domain. Then, unsupervised nearest neighbors (UNN) is introduced as efficient iterative method for dimensionality reduction. Various UNN models are developed step by step, reaching from a simple iterative strategy for discrete latent spaces to a stochastic kernel-based algorithm for learning submanifolds with independent parameterizations. Extensions that allow the embedding of incomplete and noisy patterns are introduced. Various optimization approaches are compared, from evolutionary to swarm-based heuristics. Experimental comparisons to related methodologies taking into account artificial test data sets and also real-world data demonstrate the behavior of UNN in practical scenarios. The book contains numerous color figures to illustrate the introduced concepts and to highlight the experimental results.

Journal ArticleDOI
TL;DR: Experimental results at various types of datasets show the proposed STDR outperforms the state-of-the-art algorithms in terms of k-means clustering performance and the proposed solution is theoretically guaranteed that the objective function of the proposed model converges to the global optimum.

Posted Content
TL;DR: In this article, a nonparametric independence screening (NIS) method is proposed to select variables by ranking a measure of the non-parametric marginal contributions of each covariate given the exposure variable.
Abstract: The varying-coefficient model is an important nonparametric statistical model that allows us to examine how the effects of covariates vary with exposure variables. When the number of covariates is big, the issue of variable selection arrives. In this paper, we propose and investigate marginal nonparametric screening methods to screen variables in ultra-high dimensional sparse varying-coefficient models. The proposed nonparametric independence screening (NIS) selects variables by ranking a measure of the nonparametric marginal contributions of each covariate given the exposure variable. The sure independent screening property is established under some mild technical conditions when the dimensionality is of nonpolynomial order, and the dimensionality reduction of NIS is quantified. To enhance practical utility and the finite sample performance, two data-driven iterative NIS methods are proposed for selecting thresholding parameters and variables: conditional permutation and greedy methods, resulting in Conditional-INIS and Greedy-INIS. The effectiveness and flexibility of the proposed methods are further illustrated by simulation studies and real data applications.

Journal ArticleDOI
TL;DR: In this article, the authors proposed an alternative approach that involves linear projection of all the data points onto a lower-dimensional subspace and demonstrate the superiority of this approach from a theoretical perspective and through simulated and real data examples.
Abstract: Gaussian processes are widely used in nonparametric regression, classification and spatiotemporal modelling, facilitated in part by a rich literature on their theoretical properties. However, one of their practical limitations is expensive computation, typically on the order of n3 where n is the number of data points, in performing the necessary matrix inversions. For large datasets, storage and processing also lead to computational bottlenecks, and numerical stability of the estimates and predicted values degrades with increasing n. Various methods have been proposed to address these problems, including predictive processes in spatial data analysis and the subset-of-regressors technique in machine learning. The idea underlying these approaches is to use a subset of the data, but this raises questions concerning sensitivity to the choice of subset and limitations in estimating fine-scale structure in regions that are not well covered by the subset. Motivated by the literature on compressive sensing, we propose an alternative approach that involves linear projection of all the data points onto a lower-dimensional subspace. We demonstrate the superiority of this approach from a theoretical perspective and through simulated and real data examples.

Journal ArticleDOI
TL;DR: A novel multiview dimensionality reduction method for scene classification that takes both intraclass and interclass geometries into consideration, and has the best performance in scene classification.

Proceedings ArticleDOI
19 Oct 2013
TL;DR: This work proposes a new multi- label feature selection algorithm, RF-ML, by extending the single-label feature selection ReliefF algorithm that takes into account the effect of interacting attributes to directly deal with multi-label data without any data transformation.
Abstract: The feature selection process aims to select a subset of relevant features to be used in model construction, reducing data dimensionality by removing irrelevant and redundant features. Although effective feature selection methods to support single-label learning are abound, this is not the case for multi-label learning. Furthermore, most of the multi-label feature selection methods proposed initially transform the multi-label data to single-label in which a traditional feature selection method is then applied. However, the application of single-label feature selection methods after transforming the data can hinder exploring label dependence, an important issue in multi-label learning. This work proposes a new multi-label feature selection algorithm, RF-ML, by extending the single-label feature selection ReliefF algorithm. RF-ML, unlike strictly univariate measures for feature ranking, takes into account the effect of interacting attributes to directly deal with multi-label data without any data transformation. Using synthetic datasets, the proposed algorithm is experimentally compared to the ReliefF algorithm in which the multi-label data has been previously transformed to single-label data using two well-known data transformation approaches. Results show that the proposed algorithm stands out by ranking the relevant features as the best ones more often.

Journal ArticleDOI
TL;DR: This work proposes a high-dimensional robust principal component analysis algorithm that is efficient, robust to contaminated points, and easily kernelizable, and achieves maximal robustness.
Abstract: Principal component analysis plays a central role in statistics, engineering, and science. Because of the prevalence of corrupted data in real-world applications, much research has focused on developing robust algorithms. Perhaps surprisingly, these algorithms are unequipped-indeed, unable-to deal with outliers in the high-dimensional setting where the number of observations is of the same magnitude as the number of variables of each observation, and the dataset contains some (arbitrarily) corrupted observations. We propose a high-dimensional robust principal component analysis algorithm that is efficient, robust to contaminated points, and easily kernelizable. In particular, our algorithm achieves maximal robustness-it has a breakdown point of 50% (the best possible), while all existing algorithms have a breakdown point of zero. Moreover, our algorithm recovers the optimal solution exactly in the case where the number of corrupted points grows sublinearly in the dimension.

Journal ArticleDOI
Zhizhao Feng1, Meng Yang1, Lei Zhang1, Yan Liu1, David Zhang1 
TL;DR: The proposed algorithm is evaluated on benchmark face databases in comparison with existing linear representation based methods, and the results show that the joint learning improves the FR rate, particularly when the number of training samples per class is small.

Journal ArticleDOI
TL;DR: This approach can use only ECG signals to effectively recognize driving stress conditions with very good recognition performance and the combination of KBCS, LDA, and PCA can achieve satisfactory recognition rates for the features generated by both trend-based and parameter-based methods.