scispace - formally typeset
Search or ask a question

Showing papers on "Dimensionality reduction published in 2010"


Proceedings ArticleDOI
13 Jun 2010
TL;DR: This work proposes a simple yet efficient way of aggregating local image descriptors into a vector of limited dimension, which can be viewed as a simplification of the Fisher kernel representation, and shows how to jointly optimize the dimension reduction and the indexing algorithm.
Abstract: We address the problem of image search on a very large scale, where three constraints have to be considered jointly: the accuracy of the search, its efficiency, and the memory usage of the representation. We first propose a simple yet efficient way of aggregating local image descriptors into a vector of limited dimension, which can be viewed as a simplification of the Fisher kernel representation. We then show how to jointly optimize the dimension reduction and the indexing algorithm, so that it best preserves the quality of vector comparison. The evaluation shows that our approach significantly outperforms the state of the art: the search accuracy is comparable to the bag-of-features approach for an image representation that fits in 20 bytes. Searching a 10 million image dataset takes about 50ms.

2,782 citations


Proceedings ArticleDOI
25 Jul 2010
TL;DR: Inspired from the recent developments on manifold learning and L1-regularized models for subset selection, a new approach is proposed, called Multi-Cluster Feature Selection (MCFS), for unsupervised feature selection, which select those features such that the multi-cluster structure of the data can be best preserved.
Abstract: In many data analysis tasks, one is often confronted with very high dimensional data. Feature selection techniques are designed to find the relevant feature subset of the original features which can facilitate clustering, classification and retrieval. In this paper, we consider the feature selection problem in unsupervised learning scenario, which is particularly difficult due to the absence of class labels that would guide the search for relevant information. The feature selection problem is essentially a combinatorial optimization problem which is computationally expensive. Traditional unsupervised feature selection methods address this issue by selecting the top ranked features based on certain scores computed independently for each feature. These approaches neglect the possible correlation between different features and thus can not produce an optimal feature subset. Inspired from the recent developments on manifold learning and L1-regularized models for subset selection, we propose in this paper a new approach, called Multi-Cluster Feature Selection (MCFS), for unsupervised feature selection. Specifically, we select those features such that the multi-cluster structure of the data can be best preserved. The corresponding optimization problem can be efficiently solved since it only involves a sparse eigen-problem and a L1-regularized least squares problem. Extensive experimental results over various real-life data sets have demonstrated the superiority of the proposed algorithm.

975 citations


Journal Article
TL;DR: In this paper, a brief account of the recent developments of theory, methods, and implementations for high-dimensional variable selection is presented, with emphasis on independence screening and two-scale methods.
Abstract: High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods.

892 citations


Journal ArticleDOI
TL;DR: A new unsupervised DR method called sparsity preserving projections (SPP), which aims to preserve the sparse reconstructive relationship of the data, which is achieved by minimizing a L1 regularization-related objective function.

765 citations


Proceedings Article
06 Dec 2010
TL;DR: In this paper, an efficient convex optimization-based algorithm called Outlier Pursuit is presented, which under mild assumptions on the uncorrupted points (satisfied, e.g., by the standard generative assumption in PCA problems) recovers the exact optimal low-dimensional subspace, and identifies the corrupted points.
Abstract: Singular Value Decomposition (and Principal Component Analysis) is one of the most widely used techniques for dimensionality reduction: successful and efficiently computable, it is nevertheless plagued by a well-known, well-documented sensitivity to outliers. Recent work has considered the setting where each point has a few arbitrarily corrupted components. Yet, in applications of SVD or PCA such as robust collaborative filtering or bioinformatics, malicious agents, defective genes, or simply corrupted or contaminated experiments may effectively yield entire points that are completely corrupted. We present an efficient convex optimization-based algorithm we call Outlier Pursuit, that under some mild assumptions on the uncorrupted points (satisfied, e.g., by the standard generative assumption in PCA problems) recovers the exact optimal low-dimensional subspace, and identifies the corrupted points. Such identification of corrupted points that do not conform to the low-dimensional approximation, is of paramount interest in bioinformatics and financial applications, and beyond. Our techniques involve matrix decomposition using nuclear norm minimization, however, our results, setup, and approach, necessarily differ considerably from the existing line of work in matrix completion and matrix decomposition, since we develop an approach to recover the correct column space of the uncorrupted matrix, rather than the exact matrix itself.

590 citations


Journal Article
TL;DR: This paper explores a new aspect of the dimensionality curse, referred to as hubness, that affects the distribution of k-occurrences: the number of times a point appears among the k nearest neighbors of other points in a data set, which becomes considerably skewed as dimensionality increases.
Abstract: Different aspects of the curse of dimensionality are known to present serious challenges to various machine-learning methods and tasks. This paper explores a new aspect of the dimensionality curse, referred to as hubness, that affects the distribution of k-occurrences: the number of times a point appears among the k nearest neighbors of other points in a data set. Through theoretical and empirical analysis involving synthetic and real data sets we show that under commonly used assumptions this distribution becomes considerably skewed as dimensionality increases, causing the emergence of hubs, that is, points with very high k-occurrences which effectively represent "popular" nearest neighbors. We examine the origins of this phenomenon, showing that it is an inherent property of data distributions in high-dimensional vector space, discuss its interaction with dimensionality reduction, and explore its influence on a wide range of machine-learning tasks directly or indirectly based on measuring distances, belonging to supervised, semi-supervised, and unsupervised learning families.

560 citations


Journal ArticleDOI
TL;DR: This work presents a hybrid algorithm, SAGA, that combines the ability to avoid being trapped in a local minimum of simulated annealing with the very high rate of convergence of the crossover operator of genetic algorithms, the strong local search ability of greedy algorithms and the high computational efficiency of generalized regression neural networks.

554 citations


Journal ArticleDOI
01 Dec 2010
TL;DR: A new spectral-embedding algorithm, namely, multiview spectral embedding (MSE), which can encode different features in different ways, to achieve a physically meaningful embedding and explores the complementary property of different views.
Abstract: In computer vision and multimedia search, it is common to use multiple features from different views to represent an object. For example, to well characterize a natural scene image, it is essential to find a set of visual features to represent its color, texture, and shape information and encode each feature into a vector. Therefore, we have a set of vectors in different spaces to represent the image. Conventional spectral-embedding algorithms cannot deal with such datum directly, so we have to concatenate these vectors together as a new vector. This concatenation is not physically meaningful because each feature has a specific statistical property. Therefore, we develop a new spectral-embedding algorithm, namely, multiview spectral embedding (MSE), which can encode different features in different ways, to achieve a physically meaningful embedding. In particular, MSE finds a low-dimensional embedding wherein the distribution of each view is sufficiently smooth, and MSE explores the complementary property of different views. Because there is no closed-form solution for MSE, we derive an alternating optimization-based iterative algorithm to obtain the low-dimensional embedding. Empirical evaluations based on the applications of image retrieval, video annotation, and document clustering demonstrate the effectiveness of the proposed approach.

495 citations


Journal ArticleDOI
TL;DR: A classification with a success of 97% and 98% has been obtained by FP-ANN and k-NN, respectively, which shows that the proposed technique is robust and effective compared with other recent work.

482 citations


Journal ArticleDOI
TL;DR: A unified manifold learning framework for semi-supervised and unsupervised dimension reduction by employing a simple but effective linear regression function to map the new data points by modeling the mismatch between h(X) and F.
Abstract: We propose a unified manifold learning framework for semi-supervised and unsupervised dimension reduction by employing a simple but effective linear regression function to map the new data points. For semi-supervised dimension reduction, we aim to find the optimal prediction labels F for all the training samples X, the linear regression function h(X) and the regression residue F0 = F - h(X) simultaneously. Our new objective function integrates two terms related to label fitness and manifold smoothness as well as a flexible penalty term defined on the residue F0. Our Semi-Supervised learning framework, referred to as flexible manifold embedding (FME), can effectively utilize label information from labeled data as well as a manifold structure from both labeled and unlabeled data. By modeling the mismatch between h(X) and F, we show that FME relaxes the hard linear constraint F = h(X) in manifold regularization (MR), making it better cope with the data sampled from a nonlinear manifold. In addition, we propose a simplified version (referred to as FME/U) for unsupervised dimension reduction. We also show that our proposed framework provides a unified view to explain and understand many semi-supervised, supervised and unsupervised dimension reduction techniques. Comprehensive experiments on several benchmark databases demonstrate the significant improvement over existing dimension reduction algorithms.

435 citations


Journal ArticleDOI
TL;DR: This paper presents a family of subspace learning algorithms based on a new form of regularization, which transfers the knowledge gained in training samples to testing samples, and minimizes the Bregman divergence between the distribution of training samples and that of testing samples in the selected subspace.
Abstract: The regularization principals [31] lead approximation schemes to deal with various learning problems, e.g., the regularization of the norm in a reproducing kernel Hilbert space for the ill-posed problem. In this paper, we present a family of subspace learning algorithms based on a new form of regularization, which transfers the knowledge gained in training samples to testing samples. In particular, the new regularization minimizes the Bregman divergence between the distribution of training samples and that of testing samples in the selected subspace, so it boosts the performance when training and testing samples are not independent and identically distributed. To test the effectiveness of the proposed regularization, we introduce it to popular subspace learning algorithms, e.g., principal components analysis (PCA) for cross-domain face modeling; and Fisher's linear discriminant analysis (FLDA), locality preserving projections (LPP), marginal Fisher's analysis (MFA), and discriminative locality alignment (DLA) for cross-domain face recognition and text categorization. Finally, we present experimental evidence on both face image data sets and text data sets, suggesting that the proposed Bregman divergence-based regularization is effective to deal with cross-domain learning problems.

Journal ArticleDOI
TL;DR: In experiments with Hyperion and AVIRIS hyperspectral data, the proposed SLML-WkNN performed better than ULML- kNN and SL ML-k NN, and the highest accuracies were obtained using weights provided by supervised LTSA and LLE.
Abstract: Approaches to combine local manifold learning (LML) and the k -nearest-neighbor (kNN) classifier are investigated for hyperspectral image classification. Based on supervised LML (SLML) and kNN, a new SLML-weighted kNN (SLML-W kNN) classifier is proposed. This method is appealing as it does not require dimensionality reduction and only depends on the weights provided by the kernel function of the specific ML method. Performance of the proposed classifier is compared to that of unsupervised LML (ULML) and SLML for dimensionality reduction in conjunction with the kNN (ULML- kNN and SLML-k NN). Three LML methods, locally linear embedding (LLE), local tangent space alignment (LTSA), and Laplacian eigenmaps, are investigated with these classifiers. In experiments with Hyperion and AVIRIS hyperspectral data, the proposed SLML-WkNN performed better than ULML- kNN and SLML-k NN, and the highest accuracies were obtained using weights provided by supervised LTSA and LLE.

Journal Article
TL;DR: A rigorous definition for a specific visualization task is given, resulting in quantifiable goodness measures and new visualization methods, and it is shown empirically that the unsupervised version outperforms existing unsuper supervised dimensionality reduction methods in the visualization task, and the supervised version outper performs existing supervised methods.
Abstract: Nonlinear dimensionality reduction methods are often used to visualize high-dimensional data, although the existing methods have been designed for other related tasks such as manifold learning. It has been difficult to assess the quality of visualizations since the task has not been well-defined. We give a rigorous definition for a specific visualization task, resulting in quantifiable goodness measures and new visualization methods. The task is information retrieval given the visualization: to find similar data based on the similarities shown on the display. The fundamental tradeoff between precision and recall of information retrieval can then be quantified in visualizations as well. The user needs to give the relative cost of missing similar points vs. retrieving dissimilar points, after which the total cost can be measured. We then introduce a new method NeRV (neighbor retrieval visualizer) which produces an optimal visualization by minimizing the cost. We further derive a variant for supervised visualization; class information is taken rigorously into account when computing the similarity relationships. We show empirically that the unsupervised version outperforms existing unsupervised dimensionality reduction methods in the visualization task, and the supervised version outperforms existing supervised methods.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a multilabel dimensionality reduction method, MDDM, with two kinds of projection strategies, attempting to project the original data into a lower-dimensional feature space maximizing the dependence between the original feature description and the associated class labels.
Abstract: Multilabel learning deals with data associated with multiple labels simultaneously. Like other data mining and machine learning tasks, multilabel learning also suffers from the curse of dimensionality. Dimensionality reduction has been studied for many years, however, multilabel dimensionality reduction remains almost untouched. In this article, we propose a multilabel dimensionality reduction method, MDDM, with two kinds of projection strategies, attempting to project the original data into a lower-dimensional feature space maximizing the dependence between the original feature description and the associated class labels. Based on the Hilbert-Schmidt Independence Criterion, we derive a eigen-decomposition problem which enables the dimensionality reduction process to be efficient. Experiments validate the performance of MDDM.

Proceedings Article
31 Mar 2010
TL;DR: In this article, a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction is introduced, which can automatically select the dimensionality of the nonlinear latent space.
Abstract: We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the exact marginal likelihood of the nonlinear latent variable model. The maximization of the variational lower bound provides a Bayesian training procedure that is robust to overfitting and can automatically select the dimensionality of the nonlinear latent space. We demonstrate our method on real world datasets. The focus in this paper is on dimensionality reduction problems, but the methodology is more general. For example, our algorithm is immediately applicable for training Gaussian process models in the presence of missing or uncertain inputs.

Journal ArticleDOI
TL;DR: A new appearance‐based feature descriptor, the local directional pattern (LDP), is presented to represent facial geometry and analyze its performance in expression recognition, showing the superiority of LDP descriptor against other appearance-based feature descriptors.
Abstract: Automatic facial expression recognition has many potential applications in different areas of human computer interaction. However, they are not yet fully realized due to the lack of an effective facial feature descriptor. In this paper, we present a new appearance-based feature descriptor, the local directional pattern (LDP), to represent facial geometry and analyze its performance in expression recognition. An LDP feature is obtained by computing the edge response values in 8 directions at each pixel and encoding them into an 8 bit binary number using the relative strength of these edge responses. The LDP descriptor, a distribution of LDP codes within an image or image patch, is used to describe each expression image. The effectiveness of dimensionality reduction techniques, such as principal component analysis and AdaBoost, is also analyzed in terms of computational cost saving and classification accuracy. Two well-known machine learning methods, template matching and support vector machine, are used for classification using the Cohn-Kanade and Japanese female facial expression databases. Better classification accuracy shows the superiority of LDP descriptor against other appearance-based feature descriptors.

Proceedings ArticleDOI
12 Nov 2010
TL;DR: The analysis clearly shows that PCA has the potential to perform feature selection and is able to select a number of important individuals from all the feature components and the devised algorithm is not only subject to the nature of PCA but also computationally efficient.
Abstract: Principal component analysis (PCA) has been widely applied in the area of computer science. It is well-known that PCA is a popular transform method and the transform result is not directly related to a sole feature component of the original sample. However, in this paper, we try to apply principal components analysis (PCA) to feature selection. The proposed method well addresses the feature selection issue, from a viewpoint of numerical analysis. The analysis clearly shows that PCA has the potential to perform feature selection and is able to select a number of important individuals from all the feature components. Our method assumes that different feature components of original samples have different effects on feature extraction result and exploits the eigenvectors of the covariance matrix of PCA to evaluate the significance of each feature component of the original sample. When evaluating the significance of the feature components, the proposed method takes a number of eigenvectors into account. Then it uses a reasonable scheme to perform feature selection. The devised algorithm is not only subject to the nature of PCA but also computationally efficient. The experimental results on face recognition show that when the proposed method is able to greatly reduce the dimensionality of the original samples, it also does not bring the decrease in the recognition accuracy.

Book
20 Jul 2010
TL;DR: A tutorial overview of several geometric methods for dimension reduction by dividing the methods into projective methods and methods that model the manifold on which the data lies.
Abstract: We give a tutorial overview of several geometric methods for dimension reduction. We divide the methods into projective methods and methods that model the manifold on which the data lies. For projective methods, we review projection pursuit, principal component analysis (PCA), kernel PCA, probabilistic PCA, canonical correlation analysis, oriented PCA, and several techniques for sufficient dimension reduction. For the manifold methods, we review multidimensional scaling (MDS), landmark MDS, Isomap, locally linear embedding, Laplacian

Proceedings Article
11 Jul 2010
TL;DR: This work proposes a novel spectral feature selection algorithm to handle feature redundancy, adopting an embedded model derived from a formulation based on a sparse multi-output regression with a L2,1-norm constraint.
Abstract: Spectral feature selection identifies relevant features by measuring their capability of preserving sample similarity. It provides a powerful framework for both supervised and unsupervised feature selection, and has been proven to be effective in many real-world applications. One common drawback associated with most existing spectral feature selection algorithms is that they evaluate features individually and cannot identify redundant features. Since redundant features can have significant adverse effect on learning performance, it is necessary to address this limitation for spectral feature selection. To this end, we propose a novel spectral feature selection algorithm to handle feature redundancy, adopting an embedded model. The algorithm is derived from a formulation based on a sparse multi-output regression with a L2,1-norm constraint. We conduct theoretical analysis on the properties of its optimal solutions, paving the way for designing an efficient path-following solver. Extensive experiments show that the proposed algorithm can do well in both selecting relevant features and removing redundancy.

Journal ArticleDOI
TL;DR: This paper proposes a semi-supervised dimensionality reduction method which preserves the global structure of unlabeled samples in addition to separating labeled samples in different classes from each other and shows the usefulness of SELF through experiments with benchmark and real-world document classification datasets.
Abstract: When only a small number of labeled samples are available, supervised dimensionality reduction methods tend to perform poorly because of overfitting. In such cases, unlabeled samples could be useful in improving the performance. In this paper, we propose a semi-supervised dimensionality reduction method which preserves the global structure of unlabeled samples in addition to separating labeled samples in different classes from each other. The proposed method, which we call SEmi-supervised Local Fisher discriminant analysis (SELF), has an analytic form of the globally optimal solution and it can be computed based on eigen-decomposition. We show the usefulness of SELF through experiments with benchmark and real-world document classification datasets.

Posted Content
TL;DR: It is shown that parameters of a Gaussian mixture distribution with fixed number of components can be learned using a sample whose size is polynomial in dimension and all other parameters.
Abstract: The question of polynomial learnability of probability distributions, particularly Gaussian mixture distributions, has recently received significant attention in theoretical computer science and machine learning. However, despite major progress, the general question of polynomial learnability of Gaussian mixture distributions still remained open. The current work resolves the question of polynomial learnability for Gaussian mixtures in high dimension with an arbitrary fixed number of components. The result on learning Gaussian mixtures relies on an analysis of distributions belonging to what we call "polynomial families" in low dimension. These families are characterized by their moments being polynomial in parameters and include almost all common probability distributions as well as their mixtures and products. Using tools from real algebraic geometry, we show that parameters of any distribution belonging to such a family can be learned in polynomial time and using a polynomial number of sample points. The result on learning polynomial families is quite general and is of independent interest. To estimate parameters of a Gaussian mixture distribution in high dimensions, we provide a deterministic algorithm for dimensionality reduction. This allows us to reduce learning a high-dimensional mixture to a polynomial number of parameter estimations in low dimension. Combining this reduction with the results on polynomial families yields our result on learning arbitrary Gaussian mixtures in high dimensions.

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed approach is able to select feature subsets with better classification performance than the SFFS method and the integrated feature pre-selection mechanism helps to solve the over-fitting problems and reduces the chances of getting a local optimal solution.

Journal ArticleDOI
TL;DR: This work proposes algorithms for feature extraction and classification based on orthogonal or nonnegative tensor (multi-array) decompositions, and higher order (multilinear) discriminant analysis (HODA), whereby input data are considered as tensors instead of more conventional vector or matrix representations.
Abstract: Feature extraction and selection are key factors in model reduction, classification and pattern recognition problems. This is especially important for input data with large dimensions such as brain recording or multiview images, where appropriate feature extraction is a prerequisite to classification. To ensure that the reduced dataset contains maximum information about input data we propose algorithms for feature extraction and classification. This is achieved based on orthogonal or nonnegative tensor (multi-array) decompositions, and higher order (multilinear) discriminant analysis (HODA), whereby input data are considered as tensors instead of more conventional vector or matrix representations. The developed algorithms are verified on benchmark datasets, using constraints imposed on tensors and/or factor matrices such as orthogonality and nonnegativity.

Journal ArticleDOI
TL;DR: Thorough empirical studies based on the USC scene dataset demonstrate that the proposed framework improves the classification rates around 100% relatively and the training speed 60 times for different sites in comparing with previous gist proposed by Siagian and Itti in 2007.
Abstract: Biologically inspired feature (BIF) and its variations have been demonstrated to be effective and efficient for scene classification. It is unreasonable to measure the dissimilarity between two BIFs based on their Euclidean distance. This is because BIFs are extrinsically very high dimensional and intrinsically low dimensional, i.e., BIFs are sampled from a low-dimensional manifold and embedded in a high-dimensional space. Therefore, it is essential to find the intrinsic structure of a set of BIFs, obtain a suitable mapping to implement the dimensionality reduction, and measure the dissimilarity between two BIFs in the low-dimensional space based on their Euclidean distance. In this paper, we study the manifold constructed by a set of BIFs utilized for scene classification, form a new dimensionality reduction algorithm by preserving both the geometry of intra BIFs and the discriminative information inter BIFs termed Discriminative and Geometry Preserving Projections (DGPP), and construct a new framework for scene classification. In this framework, we represent an image based on a new BIF, which combines the intensity channel, the color channel, and the C1 unit of a color image; then we project the high-dimensional BIF to a low-dimensional space based on DGPP; and, finally, we conduct the classification based on the multiclass support vector machine (SVM). Thorough empirical studies based on the USC scene dataset demonstrate that the proposed framework improves the classification rates around 100% relatively and the training speed 60 times for different sites in comparing with previous gist proposed by Siagian and Itti in 2007.

Journal ArticleDOI
TL;DR: The proposed method includes extraction of a surrogate model that mimics key characteristics of a full process model, followed by testing and implementation of a pragmatic uncertainty analysis technique, called null‐space Monte Carlo (NSMC), that merges the strengths of gradient‐based search and parameter dimensionality reduction.
Abstract: [1] Highly parameterized and CPU-intensive groundwater models are increasingly being used to understand and predict flow and transport through aquifers. Despite their frequent use, these models pose significant challenges for parameter estimation and predictive uncertainty analysis algorithms, particularly global methods which usually require very large numbers of forward runs. Here we present a general methodology for parameter estimation and uncertainty analysis that can be utilized in these situations. Our proposed method includes extraction of a surrogate model that mimics key characteristics of a full process model, followed by testing and implementation of a pragmatic uncertainty analysis technique, called null-space Monte Carlo (NSMC), that merges the strengths of gradient-based search and parameter dimensionality reduction. As part of the surrogate model analysis, the results of NSMC are compared with a formal Bayesian approach using the DiffeRential Evolution Adaptive Metropolis (DREAM) algorithm. Such a comparison has never been accomplished before, especially in the context of high parameter dimensionality. Despite the highly nonlinear nature of the inverse problem, the existence of multiple local minima, and the relatively large parameter dimensionality, both methods performed well and results compare favorably with each other. Experiences gained from the surrogate model analysis are then transferred to calibrate the full highly parameterized and CPU intensive groundwater model and to explore predictive uncertainty of predictions made by that model. The methodology presented here is generally applicable to any highly parameterized and CPU-intensive environmental model, where efficient methods such as NSMC provide the only practical means for conducting predictive uncertainty analysis.

Journal ArticleDOI
29 Apr 2010
TL;DR: In this article, the authors compare and contrast from a geometric perspective a number of low-dimensional signal models that support stable information-preserving dimensionality reduction, including sparse and compressible signal models for deterministic and random signals.
Abstract: We compare and contrast from a geometric perspective a number of low-dimensional signal models that support stable information-preserving dimensionality reduction. We consider sparse and compressible signal models for deterministic and random signals, structured sparse and compressible signal models, point clouds, and manifold signal models. Each model has a particular geometrical structure that enables signal information to be stably preserved via a simple linear and nonadaptive projection to a much lower dimensional space; in each case the projection dimension is independent of the signal's ambient dimension at best or grows logarithmically with it at worst. As a bonus, we point out a common misconception related to probabilistic compressible signal models, namely, by showing that the oft-used generalized Gaussian and Laplacian models do not support stable linear dimensionality reduction.

Journal ArticleDOI
TL;DR: Simulations for a space-time interference suppression application with a direct-sequence code-division multiple-access (DS-CDMA) system show that the proposed scheme outperforms in convergence and tracking the state-of-the-art reduced-rank schemes at a comparable complexity.
Abstract: This paper presents novel adaptive space-time reduced-rank interference-suppression least squares (LS) algorithms based on a joint iterative optimization of parameter vectors. The proposed space-time reduced-rank scheme consists of a joint iterative optimization of a projection matrix that performs dimensionality reduction and an adaptive reduced-rank parameter vector that yields the symbol estimates. The proposed techniques do not require singular value decomposition (SVD) and automatically find the best set of basis for reduced-rank processing. We present LS expressions for the design of the projection matrix and the reduced-rank parameter vector, and we conduct an analysis of the convergence properties of the LS algorithms. We then develop recursive LS (RLS) adaptive algorithms for their computationally efficient estimation and an algorithm that automatically adjusts the rank of the proposed scheme. A convexity analysis of the LS algorithms is carried out along with the development of a proof of convergence for the proposed algorithms. Simulations for a space-time interference suppression application with a direct-sequence code-division multiple-access (DS-CDMA) system show that the proposed scheme outperforms in convergence and tracking the state-of-the-art reduced-rank schemes at a comparable complexity.

Journal ArticleDOI
TL;DR: A novel approach for LR face recognition without any SR preprocessing based on coupled mappings, which significantly improves the recognition performance and is Inspired by locality preserving methods for dimensionality reduction.
Abstract: Practical face recognition systems are sometimes confronted with low-resolution face images. Traditional two-step methods solve this problem through employing super-resolution (SR). However, these methods usually have limited performance because the target of SR is not absolutely consistent with that of face recognition. Moreover, time-consuming sophisticated SR algorithms are not suitable for real-time applications. To avoid these limitations, we propose a novel approach for LR face recognition without any SR preprocessing. Our method based on coupled mappings (CMs), projects the face images with different resolutions into a unified feature space which favors the task of classification. These CMs are learned through optimizing the objective function to minimize the difference between the correspondences (i.e., low-resolution image and its high-resolution counterpart). Inspired by locality preserving methods for dimensionality reduction, we introduce a penalty weighting matrix into our objective function. Our method significantly improves the recognition performance. Finally, we conduct experiments on publicly available databases to verify the efficacy of our algorithm.

Journal ArticleDOI
TL;DR: This work develops sparse versions of the recently proposed two PLS-based classification methods using sparse partial least squares (SPLS), and shows that incorporation of SPLS into a generalized linear model (GLM) framework provides higher sensitivity in variable selection for multicategory classification with unbalanced sample sizes between classes.
Abstract: Partial least squares (PLS) is a well known dimension reduction method which has been recently adapted for high dimensional classification problems in genome biology. We develop sparse versions of the recently proposed two PLS-based classification methods using sparse partial least squares (SPLS). These sparse versions aim to achieve variable selection and dimension reduction simultaneously. We consider both binary and multicategory classification. We provide analytical and simulation-based insights about the variable selection properties of these approaches and benchmark them on well known publicly available datasets that involve tumor classification with high dimensional gene expression data. We show that incorporation of SPLS into a generalized linear model (GLM) framework provides higher sensitivity in variable selection for multicategory classification with unbalanced sample sizes between classes. As the sample size increases, the two-stage approach provides comparable sensitivity with better specificity in variable selection. In binary classification and multicategory classification with balanced sample sizes, the two-stage approach provides comparable variable selection and prediction accuracy as the GLM version and is computationally more efficient.

Journal ArticleDOI
TL;DR: The modeling framework can be viewed as a combination of dimensionality reduction and graphical modeling (to capture remaining statistical structure not attributable to the latent variables) and it consistently estimates both the number of latent components and the conditional graphical model structure among the observed variables.
Abstract: Suppose we observe samples of a subset of a collection of random variables. No additional information is provided about the number of latent variables, nor of the relationship between the latent and observed variables. Is it possible to discover the number of latent components, and to learn a statistical model over the entire collection of variables? We address this question in the setting in which the latent and observed variables are jointly Gaussian, with the conditional statistics of the observed variables conditioned on the latent variables being specified by a graphical model. As a first step we give natural conditions under which such latent-variable Gaussian graphical models are identifiable given marginal statistics of only the observed variables. Essentially these conditions require that the conditional graphical model among the observed variables is sparse, while the effect of the latent variables is "spread out" over most of the observed variables. Next we propose a tractable convex program based on regularized maximum-likelihood for model selection in this latent-variable setting; the regularizer uses both the $\ell_1$ norm and the nuclear norm. Our modeling framework can be viewed as a combination of dimensionality reduction (to identify latent variables) and graphical modeling (to capture remaining statistical structure not attributable to the latent variables), and it consistently estimates both the number of latent components and the conditional graphical model structure among the observed variables. These results are applicable in the high-dimensional setting in which the number of latent/observed variables grows with the number of samples of the observed variables. The geometric properties of the algebraic varieties of sparse matrices and of low-rank matrices play an important role in our analysis.