scispace - formally typeset
Search or ask a question

Showing papers on "Dimensionality reduction published in 2011"


Journal ArticleDOI
TL;DR: This work proposes a novel dimensionality reduction framework for reducing the distance between domains in a latent space for domain adaptation and proposes both unsupervised and semisupervised feature extraction approaches, which can dramatically reduce thedistance between domain distributions by projecting data onto the learned transfer components.
Abstract: Domain adaptation allows knowledge from a source domain to be transferred to a different but related target domain. Intuitively, discovering a good feature representation across domains is crucial. In this paper, we first propose to find such a representation through a new learning method, transfer component analysis (TCA), for domain adaptation. TCA tries to learn some transfer components across domains in a reproducing kernel Hilbert space using maximum mean miscrepancy. In the subspace spanned by these transfer components, data properties are preserved and data distributions in different domains are close to each other. As a result, with the new representations in this subspace, we can apply standard machine learning methods to train classifiers or regression models in the source domain for use in the target domain. Furthermore, in order to uncover the knowledge hidden in the relations between the data labels from the source and target domains, we extend TCA in a semisupervised learning setting, which encodes label information into transfer components learning. We call this extension semisupervised TCA. The main contribution of our work is that we propose a novel dimensionality reduction framework for reducing the distance between domains in a latent space for domain adaptation. We propose both unsupervised and semisupervised feature extraction approaches, which can dramatically reduce the distance between domain distributions by projecting data onto the learned transfer components. Finally, our approach can handle large datasets and naturally lead to out-of-sample generalization. The effectiveness and efficiency of our approach are verified by experiments on five toy datasets and two real-world applications: cross-domain indoor WiFi localization and cross-domain text classification.

3,195 citations


Journal Article
TL;DR: MULAN is a Java library for learning from multi-label data that offers a variety of classification, ranking, thresholding and dimensionality reduction algorithms, as well as algorithms forlearning from hierarchically structured labels.
Abstract: MULAN is a Java library for learning from multi-label data. It offers a variety of classification, ranking, thresholding and dimensionality reduction algorithms, as well as algorithms for learning from hierarchically structured labels. In addition, it contains an evaluation framework that calculates a rich variety of performance measures.

709 citations


Journal Article
TL;DR: Methods for learning dictionaries that are appropriate for the representation of given classes of signals and multisensor data are described and dimensionality reduction based on dictionary representation can be extended to address specific tasks such as data analy sis or classification.
Abstract: We describe methods for learning dictionaries that are appropriate for the representation of given classes of signals and multisensor data. We further show that dimensionality reduction based on dictionary representation can be extended to address specific tasks such as data analy sis or classification when the learning includes a class separability criteria in the objective function. The benefits of dictionary learning clearly show that a proper understanding of causes underlying the sensed world is key to task-specific representation of relevant information in high-dimensional data sets.

705 citations


Journal ArticleDOI
TL;DR: A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework and has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets.
Abstract: Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits. A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework. sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets.

672 citations


Proceedings ArticleDOI
06 Nov 2011
TL;DR: This paper proposes to construct the dictionary by using both observed and unobserved, hidden data, and shows that the effects of the hidden data can be approximately recovered by solving a nuclear norm minimization problem, which is convex and can be solved efficiently.
Abstract: Low-Rank Representation (LRR) [16, 17] is an effective method for exploring the multiple subspace structures of data. Usually, the observed data matrix itself is chosen as the dictionary, which is a key aspect of LRR. However, such a strategy may depress the performance, especially when the observations are insufficient and/or grossly corrupted. In this paper we therefore propose to construct the dictionary by using both observed and unobserved, hidden data. We show that the effects of the hidden data can be approximately recovered by solving a nuclear norm minimization problem, which is convex and can be solved efficiently. The formulation of the proposed method, called Latent Low-Rank Representation (LatLRR), seamlessly integrates subspace segmentation and feature extraction into a unified framework, and thus provides us with a solution for both subspace segmentation and feature extraction. As a subspace segmentation algorithm, LatLRR is an enhanced version of LRR and outperforms the state-of-the-art algorithms. Being an unsupervised feature extraction algorithm, LatLRR is able to robustly extract salient features from corrupted data, and thus can work much better than the benchmark that utilizes the original data vectors as features for classification. Compared to dimension reduction based methods, LatLRR is more robust to noise.

656 citations


Proceedings ArticleDOI
16 Jul 2011
TL;DR: In this paper, a joint framework for unsupervised feature selection is proposed to select the most discriminative feature subset from the whole feature set in batch mode, where the class label of input data can be predicted by a linear classifier.
Abstract: Compared with supervised learning for feature selection, it is much more difficult to select the discriminative features in unsupervised learning due to the lack of label information. Traditional unsupervised feature selection algorithms usually select the features which best preserve the data distribution, e.g., manifold structure, of the whole feature set. Under the assumption that the class label of input data can be predicted by a linear classifier, we incorporate discriminative analysis and l2,1-norm minimization into a joint framework for unsupervised feature selection. Different from existing unsupervised feature selection algorithms, our algorithm selects the most discriminative feature subset from the whole feature set in batch mode. Extensive experiment on different data types demonstrates the effectiveness of our algorithm.

613 citations


Journal ArticleDOI
TL;DR: This work proposes sparse discriminantAnalysis, a method for performing linear discriminant analysis with a sparseness criterion imposed such that classification and feature selection are performed simultaneously in the high-dimensional setting.
Abstract: We consider the problem of performing interpretable classification in the high-dimensional setting, in which the number of features is very large and the number of observations is limited. This setting has been studied extensively in the chemometrics literature, and more recently has become commonplace in biological and medical applications. In this setting, a traditional approach involves performing feature selection before classification. We propose sparse discriminant analysis, a method for performing linear discriminant analysis with a sparseness criterion imposed such that classification and feature selection are performed simultaneously. Sparse discriminant analysis is based on the optimal scoring interpretation of linear discriminant analysis, and can be extended to perform sparse discrimination via mixtures of Gaussians if boundaries between classes are nonlinear or if subgroups are present within each class. Our proposal also provides low-dimensional views of the discriminative directions.

565 citations


Journal ArticleDOI
TL;DR: A set of building blocks for constructing descriptors which can be combined together and jointly optimized so as to minimize the error of a nearest-neighbor classifier are described.
Abstract: In this paper, we explore methods for learning local image descriptors from training data. We describe a set of building blocks for constructing descriptors which can be combined together and jointly optimized so as to minimize the error of a nearest-neighbor classifier. We consider both linear and nonlinear transforms with dimensionality reduction, and make use of discriminant learning techniques such as Linear Discriminant Analysis (LDA) and Powell minimization to solve for the parameters. Using these techniques, we obtain descriptors that exceed state-of-the-art performance with low dimensionality. In addition to new experiments and recommendations for descriptor learning, we are also making available a new and realistic ground truth data set based on multiview stereo data.

520 citations


Proceedings ArticleDOI
27 Aug 2011
TL;DR: In this paper, a new language identification system is presented based on the total variability approach previously developed in the field of speaker identification and various techniques are employed to extract the most salient features in the lower dimensional i-vector space.
Abstract: In this paper, a new language identification system is presented based on the total variability approach previously developed in the field of speaker identification. Various techniques are employed to extract the most salient features in the lower dimensional i-vector space and the system developed results in excellent performance on the 2009 LRE evaluation set without the need for any post-processing or backend techniques. Additional performance gains are observed when the system is combined with other acoustic systems.

438 citations


Journal ArticleDOI
Harun Uğuz1
TL;DR: Two-stage feature selection and feature extraction is used to improve the performance of text categorization and the proposed model is able to achieve high categorization effectiveness as measured by precision, recall and F-measure.
Abstract: Text categorization is widely used when organizing documents in a digital form. Due to the increasing number of documents in digital form, automated text categorization has become more promising in the last ten years. A major problem of text categorization is its large number of features. Most of those are irrelevant noise that can mislead the classifier. Therefore, feature selection is often used in text categorization to reduce the dimensionality of the feature space and to improve performance. In this study, two-stage feature selection and feature extraction is used to improve the performance of text categorization. In the first stage, each term within the document is ranked depending on their importance for classification using the information gain (IG) method. In the second stage, genetic algorithm (GA) and principal component analysis (PCA) feature selection and feature extraction methods are applied separately to the terms which are ranked in decreasing order of importance, and a dimension reduction is carried out. Thereby, during text categorization, terms of less importance are ignored, and feature selection and extraction methods are applied to the terms of highest importance; thus, the computational time and complexity of categorization is reduced. To evaluate the effectiveness of dimension reduction methods on our purposed model, experiments are conducted using the k-nearest neighbour (KNN) and C4.5 decision tree algorithm on Reuters-21,578 and Classic3 datasets collection for text categorization. The experimental results show that the proposed model is able to achieve high categorization effectiveness as measured by precision, recall and F-measure.

431 citations


Journal ArticleDOI
TL;DR: The central issues of MSL are discussed, including establishing the foundations of the field via multilinear projections, formulating a unifying MSL framework for systematic treatment of the problem, and examining the algorithmic aspects of typical MSL solutions.

Journal ArticleDOI
TL;DR: In this article, an improved analysis of a structured dimension reduction map called the subsampled randomized Hadamard transform is presented, and the new proof is much simpler than previous approaches, and it offers optimal constants in the estimate on the number of dimensions required for the embedding.
Abstract: This paper presents an improved analysis of a structured dimension-reduction map called the subsampled randomized Hadamard transform. This argument demonstrates that the map preserves the Euclidean geometry of an entire subspace of vectors. The new proof is much simpler than previous approaches, and it offers — for the first time — optimal constants in the estimate on the number of dimensions required for the embedding.

Proceedings ArticleDOI
Jorge Sanchez1, Florent Perronnin1
20 Jun 2011
TL;DR: This work reports results on two large databases — ImageNet and a dataset of lM Flickr images — showing that it can reduce the storage of the authors' signatures by a factor 64 to 128 with little loss in accuracy and integrating the decompression in the classifier learning yields an efficient and scalable training algorithm.
Abstract: We address image classification on a large-scale, i.e. when a large number of images and classes are involved. First, we study classification accuracy as a function of the image signature dimensionality and the training set size. We show experimentally that the larger the training set, the higher the impact of the dimensionality on the accuracy. In other words, high-dimensional signatures are important to obtain state-of-the-art results on large datasets. Second, we tackle the problem of data compression on very large signatures (on the order of 105 dimensions) using two lossy compression strategies: a dimensionality reduction technique known as the hash kernel and an encoding technique based on product quantizers. We explain how the gain in storage can be traded against a loss in accuracy and/or an increase in CPU cost. We report results on two large databases — ImageNet and a dataset of lM Flickr images — showing that we can reduce the storage of our signatures by a factor 64 to 128 with little loss in accuracy. Integrating the decompression in the classifier learning yields an efficient and scalable training algorithm. On ILSVRC2010 we report a 74.3% accuracy at top-5, which corresponds to a 2.5% absolute improvement with respect to the state-of-the-art. On a subset of 10K classes of ImageNet we report a top-1 accuracy of 16.7%, a relative improvement of 160% with respect to the state-of-the-art.

Journal ArticleDOI
TL;DR: It is shown that under a mild condition which tends to hold for high-dimensional data, CCA in the multilabel case can be formulated as a least-squares problem, and several CCA extensions are proposed, including the sparse CCA formulation based on the 1-norm regularization.
Abstract: Canonical Correlation Analysis (CCA) is a well-known technique for finding the correlations between two sets of multidimensional variables. It projects both sets of variables onto a lower-dimensional space in which they are maximally correlated. CCA is commonly applied for supervised dimensionality reduction in which the two sets of variables are derived from the data and the class labels, respectively. It is well-known that CCA can be formulated as a least-squares problem in the binary class case. However, the extension to the more general setting remains unclear. In this paper, we show that under a mild condition which tends to hold for high-dimensional data, CCA in the multilabel case can be formulated as a least-squares problem. Based on this equivalence relationship, efficient algorithms for solving least-squares problems can be applied to scale CCA to very large data sets. In addition, we propose several CCA extensions, including the sparse CCA formulation based on the 1-norm regularization. We further extend the least-squares formulation to partial least squares. In addition, we show that the CCA projection for one set of variables is independent of the regularization on the other set of multidimensional variables, providing new insights on the effect of regularization on CCA. We have conducted experiments using benchmark data sets. Experiments on multilabel data sets confirm the established equivalence relationships. Results also demonstrate the effectiveness and efficiency of the proposed CCA extensions.

Journal ArticleDOI
TL;DR: A new human face recognition algorithm based on bidirectional two dimensional principal component analysis (B2DPCA) and extreme learning machine (ELM) and a subband that exhibits a maximum standard deviation is dimensionally reduced using an improved dimensionality reduction technique.

Journal ArticleDOI
TL;DR: This work proposes ''supervised principal component analysis (supervised PCA)'', a generalization of PCA that is uniquely effective for regression and classification problems with high-dimensional input data and shows significant improvement over other supervised approaches both in accuracy and computational efficiency.

Proceedings Article
12 Dec 2011
TL;DR: An algorithm called Sparse Manifold Clustering and Embedding (SMCE) for simultaneous clustering and dimensionality reduction of data lying in multiple nonlinear manifolds finds a small neighborhood around each data point and connects each point to its neighbors with appropriate weights.
Abstract: We propose an algorithm called Sparse Manifold Clustering and Embedding (SMCE) for simultaneous clustering and dimensionality reduction of data lying in multiple nonlinear manifolds. Similar to most dimensionality reduction methods, SMCE finds a small neighborhood around each data point and connects each point to its neighbors with appropriate weights. The key difference is that SMCE finds both the neighbors and the weights automatically. This is done by solving a sparse optimization problem, which encourages selecting nearby points that lie in the same manifold and approximately span a low-dimensional affine subspace. The optimal solution encodes information that can be used for clustering and dimensionality reduction using spectral clustering and embedding. Moreover, the size of the optimal neighborhood of a data point, which can be different for different points, provides an estimate of the dimension of the manifold to which the point belongs. Experiments demonstrate that our method can effectively handle multiple manifolds that are very close to each other, manifolds with non-uniform sampling and holes, as well as estimate the intrinsic dimensions of the manifolds.

Journal ArticleDOI
TL;DR: It is proposed that when dimensionality reduction is performed on trajectory data one should think of the resultant embedding as a quickly sketched set of directions rather than a road map, because some features of the free-energy surface are inherently high-dimensional.
Abstract: A new scheme, sketch-map, for obtaining a low-dimensional representation of the region of phase space explored during an enhanced dynamics simulation is proposed. We show evidence, from an examination of the distribution of pairwise distances between frames, that some features of the free-energy surface are inherently high-dimensional. This makes dimensionality reduction problematic because the data does not satisfy the assumptions made in conventional manifold learning algorithms We therefore propose that when dimensionality reduction is performed on trajectory data one should think of the resultant embedding as a quickly sketched set of directions rather than a road map. In other words, the embedding tells one about the connectivity between states but does not provide the vectors that correspond to the slow degrees of freedom. This realization informs the development of sketch-map, which endeavors to reproduce the proximity information from the high-dimensionality description in a space of lower dimensionality even when a faithful embedding is not possible.

Journal ArticleDOI
TL;DR: Under a spiked covariance model, a new iterative thresholding approach for estimating principal subspaces in the setting where the leading eigenvectors are sparse is proposed and it is found that the new approach recovers the principal subspace and leading eignevectors consistently, and even optimally, in a range of high-dimensional sparse settings.
Abstract: Principal component analysis (PCA) is a classical dimension reduction method which projects data onto the principal subspace spanned by the leading eigenvectors of the covariance matrix. However, it behaves poorly when the number of features p is comparable to, or even much larger than, the sample size n. In this paper, we propose a new iterative thresholding approach for estimating principal subspaces in the setting where the leading eigenvectors are sparse. Under a spiked covariance model, we find that the new approach recovers the principal subspace and leading eigenvectors consistently, and even optimally, in a range of high-dimensional sparse settings. Simulated examples also demonstrate its competitive performance.

Journal ArticleDOI
TL;DR: A new supervised band-selection algorithm that uses the known class signatures only without examining the original bands or the need of class training samples is proposed, which can complete the task much faster than traditional methods that test bands or band combinations.
Abstract: Band selection is often applied to reduce the dimensionality of hyperspectral imagery. When the desired object information is known, it can be achieved by finding the bands that contain the most object information. It is expected that these bands can provide an overall satisfactory detection and classification performance. In this letter, we propose a new supervised band-selection algorithm that uses the known class signatures only without examining the original bands or the need of class training samples. Thus, it can complete the task much faster than traditional methods that test bands or band combinations. The experimental result shows that our approach can generally yield better results than other popular supervised band-selection methods in the literature.

Journal ArticleDOI
TL;DR: The proposed approach generalizes the framework of multiple kernel learning for dimensionality reduction, and distinguishes itself with the following three main contributions: first, the method provides the convenience of using diverse image descriptors to describe useful characteristics of various aspects about the underlying data, and consequently improves their effectiveness.
Abstract: In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. The resulting data representations are typically high-dimensional and assume diverse forms. Hence, finding a way of transforming them into a unified space of lower dimension generally facilitates the underlying tasks such as object recognition or clustering. To this end, the proposed approach (termed MKL-DR) generalizes the framework of multiple kernel learning for dimensionality reduction, and distinguishes itself with the following three main contributions: First, our method provides the convenience of using diverse image descriptors to describe useful characteristics of various aspects about the underlying data. Second, it extends a broad set of existing dimensionality reduction techniques to consider multiple kernel learning, and consequently improves their effectiveness. Third, by focusing on the techniques pertaining to dimensionality reduction, the formulation introduces a new class of applications with the multiple kernel learning framework to address not only the supervised learning problems but also the unsupervised and semi-supervised ones.

Journal ArticleDOI
TL;DR: A novel algorithm, Pareto corner search evolutionary algorithm (PCSEA), is introduced in this paper, which searches for the corners of the PareTO front instead of searching for the complete Pare to front to identify the relevant objectives.
Abstract: Many-objective optimization refers to the optimization problems containing large number of objectives, typically more than four. Non-dominance is an inadequate strategy for convergence to the Pareto front for such problems, as almost all solutions in the population become non-dominated, resulting in loss of convergence pressure. However, for some problems, it may be possible to generate the Pareto front using only a few of the objectives, rendering the rest of the objectives redundant. Such problems may be reducible to a manageable number of relevant objectives, which can be optimized using conventional multiobjective evolutionary algorithms (MOEAs). For dimensionality reduction, most proposals in the paper rely on analysis of a representative set of solutions obtained by running a conventional MOEA for a large number of generations, which is computationally overbearing. A novel algorithm, Pareto corner search evolutionary algorithm (PCSEA), is introduced in this paper, which searches for the corners of the Pareto front instead of searching for the complete Pareto front. The solutions obtained using PCSEA are then used for dimensionality reduction to identify the relevant objectives. The potential of the proposed approach is demonstrated by studying its performance on a set of benchmark test problems and two engineering examples. While the preliminary results obtained using PCSEA are promising, there are a number of areas that need further investigation. This paper provides a number of useful insights into dimensionality reduction and, in particular, highlights some of the roadblocks that need to be cleared for future development of algorithms attempting to use few selected solutions for identifying relevant objectives.

Proceedings ArticleDOI
20 Jun 2011
TL;DR: Experimental results on a very large database show that the KPLS is significantly better than the popular SVM method, and outperform the state-of-the-art approaches in human age estimation.
Abstract: Human age estimation has recently become an active research topic in computer vision and pattern recognition, because of many potential applications in reality. In this paper we propose to use the kernel partial least squares (KPLS) regression for age estimation. The KPLS (or linear PLS) method has several advantages over previous approaches: (1) the KPLS can reduce feature dimensionality and learn the aging function simultaneously in a single learning framework, instead of performing each task separately using different techniques; (2) the KPLS can find a small number of latent variables, e.g., 20, to project thousands of features into a very low-dimensional subspace, which may have great impact on real-time applications; and (3) the KPLS regression has an output vector that can contain multiple labels, so that several related problems, e.g., age estimation, gender classification, and ethnicity estimation can be solved altogether. This is the first time that the kernel PLS method is introduced and applied to solve a regression problem in computer vision with high accuracy. Experimental results on a very large database show that the KPLS is significantly better than the popular SVM method, and outperform the state-of-the-art approaches in human age estimation.

Journal Article
TL;DR: A novel theoretical understanding of principal curves and surfaces, practical algorithms as general purpose machine learning tools, and applications of these algorithms to several practical problems are presented.
Abstract: Principal curves are defined as self-consistent smooth curves passing through the middle of the data, and they have been used in many applications of machine learning as a generalization, dimensionality reduction and a feature extraction tool. We redefine principal curves and surfaces in terms of the gradient and the Hessian of the probability density estimate. This provides a geometric understanding of the principal curves and surfaces, as well as a unifying view for clustering, principal curve fitting and manifold learning by regarding those as principal manifolds of different intrinsic dimensionalities. The theory does not impose any particular density estimation method can be used with any density estimator that gives continuous first and second derivatives. Therefore, we first present our principal curve/surface definition without assuming any particular density estimation method. Afterwards, we develop practical algorithms for the commonly used kernel density estimation (KDE) and Gaussian mixture models (GMM). Results of these algorithms are presented in notional data sets as well as real applications with comparisons to other approaches in the principal curve literature. All in all, we present a novel theoretical understanding of principal curves and surfaces, practical algorithms as general purpose machine learning tools, and applications of these algorithms to several practical problems.

Proceedings ArticleDOI
16 Jul 2011
TL;DR: This paper reformulate the subspace learning problem and uses L2,1-norm on the projection matrix to achieve row-sparsity, which leads to selecting relevant features and learning transformation simultaneously.
Abstract: Dimensionality reduction is a very important topic in machine learning. It can be generally classified into two categories: feature selection and subspace learning. In the past decades, many methods have been proposed for dimensionality reduction. However, most of these works study feature selection and subspace learning independently. In this paper, we present a framework for joint feature selection and subspace learning. We reformulate the subspace learning problem and use L2,1-norm on the projection matrix to achieve row-sparsity, which leads to selecting relevant features and learning transformation simultaneously. We discuss two situations of the proposed framework, and present their optimization algorithms. Experiments on benchmark face recognition data sets illustrate that the proposed framework outperforms the state of the art methods overwhelmingly.

BookDOI
20 Dec 2011
TL;DR: Comprehensive in its coverage, this pioneering work explores this novel modality from algorithm creation to successful implementation of manifold learning, offering examples of applications in medical, biometrics, multimedia, and computer vision.
Abstract: Trained to extract actionable information from large volumes of high-dimensional data, engineers and scientists often have trouble isolating meaningful low-dimensional structures hidden in their high-dimensional observations. Manifold learning, a groundbreaking technique designed to tackle these issues of dimensionality reduction, finds widespread application in machine learning, neural networks, pattern recognition, image processing, and computer vision. Filling a void in the literature, Manifold Learning Theory and Applications incorporates state-of-the-art techniques in manifold learning with a solid theoretical and practical treatment of the subject. Comprehensive in its coverage, this pioneering work explores this novel modality from algorithm creation to successful implementationoffering examples of applications in medical, biometrics, multimedia, and computer vision. Emphasizing implementation, it highlights the various permutations of manifold learning in industry including manifold optimization, large scale manifold learning, semidefinite programming for embedding, manifold models for signal acquisition, compression and processing, and multi scale manifold. Beginning with an introduction to manifold learning theories and applications, the book includes discussions on the relevance to nonlinear dimensionality reduction, clustering, graph-based subspace learning, spectral learning and embedding, extensions, and multi-manifold modeling. It synergizes cross-domain knowledge for interdisciplinary instructions, offers a rich set of specialized topics contributed by expert professionals and researchers from a variety of fields. Finally, the book discusses specific algorithms and methodologies using case studies to apply manifold learning for real-world problems.

Journal ArticleDOI
01 Feb 2011
TL;DR: Spectral Regression Kernel Discriminant Analysis is presented, which casts discriminant analysis into a regression framework, which facilitates both efficient computation and the use of regularization techniques.
Abstract: Linear discriminant analysis (LDA) has been a popular method for dimensionality reduction, which preserves class separability. The projection vectors are commonly obtained by maximizing the between-class covariance and simultaneously minimizing the within-class covariance. LDA can be performed either in the original input space or in the reproducing kernel Hilbert space (RKHS) into which data points are mapped, which leads to kernel discriminant analysis (KDA). When the data are highly nonlinear distributed, KDA can achieve better performance than LDA. However, computing the projective functions in KDA involves eigen-decomposition of kernel matrix, which is very expensive when a large number of training samples exist. In this paper, we present a new algorithm for kernel discriminant analysis, called Spectral Regression Kernel Discriminant Analysis (SRKDA). By using spectral graph analysis, SRKDA casts discriminant analysis into a regression framework, which facilitates both efficient computation and the use of regularization techniques. Specifically, SRKDA only needs to solve a set of regularized regression problems, and there is no eigenvector computation involved, which is a huge save of computational cost. The new formulation makes it very easy to develop incremental version of the algorithm, which can fully utilize the computational results of the existing training samples. Moreover, it is easy to produce sparse projections (Sparse KDA) with a L 1-norm regularizer. Extensive experiments on spoken letter, handwritten digit image and face image data demonstrate the effectiveness and efficiency of the proposed algorithm.

Journal ArticleDOI
TL;DR: The proposed DEFS is used to search for optimal subsets of features in datasets with varying dimensionality and is then utilized to aid in the selection of Wavelet Packet Transform best basis for classification problems, thus acting as a part of a feature extraction process.
Abstract: One of the fundamental motivations for feature selection is to overcome the curse of dimensionality problem. This paper presents a novel feature selection method utilizing a combination of differential evolution (DE) optimization method and a proposed repair mechanism based on feature distribution measures. The new method, abbreviated as DEFS, utilizes the DE float number optimizer in the combinatorial optimization problem of feature selection. In order to make the solutions generated by the float-optimizer suitable for feature selection, a roulette wheel structure is constructed and supplied with the probabilities of features distribution. These probabilities are constructed during iterations by identifying the features that contribute to the most promising solutions. The proposed DEFS is used to search for optimal subsets of features in datasets with varying dimensionality. It is then utilized to aid in the selection of Wavelet Packet Transform (WPT) best basis for classification problems, thus acting as a part of a feature extraction process. Practical results indicate the significance of the proposed method in comparison with other feature selection methods.

Journal ArticleDOI
TL;DR: By using a series of equivalent transformations, it is shown MEN is equivalent to the lasso penalized least square problem and thus LARS is adopted to obtain the optimal sparse solution of MEN.
Abstract: It is difficult to find the optimal sparse solution of a manifold learning based dimensionality reduction algorithm. The lasso or the elastic net penalized manifold learning based dimensionality reduction is not directly a lasso penalized least square problem and thus the least angle regression (LARS) (Efron et al., Ann Stat 32(2):407---499, 2004), one of the most popular algorithms in sparse learning, cannot be applied. Therefore, most current approaches take indirect ways or have strict settings, which can be inconvenient for applications. In this paper, we proposed the manifold elastic net or MEN for short. MEN incorporates the merits of both the manifold learning based dimensionality reduction and the sparse learning based dimensionality reduction. By using a series of equivalent transformations, we show MEN is equivalent to the lasso penalized least square problem and thus LARS is adopted to obtain the optimal sparse solution of MEN. In particular, MEN has the following advantages for subsequent classification: (1) the local geometry of samples is well preserved for low dimensional data representation, (2) both the margin maximization and the classification error minimization are considered for sparse projection calculation, (3) the projection matrix of MEN improves the parsimony in computation, (4) the elastic net penalty reduces the over-fitting problem, and (5) the projection matrix of MEN can be interpreted psychologically and physiologically. Experimental evidence on face recognition over various popular datasets suggests that MEN is superior to top level dimensionality reduction algorithms.

Journal ArticleDOI
TL;DR: All the eigenvalue problems solved in the context of explicit linear projections can be viewed as the projected analogues of the nonlinear or implicit projections, including kernels as a means of unifying linear and nonlinear methods.
Abstract: This paper gives an overview of the eigenvalue problems encountered in areas of data mining that are related to dimension reduction. Given some input high-dimensional data, the goal of dimension reduction is to map them to a low-dimensional space such that certain properties of the original data are preserved. Optimizing these properties among the reduced data can be typically posed as a trace optimization problem that leads to an eigenvalue problem. There is a rich variety of such problems and the goal of this paper is to unravel relations between them as well as to discuss effective solution techniques. First, we make a distinction between projective methods that determine an explicit linear mapping from the high-dimensional space to the low-dimensional space, and nonlinear methods where the mapping between the two is nonlinear and implicit. Then, we show that all of the eigenvalue problems solved in the context of explicit linear projections can be viewed as the projected analogues of the nonlinear or implicit projections. We also discuss kernels as a means of unifying linear and nonlinear methods and revisit some of the equivalences between methods established in this way. Finally, we provide some illustrative examples to showcase the behavior and the particular characteristics of the various dimension reduction techniques on real world data sets. Copyright c © 200 John Wiley & Sons, Ltd.