Showing papers on "Dimensionality reduction published in 2011"

PDF

Open Access

Journal Article•DOI•

Domain Adaptation via Transfer Component Analysis

[...]

Sinno Jialin Pan, Ivor W. Tsang¹, James T. Kwok², Qiang Yang²•Institutions (2)

Nanyang Technological University¹, Hong Kong University of Science and Technology²

01 Feb 2011-IEEE Transactions on Neural Networks

TL;DR: This work proposes a novel dimensionality reduction framework for reducing the distance between domains in a latent space for domain adaptation and proposes both unsupervised and semisupervised feature extraction approaches, which can dramatically reduce thedistance between domain distributions by projecting data onto the learned transfer components.

...read moreread less

Abstract: Domain adaptation allows knowledge from a source domain to be transferred to a different but related target domain. Intuitively, discovering a good feature representation across domains is crucial. In this paper, we first propose to find such a representation through a new learning method, transfer component analysis (TCA), for domain adaptation. TCA tries to learn some transfer components across domains in a reproducing kernel Hilbert space using maximum mean miscrepancy. In the subspace spanned by these transfer components, data properties are preserved and data distributions in different domains are close to each other. As a result, with the new representations in this subspace, we can apply standard machine learning methods to train classifiers or regression models in the source domain for use in the target domain. Furthermore, in order to uncover the knowledge hidden in the relations between the data labels from the source and target domains, we extend TCA in a semisupervised learning setting, which encodes label information into transfer components learning. We call this extension semisupervised TCA. The main contribution of our work is that we propose a novel dimensionality reduction framework for reducing the distance between domains in a latent space for domain adaptation. We propose both unsupervised and semisupervised feature extraction approaches, which can dramatically reduce the distance between domain distributions by projecting data onto the learned transfer components. Finally, our approach can handle large datasets and naturally lead to out-of-sample generalization. The effectiveness and efficiency of our approach are verified by experiments on five toy datasets and two real-world applications: cross-domain indoor WiFi localization and cross-domain text classification.

...read moreread less

3,195 citations

Journal Article•

MULAN: A Java Library for Multi-Label Learning

[...]

Grigorios Tsoumakas, Eleftherios Spyromitros-Xioufis, Jozef Vilcek, Ioannis Vlahavas

01 Feb 2011-Journal of Machine Learning Research

TL;DR: MULAN is a Java library for learning from multi-label data that offers a variety of classification, ranking, thresholding and dimensionality reduction algorithms, as well as algorithms forlearning from hierarchically structured labels.

...read moreread less

Abstract: MULAN is a Java library for learning from multi-label data. It offers a variety of classification, ranking, thresholding and dimensionality reduction algorithms, as well as algorithms for learning from hierarchically structured labels. In addition, it contains an evaluation framework that calculates a rich variety of performance measures.

...read moreread less

709 citations

Journal Article•

Dictionary Learning

[...]

Ivana Tosic, Pascal Frossard

01 Mar 2011-IEEE Signal Processing Magazine

TL;DR: Methods for learning dictionaries that are appropriate for the representation of given classes of signals and multisensor data are described and dimensionality reduction based on dictionary representation can be extended to address specific tasks such as data analy sis or classification.

...read moreread less

Abstract: We describe methods for learning dictionaries that are appropriate for the representation of given classes of signals and multisensor data. We further show that dimensionality reduction based on dictionary representation can be extended to address specific tasks such as data analy sis or classification when the learning includes a class separability criteria in the objective function. The benefits of dictionary learning clearly show that a proper understanding of causes underlying the sensed world is key to task-specific representation of relevant information in high-dimensional data sets.

...read moreread less

705 citations

Journal Article•DOI•

Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems

[...]

Kim-Anh Lê Cao¹, Simon Boitard², Philippe Besse³•Institutions (3)

University of Queensland¹, Institut national de la recherche agronomique², Institut de Mathématiques de Toulouse³

22 Jun 2011-BMC Bioinformatics

TL;DR: A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework and has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets.

...read moreread less

Abstract: Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits. A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework. sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets.

...read moreread less

672 citations

Proceedings Article•DOI•

Latent Low-Rank Representation for subspace segmentation and feature extraction

[...]

Guangcan Liu¹, Shuicheng Yan¹•Institutions (1)

National University of Singapore¹

06 Nov 2011

TL;DR: This paper proposes to construct the dictionary by using both observed and unobserved, hidden data, and shows that the effects of the hidden data can be approximately recovered by solving a nuclear norm minimization problem, which is convex and can be solved efficiently.

...read moreread less

Abstract: Low-Rank Representation (LRR) [16, 17] is an effective method for exploring the multiple subspace structures of data. Usually, the observed data matrix itself is chosen as the dictionary, which is a key aspect of LRR. However, such a strategy may depress the performance, especially when the observations are insufficient and/or grossly corrupted. In this paper we therefore propose to construct the dictionary by using both observed and unobserved, hidden data. We show that the effects of the hidden data can be approximately recovered by solving a nuclear norm minimization problem, which is convex and can be solved efficiently. The formulation of the proposed method, called Latent Low-Rank Representation (LatLRR), seamlessly integrates subspace segmentation and feature extraction into a unified framework, and thus provides us with a solution for both subspace segmentation and feature extraction. As a subspace segmentation algorithm, LatLRR is an enhanced version of LRR and outperforms the state-of-the-art algorithms. Being an unsupervised feature extraction algorithm, LatLRR is able to robustly extract salient features from corrupted data, and thus can work much better than the benchmark that utilizes the original data vectors as features for classification. Compared to dimension reduction based methods, LatLRR is more robust to noise.

...read moreread less

656 citations

Proceedings Article•DOI•

l 2,1 -norm regularized discriminative feature selection for unsupervised learning

[...]

Yi Yang¹, Heng Tao Shen¹, Zhigang Ma², Zi Huang¹, Xiaofang Zhou¹ - Show less +1 more•Institutions (2)

University of Queensland¹, University of Trento²

16 Jul 2011

TL;DR: In this paper, a joint framework for unsupervised feature selection is proposed to select the most discriminative feature subset from the whole feature set in batch mode, where the class label of input data can be predicted by a linear classifier.

...read moreread less

Abstract: Compared with supervised learning for feature selection, it is much more difficult to select the discriminative features in unsupervised learning due to the lack of label information. Traditional unsupervised feature selection algorithms usually select the features which best preserve the data distribution, e.g., manifold structure, of the whole feature set. Under the assumption that the class label of input data can be predicted by a linear classifier, we incorporate discriminative analysis and l2,1-norm minimization into a joint framework for unsupervised feature selection. Different from existing unsupervised feature selection algorithms, our algorithm selects the most discriminative feature subset from the whole feature set in batch mode. Extensive experiment on different data types demonstrates the effectiveness of our algorithm.

...read moreread less

613 citations

Journal Article•DOI•

Sparse Discriminant Analysis

[...]

Line Katrine Harder Clemmensen¹, Trevor Hastie², Daniela Witten³, Bjarne Kjær Ersbøll¹•Institutions (3)

Technical University of Denmark¹, Stanford University², University of Washington³

01 Nov 2011-Technometrics

TL;DR: This work proposes sparse discriminantAnalysis, a method for performing linear discriminant analysis with a sparseness criterion imposed such that classification and feature selection are performed simultaneously in the high-dimensional setting.

...read moreread less

Abstract: We consider the problem of performing interpretable classification in the high-dimensional setting, in which the number of features is very large and the number of observations is limited. This setting has been studied extensively in the chemometrics literature, and more recently has become commonplace in biological and medical applications. In this setting, a traditional approach involves performing feature selection before classification. We propose sparse discriminant analysis, a method for performing linear discriminant analysis with a sparseness criterion imposed such that classification and feature selection are performed simultaneously. Sparse discriminant analysis is based on the optimal scoring interpretation of linear discriminant analysis, and can be extended to perform sparse discrimination via mixtures of Gaussians if boundaries between classes are nonlinear or if subgroups are present within each class. Our proposal also provides low-dimensional views of the discriminative directions.

...read moreread less

565 citations

Journal Article•DOI•

Discriminative Learning of Local Image Descriptors

[...]

Matthew Brown¹, Gang Hua², Simon Winder³•Institutions (3)

École Normale Supérieure¹, Nokia², Microsoft³

01 Jan 2011-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A set of building blocks for constructing descriptors which can be combined together and jointly optimized so as to minimize the error of a nearest-neighbor classifier are described.

...read moreread less

Abstract: In this paper, we explore methods for learning local image descriptors from training data. We describe a set of building blocks for constructing descriptors which can be combined together and jointly optimized so as to minimize the error of a nearest-neighbor classifier. We consider both linear and nonlinear transforms with dimensionality reduction, and make use of discriminant learning techniques such as Linear Discriminant Analysis (LDA) and Powell minimization to solve for the parameters. Using these techniques, we obtain descriptors that exceed state-of-the-art performance with low dimensionality. In addition to new experiments and recommendations for descriptor learning, we are also making available a new and realistic ground truth data set based on multiview stereo data.

...read moreread less

520 citations

Proceedings Article•DOI•

Language Recognition via i-vectors and Dimensionality Reduction.

[...]

Najim Dehak¹, Pedro A. Torres-Carrasquillo¹, Douglas A. Reynolds¹, Réda Dehak²•Institutions (2)

Massachusetts Institute of Technology¹, École Pour l'Informatique et les Techniques Avancées²

27 Aug 2011

TL;DR: In this paper, a new language identification system is presented based on the total variability approach previously developed in the field of speaker identification and various techniques are employed to extract the most salient features in the lower dimensional i-vector space.

...read moreread less

Abstract: In this paper, a new language identification system is presented based on the total variability approach previously developed in the field of speaker identification. Various techniques are employed to extract the most salient features in the lower dimensional i-vector space and the system developed results in excellent performance on the 2009 LRE evaluation set without the need for any post-processing or backend techniques. Additional performance gains are observed when the system is combined with other acoustic systems.

...read moreread less

438 citations

Journal Article•DOI•

A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm

[...]

Harun Uğuz¹•Institutions (1)

Selçuk University¹

01 Oct 2011-Knowledge Based Systems

TL;DR: Two-stage feature selection and feature extraction is used to improve the performance of text categorization and the proposed model is able to achieve high categorization effectiveness as measured by precision, recall and F-measure.

...read moreread less

Abstract: Text categorization is widely used when organizing documents in a digital form. Due to the increasing number of documents in digital form, automated text categorization has become more promising in the last ten years. A major problem of text categorization is its large number of features. Most of those are irrelevant noise that can mislead the classifier. Therefore, feature selection is often used in text categorization to reduce the dimensionality of the feature space and to improve performance. In this study, two-stage feature selection and feature extraction is used to improve the performance of text categorization. In the first stage, each term within the document is ranked depending on their importance for classification using the information gain (IG) method. In the second stage, genetic algorithm (GA) and principal component analysis (PCA) feature selection and feature extraction methods are applied separately to the terms which are ranked in decreasing order of importance, and a dimension reduction is carried out. Thereby, during text categorization, terms of less importance are ignored, and feature selection and extraction methods are applied to the terms of highest importance; thus, the computational time and complexity of categorization is reduced. To evaluate the effectiveness of dimension reduction methods on our purposed model, experiments are conducted using the k-nearest neighbour (KNN) and C4.5 decision tree algorithm on Reuters-21,578 and Classic3 datasets collection for text categorization. The experimental results show that the proposed model is able to achieve high categorization effectiveness as measured by precision, recall and F-measure.

...read moreread less

431 citations

Journal Article•DOI•

A survey of multilinear subspace learning for tensor data

[...]

Haiping Lu¹, Konstantinos N. Plataniotis², Anastasios N. Venetsanopoulos²•Institutions (2)

Agency for Science, Technology and Research¹, University of Toronto²

01 Jul 2011-Pattern Recognition

TL;DR: The central issues of MSL are discussed, including establishing the foundations of the field via multilinear projections, formulating a unifying MSL framework for systematic treatment of the problem, and examining the algorithmic aspects of typical MSL solutions.

...read moreread less

Journal Article•DOI•

Improved analysis of the subsampled randomized hadamard transform

[...]

Joel A. Tropp¹•Institutions (1)

California Institute of Technology¹

20 Nov 2011-Advances in Adaptive Data Analysis

TL;DR: In this article, an improved analysis of a structured dimension reduction map called the subsampled randomized Hadamard transform is presented, and the new proof is much simpler than previous approaches, and it offers optimal constants in the estimate on the number of dimensions required for the embedding.

...read moreread less

Abstract: This paper presents an improved analysis of a structured dimension-reduction map called the subsampled randomized Hadamard transform. This argument demonstrates that the map preserves the Euclidean geometry of an entire subspace of vectors. The new proof is much simpler than previous approaches, and it offers — for the first time — optimal constants in the estimate on the number of dimensions required for the embedding.

...read moreread less

Proceedings Article•DOI•

High-dimensional signature compression for large-scale image classification

[...]

Jorge Sanchez¹, Florent Perronnin¹•Institutions (1)

Xerox¹

20 Jun 2011

TL;DR: This work reports results on two large databases — ImageNet and a dataset of lM Flickr images — showing that it can reduce the storage of the authors' signatures by a factor 64 to 128 with little loss in accuracy and integrating the decompression in the classifier learning yields an efficient and scalable training algorithm.

...read moreread less

Abstract: We address image classification on a large-scale, i.e. when a large number of images and classes are involved. First, we study classification accuracy as a function of the image signature dimensionality and the training set size. We show experimentally that the larger the training set, the higher the impact of the dimensionality on the accuracy. In other words, high-dimensional signatures are important to obtain state-of-the-art results on large datasets. Second, we tackle the problem of data compression on very large signatures (on the order of 105 dimensions) using two lossy compression strategies: a dimensionality reduction technique known as the hash kernel and an encoding technique based on product quantizers. We explain how the gain in storage can be traded against a loss in accuracy and/or an increase in CPU cost. We report results on two large databases — ImageNet and a dataset of lM Flickr images — showing that we can reduce the storage of our signatures by a factor 64 to 128 with little loss in accuracy. Integrating the decompression in the classifier learning yields an efficient and scalable training algorithm. On ILSVRC2010 we report a 74.3% accuracy at top-5, which corresponds to a 2.5% absolute improvement with respect to the state-of-the-art. On a subset of 10K classes of ImageNet we report a top-1 accuracy of 16.7%, a relative improvement of 160% with respect to the state-of-the-art.

...read moreread less

Journal Article•DOI•

Canonical Correlation Analysis for Multilabel Classification: A Least-Squares Formulation, Extensions, and Analysis

[...]

Liang Sun¹, Shuiwang Ji¹, Jieping Ye¹•Institutions (1)

Arizona State University¹

01 Jan 2011-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is shown that under a mild condition which tends to hold for high-dimensional data, CCA in the multilabel case can be formulated as a least-squares problem, and several CCA extensions are proposed, including the sparse CCA formulation based on the 1-norm regularization.

...read moreread less

Abstract: Canonical Correlation Analysis (CCA) is a well-known technique for finding the correlations between two sets of multidimensional variables. It projects both sets of variables onto a lower-dimensional space in which they are maximally correlated. CCA is commonly applied for supervised dimensionality reduction in which the two sets of variables are derived from the data and the class labels, respectively. It is well-known that CCA can be formulated as a least-squares problem in the binary class case. However, the extension to the more general setting remains unclear. In this paper, we show that under a mild condition which tends to hold for high-dimensional data, CCA in the multilabel case can be formulated as a least-squares problem. Based on this equivalence relationship, efficient algorithms for solving least-squares problems can be applied to scale CCA to very large data sets. In addition, we propose several CCA extensions, including the sparse CCA formulation based on the 1-norm regularization. We further extend the least-squares formulation to partial least squares. In addition, we show that the CCA projection for one set of variables is independent of the regularization on the other set of multidimensional variables, providing new insights on the effect of regularization on CCA. We have conducted experiments using benchmark data sets. Experiments on multilabel data sets confirm the established equivalence relationships. Results also demonstrate the effectiveness and efficiency of the proposed CCA extensions.

...read moreread less

Journal Article•DOI•

Human face recognition based on multidimensional PCA and extreme learning machine

[...]

Abdul Adeel Mohammed¹, Rashid Minhas¹, Q. M. Jonathan Wu¹, Maher A. Sid-Ahmed¹•Institutions (1)

University of Windsor¹

01 Oct 2011-Pattern Recognition

TL;DR: A new human face recognition algorithm based on bidirectional two dimensional principal component analysis (B2DPCA) and extreme learning machine (ELM) and a subband that exhibits a maximum standard deviation is dimensionally reduced using an improved dimensionality reduction technique.

...read moreread less

Journal Article•DOI•

Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds

[...]

Elnaz Barshan¹, Ali Ghodsi², Zohreh Azimifar¹, Mansoor Zolghadri Jahromi¹•Institutions (2)

Shiraz University¹, University of Waterloo²

01 Jul 2011-Pattern Recognition

TL;DR: This work proposes ''supervised principal component analysis (supervised PCA)'', a generalization of PCA that is uniquely effective for regression and classification problems with high-dimensional input data and shows significant improvement over other supervised approaches both in accuracy and computational efficiency.

...read moreread less

Proceedings Article•

Sparse Manifold Clustering and Embedding

[...]

Ehsan Elhamifar¹, René Vidal¹•Institutions (1)

Johns Hopkins University¹

12 Dec 2011

TL;DR: An algorithm called Sparse Manifold Clustering and Embedding (SMCE) for simultaneous clustering and dimensionality reduction of data lying in multiple nonlinear manifolds finds a small neighborhood around each data point and connects each point to its neighbors with appropriate weights.

...read moreread less

Abstract: We propose an algorithm called Sparse Manifold Clustering and Embedding (SMCE) for simultaneous clustering and dimensionality reduction of data lying in multiple nonlinear manifolds. Similar to most dimensionality reduction methods, SMCE finds a small neighborhood around each data point and connects each point to its neighbors with appropriate weights. The key difference is that SMCE finds both the neighbors and the weights automatically. This is done by solving a sparse optimization problem, which encourages selecting nearby points that lie in the same manifold and approximately span a low-dimensional affine subspace. The optimal solution encodes information that can be used for clustering and dimensionality reduction using spectral clustering and embedding. Moreover, the size of the optimal neighborhood of a data point, which can be different for different points, provides an estimate of the dimension of the manifold to which the point belongs. Experiments demonstrate that our method can effectively handle multiple manifolds that are very close to each other, manifolds with non-uniform sampling and holes, as well as estimate the intrinsic dimensions of the manifolds.

...read moreread less

Journal Article•DOI•

Simplifying the representation of complex free-energy landscapes using sketch-map

[...]

Michele Ceriotti¹, Gareth A. Tribello², Michele Parrinello²•Institutions (2)

University of Oxford¹, University of Lugano²

09 Aug 2011-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: It is proposed that when dimensionality reduction is performed on trajectory data one should think of the resultant embedding as a quickly sketched set of directions rather than a road map, because some features of the free-energy surface are inherently high-dimensional.

...read moreread less

Abstract: A new scheme, sketch-map, for obtaining a low-dimensional representation of the region of phase space explored during an enhanced dynamics simulation is proposed. We show evidence, from an examination of the distribution of pairwise distances between frames, that some features of the free-energy surface are inherently high-dimensional. This makes dimensionality reduction problematic because the data does not satisfy the assumptions made in conventional manifold learning algorithms We therefore propose that when dimensionality reduction is performed on trajectory data one should think of the resultant embedding as a quickly sketched set of directions rather than a road map. In other words, the embedding tells one about the connectivity between states but does not provide the vectors that correspond to the slow degrees of freedom. This realization informs the development of sketch-map, which endeavors to reproduce the proximity information from the high-dimensionality description in a space of lower dimensionality even when a faithful embedding is not possible.

...read moreread less

Journal Article•DOI•

Sparse principal component analysis and iterative thresholding

[...]

Zongming Ma

12 Dec 2011-arXiv: Statistics Theory

TL;DR: Under a spiked covariance model, a new iterative thresholding approach for estimating principal subspaces in the setting where the leading eigenvectors are sparse is proposed and it is found that the new approach recovers the principal subspace and leading eignevectors consistently, and even optimally, in a range of high-dimensional sparse settings.

...read moreread less

Abstract: Principal component analysis (PCA) is a classical dimension reduction method which projects data onto the principal subspace spanned by the leading eigenvectors of the covariance matrix. However, it behaves poorly when the number of features p is comparable to, or even much larger than, the sample size n. In this paper, we propose a new iterative thresholding approach for estimating principal subspaces in the setting where the leading eigenvectors are sparse. Under a spiked covariance model, we find that the new approach recovers the principal subspace and leading eigenvectors consistently, and even optimally, in a range of high-dimensional sparse settings. Simulated examples also demonstrate its competitive performance.

...read moreread less

Journal Article•DOI•

An Efficient Method for Supervised Hyperspectral Band Selection

[...]

He Yang¹, Qian Du¹, Hongjun Su², Yehua Sheng²•Institutions (2)

Mississippi State University¹, Nanjing Normal University²

01 Jan 2011-IEEE Geoscience and Remote Sensing Letters

TL;DR: A new supervised band-selection algorithm that uses the known class signatures only without examining the original bands or the need of class training samples is proposed, which can complete the task much faster than traditional methods that test bands or band combinations.

...read moreread less

Abstract: Band selection is often applied to reduce the dimensionality of hyperspectral imagery. When the desired object information is known, it can be achieved by finding the bands that contain the most object information. It is expected that these bands can provide an overall satisfactory detection and classification performance. In this letter, we propose a new supervised band-selection algorithm that uses the known class signatures only without examining the original bands or the need of class training samples. Thus, it can complete the task much faster than traditional methods that test bands or band combinations. The experimental result shows that our approach can generally yield better results than other popular supervised band-selection methods in the literature.

...read moreread less

Journal Article•DOI•

Multiple Kernel Learning for Dimensionality Reduction

[...]

Yen-Yu Lin, Tyng-Luh Liu, Chiou-Shann Fuh¹•Institutions (1)

National Taiwan University¹

01 Jun 2011-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The proposed approach generalizes the framework of multiple kernel learning for dimensionality reduction, and distinguishes itself with the following three main contributions: first, the method provides the convenience of using diverse image descriptors to describe useful characteristics of various aspects about the underlying data, and consequently improves their effectiveness.

...read moreread less

Abstract: In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. The resulting data representations are typically high-dimensional and assume diverse forms. Hence, finding a way of transforming them into a unified space of lower dimension generally facilitates the underlying tasks such as object recognition or clustering. To this end, the proposed approach (termed MKL-DR) generalizes the framework of multiple kernel learning for dimensionality reduction, and distinguishes itself with the following three main contributions: First, our method provides the convenience of using diverse image descriptors to describe useful characteristics of various aspects about the underlying data. Second, it extends a broad set of existing dimensionality reduction techniques to consider multiple kernel learning, and consequently improves their effectiveness. Third, by focusing on the techniques pertaining to dimensionality reduction, the formulation introduces a new class of applications with the multiple kernel learning framework to address not only the supervised learning problems but also the unsupervised and semi-supervised ones.

...read moreread less

Journal Article•DOI•

A Pareto Corner Search Evolutionary Algorithm and Dimensionality Reduction in Many-Objective Optimization Problems

[...]

Hemant Kumar Singh¹, Amitay Isaacs¹, Tapabrata Ray¹•Institutions (1)

University of New South Wales¹

01 Aug 2011-IEEE Transactions on Evolutionary Computation

TL;DR: A novel algorithm, Pareto corner search evolutionary algorithm (PCSEA), is introduced in this paper, which searches for the corners of the PareTO front instead of searching for the complete Pare to front to identify the relevant objectives.

...read moreread less

Abstract: Many-objective optimization refers to the optimization problems containing large number of objectives, typically more than four. Non-dominance is an inadequate strategy for convergence to the Pareto front for such problems, as almost all solutions in the population become non-dominated, resulting in loss of convergence pressure. However, for some problems, it may be possible to generate the Pareto front using only a few of the objectives, rendering the rest of the objectives redundant. Such problems may be reducible to a manageable number of relevant objectives, which can be optimized using conventional multiobjective evolutionary algorithms (MOEAs). For dimensionality reduction, most proposals in the paper rely on analysis of a representative set of solutions obtained by running a conventional MOEA for a large number of generations, which is computationally overbearing. A novel algorithm, Pareto corner search evolutionary algorithm (PCSEA), is introduced in this paper, which searches for the corners of the Pareto front instead of searching for the complete Pareto front. The solutions obtained using PCSEA are then used for dimensionality reduction to identify the relevant objectives. The potential of the proposed approach is demonstrated by studying its performance on a set of benchmark test problems and two engineering examples. While the preliminary results obtained using PCSEA are promising, there are a number of areas that need further investigation. This paper provides a number of useful insights into dimensionality reduction and, in particular, highlights some of the roadblocks that need to be cleared for future development of algorithms attempting to use few selected solutions for identifying relevant objectives.

...read moreread less

Proceedings Article•DOI•

Simultaneous dimensionality reduction and human age estimation via kernel partial least squares regression

[...]

Guodong Guo¹, Guowang Mu²•Institutions (2)

West Virginia University¹, Hebei University of Technology²

20 Jun 2011

TL;DR: Experimental results on a very large database show that the KPLS is significantly better than the popular SVM method, and outperform the state-of-the-art approaches in human age estimation.

...read moreread less

Abstract: Human age estimation has recently become an active research topic in computer vision and pattern recognition, because of many potential applications in reality. In this paper we propose to use the kernel partial least squares (KPLS) regression for age estimation. The KPLS (or linear PLS) method has several advantages over previous approaches: (1) the KPLS can reduce feature dimensionality and learn the aging function simultaneously in a single learning framework, instead of performing each task separately using different techniques; (2) the KPLS can find a small number of latent variables, e.g., 20, to project thousands of features into a very low-dimensional subspace, which may have great impact on real-time applications; and (3) the KPLS regression has an output vector that can contain multiple labels, so that several related problems, e.g., age estimation, gender classification, and ethnicity estimation can be solved altogether. This is the first time that the kernel PLS method is introduced and applied to solve a regression problem in computer vision with high accuracy. Experimental results on a very large database show that the KPLS is significantly better than the popular SVM method, and outperform the state-of-the-art approaches in human age estimation.

...read moreread less

Journal Article•

Locally Defined Principal Curves and Surfaces

[...]

Umut Ozertem, Deniz Erdogmus

01 Feb 2011-Journal of Machine Learning Research

TL;DR: A novel theoretical understanding of principal curves and surfaces, practical algorithms as general purpose machine learning tools, and applications of these algorithms to several practical problems are presented.

...read moreread less

Abstract: Principal curves are defined as self-consistent smooth curves passing through the middle of the data, and they have been used in many applications of machine learning as a generalization, dimensionality reduction and a feature extraction tool. We redefine principal curves and surfaces in terms of the gradient and the Hessian of the probability density estimate. This provides a geometric understanding of the principal curves and surfaces, as well as a unifying view for clustering, principal curve fitting and manifold learning by regarding those as principal manifolds of different intrinsic dimensionalities. The theory does not impose any particular density estimation method can be used with any density estimator that gives continuous first and second derivatives. Therefore, we first present our principal curve/surface definition without assuming any particular density estimation method. Afterwards, we develop practical algorithms for the commonly used kernel density estimation (KDE) and Gaussian mixture models (GMM). Results of these algorithms are presented in notional data sets as well as real applications with comparisons to other approaches in the principal curve literature. All in all, we present a novel theoretical understanding of principal curves and surfaces, practical algorithms as general purpose machine learning tools, and applications of these algorithms to several practical problems.

...read moreread less

Proceedings Article•DOI•

Joint feature selection and subspace learning

[...]

Quanquan Gu¹, Zhenhui Li¹, Jiawei Han¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

16 Jul 2011

TL;DR: This paper reformulate the subspace learning problem and uses L2,1-norm on the projection matrix to achieve row-sparsity, which leads to selecting relevant features and learning transformation simultaneously.

...read moreread less

Abstract: Dimensionality reduction is a very important topic in machine learning. It can be generally classified into two categories: feature selection and subspace learning. In the past decades, many methods have been proposed for dimensionality reduction. However, most of these works study feature selection and subspace learning independently. In this paper, we present a framework for joint feature selection and subspace learning. We reformulate the subspace learning problem and use L2,1-norm on the projection matrix to achieve row-sparsity, which leads to selecting relevant features and learning transformation simultaneously. We discuss two situations of the proposed framework, and present their optimization algorithms. Experiments on benchmark face recognition data sets illustrate that the proposed framework outperforms the state of the art methods overwhelmingly.

...read moreread less

Book•DOI•

Manifold Learning Theory and Applications

[...]

Yunqian Ma, Yun Fu

20 Dec 2011

TL;DR: Comprehensive in its coverage, this pioneering work explores this novel modality from algorithm creation to successful implementation of manifold learning, offering examples of applications in medical, biometrics, multimedia, and computer vision.

...read moreread less

Abstract: Trained to extract actionable information from large volumes of high-dimensional data, engineers and scientists often have trouble isolating meaningful low-dimensional structures hidden in their high-dimensional observations. Manifold learning, a groundbreaking technique designed to tackle these issues of dimensionality reduction, finds widespread application in machine learning, neural networks, pattern recognition, image processing, and computer vision. Filling a void in the literature, Manifold Learning Theory and Applications incorporates state-of-the-art techniques in manifold learning with a solid theoretical and practical treatment of the subject. Comprehensive in its coverage, this pioneering work explores this novel modality from algorithm creation to successful implementationoffering examples of applications in medical, biometrics, multimedia, and computer vision. Emphasizing implementation, it highlights the various permutations of manifold learning in industry including manifold optimization, large scale manifold learning, semidefinite programming for embedding, manifold models for signal acquisition, compression and processing, and multi scale manifold. Beginning with an introduction to manifold learning theories and applications, the book includes discussions on the relevance to nonlinear dimensionality reduction, clustering, graph-based subspace learning, spectral learning and embedding, extensions, and multi-manifold modeling. It synergizes cross-domain knowledge for interdisciplinary instructions, offers a rich set of specialized topics contributed by expert professionals and researchers from a variety of fields. Finally, the book discusses specific algorithms and methodologies using case studies to apply manifold learning for real-world problems.

...read moreread less

Journal Article•DOI•

Speed up kernel discriminant analysis

[...]

Deng Cai¹, Xiaofei He¹, Jiawei Han²•Institutions (2)

Zhejiang University¹, University of Illinois at Urbana–Champaign²

01 Feb 2011

TL;DR: Spectral Regression Kernel Discriminant Analysis is presented, which casts discriminant analysis into a regression framework, which facilitates both efficient computation and the use of regularization techniques.

...read moreread less

Abstract: Linear discriminant analysis (LDA) has been a popular method for dimensionality reduction, which preserves class separability. The projection vectors are commonly obtained by maximizing the between-class covariance and simultaneously minimizing the within-class covariance. LDA can be performed either in the original input space or in the reproducing kernel Hilbert space (RKHS) into which data points are mapped, which leads to kernel discriminant analysis (KDA). When the data are highly nonlinear distributed, KDA can achieve better performance than LDA. However, computing the projective functions in KDA involves eigen-decomposition of kernel matrix, which is very expensive when a large number of training samples exist. In this paper, we present a new algorithm for kernel discriminant analysis, called Spectral Regression Kernel Discriminant Analysis (SRKDA). By using spectral graph analysis, SRKDA casts discriminant analysis into a regression framework, which facilitates both efficient computation and the use of regularization techniques. Specifically, SRKDA only needs to solve a set of regularized regression problems, and there is no eigenvector computation involved, which is a huge save of computational cost. The new formulation makes it very easy to develop incremental version of the algorithm, which can fully utilize the computational results of the existing training samples. Moreover, it is easy to produce sparse projections (Sparse KDA) with a L 1-norm regularizer. Extensive experiments on spoken letter, handwritten digit image and face image data demonstrate the effectiveness and efficiency of the proposed algorithm.

...read moreread less

Journal Article•DOI•

Feature subset selection using differential evolution and a statistical repair mechanism

[...]

Rami N. Khushaba¹, Ahmed Al-Ani¹, Adel Al-Jumaily¹•Institutions (1)

University of Technology, Sydney¹

01 Sep 2011-Expert Systems With Applications

TL;DR: The proposed DEFS is used to search for optimal subsets of features in datasets with varying dimensionality and is then utilized to aid in the selection of Wavelet Packet Transform best basis for classification problems, thus acting as a part of a feature extraction process.

...read moreread less

Abstract: One of the fundamental motivations for feature selection is to overcome the curse of dimensionality problem. This paper presents a novel feature selection method utilizing a combination of differential evolution (DE) optimization method and a proposed repair mechanism based on feature distribution measures. The new method, abbreviated as DEFS, utilizes the DE float number optimizer in the combinatorial optimization problem of feature selection. In order to make the solutions generated by the float-optimizer suitable for feature selection, a roulette wheel structure is constructed and supplied with the probabilities of features distribution. These probabilities are constructed during iterations by identifying the features that contribute to the most promising solutions. The proposed DEFS is used to search for optimal subsets of features in datasets with varying dimensionality. It is then utilized to aid in the selection of Wavelet Packet Transform (WPT) best basis for classification problems, thus acting as a part of a feature extraction process. Practical results indicate the significance of the proposed method in comparison with other feature selection methods.

...read moreread less

Journal Article•DOI•

Manifold elastic net: a unified framework for sparse dimension reduction

[...]

Tianyi Zhou¹, Dacheng Tao¹, Xindong Wu²•Institutions (2)

Nanyang Technological University¹, University of Vermont²

01 May 2011-Data Mining and Knowledge Discovery

TL;DR: By using a series of equivalent transformations, it is shown MEN is equivalent to the lasso penalized least square problem and thus LARS is adopted to obtain the optimal sparse solution of MEN.

...read moreread less

Abstract: It is difficult to find the optimal sparse solution of a manifold learning based dimensionality reduction algorithm. The lasso or the elastic net penalized manifold learning based dimensionality reduction is not directly a lasso penalized least square problem and thus the least angle regression (LARS) (Efron et al., Ann Stat 32(2):407---499, 2004), one of the most popular algorithms in sparse learning, cannot be applied. Therefore, most current approaches take indirect ways or have strict settings, which can be inconvenient for applications. In this paper, we proposed the manifold elastic net or MEN for short. MEN incorporates the merits of both the manifold learning based dimensionality reduction and the sparse learning based dimensionality reduction. By using a series of equivalent transformations, we show MEN is equivalent to the lasso penalized least square problem and thus LARS is adopted to obtain the optimal sparse solution of MEN. In particular, MEN has the following advantages for subsequent classification: (1) the local geometry of samples is well preserved for low dimensional data representation, (2) both the margin maximization and the classification error minimization are considered for sparse projection calculation, (3) the projection matrix of MEN improves the parsimony in computation, (4) the elastic net penalty reduces the over-fitting problem, and (5) the projection matrix of MEN can be interpreted psychologically and physiologically. Experimental evidence on face recognition over various popular datasets suggests that MEN is superior to top level dimensionality reduction algorithms.

...read moreread less

Journal Article•DOI•

Trace optimization and eigenproblems in dimension reduction methods

[...]

Effrosini Kokiopoulou, Jie Chen¹, Yousef Saad¹•Institutions (1)

University of Minnesota¹

01 May 2011-Numerical Linear Algebra With Applications

TL;DR: All the eigenvalue problems solved in the context of explicit linear projections can be viewed as the projected analogues of the nonlinear or implicit projections, including kernels as a means of unifying linear and nonlinear methods.

...read moreread less

Abstract: This paper gives an overview of the eigenvalue problems encountered in areas of data mining that are related to dimension reduction. Given some input high-dimensional data, the goal of dimension reduction is to map them to a low-dimensional space such that certain properties of the original data are preserved. Optimizing these properties among the reduced data can be typically posed as a trace optimization problem that leads to an eigenvalue problem. There is a rich variety of such problems and the goal of this paper is to unravel relations between them as well as to discuss effective solution techniques. First, we make a distinction between projective methods that determine an explicit linear mapping from the high-dimensional space to the low-dimensional space, and nonlinear methods where the mapping between the two is nonlinear and implicit. Then, we show that all of the eigenvalue problems solved in the context of explicit linear projections can be viewed as the projected analogues of the nonlinear or implicit projections. We also discuss kernels as a means of unifying linear and nonlinear methods and revisit some of the equivalences between methods established in this way. Finally, we provide some illustrative examples to showcase the behavior and the particular characteristics of the various dimension reduction techniques on real world data sets. Copyright c © 200 John Wiley & Sons, Ltd.

...read moreread less

Collapse