scispace - formally typeset
Search or ask a question

Showing papers on "Dimensionality reduction published in 2008"


Journal Article
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large datasets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of datasets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the datasets.

30,124 citations


Journal ArticleDOI
TL;DR: In this article, the authors introduce the concept of sure screening and propose a sure screening method that is based on correlation learning, called sure independence screening, to reduce dimensionality from high to a moderate scale that is below the sample size.
Abstract: Summary. Variable selection plays an important role in high dimensional statistical modelling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, accuracy of estimation and computational cost are two top concerns. Recently, Candes and Tao have proposed the Dantzig selector using L1-regularization and showed that it achieves the ideal risk up to a logarithmic factor log (p). Their innovative procedure and remarkable result are challenged when the dimensionality is ultrahigh as the factor log (p) can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method that is based on correlation learning, called sure independence screening, to reduce dimensionality from high to a moderate scale that is below the sample size. In a fairly general asymptotic framework, correlation learning is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, iterative sure independence screening is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be accomplished by a well-developed method such as smoothly clipped absolute deviation, the Dantzig selector, lasso or adaptive lasso. The connections between these penalized least squares methods are also elucidated.

2,204 citations


Book
17 Jul 2008
TL;DR: The purpose of the book is to summarize clear facts and ideas about well-known methods as well as recent developments in the topic of nonlinear dimensionality reduction, which encompasses many of the recently developed methods.
Abstract: Methods of dimensionality reduction provide a way to understand and visualize the structure of complex data sets. Traditional methods like principal component analysis and classical metric multidimensional scaling suffer from being based on linear models. Until recently, very few methods were able to reduce the data dimensionality in a nonlinear way. However, since the late nineties, many new methods have been developed and nonlinear dimensionality reduction, also called manifold learning, has become a hot topic. New advances that account for this rapid growth are, e.g. the use of graphs to represent the manifold topology, and the use of new metrics like the geodesic distance. In addition, new optimization schemes, based on kernel techniques and spectral decomposition, have lead to spectral embedding, which encompasses many of the recently developed methods. This book describes existing and advanced methods to reduce the dimensionality of numerical databases. For each method, the description starts from intuitive ideas, develops the necessary mathematical details, and ends by outlining the algorithmic implementation. Methods are compared with each other with the help of different illustrative examples. The purpose of the book is to summarize clear facts and ideas about well-known methods as well as recent developments in the topic of nonlinear dimensionality reduction. With this goal in mind, methods are all described from a unifying point of view, in order to highlight their respective strengths and shortcomings. The book is primarily intended for statisticians, computer scientists and data analysts. It is also accessible to other practitioners having a basic background in statistics and/or computational learning, like psychologists (in psychometry) and economists.

1,435 citations


Journal ArticleDOI
TL;DR: This paper introduces a new minimum mean square error-based approach to infer the signal subspace in hyperspectral imagery, which is eigen decomposition based, unsupervised, and fully automatic.
Abstract: Signal subspace identification is a crucial first step in many hyperspectral processing algorithms such as target detection, change detection, classification, and unmixing. The identification of this subspace enables a correct dimensionality reduction, yielding gains in algorithm performance and complexity and in data storage. This paper introduces a new minimum mean square error-based approach to infer the signal subspace in hyperspectral imagery. The method, which is termed hyperspectral signal identification by minimum error, is eigen decomposition based, unsupervised, and fully automatic (i.e., it does not depend on any tuning parameters). It first estimates the signal and noise correlation matrices and then selects the subset of eigenvalues that best represents the signal subspace in the least squared error sense. State-of-the-art performance of the proposed method is illustrated by using simulated and real hyperspectral images.

1,154 citations


Journal ArticleDOI
TL;DR: An approach has been proposed which is based on using several principal components from the hyperspectral data and build morphological profiles which can be used all together in one extended morphological profile for classification of urban structures.
Abstract: A method is proposed for the classification of urban hyperspectral data with high spatial resolution. The approach is an extension of previous approaches and uses both the spatial and spectral information for classification. One previous approach is based on using several principal components (PCs) from the hyperspectral data and building several morphological profiles (MPs). These profiles can be used all together in one extended MP. A shortcoming of that approach is that it was primarily designed for classification of urban structures and it does not fully utilize the spectral information in the data. Similarly, the commonly used pixelwise classification of hyperspectral data is solely based on the spectral content and lacks information on the structure of the features in the image. The proposed method overcomes these problems and is based on the fusion of the morphological information and the original hyperspectral data, i.e., the two vectors of attributes are concatenated into one feature vector. After a reduction of the dimensionality, the final classification is achieved by using a support vector machine classifier. The proposed approach is tested in experiments on ROSIS data from urban areas. Significant improvements are achieved in terms of accuracies when compared to results obtained for approaches based on the use of MPs based on PCs only and conventional spectral classification. For instance, with one data set, the overall accuracy is increased from 79% to 83% without any feature reduction and to 87% with feature reduction. The proposed approach also shows excellent results with a limited training set.

1,092 citations


Journal ArticleDOI
TL;DR: It is shown that even without a fully optimized design, an MPCA-based gait recognition module achieves highly competitive performance and compares favorably to the state-of-the-art gait recognizers.
Abstract: This paper introduces a multilinear principal component analysis (MPCA) framework for tensor object feature extraction. Objects of interest in many computer vision and pattern recognition applications, such as 2D/3D images and video sequences are naturally described as tensors or multilinear arrays. The proposed framework performs feature extraction by determining a multilinear projection that captures most of the original tensorial input variation. The solution is iterative in nature and it proceeds by decomposing the original problem to a series of multiple projection subproblems. As part of this work, methods for subspace dimensionality determination are proposed and analyzed. It is shown that the MPCA framework discussed in this work supplants existing heterogeneous solutions such as the classical principal component analysis (PCA) and its 2D variant (2D PCA). Finally, a tensor object recognition system is proposed with the introduction of a discriminative tensor feature selection mechanism and a novel classification strategy, and applied to the problem of gait recognition. Results presented here indicate MPCA's utility as a feature extraction tool. It is shown that even without a fully optimized design, an MPCA-based gait recognition module achieves highly competitive performance and compares favorably to the state-of-the-art gait recognizers.

856 citations


Journal ArticleDOI
TL;DR: This work introduces a novel vocabulary using dense color SIFT descriptors and investigates the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM).
Abstract: We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent ";topics"; using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos.

778 citations


Journal ArticleDOI
TL;DR: Sparse PCA via regularized SVD provides a uniform treatment of both classical multivariate data and high-dimension-low-sample-size (HDLSS) data, which suggests that sPCA-rSVD provides competitive results.

730 citations


Proceedings Article
13 Jul 2008
TL;DR: A new dimensionality reduction method is proposed to find a latent space, which minimizes the distance between distributions of the data in different domains in a latentspace, which can be treated as a bridge of transferring knowledge from the source domain to the target domain.
Abstract: Transfer learning addresses the problem of how to utilize plenty of labeled data in a source domain to solve related but different problems in a target domain, even when the training and testing problems have different distributions or features. In this paper, we consider transfer learning via dimensionality reduction. To solve this problem, we learn a low-dimensional latent feature space where the distributions between the source domain data and the target domain data are the same or close to each other. Onto this latent feature space, we project the data in related domains where we can apply standard learning algorithms to train classification or regression models. Thus, the latent feature space can be treated as a bridge of transferring knowledge from the source domain to the target domain. The main contribution of our work is that we propose a new dimensionality reduction method to find a latent space, which minimizes the distance between distributions of the data in different domains in a latent space. The effectiveness of our approach to transfer learning is verified by experiments in two real world applications: indoor WiFi localization and binary text classification.

640 citations


Book ChapterDOI
15 Sep 2008
TL;DR: It is shown that ensemble feature selection techniques show great promise for high-dimensional domains with small sample sizes, and provide more robust feature subsets than a single feature selection technique.
Abstract: Robustness or stability of feature selection techniques is a topic of recent interest, and is an important issue when selected feature subsets are subsequently analysed by domain experts to gain more insight into the problem modelled. In this work, we investigate the use of ensemble feature selection techniques, where multiple feature selection methods are combined to yield more robust results. We show that these techniques show great promise for high-dimensional domains with small sample sizes, and provide more robust feature subsets than a single feature selection technique. In addition, we also investigate the effect of ensemble feature selection techniques on classification performance, giving rise to a new model selection strategy.

587 citations


Journal ArticleDOI
08 Dec 2008
TL;DR: In this article, Gaussian process factor analysis (GPFA) was proposed to combine smoothing and dimensionality reduction operations in a common probabilistic framework, and applied to the activity of 61 neurons recorded simultaneously in macaque premotor and motor cortices.
Abstract: We consider the problem of extracting smooth, low-dimensional neural trajectories that summarize the activity recorded simultaneously from tens to hundreds of neurons on individual experimental trials. Current methods for extracting neural trajectories involve a two-stage process: the data are first "denoised" by smoothing over time, then a static dimensionality reduction technique is applied. We first describe extensions of the two-stage methods that allow the degree of smoothing to be chosen in a principled way, and account for spiking variability that may vary both across neurons and across time. We then present a novel method for extracting neural trajectories, Gaussian-process factor analysis (GPFA), which unifies the smoothing and dimensionality reduction operations in a common probabilistic framework. We applied these methods to the activity of 61 neurons recorded simultaneously in macaque premotor and motor cortices during reach planning and execution. By adopting a goodness-of-fit metric that measures how well the activity of each neuron can be predicted by all other recorded neurons, we found that GPFA provided a better characterization of the population activity than the two-stage methods.

Journal ArticleDOI
TL;DR: A novel and efficient approach to dense image registration, which does not require a derivative of the employed cost function is introduced, and efficient linear programming using the primal dual principles is considered to recover the lowest potential of the cost function.

Journal ArticleDOI
TL;DR: By using spectral graph analysis, SRDA casts discriminant analysis into a regression framework that facilitates both efficient computation and the use of regularization techniques, and there is no eigenvector computation involved, which is a huge save of both time and memory.
Abstract: Linear Discriminant Analysis (LDA) has been a popular method for extracting features that preserves class separability. The projection functions of LDA are commonly obtained by maximizing the between-class covariance and simultaneously minimizing the within-class covariance. It has been widely used in many fields of information processing, such as machine learning, data mining, information retrieval, and pattern recognition. However, the computation of LDA involves dense matrices eigendecomposition, which can be computationally expensive in both time and memory. Specifically, LDA has O(mnt + t3) time complexity and requires O(mn + mt + nt) memory, where m is the number of samples, n is the number of features, and t = min(m,n). When both m and n are large, it is infeasible to apply LDA. In this paper, we propose a novel algorithm for discriminant analysis, called Spectral Regression Discriminant Analysis (SRDA). By using spectral graph analysis, SRDA casts discriminant analysis into a regression framework that facilitates both efficient computation and the use of regularization techniques. Specifically, SRDA only needs to solve a set of regularized least squares problems, and there is no eigenvector computation involved, which is a huge save of both time and memory. Our theoretical analysis shows that SRDA can be computed with O(mn) time and O(ms) memory, where .s(les n) is the average number of nonzero features in each sample. Extensive experimental results on four real-world data sets demonstrate the effectiveness and efficiency of our algorithm.

Proceedings Article
08 Dec 2008
TL;DR: This paper presents DiscLDA, a discriminative variation on Latent Dirichlet Allocation in which a class-dependent linear transformation is introduced on the topic mixture proportions, and obtains a supervised dimensionality reduction algorithm that uncovers the latent structure in a document collection while preserving predictive power for the task of classification.
Abstract: Probabilistic topic models have become popular as methods for dimensionality reduction in collections of text documents or images. These models are usually treated as generative models and trained using maximum likelihood or Bayesian methods. In this paper, we discuss an alternative: a discriminative framework in which we assume that supervised side information is present, and in which we wish to take that side information into account in finding a reduced dimensionality representation. Specifically, we present DiscLDA, a discriminative variation on Latent Dirichlet Allocation (LDA) in which a class-dependent linear transformation is introduced on the topic mixture proportions. This parameter is estimated by maximizing the conditional likelihood. By using the transformed topic mixture proportions as a new representation of documents, we obtain a supervised dimensionality reduction algorithm that uncovers the latent structure in a document collection while preserving predictive power for the task of classification. We compare the predictive power of the latent structure of DiscLDA with unsupervised LDA on the 20 Newsgroups document classification task and show how our model can identify shared topics across classes as well as class-dependent topics.

Proceedings Article
13 Jul 2008
TL;DR: A novel algorithm is proposed to efficiently find the global optimal feature subset such that the subset-level score is maximized, and extensive experiments demonstrate the effectiveness of the proposed algorithm in comparison with the traditional methods for feature selection.
Abstract: Fisher score and Laplacian score are two popular feature selection algorithms, both of which belong to the general graph-based feature selection framework. In this framework, a feature subset is selected based on the corresponding score (subset-level score), which is calculated in a trace ratio form. Since the number of all possible feature subsets is very huge, it is often prohibitively expensive in computational cost to search in a brute force manner for the feature subset with the maximum subset-level score. Instead of calculating the scores of all the feature subsets, traditional methods calculate the score for each feature, and then select the leading features based on the rank of these feature-level scores. However, selecting the feature subset based on the feature-level score cannot guarantee the optimum of the subset-level score. In this paper, we directly optimize the subset-level score, and propose a novel algorithm to efficiently find the global optimal feature subset such that the subset-level score is maximized. Extensive experiments demonstrate the effectiveness of our proposed algorithm in comparison with the traditional methods for feature selection.

Journal ArticleDOI
TL;DR: Experiments comparing the proposed approach with some other popular subspace methods on the FERET, ORL, AR, and GT databases show that the method consistently outperforms others.
Abstract: This work proposes a subspace approach that regularizes and extracts eigenfeatures from the face image. Eigenspace of the within-class scatter matrix is decomposed into three subspaces: a reliable subspace spanned mainly by the facial variation, an unstable subspace due to noise and finite number of training samples, and a null subspace. Eigenfeatures are regularized differently in these three subspaces based on an eigenspectrum model to alleviate problems of instability, overfitting, or poor generalization. This also enables the discriminant evaluation performed in the whole space. Feature extraction or dimensionality reduction occurs only at the final stage after the discriminant assessment. These efforts facilitate a discriminative and a stable low-dimensional feature representation of the face image. Experiments comparing the proposed approach with some other popular subspace methods on the FERET, ORL, AR, and GT databases show that our method consistently outperforms others.

Journal ArticleDOI
01 Apr 2008
TL;DR: This work proposes a new manifold learning technique called discriminant locally linear embedding (DLLE), in which the local geometric properties within each class are preserved according to the locally linear embeddedding (LLE) criterion, and the separability between different classes is enforced by maximizing margins between point pairs on different classes.
Abstract: Graph-embedding along with its linearization and kernelization provides a general framework that unifies most traditional dimensionality reduction algorithms. From this framework, we propose a new manifold learning technique called discriminant locally linear embedding (DLLE), in which the local geometric properties within each class are preserved according to the locally linear embedding (LLE) criterion, and the separability between different classes is enforced by maximizing margins between point pairs on different classes. To deal with the out-of-sample problem in visual recognition with vector input, the linear version of DLLE, i.e., linearization of DLLE (DLLE/L), is directly proposed through the graph-embedding framework. Moreover, we propose its multilinear version, i.e., tensorization of DLLE, for the out-of-sample problem with high-order tensor input. Based on DLLE, a procedure for gait recognition is described. We conduct comprehensive experiments on both gait and face recognition, and observe that: 1) DLLE along its linearization and tensorization outperforms the related versions of linear discriminant analysis, and DLLE/L demonstrates greater effectiveness than the linearization of LLE; 2) algorithms based on tensor representations are generally superior to linear algorithms when dealing with intrinsically high-order data; and 3) for human gait recognition, DLLE/L generally obtains higher accuracy than state-of-the-art gait recognition algorithms on the standard University of South Florida gait database.

Journal ArticleDOI
TL;DR: A novel method, referred to as LRTA, is proposed, which performs both spatial lower rank approximation and spectral DR, which achieves denoising reduction and DR in hyperspectral image analysis.
Abstract: In hyperspectral image (HSI) analysis, classification requires spectral dimensionality reduction (DR). While common DR methods use linear algebra, we propose a multilinear algebra method to jointly achieve denoising reduction and DR. Multilinear tools consider HSI data as a whole by processing jointly spatial and spectral ways. The lower rank-(K1, K2, K3) tensor approximation [LRTA-(K1, K2, K3)] was successfully applied to denoise multiway data such as color images. First, we demonstrate that the LRTA-(K1, K2, K3) performs well as a denoising preprocessing to improve classification results. Then, we propose a novel method, referred to as LRTAdr-(K1, K2, D3), which performs both spatial lower rank approximation and spectral DR. The classification algorithm Spectral Angle Mapper is applied to the output of the following three DR and noise reduction methods to compare their efficiency: the proposed LRTAdr-(K1, K2, D3), PCAdr, and PCAdr associated with Wiener filtering or soft shrinkage of wavelet transform coefficients.

Journal ArticleDOI
TL;DR: A general framework, incremental tensor analysis (ITA), which efficiently computes a compact summary for high-order and high-dimensional data, and also reveals the hidden correlations is introduced.
Abstract: How do we find patterns in author-keyword associations, evolving over timeq Or in data cubes (tensors), with product-branchcustomer sales informationq And more generally, how to summarize high-order data cubes (tensors)q How to incrementally update these patterns over timeq Matrix decompositions, like principal component analysis (PCA) and variants, are invaluable tools for mining, dimensionality reduction, feature selection, rule identification in numerous settings like streaming data, text, graphs, social networks, and many more settings. However, they have only two orders (i.e., matrices, like author and keyword in the previous example).We propose to envision such higher-order data as tensors, and tap the vast literature on the topic. However, these methods do not necessarily scale up, let alone operate on semi-infinite streams. Thus, we introduce a general framework, incremental tensor analysis (ITA), which efficiently computes a compact summary for high-order and high-dimensional data, and also reveals the hidden correlations. Three variants of ITA are presented: (1) dynamic tensor analysis (DTA); (2) streaming tensor analysis (STA); and (3) window-based tensor analysis (WTA). In paricular, we explore several fundamental design trade-offs such as space efficiency, computational cost, approximation accuracy, time dependency, and model complexity.We implement all our methods and apply them in several real settings, such as network anomaly detection, multiway latent semantic indexing on citation networks, and correlation study on sensor measurements. Our empirical studies show that the proposed methods are fast and accurate and that they find interesting patterns and outliers on the real datasets.

Journal ArticleDOI
TL;DR: The theoretical analysis of the effects of PCA on the discrimination power of the projected subspace is presented from a general pattern classification perspective for two possible scenarios: when PCA is used as a simple dimensionality reduction tool and when it is used to recondition an ill-posed LDA formulation.
Abstract: Dimensionality reduction is a necessity in most hyperspectral imaging applications. Tradeoffs exist between unsupervised statistical methods, which are typically based on principal components analysis (PCA), and supervised ones, which are often based on Fisher's linear discriminant analysis (LDA), and proponents for each approach exist in the remote sensing community. Recently, a combined approach known as subspace LDA has been proposed, where PCA is employed to recondition ill-posed LDA formulations. The key idea behind this approach is to use a PCA transformation as a preprocessor to discard the null space of rank-deficient scatter matrices, so that LDA can be applied on this reconditioned space. Thus, in theory, the subspace LDA technique benefits from the advantages of both methods. In this letter, we present a theoretical analysis of the effects (often ill effects) of PCA on the discrimination power of the projected subspace. The theoretical analysis is presented from a general pattern classification perspective for two possible scenarios: (1) when PCA is used as a simple dimensionality reduction tool and (2) when it is used to recondition an ill-posed LDA formulation. We also provide experimental evidence of the ineffectiveness of both scenarios for hyperspectral target recognition applications.

01 Dec 2008
TL;DR: In this paper, a Bayesian approach to nonlinear inverse problems in which the unknown quantity is a spatial or temporal field, endowed with a hierarchical Gaussian process prior, is proposed, where truncated Karhunen-Loeve expansions are introduced to efficiently parameterize the unknown field and specify a stochastic forward problem whose solution captures that of the deterministic forward model over the support of the prior.
Abstract: We consider a Bayesian approach to nonlinear inverse problems in which the unknown quantity is a spatial or temporal field, endowed with a hierarchical Gaussian process prior. Computational challenges in this construction arise from the need for repeated evaluations of the forward model (e.g., in the context of Markov chain Monte Carlo) and are compounded by high dimensionality of the posterior. We address these challenges by introducing truncated Karhunen-Loeve expansions, based on the prior distribution, to efficiently parameterize the unknown field and to specify a stochastic forward problem whose solution captures that of the deterministic forward model over the support of the prior. We seek a solution of this problem using Galerkin projection on a polynomial chaos basis, and use the solution to construct a reduced-dimensionality surrogate posterior density that is inexpensive to evaluate. We demonstrate the formulation on a transient diffusion equation with prescribed source terms, inferring the spatially-varying diffusivity of the medium from limited and noisy data.

Proceedings ArticleDOI
05 Jul 2008
TL;DR: Novel applications of the approach including cross-lingual information retrieval and transfer learning in Markov decision processes are presented, providing results showing useful knowledge transfer from one domain to another.
Abstract: In this paper we introduce a novel approach to manifold alignment, based on Procrustes analysis. Our approach differs from "semi-supervised alignment" in that it results in a mapping that is defined everywhere - when used with a suitable dimensionality reduction method - rather than just on the training data points. We describe and evaluate our approach both theoretically and experimentally, providing results showing useful knowledge transfer from one domain to another. Novel applications of our method including cross-lingual information retrieval and transfer learning in Markov decision processes are presented.

Proceedings Article
01 Oct 2008
TL;DR: The proposed model introduces the concept of mapping function to make the different patterns from different pattern spaces comparable and hence an optimal pattern can be learned from the multiple patterns of multiple representations.
Abstract: Multiple view data, which have multiple representations from different feature spaces or graph spaces, arise in various data mining applications such as information retrieval, bioinformatics and social network analysis. Since different representations could have very different statistical properties, how to learn a consensus pattern from multiple representations is a challenging problem. In this paper, we propose a general model for multiple view unsupervised learning. The proposed model introduces the concept of mapping function to make the different patterns from different pattern spaces comparable and hence an optimal pattern can be learned from the multiple patterns of multiple representations. Under this model, we formulate two specific models for two important cases of unsupervised learning, clustering and spectral dimensionality reduction; we derive an iterating algorithm for multiple view clustering, and a simple algorithm providing a global optimum to multiple spectral dimensionality reduction. We also extend the proposed model and algorithms to evolutionary clustering and unsupervised learning with side information. Empirical evaluations on both synthetic and real data sets demonstrate the effectiveness of the proposed model and algorithms.

01 Jan 2008
TL;DR: The results show that the classification accuracy based on PCA is highly sensitive to the type of data and that the variance captured the principal components is not necessarily a vital indicator for the classification performance.
Abstract: Dimensionality reduction and feature subset selection are two techniques for reducing the attribute space of a feature set, which is an important component of both supervised and unsupervised classification or regression problems. While in feature subset selection a subset of the original attributes is extracted, dimensionality reduction in general produces linear combinations of the original attribute set. In this paper we investigate the relationship between several attribute space reduction techniques and the resulting classification accuracy for two very different application areas. On the one hand, we consider e-mail filtering, where the feature space contains various properties of e-mail messages, and on the other hand, we consider drug discovery problems, where quantitative representations of molecular structures are encoded in terms of information-preserving descriptor values. Subsets of the original attributes constructed by filter and wrapper techniques as well as subsets of linear combinations of the original attributes constructed by three different variants of the principle component analysis (PCA) are compared in terms of the classification performance achieved with various machine learning algorithms as well as in terms of runtime performance. We successively reduce the size of the attribute sets and investigate the changes in the classification results. Moreover, we explore the relationship between the variance captured in the linear combinations within PCA and the resulting classification accuracy. The results show that the classification accuracy based on PCA is highly sensitive to the type of data and that the variance captured the principal components is not necessarily a vital indicator for the classification performance.

Journal ArticleDOI
TL;DR: Experimental results on Yale and CMU PIE face databases convince us that the proposed method provides a better representation of the class information and obtains much higher recognition accuracies.

Journal ArticleDOI
TL;DR: This work implements an enhanced hybrid classification method through the utilization of the naive Bayes approach and the support vector machine (SVM) that shows a significant reduction in training time as compared to the Lsquare method and significant improvement in the classification accuracy when compared to pure naive BayES systems and also the TF-IDF/SVM hybrids.
Abstract: This work implements an enhanced hybrid classification method through the utilization of the naive Bayes approach and the support vector machine (SVM). In this project, the Bayes formula was used to vectorize (as opposed to classify) a document according to a probability distribution reflecting the probable categories that the document may belong to. The Bayes formula gives a range of probabilities to which the document can be assigned according to a predetermined set of topics (categories) such as those found in the "20 Newsgroups" data set for instance. Using this probability distribution as the vectors to represent the document, the SVM can then be used to classify the documents on a multidimensional level. The effects of an inadvertent dimensionality reduction caused by classifying using only the highest probability using the naive Bayes classifier can be overcome using the SVM by employing all the probability values associated with every category for each document. This method can be used for any data set and shows a significant reduction in training time as compared to the Lsquare method and significant improvement in the classification accuracy when compared to pure naive Bayes systems and also the TF-IDF/SVM hybrids.

15 Sep 2008
TL;DR: In this article, the authors investigate the relationship between several attribute space reduction techniques and the resulting classification accuracy for two very different application areas, e.g., e-mail filtering and drug discovery.
Abstract: Dimensionality reduction and feature subset selection are two techniques for reducing the attribute space of a feature set, which is an important component of both supervised and unsupervised classification or regression problems. While in feature subset selection a subset of the original attributes is extracted, dimensionality reduction in general produces linear combinations of the original attribute set. In this paper we investigate the relationship between several attribute space reduction techniques and the resulting classification accuracy for two very different application areas. On the one hand, we consider e-mail filtering, where the feature space contains various properties of e-mail messages, and on the other hand, we consider drug discovery problems, where quantitative representations of molecular structures are encoded in terms of information-preserving descriptor values. Subsets of the original attributes constructed by filter and wrapper techniques as well as subsets of linear combinations of the original attributes constructed by three different variants of the principle component analysis (PCA) are compared in terms of the classification performance achieved with various machine learning algorithms as well as in terms of runtime performance. We successively reduce the size of the attribute sets and investigate the changes in the classification results. Moreover, we explore the relationship between the variance captured in the linear combinations within PCA and the resulting classification accuracy. The results show that the classification accuracy based on PCA is highly sensitive to the type of data and that the variance captured the principal components is not necessarily a vital indicator for the classification performance.

Journal ArticleDOI
01 Feb 2008
TL;DR: Experimental results show that the proposed GSVD-ILDA algorithm gives the same performance as the LDA/GSVD with much smaller computational complexity, and also gives better classification performance than the other recently proposed ILDA algorithms.
Abstract: Dimensionality reduction methods have been successfully employed for face recognition. Among the various dimensionality reduction algorithms, linear (Fisher) discriminant analysis (LDA) is one of the popular supervised dimensionality reduction methods, and many LDA-based face recognition algorithms/systems have been reported in the last decade. However, the LDA-based face recognition systems suffer from the scalability problem. To overcome this limitation, an incremental approach is a natural solution. The main difficulty in developing the incremental LDA (ILDA) is to handle the inverse of the within-class scatter matrix. In this paper, based on the generalized singular value decomposition LDA (LDA/GSVD), we develop a new ILDA algorithm called GSVD-ILDA. Different from the existing techniques in which the new projection matrix is found in a restricted subspace, the proposed GSVD-ILDA determines the projection matrix in full space. Extensive experiments are performed to compare the proposed GSVD-ILDA with the LDA/GSVD as well as the existing ILDA methods using the face recognition technology face database and the Carneggie Mellon University Pose, Illumination, and Expression face database. Experimental results show that the proposed GSVD-ILDA algorithm gives the same performance as the LDA/GSVD with much smaller computational complexity. The experimental results also show that the proposed GSVD-ILDA gives better classification performance than the other recently proposed ILDA algorithms.

Journal ArticleDOI
TL;DR: This work proposes a novel semisupervised method for dimensionality reduction called Maximum Margin Projection (MMP), which aims at maximizing the margin between positive and negative examples at each local neighborhood.
Abstract: One of the fundamental problems in Content-Based Image Retrieval (CBIR) has been the gap between low-level visual features and high-level semantic concepts. To narrow down this gap, relevance feedback is introduced into image retrieval. With the user-provided information, a classifier can be learned to distinguish between positive and negative examples. However, in real-world applications, the number of user feedbacks is usually too small compared to the dimensionality of the image space. In order to cope with the high dimensionality, we propose a novel semisupervised method for dimensionality reduction called Maximum Margin Projection (MMP). MMP aims at maximizing the margin between positive and negative examples at each local neighborhood. Different from traditional dimensionality reduction algorithms such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), which effectively see only the global euclidean structure, MMP is designed for discovering the local manifold structure. Therefore, MMP is likely to be more suitable for image retrieval, where nearest neighbor search is usually involved. After projecting the images into a lower dimensional subspace, the relevant images get closer to the query image; thus, the retrieval performance can be enhanced. The experimental results on Corel image database demonstrate the effectiveness of our proposed algorithm.

Journal ArticleDOI
TL;DR: A class separability criterion is developed in a high-dimensional kernel space, and feature selection is performed by the maximization of this criterion, which is applied to a variety of selection modes with different search strategies.
Abstract: Classification can often benefit from efficient feature selection. However, the presence of linearly nonseparable data, quick response requirement, small sample problem and noisy features makes the feature selection quite challenging. In this work, a class separability criterion is developed in a high-dimensional kernel space, and feature selection is performed by the maximization of this criterion. To make this feature selection approach work, the issues of automatic kernel parameter tuning, the numerical stability, and the regularization for multi-parameter optimization are addressed. Theoretical analysis uncovers the relationship of this criterion to the radius-margin bound of the SVMs, the KFDA, and the kernel alignment criterion, providing more insight on using this criterion for feature selection. This criterion is applied to a variety of selection modes with different search strategies. Extensive experimental study demonstrates its efficiency in delivering fast and robust feature selection.