Showing papers on "Dimensionality reduction published in 2005"

PDF

Open Access

Reference Entry•DOI•

[...]

15 Oct 2005

TL;DR: Principal component analysis (PCA) as discussed by the authors replaces the p original variables by a smaller number, q, of derived variables, the principal components, which are linear combinations of the original variables.

...read moreread less

Abstract: When large multivariate datasets are analyzed, it is often desirable to reduce their dimensionality. Principal component analysis is one technique for doing this. It replaces the p original variables by a smaller number, q, of derived variables, the principal components, which are linear combinations of the original variables. Often, it is possible to retain most of the variability in the original variables with q very much smaller than p. Despite its apparent simplicity, principal component analysis has a number of subtleties, and it has many uses and extensions. A number of choices associated with the technique are briefly discussed, namely, covariance or correlation, how many components, and different normalization constraints, as well as confusion with factor analysis. Various uses and extensions are outlined. Keywords: dimension reduction; factor analysis; multivariate analysis; variance maximization

...read moreread less

14,773 citations

Proceedings Article•

Laplacian Score for Feature Selection

[...]

Xiaofei He¹, Deng Cai², Partha Niyogi¹•Institutions (2)

University of Chicago¹, University of Illinois at Urbana–Champaign²

05 Dec 2005

TL;DR: This paper proposes a "filter" method for feature selection which is independent of any learning algorithm, based on the observation that, in many real world classification problems, data from the same class are often close to each other.

...read moreread less

Abstract: In supervised learning scenarios, feature selection has been studied widely in the literature. Selecting features in unsupervised learning scenarios is a much harder problem, due to the absence of class labels that would guide the search for relevant information. And, almost all of previous unsupervised feature selection methods are "wrapper" techniques that require a learning algorithm to evaluate the candidate feature subsets. In this paper, we propose a "filter" method for feature selection which is independent of any learning algorithm. Our method can be performed in either supervised or unsupervised fashion. The proposed method is based on the observation that, in many real world classification problems, data from the same class are often close to each other. The importance of a feature is evaluated by its power of locality preserving, or, Laplacian Score. We compare our method with data variance (unsupervised) and Fisher score (supervised) on two data sets. Experimental results demonstrate the effectiveness and efficiency of our algorithm.

...read moreread less

1,817 citations

Journal Article•DOI•

Generalized principal component analysis (GPCA)

[...]

René Vidal¹, Yi Ma, S. Shankar Sastry•Institutions (1)

Johns Hopkins University¹

01 Dec 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An algebro-geometric solution to the problem of segmenting an unknown number of subspaces of unknown and varying dimensions from sample data points and applications of GPCA to computer vision problems such as face clustering, temporal video segmentation, and 3D motion segmentation from point correspondences in multiple affine views are presented.

...read moreread less

Abstract: This paper presents an algebro-geometric solution to the problem of segmenting an unknown number of subspaces of unknown and varying dimensions from sample data points. We represent the subspaces with a set of homogeneous polynomials whose degree is the number of subspaces and whose derivatives at a data point give normal vectors to the subspace passing through the point. When the number of subspaces is known, we show that these polynomials can be estimated linearly from data; hence, subspace segmentation is reduced to classifying one point per subspace. We select these points optimally from the data set by minimizing certain distance function, thus dealing automatically with moderate noise in the data. A basis for the complement of each subspace is then recovered by applying standard PCA to the collection of derivatives (normal vectors). Extensions of GPCA that deal with data in a high-dimensional space and with an unknown number of subspaces are also presented. Our experiments on low-dimensional data show that GPCA outperforms existing algebraic algorithms based on polynomial factorization and provides a good initialization to iterative techniques such as k-subspaces and expectation maximization. We also present applications of GPCA to computer vision problems such as face clustering, temporal video segmentation, and 3D motion segmentation from point correspondences in multiple affine views.

...read moreread less

1,162 citations

Journal Article•DOI•

Document clustering using locality preserving indexing

[...]

Deng Cai¹, Xiaofei He², Jiawei Han•Institutions (2)

University of Illinois at Urbana–Champaign¹, University of Chicago²

01 Dec 2005-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A novel document clustering method which aims to cluster the documents into different semantic classes by using locality preserving indexing (LPI), an unsupervised approximation of the supervised linear discriminant analysis (LDA) method, which gives the intuitive motivation of the method.

...read moreread less

Abstract: We propose a novel document clustering method which aims to cluster the documents into different semantic classes. The document space is generally of high dimensionality and clustering in such a high dimensional space is often infeasible due to the curse of dimensionality. By using locality preserving indexing (LPI), the documents can be projected into a lower-dimensional semantic space in which the documents related to the same semantics are close to each other. Different from previous document clustering methods based on latent semantic indexing (LSI) or nonnegative matrix factorization (NMF), our method tries to discover both the geometric and discriminating structures of the document space. Theoretical analysis of our method shows that LPI is an unsupervised approximation of the supervised linear discriminant analysis (LDA) method, which gives the intuitive motivation of our method. Extensive experimental evaluations are performed on the Reuters-21578 and TDT2 data sets.

...read moreread less

707 citations

Journal Article•

Learning a Mahalanobis Metric from Equivalence Constraints

[...]

Aharon Bar-Hillel, Tomer Hertz, Noam Shental, Daphna Weinshall

01 Dec 2005-Journal of Machine Learning Research

TL;DR: This work presents the Relevant Component Analysis algorithm, which is a simple and efficient algorithm for learning a Mahalanobis metric, and shows that RCA is the solution of an interesting optimization problem, founded on an information theoretic basis.

...read moreread less

Abstract: Many learning algorithms use a metric defined over the input space as a principal tool, and their performance critically depends on the quality of this metric. We address the problem of learning metrics using side-information in the form of equivalence constraints. Unlike labels, we demonstrate that this type of side-information can sometimes be automatically obtained without the need of human intervention. We show how such side-information can be used to modify the representation of the data, leading to improved clustering and classification.Specifically, we present the Relevant Component Analysis (RCA) algorithm, which is a simple and efficient algorithm for learning a Mahalanobis metric. We show that RCA is the solution of an interesting optimization problem, founded on an information theoretic basis. If dimensionality reduction is allowed within RCA, we show that it is optimally accomplished by a version of Fisher's linear discriminant that uses constraints. Moreover, under certain Gaussian assumptions, RCA can be viewed as a Maximum Likelihood estimation of the within class covariance matrix. We conclude with extensive empirical evaluations of RCA, showing its advantage over alternative methods.

...read moreread less

569 citations

Journal Article•DOI•

Incremental Online Learning in High Dimensions

[...]

Sethu Vijayakumar¹, Aaron D'Souza², Stefan Schaal²•Institutions (2)

University of Edinburgh¹, University of Southern California²

01 Dec 2005-Neural Computation

TL;DR: Locally weighted projection regression is the first truly incremental spatially localized learning method that can successfully and efficiently operate in very high-dimensional spaces.

...read moreread less

Abstract: Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high-dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally efficient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high-dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it (1) learns rapidly with second-order learning methods based on incremental training, (2) uses statistically sound stochastic leave-one-out cross validation for learning without the need to memorize training data, (3) adjusts its weighting kernels based on only local information in order to minimize the danger of negative interference of incremental learning, (4) has a computational complexity that is linear in the number of inputs, and (5) can deal with a large number of—possibly redundant—inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and efficiently operate in very high-dimensional spaces.

...read moreread less

564 citations

Posted Content•

Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck operators

[...]

Boaz Nadler, Stephane Lafon, Ronald R. Coifman, Ioannis G. Kevrekidis

06 Jun 2005-arXiv: Numerical Analysis

TL;DR: In this paper, a diffusion-based probabilistic interpretation of spectral clustering and dimensionality reduction algorithms that use the eigenvectors of the normalized graph Laplacian is presented.

...read moreread less

Abstract: This paper presents a diffusion based probabilistic interpretation of spectral clustering and dimensionality reduction algorithms that use the eigenvectors of the normalized graph Laplacian. Given the pairwise adjacency matrix of all points, we define a diffusion distance between any two data points and show that the low dimensional representation of the data by the first few eigenvectors of the corresponding Markov matrix is optimal under a certain mean squared error criterion. Furthermore, assuming that data points are random samples from a density $p(\x) = e^{-U(\x)}$ we identify these eigenvectors as discrete approximations of eigenfunctions of a Fokker-Planck operator in a potential $2U(\x)$ with reflecting boundary conditions. Finally, applying known results regarding the eigenvalues and eigenfunctions of the continuous Fokker-Planck operator, we provide a mathematical justification for the success of spectral clustering and dimensional reduction algorithms based on these first few eigenvectors. This analysis elucidates, in terms of the characteristics of diffusion processes, many empirical findings regarding spectral clustering algorithms.

...read moreread less

427 citations

Proceedings Article•DOI•

Graph embedding: a general framework for dimensionality reduction

[...]

Shuicheng Yan¹, Dong Xu², Benyu Zhang³, Hong-Jiang Zhang³•Institutions (3)

The Chinese University of Hong Kong¹, University of Science and Technology of China², Microsoft³

20 Jun 2005

TL;DR: A new supervised algorithm, Marginal Fisher Analysis (MFA), is proposed, for dimensionality reduction by designing two graphs that characterize the intra-class compactness and inter-class separability, respectively.

...read moreread less

Abstract: In the last decades, a large family of algorithms - supervised or unsupervised; stemming from statistic or geometry theory - have been proposed to provide different solutions to the problem of dimensionality reduction. In this paper, beyond the different motivations of these algorithms, we propose a general framework, graph embedding along with its linearization and kernelization, which in theory reveals the underlying objective shared by most previous algorithms. It presents a unified perspective to understand these algorithms; that is, each algorithm can be considered as the direct graph embedding or its linear/kernel extension of some specific graph characterizing certain statistic or geometry property of a data set. Furthermore, this framework is a general platform to develop new algorithm for dimensionality reduction. To this end, we propose a new supervised algorithm, Marginal Fisher Analysis (MFA), for dimensionality reduction by designing two graphs that characterize the intra-class compactness and inter-class separability, respectively. MFA measures the intra-class compactness with the distance between each data point and its neighboring points of the same class, and measures the inter-class separability with the class margins; thus it overcomes the limitations of traditional Linear Discriminant Analysis algorithm in terms of data distribution assumptions and available projection directions. The toy problem on artificial data and the real face recognition experiments both show the superiority of our proposed MFA in comparison to LDA.

...read moreread less

404 citations

Journal Article•DOI•

Supervised nonlinear dimensionality reduction for visualization and classification

[...]

Xin Geng¹, De-Chuan Zhan¹, Zhi-Hua Zhou¹•Institutions (1)

Nanjing University¹

01 Dec 2005

TL;DR: The results reveal that S-Isomap excels compared to Isomap and WeightedIso in classification, and it is highly competitive with those well-known classification methods.

...read moreread less

Abstract: When performing visualization and classification, people often confront the problem of dimensionality reduction. Isomap is one of the most promising nonlinear dimensionality reduction techniques. However, when Isomap is applied to real-world data, it shows some limitations, such as being sensitive to noise. In this paper, an improved version of Isomap, namely S-Isomap, is proposed. S-Isomap utilizes class information to guide the procedure of nonlinear dimensionality reduction. Such a kind of procedure is called supervised nonlinear dimensionality reduction. In S-Isomap, the neighborhood graph of the input data is constructed according to a certain kind of dissimilarity between data points, which is specially designed to integrate the class information. The dissimilarity has several good properties which help to discover the true neighborhood of the data and, thus, makes S-Isomap a robust technique for both visualization and classification, especially for real-world problems. In the visualization experiments, S-Isomap is compared with Isomap, LLE, and WeightedIso. The results show that S-Isomap performs the best. In the classification experiments, S-Isomap is used as a preprocess of classification and compared with Isomap, WeightedIso, as well as some other well-established classification methods, including the K-nearest neighbor classifier, BP neural network, J4.8 decision tree, and SVM. The results reveal that S-Isomap excels compared to Isomap and WeightedIso in classification, and it is highly competitive with those well-known classification methods.

...read moreread less

394 citations

Journal Article•DOI•

Real-Time subspace integration for St. Venant-Kirchhoff deformable models

[...]

Jernej Barbič¹, Doug L. James¹•Institutions (1)

Carnegie Mellon University¹

01 Jul 2005

TL;DR: An approach for fast subspace integration of reduced-coordinate nonlinear deformable models that is suitable for interactive applications in computer graphics and haptics, and presents two useful approaches for generating low-dimensional subspace bases: modal derivatives and an interactive sketching technique.

...read moreread less

Abstract: In this paper, we present an approach for fast subspace integration of reduced-coordinate nonlinear deformable models that is suitable for interactive applications in computer graphics and haptics. Our approach exploits dimensional model reduction to build reduced-coordinate deformable models for objects with complex geometry. We exploit the fact that model reduction on large deformation models with linear materials (as commonly used in graphics) result in internal force models that are simply cubic polynomials in reduced coordinates. Coefficients of these polynomials can be precomputed, for efficient runtime evaluation. This allows simulation of nonlinear dynamics using fast implicit Newmark subspace integrators, with subspace integration costs independent of geometric complexity. We present two useful approaches for generating low-dimensional subspace bases: modal derivatives and an interactive sketching technique. Mass-scaled principal component analysis (mass-PCA) is suggested for dimensionality reduction. Finally, several examples are given from computer animation to illustrate high performance, including force-feedback haptic rendering of a complicated object undergoing large deformations.

...read moreread less

381 citations

Journal Article•

Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems

[...]

Jieping Ye

01 Dec 2005-Journal of Machine Learning Research

TL;DR: A generalized discriminant analysis based on a new optimization criterion that extends the optimization criteria of the classical Linear Discriminant Analysis (LDA) when the scatter matrices are singular is presented.

...read moreread less

Abstract: A generalized discriminant analysis based on a new optimization criterion is presented. The criterion extends the optimization criteria of the classical Linear Discriminant Analysis (LDA) when the scatter matrices are singular. An efficient algorithm for the new optimization problem is presented.The solutions to the proposed criterion form a family of algorithms for generalized LDA, which can be characterized in a closed form. We study two specific algorithms, namely Uncorrelated LDA (ULDA) and Orthogonal LDA (OLDA). ULDA was previously proposed for feature extraction and dimension reduction, whereas OLDA is a novel algorithm proposed in this paper. The features in the reduced space of ULDA are uncorrelated, while the discriminant vectors of OLDA are orthogonal to each other. We have conducted a comparative study on a variety of real-world data sets to evaluate ULDA and OLDA in terms of classification accuracy.

...read moreread less

Proceedings Article•

Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

[...]

Boaz Nadler¹, Stephane Lafon¹, Ioannis G. Kevrekidis², Ronald R. Coifman¹•Institutions (2)

Yale University¹, Princeton University²

05 Dec 2005

TL;DR: A diffusion based probabilistic interpretation of spectral clustering and dimensionality reduction algorithms that use the eigenvectors of the normalized graph Laplacian is presented.

...read moreread less

Abstract: This paper presents a diffusion based probabilistic interpretation of spectral clustering and dimensionality reduction algorithms that use the eigenvectors of the normalized graph Laplacian. Given the pairwise adjacency matrix of all points, we define a diffusion distance between any two data points and show that the low dimensional representation of the data by the first few eigenvectors of the corresponding Markov matrix is optimal under a certain mean squared error criterion. Furthermore, assuming that data points are random samples from a density p(x) = e-U(x) we identify these eigenvectors as discrete approximations of eigenfunctions of a Fokker-Planck operator in a potential 2U(x) with reflecting boundary conditions. Finally, applying known results regarding the eigenvalues and eigenfunctions of the continuous Fokker-Planck operator, we provide a mathematical justification for the success of spectral clustering and dimensional reduction algorithms based on these first few eigenvectors. This analysis elucidates, in terms of the characteristics of diffusion processes, many empirical findings regarding spectral clustering algorithms.

...read moreread less

Journal Article•DOI•

Locally linear discriminant analysis for multimodally distributed classes for face recognition with a single model image

[...]

Tae-Kyun Kim¹, J. Kittler•Institutions (1)

Samsung¹

01 Mar 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel gradient-based learning algorithm is proposed for finding the optimal set of local linear bases for multiclass nonlinear discrimination and it is computationally highly efficient as compared to GDA.

...read moreread less

Abstract: We present a novel method of nonlinear discriminant analysis involving a set of locally linear transformations called "Locally Linear Discriminant Analysis" (LLDA). The underlying idea is that global nonlinear data structures are locally linear and local structures can be linearly aligned. Input vectors are projected into each local feature space by linear transformations found to yield locally linearly transformed classes that maximize the between-class covariance while minimizing the within-class covariance. In face recognition, linear discriminant analysis (LIDA) has been widely adopted owing to its efficiency, but it does not capture nonlinear manifolds of faces which exhibit pose variations. Conventional nonlinear classification methods based on kernels such as generalized discriminant analysis (GDA) and support vector machine (SVM) have been developed to overcome the shortcomings of the linear method, but they have the drawback of high computational cost of classification and overfitting. Our method is for multiclass nonlinear discrimination and it is computationally highly efficient as compared to GDA. The method does not suffer from overfitting by virtue of the linear base structure of the solution. A novel gradient-based learning algorithm is proposed for finding the optimal set of local linear bases. The optimization does not exhibit a local-maxima problem. The transformation functions facilitate robust face recognition in a low-dimensional subspace, under pose variations, using a single model image. The classification results are given for both synthetic and real face data.

...read moreread less

Journal Article•DOI•

A two-stage linear discriminant analysis via QR-decomposition

[...]

Jieping Ye¹, Qi Li•Institutions (1)

University of Minnesota¹

01 Jun 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper proposes a two-stage LDA method, namely LDA/QR, which aims to overcome the singularity problems of classical LDA, while achieving efficiency and scalability simultaneously.

...read moreread less

Abstract: Linear discriminant analysis (LDA) is a well-known method for feature extraction and dimension reduction. It has been used widely in many applications involving high-dimensional data, such as image and text classification. An intrinsic limitation of classical LDA is the so-called singularity problems; that is, it fails when all scatter matrices are singular. Many LDA extensions were proposed in the past to overcome the singularity problems. Among these extensions, PCA+LDA, a two-stage method, received relatively more attention. In PCA+LDA, the LDA stage is preceded by an intermediate dimension reduction stage using principal component analysis (PCA). Most previous LDA extensions are computationally expensive, and not scalable, due to the use of singular value decomposition or generalized singular value decomposition. In this paper, we propose a two-stage LDA method, namely LDA/QR, which aims to overcome the singularity problems of classical LDA, while achieving efficiency and scalability simultaneously. The key difference between LDA/QR and PCA+LDA lies in the first stage, where LDA/QR applies QR decomposition to a small matrix involving the class centroids, while PCA+LDA applies PCA to the total scatter matrix involving all training data points. We further justify the proposed algorithm by showing the relationship among LDA/QR and previous LDA methods. Extensive experiments on face images and text documents are presented to show the effectiveness of the proposed algorithm.

...read moreread less

Journal Article•DOI•

Contour regression: A general approach to dimension reduction

[...]

Bing Li, Hongyuan Zha¹, Francesca Chiaromonte•Institutions (1)

Pennsylvania State University¹

01 Aug 2005-Annals of Statistics

TL;DR: A novel approach to sufficient dimension reduction in regression, based on estimating contour directions of small variation in the response, which proves robust to departures from ellipticity and establishes population properties for both SCR and GCR and asymptotic properties for SCR.

...read moreread less

Abstract: We propose a novel approach to sufficient dimension reduction in regression, based on estimating contour directions of small variation in the response. These directions span the orthogonal complement of the minimal space relevant for the regression and can be extracted according to two measures of variation in the response, leading to simple and general contour regression (SCR and GCR) methodology. In comparison with existing sufficient dimension reduction techniques, this contour-based methodology guarantees exhaustive estimation of the central subspace under ellipticity of the predictor distribution and mild additional assumptions, while maintaining $\sqrt{n}$-consistency and computational ease. Moreover, it proves robust to departures from ellipticity. We establish population properties for both SCR and GCR, and asymptotic properties for SCR. Simulations to compare performance with that of standard techniques such as ordinary least squares, sliced inverse regression, principal Hessian directions and sliced average variance estimation confirm the advantages anticipated by the theoretical analyses. We demonstrate the use of contour-based methods on a data set concerning soil evaporation.

...read moreread less

Journal Article•DOI•

Contour regression: A general approach to dimension reduction

[...]

Bing Li, Hongyuan Zha¹, Francesca Chiaromonte•Institutions (1)

Pennsylvania State University¹

16 Aug 2005-arXiv: Statistics Theory

TL;DR: In this paper, the authors proposed a contour-based approach to sufficient dimension reduction in regression, based on estimating contour directions of small variation in the response, leading to simple and general contour regression (SCR and GCR) methodology.

...read moreread less

Abstract: We propose a novel approach to sufficient dimension reduction in regression, based on estimating contour directions of small variation in the response. These directions span the orthogonal complement of the minimal space relevant for the regression and can be extracted according to two measures of variation in the response, leading to simple and general contour regression (SCR and GCR) methodology. In comparison with existing sufficient dimension reduction techniques, this contour-based methodology guarantees exhaustive estimation of the central subspace under ellipticity of the predictor distribution and mild additional assumptions, while maintaining \sqrtn-consistency and computational ease. Moreover, it proves robust to departures from ellipticity. We establish population properties for both SCR and GCR, and asymptotic properties for SCR. Simulations to compare performance with that of standard techniques such as ordinary least squares, sliced inverse regression, principal Hessian directions and sliced average variance estimation confirm the advantages anticipated by the theoretical analyses. We demonstrate the use of contour-based methods on a data set concerning soil evaporation.

...read moreread less

Journal Article•DOI•

On the impact of PCA dimension reduction for hyperspectral detection of difficult targets

[...]

M.D. Farrell¹, Russell M. Mersereau¹•Institutions (1)

Georgia Institute of Technology¹

18 Apr 2005-IEEE Geoscience and Remote Sensing Letters

TL;DR: The popular principal components transform [aka. principal components analysis (PCA)] is used to explore the impact that dimension reduction has on adaptive detection of difficult targets in both the reflective and emissive regimes.

...read moreread less

Abstract: Due to constraints both at the sensor and on the ground, dimension reduction is a common preprocessing step performed on many hyperspectral imaging datasets. However, this transformation is not necessarily done with the ultimate data exploitation task in mind-for example, target detection or ground cover classification. Indeed, theoretically speaking it is possible that a lossy operation such as dimension reduction might have a negative impact on detection performance. This notion is investigated experimentally using real-world hyperspectral imaging data. The popular principal components transform [aka. principal components analysis (PCA)] is used to explore the impact that dimension reduction has on adaptive detection of difficult targets in both the reflective and emissive regimes. Using seven state-of-the-art algorithms, it is shown that in many cases PCA can have a minimal impact on the detection statistic value for a target that is spectrally similar to the background against which it is sought.

...read moreread less

Proceedings Article•DOI•

Multi-label informed latent semantic indexing

[...]

Kai Yu¹, Shipeng Yu², Volker Tresp¹•Institutions (2)

Siemens¹, Ludwig Maximilian University of Munich²

15 Aug 2005

TL;DR: This paper introduces the multi-label informed latent semantic indexing (MLSI) algorithm which preserves the information of inputs and meanwhile captures the correlations between the multiple outputs and incorporates the human-annotated category information.

...read moreread less

Abstract: Latent semantic indexing (LSI) is a well-known unsupervised approach for dimensionality reduction in information retrieval. However if the output information (i.e. category labels) is available, it is often beneficial to derive the indexing not only based on the inputs but also on the target values in the training data set. This is of particular importance in applications with multiple labels, in which each document can belong to several categories simultaneously. In this paper we introduce the multi-label informed latent semantic indexing (MLSI) algorithm which preserves the information of inputs and meanwhile captures the correlations between the multiple outputs. The recovered "latent semantics" thus incorporate the human-annotated category information and can be used to greatly improve the prediction accuracy. Empirical study based on two data sets, Reuters-21578 and RCV1, demonstrates very encouraging results.

...read moreread less

Journal Article•DOI•

Evolving feature selection

[...]

Huan Liu¹, Edward R. Dougherty², Jennifer G. Dy³, Kari Torkkola⁴, Eugene Tuv⁵, Hanchuan Peng⁶, Chris Ding⁶, Fuhui Long⁶, Michael E. Berens⁷, Lance Parsons¹, Zheng Zhao¹, Lei Yu⁸, George Forman⁹ - Show less +9 more•Institutions (9)

Arizona State University¹, Texas A&M University², Northeastern University³, Motorola⁴, Intel⁵, Lawrence Berkeley National Laboratory⁶, Translational Genomics Research Institute⁷, Binghamton University⁸, Hewlett-Packard⁹

01 Nov 2005-IEEE Intelligent Systems

TL;DR: This article considers feature-selection overfitting with small-sample classifier design; feature selection for unlabeled data; variable selection using ensemble methods; minimum redundancy-maximum relevance feature selection; and biological relevance infeature selection for microarray data.

...read moreread less

Abstract: Data preprocessing is an indispensable step in effective data analysis. It prepares data for data mining and machine learning, which aim to turn data into business intelligence or knowledge. Feature selection is a preprocessing technique commonly used on high-dimensional data. Feature selection studies how to select a subset or list of attributes or variables that are used to construct models describing data. Its purposes include reducing dimensionality, removing irrelevant and redundant features, reducing the amount of data needed for learning, improving algorithms' predictive accuracy, and increasing the constructed models' comprehensibility. This article considers feature-selection overfitting with small-sample classifier design; feature selection for unlabeled data; variable selection using ensemble methods; minimum redundancy-maximum relevance feature selection; and biological relevance in feature selection for microarray data.

...read moreread less

Journal Article•DOI•

On fuzzy-rough sets approach to feature selection

[...]

Rajen B. Bhatt¹, M. Gopal¹•Institutions (1)

Indian Institute of Technology Delhi¹

15 May 2005-Pattern Recognition Letters

TL;DR: It is shown that the fuzzy-rough set attribute reduction algorithm is not convergent on many real datasets due to its poorly designed termination criteria; and the computational complexity of the algorithm increases exponentially with increase in the number of input variables and in multiplication with the size of data patterns.

...read moreread less

Proceedings Article•DOI•

Discriminant analysis with tensor representation

[...]

Shuicheng Yan¹, Dong Xu², Qiang Yang³, Lei Zhang⁴, Xiaoou Tang¹, Hong-Jiang Zhang⁴ - Show less +2 more•Institutions (4)

The Chinese University of Hong Kong¹, University of Science and Technology of China², Hong Kong University of Science and Technology³, Microsoft⁴

20 Jun 2005

TL;DR: This paper proposes a discriminant tensor criterion (DTC), whereby multiple interrelated lower-dimensional discriminative subspaces are derived for feature selection and an algorithm discriminant analysis with tensor representation (DATER), which has the potential to outperform the traditional subspace learning algorithms, especially in the small sample size cases.

...read moreread less

Abstract: In this paper, we present a novel approach to solving the supervised dimensionality reduction problem by encoding an image object as a general tensor of 2nd or higher order. First, we propose a discriminant tensor criterion (DTC), whereby multiple interrelated lower-dimensional discriminative subspaces are derived for feature selection. Then, a novel approach called k-mode cluster-based discriminant analysis is presented to iteratively learn these subspaces by unfolding the tensor along different tensor dimensions. We call this algorithm discriminant analysis with tensor representation (DATER), which has the following characteristics: 1) multiple interrelated subspaces can collaborate to discriminate different classes; 2) for classification problems involving higher-order tensors, the DATER algorithm can avoid the curse of dimensionality dilemma and overcome the small sample size problem; and 3) the computational cost in the learning stage is reduced to a large extent owing to the reduced data dimensions in generalized eigenvalue decomposition. We provide extensive experiments by encoding face images as 2nd or 3rd order tensors to demonstrate that the proposed DATER algorithm based on higher order tensors has the potential to outperform the traditional subspace learning algorithms, especially in the small sample size cases.

...read moreread less

Proceedings Article•DOI•

Stability of feature selection algorithms

[...]

Alexandros Kalousis¹, Julien Prados¹, Melanie Hilario¹•Institutions (1)

Geneva College¹

27 Nov 2005

TL;DR: This study quantifies the sensitivity of feature selection algorithms to variations in the training set by assessing the stability of the feature preferences that they express in the form of weights-scores, ranks, or a selected feature subset.

...read moreread less

Abstract: With the proliferation of extremely high-dimensional data, feature selection algorithms have become indispensable components of the learning process. Strangely, despite extensive work on the stability of learning algorithms, the stability of feature selection algorithms has been relatively neglected. This study is an attempt to fill that gap by quantifying the sensitivity of feature selection algorithms to variations in the training set. We assess the stability of feature selection algorithms based on the stability of the feature preferences that they express in the form of weights-scores, ranks, or a selected feature subset. We examine a number of measures to quantify the stability of feature preferences and propose an empirical way to estimate them. We perform a series of experiments with several feature selection algorithms on a set of proteomics datasets. The experiments allow us to explore the merits of each stability measure and create stability profiles of the feature selection algorithms. Finally we show how stability profiles can support the choice of a feature selection algorithm.

...read moreread less

Proceedings Article•

Distributed Estimation Using Reduced Dimensionality Sensor Observations

[...]

Ioannis D. Schizas¹, Georgios B. Giannakis¹, Zhi-Quan Luo¹•Institutions (1)

University of Minnesota¹

01 Dec 2005

TL;DR: This work derives linear estimators of stationary random signals based on reduced-dimensionality observations collected at distributed sensors and communicated to a fusion center over wireless links with closed-form mean-square error (MSE) optimal estimators along with coordinate descent suboptimal alternatives that guarantee convergence at least to a stationary point.

...read moreread less

Abstract: We derive linear estimators of stationary random signals based on reduced-dimensionality observations collected at distributed sensors and communicated to a fusion center over wireless links. Dimensionality reduction compresses sensor data to meet low-power and bandwidth constraints, while linearity in compression and estimation are well motivated by the limited computing capabilities wireless sensor networks are envisioned to operate with, and by the desire to estimate random signals from observations with unknown probability density functions. In the absence of fading and fusion center noise (ideal links), we cast this intertwined compression-estimation problem in a canonical cor- relation analysis framework and derive closed-form mean-square error (MSE) optimal estimators along with coordinate descent suboptimal alternatives that guarantee convergence at least to a stationary point. Likewise, we develop estimators based on reduced-dimensionality sensor observations in the presence of fading and additive noise at the fusion center (nonideal links). Performance analysis and corroborating simulations demonstrate the merits of the novel distributed estimators relative to existing alternatives.

...read moreread less

Proceedings Article•DOI•

Face recognition experiments with random projection

[...]

Navin Goel¹, George Bebis¹, Ara V. Nefian²•Institutions (2)

University of Nevada, Reno¹, Intel²

28 Mar 2005

TL;DR: The experimental results illustrate that although RP represents faces in a random, low-dimensional subspace, its overall performance is comparable to that of PCA while having lower computational requirements and being data independent.

...read moreread less

Abstract: There has been a strong trend lately in face processing research away from geometric models towards appearance models. Appearance-based methods employ dimensionality reduction to represent faces more compactly in a low-dimensional subspace which is found by optimizing certain criteria. The most popular appearance-based method is the method of eigenfaces that uses Principal Component Analysis (PCA) to represent faces in a low-dimensional subspace spanned by the eigenvectors of the covariance matrix of the data corresponding to the largest eigenvalues (i.e., directions of maximum variance). Recently, Random Projection (RP) has emerged as a powerful method for dimensionality reduction. It represents a computationally simple and efficient method that preserves the structure of the data without introducing significant distortion. Despite its simplicity, RP has promising theoretical properties that make it an attractive tool for dimensionality reduction. Our focus in this paper is on investigating the feasibility of RP for face recognition. In this context, we have performed a large number of experiments using three popular face databases and comparisons using PCA. Our experimental results illustrate that although RP represents faces in a random, low-dimensional subspace, its overall performance is comparable to that of PCA while having lower computational requirements and being data independent.

...read moreread less

Proceedings Article•

Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization.

[...]

Kilian Q. Weinberger¹, Benjamin Packer¹, Lawrence K. Saul¹•Institutions (1)

University of Pennsylvania¹

01 Jan 2005

TL;DR: It is shown that the full kernel matrix can be very well approximated by a product of smaller matrices, which leads to order-of-magnitude reductions in computation time and makes it possible to study much larger problems in manifold learning.

...read moreread less

Abstract: We describe an algorithm for nonlinear dimensionality reduction based on semidefinite programming and kernel matrix factorization. The algorithm learns a kernel matrix for high dimensional data that lies on or near a low dimensional manifold. In earlier work, the kernel matrix was learned by maximizing the variance in feature space while preserving the distances and angles between nearest neighbors. In this paper, adapting recent ideas from semi-supervised learning on graphs, we show that the full kernel matrix can be very well approximated by a product of smaller matrices. Representing the kernel matrix in this way, we can reformulate the semidefinite program in terms of a much smaller submatrix of inner products between randomly chosen landmarks. The new framework leads to order-of-magnitude reductions in computation time and makes it possible to study much larger problems in manifold learning.

...read moreread less

Journal Article•DOI•

Applications of support vector machines to cancer classification with microarray data.

[...]

Feng Chu¹, Lipo Wang¹•Institutions (1)

Nanyang Technological University¹

01 Dec 2005-International Journal of Neural Systems

TL;DR: The support vector machine (SVM) is used for cancer classification with microarray data and is able to obtain the same classification accuracy but with much fewer features compared to other published results.

...read moreread less

Abstract: Microarray gene expression data usually have a large number of dimensions, e.g., over ten thousand genes, and a small number of samples, e.g., a few tens of patients. In this paper, we use the support vector machine (SVM) for cancer classification with microarray data. Dimensionality reduction methods, such as principal components analysis (PCA), class-separability measure, Fisher ratio, and t-test, are used for gene selection. A voting scheme is then employed to do multi-group classification by k(k - 1) binary SVMs. We are able to obtain the same classification accuracy but with much fewer features compared to other published results.

...read moreread less

Journal Article•DOI•

Feature selection and nearest centroid classification for protein mass spectrometry

[...]

Ilya Levner¹•Institutions (1)

University of Alberta¹

23 Mar 2005-BMC Bioinformatics

TL;DR: This study tested a number of popular feature selection methods using the nearest centroid classifier and found that several reportedly state-of-the-art algorithms in fact perform rather poorly when tested via stratified cross-validation, providing clear evidence that algorithm evaluation should be performed on several data sets using a consistent cross- validation procedure in order for the conclusions to be statistically sound.

...read moreread less

Abstract: The use of mass spectrometry as a proteomics tool is poised to revolutionize early disease diagnosis and biomarker identification. Unfortunately, before standard supervised classification algorithms can be employed, the "curse of dimensionality" needs to be solved. Due to the sheer amount of information contained within the mass spectra, most standard machine learning techniques cannot be directly applied. Instead, feature selection techniques are used to first reduce the dimensionality of the input space and thus enable the subsequent use of classification algorithms. This paper examines feature selection techniques for proteomic mass spectrometry. This study examines the performance of the nearest centroid classifier coupled with the following feature selection algorithms. Student-t test, Kolmogorov-Smirnov test, and the P-test are univariate statistics used for filter-based feature ranking. From the wrapper approaches we tested sequential forward selection and a modified version of sequential backward selection. Embedded approaches included shrunken nearest centroid and a novel version of boosting based feature selection we developed. In addition, we tested several dimensionality reduction approaches, namely principal component analysis and principal component analysis coupled with linear discriminant analysis. To fairly assess each algorithm, evaluation was done using stratified cross validation with an internal leave-one-out cross-validation loop for automated feature selection. Comprehensive experiments, conducted on five popular cancer data sets, revealed that the less advocated sequential forward selection and boosted feature selection algorithms produce the most consistent results across all data sets. In contrast, the state-of-the-art performance reported on isolated data sets for several of the studied algorithms, does not hold across all data sets. This study tested a number of popular feature selection methods using the nearest centroid classifier and found that several reportedly state-of-the-art algorithms in fact perform rather poorly when tested via stratified cross-validation. The revealed inconsistencies provide clear evidence that algorithm evaluation should be performed on several data sets using a consistent (i.e., non-randomized, stratified) cross-validation procedure in order for the conclusions to be statistically sound.

...read moreread less

Journal Article•DOI•

Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data

[...]

J. S. Yu¹, Stefano Ongarello², R. Fiedler², Xue-wen Chen³, Gianna Toffolo⁴, Claudio Cobelli⁴, Zlatko Trajanoski² - Show less +3 more•Institutions (4)

Peking University¹, Graz University of Technology², University of Kansas³, University of Padua⁴

15 May 2005-Bioinformatics

TL;DR: A novel method for dimensionality reduction and tested on a published ovarian high-resolution SELDI-TOF dataset, using a four-step strategy for data preprocessing based on binning, Kolmogorov-Smirnov test, restriction of coefficient of variation and wavelet analysis.

...read moreread less

Abstract: Motivation: High-throughput and high-resolution mass spectrometry instruments are increasingly used for disease classification and therapeutic guidance. However, the analysis of immense amount of data poses considerable challenges. We have therefore developed a novel method for dimensionality reduction and tested on a published ovarian high-resolution SELDI-TOF dataset. Results: We have developed a four-step strategy for data preprocessing based on: (1) binning, (2) Kolmogorov--Smirnov test, (3) restriction of coefficient of variation and (4) wavelet analysis. Subsequently, support vector machines were used for classification. The developed method achieves an average sensitivity of 97.38% (sd = 0.0125) and an average specificity of 93.30% (sd = 0.0174) in 1000 independent k-fold cross-validations, where k = 2, ..., 10. Availability: The software is available for academic and non-commercial institutions. Contact: zlatko.trajanoski@tugraz.at

...read moreread less

Journal Article•DOI•

Optimal dimensionality reduction of sensor data in multisensor estimation fusion

[...]

Yunmin Zhu¹, Enbin Song¹, Jie Zhou¹, Zhisheng You¹•Institutions (1)

Sichuan University¹

01 May 2005-IEEE Transactions on Signal Processing

TL;DR: This paper will answer the above questions by using the matrix decomposition, pseudo-inverse, and eigenvalue techniques.

...read moreread less

Abstract: When there exists the limitation of communication bandwidth between sensors and a fusion center, one needs to optimally precompress sensor outputs-sensor observations or estimates before the sensors' transmission in order to obtain a constrained optimal estimation at the fusion center in terms of the linear minimum error variance criterion, or when an allowed performance loss constraint exists, one needs to design the minimum dimension of sensor data. This paper will answer the above questions by using the matrix decomposition, pseudo-inverse, and eigenvalue techniques.

...read moreread less

Journal Article•DOI•

IDR/QR: an incremental dimension reduction algorithm via QR decomposition

[...]

Jieping Ye¹, Qi Li, Hui Xiong, Haesun Park¹, Ravi Janardan, Vipin Kumar - Show less +2 more•Institutions (1)

University of Minnesota¹

01 Sep 2005-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper proposes an LDA-based incremental dimension reduction algorithm, called IDR/QR, which applies QR decomposition rather than SVD, which does not require the whole data matrix in main memory, which is desirable for large data sets.

...read moreread less

Abstract: Dimension reduction is a critical data preprocessing step for many database and data mining applications, such as efficient storage and retrieval of high-dimensional data. In the literature, a well-known dimension reduction algorithm is linear discriminant analysis (LDA). The common aspect of previously proposed LDA-based algorithms is the use of singular value decomposition (SVD). Due to the difficulty of designing an incremental solution for the eigenvalue problem on the product of scatter matrices in LDA, there has been little work on designing incremental LDA algorithms that can efficiently incorporate new data items as they become available. In this paper, we propose an LDA-based incremental dimension reduction algorithm, called IDR/QR, which applies QR decomposition rather than SVD. Unlike other LDA-based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large data sets. More importantly, with the insertion of new data items, the IDR/QR algorithm can constrain the computational cost by applying efficient QR-updating techniques. Finally, we evaluate the effectiveness of the IDR/QR algorithm in terms of classification error rate on the reduced dimensional space. Our experiments on several real-world data sets reveal that the classification error rate achieved by the IDR/QR algorithm is very close to the best possible one achieved by other LDA-based algorithms. However, the IDR/QR algorithm has much less computational cost, especially when new data items are inserted dynamically.

...read moreread less

Collapse