scispace - formally typeset
Search or ask a question

Showing papers on "Dimensionality reduction published in 2001"


Journal ArticleDOI
TL;DR: This work introduces a new dimensionality reduction technique which it is called Piecewise Aggregate Approximation (PAA), and theoretically and empirically compare it to the other techniques and demonstrate its superiority.
Abstract: The problem of similarity search in large time series databases has attracted much attention recently. It is a non-trivial problem because of the inherent high dimensionality of the data. The most promising solutions involve first performing dimensionality reduction on the data, and then indexing the reduced data with a spatial access method. Three major dimensionality reduction techniques have been proposed: Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and more recently the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Piecewise Aggregate Approximation (PAA). We theoretically and empirically compare it to the other techniques and demonstrate its superiority. In addition to being competitive with or faster than the other methods, our approach has numerous other advantages. It is simple to understand and to implement, it allows more flexible distance measures, including weighted Euclidean queries, and the index can be built in linear time.

1,550 citations


Proceedings ArticleDOI
26 Aug 2001
TL;DR: It is shown that projecting the data onto a random lower-dimensional subspace yields results comparable to conventional dimensionality reduction methods such as principal component analysis: the similarity of data vectors is preserved well under random projection.
Abstract: Random projections have recently emerged as a powerful method for dimensionality reduction. Theoretical results indicate that the method preserves distances quite nicely; however, empirical results are sparse. We present experimental results on using random projection as a dimensionality reduction tool in a number of cases, where the high dimensionality of the data would otherwise lead to burden-some computations. Our application areas are the processing of both noisy and noiseless images, and information retrieval in text documents. We show that projecting the data onto a random lower-dimensional subspace yields results comparable to conventional dimensionality reduction methods such as principal component analysis: the similarity of data vectors is preserved well under random projection. However, using random projections is computationally significantly less expensive than using, e.g., principal component analysis. We also show experimentally that using a sparse random matrix gives additional computational savings in random projection.

1,470 citations


Proceedings ArticleDOI
01 May 2001
TL;DR: This work introduces a new dimensionality reduction technique which it is shown how APCA can be indexed using a multidimensional index structure, and proposes two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching.
Abstract: Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions involve performing dimensionality reduction on the data, then indexing the reduced data with a multidimensional index structure. Many dimensionality reduction techniques have been proposed, including Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Adaptive Piecewise Constant Approximation (APCA). While previous techniques (e.g., SVD, DFT and DWT) choose a common representation for all the items in the database that minimizes the global reconstruction error, APCA approximates each time series by a set of constant value segments of varying lengths such that their individual reconstruction errors are minimal. We show how APCA can be indexed using a multidimensional index structure. We propose two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching: a lower bounding Euclidean distance approximation, and a non-lower bounding, but very tight Euclidean distance approximation and show how they can support fast exact searching, and even faster approximate searching on the same index structure. We theoretically and empirically compare APCA to all the other techniques and demonstrate its superiority.

849 citations


Proceedings Article
03 Jan 2001
TL;DR: This paper draws on ideas from the Exponential family, Generalized linear models, and Bregman distances to give a generalization of PCA to loss functions that it is argued are better suited to other data types.
Abstract: Principal component analysis (PCA) is a commonly applied technique for dimensionality reduction. PCA implicitly minimizes a squared loss function, which may be inappropriate for data that is not real-valued, such as binary-valued data. This paper draws on ideas from the Exponential family, Generalized linear models, and Bregman distances, to give a generalization of PCA to loss functions that we argue are better suited to other data types. We describe algorithms for minimizing the loss functions, and give examples on simulated data.

506 citations


Journal ArticleDOI
TL;DR: By extracting uncorrelated discriminant features, face recognition could be performed with higher accuracy on lower than 16×16 resolution mosaic images and it is suggested that the optimal face image resolution can be regarded as the resolution m × n which makes the dimensionality N = mn of the original image vector space be larger and closer to the number of known-face classes.

383 citations


01 Jan 2001
TL;DR: Locally linear embedding is described, an unsupervised learning algorithm that computes low dimensional, neighborhood preserving embeddings of high dimensional data that attempts to discover nonlinear structure in highdimensional data by exploiting the local symmetries of linear reconstructions.
Abstract: Many problems in information processing involve some form of dimensionality reduction Here we describe locally linear embedding (LLE), an unsupervised learning algorithm that computes low dimensional, neighborhood preserving embeddings of high dimensional data LLE attempts to discover nonlinear structure in high dimensional data by exploiting the local symmetries of linear reconstructions Notably, LLE maps its inputs into a single global coordinate system of lower dimensionality, and its optimizations— though capable of generating highly nonlinear embeddings—do not involve local minima We illustrate the method on images of lips used in audiovisual speech synthesis

259 citations


Proceedings ArticleDOI
01 Dec 2001
TL;DR: It is found that for regression the tensor-rank coding, as a dimensionality reduction technique, significantly outperforms other techniques like PCA.
Abstract: Given a collection of images (matrices) representing a "class" of objects we present a method for extracting the commonalities of the image space directly from the matrix representations (rather than from the vectorized representation which one would normally do in a PCA approach, for example). The general idea is to consider the collection of matrices as a tensor and to look for an approximation of its tensor-rank. The tensor-rank approximation is designed such that the SVD decomposition emerges in the special case where all the input matrices are the repeatition of a single matrix. We evaluate the coding technique both in terms of regression, i.e., the efficiency of the technique for functional approximation, and classification. We find that for regression the tensor-rank coding, as a dimensionality reduction technique, significantly outperforms other techniques like PCA. As for classification, the tensor-rank coding is at is best when the number of training examples is very small.

231 citations


Journal ArticleDOI
TL;DR: Similarity search in large time series databases has attracted much research interest recently as discussed by the authors, however, it is a difficult problem because of the typically high dimensionality of the data and the most promisi...
Abstract: Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promisi...

226 citations


Journal ArticleDOI
TL;DR: On the human signal detection task, the superiority of Kernel PCA feature extraction over linear PCA is reported and de-noising of the original data by the appropriate selection of various nonlinear principal components is demonstrated.
Abstract: In this paper, we propose the application of the Kernel Principal Component Analysis (PCA) technique for feature selection in a high-dimensional feature space, where input variables are mapped by a Gaussian kernel. The extracted features are employed in the regression problems of chaotic Mackey–Glass time-series prediction in a noisy environment and estimating human signal detection performance from brain event-related potentials elicited by task relevant signals. We compared results obtained using either Kernel PCA or linear PCA as data preprocessing steps. On the human signal detection task, we report the superiority of Kernel PCA feature extraction over linear PCA. Similar to linear PCA, we demonstrate de-noising of the original data by the appropriate selection of various nonlinear principal components. The theoretical relation and experimental comparison of Kernel Principal Components Regression, Kernel Ridge Regression and ε-insensitive Support Vector Regression is also provided.

216 citations


01 Jan 2001
TL;DR: This chapter contains sections titled: Introduction, Latent Variable Models andPCA, Probabilistic PCA, Mixtures of Probabilistically Principal Component Analyzers, Local Linear Dimensionality Reduction, Density Modeling, Conclusions.
Abstract: This chapter contains sections titled: Introduction, Latent Variable Models and PCA, Probabilistic PCA, Mixtures of Probabilistic Principal Component Analyzers, Local Linear Dimensionality Reduction, Density Modeling, Conclusions, Appendix A: Maximum Likelihood PCA, Appendix B: Optimal Least-Squares Reconstruction, Appendix C: EM for Mixtures of Probabilistic PCA, Acknowledgments, References

211 citations


Journal ArticleDOI
TL;DR: The key to this approach is to regard the signals as curves in the continuum and employ a functional data-analytic method for dimension reduction, based on the FDA technique for principal coordinates analysis, which has the advantage of providing a signal approximation that is best possible, in an L2 sense, for a given dimension.
Abstract: Motivated by specific problems involving radar-range profiles, we suggest techniques for real-time discrimination in the context of signal analysis The key to our approach is to regard the signals as curves in the continuum and employ a functional data-analytic (FDA) method for dimension reduction, based on the FDA technique for principal coordinates analysis This has the advantage, relative to competing methods such as canonical variates analysis, of providing a signal approximation that is best possible, in an L2 sense, for a given dimension As a result, it produces particularly good discrimination We explore the use of both nonparametric and Gaussian-based discriminators applied to the dimensionreduced data

Journal ArticleDOI
TL;DR: In this paper, a permutation test is suggested as a means of determining dimension, and examples are given throughout the discussion, which can be viewed as pre-processors, aiding the analyst's understanding of the data and the choice of a final classifier.
Abstract: Summary This paper discusses visualization methods for discriminant analysis. It does not address numerical methods for classification per se, but rather focuses on graphical methods that can be viewed as pre-processors, aiding the analyst’s understanding of the data and the choice of a final classifier. The methods are adaptations of recent results in dimension reduction for regression, including sliced inverse regression and sliced average variance estimation. A permutation test is suggested as a means of determining dimension, and examples are given throughout the discussion.

Proceedings ArticleDOI
07 Jul 2001
TL;DR: This work addresses the feature selection problem by proposing a three-step algorithm that uses a variation of the well known Relief algorithm to remove irrelevance, and which is shown to be more effective than standard feature selection algorithms for large data sets with lots of irrelevant and redundant features.
Abstract: The number of features that can be completed over an image is, for practical purposes, limitless. Unfortunately, the number of features that can be computed and exploited by most computer vision systems is considerably less. As a result, it is important to develop techniques for selecting features from very large data sets that include many irrelevant or redundant features. This work addresses the feature selection problem by proposing a three-step algorithm. The first step uses a variation of the well known Relief algorithm to remove irrelevance; the second step clusters features using K-means to remove redundancy; and the third step is a standard combinatorial feature selection algorithm. This three-step combination is shown to be more effective than standard feature selection algorithms for large data sets with lots of irrelevant and redundant features. It is also shown to he no worse than standard techniques for data sets that do not have these properties. Finally, we show a third experiment in which a data set with 4096 features is reduced to 5% of its original size with very little information loss.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a method of effective dimension reduction for a multi-index model which is based on iterative improvement of the family of average derivative estimates, and showed that in the case when the effective dimension m of the index space does not exceed 3, this space can be estimated with the rate n − 1/2 under mild assumptions on the model.
Abstract: We propose a new method of effective dimension reduction for a multiindex model which is based on iterative improvement of the family of average derivative estimates. The procedure is computationally straightforward and does not require any prior information about the structure of the underlying model. We show that in the case when the effective dimension m of the index space does not exceed 3, this space can be estimated with the rate n −1/2 under rather mild assumptions on the model.

Proceedings Article
03 Jan 2001
TL;DR: A variant of LLE that can simultaneously group the data and calculate local embedding of each group is studied, and an estimate for the upper bound on the intrinsic dimension of the data set is obtained automatically.
Abstract: Locally Linear Embedding (LLE) is an elegant nonlinear dimensionality-reduction technique recently introduced by Roweis and Saul [2]. It fails when the data is divided into separate groups. We study a variant of LLE that can simultaneously group the data and calculate local embedding of each group. An estimate for the upper bound on the intrinsic dimension of the data set is obtained automatically.

Journal ArticleDOI
TL;DR: By comparing several feature selection methods, this work demonstrates how phenotypic classes can be predicted by combining feature selection and discriminant analysis and shows that the right dimension reduction strategy is of crucial importance for the classification performance.
Abstract: Molecular portraits, such as mRNA expression or DNA methylation patterns, have been shown to be strongly correlated with phenotypical parameters. These molecular patterns can be revealed routinely on a genomic scale. However, class prediction based on these patterns is an under-determined problem, due to the extreme high dimensionality of the data compared to the usually small number of available samples. This makes a reduction of the data dimensionality necessary. Here we demonstrate how phenotypic classes can be predicted by combining feature selection and discriminant analysis. By comparing several feature selection methods we show that the right dimension reduction strategy is of crucial importance for the classification performance. The techniques are demonstrated by methylation pattern based discrimination between acute lymphoblastic leukemia and acute myeloid leukemia.

Journal ArticleDOI
TL;DR: A unified covariance model is introduced that implements the probabilistic principal surface (PPS), and it is shown in two different comparisons that the PPS outperforms the GTM under identical parameter settings.
Abstract: Principal curves and surfaces are nonlinear generalizations of principal components and subspaces, respectively. They can provide insightful summary of high-dimensional data not typically attainable by classical linear methods. Solutions to several problems, such as proof of existence and convergence, faced by the original principal curve formulation have been proposed in the past few years. Nevertheless, these solutions are not generally extensible to principal surfaces, the mere computation of which presents a formidable obstacle. Consequently, relatively few studies of principal surfaces are available. We previously (2000) proposed the probabilistic principal surface (PPS) to address a number of issues associated with current principal surface algorithms. PPS uses a manifold oriented covariance noise model, based on the generative topographical mapping (GTM), which can be viewed as a parametric formulation of Kohonen's self-organizing map. Building on the PPS, we introduce a unified covariance model that implements PPS (0 1) by varying the clamping parameter /spl alpha/. Then, we comprehensively evaluate the empirical performance of PPS, GTM, and the manifold-aligned GTM on three popular benchmark data sets. It is shown in two different comparisons that the PPS outperforms the GTM under identical parameter settings. Convergence of the PPS is found to be identical to that of the GTM and the computational overhead incurred by the PPS decreases to 40 percent or less for more complex manifolds. These results show that the generalized PPS provides a flexible and effective way of obtaining principal surfaces.

01 Jan 2001
TL;DR: It is shown how the "curse of dimensionality" and the "empty space phenomenon" can be taken into account in the design of neural network algorithms, and how non-linear dimension reduction techniques can be used to circumvent the problem.
Abstract: Observations from real-world problems are often high-dimensional vectors, i.e. made up of many variables. Learning methods, including artificial neural networks, often have difficulties to handle a relatively small number of high-dimensional data. In this paper, we show how concepts gained from our intuition on 2- and 3-dimensional data can be misleading when used in high-dimensional settings. When then show how the "curse of dimensionality" and the "empty space phenomenon" can be taken into account in the design of neural network algorithms, and how non-linear dimension reduction techniques can be used to circumvent the problem. We conclude by an illustrative example of this last method on the forecasting of financial time series.

Journal ArticleDOI
TL;DR: The numerical experiments show the ability of rough sets to select reduced set of pattern's features (minimizing the pattern size), while providing better generalization of neural-network texture classifiers.

Journal ArticleDOI
TL;DR: It is proved that the classical optimal discriminant vectors are equivalent to UODV, which can be used to extract (L−1) uncorrelated discriminant features for L-class problems without losing any discriminant information in the meaning of Fisher discriminant criterion function.

Proceedings ArticleDOI
Charu C. Aggarwal1
01 May 2001
TL;DR: An intuitive model of the effects of dimensionality reduction on arbitrary high dimensional problems is provided and it is demonstrated that by making simple changes to the implementation details ofdimensionality reduction techniques, one can considerably improve the quality of similarity search.
Abstract: The dimensionality curse has profound effects on the effectiveness of high-dimensional similarity indexing from the performance perspective. One of the well known techniques for improving the indexing performance is the method of dimensionality reduction. In this technique, the data is transformed to a lower dimensional space by finding a new axis-system in which most of the data variance is preserved in a few dimensions. This reduction may also have a positive effect on the quality of similarity for certain data domains such as text. For other domains, it may lead to loss of information and degradation of search quality. Recent research indicates that the improvement for the text domain is caused by the re-enforcement of the semantic concepts in the data. In this paper, we provide an intuitive model of the effects of dimensionality reduction on arbitrary high dimensional problems. We provide an effective diagnosis of the causality behind the qualitative effects of dimensionality reduction on a given data set. The analysis suggests that these effects are very data dependent. Our analysis also indicates that currently accepted techniques of picking the reduction which results in the least loss of information are useful for maximizing precision and recall, but are not necessarily optimum from a qualitative perspective. We demonstrate that by making simple changes to the implementation details of dimensionality reduction techniques, we can considerably improve the quality of similarity search.

Book ChapterDOI
02 Jul 2001
TL;DR: In this paper, the authors provide a systematic study of input decimation on synthetic data sets and analyze how the interaction between correlation and performance in base classifiers affects ensemble performance, and propose a method that decouples the classifiers by training them with different subsets of the input features.
Abstract: Using an ensemble of classifiers instead of a single classifier has been shown to improve generalization performance in many machine learning problems [4, 16]. However, the extent of such improvement depends greatly on the amount of correlation among the errors of the base classifiers [1,14]. As such, reducing those correlations while keeping the base classifiers' performance levels high is a promising research topic. In this paper, we describe input decimation, a method that decouples the base classifiers by training them with different subsets of the input features. In past work [15], we showed the theoretical benefits of input decimation and presented its application to a handful of real data sets. In this paper, we provide a systematic study of input decimation on synthetic data sets and analyze how the interaction between correlation and performance in base classifiers affects ensemble performance.

Journal ArticleDOI
TL;DR: In this paper, the problem of identifying subsets of variables that best approximate the full set of variables or their first few principal components is considered, thus stressing dimensionality reduction in terms of the original variables rather than the derived variables.
Abstract: Principal component analysis is widely used in the analysis of multivariate data in the agricultural, biological, and environmental sciences. The first few principal components (PCs) of a set of variables are derived variables with optimal properties in terms of approximating the original variables. This paper considers the problem of identifying subsets of variables that best approximate the full set of variables or their first few PCs, thus stressing dimensionality reduction in terms of the original variables rather than in terms of derived variables (PCs) whose definition requires all the original variables. Criteria for selecting variables are often ill defined and may produce inappropriate subsets. Indicators of the performance of different subsets of the variables are discussed and two criteria are defined. These criteria are used in stepwise selection-type algorithms to choose good subsets. Examples are given that show, among other things, that the selection of variable subsets should not be based only on the PC loadings of the variables.

Proceedings ArticleDOI
09 Jul 2001
TL;DR: This paper proposes to show the interest of ICA as a tool for unsupervised analysis of hyperspectral images by using higher order statistics and leads to independent components, a stronger statistical assumption revealing interesting features in the usually non gaussian hyperspectrals data sets.
Abstract: Independent component analysis (ICA) is a multivariate data analysis process largely studied these last years in the signal processing community for blind source separation. This paper proposes to show the interest of ICA as a tool for unsupervised analysis of hyperspectral images. The commonly used principal component analysis (PCA) is the mean square optimal projection for gaussian data leading to uncorrelated components by using second order statistics. ICA rather uses higher order statistics and leads to independent components, a stronger statistical assumption revealing interesting features in the usually non gaussian hyperspectral data sets.

Journal ArticleDOI
TL;DR: MKL is shown to outperform KL when the data distribution is far from a multidimensional Gaussian and to better cope with large sets of patterns, which could cause a severe performance drop in KL.
Abstract: This work introduces the multispace Karhunen-Loeve (MKL) as a new approach to unsupervised dimensionality reduction for pattern representation and classification. The training set is automatically partitioned into disjoint subsets, according to an optimality criterion; each subset then determines a different KL subspace which is specialized in representing a particular group of patterns. The extension of the classical KL operators and the definition of ad hoc distances allow MKL to be effectively used where KL is commonly employed. The limits of the standard KL transform are pointed out, in particular, MKL is shown to outperform KL when the data distribution is far from a multidimensional Gaussian and to better cope with large sets of patterns, which could cause a severe performance drop in KL.

Journal ArticleDOI
TL;DR: In this paper, feature space theory is introduced as a mathematical foundation for feature related concepts and techniques in data mining.
Abstract: In data mining, an important task in classification and prediction includes feature construction, feature description, feature selection, feature relevance analysis and feature reduction. In this paper, feature space theory is introduced as a mathematical foundation for feature related concepts and techniques in data mining.

Proceedings ArticleDOI
01 Oct 2001
TL;DR: A novel method for extracting features for the class of images represented by the positive images provided by subjective RF is proposed, using Principal Component Analysis (PCA) to reduce both noise contained in the original image features and dimensionality of feature spaces.
Abstract: In the past few years, relevance feedback (RF) has been used as an effective solution for content-based image retrieval (CBIR). Although effective, the RF-CBIR framework does not address the issue of feature extraction for dimension reduction and noise reduction. In this paper, we propose a novel method for extracting features for the class of images represented by the positive images provided by subjective RF. Principal Component Analysis (PCA) is used to reduce both noise contained in the original image features and dimensionality of feature spaces. The method increases the retrieval speed and reduces the memory significantly without sacrificing the retrieval accuracy.

Journal ArticleDOI
TL;DR: A novel enhancement for unsupervised learning of conditional Gaussian networks that benefits from feature selection based on the assumption that in the absence of labels reflecting the cluster membership of each case of the database, those features that exhibit low correlation with the rest of the features can be considered irrelevant for the learning process.
Abstract: This paper introduces a novel enhancement for unsupervised learning of conditional Gaussian networks that benefits from feature selection. Our proposal is based on the assumption that, in the absence of labels reflecting the cluster membership of each case of the database, those features that exhibit low correlation with the rest of the features can be considered irrelevant for the learning process. Thus, we suggest performing this process using only the relevant features. Then, every irrelevant feature is added to the learned model to obtain an explanatory model for the original database which is our primary goal. A simple and, thus, efficient measure to assess the relevance of the features for the learning process is presented. Additionally, the form of this measure allows us to calculate a relevance threshold to automatically identify the relevant features. The experimental results reported for synthetic and real-world databases show the ability of our proposal to distinguish between relevant and irrelevant features and to accelerate learning, while still obtaining good explanatory models for the original database.

Journal ArticleDOI
TL;DR: This article presents a family of algorithms that combine nonlinear mapping techniques with neural networks, and make possible the scaling of very large data sets that are intractable with conventional methodologies.
Abstract: Multidimensional scaling (MDS) is a collection of statistical techniques that attempt to embed a set of patterns described by means of a dissimilarity matrix into a low-dimensional display plane in a way that preserves their original pairwise interrelationships as closely as possible. Unfortunately, current MDS algorithms are notoriously slow, and their use is limited to small data sets. In this article, we present a family of algorithms that combine nonlinear mapping techniques with neural networks, and make possible the scaling of very large data sets that are intractable with conventional methodologies. The method employs a nonlinear mapping algorithm to project a small random sample, and then "learns" the underlying transform using one or more multilayer perceptrons. The distinct advantage of this approach is that it captures the nonlinear mapping relationship in an explicit function, and allows the scaling of additional patterns as they become available, without the need to reconstruct the entire map. A novel encoding scheme is described, allowing this methodology to be used with a wide variety of input data representations and similarity functions. The potential of the algorithm is illustrated in the analysis of two combinatorial libraries and an ensemble of molecular conformations. The method is particularly useful for extracting low-dimensional Cartesian coordinate vectors from large binary spaces, such as those encountered in the analysis of large chemical data sets. c 2001 John Wiley & Sons, Inc. J Comput Chem 22: 488-500, 2001

Proceedings ArticleDOI
07 Oct 2001
TL;DR: Non-negative matrix factorization (NMF) is used for dimensionality reduction of the vector space model, where matrices decomposed by NMF only contain non-negative values, the original data are represented by only additive, not subtractive, combinations of the basis vectors.
Abstract: The vector space model (VSM) is a conventional information retrieval model, which represents a document collection by a term-by-document matrix. Since term-by-document matrices are usually high-dimensional and sparse, they are susceptible to noise and are also difficult to capture the underlying semantic structure. Additionally, the storage and processing of such matrices places great demands on computing resources. Dimensionality reduction is a way to overcome these problems. Principal component analysis (PCA) and singular value decomposition (SVD) are popular techniques for dimensionality reduction based on matrix decomposition, however they contain both positive and negative values in the decomposed matrices. In the work described here, we use non-negative matrix factorization (NMF) for dimensionality reduction of the vector space model. Since matrices decomposed by NMF only contain non-negative values, the original data are represented by only additive, not subtractive, combinations of the basis vectors. This characteristic of parts-based representation is appealing because it reflects the intuitive notion of combining parts to form a whole. Also NMF computation is based on the simple iterative algorithm, it is therefore advantageous for applications involving large matrices. Using the MEDLINE collection, we experimentally showed that NMF offers great improvement over the vector space model.