Showing papers on "Dimensionality reduction published in 2001"

PDF

Open Access

Journal Article•DOI•

Dimensionality reduction for fast similarity search in large time series databases

[...]

Eamonn Keogh¹, Kaushik Chakrabarti², Michael J. Pazzani¹, Sharad Mehrotra¹•Institutions (2)

University of California, Irvine¹, University of Illinois at Urbana–Champaign²

01 Aug 2001-Knowledge and Information Systems

TL;DR: This work introduces a new dimensionality reduction technique which it is called Piecewise Aggregate Approximation (PAA), and theoretically and empirically compare it to the other techniques and demonstrate its superiority.

...read moreread less

Abstract: The problem of similarity search in large time series databases has attracted much attention recently. It is a non-trivial problem because of the inherent high dimensionality of the data. The most promising solutions involve first performing dimensionality reduction on the data, and then indexing the reduced data with a spatial access method. Three major dimensionality reduction techniques have been proposed: Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and more recently the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Piecewise Aggregate Approximation (PAA). We theoretically and empirically compare it to the other techniques and demonstrate its superiority. In addition to being competitive with or faster than the other methods, our approach has numerous other advantages. It is simple to understand and to implement, it allows more flexible distance measures, including weighted Euclidean queries, and the index can be built in linear time.

...read moreread less

1,550 citations

Proceedings Article•DOI•

Random projection in dimensionality reduction: applications to image and text data

[...]

Ella Bingham¹, Heikki Mannila¹•Institutions (1)

Helsinki University of Technology¹

26 Aug 2001

TL;DR: It is shown that projecting the data onto a random lower-dimensional subspace yields results comparable to conventional dimensionality reduction methods such as principal component analysis: the similarity of data vectors is preserved well under random projection.

...read moreread less

Abstract: Random projections have recently emerged as a powerful method for dimensionality reduction. Theoretical results indicate that the method preserves distances quite nicely; however, empirical results are sparse. We present experimental results on using random projection as a dimensionality reduction tool in a number of cases, where the high dimensionality of the data would otherwise lead to burden-some computations. Our application areas are the processing of both noisy and noiseless images, and information retrieval in text documents. We show that projecting the data onto a random lower-dimensional subspace yields results comparable to conventional dimensionality reduction methods such as principal component analysis: the similarity of data vectors is preserved well under random projection. However, using random projections is computationally significantly less expensive than using, e.g., principal component analysis. We also show experimentally that using a sparse random matrix gives additional computational savings in random projection.

...read moreread less

1,470 citations

Proceedings Article•DOI•

Locally adaptive dimensionality reduction for indexing large time series databases

[...]

Eamonn Keogh¹, Kaushik Chakrabarti¹, Michael J. Pazzani¹, Sharad Mehrotra¹•Institutions (1)

University of California, Irvine¹

01 May 2001

TL;DR: This work introduces a new dimensionality reduction technique which it is shown how APCA can be indexed using a multidimensional index structure, and proposes two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching.

...read moreread less

Abstract: Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions involve performing dimensionality reduction on the data, then indexing the reduced data with a multidimensional index structure. Many dimensionality reduction techniques have been proposed, including Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Adaptive Piecewise Constant Approximation (APCA). While previous techniques (e.g., SVD, DFT and DWT) choose a common representation for all the items in the database that minimizes the global reconstruction error, APCA approximates each time series by a set of constant value segments of varying lengths such that their individual reconstruction errors are minimal. We show how APCA can be indexed using a multidimensional index structure. We propose two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching: a lower bounding Euclidean distance approximation, and a non-lower bounding, but very tight Euclidean distance approximation and show how they can support fast exact searching, and even faster approximate searching on the same index structure. We theoretically and empirically compare APCA to all the other techniques and demonstrate its superiority.

...read moreread less

849 citations

Proceedings Article•

A Generalization of Principal Components Analysis to the Exponential Family

[...]

Michael Collins¹, Sanjoy Dasgupta¹, Robert E. Schapire¹•Institutions (1)

AT&T Labs¹

03 Jan 2001

TL;DR: This paper draws on ideas from the Exponential family, Generalized linear models, and Bregman distances to give a generalization of PCA to loss functions that it is argued are better suited to other data types.

...read moreread less

Abstract: Principal component analysis (PCA) is a commonly applied technique for dimensionality reduction. PCA implicitly minimizes a squared loss function, which may be inappropriate for data that is not real-valued, such as binary-valued data. This paper draws on ideas from the Exponential family, Generalized linear models, and Bregman distances, to give a generalization of PCA to loss functions that we argue are better suited to other data types. We describe algorithms for minimizing the loss functions, and give examples on simulated data.

...read moreread less

506 citations

Journal Article•DOI•

Face recognition based on the uncorrelated discriminant transformation

[...]

Zhong Jin¹, Jingyu Yang¹, Zhong-Shan Hu¹, Zhen Lou¹•Institutions (1)

Nanjing University of Science and Technology¹

01 Jan 2001-Pattern Recognition

TL;DR: By extracting uncorrelated discriminant features, face recognition could be performed with higher accuracy on lower than 16×16 resolution mosaic images and it is suggested that the optimal face image resolution can be regarded as the resolution m × n which makes the dimensionality N = mn of the original image vector space be larger and closer to the number of known-face classes.

...read moreread less

383 citations

An Introduction to Locally Linear Embedding

[...]

Lawrence K. Saul¹, Sam T. Roweis²•Institutions (2)

AT&T¹, University College London²

01 Jan 2001

TL;DR: Locally linear embedding is described, an unsupervised learning algorithm that computes low dimensional, neighborhood preserving embeddings of high dimensional data that attempts to discover nonlinear structure in highdimensional data by exploiting the local symmetries of linear reconstructions.

...read moreread less

Abstract: Many problems in information processing involve some form of dimensionality reduction Here we describe locally linear embedding (LLE), an unsupervised learning algorithm that computes low dimensional, neighborhood preserving embeddings of high dimensional data LLE attempts to discover nonlinear structure in high dimensional data by exploiting the local symmetries of linear reconstructions Notably, LLE maps its inputs into a single global coordinate system of lower dimensionality, and its optimizations— though capable of generating highly nonlinear embeddings—do not involve local minima We illustrate the method on images of lips used in audiovisual speech synthesis

...read moreread less

259 citations

Proceedings Article•DOI•

Linear image coding for regression and classification using the tensor-rank principle

[...]

Amnon Shashua¹, Anat Levin¹•Institutions (1)

Hebrew University of Jerusalem¹

01 Dec 2001

TL;DR: It is found that for regression the tensor-rank coding, as a dimensionality reduction technique, significantly outperforms other techniques like PCA.

...read moreread less

Abstract: Given a collection of images (matrices) representing a "class" of objects we present a method for extracting the commonalities of the image space directly from the matrix representations (rather than from the vectorized representation which one would normally do in a PCA approach, for example). The general idea is to consider the collection of matrices as a tensor and to look for an approximation of its tensor-rank. The tensor-rank approximation is designed such that the SVD decomposition emerges in the special case where all the input matrices are the repeatition of a single matrix. We evaluate the coding technique both in terms of regression, i.e., the efficiency of the technique for functional approximation, and classification. We find that for regression the tensor-rank coding, as a dimensionality reduction technique, significantly outperforms other techniques like PCA. As for classification, the tensor-rank coding is at is best when the number of training examples is very small.

...read moreread less

231 citations

Journal Article•DOI•

Locally adaptive dimensionality reduction for indexing large time series databases

[...]

KeoghEamonn, ChakrabartiKaushik, PazzaniMichael, MehrotraSharad

01 May 2001-Sigmod Record

TL;DR: Similarity search in large time series databases has attracted much research interest recently as discussed by the authors, however, it is a difficult problem because of the typically high dimensionality of the data and the most promisi...

...read moreread less

226 citations

Journal Article•DOI•

Kernel PCA for Feature Extraction and De-Noising in Nonlinear Regression

[...]

Roman Rosipal¹, Mark Girolami², Leonard J. Trejo¹, Andrzej Cichocki•Institutions (2)

Ames Research Center¹, University of the West of Scotland²

01 Jan 2001-Neural Computing and Applications

TL;DR: On the human signal detection task, the superiority of Kernel PCA feature extraction over linear PCA is reported and de-noising of the original data by the appropriate selection of various nonlinear principal components is demonstrated.

...read moreread less

Abstract: In this paper, we propose the application of the Kernel Principal Component Analysis (PCA) technique for feature selection in a high-dimensional feature space, where input variables are mapped by a Gaussian kernel. The extracted features are employed in the regression problems of chaotic Mackey–Glass time-series prediction in a noisy environment and estimating human signal detection performance from brain event-related potentials elicited by task relevant signals. We compared results obtained using either Kernel PCA or linear PCA as data preprocessing steps. On the human signal detection task, we report the superiority of Kernel PCA feature extraction over linear PCA. Similar to linear PCA, we demonstrate de-noising of the original data by the appropriate selection of various nonlinear principal components. The theoretical relation and experimental comparison of Kernel Principal Components Regression, Kernel Ridge Regression and ε-insensitive Support Vector Regression is also provided.

...read moreread less

216 citations

Mixtures of Probabilistic Principal Component Analyzers

[...]

Michael I. Jordan, Terrence J. Sejnowski

01 Jan 2001

TL;DR: This chapter contains sections titled: Introduction, Latent Variable Models andPCA, Probabilistic PCA, Mixtures of Probabilistically Principal Component Analyzers, Local Linear Dimensionality Reduction, Density Modeling, Conclusions.

...read moreread less

Abstract: This chapter contains sections titled: Introduction, Latent Variable Models and PCA, Probabilistic PCA, Mixtures of Probabilistic Principal Component Analyzers, Local Linear Dimensionality Reduction, Density Modeling, Conclusions, Appendix A: Maximum Likelihood PCA, Appendix B: Optimal Least-Squares Reconstruction, Appendix C: EM for Mixtures of Probabilistic PCA, Acknowledgments, References

...read moreread less

211 citations

Journal Article•DOI•

A Functional Data—Analytic Approach to Signal Discrimination

[...]

Peter Hall¹, Donald Poskitt¹, Brett Presnell²•Institutions (2)

Australian National University¹, University of Florida²

01 Feb 2001-Technometrics

TL;DR: The key to this approach is to regard the signals as curves in the continuum and employ a functional data-analytic method for dimension reduction, based on the FDA technique for principal coordinates analysis, which has the advantage of providing a signal approximation that is best possible, in an L2 sense, for a given dimension.

...read moreread less

Abstract: Motivated by specific problems involving radar-range profiles, we suggest techniques for real-time discrimination in the context of signal analysis The key to our approach is to regard the signals as curves in the continuum and employ a functional data-analytic (FDA) method for dimension reduction, based on the FDA technique for principal coordinates analysis This has the advantage, relative to competing methods such as canonical variates analysis, of providing a signal approximation that is best possible, in an L2 sense, for a given dimension As a result, it produces particularly good discrimination We explore the use of both nonparametric and Gaussian-based discriminators applied to the dimensionreduced data

...read moreread less

Journal Article•DOI•

Theory & Methods: Special Invited Paper: Dimension Reduction and Visualization in Discriminant Analysis (with discussion)

[...]

R. Dennis Cook¹, Xiangrong Yin²•Institutions (2)

University of Minnesota¹, University of Georgia²

01 Jun 2001-Australian & New Zealand Journal of Statistics

TL;DR: In this paper, a permutation test is suggested as a means of determining dimension, and examples are given throughout the discussion, which can be viewed as pre-processors, aiding the analyst's understanding of the data and the choice of a final classifier.

...read moreread less

Abstract: Summary This paper discusses visualization methods for discriminant analysis. It does not address numerical methods for classification per se, but rather focuses on graphical methods that can be viewed as pre-processors, aiding the analyst’s understanding of the data and the choice of a final classifier. The methods are adaptations of recent results in dimension reduction for regression, including sliced inverse regression and sliced average variance estimation. A permutation test is suggested as a means of determining dimension, and examples are given throughout the discussion.

...read moreread less

Proceedings Article•DOI•

Feature selection from huge feature sets

[...]

José Bins, Bruce A. Draper

07 Jul 2001

TL;DR: This work addresses the feature selection problem by proposing a three-step algorithm that uses a variation of the well known Relief algorithm to remove irrelevance, and which is shown to be more effective than standard feature selection algorithms for large data sets with lots of irrelevant and redundant features.

...read moreread less

Abstract: The number of features that can be completed over an image is, for practical purposes, limitless. Unfortunately, the number of features that can be computed and exploited by most computer vision systems is considerably less. As a result, it is important to develop techniques for selecting features from very large data sets that include many irrelevant or redundant features. This work addresses the feature selection problem by proposing a three-step algorithm. The first step uses a variation of the well known Relief algorithm to remove irrelevance; the second step clusters features using K-means to remove redundancy; and the third step is a standard combinatorial feature selection algorithm. This three-step combination is shown to be more effective than standard feature selection algorithms for large data sets with lots of irrelevant and redundant features. It is also shown to he no worse than standard techniques for data sets that do not have these properties. Finally, we show a third experiment in which a data set with 4096 features is reduced to 5% of its original size with very little information loss.

...read moreread less

Journal Article•DOI•

Structure Adaptive Approach for Dimension Reduction

[...]

Marian Hristache, Anatoli Juditsky, Jörg Polzehl, Vladimir Spokoiny

01 Dec 2001-Annals of Statistics

TL;DR: In this article, the authors proposed a method of effective dimension reduction for a multi-index model which is based on iterative improvement of the family of average derivative estimates, and showed that in the case when the effective dimension m of the index space does not exceed 3, this space can be estimated with the rate n − 1/2 under mild assumptions on the model.

...read moreread less

Abstract: We propose a new method of effective dimension reduction for a multiindex model which is based on iterative improvement of the family of average derivative estimates. The procedure is computationally straightforward and does not require any prior information about the structure of the underlying model. We show that in the case when the effective dimension m of the index space does not exceed 3, this space can be estimated with the rate n −1/2 under rather mild assumptions on the model.

...read moreread less

Proceedings Article•

Grouping and dimensionality reduction by locally linear embedding

[...]

Marzia Polito¹, Pietro Perona¹•Institutions (1)

California Institute of Technology¹

03 Jan 2001

TL;DR: A variant of LLE that can simultaneously group the data and calculate local embedding of each group is studied, and an estimate for the upper bound on the intrinsic dimension of the data set is obtained automatically.

...read moreread less

Abstract: Locally Linear Embedding (LLE) is an elegant nonlinear dimensionality-reduction technique recently introduced by Roweis and Saul [2]. It fails when the data is divided into separate groups. We study a variant of LLE that can simultaneously group the data and calculate local embedding of each group. An estimate for the upper bound on the intrinsic dimension of the data set is obtained automatically.

...read moreread less

Journal Article•DOI•

Feature selection for DNA methylation based cancer classification.

[...]

Fabian Model¹, Peter Adorjan¹, Alexander Olek¹, Christian Piepenbrock¹•Institutions (1)

Epigenomics AG¹

01 Jun 2001-Bioinformatics

TL;DR: By comparing several feature selection methods, this work demonstrates how phenotypic classes can be predicted by combining feature selection and discriminant analysis and shows that the right dimension reduction strategy is of crucial importance for the classification performance.

...read moreread less

Abstract: Molecular portraits, such as mRNA expression or DNA methylation patterns, have been shown to be strongly correlated with phenotypical parameters. These molecular patterns can be revealed routinely on a genomic scale. However, class prediction based on these patterns is an under-determined problem, due to the extreme high dimensionality of the data compared to the usually small number of available samples. This makes a reduction of the data dimensionality necessary. Here we demonstrate how phenotypic classes can be predicted by combining feature selection and discriminant analysis. By comparing several feature selection methods we show that the right dimension reduction strategy is of crucial importance for the classification performance. The techniques are demonstrated by methylation pattern based discrimination between acute lymphoblastic leukemia and acute myeloid leukemia.

...read moreread less

Journal Article•DOI•

A unified model for probabilistic principal surfaces

[...]

Kui-yu Chang, Joydeep Ghosh¹•Institutions (1)

University of Texas at Austin¹

01 Jan 2001-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A unified covariance model is introduced that implements the probabilistic principal surface (PPS), and it is shown in two different comparisons that the PPS outperforms the GTM under identical parameter settings.

...read moreread less

Abstract: Principal curves and surfaces are nonlinear generalizations of principal components and subspaces, respectively. They can provide insightful summary of high-dimensional data not typically attainable by classical linear methods. Solutions to several problems, such as proof of existence and convergence, faced by the original principal curve formulation have been proposed in the past few years. Nevertheless, these solutions are not generally extensible to principal surfaces, the mere computation of which presents a formidable obstacle. Consequently, relatively few studies of principal surfaces are available. We previously (2000) proposed the probabilistic principal surface (PPS) to address a number of issues associated with current principal surface algorithms. PPS uses a manifold oriented covariance noise model, based on the generative topographical mapping (GTM), which can be viewed as a parametric formulation of Kohonen's self-organizing map. Building on the PPS, we introduce a unified covariance model that implements PPS (0 1) by varying the clamping parameter /spl alpha/. Then, we comprehensively evaluate the empirical performance of PPS, GTM, and the manifold-aligned GTM on three popular benchmark data sets. It is shown in two different comparisons that the PPS outperforms the GTM under identical parameter settings. Convergence of the PPS is found to be identical to that of the GTM and the computational overhead incurred by the PPS decreases to 40 percent or less for more complex manifolds. These results show that the generalized PPS provides a flexible and effective way of obtaining principal surfaces.

...read moreread less

Learning high-dimensional data

[...]

Michel Verleysen¹•Institutions (1)

University College London¹

01 Jan 2001

TL;DR: It is shown how the "curse of dimensionality" and the "empty space phenomenon" can be taken into account in the design of neural network algorithms, and how non-linear dimension reduction techniques can be used to circumvent the problem.

...read moreread less

Abstract: Observations from real-world problems are often high-dimensional vectors, i.e. made up of many variables. Learning methods, including artificial neural networks, often have difficulties to handle a relatively small number of high-dimensional data. In this paper, we show how concepts gained from our intuition on 2- and 3-dimensional data can be misleading when used in high-dimensional settings. When then show how the "curse of dimensionality" and the "empty space phenomenon" can be taken into account in the design of neural network algorithms, and how non-linear dimension reduction techniques can be used to circumvent the problem. We conclude by an illustrative example of this last method on the forecasting of financial time series.

...read moreread less

Journal Article•DOI•

Rough sets as a front end of neural-networks texture classifiers

[...]

Roman W. Swiniarski¹, Larry Hargis•Institutions (1)

San Diego State University¹

01 Feb 2001-Neurocomputing

TL;DR: The numerical experiments show the ability of rough sets to select reduced set of pattern's features (minimizing the pattern size), while providing better generalization of neural-network texture classifiers.

...read moreread less

Journal Article•DOI•

A theorem on the uncorrelated optimal discriminant vectors

[...]

Zhong Jin¹, Jingyu Yang¹, Zhenmin Tang¹, Zhong-Shan Hu¹•Institutions (1)

Nanjing University of Science and Technology¹

01 Oct 2001-Pattern Recognition

TL;DR: It is proved that the classical optimal discriminant vectors are equivalent to UODV, which can be used to extract (L−1) uncorrelated discriminant features for L-class problems without losing any discriminant information in the meaning of Fisher discriminant criterion function.

...read moreread less

Proceedings Article•DOI•

On the effects of dimensionality reduction on high dimensional similarity search

[...]

Charu C. Aggarwal¹•Institutions (1)

IBM¹

01 May 2001

TL;DR: An intuitive model of the effects of dimensionality reduction on arbitrary high dimensional problems is provided and it is demonstrated that by making simple changes to the implementation details ofdimensionality reduction techniques, one can considerably improve the quality of similarity search.

...read moreread less

Abstract: The dimensionality curse has profound effects on the effectiveness of high-dimensional similarity indexing from the performance perspective. One of the well known techniques for improving the indexing performance is the method of dimensionality reduction. In this technique, the data is transformed to a lower dimensional space by finding a new axis-system in which most of the data variance is preserved in a few dimensions. This reduction may also have a positive effect on the quality of similarity for certain data domains such as text. For other domains, it may lead to loss of information and degradation of search quality. Recent research indicates that the improvement for the text domain is caused by the re-enforcement of the semantic concepts in the data. In this paper, we provide an intuitive model of the effects of dimensionality reduction on arbitrary high dimensional problems. We provide an effective diagnosis of the causality behind the qualitative effects of dimensionality reduction on a given data set. The analysis suggests that these effects are very data dependent. Our analysis also indicates that currently accepted techniques of picking the reduction which results in the least loss of information are useful for maximizing precision and recall, but are not necessarily optimum from a qualitative perspective. We demonstrate that by making simple changes to the implementation details of dimensionality reduction techniques, we can considerably improve the quality of similarity search.

...read moreread less

Book Chapter•DOI•

Input Decimation Ensembles: Decorrelation through Dimensionality Reduction

[...]

Nikunj C. Oza¹, Kagan Tumer²•Institutions (2)

University of California¹, Ames Research Center²

02 Jul 2001

TL;DR: In this paper, the authors provide a systematic study of input decimation on synthetic data sets and analyze how the interaction between correlation and performance in base classifiers affects ensemble performance, and propose a method that decouples the classifiers by training them with different subsets of the input features.

...read moreread less

Abstract: Using an ensemble of classifiers instead of a single classifier has been shown to improve generalization performance in many machine learning problems [4, 16]. However, the extent of such improvement depends greatly on the amount of correlation among the errors of the base classifiers [1,14]. As such, reducing those correlations while keeping the base classifiers' performance levels high is a promising research topic. In this paper, we describe input decimation, a method that decouples the base classifiers by training them with different subsets of the input features. In past work [15], we showed the theoretical benefits of input decimation and presented its application to a handful of real data sets. In this paper, we provide a systematic study of input decimation on synthetic data sets and analyze how the interaction between correlation and performance in base classifiers affects ensemble performance.

...read moreread less

Journal Article•DOI•

Variable selection and the interpretation of principal subspaces

[...]

Jorge Cadima¹, Ian T. Jolliffe²•Institutions (2)

Instituto Superior de Agronomia¹, King's College, Aberdeen²

01 Mar 2001-Journal of Agricultural Biological and Environmental Statistics

TL;DR: In this paper, the problem of identifying subsets of variables that best approximate the full set of variables or their first few principal components is considered, thus stressing dimensionality reduction in terms of the original variables rather than the derived variables.

...read moreread less

Abstract: Principal component analysis is widely used in the analysis of multivariate data in the agricultural, biological, and environmental sciences. The first few principal components (PCs) of a set of variables are derived variables with optimal properties in terms of approximating the original variables. This paper considers the problem of identifying subsets of variables that best approximate the full set of variables or their first few PCs, thus stressing dimensionality reduction in terms of the original variables rather than in terms of derived variables (PCs) whose definition requires all the original variables. Criteria for selecting variables are often ill defined and may produce inappropriate subsets. Indicators of the performance of different subsets of the variables are discussed and two criteria are defined. These criteria are used in stepwise selection-type algorithms to choose good subsets. Examples are given that show, among other things, that the selection of variable subsets should not be based only on the PC loadings of the variables.

...read moreread less

Proceedings Article•DOI•

Independent component analysis as a tool for the dimensionality reduction and the representation of hyperspectral images

[...]

M. Lennon¹, Gregoire Mercier¹, M.C. Mouchot¹, Laurence Hubert-Moy²•Institutions (2)

École Normale Supérieure¹, University of Rennes²

09 Jul 2001

TL;DR: This paper proposes to show the interest of ICA as a tool for unsupervised analysis of hyperspectral images by using higher order statistics and leads to independent components, a stronger statistical assumption revealing interesting features in the usually non gaussian hyperspectrals data sets.

...read moreread less

Abstract: Independent component analysis (ICA) is a multivariate data analysis process largely studied these last years in the signal processing community for blind source separation. This paper proposes to show the interest of ICA as a tool for unsupervised analysis of hyperspectral images. The commonly used principal component analysis (PCA) is the mean square optimal projection for gaussian data leading to uncorrelated components by using second order statistics. ICA rather uses higher order statistics and leads to independent components, a stronger statistical assumption revealing interesting features in the usually non gaussian hyperspectral data sets.

...read moreread less

Journal Article•DOI•

Multispace KL for pattern representation and classification

[...]

Raffaele Cappelli¹, Davide Maltoni•Institutions (1)

University of Bologna¹

01 Sep 2001-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: MKL is shown to outperform KL when the data distribution is far from a multidimensional Gaussian and to better cope with large sets of patterns, which could cause a severe performance drop in KL.

...read moreread less

Abstract: This work introduces the multispace Karhunen-Loeve (MKL) as a new approach to unsupervised dimensionality reduction for pattern representation and classification. The training set is automatically partitioned into disjoint subsets, according to an optimality criterion; each subset then determines a different KL subspace which is specialized in representing a particular group of patterns. The extension of the classical KL operators and the definition of ad hoc distances allow MKL to be effectively used where KL is commonly employed. The limits of the standard KL transform are pointed out, in particular, MKL is shown to outperform KL when the data distribution is far from a multidimensional Gaussian and to better cope with large sets of patterns, which could cause a severe performance drop in KL.

...read moreread less

Journal Article•DOI•

Feature space theory — a mathematical foundation for data mining

[...]

Hong Xing Li¹, Li D. Xu²•Institutions (2)

Beijing Normal University¹, University of Science and Technology of China²

01 Aug 2001-Knowledge Based Systems

TL;DR: In this paper, feature space theory is introduced as a mathematical foundation for feature related concepts and techniques in data mining.

...read moreread less

Abstract: In data mining, an important task in classification and prediction includes feature construction, feature description, feature selection, feature relevance analysis and feature reduction. In this paper, feature space theory is introduced as a mathematical foundation for feature related concepts and techniques in data mining.

...read moreread less

Proceedings Article•DOI•

Extraction of feature subspaces for content-based retrieval using relevance feedback

[...]

Zhong Su¹, Stan Z. Li², Hong-Jiang Zhang²•Institutions (2)

Tsinghua University¹, Microsoft²

01 Oct 2001

TL;DR: A novel method for extracting features for the class of images represented by the positive images provided by subjective RF is proposed, using Principal Component Analysis (PCA) to reduce both noise contained in the original image features and dimensionality of feature spaces.

...read moreread less

Abstract: In the past few years, relevance feedback (RF) has been used as an effective solution for content-based image retrieval (CBIR). Although effective, the RF-CBIR framework does not address the issue of feature extraction for dimension reduction and noise reduction. In this paper, we propose a novel method for extracting features for the class of images represented by the positive images provided by subjective RF. Principal Component Analysis (PCA) is used to reduce both noise contained in the original image features and dimensionality of feature spaces. The method increases the retrieval speed and reduces the memory significantly without sacrificing the retrieval accuracy.

...read moreread less

Journal Article•DOI•

Dimensionality reduction in unsupervised learning of conditional Gaussian networks

[...]

Jose M. Peña¹, Jose A. Lozano¹, Pedro Larrañaga¹, Iñaki Inza¹•Institutions (1)

University of the Basque Country¹

01 Jun 2001-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel enhancement for unsupervised learning of conditional Gaussian networks that benefits from feature selection based on the assumption that in the absence of labels reflecting the cluster membership of each case of the database, those features that exhibit low correlation with the rest of the features can be considered irrelevant for the learning process.

...read moreread less

Abstract: This paper introduces a novel enhancement for unsupervised learning of conditional Gaussian networks that benefits from feature selection. Our proposal is based on the assumption that, in the absence of labels reflecting the cluster membership of each case of the database, those features that exhibit low correlation with the rest of the features can be considered irrelevant for the learning process. Thus, we suggest performing this process using only the relevant features. Then, every irrelevant feature is added to the learned model to obtain an explanatory model for the original database which is our primary goal. A simple and, thus, efficient measure to assess the relevance of the features for the learning process is presented. Additionally, the form of this measure allows us to calculate a relevance threshold to automatically identify the relevant features. The experimental results reported for synthetic and real-world databases show the ability of our proposal to distinguish between relevant and irrelevant features and to accelerate learning, while still obtaining good explanatory models for the original database.

...read moreread less

Journal Article•DOI•

Multidimensional scaling and visualization of large molecular similarity tables

[...]

Dimitris K. Agrafiotis, Dmitrii N. Rassokhin, Victor S. Lobanov

15 Apr 2001-Journal of Computational Chemistry

TL;DR: This article presents a family of algorithms that combine nonlinear mapping techniques with neural networks, and make possible the scaling of very large data sets that are intractable with conventional methodologies.

...read moreread less

Abstract: Multidimensional scaling (MDS) is a collection of statistical techniques that attempt to embed a set of patterns described by means of a dissimilarity matrix into a low-dimensional display plane in a way that preserves their original pairwise interrelationships as closely as possible. Unfortunately, current MDS algorithms are notoriously slow, and their use is limited to small data sets. In this article, we present a family of algorithms that combine nonlinear mapping techniques with neural networks, and make possible the scaling of very large data sets that are intractable with conventional methodologies. The method employs a nonlinear mapping algorithm to project a small random sample, and then "learns" the underlying transform using one or more multilayer perceptrons. The distinct advantage of this approach is that it captures the nonlinear mapping relationship in an explicit function, and allows the scaling of additional patterns as they become available, without the need to reconstruct the entire map. A novel encoding scheme is described, allowing this methodology to be used with a wide variety of input data representations and similarity functions. The potential of the algorithm is illustrated in the analysis of two combinatorial libraries and an ensemble of molecular conformations. The method is particularly useful for extracting low-dimensional Cartesian coordinate vectors from large binary spaces, such as those encountered in the analysis of large chemical data sets. c 2001 John Wiley & Sons, Inc. J Comput Chem 22: 488-500, 2001

...read moreread less

Proceedings Article•DOI•

Dimensionality reduction using non-negative matrix factorization for information retrieval

[...]

Satoru Tsuge¹, Masami Shishibori, Shingo Kuroiwa, Kenji Kita•Institutions (1)

University of Tokushima¹

07 Oct 2001

TL;DR: Non-negative matrix factorization (NMF) is used for dimensionality reduction of the vector space model, where matrices decomposed by NMF only contain non-negative values, the original data are represented by only additive, not subtractive, combinations of the basis vectors.

...read moreread less

Abstract: The vector space model (VSM) is a conventional information retrieval model, which represents a document collection by a term-by-document matrix. Since term-by-document matrices are usually high-dimensional and sparse, they are susceptible to noise and are also difficult to capture the underlying semantic structure. Additionally, the storage and processing of such matrices places great demands on computing resources. Dimensionality reduction is a way to overcome these problems. Principal component analysis (PCA) and singular value decomposition (SVD) are popular techniques for dimensionality reduction based on matrix decomposition, however they contain both positive and negative values in the decomposed matrices. In the work described here, we use non-negative matrix factorization (NMF) for dimensionality reduction of the vector space model. Since matrices decomposed by NMF only contain non-negative values, the original data are represented by only additive, not subtractive, combinations of the basis vectors. This characteristic of parts-based representation is appealing because it reflects the intuitive notion of combining parts to form a whole. Also NMF computation is based on the simple iterative algorithm, it is therefore advantageous for applications involving large matrices. Using the MEDLINE collection, we experimentally showed that NMF offers great improvement over the vector space model.

...read moreread less