scispace - formally typeset
Search or ask a question

Showing papers on "Dimensionality reduction published in 1993"


Journal ArticleDOI
TL;DR: In this article, the curse of dimensionality and dimension reduction is discussed in the context of multivariate data representation and geometrical properties of multi-dimensional data, including Histograms and Kernel Density Estimators.
Abstract: Representation and Geometry of Multivariate Data. Nonparametric Estimation Criteria. Histograms: Theory and Practice. Frequency Polygons. Averaged Shifted Histograms. Kernel Density Estimators. The Curse of Dimensionality and Dimension Reduction. Nonparametric Regression and Additive Models. Special Topics. Appendices. Indexes.

3,007 citations


Proceedings Article
01 Jun 1993
TL;DR: This paper summarizes work on an approach that combines feature selection and data classiication using Genetic Algorithms combined with a K-nearest neighbor algorithm to optimize classiications by searching for an optimal feature weight-ing, essentially warping the feature space to coalesce individuals within groups and to separate groups from one another.
Abstract: This paper summarizes work on an approach that combines feature selection and data classiication using Genetic Algorithms. First, it describes our use of Genetic Algorithms combined with a K-nearest neighbor algorithm to optimize classiication by searching for an optimal feature weight-ing, essentially warping the feature space to coalesce individuals within groups and to separate groups from one another. This approach has proven especially useful with large data sets where standard feature selection techniques are computationally expensive. Second, it describes our implementation of the approach in a parallel processing environment, giving nearly linear speed-up in processing time. Third, it will summarize our present results in using the technique to discover the relative importance of features in large biological test sets. Finally, it will indicate areas for future research. 1 The Problem We live in the age of information where data is plentiful , to the extent that we are typically unable to process all of it usefully. Computer science has been challenged to discover approaches that can sort through the mountains of data available and discover the essential features needed to answer a speciic question. These approaches must be able to process large quantities of data, in reasonable time and in the presence of oisy" data i.e., irrelevant or erroneous data. Consider a typical example in biology. Researchers in the Center for Microbial Ecology (CME) have selected soil samples from three environments found in agriculture. The environments were: near the roots of a crop (rhizosphere), away from the innuence of the crop roots (non-rhizosphere), and from a fallow eld (crop residue). The CME researchers wished to investigate whether samples from those three environments could be distinguished. In particular, they wanted to see if diversity decreased in the rhizosphere as a result of the symbiotic relationship between the roots and its near-neighbor microbes, and if so in what ways. Their rst experiments used the Biolog c test as the discriminator. Biolog consists of a plate of 96 wells, with a diierent substrate in each well. These sub-strates (various sugars, amino acids and other nutrients) are assimilated by some microbes and not by others. If the microbial sample processes the substrate in the well, that well changes color which can be recorded photometrically. Thus large numbers of samples can be processed and characterized based on the substrates they can assimilate. The CME researchers applied the Biolog test to 3 sets of 100 samples …

297 citations


Proceedings Article
01 Jun 1993
TL;DR: This work applies Genetic Programming to the development of a processing tree for the classification of features extracted from images: measurements from a set of input nodes are weighted and combined through linear and nonlinear operations to form an output response.
Abstract: We apply Genetic Programming (GP) to the development of a processing tree for the classification of features extracted from images: measurements from a set of input nodes are weighted and combined through linear and nonlinear operations to form an output response. No constraints are placed upon size, shape, or order of processing within the network. This network is used to classify feature vectors extracted from IR imagery into target/nontarget categories using a database of 2000 training samples. Performance is tested against a separate database of 7000 samples. This represents a significant scaling up from the problems to which GP has been applied to date. Two experiments are performed: in the first set, we input classical "statistical" image features and minimize misclassification of target and non-target samples. In the second set of experiments, GP is allowed to form it's own feature set from primitive intensity measurements. For purposes of comparison, the same training and test sets are used to train two other adaptive classifier systems, the binary tree classifier and the Backpropagation neural network. The GP network achieves higher performance with reduced computational requirements. The contributions of GP "schemata," or subtrees, to the performance of generated trees are examined. Genetic Programming for Feature Discovery and Image Discrimination 1

263 citations


Proceedings ArticleDOI
08 Nov 1993
TL;DR: Results are presented which suggested that genetic algorithms can be used to increase the robustness of feature selection algorithms without a significant decrease in compuational efficiency.
Abstract: Selecting a set of features which is optimal for a given task is a problem which plays an important role in wide variety of contexts including pattern recognition, adaptive control and machine learning. Experience with traditional feature selection algorithms in the domain of machine learning leads to an appreciation for their computational efficiency and a concern for their brittleness. The authors describe an alternative approach to feature selection which uses genetic algorithms as the primary search component. Results are presented which suggested that genetic algorithms can be used to increase the robustness of feature selection algorithms without a significant decrease in compuational efficiency.

178 citations


Proceedings Article
29 Nov 1993
TL;DR: Experiments with speech and image data indicate that the local linear algorithm produces encodings with lower distortion than those built by five layer auto-associative networks.
Abstract: We present a fast algorithm for non-linear dimension reduction. The algorithm builds a local linear model of the data by merging PCA with clustering based on a new distortion measure. Experiments with speech and image data indicate that the local linear algorithm produces encodings with lower distortion than those built by five layer auto-associative networks. The local linear algorithm is also more than an order of magnitude faster to train.

115 citations


Journal ArticleDOI
TL;DR: A transformation matrix is derived that makes it possible to theoretically attain the full-dimension Cramer-Rao bound also in the reduced space and the problem of estimating parameters of sinusoidal signals from noisy data is addressed by a direct application of the results derived herein.

81 citations


Journal ArticleDOI
TL;DR: The dimension of the frame feature vectors, and hence the number of model parameters, were greatly reduced without a significant loss of recognition performance.

41 citations


Proceedings ArticleDOI
20 Oct 1993
TL;DR: In this paper, the Greedy (2, 1) feature selection algorithm has been shown to be a practical means of selecting bands for hyperspectral image data and has theoretical advantages over the ForwardSequential algorithm.
Abstract: SUMMARY AND CONCLUSIONSBand selection has been shown here and elsewhere to be a practical method of data reduction for hyperspectral image data.Moreover, band selection has a number of advantages over linear band combining for reducing the dimensionality of highdimensional data. Band selection eliminates the requirement that all bands be measured before data dimensionality isreduced. Bands that are uninformative about pixel classification need not be measured or communicated. Band sets can betailored to specific classification goals (classes, error rates, etc.). Band selection reduces data link requirements, yet retains atunable capability to collect as many bands as required for a specific application. Feature selection algorithms developed forstatistical pattern classifier design can be used to perform band selection. The Greedy (2, 1) feature selection algorithm hasbeen shown to be a practical means of selecting bands. In addition this algorithm has theoretical advantages over the ForwardSequential algorithm, making it the method of choice for hyperspectral applications.

38 citations


Journal ArticleDOI
A. Saha1, Chuan-Lin Wu, Dun-Sung Tang
TL;DR: The authors derive some key properties of RBF networks that provide suitable grounds for implementing efficient search strategies for nonconvex optimization within the same framework.
Abstract: This paper concerns neural network approaches to function approximation and optimization using linear superposition of Gaussians (or what are popularly known as radial basis function (RBF) networks). The problem of function approximation is one of estimating an underlying function f, given samples of the form ((y/sub i/, x/sub i/); i=1,2,...,n; with y/sub i/=f(x/sub i/)). When the dimension of the input is high and the number of samples small, estimation of the function becomes difficult due to the sparsity of samples in local regions. The authors find that this problem of high dimensionality can be overcome to some extent by using linear transformations of the input in the Gaussian kernels. Such transformations induce intrinsic dimension reduction, and can be exploited for identifying key factors of the input and for the phase space reconstruction of dynamical systems, without explicitly computing the dimension and delay. They present a generalization that uses multiple linear projections onto scalars and successive RBF networks (MLPRBF) that estimate the function based on these scaler values. They derive some key properties of RBF networks that provide suitable grounds for implementing efficient search strategies for nonconvex optimization within the same framework. >

33 citations


Journal ArticleDOI
TL;DR: A methodology that works on the notion of reduced dimensionality in the choice of a small set of site-relevant variables is presented and it is contended that this methodology could incorporate model simplicity and site specificity in current estimation models.

33 citations


Journal ArticleDOI
TL;DR: The Kohonen map is introduced, that orders its neurons according to topological features of the data sets to be trained with, that can be called a topology-preserving feature map and can be used to solve general visualization problems of data mapping into a lower dimensional representation.
Abstract: This paper describes the application of self-organizing neural networks on the analysis and visualization of multidimensional data sets. First, a mathematical description of cluster analysis, dimensionality reduction, and topological ordering is given taking these methods as problems of discrete optimization. Then, the Kohonen map is introduced, that orders its neurons according to topological features of the data sets to be trained with. For this reason, it can also be called a topology-preserving feature map.

Journal ArticleDOI
TL;DR: The dimensionality of feature vectors is reduced by using the principles of Karhunen-Loeve transform applied to the feature images locally and globally by choosing the resulting basis vectors which are closest to the classical KL basis vectors.

Journal ArticleDOI
TL;DR: In this paper, the problem of determining the minimum dimension necessary for quadratic discrimination in normal populations with heterogeneous covariance matrices was considered and some asymptotic chi-squared tests were obtained.

Journal ArticleDOI
TL;DR: An approach that is based on a two–pass method including non–supervised cluster analysis, dimensionality reduction and visualization of the texture features by means of nonlinear topographic mappings for automatic segmentation and tissue classification of anatomical objects from magnetic resonance imaging data sets using artificial neural networks is described.
Abstract: The following paper describes a new approach for the automatic segmentation and tissue classification of anatomical objects such as brain tumors from magnetic resonance imaging (MRI) data sets using artificial neural networks. These segmentations serve as an input for 3D–reconstruction algorithms. Since MR images require a careful interpretation of the underlying physics and parameters, we first give the reader a tutorial style introduction to the physical basics of MR technology. Secondly, we describe our approach that is based on a two–pass method including non–supervised cluster analysis, dimensionality reduction and visualization of the texture features by means of nonlinear topographic mappings. An additional classification of the MR data set can be obtained using a post–processing technique to approximate the Bayes decision boundaries. Interactions between the user and the network allow an optimization of the results. For fast 3D–reconstructions, we use a modified marching cubes algorithm but our scheme can easily serve as a preprocessor for any kind of volume renderer. The applications we present in our paper aim at the automatic extraction and fast reconstruction of brain tumors for surgery and therapy planning. We use the neural networks on pathological data sets and show how the method generalizes to physically comparable data sets.

Journal ArticleDOI
TL;DR: The reduced-rank growth curve (RRGC) model as mentioned in this paper is an extension of the reduced rank regression model, referred to as RRGC model, and it estimates discriminant functions via maximum likelihood and gives a procedure for determining dimensionality.
Abstract: Summary This paper presents a method of discriminant analysis especially suited to longitudinal data. The approach is in the spirit of canonical variate analysis (CVA) and is similarly intended to reduce the dimensionality of multivariate data while retaining information about group differences. A drawback of CVA is that it does not take advantage of special structures that may be anticipated in certain types of data. For longitudinal data, it is often appropriate to specify a growth curve structure (as given, for example, in the model of Potthoff & Roy, 1964). The present paper focuses on this growth curve structure, utilizing it in a model-based approach to discriminant analysis. For this purpose the paper presents an extension of the reduced-rank regression model, referred to as the reduced-rank growth curve (RRGC) model. It estimates discriminant functions via maximum likelihood and gives a procedure for determining dimensionality. This methodology is exploratory only, and is illustrated by a well-known dataset from Grizzle & Allen (1969).

Proceedings ArticleDOI
25 Oct 1993
TL;DR: In this article, a modification of Kohonen's self-organizing feature map algorithm that extracts vectors in q-space from data in p-space is given, and three methods are empirically compared.
Abstract: We discuss topological preservation under feature extraction transformations. Transformations that preserve the order of all distances in any neighborhood of vectors in p-space are defined as metric topology preserving (MTP) transformations. We give a necessary and sufficient condition for this property in terms of Spearman's rank correlation coefficient. A modification of Kohonen's self-organizing feature map algorithm that extracts vectors in q-space from data in p-space is given. Three methods are empirically compared: principal components analysis; Sammon's algorithm; and our extension of the self-organizing feature map algorithm. Our MTP index shows that the first two methods preserve distance ranks on six data sets much more effectively than extended SOFM.

Journal ArticleDOI
01 Sep 1993
TL;DR: This correspondence is concerned with presenting a methodology for characterizing ultrasonic transducers using pattern recognition techniques, and it was found that the K-means was the most successful classification algorithm.
Abstract: This correspondence is concerned with presenting a methodology for characterizing ultrasonic transducers using pattern recognition techniques. An apparatus was developed to collect focus, frequency spectrum, impulse response, and diameter parameters. Six different pattern recognition techniques were applied to classify 83 different transducers. These include: K-means, minimum distance, perceptron, potential function, cosine measure, and Bayes' classifier. Moreover, two dimensionality reduction techniques, the K-L transform and the Fisher multiple discriminant, were applied to reduce the feature space. It was found that the K-means was the most successful classification algorithm. Very close behind in performance was the combination of K-L transform with minimum distance classifier. The latter reduced the feature space by 44 percent with only 6 percent misclassification error more than K-means. The potential function scheme did not converge to a solution in a time equal to 60 times what was required for the other algorithms. Results of the classification, dimensionality reduction, comparison and validation are presented to highlight the advantages and limitations of the investigated techniques. >

Proceedings ArticleDOI
25 Oct 1993
TL;DR: An efficient speaker-normalization method based on the mapping of two self-organizing feature maps is developed and can broadly be applied as a front end to all kinds of VQ-based recognition systems.
Abstract: An efficient speaker-normalization method based on the mapping of two self-organizing feature maps is developed. The normalization system consists of a reference map trained on the reference speaker's feature space and a test speaker's map generated by a special topology maintaining/retraining reference map. The retraining procedure is called 'forced competitive learning ' (FCL). It allows for an 1:1-exchange of the feature vectors represented by the neurons of the reference map for those of the test map in the operation phase. Pilot tests on a 33-word (including the 10 digits) database have been performed employing a simple HMM-isolated-word recognizer. The evaluation was based on speaker-dependent recognition and has shown an average adaptation efficiency of /spl rho/=0,90. By using topology-preserving feature maps, the method proposed can broadly be applied as a front end to all kinds of VQ-based recognition systems.

Journal ArticleDOI
TL;DR: In this article, the curse of dimensionality and dimension reduction is discussed in the context of multivariate data representation and geometrical properties of multi-dimensional data, including Histograms and Kernel Density Estimators.
Abstract: Representation and Geometry of Multivariate Data. Nonparametric Estimation Criteria. Histograms: Theory and Practice. Frequency Polygons. Averaged Shifted Histograms. Kernel Density Estimators. The Curse of Dimensionality and Dimension Reduction. Nonparametric Regression and Additive Models. Special Topics. Appendices. Indexes.

Book ChapterDOI
02 Jan 1993
TL;DR: All in all, search algorithms constitute the motor which drives information retrieval.
Abstract: Search algorithms underpin astronomical databases, and may be called upon for the processing of (suitably coded) textual data. They may be required in conjunction with the use of dimensionality reduction approaches such as the factor space approach described in chapter 3, or latent semantic indexing (Deerwester et al., 1990). Efficient search algorithms can be the building blocks of data reorganization approaches using clustering (see section 4.8 below). All in all, search algorithms constitute the motor which drives information retrieval.

25 May 1993
TL;DR: Details of the properties of the learning rules, the use of the network for feature extraction in pattern recognition applications, and an investigation of other learning architectures for principal component extraction are presented.
Abstract: In this paper, learning rules for a two-layered network consisting of N input units and M output units, with full connections between the two layers and full lateral connections between the output units, are proposed. The learning rules extract the principal components from a given input data set, i.e. the weight vectors of the network converge to the eigenvectors belonging to the M largest eigenvalues of the covariance matrix of the input. Simulation results are presented to illustrate the convergence behaviour of the network. Among the issues for further research are a detailed mathematical analysis of the properties of the learning rules, the use of the network for feature extraction in pattern recognition applications, and an investigation of other learning architectures for principal component extraction.

Proceedings Article
29 Nov 1993
TL;DR: A number of learning rules that can be used to train unsupervised parallel feature extraction systems are described and it is shown that one system learns the principle components of the correlation matrix.
Abstract: We describe a number of learning rules that can be used to train unsupervised parallel feature extraction systems. The learning rules are derived using gradient ascent of a quality function. We consider a number of quality functions that are rational functions of higher order moments of the extracted feature values. We show that one system learns the principle components of the correlation matrix. Principal component analysis systems are usually not optimal feature extractors for classification. Therefore we design quality functions which produce feature vectors that support unsupervised classification. The properties of the different systems are compared with the help of different artificially designed datasets and a database consisting of all Munsell color spectra.

25 Oct 1993
TL;DR: This work uses feature selection algorithms to determine which combination of bands has the lowest probability of pixel misclassification, and achieves dimensionality reduction by selecting bands.
Abstract: Hyperspectral images have many bands requiring significant computational power for machine interpretation. During image pre-processing, regions of interest that warrant full examination need to be identified quickly. One technique for speeding up the processing is to use only a small subset of bands to determine the 'interesting' regions. The problem addressed here is how to determine the fewest bands required to achieve a specified performance goal for pixel classification. The band selection problem has been addressed previously Chen et al., Ghassemian et al., Henderson et al., and Kim et al.. Some popular techniques for reducing the dimensionality of a feature space, such as principal components analysis, reduce dimensionality by computing new features that are linear combinations of the original features. However, such approaches require measuring and processing all the available bands before the dimensionality is reduced. Our approach, adapted from previous multidimensional signal analysis research, is simpler and achieves dimensionality reduction by selecting bands. Feature selection algorithms are used to determine which combination of bands has the lowest probability of pixel misclassification. Two elements required by this approach are a choice of objective function and a choice of search strategy.

Proceedings ArticleDOI
27 Apr 1993
TL;DR: A novel feature selection algorithm is presented which outperforms the well-known SFS and SBS algorithms for large-scale problems by choosing a subset of the original measurements that are closest to the space spanned by the extracted (transformed) features.
Abstract: A novel feature selection algorithm is presented which outperforms the well-known SFS (sequential forward selection) and SBS (sequential backward selection) algorithms for large-scale problems. The approach utilizes the solution to the similar problem of large-scale feature extraction by choosing a subset of the original measurements that are closest to the space spanned by the extracted (transformed) features. The authors develop a computationally efficient Frobenius subspace distance metric for the subspace comparisons, which reduces the complexity from order N taken k at a time to order N/sup 3/ operations. Finally, sufficient conditions for optimality of the algorithm are presented that demonstrate the relationship between the feature extraction and the feature selection solutions. >

Proceedings ArticleDOI
16 Aug 1993
TL;DR: The speed of learning, the robustness of recognition, the automatic feature competition and automatic feature extraction, and other characteristics of this novel one-step learning system are discussed in detail.
Abstract: Supervised learning in a one-layered, hard-limited perceptron can be formulated into a set of linear inequalities containing the unknown weight coefficients. Solving these inequalities under a generalized separability condition by a noniterative method then achieves the goal of supervised learning. The speed of learning, the robustness of recognition, the automatic feature competition and automatic feature extraction, and other characteristics of this novel one-step learning system are then discussed in detail. >

ReportDOI
11 May 1993
TL;DR: This work addressed applications of the dimension reduction principle in adaptive beamforming, spectral estimation, and detection problems, and demonstrated that dimension reduction is most profitably used in applications where relatively short data records are available or fast response time is required.
Abstract: : The problem studied concerns reducing the dimension of data by mapping it through rectangular matrix transformations before application of signal processing algorithms. Our work addressed applications of this principle in adaptive beamforming, spectral estimation, and detection problems. While dimension reduction often leads to dramatic reductions in the computational burden of the signal processing algorithm, it can also introduce significant asymptotic performance losses if the transformation is not chosen properly. We choose dimension reducing transformations to optimize performance criteria associated with the problem of interest. Our results indicate that dramatic reductions in dimension can be achieved with relatively small asymptotic performance losses using these design procedures. Performance analyses demonstrate that dimension reduction is most profitably used in applications where relatively short data records are available or fast response time is required. In these cases dimension reduction actually improves performance. Partially adaptive beamforming, Minimum variance spectrum analysis, Adaptive detection, Beamspace processing, Dimension reduction.