scispace - formally typeset
Search or ask a question

Showing papers on "Dimensionality reduction published in 1996"


Proceedings Article
03 Jul 1996
TL;DR: An efficient algorithm for feature selection which computes an approximation to the optimal feature selection criterion is given, showing that the algorithm effectively handles datasets with a very large number of features.
Abstract: In this paper, we examine a method for feature subset selection based on Information Theory. Initially, a framework for defining the theoretically optimal, but computationally intractable, method for feature subset selection is presented. We show that our goal should be to eliminate a feature if it gives us little or no additional information beyond that subsumed by the remaining features. In particular, this will be the case for both irrelevant and redundant features. We then give an efficient algorithm for feature selection which computes an approximation to the optimal feature selection criterion. The conditions under which the approximate algorithm is successful are examined. Empirical results are given on a number of data sets, showing that the algorithm effectively handles datasets with a very large number of features.

1,713 citations


21 May 1996
TL;DR: This work presents an exact Expectation{Maximization algorithm for determining the parameters of this mixture of factor analyzers which concurrently performs clustering and dimensionality reduction, and can be thought of as a reduced dimension mixture of Gaussians.
Abstract: Factor analysis, a statistical method for modeling the covariance structure of high dimensional data using a small number of latent variables, can be extended by allowing di erent local factor models in di erent regions of the input space. This results in a model which concurrently performs clustering and dimensionality reduction, and can be thought of as a reduced dimension mixture of Gaussians. We present an exact Expectation{Maximization algorithm for tting the parameters of this mixture of factor analyzers.

705 citations


Journal ArticleDOI
TL;DR: A methodological framework is developed and algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture are developed.
Abstract: We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.

527 citations


Proceedings ArticleDOI
18 Jun 1996
TL;DR: An algorithm to automatically construct detectors for arbitrary parametric features is proposed and the results of detailed experiments are presented which demonstrate the robustness of feature detection and the accuracy of parameter estimation.
Abstract: We propose an algorithm to automatically construct feature detectors for arbitrary parametric features. To obtain a high level of robustness we advocate the use of realistic multi-parameter feature models and incorporate optical and sensing effects. Each feature is represented as a densely sampled parametric manifold in a low dimensional subspace of a Hilbert space. During detection, the brightness distribution around each image pixel is projected into the subspace. If the projection lies sufficiently close to the feature manifold, the feature is detected and the location of the closest manifold point yields the feature parameters. The concepts of parameter reduction by normalization, dimension reduction, pattern rejection, and heuristic search are all employed to achieve the required efficiency. By applying the algorithm to appropriate parametric feature models, detectors have been constructed for five features, namely, step edge, roof edge, line, corner, and circular disc. Detailed experiments are reported on the robustness of detection and the accuracy of parameter estimation.

139 citations


Proceedings Article
04 Jun 1996
TL;DR: A probabilistic wrapper model is proposed as another method besides the exhaus tive search and the heuristic approach to avoid local minima and exhaustive search and can be used to improve the predictive accuracy of an induction algorithm.
Abstract: Feature selection is de ned as a problem to nd a minimum set of M features for an inductive al gorithm to achieve the highest predictive accuracy from the data described by the original N features where M N A probabilistic wrapper model is proposed as another method besides the exhaus tive search and the heuristic approach The aim of this model is to avoid local minima and exhaustive search The highest predictive accuracy is the crite rion in search of the smallest M Analysis and ex periments show that this model can e ectively nd relevant features and remove irrelevant ones in the context of improving the predictive accuracy of an induction algorithm It is simple straightforward and providing fast solutions while searching for the optimal The applications of such a model its future work and some related issues are also discussed

138 citations


Proceedings ArticleDOI
25 Aug 1996
TL;DR: The results show that the sequential forward floating selection (SFFS) algorithm, proposed by Pudil et al. (1994), dominates the other algorithms tested, and illustrates the dangers of using feature selection in small sample size situations.
Abstract: A large number of algorithms have been proposed for doing feature subset selection. The goal of this paper is to evaluate the quality of feature subsets generated by the various algorithms, and also compare their computational requirements. Our results show that the sequential forward floating selection (SFFS) algorithm, proposed by Pudil et al. (1994), dominates the other algorithms tested. This paper also illustrates the dangers of using feature selection in small sample size situations. It gives the results of applying feature selection to land use classification of SAR satellite images using four different texture models. Pooling features derived from different texture models, followed by a feature selection results in a substantial improvement in the classification accuracy. Application of feature selection to classification of handprinted characters illustrates the value of feature selection in reducing the number of features needed for classifier design.

121 citations


Patent
Masaki Souma1, Kenji Nagao1
12 Dec 1996
TL;DR: In this paper, a feature extraction system for statistically analyzing a set of samples of feature vectors to calculate a feature being an index for a pattern identification, which is capable of identifying confusing data with a high robustness.
Abstract: A feature extraction system for statistically analyzing a set of samples of feature vectors to calculate a feature being an index for a pattern identification, which is capable of identifying confusing data with a high robustness. In this system, a storage section stores a feature vector inputted through an input section and a neighborhood vector selection section selects a specific feature vector from the feature vectors existing in the storage section. The specific feature is a neighborhood vector close in distance to the feature vector stored in the storage section. Further, the system is equipped with a feature vector space production section for outputting a partial vector space. The partial vector space is made to maximize the local scattering property of the feature vector when the feature vector is orthogonally projected to that space.

105 citations


Patent
29 Mar 1996
TL;DR: In this article, a hybrid convolutional neural network (HNN) and a self-organizing map neural network are used for object recognition. But they do not provide invariance to translation, rotation, scale, and deformation.
Abstract: A hybrid neural network system for object recognition exhibiting local image sampling, a self-organizing map neural network, and a hybrid convolutional neural network. The self-organizing map provides a quantization of the image samples into a topological space where inputs that are nearby in the original space are also nearby in the output space, thereby providing dimensionality reduction and invariance to minor changes in the image sample, and the hybrid convolutional neural network provides for partial invariance to translation, rotation, scale, and deformation. The hybrid convolutional network extracts successively larger features in a hierarchical set of layers. Alternative embodiments using the Karhunen-Loeve transform in place of the self-organizing map, and a multi-layer perceptron in place of the convolutional network are described.

90 citations


Journal ArticleDOI
TL;DR: A selection of classifiers and a selection of dimensionality reducing techniques are applied to the discrimination of seagrass spectral data and results indicate a promising future for wavelets in discriminant analysis, and the recently introduced flexible and penalized discriminantAnalysis.

83 citations


Proceedings ArticleDOI
12 Nov 1996
TL;DR: It is shown that for even moderately large databases (in fact, only 1856 texture images), these approaches do not scale well for exact retrieval, but as a browsing tool, these dimensionality reduction techniques hold much promise.
Abstract: The management of large image databases poses several interesting and challenging problems. These problems range from ingesting the data and extracting meta-data to the efficient storage and retrieval of the data. Of particular interest are the retrieval methods and user interactions with an image database during browsing. In image databases, the response to a given query is not an exact well-defined set, rather, the user poses a query and expects a set of responses that should contain many possible candidates from which the user chooses the answer set. We first present the browsing model in Alexandria, a digital library for maps and satellite images. Designed for content-based retrieval, the relevant information in an image is encoded in the form of a multi-% dimensional feature vector. Various techniques have been previously proposed for the efficient retrieval of such vectors by reducing the dimensionality of such vectors. We show that for even moderately large databases (in fact, only 1856 texture images), these approaches do not scale well for exact retrieval. However, as a browsing tool, these dimensionality reduction techniques hold much promise.

72 citations


Book ChapterDOI
01 Jan 1996
TL;DR: The new approach selects a subset of features that maximizes predictive accuracy prior to the network learning phase, and generates networks that are computationally simpler to evaluate and display predictive accuracy comparable to that of Bayesian networks which model all attributes.
Abstract: This paper introduces a novel enhancement for learning Bayesian networks with a bias for small, high-predictive-accuracy networks. The new approach selects a subset of features that maximizes predictive accuracy prior to the network learning phase. We examine explicitly the effects of two aspects of the algorithm, feature selection and node ordering. Our approach generates networks that are computationally simpler to evaluate and display predictive accuracy comparable to that of Bayesian networks which model all attributes.

03 Oct 1996
TL;DR: Local models or Gaussian mixture models can be efficient tools for dimension reduction, exploratory data analysis, feature extraction, classification and regression, and proposed algorithms for regularizing them are presented.
Abstract: In this dissertation, we present local linear models for dimension reduction and Gaussian mixture models for classification and regression. When the data has different structure in different parts of the input space, fitting once global model can be slow and inaccurate. Simple learning models can quickly learn the structure of the data in small (local) regions. Thus, local learning techniques can offer us faster and more accurate model fitting. Gaussian mixture models form a soft local model of the data; data points belong to all "local" regions (Gaussians) at once with differing degrees of membership. Thus, mixture models blend together the different (local) models. We show that local linear dimension reduction approximates maximum likelihood signal extraction for a mixture of Gaussians signal-plus-noise model. The thesis of this document is that "local learning models can perform efficient (fast and accurate) data processing". We propose local linear dimension reduction algorithms which partition the input space and build separate low dimensional coordinate systems in disjoint regions of the input space. We compare the local linear models with a global linear model (principal components analysis) and a global non-linear model (five layered auto-associative neural networks). For speech and image data, the local linear models incur about half the error of the global models while training nearly an order of magnitude faster than the neural networks. Under certain conditions, the local linear models are related to a mixture of Gaussians data model. Motivated by the relation between local linear dimension reduction and Gaussians mixture models we present Gaussian mixture models for classification and regression and propose algorithms for regularizing them. Our results with speech phoneme classification and some benchmark regression tasks indicate that the mixture models perform comparably with a global model (neural networks). To summarize, local models or Gaussian mixture models can be efficient tools for dimension reduction, exploratory data analysis, feature extraction, classification and regression.

Journal ArticleDOI
TL;DR: The results suggest that PCA/FIT is useful way to pretreat the data as input of NN, and univariate feature selection followed by PCA reduces somewhat the size of the structure of the NN for some data sets.

Proceedings Article
01 Jan 1996
TL;DR: This paper applies two statistical learning algorithms, the Linear Least Squares Fit (LLSF) mapping and a Nearest Neighbor classifier named ExpNet, to a large collection of MEDLINE documents, and both LLSF and ExpNet successfully scaled to this very large problem.
Abstract: Whether or not high accuracy classification methods can be scaled to large applications is crucial for the ultimate usefulness of such methods in text categorization. This paper applies two statistical learning algorithms, the Linear Least Squares Fit (LLSF) mapping and a Nearest Neighbor classifier named ExpNet, to a large collection of MEDLINE documents. With the use of suitable dimensionality reduction techniques and efficient algorithms, both LLSF and ExpNet successfully scaled to this very large problem with a result significantly outperforming word-matching and other automatic learning methods applied to the same corpus.

Journal ArticleDOI
TL;DR: A method that reduces data vertically and horizontally, keeps the discriminating power of the original data, and paves the way for extracting concepts from the raw data is introduced.
Abstract: The existence of numeric data and large numbers of records in a database present a challenging task in terms of explicit concepts extraction from the raw data. The paper introduces a method that reduces data vertically and horizontally, keeps the discriminating power of the original data, and paves the way for extracting concepts. The method is based on discretization (vertical reduction) and feature selection (horizontal reduction). The experimental results show that (a) the data can be effectively reduced by the proposed method; (b) the predictive accuracy of a classifier (C4.5) can be improved after data and dimensionality reduction; and (c) the classification rules learned are simpler.

Proceedings ArticleDOI
18 Jun 1996
TL;DR: This work presents a hybrid neural network solution which is capable of rapid classification, requires only fast, approximate normalization and preprocessing, and consistently exhibits better classification performance than the eigenfaces approach on the database.
Abstract: Faces represent complex, multidimensional, meaningful visual stimuli and developing a computational model for face recognition is difficult. We present a hybrid neural network solution which compares favorably with other methods. The system combines local image sampling, a self-organizing map neural network, and a convolutional neural network. The self-organizing map provides a quantization of the image samples into a topological space where inputs that are nearby in the original space are also nearby in the output space, thereby providing dimensionality reduction and invariance to minor changes in the image sample, and the convolutional neural network provides for partial invariance to translation, rotation, scale, and deformation. The method is capable of rapid classification, requires only fast, approximate normalization and preprocessing, and consistently exhibits better classification performance than the eigenfaces approach on the database considered as the number of images per person in the training database is varied from 1 to 5. With 5 images per person the proposed method and eigenfaces result in 3.8% and 10.5% error respectively. The recognizer provides a measure of confidence in its output and classification error approaches zero when rejecting as few as 10% of the examples. We use a database of 400 images of 40 individuals which contains quite a high degree of variability in expression, pose, and facial details.

Proceedings Article
02 Aug 1996
TL;DR: ADHOC (Automatic Discoverer of Higher-Order Correlation), an algorithm that combines the advantages of both filter and feedback models to enhance the understanding of the given data and to increase the efficiency of the feature selection process is introduced.
Abstract: This paper introduces ADHOC (Automatic Discoverer of Higher-Order Correlation), an algorithm that combines the advantages of both filter and feedback models to enhance the understanding of the given data and to increase the efficiency of the feature selection process. ADHOC partitions the observed features into a number of groups, called factors, that reflect the major dimensions of the phenomenon under consideration. The set of learned factors define the starting point of the search of the best performing feature subset. A genetic algorithm is used to explore the feature space originated by the factors and to determine the set of most informative feature configurations. The feature subset evaluation function is the performance of the induction algorithm. This approach offers three main advantages: (i) the likelihood of selecting good performing features grows; (ii) the complexity of search diminishes consistently; (iii) the possibility of selecting a bad feature subset due to overfitting problems decreases. Extensive experiments on real-world data have been conducted to demonstrate the effectiveness of ADHOC as data reduction technique as well as feature selection method.

Proceedings ArticleDOI
25 Aug 1996
TL;DR: The experiments reveal that Sammon's mapping, the multilayer perceptron (MLP) and the principal component analysis (PCA) based feature extractors yield similar classification performance, and the PCA based initialization affords better human chromosome classification performance even when using a few eigenvectors.
Abstract: Feature extraction for exploratory data projection aims for data visualization by a projection of a high-dimensional space onto two or three-dimensional space, while feature extraction for classification generally requires more than two or three features. We study extraction of more than three features, using neural network (NN) implementation of Sammon's mapping to be applied for classification. The experiments reveal that Sammon's mapping, the multilayer perceptron (MLP) and the principal component analysis (PCA) based feature extractors yield similar classification performance. We investigate a random- and PCA-based initializations of Sammon's mapping. When the PCA is applied to initialize Sammon's projection, only one experiment is required and only a fraction of the training period is needed to achieve performance comparable with that of the random initialization. Furthermore, the PCA based initialization affords better human chromosome classification performance even when using a few eigenvectors.

Proceedings ArticleDOI
25 Aug 1996
TL;DR: These feature extractors consider general dependencies between features and class labels, as opposed to well known linear methods such as PCA which does not consider class labels and LDA, which uses only simple low order dependencies.
Abstract: This paper presents and evaluates two linear feature extractors based on mutual information. These feature extractors consider general dependencies between features and class labels, as opposed to well known linear methods such as PCA which does not consider class labels and LDA, which uses only simple low order dependencies. As evidenced by several simulations on high dimensional data sets, the proposed techniques provide superior feature extraction and better dimensionality reduction while having similar computational requirements.

Proceedings ArticleDOI
03 Jun 1996
TL;DR: These feature extractors consider general dependencies between features and class labels, as opposed to statistical techniques such as PCA which does not consider class labels and LDA, which uses only simple first order dependencies.
Abstract: Presents and evaluates two linear feature extractors based on mutual information. These feature extractors consider general dependencies between features and class labels, as opposed to statistical techniques such as PCA which does not consider class labels and LDA, which uses only simple first order dependencies. As evidenced by several simulations on high dimensional data sets, the proposed techniques provide superior feature extraction and better dimensionality reduction while having similar computational requirements.

Journal ArticleDOI
TL;DR: This paper develops a formal statistical test for the 'scree plot', a special case of this test is the classical test for equality of eigenvalues which has been suggested in several texts as the criterion to decide the number of principal components to retain.
Abstract: Principal component analysis and factor analysis are the most widely used tools for dimension reduction in data analysis. Both methods require some good criterion to judge the number of dimensions to be kept. The classical method focuses on testing the equality of eigenvalues. As real data hardly have this property, practitioners turn to some ad hoc criterion in judging the dimensionality of their data. One such popular method, the ‘scree test’ or ‘scree plot’ as described in many texts and statistical programs, is based on the trend in eigenvalues of sample covariance (correlation) matrix. The principal components or common factors corresponding to eigenvalues which exhibit a slow linear decrease arc discarded in further data analysis. This paper develops a formal statistical test for the ‘scree plot’. A special case of this test is the classical test for equality of eigenvalues which has been suggested in several texts as the criterion to decide the number of principal components to retain. Comparisons between equality of eigenvalues and the slow linear decrease in eigenvalues on some classical examples support the hypothesis of slow linear decrease. A physical background to such a phenomenon is also suggested.


Book ChapterDOI
23 Oct 1996
TL;DR: A case study of data preprocessing for a hybrid genetic algorithm shows that the elimination of irrelevant features can substantially improve the efficiency of learning and cost-sensitive feature elimination can be effective for reducing costs of induced hypotheses.
Abstract: This study is concerned with whether it is possible to detect what information contained in the training data and background knowledge is relevant for solving the learning problem, and whether irrelevant information can be eliminated in preprocessing before starting the learning process. A case study of data preprocessing for a hybrid genetic algorithm shows that the elimination of irrelevant features can substantially improve the efficiency of learning. In addition, cost-sensitive feature elimination can be effective for reducing costs of induced hypotheses.

Journal ArticleDOI
TL;DR: In this article, a Fourier transform (FT) was used as a tool to reduce the number of variables in pattern recognition of NIR data, and five procedures were designed to select the FT coefficients as the input of the classifier of regularized discriminant analysis.

Journal ArticleDOI
TL;DR: In this paper, an approach to non-linear principal component analysis (NPCA) has been developed by combining the essential properties of linear PCA, least squares approximation property and structure preservation property.

Journal ArticleDOI
TL;DR: It is shown that internal representations of neural networks do not yield unique feature values but can provide the basis for facilitating a number of useful information management tasks, such as memorization, categorization, discovery, associative recall and others.
Abstract: The subject matter of this paper is one of long-standing interest to the Pattern Recognition and Artificial Intelligence research communities, namely that of "feature extraction" for facilitating the task of classification or various other tasks. We show that internal representations of neural networks do not yield unique feature values but can provide the basis for facilitating a number of useful information management tasks, such as memorization, categorization, discovery, associative recall and others. These matters are illustrated with three sets of data, one of a benchmark nature, another of the nature of real-world sensor data, and a third set consisting of semiconductor crystal structure parameters.

Proceedings ArticleDOI
25 Aug 1996
TL;DR: The result of experiment shows that the FKL provides the richest features in discriminating power for the limited class problem when compared with other techniques including the canonical discriminant analysis, the principal component analysis, and the orthonormal discriminant vector method (ODV).
Abstract: The availability of the canonical discriminant analysis to a limited class problem is restricted because the number of extracted features can not be or exceed the number of classes. In order to remove the restriction, a new feature extraction technique FKL is proposed and is tested by handwritten numeral recognition experiment. While the canonical discriminant analysis maximizes the variance ratio (F-ratio), and the principal component analysis (K-L expansion) minimizes the mean square error of dimension reduction, the FKL optimizes both the F-ratio and the mean square error simultaneously. The result of experiment shows that the FKL provides the richest features in discriminating power for the limited class problem when compared with other techniques including the canonical discriminant analysis, the principal component analysis, and the orthonormal discriminant vector method (ODV).

Journal ArticleDOI
TL;DR: An unsupervised learning network is developed by incorporating the idea of non‐linear mapping (NLM) into a backpropagation (BP) algorithm, which makes the BP learning algorithms more competent for many supervised and un Supervised learning tasks provided that an appropriate criterion has been designed.
Abstract: An unsupervised learning network is developed by incorporating the idea of non-linear mapping (NLM) into a backpropagation (BP) algorithm. This network performs the learning process by 2iteratively adjusting its network parameters to minimize an appropriate criterion using a generalized BP (GBP) algorithm. This generalization makes the BP learning algorithms more competent for many supervised and unsupervised learning tasks provided that an appropriate criterion has been designed. Results of numerical simulation and real data show that the proposed technique is a promising approach to visualize multidimensional clusters by mapping the multidimensional data to a perceivable low-dimensional space.

Proceedings ArticleDOI
03 Jun 1996
TL;DR: A feature extraction algorithm based on self-organising maps that can be interpreted as a non-orthogonal basis spanning the space of the input vectors is presented.
Abstract: A feature extraction algorithm based on self-organising maps is presented. The converged feature map can be interpreted as a non-orthogonal basis spanning the space of the input vectors. The new algorithm can be shown to be a generalization of the generalised Hebbian algorithm (GHA).

Proceedings ArticleDOI
28 Jan 1996
TL;DR: Two feature extraction algorithms, the minimum entropy method and the Karhunen-Loe've expansion, have been studied to examine their intraset clustering and interset class dispersion.
Abstract: This paper compares two feature extraction techniques for neural network classifiers. The techniques evaluated are used for the dynamic security assessment of power systems. The feature extraction methods are used to map the observation vectors from the measurement space into a lower dimension feature space. The patterns in the feature space can then be utilized to train a neural network (NN) classifier. The NN classifier is used to classify a given power system into either a "secure" or "insecure" class. The feature vectors not only represent a reduction in dimensionality, but also lead to an improvement in class dispersion, and hence to a better classification. Two feature extraction algorithms, the minimum entropy method and the Karhunen-Loe've expansion, have been studied to examine their intraset clustering and interset class dispersion. A NN pattern classifier system is developed to illustrate the feasibility of classifying any given operating condition into either secure, or insecure class. Security assessment data from two utility power systems are used to test the proposed techniques.