scispace - formally typeset
Search or ask a question

Showing papers on "Dimensionality reduction published in 1998"


01 Jan 1998
TL;DR: This thesis addresses the problem of feature selection for machine learning through a correlation based approach with CFS (Correlation based Feature Selection), an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy.
Abstract: A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. CFS (Correlation based Feature Selection) is an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy. CFS was evaluated by experiments on artificial and natural datasets. Three machine learning algorithms were used: C4.5 (a decision tree learner), IB1 (an instance based learner), and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifies and screens irrelevant, redundant, and noisy features, and identifies relevant features as long as their relevance does not strongly depend on other features. On natural domains, CFS typically eliminated well over half the features. In most cases, classification accuracy using the reduced feature set equaled or bettered accuracy using the complete feature set. Feature selection degraded machine learning performance in cases where some features were eliminated which were highly predictive of very small areas of the instance space. Further experiments compared CFS with a wrapper—a well known approach to feature selection that employs the target learning algorithm to evaluate feature sets. In many cases CFS gave comparable results to the wrapper, and in general, outperformed the wrapper on small datasets. CFS executes many times faster than the wrapper, which allows it to scale to larger datasets. Two methods of extending CFS to handle feature interaction are presented and experimentally evaluated. The first considers pairs of features and the second incorporates iii feature weights calculated by the RELIEF algorithm. Experiments on artificial domains showed that both methods were able to identify interacting features. On natural domains, the pairwise method gave more reliable results than using weights provided by RELIEF.

3,533 citations


Proceedings Article
01 Dec 1998
TL;DR: This work presents ideas for finding approximate pre-images, focusing on Gaussian kernels, and shows experimental results using these pre- images in data reconstruction and de-noising on toy examples as well as on real world data.
Abstract: Kernel PCA as a nonlinear feature extractor has proven powerful as a preprocessing step for classification algorithms. But it can also be considered as a natural generalization of linear principal component analysis. This gives rise to the question how to use nonlinear features for data compression, reconstruction, and de-noising, applications common in linear PCA. This is a nontrivial task, as the results provided by kernel PCA live in some high dimensional feature space and need not have pre-images in input space. This work presents ideas for finding approximate pre-images, focusing on Gaussian kernels, and shows experimental results using these pre-images in data reconstruction and de-noising on toy examples as well as on real world data.

1,031 citations


01 Jan 1998
TL;DR: A new feature selection algorithm is described that uses a correlation based heuristic to determine the “goodness” of feature subsets, and its effectiveness is evaluated with three common machine learning algorithms.
Abstract: Machine learning algorithms automatically extract knowledge from machine readable information. Unfortunately, their success is usually dependant on the quality of the data that they operate on. If the data is inadequate, or contains extraneous and irrelevant information, machine learning algorithms may produce less accurate and less understandable results, or may fail to discover anything of use at all. Feature subset selectors are algorithms that attempt to identify and remove as much irrelevant and redundant information as possible prior to learning. Feature subset selection can result in enhanced performance, a reduced hypothesis search space, and, in some cases, reduced storage requirement. This paper describes a new feature selection algorithm that uses a correlation based heuristic to determine the “goodness” of feature subsets, and evaluates its effectiveness with three common machine learning algorithms. Experiments using a number of standard machine learning data sets are presented. Feature subset selection gave significant improvement for all three algorithms.

515 citations


Proceedings ArticleDOI
04 May 1998
TL;DR: It is demonstrated that the document classification accuracy obtained after the dimensionality has been reduced using a random mapping method will be almost as good as the original accuracy if the final dimensionality is sufficiently large.
Abstract: When the data vectors are high-dimensional it is computationally infeasible to use data analysis or pattern recognition algorithms which repeatedly compute similarities or distances in the original data space It is therefore necessary to reduce the dimensionality before, for example, clustering the data If the dimensionality is very high, like in the WEBSOM method which organizes textual document collections on a self-organizing map, then even the commonly used dimensionality reduction methods like the principal component analysis may be too costly It is demonstrated that the document classification accuracy obtained after the dimensionality has been reduced using a random mapping method will be almost as good as the original accuracy if the final dimensionality is sufficiently large (about 100 out of 6000) In fact, it can be shown that the inner product (similarity) between the mapped vectors follows closely the inner product of the original vectors

434 citations


Patent
27 Oct 1998
TL;DR: An improved multidimensional data indexing technique that generates compact indexes such that most or all of the index can reside in main memory at any time is presented in this article. But the technique is limited to the presence of variables which are not highly correlated.
Abstract: An improved multidimensional data indexing technique that generates compact indexes such that most or all of the index can reside in main memory at any time. During the clustering and dimensionality reduction, clustering information and dimensionality reduction information are generated for use in a subsequent search phase. The indexing technique can be effective even in the presence of variables which are not highly correlated. Other features provide for efficiently performing exact and nearest neighbor searches using the clustering information and dimensionality reduction information. One example of the dimensionality reduction uses a singular value decomposition technique. The method can also be recursively applied to each of the reduced-dimensionality clusters. The dimensionality reduction can also be applied to the entire database as a first step of the index generation.

390 citations


Proceedings ArticleDOI
01 Jun 1998
TL;DR: This paper proposes novel techniques for performing SVD-based dimensionality reduction in dynamic databases and proposes a novel technique that uses aggregate data from the existing index rather than the entire data.
Abstract: Databases are increasingly being used to store multi-media objects such as maps, images, audio and video. Storage and retrieval of these objects is accomplished using multi-dimensional index structures such as R*-trees and SS-trees. As dimensionality increases, query performance in these index structures degrades. This phenomenon, generally referred to as the dimensionality curse, can be circumvented by reducing the dimensionality of the data. Such a reduction is however accompanied by a loss of precision of query results. Current techniques such as QBIC use SVD transform-based dimensionality reduction to ensure high query precision. The drawback of this approach is that SVD is expensive to compute, and therefore not readily applicable to dynamic databases. In this paper, we propose novel techniques for performing SVD-based dimensionality reduction in dynamic databases. When the data distribution changes considerably so as to degrade query precision, we recompute the SVD transform and incorporate it in the existing index structure. For recomputing the SVD-transform, we propose a novel technique that uses aggregate data from the existing index rather than the entire data. This technique reduces the SVD-computation time without compromising query precision. We then explore efficient ways to incorporate the recomputed SVD-transform in the existing index structure without degrading subsequent query response times. These techniques reduce the computation time by a factor of 20 in experiments on color and texture image vectors. The error due to approximate computation of SVD is less than 10%.

316 citations


Journal ArticleDOI
TL;DR: Experimental results suggest that the probabilistic algorithm is effective in obtaining optimal/suboptimal feature subsets and its incremental version expedites feature selection further when the number of patterns is large and can scale up without sacrificing the quality of selected features.
Abstract: Feature selection is a problem of finding relevant features. When the number of features of a dataset is large and its number of patterns is huge, an effective method of feature selection can help in dimensionality reduction. An incremental probabilistic algorithm is designed and implemented as an alternative to the exhaustive and heuristic approaches. Theoretical analysis is given to support the idea of the probabilistic algorithm in finding an optimal or near-optimal subset of features. Experimental results suggest that (1) the probabilistic algorithm is effective in obtaining optimal/suboptimal feature subsets; (2) its incremental version expedites feature selection further when the number of patterns is large and can scale up without sacrificing the quality of selected features.

200 citations


Book ChapterDOI
21 Apr 1998
TL;DR: Experimental comparison given on real-world data collected from Web users shows that characteristics of the problem domain and machine learning algorithm should be considered when feature scoring measure is selected.
Abstract: This paper describes several known and some new methods for feature subset selection on large text data Experimental comparison given on real-world data collected from Web users shows that characteristics of the problem domain and machine learning algorithm should be considered when feature scoring measure is selected Our problem domain consists of hyperlinks given in a form of small-documents represented with word vectors In our learning experiments naive Bayesian classifier was used on text data The best performance was achieved by the feature selection methods based on the feature scoring measure called Odds ratio that is known from information retrieval

197 citations


Journal ArticleDOI
TL;DR: In this paper, a general theory on convergence of the least square projection estimate in multiple regression is developed and applied to the functional ANOVA model, where the regression function is modeled as a specified sum of a constant term, main effects functions of one variable and selected interaction terms functions of two or more variables.
Abstract: A general theory on rates of convergence of the least-squares projection estimate in multiple regression is developed. The theory is applied to the functional ANOVA model, where the multivariate regression function Ž is modeled as a specified sum of a constant term, main effects functions of .Ž one variable and selected interaction terms functions of two or more . variables . The least-squares projection is onto an approximating space constructed from arbitrary linear spaces of functions and their tensor products respecting the assumed ANOVA structure of the regression function. The linear spaces that serve as building blocks can be any of the ones commonly used in practice: polynomials, trigonometric polynomials, splines, wavelets and finite elements. The rate of convergence result that is obtained reinforces the intuition that low-order ANOVA modeling can achieve dimension reduction and thus overcome the curse of dimensionality. Moreover, the components of the projection estimate in an appropriately defined ANOVA decomposition provide consistent estimates of the corresponding components of the regression function. When the regression function does not satisfy the assumed ANOVA form, the projection estimate converges to its best approximation of that form.

185 citations


Patent
04 Sep 1998
TL;DR: In this article, a set of speaker dependent models or adapted models is trained upon a comparatively large number of training speakers, one model per speaker, and model parameters are extracted in a predefined order to construct a sets of supervectors, one per speaker.
Abstract: A set of speaker dependent models or adapted models is trained upon a comparatively large number of training speakers, one model per speaker, and model parameters are extracted in a predefined order to construct a set of supervectors, one per speaker. Dimensionality reduction is then performed on the set of supervectors to generate a set of eigenvectors that define an eigenvoice space. If desired, the number of vectors may be reduced to achieve data compression. Thereafter, a new speaker provides adaptation data from which a supervector is constructed by constraining this supervector to be in the eigenvoice space based on a maximum likelihood estimation. The resulting coefficients in the eigenspace of this new speaker may then be used to construct a new set of model parameters from which an adapted model is constructed for that speaker. The adapted model may then be further adapted via MAP, MLLR, MLED or the like. The eigenvoice technique may be applied to MLLR transformation matrices or the like; Bayesian estimation performed in eigenspace uses prior knowledge about speaker space density to refine the estimate about the location of a new speaker in eigenspace.

178 citations


Journal ArticleDOI
01 May 1998
TL;DR: A fast and simple algorithm for approximately calculating the principal components PCs of a dataset and so reducing its dimensionality is described and shows a fast convergence rate compared with other methods and robustness to the reordering of the samples.
Abstract: A fast and simple algorithm for approximately calculating the principal components PCs of a dataset and so reducing its dimensionality is described. This Simple Principal Components Analysis SPCA method was used for dimensionality reduction of two high-dimensional image databases, one of handwritten digits and one of handwritten Japanese characters. It was tested and compared with other techniques. On both databases SPCA shows a fast convergence rate compared with other methods and robustness to the reordering of the samples.

Proceedings ArticleDOI
16 Aug 1998
TL;DR: Two enhanced Fisher linear discriminant models (EFM) are introduced in order to improve the generalization ability of the standard FLD based classifiers such as Fisherfaces and Experimental data shows that the EFM models outperform the standardFLD based methods.
Abstract: We introduce two enhanced Fisher linear discriminant (FLD) models (EFM) in order to improve the generalization ability of the standard FLD based classifiers such as Fisherfaces Similar to Fisherfaces, both EFM models apply first principal component analysis (PCA) for dimensionality reduction before proceeding with FLD type of analysis EFM-1 implements the dimensionality reduction with the goal to balance between the need that the selected eigenvalues account for most of the spectral energy of the raw data and the requirement that the eigenvalues of the within-class scatter matrix in the reduced PCA subspace are not too small EFM-2 implements the dimensionality reduction as Fisherfaces do It proceeds with the whitening of the within-class scatter matrix in the reduced PCA subspace and then chooses a small set of features (corresponding to the eigenvectors of the within-class scatter matrix) so that the smaller trailing eigenvalues are not included in further computation of the between-class scatter matrix Experimental data using a large set of faces-1,107 images drawn from 369 subjects and including duplicates acquired at a later time under different illumination-from the FERET database shows that the EFM models outperform the standard FLD based methods

Book ChapterDOI
02 Sep 1998
TL;DR: This work has shown that the reconstruction of patterns from their largest nonlinear principal components, a technique which is common practice in linear principal component analysis, can be performed using a kernel without explicitly working in F.
Abstract: Algorithms based on Mercer kernels construct their solutions in terms of expansions in a high-dimensional feature space F. Previous work has shown that all algorithms which can be formulated in terms of dot products in F can be performed using a kernel without explicitly working in F. The list of such algorithms includes support vector machines and nonlinear kernel principal component extraction. So far, however, it did not include the reconstruction of patterns from their largest nonlinear principal components, a technique which is common practice in linear principal component analysis.

Journal ArticleDOI
TL;DR: In this article, the authors proposed an algorithm to automatically construct detectors for arbitrary parametric features, including edges, lines, corners, and junctions, by using realistic multi-parameter feature models and incorporating optical and sensing effects.
Abstract: Most visual features are parametric in nature, including, edges, lines, corners, and junctions. We propose an algorithm to automatically construct detectors for arbitrary parametric features. To maximize robustness we use realistic multi-parameter feature models and incorporate optical and sensing effects. Each feature is represented as a densely sampled parametric manifold in a low dimensional subspace of a Hilbert space. During detection, the vector of intensity values in a window about each pixel in the image is projected into the subspace. If the projection lies sufficiently close to the feature manifold, the feature is detected and the location of the closest manifold point yields the feature parameters. The concepts of parameter reduction by normalization, dimension reduction, pattern rejection, and heuristic search are all employed to achieve the required efficiency. Detectors have been constructed for five features, namely, step edge (five parameters), roof edge (five parameters), line (six parameters), corner (five parameters), and circular disc (six parameters). The results of detailed experiments are presented which demonstrate the robustness of feature detection and the accuracy of parameter estimation.

Journal ArticleDOI
TL;DR: In this review, linear and nonlinear multivariate methods are described and illustrated with examples related both to the segmentation of microanalytical maps and to the study of variability in the images of unit cells in high‐resolution transmission electron microscopy.
Abstract: Multivariate data sets are now produced in several types of microscopy. Multivariate statistical methods are necessary in order to extract the useful information contained in such (image or spectrum) series. In this review, linear and nonlinear multivariate methods are described and illustrated with examples related both to the segmentation of microanalytical maps and to the study of variability in the images of unit cells in high-resolution transmission electron microscopy. Concerning linear multivariate statistical analysis, emphasis is put on the need to go beyond the classical orthogonal decomposition already routinely performed through principal components analysis or correspondence analysis. It is shown that oblique analysis is often necessary when quantitative results are expected. Concerning nonlinear multivariate analysis, several methods are first described for performing the mapping of data from a high-dimensional space to a space of lower dimensionality. Then, automatic classification methods are described. These methods, which range from classical methods (hard and fuzzy C-means) to neural networks through clustering methods which do not make assumptions concerning the shape of classes, can be used for multivariate image segmentation and image classification and averaging.

Journal ArticleDOI
TL;DR: In this paper, a real-time Gabor wavelet projection was implemented using a Datacube MaxVideo 250 whilst an alternative system for realtime pose estimation used only standard PC hardware.
Abstract: Methods were investigated for estimating the poses of human faces undergoing large rotations in depth. Dimensionality reduction using principal components analysis enabled pose changes to be visualised as manifolds in low-dimensional subspaces and provided a useful mechanism for investigating these changes. Appearance-based matching using Gabor wavelets was developed for real-time face tracking and pose estimation. A real-time Gabor wavelet projection was implemented using a Datacube MaxVideo 250 whilst an alternative system for real-time pose estimation used only standard PC hardware.

Journal ArticleDOI
TL;DR: Algorithms for multiscale basis selection and feature extraction for pattern classification problems are presented and have been tested for classification and segmentation of one-dimensional radar signals and two-dimensional texture and document images.
Abstract: Algorithms for multiscale basis selection and feature extraction for pattern classification problems are presented. The basis selection algorithm is based on class separability measures rather than energy or entropy. At each level the "accumulated" tree-structured class separabilities obtained from the tree which includes a parent node and the one which includes its children are compared. The decomposition of the node (or subband) is performed (creating the children), if it provides larger combined separability. The suggested feature extraction algorithm focuses on dimensionality reduction of a multiscale feature space subject to maximum preservation of information useful for classification. At each level of decomposition, an optimal linear transform that preserves class separabilities and results in a reduced dimensional feature space is obtained. Classification and feature extraction is then performed at each scale and resulting "soft decisions" obtained for each area are integrated across scales. The suggested algorithms have been tested for classification and segmentation of one-dimensional (1-D) radar signals and two-dimensional (2-D) texture and document images. The same idea can be used for other tree structured local basis, e.g., local trigonometric basis functions, and even for nonorthogonal, redundant and composite basis dictionaries.

Journal ArticleDOI
TL;DR: The present work shows the great potential of GAs for feature selection (dimensionality reduction) problems and tested on a practical pattern recognition problem, which consisted on the discrimination between four seed species by artiÐcial vision.
Abstract: Genetic algorithms (GAs) are efficient search methods based on theparadigm of natural selection and population genetics. A simple GA was appliedfor selecting the optimal feature subset among an initial feature set of larger size.The performances were tested on a practical pattern recognition problem, whichconsisted on the discrimination between four seed species (two cultivated andtwo adventitious seed species) by artiÐcial vision. A set of 73 features, describingsize, shape and texture, were extracted from colour images in order to character-ise each seed. The goal of the GA was to select the best subset of features whichgave the highest classiÐcation rates when using the nearest neighbour as a classi-Ðcation method. The selected features were represented by binary chromosomeswhich had 73 elements. The number of selected features was directly related tothe probability of initialisation of the population at the Ðrst generation of theGA. When this probability was Ðxed to 0E1, the GA selected about Ðve features.The classiÐcation performances increased with the number of generations. Forexample, 6E25% of the seeds were misclassiÐed by using Ðve features at gener-ation 140, whereas another subset of the same size led to 3% misclassiÐcation atgeneration 400. The present work shows the great potential of GAs for featureselection (dimensionality reduction) problems. 1998 SCI.(J Sci Food Agric 76,77E86 (1998)Key words: feature selection; genetic algorithm; seed; colour image analysis;classiÐcation; discrimination

Journal ArticleDOI
TL;DR: This article provides an overview of the methods and techniques for statistical pattern recognition that, based on the user's level of knowledge of a problem, can reduce the problem's dimensionality.
Abstract: Choosing the best method for feature selection depends on the extent of a-priori knowledge of the problem. We present two basic approaches. One involves computationally effective floating-search methods; the other trades off the requirement for a-priori information for the requirement of sufficient data to represent the distributions involved. We've developed methods for statistical pattern recognition that, based on the user's level of knowledge of a problem, can reduce the problem's dimensionality. We believe that these methods can enrich the methodology of subset selection for other fields of AI. This article provides an overview of our methods and techniques. focusing on the basic principles and their potential use.

Journal ArticleDOI
TL;DR: Four different non-linear techniques are studied: multidimensional scaling, Sammon's mapping, self-organizing maps and auto-associative feedforward networks, all presented in the same framework of optimization.

Proceedings Article
01 Dec 1998
TL;DR: This work proposes an alternative and novel technique that produces sparse representations constructed from sets of highly-related words that significantly improves retrieval performance, is efficient to compute and shares properties with the optimal linear projection operator and the independent components of documents.
Abstract: The task in text retrieval is to find the subset of a collection of documents relevant to a user's information request, usually expressed as a set of words. Classically, documents and queries are represented as vectors of word counts. In its simplest form, relevance is defined to be the dot product between a document and a query vector-a measure of the number of common terms. A central difficulty in text retrieval is that the presence or absence of a word is not sufficient to determine relevance to a query. Linear dimensionality reduction has been proposed as a technique for extracting underlying structure from the document collection. In some domains (such as vision) dimensionality reduction reduces computational complexity. In text retrieval it is more often used to improve retrieval performance. We propose an alternative and novel technique that produces sparse representations constructed from sets of highly-related words. Documents and queries are represented by their distance to these sets, and relevance is measured by the number of common clusters. This technique significantly improves retrieval performance, is efficient to compute and shares properties with the optimal linear projection operator and the independent components of documents.

Journal ArticleDOI
TL;DR: Two neural-net-based methods for structure preserving dimensionality reduction using Kohonen's self-organizing feature map (SOFM) and Sammon's method are proposed.
Abstract: We propose two neural net based methods for structure preserving dimensionality reduction. Method 1 selects a small representative sample and applies Sammon's method to project it. This projected data set is then used to train a multilayer perceptron (MLP). Method 2 uses Kohonen's self-organizing feature map to generate a small set of prototypes which is then projected by Sammon's method. This projected data set is then used to train an MLP. Both schemes are quite effective in terms of computation time and quality of output, and both outperform methods of Jain and Mao (1992, 1995) on the data sets tried.

15 Oct 1998
TL;DR: This work presents a hybrid neural network solution which is capable of rapid classification, requires only fast, approximate normalization and preprocessing, and consistently exhibits better classification performance than the eigenfaces approach on the database.
Abstract: Faces represent complex, multidimensional, meaningful visual stimuli and developing a computa- tional model for face recognition is difficult (Turk and Pentland, 1991). We present a hybrid neural network solution which compares favorably with other methods. The system combines local image sam- pling, a self-organizing map neural network, and a convolutional neural network. The self-organizing map provides a quantization of the image samples into a topological space where inputs that are nearby in the original space are also nearby in the output space, thereby providing dimensionality reduction and invariance to minor changes in the image sample, and the convolutional neural network provides for partial invariance to translation, rotation, scale, and deformation. The convolutional network extracts successively larger features in a hierarchical set of layers. We present results using the Karhunen-Loe transform in place of the self-organizing map, and a multilayer perceptron in place of the convolu- tional network. The Karhunen-Lo` eve transform performs almost as well (5.3% error versus 3.8%). The multilayer perceptron performs very poorly (40% error versus 3.8%). The method is capable of rapid classification, requires only fast, approximate normalization and preprocessing, and consistently exhibits better classification performance than the eigenfaces approach (Turk and Pentland, 1991) on the database considered as the number of images per person in the training database is varied from 1 to 5. With 5 images per person the proposed method and eigenfaces result in 3.8% and 10.5% error respectively. The recognizer provides a measure of confidence in its output and classification error approaches zero when rejecting as few as 10% of the examples. We use a database of 400 images of 40 individuals which con- tains quite a high degree of variability in expression, pose, and facial details. We analyze computational complexity and discuss how new classes could be added to the trained recognizer.

Proceedings ArticleDOI
06 Jul 1998
TL;DR: In this paper, the authors presented a heuristic algorithm to solve the dimension reduction problem using subset selection as a matrix approximation problem and showed that the selected bands are contained in a space that is almost aligned with the first few principal components.
Abstract: Presents the formulation of the dimension reduction problem using subset selection as a matrix approximation problem. A heuristic algorithm to solve this problem is presented. Numerical results using LANDSAT and AVIRIS images show that the selected bands are contained in a space that is almost aligned with the first few principal components.

Proceedings ArticleDOI
29 Oct 1998
TL;DR: It is shown that feature sets based upon the short-time Fourier transform, the wavelet transform, and theWavelet packet transform provide an effective representation for classification, provided that they are subject to dimensionality reduction by principal components analysis.
Abstract: An accurate and computationally efficient means of classifying myoelectric signal (MES) patterns has been the subject of considerable research effort in recent years. Effective feature extraction is crucial to reliable classification and, in the quest to improve the accuracy of transient MES pattern classification, many forms of signal representation have been suggested. It is shown that feature sets based upon the short-time Fourier transform, the wavelet transform, and the wavelet packet transform provide an effective representation for classification, provided that they are subject to dimensionality reduction by principal components analysis.

Proceedings ArticleDOI
04 Oct 1998
TL;DR: A Bayesian framework for face recognition is introduced which unifies popular methods such as the eigenfaces and Fisherfaces and can generate two novel probabilistic reasoning models (PRM) with enhanced performance.
Abstract: This paper introduces a Bayesian framework for face recognition which unifies popular methods such as the eigenfaces and Fisherfaces and can generate two novel probabilistic reasoning models (PRM) with enhanced performance. The Bayesian framework first applies principal component analysis (PCA) for dimensionality reduction with the resulting image representation enjoying noise reduction and enhanced generalization abilities for classification tasks. Following data compression, the Bayes classifier which yields the minimum error when the underlying probability density functions (PDF) are known, carries out the recognition in the reduced PCA subspace using the maximum a posteriori (MAP) rule, which is the optimal criterion for classification because it measures class separability. The PRM models are described within this unified Bayesian framework and shown to yield better performance against both the eigenfaces and Fisherfaces methods.

Proceedings ArticleDOI
07 Dec 1998
TL;DR: This paper investigates the use of Fisher-Rao (1965) linear discriminant analysis (LDA) as a means of visual feature extraction for hidden Markov model based automatic speechreading, consistently outperforming principal component analysis and discrete wavelet transform based visual features.
Abstract: This paper investigates the use of Fisher-Rao (1965) linear discriminant analysis (LDA) as a means of visual feature extraction for hidden Markov model based automatic speechreading. For every video frame, a three-dimensional region of interest containing the speaker's mouth over a sequence of adjacent frames is lexicographically arranged into a data vector. Such vectors are then projected onto the space of the most discriminant "eigensequences", estimated by means of LDA on a training set of image sequence vectors, labeled from a set of a-priori chosen classes. The resulting projections, as well as their first and second derivatives over time, are used as features for automatic speechreading. The proposed method is applied to single-speaker, multi-speaker, and speaker-independent visual-only recognition tasks, consistently outperforming principal component analysis and discrete wavelet transform based visual features. Specific issues relevant to LDA are also discussed, namely, class selection, automatic data class labelling, and dimensionality reduction prior to LDA.

Proceedings ArticleDOI
TL;DR: An improved unbiased algorithm for determining principal curves in high dimensional spaces is presented, and a novel applications of principal curve to feature extraction and pattern classification--the Principal Curve Feature Extractor (PCFE) and the Principal Curve Classifier (PCC) are proposed.
Abstract: We present an improved unbiased algorithm for determining principal curves in high dimensional spaces, and then propose two novel applications of principal curve to feature extraction and pattern classification--the Principal Curve Feature Extractor (PCFE) and the Principal Curve Classifier (PCC). The PCFE extracts features from a subset of principal curves computed via the principal components of the input data. With its flexible partitioning choice and non- parametric nature, the PCFE is capable of modeling nonlinear data effectively. The PCC is a general non-parametric classification method that involves computing a principal curve template for each class during the training phase. In the test or application phase, an unlabeled data point is assigned the class label of the nearest principal curve template. PCC performs well for non-gaussian distributed data and data with low local intrinsic dimensionality. Experiments comparing the PCC to established classification methods are performed on selected benchmarks from the UC Irvine machine learning database and the PROBEN1 benchmark dataset, to highlight situations where PCC is advantageous for feature extraction, data characterization, and classification.

Journal ArticleDOI
TL;DR: A method based on linear discriminant analysis (LDA), is introduced that detects principal components which can be used for discrimination, leading to data sets of reduced dimensionality but similar classification accuracy.
Abstract: The study focuses on the problems of dimensionality reduction by means of principal component analysis (PCA) in the context of single-trial EEG data classification (i.e. discriminating between imagined left- and right-hand movement). The principal components with the highest variance, however, do not necessarily carry the greatest information to enable a discrimination between classes. An EEG data set is presented where principal components with high variance cannot be used for discrimination. In addition, a method based on linear discriminant analysis (LDA), is introduced that detects principal components which can be used for discrimination, leading to data sets of reduced dimensionality but similar classification accuracy.

Journal ArticleDOI
TL;DR: Two fabric classification methods, the neural network and dimensionality reduction, are proposed to automatically classify fabrics based on measured hand properties, which are independent and reinforce each other.
Abstract: Fabric classification plays an important role in the textile industry. In this paper, two fabric classification methods, the neural network and dimensionality reduction, are proposed to automatical...