Showing papers on "Dimensionality reduction published in 1997"

PDF

Open Access

Journal Article•DOI•

Face recognition: a convolutional neural-network approach

[...]

Steve Lawrence¹, C.L. Giles², Ah Chung Tsoi³, Andrew D. Back³•Institutions (3)

Princeton University¹, Penn State College of Information Sciences and Technology², University of Queensland³

01 Jan 1997-IEEE Transactions on Neural Networks

TL;DR: A hybrid neural-network for human face recognition which compares favourably with other methods and analyzes the computational complexity and discusses how new classes could be added to the trained recognizer.

...read moreread less

Abstract: We present a hybrid neural-network for human face recognition which compares favourably with other methods. The system combines local image sampling, a self-organizing map (SOM) neural network, and a convolutional neural network. The SOM provides a quantization of the image samples into a topological space where inputs that are nearby in the original space are also nearby in the output space, thereby providing dimensionality reduction and invariance to minor changes in the image sample, and the convolutional neural network provides partial invariance to translation, rotation, scale, and deformation. The convolutional network extracts successively larger features in a hierarchical set of layers. We present results using the Karhunen-Loeve transform in place of the SOM, and a multilayer perceptron (MLP) in place of the convolutional network for comparison. We use a database of 400 images of 40 individuals which contains quite a high degree of variability in expression, pose, and facial details. We analyze the computational complexity and discuss how new classes could be added to the trained recognizer.

...read moreread less

2,954 citations

Book Chapter•DOI•

Kernel Principal Component Analysis

[...]

Bernhard Schölkopf¹, Alexander J. Smola, Klaus-Robert Müller•Institutions (1)

Max Planck Society¹

08 Oct 1997

TL;DR: A new method for performing a nonlinear form of Principal Component Analysis by the use of integral operator kernel functions is proposed and experimental results on polynomial feature extraction for pattern recognition are presented.

...read moreread less

Abstract: A new method for performing a nonlinear form of Principal Component Analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all possible d-pixel products in images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition.

...read moreread less

2,223 citations

Journal Article•DOI•

Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets

[...]

P. Demartines, J. Herault

01 Jan 1997-IEEE Transactions on Neural Networks

TL;DR: A self-organized neural network performing two tasks: vector quantization of the submanifold in the data set (input space) and nonlinear projection of these quantizing vectors toward an output space, providing a revealing unfolding of theSub manifold.

...read moreread less

Abstract: We present a new strategy called "curvilinear component analysis" (CCA) for dimensionality reduction and representation of multidimensional data sets. The principle of CCA is a self-organized neural network performing two tasks: vector quantization (VQ) of the submanifold in the data set (input space); and nonlinear projection (P) of these quantizing vectors toward an output space, providing a revealing unfolding of the submanifold. After learning, the network has the ability to continuously map any new point from one space into another: forward mapping of new points in the input space, or backward mapping of an arbitrary position in the output space.

...read moreread less

721 citations

Journal Article•DOI•

Dimension reduction by local principal component analysis

[...]

Nandakishore Kambhatla¹, Todd K. Leen¹•Institutions (1)

Oregon Health & Science University¹

01 Oct 1997-Neural Computation

TL;DR: A local linear approach to dimension reduction that provides accurate representations and is fast to compute is developed and it is shown that the local linear techniques outperform neural network implementations.

...read moreread less

Abstract: Reducing or eliminating statistical redundancy between the components of high-dimensional vector data enables a lower-dimensional representation without significant loss of information. Recognizing the limitations of principal component analysis (PCA), researchers in the statistics and neural network communities have developed nonlinear extensions of PCA. This article develops a local linear approach to dimension reduction that provides accurate representations and is fast to compute. We exercise the algorithms on speech and image data, and compare performance with PCA and with neural network implementations of nonlinear PCA. We find that both nonlinear techniques can provide more accurate representations than PCA and show that the local linear techniques outperform neural network implementations.

...read moreread less

702 citations

Journal Article•DOI•

Pairwise data clustering by deterministic annealing

[...]

Thomas Hofmann¹, Joachim M. Buhmann¹•Institutions (1)

University of Bonn¹

01 Jan 1997-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A deterministic annealing approach to pairwise clustering is described which shares the robustness properties of maximum entropy inference and the resulting Gibbs probability distributions are estimated by mean-field approximation.

...read moreread less

Abstract: Partitioning a data set and extracting hidden structure from the data arises in different application areas of pattern recognition, speech and image processing. Pairwise data clustering is a combinatorial optimization method for data grouping which extracts hidden structure from proximity data. We describe a deterministic annealing approach to pairwise clustering which shares the robustness properties of maximum entropy inference. The resulting Gibbs probability distributions are estimated by mean-field approximation. A new structure-preserving algorithm to cluster dissimilarity data and to simultaneously embed these data in a Euclidian vector space is discussed which can be used for dimensionality reduction and data visualization. The suggested embedding algorithm which outperforms conventional approaches has been implemented to analyze dissimilarity data from protein analysis and from linguistics. The algorithm for pairwise data clustering is used to segment textured images.

...read moreread less

524 citations

Proceedings Article•

Mapping a Manifold of Perceptual Observations

[...]

Joshua B. Tenenbaum¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Dec 1997

TL;DR: The isometric feature mapping procedure, or isomap, is able to reliably recover low-dimensional nonlinear structure in realistic perceptual data sets, such as a manifold of face images, where conventional global mapping methods find only local minima.

...read moreread less

Abstract: Nonlinear dimensionality reduction is formulated here as the problem of trying to find a Euclidean feature-space embedding of a set of observations that preserves as closely as possible their intrinsic metric structure - the distances between points on the observation manifold as measured along geodesic paths. Our isometric feature mapping procedure, or isomap, is able to reliably recover low-dimensional nonlinear structure in realistic perceptual data sets, such as a manifold of face images, where conventional global mapping methods find only local minima. The recovered map provides a canonical set of globally meaningful features, which allows perceptual transformations such as interpolation, extrapolation, and analogy - highly nonlinear transformations in the original observation space - to be computed with simple linear operations in feature space.

...read moreread less

304 citations

Proceedings Article•

Feature subset selection: a correlation based filter approach

[...]

Mark Hall, Lloyd A. Smith¹•Institutions (1)

University of Waikato¹

01 Jan 1997

TL;DR: This paper describes a feature subset selector that uses a correlation based evaluates its effectiveness with three common ML algorithms: a decision tree inducer, a naive Bayes classifier, and an instance based learner.

...read moreread less

Abstract: Recent work has shown that feature subset selection can have a positive affect on the performance of machine learning algorithms. Some algorithms can be slowed or their performance irrelevant or redundant to the learning task. Feature subset selection, then, is a method for enhancing the performance of learning algorithms, reducing the hypothesis search space, and, in some cases, reducing the storage requirement. This paper describes a feature subset selector that uses a correlation based evaluates its effectiveness with three common ML algorithms: a decision tree inducer (C4.5), a naive Bayes classifier, and an instance based learner (IB1). Experiments using a number of standard data sets drawn from real and artificial domains are presented. Feature subset selection gave significant improvement for all three algorithms; C4.5 generated smaller decision trees.

...read moreread less

286 citations

Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition

[...]

Andreas G. Andreou, N. Kumar

01 Jan 1997

TL;DR: This thesis proposes an alternate architecture that goes beyond the basilar-membrane model, and, using which, auditory features can be computed in real time, and presents a unified framework for the problem of dimension reduction and HMM parameter estimation by modeling the original features with reduced-rank HMM.

...read moreread less

Abstract: Biologically motivated feature extraction algorithms have been found to provide significantly robust performance in speech recognition systems, in the presence of channel and noise degradation, when compared to the standard features such as mel-cepstrum coefficients. However, auditory feature extraction is computationally expensive, and makes these features useless for real-time speech recognition systems. In this thesis, I investigate the use of low power techniques and custom analog VLSI for auditory feature extraction. I first investigated the basilar-membrane model and the hair-cell model chips that were designed by Liu (Liu, 1992). I performed speech recognition experiments to evaluate how well these chips would perform as a front-end to a speech recognizer. Based on the experience gained by these experiments, I propose an alternate architecture that goes beyond the basilar-membrane model, and, using which, auditory features can be computed in real time. These chips have been designed and tested, and consume only a few milliwatts of power as compared to general purpose digital machines that consume several Watts. I have also investigated Linear Discriminant Analysis (LDA) for dimension reduction of auditory features. Researchers have used Fisher-Rao linear discriminant analysis (LDA) to reduce the feature dimension. They model the low-dimensional features obtained from LDA as the outputs of a Markov process with hidden states (HMM). I present a unified framework for the problem of dimension reduction and HMM parameter estimation by modeling the original features with reduced-rank HMM. This re-formulation also leads to a generalization of LDA that is consistent with the heteroscedastic state models used in HMM, and give better performance when tested on a digit recognition task.

...read moreread less

199 citations

Patent•

Searching multidimensional indexes using associated clustering and dimension reduction information

[...]

Vittorio Castelli¹, Chung-Sheng Li¹, Alexander Thomasian¹•Institutions (1)

IBM¹

31 Oct 1997

TL;DR: An improved multidimensional data indexing technique that generates compact indexes such that most or all of the index can reside in main memory at any time is presented in this article. But the technique is limited to the presence of variables which are not highly correlated.

...read moreread less

Abstract: An improved multidimensional data indexing technique that generates compact indexes such that most or all of the index can reside in main memory at any time. During the clustering and dimensionality reduction, clustering information and dimensionality reduction information are generated for use in a subsequent search phase. The indexing technique can be effective even in the presence of variables which are not highly correlated. Other features provide for efficiently performing exact and nearest neighbor searches using the clustering information and dimensionality reduction information. One example of the dimensionality reduction uses a singular value decomposition technique. The method can also be recursively applied to each of the reduced-dimensionality clusters. The dimensionality reduction also can be applied to the entire database as a first step of the index generation.

...read moreread less

185 citations

Journal Article•DOI•

Theory of partially adaptive radar

[...]

J.S. Goldstein¹, I.S. Reed•Institutions (1)

University of Southern California¹

01 Oct 1997-IEEE Transactions on Aerospace and Electronic Systems

TL;DR: It is demonstrated that the cross-spectral metric results in a low-dimensional detector which provides nearly optimal performance when the noise covariance is known, closely approximating the performance of the matched filter.

...read moreread less

Abstract: This work extends the recently introduced cross-spectral metric for subspace selection and dimensionality reduction to partially adaptive space-time sensor array processing. A general methodology is developed for the analysis of reduced-dimension detection tests with known and unknown covariance. It is demonstrated that the cross-spectral metric results in a low-dimensional detector which provides nearly optimal performance when the noise covariance is known. It is also shown that this metric allows the dimensionality of the detector to be reduced below the dimension of the noise subspace eigenstructure without significant loss. This attribute provides robustness in the subspace selection process to achieve reduced-dimensional target detection. Finally, it is demonstrated that the cross-spectral subspace reduced-dimension detector can outperform the full-dimension detector when the noise covariance is unknown, closely approximating the performance of the matched filter.

...read moreread less

181 citations

Proceedings Article•DOI•

Dimensionality reduction of unsupervised data

[...]

Manoranjan Dash¹, Huan Liu¹, Jun Yao¹•Institutions (1)

National University of Singapore¹

03 Nov 1997

TL;DR: This paper proposes an entropy measure for ranking features, and conducts extensive experiments to show that the method is able to find the important features and compares well with a similar feature ranking method that requires class information unlike this method.

...read moreread less

Abstract: Dimensionality reduction is an important problem for efficient handling of large databases. Many feature selection methods exist for supervised data having class information. Little work has been done for dimensionality reduction of unsupervised data in which class information is not available. Principal component analysis (PCA) is often used. However, PCA creates new features. It is difficult to obtain intuitive understanding of the data using the new features only. We are concerned with the problem of determining and choosing the important original features for unsupervised data. Our method is based on the observation that removing an irrelevant feature from the feature set may not change the underlying concept of the data, but not so otherwise. We propose an entropy measure for ranking features, and conduct extensive experiments to show that our method is able to find the important features. Also it compares well with a similar feature ranking method (Relief) that requires class information unlike our method.

...read moreread less

Proceedings Article•

Mistake-Driven Learning in Text Categorization

[...]

Ido Dagan, Yael Karov, Dan Roth

01 Jan 1997

TL;DR: This work studies three mistake-driven learning algorithms for a typical task of this nature -- text categorization and presents an algorithm, a variation of Littlestone's Winnow, which performs significantly better than any other algorithm tested on this task using a similar feature set.

...read moreread less

Abstract: Learning problems in the text processing domain often map the text to a space whose dimensions are the measured features of the text, e.g., its words. Three characteristic properties of this domain are (a) very high dimensionality, (b) both the learned concepts and the instances reside very sparsely in the feature space, and (c) a high variation in the number of active features in an instance. In this work we study three mistake-driven learning algorithms for a typical task of this nature -- text categorization. We argue that these algorithms -- which categorize documents by learning a linear separator in the feature space -- have a few properties that make them ideal for this domain. We then show that a quantum leap in performance is achieved when we further modify the algorithms to better address some of the specific characteristics of the domain. In particular, we demonstrate (1) how variation in document length can be tolerated by either normalizing feature weights or by using negative weights, (2) the positive effect of applying a threshold range in training, (3) alternatives in considering feature frequency, and (4) the benefits of discarding features while training. Overall, we present an algorithm, a variation of Littlestone's Winnow, which performs significantly better than any other algorithm tested on this task using a similar feature set.

...read moreread less

Proceedings Article•DOI•

Fast feature selection with genetic algorithms: a filter approach

[...]

Pier Luca Lanzi¹•Institutions (1)

Polytechnic University of Milan¹

13 Apr 1997

TL;DR: This paper presents a genetic algorithm for feature selection which improves previous results presented in the literature for genetic-based feature selection, independent of a specific learning algorithm and requires less CPU time to reach a relevant subset of features.

...read moreread less

Abstract: The goal of the feature selection process is, given a dataset described by n attributes (features), to find the minimum number m of relevant attributes which describe the data as well as the original set of attributes do. Genetic algorithms have been used to implement feature selection algorithms. Previous algorithms presented in the literature used the predictive accuracy of a specific learning algorithm as the fitness function to maximize over the space of possible feature subsets. Such an approach to feature selection requires a large amount of CPU time to reach a good solution on large datasets. This paper presents a genetic algorithm for feature selection which improves previous results presented in the literature for genetic-based feature selection. It is independent of a specific learning algorithm and requires less CPU time to reach a relevant subset of features. Reported experiments show that the proposed algorithm is at least ten times faster than a standard genetic algorithm for feature selection without a loss of predictive accuracy when a learning algorithm is applied to reduced data.

...read moreread less

Journal Article•DOI•

HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features

[...]

R. Chengalvarayan¹, Li Deng¹•Institutions (1)

University of Waterloo¹

01 May 1997-IEEE Transactions on Speech and Audio Processing

TL;DR: Experimental results show that state-dependent transformation on mel-warped DFT features is superior in performance to the mel-frequency cepstral coefficients (MFCC's); an error rate reduction is obtained on a standard 39-class TIMIT phone classification task, in comparison with the conventional MCE-trained HMM using MFCC's that have not been subject to optimization during training.

...read moreread less

Abstract: In the study reported in this paper, we investigate interactions of front-end feature extraction and back-end classification techniques in hidden Markov model-based (HMM-based) speech recognition. The proposed model focuses on dimensionality reduction of the mel-warped discrete Fourier transform (DFT) feature space subject to maximal preservation of speech classification information, and aims at finding an optimal linear transformation on the mel-warped DFT according to the minimum classification error (MCE) criterion. This linear transformation, along with the HMM parameters, are automatically trained using the gradient descent method to minimize a measure of overall empirical error counts. A further generalization of the model allows integration of the discriminatively derived state-dependent transformation with the construction of dynamic feature parameters. Experimental results show that state-dependent transformation on mel-warped DFT features is superior in performance to the mel-frequency cepstral coefficients (MFCC's). An error rate reduction of 15% is obtained on a standard 39-class TIMIT phone classification task, in comparison with the conventional MCE-trained HMM using MFCC's that have not been subject to optimization during training.

...read moreread less

Journal Article•DOI•

Kernel-PCA algorithms for wide data Part II: Fast cross-validation and application in classification of NIR data

[...]

Wen Wu¹, Desire Massart¹, S. De Jong•Institutions (1)

Vrije Universiteit Brussel¹

01 Jun 1997-Chemometrics and Intelligent Laboratory Systems

TL;DR: Four PCA algorithms, namely NIPALS, the power method (POWER), singular value decomposition (SVD) and eigenvalue decomposing (EVD), and their kernel versions are systematically applied to three NIR data sets from the pharmaceutical industry.

...read moreread less

Proceedings Article•

Local Dimensionality Reduction

[...]

Stefan Schaal¹, Sethu Vijayakumar², Christopher G. Atkeson¹•Institutions (2)

Georgia Institute of Technology¹, Tokyo Institute of Technology²

01 Dec 1997

TL;DR: This paper examines several techniques for local dimensionality reduction in the context of locally weighted linear regression and finds that locally weighted partial least squares regression offers the best average results, thus outperforming even factor analysis, the theoretically most appealing of the candidate techniques.

...read moreread less

Abstract: If globally high dimensional data has locally only low dimensional distributions, it is advantageous to perform a local dimensionality reduction before further processing the data. In this paper we examine several techniques for local dimensionality reduction in the context of locally weighted linear regression. As possible candidates, we derive local versions of factor analysis regression, principle component regression, principle component regression on joint distributions, and partial least squares regression. After outlining the statistical bases of these methods, we perform Monte Carlo simulations to evaluate their robustness with respect to violations of their statistical assumptions. One surprising outcome is that locally weighted partial least squares regression offers the best average results, thus outperforming even factor analysis, the theoretically most appealing of our candidate techniques.

...read moreread less

Simultaneous Feature Extraction and Selection Using a Masking Genetic Algorithm

[...]

Michael L. Raymer, William F. Punch, Erik D. Goodman, Paul C. Sanschagrin, Leslie A. Kuhn - Show less +1 more

01 Jan 1997

TL;DR: The masking GA/knn feature selection method can efficiently examine noisy, complex, and high-dimensionality datasets to find combinations of features which classify the data more accurately and result in equivalent or better classification accuracy using fewer features.

...read moreread less

Abstract: Statistical pattern recognition techniques classify objects in terms of a representative set of features. The selection of features to measure and include can have a significant effect on the cost and accuracy of an automated classifier. Our previous research has shown that a hybrid between a k-nearest-neighbors (knn) classifier and a genetic algorithm (GA) can reduce the size of the feature set used by a classifier, while simultaneously weighting the remaining features to allow greater classification accuracy. Here we describe an extension to this approach which further enhances feature selection through the simultaneous optimization of feature weights and selection of key features by including a masking vector on the GA chromosome. We present the results of our masking GA/knn feature selection method on two important problems from biochemistry and medicine: identification of functional water molecules bound to protein surfaces, and diagnosis of thyroid deficiency. By allowing the GA to explore the effect of eliminating a feature from the classification without losing weight knowledge learned about the feature, the masking GA/knn can efficiently examine noisy, complex, and high-dimensionality datasets to find combinations of features which classify the data more accurately. In both biomedical applications, this technique resulted in equivalent or better classification accuracy using fewer features.

...read moreread less

Journal Article•DOI•

Linear discriminant analysis for two classes via removal of classification structure

[...]

Mayer Aladjem¹•Institutions (1)

Ben-Gurion University of the Negev¹

01 Feb 1997-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Studies with a wide spectrum of synthetic data sets and a real data set indicate that the discrimination quality of these criteria can be improved by the proposed method, called removal of classification structure.

...read moreread less

Abstract: A new method for two-class linear discriminant analysis, called "removal of classification structure", is proposed. Its novelty lies in the transformation of the data along an identified discriminant direction into data without discriminant information and iteration to obtain the next discriminant direction. It is free to search for discriminant directions oblique to each other and ensures that the informative directions already found will not be chosen again at a later stage. The efficacy of the method is examined for two discriminant criteria. Studies with a wide spectrum of synthetic data sets and a real data set indicate that the discrimination quality of these criteria can be improved by the proposed method.

...read moreread less

Proceedings Article•DOI•

Dimensionality reduction using multi-dimensional scaling for content-based retrieval

[...]

M. Beatty¹, B.S. Manjunath¹•Institutions (1)

University of California, Santa Barbara¹

26 Oct 1997

TL;DR: A variant of the non-linear PCA algorithm described by Webb is investigated and its usefulness in the database retrieval problem is investigated; in an experiment using an aerial photo database, the feature vector length was reduced by a factor of 10 without significantly reducing the retrieval performance.

...read moreread less

Abstract: There has been much interest recently in image content based retrieval, with applications to digital libraries and image database accessing. One approach to this problem is to base retrieval from the database upon feature vectors which characterize the image texture. Since feature vectors are often high dimensional, multi-dimensional scaling, or non-linear principal components analysis (PCA) may be useful in reducing feature vector size, and therefore computation time. We have investigated a variant of the non-linear PCA algorithm described by Webb (see Pattern Recognition, vol.28, no.6, p.753-9, 1995) and its usefulness in the database retrieval problem. The results are quite impressive; in an experiment using an aerial photo database, the feature vector length was reduced by a factor of 10 without significantly reducing the retrieval performance.

...read moreread less

Proceedings Article•DOI•

Feature selection via set cover

[...]

M. Dash¹•Institutions (1)

National University of Singapore¹

04 Nov 1997

TL;DR: A multivariate measure of feature consistency is described, which is based on Johnson's (1974) algorithm for set covering, and which outperforms earlier methods using a similar amount of time.

...read moreread less

Abstract: In pattern classification, features are used to define classes. Feature selection is a preprocessing process that searches for an "optimal" subset of features. The class separability is normally used as the basic feature selection criterion. Instead of maximizing the class separability, as in the literature, this work adopts a criterion aiming to maintain the discriminating power of the data describing its classes. In other words, the problem is formalized as that of finding the smallest set of features that is "consistent" in describing classes. We describe a multivariate measure of feature consistency. The new feature selection algorithm is based on Johnson's (1974) algorithm for set covering. Johnson's analysis implies that this algorithm runs in polynomial time, and outputs a consistent feature set whose size is within a log factor of the best possible. Our experiments show that its performance in practice is much better than this, and that it outperforms earlier methods using a similar amount of time.

...read moreread less

Proceedings Article•DOI•

Local dimensionality reduction for locally weighted learning

[...]

Sethu Vijayakumar¹, Stefan Schaal•Institutions (1)

Tokyo Institute of Technology¹

10 Jun 1997

TL;DR: A learning algorithm is derived that exploits a dynamically growing local dimensionality reduction as a preprocessing step with a nonparametric learning technique, locally weighted regression, that exploits data distributions from physical movement systems are locally low dimensional and dense.

...read moreread less

Abstract: Incremental learning of sensorimotor transformations in high dimensional spaces is one of the basic prerequisites for the success of autonomous robot devices as well as biological movement systems. So far, due to sparsity of data in high dimensional spaces, learning in such settings requires a significant amount of prior knowledge about the learning task, usually provided by a human expert. In this paper we suggest a partial revision of the view. Based on empirical studies, it can been observed that, despite being globally high dimensional and sparse, data distributions from physical movement systems are locally low dimensional and dense. Under this assumption, we derive a learning algorithm, locally adaptive subspace regression, that exploits this property by combining a dynamically growing local dimensionality reduction as a preprocessing step with a nonparametric learning technique, locally weighted regression. The usefulness of the algorithm and the validity of its assumptions are illustrated for a synthetic data set and data of the inverse dynamics of an actual 7 degree-of-freedom anthropomorphic robot arm.

...read moreread less

Journal Article•DOI•

The Discrimination Subspace Model

[...]

Leah Flury¹, Benzion Boukai², Bernard D. Flury³•Institutions (3)

Purdue University¹, Indiana University – Purdue University Indianapolis², Indiana University³

01 Jun 1997-Journal of the American Statistical Association

TL;DR: In this article, the authors proposed a discriminative subspace model, DSM(q), which combines the ideas of dimension reduction and constraints on the parameter space, thus substantially reducing the number of parameters to be estimated.

...read moreread less

Abstract: For a p-variate normal random vector measured in two populations, we propose a method of discrimination under the constraint that all differences between the two populations occur in a subspace of dimension q < p. This method of classification is based on the discrimination subspace model, denoted by DSM(q), and is intermediate between linear and quadratic discrimination. It combines the ideas of dimension reduction and constraints on the parameter space, thus substantially reducing the number of parameters to be estimated. The maximum likelihood estimators of the model are presented, and the performance of DSM(q) versus quadratic and linear discrimination is assessed via simulation. It is generally shown that discrimination based on DSM(q) consistently yields noticeably lower expected actual error rates relative to the traditional methods. The method is illustrated with a real data example and is compared to linear and quadratic discrimination using a leave-one-out method. The example confirms t...

...read moreread less

Patent•

Speech and speaker recognition using factor analysis to model covariance structure of mixture components

[...]

Mazin G. Rahim¹, Lawrence K. Saul¹•Institutions (1)

AT&T¹

17 Nov 1997

TL;DR: In this article, factor analysis is used to model acoustic correlation in automatic speech recognition by introducing a small number of parameters to model the covariance structure of a speech signal, which are estimated by an Expectation Maximization (EM) technique that can be embedded in the training procedures for the HMMs, and then further adjusted using Minimum Classification Error (MCE) training, which demonstrates better discrimination and produces more accurate recognition models.

...read moreread less

Abstract: Hidden Markov models (HMMs) rely on high-dimensional feature vectors to summarize the short-time properties of speech correlations between features that can arise when the speech signal is non-stationary or corrupted by noise. These correlations are modeled using factor analysis, a statistical method for dimensionality reduction. Factor analysis is used to model acoustic correlation in automatic speech recognition by introducing a small number of parameters to model the covariance structure of a speech signal. The parameters are estimated by an Expectation Maximization (EM) technique that can be embedded in the training procedures for the HMMs, and then further adjusted using Minimum Classification Error (MCE) training, which demonstrates better discrimination and produces more accurate recognition models.

...read moreread less

Proceedings Article•DOI•

Interactive feature extraction for a form feature conversion system

[...]

Yong Seok Suh¹, Michael J. Wozny¹•Institutions (1)

Rensselaer Polytechnic Institute¹

01 May 1997

TL;DR: In this paper, an intermediate feature representation based on B-rep entities will be addressed for a feature conversion architecture and a constraint-based feature extraction method is advocated for the feature conversion problem.

...read moreread less

Abstract: A feature-based CAD system must support multiple feature representations in the concurrent engineering environment in which multiple engineering groups work on a same product simultaneously. Users must be able to define customized features interactively, for different users need different feature sets. On the other hand, to share high-level information stored in features among different engineering groups, the features must be standardized to be compatible among highly customized feature sets. To solve the two conflicting requirements, a feature conversion system architecture including an interactive feature extraction module has been developed and implemented. Sharing the same basis of fundamental features, multiple feature representations can be maintained in the system. In this paper, an intermediate feature representation based on B-rep entities will be addressed for a feature conversion architecture. For the one-to-one mapping cases, a feature shape code matching mechanism is used, and a constraint-based feature extraction method is advocated for the feature conversion problem.

...read moreread less

Proceedings Article•DOI•

A neural network that explains as well as predicts financial market behavior

[...]

C. Ornes¹, J. Sklansky•Institutions (1)

University of California, Irvine¹

23 Mar 1997

TL;DR: A high-performance neural network that in addition to predicting stock market direction, allows the user to visualize the relationship between current conditions and previous conditions that led to similar predictions.

...read moreread less

Abstract: When a neural network makes a financial prediction, the user may benefit from knowing which previous time periods are illustrative of the current time period. The authors describe a high-performance neural network that in addition to predicting stock market direction, allows the user to visualize the relationship between current conditions and previous conditions that led to similar predictions. Visualization is accomplished by forming a gated multi-expert network using funnel-shaped multilayer dimensionality reduction networks. The neck of the funnel is a two-neuron layer that displays the training data and the decision boundaries in a two-dimensional space. This architecture facilitates a) interactive design of the decision functions and b) explanation of the relevance of past decisions from the training set to the current decision. They describe a stock market prediction system whose design incorporates a visual neural network for prediction, wavelet transforms and tapped delay lines for feature extraction, and a genetic algorithm for feature selection. This system shows that the visual neural network provides the low error rates (i.e., accurate predictions) of multi-expert networks along with the visual explanatory power of nonlinear dimensionality reduction.

...read moreread less

Journal Article•DOI•

Rates of convergence of estimates, Kolmogorov's entropy and the dimensionality reduction principle in regression

[...]

Theodoros Nicoleris, Yannis G. Yatracos

01 Dec 1997-Annals of Statistics

TL;DR: For a projection pursuit regression type function with smooth functional components that are either additive or multiplicative, in the presence of or without interactions, the rate of convergence to the true parameter depend on Kolmogorov's entropy of the assumed model as discussed by the authors.

...read moreread less

Abstract: $L_1$-optimal minimum distance estimators are provided for a projection pursuit regression type function with smooth functional components that are either additive or multiplicative, in the presence of or without interactions. The obtained rates of convergence of the estimate to the true parameter depend on Kolmogorov's entropy of the assumed model and confirm Stone's heuristic dimensionality reduction principle. Rates of convergence are also obtained for the error in estimating the derivatives of a regression type function.

...read moreread less

Journal Article•DOI•

Visualization of pattern data through learning of non-linear variance-conserving dimension-reduction mapping

[...]

Yoh-Han Pao¹, Chang-Yun Shen¹•Institutions (1)

Case Western Reserve University¹

01 Oct 1997-Pattern Recognition

TL;DR: A new approach to reduced-dimension mapping of multi-dimensional pattern data is described which consists of learning a mapping from the original pattern space to a reduced- dimension space in an unsupervised non-linear manner but with the constraint that the overall variance in the representation of the data be conserved.

...read moreread less

Dimensionality Reduction in ILP: A Call to Arms

[...]

Johannes Fürnkranz

01 Jan 1997

TL;DR: This paper aims to motivate research in this area and to present some results on windowing techniques, which have so far been neglected in ILP research.

...read moreread less

Abstract: The recent uprise ofKnowledge Discovery in Databases (KDD)has underlined the need for machine learning algorithms to be able to tackle largescale applications that are currently beyond their scope. One way to address this problem is to use techniques for reducing the dimensionality of the learning problem by reducing the hypothesis space and/or reducing the example space. While research in machine learning has devoted considerable attention to such techniques, they have so far been neglected in ILP research. The purpose of this paper is to motivate research in this area and to present some results on windowing techniques.

...read moreread less

Proceedings Article•DOI•

A comparison of backpropagation and statistical classifiers for bird identification

[...]

A.L. McIlraith¹, H.C. Card•Institutions (1)

University of Winnipeg¹

09 Jun 1997

TL;DR: This work compares neural networks and statistical methods used to identify birds by their songs using backpropagation learning in two-layer perceptrons, as well as methods from multivariate statistics including quadratic discriminant analysis.

...read moreread less

Abstract: We compare neural networks and statistical methods used to identify birds by their songs. Six birds native to Manitoba were chosen which exhibited overlapping characteristics in terms of frequency content, song components and length of songs. Songs from multiple individuals in each species were employed. These songs were analyzed using backpropagation learning in two-layer perceptrons, as well as methods from multivariate statistics including quadratic discriminant analysis. Preprocessing methods included linear predictive coding and windowed Fourier transforms. Generalization performance ranged from 82% to 93% correct identification, with the lower figures corresponding to smaller networks that employed more preprocessing for dimensionality reduction. Computational requirements were significantly reduced in the later case.

...read moreread less

Proceedings Article•DOI•

Dimensionality reduction in higher-order-only ICA

[...]

L. De Lathauwer¹, B. De Moor¹, Joos Vandewalle¹•Institutions (1)

Katholieke Universiteit Leuven¹

21 Jul 1997

TL;DR: The purpose of this paper is to explain how the dimensionality of the ICA-model can algebraically be reduced to the true number of sources in higher-order-only schemes.

...read moreread less

Abstract: Most algebraic methods for independent component analysis (ICA) consist of a second-order and a higher-order stage. The former can be considered as a classical principal component analysis (PCA), with a three-fold goal: (a) reduction of the parameter set of unknowns to the manifold of orthogonal matrices, (b) standardization of the unknown source signals to mutually uncorrelated unit-variance signals, and (c) determination of the number of sources. In the higher-order stage the remaining unknown orthogonal factor is determined by imposing statistical independence on the source estimates. Like all correlation-based techniques, this set-up has the disadvantage that it is affected by additive Gaussian noise. However it is possible to solve the problem, in a way that is conceptually blind to additive Gaussian noise, by resorting only to higher-order cumulants. The purpose of this paper is to explain how the dimensionality of the ICA-model can algebraically be reduced to the true number of sources in higher-order-only schemes.

...read moreread less