scispace - formally typeset
Search or ask a question

Showing papers on "Dimensionality reduction published in 1997"


Journal ArticleDOI
TL;DR: A hybrid neural-network for human face recognition which compares favourably with other methods and analyzes the computational complexity and discusses how new classes could be added to the trained recognizer.
Abstract: We present a hybrid neural-network for human face recognition which compares favourably with other methods. The system combines local image sampling, a self-organizing map (SOM) neural network, and a convolutional neural network. The SOM provides a quantization of the image samples into a topological space where inputs that are nearby in the original space are also nearby in the output space, thereby providing dimensionality reduction and invariance to minor changes in the image sample, and the convolutional neural network provides partial invariance to translation, rotation, scale, and deformation. The convolutional network extracts successively larger features in a hierarchical set of layers. We present results using the Karhunen-Loeve transform in place of the SOM, and a multilayer perceptron (MLP) in place of the convolutional network for comparison. We use a database of 400 images of 40 individuals which contains quite a high degree of variability in expression, pose, and facial details. We analyze the computational complexity and discuss how new classes could be added to the trained recognizer.

2,954 citations


Book ChapterDOI
08 Oct 1997
TL;DR: A new method for performing a nonlinear form of Principal Component Analysis by the use of integral operator kernel functions is proposed and experimental results on polynomial feature extraction for pattern recognition are presented.
Abstract: A new method for performing a nonlinear form of Principal Component Analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all possible d-pixel products in images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition.

2,223 citations


Journal ArticleDOI
TL;DR: A self-organized neural network performing two tasks: vector quantization of the submanifold in the data set (input space) and nonlinear projection of these quantizing vectors toward an output space, providing a revealing unfolding of theSub manifold.
Abstract: We present a new strategy called "curvilinear component analysis" (CCA) for dimensionality reduction and representation of multidimensional data sets. The principle of CCA is a self-organized neural network performing two tasks: vector quantization (VQ) of the submanifold in the data set (input space); and nonlinear projection (P) of these quantizing vectors toward an output space, providing a revealing unfolding of the submanifold. After learning, the network has the ability to continuously map any new point from one space into another: forward mapping of new points in the input space, or backward mapping of an arbitrary position in the output space.

721 citations


Journal ArticleDOI
TL;DR: A local linear approach to dimension reduction that provides accurate representations and is fast to compute is developed and it is shown that the local linear techniques outperform neural network implementations.
Abstract: Reducing or eliminating statistical redundancy between the components of high-dimensional vector data enables a lower-dimensional representation without significant loss of information. Recognizing the limitations of principal component analysis (PCA), researchers in the statistics and neural network communities have developed nonlinear extensions of PCA. This article develops a local linear approach to dimension reduction that provides accurate representations and is fast to compute. We exercise the algorithms on speech and image data, and compare performance with PCA and with neural network implementations of nonlinear PCA. We find that both nonlinear techniques can provide more accurate representations than PCA and show that the local linear techniques outperform neural network implementations.

702 citations


Journal ArticleDOI
TL;DR: A deterministic annealing approach to pairwise clustering is described which shares the robustness properties of maximum entropy inference and the resulting Gibbs probability distributions are estimated by mean-field approximation.
Abstract: Partitioning a data set and extracting hidden structure from the data arises in different application areas of pattern recognition, speech and image processing. Pairwise data clustering is a combinatorial optimization method for data grouping which extracts hidden structure from proximity data. We describe a deterministic annealing approach to pairwise clustering which shares the robustness properties of maximum entropy inference. The resulting Gibbs probability distributions are estimated by mean-field approximation. A new structure-preserving algorithm to cluster dissimilarity data and to simultaneously embed these data in a Euclidian vector space is discussed which can be used for dimensionality reduction and data visualization. The suggested embedding algorithm which outperforms conventional approaches has been implemented to analyze dissimilarity data from protein analysis and from linguistics. The algorithm for pairwise data clustering is used to segment textured images.

524 citations


Proceedings Article
01 Dec 1997
TL;DR: The isometric feature mapping procedure, or isomap, is able to reliably recover low-dimensional nonlinear structure in realistic perceptual data sets, such as a manifold of face images, where conventional global mapping methods find only local minima.
Abstract: Nonlinear dimensionality reduction is formulated here as the problem of trying to find a Euclidean feature-space embedding of a set of observations that preserves as closely as possible their intrinsic metric structure - the distances between points on the observation manifold as measured along geodesic paths. Our isometric feature mapping procedure, or isomap, is able to reliably recover low-dimensional nonlinear structure in realistic perceptual data sets, such as a manifold of face images, where conventional global mapping methods find only local minima. The recovered map provides a canonical set of globally meaningful features, which allows perceptual transformations such as interpolation, extrapolation, and analogy - highly nonlinear transformations in the original observation space - to be computed with simple linear operations in feature space.

304 citations


Proceedings Article
01 Jan 1997
TL;DR: This paper describes a feature subset selector that uses a correlation based evaluates its effectiveness with three common ML algorithms: a decision tree inducer, a naive Bayes classifier, and an instance based learner.
Abstract: Recent work has shown that feature subset selection can have a positive affect on the performance of machine learning algorithms. Some algorithms can be slowed or their performance irrelevant or redundant to the learning task. Feature subset selection, then, is a method for enhancing the performance of learning algorithms, reducing the hypothesis search space, and, in some cases, reducing the storage requirement. This paper describes a feature subset selector that uses a correlation based evaluates its effectiveness with three common ML algorithms: a decision tree inducer (C4.5), a naive Bayes classifier, and an instance based learner (IB1). Experiments using a number of standard data sets drawn from real and artificial domains are presented. Feature subset selection gave significant improvement for all three algorithms; C4.5 generated smaller decision trees.

286 citations


01 Jan 1997
TL;DR: This thesis proposes an alternate architecture that goes beyond the basilar-membrane model, and, using which, auditory features can be computed in real time, and presents a unified framework for the problem of dimension reduction and HMM parameter estimation by modeling the original features with reduced-rank HMM.
Abstract: Biologically motivated feature extraction algorithms have been found to provide significantly robust performance in speech recognition systems, in the presence of channel and noise degradation, when compared to the standard features such as mel-cepstrum coefficients. However, auditory feature extraction is computationally expensive, and makes these features useless for real-time speech recognition systems. In this thesis, I investigate the use of low power techniques and custom analog VLSI for auditory feature extraction. I first investigated the basilar-membrane model and the hair-cell model chips that were designed by Liu (Liu, 1992). I performed speech recognition experiments to evaluate how well these chips would perform as a front-end to a speech recognizer. Based on the experience gained by these experiments, I propose an alternate architecture that goes beyond the basilar-membrane model, and, using which, auditory features can be computed in real time. These chips have been designed and tested, and consume only a few milliwatts of power as compared to general purpose digital machines that consume several Watts. I have also investigated Linear Discriminant Analysis (LDA) for dimension reduction of auditory features. Researchers have used Fisher-Rao linear discriminant analysis (LDA) to reduce the feature dimension. They model the low-dimensional features obtained from LDA as the outputs of a Markov process with hidden states (HMM). I present a unified framework for the problem of dimension reduction and HMM parameter estimation by modeling the original features with reduced-rank HMM. This re-formulation also leads to a generalization of LDA that is consistent with the heteroscedastic state models used in HMM, and give better performance when tested on a digit recognition task.

199 citations


Patent
31 Oct 1997
TL;DR: An improved multidimensional data indexing technique that generates compact indexes such that most or all of the index can reside in main memory at any time is presented in this article. But the technique is limited to the presence of variables which are not highly correlated.
Abstract: An improved multidimensional data indexing technique that generates compact indexes such that most or all of the index can reside in main memory at any time. During the clustering and dimensionality reduction, clustering information and dimensionality reduction information are generated for use in a subsequent search phase. The indexing technique can be effective even in the presence of variables which are not highly correlated. Other features provide for efficiently performing exact and nearest neighbor searches using the clustering information and dimensionality reduction information. One example of the dimensionality reduction uses a singular value decomposition technique. The method can also be recursively applied to each of the reduced-dimensionality clusters. The dimensionality reduction also can be applied to the entire database as a first step of the index generation.

185 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the cross-spectral metric results in a low-dimensional detector which provides nearly optimal performance when the noise covariance is known, closely approximating the performance of the matched filter.
Abstract: This work extends the recently introduced cross-spectral metric for subspace selection and dimensionality reduction to partially adaptive space-time sensor array processing. A general methodology is developed for the analysis of reduced-dimension detection tests with known and unknown covariance. It is demonstrated that the cross-spectral metric results in a low-dimensional detector which provides nearly optimal performance when the noise covariance is known. It is also shown that this metric allows the dimensionality of the detector to be reduced below the dimension of the noise subspace eigenstructure without significant loss. This attribute provides robustness in the subspace selection process to achieve reduced-dimensional target detection. Finally, it is demonstrated that the cross-spectral subspace reduced-dimension detector can outperform the full-dimension detector when the noise covariance is unknown, closely approximating the performance of the matched filter.

181 citations


Proceedings ArticleDOI
03 Nov 1997
TL;DR: This paper proposes an entropy measure for ranking features, and conducts extensive experiments to show that the method is able to find the important features and compares well with a similar feature ranking method that requires class information unlike this method.
Abstract: Dimensionality reduction is an important problem for efficient handling of large databases. Many feature selection methods exist for supervised data having class information. Little work has been done for dimensionality reduction of unsupervised data in which class information is not available. Principal component analysis (PCA) is often used. However, PCA creates new features. It is difficult to obtain intuitive understanding of the data using the new features only. We are concerned with the problem of determining and choosing the important original features for unsupervised data. Our method is based on the observation that removing an irrelevant feature from the feature set may not change the underlying concept of the data, but not so otherwise. We propose an entropy measure for ranking features, and conduct extensive experiments to show that our method is able to find the important features. Also it compares well with a similar feature ranking method (Relief) that requires class information unlike our method.

Proceedings Article
01 Jan 1997
TL;DR: This work studies three mistake-driven learning algorithms for a typical task of this nature -- text categorization and presents an algorithm, a variation of Littlestone's Winnow, which performs significantly better than any other algorithm tested on this task using a similar feature set.
Abstract: Learning problems in the text processing domain often map the text to a space whose dimensions are the measured features of the text, e.g., its words. Three characteristic properties of this domain are (a) very high dimensionality, (b) both the learned concepts and the instances reside very sparsely in the feature space, and (c) a high variation in the number of active features in an instance. In this work we study three mistake-driven learning algorithms for a typical task of this nature -- text categorization. We argue that these algorithms -- which categorize documents by learning a linear separator in the feature space -- have a few properties that make them ideal for this domain. We then show that a quantum leap in performance is achieved when we further modify the algorithms to better address some of the specific characteristics of the domain. In particular, we demonstrate (1) how variation in document length can be tolerated by either normalizing feature weights or by using negative weights, (2) the positive effect of applying a threshold range in training, (3) alternatives in considering feature frequency, and (4) the benefits of discarding features while training. Overall, we present an algorithm, a variation of Littlestone's Winnow, which performs significantly better than any other algorithm tested on this task using a similar feature set.

Proceedings ArticleDOI
13 Apr 1997
TL;DR: This paper presents a genetic algorithm for feature selection which improves previous results presented in the literature for genetic-based feature selection, independent of a specific learning algorithm and requires less CPU time to reach a relevant subset of features.
Abstract: The goal of the feature selection process is, given a dataset described by n attributes (features), to find the minimum number m of relevant attributes which describe the data as well as the original set of attributes do. Genetic algorithms have been used to implement feature selection algorithms. Previous algorithms presented in the literature used the predictive accuracy of a specific learning algorithm as the fitness function to maximize over the space of possible feature subsets. Such an approach to feature selection requires a large amount of CPU time to reach a good solution on large datasets. This paper presents a genetic algorithm for feature selection which improves previous results presented in the literature for genetic-based feature selection. It is independent of a specific learning algorithm and requires less CPU time to reach a relevant subset of features. Reported experiments show that the proposed algorithm is at least ten times faster than a standard genetic algorithm for feature selection without a loss of predictive accuracy when a learning algorithm is applied to reduced data.

Journal ArticleDOI
TL;DR: Experimental results show that state-dependent transformation on mel-warped DFT features is superior in performance to the mel-frequency cepstral coefficients (MFCC's); an error rate reduction is obtained on a standard 39-class TIMIT phone classification task, in comparison with the conventional MCE-trained HMM using MFCC's that have not been subject to optimization during training.
Abstract: In the study reported in this paper, we investigate interactions of front-end feature extraction and back-end classification techniques in hidden Markov model-based (HMM-based) speech recognition. The proposed model focuses on dimensionality reduction of the mel-warped discrete Fourier transform (DFT) feature space subject to maximal preservation of speech classification information, and aims at finding an optimal linear transformation on the mel-warped DFT according to the minimum classification error (MCE) criterion. This linear transformation, along with the HMM parameters, are automatically trained using the gradient descent method to minimize a measure of overall empirical error counts. A further generalization of the model allows integration of the discriminatively derived state-dependent transformation with the construction of dynamic feature parameters. Experimental results show that state-dependent transformation on mel-warped DFT features is superior in performance to the mel-frequency cepstral coefficients (MFCC's). An error rate reduction of 15% is obtained on a standard 39-class TIMIT phone classification task, in comparison with the conventional MCE-trained HMM using MFCC's that have not been subject to optimization during training.

Journal ArticleDOI
TL;DR: Four PCA algorithms, namely NIPALS, the power method (POWER), singular value decomposition (SVD) and eigenvalue decomposing (EVD), and their kernel versions are systematically applied to three NIR data sets from the pharmaceutical industry.

Proceedings Article
01 Dec 1997
TL;DR: This paper examines several techniques for local dimensionality reduction in the context of locally weighted linear regression and finds that locally weighted partial least squares regression offers the best average results, thus outperforming even factor analysis, the theoretically most appealing of the candidate techniques.
Abstract: If globally high dimensional data has locally only low dimensional distributions, it is advantageous to perform a local dimensionality reduction before further processing the data. In this paper we examine several techniques for local dimensionality reduction in the context of locally weighted linear regression. As possible candidates, we derive local versions of factor analysis regression, principle component regression, principle component regression on joint distributions, and partial least squares regression. After outlining the statistical bases of these methods, we perform Monte Carlo simulations to evaluate their robustness with respect to violations of their statistical assumptions. One surprising outcome is that locally weighted partial least squares regression offers the best average results, thus outperforming even factor analysis, the theoretically most appealing of our candidate techniques.

01 Jan 1997
TL;DR: The masking GA/knn feature selection method can efficiently examine noisy, complex, and high-dimensionality datasets to find combinations of features which classify the data more accurately and result in equivalent or better classification accuracy using fewer features.
Abstract: Statistical pattern recognition techniques classify objects in terms of a representative set of features. The selection of features to measure and include can have a significant effect on the cost and accuracy of an automated classifier. Our previous research has shown that a hybrid between a k-nearest-neighbors (knn) classifier and a genetic algorithm (GA) can reduce the size of the feature set used by a classifier, while simultaneously weighting the remaining features to allow greater classification accuracy. Here we describe an extension to this approach which further enhances feature selection through the simultaneous optimization of feature weights and selection of key features by including a masking vector on the GA chromosome. We present the results of our masking GA/knn feature selection method on two important problems from biochemistry and medicine: identification of functional water molecules bound to protein surfaces, and diagnosis of thyroid deficiency. By allowing the GA to explore the effect of eliminating a feature from the classification without losing weight knowledge learned about the feature, the masking GA/knn can efficiently examine noisy, complex, and high-dimensionality datasets to find combinations of features which classify the data more accurately. In both biomedical applications, this technique resulted in equivalent or better classification accuracy using fewer features.

Journal ArticleDOI
TL;DR: Studies with a wide spectrum of synthetic data sets and a real data set indicate that the discrimination quality of these criteria can be improved by the proposed method, called removal of classification structure.
Abstract: A new method for two-class linear discriminant analysis, called "removal of classification structure", is proposed. Its novelty lies in the transformation of the data along an identified discriminant direction into data without discriminant information and iteration to obtain the next discriminant direction. It is free to search for discriminant directions oblique to each other and ensures that the informative directions already found will not be chosen again at a later stage. The efficacy of the method is examined for two discriminant criteria. Studies with a wide spectrum of synthetic data sets and a real data set indicate that the discrimination quality of these criteria can be improved by the proposed method.

Proceedings ArticleDOI
26 Oct 1997
TL;DR: A variant of the non-linear PCA algorithm described by Webb is investigated and its usefulness in the database retrieval problem is investigated; in an experiment using an aerial photo database, the feature vector length was reduced by a factor of 10 without significantly reducing the retrieval performance.
Abstract: There has been much interest recently in image content based retrieval, with applications to digital libraries and image database accessing. One approach to this problem is to base retrieval from the database upon feature vectors which characterize the image texture. Since feature vectors are often high dimensional, multi-dimensional scaling, or non-linear principal components analysis (PCA) may be useful in reducing feature vector size, and therefore computation time. We have investigated a variant of the non-linear PCA algorithm described by Webb (see Pattern Recognition, vol.28, no.6, p.753-9, 1995) and its usefulness in the database retrieval problem. The results are quite impressive; in an experiment using an aerial photo database, the feature vector length was reduced by a factor of 10 without significantly reducing the retrieval performance.

Proceedings ArticleDOI
04 Nov 1997
TL;DR: A multivariate measure of feature consistency is described, which is based on Johnson's (1974) algorithm for set covering, and which outperforms earlier methods using a similar amount of time.
Abstract: In pattern classification, features are used to define classes. Feature selection is a preprocessing process that searches for an "optimal" subset of features. The class separability is normally used as the basic feature selection criterion. Instead of maximizing the class separability, as in the literature, this work adopts a criterion aiming to maintain the discriminating power of the data describing its classes. In other words, the problem is formalized as that of finding the smallest set of features that is "consistent" in describing classes. We describe a multivariate measure of feature consistency. The new feature selection algorithm is based on Johnson's (1974) algorithm for set covering. Johnson's analysis implies that this algorithm runs in polynomial time, and outputs a consistent feature set whose size is within a log factor of the best possible. Our experiments show that its performance in practice is much better than this, and that it outperforms earlier methods using a similar amount of time.

Proceedings ArticleDOI
10 Jun 1997
TL;DR: A learning algorithm is derived that exploits a dynamically growing local dimensionality reduction as a preprocessing step with a nonparametric learning technique, locally weighted regression, that exploits data distributions from physical movement systems are locally low dimensional and dense.
Abstract: Incremental learning of sensorimotor transformations in high dimensional spaces is one of the basic prerequisites for the success of autonomous robot devices as well as biological movement systems. So far, due to sparsity of data in high dimensional spaces, learning in such settings requires a significant amount of prior knowledge about the learning task, usually provided by a human expert. In this paper we suggest a partial revision of the view. Based on empirical studies, it can been observed that, despite being globally high dimensional and sparse, data distributions from physical movement systems are locally low dimensional and dense. Under this assumption, we derive a learning algorithm, locally adaptive subspace regression, that exploits this property by combining a dynamically growing local dimensionality reduction as a preprocessing step with a nonparametric learning technique, locally weighted regression. The usefulness of the algorithm and the validity of its assumptions are illustrated for a synthetic data set and data of the inverse dynamics of an actual 7 degree-of-freedom anthropomorphic robot arm.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a discriminative subspace model, DSM(q), which combines the ideas of dimension reduction and constraints on the parameter space, thus substantially reducing the number of parameters to be estimated.
Abstract: For a p-variate normal random vector measured in two populations, we propose a method of discrimination under the constraint that all differences between the two populations occur in a subspace of dimension q < p. This method of classification is based on the discrimination subspace model, denoted by DSM(q), and is intermediate between linear and quadratic discrimination. It combines the ideas of dimension reduction and constraints on the parameter space, thus substantially reducing the number of parameters to be estimated. The maximum likelihood estimators of the model are presented, and the performance of DSM(q) versus quadratic and linear discrimination is assessed via simulation. It is generally shown that discrimination based on DSM(q) consistently yields noticeably lower expected actual error rates relative to the traditional methods. The method is illustrated with a real data example and is compared to linear and quadratic discrimination using a leave-one-out method. The example confirms t...

Patent
Mazin G. Rahim1, Lawrence K. Saul1
17 Nov 1997
TL;DR: In this article, factor analysis is used to model acoustic correlation in automatic speech recognition by introducing a small number of parameters to model the covariance structure of a speech signal, which are estimated by an Expectation Maximization (EM) technique that can be embedded in the training procedures for the HMMs, and then further adjusted using Minimum Classification Error (MCE) training, which demonstrates better discrimination and produces more accurate recognition models.
Abstract: Hidden Markov models (HMMs) rely on high-dimensional feature vectors to summarize the short-time properties of speech correlations between features that can arise when the speech signal is non-stationary or corrupted by noise. These correlations are modeled using factor analysis, a statistical method for dimensionality reduction. Factor analysis is used to model acoustic correlation in automatic speech recognition by introducing a small number of parameters to model the covariance structure of a speech signal. The parameters are estimated by an Expectation Maximization (EM) technique that can be embedded in the training procedures for the HMMs, and then further adjusted using Minimum Classification Error (MCE) training, which demonstrates better discrimination and produces more accurate recognition models.

Proceedings ArticleDOI
01 May 1997
TL;DR: In this paper, an intermediate feature representation based on B-rep entities will be addressed for a feature conversion architecture and a constraint-based feature extraction method is advocated for the feature conversion problem.
Abstract: A feature-based CAD system must support multiple feature representations in the concurrent engineering environment in which multiple engineering groups work on a same product simultaneously. Users must be able to define customized features interactively, for different users need different feature sets. On the other hand, to share high-level information stored in features among different engineering groups, the features must be standardized to be compatible among highly customized feature sets. To solve the two conflicting requirements, a feature conversion system architecture including an interactive feature extraction module has been developed and implemented. Sharing the same basis of fundamental features, multiple feature representations can be maintained in the system. In this paper, an intermediate feature representation based on B-rep entities will be addressed for a feature conversion architecture. For the one-to-one mapping cases, a feature shape code matching mechanism is used, and a constraint-based feature extraction method is advocated for the feature conversion problem.

Proceedings ArticleDOI
23 Mar 1997
TL;DR: A high-performance neural network that in addition to predicting stock market direction, allows the user to visualize the relationship between current conditions and previous conditions that led to similar predictions.
Abstract: When a neural network makes a financial prediction, the user may benefit from knowing which previous time periods are illustrative of the current time period. The authors describe a high-performance neural network that in addition to predicting stock market direction, allows the user to visualize the relationship between current conditions and previous conditions that led to similar predictions. Visualization is accomplished by forming a gated multi-expert network using funnel-shaped multilayer dimensionality reduction networks. The neck of the funnel is a two-neuron layer that displays the training data and the decision boundaries in a two-dimensional space. This architecture facilitates a) interactive design of the decision functions and b) explanation of the relevance of past decisions from the training set to the current decision. They describe a stock market prediction system whose design incorporates a visual neural network for prediction, wavelet transforms and tapped delay lines for feature extraction, and a genetic algorithm for feature selection. This system shows that the visual neural network provides the low error rates (i.e., accurate predictions) of multi-expert networks along with the visual explanatory power of nonlinear dimensionality reduction.

Journal ArticleDOI
TL;DR: For a projection pursuit regression type function with smooth functional components that are either additive or multiplicative, in the presence of or without interactions, the rate of convergence to the true parameter depend on Kolmogorov's entropy of the assumed model as discussed by the authors.
Abstract: $L_1$-optimal minimum distance estimators are provided for a projection pursuit regression type function with smooth functional components that are either additive or multiplicative, in the presence of or without interactions. The obtained rates of convergence of the estimate to the true parameter depend on Kolmogorov's entropy of the assumed model and confirm Stone's heuristic dimensionality reduction principle. Rates of convergence are also obtained for the error in estimating the derivatives of a regression type function.

Journal ArticleDOI
TL;DR: A new approach to reduced-dimension mapping of multi-dimensional pattern data is described which consists of learning a mapping from the original pattern space to a reduced- dimension space in an unsupervised non-linear manner but with the constraint that the overall variance in the representation of the data be conserved.

01 Jan 1997
TL;DR: This paper aims to motivate research in this area and to present some results on windowing techniques, which have so far been neglected in ILP research.
Abstract: The recent uprise ofKnowledge Discovery in Databases (KDD)has underlined the need for machine learning algorithms to be able to tackle largescale applications that are currently beyond their scope. One way to address this problem is to use techniques for reducing the dimensionality of the learning problem by reducing the hypothesis space and/or reducing the example space. While research in machine learning has devoted considerable attention to such techniques, they have so far been neglected in ILP research. The purpose of this paper is to motivate research in this area and to present some results on windowing techniques.

Proceedings ArticleDOI
09 Jun 1997
TL;DR: This work compares neural networks and statistical methods used to identify birds by their songs using backpropagation learning in two-layer perceptrons, as well as methods from multivariate statistics including quadratic discriminant analysis.
Abstract: We compare neural networks and statistical methods used to identify birds by their songs. Six birds native to Manitoba were chosen which exhibited overlapping characteristics in terms of frequency content, song components and length of songs. Songs from multiple individuals in each species were employed. These songs were analyzed using backpropagation learning in two-layer perceptrons, as well as methods from multivariate statistics including quadratic discriminant analysis. Preprocessing methods included linear predictive coding and windowed Fourier transforms. Generalization performance ranged from 82% to 93% correct identification, with the lower figures corresponding to smaller networks that employed more preprocessing for dimensionality reduction. Computational requirements were significantly reduced in the later case.

Proceedings ArticleDOI
21 Jul 1997
TL;DR: The purpose of this paper is to explain how the dimensionality of the ICA-model can algebraically be reduced to the true number of sources in higher-order-only schemes.
Abstract: Most algebraic methods for independent component analysis (ICA) consist of a second-order and a higher-order stage. The former can be considered as a classical principal component analysis (PCA), with a three-fold goal: (a) reduction of the parameter set of unknowns to the manifold of orthogonal matrices, (b) standardization of the unknown source signals to mutually uncorrelated unit-variance signals, and (c) determination of the number of sources. In the higher-order stage the remaining unknown orthogonal factor is determined by imposing statistical independence on the source estimates. Like all correlation-based techniques, this set-up has the disadvantage that it is affected by additive Gaussian noise. However it is possible to solve the problem, in a way that is conceptually blind to additive Gaussian noise, by resorting only to higher-order cumulants. The purpose of this paper is to explain how the dimensionality of the ICA-model can algebraically be reduced to the true number of sources in higher-order-only schemes.