scispace - formally typeset
Search or ask a question

Showing papers on "Feature extraction published in 2004"


Journal ArticleDOI
TL;DR: A new technique coined two-dimensional principal component analysis (2DPCA) is developed for image representation that is based on 2D image matrices rather than 1D vectors so the image matrix does not need to be transformed into a vector prior to feature extraction.
Abstract: In this paper, a new technique coined two-dimensional principal component analysis (2DPCA) is developed for image representation. As opposed to PCA, 2DPCA is based on 2D image matrices rather than 1D vectors so the image matrix does not need to be transformed into a vector prior to feature extraction. Instead, an image covariance matrix is constructed directly using the original image matrices, and its eigenvectors are derived for image feature extraction. To test 2DPCA and evaluate its performance, a series of experiments were performed on three face image databases: ORL, AR, and Yale face databases. The recognition rate across all trials was higher using 2DPCA than PCA. The experimental results also indicated that the extraction of image features is computationally more efficient using 2DPCA than PCA.

3,439 citations


Journal ArticleDOI
TL;DR: A method is presented for automated segmentation of vessels in two-dimensional color images of the retina based on extraction of image ridges, which coincide approximately with vessel centerlines, which is compared with two recently published rule-based methods.
Abstract: A method is presented for automated segmentation of vessels in two-dimensional color images of the retina. This method can be used in computer analyses of retinal images, e.g., in automated screening for diabetic retinopathy. The system is based on extraction of image ridges, which coincide approximately with vessel centerlines. The ridges are used to compose primitives in the form of line elements. With the line elements an image is partitioned into patches by assigning each image pixel to the closest line element. Every line element constitutes a local coordinate frame for its corresponding patch. For every pixel, feature vectors are computed that make use of properties of the patches and the line elements. The feature vectors are classified using a kNN-classifier and sequential forward feature selection. The algorithm was tested on a database consisting of 40 manually labeled images. The method achieves an area under the receiver operating characteristic curve of 0.952. The method is compared with two recently published rule-based methods of Hoover et al. and Jiang et al. . The results show that our method is significantly better than the two rule-based methods (p<0.01). The accuracy of our method is 0.944 versus 0.947 for a second observer.

3,416 citations


Proceedings ArticleDOI
27 Jun 2004
TL;DR: This paper examines (and improves upon) the local image descriptor used by SIFT, and demonstrates that the PCA-based local descriptors are more distinctive, more robust to image deformations, and more compact than the standard SIFT representation.
Abstract: Stable local feature detection and representation is a fundamental component of many image registration and object recognition algorithms. Mikolajczyk and Schmid (June 2003) recently evaluated a variety of approaches and identified the SIFT [D. G. Lowe, 1999] algorithm as being the most resistant to common image deformations. This paper examines (and improves upon) the local image descriptor used by SIFT. Like SIFT, our descriptors encode the salient aspects of the image gradient in the feature point's neighborhood; however, instead of using SIFT's smoothed weighted histograms, we apply principal components analysis (PCA) to the normalized gradient patch. Our experiments demonstrate that the PCA-based local descriptors are more distinctive, more robust to image deformations, and more compact than the standard SIFT representation. We also present results showing that using these descriptors in an image retrieval application results in increased accuracy and faster matching.

3,325 citations


Proceedings ArticleDOI
23 Aug 2004
TL;DR: This paper construct video representations in terms of local space-time features and integrate such representations with SVM classification schemes for recognition and presents the presented results of action recognition.
Abstract: Local space-time features capture local events in video and can be adapted to the size, the frequency and the velocity of moving patterns. In this paper, we demonstrate how such features can be used for recognizing complex motion patterns. We construct video representations in terms of local space-time features and integrate such representations with SVM classification schemes for recognition. For the purpose of evaluation we introduce a new video database containing 2391 sequences of six human actions performed by 25 people in four different scenarios. The presented results of action recognition justify the proposed method and demonstrate its advantage compared to other relative approaches for action recognition.

3,238 citations


Book ChapterDOI
11 May 2004
TL;DR: A novel approach to face recognition which considers both shape and texture information to represent face images and the simplicity of the proposed method allows for very fast feature extraction.
Abstract: In this work, we present a novel approach to face recognition which considers both shape and texture information to represent face images. The face area is first divided into small regions from which Local Binary Pattern (LBP) histograms are extracted and concatenated into a single, spatially enhanced feature histogram efficiently representing the face image. The recognition is performed using a nearest neighbour classifier in the computed feature space with Chi square as a dissimilarity measure. Extensive experiments clearly show the superiority of the proposed scheme over all considered methods (PCA, Bayesian Intra/extrapersonal Classifier and Elastic Bunch Graph Matching) on FERET tests which include testing the robustness of the method against different facial expressions, lighting and aging of the subjects. In addition to its efficiency, the simplicity of the proposed method allows for very fast feature extraction.

2,191 citations


Proceedings ArticleDOI
19 Jul 2004
TL;DR: This paper proposes a novel method for solving single-image super-resolution problems, given a low-resolution image as input, and recovers its high-resolution counterpart using a set of training examples, inspired by recent manifold teaming methods.
Abstract: In this paper, we propose a novel method for solving single-image super-resolution problems. Given a low-resolution image as input, we recover its high-resolution counterpart using a set of training examples. While this formulation resembles other learning-based methods for super-resolution, our method has been inspired by recent manifold teaming methods, particularly locally linear embedding (LLE). Specifically, small image patches in the lowand high-resolution images form manifolds with similar local geometry in two distinct feature spaces. As in LLE, local geometry is characterized by how a feature vector corresponding to a patch can be reconstructed by its neighbors in the feature space. Besides using the training image pairs to estimate the high-resolution embedding, we also enforce local compatibility and smoothness constraints between patches in the target high-resolution image through overlapping. Experiments show that our method is very flexible and gives good empirical results.

1,951 citations


Proceedings ArticleDOI
27 Jun 2004
TL;DR: A real-time version of the system was implemented that can detect and classify objects in natural scenes at around 10 frames per second and proved impractical, while convolutional nets yielded 16/7% error.
Abstract: We assess the applicability of several popular learning methods for the problem of recognizing generic visual categories with invariance to pose, lighting, and surrounding clutter. A large dataset comprising stereo image pairs of 50 uniform-colored toys under 36 azimuths, 9 elevations, and 6 lighting conditions was collected (for a total of 194,400 individual images). The objects were 10 instances of 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. Five instances of each category were used for training, and the other five for testing. Low-resolution grayscale images of the objects with various amounts of variability and surrounding clutter were used for training and testing. Nearest neighbor methods, support vector machines, and convolutional networks, operating on raw pixels or on PCA-derived features were tested. Test error rates for unseen object instances placed on uniform backgrounds were around 13% for SVM and 7% for convolutional nets. On a segmentation/recognition task with highly cluttered images, SVM proved impractical, while convolutional nets yielded 16/7% error. A real-time version of the system was implemented that can detect and classify objects in natural scenes at around 10 frames per second.

1,509 citations


Journal ArticleDOI
TL;DR: This tutorial performs a synthesis between the multiscale-decomposition-based image approach, the ARSIS concept, and a multisensor scheme based on wavelet decomposition, i.e. a multiresolution image fusion approach.

1,187 citations


Journal ArticleDOI
TL;DR: Quantitative evaluation and comparison show that the proposed Bayesian framework for foreground object detection in complex environments provides much improved results.
Abstract: This paper addresses the problem of background modeling for foreground object detection in complex environments. A Bayesian framework that incorporates spectral, spatial, and temporal features to characterize the background appearance is proposed. Under this framework, the background is represented by the most significant and frequent features, i.e., the principal features , at each pixel. A Bayes decision rule is derived for background and foreground classification based on the statistics of principal features. Principal feature representation for both the static and dynamic background pixels is investigated. A novel learning method is proposed to adapt to both gradual and sudden "once-off" background changes. The convergence of the learning process is analyzed and a formula to select a proper learning rate is derived. Under the proposed framework, a novel algorithm for detecting foreground objects from complex environments is then established. It consists of change detection, change classification, foreground segmentation, and background maintenance. Experiments were conducted on image sequences containing targets of interest in a variety of environments, e.g., offices, public buildings, subway stations, campuses, parking lots, airports, and sidewalks. Good results of foreground detection were obtained. Quantitative evaluation and comparison with the existing method show that the proposed method provides much improved results.

1,120 citations


Journal ArticleDOI
TL;DR: The basic idea is that local sharp variation points, denoting the appearing or vanishing of an important image structure, are utilized to represent the characteristics of the iris.
Abstract: Unlike other biometrics such as fingerprints and face, the distinct aspect of iris comes from randomly distributed features. This leads to its high reliability for personal identification, and at the same time, the difficulty in effectively representing such details in an image. This paper describes an efficient algorithm for iris recognition by characterizing key local variations. The basic idea is that local sharp variation points, denoting the appearing or vanishing of an important image structure, are utilized to represent the characteristics of the iris. The whole procedure of feature extraction includes two steps: 1) a set of one-dimensional intensity signals is constructed to effectively characterize the most important information of the original two-dimensional image; 2) using a particular class of wavelets, a position sequence of local sharp variation points in such signals is recorded as features. We also present a fast matching scheme based on exclusive OR operation to compute the similarity between a pair of position sequences. Experimental results on 2 255 iris images show that the performance of the proposed method is encouraging and comparable to the best iris recognition algorithm found in the current literature.

999 citations


Journal ArticleDOI
TL;DR: A physiological signal-based emotion recognition system is reported, developed to operate as a user-independent system, based on physiological signal databases obtained from multiple subjects, and consisted of preprocessing, feature extraction and pattern classification stages.
Abstract: A physiological signal-based emotion recognition system is reported. The system was developed to operate as a user-independent system, based on physiological signal databases obtained from multiple subjects. The input signals were electrocardiogram, skin temperature variation and electrodermal activity, all of which were acquired without much discomfort from the body surface, and can reflect the influence of emotion on the autonomic nervous system. The system consisted of preprocessing, feature extraction and pattern classification stages. Preprocessing and feature extraction methods were devised so that emotion-specific characteristics could be extracted from short-segment signals. Although the features were carefully extracted, their distribution formed a classification problem, with large overlap among clusters and large variance within clusters. A support vector machine was adopted as a pattern classifier to resolve this difficulty. Correct-classification ratios for 50 subjects were 78.4% and 61.8%, for the recognition of three and four categories, respectively.

Journal ArticleDOI
TL;DR: Experiments revealed that the proposed hybrid GA is superior to both a simple GA and sequential search algorithms, and showed better convergence properties compared to the classical GAs.
Abstract: This paper proposes a novel hybrid genetic algorithm for feature selection. Local search operations are devised and embedded in hybrid GAs to fine-tune the search. The operations are parameterized in terms of their fine-tuning power, and their effectiveness and timing requirements are analyzed and compared. The hybridization technique produces two desirable effects: a significant improvement in the final performance and the acquisition of subset-size control. The hybrid GAs showed better convergence properties compared to the classical GAs. A method of performing rigorous timing analysis was developed, in order to compare the timing requirement of the conventional and the proposed algorithms. Experiments performed with various standard data sets revealed that the proposed hybrid GA is superior to both a simple GA and sequential search algorithms.

Journal ArticleDOI
01 Oct 2004
TL;DR: Experimental results show that the proposed method achieves robust pattern extraction, and the equal error rate was 0.145% in personal identification.
Abstract: We propose a method of personal identification based on finger-vein patterns. An image of a finger captured under infrared light contains not only the vein pattern but also irregular shading produced by the various thicknesses of the finger bones and muscles. The proposed method extracts the finger-vein pattern from the unclear image by using line tracking that starts from various positions. Experimental results show that it achieves robust pattern extraction, and the equal error rate was 0.145% in personal identification.

Proceedings Article
01 Dec 2004
TL;DR: 2DLDA, a novel LDA algorithm, which stands for 2-Dimensional Linear Discriminant Analysis, overcomes the singularity problem implicitly, while achieving efficiency and the combination of 2DLDA and classical LDA, namely 2 DLDA+LDA, is studied.
Abstract: Linear Discriminant Analysis (LDA) is a well-known scheme for feature extraction and dimension reduction. It has been used widely in many applications involving high-dimensional data, such as face recognition and image retrieval. An intrinsic limitation of classical LDA is the so-called singularity problem, that is, it fails when all scatter matrices are singular. A well-known approach to deal with the singularity problem is to apply an intermediate dimension reduction stage using Principal Component Analysis (PCA) before LDA. The algorithm, called PCA+LDA, is used widely in face recognition. However, PCA+LDA has high costs in time and space, due to the need for an eigen-decomposition involving the scatter matrices. In this paper, we propose a novel LDA algorithm, namely 2DLDA, which stands for 2-Dimensional Linear Discriminant Analysis. 2DLDA overcomes the singularity problem implicitly, while achieving efficiency. The key difference between 2DLDA and classical LDA lies in the model for data representation. Classical LDA works with vectorized representations of data, while the 2DLDA algorithm works with data in matrix representation. To further reduce the dimension by 2DLDA, the combination of 2DLDA and classical LDA, namely 2DLDA+LDA, is studied, where LDA is preceded by 2DLDA. The proposed algorithms are applied on face recognition and compared with PCA+LDA. Experiments show that 2DLDA and 2DLDA+LDA achieve competitive recognition accuracy, while being much more efficient.

Journal ArticleDOI
TL;DR: The proposed monitoring method was applied to fault detection and identification in both a simple multivariate process and the simulation benchmark of the biological wastewater treatment process, which is characterized by a variety of fault sources with non-Gaussian characteristics.

Journal ArticleDOI
TL;DR: A comparative study of standard endmember extraction algorithms using a custom-designed quantitative and comparative framework that involves both the spectral and spatial information indicates that endmember selection and subsequent mixed-pixel interpretation by a linear mixture model are more successful when methods combining spatial and spectral information are applied.
Abstract: Linear spectral unmixing is a commonly accepted approach to mixed-pixel classification in hyperspectral imagery. This approach involves two steps. First, to find spectrally unique signatures of pure ground components, usually known as endmembers, and, second, to express mixed pixels as linear combinations of endmember materials. Over the past years, several algorithms have been developed for autonomous and supervised endmember extraction from hyperspectral data. Due to a lack of commonly accepted data and quantitative approaches to substantiate new algorithms, available methods have not been rigorously compared by using a unified scheme. In this paper, we present a comparative study of standard endmember extraction algorithms using a custom-designed quantitative and comparative framework that involves both the spectral and spatial information. The algorithms considered in this study represent substantially different design choices. A database formed by simulated and real hyperspectral data collected by the Airborne Visible and Infrared Imaging Spectrometer (AVIRIS) is used to investigate the impact of noise, mixture complexity, and use of radiance/reflectance data on algorithm performance. The results obtained indicate that endmember selection and subsequent mixed-pixel interpretation by a linear mixture model are more successful when methods combining spatial and spectral information are applied.

Journal ArticleDOI
TL;DR: This paper reviews those techniques that preserve the underlying semantics of the data, using crisp and fuzzy rough set-based methodologies, and several approaches to feature selection based on rough set theory are experimentally compared.
Abstract: Semantics-preserving dimensionality reduction refers to the problem of selecting those input features that are most predictive of a given outcome; a problem encountered in many areas such as machine learning, pattern recognition, and signal processing. This has found successful application in tasks that involve data sets containing huge numbers of features (in the order of tens of thousands), which would be impossible to process further. Recent examples include text processing and Web content classification. One of the many successful applications of rough set theory has been to this feature selection area. This paper reviews those techniques that preserve the underlying semantics of the data, using crisp and fuzzy rough set-based methodologies. Several approaches to feature selection based on rough set theory are experimentally compared. Additionally, a new area in feature selection, feature grouping, is highlighted and a rough set-based feature grouping technique is detailed.

Proceedings ArticleDOI
27 Jun 2004
TL;DR: Empirically to what extent pure bottom-up attention can extract useful information about the location, size and shape of objects from images and how this information can be utilized to enable unsupervised learning of Objects from unlabeled images is investigated.
Abstract: A key problem in learning multiple objects from unlabeled images is that it is a priori impossible to tell which part of the image corresponds to each individual object, and which part is irrelevant clutter which is not associated to the objects. We investigate empirically to what extent pure bottom-up attention can extract useful information about the location, size and shape of objects from images and demonstrate how this information can be utilized to enable unsupervised learning of objects from unlabeled images. Our experiments demonstrate that the proposed approach to using bottom-up attention is indeed useful for a variety of applications.

Journal ArticleDOI
TL;DR: It is shown that an efficient face detection system does not require any costly local preprocessing before classification of image areas, and provides very high detection rate with a particularly low level of false positives, demonstrated on difficult test sets, without requiring the use of multiple networks for handling difficult cases.
Abstract: In this paper, we present a novel face detection approach based on a convolutional neural architecture, designed to robustly detect highly variable face patterns, rotated up to /spl plusmn/20 degrees in image plane and turned up to /spl plusmn/60 degrees, in complex real world images. The proposed system automatically synthesizes simple problem-specific feature extractors from a training set of face and nonface patterns, without making any assumptions or using any hand-made design concerning the features to extract or the areas of the face pattern to analyze. The face detection procedure acts like a pipeline of simple convolution and subsampling modules that treat the raw input image as a whole. We therefore show that an efficient face detection system does not require any costly local preprocessing before classification of image areas. The proposed scheme provides very high detection rate with a particularly low level of false positives, demonstrated on difficult test sets, without requiring the use of multiple networks for handling difficult cases. We present extensive experimental results illustrating the efficiency of the proposed approach on difficult test sets and including an in-depth sensitivity analysis with respect to the degrees of variability of the face patterns.

Journal ArticleDOI
TL;DR: A view-based approach to recognize humans from their gait by employing a hidden Markov model (HMM) and the statistical nature of the HMM lends overall robustness to representation and recognition.
Abstract: We propose a view-based approach to recognize humans from their gait. Two different image features have been considered: the width of the outer contour of the binarized silhouette of the walking person and the entire binary silhouette itself. To obtain the observation vector from the image features, we employ two different methods. In the first method, referred to as the indirect approach, the high-dimensional image feature is transformed to a lower dimensional space by generating what we call the frame to exemplar (FED) distance. The FED vector captures both structural and dynamic traits of each individual. For compact and effective gait representation and recognition, the gait information in the FED vector sequences is captured in a hidden Markov model (HMM). In the second method, referred to as the direct approach, we work with the feature vector directly (as opposed to computing the FED) and train an HMM. We estimate the HMM parameters (specifically the observation probability B) based on the distance between the exemplars and the image features. In this way, we avoid learning high-dimensional probability density functions. The statistical nature of the HMM lends overall robustness to representation and recognition. The performance of the methods is illustrated using several databases.

Proceedings ArticleDOI
23 Aug 2004
TL;DR: A new method for extracting features from palmprints using the competitive coding scheme and angular matching and the execution time for the whole process of verification, including preprocessing, feature extraction and final matching is 1s.
Abstract: There is increasing interest in the development of reliable, rapid and non-intrusive security control systems. Among the many approaches, biometrics such as palmprints provide highly effective automatic mechanisms for use in personal identification. This paper presents a new method for extracting features from palmprints using the competitive coding scheme and angular matching. The competitive coding scheme uses multiple 2-D Gabor filters to extract orientation information from palm lines. This information is then stored in a feature vector called the competitive code. The angular matching with an effective implementation is then defined for comparing the proposed codes, which can make over 9,000 comparisons within 1s. In our testing database of 7,752 palmprint samples from 386 palms, we can achieve a high genuine acceptance rate of 98.4% and a low false acceptance rate of 3/spl times/10/sup -6/%. The execution time for the whole process of verification, including preprocessing, feature extraction and final matching, is 1s.

Journal ArticleDOI
TL;DR: The task of classifying the types of moving vehicles in a distributed, wireless sensor network is investigated and a data set that consists of 820 MByte raw time series data, 70 MByte of preprocessed, extracted spectral feature vectors, and baseline classification results using the maximum likelihood classifier is compiled.

Proceedings ArticleDOI
27 Jun 2004
TL;DR: It is proved that an efficient, globally optimal algorithm exists for the co- embedding problem and an important sub-family of correspondence functions can be reduced to co-embedding prototypes and segments to N-D Euclidean space.
Abstract: We present an unsupervised technique for detecting unusual activity in a large video set using many simple features. No complex activity models and no supervised feature selections are used. We divide the video into equal length segments and classify the extracted features into prototypes, from which a prototype-segment co-occurrence matrix is computed. Motivated by a similar problem in document-keyword analysis, we seek a correspondence relationship between prototypes and video segments which satisfies the transitive closure constraint. We show that an important sub-family of correspondence functions can be reduced to co-embedding prototypes and segments to N-D Euclidean space. We prove that an efficient, globally optimal algorithm exists for the co-embedding problem. Experiments on various real-life videos have validated our approach.

Journal ArticleDOI
TL;DR: Recursive Feature Elimination and Zero-Norm Optimization which are based on the training of support vector machines (SVM) can provide more accurate solutions than standard filter methods for feature selection for EEG channels.
Abstract: Designing a brain computer interface (BCI) system one can choose from a variety of features that may be useful for classifying brain activity during a mental task. For the special case of classifying electroencephalogram (EEG) signals we propose the usage of the state of the art feature selection algorithms Recursive Feature Elimination and Zero-Norm Optimization which are based on the training of support vector machines (SVM) . These algorithms can provide more accurate solutions than standard filter methods for feature selection . We adapt the methods for the purpose of selecting EEG channels. For a motor imagery paradigm we show that the number of used channels can be reduced significantly without increasing the classification error. The resulting best channels agree well with the expected underlying cortical activity patterns during the mental tasks. Furthermore we show how time dependent task specific information can be visualized.

Journal ArticleDOI
TL;DR: A study is presented to compare the performance of gear fault detection using artificial neural networks (ANNs) and support vector machines (SMVs) and for most of the cases considered, the classification accuracy of SVM is better than ANN, without GA.

Proceedings ArticleDOI
27 Jun 2004
TL;DR: A dual-space LDA approach for face recognition is proposed to take full advantage of the discriminative information in the face space and outperforms existing LDA approaches.
Abstract: Linear discriminant analysis (LDA) is a popular feature extraction technique for face recognition. However, it often suffers from the small sample size problem when dealing with the high dimensional face data. Some approaches have been proposed to overcome this problem, but they are often unstable and have to discard some discriminative information. In this paper, a dual-space LDA approach for face recognition is proposed to take full advantage of the discriminative information in the face space. Based on a probabilistic visual model, the eigenvalue spectrum in the null space of within-class scatter matrix is estimated, and discriminant analysis is simultaneously applied in the principal and null subspaces of the within-class scatter matrix. The two sets of discriminative features are then combined for recognition. It outperforms existing LDA approaches.

Journal ArticleDOI
TL;DR: This work proposes a general approach for the design of 2D feature detectors from a class of steerable functions based on the optimization of a Canny-like criterion that yields operators that have a better orientation selectivity than the classical gradient or Hessian-based detectors.
Abstract: We propose a general approach for the design of 2D feature detectors from a class of steerable functions based on the optimization of a Canny-like criterion. In contrast with previous computational designs, our approach is truly 2D and provides filters that have closed-form expressions. It also yields operators that have a better orientation selectivity than the classical gradient or Hessian-based detectors. We illustrate the method with the design of operators for edge and ridge detection. We present some experimental results that demonstrate the performance improvement of these new feature detectors. We propose computationally efficient local optimization algorithms for the estimation of feature orientation. We also introduce the notion of shape-adaptable feature detection and use it for the detection of image corners.

Journal ArticleDOI
TL;DR: The new method provides greater weight to samples near the expected decision boundary, which tends to provide for increased classification accuracy and to reduce the effect of the singularity problem.
Abstract: In this paper, a new nonparametric feature extraction method is proposed for high-dimensional multiclass pattern recognition problems. It is based on a nonparametric extension of scatter matrices. There are at least two advantages to using the proposed nonparametric scatter matrices. First, they are generally of full rank. This provides the ability to specify the number of extracted features desired and to reduce the effect of the singularity problem. This is in contrast to parametric discriminant analysis, which usually only can extract L-1 (number of classes minus one) features. In a real situation, this may not be enough. Second, the nonparametric nature of scatter matrices reduces the effects of outliers and works well even for nonnormal datasets. The new method provides greater weight to samples near the expected decision boundary. This tends to provide for increased classification accuracy.

Proceedings ArticleDOI
17 May 2004
TL;DR: A novel approach to the combination of acoustic features and language information for a most robust automatic recognition of a speaker's emotion by applying belief network based spotting for emotional key-phrases is introduced.
Abstract: In this paper we introduce a novel approach to the combination of acoustic features and language information for a most robust automatic recognition of a speaker's emotion. Seven discrete emotional states are classified throughout the work. Firstly a model for the recognition of emotion by acoustic features is presented. The derived features of the signal-, pitch-, energy, and spectral contours are ranked by their quantitative contribution to the estimation of an emotion. Several different classification methods including linear classifiers, Gaussian mixture models, neural nets, and support vector machines are compared by their performance within this task. Secondly an approach to emotion recognition by the spoken content is introduced applying belief network based spotting for emotional key-phrases. Finally the two information sources are integrated in a soft decision fusion by using a neural net. The gain is evaluated and compared to other advances. Two emotional speech corpora used for training and evaluation are described in detail and the results achieved applying the propagated novel advance to speaker emotion recognition are presented and discussed.

Journal ArticleDOI
TL;DR: This paper first model face difference with three components: intrinsic difference, transformation difference, and noise, and builds a unified framework by using this face difference model and a detailed subspace analysis on the three components.
Abstract: PCA, LDA, and Bayesian analysis are the three most representative subspace face recognition approaches. In this paper, we show that they can be unified under the same framework. We first model face difference with three components: intrinsic difference, transformation difference, and noise. A unified framework is then constructed by using this face difference model and a detailed subspace analysis on the three components. We explain the inherent relationship among different subspace methods and their unique contributions to the extraction of discriminating information from the face difference. Based on the framework, a unified subspace analysis method is developed using PCA, Bayes, and LDA as three steps. A 3D parameter space is constructed using the three subspace dimensions as axes. Searching through this parameter space, we achieve better recognition performance than standard subspace methods.