scispace - formally typeset
Search or ask a question

Showing papers by "Yücel Yemez published in 2004"


Journal ArticleDOI
TL;DR: A robust and accurate system for 3D reconstruction of real objects with high resolution shape and texture and a texture mapping strategy based on surface particles to adequately address photography related problems such as inhomogeneous lighting, highlights and occlusion are presented.

75 citations


Proceedings ArticleDOI
24 Oct 2004
TL;DR: This paper addresses the selection of best lip motion features for biometric open-set speaker identification by using a probabilistic measure that maximizes the ratio of intra-class and inter-class probabilities.
Abstract: This paper addresses the selection of best lip motion features for biometric open-set speaker identification. The best features are those that result in the highest discrimination of individual speakers in a population. We first detect the face region in each video frame. The lip region for each frame is then segmented following registration of successive face regions by global motion compensation. The initial lip feature vector is composed of the 2D-DCT coefficients of the optical flow vectors within the lip region at each frame. The discriminant analysis is composed of two stages. At the first stage, the most discriminative features are selected from the full set of DCT coefficients of a single lip motion frame by using a probabilistic measure that maximizes the ratio of intra-class and inter-class probabilities. At the second stage, the resulting discriminative feature vectors are interpolated and concatenated for each time instant within a neighborhood, and further analyzed by LDA to reduce dimension, this time taking into account temporal discrimination information. Experimental results of the HMM-based speaker identification system are included to demonstrate the performance.

22 citations


Proceedings ArticleDOI
28 Apr 2004
TL;DR: Experimental results support that the resulting discriminative feature vector with reduced dimension improves the identification performance.
Abstract: The paper addresses the selection of the best lip motion features for biometric open-set speaker identification. The best features are those that result in the highest discrimination of individual speakers in a population. We first detect the face region in each video frame. The lip region for each frame is then segmented following registration of successive face regions by global motion compensation. The initial lip feature vector is composed of the 2D-DCT coefficients of the optical flow vectors within the lip region at each frame. We propose to select the most discriminative features from the full set of transform coefficients by using a probabilistic measure that maximizes the ratio of intra-class and inter-class probabilities. The resulting discriminative feature vector with reduced dimension is expected to maximize the identification performance. Experimental results support that the resulting discriminative feature vector with reduced dimension improves the identification performance.

2 citations


Proceedings ArticleDOI
01 Jan 2004
TL;DR: This paper addresses the selection of best lip motion features for biometric open-set speaker identification by using a probabilistic measure that maximizes the ratio of intra-class and inter-class probabilities and expects the resulting discriminative feature vector with reduced dimension to maximize the identification performance.
Abstract: This paper addresses the selection of best lip motion features for biometric open-set speaker identification. The best features are those that result in the highest discrimination of individual speakers in a population. We first detect the face region in each video frame. The lip region for each frame is then segmented following the registration of successive face regions by global motion compensation. The initial lip feature vector is composed of the 2D-DCT coefficients of the optical flow vectors within the lip region at each frame. We propose to select the most discriminative features from the full set of transform coefficients by using a probabilistic measure that maximizes the ratio of intra-class and inter-class probabilities. The resulting discriminative feature vector with reduced dimension is expected to maximize the identification performance. Experimental results are also included to demonstrate the performance.

1 citations


Proceedings ArticleDOI
04 Oct 2004
TL;DR: A new adaptive cascade rule is proposed that favors reliable modality combinations through a cascade of classifiers and is more robust in the presence of unreliable modalities, and outperforms the hard-level max rule and soft-level weighted summation rule.
Abstract: We present a multimodal open-set speaker identification system that integrates information coming from audio, face and lip motion modalities. For fusion of multiple modalities, we propose a new adaptive cascade rule that favors reliable modality combinations through a cascade of classifiers. The order of the classifiers in the cascade is adaptively determined based on the reliability of each modality combination. A novel reliability measure, that genuinely fits to the open-set speaker identification problem, is also proposed to assess accept or reject decisions of a classifier. The proposed adaptive rule is more robust in the presence of unreliable modalities, and outperforms the hard-level max rule and soft-level weighted summation rule, provided that the employed reliability measure is effective in assessment of classifier decisions. Experimental results that support this assertion are provided.