scispace - formally typeset
Search or ask a question

Showing papers by "Yücel Yemez published in 2003"


Proceedings ArticleDOI
06 Jul 2003
TL;DR: A bimodal audio-visual speaker identification system that exploits not only the temporal and spatial correlations existing in the speech and video signals of a speaker, but also the cross-correlation between these two modalities.
Abstract: In this paper we present a bimodal audio-visual speaker identification system. The objective is to improve the recognition performance over conventional unimodal schemes. The proposed system exploits not only the temporal and spatial correlations existing in speech and video signals of a speaker, but also the cross-correlation between these two modalities. Lip images extracted for each video frame are transformed onto an eigenspace. The obtained eigenlip coefficients are interpolated to match the rate of the speech signal and fused with mel frequency cepstral coefficients (MFCC) of the corresponding speech signal. The resulting joint feature vectors are used to train and test a hidden Markov model (HMM) based identification system. Experimental results are also included for demonstration of the system performance.

47 citations


Proceedings ArticleDOI
24 Nov 2003
TL;DR: The proposed system decomposes the information existing in a video stream into three components: speech, face texture and lip motion, which is used to train and test a Hidden Markov Model (HMM) based identification system.
Abstract: In this paper we present a multimodal audio-visual speaker identification system. The objective is to improve the recognition performance over conventional unimodal schemes. The proposed system decomposes the information existing in a video stream into three components: speech, face texture and lip motion. Lip motion between successive frames is first computed in terms of optical flow vectors and then encoded as a feature vector in a magnitude direction histogram domain. The feature vectors obtained along the whole stream are then interpolated to match the rate of the speech signal and fused with mel frequency cepstral coefficients (MFCC) of the corresponding speech signal. The resulting joint feature vectors are used to train and test a Hidden Markov Model (HMM) based identification system. Face texture images are treated separately in eigenface domain and integrated to the system through decision-fusion. Experimental results are also included for demonstration of the system performance.

15 citations


Journal ArticleDOI
TL;DR: A multilevel representation scheme adapted to storage, progressive transmission, and rendering of dense data sampled on the surface of real objects using surface particles associated to a hierarchical space partitioning based on an octree.
Abstract: We present a multilevel representation scheme adapted to storage, progressive transmission, and rendering of dense data sampled on the surface of real objects. Geometry and object attributes, such as color and normal, are encoded in terms of surface particles associated to a hierarchical space partitioning based on an octree. Appropriate ordering of surface particles results in a compact multilevel representation without increasing the size of the uniresolution model corresponding to the highest level of detail. This compact representation can progressively be decoded by the viewer and transformed by a fast direct triangulation technique into a sequence of triangle meshes with increasing levels of detail. The representation requires approximately 5 bits per particle (2.5 bits per triangle) to encode the basic geometrical structure. The vertex positions can then be refined by means of additional precision bits, resulting in 5 to 9 bits per triangle for representing a 12-bit quantized geometry. The proposed representation scheme is demonstrated with the surface data of various real objects.

10 citations