Search or ask a question

Showing papers by "Yücel Yemez published in 2003"

PDF

Open Access

Proceedings Article•DOI•

Joint audio-video processing for biometric speaker identification

[...]

Alper Kanak¹, Engin Erzin¹, Yücel Yemez¹, A.M. Tekalp¹•Institutions (1)

Koç University¹

06 Jul 2003

TL;DR: A bimodal audio-visual speaker identification system that exploits not only the temporal and spatial correlations existing in the speech and video signals of a speaker, but also the cross-correlation between these two modalities.

...read moreread less

Abstract: In this paper we present a bimodal audio-visual speaker identification system. The objective is to improve the recognition performance over conventional unimodal schemes. The proposed system exploits not only the temporal and spatial correlations existing in speech and video signals of a speaker, but also the cross-correlation between these two modalities. Lip images extracted for each video frame are transformed onto an eigenspace. The obtained eigenlip coefficients are interpolated to match the rate of the speech signal and fused with mel frequency cepstral coefficients (MFCC) of the corresponding speech signal. The resulting joint feature vectors are used to train and test a hidden Markov model (HMM) based identification system. Experimental results are also included for demonstration of the system performance.

...read moreread less

47 citations

Proceedings Article•DOI•

Multimodal speaker identification with audio-video processing

[...]

Yücel Yemez¹, Alper Kanak¹, Engin Erzin¹, A.M. Tekalp¹•Institutions (1)

Koç University¹

24 Nov 2003

TL;DR: The proposed system decomposes the information existing in a video stream into three components: speech, face texture and lip motion, which is used to train and test a Hidden Markov Model (HMM) based identification system.

...read moreread less

Abstract: In this paper we present a multimodal audio-visual speaker identification system. The objective is to improve the recognition performance over conventional unimodal schemes. The proposed system decomposes the information existing in a video stream into three components: speech, face texture and lip motion. Lip motion between successive frames is first computed in terms of optical flow vectors and then encoded as a feature vector in a magnitude direction histogram domain. The feature vectors obtained along the whole stream are then interpolated to match the rate of the speech signal and fused with mel frequency cepstral coefficients (MFCC) of the corresponding speech signal. The resulting joint feature vectors are used to train and test a Hidden Markov Model (HMM) based identification system. Face texture images are treated separately in eigenface domain and integrated to the system through decision-fusion. Experimental results are also included for demonstration of the system performance.

...read moreread less

15 citations

Journal Article•DOI•

Multilevel representation and transmission of real objects with progressive octree particles

[...]

Yücel Yemez¹, Francis Schmitt•Institutions (1)

Koç University¹

01 Oct 2003-IEEE Transactions on Visualization and Computer Graphics

TL;DR: A multilevel representation scheme adapted to storage, progressive transmission, and rendering of dense data sampled on the surface of real objects using surface particles associated to a hierarchical space partitioning based on an octree.

...read moreread less

Abstract: We present a multilevel representation scheme adapted to storage, progressive transmission, and rendering of dense data sampled on the surface of real objects. Geometry and object attributes, such as color and normal, are encoded in terms of surface particles associated to a hierarchical space partitioning based on an octree. Appropriate ordering of surface particles results in a compact multilevel representation without increasing the size of the uniresolution model corresponding to the highest level of detail. This compact representation can progressively be decoded by the viewer and transformed by a fast direct triangulation technique into a sequence of triangle meshes with increasing levels of detail. The representation requires approximately 5 bits per particle (2.5 bits per triangle) to encode the basic geometrical structure. The vertex positions can then be refined by means of additional precision bits, resulting in 5 to 9 bits per triangle for representing a 12-bit quantized geometry. The proposed representation scheme is demonstrated with the surface data of various real objects.

...read moreread less

10 citations