scispace - formally typeset
Search or ask a question

Showing papers by "Hazim Kemal Ekenel published in 2011"


Proceedings ArticleDOI
20 Jun 2011
TL;DR: A common framework for realtime action unit detection and emotion recognition that is developed for the emotion recognition andaction unit detection sub-challenges of the FG 2011 Facial Expression Recognition and Analysis Challenge is presented.
Abstract: In this paper, we present a common framework for realtime action unit detection and emotion recognition that we have developed for the emotion recognition and action unit detection sub-challenges of the FG 2011 Facial Expression Recognition and Analysis Challenge. For these tasks we employed a local appearance-based face representation approach using discrete cosine transform, which has been shown to be very effective and robust for face recognition. Using these features, we trained multiple one-versus-all support vector machine classifiers corresponding to the individual classes of the specific task. With this framework we achieve 24.2% and 7.6% absolute improvement over the overall baseline results on the emotion recognition and action unit detection sub-challenge, respectively.

30 citations


Journal ArticleDOI
TL;DR: The proposed system employs a face tracker that can track faces up to full profile views and provides temporal association of the face images in the video, so that instead of using single images for query or target, whole tracks can be used.
Abstract: In this paper, we present a system for person re-identification in TV series. In the context of video retrieval, person re-identification refers to the task where a user clicks on a person in a video frame and the system then finds other occurrences of the same person in the same or different videos. The main characteristic of this scenario is that no previously collected training data is available, so no person-specific models can be trained in advance. Additionally, the query data is limited to the image that the user clicks on. These conditions pose a great challenge to the re-identification system, which has to find the same person in other shots despite large variations in the person's appearance. In the study, facial appearance is used as the re-identification cue, since, in contrast to surveillance-oriented re-identification studies, the person can have different clothing in different shots. In order to increase the amount of available face data, the proposed system employs a face tracker that can track faces up to full profile views. This makes it possible to use a profile face image as query image and also to retrieve images with non-frontal poses. It also provides temporal association of the face images in the video, so that instead of using single images for query or target, whole tracks can be used. A fast and robust face recognition algorithm is used to find matching faces. If the match result is highly confident, our system adds the matching face track to the query set. Finally, if the user is not satisfied with the number of returned results, the system can present a small number of candidate face images and lets the user confirm the ones that belong to the queried person. These features help to increase the variation in the query set, making it possible to retrieve results with different poses, illumination conditions, etc. The system is extensively evaluated on two episodes of the TV series Coupling, showing very promising results.

26 citations


Proceedings ArticleDOI
01 Nov 2011
TL;DR: This work proposes a framework for simultaneously detecting the presence of multiple facial action units using kernel partial least square regression (KPLS), which has the advantage of being easily extensible to learn more face related labels, while at the same time being computationally efficient.
Abstract: In this work, we propose a framework for simultaneously detecting the presence of multiple facial action units using kernel partial least square regression (KPLS). This method has the advantage of being easily extensible to learn more face related labels, while at the same time being computationally efficient. We compare the approach to linear and non-linear support vector machines (SVM) and evaluate its performance on the extended Cohn-Kanade (CK+) dataset and the GEneva Multimodal Emotion Portrayals (GEMEP-FERA) dataset, as well as across databases. It is shown that KPLS achieves around 2% absolute improvement over the SVM-based approach in terms of the two alternative forced choice (2AFC) score when trained on CK+ and tested on CK+ and GEMEP-FERA. It achieves around 6% absolute improvement over the SVM-based approach when trained on GEMEP-FERA and tested on CK+. We also show that KPLS is handling non-additive AU combinations better than SVM-based approaches trained to detect single AUs only.

23 citations


Proceedings ArticleDOI
01 Jan 2011
TL;DR: This paper presents a new discriminative face model based on boosting pseudo census transform features, considered to be less sensitive to illumination changes, which yields a more robust alignment algorithm.
Abstract: Face alignment using deformable face model has attracted broad interest in recent years for its wide range of applications in facial analysis. Previous work has shown that discriminative deformable models have better generalization capacity compared to generative models [8, 9]. In this paper, we present a new discriminative face model based on boosting pseudo census transform features. This feature is considered to be less sensitive to illumination changes, which yields a more robust alignment algorithm. The alignment is based on maximizing the scores of boosted strong classifier, which indicate whether the current alignment is a correct or incorrect one. The proposed approach has been evaluated extensively on several databases. The experimental results show that our approach generalizes better on unseen data compared to the Haar feature-based approach. Moreover, its training procedure is much faster due to the low dimensionality of the configuration space of the proposed feature.

7 citations


Proceedings ArticleDOI
29 Dec 2011
TL;DR: A real-time face detector based on Gabor features is described and an efficient discrete encoding method for the Gabor feature vector is proposed that enables us to use a computationally efficient multi-stage classifier based on boosting and winnowing.
Abstract: We describe a real-time face detector based on Gabor features. While Gabor features often lead to improved performance, they are often avoided as they are perceived as being computationally expensive. We address this in two ways. First, we propose an efficient discrete encoding method for the Gabor feature vector. This enables us to use a computationally efficient multi-stage classifier based on boosting and winnowing. Second, we accelerate computationally complex computations using the parallelization provided by graphics processing units (GPUs). With these innovations, the resulting detector runs at 16.8 fps for 640 × 480 images on a PC equipped with an i5 CPU and a GTX 465 graphic card.

6 citations


Proceedings ArticleDOI
20 Apr 2011
TL;DR: Magic Mirror is a face swapping tool that replaces the user's face with a selected famous person's face in the database via a user interface which enables the selection of the replacement face and directly reflects the changed appearance.
Abstract: Magic Mirror is a face swapping tool that replaces the user's face with a selected famous person's face in the database. System interacts with the user via a user interface which enables the selection of the replacement face and directly reflects the changed appearance. First, we apply a face detection mechanism to locate the face in the frame coming from the capturing device. Then, we feed the detection result to an active appearance model to get the exact shape of the face. By using extracted information, we replace the user's face with selected target face.We display output after some post processing for color and lighting adjustments.

5 citations


15 Nov 2011
TL;DR: The Quaero group is a consortium of French and German organizations working on Multimedia Indexing and Retrieval and LIG and KIT participated to the semantic indexing task and the organization of this task, with a system derived from the generic one they have for general purpose concept indexing in videos.
Abstract: The Quaero group is a consortium of French and German organizations working on Multimedia Indexing and Retrieval. LIG and KIT participated to the semantic indexing task and LIG participated to the organization of this task. LIG also participated to the multimedia event detection task. This paper describes these participations. For the semantic indexing task, our approach uses a six-stages processing pipelines for computing scores for the likelihood of a video shot to contain a target concept. These scores are then used for producing a ranked list of images or shots that are the most likely to contain the target concept. The pipeline is composed of the following steps: descriptor extraction, descriptor optimization, classifi cation, fusion of descriptor variants, higher-level fusion, and re-ranking. We used a number of diff erent descriptors and a hierarchical fusion strategy. We also used conceptual feedback by adding a vector of classi fication score to the pool of descriptors. The best Quaero run has a Mean Inferred Average Precision of 0.1529, which ranked us 3rd out of 19 participants. We participated to the multimedia event detection task with a system derived from the generic one we have for general purpose concept indexing in videos considering the target events as concepts. Detection scores on videos are produced from the scores on shots.

4 citations


Proceedings ArticleDOI
TL;DR: An appearance-based multimodal gesture recognition framework, which combines the different groups of features such as facial expression features and hand motion features which are extracted from image frames captured by a single web camera is presented.
Abstract: The use of gesture as a natural interface plays an utmost important role for achieving intelligent Human Computer Interaction (HCI). Human gestures include different components of visual actions such as motion of hands, facial expression, and torso, to convey meaning. So far, in the field of gesture recognition, most previous works have focused on the manual component of gestures. In this paper, we present an appearance-based multimodal gesture recognition framework, which combines the different groups of features such as facial expression features and hand motion features which are extracted from image frames captured by a single web camera. We refer 12 classes of human gestures with facial expression including neutral, negative and positive meanings from American Sign Languages (ASL). We combine the features in two levels by employing two fusion strategies. At the feature level, an early feature combination can be performed by concatenating and weighting different feature groups, and LDA is used to choose the most discriminative elements by projecting the feature on a discriminative expression space. The second strategy is applied on decision level. Weighted decisions from single modalities are fused in a later stage. A condensation-based algorithm is adopted for classification. We collected a data set with three to seven recording sessions and conducted experiments with the combination techniques. Experimental results showed that facial analysis improve hand gesture recognition, decision level fusion performs better than feature level fusion.

1 citations


Proceedings Article
01 Jan 2011
TL;DR: A system which automatically detects a list of important targets such as anchor speakers or active politicians in broadcast news videos and achieves a very high precision with a reasonable recall rate is proposed.
Abstract: Automatic face identification in multimedia archives such as broadcast news videos is useful for indexing or retrieving documents based on important persons that appear in the video. In this paper, we propose a system which automatically detects a list of important targets such as anchor speakers or active politicians in broadcast news videos. This involves several steps including detecting faces in various conditions, associating faces to tracks and identifying whether a face track contains certain faces defined in a watch list. We evaluated this system on a database, which contains about 36 hours of broadcast news videos. Experiments show that our system achieves a very high precision with a reasonable recall rate.

1 citations


01 Jan 2011
TL;DR: This evaluation is to assess the content-based system’s performance on the diversied content of the blip.tv web-video corpus, which is described in detail in [5].
Abstract: In this paper, we run our content-based video genre classication system on the MediaEval evaluation corpus. Our system is based on several low level audio-visual cues, as well as cognitive and structural information. The purpose of this evaluation is to assess our content-based system’s performance on the diversied content of the blip.tv web-video corpus, which is described in detail in [5].

1 citations


01 Jan 2011
TL;DR: This study addressed the following three interesting points: matching between face sequence and face image sets, the effect of automatically collected noisy training images from the web on the performance, and finally, the performance effect of utilizing prior information of cast list and performing the classification within a limited number of classes.
Abstract: In this paper, we present an initial study on an IMDB plug-in for cast identification in movies. In the system, training face images are collected by using Google image search. While watching a movie, the user clicks on the face of the person he is interested to acquire information. Afterwards, the system first tries to detect close to frontal faces, if it cannot find any, then it runs a profile face detector. The found face are then tracked backwards and forwards in the shot and this way a face sequence is obtained. Matching is performed between the extracted face sequence from the movie and the face image sets collected from the web. IMDB page links of the closest three persons resulted from the matching process is then presented to the user. In this study, we addressed the following three interesting points: matching between face sequence and face image sets, the effect of automatically collected noisy training images from the web on the performance, and finally, the performance effect of utilizing prior information of cast list and performing the classification within a limited number of classes. , ,

01 Jan 2011
TL;DR: The Quaero group is a consortium of French and German organizations working on Multimedia Indexing and Retrieval and LIG and KIT participated to the semantic indexing task and the organization of this task.
Abstract: The Quaero group is a consortium of French and German organizations working on Multimedia Indexing and Retrieval 1 . LIG and KIT participated to the semantic indexing task and LIG participated to the organization of this task. LIG also participated to the multimedia event detection task. This paper describes these participations. For the semantic indexing task, our approach uses a sixstages processing pipelines for computing scores for the likelihood of a video shot to contain a target concept. These scores are then used for producing a ranked list of images or shots that are the most likely to contain the target concept. The pipeline is composed of the following steps: descriptor extraction, descriptor optimization, classication, fusion of descriptor variants,

Proceedings ArticleDOI
20 Apr 2011
TL;DR: In this paper, an IMDB plug-in for cast identification in movies is presented, where the user clicks on the face of the person he is interested to acquire information and then runs a profile face detector.
Abstract: In this paper, we present an initial study on an IMDB plug-in for cast identification in movies. In the system, training face images are collected by using Google image search. While watching a movie, the user clicks on the face of the person he is interested to acquire information. Afterwards, the system first tries to detect close to frontal faces, if it cannot find any, then it runs a profile face detector. The found face are then tracked backwards and forwards in the shot and this way a face sequence is obtained. Matching is performed between the extracted face sequence from the movie and the face image sets collected from the web. IMDB page links of the closest three persons resulted from the matching process is then presented to the user. In this study, we addressed the following three interesting points: matching between face sequence and face image sets, the effect of automatically collected noisy training images from the web on the performance, and finally, the performance effect of utilizing prior information of cast list and performing the classification within a limited number of classes. Experiments have shown that matching between face sequence and face image sets is a difficult problem.