scispace - formally typeset
Search or ask a question

Showing papers by "Emmanuel Dellandréa published in 2011"


Journal ArticleDOI
01 Oct 2011
TL;DR: This paper proposes a general learning-based framework for reliable landmark localization on 3-D facial data under challenging conditions (i.e., facial expressions and occlusions), and relies on a statistical model, called3-D statistical facial feature model, which learns both the global variations in configurational relationships between landmarks and the local variations of texture and geometry around each landmark.
Abstract: Three-dimensional face landmarking aims at automatically localizing facial landmarks and has a wide range of applications (e.g., face recognition, face tracking, and facial expression analysis). Existing methods assume neutral facial expressions and unoccluded faces. In this paper, we propose a general learning-based framework for reliable landmark localization on 3-D facial data under challenging conditions (i.e., facial expressions and occlusions). Our approach relies on a statistical model, called 3-D statistical facial feature model, which learns both the global variations in configurational relationships between landmarks and the local variations of texture and geometry around each landmark. Based on this model, we further propose an occlusion classifier and a fitting algorithm. Results from experiments on three publicly available 3-D face databases (FRGC, BU-3-DFE, and Bosphorus) demonstrate the effectiveness of our approach, in terms of landmarking accuracy and robustness, in the presence of expressions and occlusions.

72 citations


Book ChapterDOI
09 Oct 2011
TL;DR: Two textual features to catch the text emotional meaning are developed: one is based on the semantic distance matrix between the text and an emotional dictionary, and the other one carries the valence and arousal meanings of words.
Abstract: Many images carry a strong emotional semantic. These last years, some investigations have been driven to automatically identify induced emotions that may arise in viewers when looking at images, based on low-level image properties. Since these features can only catch the image atmosphere, they may fail when the emotional semantic is carried by objects. Therefore additional information is needed, and we propose in this paper to make use of textual information describing the image, such as tags. Thus, we have developed two textual features to catch the text emotional meaning: one is based on the semantic distance matrix between the text and an emotional dictionary, and the other one carries the valence and arousal meanings of words. Experiments have been driven on two datasets to evaluate visual and textual features and their fusion. The results have shown that our textual features can improve the classification accuracy of affective images.

43 citations


19 Sep 2011
TL;DR: In this article, two kinds of textual features were proposed to extract semantic meanings from text associated to images: one is based on semantic distance matrix between the text and a semantic dictionary, and the other one carries the valence and arousal meanings by making use of the Affective Norms for English Words (ANEW) dataset.
Abstract: In this paper, we focus on one of the ImageCLEF tasks that LIRIS-Imagine research group participated: visual concept detection and annotation. For this task, we firstly propose two kinds of textual fea- tures to extract semantic meanings from text associated to images: one is based on semantic distance matrix between the text and a semantic dictionary, and the other one carries the valence and arousal meanings by making use of the Affective Norms for English Words (ANEW) dataset. Meanwhile, we investigate efficiency of different visual features including color, texture, shape, high level features, and we test four fusion meth- ods to combine various features to improve the performance including min, max, mean and score. The results have shown that combination of our textural features and visual features can improve the performance significantly.

14 citations


Proceedings Article
05 Mar 2011
TL;DR: A new approach that combines different classifiers based on Dempster-Shafer’s theory of evidence, which has the ability to handle ambiguous and uncertain knowledge such as the properties of emotions is proposed.
Abstract: Recognition of emotional semantics in images is a new and very challenging research direction that gains more and more attention in the research community. As an emerging topic, publications remains relatively rare and numerous issues need to be addressed. In this paper, we propose to investigate the efficiency of different types of features including low-level features and proposed semantic features for classification of emotional semantics in images. Moreover, we propose a new approach that combines different classifiers based on Dempster-Shafer’s theory of evidence, which has the ability to handle ambiguous and uncertain knowledge such as the properties of emotions. Experiments driven on the International Affective Picture System (IAPS) image databases, which is a common stimulus set frequently used in emotion psychology research, demonstrated that the proposed approach can achieve promising results.

11 citations


Journal ArticleDOI
TL;DR: This paper proposes an automatically elaborated hierarchical classification scheme (ACS), which is driven by an evidence theory-based embedded feature-selection scheme (ESFS), for the purpose of application-dependent emotions' recognition.
Abstract: Current machine-based techniques for vocal emotion recognition only consider a finite number of clearly labeled emotional classes whereas the kinds of emotional classes and their number are typically application dependent. Previous studies have shown that multistage classification scheme, because of ambiguous nature of affect classes, helps to improve emotion classification accuracy. However, these multistage classification schemes were manually elaborated by taking into account the underlying emotional classes to be discriminated. In this paper, we propose an automatically elaborated hierarchical classification scheme (ACS), which is driven by an evidence theory-based embedded feature-selection scheme (ESFS), for the purpose of application-dependent emotions' recognition. Experimented on the Berlin dataset with 68 features and six emotion states, this automatically elaborated hierarchical classifier (ACS) showed its effectiveness, displaying a 71.38% classification accuracy rate compared to a 71.52% classification rate achieved by our previously dimensional model-driven but still manually elaborated multistage classifier (DEC). Using the DES dataset with five emotion states, our ACS achieved a 76.74% recognition rate compared to a 81.22% accuracy rate displayed by a manually elaborated multistage classification scheme (DEC).

4 citations


Proceedings ArticleDOI
29 Aug 2011
TL;DR: The results have shown that the approach is more efficient than a sparse representation being only reconstructive, which indicates that adding a discriminative term for constructing the sparse representation is more suitable for the categorization purpose.
Abstract: Sparse representation was originally used in signal processing as a powerful tool for acquiring, representing and compressing high-dimensional signals. Recently, motivated by the great successes it has achieved, it has become a hot research topic in the domain of computer vision and pattern recognition. In this paper, we propose to adapt sparse representation to the problem of Visual Object Categorization which aims at predicting whether at least one or several objects of some given categories are present in an image. Thus, we have elaborated a reconstructive and discriminative sparse representation of images, which integrates a discriminative term, such as Fisher discriminative measure or the output of a SVM classifier, into the standard sparse representation objective function in order to learn a reconstructive and discriminative dictionary. Experiments carried out on the SIMPLIcity image dataset have clearly revealed that our reconstructive and discriminative approach has gained an obvious improvement of the classification accuracy compared to standard SVM using image features as input. Moreover, the results have shown that our approach is more efficient than a sparse representation being only reconstructive, which indicates that adding a discriminative term for constructing the sparse representation is more suitable for the categorization purpose.

2 citations


Proceedings Article
01 Jan 2011
TL;DR: It is shown how OMNIA can be used for simple, efficient, and intuitive asset search in the context of graphic design applications.
Abstract: This paper describes OMNIA, a system and interface for searching in multimodal image collections. OMNIA includes a set of tools which allow the user to retrieve assets using different features. The tools are based on extracting different types of asset features, which are content, aesthetic, and emotion. Visual-based features are used to retrieve assets using each of these tools. In addition, text-based features can be used to retrieve image assets based on content. Different datasets are used in OMNIA and retrieved assets are displayed in such a way which facilitates user navigation. It is shown how OMNIA can be used for simple, efficient, and intuitive asset search in the context of graphic design applications.

2 citations


Proceedings ArticleDOI
06 Dec 2011
TL;DR: A statistical AU space for the purpose of AU interpretation is proposed and similarity scores from the previously proposed statistical feature models are used for defining the coordinate of an expression displayed on the facial scan.
Abstract: A commonly accepted postulate is that facial expression recognition (FER) can be carried out by interpretation of facial action units (AUs) through high-level decision making rules. Meanwhile, most studies on AU-based FER simply detect AUs and do not map their AU detection results into expressions. In this paper, we propose to build a statistical AU space for the purpose of AU interpretation. Similarity scores from the previously proposed statistical feature models are used for defining the coordinate of an expression displayed on the facial scan. These scores are further fed to a SVM classifier to interpret expression into one of the six universal emotions. The preliminary results demonstrate the potential effectiveness of applying AU space for FER through AU interpretation.

1 citations


01 Jan 2011
TL;DR: This paper summarizes the approach submitted to Semantic Indexing (SIN) task in TRECVID 2011, which adopts bag-of-features method to transform original visual and audio features into histogram features, using pre-trained codebook.
Abstract: This is the first time that our team participate TRECVID. This paper summarizes our approach submitted to Semantic Indexing (SIN) task in TRECVID 2011. Our approach adopts bag-of-features method to transform original visual and audio features into histogram features, using pre-trained codebook. After feature transformation, one-versus-others SVMs with Chi-square kernel are trained. In decision step, averaged probability is calculated as a final score to rank shots. Under this framework, we tested 4 visual features including dense grid SIFT, color SIFT, OLBPC and DAISY together with 1 audio feature consisting of MFCC with delta and acceleration. Our audio visual combination model achieves best results in terms of mean xinfAP. Besides, considering the huge amount of data this year, we employed several speedup strategies such as k-means clustering with GPU acceleration and homogeneous kernel map. All these efforts rank us at the 12 th out of 19 teams in full run and the 13 th out of 27 teams in the light run test.

1 citations