scispace - formally typeset
Search or ask a question

Showing papers by "Emmanuel Dellandréa published in 2016"


Proceedings ArticleDOI
27 Jun 2016
TL;DR: Strong evidence is found that visual similarity and semantic relatedness are complementary for the task, and when combined notably improve detection, achieving state-of-the-art detection performance in a semi-supervised setting.
Abstract: Deep CNN-based object detection systems have achieved remarkable success on several large-scale object detection benchmarks. However, training such detectors requires a large number of labeled bounding boxes, which are more difficult to obtain than image-level annotations. Previous work addresses this issue by transforming image-level classifiers into object detectors. This is done by modeling the differences between the two on categories with both imagelevel and bounding box annotations, and transferring this information to convert classifiers to detectors for categories without bounding box annotations. We improve this previous work by incorporating knowledge about object similarities from visual and semantic domains during the transfer process. The intuition behind our proposed method is that visually and semantically similar categories should exhibit more common transferable properties than dissimilar categories, e.g. a better detector would result by transforming the differences between a dog classifier and a dog detector onto the cat class, than would by transforming from the violin class. Experimental results on the challenging ILSVRC2013 detection dataset demonstrate that each of our proposed object similarity based knowledge transfer methods outperforms the baseline methods. We found strong evidence that visual similarity and semantic relatedness are complementary for the task, and when combined notably improve detection, achieving state-of-the-art detection performance in a semi-supervised setting.

178 citations


Proceedings Article
23 Aug 2016
TL;DR: Since 2010, ImageCLEF has run a scalable image annotation task, to promote research into the annotation of images using noisy web page data to improve image annotation, and there are interesting insights into these very relevant challenges.
Abstract: Since 2010, ImageCLEF has run a scalable image annotation task, to promote research into the annotation of images using noisy web page data. It aims to develop techniques to allow computers to describe images reliably, localise di erent concepts depicted and generate descriptions of the scenes. The primary goal of the challenge is to encourage creative ideas of using web page data to improve image annotation. Three subtasks and two pilot teaser tasks were available to participants; all tasks use a single mixed modality data source of 510,123 web page items for both training and test. The dataset included raw images, textual features obtained from the web pages on which the images appeared, as well as extracted visual features. Extracted from the Web by querying popular image search engines, the dataset was formed. For the main subtasks, the development and test sets were both taken from the \training set". For the teaser tasks, 200,000 web page items were reserved for testing, and a separate development set was provided. The 251 concepts were chosen to be visual objects that are localizable and that are useful for generating textual descriptions of the visual content of images and were mined from the texts of our extensive database of image-webpage pairs. This year seven groups participated in the task, submitting over 50 runs across all subtasks, and all participants also provided working notes papers. In general, the groups' performance is impressive across the tasks, and there are interesting insights into these very relevant challenges.

32 citations


Journal ArticleDOI
TL;DR: This paper proposes an automatic emotion annotation solution on 2.5-D facial data collected from RGB-D cameras, consisting of a facial landmarking method and a FER method that has achieved satisfactory results on three publicly accessible facial databases.
Abstract: People with low vision, Alzheimer’s disease, and autism spectrum disorder experience difficulties in perceiving or interpreting facial expression of emotion in their social lives. Though automatic facial expression recognition (FER) methods on 2-D videos have been extensively investigated, their performance was constrained by challenges in head pose and lighting conditions. The shape information in 3-D facial data can reduce or even overcome these challenges. However, high expenses of 3-D cameras prevent their widespread use. Fortunately, 2.5-D facial data from emerging portable RGB-D cameras provide a good balance for this dilemma. In this paper, we propose an automatic emotion annotation solution on 2.5-D facial data collected from RGB-D cameras. The solution consists of a facial landmarking method and a FER method. Specifically, we propose building a deformable partial face model and fit the model to a 2.5-D face for localizing facial landmarks automatically. In FER, a novel action unit (AU) space-based FER method has been proposed. Facial features are extracted using landmarks and further represented as coordinates in the AU space, which are classified into facial expressions. Evaluated on three publicly accessible facial databases, namely EURECOM, FRGC, and Bosphorus databases, the proposed facial landmarking and expression recognition methods have achieved satisfactory results. Possible real-world applications using our algorithms have also been discussed.

29 citations


22 Nov 2016
TL;DR: The 2018 edition of the MediaEval 2018 Emotional Impact of Movies Task as mentioned in this paper focused on predicting the emotional impact that video content will have on viewers, in terms of valence, arousal and fear.
Abstract: This paper provides a description of the MediaEval 2018 “Emotional Impact of Movies task". It continues to build on last year’s edition, integrating the feedback of previous participants. The goal is to create systems that automatically predict the emotional impact that video content will have on viewers, in terms of valence, arousal and fear. Here we provide a description of the use case, task challenges, dataset and ground truth, task run requirements and evaluation metrics.

25 citations



Patent
02 Feb 2016
TL;DR: In this article, a method for determining a personalized profile related to an emotion is proposed, which consists of successive rendering of items of a set of items associated each with a level of compliance to the emotion and ordered according to the levels, until receiving, in response to the rendering of one of the items, a stop indicator; partitioning the set of acceptable items into a subset of accepted items and a subsets of rejected items, acquiring at least one ranking value for each rejected item; and determining the personalized profile according to acquired ranking values.
Abstract: A first method, for determining a personalized profile related to an emotion, comprises: successive rendering of items of a set of items associated each with a level of compliance to the emotion and ordered according to the levels, until receiving, in response to the rendering of one of the items, a stop indicator; partitioning the set of items into a subset of acceptable items and a subset of rejected items according to the stop indicator; acquiring at least one ranking value for each rejected item; and determining the personalized profile according to the acquired ranking values. A second method, for filtering excerpts of a multimedia content, comprises: determining or receiving an emotional profile of the content; determining or receiving at least one personalized profile resulting from performing the first method; and filtering excerpts from the content, using the emotional profile and the at least one personalized profile.