scispace - formally typeset
Search or ask a question

Showing papers by "Emmanuel Dellandréa published in 2012"


26 Nov 2012
TL;DR: The IRIM group is a consortium of French teams work- ing on Multimedia Indexing and Retrieval and its participation to the TRECVID 2011 se- mantic indexing and instance search tasks is described.
Abstract: The IRIM group is a consortium of French teams work- ing on Multimedia Indexing and Retrieval. This paper describes its participation to the TRECVID 2012 se- mantic indexing and instance search tasks. For the semantic indexing task, our approach uses a six-stages processing pipelines for computing scores for the likeli- hood of a video shot to contain a target concept. These scores are then used for producing a ranked list of im- ages or shots that are the most likely to contain the tar- get concept. The pipeline is composed of the following steps: descriptor extraction, descriptor optimization, classi cation, fusion of descriptor variants, higher-level fusion, and re-ranking. We evaluated a number of dif- ferent descriptors and tried di erent fusion strategies. The best IRIM run has a Mean Inferred Average Pre- cision of 0.2378, which ranked us 4th out of 16 partici- pants. For the instance search task, our approach uses two steps. First individual methods of participants are used to compute similrity between an example image of in- stance and keyframes of a video clip. Then a two-step fusion method is used to combine these individual re- sults and obtain a score for the likelihood of an instance to appear in a video clip. These scores are used to ob- tain a ranked list of clips the most likely to contain the queried instance. The best IRIM run has a MAP of 0.1192, which ranked us 29th on 79 fully automatic runs.

54 citations


Book ChapterDOI
07 Oct 2012
TL;DR: A novel multimodal approach to automatically predict the visual concepts of images through an effective fusion of visual and textual features relies on a Selective Weighted Late Fusion (SWLF) scheme which learns to automatically select and weight the best experts for each visual concept to be recognized.
Abstract: We propose in this paper a novel multimodal approach to automatically predict the visual concepts of images through an effective fusion of visual and textual features. It relies on a Selective Weighted Late Fusion (SWLF) scheme which, in optimizing an overall Mean interpolated Average Precision (MiAP), learns to automatically select and weight the best experts for each visual concept to be recognized. Experiments were conducted on the MIR Flickr image collection within the ImageCLEF 2011 Photo Annotation challenge. The results have brought to the fore the effectiveness of SWLF as it achieved a MiAP of 43.69 % for the detection of the 99 visual concepts which ranked 2nd out of the 79 submitted runs, while our new variant of SWLF allows to reach a MiAP of 43.93 %.

11 citations


Proceedings ArticleDOI
27 Jun 2012
TL;DR: A novel music representation is presented that allows gaining an in-depth understanding of the music structure and is to decompose sparsely the music into a basis of elementary audio elements, called musical words, which represent the notes played by various instruments generated through a MIDI synthesizer.
Abstract: Most of the automated music analysis methods available in the literature rely on the representation of the music through a set of low-level audio features related to temporal and frequential properties. Identifying high-level concepts, such as music mood, from this "black-box" representation is particularly challenging. Therefore we present in this paper a novel music representation that allows gaining an in-depth understanding of the music structure. Its principle is to decompose sparsely the music into a basis of elementary audio elements, called musical words, which represent the notes played by various instruments generated through a MIDI synthesizer. From this representation, a music feature is also proposed to allow automatic music classification. Experiments driven on two music datasets have shown the effectiveness of this approach to represent accurately music signals and to allow efficient classification for the complex problem of music mood classification.

8 citations


01 Jan 2012
TL;DR: This paper has proposed the Histogram of Textual Concepts (HTC) textual feature to capture the relatedness of semantic concepts and a Selective Weighted Late Fusion (SWLF) is introduced to combine multiple sources of information.
Abstract: In this paper, we present the methods we have proposed and evaluated through the ImageCLEF 2012 Photo Annotation task. More precisely, we have proposed the Histogram of Textual Concepts (HTC) textual feature to capture the relatedness of semantic concepts. In con- trast to term frequency-based text representations mostly used for visual concept detection and annotation, HTC relies on the semantic similar- ity between the user tags and a concept dictionary. Moreover, a Selective Weighted Late Fusion (SWLF) is introduced to combine multiple sources of information which by iteratively selecting and weighting the best fea- tures for each concept at hand to be classied. The results have shown

4 citations


Journal ArticleDOI
TL;DR: This paper presents two visual features for object recognition: multi-scale Local Binary Pattern operator extracted from coarse-to-fine image blocks to well describe texture structures and line segment feature based on Gestalt-inspired region segmentation and fast Hough transform to capture accurate geometric information.
Abstract: This paper presents two visual features for object recognition. One is multi-scale Local Binary Pattern (LBP) operator extracted from coarse-to-fine image blocks to well describe texture structures. The other is line segment feature based on Gestalt-inspired region segmentation and fast Hough transform to capture accurate geometric information. The experiments on the SIMPLIcity database and PASCAL VOC 2007 benchmark show the effectiveness of line segment feature, and significant accuracy improvement by using fine-level blocks for LBP. Moreover, fusing LBP from different block levels further boosts the performance and outperforms the state-of-the-art SIFT. Both features also prove complementary to SIFT.

4 citations


17 Sep 2012
TL;DR: In this paper, the Histogram of Textual Concepts (HTC) textual feature was proposed to capture the relatedness of semantic concepts and a Selective Weighted Late Fusion (SWLF) was introduced to combine multiple sources of information which by iteratively selecting and weighting the best fea- tures for each concept at hand to be classied.
Abstract: In this paper, we present the methods we have proposed and evaluated through the ImageCLEF 2012 Photo Annotation task. More precisely, we have proposed the Histogram of Textual Concepts (HTC) textual feature to capture the relatedness of semantic concepts. In con- trast to term frequency-based text representations mostly used for visual concept detection and annotation, HTC relies on the semantic similar- ity between the user tags and a concept dictionary. Moreover, a Selective Weighted Late Fusion (SWLF) is introduced to combine multiple sources of information which by iteratively selecting and weighting the best fea- tures for each concept at hand to be classied. The results have shown

3 citations


Proceedings ArticleDOI
01 Oct 2012
TL;DR: It is shown that both methods can be concisely reformulated into matrix multiplications which allows the application of NVIDIA Compute Unified Device Architecture (CUDA) implemented Basic Linear Algebra Subprograms (CUBLAS) and AMD Core Math Library (ACML) that are highly optimized matrix operation libraries for GPU and multi-core CPU.
Abstract: K-means clustering and GMM training, as dictionary learning procedures, lie at the heart of many signal processing applications. Increasing data scale requires more efficient ways to perform this process. In this paper a new GPU and multi-core CPU accelerated k-means clustering and GMM training is proposed. We show that both methods can be concisely reformulated into matrix multiplications which allows the application of NVIDIA Compute Unified Device Architecture (CUDA) implemented Basic Linear Algebra Subprograms (CUBLAS) and AMD Core Math Library (ACML) that are highly optimized matrix operation libraries for GPU and multi-core CPU. Experimentations on music genre and mood representation and classification have shown that the acceleration for learning dictionary is achieved by factors of 38.0 and 209.5 for k-means clustering and GMM training, compared with single thread CPU execution while the difference between the average classification accuracies is less than 1%.

1 citations