scispace - formally typeset
Open AccessJournal ArticleDOI

Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Reads0
Chats0
TLDR
The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost and to be fairly robust to parameter tuning.
Abstract
A probabilistic formulation for semantic image annotation and retrieval is proposed. Annotation and retrieval are posed as classification problems where each class is defined as the group of database images labeled with a common semantic label. It is shown that, by establishing this one-to-one correspondence between semantic labels and semantic classes, a minimum probability of error annotation and retrieval are feasible with algorithms that are 1) conceptually simple, 2) computationally efficient, and 3) do not require prior semantic segmentation of training images. In particular, images are represented as bags of localized feature vectors, a mixture density estimated for each image, and the mixtures associated with all images annotated with a common semantic label pooled into a density estimate for the corresponding semantic class. This pooling is justified by a multiple instance learning argument and performed efficiently with a hierarchical extension of expectation-maximization. The benefits of the supervised formulation over the more complex, and currently popular, joint modeling of semantic label and visual feature distributions are illustrated through theoretical arguments and extensive experiments. The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost. Finally, the proposed method is shown to be fairly robust to parameter tuning

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Image-Based Recommendations on Styles and Substitutes

TL;DR: The approach is not based on fine-grained modeling of user annotations but rather on capturing the largest dataset possible and developing a scalable method for uncovering human notions of the visual relationships within.
Proceedings ArticleDOI

A new approach to cross-modal multimedia retrieval

TL;DR: It is shown that accounting for cross-modal correlations and semantic abstraction both improve retrieval accuracy and are shown to outperform state-of-the-art image retrieval systems on a unimodal retrieval task.
Proceedings ArticleDOI

Evaluating bag-of-visual-words representations in scene classification

TL;DR: This study provides an empirical basis for designing visual-word representations that are likely to produce superior classification performance and applies techniques used in text categorization to generate image representations that differ in the dimension, selection, and weighting of visual words.
Proceedings ArticleDOI

TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation

TL;DR: This work proposes TagProp, a discriminatively trained nearest neighbor model that allows the integration of metric learning by directly maximizing the log-likelihood of the tag predictions in the training set, and introduces a word specific sigmoidal modulation of the weighted neighbor tag predictions to boost the recall of rare words.
Journal ArticleDOI

Real-Time Computerized Annotation of Pictures

TL;DR: New optimization and estimation techniques to address two fundamental problems in machine learning are developed, which serve as the basis for the Automatic Linguistic Indexing of Pictures - Real Time (ALIPR) system of fully automatic and high speed annotation for online pictures.
References
More filters
Journal ArticleDOI

Content-based image retrieval at the end of the early years

TL;DR: The working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap are discussed, as well as aspects of system engineering: databases, system architecture, and evaluation.
Journal ArticleDOI

Texture features for browsing and retrieval of image data

TL;DR: Comparisons with other multiresolution texture features using the Brodatz texture database indicate that the Gabor features provide the best pattern retrieval accuracy.
Journal ArticleDOI

Solving the multiple instance problem with axis-parallel rectangles

TL;DR: Three kinds of algorithms that learn axis-parallel rectangles to solve the multiple instance problem are described and compared, giving 89% correct predictions on a musk odor prediction task.
Related Papers (5)