Supervised Learning of Semantic Classes for Image Annotation and Retrieval
Reads0
Chats0
TLDR
The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost and to be fairly robust to parameter tuning.Abstract:
A probabilistic formulation for semantic image annotation and retrieval is proposed. Annotation and retrieval are posed as classification problems where each class is defined as the group of database images labeled with a common semantic label. It is shown that, by establishing this one-to-one correspondence between semantic labels and semantic classes, a minimum probability of error annotation and retrieval are feasible with algorithms that are 1) conceptually simple, 2) computationally efficient, and 3) do not require prior semantic segmentation of training images. In particular, images are represented as bags of localized feature vectors, a mixture density estimated for each image, and the mixtures associated with all images annotated with a common semantic label pooled into a density estimate for the corresponding semantic class. This pooling is justified by a multiple instance learning argument and performed efficiently with a hierarchical extension of expectation-maximization. The benefits of the supervised formulation over the more complex, and currently popular, joint modeling of semantic label and visual feature distributions are illustrated through theoretical arguments and extensive experiments. The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost. Finally, the proposed method is shown to be fairly robust to parameter tuningread more
Citations
More filters
Proceedings ArticleDOI
Image-Based Recommendations on Styles and Substitutes
TL;DR: The approach is not based on fine-grained modeling of user annotations but rather on capturing the largest dataset possible and developing a scalable method for uncovering human notions of the visual relationships within.
Proceedings ArticleDOI
A new approach to cross-modal multimedia retrieval
Nikhil Rasiwasia,Jose Costa Pereira,Emanuele Coviello,Gabriel Doyle,Gert R. G. Lanckriet,Roger Levy,Nuno Vasconcelos +6 more
TL;DR: It is shown that accounting for cross-modal correlations and semantic abstraction both improve retrieval accuracy and are shown to outperform state-of-the-art image retrieval systems on a unimodal retrieval task.
Proceedings ArticleDOI
Evaluating bag-of-visual-words representations in scene classification
TL;DR: This study provides an empirical basis for designing visual-word representations that are likely to produce superior classification performance and applies techniques used in text categorization to generate image representations that differ in the dimension, selection, and weighting of visual words.
Proceedings ArticleDOI
TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation
TL;DR: This work proposes TagProp, a discriminatively trained nearest neighbor model that allows the integration of metric learning by directly maximizing the log-likelihood of the tag predictions in the training set, and introduces a word specific sigmoidal modulation of the weighted neighbor tag predictions to boost the recall of rare words.
Journal ArticleDOI
Real-Time Computerized Annotation of Pictures
Jia Li,James Z. Wang +1 more
TL;DR: New optimization and estimation techniques to address two fundamental problems in machine learning are developed, which serve as the basis for the Automatic Linguistic Indexing of Pictures - Real Time (ALIPR) system of fully automatic and high speed annotation for online pictures.
References
More filters
Journal ArticleDOI
Maximum likelihood from incomplete data via the EM algorithm
Journal ArticleDOI
Content-based image retrieval at the end of the early years
TL;DR: The working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap are discussed, as well as aspects of system engineering: databases, system architecture, and evaluation.
Journal ArticleDOI
Texture features for browsing and retrieval of image data
B.S. Manjunath,Wei-Ying Ma +1 more
TL;DR: Comparisons with other multiresolution texture features using the Brodatz texture database indicate that the Gabor features provide the best pattern retrieval accuracy.
Journal ArticleDOI
Solving the multiple instance problem with axis-parallel rectangles
TL;DR: Three kinds of algorithms that learn axis-parallel rectangles to solve the multiple instance problem are described and compared, giving 89% correct predictions on a musk odor prediction task.