Discriminative Models for Multi-Class Object Layout

doi:10.1007/S11263-011-0439-X

Journal ArticleDOI

Discriminative Models for Multi-Class Object Layout

Chaitanya Desai, +2 more

- 01 Oct 2011 -

International Journal of Computer Vision

- Vol. 95, Iss: 1, pp 1-12

TLDR

A unified model for multi-class object recognition is introduced that casts the problem as a structured prediction task and how to formulate learning as a convex optimization problem is shown.

Abstract:

Many state-of-the-art approaches for object recognition reduce the problem to a 0-1 classification task. This allows one to leverage sophisticated machine learning techniques for training classifiers from labeled examples. However, these models are typically trained independently for each class using positive and negative examples cropped from images. At test-time, various post-processing heuristics such as non-maxima suppression (NMS) are required to reconcile multiple detections within and between different classes for each image. Though crucial to good performance on benchmarks, this post-processing is usually defined heuristically. We introduce a unified model for multi-class object recognition that casts the problem as a structured prediction task. Rather than predicting a binary label for each image window independently, our model simultaneously predicts a structured labeling of the entire image (Fig. 1). Our model learns statistics that capture the spatial arrangements of various object classes in real images, both in terms of which arrangements to suppress through NMS and which arrangements to favor through spatial co-occurrence statistics. We formulate parameter estimation in our model as a max-margin learning problem. Given training images with ground-truth object locations, we show how to formulate learning as a convex optimization problem. We employ the cutting plane algorithm of Joachims et al. (Mach. Learn. 2009) to efficiently learn a model from thousands of training images. We show state-of-the-art results on the PASCAL VOC benchmark that indicate the benefits of learning a global model encapsulating the spatial layout of multiple object classes (a preliminary version of this work appeared in ICCV 2009, Desai et al., IEEE international conference on computer vision, 2009).

Discriminative Models for Multi-Class Object Layout

Citations

Computer Vision: Algorithms and Applications

The PASCAL Visual Object Classes Challenge

The Role of Context for Object Detection and Semantic Segmentation in the Wild

Soft-NMS — Improving Object Detection with One Line of Code

Measuring the objectness of image windows

References

Histograms of oriented gradients for human detection

The Pascal Visual Object Classes (VOC) Challenge

Robust Real-Time Face Detection

Robust real-time face detection

The Hidden Dimension

Related Papers (5)

Histograms of oriented gradients for human detection

The Pascal Visual Object Classes (VOC) Challenge

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Microsoft COCO: Common Objects in Context

ImageNet Classification with Deep Convolutional Neural Networks