Showing papers by "Ross Girshick published in 2013"

PDF

Open Access

Posted Content•

Rich feature hierarchies for accurate object detection and semantic segmentation

[...]

Ross Girshick¹, Jeff Donahue¹, Trevor Darrell¹, Jitendra Malik¹•Institutions (1)

11 Nov 2013-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

...read moreread less

Abstract: Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012---achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed sliding-window detector based on a similar CNN architecture. We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset. Source code for the complete system is available at this http URL.

...read moreread less

13,081 citations

Journal Article•DOI•

Efficient Human Pose Estimation from Single Depth Images

[...]

Jamie Shotton¹, Ross Girshick², Andrew Fitzgibbon¹, Toby Sharp¹, Mat Cook¹, Mark J. Finocchio¹, Richard Moore³, Pushmeet Kohli¹, Antonio Criminisi¹, Alex Aben-Athar Kipman¹, Andrew Blake¹ - Show less +7 more•Institutions (3)

Microsoft¹, University of California, Berkeley², Ericsson³

01 Dec 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Two new approaches to human pose estimation are described, both of which can quickly and accurately predict the 3D positions of body joints from a single depth image without using any temporal information.

...read moreread less

Abstract: We describe two new approaches to human pose estimation. Both can quickly and accurately predict the 3D positions of body joints from a single depth image without using any temporal information. The key to both approaches is the use of a large, realistic, and highly varied synthetic set of training images. This allows us to learn models that are largely invariant to factors such as pose, body shape, field-of-view cropping, and clothing. Our first approach employs an intermediate body parts representation, designed so that an accurate per-pixel classification of the parts will localize the joints of the body. The second approach instead directly regresses the positions of body joints. By using simple depth pixel comparison features and parallelizable decision forests, both approaches can run super-real time on consumer hardware. Our evaluation investigates many aspects of our methods, and compares the approaches to each other and to the state of the art. Results on silhouettes suggest broader applicability to other imaging modalities.

...read moreread less

568 citations

Journal Article•DOI•

Visual object detection with deformable part models

[...]

Pedro F. Felzenszwalb¹, Ross Girshick², David McAllester³, Deva Ramanan⁴•Institutions (4)

Brown University¹, University of California, Berkeley², Toyota Technological Institute at Chicago³, University of California, Irvine⁴

01 Sep 2013-Communications of The ACM

TL;DR: A state-of-the-art system for finding objects in cluttered images based on deformable models that represent objects using local part templates and geometric constraints on the locations of parts is described.

...read moreread less

Abstract: We describe a state-of-the-art system for finding objects in cluttered images. Our system is based on deformable models that represent objects using local part templates and geometric constraints on the locations of parts. We reduce object detection to classification with latent variables. The latent variables introduce invariances that make it possible to detect objects with highly variable appearance. We use a generalization of support vector machines to incorporate latent information during training. This has led to a general framework for discriminative training of classifiers with latent variables. Discriminative training benefits from large training datasets. In practice we use an iterative algorithm that alternates between estimating latent values for positive examples and solving a large convex optimization problem. Practical optimization of this large convex problem can be done using active set techniques for adaptive subsampling of the training data.

...read moreread less

57 citations

Proceedings Article•DOI•

Training Deformable Part Models with Decorrelated Features

[...]

Ross Girshick¹, Jitendra Malik¹•Institutions (1)

University of California, Berkeley¹

01 Dec 2013

TL;DR: This paper shows how to train a deformable part model (DPM) fast-typically in less than 20 minutes, or four times faster than the current fastest method-while maintaining high average precision on the PASCAL VOC datasets.

...read moreread less

Abstract: In this paper, we show how to train a deformable part model (DPM) fast-typically in less than 20 minutes, or four times faster than the current fastest method-while maintaining high average precision on the PASCAL VOC datasets. At the core of our approach is "latent LDA," a novel generalization of linear discriminant analysis for learning latent variable models. Unlike latent SVM, latent LDA uses efficient closed-form updates and does not require an expensive search for hard negative examples. Our approach also acts as a springboard for a detailed experimental study of DPM training. We isolate and quantify the impact of key training factors for the first time (e.g., How important are discriminative SVM filters? How important is joint parameter estimation? How many negative images are needed for training?). Our findings yield useful insights for researchers working with Markov random fields and part-based models, and have practical implications for speeding up tasks such as model selection.

...read moreread less

40 citations

Proceedings Article•

Discriminatively Activated Sparselets

[...]

Ross Girshick¹, Hyun Oh Song¹, Trevor Darrell¹•Institutions (1)

University of California, Berkeley¹

16 Jun 2013

TL;DR: This paper describes a new training framework that learns which sparselets to activate in order to optimize a discriminative objective, leading to larger speedup factors with no decrease in task performance, and shows experimental results on object detection and image classification tasks.

...read moreread less

Abstract: Shared representations are highly appealing due to their potential for gains in computational and statistical efficiency. Compressing a shared representation leads to greater computational savings, but can also severely decrease performance on a target task. Recently, sparselets (Song et al., 2012) were introduced as a new shared intermediate representation for multiclass object detection with deformable part models (Felzenszwalb et al., 2010a), showing significant speedup factors, but with a large decrease in task performance. In this paper we describe a new training framework that learns which sparselets to activate in order to optimize a discriminative objective, leading to larger speedup factors with no decrease in task performance. We first reformulate sparselets in a general structured output prediction framework, then analyze when sparselets lead to computational efficiency gains, and lastly show experimental results on object detection and image classification tasks. Our experimental results demonstrate that discriminative activation substantially outperforms the previous reconstructive approach which, together with our structured output prediction formulation, make sparselets broadly applicable and significantly more effective.

...read moreread less

35 citations