R
Ross Girshick
Researcher at Facebook
Publications - 170
Citations - 336844
Ross Girshick is an academic researcher from Facebook. The author has contributed to research in topics: Object detection & Convolutional neural network. The author has an hindex of 97, co-authored 166 publications receiving 231744 citations. Previous affiliations of Ross Girshick include University of Washington & Carnegie Mellon University.
Papers
More filters
From Large-Scale Object Classifiers to Large-Scale Object Detectors: An Adaptation Approach
Judy Hoffman,Sergio Guadarrama,Eric Tzeng,Jeff Donahue,Ross Girshick,Trevor Darrell,Kate Saenko +6 more
TL;DR: This paper proposes a Deep Detection Adaptation (DDA) algorithm which learns the difference between the two tasks and transfers this knowledge to classifiers for categories without bounding box annotated data, turning them into detectors.
Proceedings Article
Discriminative Latent Variable Models for Object Detection
TL;DR: This talk will discuss recent work by colleagues and myself on discriminative latent-variable models for object detection, which specifically considers the task of localizing and detecting instances of a generic object category in cluttered real-word images.
Journal ArticleDOI
I1.4: Invited Paper: Indoor Scene Understanding from RGB-D Images
TL;DR: This work aims to be able to align objects in an RGB-D image with 3D models from a library by detecting and segmenting objects and estimating coarse pose using a convolutional neural network, followed by inserting the rendered model in the scene.
Proceedings ArticleDOI
Training ASR Models By Generation of Contextual Information
Kritika Singh,Dmytro Okhonko,Jun Liu,Yongqiang Wang,Frank Zhang,Ross Girshick,Sergey Edunov,Fuchun Peng,Yatharth Saraf,Geoffrey Zweig,Abdelrahman Mohamed +10 more
TL;DR: The authors used loosely related contextual information as a surrogate for ground-truth labels to train an encoder-decoder transformer model, which achieved an average 20.8% WER reduction over a 1000 hours supervised baseline, and an average 13.4% reduction when using only the weakly supervised encoder for CTC fine-tuning.