scispace - formally typeset
R

Rohit Girdhar

Researcher at Facebook

Publications -  46
Citations -  3802

Rohit Girdhar is an academic researcher from Facebook. The author has contributed to research in topics: Computer science & Object detection. The author has an hindex of 14, co-authored 36 publications receiving 2586 citations. Previous affiliations of Rohit Girdhar include International Institute of Information Technology, Hyderabad & Carnegie Mellon University.

Papers
More filters
Posted Content

Learning a Predictable and Generative Vector Representation for Objects

TL;DR: A novel architecture, called the TL-embedding network, is proposed, to learn an embedding space with generative and predictable properties, which enables tackling a number of tasks including voxel prediction from 2D images and 3D model retrieval.
Book ChapterDOI

Learning a Predictable and Generative Vector Representation for Objects

TL;DR: The TL-embedding network as discussed by the authors uses an autoencoder to ensure the representation is generative and a convolutional network to ensure it is predictable, which can be used for voxel prediction from 2D images and 3D model retrieval.
Proceedings ArticleDOI

Video Action Transformer Network

TL;DR: Action Transformer as mentioned in this paper uses a Transformer-style architecture to aggregate features from the spatio-temporal context around the person whose actions we are trying to classify, and shows that by using high-resolution, person-specific, class-agnostic queries, the model spontaneously learns to track individual people and to pick up on semantic context from the actions of others.
Proceedings ArticleDOI

ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification

TL;DR: A new video representation for action classification that aggregates local convolutional features across the entire spatio-temporal extent of the video and outperforms other baselines with comparable base architectures on HMDB51, UCF101, and Charades video classification benchmarks.
Posted Content

ActionVLAD: Learning spatio-temporal aggregation for action classification

TL;DR: In this paper, a two-stream network with learnable spatio-temporal feature aggregation is proposed for action classification, which is end-to-end trainable for whole-video classification.