I
Ishan Misra
Researcher at Facebook
Publications - 90
Citations - 10759
Ishan Misra is an academic researcher from Facebook. The author has contributed to research in topics: Computer science & Object detection. The author has an hindex of 29, co-authored 66 publications receiving 5410 citations. Previous affiliations of Ishan Misra include University of Massachusetts Amherst & Carnegie Mellon University.
Papers
More filters
Posted Content
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
TL;DR: This paper proposes an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons, and uses a swapped prediction mechanism where it predicts the cluster assignment of a view from the representation of another view.
Proceedings ArticleDOI
Self-Supervised Learning of Pretext-Invariant Representations
TL;DR: This work develops Pretext-Invariant Representation Learning (PIRL), a new state-of-the-art in self-supervised learning from images that learns invariant representations based on pretext tasks that substantially improves the semantic quality of the learned image representations.
Proceedings ArticleDOI
Cross-Stitch Networks for Multi-task Learning
TL;DR: In this paper, a cross-stitch unit is proposed to combine the activations from multiple networks and can be trained end-to-end to learn an optimal combination of shared and task-specific representations.
Book ChapterDOI
Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification
TL;DR: This paper forms an approach for learning a visual representation from the raw spatiotemporal signals in videos using a Convolutional Neural Network, and shows that this method captures information that is temporally varying, such as human pose.
Posted Content
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron,Hugo Touvron,Hugo Touvron,Ishan Misra,Hervé Jégou,Julien Mairal,Piotr Bojanowski,Armand Joulin +7 more
TL;DR: In this paper, self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets) beyond the fact that adapting selfsupervised methods to this architecture works particularly well, they make the following observations: first, self-vised ViT features contain explicit information about the semantic segmentation of an image, which does not emerge as clearly with supervised ViTs, nor with convnets.