scispace - formally typeset
Proceedings ArticleDOI

Object discovery and representation networks

TLDR
Odin is proposed, a self-supervised learning paradigm that discovers meaningful image segmentations without any supervision and achieves state-of-the-art transfer learning results for object detection and instance segmentation on COCO, and semantic segmentsation on PASCAL and Cityscapes, while strongly surpassing supervised pre-training for video segmentations on DAVIS.
Abstract
The promise of self-supervised learning (SSL) is to leverage large amounts of unlabeled data to solve complex tasks. While there has been excellent progress with simple, image-level learning, recent methods have shown the advantage of including knowledge of image structure. However, by introducing hand-crafted image segmentations to define regions of interest, or specialized augmentation strategies, these methods sacrifice the simplicity and generality that makes SSL so powerful. Instead, we propose a self-supervised learning paradigm that discovers this image structure by itself. Our method, Odin, couples object discovery and representation networks to discover meaningful image segmentations without any supervision. The resulting learning paradigm is simpler, less brittle, and more general, and achieves state-of-the-art transfer learning results for object detection and instance segmentation on COCO, and semantic segmentation on PASCAL and Cityscapes, while strongly surpassing supervised pre-training for video segmentation on DAVIS.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos

TL;DR: SAVi ++ is introduced, an object-centric video model which is trained to predict depth signals from a slot-based video representation and is able to learn emergent object segmentation and tracking from videos in the real-world Waymo Open dataset by using sparse depth signals obtained from LiDAR.
Proceedings ArticleDOI

VICRegL: Self-Supervised Learning of Local Visual Features

TL;DR: A new method called VICRegL is proposed that learns good global and local features simultaneously, yielding excellent performance on detection and segmentation tasks while maintaining good performance on classification tasks.
Proceedings ArticleDOI

Bridging the Gap to Real-World Object-Centric Learning

TL;DR: DINOSAUR is the first unsupervised object-centric model that scales to real world-datasets such as COCO and PASCAL VOC and shows competitive performance compared to more involved pipelines from the computer vision literature.
Proceedings ArticleDOI

Self-Supervised Visual Representation Learning with Semantic Grouping

TL;DR: This paper proposes contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning, and effectively decomposes complex scenes into semantic groups for feature learning and downstream tasks, including object detection, instance segmentation, and semantic segmentation.
Proceedings ArticleDOI

NeRF-SOS: Any-View Self-supervised Object Segmentation on Complex Scenes

TL;DR: Of the exploration of self-supervised for object segmentation using NeRF for complex real-world NeRF NeRF-SOS segmentation and radiance in a The
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Posted Content

Deep Residual Learning for Image Recognition

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Proceedings ArticleDOI

Feature Pyramid Networks for Object Detection

TL;DR: This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.