scispace - formally typeset
Open AccessProceedings ArticleDOI

SwiftNet: Real-time Video Object Segmentation

Reads0
Chats0
TLDR
SwiftNet as discussed by the authors compresses spatiotemporal redundancy in matching-based VOS via Pixel-Adaptive Memory (PAM), which adaptively triggers memory updates on frames where objects display noteworthy inter-frame variations.
Abstract
In this work we present SwiftNet for real-time semisupervised video object segmentation (one-shot VOS), which reports 77.8% $\mathcal{J}\& \mathcal{F}$ and 70 FPS on DAVIS 2017 validation dataset, leading all present solutions in overall accuracy and speed performance. We achieve this by elaborately compressing spatiotemporal redundancy in matching-based VOS via Pixel-Adaptive Memory (PAM). Temporally, PAM adaptively triggers memory updates on frames where objects display noteworthy inter-frame variations. Spatially, PAM selectively performs memory update and match on dynamic pixels while ignoring the static ones, significantly reducing redundant computations wasted on segmentation-irrelevant pixels. To promote efficient reference encoding, light-aggregation encoder is also introduced in SwiftNet deploying reversed sub-pixel. We hope SwiftNet could set a strong and efficient baseline for real-time VOS and facilitate its application in mobile vision. The source code of SwiftNet can be found at https://github.com/haochenheheda/SwiftNet.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

TL;DR: This work develops a memory potentiation algorithm that routinely consolidates actively used working memory elements into the long-term memory, which avoids memory explosion and minimizes performance decay for long- term prediction.
Proceedings ArticleDOI

Recurrent Dynamic Embedding for Video Object Segmentation

TL;DR: This paper proposes a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size, explicitly generated and update by the proposed Spatio-temporal Aggregation Module (SAM), which exploits the cue of historical information.
Proceedings ArticleDOI

SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization

TL;DR: A novel Sequential Weighted Expectation-Maximization (SWEM) network to greatly reduce the redundancy of memory features and maintains a fixed number of template features in memory, which ensures the stable inference complexity of the VOS system.
Proceedings ArticleDOI

Learning Quality-aware Dynamic Memory for Video Object Segmentation

TL;DR: This work proposes a QDMN to evaluate the segmentation quality of each frame, allowing the memory bank to selectively store accurately segmented frames to prevent the error accumulation problem and significantly improves performance.
Journal ArticleDOI

Occluded Video Instance Segmentation: A Benchmark

TL;DR: In this article , MaskTrack R-CNN and SipMask are used to detect, segment, and track instances in occluded scenes, and a plug-and-play module that performs temporal feature calibration to complement missing object cues caused by occlusion is presented.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

Automatic differentiation in PyTorch

TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Proceedings ArticleDOI

Non-local Neural Networks

TL;DR: In this article, the non-local operation computes the response at a position as a weighted sum of the features at all positions, which can be used to capture long-range dependencies.
Proceedings ArticleDOI

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

TL;DR: This paper presents the first convolutional neural network capable of real-time SR of 1080p videos on a single K2 GPU and introduces an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output.
Related Papers (5)