SwiftNet: Real-time Video Object Segmentation

doi:10.1109/CVPR46437.2021.00135

Open AccessProceedings ArticleDOI

SwiftNet: Real-time Video Object Segmentation

Haochen Wang, +4 more

- pp 1296-1305

Chats0

TLDR

SwiftNet as discussed by the authors compresses spatiotemporal redundancy in matching-based VOS via Pixel-Adaptive Memory (PAM), which adaptively triggers memory updates on frames where objects display noteworthy inter-frame variations.

Abstract:

In this work we present SwiftNet for real-time semisupervised video object segmentation (one-shot VOS), which reports 77.8% $\mathcal{J}\& \mathcal{F}$ and 70 FPS on DAVIS 2017 validation dataset, leading all present solutions in overall accuracy and speed performance. We achieve this by elaborately compressing spatiotemporal redundancy in matching-based VOS via Pixel-Adaptive Memory (PAM). Temporally, PAM adaptively triggers memory updates on frames where objects display noteworthy inter-frame variations. Spatially, PAM selectively performs memory update and match on dynamic pixels while ignoring the static ones, significantly reducing redundant computations wasted on segmentation-irrelevant pixels. To promote efficient reference encoding, light-aggregation encoder is also introduced in SwiftNet deploying reversed sub-pixel. We hope SwiftNet could set a strong and efficient baseline for real-time VOS and facilitate its application in mobile vision. The source code of SwiftNet can be found at https://github.com/haochenheheda/SwiftNet.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

Ho Kei Cheng, +1 more

TL;DR: This work develops a memory potentiation algorithm that routinely consolidates actively used working memory elements into the long-term memory, which avoids memory explosion and minimizes performance decay for long- term prediction.

...read moreread less

Proceedings ArticleDOI

Recurrent Dynamic Embedding for Video Object Segmentation

Mingxing Li, +5 more

TL;DR: This paper proposes a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size, explicitly generated and update by the proposed Spatio-temporal Aggregation Module (SAM), which exploits the cue of historical information.

...read moreread less

Proceedings ArticleDOI

SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization

Zhihui Lin, +6 more

TL;DR: A novel Sequential Weighted Expectation-Maximization (SWEM) network to greatly reduce the redundancy of memory features and maintains a fixed number of template features in memory, which ensures the stable inference complexity of the VOS system.

...read moreread less

Proceedings ArticleDOI

Learning Quality-aware Dynamic Memory for Video Object Segmentation

Yong Liu, +6 more

TL;DR: This work proposes a QDMN to evaluate the segmentation quality of each frame, allowing the memory bank to selectively store accurately segmented frames to prevent the error accumulation problem and significantly improves performance.

...read moreread less

Journal ArticleDOI

Occluded Video Instance Segmentation: A Benchmark

- 18 Jun 2022 -

International Journal of Computer Vision

TL;DR: In this article , MaskTrack R-CNN and SipMask are used to detect, segment, and track instances in occluded scenes, and a plug-and-play module that performs temporal feature calibration to complement missing object cues caused by occlusion is presented.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Book ChapterDOI

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

Automatic differentiation in PyTorch

Adam Paszke, +9 more

TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.

...read moreread less

Proceedings ArticleDOI

Non-local Neural Networks

Xiaolong Wang, +3 more

TL;DR: In this article, the non-local operation computes the response at a position as a weighted sum of the features at all positions, which can be used to capture long-range dependencies.

...read moreread less

Proceedings ArticleDOI

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

Wenzhe Shi, +7 more

TL;DR: This paper presents the first convolutional neural network capable of real-time SR of 1080p videos on a single K2 GPU and introduces an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output.

...read moreread less

Collapse

Related Papers (5)

TapLab: A Fast Framework for Semantic Video Segmentation Tapping into Compressed-Domain Knowledge.

Junyi Feng, +6 more

- 30 Mar 2020 -

arXiv: Computer Vision and Pattern Recog...

Computer Vision and Image Understanding

Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence

Dahun Kim, +3 more

SwiftNet: Real-time Video Object Segmentation

Citations

XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

Recurrent Dynamic Embedding for Video Object Segmentation

SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization

Learning Quality-aware Dynamic Memory for Video Object Segmentation

Occluded Video Instance Segmentation: A Benchmark

References

Deep Residual Learning for Image Recognition

Microsoft COCO: Common Objects in Context

Automatic differentiation in PyTorch

Non-local Neural Networks

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

Related Papers (5)

TapLab: A Fast Framework for Semantic Video Segmentation Tapping into Compressed-Domain Knowledge.

Fast and adaptive semantic object extraction from video

Fast Semantic Segmentation on Video Using Block Motion-Based Feature Interpolation

Real-time and accurate object detection in compressed video by long short-term feature aggregation

Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence