SwiftNet: Real-time Video Object Segmentation
Haochen Wang,Xiaolong Jiang,Haibing Ren,Yao Hu,Song Bai +4 more
- pp 1296-1305
Reads0
Chats0
TLDR
SwiftNet as discussed by the authors compresses spatiotemporal redundancy in matching-based VOS via Pixel-Adaptive Memory (PAM), which adaptively triggers memory updates on frames where objects display noteworthy inter-frame variations.Abstract:
In this work we present SwiftNet for real-time semisupervised video object segmentation (one-shot VOS), which reports 77.8% $\mathcal{J}\& \mathcal{F}$ and 70 FPS on DAVIS 2017 validation dataset, leading all present solutions in overall accuracy and speed performance. We achieve this by elaborately compressing spatiotemporal redundancy in matching-based VOS via Pixel-Adaptive Memory (PAM). Temporally, PAM adaptively triggers memory updates on frames where objects display noteworthy inter-frame variations. Spatially, PAM selectively performs memory update and match on dynamic pixels while ignoring the static ones, significantly reducing redundant computations wasted on segmentation-irrelevant pixels. To promote efficient reference encoding, light-aggregation encoder is also introduced in SwiftNet deploying reversed sub-pixel. We hope SwiftNet could set a strong and efficient baseline for real-time VOS and facilitate its application in mobile vision. The source code of SwiftNet can be found at https://github.com/haochenheheda/SwiftNet.read more
Citations
More filters
Proceedings ArticleDOI
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
TL;DR: This work develops a memory potentiation algorithm that routinely consolidates actively used working memory elements into the long-term memory, which avoids memory explosion and minimizes performance decay for long- term prediction.
Proceedings ArticleDOI
Recurrent Dynamic Embedding for Video Object Segmentation
TL;DR: This paper proposes a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size, explicitly generated and update by the proposed Spatio-temporal Aggregation Module (SAM), which exploits the cue of historical information.
Proceedings ArticleDOI
SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization
TL;DR: A novel Sequential Weighted Expectation-Maximization (SWEM) network to greatly reduce the redundancy of memory features and maintains a fixed number of template features in memory, which ensures the stable inference complexity of the VOS system.
Proceedings ArticleDOI
Learning Quality-aware Dynamic Memory for Video Object Segmentation
TL;DR: This work proposes a QDMN to evaluate the segmentation quality of each frame, allowing the memory bank to selectively store accurately segmented frames to prevent the error accumulation problem and significantly improves performance.
Journal ArticleDOI
Occluded Video Instance Segmentation: A Benchmark
TL;DR: In this article , MaskTrack R-CNN and SipMask are used to detect, segment, and track instances in occluded scenes, and a plug-and-play module that performs temporal feature calibration to complement missing object cues caused by occlusion is presented.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Book ChapterDOI
Microsoft COCO: Common Objects in Context
Tsung-Yi Lin,Michael Maire,Serge Belongie,James Hays,Pietro Perona,Deva Ramanan,Piotr Dollár,C. Lawrence Zitnick +7 more
TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Automatic differentiation in PyTorch
Adam Paszke,Sam Gross,Soumith Chintala,Gregory Chanan,Edward Z. Yang,Zachary DeVito,Zeming Lin,Alban Desmaison,Luca Antiga,Adam Lerer +9 more
TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Proceedings ArticleDOI
Non-local Neural Networks
TL;DR: In this article, the non-local operation computes the response at a position as a weighted sum of the features at all positions, which can be used to capture long-range dependencies.
Proceedings ArticleDOI
Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network
Wenzhe Shi,Jose Caballero,Ferenc Huszar,Johannes Totz,Andrew Peter Aitken,Rob Bishop,Daniel Rueckert,Zehan Wang +7 more
TL;DR: This paper presents the first convolutional neural network capable of real-time SR of 1080p videos on a single K2 GPU and introduces an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output.
Related Papers (5)
Fast Semantic Segmentation on Video Using Block Motion-Based Feature Interpolation
Samvit Jain,Joseph E. Gonzalez +1 more