scispace - formally typeset
Search or ask a question

Showing papers in "Computer Vision and Image Understanding in 2022"


Journal ArticleDOI
TL;DR: A comprehensive overview of state-of-the-art tracking frameworks including both deep and non-deep trackers is provided in this article , where the authors present both quantitative and qualitative tracking results of various trackers on five benchmark datasets.

39 citations


Journal ArticleDOI
TL;DR: Borji et al. as discussed by the authors describe new dimensions that are becoming important in assessing models (e.g. bias and fairness) and discuss the connection between GAN evaluation and deepfakes.

34 citations


Journal ArticleDOI
Han Xu, Meiqi Gong, Xin Tian, Jun Huang, Jiayi Ma 
TL;DR: Wang et al. as mentioned in this paper proposed a novel method for visible and infrared image fusion by decomposing feature information, which adopts two pairs of encoder-decoder networks to implement feature map extraction and decomposition, respectively.

28 citations


Journal ArticleDOI
TL;DR: Tan et al. as discussed by the authors proposed a temporal contrastive learning framework consisting of two novel losses to improve upon existing contrastive self-supervised video representation learning methods, namely the local-local and global-local contrastive losses.

26 citations


Journal ArticleDOI
TL;DR: In this article , a survey of algorithms used to create deepfakes and methods proposed to detect deep fakes in the literature to date is presented, along with extensive discussions on challenges, research trends and directions related to deepfake technologies.

19 citations


Journal ArticleDOI
TL;DR: Despite its simplicity, the proposed weakly supervised object detection method shows competitive results on a range of publicly available datasets, including paintings, watercolors, cliparts and comics and allows to quickly learn unseen visual categories.

16 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed an adaptive temporal modelling block (ATB), which is able to flexibly capture temporal structure for skeleton-based action recognition, and fused the adaptive feature map to the graph convolutional layer to improve the capability of learning better representation.

15 citations


Journal ArticleDOI
TL;DR: The performance of face recognition systems can be negatively impacted in the presence of masks and other types of facial coverings that have become prevalent due to the COVID-19 pandemic, so the periocular region of the human face becomes an important biometric cue.

14 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed an uncertainty-aware consistency regularization method for cross-domain semantic segmentation, which introduces an uncertainty guided consistency loss with a dynamic weighting scheme by exploiting the latent uncertainty information of the target samples.

12 citations


Journal ArticleDOI
TL;DR: This paper decomposes the TAD pipeline into several essential components: data sampling, backbone design, neck construction, and detection head, and yields an astounding RGB-Only baseline very close to the state-of-the-art methods with two-stream inputs.

9 citations


Journal ArticleDOI
TL;DR: In this paper , the authors present MC-Calib, a toolbox dedicated to the calibration of complex synchronized multi-camera systems using an arbitrary number of fiducial marker-based patterns.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a frame-level refinement network to adaptively learn specific topology in different frames and capture long-range dependencies between frames through transformer self-attention.

Journal ArticleDOI
TL;DR: This work revisits the self-supervised multi-task learning framework for video anomaly detection, proposing several updates to the original method, and modernizes the 3D convolutional backbone by introducing multi-head self-attention modules, inspired by the recent success of vision transformers.

Journal ArticleDOI
TL;DR: In this paper , the authors present a checklist to spot different types of bias during visual dataset collection and discuss existing attempts to collect visual datasets in a bias-aware manner, concluding that the problem of bias discovery and quantification in visual datasets is still open, and there is room for improvement in terms of both methods and the range of biases that can be addressed.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a 3D-2D structural information fusion (SIF) for 3D object detection on LiDAR-camera system, which is based on hand-crafted 3D and 2D descriptors, generates primary structure feature, and has stable performance in outdoor scenes.

Journal ArticleDOI
TL;DR: In this paper , the authors reviewed and categorized image datasets that include depth information and grouped them into three categories: scene/objects, body, and medical, and provided an overview of the different types of sensors, depth applications, and examined trends and future directions of the usage and creation of datasets containing depth data.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a two-stage region-based convolutional neural network for thighbone fracture detection, which achieved an AP of 88.9% and outperformed all existing methods.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a 3D-2D structural information fusion (SIF) for 3D object detection on LiDAR-camera system, which is based on hand-crafted 3D and 2D descriptors, generates primary structure feature, and has stable performance in outdoor scenes.

Journal ArticleDOI
TL;DR: Widafeng et al. as mentioned in this paper proposed a cross-modality dual attention fusion module named CMDA to explicitly exchange spatial-temporal information between two pathways in two-stream SlowFast networks.

Journal ArticleDOI
TL;DR: In this article , the authors summarize the recent work in animal pose estimation from computer vision perspective and highlight the challenges they face in this field, and provide an in-depth analysis of the persisting obstacles.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a saliency guided backlit image enhancement network, namely BacklitNet, for robust and natural restoration of backlit images, which combines a nested U-structure with bilateral grids, which enables fully extracting multi-scale saliency information and rapidly enhancing arbitrary resolution images.

Journal ArticleDOI
TL;DR: In this paper , a global filter importance based adaptive pruning (GFI-AP) method is proposed to assign importance scores to all filters based on how the network learns the input-output mapping of a dataset, which can then be compared across all the other convolutional filters.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a light-weight network for real-time shadow detection, which uses graph convolutional networks to provide extra training pairs, which obtains a complete shadow mask via only several annotation scribbles.

Journal ArticleDOI
TL;DR: In this article , a robust encoder-decoder structured deep learning network is proposed to detect the local changes in video using a combination of the feature pooling module (FPM) with a ResNet-50 encoderdecoder network.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a root regression module to estimate absolute root locations in the camera coordinate and a fisheye re-projection module without using ground-truth camera parameters to connect two branches.

Journal ArticleDOI
TL;DR: In this article , an anti-jamming network is proposed to improve the robustness of handling less-constrained scenarios, and a new spatial-temporal map generation mechanism is designed to enhance the spatial and temporal features representation by equivalent padding for low-quality video frame fragments.

Journal ArticleDOI
TL;DR: In this paper , a split-inpaint-fuse network (SIFNet) is proposed to separate the corrupted luma and chroma images using two decoupled branches in the coarse stage and a fusion sub-network for fusing the inpainted chroma and luma images into a refined image in the refinement stage.

Journal ArticleDOI
TL;DR: In this article , the authors proposed a cross-modal attention mechanism where the gating signal from one modality can dynamically activate the most discriminant CNN filters of the other modality.

Journal ArticleDOI
TL;DR: HSGAN as discussed by the authors alleviates the mode collapse problem by maintaining a certain distance between the latent code of the generated data and the real data, where the objective function is designed to minimize the f-divergence between the distributions of generated and real data.

Journal ArticleDOI
TL;DR: A benchmark aimed to study human behavior in the considered industrial-like scenario is proposed which demon-strates that the investigated tasks and the considered scenario are challenging for state-of-the-art algorithms.