scispace - formally typeset
Search or ask a question

Showing papers in "Computer Vision and Image Understanding in 2021"


Journal ArticleDOI
TL;DR: This work proposes a novel Spatial-Temporal Transformer network (ST-TR) which models dependencies between joints using the Transformer self-attention operator, outperforming the state-of-the-art on NTU-RGB+D w.r.t.

154 citations


Journal ArticleDOI
TL;DR: A thorough review of existing deep learning based works for 3D pose estimation is provided, the advantages and disadvantages of these methods are summarized, and the commonly-used benchmark datasets are explored for comparison and analysis.

100 citations


Journal ArticleDOI
TL;DR: This paper presents a comprehensive review of image inpainting methods over the past decade and the commonly used performance metrics and datasets and details the strengths and weaknesses of each to provide new insights in the field.

53 citations


Journal ArticleDOI
TL;DR: It is found that multi-scale training helps NNs to deal with large blurs, and RNNs outperform CNNs and GANs using a perceptual loss function produce artifacts.

46 citations


Journal ArticleDOI
TL;DR: In this article, a review of the leading human pose estimation methods of the past five years is presented, focusing on metrics, benchmarks and method structures, and a taxonomy based on accuracy, speed and robustness is proposed to classify the methods and derive directions for future research.

45 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed an adaptive manipulation traces extraction network (AMTEN), which serves as pre-processing to suppress image content and highlight manipulation traces, which achieves an average accuracy up to 98.52%.

41 citations


Journal ArticleDOI
TL;DR: This paper presents a salient instance segmentation method that produces a saliency map with distinct object instance labels for an input image that is capable of achieving satisfactory performance over six public benchmarks for salient region detection as well as on the new dataset for salient instance segmentsation.

39 citations


Journal ArticleDOI
TL;DR: In this article, knowledge distillation techniques are applied on the previous model to retain the information about learned classes, whilst updating the current model to learn the new ones, thus preserving high accuracy on previously learned classes.

38 citations


Journal ArticleDOI
TL;DR: This survey focuses on high level prior, embedded at the loss function level, and categorizes the articles according to the nature of the prior: the object shape, size, topology, and the inter-regions constraints.

35 citations


Journal ArticleDOI
TL;DR: The work on adversarial samples detection in forensics mainly focused on detecting attacks against FR systems in which the learning model is typically used only as a features extractor, showing that such approach is generalizable to different types of offensives.

30 citations


Journal ArticleDOI
TL;DR: It is highlighted that methods for future prediction from egocentric vision can have a significant impact in a range of applications and that further research efforts should be devoted to the standardisation of tasks and the proposal of datasets considering real-world scenarios such as the ones with an industrial vocation.

Journal ArticleDOI
TL;DR: A simple yet effective approach using Retinex theory and Taylor series expansion for nighttime image dehazing, referred to as ‘RDT’ is proposed, which demonstrates the superior performance of the proposed RDT method over the state-of-the-art methods.

Journal ArticleDOI
Tao Wang1, Xiaoqin Zhang1, Runhua Jiang1, Li Zhao1, Huiling Chen1, Wenhan Luo2 
TL;DR: A Spatiotemporal Pyramid Network is proposed to dynamically learn different spatiotem temporal cues for video deblurring and a SPGAN, which conducts adversarial discrimination in the gradient space is proposed, which helps the network produce more realistic sharp video frames.

Journal ArticleDOI
TL;DR: A novel few-shot action recognition framework that uses long short-term memory following 3D convolutional layers for sequence modeling and alignment and circle loss is introduced to maximize the within- class similarity and minimize the between-class similarity flexibly towards a more definite convergence target is proposed.

Journal ArticleDOI
TL;DR: In this paper, a new neural network architecture is proposed to learn the normal behavior in a purely unsupervised fashion, which uses latent code predictions as the anomaly metric and outperforms frame reconstruction-based and prediction-based methods.

Journal ArticleDOI
TL;DR: A new generative model for multi-future trajectory prediction based on Conditional Variational Recurrent Neural Networks (C-VRNNs) is proposed, which operates step-by-step, generating more refined and accurate predictions.

Journal ArticleDOI
TL;DR: “Perceive, Transform, and Act” (PTA) is devised: a fully-attentive VLN architecture that leaves the recurrent approach behind and the first Transformer-like architecture incorporating three different modalities – natural language, images, and low-level actions for the agent control.

Journal ArticleDOI
TL;DR: Based on two coupled neural P (CNP) systems with local topology, a multi-focus image fusion framework in the non-sub-sampled contourlet transform (NSCT) domain is developed, where the two CNP systems are utilized to control the fusion of low-frequency coefficients in the NSCT domain this paper.

Journal ArticleDOI
TL;DR: RemotePulseNet is proposed, a novel 3DCNN architecture that exploits temporally dilated convolutions with increasing dilation rate to drastically increase the receptive field, and its performance is compared with that of recent state-of-the-art pulse estimation methods.

Journal ArticleDOI
TL;DR: A novel multi-scale attention network (MSA-Net) is proposed to fill the irregular missing regions and can achieve better results than the previous inpainting methods, where spatial attention is combined with each scale to highlight the most probably attentive spatial components.

Journal ArticleDOI
TL;DR: A task-dependent deep pruning framework based on Fisher’s Linear Discriminant Analysis (LDA) that can be applied to convolutional, fully-connected, and module-based deep network structures, in all cases leveraging the high decorrelation of neuron motifs found in the pre-decision space and cross-layer deconv dependency.

Journal ArticleDOI
TL;DR: In this article, a disjoint multitask learning framework was proposed to combine real and game data in an alternating fashion to obtain an improved action classifier for aerial action recognition.

Journal ArticleDOI
TL;DR: This paper embeds an iterative dehazing model into the generative process of the Cycle-Consistent Adversarial Network (CycleGAN) and develops a detail information-consistency loss that preserves more textural details and color information; this loss is obtained based on the physical features of the hazy image.

Journal ArticleDOI
Can Peng1, Kun Zhao1, Sam Maksoud1, Meng Li1, Brian C. Lovell1 
TL;DR: A novel incremental learning paradigm called Selective and Inter-related Distillation (SID) is proposed and a novel evaluation metric is proposed to better assess the performance of detectors under incremental learning conditions.

Journal ArticleDOI
TL;DR: In this article, a self-paced algorithm that learns from easy to hard was proposed to translate images with ground-truth labels from the source domain to the target domain using Cycle-GAN.

Journal ArticleDOI
TL;DR: This work proposes a framework to improve the state-of-the-art models of crowd motion prediction by enriching the learning model with the social relationships between pedestrians walking in the crowd, as well as the layout of the environment.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a patch-based structure similarity to reconstruct the high frequency parts and the low frequency parts were reconstructed by singular value decomposition (SVD) to achieve high and low frequency components respectively.

Journal ArticleDOI
TL;DR: Li et al. as discussed by the authors proposed a graph-based framework to learn high-level interactions between people and objects, in both space and time, by self-attention on a multi-layer graph structure which can connect entities from consecutive clips, thus considering long-range spatial and temporal dependencies.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors introduced an attention mechanism for the blind deblurring NN, including both spatial and channel attention, to effectively handle the significant spatial variations on blurring effects.

Journal ArticleDOI
TL;DR: This article propose a mixed-initiative framework where both the user and system can be active participants, depending on whose input will be more beneficial for obtaining high-quality search results, and develop a reinforcement learning approach which dynamically decides which of four interaction opportunities to give to the user: drawing a sketch, marking images as relevant or not, providing free-form attribute feedback, or answering attribute-based questions.