scispace - formally typeset
Search or ask a question

Showing papers in "Computer Vision and Image Understanding in 2020"


Journal ArticleDOI
TL;DR: This work performs a comprehensive quantitative study on the effects of object detection accuracy to the overall MOT performance, using the new large-scale University at Albany DETection and tRACking (UA-DETRAC) benchmark dataset.

332 citations


Journal ArticleDOI
TL;DR: This survey extensively reviews the recent deep learning-based 2D and 3D human pose estimation methods published since 2014 and summarizes the challenges, main frameworks, benchmark datasets, evaluation metrics, performance comparison, and discusses some promising future research directions.

255 citations


Journal ArticleDOI
TL;DR: This study presents a novel end-to-end partially supervised deep learning approach for video anomaly detection and localization using only normal samples, based on Gaussian Mixture Variational Autoencoder, which can learn feature representations of the normal samples as a GaRussian Mixture Model trained using deep learning.

123 citations


Journal ArticleDOI
Xiaoqin Zhang1, Tao Wang1, Jinxin Wang1, Guiying Tang1, Li Zhao1 
TL;DR: Experimental results demonstrate that the proposed Pyramid Channel-based Feature Attention Network (PCFAN) outperforms existing state-of-the-art algorithms on standard benchmark datasets in terms of accuracy, efficiency, and visual effect.

119 citations


Journal ArticleDOI
Jiayi Ma1, Yi Zhou1
TL;DR: An image filter based on fuzzy gradient threshold function and global optimization, termed as gradientlet filter, from the perspective of luminance and gradient separation is proposed, which can remove small gradient textures and noise while maintaining the overall brightness and edge gradients of an image.

52 citations


Journal ArticleDOI
TL;DR: The 3D Adversarial autoencoder (3dAAE) as mentioned in this paper is the state-of-the-art method for 3D point cloud point cloud generation.

51 citations


Journal ArticleDOI
TL;DR: An extensive comparative analysis of several frameworks for real AAE based on deep learning architectures and demonstrates the high performances of the popular CNNs frameworks against the state-of-art methods of automatic age estimation.

37 citations


Journal ArticleDOI
TL;DR: It is shown that attackers can successfully fool a face authentication system equipped with a deep learning spoof detection module, by exploiting the vulnerabilities of CNNs to adversarial perturbations.

30 citations


Journal ArticleDOI
TL;DR: An efficient and trusty denoising scheme, which combines the convolutional neural network technique with the traditional variational model, to offer interpretable and high quality reconstructions, which outperforms the state-of-the-art interpretable Denoising methods.

29 citations


Journal ArticleDOI
TL;DR: A new boosted and parallel architecture is proposed for video captioning using Long Short-Term Memory (LSTM) networks that considerably improves the accuracy of the generated sentence.

29 citations


Journal ArticleDOI
TL;DR: The results of ablation studies demonstrate that the proposed multi-branch architecture with attention blocks is effective and essential, and demonstrates controllability and interpretability.

Journal ArticleDOI
TL;DR: A novel Multi-scale Channel Attention guided Network (MCANet) is proposed to address the ghosting problem by using multi-scale blocks consisting of dilated convolution layers to extract informative features.

Journal ArticleDOI
TL;DR: A double attention model is proposed which combines sentence-level attention model with word- level attention model to generate more accurate captions and outperforms many state-of-the-art image captioning approaches in various evaluation metrics.

Journal ArticleDOI
TL;DR: An activation energy metric that combines convolutional layer activations to quantify visual complexity is derived and it is demonstrated that, within the context of a category, visually more complex images are also more memorable to human observers.

Journal ArticleDOI
Haijin Zeng1, Xiaozhen Xie1, Haojie Cui1, Yuan Zhao1, Jifeng Ning1 
TL;DR: This paper combines the advantages of traditional physical restoration models and the denoising convolutional neural networks to introduce the HSI restoration CNN with the low-rank tensor approximation based regularization in the flexible and extensible plug-and-play framework.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed an entropic optimal transportation (EOP) method to mitigate the effect of inaccurate labels on the performance of deep neural networks in remote sensing image analysis.

Journal ArticleDOI
TL;DR: A discriminative descriptor matching solution based on Reverse Nearest Neighbor and a memory based cumulative learning strategy that discards redundant descriptors while time progresses allows building a comprehensive and cumulative representation of all the past visual information observed so far.

Journal ArticleDOI
TL;DR: A novel end-to-end deep learning-based framework for FPP that does not need any frequency domain filtering and phase unwrapping is introduced that directly reconstructs the object’s depth profile from the deformed fringe itself through a multi-resolution similarity assessment convolutional neural network.

Journal ArticleDOI
TL;DR: It is concluded that adversarial training is beneficial if and only if the reconstruction loss is not too constrained, and non-adversarial training outperforms (or is on par with) any method trained with a GAN when a constrained reconstruction Loss is used in combination with batch normalisation.

Journal ArticleDOI
TL;DR: This paper considers human cooperative behaviour in front of wearable security cameras and proposes a human cooperation detection pipeline based on deep learning using an RNN architecture with the aim at detecting whether a human is exhibiting an adversarial behaviour by trying to avoid the camera.

Journal ArticleDOI
TL;DR: A new attention network architecture, termed as Cascade multi-head ATtention Network (CATNet), which constructs video representations with two-level attentions, namely multi- head local self-attentions and relation based global attentions is proposed.

Journal ArticleDOI
TL;DR: A novel large-scale product image dataset, termed as Product-90, and a simple yet efficient guidance learning method for training convolutional neural networks (CNNs) with noisy supervision, which achieves performance superior to state-of-the-art methods on these datasets.

Journal ArticleDOI
TL;DR: A new framework, called Momental Directional Patterns, is presented, taking into account the advantages of filtering and local-feature-based approaches to form effective DT descriptors, motivated by convolutional neural networks.

Journal ArticleDOI
TL;DR: A network by using residual blocks with cascading simple blocks to improve the image resolution and introduces a novel loss function called detail perception loss, which is used to measure the difference of the wavelet coefficients from the reconstructed image and ground truth.

Journal ArticleDOI
TL;DR: This work proposes classifier-agnostic saliency map extraction, which finds all parts of the image that any classifier could use, not just one given in advance, and extracts higher quality saliency maps than prior work.

Journal ArticleDOI
TL;DR: A JPEG simulation network JSNet is proposed to reappear the whole procedure of the JPEG lossy compression and restoration except entropy encoding as realistically as possible to enhance the robustness of deep learning-based watermarking methods.

Journal ArticleDOI
TL;DR: UPGen as discussed by the authors is a generic generative model for image-based plant phenotyping that leverages domain randomization to produce widely distributed data samples and models stochastic biological variation.

Journal ArticleDOI
TL;DR: A deep four-stream convolutional neural network is proposed for person re-identification to overcome the poor generalisation of the traditional triplet loss function, demonstrating promising performance when training and testing sets are from different domains.

Journal ArticleDOI
TL;DR: In this article, an actor-supervised architecture that exploits the inherent compositionality of actions in terms of actor transformations is proposed to localize actions in videos, which is end-to-end trainable.

Journal ArticleDOI
TL;DR: A novel method of estimating class saliency maps, which significantly improves the method proposed by Simonyan et al. (2014), and a method for retrieving “good seeds” by predicting the segmentation “Easiness” of images based on the consistency of the outputs under different conditions.