Showing papers in "Computer Vision and Image Understanding in 2020"

PDF

Open Access

Journal Article•DOI•

UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking

[...]

Longyin Wen, Dawei Du¹, Zhaowei Cai², Zhen Lei, Ming-Ching Chang¹, Honggang Qi³, Jongwoo Lim⁴, Ming-Hsuan Yang⁵, Siwei Lyu¹ - Show less +5 more•Institutions (5)

University at Albany, SUNY¹, University of California, San Diego², Chinese Academy of Sciences³, Hanyang University⁴, University of California, Merced⁵

01 Apr 2020-Computer Vision and Image Understanding

TL;DR: This work performs a comprehensive quantitative study on the effects of object detection accuracy to the overall MOT performance, using the new large-scale University at Albany DETection and tRACking (UA-DETRAC) benchmark dataset.

...read moreread less

332 citations

Journal Article•DOI•

Monocular human pose estimation: A survey of deep learning-based methods

[...]

Yucheng Chen¹, Yingli Tian², Mingyi He¹•Institutions (2)

Northwestern Polytechnical University¹, City University of New York²

01 Mar 2020-Computer Vision and Image Understanding

TL;DR: This survey extensively reviews the recent deep learning-based 2D and 3D human pose estimation methods published since 2014 and summarizes the challenges, main frameworks, benchmark datasets, evaluation metrics, performance comparison, and discusses some promising future research directions.

...read moreread less

255 citations

Journal Article•DOI•

Video Anomaly Detection and Localization via Gaussian Mixture Fully Convolutional Variational Autoencoder

[...]

Yaxiang Fan¹, Yaxiang Fan², Gongjian Wen², Deren Li³, Shaohua Qiu¹, Shaohua Qiu², Martin D. Levine⁴, Fei Xiao¹ - Show less +4 more•Institutions (4)

Naval University of Engineering¹, National University of Defense Technology², Wuhan University³, McGill University⁴

01 Jun 2020-Computer Vision and Image Understanding

TL;DR: This study presents a novel end-to-end partially supervised deep learning approach for video anomaly detection and localization using only normal samples, based on Gaussian Mixture Variational Autoencoder, which can learn feature representations of the normal samples as a GaRussian Mixture Model trained using deep learning.

...read moreread less

123 citations

Journal Article•DOI•

Pyramid Channel-based Feature Attention Network for image dehazing

[...]

Xiaoqin Zhang¹, Tao Wang¹, Jinxin Wang¹, Guiying Tang¹, Li Zhao¹ - Show less +1 more•Institutions (1)

Wenzhou University¹

01 Aug 2020-Computer Vision and Image Understanding

TL;DR: Experimental results demonstrate that the proposed Pyramid Channel-based Feature Attention Network (PCFAN) outperforms existing state-of-the-art algorithms on standard benchmark datasets in terms of accuracy, efficiency, and visual effect.

...read moreread less

119 citations

Journal Article•DOI•

Infrared and visible image fusion via gradientlet filter

[...]

Jiayi Ma¹, Yi Zhou¹•Institutions (1)

Wuhan University¹

01 Aug 2020-Computer Vision and Image Understanding

TL;DR: An image filter based on fuzzy gradient threshold function and global optimization, termed as gradientlet filter, from the perspective of luminance and gradient separation is proposed, which can remove small gradient textures and noise while maintaining the overall brightness and edge gradients of an image.

...read moreread less

52 citations

Journal Article•DOI•

Adversarial autoencoders for compact representations of 3D point clouds

[...]

Maciej Zamorski¹, Maciej Zięba¹, Piotr Klukowski¹, Rafał Nowak², Karol Kurach³, Wojciech Stokowiec, Tomasz Trzcinski⁴ - Show less +3 more•Institutions (4)

Wrocław University of Technology¹, University of Wrocław², Google³, Warsaw University of Technology⁴

01 Apr 2020-Computer Vision and Image Understanding

TL;DR: The 3D Adversarial autoencoder (3dAAE) as mentioned in this paper is the state-of-the-art method for 3D point cloud point cloud generation.

...read moreread less

51 citations

Journal Article•DOI•

Age estimation from faces using deep learning: A comparative analysis

[...]

Alice Othmani¹, Abdul Rahman Taleb¹, Hazem Abdelkawy¹, Abdenour Hadid²•Institutions (2)

University of Paris¹, University of Oulu²

01 Jul 2020-Computer Vision and Image Understanding

TL;DR: An extensive comparative analysis of several frameworks for real AAE based on deep learning architectures and demonstrates the high performances of the popular CNNs frameworks against the state-of-art methods of automatic age estimation.

...read moreread less

37 citations

Journal Article•DOI•

Adversarial examples for replay attacks against CNN-based face recognition with anti-spoofing capability

[...]

Bowen Zhang¹, Benedetta Tondi², Mauro Barni²•Institutions (2)

Xidian University¹, University of Siena²

01 Aug 2020-Computer Vision and Image Understanding

TL;DR: It is shown that attackers can successfully fool a face authentication system equipped with a deep learning spoof detection module, by exploiting the vulnerabilities of CNNs to adversarial perturbations.

...read moreread less

30 citations

Journal Article•DOI•

Learning deep edge prior for image denoising

[...]

Yingying Fang¹, Tieyong Zeng²•Institutions (2)

Hong Kong Baptist University¹, The Chinese University of Hong Kong²

01 Nov 2020-Computer Vision and Image Understanding

TL;DR: An efficient and trusty denoising scheme, which combines the convolutional neural network technique with the traditional variational model, to offer interpretable and high quality reconstructions, which outperforms the state-of-the-art interpretable Denoising methods.

...read moreread less

29 citations

Journal Article•DOI•

Video captioning using boosted and parallel Long Short-Term Memory networks

[...]

Masoomeh Nabati¹, Alireza Behrad¹•Institutions (1)

Shahed University¹

01 Jan 2020-Computer Vision and Image Understanding

TL;DR: A new boosted and parallel architecture is proposed for video captioning using Long Short-Term Memory (LSTM) networks that considerably improves the accuracy of the generated sentence.

...read moreread less

29 citations

Journal Article•DOI•

MTRNet++: One-stage mask-based scene text eraser

[...]

Osman Tursun¹, Simon Denman¹, Rui Zeng², Rui Zeng¹, Sabesan Sivapalan¹, Sridha Sridharan¹, Clinton Fookes¹ - Show less +3 more•Institutions (2)

Queensland University of Technology¹, University of Sydney²

01 Dec 2020-Computer Vision and Image Understanding

TL;DR: The results of ablation studies demonstrate that the proposed multi-branch architecture with attention blocks is effective and essential, and demonstrates controllability and interpretability.

...read moreread less

Journal Article•DOI•

Ghost Removal via Channel Attention in Exposure Fusion

[...]

Qingsen Yan¹, Bo Wang², Peipei Li³, Xianjun Li³, Ao Zhang³, Qinfeng Shi¹, Zheng You², Yu Zhu³, Jinqiu Sun³, Yanning Zhang³ - Show less +6 more•Institutions (3)

University of Adelaide¹, Tsinghua University², Northwestern Polytechnical University³

01 Dec 2020-Computer Vision and Image Understanding

TL;DR: A novel Multi-scale Channel Attention guided Network (MCANet) is proposed to address the ghosting problem by using multi-scale blocks consisting of dilated convolution layers to extract informative features.

...read moreread less

Journal Article•DOI•

The synergy of double attention: Combine sentence-level and word-level attention for image captioning

[...]

Haiyang Wei¹, Zhixin Li¹, Canlong Zhang¹, Huifang Ma¹, Huifang Ma² - Show less +1 more•Institutions (2)

Guangxi Normal University¹, Northwest Normal University²

01 Dec 2020-Computer Vision and Image Understanding

TL;DR: A double attention model is proposed which combines sentence-level attention model with word- level attention model to generate more accurate captions and outperforms many state-of-the-art image captioning approaches in various evaluation metrics.

...read moreread less

Journal Article•DOI•

Visual complexity analysis using deep intermediate-layer features

[...]

Elham Saraee¹, Mona Jalal¹, Margrit Betke¹•Institutions (1)

Boston University¹

01 Jun 2020-Computer Vision and Image Understanding

TL;DR: An activation energy metric that combines convolutional layer activations to quantify visual complexity is derived and it is demonstrated that, within the context of a category, visually more complex images are also more memorable to human observers.

...read moreread less

Journal Article•DOI•

Hyperspectral image restoration via CNN denoiser prior regularized low-rank tensor recovery

[...]

Haijin Zeng¹, Xiaozhen Xie¹, Haojie Cui¹, Yuan Zhao¹, Jifeng Ning¹ - Show less +1 more•Institutions (1)

Northwest A&F University¹

01 Aug 2020-Computer Vision and Image Understanding

TL;DR: This paper combines the advantages of traditional physical restoration models and the denoising convolutional neural networks to introduce the HSI restoration CNN with the low-rank tensor approximation based regularization in the flexible and extensible plug-and-play framework.

...read moreread less

Journal Article•DOI•

An Entropic Optimal Transport loss for learning deep neural networks under label noise in remote sensing images

[...]

Bharath Bhushan Damodaran¹, Rémi Flamary², Vivien Seguy³, Nicolas Courty¹•Institutions (3)

University of Southern Brittany¹, Centre national de la recherche scientifique², Kyoto University³

01 Feb 2020-Computer Vision and Image Understanding

TL;DR: In this paper, the authors proposed an entropic optimal transportation (EOP) method to mitigate the effect of inaccurate labels on the performance of deep neural networks in remote sensing image analysis.

...read moreread less

Journal Article•DOI•

Self-supervised on-line cumulative learning from video streams

[...]

Federico Pernici¹, Matteo Bruni¹, Alberto Del Bimbo¹•Institutions (1)

University of Florence¹

01 Aug 2020-Computer Vision and Image Understanding

TL;DR: A discriminative descriptor matching solution based on Reverse Nearest Neighbor and a memory based cumulative learning strategy that discards redundant descriptors while time progresses allows building a comprehensive and cumulative representation of all the past visual information observed so far.

...read moreread less

Journal Article•DOI•

End-to-end deep learning-based fringe projection framework for 3D profiling of objects

[...]

Rakesh Chowdary Machineni¹, G. E. Spoorthi¹, Krishna Sumanth Vengala¹, Subrahmanyam Gorthi¹, Rama Krishna Sai Subrahmanyam Gorthi¹ - Show less +1 more•Institutions (1)

Indian Institutes of Technology¹

01 Oct 2020-Computer Vision and Image Understanding

TL;DR: A novel end-to-end deep learning-based framework for FPP that does not need any frequency domain filtering and phase unwrapping is introduced that directly reconstructs the object’s depth profile from the deformed fringe itself through a multi-resolution similarity assessment convolutional neural network.

...read moreread less

Journal Article•DOI•

On the benefit of adversarial training for monocular depth estimation

[...]

Rick Groenendijk¹, Sezer Karaoglu, Theo Gevers¹, Thomas Mensink², Thomas Mensink¹ - Show less +1 more•Institutions (2)

University of Amsterdam¹, Google²

01 Jan 2020-Computer Vision and Image Understanding

TL;DR: It is concluded that adversarial training is beneficial if and only if the reconstruction loss is not too constrained, and non-adversarial training outperforms (or is on par with) any method trained with a GAN when a constrained reconstruction Loss is used in combination with batch normalisation.

...read moreread less

Journal Article•DOI•

An attention recurrent model for human cooperation detection

[...]

David Freire-Obregón¹, Modesto Castrillón-Santana¹, Paola Barra², Carmen Bisogni², Michele Nappi² - Show less +1 more•Institutions (2)

University of Las Palmas de Gran Canaria¹, University of Salerno²

01 Aug 2020-Computer Vision and Image Understanding

TL;DR: This paper considers human cooperative behaviour in front of wearable security cameras and proposes a human cooperation detection pipeline based on deep learning using an RNN architecture with the aim at detecting whether a human is exhibiting an adversarial behaviour by trying to avoid the camera.

...read moreread less

Journal Article•DOI•

Cascade multi-head attention networks for action recognition

[...]

Jiaze Wang¹, Jiaze Wang², Xiaojiang Peng¹, Xiaojiang Peng², Yu Qiao¹, Yu Qiao² - Show less +2 more•Institutions (2)

Chinese Academy of Sciences¹, Shenzhen University²

01 Mar 2020-Computer Vision and Image Understanding

TL;DR: A new attention network architecture, termed as Cascade multi-head ATtention Network (CATNet), which constructs video representations with two-level attentions, namely multi- head local self-attentions and relation based global attentions is proposed.

...read moreread less

Journal Article•DOI•

Product image recognition with guidance learning and noisy supervision

[...]

Qing Li¹, Qing Li², Xiaojiang Peng², Liangliang Cao³, Wenbin Du², Hao Xing, Yu Qiao², Qiang Peng¹ - Show less +4 more•Institutions (3)

Southwest Jiaotong University¹, Chinese Academy of Sciences², University of Massachusetts Amherst³

01 Jul 2020-Computer Vision and Image Understanding

TL;DR: A novel large-scale product image dataset, termed as Product-90, and a simple yet efficient guidance learning method for training convolutional neural networks (CNNs) with noisy supervision, which achieves performance superior to state-of-the-art methods on these datasets.

...read moreread less

Journal Article•DOI•

Momental directional patterns for dynamic texture recognition

[...]

Thanh Tuan Nguyen, Thanh Phuong Nguyen, Frédéric Bouchara, Xuan Son Nguyen

01 May 2020-Computer Vision and Image Understanding

TL;DR: A new framework, called Momental Directional Patterns, is presented, taking into account the advantages of filtering and local-feature-based approaches to form effective DT descriptors, motivated by convolutional neural networks.

...read moreread less

Journal Article•DOI•

Residual network with detail perception loss for single image super-resolution

[...]

Zhijie Wen¹, Jiawei Guan¹, Tieyong Zeng², Ying Li¹•Institutions (2)

Shanghai University¹, The Chinese University of Hong Kong²

01 Oct 2020-Computer Vision and Image Understanding

TL;DR: A network by using residual blocks with cascading simple blocks to improve the image resolution and introduces a novel loss function called detail perception loss, which is used to measure the difference of the wavelet coefficients from the reconstructed image and ground truth.

...read moreread less

Journal Article•DOI•

Classifier-agnostic saliency map extraction

[...]

Konrad Zolna¹, Konrad Zolna², Krzysztof J. Geras², Kyunghyun Cho², Kyunghyun Cho³, Kyunghyun Cho⁴ - Show less +2 more•Institutions (4)

Jagiellonian University¹, New York University², Canadian Institute for Advanced Research³, Facebook⁴

01 Jul 2020-Computer Vision and Image Understanding

TL;DR: This work proposes classifier-agnostic saliency map extraction, which finds all parts of the image that any classifier could use, not just one given in advance, and extracts higher quality saliency maps than prior work.

...read moreread less

Journal Article•DOI•

JSNet: A simulation network of JPEG lossy compression and restoration for robust image watermarking against JPEG attack

[...]

Beijing Chen¹, Yunqing Wu¹, Gouenou Coatrieux², Xiao Chen¹, Yuhui Zheng¹ - Show less +1 more•Institutions (2)

Nanjing University of Information Science and Technology¹, French Institute of Health and Medical Research²

01 Aug 2020-Computer Vision and Image Understanding

TL;DR: A JPEG simulation network JSNet is proposed to reappear the whole procedure of the JPEG lossy compression and restoration except entropy encoding as realistically as possible to enhance the robustness of deep learning-based watermarking methods.

...read moreread less

Journal Article•DOI•

Scalable learning for bridging the species gap in image-based plant phenotyping

[...]

Daniel Ward¹, Peyman Moghadam¹•Institutions (1)

Commonwealth Scientific and Industrial Research Organisation¹

01 Aug 2020-Computer Vision and Image Understanding

TL;DR: UPGen as discussed by the authors is a generic generative model for image-based plant phenotyping that leverages domain randomization to produce widely distributed data samples and models stochastic biological variation.

...read moreread less

Journal Article•DOI•

Joint identification–verification for person re-identification: A four stream deep learning approach with improved quartet loss function

[...]

Amena Khatun¹, Simon Denman¹, Sridha Sridharan¹, Clinton Fookes¹•Institutions (1)

Queensland University of Technology¹

01 Aug 2020-Computer Vision and Image Understanding

TL;DR: A deep four-stream convolutional neural network is proposed for person re-identification to overcome the poor generalisation of the traditional triplet loss function, demonstrating promising performance when training and testing sets are from different domains.

...read moreread less

Journal Article•DOI•

Guess where? Actor-supervision for spatiotemporal action localization

[...]

Victor Escorcia¹, Cuong D. Dao¹, Mihir Jain², Bernard Ghanem¹, Cees G. M. Snoek³ - Show less +1 more•Institutions (3)

King Abdullah University of Science and Technology¹, Qualcomm², University of Amsterdam³

01 Mar 2020-Computer Vision and Image Understanding

TL;DR: In this article, an actor-supervised architecture that exploits the inherent compositionality of actions in terms of actor transformations is proposed to localize actions in videos, which is end-to-end trainable.

...read moreread less

Journal Article•DOI•

Weakly supervised semantic segmentation using distinct class specific saliency maps

[...]

Wataru Shimoda¹, Keiji Yanai¹•Institutions (1)

University of Electro-Communications¹

01 Feb 2020-Computer Vision and Image Understanding

TL;DR: A novel method of estimating class saliency maps, which significantly improves the method proposed by Simonyan et al. (2014), and a method for retrieving “good seeds” by predicting the segmentation “Easiness” of images based on the consistency of the outputs under different conditions.

...read moreread less