Showing papers on "Feature (computer vision) published in 2020"

PDF

Open Access

Proceedings Article•DOI•

EfficientDet: Scalable and Efficient Object Detection

[...]

Mingxing Tan¹, Ruoming Pang¹, Quoc V. Le¹•Institutions (1)

14 Jun 2020

TL;DR: EfficientDetD7 as discussed by the authors proposes a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multi-scale feature fusion, and a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time.

...read moreread less

Abstract: Model efficiency has become increasingly important in computer vision. In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multi-scale feature fusion; Second, we propose a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time. Based on these optimizations and EfficientNet backbones, we have developed a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior art across a wide spectrum of resource constraints. In particular, with single-model and single-scale, our EfficientDetD7 achieves state-of-the-art 52.2 AP on COCO test-dev with 52M parameters and 325B FLOPs, being 4x – 9x smaller and using 13x – 42x fewer FLOPs than previous detector.

...read moreread less

3,423 citations

Journal Article•DOI•

Deep Learning for Generic Object Detection: A Survey

[...]

Li Liu¹, Li Liu², Wanli Ouyang³, Xiaogang Wang⁴, Paul Fieguth⁵, Jie Chen¹, Xinwang Liu², Matti Pietikäinen¹ - Show less +4 more•Institutions (5)

University of Oulu¹, National University of Defense Technology², University of Sydney³, The Chinese University of Hong Kong⁴, University of Waterloo⁵

01 Feb 2020-International Journal of Computer Vision

TL;DR: A comprehensive survey of the recent achievements in this field brought about by deep learning techniques, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics.

...read moreread less

Abstract: Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances from a large number of predefined categories in natural images. Deep learning techniques have emerged as a powerful strategy for learning feature representations directly from data and have led to remarkable breakthroughs in the field of generic object detection. Given this period of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field brought about by deep learning techniques. More than 300 research contributions are included in this survey, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics. We finish the survey by identifying promising directions for future research.

...read moreread less

1,897 citations

Journal Article•DOI•

The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping

[...]

Alex Zwanenburg¹, Alex Zwanenburg², Martin Vallières³, Mahmoud A. Abdalah⁴, Hugo J.W.L. Aerts⁵, Hugo J.W.L. Aerts⁶, Vincent Andrearczyk, Aditya Apte⁷, Saeed Ashrafinia⁸, Spyridon Bakas⁹, Roelof J. Beukinga¹⁰, Ronald Boellaard¹⁰, Marta Bogowicz¹¹, Luca Boldrini, Irène Buvat, Gary Cook¹², Christos Davatzikos⁹, Adrien Depeursinge¹³, Marie-Charlotte Desseroit, Nicola Dinapoli, Cuong V. Dinh¹⁴, Sebastian Echegaray¹⁵, Issam El Naqa¹⁶, Issam El Naqa³, Andriy Fedorov⁶, Roberto Gatta, Robert J. Gillies⁴, Vicky Goh¹², Michael Götz¹, Matthias Guckenberger¹¹, Sung Min Ha⁹, Mathieu Hatt, Fabian Isensee¹, Philippe Lambin¹⁷, Stefan Leger², Stefan Leger¹, Ralph T.H. Leijenaar¹⁷, Jacopo Lenkowicz, Fiona Lippert¹⁸, Are Losnegård¹⁹, Klaus H. Maier-Hein¹, Olivier Morin²⁰, Henning Müller²¹, Sandy Napel¹⁵, Christophe Nioche, Fanny Orlhac, Sarthak Pati⁹, Elisabeth Pfaehler¹⁰, Arman Rahmim⁸, Arman Rahmim²², Arvind Rao¹⁶, Jonas Scherer¹, Muhammad Siddique¹², Nanna M. Sijtsema¹⁰, Jairo Socarras Fernandez¹⁸, Emiliano Spezi²³, Roel J H M Steenbakkers¹⁰, Stephanie Tanadini-Lang¹¹, Daniela Thorwarth¹⁸, Esther G.C. Troost², Esther G.C. Troost¹, Taman Upadhaya, Vincenzo Valentini, Lisanne V. van Dijk¹⁰, Joost J. M. van Griethuysen, Floris H. P. van Velden²⁴, Philip Whybra²³, Christian Richter¹, Christian Richter², Steffen Löck¹, Steffen Löck² - Show less +67 more•Institutions (24)

01 May 2020-Radiology

TL;DR: A set of 169 radiomics features was standardized, which enabled verification and calibration of different radiomics software and could be excellently reproduced.

...read moreread less

Abstract: Background Radiomic features may quantify characteristics present in medical imaging. However, the lack of standardized definitions and validated reference values have hampered clinical use. Purpose To standardize a set of 174 radiomic features. Materials and Methods Radiomic features were assessed in three phases. In phase I, 487 features were derived from the basic set of 174 features. Twenty-five research teams with unique radiomics software implementations computed feature values directly from a digital phantom, without any additional image processing. In phase II, 15 teams computed values for 1347 derived features using a CT image of a patient with lung cancer and predefined image processing configurations. In both phases, consensus among the teams on the validity of tentative reference values was measured through the frequency of the modal value and classified as follows: less than three matches, weak; three to five matches, moderate; six to nine matches, strong; 10 or more matches, very strong. In the final phase (phase III), a public data set of multimodality images (CT, fluorine 18 fluorodeoxyglucose PET, and T1-weighted MRI) from 51 patients with soft-tissue sarcoma was used to prospectively assess reproducibility of standardized features. Results Consensus on reference values was initially weak for 232 of 302 features (76.8%) at phase I and 703 of 1075 features (65.4%) at phase II. At the final iteration, weak consensus remained for only two of 487 features (0.4%) at phase I and 19 of 1347 features (1.4%) at phase II. Strong or better consensus was achieved for 463 of 487 features (95.1%) at phase I and 1220 of 1347 features (90.6%) at phase II. Overall, 169 of 174 features were standardized in the first two phases. In the final validation phase (phase III), most of the 169 standardized features could be excellently reproduced (166 with CT; 164 with PET; and 164 with MRI). Conclusion A set of 169 radiomics features was standardized, which enabled verification and calibration of different radiomics software. © RSNA, 2020 Online supplemental material is available for this article. See also the editorial by Kuhl and Truhn in this issue.

...read moreread less

1,563 citations

Proceedings Article•DOI•

UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation

[...]

Huimin Huang¹, Lanfen Lin¹, Ruofeng Tong¹, Hongjie Hu², Qiaowei Zhang², Yutaro Iwamoto³, Xian-Hua Han³, Yen-Wei Chen³, Jian Wu¹ - Show less +5 more•Institutions (3)

Zhejiang University¹, Sir Run Run Shaw Hospital², Ritsumeikan University³

04 May 2020

TL;DR: A novel UNet 3+ is proposed, which takes advantage of full-scale skip connections and deep supervisions, and can reduce the network parameters to improve the computation efficiency.

...read moreread less

Abstract: Recently, a growing interest has been seen in deep learning-based semantic segmentation. UNet, which is one of deep learning networks with an encoder-decoder architecture, is widely used in medical image segmentation. Combining multi-scale features is one of important factors for accurate segmentation. UNet++ was developed as a modified Unet by designing an architecture with nested and dense skip connections. However, it does not explore sufficient information from full scales and there is still a large room for improvement. In this paper, we propose a novel UNet 3+, which takes advantage of full-scale skip connections and deep supervisions. The full-scale skip connections incorporate low-level details with high-level semantics from feature maps in different scales; while the deep supervision learns hierarchical representations from the full-scale aggregated feature maps. The proposed method is especially benefiting for organs that appear at varying scales. In addition to accuracy improvements, the proposed UNet 3+ can reduce the network parameters to improve the computation efficiency. We further propose a hybrid loss function and devise a classification-guided module to enhance the organ boundary and reduce the over-segmentation in a non-organ image, yielding more accurate segmentation results. The effectiveness of the proposed method is demonstrated on two datasets. The code is available at: github.com/ZJUGiveLab/UNet-Version

...read moreread less

897 citations

Proceedings Article•DOI•

GhostNet: More Features From Cheap Operations

[...]

Kai Han¹, Yunhe Wang¹, Qi Tian¹, Jianyuan Guo², Chunjing Xu¹, Chang Xu³ - Show less +2 more•Institutions (3)

Huawei¹, Peking University², University of Sydney³

14 Jun 2020

Abstract: Deploying convolutional neural networks (CNNs) on embedded devices is difficult due to the limited memory and computation resources. The redundancy in feature maps is an important characteristic of those successful CNNs, but has rarely been investigated in neural architecture design. This paper proposes a novel Ghost module to generate more feature maps from cheap operations. Based on a set of intrinsic feature maps, we apply a series of linear transformations with cheap cost to generate many ghost feature maps that could fully reveal information underlying intrinsic features. The proposed Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks. Ghost bottlenecks are designed to stack Ghost modules, and then the lightweight GhostNet can be easily established. Experiments conducted on benchmarks demonstrate that the proposed Ghost module is an impressive alternative of convolution layers in baseline models, and our GhostNet can achieve higher recognition performance (e.g. 75.7% top-1 accuracy) than MobileNetV3 with similar computational cost on the ImageNet ILSVRC-2012 classification dataset. Code is available at https://github.com/huawei-noah/ghostnet.

...read moreread less

880 citations

Proceedings Article•DOI•

SuperGlue: Learning Feature Matching With Graph Neural Networks

[...]

Paul-Edouard Sarlin¹, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich•Institutions (1)

ETH Zurich¹

14 Jun 2020

TL;DR: SuperGlue is introduced, a neural network that matches two sets of local features by jointly finding correspondences and rejecting non-matchable points and introduces a flexible context aggregation mechanism based on attention, enabling SuperGlue to reason about the underlying 3D scene and feature assignments jointly.

...read moreread less

Abstract: This paper introduces SuperGlue, a neural network that matches two sets of local features by jointly finding correspondences and rejecting non-matchable points. Assignments are estimated by solving a differentiable optimal transport problem, whose costs are predicted by a graph neural network. We introduce a flexible context aggregation mechanism based on attention, enabling SuperGlue to reason about the underlying 3D scene and feature assignments jointly. Compared to traditional, hand-designed heuristics, our technique learns priors over geometric transformations and regularities of the 3D world through end-to-end training from image pairs. SuperGlue outperforms other learned approaches and achieves state-of-the-art results on the task of pose estimation in challenging real-world indoor and outdoor environments. The proposed method performs matching in real-time on a modern GPU and can be readily integrated into modern SfM or SLAM systems. The code and trained weights are publicly available at github.com/magicleap/SuperGluePretrainedNetwork.

...read moreread less

656 citations

Proceedings Article•DOI•

PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection

[...]

Shaoshuai Shi¹, Chaoxu Guo², Li Jiang¹, Zhe Wang², Jianping Shi², Xiaogang Wang¹, Hongsheng Li¹ - Show less +3 more•Institutions (2)

The Chinese University of Hong Kong¹, SenseTime²

14 Jun 2020

TL;DR: PointVoxel-RCNN as discussed by the authors combines 3D voxel convolutional neural network (CNN) and PointNet-based set abstraction to learn more discriminative point cloud features.

...read moreread less

Abstract: We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN), for accurate 3D object detection from point clouds. Our proposed method deeply integrates both 3D voxel Convolutional Neural Network (CNN) and PointNet-based set abstraction to learn more discriminative point cloud features. It takes advantages of efficient learning and high-quality proposals of the 3D voxel CNN and the flexible receptive fields of the PointNet-based networks. Specifically, the proposed framework summarizes the 3D scene with a 3D voxel CNN into a small set of keypoints via a novel voxel set abstraction module to save follow-up computations and also to encode representative scene features. Given the high-quality 3D proposals generated by the voxel CNN, the RoI-grid pooling is proposed to abstract proposal-specific features from the keypoints to the RoI-grid points via keypoint set abstraction. Compared with conventional pooling operations, the RoI-grid feature points encode much richer context information for accurately estimating object confidences and locations. Extensive experiments on both the KITTI dataset and the Waymo Open dataset show that our proposed PV-RCNN surpasses state-of-the-art 3D detection methods with remarkable margins.

...read moreread less

538 citations

Proceedings Article•DOI•

HRank: Filter Pruning Using High-Rank Feature Map

[...]

Mingbao Lin¹, Rongrong Ji¹, Yan Wang, Yichen Zhang¹, Baochang Zhang², Yonghong Tian³, Ling Shao - Show less +3 more•Institutions (3)

Xiamen University¹, Beihang University², Peking University³

14 Jun 2020

TL;DR: This paper proposes a novel filter pruning method by exploring the High Rank of feature maps (HRank), inspired by the discovery that the average rank of multiple feature maps generated by a single filter is always the same, regardless of the number of image batches CNNs receive.

...read moreread less

Abstract: Neural network pruning offers a promising prospect to facilitate deploying deep neural networks on resource-limited devices. However, existing methods are still challenged by the training inefficiency and labor cost in pruning designs, due to missing theoretical guidance of non-salient network components. In this paper, we propose a novel filter pruning method by exploring the High Rank of feature maps (HRank). Our HRank is inspired by the discovery that the average rank of multiple feature maps generated by a single filter is always the same, regardless of the number of image batches CNNs receive. Based on HRank, we develop a method that is mathematically formulated to prune filters with low-rank feature maps. The principle behind our pruning is that low-rank feature maps contain less information, and thus pruned results can be easily reproduced. Besides, we experimentally show that weights with high-rank feature maps contain more important information, such that even when a portion is not updated, very little damage would be done to the model performance. Without introducing any additional constraints, HRank leads to significant improvements over the state-of-the-arts in terms of FLOPs and parameters reduction, with similar accuracies. For example, with ResNet-110, we achieve a 58.2%-FLOPs reduction by removing 59.2% of the parameters, with only a small loss of $0.14\%$ in top-1 accuracy on CIFAR-10. With Res-50, we achieve a 43.8%-FLOPs reduction by removing 36.7% of the parameters, with only a loss of 1.17% in the top-1 accuracy on ImageNet. The codes can be available at https://github.com/lmbxmu/HRank.

...read moreread less

527 citations

Proceedings Article•DOI•

HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

[...]

Bowen Cheng¹, Bin Xiao², Jingdong Wang², Honghui Shi³, Thomas S. Huang¹, Lei Zhang² - Show less +2 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, Microsoft², University of Oregon³

14 Jun 2020

TL;DR: HigherHRNet is presented, a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids that surpasses all top-down methods on CrowdPose test and achieves new state-of-the-art result on COCO test-dev, suggesting its robustness in crowded scene.

...read moreread less

Abstract: Bottom-up human pose estimation methods have difficulties in predicting the correct pose for small persons due to challenges in scale variation. In this paper, we present HigherHRNet: a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids. Equipped with multi-resolution supervision for training and multi-resolution aggregation for inference, the proposed approach is able to solve the scale variation challenge in bottom-up multi-person pose estimation and localize keypoints more precisely, especially for small person. The feature pyramid in HigherHRNet consists of feature map outputs from HRNet and upsampled higher-resolution outputs through a transposed convolution. HigherHRNet outperforms the previous best bottom-up method by 2.5% AP for medium person on COCO test-dev, showing its effectiveness in handling scale variation. Furthermore, HigherHRNet achieves new state-of-the-art result on COCO test-dev (70.5% AP) without using refinement or other post-processing techniques, surpassing all existing bottom-up methods. HigherHRNet even surpasses all top-down methods on CrowdPose test (67.6% AP), suggesting its robustness in crowded scene.

...read moreread less

459 citations

Proceedings Article•DOI•

Image2StyleGAN++: How to Edit the Embedded Images?

[...]

Rameen Abdal¹, Yipeng Qin², Peter Wonka¹•Institutions (2)

King Abdullah University of Science and Technology¹, Cardiff University²

14 Jun 2020

TL;DR: A framework that combines embedding with activation tensor manipulation to perform high quality local edits along with global semantic edits on images and can restore high frequency features in images and thus significantly improves the quality of reconstructed images.

...read moreread less

Abstract: We propose Image2StyleGAN++, a flexible image editing framework with many applications. Our framework extends the recent Image2StyleGAN in three ways. First, we introduce noise optimization as a complement to the W+ latent space embedding. Our noise optimization can restore high frequency features in images and thus significantly improves the quality of reconstructed images, e.g. a big increase of PSNR from 20 dB to 45 dB. Second, we extend the global W+ latent space embedding to enable local embeddings. Third, we combine embedding with activation tensor manipulation to perform high quality local edits along with global semantic edits on images. Such edits motivate various high quality image editing applications, e.g. image reconstruction, image inpainting, image crossover, local style transfer, image editing using scribbles, and attribute level feature transfer. Examples of the edited images are shown across the paper for visual inspection.

...read moreread less

440 citations

Journal Article•DOI•

COVID-19 detection using deep learning models to exploit Social Mimic Optimization and structured chest X-ray images using fuzzy color and stacking approaches.

[...]

Mesut Toğaçar¹, Burhan Ergen¹, Zafer Cömert•Institutions (1)

Fırat University¹

01 Jun 2020-Computers in Biology and Medicine

TL;DR: With the proposed approach in this study, it is evident that the model can efficiently contribute to the detection of COVID-19 disease.

...read moreread less

Posted Content•

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields

[...]

Michael Niemeyer¹, Andreas Geiger¹•Institutions (1)

Max Planck Society¹

24 Nov 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: The key hypothesis is that incorporating a compositional 3D scene representation into the generative model leads to more controllable image synthesis and a fast and realistic image synthesis model is proposed.

...read moreread less

Abstract: Deep generative models allow for photorealistic image synthesis at high resolutions. But for many applications, this is not enough: content creation also needs to be controllable. While several recent works investigate how to disentangle underlying factors of variation in the data, most of them operate in 2D and hence ignore that our world is three-dimensional. Further, only few works consider the compositional nature of scenes. Our key hypothesis is that incorporating a compositional 3D scene representation into the generative model leads to more controllable image synthesis. Representing scenes as compositional generative neural feature fields allows us to disentangle one or multiple objects from the background as well as individual objects' shapes and appearances while learning from unstructured and unposed image collections without any additional supervision. Combining this scene representation with a neural rendering pipeline yields a fast and realistic image synthesis model. As evidenced by our experiments, our model is able to disentangle individual objects and allows for translating and rotating them in the scene as well as changing the camera pose.

...read moreread less

Journal Article•DOI•

MedGAN: Medical image translation using GANs

[...]

Karim Armanious¹, Karim Armanious², Chenming Jiang¹, Marc Fischer¹, Thomas Küstner³, Tobias Hepp², Konstantin Nikolaou², Sergios Gatidis², Bin Yang¹ - Show less +5 more•Institutions (3)

University of Stuttgart¹, University of Tübingen², King's College London³

01 Jan 2020-Computerized Medical Imaging and Graphics

TL;DR: A new framework, named MedGAN, is proposed for medical image-to-image translation which operates on the image level in an end- to-end manner and outperforms other existing translation approaches.

...read moreread less

Journal Article•DOI•

Blind Image Quality Assessment Using a Deep Bilinear Convolutional Neural Network

[...]

Weixia Zhang¹, Kede Ma², Jia Yan¹, Dexiang Deng¹, Zhou Wang³ - Show less +1 more•Institutions (3)

Wuhan University¹, Center for Neural Science², University of Waterloo³

01 Jan 2020-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A deep bilinear model for blind image quality assessment that works for both synthetically and authentically distorted images and achieves state-of-the-art performance on both synthetic and authentic IQA databases is proposed.

...read moreread less

Abstract: We propose a deep bilinear model for blind image quality assessment that works for both synthetically and authentically distorted images. Our model constitutes two streams of deep convolutional neural networks (CNNs), specializing in two distortion scenarios separately. For synthetic distortions, we first pre-train a CNN to classify the distortion type and the level of an input image, whose ground truth label is readily available at a large scale. For authentic distortions, we make use of a pre-train CNN (VGG-16) for the image classification task. The two feature sets are bilinearly pooled into one representation for a final quality prediction. We fine-tune the whole network on the target databases using a variant of stochastic gradient descent. The extensive experimental results show that the proposed model achieves state-of-the-art performance on both synthetic and authentic IQA databases. Furthermore, we verify the generalizability of our method on the large-scale Waterloo Exploration Database, and demonstrate its competitiveness using the group maximum differentiation competition methodology.

...read moreread less

Proceedings Article•DOI•

Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion

[...]

Julian Chibane¹, Thiemo Alldieck², Gerard Pons-Moll²•Institutions (2)

University of Würzburg¹, Max Planck Society²

14 Jun 2020

TL;DR: In this paper, the authors proposed Implicit Feature Networks (IF-Nets), which deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data retaining the nice properties of recent learned implicit functions.

...read moreread less

Abstract: While many works focus on 3D reconstruction from images, in this paper, we focus on 3D shape reconstruction and completion from a variety of 3D inputs, which are deficient in some respect: low and high resolution voxels, sparse and dense point clouds, complete or incomplete. Processing of such 3D inputs is an increasingly important problem as they are the output of 3D scanners, which are becoming more accessible, and are the intermediate output of 3D computer vision algorithms. Recently, learned implicit functions have shown great promise as they produce continuous reconstructions. However, we identified two limitations in reconstruction from 3D inputs: 1) details present in the input data are not retained, and 2) poor reconstruction of articulated humans. To solve this, we propose Implicit Feature Networks (IF-Nets), which deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data retaining the nice properties of recent learned implicit functions, but critically they can also retain detail when it is present in the input data, and can reconstruct articulated humans. Our work differs from prior work in two crucial aspects. First, instead of using a single vector to encode a 3D shape, we extract a learnable 3-dimensional multi-scale tensor of deep features, which is aligned with the original Euclidean space embedding the shape. Second, instead of classifying x-y-z point coordinates directly, we classify deep features extracted from the tensor at a continuous query point. We show that this forces our model to make decisions based on global and local shape structure, as opposed to point coordinates, which are arbitrary under Euclidean transformations. Experiments demonstrate that IF-Nets outperform prior work in 3D object reconstruction in ShapeNet, and obtain significantly more accurate 3D human reconstructions. Code and project website is available at https://virtualhumans.mpi-inf.mpg.de/ifnets/.

...read moreread less

Journal Article•DOI•

FFA-Net: Feature Fusion Attention Network for Single Image Dehazing

[...]

Xu Qin¹, Zhilin Wang², Yuanchao Bai¹, Xiaodong Xie¹, Huizhu Jia¹ - Show less +1 more•Institutions (2)

Peking University¹, Beihang University²

03 Apr 2020

TL;DR: Zhang et al. as mentioned in this paper proposed an end-to-end feature fusion at-tention network (FFA-Net) to directly restore the haze-free image, which consists of three key components: Feature Attention (FA) module combines Channel Attention with Pixel Attention mechanism, considering that different channel-wise features contain totally different weighted information and haze distribution is uneven on the different image pixels.

...read moreread less

Abstract: In this paper, we propose an end-to-end feature fusion at-tention network (FFA-Net) to directly restore the haze-free image. The FFA-Net architecture consists of three key components:1) A novel Feature Attention (FA) module combines Channel Attention with Pixel Attention mechanism, considering that different channel-wise features contain totally different weighted information and haze distribution is uneven on the different image pixels. FA treats different features and pixels unequally, which provides additional flexibility in dealing with different types of information, expanding the representational ability of CNNs. 2) A basic block structure consists of Local Residual Learning and Feature Attention, Local Residual Learning allowing the less important information such as thin haze region or low-frequency to be bypassed through multiple local residual connections, let main network architecture focus on more effective information. 3) An Attention-based different levels Feature Fusion (FFA) structure, the feature weights are adaptively learned from the Feature Attention (FA) module, giving more weight to important features. This structure can also retain the information of shallow layers and pass it into deep layers.The experimental results demonstrate that our proposed FFA-Net surpasses previous state-of-the-art single image dehazing methods by a very large margin both quantitatively and qualitatively, boosting the best published PSNR metric from 30.23 dB to 36.39 dB on the SOTS indoor test dataset. Code has been made available at GitHub.

...read moreread less

Journal Article•DOI•

A Smart Healthcare Monitoring System for Heart Disease Prediction Based On Ensemble Deep Learning and Feature Fusion

[...]

Farman Ali¹, Shaker El-Sappagh², Shaker El-Sappagh³, S. M. Riazul Islam¹, Daehan Kwak⁴, Amjad Ali⁵, Muhammad Imran⁶, Kyung Sup Kwak⁷ - Show less +4 more•Institutions (7)

Sejong University¹, Banha University², University of Santiago de Compostela³, Kean University⁴, COMSATS Institute of Information Technology⁵, King Saud University⁶, Inha University⁷

26 Jun 2020-Information Fusion

TL;DR: A smart healthcare system is proposed for heart disease prediction using ensemble deep learning and feature fusion approaches and obtains accuracy of 98.5%, which is higher than existing systems.

...read moreread less

Posted Content•

DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution.

[...]

Siyuan Qiao¹, Liang-Chieh Chen², Alan L. Yuille¹•Institutions (2)

Johns Hopkins University¹, Google²

03 Jun 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes Recursive Feature Pyramid, which incorporates extra feedback connections from Feature Pyramid Networks into the bottom-up backbone layers and proposes Switchable Atrous Convolution, which convolves the features with different atrous rates and gathers the results using switch functions.

...read moreread less

Abstract: Many modern object detectors demonstrate outstanding performances by using the mechanism of looking and thinking twice. In this paper, we explore this mechanism in the backbone design for object detection. At the macro level, we propose Recursive Feature Pyramid, which incorporates extra feedback connections from Feature Pyramid Networks into the bottom-up backbone layers. At the micro level, we propose Switchable Atrous Convolution, which convolves the features with different atrous rates and gathers the results using switch functions. Combining them results in DetectoRS, which significantly improves the performances of object detection. On COCO test-dev, DetectoRS achieves state-of-the-art 55.7% box AP for object detection, 48.5% mask AP for instance segmentation, and 50.0% PQ for panoptic segmentation. The code is made publicly available.

...read moreread less

Journal Article•DOI•

A Combined Deep CNN-LSTM Network for the Detection of Novel Coronavirus (COVID-19) Using X-ray Images

[...]

Md. Zabirul Islam¹, Md. Milon Islam¹, Amanullah Asraf¹•Institutions (1)

Khulna University of Engineering & Technology¹

15 Aug 2020-Informatics in Medicine Unlocked

TL;DR: This paper aims to introduce a deep learning technique based on the combination of a convolutional neural network (CNN) and long short-term memory (LSTM) to diagnose COVID-19 automatically from X-ray images, which achieved desired results on the currently available dataset.

...read moreread less

Proceedings Article•DOI•

Relation-Aware Global Attention for Person Re-Identification

[...]

Zhizheng Zhang¹, Cuiling Lan², Wenjun Zeng², Xin Jin¹, Zhibo Chen¹ - Show less +1 more•Institutions (2)

University of Science and Technology of China¹, Microsoft²

14 Jun 2020

TL;DR: This work proposes an effective Relation-Aware Global Attention (RGA) module which captures the global structural information for better attention learning and proposes to stack the relations, i.e., its pairwise correlations/affinities with all the feature positions together to learn the attention with a shallow convolutional model.

...read moreread less

Abstract: For person re-identification (re-id), attention mechanisms have become attractive as they aim at strengthening discriminative features and suppressing irrelevant ones, which matches well the key of re-id, i.e., discriminative feature learning. Previous approaches typically learn attention using local convolutions, ignoring the mining of knowledge from global structure patterns. Intuitively, the affinities among spatial positions/nodes in the feature map provide clustering-like information and are helpful for inferring semantics and thus attention, especially for person images where the feasible human poses are constrained. In this work, we propose an effective Relation-Aware Global Attention (RGA) module which captures the global structural information for better attention learning. Specifically, for each feature position, in order to compactly grasp the structural information of global scope and local appearance information, we propose to stack the relations, i.e., its pairwise correlations/affinities with all the feature positions (e.g., in raster scan order), and the feature itself together to learn the attention with a shallow convolutional model. Extensive ablation studies demonstrate that our RGA can significantly enhance the feature representation power and help achieve the state-of-the-art performance on several popular benchmarks. The source code is available at https://github.com/microsoft/Relation-Aware-Global-Attention-Networks.

...read moreread less

Journal Article•DOI•

F³Net: Fusion, Feedback and Focus for Salient Object Detection

[...]

Jun Wei¹, Shuhui Wang¹, Qingming Huang•Institutions (1)

Chinese Academy of Sciences¹

03 Apr 2020

TL;DR: The F3Net is able to segment salient object regions accurately and provide clear local details and outperforms state-of-the-art approaches on six evaluation metrics.

...read moreread less

Abstract: Most of existing salient object detection models have achieved great progress by aggregating multi-level features extracted from convolutional neural networks. However, because of the different receptive fields of different convolutional layers, there exists big differences between features generated by these layers. Common feature fusion strategies (addition or concatenation) ignore these differences and may cause suboptimal solutions. In this paper, we propose the F3Net to solve above problem, which mainly consists of cross feature module (CFM) and cascaded feedback decoder (CFD) trained by minimizing a new pixel position aware loss (PPA). Specifically, CFM aims to selectively aggregate multi-level features. Different from addition and concatenation, CFM adaptively selects complementary components from input features before fusion, which can effectively avoid introducing too much redundant information that may destroy the original features. Besides, CFD adopts a multi-stage feedback mechanism, where features closed to supervision will be introduced to the output of previous layers to supplement them and eliminate the differences between features. These refined features will go through multiple similar iterations before generating the final saliency maps. Furthermore, different from binary cross entropy, the proposed PPA loss doesn't treat pixels equally, which can synthesize the local structure information of a pixel to guide the network to focus more on local details. Hard pixels from boundaries or error-prone parts will be given more attention to emphasize their importance. F3Net is able to segment salient object regions accurately and provide clear local details. Comprehensive experiments on five benchmark datasets demonstrate that F3Net outperforms state-of-the-art approaches on six evaluation metrics. Code will be released at https://github.com/weijun88/F3Net.

...read moreread less

Journal Article•DOI•

Deep Learning Fault Diagnosis Method Based on Global Optimization GAN for Unbalanced Data

[...]

Funa Zhou¹, Funa Zhou², Shuai Yang², Hamido Fujita³, Danmin Chen², Chenglin Wen⁴ - Show less +2 more•Institutions (4)

Shanghai Maritime University¹, Henan University², Iwate Prefectural University³, Hangzhou Dianzi University⁴

01 Jan 2020-Knowledge Based Systems

TL;DR: New generator and discriminator of Generative Adversarial Network (GAN) are designed in this paper to generate more discriminant fault samples using a scheme of global optimization to solve the problem of unbalanced fault samples.

...read moreread less

Abstract: Deep learning can be applied to the field of fault diagnosis for its powerful feature representation capabilities. When a certain class fault samples available are very limited, it is inevitably to be unbalanced. The fault feature extracted from unbalanced data via deep learning is inaccurate, which can lead to high misclassification rate. To solve this problem, new generator and discriminator of Generative Adversarial Network (GAN) are designed in this paper to generate more discriminant fault samples using a scheme of global optimization. The generator is designed to generate those fault feature extracted from a few fault samples via Auto Encoder (AE) instead of fault data sample. The training of the generator is guided by fault feature and fault diagnosis error instead of the statistical coincidence of traditional GAN. The discriminator is designed to filter the unqualified generated samples in the sense that qualified samples are helpful for more accurate fault diagnosis. The experimental results of rolling bearings verify the effectiveness of the proposed algorithm.

...read moreread less

Posted Content•

PraNet: Parallel Reverse Attention Network for Polyp Segmentation

[...]

Deng-Ping Fan, Ge-Peng Ji¹, Tao Zhou, Geng Chen, Huazhu Fu, Jianbing Shen, Ling Shao - Show less +3 more•Institutions (1)

Wuhan University¹

13 Jun 2020-arXiv: Image and Video Processing

TL;DR: Quantitative and qualitative evaluations on five challenging datasets across six metrics show that the PraNet improves the segmentation accuracy significantly, and presents a number of advantages in terms of generalizability, and real-time segmentation efficiency.

...read moreread less

Abstract: Colonoscopy is an effective technique for detecting colorectal polyps, which are highly related to colorectal cancer. In clinical practice, segmenting polyps from colonoscopy images is of great importance since it provides valuable information for diagnosis and surgery. However, accurate polyp segmentation is a challenging task, for two major reasons: (i) the same type of polyps has a diversity of size, color and texture; and (ii) the boundary between a polyp and its surrounding mucosa is not sharp. To address these challenges, we propose a parallel reverse attention network (PraNet) for accurate polyp segmentation in colonoscopy images. Specifically, we first aggregate the features in high-level layers using a parallel partial decoder (PPD). Based on the combined feature, we then generate a global map as the initial guidance area for the following components. In addition, we mine the boundary cues using a reverse attention (RA) module, which is able to establish the relationship between areas and boundary cues. Thanks to the recurrent cooperation mechanism between areas and boundaries, our PraNet is capable of calibrating any misaligned predictions, improving the segmentation accuracy. Quantitative and qualitative evaluations on five challenging datasets across six metrics show that our PraNet improves the segmentation accuracy significantly, and presents a number of advantages in terms of generalizability, and real-time segmentation efficiency.

...read moreread less

Journal Article•DOI•

Global Context-Aware Progressive Aggregation Network for Salient Object Detection

[...]

Zuyao Chen¹, Qianqian Xu¹, Runmin Cong², Qingming Huang¹•Institutions (2)

Chinese Academy of Sciences¹, Beijing Jiaotong University²

03 Apr 2020

TL;DR: This work proposes a novel network named GCPANet to effectively integrate low-level appearance features, high-level semantic features, and global context features through some progressive context-aware Feature Interweaved Aggregation modules and generate the saliency map in a supervised way.

...read moreread less

Abstract: Deep convolutional neural networks have achieved competitive performance in salient object detection, in which how to learn effective and comprehensive features plays a critical role. Most of the previous works mainly adopted multiple-level feature integration yet ignored the gap between different features. Besides, there also exists a dilution process of high-level features as they passed on the top-down pathway. To remedy these issues, we propose a novel network named GCPANet to effectively integrate low-level appearance features, high-level semantic features, and global context features through some progressive context-aware Feature Interweaved Aggregation (FIA) modules and generate the saliency map in a supervised way. Moreover, a Head Attention (HA) module is used to reduce information redundancy and enhance the top layers features by leveraging the spatial and channel-wise attention, and the Self Refinement (SR) module is utilized to further refine and heighten the input features. Furthermore, we design the Global Context Flow (GCF) module to generate the global context information at different stages, which aims to learn the relationship among different salient regions and alleviate the dilution effect of high-level features. Experimental results on six benchmark datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.

...read moreread less

Posted Content•

Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers

[...]

Zhicheng Huang, Zhaoyang Zeng, Bei Liu, Dongmei Fu, Jianlong Fu - Show less +1 more

02 Apr 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: The Pixel-BERT which aligns semantic connection in pixel and text level solves the limitation of task-specific visual representation for vision and language tasks and relieves the cost of bounding box annotations and overcomes the unbalance between semantic labels in visual task and language semantic.

...read moreread less

Abstract: We propose Pixel-BERT to align image pixels with text by deep multi-modal transformers that jointly learn visual and language embedding in a unified end-to-end framework. We aim to build a more accurate and thorough connection between image pixels and language semantics directly from image and sentence pairs instead of using region-based image features as the most recent vision and language tasks. Our Pixel-BERT which aligns semantic connection in pixel and text level solves the limitation of task-specific visual representation for vision and language tasks. It also relieves the cost of bounding box annotations and overcomes the unbalance between semantic labels in visual task and language semantic. To provide a better representation for down-stream tasks, we pre-train a universal end-to-end model with image and sentence pairs from Visual Genome dataset and MS-COCO dataset. We propose to use a random pixel sampling mechanism to enhance the robustness of visual representation and to apply the Masked Language Model and Image-Text Matching as pre-training tasks. Extensive experiments on downstream tasks with our pre-trained model show that our approach makes the most state-of-the-arts in downstream tasks, including Visual Question Answering (VQA), image-text retrieval, Natural Language for Visual Reasoning for Real (NLVR). Particularly, we boost the performance of a single model in VQA task by 2.17 points compared with SOTA under fair comparison.

...read moreread less

Book Chapter•DOI•

PraNet: Parallel Reverse Attention Network for Polyp Segmentation

[...]

Deng-Ping Fan, Ge-Peng Ji¹, Tao Zhou, Geng Chen, Huazhu Fu, Jianbing Shen, Ling Shao - Show less +3 more•Institutions (1)

Wuhan University¹

04 Oct 2020

TL;DR: Wang et al. as mentioned in this paper proposed a parallel reverse attention network (PraNet) for accurate polyp segmentation in colonoscopy images, which first aggregate the features in high-level layers using a parallel partial decoder (PPD), and then generate a global map as the initial guidance area for the following components.

...read moreread less

Abstract: Colonoscopy is an effective technique for detecting colorectal polyps, which are highly related to colorectal cancer. In clinical practice, segmenting polyps from colonoscopy images is of great importance since it provides valuable information for diagnosis and surgery. However, accurate polyp segmentation is a challenging task, for two major reasons: (i) the same type of polyps has a diversity of size, color and texture; and (ii) the boundary between a polyp and its surrounding mucosa is not sharp. To address these challenges, we propose a parallel reverse attention network (PraNet) for accurate polyp segmentation in colonoscopy images. Specifically, we first aggregate the features in high-level layers using a parallel partial decoder (PPD). Based on the combined feature, we then generate a global map as the initial guidance area for the following components. In addition, we mine the boundary cues using the reverse attention (RA) module, which is able to establish the relationship between areas and boundary cues. Thanks to the recurrent cooperation mechanism between areas and boundaries, our PraNet is capable of calibrating some misaligned predictions, improving the segmentation accuracy. Quantitative and qualitative evaluations on five challenging datasets across six metrics show that our PraNet improves the segmentation accuracy significantly, and presents a number of advantages in terms of generalizability, and real-time segmentation efficiency ($\varvec{\sim }$50 fps).

...read moreread less

Journal Article•DOI•

Face Recognition Systems: A Survey

[...]

Yassin Kortli, Maher Jridi, Ayman Al Falou, Mohamed Atri¹•Institutions (1)

King Khalid University¹

07 Jan 2020-Sensors

TL;DR: This survey is to review some well-known techniques for each approach and to give the taxonomy of their categories and a solid discussion is given about future directions in terms of techniques to be used for face recognition.

...read moreread less

Abstract: Over the past few decades, interest in theories and algorithms for face recognition has been growing rapidly. Video surveillance, criminal identification, building access control, and unmanned and autonomous vehicles are just a few examples of concrete applications that are gaining attraction among industries. Various techniques are being developed including local, holistic, and hybrid approaches, which provide a face image description using only a few face image features or the whole facial features. The main contribution of this survey is to review some well-known techniques for each approach and to give the taxonomy of their categories. In the paper, a detailed comparison between these techniques is exposed by listing the advantages and the disadvantages of their schemes in terms of robustness, accuracy, complexity, and discrimination. One interesting feature mentioned in the paper is about the database used for face recognition. An overview of the most commonly used databases, including those of supervised and unsupervised learning, is given. Numerical results of the most interesting techniques are given along with the context of experiments and challenges handled by these techniques. Finally, a solid discussion is given in the paper about future directions in terms of techniques to be used for face recognition.

...read moreread less

Journal Article•DOI•

A Novel Medical Diagnosis model for COVID-19 infection detection based on Deep Features and Bayesian Optimization.

[...]

Majid Nour¹, Zafer Cömert, Kemal Polat²•Institutions (2)

King Abdulaziz University¹, Abant Izzet Baysal University²

28 Jul 2020-Applied Soft Computing

TL;DR: A cheap, fast, and reliable intelligence tool has been provided for COVID-19 infection detection, and the developed model can be used to assist field specialists, physicians, and radiologists in the decision-making process.

...read moreread less

Journal Article•DOI•

Unsupervised Bidirectional Cross-Modality Adaptation via Deeply Synergistic Image and Feature Alignment for Medical Image Segmentation

[...]

Cheng Chen¹, Qi Dou¹, Hao Chen¹, Jing Qin², Pheng-Ann Heng¹ - Show less +1 more•Institutions (2)

The Chinese University of Hong Kong¹, Hong Kong Polytechnic University²

10 Feb 2020-IEEE Transactions on Medical Imaging

TL;DR: In this paper, the authors proposed a novel unsupervised domain adaptation framework, named as synergistic image and feature alignment (SIFA), to effectively adapt a segmentation network to an unlabeled target domain.

...read moreread less

Abstract: Unsupervised domain adaptation has increasingly gained interest in medical image computing, aiming to tackle the performance degradation of deep neural networks when being deployed to unseen data with heterogeneous characteristics. In this work, we present a novel unsupervised domain adaptation framework, named as Synergistic Image and Feature Alignment (SIFA) , to effectively adapt a segmentation network to an unlabeled target domain. Our proposed SIFA conducts synergistic alignment of domains from both image and feature perspectives. In particular, we simultaneously transform the appearance of images across domains and enhance domain-invariance of the extracted features by leveraging adversarial learning in multiple aspects and with a deeply supervised mechanism. The feature encoder is shared between both adaptive perspectives to leverage their mutual benefits via end-to-end learning. We have extensively evaluated our method with cardiac substructure segmentation and abdominal multi-organ segmentation for bidirectional cross-modality adaptation between MRI and CT images. Experimental results on two different tasks demonstrate that our SIFA method is effective in improving segmentation performance on unlabeled target images, and outperforms the state-of-the-art domain adaptation approaches by a large margin.

...read moreread less

Posted Content•DOI•

A Combined Deep CNN-LSTM Network for the Detection of Novel Coronavirus (COVID-19) Using X-ray Images

[...]

Md. Zabirul Islam¹, Md. Milon Islam¹, Amanullah Asraf¹•Institutions (1)

Khulna University of Engineering & Technology¹

20 Jun 2020-medRxiv

TL;DR: This paper aims to introduce a deep learning technique based on the combination of a convolutional neural network (CNN) and long short -term memory (LSTM) to diagnose COVID-19 automatically from X-ray images.

...read moreread less

Abstract: Nowadays automatic disease detection has become a crucial issue in medical science with the rapid growth of population. Coronavirus (COVID-19) has become one of the most severe and acute diseases in very recent times that has been spread globally. Automatic disease detection framework assists the doctors in the diagnosis of disease and provides exact, consistent, and fast reply as well as reduces the death rate. Therefore, an automated detection system should be implemented as the fastest way of diagnostic option to impede COVID-19 from spreading. This paper aims to introduce a deep learning technique based on the combination of a convolutional neural network (CNN) and long short-term memory (LSTM) to diagnose COVID-19 automatically from X-ray images. In this system, CNN is used for deep feature extraction and LSTM is used for detection using the extracted feature. A collection of 421 X-ray images including 141 images of COVID-19 is used as a dataset in this system. The experimental results show that our proposed system has achieved 97% accuracy, 91% specificity, and 100% sensitivity. The system achieved desired results on a small dataset which can be further improved when more COVID-19 images become available. The proposed system can assist doctors to diagnose and treatment the COVID-19 patients easily.

...read moreread less

Collapse