Showing papers on "Feature extraction published in 2021"

PDF

Open Access

Proceedings Article•DOI•

SwinIR: Image Restoration Using Swin Transformer

[...]

Jingyun Liang¹, Jiezhang Cao², Guolei Sun¹, Kai Zhang¹, Luc Van Gool¹, Radu Timofte¹ - Show less +2 more•Institutions (2)

ETH Zurich¹, South China University of Technology²

23 Aug 2021

TL;DR: Wang et al. as discussed by the authors proposed a strong baseline model SwinIR for image restoration based on the Swin Transformer, which consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction.

...read moreread less

Abstract: Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from low-quality images (e.g., downscaled, noisy and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on high-level vision tasks. In this paper, we propose a strong baseline model SwinIR for image restoration based on the Swin Transformer. SwinIR consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction. In particular, the deep feature extraction module is composed of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers together with a residual connection. We conduct experiments on three representative tasks: image super-resolution (including classical, lightweight and real-world image super-resolution), image denoising (including grayscale and color image denoising) and JPEG compression artifact reduction. Experimental results demonstrate that SwinIR outperforms state-of-the-art methods on different tasks by $\textbf{up to 0.14$\sim$0.45dB}$, while the total number of parameters can be reduced by $\textbf{up to 67%}$.

...read moreread less

1,064 citations

Journal Article•DOI•

Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey

[...]

Longlong Jing¹, Yingli Tian¹•Institutions (1)

City University of New York¹

01 Nov 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos as a subset of unsupervised learning methods to learn general image and video features from large-scale unlabeled data without using any human-annotated labels is provided.

...read moreread less

Abstract: Large-scale labeled data are generally required to train deep neural networks in order to obtain better performance in visual feature learning from images or videos for computer vision applications. To avoid extensive cost of collecting and annotating large-scale datasets, as a subset of unsupervised learning methods, self-supervised learning methods are proposed to learn general image and video features from large-scale unlabeled data without using any human-annotated labels. This paper provides an extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos. First, the motivation, general pipeline, and terminologies of this field are described. Then the common deep neural network architectures that used for self-supervised learning are summarized. Next, the schema and evaluation metrics of self-supervised learning methods are reviewed followed by the commonly used datasets for images, videos, audios, and 3D data, as well as the existing self-supervised visual feature learning methods. Finally, quantitative performance comparisons of the reviewed methods on benchmark datasets are summarized and discussed for both image and video feature learning. At last, this paper is concluded and lists a set of promising future directions for self-supervised visual feature learning.

...read moreread less

876 citations

Journal Article•DOI•

GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild

[...]

Lianghua Huang¹, Xin Zhao¹, Kaiqi Huang¹•Institutions (1)

Chinese Academy of Sciences¹

01 May 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A large tracking database that offers an unprecedentedly wide coverage of common moving objects in the wild, called GOT-10k, and the first video trajectory dataset that uses the semantic hierarchy of WordNet to guide class population, which ensures a comprehensive and relatively unbiased coverage of diverse moving objects.

...read moreread less

Abstract: We introduce here a large tracking database that offers an unprecedentedly wide coverage of common moving objects in the wild, called GOT-10k. Specifically, GOT-10k is built upon the backbone of WordNet structure [1] and it populates the majority of over 560 classes of moving objects and 87 motion patterns, magnitudes wider than the most recent similar-scale counterparts [19] , [20] , [23] , [26] . By releasing the large high-diversity database, we aim to provide a unified training and evaluation platform for the development of class-agnostic, generic purposed short-term trackers. The features of GOT-10k and the contributions of this article are summarized in the following. (1) GOT-10k offers over 10,000 video segments with more than 1.5 million manually labeled bounding boxes, enabling unified training and stable evaluation of deep trackers. (2) GOT-10k is by far the first video trajectory dataset that uses the semantic hierarchy of WordNet to guide class population, which ensures a comprehensive and relatively unbiased coverage of diverse moving objects. (3) For the first time, GOT-10k introduces the one-shot protocol for tracker evaluation, where the training and test classes are zero-overlapped . The protocol avoids biased evaluation results towards familiar objects and it promotes generalization in tracker development. (4) GOT-10k offers additional labels such as motion classes and object visible ratios, facilitating the development of motion-aware and occlusion-aware trackers. (5) We conduct extensive tracking experiments with 39 typical tracking algorithms and their variants on GOT-10k and analyze their results in this paper. (6) Finally, we develop a comprehensive platform for the tracking community that offers full-featured evaluation toolkits, an online evaluation server, and a responsive leaderboard. The annotations of GOT-10k’s test data are kept private to avoid tuning parameters on it.

...read moreread less

852 citations

Proceedings Article•DOI•

VinVL: Revisiting Visual Representations in Vision-Language Models

[...]

Pengchuan Zhang¹, Xiujun Li¹, Xiaowei Hu¹, Jianwei Yang¹, Lei Zhang¹, Lijuan Wang¹, Yejin Choi², Jianfeng Gao¹ - Show less +4 more•Institutions (2)

Microsoft¹, University of Washington²

20 Jun 2021

TL;DR: Li et al. as discussed by the authors presented a detailed study of improving visual representations for vision language (VL) tasks, and developed an improved object detection model to provide object-centric representations of images.

...read moreread less

Abstract: This paper presents a detailed study of improving visual representations for vision language (VL) tasks and develops an improved object detection model to provide object-centric representations of images. Compared to the most widely used bottom-up and top-down model [2], the new model is bigger, better-designed for VL tasks, and pre-trained on much larger training corpora that combine multiple public annotated object detection datasets. Therefore, it can generate representations of a richer collection of visual objects and concepts. While previous VL research focuses mainly on improving the vision-language fusion model and leaves the object detection model improvement untouched, we show that visual features matter significantly in VL models. In our experiments we feed the visual features generated by the new object detection model into a Transformer-based VL fusion model OSCAR [20], and utilize an improved approach OSCAR+ to pre-train the VL model and fine-tune it on a wide range of downstream VL tasks. Our results show that the new visual features significantly improve the performance across all VL tasks, creating new state-of-the-art results on seven public benchmarks. Code, models and pre-extracted features are released at https://github.com/pzzhang/VinVL.

...read moreread less

543 citations

Journal Article•DOI•

A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the COVID-19 pandemic

[...]

Mohamed Loey¹, Gunasekaran Manogaran², Gunasekaran Manogaran³, Mohamed Hamed N. Taha⁴, Nour Eldeen M. Khalifa⁴ - Show less +1 more•Institutions (4)

Banha University¹, University of California, Davis², Asia University (Taiwan)³, Cairo University⁴

01 Jan 2021-Measurement

TL;DR: A hybrid model using deep and classical machine learning for face mask detection will be presented, and the SVM classifier achieved 99.64 % testing accuracy in RMFD.

...read moreread less

540 citations

Proceedings Article•DOI•

Transformer Tracking

[...]

Xin Chen¹, Bin Yan¹, Jiawen Zhu¹, Dong Wang¹, Xiaoyun Yang, Huchuan Lu¹ - Show less +2 more•Institutions (1)

Dalian University of Technology¹

01 Jun 2021

TL;DR: Transformer as discussed by the authors proposes an attention-based feature fusion network, which effectively combines the template and search region features solely using attention, and achieves very promising results on six challenging datasets, especially on large-scale LaSOT, TrackingNet, and GOT-10k benchmarks.

...read moreread less

Abstract: Correlation acts as a critical role in the tracking field, especially in recent popular Siamese-based trackers. The correlation operation is a simple fusion manner to consider the similarity between the template and the search region. However, the correlation operation itself is a local linear matching process, leading to lose semantic information and fall into local optimum easily, which may be the bottleneck of designing high-accuracy tracking algorithms. Is there any better feature fusion method than correlation? To address this issue, inspired by Transformer, this work presents a novel attention-based feature fusion network, which effectively combines the template and search region features solely using attention. Specifically, the proposed method includes an ego-context augment module based on self-attention and a cross-feature augment module based on cross-attention. Finally, we present a Transformer tracking (named TransT) method based on the Siamese-like feature extraction backbone, the designed attention-based fusion mechanism, and the classification and regression head. Experiments show that our TransT achieves very promising results on six challenging datasets, especially on large-scale LaSOT, TrackingNet, and GOT-10k benchmarks. Our tracker runs at approximatively 50 fps on GPU. Code and models are available at https://github.com/chenxin-dlut/TransT.

...read moreread less

528 citations

Journal Article•DOI•

Residual Dense Network for Image Restoration

[...]

Yulun Zhang¹, Yapeng Tian², Yu Kong³, Bineng Zhong⁴, Yun Fu¹ - Show less +1 more•Institutions (4)

Northeastern University¹, University of Rochester², Rochester Institute of Technology³, Huaqiao University⁴

01 Jul 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Zhang et al. as mentioned in this paper proposed a residual dense block (RDB) to extract abundant local features via densely connected convolutional layers, which further allows direct connections from the state of preceding RDB to all the layers of current RDB, leading to a contiguous memory mechanism.

...read moreread less

Abstract: Recently, deep convolutional neural network (CNN) has achieved great success for image restoration (IR) and provided hierarchical features at the same time. However, most deep CNN based IR models do not make full use of the hierarchical features from the original low-quality images; thereby, resulting in relatively-low performance. In this work, we propose a novel and efficient residual dense network (RDN) to address this problem in IR, by making a better tradeoff between efficiency and effectiveness in exploiting the hierarchical features from all the convolutional layers. Specifically, we propose residual dense block (RDB) to extract abundant local features via densely connected convolutional layers. RDB further allows direct connections from the state of preceding RDB to all the layers of current RDB, leading to a contiguous memory mechanism. To adaptively learn more effective features from preceding and current local features and stabilize the training of wider network, we proposed local feature fusion in RDB. After fully obtaining dense local features, we use global feature fusion to jointly and adaptively learn global hierarchical features in a holistic way. We demonstrate the effectiveness of RDN with several representative IR applications, single image super-resolution, Gaussian image denoising, image compression artifact reduction, and image deblurring. Experiments on benchmark and real-world datasets show that our RDN achieves favorable performance against state-of-the-art methods for each IR task quantitatively and visually.

...read moreread less

519 citations

Journal Article•DOI•

Deep Learning Approaches for COVID-19 Detection Based on Chest X-ray Images.

[...]

Aras Masood Ismael¹, Abdulkadir Sengur²•Institutions (2)

Sulaimani Polytechnic University¹, Fırat University²

01 Feb 2021-Expert Systems With Applications

TL;DR: Results showed the deep approaches to be quite efficient when compared to the local texture descriptors in the detection of COVID-19 based on chest X-ray images.

...read moreread less

Abstract: COVID-19 is a novel virus that causes infection in both the upper respiratory tract and the lungs. The numbers of cases and deaths have increased on a daily basis on the scale of a global pandemic. Chest X-ray images have proven useful for monitoring various lung diseases and have recently been used to monitor the COVID-19 disease. In this paper, deep-learning-based approaches, namely deep feature extraction, fine-tuning of pretrained convolutional neural networks (CNN), and end-to-end training of a developed CNN model, have been used in order to classify COVID-19 and normal (healthy) chest X-ray images. For deep feature extraction, pretrained deep CNN models (ResNet18, ResNet50, ResNet101, VGG16, and VGG19) were used. For classification of the deep features, the Support Vector Machines (SVM) classifier was used with various kernel functions, namely Linear, Quadratic, Cubic, and Gaussian. The aforementioned pretrained deep CNN models were also used for the fine-tuning procedure. A new CNN model is proposed in this study with end-to-end training. A dataset containing 180 COVID-19 and 200 normal (healthy) chest X-ray images was used in the study's experimentation. Classification accuracy was used as the performance measurement of the study. The experimental works reveal that deep learning shows potential in the detection of COVID-19 based on chest X-ray images. The deep features extracted from the ResNet50 model and SVM classifier with the Linear kernel function produced a 94.7% accuracy score, which was the highest among all the obtained results. The achievement of the fine-tuned ResNet50 model was found to be 92.6%, whilst end-to-end training of the developed CNN model produced a 91.6% result. Various local texture descriptors and SVM classifications were also used for performance comparison with alternative deep approaches; the results of which showed the deep approaches to be quite efficient when compared to the local texture descriptors in the detection of COVID-19 based on chest X-ray images.

...read moreread less

460 citations

Journal Article•DOI•

ABCDM: An Attention-based Bidirectional CNN-RNN Deep Model for sentiment analysis

[...]

Mohammad Ehsan Basiri¹, Shahla Nemati¹, Moloud Abdar², Erik Cambria³, U. Rajendra Acharya⁴ - Show less +1 more•Institutions (4)

Shahrekord University¹, Deakin University², Nanyang Technological University³, Ngee Ann Polytechnic⁴

01 Feb 2021-Future Generation Computer Systems

TL;DR: An Attention-based Bidirectional CNN-RNN Deep Model (ABCDM) is proposed that achieves state-of-the-art results on both long review and short tweet polarity classification and is evaluated on sentiment polarity detection.

...read moreread less

385 citations

Journal Article•DOI•

Deep face recognition: A survey

[...]

Mei Wang¹, Weihong Deng¹•Institutions (1)

Beijing University of Posts and Telecommunications¹

14 Mar 2021-Neurocomputing

TL;DR: A comprehensive review of the recent developments on deep face recognition can be found in this paper, covering broad topics on algorithm designs, databases, protocols, and application scenes, as well as the technical challenges and several promising directions.

...read moreread less

353 citations

Journal Article•DOI•

Deep Feature Learning for Medical Image Analysis with Convolutional Autoencoder Neural Network

[...]

Min Chen¹, Xiaobo Shi², Yin Zhang³, Di Wu⁴, Mohsen Guizani⁵ - Show less +1 more•Institutions (5)

Huazhong University of Science and Technology¹, Henan Normal University², Zhongnan University of Economics and Law³, Sun Yat-sen University⁴, University of Idaho⁵

01 Sep 2021-IEEE Transactions on Big Data

TL;DR: A convolutional autoencoder deep learning framework to support unsupervised image features learning for lung nodule through unlabeled data, which only needs a small amount of labeled data for efficient feature learning.

...read moreread less

Abstract: At present, computed tomography (CT) is widely used to assist disease diagnosis. Especially, computer aided diagnosis (CAD) based on artificial intelligence (AI) recently exhibits its importance in intelligent healthcare. However, it is a great challenge to establish an adequate labeled dataset for CT analysis assistance, due to the privacy and security issues. Therefore, this paper proposes a convolutional autoencoder deep learning framework to support unsupervised image features learning for lung nodule through unlabeled data, which only needs a small amount of labeled data for efficient feature learning. Through comprehensive experiments, it shows that the proposed scheme is superior to other approaches, which effectively solves the intrinsic labor-intensive problem during artificial image labeling. Moreover, it verifies that the proposed convolutional autoencoder approach can be extended for similarity measurement of lung nodules images. Especially, the features extracted through unsupervised learning are also applicable in other related scenarios.

...read moreread less

Proceedings Article•DOI•

You Only Look One-level Feature

[...]

Qiang Chen¹, Yingming Wang, Tong Yang, Xiangyu Zhang, Jian Cheng¹, Jian Sun - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

17 Mar 2021

TL;DR: YOuyang et al. as discussed by the authors revisited feature pyramids networks (FPN) for one-stage detectors and pointed out that the success of FPN is due to its divide-and-conquer solution to the optimization problem in object detection rather than multi-scale feature fusion.

...read moreread less

Abstract: This paper revisits feature pyramids networks (FPN) for one-stage detectors and points out that the success of FPN is due to its divide-and-conquer solution to the optimization problem in object detection rather than multi-scale feature fusion. From the perspective of optimization, we introduce an alternative way to address the problem instead of adopting the complex feature pyramids - utilizing only one-level feature for detection. Based on the simple and efficient solution, we present You Only Look One-level Feature (YOLOF). In our method, two key components, Dilated Encoder and Uniform Matching, are proposed and bring considerable improvements. Extensive experiments on the COCO benchmark prove the effectiveness of the proposed model. Our YOLOF achieves comparable results with its feature pyramids counterpart RetinaNet while being 2.5× faster. Without transformer layers, YOLOF can match the performance of DETR in a single-level feature manner with 7× less training epochs. Code is available at https://github.com/megvii-model/YOLOF.

...read moreread less

Journal Article•DOI•

DASNet: Dual Attentive Fully Convolutional Siamese Networks for Change Detection in High-Resolution Satellite Images

[...]

Jie Chen¹, Yuan Ziyang¹, Jian Peng¹, Li Chen¹, Haozhe Huang¹, Jiawei Zhu¹, Yu Liu², Haifeng Li¹ - Show less +4 more•Institutions (2)

Central South University¹, National University of Defense Technology²

01 Jan 2021-IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

TL;DR: The weighted double-margin contrastive loss is proposed to address the imbalanced sample is a serious problem in change detection, i.e., unchanged samples are much more abundant than changed samples, which is one of the main reasons for pseudochanges.

...read moreread less

Abstract: Change detection is a basic task of remote sensing image processing. The research objective is to identify the change information of interest and filter out the irrelevant change information as interference factors. Recently, the rise in deep learning has provided new tools for change detection, which have yielded impressive results. However, the available methods focus mainly on the difference information between multitemporal remote sensing images and lack robustness to pseudochange information. To overcome the lack of resistance in current methods to pseudochanges, in this article, we propose a new method, namely, dual attentive fully convolutional Siamese networks, for change detection in high-resolution images. Through the dual attention mechanism, long-range dependencies are captured to obtain more discriminant feature representations to enhance the recognition performance of the model. Moreover, the imbalanced sample is a serious problem in change detection, i.e., unchanged samples are much more abundant than changed samples, which is one of the main reasons for pseudochanges. We propose the weighted double-margin contrastive loss to address this problem by punishing attention to unchanged feature pairs and increasing attention to changed feature pairs. The experimental results of our method on the change detection dataset and the building change detection dataset demonstrate that compared with other baseline methods, the proposed method realizes maximum improvements of 2.9% and 4.2%, respectively, in the F 1 score. Our PyTorch implementation is available at https://github.com/lehaifeng/DASNet .

...read moreread less

Journal Article•DOI•

Multi-Scale Self-Guided Attention for Medical Image Segmentation

[...]

Ashish Sinha¹, Jose Dolz²•Institutions (2)

Indian Institute of Technology Roorkee¹, École de technologie supérieure²

01 Jan 2021-IEEE Journal of Biomedical and Health Informatics

TL;DR: Compared to other state-of-the-art segmentation networks, this model yields better segmentation performance, increasing the accuracy of the predictions while reducing the standard deviation, which demonstrates the efficiency of the approach to generate precise and reliable automatic segmentations of medical images.

...read moreread less

Abstract: Even though convolutional neural networks (CNNs) are driving progress in medical image segmentation, standard models still have some drawbacks. First, the use of multi-scale approaches, i.e., encoder-decoder architectures, leads to a redundant use of information, where similar low-level features are extracted multiple times at multiple scales. Second, long-range feature dependencies are not efficiently modeled, resulting in non-optimal discriminative feature representations associated with each semantic class. In this paper we attempt to overcome these limitations with the proposed architecture, by capturing richer contextual dependencies based on the use of guided self-attention mechanisms. This approach is able to integrate local features with their corresponding global dependencies, as well as highlight interdependent channel maps in an adaptive manner. Further, the additional loss between different modules guides the attention mechanisms to neglect irrelevant information and focus on more discriminant regions of the image by emphasizing relevant feature associations. We evaluate the proposed model in the context of semantic segmentation on three different datasets: abdominal organs, cardiovascular structures and brain tumors. A series of ablation experiments support the importance of these attention modules in the proposed architecture. In addition, compared to other state-of-the-art segmentation networks our model yields better segmentation performance, increasing the accuracy of the predictions while reducing the standard deviation. This demonstrates the efficiency of our approach to generate precise and reliable automatic segmentations of medical images. Our code is made publicly available at: https://github.com/sinAshish/Multi-Scale-Attention .

...read moreread less

Journal Article•DOI•

Fighting against COVID-19: A novel deep learning model based on YOLO-v2 with ResNet-50 for medical face mask detection.

[...]

Mohamed Loey¹, Gunasekaran Manogaran², Gunasekaran Manogaran³, Mohamed Hamed N. Taha⁴, Nour Eldeen M. Khalifa⁴ - Show less +1 more•Institutions (4)

Banha University¹, University of California, Davis², Asia University (Taiwan)³, Cairo University⁴

01 Feb 2021-Sustainable Cities and Society

TL;DR: The objective of this paper is to annotate and localize the medical face mask objects in real-life images to improve the object detection process and it is concluded that the adam optimizer achieved the highest average precision percentage of 81% as a detector.

...read moreread less

Journal Article•DOI•

Plant leaf disease classification using EfficientNet deep learning model

[...]

Ümit Atila¹, Murat Uçar, Kemal Akyol², Emine Uçar•Institutions (2)

Karabük University¹, Kastamonu University²

01 Mar 2021-Ecological Informatics

TL;DR: In this study, EfficientNet deep learning architecture was proposed in plant leaf disease classification and the performance of this model was compared with other state-of-the-art deep learning models.

...read moreread less

Journal Article•DOI•

SNUNet-CD: A Densely Connected Siamese Network for Change Detection of VHR Images

[...]

Sheng Fang¹, Kaiyu Li¹, Jinyuan Shao², Zhe Li¹•Institutions (2)

Shandong University of Science and Technology¹, Chinese Academy of Sciences²

17 Feb 2021-IEEE Geoscience and Remote Sensing Letters

TL;DR: Experimental results show that the proposed SNUNet-CD method improves greatly on many evaluation criteria and has a better tradeoff between accuracy and calculation amount than other state-of-the-art (SOTA) change detection methods.

...read moreread less

Abstract: Change detection is an important task in remote sensing (RS) image analysis. It is widely used in natural disaster monitoring and assessment, land resource planning, and other fields. As a pixel-to-pixel prediction task, change detection is sensitive about the utilization of the original position information. Recent change detection methods always focus on the extraction of deep change semantic feature, but ignore the importance of shallow-layer information containing high-resolution and fine-grained features, this often leads to the uncertainty of the pixels at the edge of the changed target and the determination miss of small targets. In this letter, we propose a densely connected siamese network for change detection, namely SNUNet-CD (the combination of Siamese network and NestedUNet). SNUNet-CD alleviates the loss of localization information in the deep layers of neural network through compact information transmission between encoder and decoder, and between decoder and decoder. In addition, Ensemble Channel Attention Module (ECAM) is proposed for deep supervision. Through ECAM, the most representative features of different semantic levels can be refined and used for the final classification. Experimental results show that our method improves greatly on many evaluation criteria and has a better tradeoff between accuracy and calculation amount than other state-of-the-art (SOTA) change detection methods.

...read moreread less

Journal Article•DOI•

Semi-Supervised Semantic Segmentation With High- and Low-Level Consistency

[...]

Sudhanshu Mittal¹, Maxim Tatarchenko¹, Thomas Brox¹•Institutions (1)

University of Freiburg¹

01 Apr 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work proposes an approach for semi-supervised semantic segmentation that learns from limited pixel-wise annotated samples while exploiting additional annotation-free images, and achieves significant improvement over existing methods, especially when trained with very few labeled samples.

...read moreread less

Abstract: The ability to understand visual information from limited labeled data is an important aspect of machine learning. While image-level classification has been extensively studied in a semi-supervised setting, dense pixel-level classification with limited data has only drawn attention recently. In this work, we propose an approach for semi-supervised semantic segmentation that learns from limited pixel-wise annotated samples while exploiting additional annotation-free images. The proposed approach relies on adversarial training with a feature matching loss to learn from unlabeled images. It uses two network branches that link semi-supervised classification with semi-supervised segmentation including self-training. The dual-branch approach reduces both the low-level and the high-level artifacts typical when training with few labels. The approach attains significant improvement over existing methods, especially when trained with very few labeled samples. On several standard benchmarks—PASCAL VOC 2012, PASCAL-Context, and Cityscapes—the approach achieves new state-of-the-art in semi-supervised learning.

...read moreread less

Proceedings Article•DOI•

Rethinking BiSeNet For Real-time Semantic Segmentation

[...]

Fan Mingyuan, Shenqi Lai, Junshi Huang, Xiaoming Wei, Zhenhua Chai, Luo Junfeng, Xiaolin Wei - Show less +3 more

27 Apr 2021

TL;DR: A novel and efficient structure named Short-Term Dense Concatenate network (STDC network) is proposed by removing structure redundancy by gradually reducing the dimension of feature maps and use the aggregation of them for image representation, which forms the basic module of STDC network.

...read moreread less

Abstract: BiSeNet [28], [27] has been proved to be a popular two-stream network for real-time segmentation. However, its principle of adding an extra path to encode spatial information is time-consuming, and the backbones borrowed from pretrained tasks, e.g., image classification, may be inefficient for image segmentation due to the deficiency of task-specific design. To handle these problems, we propose a novel and efficient structure named Short-Term Dense Concatenate network (STDC network) by removing structure redundancy. Specifically, we gradually reduce the dimension of feature maps and use the aggregation of them for image representation, which forms the basic module of STDC network. In the decoder, we propose a Detail Aggregation module by integrating the learning of spatial information into low-level layers in single-stream manner. Finally, the low-level features and deep features are fused to predict the final segmentation results. Extensive experiments on Cityscapes and CamVid dataset demonstrate the effectiveness of our method by achieving promising trade-off between segmentation accuracy and inference speed. On Cityscapes, we achieve 71.9% mIoU on the test set with a speed of 250.4 FPS on NVIDIA GTX 1080Ti, which is 45.2% faster than the latest methods, and achieve 76.8% mIoU with 97.0 FPS while inferring on higher resolution images. Code is available at https://github.com/MichaelFan01/STDC-Seg.

...read moreread less

Proceedings Article•DOI•

Multi-attentional Deepfake Detection

[...]

Hanqing Zhao¹, Tianyi Wei¹, Wenbo Zhou¹, Weiming Zhang¹, Dongdong Chen², Nenghai Yu¹ - Show less +2 more•Institutions (2)

University of Science and Technology of China¹, Microsoft²

20 Jun 2021

TL;DR: In this paper, a multi-attentional deepfake detection network is proposed, which consists of three key components: 1) multiple spatial attention heads to make the network attend to different local parts; 2) textural feature enhancement block to zoom in the subtle artifacts in shallow features; 3) aggregate the low-level textural features and high-level semantic features guided by the attention maps.

...read moreread less

Abstract: Face forgery by deepfake is widely spread over the internet and has raised severe societal concerns. Recently, how to detect such forgery contents has become a hot research topic and many deepfake detection methods have been proposed. Most of them model deepfake detection as a vanilla binary classification problem, i.e, first use a backbone network to extract a global feature and then feed it into a binary classifier (real/fake). But since the difference between the real and fake images in this task is often subtle and local, we argue this vanilla solution is not optimal. In this paper, we instead formulate deepfake detection as a fine-grained classification problem and propose a new multi-attentional deepfake detection network. Specifically, it consists of three key components: 1) multiple spatial attention heads to make the network attend to different local parts; 2) textural feature enhancement block to zoom in the subtle artifacts in shallow features; 3) aggregate the low-level textural feature and high-level semantic features guided by the attention maps. Moreover, to address the learning difficulty of this network, we further introduce a new regional independence loss and an attention guided data augmentation strategy. Through extensive experiments on different datasets, we demonstrate the superiority of our method over the vanilla binary classifier counterparts, and achieve state-of-the-art performance. The models will be released recently at https://github.com/yoctta/multiple-attention.

...read moreread less

Journal Article•DOI•

Deep Affinity Network for Multiple Object Tracking

[...]

Shijie Sun¹, Naveed Akhtar², Huansheng Song¹, Ajmal Mian², Mubarak Shah³ - Show less +1 more•Institutions (3)

Chang'an University¹, University of Western Australia², University of Central Florida³

01 Jan 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The proposed Deep Affinity Network (DAN) learns compact, yet comprehensive features of pre-detected objects at several levels of abstraction, and performs exhaustive pairing permutations of those features in any two frames to infer object affinities.

...read moreread less

Abstract: Multiple Object Tracking (MOT) plays an important role in solving many fundamental problems in video analysis and computer vision. Most MOT methods employ two steps: Object Detection and Data Association. The first step detects objects of interest in every frame of a video, and the second establishes correspondence between the detected objects in different frames to obtain their tracks. Object detection has made tremendous progress in the last few years due to deep learning. However, data association for tracking still relies on hand crafted constraints such as appearance, motion, spatial proximity, grouping etc. to compute affinities between the objects in different frames. In this paper, we harness the power of deep learning for data association in tracking by jointly modeling object appearances and their affinities between different frames in an end-to-end fashion. The proposed Deep Affinity Network (DAN) learns compact, yet comprehensive features of pre-detected objects at several levels of abstraction, and performs exhaustive pairing permutations of those features in any two frames to infer object affinities. DAN also accounts for multiple objects appearing and disappearing between video frames. We exploit the resulting efficient affinity computations to associate objects in the current frame deep into the previous frames for reliable on-line tracking. Our technique is evaluated on popular multiple object tracking challenges MOT15, MOT17 and UA-DETRAC. Comprehensive benchmarking under twelve evaluation metrics demonstrates that our approach is among the best performing techniques on the leader board for these challenges. The open source implementation of our work is available at https://github.com/shijieS/SST.git .

...read moreread less

Journal Article•DOI•

Residual Spectral–Spatial Attention Network for Hyperspectral Image Classification

[...]

Minghao Zhu¹, Licheng Jiao¹, Fang Liu¹, Shuyuan Yang¹, Jianing Wang¹ - Show less +1 more•Institutions (1)

Xidian University¹

01 Jan 2021-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: Zhang et al. as discussed by the authors proposed an end-to-end residual spectral-spatial attention network (RSSAN) for hyperspectral image classification, which takes raw 3D cubes as input data without additional feature engineering.

...read moreread less

Abstract: In the last five years, deep learning has been introduced to tackle the hyperspectral image (HSI) classification and demonstrated good performance. In particular, the convolutional neural network (CNN)-based methods for HSI classification have made great progress. However, due to the high dimensionality of HSI and equal treatment of all bands, the performance of these methods is hampered by learning features from useless bands for classification. Moreover, for patchwise-based CNN models, equal treatment of spatial information from the pixel-centered neighborhood also hinders the performance of these methods. In this article, we propose an end-to-end residual spectral–spatial attention network (RSSAN) for HSI classification. The RSSAN takes raw 3-D cubes as input data without additional feature engineering. First, a spectral attention module is designed for spectral band selection from raw input data by emphasizing useful bands for classification and suppressing useless bands. Then, a spatial attention module is designed for the adaptive selection of spatial information by emphasizing pixels from the same class as the center pixel or those are useful for classification in the pixel-centered neighborhood and suppressing those from a different class or useless. Second, two attention modules are also used in the following CNN for adaptive feature refinement in spectral–spatial feature learning. Third, a sequential spectral–spatial attention module is embedded into a residual block to avoid overfitting and accelerate the training of the proposed model. Experimental studies demonstrate that the RSSAN achieved superior classification accuracy compared with the state of the art on three HSI data sets: Indian Pines (IN), University of Pavia (UP), and Kennedy Space Center (KSC).

...read moreread less

Journal Article•DOI•

Deep Fuzzy Hashing Network for Efficient Image Retrieval

[...]

Huimin Lu¹, Ming Zhang², Xing Xu², Yujie Li³, Heng Tao Shen² - Show less +1 more•Institutions (3)

Kyushu Institute of Technology¹, University of Electronic Science and Technology of China², Yangzhou University³

01 Jan 2021-IEEE Transactions on Fuzzy Systems

TL;DR: The proposed deep fuzzy hashing network (DFHN) method combines the fuzzy logic technique and the DNN to learn more effective binary codes, which can leverage fuzzy rules to model the uncertainties underlying the data.

...read moreread less

Abstract: Hashing methods for efficient image retrieval aim at learning hash functions that map similar images to semantically correlated binary codes in the Hamming space with similarity well preserved. The traditional hashing methods usually represent image content by hand-crafted features. Deep hashing methods based on deep neural network (DNN) architectures can generate more effective image features and obtain better retrieval performance. However, the underlying data structure is hardly captured by existing DNN models. Moreover, the similarity (either visually or semantically) between pairwise images is ambiguous, even uncertain, to be measured in the existing deep hashing methods. In this article, we propose a novel hashing method termed deep fuzzy hashing network (DFHN) to overcome the shortcomings of existing deep hashing approaches. Our DFHN method combines the fuzzy logic technique and the DNN to learn more effective binary codes, which can leverage fuzzy rules to model the uncertainties underlying the data. Derived from fuzzy logic theory, the generalized hamming distance is devised in the convolutional layers and fully connected layers in our DFHN to model their outputs, which come from an efficient xor operation on given inputs and weights. Extensive experiments show that our DFHN method obtains competitive retrieval accuracy with highly efficient training speed compared with several state-of-the-art deep hashing approaches on two large-scale image datasets: CIFAR-10 and NUS-WIDE.

...read moreread less

Journal Article•DOI•

Deep Multi-View Enhancement Hashing for Image Retrieval

[...]

Chenggang Yan¹, Biao Gong¹, Yuxuan Wei², Yue Gao²•Institutions (2)

Hangzhou Dianzi University¹, Tsinghua University²

01 Apr 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Zhang et al. as discussed by the authors proposed a supervised multi-view hash model which can enhance the multiview information through neural networks, and the proposed method utilizes an effective view stability evaluation method to actively explore the relationship among views, which will affect the optimization direction of the entire network.

...read moreread less

Abstract: Hashing is an efficient method for nearest neighbor search in large-scale data space by embedding high-dimensional feature descriptors into a similarity preserving Hamming space with a low dimension. However, large-scale high-speed retrieval through binary code has a certain degree of reduction in retrieval accuracy compared to traditional retrieval methods. We have noticed that multi-view methods can well preserve the diverse characteristics of data. Therefore, we try to introduce the multi-view deep neural network into the hash learning field, and design an efficient and innovative retrieval model, which has achieved a significant improvement in retrieval performance. In this paper, we propose a supervised multi-view hash model which can enhance the multi-view information through neural networks. This is a completely new hash learning method that combines multi-view and deep learning methods. The proposed method utilizes an effective view stability evaluation method to actively explore the relationship among views, which will affect the optimization direction of the entire network. We have also designed a variety of multi-data fusion methods in the Hamming space to preserve the advantages of both convolution and multi-view. In order to avoid excessive computing resources on the enhancement procedure during retrieval, we set up a separate structure called memory network which participates in training together. The proposed method is systematically evaluated on the CIFAR-10, NUS-WIDE and MS-COCO datasets, and the results show that our method significantly outperforms the state-of-the-art single-view and multi-view hashing methods.

...read moreread less

Journal Article•DOI•

CA-Net: Comprehensive Attention Convolutional Neural Networks for Explainable Medical Image Segmentation

[...]

Ran Gu¹, Guotai Wang¹, Tao Song², Rui Huang², Michael Aertsen³, Jan Deprest⁴, Sebastien Ourselin⁴, Tom Vercauteren⁴, Shaoting Zhang¹ - Show less +5 more•Institutions (4)

University of Electronic Science and Technology of China¹, SenseTime², Katholieke Universiteit Leuven³, King's College London⁴

01 Feb 2021-IEEE Transactions on Medical Imaging

TL;DR: CA-Net as mentioned in this paper proposes a joint spatial attention module to make the network focus more on the foreground region and a novel channel attention module is proposed to adaptively recalibrate channel-wise feature responses and highlight the most relevant feature channels.

...read moreread less

Abstract: Accurate medical image segmentation is essential for diagnosis and treatment planning of diseases. Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance for automatic medical image segmentation. However, they are still challenged by complicated conditions where the segmentation target has large variations of position, shape and scale, and existing CNNs have a poor explainability that limits their application to clinical decisions. In this work, we make extensive use of multiple attentions in a CNN architecture and propose a comprehensive attention-based CNN (CA-Net) for more accurate and explainable medical image segmentation that is aware of the most important spatial positions, channels and scales at the same time. In particular, we first propose a joint spatial attention module to make the network focus more on the foreground region. Then, a novel channel attention module is proposed to adaptively recalibrate channel-wise feature responses and highlight the most relevant feature channels. Also, we propose a scale attention module implicitly emphasizing the most salient feature maps among multiple scales so that the CNN is adaptive to the size of an object. Extensive experiments on skin lesion segmentation from ISIC 2018 and multi-class segmentation of fetal MRI found that our proposed CA-Net significantly improved the average segmentation Dice score from 87.77% to 92.08% for skin lesion, 84.79% to 87.08% for the placenta and 93.20% to 95.88% for the fetal brain respectively compared with U-Net. It reduced the model size to around 15 times smaller with close or even better accuracy compared with state-of-the-art DeepLabv3+. In addition, it has a much higher explainability than existing networks by visualizing the attention weight maps. Our code is available at https://github.com/HiLab-git/CA-Net .

...read moreread less

Journal Article•DOI•

An Attention-Based Deep Learning Approach for Sleep Stage Classification With Single-Channel EEG

[...]

Emadeldeen Eldele¹, Zhenghua Chen², Chengyu Liu³, Min Wu², Chee Keong Kwoh¹, Xiaoli Li², Cuntai Guan¹ - Show less +3 more•Institutions (3)

Nanyang Technological University¹, Institute for Infocomm Research Singapore², Southeast University³

28 Apr 2021

TL;DR: In this paper, an attention-based deep learning architecture called AttnSleep was proposed to classify sleep stages using single-channel EEG signals, which leverages a multi-head attention mechanism to capture the temporal dependencies among the extracted features.

...read moreread less

Abstract: Automatic sleep stage mymargin classification is of great importance to measure sleep quality. In this paper, we propose a novel attention-based deep learning architecture called AttnSleep to classify sleep stages using single channel EEG signals. This architecture starts with the feature extraction module based on multi-resolution convolutional neural network (MRCNN) and adaptive feature recalibration (AFR). The MRCNN can extract low and high frequency features and the AFR is able to improve the quality of the extracted features by modeling the inter-dependencies between the features. The second module is the temporal context encoder (TCE) that leverages a multi-head attention mechanism to capture the temporal dependencies among the extracted features. Particularly, the multi-head attention deploys causal convolutions to model the temporal relations in the input features. We evaluate the performance of our proposed AttnSleep model using three public datasets. The results show that our AttnSleep outperforms state-of-the-art techniques in terms of different evaluation metrics. Our source codes, experimental data, and supplementary materials are available at https://github.com/emadeldeen24/AttnSleep .

...read moreread less

Journal Article•DOI•

Underwater Image Enhancement via Medium Transmission-Guided Multi-Color Space Embedding

[...]

Chongyi Li¹, Saeed Anwar², Junhui Hou¹, Runmin Cong³, Chunle Guo⁴, Wenqi Ren⁵ - Show less +2 more•Institutions (5)

City University of Hong Kong¹, Commonwealth Scientific and Industrial Research Organisation², Beijing Jiaotong University³, Nankai University⁴, Chinese Academy of Sciences⁵

07 May 2021-IEEE Transactions on Image Processing

TL;DR: Li et al. as mentioned in this paper proposed an underwater image enhancement network via medium transmission-guided multi-color space embedding, which enriches the diversity of feature representations by incorporating the characteristics of different color spaces into a unified structure.

...read moreread less

Abstract: Underwater images suffer from color casts and low contrast due to wavelength- and distance-dependent attenuation and scattering. To solve these two degradation issues, we present an underwater image enhancement network via medium transmission-guided multi-color space embedding, called Ucolor . Concretely, we first propose a multi-color space encoder network, which enriches the diversity of feature representations by incorporating the characteristics of different color spaces into a unified structure. Coupled with an attention mechanism, the most discriminative features extracted from multiple color spaces are adaptively integrated and highlighted. Inspired by underwater imaging physical models, we design a medium transmission (indicating the percentage of the scene radiance reaching the camera)-guided decoder network to enhance the response of network towards quality-degraded regions. As a result, our network can effectively improve the visual quality of underwater images by exploiting multiple color spaces embedding and the advantages of both physical model-based and learning-based methods. Extensive experiments demonstrate that our Ucolor achieves superior performance against state-of-the-art methods in terms of both visual quality and quantitative metrics. The code is publicly available at: https://li-chongyi.github.io/Proj_Ucolor.html .

...read moreread less

Journal Article•DOI•

Plant Disease Detection and Classification by Deep Learning—A Review

[...]

Lili Li¹, Shujuan Zhang¹, Bin Wang¹•Institutions (1)

Shanxi Agricultural University¹

08 Apr 2021-IEEE Access

TL;DR: In this paper, the authors present the current trends and challenges for the detection of plant leaf disease using deep learning and advanced imaging techniques, and discuss some of the current challenges and problems that need to be resolved.

...read moreread less

Abstract: Deep learning is a branch of artificial intelligence. In recent years, with the advantages of automatic learning and feature extraction, it has been widely concerned by academic and industrial circles. It has been widely used in image and video processing, voice processing, and natural language processing. At the same time, it has also become a research hotspot in the field of agricultural plant protection, such as plant disease recognition and pest range assessment, etc. The application of deep learning in plant disease recognition can avoid the disadvantages caused by artificial selection of disease spot features, make plant disease feature extraction more objective, and improve the research efficiency and technology transformation speed. This review provides the research progress of deep learning technology in the field of crop leaf disease identification in recent years. In this paper, we present the current trends and challenges for the detection of plant leaf disease using deep learning and advanced imaging techniques. We hope that this work will be a valuable resource for researchers who study the detection of plant diseases and insect pests. At the same time, we also discussed some of the current challenges and problems that need to be resolved.

...read moreread less

Journal Article•DOI•

A review of machine learning in building load prediction

[...]

Liang Zhang¹, Jin Wen², Yanfei Li¹, Jianli Chen³, Yunyang Ye⁴, Yangyang Fu⁵, William Livingood¹ - Show less +3 more•Institutions (5)

National Renewable Energy Laboratory¹, Drexel University², University of Utah³, Pacific Northwest National Laboratory⁴, Texas A&M University⁵

01 Mar 2021-Applied Energy

TL;DR: This paper reviews the application of machine learning techniques in building load prediction under the organization and logic of the machine learning, which is to perform tasks T using Performance measure P and based on learning from Experience E.

...read moreread less

Journal Article•DOI•

Variational LSTM Enhanced Anomaly Detection for Industrial Big Data

[...]

Xiaokang Zhou¹, Yiyong Hu², Wei Liang², Jianhua Ma³, Qun Jin⁴ - Show less +1 more•Institutions (4)

Shiga University¹, Hunan University of Technology², Hosei University³, Waseda University⁴

01 May 2021-IEEE Transactions on Industrial Informatics

TL;DR: Experiments demonstrate that the proposed VLSTM model can efficiently cope with imbalance and high-dimensional issues, and significantly improve the accuracy and reduce the false rate in anomaly detection for IBD according to F1, area under curve (AUC), and false alarm rate (FAR).

...read moreread less

Abstract: With the increasing population of Industry 4.0, industrial big data (IBD) has become a hotly discussed topic in digital and intelligent industry field. The security problem existing in the signal processing on large scale of data stream is still a challenge issue in industrial internet of things, especially when dealing with the high-dimensional anomaly detection for intelligent industrial application. In this article, to mitigate the inconsistency between dimensionality reduction and feature retention in imbalanced IBD, we propose a variational long short-term memory (VLSTM) learning model for intelligent anomaly detection based on reconstructed feature representation. An encoder–decoder neural network associated with a variational reparameterization scheme is designed to learn the low-dimensional feature representation from high-dimensional raw data. Three loss functions are defined and quantified to constrain the reconstructed hidden variable into a more explicit and meaningful form. A lightweight estimation network is then fed with the refined feature representation to identify anomalies in IBD. Experiments using a public IBD dataset named UNSW-NB15 demonstrate that the proposed VLSTM model can efficiently cope with imbalance and high-dimensional issues, and significantly improve the accuracy and reduce the false rate in anomaly detection for IBD according to F1, area under curve (AUC), and false alarm rate (FAR).

...read moreread less

Collapse