Showing papers in "Pattern Recognition in 2021"
TL;DR: In this article, the authors provide a deeper understanding of adversarial examples in the context of medical images and find that medical DNN models can be more vulnerable to adversarial attacks compared to models for natural images, according to two different viewpoints.
Abstract: Deep neural networks (DNNs) have become popular for medical image analysis tasks like cancer diagnosis and lesion detection. However, a recent study demonstrates that medical deep learning systems can be compromised by carefully-engineered adversarial examples/attacks with small imperceptible perturbations. This raises safety concerns about the deployment of these systems in clinical settings. In this paper, we provide a deeper understanding of adversarial examples in the context of medical images. We find that medical DNN models can be more vulnerable to adversarial attacks compared to models for natural images, according to two different viewpoints. Surprisingly, we also find that medical adversarial attacks can be easily detected, i.e., simple detectors can achieve over 98% detection AUC against state-of-the-art attacks, due to fundamental feature differences compared to normal examples. We believe these findings may be a useful basis to approach the design of more explainable and secure medical deep learning systems.
TL;DR: The RIAD approach (RIAD) randomly removes partial image regions and reconstructs the image from partial inpaintings, thus addressing the drawbacks of auto-enocoding methods.
Abstract: Visual anomaly detection addresses the problem of classification or localization of regions in an image that deviate from their normal appearance. A popular approach trains an auto-encoder on anomaly-free images and performs anomaly detection by calculating the difference between the input and the reconstructed image. This approach assumes that the auto-encoder will be unable to accurately reconstruct anomalous regions. But in practice neural networks generalize well even to anomalies and reconstruct them sufficiently well, thus reducing the detection capabilities. Accurate reconstruction is far less likely if the anomaly pixels were not visible to the auto-encoder. We thus cast anomaly detection as a self-supervised reconstruction-by-inpainting problem. Our approach (RIAD) randomly removes partial image regions and reconstructs the image from partial inpaintings, thus addressing the drawbacks of auto-enocoding methods. RIAD is extensively evaluated on several benchmarks and sets a new state-of-the art on a recent highly challenging anomaly detection benchmark.
TL;DR: Hypergraph convolution and hypergraph attention as discussed by the authors further enhances the capacity of representation learning by leveraging an attention module, which can be applied to diverse applications where non-pairwise relationships are observed.
Abstract: Recently, graph neural networks have attracted great attention and achieved prominent performance in various research fields. Most of those algorithms have assumed pairwise relationships of objects of interest. However, in many real applications, the relationships between objects are in higher-order, beyond a pairwise formulation. To efficiently learn deep embeddings on the high-order graph-structured data, we introduce two end-to-end trainable operators to the family of graph neural networks, i.e., hypergraph convolution and hypergraph attention. Whilst hypergraph convolution defines the basic formulation of performing convolution on a hypergraph, hypergraph attention further enhances the capacity of representation learning by leveraging an attention module. With the two operators, a graph neural network is readily extended to a more flexible model and applied to diverse applications where non-pairwise relationships are observed. Extensive experimental results with semi-supervised node classification demonstrate the effectiveness of hypergraph convolution and hypergraph attention.
TL;DR: An AI system based on deep meta learning is proposed in this research to accelerate analysis of chest X-ray (CXR) images in automatic detection of COVID-19 cases and achieves 95.6% accuracy and AUC of 0.97 in diagnosing CO VID-19 from CXR images even with a limited number of training samples.
Abstract: Various AI functionalities such as pattern recognition and prediction can effectively be used to diagnose (recognize) and predict coronavirus disease 2019 (COVID-19) infections and propose timely response (remedial action) to minimize the spread and impact of the virus. Motivated by this, an AI system based on deep meta learning has been proposed in this research to accelerate analysis of chest X-ray (CXR) images in automatic detection of COVID-19 cases. We present a synergistic approach to integrate contrastive learning with a fine-tuned pre-trained ConvNet encoder to capture unbiased feature representations and leverage a Siamese network for final classification of COVID-19 cases. We validate the effectiveness of our proposed model using two publicly available datasets comprising images from normal, COVID-19 and other pneumonia infected categories. Our model achieves 95.6% accuracy and AUC of 0.97 in diagnosing COVID-19 from CXR images even with a limited number of training samples.
TL;DR: The proposed cross-modality deep feature learning framework can effectively improve the brain tumor segmentation performance when compared with the baseline methods and state-of-the-art methods.
Abstract: Recent advances in machine learning and prevalence of digital medical images have opened up an opportunity to address the challenging brain tumor segmentation (BTS) task by using deep convolutional neural networks. However, different from the RGB image data that are very widespread, the medical image data used in brain tumor segmentation are relatively scarce in terms of the data scale but contain the richer information in terms of the modality property. To this end, this paper proposes a novel cross-modality deep feature learning framework to segment brain tumors from the multi-modality MRI data. The core idea is to mine rich patterns across the multi-modality data to make up for the insufficient data scale. The proposed cross-modality deep feature learning framework consists of two learning processes: the cross-modality feature transition (CMFT) process and the cross-modality feature fusion (CMFF) process, which aims at learning rich feature representations by transiting knowledge across different modality data and fusing knowledge from different modality data, respectively. Comprehensive experiments are conducted on the BraTS benchmarks, which show that the proposed cross-modality deep feature learning framework can effectively improve the brain tumor segmentation performance when compared with the baseline methods and state-of-the-art methods.
TL;DR: The extensive computer simulations show better efficiency and flexibility of this end-to-end learning approach on CT image segmentation with image enhancement comparing to the state of the art segmentation approaches, namely GraphCut, Medical Image Segmentation (MIS), and Watershed.
Abstract: History shows that the infectious disease (COVID-19) can stun the world quickly, causing massive losses to health, resulting in a profound impact on the lives of billions of people, from both a safety and an economic perspective, for controlling the COVID-19 pandemic. The best strategy is to provide early intervention to stop the spread of the disease. In general, Computer Tomography (CT) is used to detect tumors in pneumonia, lungs, tuberculosis, emphysema, or other pleura (the membrane covering the lungs) diseases. Disadvantages of CT imaging system are: inferior soft tissue contrast compared to MRI as it is X-ray-based Radiation exposure. Lung CT image segmentation is a necessary initial step for lung image analysis. The main challenges of segmentation algorithms exaggerated due to intensity in-homogeneity, presence of artifacts, and closeness in the gray level of different soft tissue. The goal of this paper is to design and evaluate an automatic tool for automatic COVID-19 Lung Infection segmentation and measurement using chest CT images. The extensive computer simulations show better efficiency and flexibility of this end-to-end learning approach on CT image segmentation with image enhancement comparing to the state of the art segmentation approaches, namely GraphCut, Medical Image Segmentation (MIS), and Watershed. Experiments performed on COVID-CT-Dataset containing (275) CT scans that are positive for COVID-19 and new data acquired from the EL-BAYANE center for Radiology and Medical Imaging. The means of statistical measures obtained using the accuracy, sensitivity, F-measure, precision, MCC, Dice, Jacquard, and specificity are 0.98, 0.73, 0.71, 0.73, 0.71, 0.71, 0.57, 0.99 respectively; which is better than methods mentioned above. The achieved results prove that the proposed approach is more robust, accurate, and straightforward.
TL;DR: In this article, a knowledge base graph embedding module is constructed to extend the versatility of knowledge-based VQA (Visual Question Answering) models, which extracts core entities from images and text, and maps them as knowledge base entities, then extracts the subgraphs closely related to the core entities, and converts the sub-graphs into low-dimensional vectors to realize subgraph embedding.
Abstract: In this paper, a knowledge base graph embedding module is constructed to extend the versatility of knowledge-based VQA (Visual Question Answering) models. The knowledge base graph embedding module constructed in this paper extracts core entities from images and text, and maps them as knowledge base entities, then extracts the sub-graphs closely related to the core entities, and converts the sub-graphs into low-dimensional vectors to realize sub-graph embedding. In order to achieve good subgraph embedding, we first extracted two experimental knowledge bases with rich semantics from DBpedia: DBV and DBA. Based on these two knowledge bases, this paper selects several excellent models in knowledge base embedding as test models, including SE (structured embedding),SME(semantic matching energy function), and TransE model to produce link prediction. The results show that there is a clear correspondence between the entities of the DBV, which can achieve excellent node embedding. And the TransE model can achieve a good knowledge base embedding, so we built the knowledge base graph embedding module based on TransE. And then we construct a VQA model (KBSN) based on the knowledge base graph embedding. Experimental results on VQA2.0 and KB-VQA data sets prove that the knowledge base graph embedding module improves the accuracy.
TL;DR: This paper proposes a novel criterion for CNN pruning inspired by neural network interpretability: the most relevant elements, i.e. weights or filters, are automatically found using their relevance scores obtained from concepts of explainable AI (XAI).
Abstract: The success of convolutional neural networks (CNNs) in various applications is accompanied by a significant increase in computation and parameter storage costs. Recent efforts to reduce these overheads involve pruning and compressing the weights of various layers while at the same time aiming to not sacrifice performance. In this paper, we propose a novel criterion for CNN pruning inspired by neural network interpretability: The most relevant units, i.e. weights or filters, are automatically found using their relevance scores obtained from concepts of explainable AI (XAI). By exploring this idea, we connect the lines of interpretability and model compression research. We show that our proposed method can efficiently prune CNN models in transfer-learning setups in which networks pre-trained on large corpora are adapted to specialized tasks. The method is evaluated on a broad range of computer vision datasets. Notably, our novel criterion is not only competitive or better compared to state-of-the-art pruning criteria when successive retraining is performed, but clearly outperforms these previous criteria in the resource-constrained application scenario in which the data of the task to be transferred to is very scarce and one chooses to refrain from fine-tuning. Our method is able to compress the model iteratively while maintaining or even improving accuracy. At the same time, it has a computational cost in the order of gradient computation and is comparatively simple to apply without the need for tuning hyperparameters for pruning.
TL;DR: In this paper, the authors present a comprehensive survey that covers various aspects of place recognition from a deep learning perspective and discuss the opportunities and challenges of using deep learning for place recognition.
Abstract: Visual place recognition has attracted widespread research interest in multiple fields such as computer vision and robotics. Recently, researchers have employed advanced deep learning techniques to tackle this problem. While an increasing number of studies have proposed novel place recognition methods based on deep learning, few of them has provided a whole picture about how and to what extent deep learning has been utilized for this issue. In this paper, by delving into over 200 references, we present a comprehensive survey that covers various aspects of place recognition from deep learning perspective. We first present a brief introduction of deep learning and discuss its opportunities for recognizing places. After that, we focus on existing approaches built upon convolutional neural networks, including off-the-shelf and specifically designed models as well as novel image representations. We also discuss challenging problems in place recognition and present an extensive review of the corresponding datasets. To explore the future directions, we describe open issues and some new tools, for instance, generative adversarial networks, semantic scene understanding and multi-modality feature learning for this research topic. Finally, a conclusion is drawn for this paper.
TL;DR: A novel feature selection algorithm based on bare bones PSO (BBPSO) with mutual information is proposed that can achieve a feature subset with better performance, and is a highly competitive FS algorithm.
Abstract: Feature selection (FS) is an important data processing method in pattern recognition and data mining. Due to not considering characteristics of the FS problem itself, traditional particle update mechanisms and swarm initialization strategies adopted in most particle swarm optimization (PSO) limit their performance on dealing with high-dimensional FS problems. Focused on it, this paper proposes a novel feature selection algorithm based on bare bones PSO (BBPSO) with mutual information. Firstly, an effective swarm initialization strategy based on label correlation is developed, making full use of the correlation between features and class labels to accelerate the convergence of swarm. Then, in order to enhance the exploitation performance of the algorithm, two local search operators, i.e., the supplementary operator and the deletion operator, are developed based on feature relevance-redundancy. Furthermore, an adaptive flip mutation operator is designed to help particles jump out of local optimal solutions. We apply the proposed algorithm to typical datasets based on the K-Nearest Neighbor classifier (K-NN), and compare it with eleven state-of-the-art algorithms, SFS, PTA, SGA, BPSO, PSO(4-2), HPSO-LS, Binary BPSO, NaFA, IBFA, KPLS-mRMR and SMBA-CSFS. The experimental results show that the proposed algorithm can achieve a feature subset with better performance, and is a highly competitive FS algorithm.
TL;DR: A new deep learning algorithm is proposed for the automated diagnosis of COVID-19, which only requires a few samples for training and uses contrastive learning to train an encoder which can capture expressive feature representations on large and publicly available lung datasets and adopt the prototypical network for classification.
Abstract: The current pandemic, caused by the outbreak of a novel coronavirus (COVID-19) in December 2019, has led to a global emergency that has significantly impacted economies, healthcare systems and personal wellbeing all around the world Controlling the rapidly evolving disease requires highly sensitive and specific diagnostics While RT-PCR is the most commonly used, it can take up to eight hours, and requires significant effort from healthcare professionals As such, there is a critical need for a quick and automatic diagnostic system Diagnosis from chest CT images is a promising direction However, current studies are limited by the lack of sufficient training samples, as acquiring annotated CT images is time-consuming To this end, we propose a new deep learning algorithm for the automated diagnosis of COVID-19, which only requires a few samples for training Specifically, we use contrastive learning to train an encoder which can capture expressive feature representations on large and publicly available lung datasets and adopt the prototypical network for classification We validate the efficacy of the proposed model in comparison with other competing methods on two publicly available and annotated COVID-19 CT datasets Our results demonstrate the superior performance of our model for the accurate diagnosis of COVID-19 based on chest CT images
TL;DR: A comprehensive survey of the state-of-the-art methods for imbalanced multi-label classification is provided in this paper, including the characteristics of imbalanced multilabel datasets, evaluation measures and comparative analysis of the proposed methods.
Abstract: Multi-Label Classification (MLC) is an extension of the standard single-label classification where each data instance is associated with several labels simultaneously. MLC has gained much importance in recent years due to its wide range of application domains. However, the class imbalance problem has become an inherent characteristic of many multi-label datasets, where the samples and their corresponding labels are non-uniformly distributed over the data space. The imbalanced problem in MLC imposes challenges to multi-label data analytics which can be viewed from three perspectives: imbalance within labels, among labels, and label-sets. In this paper, we provide a review of the approaches for handling the imbalance problem in multi-label data by collecting the existing research work. As the first systematic study of approaches addressing an imbalanced problem in MLC, this paper provides a comprehensive survey of the state-of-the-art methods for imbalanced MLC, including the characteristics of imbalanced multi-label datasets, evaluation measures and comparative analysis of the proposed methods. The study also discusses important results reported so far in the literature and highlights some of their strengths and limitations to guide future research.
TL;DR: In this article, explainable deep learning methods are grouped into three main categories: efficient deep learning via model compression and acceleration, as well as robustness and stability in deep learning.
Abstract: Deep learning has recently achieved great success in many visual recognition tasks. However, the deep neural networks (DNNs) are often perceived as black-boxes, making their decision less understandable to humans and prohibiting their usage in safety-critical applications. This guest editorial introduces the thirty papers accepted for the Special Issue on Explainable Deep Learning for Efficient and Robust Pattern Recognition. They are grouped into three main categories: explainable deep learning methods, efficient deep learning via model compression and acceleration, as well as robustness and stability in deep learning. For each of the three topics, a survey of the representative works and latest developments is presented, followed by the brief introduction of the accepted papers belonging to this topic. The special issue should be of high relevance to the reader interested in explainable deep learning methods for efficient and robust pattern recognition applications and it helps promoting the future research directions in this field.
TL;DR: This work represents the feasibility of using a novel deep learning-based CAD scheme to efficiently and accurately distinguish COVID-19 from CAP and detect localization with high accuracy and agreement with radiologists.
Abstract: The COVID-19 outbreak continues to threaten the health and life of people worldwide. It is an immediate priority to develop and test a computer-aided detection (CAD) scheme based on deep learning (DL) to automatically localize and differentiate COVID-19 from community-acquired pneumonia (CAP) on chest X-rays. Therefore, this study aims to develop and test an efficient and accurate deep learning scheme that assists radiologists in automatically recognizing and localizing COVID-19. A retrospective chest X-ray image dataset was collected from open image data and the Xiangya Hospital, which was divided into a training group and a testing group. The proposed CAD framework is composed of two steps with DLs: the Discrimination-DL and the Localization-DL. The first DL was developed to extract lung features from chest X-ray radiographs for COVID-19 discrimination and trained using 3548 chest X-ray radiographs. The second DL was trained with 406-pixel patches and applied to the recognized X-ray radiographs to localize and assign them into the left lung, right lung or bipulmonary. X-ray radiographs of CAP and healthy controls were enrolled to evaluate the robustness of the model. Compared to the radiologists' discrimination and localization results, the accuracy of COVID-19 discrimination using the Discrimination-DL yielded 98.71%, while the accuracy of localization using the Localization-DL was 93.03%. This work represents the feasibility of using a novel deep learning-based CAD scheme to efficiently and accurately distinguish COVID-19 from CAP and detect localization with high accuracy and agreement with radiologists.
TL;DR: This paper presents a deep RVFL network with stacked layers, inspired by the principles of Random Vector Functional Link (RVFL) network, and proposes an ensemble deep network that can be regarded as a marriage of ensemble learning with deep learning.
Abstract: In this paper, we propose deep learning frameworks based on the randomized neural network. Inspired by the principles of Random Vector Functional Link (RVFL) network, we present a deep RVFL network (dRVFL) with stacked layers. The parameters of the hidden layers of the dRVFL are randomly generated within a suitable range and kept fixed while the output weights are computed using the closed-form solution as in a standard RVFL network. We also propose an ensemble deep network (edRVFL) that can be regarded as a marriage of ensemble learning with deep learning. Unlike traditional ensembling approaches that require training several models independently from scratch, edRVFL is obtained by training a single dRVFL network once. Both dRVFL and edRVFL frameworks are generic and can be used with any RVFL variant. To illustrate this, we integrate the deep learning RVFL networks with a recently proposed sparse pre-trained RVFL (SP-RVFL). Experiments on 46 tabular UCI classification datasets and 12 sparse datasets demonstrate that the proposed deep RVFL networks outperform state-of-the-art deep feed-forward neural networks (FNNs).
TL;DR: In this article, a low-cost U-Net (LCU-Net) was proposed for the EM image segmentation task to assist microbiologists in detecting and identifying EMs more effectively.
Abstract: In this paper, we propose a novel Low-cost U-Net (LCU-Net) for the Environmental Microorganism (EM) image segmentation task to assist microbiologists in detecting and identifying EMs more effectively. The LCU-Net is an improved Convolutional Neural Network (CNN) based on U-Net, Inception, and concatenate operations. It addresses the limitation of single receptive field setting and the relatively high memory cost of U-Net. Experimental results show the effectiveness and potential of the proposed LCU-Net in the practical EM image segmentation field.
TL;DR: The experimental results show that the proposed complex-valued denoising CNN performs competitively against existing state-of-the-art real-valuedDenoisingCNNs, with better robustness to possible inconsistencies of noise models between training samples and test images.
Abstract: While complex-valued transforms have been widely used in image processing and have their deep connections to biological vision systems, complex-valued convolutional neural networks (CNNs) have not seen their applications in image recovery. This paper aims at investigating the potentials of complex-valued CNNs for image denoising. A CNN is developed for image denoising with its key mathematical operations defined in the complex number field to exploit the merits of complex-valued operations, including the compactness of convolution given by the tensor product of 1D complex-valued filters, the nonlinear activation on phase, and the noise robustness of residual blocks. The experimental results show that, the proposed complex-valued denoising CNN performs competitively against existing state-of-the-art real-valued denoising CNNs, with better robustness to possible inconsistencies of noise models between training samples and test images. The results also suggest that complex-valued CNNs provide another promising deep-learning-based approach to image denoising and other image recovery tasks.
TL;DR: An approximate approach, namely BLOCK-DBSCAN, is proposed for large scale data, which runs in about O(nlog (n) expected time and obtains almost the same result as DBSCAN.
Abstract: We analyze the drawbacks of DBSCAN and its variants, and find the grid technique, which is used in Fast-DBSCAN and ρ-approximate DBSCAN, is almost useless in high dimensional data space. Because it usually yields considerable redundant distance computations. In order to tame these problems, two techniques are proposed: one is to use ϵ 2 -norm ball to identify Inner Core Blocks within which all points are core points, it has higher efficiency than grid technique for finding more core points at one time; the other is a fast approximate algorithm for judging whether two Inner Core Blocks are density-reachable from each other. Besides, cover tree is also used to accelerate the process of density computations. Based on the three techniques, an approximate approach, namely BLOCK-DBSCAN, is proposed for large scale data, which runs in about O(nlog (n)) expected time and obtains almost the same result as DBSCAN. BLOCK-DBSCAN has two versions, i.e., L2 version can work well for relatively high dimensional data, and L∞ version is suitable for high dimensional data. Experimental results show that BLOCK-DBSCAN is promising and outperforms NQDBSCAN, ρ-approximate DBSCAN and AnyDBC.
TL;DR: The proposed CenterNet++ method achieves a remarkable accuracy improvement with negligible increase in time cost, and to alleviate the impact of complex background, head enhancement module is proposed for a balance between foreground and background.
Abstract: Ship detection in SAR images is a challenging task due to two difficulties. (1) Because of the long observation distance, ships in SAR images are small with low resolution, leading to high false negative. (2) Because of the complex onshore background, ships are easily confused with other objects with similar appearance. To solve these problems, we propose an effective and stable single-stage detector called CenterNet++. Our model mainly consists of three modules, i.e., feature refinement module, feature pyramids fusion module, and head enhancement module. Firstly, to address small objects detection problem, we design a feature refinement module for extracting multi-scale contextual information. Secondly, feature pyramids fusion module is developed for generating more powerful semantic information. Finally, to alleviate the impact of complex background, head enhancement module is proposed for a balance between foreground and background. To prove the effectiveness and robustness of the proposed method, we make extensive experiments on three popular SAR image datasets, i.e., AIR-SARShip, SSDD, SAR-Ship. The experimental results show that our CenterNet++ reaches state-of-the-art performance on all datasets. In addition, compared with the baseline CenterNet, the proposed method achieves a remarkable accuracy improvement with negligible increase in time cost.
TL;DR: This paper proposes a graph learning framework to preserve both the local and global structure of data that uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Abstract: Graphs have become increasingly popular in modeling structures and interactions in a wide variety of problems during the last decade. Graph-based clustering and semi-supervised classification techniques have shown impressive performance. This paper proposes a graph learning framework to preserve both the local and global structure of data. Specifically, our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure. Furthermore, most existing graph-based methods conduct clustering and semi-supervised classification on the graph learned from the original data matrix, which doesn’t have explicit cluster structure, thus they might not achieve the optimal performance. By considering rank constraint, the achieved graph will have exactly c connected components if there are c clusters or classes. As a byproduct of this, graph learning and label inference are jointly and iteratively implemented in a principled way. Theoretically, we show that our model is equivalent to a combination of kernel k-means and k-means methods under certain condition. Extensive experiments on clustering and semi-supervised classification demonstrate that the proposed method outperforms other state-of-the-art methods.
TL;DR: This paper is the first attempt to make a comprehensive review of vision-based lane detection methods, including traditionallane detection methods and related deep learning methods, and points out some directions to be further explored in the future.
Abstract: Lane detection is an application of environmental perception, which aims to detect lane areas or lane lines by camera or lidar. In recent years, gratifying progress has been made in detection accuracy. To the best of our knowledge, this paper is the first attempt to make a comprehensive review of vision-based lane detection methods. First, we introduce the background of lane detection, including traditional lane detection methods and related deep learning methods. Second, we group the existing lane detection methods into two categories: two-step and one-step methods. Around the above summary, we introduce lane detection methods from the following two perspectives: (1) network architectures, including classification and object detection-based methods, end-to-end image-segmentation based methods, and some optimization strategies; (2) related loss functions. For each method, its contributions and weaknesses are introduced. Then, a brief comparison of representative methods is presented. Finally, we conclude this survey with some current challenges, such as expensive computation and the lack of generalization. And we point out some directions to be further explored in the future, that is, semi-supervised learning, meta-learning and neural architecture search, etc.
TL;DR: Wen et al. as discussed by the authors proposed MASTER, a self-attention based scene text recognizer that not only encodes the input-output attention but also learns selfattention which encodes feature-feature and target-target relationships inside the encoder and decoder and owns a great training efficiency because of high training parallelization and a high speed inference because of an efficient memory-cache mechanism.
Abstract: Attention-based scene text recognizers have gained huge success, which leverages a more compact intermediate representation to learn 1d- or 2d- attention by a RNN-based encoder-decoder architecture. However, such methods suffer from attention-driftproblem because high similarity among encoded features leads to attention confusion under the RNN-based local attention mechanism. Moreover, RNN-based methods have low efficiency due to poor parallelization. To overcome these problems, we propose the MASTER, a self-attention based scene text recognizer that (1) not only encodes the input-output attention but also learns self-attention which encodes feature-feature and target-target relationships inside the encoder and decoder and (2) learns a more powerful and robust intermediate representation to spatial distortion, and (3) owns a great training efficiency because of high training parallelization and a high-speed inference because of an efficient memory-cache mechanism. Extensive experiments on various benchmarks demonstrate the superior performance of our MASTER on both regular and irregular scene text. Pytorch code can be found at https://github.com/wenwenyu/MASTER-pytorch, and Tensorflow code can be found at https://github.com/jiangxiluning/MASTER-TF .
TL;DR: TextMountain this article predicts text center-border probability (TCBP) and text center direction (TCD) to separate text instances which cannot be easily achieved using semantic segmentation map and its rising direction can plan a road to top for each pixel on mountain foot at the group stage.
Abstract: In this paper, we propose a novel scene text detection method named TextMountain. The key idea of TextMountain is making full use of border-center information. Different from previous works that treat center-border as a binary classification problem, we predict text center-border probability (TCBP) and text center-direction (TCD). The TCBP is just like a mountain whose top is text center and foot is text border. The mountaintop can separate text instances which cannot be easily achieved using semantic segmentation map and its rising direction can plan a road to top for each pixel on mountain foot at the group stage. The TCD helps TCBP learning better. Our label rules will not lead to the ambiguous problem with the transformation of angle, so the proposed method is robust to multi-oriented text and can also handle well curved text. In inference stage, each pixel at the mountain foot needs to search the path to the mountaintop and this process can be efficiently completed in parallel, yielding the efficiency of our method compared with others. The experiments on MLT, ICDAR2015, RCTW-17 and SCUT-CTW1500 datasets demonstrate that the proposed method achieves better or comparable performance in terms of both accuracy and efficiency. It is worth mentioning our method achieves an F-measure of 76.85% on MLT which outperforms the previous methods by a large margin. Code will be made available.
TL;DR: An overview of the existing strategies proposed for MVS is presented, including their advantages and drawbacks, and the genericsteps in MVS, such as the pre-processing of video data, feature extraction, and post-processing followed by summary generation are described.
Abstract: There has been an exponential growth in the amount of visual data on a daily basis acquired from single or multi-view surveillance camera networks. This massive amount of data requires efficient mechanisms such as video summarization to ensure that only significant data are reported and the redundancy is reduced. Multi-view video summarization (MVS) is a less redundant and more concise way of providing information from the video content of all the cameras in the form of either keyframes or video segments. This paper presents an overview of the existing strategies proposed for MVS, including their advantages and drawbacks. Our survey covers the genericsteps in MVS, such as the pre-processing of video data, feature extraction, and post-processing followed by summary generation. We also describe the datasets that are available for the evaluation of MVS. Finally, we examine the major current issues related to MVS and put forward the recommendations for future research 1 .
TL;DR: In this paper, the authors discuss how COVID-19 may interact with the peripheral nervous system to cause pain in the early and late stages of the disease and the implications of this potential neurotropism.
Abstract: SARS-CoV-2 is a novel coronavirus that infects cells through the angiotensin-converting enzyme 2 receptor, aided by proteases that prime the spike protein of the virus to enhance cellular entry. Neuropilin 1 and 2 (NRP1 and NRP2) act as additional viral entry factors. SARS-CoV-2 infection causes COVID-19 disease. There is now strong evidence for neurological impacts of COVID-19, with pain as an important symptom, both in the acute phase of the disease and at later stages that are colloquially referred to as "long COVID." In this narrative review, we discuss how COVID-19 may interact with the peripheral nervous system to cause pain in the early and late stages of the disease. We begin with a review of the state of the science on how viruses cause pain through direct and indirect interactions with nociceptors. We then cover what we currently know about how the unique cytokine profiles of moderate and severe COVID-19 may drive plasticity in nociceptors to promote pain and worsen existing pain states. Finally, we review evidence for direct infection of nociceptors by SARS-CoV-2 and the implications of this potential neurotropism. The state of the science points to multiple potential mechanisms through which COVID-19 could induce changes in nociceptor excitability that would be expected to promote pain, induce neuropathies, and worsen existing pain states.
TL;DR: A new change detection method based on similarity measurement between heterogeneous images that can avoid the leakage of heterogeneous data and bring more robust change detection results is proposed.
Abstract: Change detection of heterogeneous remote sensing images is an important and challenging topic, which has found a wide range of applications in many fields, especially in the emergency situation resulting from nature disaster. However, the difference in imaging mechanism of heterogeneous sensors makes it difficult to carry out a direct comparison of images. In this paper, we propose a new change detection method based on similarity measurement between heterogeneous images. The method constructs a graph for each patch based on the nonlocal patch similarity to establish a connection between heterogeneous data, and then measures the change level by measuring how much the graph structure of one image still conforms to that of the other image. The graph structures are compared in the same domain, so it can avoid the leakage of heterogeneous data and bring more robust change detection results. Experiments demonstrate the effective performance of the proposed nonlocal patch similarity based heterogeneous change detection method.
TL;DR: In this article, a systematic literature search was performed without date limitation from the MEDLINE, Cochrane library, and Medic databases, and specific inclusion criteria were used, and risk factors before the onset of chronic symptoms were searched.
Abstract: Low back pain is the leading cause for years lived in disability. Most people with acute low back pain improve rapidly, but 4% to 25% of patients become chronic. Since the previous systematic reviews on the subject, a large number of new studies have been conducted. The objective of this article was to review the evidence of the prognostic factors behind nonspecific chronic low back pain. A systematic literature search was performed without date limitation from the MEDLINE, Cochrane library, and Medic databases. Specific inclusion criteria were used, and risk factors before the onset of chronic symptoms were searched. Study quality was assessed by 2 independent reviewers. One hundred eleven full articles were read for potential inclusion, and 25 articles met all the inclusion criteria. One study was rated as good quality, 19 studies were rated as fair quality, and 5 articles were rated as poor quality. Higher pain intensity, higher body weight, carrying heavy loads at work, difficult working positions, and depression were the most frequently observed risk factors for chronic low back pain. Maladaptive behavior strategies, general anxiety, functional limitation during the episode, smoking, and particularly physical work were also explicitly predictive of chronicity. According to this systematic review, several prognostic factors from the biomechanical, psychological and psychosocial point of view are significant for chronicity in low back pain.
TL;DR: A novel semantic segmentation approach based on shared pyramidal representation and fusion of heterogeneous features along the upsampling path is proposed, which is especially effective for dense inference in images with large scale variance due to strong regularization effects induced by feature sharing across the resolution pyramid.
Abstract: Emergence of large datasets and resilience of convolutional models have enabled successful training of very large semantic segmentation models. However, high capacity implies high computational complexity and therefore hinders real-time operation. We therefore study compact architectures which aim at high accuracy in spite of modest capacity. We propose a novel semantic segmentation approach based on shared pyramidal representation and fusion of heterogeneous features along the upsampling path. The proposed pyramidal fusion approach is especially effective for dense inference in images with large scale variance due to strong regularization effects induced by feature sharing across the resolution pyramid. Interpretation of the decision process suggests that our approach succeeds by acting as a large ensemble of relatively simple models, as well as due to large receptive range and strong gradient flow towards early layers. Our best model achieves 76.4% mIoU on Cityscapes test and runs in real time on low-power embedded devices.
TL;DR: This work proposes to leverage medical knowledge, in particular the taxonomic organization of skin lesions, which will be used to develop a hierarchical neural network and recent advances in channel and spatial attention modules, which can identify interpretable features and regions in dermoscopy images.
Abstract: Deep neural networks have rapidly become an indispensable tool in many classification applications. However, the inclusion of deep learning methods in medical diagnostic systems has come at the cost of diminishing their explainability. This significantly reduces the safety of a diagnostic system, since the physician is unable to interpret and validate the output. Therefore, in this work we aim to address this major limitation and improve the explainability of a skin cancer diagnostic system. We propose to leverage two sources of information: (i) medical knowledge, in particular the taxonomic organization of skin lesions, which will be used to develop a hierarchical neural network; and (ii) recent advances in channel and spatial attention modules, which can identify interpretable features and regions in dermoscopy images. We demonstrate that the proposed approach achieves competitive results in two dermoscopy data sets (ISIC 2017 and 2018) and provides insightful information about its decisions, thus increasing the safety of the model.
TL;DR: A scale variance minimization method that introduces certain supervision in the target domain by imposing a scale-invariance constraint while learning to segment an image and its scale-transformation concurrently and achieves superior domain adaptive segmentation performance as compared with the state-of-the-art.
Abstract: We focus on unsupervised domain adaptation (UDA) in image segmentation. Existing works address this challenge largely by aligning inter-domain representations, which may lead over-alignment that impairs the semantic structures of images and further target-domain segmentation performance. We design a scale variance minimization (SVMin) method by enforcing the intra-image semantic structure consistency in the target domain. Specifically, SVMin leverages an intrinsic property that simple scale transformation has little effect on the semantic structures of images. It thus introduces certain supervision in the target domain by imposing a scale-invariance constraint while learning to segment an image and its scale-transformation concurrently. Additionally, SVMin is complementary to most existing UDA techniques and can be easily incorporated with consistent performance boost but little extra parameters. Extensive experiments show that our method achieves superior domain adaptive segmentation performance as compared with the state-of-the-art. Preliminary studies show that SVMin can be easily adapted for UDA-based image classification.