scispace - formally typeset
Search or ask a question

Showing papers on "Feature vector published in 2021"


Proceedings ArticleDOI
20 Jun 2021
TL;DR: CoordAttention as mentioned in this paper embeds positional information into channel attention to capture long-range dependencies along one spatial direction and meanwhile precise positional information can be preserved along the other spatial direction.
Abstract: Recent studies on mobile network design have demonstrated the remarkable effectiveness of channel attention (e.g., the Squeeze-and-Excitation attention) for lifting model performance, but they generally neglect the positional information, which is important for generating spatially selective attention maps. In this paper, we propose a novel attention mechanism for mobile networks by embedding positional information into channel attention, which we call "coordinate attention". Unlike channel attention that transforms a feature tensor to a single feature vector via 2D global pooling, the coordinate attention factorizes channel attention into two 1D feature encoding processes that aggregate features along the two spatial directions, respectively. In this way, long-range dependencies can be captured along one spatial direction and meanwhile precise positional information can be preserved along the other spatial direction. The resulting feature maps are then encoded separately into a pair of direction-aware and position-sensitive attention maps that can be complementarily applied to the input feature map to augment the representations of the objects of interest. Our coordinate attention is simple and can be flexibly plugged into classic mobile networks, such as MobileNetV2, MobileNeXt, and EfficientNet with nearly no computational overhead. Extensive experiments demonstrate that our coordinate attention is not only beneficial to ImageNet classification but more interestingly, behaves better in down-stream tasks, such as object detection and semantic segmentation. Code is available at https://github.com/Andrew-Qibin/CoordAttention.

1,372 citations


Proceedings ArticleDOI
26 Jan 2021
TL;DR: ProDA as mentioned in this paper aligns the prototypical assignments based on relative feature distances for two different views of the same target, producing a more compact target feature space and distilling the already learned knowledge to a self-supervised pretrained model further boosts the performance.
Abstract: Self-training is a competitive approach in domain adaptive segmentation, which trains the network with the pseudo labels on the target domain. However inevitably, the pseudo labels are noisy and the target features are dispersed due to the discrepancy between source and target domains. In this paper, we rely on representative prototypes, the feature centroids of classes, to address the two issues for unsupervised domain adaptation. In particular, we take one step further and exploit the feature distances from prototypes that provide richer information than mere prototypes. Specifically, we use it to estimate the likelihood of pseudo labels to facilitate online correction in the course of training. Meanwhile, we align the prototypical assignments based on relative feature distances for two different views of the same target, producing a more compact target feature space. Moreover, we find that distilling the already learned knowledge to a self-supervised pretrained model further boosts the performance. Our method shows tremendous performance advantage over state-of-the-art methods. The code is available at https://github.com/microsoft/ProDA.

272 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: Patch-NetVLAD as discussed by the authors combines the advantages of both local and global descriptor methods by deriving patch-level features from NetVLAD residuals, which enables aggregation and matching of deep-learned local features defined over the feature-space grid.
Abstract: Visual Place Recognition is a challenging task for robotics and autonomous systems, which must deal with the twin problems of appearance and viewpoint change in an always changing world. This paper introduces Patch-NetVLAD, which provides a novel formulation for combining the advantages of both local and global descriptor methods by deriving patch-level features from NetVLAD residuals. Unlike the fixed spatial neighborhood regime of existing local keypoint features, our method enables aggregation and matching of deep-learned local features defined over the feature-space grid. We further introduce a multi-scale fusion of patch features that have complementary scales (i.e. patch sizes) via an integral feature space and show that the fused features are highly invariant to both condition (season, structure, and illumination) and viewpoint (translation and rotation) changes. Patch-NetVLAD achieves state-of-the-art visual place recognition results in computationally limited scenarios, validated on a range of challenging real-world datasets, including winning the Facebook Mapillary Visual Place Recognition Challenge at ECCV2020. It is also adaptable to user requirements, with a speed-optimised version operating over an order of magnitude faster than the state-of-the-art. By combining superior performance with improved computational efficiency in a configurable framework, Patch-NetVLAD is well suited to enhance both stand-alone place recognition capabilities and the overall performance of SLAM systems.

199 citations


Journal ArticleDOI
TL;DR: This article proposes an intelligent fault diagnosis method based on an improved domain adaptation method and shows that the proposed method is effective and applicable in diagnosing faults with domain mismatch.
Abstract: Nowadays, the industrial Internet of Things (IIoT) has been successfully utilized in smart manufacturing. The massive amount of data in IIoT promote the development of deep learning-based health monitoring for industrial equipment. Since monitoring data for mechanical fault diagnosis collected on different working conditions or equipment have domain mismatch, models trained with training data may not work in practical applications. Therefore, it is essential to study fault diagnosis methods with domain adaptation ability. In this article, we propose an intelligent fault diagnosis method based on an improved domain adaptation method. Specifically, two feature extractors concerning feature space distance and domain mismatch are trained using maximum mean discrepancy and domain adversarial training respectively to enhance feature representation. Since separate classifiers are trained for feature extractors, ensemble learning is further utilized to obtain final results. Experimental results indicate that the proposed method is effective and applicable in diagnosing faults with domain mismatch.

183 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: Zhang et al. as mentioned in this paper proposed a superpixel-guided clustering (SGC) and guided prototype allocation (GPA) modules for multiple prototype extraction and allocation, which extracts more representative prototypes by aggregating similar feature vectors, while GPA is able to select matched prototypes to provide more accurate guidance.
Abstract: Prototype learning is extensively used for few-shot segmentation. Typically, a single prototype is obtained from the support feature by averaging the global object information. However, using one prototype to represent all the information may lead to ambiguities. In this paper, we propose two novel modules, named superpixel-guided clustering (SGC) and guided prototype allocation (GPA), for multiple prototype extraction and allocation. Specifically, SGC is a parameter-free and training-free approach, which extracts more representative prototypes by aggregating similar feature vectors, while GPA is able to select matched prototypes to provide more accurate guidance. By integrating the SGC and GPA together, we propose the Adaptive Superpixel-guided Network (ASGNet), which is a lightweight model and adapts to object scale and shape variation. In addition, our network can easily generalize to k-shot segmentation with substantial improvement and no additional computational cost. In particular, our evaluations on COCO demonstrate that ASGNet surpasses the state-of-the-art method by 5% in 5-shot segmentation.1

172 citations


Journal ArticleDOI
TL;DR: A deep translation based change detection network (DTCDN) for optical and SAR images is proposed that utilizes deep context features to separate the unchanged pixels and changed pixels in a supervised CD network.
Abstract: With the development of space-based imaging technology, a larger and larger number of images with different modalities and resolutions are available. The optical images reflect the abundant spectral information and geometric shape of ground objects, whose qualities are degraded easily in poor atmospheric conditions. Although synthetic aperture radar (SAR) images cannot provide the spectral features of the region of interest (ROI), they can capture all-weather and all-time polarization information. In nature, optical and SAR images encapsulate lots of complementary information, which is of great significance for change detection (CD) in poor weather situations. However, due to the difference in imaging mechanisms of optical and SAR images, it is difficult to conduct their CD directly using the traditional difference or ratio algorithms. Most recent CD methods bring image translation to reduce their difference, but the results are obtained by ordinary algebraic methods and threshold segmentation with limited accuracy. Towards this end, this work proposes a deep translation based change detection network (DTCDN) for optical and SAR images. The deep translation firstly maps images from one domain (e.g., optical) to another domain (e.g., SAR) through a cyclic structure into the same feature space. With the similar characteristics after deep translation, they become comparable. Different from most previous researches, the translation results are imported to a supervised CD network that utilizes deep context features to separate the unchanged pixels and changed pixels. In the experiments, the proposed DTCDN was tested on four representative data sets from Gloucester, California, and Shuguang village. Compared with state-of-the-art methods, the effectiveness and robustness of the proposed method were confirmed.

166 citations


Journal ArticleDOI
TL;DR: A new feature selection method based on the Dempster–Shafer theory is proposed, which takes into consideration the distribution of features and results in a significant increase in the performance of MI-based BCI systems.
Abstract: The common spatial pattern (CSP) algorithm is a well-recognized spatial filtering method for feature extraction in motor imagery (MI)-based brain–computer interfaces (BCIs). However, due to the influence of nonstationary in electroencephalography (EEG) and inherent defects of the CSP objective function, the spatial filters, and their corresponding features are not necessarily optimal in the feature space used within CSP. In this work, we design a new feature selection method to address this issue by selecting features based on an improved objective function. Especially, improvements are made in suppressing outliers and discovering features with larger interclass distances. Moreover, a fusion algorithm based on the Dempster–Shafer theory is proposed, which takes into consideration the distribution of features. With two competition data sets, we first evaluate the performance of the improved objective functions in terms of classification accuracy, feature distribution, and embeddability. Then, a comparison with other feature selection methods is carried out in both accuracy and computational time. Experimental results show that the proposed methods consume less additional computational cost and result in a significant increase in the performance of MI-based BCI systems.

143 citations


Journal ArticleDOI
TL;DR: A three-stage SSL approach using data augmentation (DA) and metric learning is proposed for an intelligent bearing fault diagnosis under limited labeled data to demonstrate that the proposed method can perform better in bearing fault diagnosed under limited labeling samples than existing diagnostic methods.

141 citations


Journal ArticleDOI
TL;DR: In this paper, the authors construct a classifier for quantum machine learning and show that no classical learner can classify the data inverse-polynomially better than random guessing, assuming the widely believed hardness of the discrete logarithm problem.
Abstract: Recently, several quantum machine learning algorithms have been proposed that may offer quantum speed-ups over their classical counterparts. Most of these algorithms are either heuristic or assume that data can be accessed quantum-mechanically, making it unclear whether a quantum advantage can be proven without resorting to strong assumptions. Here we construct a classification problem with which we can rigorously show that heuristic quantum kernel methods can provide an end-to-end quantum speed-up with only classical access to data. To prove the quantum speed-up, we construct a family of datasets and show that no classical learner can classify the data inverse-polynomially better than random guessing, assuming the widely believed hardness of the discrete logarithm problem. Furthermore, we construct a family of parameterized unitary circuits, which can be efficiently implemented on a fault-tolerant quantum computer, and use them to map the data samples to a quantum feature space and estimate the kernel entries. The resulting quantum classifier achieves high accuracy and is robust against additive errors in the kernel entries that arise from finite sampling statistics. Many quantum machine learning algorithms have been proposed, but it is typically unknown whether they would outperform classical methods on practical devices. A specially constructed algorithm shows that a formal quantum advantage is possible.

136 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed an additive angular margin loss (ArcFace), which not only has a clear geometric interpretation, but also significantly enhances the discriminative power.
Abstract: Recently, a popular line of research in face recognition is adopting margins in the well-established softmax loss function to maximize class separability. In this paper, we first introduce an Additive Angular Margin Loss (ArcFace), which not only has a clear geometric interpretation but also significantly enhances the discriminative power. Since ArcFace is susceptible to the massive label noise, we further propose sub-center ArcFace, in which each class contains K sub-centers and training samples only need to be close to any of the $K$ positive sub-centers. Sub-center ArcFace encourages one dominant sub-class that contains the majority of clean faces and non-dominant sub-classes that include hard or noisy faces. Based on this self-propelled isolation, we boost the performance through automatically purifying raw web faces under massive real-world noise. Besides discriminative feature embedding, we also explore the inverse problem, mapping feature vectors to face images. Without training any additional generator or discriminator, the pre-trained ArcFace model can generate identity-preserved face images for both subjects inside and outside the training data only by using the network gradient and Batch Normalization (BN) priors. Extensive experiments demonstrate that ArcFace can enhance the discriminative feature embedding as well as strengthen the generative face synthesis.

136 citations


Journal ArticleDOI
TL;DR: In this article, the authors present a review of state-of-the-art DL-based approaches for clustering analysis that are based on representation learning, which they hope to be useful for bioinformatics research.
Abstract: Clustering is central to many data-driven bioinformatics research and serves a powerful computational method. In particular, clustering helps at analyzing unstructured and high-dimensional data in the form of sequences, expressions, texts and images. Further, clustering is used to gain insights into biological processes in the genomics level, e.g. clustering of gene expressions provides insights on the natural structure inherent in the data, understanding gene functions, cellular processes, subtypes of cells and understanding gene regulations. Subsequently, clustering approaches, including hierarchical, centroid-based, distribution-based, density-based and self-organizing maps, have long been studied and used in classical machine learning settings. In contrast, deep learning (DL)-based representation and feature learning for clustering have not been reviewed and employed extensively. Since the quality of clustering is not only dependent on the distribution of data points but also on the learned representation, deep neural networks can be effective means to transform mappings from a high-dimensional data space into a lower-dimensional feature space, leading to improved clustering results. In this paper, we review state-of-the-art DL-based approaches for cluster analysis that are based on representation learning, which we hope to be useful, particularly for bioinformatics research. Further, we explore in detail the training procedures of DL-based clustering algorithms, point out different clustering quality metrics and evaluate several DL-based approaches on three bioinformatics use cases, including bioimaging, cancer genomics and biomedical text mining. We believe this review and the evaluation results will provide valuable insights and serve a starting point for researchers wanting to apply DL-based unsupervised methods to solve emerging bioinformatics research problems.

Proceedings ArticleDOI
22 Mar 2021
TL;DR: Zhang et al. as mentioned in this paper decompose the sample similarity computation into two stages, i.e., the intra-camera and inter-camera computations, respectively, and propose a novel intra-inter camera similarity for pseudo-label generation.
Abstract: Most of unsupervised person Re-Identification (Re-ID) works produce pseudo-labels by measuring the feature similarity without considering the distribution discrepancy among cameras, leading to degraded accuracy in label computation across cameras. This paper targets to address this challenge by studying a novel intra-inter camera similarity for pseudo-label generation. We decompose the sample similarity computation into two stage, i.e., the intra-camera and inter-camera computations, respectively. The intra-camera computation directly leverages the CNN features for similarity computation within each camera. Pseudo-labels generated on different cameras train the re-id model in a multi-branch network. The second stage considers the classification scores of each sample on different cameras as a new feature vector. This new feature effectively alleviates the distribution discrepancy among cameras and generates more reliable pseudo-labels. We hence train our re-id model in two stages with intra-camera and inter-camera pseudo-labels, respectively. This simple intra-inter camera similarity produces surprisingly good performance on multiple datasets, e.g., achieves rank-1 accuracy of 89.5% on the Market1501 dataset, outperforming the recent unsupervised works by 9+%, and is comparable with the latest transfer learning works that leverage extra annotations.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: Wang et al. as mentioned in this paper proposed a feature-space video coding network (FVC) by performing all major operations (i.e., motion estimation, motion compression, motion compensation and residual compression) in the feature space.
Abstract: Learning based video compression attracts increasing attention in the past few years. The previous hybrid coding approaches rely on pixel space operations to reduce spatial and temporal redundancy, which may suffer from inaccurate motion estimation or less effective motion compensation. In this work, we propose a feature-space video coding network (FVC) by performing all major operations (i.e., motion estimation, motion compression, motion compensation and residual compression) in the feature space. Specifically, in the proposed deformable compensation module, we first apply motion estimation in the feature space to produce motion information (i.e., the offset maps), which will be compressed by using the auto-encoder style network. Then we perform motion compensation by using deformable convolution and generate the predicted feature. After that, we compress the residual feature between the feature from the current frame and the predicted feature from our deformable compensation module. For better frame reconstruction, the reference features from multiple previous reconstructed frames are also fused by using the nonlocal attention mechanism in the multi-frame feature fusion module. Comprehensive experimental results demonstrate that the proposed framework achieves the state-of-the-art performance on four benchmark datasets including HEVC, UVG, VTL and MCL-JCV.

Journal ArticleDOI
TL;DR: This work studies the image retrieval problem at the wireless edge, where an edge device captures an image, which is then used to retrieve similar images from an edge server, and proposes two alternative schemes based on digital and analog communications.
Abstract: We study the image retrieval problem at the wireless edge, where an edge device captures an image, which is then used to retrieve similar images from an edge server. These can be images of the same person or a vehicle taken from other cameras at different times and locations. Our goal is to maximize the accuracy of the retrieval task under power and bandwidth constraints over the wireless link. Due to the stringent delay constraint of the underlying application, sending the whole image at a sufficient quality is not possible. We propose two alternative schemes based on digital and analog communications, respectively. In the digital approach, we first propose a deep neural network (DNN) aided retrieval-oriented image compression scheme, whose output bit sequence is transmitted over the channel using conventional channel codes. In the analog joint source and channel coding (JSCC) approach, the feature vectors are directly mapped into channel symbols. We evaluate both schemes on image based re-identification (re-ID) tasks under different channel conditions, including both static and fading channels. We show that the JSCC scheme significantly increases the end-to-end accuracy, speeds up the encoding process, and provides graceful degradation with channel conditions. The proposed architecture is evaluated through extensive simulations on different datasets and channel conditions, as well as through ablation studies.

Journal ArticleDOI
TL;DR: The model using semantic representation as input verifies that more accurate results can be obtained by introducing a high-level semantic representation, and shows that it is feasible and effective to introduce high- level and abstract forms of knowledge representation into deep learning tasks.
Abstract: In visual reasoning, the achievement of deep learning significantly improved the accuracy of results. Image features are primarily used as input to get answers. However, the image features are too redundant to learn accurate characterizations within a limited complexity and time. While in the process of human reasoning, abstract description of an image is usually to avoid irrelevant details. Inspired by this, a higher-level representation named semantic representation is introduced. In this paper, a detailed visual reasoning model is proposed. This new model contains an image understanding model based on semantic representation, feature extraction and process model refined with watershed and u-distance method, a feature vector learning model using pyramidal pooling and residual network, and a question understanding model combining problem embedding coding method and machine translation decoding method. The feature vector could better represent the whole image instead of overly focused on specific characteristics. The model using semantic representation as input verifies that more accurate results can be obtained by introducing a high-level semantic representation. The result also shows that it is feasible and effective to introduce high-level and abstract forms of knowledge representation into deep learning tasks. This study lays a theoretical and experimental foundation for introducing different levels of knowledge representation into deep learning in the future.

Journal ArticleDOI
18 Apr 2021-Sensors
TL;DR: In this article, an innovative method called BCAoMID-F (Binarized Common Areas of Maximum Image Differences-Fusion) is proposed to extract features of thermal images of three angle grinders.
Abstract: The paper presents an analysis and classification method to evaluate the working condition of angle grinders by means of infrared (IR) thermography and IR image processing. An innovative method called BCAoMID-F (Binarized Common Areas of Maximum Image Differences—Fusion) is proposed in this paper. This method is used to extract features of thermal images of three angle grinders. The computed features are 1-element or 256-element vectors. Feature vectors are the sum of pixels of matrix V or PCA of matrix V or histogram of matrix V. Three different cases of thermal images were considered: healthy angle grinder, angle grinder with 1 blocked air inlet, angle grinder with 2 blocked air inlets. The classification of feature vectors was carried out using two classifiers: Support Vector Machine and Nearest Neighbor. Total recognition efficiency for 3 classes (TRAG) was in the range of 98.5–100%. The presented technique is efficient for fault diagnosis of electrical devices and electric power tools.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: NBNet as mentioned in this paper proposes a non-local attention module to explicitly learn the basis generation as well as subspace projection, which achieves state-of-the-art performance on PSNR and SSIM with significantly less computational cost.
Abstract: In this paper, we introduce NBNet, a novel framework for image denoising. Unlike previous works, we propose to tackle this challenging problem from a new perspective: noise reduction by image-adaptive projection. Specifically, we propose to train a network that can separate signal and noise by learning a set of reconstruction basis in the feature space. Subsequently, image denosing can be achieved by selecting corresponding basis of the signal subspace and projecting the input into such space. Our key insight is that projection can naturally maintain the local structure of input signal, especially for areas with low light or weak textures. Towards this end, we propose SSA, a non-local attention module we design to explicitly learn the basis generation as well as subspace projection. We further incorporate SSA with NBNet, a UNet structured network designed for end-to-end image denosing based. We conduct evaluations on benchmarks, including SIDD and DND, and NBNet achieves state-of-the-art performance on PSNR and SSIM with significantly less computational cost.

Proceedings ArticleDOI
20 Jun 2021
TL;DR: Hanzy et al. as mentioned in this paper proposed a contrastive embedding for generalized zero-shot learning (GZSL), which integrates the generation model with the embedding model, yielding a hybrid GZSL framework that maps both the real and the synthetic samples produced by the generator model into an embedding space.
Abstract: Generalized zero-shot learning (GZSL) aims to recognize objects from both seen and unseen classes, when only the labeled examples from seen classes are provided. Recent feature generation methods learn a generative model that can synthesize the missing visual features of unseen classes to mitigate the data-imbalance problem in GZSL. However, the original visual feature space is suboptimal for GZSL classification since it lacks discriminative information. To tackle this issue, we propose to integrate the generation model with the embedding model, yielding a hybrid GZSL framework. The hybrid GZSL approach maps both the real and the synthetic samples produced by the generation model into an embedding space, where we perform the final GZSL classification. Specifically, we propose a contrastive embedding (CE) for our hybrid GZSL framework. The proposed contrastive embedding can leverage not only the class-wise supervision but also the instance-wise supervision, where the latter is usually neglected by existing GZSL researches. We evaluate our proposed hybrid GZSL framework with contrastive embedding, named CE-GZSL, on five benchmark datasets. The results show that our CEGZSL method can outperform the state-of-the-arts by a significant margin on three datasets. Our codes are available on https://github.com/Hanzy1996/CE-GZSL.

Journal ArticleDOI
TL;DR: This study proposes a latent-factor-analysis-based online sparse-streaming-feature selection algorithm (LOSSA), which is to apply latent factor analysis to pre-estimate missing data in sparse streaming features before conducting feature selection, thereby addressing the missing data issue effectively and efficiently.
Abstract: Online streaming feature selection (OSFS) has attracted extensive attention during the past decades. Current approaches commonly assume that the feature space of fixed data instances dynamically increases without any missing data. However, this assumption does not always hold in many real applications. Motivated by this observation, this study aims to implement online feature selection from sparse streaming features, i.e., features flow in one by one with missing data as instance count remains fixed. To do so, this study proposes a latent-factor-analysis-based online sparse-streaming-feature selection algorithm (LOSSA). Its main idea is to apply latent factor analysis to pre-estimate missing data in sparse streaming features before conducting feature selection, thereby addressing the missing data issue effectively and efficiently. Theoretical and empirical studies indicate that LOSSA can significantly improve the quality of OSFS when missing data are encountered in target instances.

Journal ArticleDOI
TL;DR: A knowledge mapping-based adversarial domain adaptation (KMADA) method with a discriminator and a feature extractor to generalize knowledge from target to source domain and indicates the irreplaceable superiority of the KMADA, which achieves the highest diagnosis accuracy.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: Zhang et al. as mentioned in this paper proposed a self-guided learning approach, where the lost critical information is mined through making an initial prediction for the annotated support image, the covered and uncovered foreground regions are encoded to the primary and auxiliary support vectors using masked GAP, respectively.
Abstract: Few-shot segmentation has been attracting a lot of attention due to its effectiveness to segment unseen object classes with a few annotated samples. Most existing approaches use masked Global Average Pooling (GAP) to encode an annotated support image to a feature vector to facilitate query image segmentation. However, this pipeline unavoidably loses some discriminative information due to the average operation. In this paper, we propose a simple but effective self-guided learning approach, where the lost critical information is mined. Specifically, through making an initial prediction for the annotated support image, the covered and uncovered foreground regions are encoded to the primary and auxiliary support vectors using masked GAP, respectively. By aggregating both primary and auxiliary support vectors, better segmentation performances are obtained on query images. Enlightened by our self-guided module for 1-shot segmentation, we propose a cross-guided module for multiple shot segmentation, where the final mask is fused using predictions from multiple annotated samples with high-quality support vectors contributing more and vice versa. This module improves the final prediction in the inference stage without re-training. Extensive experiments show that our approach achieves new state-of-the-art performances on both PASCAL-5i and COCO-20i datasets. Source code is available at https://github.com/zbf1991/SCL.

Proceedings ArticleDOI
20 Jun 2021
TL;DR: Lee et al. as mentioned in this paper proposed a class margin equilibrium (CME) approach to optimize both feature space partition and novel class reconstruction in a systematic way by using a fully connected layer to decouple localization features and then reserve adequate margin space for novel classes by introducing simple-yet effective class margin loss during feature learning.
Abstract: Few-shot object detection has made substantial progress by representing novel class objects using the feature representation learned upon a set of base class objects. However, an implicit contradiction between novel class classification and representation is unfortunately ignored. On the one hand, to achieve accurate novel class classification, the distributions of either two base classes must be far away from each other (max-margin). On the other hand, to precisely represent novel classes, the distributions of base classes should be close to each other to reduce the intra-class distance of novel classes (min-margin). In this paper, we propose a class margin equilibrium (CME) approach, with the aim to optimize both feature space partition and novel class reconstruction in a systematic way. CME first converts the few-shot detection problem to the few-shot classification problem by using a fully connected layer to decouple localization features. CME then reserves adequate margin space for novel classes by introducing simple-yet-effective class margin loss during feature learning. Finally, CME pursues margin equilibrium by disturbing the features of novel class instances in an adversarial min-max fashion. Experiments on Pascal VOC and MS-COCO datasets show that CME significantly improves upon two baseline detectors (up to 3 ~ 5% in average), achieving state-of-the-art performance. Code is available at https://github.com/BohaoLee/CME.

Journal ArticleDOI
TL;DR: This work resent a MV-UFS model via cross-view local structure preserved diversity and consensus learning, referred to as CvLP-DCL briefly, and regularize the fact that different views represent same samples to solve the resultant optimization problem.
Abstract: Although demonstrating great success, previous multi-view unsupervised feature selection (MV-UFS) methods often construct a view-specific similarity graph and characterize the local structure of data within each single view. In such a way, the cross-view information could be ignored. In addition, they usually assume that different feature views are projected from a latent feature space while the diversity of different views cannot be fully captured. In this work, we resent a MV-UFS model via cross-view local structure preserved diversity and consensus learning, referred to as CvLP-DCL briefly. In order to exploit both the shared and distinguishing information across different views, we project each view into a label space, which consists of a consensus part and a view-specific part. Therefore, we regularize the fact that different views represent same samples. Meanwhile, a cross-view similarity graph learning term with matrix-induced regularization is embedded to preserve the local structure of data in the label space. By imposing the $l_{2,1}$ -norm on the feature projection matrices for constraining row sparsity, discriminative features can be selected from different views. An efficient algorithm is designed to solve the resultant optimization problem and extensive experiments on six publicly datasets are conducted to validate the effectiveness of the proposed CvLP-DCL.

Journal ArticleDOI
TL;DR: The proposed method, so-called a deep stacked Laplacian restorer (DSLR), is capable of separately recovering the global illumination and local details from the original input, and progressively combining them in the image space, and outperforms state-of-the-art methods.
Abstract: Various images captured in complicated lighting conditions often suffer from deterioration of the image quality. Such poor quality not only dissatisfies the user expectation but also may lead to a significant performance drop in many applications. In this paper, anovel method for low-light image enhancement is proposed by leveraging useful propertiesof the Laplacian pyramid both in image and feature spaces. Specifically, the proposed method, so-called a deep stacked Laplacian restorer (DSLR), is capable of separately recovering the global illumination and local details from the original input, and progressively combining them in the image space. Moreover, the Laplacian pyramid defined in the feature space makes such recovering processes more efficient based on abundant connectionsof higher-order residuals in a multiscale structure. This decomposition-based scheme is fairly desirable for learning the highly nonlinear relation between degraded images and their enhanced results. Experimental results on various datasets demonstrate that the proposed DSLR outperforms state-of-the-art methods. The code and model are publicly available at: https://github.com/SeokjaeLIM/DSLR-release .

Journal ArticleDOI
TL;DR: This work proposes a distributed sensor-fault detection and diagnosis system based on machine learning algorithms where the fault detection block is implemented in the sensor in order to achieve output immediately after data collection and shows the efficiency of the proposed fuzzy learning-based model over classic neuro-fuzzy and non- fuzzy learning approaches.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a two-layer feature selection approach that combines a wrapper and an embedded method in constructing an appropriate subset of predictors to improve the prediction accuracy.
Abstract: Feature selection, as a critical pre-processing step for machine learning, aims at determining representative predictors from a high-dimensional feature space dataset to improve the prediction accuracy. However, the increase in feature space dimensionality, comparing to the number of observations, poses a severe challenge to many existing feature selection methods considering computational efficiency and prediction performance. This paper presents a new two-layer feature selection approach that combines a wrapper and an embedded method in constructing an appropriate subset of predictors. In the first layer of the proposed method, Genetic Algorithm (GA) has been adopted as a wrapper to search for the optimal subset of predictors, which aims to reduce the number of predictors and the prediction error. As one of the meta-heuristic approaches, GA is selected due to its computational efficiency; however, GAs do not guarantee the optimality. To address this issue, a second layer is added to the proposed method to eliminate any remaining redundant/irrelevant predictors to improve the prediction accuracy. Elastic Net (EN) has been selected as the embedded method in the second layer because of its flexibility in adjusting the penalty terms in the regularization process and time efficiency. This two-layer approach has been applied on a Maize genetic dataset from NAM population, which consists of multiple subsets of datasets with different ratios of the number of predictors to the number of observations. The numerical results confirm the superiority of the proposed model.

Journal ArticleDOI
TL;DR: This paper proposes a new approach, called Feature Space Metric-based Meta-learning Model (FSM3), to overcome the challenge of the few-shot fault diagnosis under multiple limited data conditions, which is a mixture of general supervised learning and episodic metric meta-learning.

Journal ArticleDOI
TL;DR: The proposed machine learning method for the detection of viral epidemics by analyzing X‐ray and CT images has leveraging performance, especially to make the diagnosis of COVID‐19 in a short time and effectively.
Abstract: Necessary screenings must be performed to control the spread of the COVID-19 in daily life and to make a preliminary diagnosis of suspicious cases. The long duration of pathological laboratory tests and the suspicious test results led the researchers to focus on different fields. Fast and accurate diagnoses are essential for effective interventions for COVID-19. The information obtained by using X-ray and Computed Tomography (CT) images is vital in making clinical diagnoses. Therefore it is aimed to develop a machine learning method for the detection of viral epidemics by analyzing X-ray and CT images. In this study, images belonging to six situations, including coronavirus images, are classified using a two-stage data enhancement approach. Since the number of images in the dataset is deficient and unbalanced, a shallow image augmentation approach was used in the first phase. It is more convenient to analyze these images with hand-crafted feature extraction methods because the dataset newly created is still insufficient to train a deep architecture. Therefore, the Synthetic minority over-sampling technique algorithm is the second data enhancement step of this study. Finally, the feature vector is reduced in size by using a stacked auto-encoder and principal component analysis methods to remove interconnected features in the feature vector. According to the obtained results, it is seen that the proposed method has leveraging performance, especially to make the diagnosis of COVID-19 in a short time and effectively. Also, it is thought to be a source of inspiration for future studies for deficient and unbalanced datasets.

Journal ArticleDOI
TL;DR: This work proposes an unsupervised domain adaptation with disentangled representation (DR-UDA) approach to improve the generalization capability of PAD into new scenarios and shows promisinggeneralization capability in several public-domain face PAD databases.
Abstract: Face presentation attack detection (PAD) is essential for securing the widely used face recognition systems. Most of the existing PAD methods do not generalize well to unseen scenarios because labeled training data of the new domain is usually not available. In light of this, we propose an unsupervised domain adaptation with disentangled representation (DR-UDA) approach to improve the generalization capability of PAD into new scenarios. DR-UDA consists of three modules, i.e., ML-Net, UDA-Net and DR-Net. ML-Net aims to learn a discriminative feature representation using the labeled source domain face images via metric learning. UDA-Net performs unsupervised adversarial domain adaptation in order to optimize the source domain and target domain encoders jointly, and obtain a common feature space shared by both domains. As a result, the source domain PAD model can be effectively transferred to the unlabeled target domain for PAD. DR-Net further disentangles the features irrelevant to specific domains by reconstructing the source and target domain face images from the common feature space. Therefore, DR-UDA can learn a disentangled representation space which is generative for face images in both domains and discriminative for live vs. spoof classification. The proposed approach shows promising generalization capability in several public-domain face PAD databases.

Proceedings ArticleDOI
Gernot Riegler1, Vladlen Koltun1
01 Jun 2021
TL;DR: Stable View Synthesis (SVS) as discussed by the authors is a view-dependent on-surface feature aggregation, in which directional feature vectors at each 3D point are processed to produce a new feature vector for a ray that maps this point into the new target view.
Abstract: We present Stable View Synthesis (SVS). Given a set of source images depicting a scene from freely distributed viewpoints, SVS synthesizes new views of the scene. The method operates on a geometric scaffold computed via structure-from-motion and multi-view stereo. Each point on this 3D scaffold is associated with view rays and corresponding feature vectors that encode the appearance of this point in the input images. The core of SVS is view-dependent on-surface feature aggregation, in which directional feature vectors at each 3D point are processed to produce a new feature vector for a ray that maps this point into the new target view. The target view is then rendered by a convolutional network from a tensor of features synthesized in this way for all pixels. The method is composed of differentiable modules and is trained end-to-end. It supports spatially-varying view-dependent importance weighting and feature transformation of source images at each point; spatial and temporal stability due to the smooth dependence of on-surface feature aggregation on the target view; and synthesis of view-dependent effects such as specular reflection. Experimental results demonstrate that SVS outperforms state-of-the-art view synthesis methods both quantitatively and qualitatively on three diverse real-world datasets, achieving unprecedented levels of realism in free-viewpoint video of challenging large-scale scenes. Code is available at https://github.com/intel-isl/StableViewSynthesis