scispace - formally typeset
Search or ask a question

Showing papers on "Feature (machine learning) published in 2021"


Proceedings ArticleDOI
11 Mar 2021
TL;DR: MagFace as discussed by the authors introduces an adaptive mechanism to learn a well-structured within-class feature distributions by pulling easy samples to class centers while pushing hard samples away, which prevents models from overfitting on noisy low-quality samples and improves face recognition in the wild.
Abstract: The performance of face recognition system degrades when the variability of the acquired faces increases. Prior work alleviates this issue by either monitoring the face quality in pre-processing or predicting the data uncertainty along with the face feature. This paper proposes MagFace, a category of losses that learn a universal feature embedding whose magnitude can measure the quality of the given face. Under the new loss, it can be proven that the magnitude of the feature embedding monotonically increases if the subject is more likely to be recognized. In addition, Mag-Face introduces an adaptive mechanism to learn a well-structured within-class feature distributions by pulling easy samples to class centers while pushing hard samples away. This prevents models from overfitting on noisy low-quality samples and improves face recognition in the wild. Extensive experiments conducted on face recognition, quality assessments as well as clustering demonstrate its superiority over state-of-the-arts. The code is available at https://github.com/IrvingMeng/MagFace.

268 citations


Posted Content
TL;DR: Experiments on text-to-video retrieval and video question answering on six datasets demonstrate that CLIPBERT outperforms (or is on par with) existing methods that exploit full-length videos, suggesting that end- to-end learning with just a few sparsely sampled clips is often more accurate than using densely extracted offline features from full- length videos, proving the proverbial less-is-more principle.
Abstract: The canonical approach to video-and-language learning (e.g., video question answering) dictates a neural model to learn from offline-extracted dense video features from vision models and text features from language models. These feature extractors are trained independently and usually on tasks different from the target domains, rendering these fixed features sub-optimal for downstream tasks. Moreover, due to the high computational overload of dense video features, it is often difficult (or infeasible) to plug feature extractors directly into existing approaches for easy finetuning. To provide a remedy to this dilemma, we propose a generic framework ClipBERT that enables affordable end-to-end learning for video-and-language tasks, by employing sparse sampling, where only a single or a few sparsely sampled short clips from a video are used at each training step. Experiments on text-to-video retrieval and video question answering on six datasets demonstrate that ClipBERT outperforms (or is on par with) existing methods that exploit full-length videos, suggesting that end-to-end learning with just a few sparsely sampled clips is often more accurate than using densely extracted offline features from full-length videos, proving the proverbial less-is-more principle. Videos in the datasets are from considerably different domains and lengths, ranging from 3-second generic domain GIF videos to 180-second YouTube human activity videos, showing the generalization ability of our approach. Comprehensive ablation studies and thorough analyses are provided to dissect what factors lead to this success. Our code is publicly available at this https URL

267 citations


Journal ArticleDOI
TL;DR: This study conceptually and empirically explores the most representative FEAs and determines the optimal sets of new features and the quality of the various transformed feature spaces in terms of statistical significance and power analysis, and the FEA efficacy interms of classification accuracy and speed.

229 citations


Journal ArticleDOI
28 Apr 2021
TL;DR: In this paper, an attention-based deep learning architecture called AttnSleep was proposed to classify sleep stages using single-channel EEG signals, which leverages a multi-head attention mechanism to capture the temporal dependencies among the extracted features.
Abstract: Automatic sleep stage mymargin classification is of great importance to measure sleep quality. In this paper, we propose a novel attention-based deep learning architecture called AttnSleep to classify sleep stages using single channel EEG signals. This architecture starts with the feature extraction module based on multi-resolution convolutional neural network (MRCNN) and adaptive feature recalibration (AFR). The MRCNN can extract low and high frequency features and the AFR is able to improve the quality of the extracted features by modeling the inter-dependencies between the features. The second module is the temporal context encoder (TCE) that leverages a multi-head attention mechanism to capture the temporal dependencies among the extracted features. Particularly, the multi-head attention deploys causal convolutions to model the temporal relations in the input features. We evaluate the performance of our proposed AttnSleep model using three public datasets. The results show that our AttnSleep outperforms state-of-the-art techniques in terms of different evaluation metrics. Our source codes, experimental data, and supplementary materials are available at https://github.com/emadeldeen24/AttnSleep .

205 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: In this paper, a two-stage learning approach was proposed to utilize a dynamically expandable representation for more effective incremental concept modeling, where at each incremental step, the previously learned representation was augmented with additional feature dimensions from a new learnable feature extractor.
Abstract: We address the problem of class incremental learning, which is a core step towards achieving adaptive vision intelligence. In particular, we consider the task setting of incremental learning with limited memory and aim to achieve better stability-plasticity trade-off. To this end, we propose a novel two-stage learning approach that utilizes a dynamically expandable representation for more effective incremental concept modeling. Specifically, at each incremental step, we freeze the previously learned representation and augment it with additional feature dimensions from a new learnable feature extractor. This enables us to integrate new visual concepts with retaining learned knowledge. We dynamically expand the representation according to the complexity of novel concepts by introducing a channel-level mask-based pruning strategy. Moreover, we introduce an auxiliary loss to encourage the model to learn diverse and discriminate features for novel concepts. We conduct extensive experiments on the three class incremental learning benchmarks and our method consistently outperforms other methods with a large margin.1

196 citations


Journal ArticleDOI
01 Jan 2021
TL;DR: This study proposes a two-stage multiobjective feature-selection method that optimizes the number of features as well as model classification performance, and shows that the proposed model achieved similar classification performance while greatly reducing the cardinality of the feature subset.
Abstract: Many bankruptcy prediction models for small and medium-sized enterprises (SMEs) are built using accounting-based financial ratios. This study proposes a bankruptcy prediction model for SMEs that uses transactional data and payment network–based variables under a scenario where no financial (accounting) data are required. Offline and online test results both confirmed the predictive capability and economic benefit of transactional data–based variables. However, incorporating those features in predictive models produces high dimensional problems, which deteriorates model interpretability and increases feature acquisition costs. Thus, we propose a two-stage multiobjective feature-selection method that optimizes the number of features as well as model classification performance. The results showed that the proposed model achieved similar classification performance while greatly reducing the cardinality of the feature subset. Finally, the feature importance evaluation for features in the optimal subset confirmed the importance of transactional data and payment network-based variables for bankruptcy prediction.

173 citations


Journal ArticleDOI
TL;DR: An AI system based on deep meta learning is proposed in this research to accelerate analysis of chest X-ray (CXR) images in automatic detection of COVID-19 cases and achieves 95.6% accuracy and AUC of 0.97 in diagnosing CO VID-19 from CXR images even with a limited number of training samples.

153 citations


Proceedings ArticleDOI
01 Jan 2021
TL;DR: In this article, the authors proposed a Deep Attentive Center Loss (DACL) method to adaptively select a subset of significant feature elements for enhanced discrimination, which integrates an attention mechanism to estimate attention weights correlated with feature importance.
Abstract: Learning discriminative features for Facial Expression Recognition (FER) in the wild using Convolutional Neural Networks (CNNs) is a non-trivial task due to the significant intra-class variations and inter-class similarities. Deep Metric Learning (DML) approaches such as center loss and its variants jointly optimized with softmax loss have been adopted in many FER methods to enhance the discriminative power of learned features in the embedding space. However, equally supervising all features with the metric learning method might include irrelevant features and ultimately degrade the generalization ability of the learning algorithm. We propose a Deep Attentive Center Loss (DACL) method to adaptively select a subset of significant feature elements for enhanced discrimination. The proposed DACL integrates an attention mechanism to estimate attention weights correlated with feature importance using the intermediate spatial feature maps in CNN as context. The estimated weights accommodate the sparse formulation of center loss to selectively achieve intra-class compactness and inter-class separation for the relevant information in the embedding space. An extensive study on two widely used wild FER datasets demonstrates the superiority of the proposed DACL method compared to state-of-the-art methods.

137 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: In this article, Fang et al. proposed an autonomous, bidirectional and iterative ABINet for scene text recognition, which blocks gradient flow between vision and language models to enforce explicitly language modeling.
Abstract: Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively model linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from: 1) implicitly language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet for scene text recognition. Firstly, the autonomous suggests to block gradient flow between vision and language models to enforce explicitly language modeling. Secondly, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for language model which can effectively alleviate the impact of noise input. Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively. Extensive experiments indicate that ABINet has superiority on low-quality images and achieves state-of-the-art results on several mainstream benchmarks. Besides, the ABINet trained with ensemble self-training shows promising improvement in realizing human-level recognition. Code is available at https://github.com/FangShancheng/ABINet.

136 citations


Journal ArticleDOI
22 Mar 2021-Sensors
TL;DR: In this paper, the authors proposed a method for brain tumor classification using an ensemble of deep features and machine learning classifiers, where the top three deep features which perform well on several machine-learning classifiers are selected and concatenated as an ensemble-of-deep features which is then fed into several machine learning classes to predict the final output.
Abstract: Brain tumor classification plays an important role in clinical diagnosis and effective treatment. In this work, we propose a method for brain tumor classification using an ensemble of deep features and machine learning classifiers. In our proposed framework, we adopt the concept of transfer learning and uses several pre-trained deep convolutional neural networks to extract deep features from brain magnetic resonance (MR) images. The extracted deep features are then evaluated by several machine learning classifiers. The top three deep features which perform well on several machine learning classifiers are selected and concatenated as an ensemble of deep features which is then fed into several machine learning classifiers to predict the final output. To evaluate the different kinds of pre-trained models as a deep feature extractor, machine learning classifiers, and the effectiveness of an ensemble of deep feature for brain tumor classification, we use three different brain magnetic resonance imaging (MRI) datasets that are openly accessible from the web. Experimental results demonstrate that an ensemble of deep features can help improving performance significantly, and in most cases, support vector machine (SVM) with radial basis function (RBF) kernel outperforms other machine learning classifiers, especially for large datasets.

128 citations


Journal ArticleDOI
TL;DR: A novel wavelet-driven deep neural network, termed as WaveletKernelNet (WKN), is presented, where a continuous wavelet convolutional (CWConv) layer is designed to replace the first convolutionAL layer of the standard CNN.
Abstract: Convolutional neural network (CNN), with the ability of feature learning and nonlinear mapping, has demonstrated its effectiveness in prognostics and health management (PHM). However, an explanation on the physical meaning of a CNN architecture has rarely been studied. In this article, a novel wavelet-driven deep neural network, termed as WaveletKernelNet (WKN), is presented, where a continuous wavelet convolutional (CWConv) layer is designed to replace the first convolutional layer of the standard CNN. This enables the first CWConv layer to discover more meaningful kernels. Furthermore, only the scale parameter and translation parameter are directly learned from raw data at this CWConv layer. This provides a very effective way to obtain a customized kernel bank, specifically tuned for extracting defect-related impact component embedded in the vibration signal. In addition, three experimental studies using data from laboratory environment are carried out to verify the effectiveness of the proposed method for mechanical fault diagnosis. The experimental results show that the accuracy of the WKNs is higher than CNN by more than 10%, which indicate the importance of the designed CWConv layer. Besides, through theoretical analysis and feature map visualization, it is found that the WKNs are interpretable, have fewer parameters, and have the ability to converge faster within the same training epochs.

Journal ArticleDOI
TL;DR: A novel few-shot learning method named multi-scale metric learning (MSML) is proposed to extract multi- Scale features and learn the multi- scale relations between samples for the classification of few- shot learning.
Abstract: Few-shot learning in image classification is developed to learn a model that aims to identify unseen classes with only few training samples for each class. Fewer training samples and new tasks of classification make many traditional classification models no longer applicable. In this paper, a novel few-shot learning method named multi-scale metric learning (MSML) is proposed to extract multi-scale features and learn the multi-scale relations between samples for the classification of few-shot learning. In the proposed method, a feature pyramid structure is introduced for multi-scale feature embedding, which aims to combine high-level strong semantic features with low-level but abundant visual features. Then a multi-scale relation generation network (MRGN) is developed for hierarchical metric learning, in which high-level features are corresponding to deeper metric learning while low-level features are corresponding to lighter metric learning. Moreover, a novel loss function named intra-class and inter-class relation loss (IIRL) is proposed to optimize the proposed deep network, which aims to strengthen the correlation between homogeneous groups of samples and weaken the correlation between heterogeneous groups of samples. Experimental results on mini ImageNet and tiered ImageNet demonstrate that the proposed method achieves superior performance in few-shot learning problem.

Journal ArticleDOI
TL;DR: The results suggest that self-supervision may pave the way to a wider use of deep learning models on EEG data, and linear classifiers trained on SSL-learned features consistently outperformed purely supervised deep neural networks in low-labeled data regimes while reaching competitive performance when all labels were available.
Abstract: Objective. Supervised learning paradigms are often limited by the amount of labeled data that is available. This phenomenon is particularly problematic in clinically-relevant data, such as electroencephalography (EEG), where labeling can be costly in terms of specialized expertise and human processing time. Consequently, deep learning architectures designed to learn on EEG data have yielded relatively shallow models and performances at best similar to those of traditional feature-based approaches. However, in most situations, unlabeled data is available in abundance. By extracting information from this unlabeled data, it might be possible to reach competitive performance with deep neural networks despite limited access to labels. Approach. We investigated self-supervised learning (SSL), a promising technique for discovering structure in unlabeled data, to learn representations of EEG signals. Specifically, we explored two tasks based on temporal context prediction as well as contrastive predictive coding on two clinically-relevant problems: EEG-based sleep staging and pathology detection. We conducted experiments on two large public datasets with thousands of recordings and performed baseline comparisons with purely supervised and hand-engineered approaches. Main results. Linear classifiers trained on SSL-learned features consistently outperformed purely supervised deep neural networks in low-labeled data regimes while reaching competitive performance when all labels were available. Additionally, the embeddings learned with each method revealed clear latent structures related to physiological and clinical phenomena, such as age effects. Significance. We demonstrate the benefit of SSL approaches on EEG data. Our results suggest that self-supervision may pave the way to a wider use of deep learning models on EEG data.

Journal ArticleDOI
TL;DR: The authors have presented the feature-based method for 2D face images, which uses speeded up robust features (SURF) and scale-invariant feature transform (SIFT) for feature extraction and has a maximum recognition accuracy of 99.7%.
Abstract: Face recognition is the process of identifying people through facial images. It has become vital for security and surveillance applications and required everywhere including institutions, organizations, offices, and social places. There are a number of challenges faced in face recognition which includes face pose, age, gender, illumination, and other variable condition. Another challenge is that the database size for these applications is usually small. So, training and recognition become difficult. Face recognition methods can be divided into two major categories, appearance-based method and feature-based method. In this paper, the authors have presented the feature-based method for 2D face images. speeded up robust features (SURF) and scale-invariant feature transform (SIFT) are used for feature extraction. Five public datasets, namely Yale2B, Face 94, M2VTS, ORL, and FERET, are used for experimental work. Various combinations of SIFT and SURF features with two classification techniques, namely decision tree and random forest, have experimented in this work. A maximum recognition accuracy of 99.7% has been reported by the authors with a combination of SIFT (64-components) and SURF (32-components).

Journal ArticleDOI
TL;DR: In this paper, the authors performed a systematic comparison of 40 different EDA features using three feature selection methods, Joint Mutual Information (JMI), Conditional Mutual Information Maximization (CMIM), and Double Input Symmetrical Relevance (DISR), and found that approximately the same numbers of features are required to obtain the optimal accuracy for the arousal recognition and the valence recognition.
Abstract: Electrodermal activity (EDA) is indicative of psychological processes related to human cognition and emotions. Previous research has studied many methods for extracting EDA features; however, their appropriateness for emotion recognition has been tested using a small number of distinct feature sets and on different, usually small, data sets. In the current research, we reviewed 25 studies and implemented 40 different EDA features across time, frequency and time-frequency domains on the publicly available AMIGOS dataset. We performed a systematic comparison of these EDA features using three feature selection methods, Joint Mutual Information (JMI), Conditional Mutual Information Maximization (CMIM) and Double Input Symmetrical Relevance (DISR) and machine learning techniques. We found that approximately the same numbers of features are required to obtain the optimal accuracy for the arousal recognition and the valence recognition. Also, the subject-dependent classification results were significantly higher than the subject-independent classification for both arousal and valence recognition. Statistical features related to the Mel-Frequency Cepstral Coefficients (MFCC) were explored for the first time for the emotion recognition from EDA signals and they outperformed all other feature groups, including the most commonly used Skin Conductance Response (SCR) related features.

Proceedings ArticleDOI
TL;DR: In this article, the multi-task learning of lightweight convolutional neural networks is studied for face identification and classification of facial attributes (age, gender, ethnicity) trained on cropped faces without margins.
Abstract: In this paper, the multi-task learning of lightweight convolutional neural networks is studied for face identification and classification of facial attributes (age, gender, ethnicity) trained on cropped faces without margins The necessity to fine-tune these networks to predict facial expressions is highlighted Several models are presented based on MobileNet, EfficientNet and RexNet architectures It was experimentally demonstrated that they lead to near state-of-the-art results in age, gender and race recognition on the UTKFace dataset and emotion classification on the AffectNet dataset Moreover, it is shown that the usage of the trained models as feature extractors of facial regions in video frames leads to 45% higher accuracy than the previously known state-of-the-art single models for the AFEW and the VGAF datasets from the EmotiW challenges The models and source code are publicly available at this https URL

Journal ArticleDOI
TL;DR: A novel feature-attention-based end-to-end approach for RUL prediction that gives greater attention weights to more important features dynamically in the training process and outperforms other latest existing approaches.
Abstract: Deep learning plays an increasingly important role in industrial applications, such as the remaining useful life (RUL) prediction of machines. However, when dealing with multifeature data, most deep learning approaches do not have effective mechanisms to weigh the input features adaptively. In this article, a novel feature-attention-based end-to-end approach is proposed for RUL prediction. First, the proposed feature-attention mechanism is directly applied to the input data, which gives greater attention weights to more important features dynamically in the training process. This helps the model focus more on those critical inputs, and the prediction performance is therefore improved. Next, bidirectional gated recurrent units (BGRU) are used to extract long-term dependencies from the weighted input data, and convolutional neural networks are employed to capture local features from the output sequences of BGRU. Finally, fully connected networks are used to learn the above-mentioned abstract representations to predict the RUL. The proposed approach is validated in a case study of turbofan engines. The experimental results demonstrate that the proposed approach outperforms other latest existing approaches.

Journal ArticleDOI
TL;DR: A knowledge mapping-based adversarial domain adaptation (KMADA) method with a discriminator and a feature extractor to generalize knowledge from target to source domain and indicates the irreplaceable superiority of the KMADA, which achieves the highest diagnosis accuracy.

Posted ContentDOI
TL;DR: This paper proposes a reliable method based on discard masked region and deep learning based features in order to address the problem of masked face recognition process and results show high recognition performance.
Abstract: The coronavirus disease (COVID-19) is an unparalleled crisis leading to a huge number of casualties and security problems. In order to reduce the spread of coronavirus, people often wear masks to protect themselves. This makes face recognition a very difficult task since certain parts of the face are hidden. A primary focus of researchers during the ongoing coronavirus pandemic is to come up with suggestions to handle this problem through rapid and efficient solutions. In this paper, we propose a reliable method based on occlusion removal and deep learning-based features in order to address the problem of the masked face recognition process. The first step is to remove the masked face region. Next, we apply three pre-trained deep Convolutional Neural Networks (CNN), namely VGG-16, AlexNet, and ResNet-50, and use them to extract deep features from the obtained regions (mostly eyes and forehead regions). The Bag-of-features paradigm is then applied to the feature maps of the last convolutional layer in order to quantize them and to get a slight representation comparing to the fully connected layer of classical CNN. Finally, Multilayer Perceptron (MLP) is applied for the classification process. Experimental results on Real-World-Masked-Face-Dataset show high recognition performance compared to other state-of-the-art methods.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed an efficient deep matrix factorization with review feature learning for the industrial recommender system (EDMF), which extracted the interactive features of onefold review by convolutional neural networks with word attention mechanism.
Abstract: Recommendation accuracy is a fundamental problem in the quality of the recommendation system. In this paper, we propose an efficient deep matrix factorization with review feature learning for the industrial recommender system (EDMF). Two characteristics in user's review are revealed. First, interactivity between the user and the item, which can also be considered as the former's scoring behavior on the latter, is exploited in a review. Second, the review is only a partial description of the user's preferences for the item, which is revealed as the sparsity property. Specifically, in the first characteristic, EDMF extracts the interactive features of onefold review by convolutional neural networks with word attention mechanism. Subsequently, L0 norm is leveraged to constrain the review considering that the review information is a sparse feature, which is the second characteristic. Furthermore, the loss function is constructed by maximum a posteriori estimation theory, where the interactivity and sparsity property are converted as two prior probability functions. Finally, the alternative minimization algorithm is introduced to optimize the loss functions. Experimental results on several datasets demonstrate that the proposed methods, which show good industrial conversion application prospects, outperform the state-of-the-art methods in terms of effectiveness and efficiency.

Journal ArticleDOI
TL;DR: Experimental results demonstrate that PEN outperforms 14 existing HAR algorithms on these datasets in terms of the F1-score; HARFLS with PEN obtains better recognition results on the WISDM and PAMAP2 datasets, compared with 11 existing federated learning systems with various feature extraction structures.
Abstract: With the rapid growth of mobile devices, wearable sensor-based human activity recognition (HAR) has become one of the hottest topics in the Internet of Things. However, it is challenging for traditional approaches to achieving high recognition accuracy while protecting users’ privacy and sensitive information. To this end, we design a federated learning system for HAR (HARFLS). Based on the FederatedAveraging method, HARFLS enables each user to handle its activity recognition task safely and collectively. However, the recognition accuracy largely depends on the system’s feature extraction ability. To capture sufficient features from HAR data, we design a perceptive extraction network (PEN) as the feature extractor for each user. PEN is mainly composed of a feature network and a relation network. The feature network, based on a convolutional block, is responsible for discovering local features from the HAR data while the relation network, a combination of long short-term memory (LSTM) and attention mechanism, focuses on mining global relationships hidden in the data. Four widely used datasets, i.e., WISDM, UCI_HAR 2012, OPPORTUNITY, and PAMAP2, are used for performance evaluation. Experimental results demonstrate that PEN outperforms 14 existing HAR algorithms on these datasets in terms of the F1-score; HARFLS with PEN obtains better recognition results on the WISDM and PAMAP2 datasets, compared with 11 existing federated learning systems with various feature extraction structures.

Journal ArticleDOI
TL;DR: Transfer learning (TL) is proposed by leveraging knowledge learned from source domain to target domain by utilizing multiadversarial learning strategy for obtaining feature representations, which are invariant to the multiple domain shifts and discriminative for the learning goal at the same time.
Abstract: Fault diagnosis based on data-driven methods are widely investigated when enough supervised samples of the target machine are available to build a reliable model. However, the labeled samples in practical operated machine are usually scarce and difficult to collect. If the model is built based on the sufficient labeled samples from different source machines, the diagnosis performance will degenerate owing to the domain discrepancy. To solve this issue, in this article, transfer learning (TL) is proposed by leveraging knowledge learned from source domain to target domain. While TL methods for fault diagnosis have been actively studied, most of them focus on learning from a single source. Since the labeled samples can come from multiple domains, more general diagnosis knowledge can be learned, which is beneficial to the prediction for the target domain. Therefore, a new TL approach based on multisource domain adaptation is proposed. A multiadversarial learning strategy is utilized for obtaining feature representations, which are invariant to the multiple domain shifts and discriminative for the learning goal at the same time. Extensive experimental analysis on four different bearing datasets is performed to illustrate the effectiveness and advantage of the proposed method.

Journal ArticleDOI
TL;DR: A fine-grained VTC method using lightweight convolutional neural network with feature optimization and joint learning strategy combining softmax loss and contrastive-center loss to class vehicle types is proposed, thereby improving model’s fine- grained classification ability.
Abstract: Vehicle type classification (VTC) plays an important role in today’s intelligent transportation. Previous VTC systems usually run on a monitoring center’s host machine due to the models’ complexity, which consume lots of computing resources and have poor real-time performance. If these systems are deployed to embedded terminals by making the model lightweight while ensuring accuracy, then the problem can be addressed. To this end, we propose a fine-grained VTC method using lightweight convolutional neural network with feature optimization and joint learning strategy. Firstly, a lightweight convolutional network with feature optimization (LWCNN-FO) is designed. We use depthwise separable convolution to reduce network parameters. Besides, the SENet module is added to obtain the important degree of each feature channel automatically through the sample-based self-learning, which can improve recognition accuracy with less network parameters growth. In addition, considering both between-class similarity and intra-class variance, this paper adopts the joint learning strategy combining softmax loss and contrastive-center loss to class vehicle types, thereby improving model’s fine-grained classification ability. We also build a dataset, called Car-159, consisting of 7998 pictures for 159 vehicle types, to evaluate our method. Compared with the state-of-the-art methods, experimental results show that our method can effectively decrease model’s complexity while maintaining accuracy.

Journal ArticleDOI
TL;DR: Tang et al. as mentioned in this paper proposed an attention consistent network (ACNet) based on the Siamese network for remote sensing image scene classification, which unifies the salient regions and impact/separate the RS images from the same/different semantic categories.
Abstract: Remote sensing (RS) image scene classification is an important research topic in the RS community, which aims to assign the semantics to the land covers. Recently, due to the strong behavior of convolutional neural network (CNN) in feature representation, the growing number of CNN-based classification methods has been proposed for RS images. Although they achieve cracking performance, there is still some room for improvement. First, apart from the global information, the local features are crucial to distinguish the RS images. The existing networks are good at capturing the global features since the CNNs’ hierarchical structure and the nonlinear fitting capacity. However, the local features are not always emphasized. Second, to obtain satisfactory classification results, the distances of RS images from the same/different classes should be minimized/maximized. Nevertheless, these key points in pattern classification do not get the attention they deserve. To overcome the limitation mentioned above, we propose a new CNN named attention consistent network (ACNet) based on the Siamese network in this article. First, due to the dual-branch structure of ACNet, the input data are the image pairs that are obtained by the spatial rotation. This helps our model to fully explore the global features from RS images. Second, we introduce different attention techniques to mine the objects’ information from RS images comprehensively. Third, considering the influence of the spatial rotation and the similarities between RS images, we develop an attention consistent model to unify the salient regions and impact/separate the RS images from the same/different semantic categories. Finally, the classification results can be obtained using the learned features. Three popular RS scene datasets are selected to validate our ACNet. Compared with some existing networks, the proposed method can achieve better performance. The encouraging results illustrate that ACNet is effective for the RS image scene classification. The source codes of this method can be found in https://github.com/TangXu-Group/Remote-Sensing-Images-Classification/tree/main/GLCnet .

Proceedings ArticleDOI
19 Jun 2021
TL;DR: In this paper, the authors present two new natural world visual classification datasets, iNat2021 and NeWT, with the aim of benchmarking the performance of representation learning algorithms on a suite of challenging natural world binary classification tasks that go beyond standard species classification.
Abstract: Recent progress in self-supervised learning has resulted in models that are capable of extracting rich representations from image collections without requiring any explicit label supervision. However, to date the vast majority of these approaches have restricted themselves to training on standard benchmark datasets such as ImageNet. We argue that fine-grained visual categorization problems, such as plant and animal species classification, provide an informative testbed for self-supervised learning. In order to facilitate progress in this area we present two new natural world visual classification datasets, iNat2021 and NeWT. The former consists of 2.7M images from 10k different species up-loaded by users of the citizen science application iNaturalist. We designed the latter, NeWT, in collaboration with domain experts with the aim of benchmarking the performance of representation learning algorithms on a suite of challenging natural world binary classification tasks that go beyond standard species classification. These two new datasets allow us to explore questions related to large-scale representation and transfer learning in the context of fine-grained categories. We provide a comprehensive analysis of feature extractors trained with and without supervision on ImageNet and iNat2021, shedding light on the strengths and weaknesses of different learned features across a diverse set of tasks. We find that features produced by standard supervised methods still outperform those produced by self-supervised approaches such as SimCLR. However, improved self-supervised learning methods are constantly being released and the iNat2021 and NeWT datasets are a valuable resource for tracking their progress.

Journal ArticleDOI
TL;DR: The proposed CNN-SVM system is applied in bearing fault diagnosis, which takes the time domain diagram of bearing vibration data as the system input and has the advantages of less time-consuming, high precision and strong generalization ability.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: In this paper, the authors propose to learn a dynamic alignment, which can effectively highlight both query regions and channels according to different local support information, which is achieved by first dynamically sampling the neighborhood of the feature position conditioned on the input few shot, based on which they further predict a both position-dependent and channel-dependent Dynamic Meta-filter.
Abstract: Few-shot learning (FSL), which aims to recognise new classes by adapting the learned knowledge with extremely limited few-shot (support) examples, remains an important open problem in computer vision. Most of the existing methods for feature alignment in few-shot learning only consider image-level or spatial-level alignment while omitting the channel disparity. Our insight is that these methods would lead to poor adaptation with redundant matching, and leveraging channel-wise adjustment is the key to well adapting the learned knowledge to new classes. Therefore, in this paper, we propose to learn a dynamic alignment, which can effectively highlight both query regions and channels according to different local support information. Specifically, this is achieved by first dynamically sampling the neighbourhood of the feature position conditioned on the input few shot, based on which we further predict a both position-dependent and channel-dependent Dynamic Meta-filter. The filter is used to align the query feature with position-specific and channel-specific knowledge. Moreover, we adopt Neural Ordinary Differential Equation (ODE) to enable a more accurate control of the alignment. In such a sense our model is able to better capture fine-grained semantic context of the few-shot example and thus facilitates dynamical knowledge adaptation for few-shot learning. The resulting framework establishes the new state-of-the-arts on major few-shot visual recognition benchmarks, including miniImageNet and tieredImageNet.

Journal ArticleDOI
TL;DR: Ten different feature encoding schemes were explored, with the goal of capturing key characteristics around 6mA sites and Meta-i6mA was proposed that combined the baseline models using the meta-predictor approach and outperformed the existing predictors.
Abstract: DNA N6-methyladenine (6mA) represents important epigenetic modifications, which are responsible for various cellular processes. The accurate identification of 6mA sites is one of the challenging tasks in genome analysis, which leads to an understanding of their biological functions. To date, several species-specific machine learning (ML)-based models have been proposed, but majority of them did not test their model to other species. Hence, their practical application to other plant species is quite limited. In this study, we explored 10 different feature encoding schemes, with the goal of capturing key characteristics around 6mA sites. We selected five feature encoding schemes based on physicochemical and position-specific information that possesses high discriminative capability. The resultant feature sets were inputted to six commonly used ML methods (random forest, support vector machine, extremely randomized tree, logistic regression, naive Bayes and AdaBoost). The Rosaceae genome was employed to train the above classifiers, which generated 30 baseline models. To integrate their individual strength, Meta-i6mA was proposed that combined the baseline models using the meta-predictor approach. In extensive independent test, Meta-i6mA showed high Matthews correlation coefficient values of 0.918, 0.827 and 0.635 on Rosaceae, rice and Arabidopsis thaliana, respectively and outperformed the existing predictors. We anticipate that the Meta-i6mA can be applied across different plant species. Furthermore, we developed an online user-friendly web server, which is available at http://kurata14.bio.kyutech.ac.jp/Meta-i6mA/.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed methods can improve the emotion recognition rate on the different size datasets and are effective in comparison with studies where authors used EEG signals that classify human emotions in the two-dimensional space.

Journal ArticleDOI
TL;DR: A new automated deep learning method is proposed for the classification of multiclass brain tumors using a modified genetic algorithm based on metaheuristics and a non-redundant serial-based approach.
Abstract: Multiclass classification of brain tumors is an important area of research in the field of medical imaging. Since accuracy is crucial in the classification, a number of techniques are introduced by computer vision researchers; however, they still face the issue of low accuracy. In this article, a new automated deep learning method is proposed for the classification of multiclass brain tumors. To realize the proposed method, the Densenet201 Pre-Trained Deep Learning Model is fine-tuned and later trained using a deep transfer of imbalanced data learning. The features of the trained model are extracted from the average pool layer, which represents the very deep information of each type of tumor. However, the characteristics of this layer are not sufficient for a precise classification; therefore, two techniques for the selection of features are proposed. The first technique is Entropy–Kurtosis-based High Feature Values (EKbHFV) and the second technique is a modified genetic algorithm (MGA) based on metaheuristics. The selected features of the GA are further refined by the proposed new threshold function. Finally, both EKbHFV and MGA-based features are fused using a non-redundant serial-based approach and classified using a multiclass SVM cubic classifier. For the experimental process, two datasets, including BRATS2018 and BRATS2019, are used without increase and have achieved an accuracy of more than 95%. The precise comparison of the proposed method with other neural nets shows the significance of this work.