scispace - formally typeset
Search or ask a question

Showing papers on "Feature vector published in 2019"


Proceedings ArticleDOI
15 Jun 2019
TL;DR: In this paper, an implicit field is used to assign a value to each point in 3D space, so that a shape can be extracted as an iso-surface, and a binary classifier is trained to perform this assignment.
Abstract: We advocate the use of implicit fields for learning generative models of shapes and introduce an implicit field decoder, called IM-NET, for shape generation, aimed at improving the visual quality of the generated shapes. An implicit field assigns a value to each point in 3D space, so that a shape can be extracted as an iso-surface. IM-NET is trained to perform this assignment by means of a binary classifier. Specifically, it takes a point coordinate, along with a feature vector encoding a shape, and outputs a value which indicates whether the point is outside the shape or not. By replacing conventional decoders by our implicit decoder for representation learning (via IM-AE) and shape generation (via IM-GAN), we demonstrate superior results for tasks such as generative shape modeling, interpolation, and single-view 3D reconstruction, particularly in terms of visual quality. Code and supplementary material are available at https://github.com/czq142857/implicit-decoder.

1,261 citations


Journal ArticleDOI
13 Mar 2019-Nature
TL;DR: In this article, two quantum algorithms for machine learning on a superconducting processor are proposed and experimentally implemented, using a variational quantum circuit to classify the data in a way similar to the method of conventional SVMs.
Abstract: Machine learning and quantum computing are two technologies that each have the potential to alter how computation is performed to address previously untenable problems. Kernel methods for machine learning are ubiquitous in pattern recognition, with support vector machines (SVMs) being the best known method for classification problems. However, there are limitations to the successful solution to such classification problems when the feature space becomes large, and the kernel functions become computationally expensive to estimate. A core element in the computational speed-ups enabled by quantum algorithms is the exploitation of an exponentially large quantum state space through controllable entanglement and interference. Here we propose and experimentally implement two quantum algorithms on a superconducting processor. A key component in both methods is the use of the quantum state space as feature space. The use of a quantum-enhanced feature space that is only efficiently accessible on a quantum computer provides a possible path to quantum advantage. The algorithms solve a problem of supervised learning: the construction of a classifier. One method, the quantum variational classifier, uses a variational quantum circuit1,2 to classify the data in a way similar to the method of conventional SVMs. The other method, a quantum kernel estimator, estimates the kernel function on the quantum computer and optimizes a classical SVM. The two methods provide tools for exploring the applications of noisy intermediate-scale quantum computers3 to machine learning.

1,140 citations


Journal ArticleDOI
TL;DR: This Letter interprets the process of encoding inputs in a quantum state as a nonlinear feature map that maps data to quantum Hilbert space and shows how it opens up a new avenue for the design of quantum machine learning algorithms.
Abstract: A basic idea of quantum computing is surprisingly similar to that of kernel methods in machine learning, namely, to efficiently perform computations in an intractably large Hilbert space. In this Letter we explore some theoretical foundations of this link and show how it opens up a new avenue for the design of quantum machine learning algorithms. We interpret the process of encoding inputs in a quantum state as a nonlinear feature map that maps data to quantum Hilbert space. A quantum computer can now analyze the input data in this feature space. Based on this link, we discuss two approaches for building a quantum model for classification. In the first approach, the quantum device estimates inner products of quantum states to compute a classically intractable kernel. The kernel can be fed into any classical kernel method such as a support vector machine. In the second approach, we use a variational quantum circuit as a linear model that classifies data explicitly in Hilbert space. We illustrate these ideas with a feature map based on squeezing in a continuous-variable system, and visualize the working principle with two-dimensional minibenchmark datasets.

852 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: An integrated OLTR algorithm is developed that maps an image to a feature space such that visual concepts can easily relate to each other based on a learned metric that respects the closed-world classification while acknowledging the novelty of the open world.
Abstract: Real world data often have a long-tailed and open-ended distribution. A practical recognition system must classify among majority and minority classes, generalize from a few known instances, and acknowledge novelty upon a never seen instance. We define Open Long-Tailed Recognition (OLTR) as learning from such naturally distributed data and optimizing the classification accuracy over a balanced test set which include head, tail, and open classes. OLTR must handle imbalanced classification, few-shot learning, and open-set recognition in one integrated algorithm, whereas existing classification approaches focus only on one aspect and deliver poorly over the entire class spectrum. The key challenges are how to share visual knowledge between head and tail classes and how to reduce confusion between tail and open classes. We develop an integrated OLTR algorithm that maps an image to a feature space such that visual concepts can easily relate to each other based on a learned metric that respects the closed-world classification while acknowledging the novelty of the open world. Our so-called dynamic meta-embedding combines a direct image feature and an associated memory feature, with the feature norm indicating the familiarity to known classes. On three large-scale OLTR datasets we curate from object-centric ImageNet, scene-centric Places, and face-centric MS1M data, our method consistently outperforms the state-of-the-art. Our code, datasets, and models enable future OLTR research and are publicly available at \url{https://liuziwei7.github.io/projects/LongTail.html}.

780 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: In this paper, a shared latent space of image features and class embeddings is learned by modality-specific aligned variational autoencoders, on which they train a softmax classifier.
Abstract: Many approaches in generalized zero-shot learning rely on cross-modal mapping between the image feature space and the class embedding space. As labeled images are expensive, one direction is to augment the dataset by generating either images or image features. However, the former misses fine-grained details and the latter requires learning a mapping associated with class embeddings. In this work, we take feature generation one step further and propose a model where a shared latent space of image features and class embeddings is learned by modality-specific aligned variational autoencoders. This leaves us with the required discriminative information about the image and classes in the latent features, on which we train a softmax classifier. The key to our approach is that we align the distributions learned from images and from side-information to construct latent features that contain the essential multi-modal information associated with unseen classes. We evaluate our learned latent features on several benchmark datasets, i.e. CUB, SUN, AWA1 and AWA2, and establish a new state of the art on generalized zero-shot as well as on few-shot learning. Moreover, our results on ImageNet with various zero-shot splits show that our latent features generalize well in large-scale settings.

421 citations


Journal ArticleDOI
Wu Deng, Rui Yao1, Huimin Zhao, Xinhua Yang1, Guangyu Li1 
01 Apr 2019
TL;DR: The fuzzy information entropy can accurately and more completely extract the characteristics of the vibration signal, the improved PSO algorithm can effectively improve the classification accuracy of LS-SVM, and the proposed fault diagnosis method outperforms the other mentioned methods.
Abstract: Aiming at the problem that the most existing fault diagnosis methods could not effectively recognize the early faults in the rotating machinery, the empirical mode decomposition, fuzzy information entropy, improved particle swarm optimization algorithm and least squares support vector machines are introduced into the fault diagnosis to propose a novel intelligent diagnosis method, which is applied to diagnose the faults of the motor bearing in this paper. In the proposed method, the vibration signal is decomposed into a set of intrinsic mode functions (IMFs) by using empirical mode decomposition method. The fuzzy information entropy values of IMFs are calculated to reveal the intrinsic characteristics of the vibration signal and considered as feature vectors. Then the diversity mutation strategy, neighborhood mutation strategy, learning factor strategy and inertia weight strategy for basic particle swarm optimization (PSO) algorithm are used to propose an improved PSO algorithm. The improved PSO algorithm is used to optimize the parameters of least squares support vector machines (LS-SVM) in order to construct an optimal LS-SVM classifier, which is used to classify the fault. Finally, the proposed fault diagnosis method is fully evaluated by experiments and comparative studies for motor bearing. The experiment results indicate that the fuzzy information entropy can accurately and more completely extract the characteristics of the vibration signal. The improved PSO algorithm can effectively improve the classification accuracy of LS-SVM, and the proposed fault diagnosis method outperforms the other mentioned methods in this paper and published in the literature. It provides a new method for fault diagnosis of rotating machinery.

365 citations


Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed feature selection method effectively reduces the dimensions of the dataset and achieves superior classification accuracy using the selected features.

353 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: In this paper, a co-attention Siamese network (COSNet) is proposed to address the unsupervised video object segmentation task from a holistic view.
Abstract: We introduce a novel network, called as CO-attention Siamese Network (COSNet), to address the unsupervised video object segmentation task from a holistic view. We emphasize the importance of inherent correlation among video frames and incorporate a global co-attention mechanism to improve further the state-of-the-art deep learning based solutions that primarily focus on learning discriminative foreground representations over appearance and motion in short-term temporal segments. The co-attention layers in our network provide efficient and competent stages for capturing global correlations and scene context by jointly computing and appending co-attention responses into a joint feature space. We train COSNet with pairs of video frames, which naturally augments training data and allows increased learning capacity. During the segmentation stage, the co-attention model encodes useful information by processing multiple reference frames together, which is leveraged to infer the frequently reappearing and salient foreground objects better. We propose a unified and end-to-end trainable framework where different co-attention variants can be derived for mining the rich context within videos. Our extensive experiments over three large benchmarks manifest that COSNet outperforms the current alternatives by a large margin. We will publicly release our implementation and models.

341 citations


Posted Content
TL;DR: Frustum ConvNet (F-ConvNet) as mentioned in this paper aggregates point-wise features as frustum-level feature vectors, and arrays these feature vectors as a feature map for use of its subsequent component of fully convolutional network (FCN), which spatially fuses frustumlevel features and supports an end-to-end and continuous estimation of oriented boxes in the 3D space.
Abstract: In this work, we propose a novel method termed \emph{Frustum ConvNet (F-ConvNet)} for amodal 3D object detection from point clouds. Given 2D region proposals in an RGB image, our method first generates a sequence of frustums for each region proposal, and uses the obtained frustums to group local points. F-ConvNet aggregates point-wise features as frustum-level feature vectors, and arrays these feature vectors as a feature map for use of its subsequent component of fully convolutional network (FCN), which spatially fuses frustum-level features and supports an end-to-end and continuous estimation of oriented boxes in the 3D space. We also propose component variants of F-ConvNet, including an FCN variant that extracts multi-resolution frustum features, and a refined use of F-ConvNet over a reduced 3D space. Careful ablation studies verify the efficacy of these component variants. F-ConvNet assumes no prior knowledge of the working 3D environment and is thus dataset-agnostic. We present experiments on both the indoor SUN-RGBD and outdoor KITTI datasets. F-ConvNet outperforms all existing methods on SUN-RGBD, and at the time of submission it outperforms all published works on the KITTI benchmark. Code has been made available at: {\url{this https URL}.}

313 citations


Proceedings ArticleDOI
01 Oct 2019
TL;DR: In this paper, the past frames with object masks form an external memory, and the current frame as the query is segmented using the mask information in the memory, which is densely matched in the feature space, covering all the space-time pixel locations in a feed-forward fashion.
Abstract: We propose a novel solution for semi-supervised video object segmentation. By the nature of the problem, available cues (e.g. video frame(s) with object masks) become richer with the intermediate predictions. However, the existing methods are unable to fully exploit this rich source of information. We resolve the issue by leveraging memory networks and learn to read relevant information from all available sources. In our framework, the past frames with object masks form an external memory, and the current frame as the query is segmented using the mask information in the memory. Specifically, the query and the memory are densely matched in the feature space, covering all the space-time pixel locations in a feed-forward fashion. Contrast to the previous approaches, the abundant use of the guidance information allows us to better handle the challenges such as appearance changes and occlussions. We validate our method on the latest benchmark sets and achieved the state-of-the-art performance (overall score of 79.4 on Youtube-VOS val set, J of 88.7 and 79.2 on DAVIS 2016/2017 val set respectively) while having a fast runtime (0.16 second/frame on DAVIS 2016 val set).

310 citations


Journal ArticleDOI
TL;DR: Experimental results confirm the efficiency of the proposed approaches in improving the classification accuracy compared to other wrapper-based algorithms, which proves the ability of BOA algorithm in searching the feature space and selecting the most informative attributes for classification tasks.
Abstract: In this paper, binary variants of the Butterfly Optimization Algorithm (BOA) are proposed and used to select the optimal feature subset for classification purposes in a wrapper-mode. BOA is a recently proposed algorithm that has not been systematically applied to feature selection problems yet. BOA can efficiently explore the feature space for optimal or near-optimal feature subset minimizing a given fitness function. The two proposed binary variants of BOA are applied to select the optimal feature combination that maximizes classification accuracy while minimizing the number of selected features. In these variants, the native BOA is utilized while its continuous steps are bounded in a threshold using a suitable threshold function after squashing them. The proposed binary algorithms are compared with five state-of-the-art approaches and four latest high performing optimization algorithms. A number of assessment indicators are utilized to properly assess and compare the performance of these algorithms over 21 datasets from the UCI repository. The experimental results confirm the efficiency of the proposed approaches in improving the classification accuracy compared to other wrapper-based algorithms, which proves the ability of BOA algorithm in searching the feature space and selecting the most informative attributes for classification tasks.

Proceedings ArticleDOI
15 Jun 2019
TL;DR: 3D-SIS is introduced, a novel neural network architecture for 3D semantic instance segmentation in commodity RGB-D scans that leverages high-resolution RGB input by associating 2D images with the volumetric grid based on the pose alignment of the 3D reconstruction.
Abstract: We introduce 3D-SIS, a novel neural network architecture for 3D semantic instance segmentation in commodity RGB-D scans. The core idea of our method to jointly learn from both geometric and color signal, thus enabling accurate instance predictions. Rather than operate solely on 2D frames, we observe that most computer vision applications have multi-view RGB-D input available, which we leverage to construct an approach for 3D instance segmentation that effectively fuses together these multi-modal inputs. Our network leverages high-resolution RGB input by associating 2D images with the volumetric grid based on the pose alignment of the 3D reconstruction. For each image, we first extract 2D features for each pixel with a series of 2D convolutions; we then backproject the resulting feature vector to the associated voxel in the 3D grid. This combination of 2D and 3D feature learning allows significantly higher accuracy object detection and instance segmentation than state-of-the-art alternatives. We show results on both synthetic and real-world public benchmarks, achieving an improvement in mAP of over 13 on real-world data.

Posted Content
TL;DR: The results indicate that graph neural networks only perform low-pass filtering on feature vectors and do not have the non-linear manifold learning property, and some insights on GCN-based graph neural network design are proposed.
Abstract: Graph neural networks have become one of the most important techniques to solve machine learning problems on graph-structured data. Recent work on vertex classification proposed deep and distributed learning models to achieve high performance and scalability. However, we find that the feature vectors of benchmark datasets are already quite informative for the classification task, and the graph structure only provides a means to denoise the data. In this paper, we develop a theoretical framework based on graph signal processing for analyzing graph neural networks. Our results indicate that graph neural networks only perform low-pass filtering on feature vectors and do not have the non-linear manifold learning property. We further investigate their resilience to feature noise and propose some insights on GCN-based graph neural network design.

Proceedings Article
01 Jan 2019
TL;DR: In this article, the object relation relation transformer (ORT) is proposed to explicitly incorporate information about the spatial relationship between input detected objects through geometric attention, leading to improvements on all common captioning metrics on the MS-COCO dataset.
Abstract: Image captioning models typically follow an encoder-decoder architecture which uses abstract image feature vectors as input to the encoder. One of the most successful algorithms uses feature vectors extracted from the region proposals obtained from an object detector. In this work we introduce the Object Relation Transformer, that builds upon this approach by explicitly incorporating information about the spatial relationship between input detected objects through geometric attention. Quantitative and qualitative results demonstrate the importance of such geometric attention for image captioning, leading to improvements on all common captioning metrics on the MS-COCO dataset. Code is available at https://github.com/yahoo/object_relation_transformer .

Posted Content
TL;DR: This work investigates the challenging problem of domain generalization, i.e., training a model on multi-domain source data such that it can directly generalize to target domains with unknown statistics, and adopts a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift.
Abstract: Generalization capability to unseen domains is crucial for machine learning models when deploying to real-world conditions. We investigate the challenging problem of domain generalization, i.e., training a model on multi-domain source data such that it can directly generalize to target domains with unknown statistics. We adopt a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift. Further, we introduce two complementary losses which explicitly regularize the semantic structure of the feature space. Globally, we align a derived soft confusion matrix to preserve general knowledge about inter-class relationships. Locally, we promote domain-independent class-specific cohesion and separation of sample features with a metric-learning component. The effectiveness of our method is demonstrated with new state-of-the-art results on two common object recognition benchmarks. Our method also shows consistent improvement on a medical image segmentation task.

Posted Content
TL;DR: Experimental results suggest that the SFA-based approach is able to extract useful motion patterns and improves the recognition performance, requires less intermediate processing steps but achieves comparable or even better performance, and has good potential to recognize complex multiperson activities.
Abstract: Slow Feature Analysis (SFA) extracts slowly varying features from a quickly varying input signal. It has been successfully applied to modeling the visual receptive fields of the cortical neurons. Sufficient experimental results in neuroscience suggest that the temporal slowness principle is a general learning principle in visual perception. In this paper, we introduce the SFA framework to the problem of human action recognition by incorporating the discriminative information with SFA learning and considering the spatial relationship of body parts. In particular, we consider four kinds of SFA learning strategies, including the original unsupervised SFA (U-SFA), the supervised SFA (S-SFA), the discriminative SFA (D-SFA), and the spatial discriminative SFA (SD-SFA), to extract slow feature functions from a large amount of training cuboids which are obtained by random sampling in motion boundaries. Afterward, to represent action sequences, the squared first order temporal derivatives are accumulated over all transformed cuboids into one feature vector, which is termed the Accumulated Squared Derivative (ASD) feature. The ASD feature encodes the statistical distribution of slow features in an action sequence. Finally, a linear support vector machine (SVM) is trained to classify actions represented by ASD features. We conduct extensive experiments, including two sets of control experiments, two sets of large scale experiments on the KTH and Weizmann databases, and two sets of experiments on the CASIA and UT-interaction databases, to demonstrate the effectiveness of SFA for human action recognition.

Proceedings ArticleDOI
15 Jun 2019
TL;DR: A center-based feature transfer framework to augment the feature space of under-represented subjects from the regular subjects that have sufficiently diverse samples, which presents smooth visual interpolation, which conducts disentanglement to preserve identity of a class while augmenting its feature space with non-identity variations such as pose and lighting.
Abstract: Despite the large volume of face recognition datasets, there is a significant portion of subjects, of which the samples are insufficient and thus under-represented. Ignoring such significant portion results in insufficient training data. Training with under-represented data leads to biased classifiers in conventionally-trained deep networks. In this paper, we propose a center-based feature transfer framework to augment the feature space of under-represented subjects from the regular subjects that have sufficiently diverse samples. A Gaussian prior of the variance is assumed across all subjects and the variance from regular ones are transferred to the under-represented ones. This encourages the under-represented distribution to be closer to the regular distribution. Further, an alternating training regimen is proposed to simultaneously achieve less biased classifiers and a more discriminative feature representation. We conduct ablative study to mimic the under-represented datasets by varying the portion of under-represented classes on the MS-Celeb-1M dataset. Advantageous results on LFW, IJB-A and MS-Celeb-1M demonstrate the effectiveness of our feature transfer and training strategy, compared to both general baselines and state-of-the-art methods. Moreover, our feature transfer successfully presents smooth visual interpolation, which conducts disentanglement to preserve identity of a class while augmenting its feature space with non-identity variations such as pose and lighting.

Proceedings ArticleDOI
19 May 2019
TL;DR: An assembly code representation learning model that can find and incorporate rich semantic relationships among tokens appearing in assembly code and significantly outperforms existing methods against changes introduced by obfuscation and optimizations is developed.
Abstract: Reverse engineering is a manually intensive but necessary technique for understanding the inner workings of new malware, finding vulnerabilities in existing systems, and detecting patent infringements in released software. An assembly clone search engine facilitates the work of reverse engineers by identifying those duplicated or known parts. However, it is challenging to design a robust clone search engine, since there exist various compiler optimization options and code obfuscation techniques that make logically similar assembly functions appear to be very different. A practical clone search engine relies on a robust vector representation of assembly code. However, the existing clone search approaches, which rely on a manual feature engineering process to form a feature vector for an assembly function, fail to consider the relationships between features and identify those unique patterns that can statistically distinguish assembly functions. To address this problem, we propose to jointly learn the lexical semantic relationships and the vector representation of assembly functions based on assembly code. We have developed an assembly code representation learning model \emph{Asm2Vec}. It only needs assembly code as input and does not require any prior knowledge such as the correct mapping between assembly functions. It can find and incorporate rich semantic relationships among tokens appearing in assembly code. We conduct extensive experiments and benchmark the learning model with state-of-the-art static and dynamic clone search approaches. We show that the learned representation is more robust and significantly outperforms existing methods against changes introduced by obfuscation and optimizations.

Journal ArticleDOI
TL;DR: This paper proposes a generalized framework, named as transfer independently together (TIT), which learns multiple transformations, one for each domain (independently) to map data onto a shared latent space, where the domains are well aligned.
Abstract: Currently, unsupervised heterogeneous domain adaptation in a generalized setting, which is the most common scenario in real-world applications, is under insufficient exploration. Existing approaches either are limited to special cases or require labeled target samples for training. This paper aims to overcome these limitations by proposing a generalized framework, named as transfer independently together (TIT). Specifically, we learn multiple transformations, one for each domain (independently) , to map data onto a shared latent space, where the domains are well aligned. The multiple transformations are jointly optimized in a unified framework (together) by an effective formulation. In addition, to learn robust transformations, we further propose a novel landmark selection algorithm to reweight samples, i.e., increase the weight of pivot samples and decrease the weight of outliers. Our landmark selection is based on graph optimization. It focuses on sample geometric relationship rather than sample features. As a result, by abstracting feature vectors to graph vertices, only a simple and fast integer arithmetic is involved in our algorithm instead of matrix operations with float point arithmetic in existing approaches. At last, we effectively optimize our objective via a dimensionality reduction procedure. TIT is applicable to arbitrary sample dimensionality and does not need labeled target samples for training. Extensive evaluations on several standard benchmarks and large-scale datasets of image classification, text categorization and text-to-image recognition verify the superiority of our approach.

Posted Content
TL;DR: This paper proposes a self-supervised learning approach for video features that results in significantly improved performance on downstream tasks (such as video classification, captioning and segmentation) compared to existing methods.
Abstract: This paper proposes a self-supervised learning approach for video features that results in significantly improved performance on downstream tasks (such as video classification, captioning and segmentation) compared to existing methods. Our method extends the BERT model for text sequences to the case of sequences of real-valued feature vectors, by replacing the softmax loss with noise contrastive estimation (NCE). We also show how to learn representations from sequences of visual features and sequences of words derived from ASR (automatic speech recognition), and show that such cross-modal training (when possible) helps even more.

Journal ArticleDOI
TL;DR: Multi-layer convolutional neural networks (CNNs) set the new state of the art for predicting neural responses to natural images in primate V1 and deep features learned for object recognition are better explanations for V1 computation than all previous filter bank theories.
Abstract: Despite great efforts over several decades, our best models of primary visual cortex (V1) still predict spiking activity quite poorly when probed with natural stimuli, highlighting our limited understanding of the nonlinear computations in V1. Recently, two approaches based on deep learning have emerged for modeling these nonlinear computations: transfer learning from artificial neural networks trained on object recognition and data-driven convolutional neural network models trained end-to-end on large populations of neurons. Here, we test the ability of both approaches to predict spiking activity in response to natural images in V1 of awake monkeys. We found that the transfer learning approach performed similarly well to the data-driven approach and both outperformed classical linear-nonlinear and wavelet-based feature representations that build on existing theories of V1. Notably, transfer learning using a pre-trained feature space required substantially less experimental time to achieve the same performance. In conclusion, multi-layer convolutional neural networks (CNNs) set the new state of the art for predicting neural responses to natural images in primate V1 and deep features learned for object recognition are better explanations for V1 computation than all previous filter bank theories. This finding strengthens the necessity of V1 models that are multiple nonlinearities away from the image domain and it supports the idea of explaining early visual cortex based on high-level functional goals.

Proceedings ArticleDOI
15 Jun 2019
TL;DR: A Category Traversal Module is introduced that can be inserted as a plug-and-play module into most metric-learning based few-shot learners, identifying task-relevant features based on both intra-class commonality and inter-class uniqueness in the feature space.
Abstract: Few-shot learning is an important area of research. Conceptually, humans are readily able to understand new concepts given just a few examples, while in more pragmatic terms, limited-example training situations are common practice. Recent effective approaches to few-shot learning employ a metric-learning framework to learn a feature similarity comparison between a query (test) example, and the few support (training) examples. However, these approaches treat each support class independently from one another, never looking at the entire task as a whole. Because of this, they are constrained to use a single set of features for all possible test-time tasks, which hinders the ability to distinguish the most relevant dimensions for the task at hand. In this work, we introduce a Category Traversal Module that can be inserted as a plug-and-play module into most metric-learning based few-shot learners. This component traverses across the entire support set at once, identifying task-relevant features based on both intra-class commonality and inter-class uniqueness in the feature space. Incorporating our module improves performance considerably (5%-10% relative) over baseline systems on both miniImageNet and tieredImageNet benchmarks, with overall performance competitive with the most recent state-of-the-art systems.

Journal ArticleDOI
TL;DR: In this article, a novel deep learning-based approach for one-class transfer learning is presented, in which labeled data from an unrelated task is used for feature learning in one class classification.
Abstract: We present a novel deep-learning-based approach for one-class transfer learning in which labeled data from an unrelated task is used for feature learning in one-class classification. The proposed method operates on top of a convolutional neural network (CNN) of choice and produces descriptive features while maintaining a low intra-class variance in the feature space for the given class. For this purpose two loss functions, compactness loss and descriptiveness loss, are proposed along with a parallel CNN architecture. A template matching-based framework is introduced to facilitate the testing process. Extensive experiments on publicly available anomaly detection, novelty detection, and mobile active authentication datasets show that the proposed deep one-class (DOC) classification method achieves significant improvements over the state-of-the-art.

Proceedings ArticleDOI
28 Sep 2019
TL;DR: In this paper, a CNN is trained on small subsets of training images, each mimicking the few-shot setting, and the target object is segmented in the query image by using a cosine similarity between the class feature vector and the query's feature map.
Abstract: This paper is about few-shot segmentation of foreground objects in images. We train a CNN on small subsets of training images, each mimicking the few-shot setting. In each subset, one image serves as the query and the other(s) as support image(s) with ground-truth segmentation. The CNN first extracts feature maps from the query and support images. Then, a class feature vector is computed as an average of the support's feature maps over the known foreground. Finally, the target object is segmented in the query image by using a cosine similarity between the class feature vector and the query's feature map. We make two contributions by: (1) Improving discriminativeness of features so their activations are high on the foreground and low elsewhere; and (2) Boosting inference with an ensemble of experts guided with the gradient of loss incurred when segmenting the support images in testing. Our evaluations on the PASCAL-5i and COCO-20i datasets demonstrate that we significantly outperform existing approaches.

Journal ArticleDOI
TL;DR: The proposed CDSAE framework comprises two stages with different optimization objectives, which can learn discriminative low-dimensional feature mappings and train an effective classifier progressively, and imposes a local Fisher discriminant regularization on each hidden layer of stacked autoencoder (SAE) to train discrim inative SAE (DSAE).
Abstract: As one of the fundamental research topics in remote sensing image analysis, hyperspectral image (HSI) classification has been extensively studied so far. However, how to discriminatively learn a low-dimensional feature space, in which the mapped features have small within-class scatter and big between-class separation, is still a challenging problem. To address this issue, this paper proposes an effective framework, named compact and discriminative stacked autoencoder (CDSAE), for HSI classification. The proposed CDSAE framework comprises two stages with different optimization objectives, which can learn discriminative low-dimensional feature mappings and train an effective classifier progressively. First, we impose a local Fisher discriminant regularization on each hidden layer of stacked autoencoder (SAE) to train discriminative SAE (DSAE) by minimizing reconstruction error. This stage can learn feature mappings, in which the pixels from the same land-cover class are mapped as nearly as possible and the pixels from different land-cover categories are separated by a large margin. Second, we learn an effective classifier and meanwhile update DSAE with a local Fisher discriminant regularization being embedded on the top of feature representations. Moreover, to learn a compact DSAE with as small number of hidden neurons as possible, we impose a diversity regularization on the hidden neurons of DSAE to balance the feature dimensionality and the feature representation capability. The experimental results on three widely-used HSI data sets and comprehensive comparisons with existing methods demonstrate that our proposed method is effective.

Proceedings Article
29 Oct 2019
TL;DR: In this paper, the authors adopt a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift and introduce two complementary losses which explicitly regularize the semantic structure of the feature space.
Abstract: Generalization capability to unseen domains is crucial for machine learning models when deploying to real-world conditions. We investigate the challenging problem of domain generalization, i.e., training a model on multi-domain source data such that it can directly generalize to target domains with unknown statistics. We adopt a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift. Further, we introduce two complementary losses which explicitly regularize the semantic structure of the feature space. Globally, we align a derived soft confusion matrix to preserve general knowledge of inter-class relationships. Locally, we promote domain-independent class-specific cohesion and separation of sample features with a metric-learning component. The effectiveness of our method is demonstrated with new state-of-the-art results on two common object recognition benchmarks. Our method also shows consistent improvement on a medical image segmentation task.

Journal ArticleDOI
TL;DR: The experiments on a public multimodal physiological signal dataset show that the DBN, and FGSVM based model significantly increases the accuracy of emotion recognition rate as compared to the existing state-of-the-art emotion classification techniques.

Posted Content
TL;DR: This work proposes a novel solution for semi-supervised video object segmentation by leveraging memory networks and learning to read relevant information from all available sources to better handle the challenges such as appearance changes and occlussions.
Abstract: We propose a novel solution for semi-supervised video object segmentation. By the nature of the problem, available cues (e.g. video frame(s) with object masks) become richer with the intermediate predictions. However, the existing methods are unable to fully exploit this rich source of information. We resolve the issue by leveraging memory networks and learn to read relevant information from all available sources. In our framework, the past frames with object masks form an external memory, and the current frame as the query is segmented using the mask information in the memory. Specifically, the query and the memory are densely matched in the feature space, covering all the space-time pixel locations in a feed-forward fashion. Contrast to the previous approaches, the abundant use of the guidance information allows us to better handle the challenges such as appearance changes and occlussions. We validate our method on the latest benchmark sets and achieved the state-of-the-art performance (overall score of 79.4 on Youtube-VOS val set, J of 88.7 and 79.2 on DAVIS 2016/2017 val set respectively) while having a fast runtime (0.16 second/frame on DAVIS 2016 val set).

Journal ArticleDOI
TL;DR: A novel HDA method that can optimize both feature discrepancy and distribution divergence in a unified objective function is proposed, which first learns a new transferable feature space by dictionary-sharing coding, and then aligns the distribution gaps on the new space.
Abstract: In real-world transfer learning tasks, especially in cross-modal applications, the source domain and the target domain often have different features and distributions, which are well known as the heterogeneous domain adaptation (HDA) problem. Yet, existing HDA methods focus on either alleviating the feature discrepancy or mitigating the distribution divergence due to the challenges of HDA. In fact, optimizing one of them can reinforce the other. In this paper, we propose a novel HDA method that can optimize both feature discrepancy and distribution divergence in a unified objective function. Specifically, we present progressive alignment , which first learns a new transferable feature space by dictionary-sharing coding, and then aligns the distribution gaps on the new space. Different from previous HDA methods that are limited to specific scenarios, our approach can handle diverse features with arbitrary dimensions. Extensive experiments on various transfer learning tasks, such as image classification, text categorization, and text-to-image recognition, verify the superiority of our method against several state-of-the-art approaches.

Journal ArticleDOI
TL;DR: A comprehensive review of the recent development in the area of CBIR and image representation is presented and the main aspects of various image retrieval and image representations models from low-level feature extraction to recent semantic deep-learning approaches are analyzed.
Abstract: Multimedia content analysis is applied in different real-world computer vision applications, and digital images constitute a major part of multimedia data. In last few years, the complexity of multimedia contents, especially the images, has grown exponentially, and on daily basis, more than millions of images are uploaded at different archives such as Twitter, Facebook, and Instagram. To search for a relevant image from an archive is a challenging research problem for computer vision research community. Most of the search engines retrieve images on the basis of traditional text-based approaches that rely on captions and metadata. In the last two decades, extensive research is reported for content-based image retrieval (CBIR), image classification, and analysis. In CBIR and image classification-based models, high-level image visuals are represented in the form of feature vectors that consists of numerical values. The research shows that there is a significant gap between image feature representation and human visual understanding. Due to this reason, the research presented in this area is focused to reduce the semantic gap between the image feature representation and human visual understanding. In this paper, we aim to present a comprehensive review of the recent development in the area of CBIR and image representation. We analyzed the main aspects of various image retrieval and image representation models from low-level feature extraction to recent semantic deep-learning approaches. The important concepts and major research studies based on CBIR and image representation are discussed in detail, and future research directions are concluded to inspire further research in this area.