scispace - formally typeset
Search or ask a question

Showing papers on "Feature vector published in 2015"


Proceedings ArticleDOI
07 Jun 2015
TL;DR: A system that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure offace similarity, and achieves state-of-the-art face recognition performance using only 128-bytes perface.
Abstract: Despite significant recent advances in the field of face recognition [10, 14, 15, 17], implementing face verification and recognition efficiently at scale presents serious challenges to current approaches. In this paper we present a system, called FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature vectors.

8,289 citations


Proceedings ArticleDOI
TL;DR: FaceNet as discussed by the authors uses a deep convolutional network trained to directly optimize the embedding itself, rather than an intermediate bottleneck layer as in previous deep learning approaches, and achieves state-of-the-art face recognition performance using only 128 bytes per face.
Abstract: Despite significant recent advances in the field of face recognition, implementing face verification and recognition efficiently at scale presents serious challenges to current approaches. In this paper we present a system, called FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature vectors. Our method uses a deep convolutional network trained to directly optimize the embedding itself, rather than an intermediate bottleneck layer as in previous deep learning approaches. To train, we use triplets of roughly aligned matching / non-matching face patches generated using a novel online triplet mining method. The benefit of our approach is much greater representational efficiency: we achieve state-of-the-art face recognition performance using only 128-bytes per face. On the widely used Labeled Faces in the Wild (LFW) dataset, our system achieves a new record accuracy of 99.63%. On YouTube Faces DB it achieves 95.12%. Our system cuts the error rate in comparison to the best published result by 30% on both datasets. We also introduce the concept of harmonic embeddings, and a harmonic triplet loss, which describe different versions of face embeddings (produced by different networks) that are compatible to each other and allow for direct comparison between each other.

4,560 citations


Proceedings ArticleDOI
07 Dec 2015
TL;DR: In this paper, the authors propose and compare two architectures: a generic architecture and another one including a layer that correlates feature vectors at different image locations, and show that networks trained on this unrealistic data still generalize very well to existing datasets such as Sintel and KITTI.
Abstract: Convolutional neural networks (CNNs) have recently been very successful in a variety of computer vision tasks, especially on those linked to recognition. Optical flow estimation has not been among the tasks CNNs succeeded at. In this paper we construct CNNs which are capable of solving the optical flow estimation problem as a supervised learning task. We propose and compare two architectures: a generic architecture and another one including a layer that correlates feature vectors at different image locations. Since existing ground truth data sets are not sufficiently large to train a CNN, we generate a large synthetic Flying Chairs dataset. We show that networks trained on this unrealistic data still generalize very well to existing datasets such as Sintel and KITTI, achieving competitive accuracy at frame rates of 5 to 10 fps.

3,833 citations


Book ChapterDOI
06 Jul 2015
TL;DR: This paper describes a zero-shot learning approach that can be implemented in just one line of code, yet it is able to outperform state of the art approaches on standard datasets.
Abstract: Zero-shot learning consists in learning how to recognise new concepts by just having a description of them. Many sophisticated approaches have been proposed to address the challenges this problem comprises. In this paper we describe a zero-shot learning approach that can be implemented in just one line of code, yet it is able to outperform state of the art approaches on standard datasets. The approach is based on a more general framework which models the relationships between features, attributes, and classes as a two linear layers network, where the weights of the top layer are not learned but are given by the environment. We further provide a learning bound on the generalisation error of this kind of approaches, by casting them as domain adaptation methods. In experiments carried out on three standard real datasets, we found that our approach is able to perform significantly better than the state of art on all of them, obtaining a ratio of improvement up to 17%.

1,192 citations


Proceedings ArticleDOI
07 Dec 2015
TL;DR: A simple yet surprisingly powerful approach for unsupervised learning of CNN that uses hundreds of thousands of unlabeled videos from the web to learn visual representations and designs a Siamese-triplet network with a ranking loss function to train this CNN representation.
Abstract: Is strong supervision necessary for learning a good visual representation? Do we really need millions of semantically-labeled images to train a Convolutional Neural Network (CNN)? In this paper, we present a simple yet surprisingly powerful approach for unsupervised learning of CNN. Specifically, we use hundreds of thousands of unlabeled videos from the web to learn visual representations. Our key idea is that visual tracking provides the supervision. That is, two patches connected by a track should have similar visual representation in deep feature space since they probably belong to same object or object part. We design a Siamese-triplet network with a ranking loss function to train this CNN representation. Without using a single image from ImageNet, just using 100K unlabeled videos and the VOC 2012 dataset, we train an ensemble of unsupervised networks that achieves 52% mAP (no bounding box regression). This performance comes tantalizingly close to its ImageNet-supervised counterpart, an ensemble which achieves a mAP of 54.4%. We also show that our unsupervised network can perform competitively in other tasks such as surface-normal estimation.

826 citations


Proceedings ArticleDOI
10 Dec 2015
TL;DR: Experimental results indicate that the proposed feature selection based on mutual information criterion is capable of improving the performance of the machine learning models in terms of prediction accuracy and reduction in training time.
Abstract: The application of machine learning models such as support vector machine (SVM) and artificial neural networks (ANN) in predicting reservoir properties has been effective in the recent years when compared with the traditional empirical methods. Despite that the machine learning models suffer a lot in the faces of uncertain data which is common characteristics of well log dataset. The reason for uncertainty in well log dataset includes a missing scale, data interpretation and measurement error problems. Feature Selection aimed at selecting feature subset that is relevant to the predicting property. In this paper a feature selection based on mutual information criterion is proposed, the strong point of this method relies on the choice of threshold based on statistically sound criterion for the typical greedy feedforward method of feature selection. Experimental results indicate that the proposed method is capable of improving the performance of the machine learning models in terms of prediction accuracy and reduction in training time.

825 citations


Proceedings ArticleDOI
07 Jun 2015
TL;DR: Extensive evaluations on several benchmark image datasets show that the proposed simultaneous feature learning and hash coding pipeline brings substantial improvements over other state-of-the-art supervised or unsupervised hashing methods.
Abstract: Similarity-preserving hashing is a widely-used method for nearest neighbour search in large-scale image retrieval tasks. For most existing hashing methods, an image is first encoded as a vector of hand-engineering visual features, followed by another separate projection or quantization step that generates binary codes. However, such visual feature vectors may not be optimally compatible with the coding process, thus producing sub-optimal hashing codes. In this paper, we propose a deep architecture for supervised hashing, in which images are mapped into binary codes via carefully designed deep neural networks. The pipeline of the proposed deep architecture consists of three building blocks: 1) a sub-network with a stack of convolution layers to produce the effective intermediate image features; 2) a divide-and-encode module to divide the intermediate image features into multiple branches, each encoded into one hash bit; and 3) a triplet ranking loss designed to characterize that one image is more similar to the second image than to the third one. Extensive evaluations on several benchmark image datasets show that the proposed simultaneous feature learning and hash coding pipeline brings substantial improvements over other state-of-the-art supervised or unsupervised hashing methods.

822 citations


Posted Content
TL;DR: This paper constructs CNNs which are capable of solving the optical flow estimation problem as a supervised learning task, and proposes and compares two architectures: a generic architecture and another one including a layer that correlates feature vectors at different image locations.
Abstract: Convolutional neural networks (CNNs) have recently been very successful in a variety of computer vision tasks, especially on those linked to recognition. Optical flow estimation has not been among the tasks where CNNs were successful. In this paper we construct appropriate CNNs which are capable of solving the optical flow estimation problem as a supervised learning task. We propose and compare two architectures: a generic architecture and another one including a layer that correlates feature vectors at different image locations. Since existing ground truth data sets are not sufficiently large to train a CNN, we generate a synthetic Flying Chairs dataset. We show that networks trained on this unrealistic data still generalize very well to existing datasets such as Sintel and KITTI, achieving competitive accuracy at frame rates of 5 to 10 fps.

801 citations


Proceedings ArticleDOI
09 Aug 2015
TL;DR: This paper presents a convolutional neural network architecture for reranking pairs of short texts, where the optimal representation of text pairs and a similarity function to relate them in a supervised way from the available training data are learned.
Abstract: Learning a similarity function between pairs of objects is at the core of learning to rank approaches In information retrieval tasks we typically deal with query-document pairs, in question answering -- question-answer pairs However, before learning can take place, such pairs needs to be mapped from the original space of symbolic words into some feature space encoding various aspects of their relatedness, eg lexical, syntactic and semantic Feature engineering is often a laborious task and may require external knowledge sources that are not always available or difficult to obtain Recently, deep learning approaches have gained a lot of attention from the research community and industry for their ability to automatically learn optimal feature representation for a given task, while claiming state-of-the-art performance in many tasks in computer vision, speech recognition and natural language processing In this paper, we present a convolutional neural network architecture for reranking pairs of short texts, where we learn the optimal representation of text pairs and a similarity function to relate them in a supervised way from the available training data Our network takes only words in the input, thus requiring minimal preprocessing In particular, we consider the task of reranking short text pairs where elements of the pair are sentences We test our deep learning system on two popular retrieval tasks from TREC: Question Answering and Microblog Retrieval Our model demonstrates strong performance on the first task beating previous state-of-the-art systems by about 3\% absolute points in both MAP and MRR and shows comparable results on tweet reranking, while enjoying the benefits of no manual feature engineering and no additional syntactic parsers

796 citations


Posted Content
TL;DR: This work revisits both retrieval stages, namely initial search and re-ranking, by employing the same primitive information derived from the CNN, and significantly improves existing CNN-based recognition pipeline.
Abstract: Recently, image representation built upon Convolutional Neural Network (CNN) has been shown to provide effective descriptors for image search, outperforming pre-CNN features as short-vector representations. Yet such models are not compatible with geometry-aware re-ranking methods and still outperformed, on some particular object retrieval benchmarks, by traditional image search systems relying on precise descriptor matching, geometric re-ranking, or query expansion. This work revisits both retrieval stages, namely initial search and re-ranking, by employing the same primitive information derived from the CNN. We build compact feature vectors that encode several image regions without the need to feed multiple inputs to the network. Furthermore, we extend integral images to handle max-pooling on convolutional layer activations, allowing us to efficiently localize matching objects. The resulting bounding box is finally used for image re-ranking. As a result, this paper significantly improves existing CNN-based recognition pipeline: We report for the first time results competing with traditional methods on the challenging Oxford5k and Paris6k datasets.

719 citations


Journal ArticleDOI
TL;DR: This paper surveys state-of-the-art transfer learning algorithms in visual categorization applications, such as object recognition, image classification, and human action recognition, to find out if they can be efficiently solved.
Abstract: Regular machine learning and data mining techniques study the training data for future inferences under a major assumption that the future data are within the same feature space or have the same distribution as the training data. However, due to the limited availability of human labeled training data, training data that stay in the same feature space or have the same distribution as the future data cannot be guaranteed to be sufficient enough to avoid the over-fitting problem. In real-world applications, apart from data in the target domain, related data in a different domain can also be included to expand the availability of our prior knowledge about the target future data. Transfer learning addresses such cross-domain learning problems by extracting useful information from data in a related domain and transferring them for being used in target tasks. In recent years, with transfer learning being applied to visual categorization, some typical problems, e.g., view divergence in action recognition tasks and concept drifting in image classification tasks, can be efficiently solved. In this paper, we survey state-of-the-art transfer learning algorithms in visual categorization applications, such as object recognition, image classification, and human action recognition.

Journal ArticleDOI
TL;DR: This article proposes a much more flexible web server called Pse-in-One, which can, through its 28 different modes, generate nearly all the possible feature vectors for DNA, RNA and protein sequences, and can also generate those feature vectors with the properties defined by users themselves.
Abstract: With the avalanche of biological sequences generated in the post-genomic age, one of the most challenging problems in computational biology is how to effectively formulate the sequence of a biological sample (such as DNA, RNA or protein) with a discrete model or a vector that can effectively reflect its sequence pattern information or capture its key features concerned. Although several web servers and stand-alone tools were developed to address this problem, all these tools, however, can only handle one type of samples. Furthermore, the number of their built-in properties is limited, and hence it is often difficult for users to formulate the biological sequences according to their desired features or properties. In this article, with a much larger number of built-in properties, we are to propose a much more flexible web server called Pse-in-One (http://bioinformatics.hitsz.edu.cn/Pse-in-One/), which can, through its 28 different modes, generate nearly all the possible feature vectors for DNA, RNA and protein sequences. Particularly, it can also generate those feature vectors with the properties defined by users themselves. These feature vectors can be easily combined with machine-learning algorithms to develop computational predictors and analysis methods for various tasks in bioinformatics and system biology. It is anticipated that the Pse-in-One web server will become a very useful tool in computational proteomics, genomics, as well as biological sequence analysis. Moreover, to maximize users’ convenience, its stand-alone version can also be downloaded from http://bioinformatics.hitsz.edu.cn/Pse-in-One/download/, and directly run on Windows, Linux, Unix and Mac OS.

Proceedings ArticleDOI
18 May 2015
TL;DR: This work proposes a content-based recommendation system to address both the recommendation quality and the system scalability, and proposes to use a rich feature set to represent users, according to their web browsing history and search queries, using a Deep Learning approach.
Abstract: Recent online services rely heavily on automatic personalization to recommend relevant content to a large number of users. This requires systems to scale promptly to accommodate the stream of new users visiting the online services for the first time. In this work, we propose a content-based recommendation system to address both the recommendation quality and the system scalability. We propose to use a rich feature set to represent users, according to their web browsing history and search queries. We use a Deep Learning approach to map users and items to a latent space where the similarity between users and their preferred items is maximized. We extend the model to jointly learn from features of items from different domains and user features by introducing a multi-view Deep Learning model. We show how to make this rich-feature based user representation scalable by reducing the dimension of the inputs and the amount of training data. The rich user feature representation allows the model to learn relevant user behavior patterns and give useful recommendations for users who do not have any interaction with the service, given that they have adequate search and browsing history. The combination of different domains into a single model for learning helps improve the recommendation quality across all the domains, as well as having a more compact and a semantically richer user latent feature vector. We experiment with our approach on three real-world recommendation systems acquired from different sources of Microsoft products: Windows Apps recommendation, News recommendation, and Movie/TV recommendation. Results indicate that our approach is significantly better than the state-of-the-art algorithms (up to 49% enhancement on existing users and 115% enhancement on new users). In addition, experiments on a publicly open data set also indicate the superiority of our method in comparison with transitional generative topic models, for modeling cross-domain recommender systems. Scalability analysis show that our multi-view DNN model can easily scale to encompass millions of users and billions of item entries. Experimental results also confirm that combining features from all domains produces much better performance than building separate models for each domain.

Proceedings ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a triplet ranking loss to characterize that one image is more similar to the second image than to the third one, which achieved state-of-the-art performance.
Abstract: Similarity-preserving hashing is a widely-used method for nearest neighbour search in large-scale image retrieval tasks. For most existing hashing methods, an image is first encoded as a vector of hand-engineering visual features, followed by another separate projection or quantization step that generates binary codes. However, such visual feature vectors may not be optimally compatible with the coding process, thus producing sub-optimal hashing codes. In this paper, we propose a deep architecture for supervised hashing, in which images are mapped into binary codes via carefully designed deep neural networks. The pipeline of the proposed deep architecture consists of three building blocks: 1) a sub-network with a stack of convolution layers to produce the effective intermediate image features; 2) a divide-and-encode module to divide the intermediate image features into multiple branches, each encoded into one hash bit; and 3) a triplet ranking loss designed to characterize that one image is more similar to the second image than to the third one. Extensive evaluations on several benchmark image datasets show that the proposed simultaneous feature learning and hash coding pipeline brings substantial improvements over other state-of-the-art supervised or unsupervised hashing methods.

Posted Content
TL;DR: In this paper, a Siamese-triplet network with a ranking loss function was proposed to train a CNN representation for unsupervised learning of visual representations. But this method requires a large amount of unlabeled videos from the web to train the network.
Abstract: Is strong supervision necessary for learning a good visual representation? Do we really need millions of semantically-labeled images to train a Convolutional Neural Network (CNN)? In this paper, we present a simple yet surprisingly powerful approach for unsupervised learning of CNN. Specifically, we use hundreds of thousands of unlabeled videos from the web to learn visual representations. Our key idea is that visual tracking provides the supervision. That is, two patches connected by a track should have similar visual representation in deep feature space since they probably belong to the same object or object part. We design a Siamese-triplet network with a ranking loss function to train this CNN representation. Without using a single image from ImageNet, just using 100K unlabeled videos and the VOC 2012 dataset, we train an ensemble of unsupervised networks that achieves 52% mAP (no bounding box regression). This performance comes tantalizingly close to its ImageNet-supervised counterpart, an ensemble which achieves a mAP of 54.4%. We also show that our unsupervised network can perform competitively in other tasks such as surface-normal estimation.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a supervised learning framework to generate compact and bit-scalable hashing codes directly from raw images, where they pose hashing learning as a problem of regularized similarity learning.
Abstract: Extracting informative image features and learning effective approximate hashing functions are two crucial steps in image retrieval. Conventional methods often study these two steps separately, e.g., learning hash functions from a predefined hand-crafted feature space. Meanwhile, the bit lengths of output hashing codes are preset in the most previous methods, neglecting the significance level of different bits and restricting their practical flexibility. To address these issues, we propose a supervised learning framework to generate compact and bit-scalable hashing codes directly from raw images. We pose hashing learning as a problem of regularized similarity learning. In particular, we organize the training images into a batch of triplet samples, each sample containing two images with the same label and one with a different label. With these triplet samples, we maximize the margin between the matched pairs and the mismatched pairs in the Hamming space. In addition, a regularization term is introduced to enforce the adjacency consistency, i.e., images of similar appearances should have similar codes. The deep convolutional neural network is utilized to train the model in an end-to-end fashion, where discriminative image features and hash functions are simultaneously optimized. Furthermore, each bit of our hashing codes is unequally weighted, so that we can manipulate the code lengths by truncating the insignificant bits. Our framework outperforms state-of-the-arts on public benchmarks of similar image search and also achieves promising results in the application of person re-identification in surveillance. It is also shown that the generated bit-scalable hashing codes well preserve the discriminative powers with shorter code lengths.

Proceedings ArticleDOI
07 Dec 2015
TL;DR: A novel ZSL method is proposed based on unsupervised domain adaptation which uses the target domain class labels' projections in the semantic space to regularise the learned target domain projection thus effectively overcoming the projection domain shift problem.
Abstract: Zero-shot learning (ZSL) can be considered as a special case of transfer learning where the source and target domains have different tasks/label spaces and the target domain is unlabelled, providing little guidance for the knowledge transfer. A ZSL method typically assumes that the two domains share a common semantic representation space, where a visual feature vector extracted from an image/video can be projected/embedded using a projection function. Existing approaches learn the projection function from the source domain and apply it without adaptation to the target domain. They are thus based on naive knowledge transfer and the learned projections are prone to the domain shift problem. In this paper a novel ZSL method is proposed based on unsupervised domain adaptation. Specifically, we formulate a novel regularised sparse coding framework which uses the target domain class labels' projections in the semantic space to regularise the learned target domain projection thus effectively overcoming the projection domain shift problem. Extensive experiments on four object and action recognition benchmark datasets show that the proposed ZSL method significantly outperforms the state-of-the-arts.

Proceedings ArticleDOI
01 Jan 2015
TL;DR: A novel way of extracting features from short texts, based on the activation values of an inner layer of a deep convolutional neural network, is presented and a parallelizable decision-level data fusion method is presented, which is much faster, though slightly less accurate.
Abstract: We present a novel way of extracting features from short texts, based on the activation values of an inner layer of a deep convolutional neural network. We use the extracted features in multimodal sentiment analysis of short video clips representing one sentence each. We use the combined feature vectors of textual, visual, and audio modalities to train a classifier based on multiple kernel learning, which is known to be good at heterogeneous data. We obtain 14% performance improvement over the state of the art and present a parallelizable decision-level data fusion method, which is much faster, though slightly less accurate.

Journal ArticleDOI
TL;DR: A novel heterogeneous multi-view hypergraph label propagation method is formulated for zero-shot learning in the transductive embedding space that rectifies the projection shift between the auxiliary and target domains, exploits the complementarity of multiple semantic representations, and significantly outperforms existing methods for both zero- shot and N-shot recognition.
Abstract: Most existing zero-shot learning approaches exploit transfer learning via an intermediate semantic representation shared between an annotated auxiliary dataset and a target dataset with different classes and no annotation. A projection from a low-level feature space to the semantic representation space is learned from the auxiliary dataset and applied without adaptation to the target dataset. In this paper we identify two inherent limitations with these approaches. First, due to having disjoint and potentially unrelated classes, the projection functions learned from the auxiliary dataset/domain are biased when applied directly to the target dataset/domain. We call this problem the projection domain shift problem and propose a novel framework, transductive multi-view embedding , to solve it. The second limitation is the prototype sparsity problem which refers to the fact that for each target class, only a single prototype is available for zero-shot learning given a semantic representation. To overcome this problem, a novel heterogeneous multi-view hypergraph label propagation method is formulated for zero-shot learning in the transductive embedding space. It effectively exploits the complementary information offered by different semantic representations and takes advantage of the manifold structures of multiple representation spaces in a coherent manner. We demonstrate through extensive experiments that the proposed approach (1) rectifies the projection shift between the auxiliary and target domains, (2) exploits the complementarity of multiple semantic representations, (3) significantly outperforms existing methods for both zero-shot and N-shot recognition on three image and video benchmark datasets, and (4) enables novel cross-view annotation tasks.

Proceedings ArticleDOI
07 Jun 2015
TL;DR: A superpixel segmentation algorithm called Linear Spectral Clustering (LSC), which produces compact and uniform superpixels with low computational costs and is able to preserve global properties of images.
Abstract: We present in this paper a superpixel segmentation algorithm called Linear Spectral Clustering (LSC), which produces compact and uniform superpixels with low computational costs. Basically, a normalized cuts formulation of the superpixel segmentation is adopted based on a similarity metric that measures the color similarity and space proximity between image pixels. However, instead of using the traditional eigen-based algorithm, we approximate the similarity metric using a kernel function leading to an explicitly mapping of pixel values and coordinates into a high dimensional feature space. We revisit the conclusion that by appropriately weighting each point in this feature space, the objective functions of weighted K-means and normalized cuts share the same optimum point. As such, it is possible to optimize the cost function of normalized cuts by iteratively applying simple K-means clustering in the proposed feature space. LSC is of linear computational complexity and high memory efficiency and is able to preserve global properties of images. Experimental results show that LSC performs equally well or better than state of the art superpixel segmentation algorithms in terms of several commonly used evaluation metrics in image segmentation.

Journal ArticleDOI
TL;DR: Lift is proposed, an intuitive yet effective algorithm that constructs features specific to each label by conducting clustering analysis on its positive and negative instances, and then performs training and testing by querying the clustering results.
Abstract: Multi-label learning deals with the problem where each example is represented by a single instance (feature vector) while associated with a set of class labels. Existing approaches learn from multi-label data by manipulating with identical feature set, i.e. the very instance representation of each example is employed in the discrimination processes of all class labels. However, this popular strategy might be suboptimal as each label is supposed to possess specific characteristics of its own. In this paper, another strategy to learn from multi-label data is studied, where label-specific features are exploited to benefit the discrimination of different class labels. Accordingly, an intuitive yet effective algorithm named Lift , i.e. multi-label learning with Label specIfic FeaTures , is proposed. Lift firstly constructs features specific to each label by conducting clustering analysis on its positive and negative instances, and then performs training and testing by querying the clustering results. Comprehensive experiments on a total of 17 benchmark data sets clearly validate the superiority of Lift against other well-established multi-label learning algorithms as well as the effectiveness oflabel-specific features.

Journal ArticleDOI
TL;DR: This work introduces and evaluates a set of feature vector representations of crystal structures for machine learning (ML) models of formation energies of solids.
Abstract: We introduce and evaluate a set of feature vector representations of crystal structures for machine learning (ML) models of formation energies of solids. ML models of atomization energies of organi ...

Journal ArticleDOI
TL;DR: An improved algorithm SVM-RFE + CBR is proposed by incorporating the correlation bias reduction (CBR) strategy into the feature elimination procedure, which outperforms the original SVM -RFE and other typical algorithms.
Abstract: Support vector machine recursive feature elimination (SVM-RFE) is a powerful feature selection algorithm. However, when the candidate feature set contains highly correlated features, the ranking criterion of SVM-RFE will be biased, which would hinder the application of SVM-RFE on gas sensor data. In this paper, the linear and nonlinear SVM-RFE algorithms are studied. After investigating the correlation bias, an improved algorithm SVM-RFE + CBR is proposed by incorporating the correlation bias reduction (CBR) strategy into the feature elimination procedure. Experiments are conducted on a synthetic dataset and two breath analysis datasets, one of which contains temperature modulated sensors. Large and comprehensive sets of transient features are extracted from the sensor responses. The classification accuracy with feature selection proves the efficacy of the proposed SVM-RFE + CBR. It outperforms the original SVM-RFE and other typical algorithms. An ensemble method is further studied to improve the stability of the proposed method. By statistically analyzing the features’ rankings, some knowledge is obtained, which can guide future design of e-noses and feature extraction algorithms.

Proceedings ArticleDOI
19 Apr 2015
TL;DR: This work proposes a different approach, which, similar to natural language modeling, learns the language of malware spoken through the executed instructions and extracts robust, time domain features.
Abstract: Attackers often create systems that automatically rewrite and reorder their malware to avoid detection. Typical machine learning approaches, which learn a classifier based on a handcrafted feature vector, are not sufficiently robust to such reorderings. We propose a different approach, which, similar to natural language modeling, learns the language of malware spoken through the executed instructions and extracts robust, time domain features. Echo state networks (ESNs) and recurrent neural networks (RNNs) are used for the projection stage that extracts the features. These models are trained in an unsupervised fashion. A standard classifier uses these features to detect malicious files. We explore a few variants of ESNs and RNNs for the projection stage, including Max-Pooling and Half-Frame models which we propose. The best performing hybrid model uses an ESN for the recurrent model, Max-Pooling for non-linear sampling, and logistic regression for the final classification. Compared to the standard trigram of events model, it improves the true positive rate by 98.3% at a false positive rate of 0.1%.

Proceedings ArticleDOI
07 Jun 2015
TL;DR: Nearest Non-Outlier (NNO) as mentioned in this paper is an open-world recognition algorithm that evolves model efficiently, adding object categories incrementally while detecting outliers and managing open space risk.
Abstract: With the of advent rich classification models and high computational power visual recognition systems have found many operational applications. Recognition in the real world poses multiple challenges that are not apparent in controlled lab environments. The datasets are dynamic and novel categories must be continuously detected and then added. At prediction time, a trained system has to deal with myriad unseen categories. Operational systems require minimal downtime, even to learn. To handle these operational issues, we present the problem of Open World Recognition and formally define it. We prove that thresholding sums of monotonically decreasing functions of distances in linearly transformed feature space can balance “open space risk” and empirical risk. Our theory extends existing algorithms for open world recognition. We present a protocol for evaluation of open world recognition systems. We present the Nearest Non-Outlier (NNO) algorithm that evolves model efficiently, adding object categories incrementally while detecting outliers and managing open space risk. We perform experiments on the ImageNet dataset with 1.2M+ images to validate the effectiveness of our method on large scale visual recognition tasks. NNO consistently yields superior results on open world recognition.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a deep learning structure composed of a set of elaborately designed convolutional neural networks (CNNs) and a three-layer stacked auto-encoder (SAE).
Abstract: Face images appearing in multimedia applications, e.g., social networks and digital entertainment, usually exhibit dramatic pose, illumination, and expression variations, resulting in considerable performance degradation for traditional face recognition algorithms. This paper proposes a comprehensive deep learning framework to jointly learn face representation using multimodal information. The proposed deep learning structure is composed of a set of elaborately designed convolutional neural networks (CNNs) and a three-layer stacked auto-encoder (SAE). The set of CNNs extracts complementary facial features from multimodal data. Then, the extracted features are concatenated to form a high-dimensional feature vector, whose dimension is compressed by SAE. All of the CNNs are trained using a subset of 9,000 subjects from the publicly available CASIA-WebFace database, which ensures the reproducibility of this work. Using the proposed single CNN architecture and limited training data, 98.43% verification rate is achieved on the LFW database. Benefitting from the complementary information contained in multimodal data, our small ensemble system achieves higher than 99.0% recognition rate on LFW using publicly available training set.

Journal ArticleDOI
TL;DR: A comprehensive deep learning framework to jointly learn face representation using multimodal information is proposed and a small ensemble system achieves higher than 99.0% recognition rate on LFW using publicly available training set.
Abstract: Face images appeared in multimedia applications, e.g., social networks and digital entertainment, usually exhibit dramatic pose, illumination, and expression variations, resulting in considerable performance degradation for traditional face recognition algorithms. This paper proposes a comprehensive deep learning framework to jointly learn face representation using multimodal information. The proposed deep learning structure is composed of a set of elaborately designed convolutional neural networks (CNNs) and a three-layer stacked auto-encoder (SAE). The set of CNNs extracts complementary facial features from multimodal data. Then, the extracted features are concatenated to form a high-dimensional feature vector, whose dimension is compressed by SAE. All the CNNs are trained using a subset of 9,000 subjects from the publicly available CASIA-WebFace database, which ensures the reproducibility of this work. Using the proposed single CNN architecture and limited training data, 98.43% verification rate is achieved on the LFW database. Benefited from the complementary information contained in multimodal data, our small ensemble system achieves higher than 99.0% recognition rate on LFW using publicly available training set.

Journal ArticleDOI
TL;DR: The strategy to use several different feature models in a single pool, together with feature selection to optimize the fault diagnosis system, and robust performance estimation techniques usually not encountered in the context of engineering are employed are employed.
Abstract: Distinct feature extraction methods are simultaneously used to describe bearing faults. This approach produces a large number of heterogeneous features that augment discriminative information but, at the same time, create irrelevant and redundant information. A subsequent feature selection phase filters out the most discriminative features. The feature models are based on the complex envelope spectrum, statistical time- and frequency-domain parameters, and wavelet packet analysis. Feature selection is achieved by conventional search of the feature space by greedy methods. For the final fault diagnosis, the k-nearest neighbor classifier, feedforward net, and support vector machine are used. Performance criteria are the estimated error rate and the area under the receiver operating characteristic curve (AUC-ROC). Experimental results are shown for the Case Western Reserve University Bearing Data. The main contribution of this paper is the strategy to use several different feature models in a single pool, together with feature selection to optimize the fault diagnosis system. Moreover, robust performance estimation techniques usually not encountered in the context of engineering are employed.

Journal ArticleDOI
TL;DR: This article extended two Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus.
Abstract: Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two different Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora, our new models produce significant improvements on topic coherence, document clustering and document classification tasks, especially on datasets with few or short documents.

Proceedings ArticleDOI
07 Dec 2015
TL;DR: A feature selection method exploiting the convergence properties of power series of matrices and introducing the concept of infinite feature selection (Inf-FS), which permits the investigation of the importance (relevance and redundancy) of a feature when injected into an arbitrary set of cues.
Abstract: Filter-based feature selection has become crucial in many classification settings, especially object recognition, recently faced with feature learning strategies that originate thousands of cues. In this paper, we propose a feature selection method exploiting the convergence properties of power series of matrices, and introducing the concept of infinite feature selection (Inf-FS). Considering a selection of features as a path among feature distributions and letting these paths tend to an infinite number permits the investigation of the importance (relevance and redundancy) of a feature when injected into an arbitrary set of cues. Ranking the importance individuates candidate features, which turn out to be effective from a classification point of view, as proved by a thoroughly experimental section. The Inf-FS has been tested on thirteen diverse benchmarks, comparing against filters, embedded methods, and wrappers, in all the cases we achieve top performances, notably on the classification tasks of PASCAL VOC 2007-2012.