Showing papers on "Feature vector published in 2016"

PDF

Open Access

Proceedings Article•

Unsupervised deep embedding for clustering analysis

[...]

Junyuan Xie¹, Ross Girshick², Ali Farhadi¹•Institutions (2)

19 Jun 2016

TL;DR: Deep Embedded Clustering (DEC) as discussed by the authors learns a mapping from the data space to a lower-dimensional feature space in which it iteratively optimizes a clustering objective.

...read moreread less

Abstract: Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms. Relatively little work has focused on learning representations for clustering. In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural networks. DEC learns a mapping from the data space to a lower-dimensional feature space in which it iteratively optimizes a clustering objective. Our experimental evaluations on image and text corpora show significant improvement over state-of-the-art methods.

...read moreread less

1,776 citations

Proceedings Article•DOI•

Person Re-identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function

[...]

De Cheng¹, Yihong Gong¹, Sanping Zhou¹, Jinjun Wang¹, Nanning Zheng¹ - Show less +1 more•Institutions (1)

Xi'an Jiaotong University¹

27 Jun 2016

TL;DR: A novel multi-channel parts-based convolutional neural network model under the triplet framework for person re-identification that significantly outperforms many state-of-the-art approaches, including both traditional and deep network-based ones, on the challenging i-LIDS, VIPeR, PRID2011 and CUHK01 datasets.

...read moreread less

Abstract: Person re-identification across cameras remains a very challenging problem, especially when there are no overlapping fields of view between cameras. In this paper, we present a novel multi-channel parts-based convolutional neural network (CNN) model under the triplet framework for person re-identification. Specifically, the proposed CNN model consists of multiple channels to jointly learn both the global full-body and local body-parts features of the input persons. The CNN model is trained by an improved triplet loss function that serves to pull the instances of the same person closer, and at the same time push the instances belonging to different persons farther from each other in the learned feature space. Extensive comparative evaluations demonstrate that our proposed method significantly outperforms many state-of-the-art approaches, including both traditional and deep network-based ones, on the challenging i-LIDS, VIPeR, PRID2011 and CUHK01 datasets.

...read moreread less

1,265 citations

Proceedings Article•

Revisiting semi-supervised learning with graph embeddings

[...]

Zhilin Yang¹, William W. Cohen¹, Ruslan Salakhutdinov¹•Institutions (1)

Carnegie Mellon University¹

19 Jun 2016

TL;DR: In this article, a semi-supervised learning framework based on graph embeddings is proposed, where given a graph between instances, an embedding for each instance is trained to jointly predict the class label and the neighborhood context in the graph.

...read moreread less

Abstract: We present a semi-supervised learning framework based on graph embeddings. Given a graph between instances, we train an embedding for each instance to jointly predict the class label and the neighborhood context in the graph. We develop both transductive and inductive variants of our method. In the transductive variant of our method, the class labels are determined by both the learned embeddings and input feature vectors, while in the inductive variant, the embeddings are defined as a parametric function of the feature vectors, so predictions can be made on instances not seen during training. On a large and diverse set of benchmark tasks, including text classification, distantly supervised entity extraction, and entity classification, we show improved performance over many of the existing models.

...read moreread less

1,012 citations

Journal Article•DOI•

Binary grey wolf optimization approaches for feature selection

[...]

Eid Emary¹, Hossam M. Zawbaa², Aboul Ella Hassanien³•Institutions (3)

Cairo University¹, Babeș-Bolyai University², Beni-Suef University³

08 Jan 2016-Neurocomputing

TL;DR: Results prove the capability of the proposed binary version of grey wolf optimization (bGWO) to search the feature space for optimal feature combinations regardless of the initialization and the used stochastic operators.

...read moreread less

958 citations

Book Chapter•DOI•

Support Vector Machine

[...]

Shan Suthaharan¹•Institutions (1)

University of North Carolina at Greensboro¹

01 Jan 2016

TL;DR: This chapter simplifies the Lagrangian support vector machine approach using process diagrams and data flow diagrams to help readers understand theory and implement it successfully.

...read moreread less

Abstract: Support Vector Machine is one of the classical machine learning techniques that can still help solve big data classification problems. Especially, it can help the multidomain applications in a big data environment. However, the support vector machine is mathematically complex and computationally expensive. The main objective of this chapter is to simplify this approach using process diagrams and data flow diagrams to help readers understand theory and implement it successfully. To achieve this objective, the chapter is divided into three parts: (1) modeling of a linear support vector machine; (2) modeling of a nonlinear support vector machine; and (3) Lagrangian support vector machine algorithm and its implementations. The Lagrangian support vector machine with simple examples is also implemented using the R programming platform on Hadoop and non-Hadoop systems.

...read moreread less

938 citations

Journal Article•DOI•

High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning

[...]

Sarah M. Erfani¹, Sutharshan Rajasegarar¹, Shanika Karunasekera¹, Christopher Leckie¹•Institutions (1)

University of Melbourne¹

01 Oct 2016-Pattern Recognition

TL;DR: A hybrid model where an unsupervised DBN is trained to extract generic underlying features, and a one-class SVM is trained from the features learned by the DBN, which delivers a comparable accuracy with a deep autoencoder and is scalable and computationally efficient.

...read moreread less

876 citations

Journal Article•DOI•

Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations

[...]

Eliyahu Kiperwasser¹, Yoav Goldberg¹•Institutions (1)

Bar-Ilan University¹

20 Jul 2016-Transactions of the Association for Computational Linguistics

TL;DR: This paper proposed a simple and effective scheme for dependency parsing which is based on bidirectional-LSTMs (BiLSTM) and feature vectors are constructed by concatenating a few BiLSTMM vectors.

...read moreread less

Abstract: We present a simple and effective scheme for dependency parsing which is based on bidirectional-LSTMs (BiLSTMs). Each sentence token is associated with a BiLSTM vector representing the token in its sentential context, and feature vectors are constructed by concatenating a few BiLSTM vectors. The BiLSTM is trained jointly with the parser objective, resulting in very effective feature extractors for parsing. We demonstrate the effectiveness of the approach by applying it to a greedy transition-based parser as well as to a globally optimized graph-based parser. The resulting parsers have very simple architectures, and match or surpass the state-of-the-art accuracies on English and Chinese.

...read moreread less

702 citations

Journal Article•DOI•

A Privacy-Preserving and Copy-Deterrence Content-Based Image Retrieval Scheme in Cloud Computing

[...]

Zhihua Xia¹, Xinhui Wang¹, Liangao Zhang¹, Zhan Qin², Xingming Sun¹, Kui Ren¹ - Show less +2 more•Institutions (2)

Nanjing University of Information Science and Technology¹, University at Buffalo²

01 Nov 2016-IEEE Transactions on Information Forensics and Security

TL;DR: A unique watermark is directly embedded into the encrypted images by the cloud server before images are sent to the query user, and when image copy is found, the unlawful query user who distributed the image can be traced by the watermark extraction.

...read moreread less

Abstract: With the increasing importance of images in people’s daily life, content-based image retrieval (CBIR) has been widely studied. Compared with text documents, images consume much more storage space. Hence, its maintenance is considered to be a typical example for cloud storage outsourcing. For privacy-preserving purposes, sensitive images, such as medical and personal images, need to be encrypted before outsourcing, which makes the CBIR technologies in plaintext domain to be unusable. In this paper, we propose a scheme that supports CBIR over encrypted images without leaking the sensitive information to the cloud server. First, feature vectors are extracted to represent the corresponding images. After that, the pre-filter tables are constructed by locality-sensitive hashing to increase search efficiency. Moreover, the feature vectors are protected by the secure kNN algorithm, and image pixels are encrypted by a standard stream cipher. In addition, considering the case that the authorized query users may illegally copy and distribute the retrieved images to someone unauthorized, we propose a watermark-based protocol to deter such illegal distributions. In our watermark-based protocol, a unique watermark is directly embedded into the encrypted images by the cloud server before images are sent to the query user. Hence, when image copy is found, the unlawful query user who distributed the image can be traced by the watermark extraction. The security analysis and the experiments show the security and efficiency of the proposed scheme.

...read moreread less

563 citations

Journal Article•DOI•

Discovering phase transitions with unsupervised learning

[...]

Lei Wang¹•Institutions (1)

Chinese Academy of Sciences¹

02 Nov 2016-Physical Review B

TL;DR: This work shows that unsupervised learning techniques can be readily used to identify phases and phases transitions of many-body systems by using principal component analysis to extract relevant low-dimensional representations of the original data and clustering analysis to identify distinct phases in the feature space.

...read moreread less

Abstract: Unsupervised learning is a discipline of machine learning which aims at discovering patterns in large data sets or classifying the data into several categories without being trained explicitly. We show that unsupervised learning techniques can be readily used to identify phases and phases transitions of many-body systems. Starting with raw spin configurations of a prototypical Ising model, we use principal component analysis to extract relevant low-dimensional representations of the original data and use clustering analysis to identify distinct phases in the feature space. This approach successfully finds physical concepts such as the order parameter and structure factor to be indicators of a phase transition. We discuss the future prospects of discovering more complex phases and phase transitions using unsupervised learning techniques.

...read moreread less

535 citations

Proceedings Article•

Particular object retrieval with integral max-pooling of CNN activations

[...]

Giorgos Tolias, Ronan Sicre, Hervé Jégou¹•Institutions (1)

Facebook¹

02 May 2016

TL;DR: In this article, the authors revisited both retrieval stages, namely initial search and re-ranking, by employing the same primitive information derived from the CNN, and built compact feature vectors that encode several image regions without the need to feed multiple inputs to the network.

...read moreread less

Abstract: Recently, image representation built upon Convolutional Neural Network (CNN) has been shown to provide effective descriptors for image search, outperforming pre-CNN features as short-vector representations. Yet such models are not compatible with geometry-aware re-ranking methods and still outperformed, on some particular object retrieval benchmarks, by traditional image search systems relying on precise descriptor matching, geometric re-ranking, or query expansion. This work revisits both retrieval stages, namely initial search and re-ranking, by employing the same primitive information derived from the CNN. We build compact feature vectors that encode several image regions without the need to feed multiple inputs to the network. Furthermore, we extend integral images to handle max-pooling on convolutional layer activations, allowing us to efficiently localize matching objects. The resulting bounding box is finally used for image re-ranking. As a result, this paper significantly improves existing CNN-based recognition pipeline: We report for the first time results competing with traditional methods on the challenging Oxford5k and Paris6k datasets.

...read moreread less

527 citations

Proceedings Article•DOI•

Product-Based Neural Networks for User Response Prediction

[...]

Yanru Qu¹, Han Cai¹, Kan Ren¹, Weinan Zhang¹, Yong Yu¹, Ying Wen², Jun Wang³ - Show less +3 more•Institutions (3)

Shanghai Jiao Tong University¹, Peking University², University College London³

01 Dec 2016

TL;DR: A Product-based Neural Networks (PNN) with an embedding layer to learn a distributed representation of the categorical data, a product layer to capture interactive patterns between interfield categories, and further fully connected layers to explore high-order feature interactions.

...read moreread less

Abstract: Predicting user responses, such as clicks and conversions, is of great importance and has found its usage inmany Web applications including recommender systems, websearch and online advertising. The data in those applicationsis mostly categorical and contains multiple fields, a typicalrepresentation is to transform it into a high-dimensional sparsebinary feature representation via one-hot encoding. Facing withthe extreme sparsity, traditional models may limit their capacityof mining shallow patterns from the data, i.e. low-order featurecombinations. Deep models like deep neural networks, on theother hand, cannot be directly applied for the high-dimensionalinput because of the huge feature space. In this paper, we proposea Product-based Neural Networks (PNN) with an embeddinglayer to learn a distributed representation of the categorical data, a product layer to capture interactive patterns between interfieldcategories, and further fully connected layers to explorehigh-order feature interactions. Our experimental results on twolarge-scale real-world ad click datasets demonstrate that PNNsconsistently outperform the state-of-the-art models on various metrics.

...read moreread less

Journal Article•DOI•

Ensemble of keyword extraction methods and classifiers in text classification

[...]

Aytuğ Onan¹, Serdar Korukoglu², Hasan Bulut²•Institutions (2)

Celal Bayar University¹, Ege University²

15 Sep 2016-Expert Systems With Applications

TL;DR: The empirical analysis indicates that the utilization of keyword-based representation of text documents in conjunction with ensemble learning can enhance the predictive performance and scalability ofText classification schemes, which is of practical importance in the application fields of text classification.

...read moreread less

Abstract: Text classification is a domain with high dimensional feature space.Extracting the keywords as the features can be extremely useful in text classification.An empirical analysis of five statistical keyword extraction methods.A comprehensive analysis of classifier and keyword extraction ensembles.For ACM collection, a classification accuracy of 93.80% with Bagging ensemble of Random Forest. Automatic keyword extraction is an important research direction in text mining, natural language processing and information retrieval. Keyword extraction enables us to represent text documents in a condensed way. The compact representation of documents can be helpful in several applications, such as automatic indexing, automatic summarization, automatic classification, clustering and filtering. For instance, text classification is a domain with high dimensional feature space challenge. Hence, extracting the most important/relevant words about the content of the document and using these keywords as the features can be extremely useful. In this regard, this study examines the predictive performance of five statistical keyword extraction methods (most frequent measure based keyword extraction, term frequency-inverse sentence frequency based keyword extraction, co-occurrence statistical information based keyword extraction, eccentricity-based keyword extraction and TextRank algorithm) on classification algorithms and ensemble methods for scientific text document classification (categorization). In the study, a comprehensive study of comparing base learning algorithms (Naive Bayes, support vector machines, logistic regression and Random Forest) with five widely utilized ensemble methods (AdaBoost, Bagging, Dagging, Random Subspace and Majority Voting) is conducted. To the best of our knowledge, this is the first empirical analysis, which evaluates the effectiveness of statistical keyword extraction methods in conjunction with ensemble learning algorithms. The classification schemes are compared in terms of classification accuracy, F-measure and area under curve values. To validate the empirical analysis, two-way ANOVA test is employed. The experimental analysis indicates that Bagging ensemble of Random Forest with the most-frequent based keyword extraction method yields promising results for text classification. For ACM document collection, the highest average predictive performance (93.80%) is obtained with the utilization of the most frequent based keyword extraction method with Bagging ensemble of Random Forest algorithm. In general, Bagging and Random Subspace ensembles of Random Forest yield promising results. The empirical analysis indicates that the utilization of keyword-based representation of text documents in conjunction with ensemble learning can enhance the predictive performance and scalability of text classification schemes, which is of practical importance in the application fields of text classification.

...read moreread less

Book Chapter•DOI•

RDF2Vec: RDF Graph Embeddings for Data Mining

[...]

Petar Ristoski¹, Heiko Paulheim¹•Institutions (1)

University of Mannheim¹

17 Oct 2016

TL;DR: RDF2Vec is presented, an approach that uses language modeling approaches for unsupervised feature extraction from sequences of words, and adapts them to RDF graphs, and shows that feature vector representations of general knowledge graphs such as DBpedia and Wikidata can be easily reused for different tasks.

...read moreread less

Abstract: Linked Open Data has been recognized as a valuable source for background information in data mining. However, most data mining tools require features in propositional form, i.e., a vector of nominal or numerical features associated with an instance, while Linked Open Data sources are graphs by nature. In this paper, we present RDF2Vec, an approach that uses language modeling approaches for unsupervised feature extraction from sequences of words, and adapts them to RDF graphs. We generate sequences by leveraging local information from graph sub-structures, harvested by Weisfeiler-Lehman Subtree RDF Graph Kernels and graph walks, and learn latent numerical representations of entities in RDF graphs. Our evaluation shows that such vector representations outperform existing techniques for the propositionalization of RDF graphs on a variety of different predictive machine learning tasks, and that feature vector representations of general knowledge graphs such as DBpedia and Wikidata can be easily reused for different tasks.

...read moreread less

Journal Article•DOI•

Deep Learning in Label-free Cell Classification

[...]

Claire Lifan Chen¹, Ata Mahjoubfar², Ata Mahjoubfar¹, Li Chia Tai², Ian K. Blaby¹, Allen Huang¹, Kayvan Niazi², Kayvan Niazi¹, Bahram Jalali - Show less +5 more•Institutions (2)

University of California, Los Angeles¹, California NanoSystems Institute²

15 Mar 2016-Scientific Reports

TL;DR: This work integrates feature extraction and deep learning with high-throughput quantitative imaging enabled by photonic time stretch, achieving record high accuracy in label-free cell classification.

...read moreread less

Abstract: Label-free cell analysis is essential to personalized genomics, cancer diagnostics, and drug development as it avoids adverse effects of staining reagents on cellular viability and cell signaling. However, currently available label-free cell assays mostly rely only on a single feature and lack sufficient differentiation. Also, the sample size analyzed by these assays is limited due to their low throughput. Here, we integrate feature extraction and deep learning with high-throughput quantitative imaging enabled by photonic time stretch, achieving record high accuracy in label-free cell classification. Our system captures quantitative optical phase and intensity images and extracts multiple biophysical features of individual cells. These biophysical measurements form a hyperdimensional feature space in which supervised learning is performed for cell classification. We compare various learning algorithms including artificial neural network, support vector machine, logistic regression, and a novel deep learning pipeline, which adopts global optimization of receiver operating characteristics. As a validation of the enhanced sensitivity and specificity of our system, we show classification of white blood T-cells against colon cancer cells, as well as lipid accumulating algal strains for biofuel production. This system opens up a new path to data-driven phenotypic diagnosis and better understanding of the heterogeneous gene expressions in cells.

...read moreread less

Journal Article•DOI•

Polarimetric SAR Image Classification Using Deep Convolutional Neural Networks

[...]

Yu Zhou¹, Haipeng Wang¹, Feng Xu¹, Ya-Qiu Jin¹•Institutions (1)

Fudan University¹

29 Nov 2016-IEEE Geoscience and Remote Sensing Letters

TL;DR: This letter investigates the suitability and potential of deep convolutional neural network in supervised classification of polarimetric synthetic aperture radar (POLSAR) images and shows that slant built-up areas, which are conventionally mixed with vegetated area in polarIMetric feature space, can now be successfully distinguished after taking into account spatial features.

...read moreread less

Abstract: Deep convolutional neural networks have achieved great success in computer vision and many other areas. They automatically extract translational-invariant spatial features and integrate with neural network-based classifier. This letter investigates the suitability and potential of deep convolutional neural network in supervised classification of polarimetric synthetic aperture radar (POLSAR) images. The multilooked POLSAR data in the format of coherency or covariance matrix is first converted into a normalized 6-D real feature vector. The six-channel real image is then fed into a four-layer convolutional neural network tailored for POLSAR classification. With two cascaded convolutional layers, the designed deep neural network can automatically learn hierarchical polarimetric spatial features from the data. Two experiments are presented using the AIRSAR data of San Francisco, CA, and Flevoland, The Netherlands. Classification result of the San Francisco case shows that slant built-up areas, which are conventionally mixed with vegetated area in polarimetric feature space, can now be successfully distinguished after taking into account spatial features. Quantitative analysis with respect to ground truth information available for the Flevoland test site shows that the proposed method achieves an accuracy of 92.46% in classifying the considered 15 classes. Such results are comparable with the state of the art.

...read moreread less

Journal Article•DOI•

Binary ant lion approaches for feature selection

[...]

Eid Emary¹, Hossam M. Zawbaa², Aboul Ella Hassanien³•Institutions (3)

Arab Open University¹, Babeș-Bolyai University², Cairo University³

12 Nov 2016-Neurocomputing

TL;DR: Binary variants of the ant lion optimizer (ALO) are proposed and used to select the optimal feature subset for classification purposes in wrapper-mode and prove the capability of the proposed binary algorithms to search the feature space for optimal feature combinations regardless of the initialization and the used stochastic operators.

...read moreread less

Proceedings Article•

Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling

[...]

Peng Zhou¹, Zhenyu Qi¹, Suncong Zheng¹, Jiaming Xu¹, Hongyun Bao¹, Bo Xu¹ - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

21 Nov 2016

TL;DR: One of the proposed models achieves highest accuracy on Stanford Sentiment Treebank binary classification and fine-grained classification tasks and also utilizes 2D convolution to sample more meaningful information of the matrix.

...read moreread less

Abstract: Recurrent Neural Network (RNN) is one of the most popular architectures used in Natural Language Processsing (NLP) tasks because its recurrent structure is very suitable to process variablelength text. RNN can utilize distributed representations of words by first converting the tokens comprising each text into vectors, which form a matrix. And this matrix includes two dimensions: the time-step dimension and the feature vector dimension. Then most existing models usually utilize one-dimensional (1D) max pooling operation or attention-based operation only on the time-step dimension to obtain a fixed-length vector. However, the features on the feature vector dimension are not mutually independent, and simply applying 1D pooling operation over the time-step dimension independently may destroy the structure of the feature representation. On the other hand, applying two-dimensional (2D) pooling operation over the two dimensions may sample more meaningful features for sequence modeling tasks. To integrate the features on both dimensions of the matrix, this paper explores applying 2D max pooling operation to obtain a fixed-length representation of the text. This paper also utilizes 2D convolution to sample more meaningful information of the matrix. Experiments are conducted on six text classification tasks, including sentiment analysis, question classification, subjectivity classification and newsgroup classification. Compared with the state-of-the-art models, the proposed models achieve excellent performance on 4 out of 6 tasks. Specifically, one of the proposed models achieves highest accuracy on Stanford Sentiment Treebank binary classification and fine-grained classification tasks.

...read moreread less

Book Chapter•DOI•

Deep Learning over Multi-field Categorical Data

[...]

Weinan Zhang¹, Tianming Du¹, Jun Wang¹•Institutions (1)

University College London¹

20 Mar 2016

TL;DR: In this article, two novel models using deep neural networks (DNNs) were proposed to automatically learn effective patterns from categorical feature interactions and make predictions of users' ad clicks.

...read moreread less

Abstract: Predicting user responses, such as click-through rate and conversion rate, are critical in many web applications including web search, personalised recommendation, and online advertising. Different from continuous raw features that we usually found in the image and audio domains, the input features in web space are always of multi-field and are mostly discrete and categorical while their dependencies are little known. Major user response prediction models have to either limit themselves to linear models or require manually building up high-order combination features. The former loses the ability of exploring feature interactions, while the latter results in a heavy computation in the large feature space. To tackle the issue, we propose two novel models using deep neural networks (DNNs) to automatically learn effective patterns from categorical feature interactions and make predictions of users’ ad clicks. To get our DNNs efficiently work, we propose to leverage three feature transformation methods, i.e., factorisation machines (FMs), restricted Boltzmann machines (RBMs) and denoising auto-encoders (DAEs). This paper presents the structure of our models and their efficient training algorithms. The large-scale experiments with real-world data demonstrate that our methods work better than major state-of-the-art models.

...read moreread less

Journal Article•DOI•

A Main Directional Mean Optical Flow Feature for Spontaneous Micro-Expression Recognition

[...]

Yong-Jin Liu¹, Jin-Kai Zhang¹, Wen-Jing Yan², Su-Jing Wang³, Guoying Zhao⁴, Xiaolan Fu³ - Show less +2 more•Institutions (4)

Tsinghua University¹, Wenzhou University², Chinese Academy of Sciences³, University of Oulu⁴

01 Oct 2016-IEEE Transactions on Affective Computing

TL;DR: A robust optical flow method is applied on micro-expression video clips and the MDMO feature, a ROI-based, normalized statistic feature that considers both local statistic motion information and its spatial location, can achieve better performance than two state-of-the-art baseline features.

...read moreread less

Abstract: Micro-expressions are brief facial movements characterized by short duration, involuntariness and low intensity. Recognition of spontaneous facial micro-expressions is a great challenge. In this paper, we propose a simple yet effective Main Directional Mean Optical-flow (MDMO) feature for micro-expression recognition. We apply a robust optical flow method on micro-expression video clips and partition the facial area into regions of interest (ROIs) based partially on action units. The MDMO is a ROI-based, normalized statistic feature that considers both local statistic motion information and its spatial location. One of the significant characteristics of MDMO is that its feature dimension is small. The length of a MDMO feature vector is $36 \times 2=72$ , where $36$ is the number of ROIs. Furthermore, to reduce the influence of noise due to head movements, we propose an optical-flow-driven method to align all frames of a micro-expression video clip. Finally, a SVM classifier with the proposed MDMO feature is adopted for micro-expression recognition. Experimental results on three spontaneous micro-expression databases, namely SMIC, CASME and CASME II, show that the MDMO can achieve better performance than two state-of-the-art baseline features, i.e., LBP-TOP and HOOF.

...read moreread less

Journal Article•DOI•

Visual Saliency Detection Based on Multiscale Deep CNN Features

[...]

Guanbin Li¹, Yizhou Yu²•Institutions (2)

Sun Yat-sen University¹, University of Hong Kong²

01 Nov 2016-IEEE Transactions on Image Processing

TL;DR: This paper discovers that a high-quality visual saliency model can be learned from multiscale features extracted using deep convolutional neural networks (CNNs), which have had many successes in visual recognition tasks.

...read moreread less

Abstract: Visual saliency is a fundamental problem in both cognitive and computational sciences, including computer vision. In this paper, we discover that a high-quality visual saliency model can be learned from multiscale features extracted using deep convolutional neural networks (CNNs), which have had many successes in visual recognition tasks. For learning such saliency models, we introduce a neural network architecture, which has fully connected layers on top of CNNs responsible for feature extraction at three different scales. The penultimate layer of our neural network has been confirmed to be a discriminative high-level feature vector for saliency detection, which we call deep contrast feature. To generate a more robust feature, we integrate handcrafted low-level features with our deep contrast feature. To promote further research and evaluation of visual saliency models, we also construct a new large database of 4447 challenging images and their pixelwise saliency annotations. Experimental results demonstrate that our proposed method is capable of achieving the state-of-the-art performance on all public benchmarks, improving the F-measure by 6.12% and 10%, respectively, on the DUT-OMRON data set and our new data set (HKU-IS), and lowering the mean absolute error by 9% and 35.3%, respectively, on these two data sets.

...read moreread less

Journal Article•DOI•

Discriminant Correlation Analysis: Real-Time Feature Level Fusion for Multimodal Biometric Recognition

[...]

Mohammad Haghighat¹, Mohamed Abdel-Mottaleb¹, Wadee Alhalabi²•Institutions (2)

University of Miami¹, Effat University²

01 Sep 2016-IEEE Transactions on Information Forensics and Security

TL;DR: In this paper, a discriminant correlation analysis (DCA) is proposed for feature fusion by maximizing the pairwise correlations across the two feature sets and eliminating the between-class correlations and restricting the correlations to be within the classes.

...read moreread less

Abstract: Information fusion is a key step in multimodal biometric systems. The fusion of information can occur at different levels of a recognition system, i.e., at the feature level, matching-score level, or decision level. However, feature level fusion is believed to be more effective owing to the fact that a feature set contains richer information about the input biometric data than the matching score or the output decision of a classifier. The goal of feature fusion for recognition is to combine relevant information from two or more feature vectors into a single one with more discriminative power than any of the input feature vectors. In pattern recognition problems, we are also interested in separating the classes. In this paper, we present discriminant correlation analysis (DCA), a feature level fusion technique that incorporates the class associations into the correlation analysis of the feature sets. DCA performs an effective feature fusion by maximizing the pairwise correlations across the two feature sets and, at the same time, eliminating the between-class correlations and restricting the correlations to be within the classes. Our proposed method can be used in pattern recognition applications for fusing the features extracted from multiple modalities or combining different feature vectors extracted from a single modality. It is noteworthy that DCA is the first technique that considers class structure in feature fusion. Moreover, it has a very low computational complexity and it can be employed in real-time applications. Multiple sets of experiments performed on various biometric databases and using different feature extraction techniques, show the effectiveness of our proposed method, which outperforms other state-of-the-art approaches.

...read moreread less

Posted Content•

Image-based localization using LSTMs for structured feature correlation

[...]

Florian Walch, Caner Hazirbas¹, Laura Leal-Taixé¹, Torsten Sattler², Sebastian Hilsenbeck¹, Daniel Cremers¹ - Show less +2 more•Institutions (2)

Technische Universität München¹, ETH Zurich²

23 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a new CNN+LSTM architecture for camera pose regression for indoor and outdoor scenes is proposed, which makes use of LSTM units on the CNN output, which play the role of a structured dimensionality reduction on the feature vector.

...read moreread less

Abstract: In this work we propose a new CNN+LSTM architecture for camera pose regression for indoor and outdoor scenes. CNNs allow us to learn suitable feature representations for localization that are robust against motion blur and illumination changes. We make use of LSTM units on the CNN output, which play the role of a structured dimensionality reduction on the feature vector, leading to drastic improvements in localization performance. We provide extensive quantitative comparison of CNN-based and SIFT-based localization methods, showing the weaknesses and strengths of each. Furthermore, we present a new large-scale indoor dataset with accurate ground truth from a laser scanner. Experimental results on both indoor and outdoor public datasets show our method outperforms existing deep architectures, and can localize images in hard conditions, e.g., in the presence of mostly textureless surfaces, where classic SIFT-based methods fail.

...read moreread less

Posted Content•

Universal Correspondence Network

[...]

Christopher Choy, JunYoung Gwak, Silvio Savarese, Manmohan Chandraker

11 Jun 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: A convolutional spatial transformer to mimic patch normalization in traditional features like SIFT is proposed, which is shown to dramatically boost accuracy for semantic correspondences across intra-class shape variations.

...read moreread less

Abstract: We present a deep learning framework for accurate visual correspondences and demonstrate its effectiveness for both geometric and semantic matching, spanning across rigid motions to intra-class shape or appearance variations. In contrast to previous CNN-based approaches that optimize a surrogate patch similarity objective, we use deep metric learning to directly learn a feature space that preserves either geometric or semantic similarity. Our fully convolutional architecture, along with a novel correspondence contrastive loss allows faster training by effective reuse of computations, accurate gradient computation through the use of thousands of examples per image pair and faster testing with $O(n)$ feed forward passes for $n$ keypoints, instead of $O(n^2)$ for typical patch similarity methods. We propose a convolutional spatial transformer to mimic patch normalization in traditional features like SIFT, which is shown to dramatically boost accuracy for semantic correspondences across intra-class shape variations. Extensive experiments on KITTI, PASCAL, and CUB-2011 datasets demonstrate the significant advantages of our features over prior works that use either hand-constructed or learned features.

...read moreread less

Posted Content•

Neural Aggregation Network for Video Face Recognition

[...]

Jiaolong Yang¹, Peiran Ren², Dongqing Zhang², Dong Chen², Fang Wen², Hongdong Li¹, Gang Hua² - Show less +3 more•Institutions (2)

Australian National University¹, Microsoft²

17 Mar 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: Wang et al. as mentioned in this paper proposed a Neural Aggregation Network (NAN) for video face recognition, which consists of two attention blocks which adaptively aggregate the feature vectors to form a single feature inside the convex hull spanned by them.

...read moreread less

Abstract: This paper presents a Neural Aggregation Network (NAN) for video face recognition. The network takes a face video or face image set of a person with a variable number of face images as its input, and produces a compact, fixed-dimension feature representation for recognition. The whole network is composed of two modules. The feature embedding module is a deep Convolutional Neural Network (CNN) which maps each face image to a feature vector. The aggregation module consists of two attention blocks which adaptively aggregate the feature vectors to form a single feature inside the convex hull spanned by them. Due to the attention mechanism, the aggregation is invariant to the image order. Our NAN is trained with a standard classification or verification loss without any extra supervision signal, and we found that it automatically learns to advocate high-quality face images while repelling low-quality ones such as blurred, occluded and improperly exposed faces. The experiments on IJB-A, YouTube Face, Celebrity-1000 video face recognition benchmarks show that it consistently outperforms naive aggregation methods and achieves the state-of-the-art accuracy.

...read moreread less

Journal Article•DOI•

A Survey on Feature Selection

[...]

Jianyu Miao¹, Lingfeng Niu¹•Institutions (1)

Chinese Academy of Sciences¹

01 Jan 2016-Procedia Computer Science

TL;DR: The experimental results show that unsupervised feature selection algorithms benefits machine learning tasks improving the performance of clustering.

...read moreread less

Posted Content•

Embedding Deep Metric for Person Re-identication A Study Against Large Variations

[...]

Hailin Shi, Yang Yang, Xiangyu Zhu, Shengcai Liao, Zhen Lei, Wei-Shi Zheng, Stan Z. Li - Show less +3 more

01 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel moderate positive sample mining method to train robust CNN for person re-identification, dealing with the problem of large variation is proposed and the learning by a metric weight constraint is improved, so that the learned metric has a better generalization ability.

...read moreread less

Abstract: Person re-identification is challenging due to the large variations of pose, illumination, occlusion and camera view. Owing to these variations, the pedestrian data is distributed as highly-curved manifolds in the feature space, despite the current convolutional neural networks (CNN)'s capability of feature extraction. However, the distribution is unknown, so it is difficult to use the geodesic distance when comparing two samples. In practice, the current deep embedding methods use the Euclidean distance for the training and test. On the other hand, the manifold learning methods suggest to use the Euclidean distance in the local range, combining with the graphical relationship between samples, for approximating the geodesic distance. From this point of view, selecting suitable positive i.e. intra-class) training samples within a local range is critical for training the CNN embedding, especially when the data has large intra-class variations. In this paper, we propose a novel moderate positive sample mining method to train robust CNN for person re-identification, dealing with the problem of large variation. In addition, we improve the learning by a metric weight constraint, so that the learned metric has a better generalization ability. Experiments show that these two strategies are effective in learning robust deep metrics for person re-identification, and accordingly our deep model significantly outperforms the state-of-the-art methods on several benchmarks of person re-identification. Therefore, the study presented in this paper may be useful in inspiring new designs of deep models for person re-identification.

...read moreread less

Proceedings Article•DOI•

An enhanced deep feature representation for person re-identification

[...]

Shangxuan Wu¹, Ying-Cong Chen¹, Xiang Li¹, Ancong Wu¹, Jinjie You¹, Wei-Shi Zheng¹ - Show less +2 more•Institutions (1)

Sun Yat-sen University¹

07 Mar 2016

TL;DR: This paper claims that hand-crafted histogram features can be complementary to Convolutional Neural Network features and proposes a novel feature extraction model called Feature Fusion Net (FFN) for pedestrian image representation.

...read moreread less

Abstract: Feature representation and metric learning are two critical components in person re-identification models. In this paper, we focus on the feature representation and claim that hand-crafted histogram features can be complementary to Convolutional Neural Network (CNN) features. We propose a novel feature extraction model called Feature Fusion Net (FFN) for pedestrian image representation. In FFN, back propagation makes CNN features constrained by the handcrafted features. Utilizing color histogram features (RGB, HSV, YCbCr, Lab and YIQ) and texture features (multi-scale and multi-orientation Gabor features), we get a new deep feature representation that is more discriminative and compact. Experiments on three challenging datasets (VIPeR, CUHK01, PRID450s) validates the effectiveness of our proposal.

...read moreread less

Journal Article•DOI•

A Deep Neural Network-Driven Feature Learning Method for Multi-view Facial Expression Recognition

[...]

Tong Zhang¹, Wenming Zheng¹, Zhen Cui¹, Yuan Zong¹, Jingwei Yan¹, Keyu Yan¹ - Show less +2 more•Institutions (1)

Southeast University¹

01 Dec 2016-IEEE Transactions on Multimedia

TL;DR: A novel deep neural network (DNN)-driven feature learning method is proposed and applied to multi-view facial expression recognition (FER) and the experimental results show that the algorithm outperforms the state-of-the-art methods.

...read moreread less

Abstract: In this paper, a novel deep neural network (DNN)-driven feature learning method is proposed and applied to multi-view facial expression recognition (FER). In this method, scale invariant feature transform (SIFT) features corresponding to a set of landmark points are first extracted from each facial image. Then, a feature matrix consisting of the extracted SIFT feature vectors is used as input data and sent to a well-designed DNN model for learning optimal discriminative features for expression classification. The proposed DNN model employs several layers to characterize the corresponding relationship between the SIFT feature vectors and their corresponding high-level semantic information. By training the DNN model, we are able to learn a set of optimal features that are well suitable for classifying the facial expressions across different facial views. To evaluate the effectiveness of the proposed method, two nonfrontal facial expression databases, namely BU-3DFE and Multi-PIE, are respectively used to testify our method and the experimental results show that our algorithm outperforms the state-of-the-art methods.

...read moreread less

Posted Content•

DeepGaze II: Reading fixations from deep features trained on object recognition.

[...]

Matthias Kümmerer, Thomas S. A. Wallis, Matthias Bethge

05 Oct 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: The model uses the features from the VGG-19 deep neural network trained to identify objects in images for saliency prediction with no additional fine-tuning and achieves top performance in area under the curve metrics on the MIT300 hold-out benchmark.

...read moreread less

Abstract: Here we present DeepGaze II, a model that predicts where people look in images. The model uses the features from the VGG-19 deep neural network trained to identify objects in images. Contrary to other saliency models that use deep features, here we use the VGG features for saliency prediction with no additional fine-tuning (rather, a few readout layers are trained on top of the VGG features to predict saliency). The model is therefore a strong test of transfer learning. After conservative cross-validation, DeepGaze II explains about 87% of the explainable information gain in the patterns of fixations and achieves top performance in area under the curve metrics on the MIT300 hold-out benchmark. These results corroborate the finding from DeepGaze I (which explained 56% of the explainable information gain), that deep features trained on object recognition provide a versatile feature space for performing related visual tasks. We explore the factors that contribute to this success and present several informative image examples. A web service is available to compute model predictions at this http URL.

...read moreread less

Proceedings Article•

Learning to poke by poking: experiential learning of intuitive physics

[...]

Pulkit Agrawal¹, Ashvin Nair¹, Pieter Abbeel¹, Jitendra Malik¹, Sergey Levine¹ - Show less +1 more•Institutions (1)

University of California, Berkeley¹

05 Dec 2016

TL;DR: In this paper, the authors investigate an experiential learning paradigm for acquiring an internal model of intuitive physics, by jointly estimating forward and inverse models of dynamics, which can then be used for multi-step decision making.

...read moreread less

Abstract: We investigate an experiential learning paradigm for acquiring an internal model of intuitive physics. Our model is evaluated on a real-world robotic manipulation task that requires displacing objects to target locations by poking. The robot gathered over 400 hours of experience by executing more than 100K pokes on different objects. We propose a novel approach based on deep neural networks for modeling the dynamics of robot's interactions directly from images, by jointly estimating forward and inverse models of dynamics. The inverse model objective provides supervision to construct informative visual features, which the forward model can then predict and in turn regularize the feature space for the inverse model. The interplay between these two objectives creates useful, accurate models that can then be used for multi-step decision making. This formulation has the additional benefit that it is possible to learn forward models in an abstract feature space and thus alleviate the need of predicting pixels. Our experiments show that this joint modeling approach outperforms alternative methods.

...read moreread less

Collapse