Showing papers in "Pattern Recognition in 2017"
TL;DR: A novel methodology for interpreting generic multilayer neural networks by decomposing the network classification decision into contributions of its input elements by backpropagating the explanations from the output to the input layer is introduced.
Abstract: Nonlinear methods such as Deep Neural Networks (DNNs) are the gold standard for various challenging machine learning problems such as image recognition. Although these methods perform impressively well, they have a significant disadvantage, the lack of transparency, limiting the interpretability of the solution and thus the scope of application in practice. Especially DNNs act as black boxes due to their multilayer nonlinear structure. In this paper we introduce a novel methodology for interpreting generic multilayer neural networks by decomposing the network classification decision into contributions of its input elements. Although our focus is on image classification, the method is applicable to a broad set of input data, learning tasks and network architectures. Our method called deep Taylor decomposition efficiently utilizes the structure of the network by backpropagating the explanations from the output to the input layer. We evaluate the proposed method empirically on the MNIST and ILSVRC data sets. HighlightsA novel method to explain nonlinear classification decisions in terms of input variables is introduced.The method is based on Taylor expansions and decomposes the output of a deep neural network in terms of input variables.The resulting deep Taylor decomposition can be applied directly to existing neural networks without retraining.The method is tested on two large-scale neural networks for image classification: BVLC CaffeNet and GoogleNet.
TL;DR: An analysis of three possible strategies for exploiting the power of existing convolutional neural networks (ConvNets or CNNs) in different scenarios from the ones they were trained: full training, fine tuning, and using ConvNets as feature extractors points that fine tuning tends to be the best performing strategy.
Abstract: We present an analysis of three possible strategies for exploiting the power of existing convolutional neural networks (ConvNets or CNNs) in different scenarios from the ones they were trained: full training, fine tuning, and using ConvNets as feature extractors. In many applications, especially including remote sensing, it is not feasible to fully design and train a new ConvNet, as this usually requires a considerable amount of labeled data and demands high computational costs. Therefore, it is important to understand how to better use existing ConvNets. We perform experiments with six popular ConvNets using three remote sensing datasets. We also compare ConvNets in each strategy with existing descriptors and with state-of-the-art baselines. Results point that fine tuning tends to be the best performing strategy. In fact, using the features from the fine-tuned ConvNet with linear SVM obtains the best results. We also achieved state-of-the-art results for the three datasets used.
TL;DR: In this paper, a deep autoencoder-based approach is proposed to identify signal features from low-light images and adaptively brighten images without over-amplifying/saturating the lighter parts in images with high dynamic range.
Abstract: In surveillance, monitoring and tactical reconnaissance, gathering visual information from a dynamic environment and accurately processing such data are essential to making informed decisions and ensuring the success of a mission. Camera sensors are often cost-limited to capture clear images or videos taken in a poorly-lit environment. Many applications aim to enhance brightness, contrast and reduce noise content from the images in an on-board real-time manner. We propose a deep autoencoder-based approach to identify signal features from low-light images and adaptively brighten images without over-amplifying/saturating the lighter parts in images with a high dynamic range. We show that a variant of the stacked-sparse denoising autoencoder can learn from synthetically darkened and noise-added training examples to adaptively enhance images taken from natural low-light environment and/or are hardware-degraded. Results show significant credibility of the approach both visually and by quantitative comparison with various techniques.
TL;DR: Enhanced skeleton visualization method encodes spatio-temporal skeletons as visual and motion enhanced color images in a compact yet distinctive manner and consistently achieves the highest accuracies on four datasets, including the largest and most challenging NTU RGB+D dataset for skeleton-based action recognition.
Abstract: Sequence-based view invariant transform can effectively cope with view variations.Enhanced skeleton visualization method encodes spatio-temporal skeletons as visual and motion enhanced color images in a compact yet distinctive manner.Multi-stream convolutional neural networks fusion model is able to explore complementary properties among different types of enhanced color images.Our method consistently achieves the highest accuracies on four datasets, including the largest and most challenging NTU RGB+D dataset for skeleton-based action recognition. Human action recognition based on skeletons has wide applications in humancomputer interaction and intelligent surveillance. However, view variations and noisy data bring challenges to this task. Whats more, it remains a problem to effectively represent spatio-temporal skeleton sequences. To solve these problems in one goal, this work presents an enhanced skeleton visualization method for view invariant human action recognition. Our method consists of three stages. First, a sequence-based view invariant transform is developed to eliminate the effect of view variations on spatio-temporal locations of skeleton joints. Second, the transformed skeletons are visualized as a series of color images, which implicitly encode the spatio-temporal information of skeleton joints. Furthermore, visual and motion enhancement methods are applied on color images to enhance their local patterns. Third, a convolutional neural networks-based model is adopted to extract robust and discriminative features from color images. The final action class scores are generated by decision level fusion of deep features. Extensive experiments on four challenging datasets consistently demonstrate the superiority of our method.
TL;DR: A simple solution for facial expression recognition that uses a combination of Convolutional Neural Network and specific image pre-processing steps to extract only expression specific features from a face image and explore the presentation order of the samples during training.
Abstract: Facial expression recognition has been an active research area in the past 10 years, with growing application areas including avatar animation, neuromarketing and sociable robots. The recognition of facial expressions is not an easy problem for machine learning methods, since people can vary significantly in the way they show their expressions. Even images of the same person in the same facial expression can vary in brightness, background and pose, and these variations are emphasized if considering different subjects (because of variations in shape, ethnicity among others). Although facial expression recognition is very studied in the literature, few works perform fair evaluation avoiding mixing subjects while training and testing the proposed algorithms. Hence, facial expression recognition is still a challenging problem in computer vision. In this work, we propose a simple solution for facial expression recognition that uses a combination of Convolutional Neural Network and specific image pre-processing steps. Convolutional Neural Networks achieve better accuracy with big data. However, there are no publicly available datasets with sufficient data for facial expression recognition with deep architectures. Therefore, to tackle the problem, we apply some pre-processing techniques to extract only expression specific features from a face image and explore the presentation order of the samples during training. The experiments employed to evaluate our technique were carried out using three largely used public databases (CK+, JAFFE and BU-3DFE). A study of the impact of each image pre-processing operation in the accuracy rate is presented. The proposed method: achieves competitive results when compared with other facial expression recognition methods 96.76% of accuracy in the CK+ database it is fast to train, and it allows for real time facial expression recognition with standard computers. HighlightsA CNN based approach for facial expression recognition.A set of pre-processing steps allowing for a simpler CNN architecture.A study of the impact of each pre-processing step in the accuracy.A study for lowering the impact of the sample presentation order during training.High facial expression recognition accuracy (96.76%) with real time evaluation.
TL;DR: A comprehensive survey of the characteristics which define and differentiate the types of MIL problems is provided, providing insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research.
Abstract: Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research. Code is available on-line at https://github.com/macarbonneau/MILSurvey.
TL;DR: A Multi-crop Convolutional Neural Network (MC-CNN) is presented to automatically extract nodule salient information by employing a novel multi-crop pooling strategy which crops different regions from convolutional feature maps and then applies max-pooling different times.
Abstract: We investigate the problem of lung nodule malignancy suspiciousness (the likelihood of nodule malignancy) classification using thoracic Computed Tomography (CT) images. Unlike traditional studies primarily relying on cautious nodule segmentation and time-consuming feature extraction, we tackle a more challenging task on directly modeling raw nodule patches and building an end-to-end machine-learning architecture for classifying lung nodule malignancy suspiciousness. We present a Multi-crop Convolutional Neural Network (MC-CNN) to automatically extract nodule salient information by employing a novel multi-crop pooling strategy which crops different regions from convolutional feature maps and then applies max-pooling different times. Extensive experimental results show that the proposed method not only achieves state-of-the-art nodule suspiciousness classification performance, but also effectively characterizes nodule semantic attributes (subtlety and margin) and nodule diameter which are potentially helpful in modeling nodule malignancy
TL;DR: A generic computer vision system designed for exploiting trained deep Convolutional Neural Networks as a generic feature extractor and mixing these features with more traditional hand-crafted features is presented, demonstrating the generalizability of the proposed approach.
Abstract: This work presents a generic computer vision system designed for exploiting trained deep Convolutional Neural Networks (CNN) as a generic feature extractor and mixing these features with more traditional hand-crafted features. Such a system is a single structure that can be used for synthesizing a large number of different image classification tasks. Three substructures are proposed for creating the generic computer vision system starting from handcrafted and non-handcrafter features: i) one that remaps the output layer of a trained CNN to classify a different problem using an SVM; ii) a second for exploiting the output of the penultimate layer of a trained CNN as a feature vector to feed an SVM; and iii) a third for merging the output of some deep layers, applying a dimensionality reduction method, and using these features as the input to an SVM. The application of feature transform techniques to reduce the dimensionality of feature sets coming from the deep layers represents one of the main contributions of this paper. Three approaches are used for the non-handcrafted features: deep transfer learning features based on convolutional neural networks (CNN), principal component analysis network (PCAN), and the compact binary descriptor (CBD). For the handcrafted features, a wide variety of state-of-the-art algorithms are considered: Local Ternary Patterns, Local Phase Quantization, Rotation Invariant Co-occurrence Local Binary Patterns, Completed Local Binary Patterns, Rotated local binary pattern image, Globally Rotation Invariant Multi-scale Co-occurrence Local Binary Pattern, and several others. The computer vision system based on the proposed approach was tested on many different datasets, demonstrating the generalizability of the proposed approach thanks to the strong performance recorded. The Wilcoxon signed rank test is used to compare the different methods; moreover, the independence of the different methods is studied using the Q-statistic. To facilitate replication of our experiments, the MATLAB source code will be available at ( https://www.dropbox.com/s/bguw035yrqz0pwp/ElencoCode.docx?dl=0 ).
TL;DR: In this paper, semi-supervised feature selection methods are fully investigated and two taxonomies of these methods are presented based on two different perspectives which represent the hierarchical structure of semi- supervised feature Selection methods.
Abstract: Feature selection is a significant task in data mining and machine learning applications which eliminates irrelevant and redundant features and improves learning performance. In many real-world applications, collecting labeled data is difficult, while abundant unlabeled data are easily accessible. This motivates researchers to develop semi-supervised feature selection methods which use both labeled and unlabeled data to evaluate feature relevance. However, till-to-date, there is no comprehensive survey covering the semi-supervised feature selection methods. In this paper, semi-supervised feature selection methods are fully investigated and two taxonomies of these methods are presented based on two different perspectives which represent the hierarchical structure of semi-supervised feature selection methods. The first perspective is based on the basic taxonomy of feature selection methods and the second one is based on the taxonomy of semi-supervised learning methods. This survey can be helpful for a researcher to obtain a deep background in semi-supervised feature selection methods and choose a proper semi-supervised feature selection method based on the hierarchical structure of them. A comprehensive survey on semi-supervised feature selection methods is presented.Two categories of these methods are presented from two different perspectives.The hierarchical structure of semi-supervised feature selection methods is given.Advantage and disadvantage of the survey methods are presented.Future research directions are presented.
TL;DR: This paper learns useful leaf features directly from the raw representations of input data using Convolutional Neural Networks (CNN), and gains intuition of the chosen features based on a Deconvolutional Network (DN) approach, and gains insights into the design of new hybrid feature extraction models which are able to further improve the discriminative power of plant classification systems.
Abstract: Plant identification systems developed by computer vision researchers have helped botanists to recognize and identify unknown plant species more rapidly Hitherto, numerous studies have focused on procedures or algorithms that maximize the use of leaf databases for plant predictive modeling, but this results in leaf features which are liable to change with different leaf data and feature extraction techniques In this paper, we learn useful leaf features directly from the raw representations of input data using Convolutional Neural Networks (CNN), and gain intuition of the chosen features based on a Deconvolutional Network (DN) approach We report somewhat unexpected results: (1) different orders of venation are the best representative features compared to those of outline shape, and (2) we observe multi-level representation in leaf data, demonstrating the hierarchical transformation of features from lower-level to higher-level abstraction, corresponding to species classes We show that these findings fit with the hierarchical botanical definitions of leaf characters Through these findings, we gained insights into the design of new hybrid feature extraction models which are able to further improve the discriminative power of plant classification systems The source code and models are available at: https://githubcom/cs-chan/Deep-Plant
TL;DR: A large scale performance evaluation for texture classification, empirically assessing forty texture features including thirty two recent most promising LBP variants and eight non-LBP descriptors based on deep convolutional networks on thirteen widely-used texture datasets.
Abstract: Local Binary Patterns (LBP) have emerged as one of the most prominent and widely studied local texture descriptors. Truly a large number of LBP variants has been proposed, to the point that it can become overwhelming to grasp their respective strengths and weaknesses, and there is a need for a comprehensive study regarding the prominent LBP-related strategies. New types of descriptors based on multistage convolutional networks and deep learning have also emerged. In different papers the performance comparison of the proposed methods to earlier approaches is mainly done with some well-known texture datasets, with differing classifiers and testing protocols, and often not using the best sets of parameter values and multiple scales for the comparative methods. Very important aspects such as computational complexity and effects of poor image quality are often neglected.In this paper, we provide a systematic review of current LBP variants and propose a taxonomy to more clearly group the prominent alternatives. Merits and demerits of the various LBP features and their underlying connections are also analyzed. We perform a large scale performance evaluation for texture classification, empirically assessing forty texture features including thirty two recent most promising LBP variants and eight non-LBP descriptors based on deep convolutional networks on thirteen widely-used texture datasets. The experiments are designed to measure their robustness against different classification challenges, including changes in rotation, scale, illumination, viewpoint, number of classes, different types of image degradation, and computational complexity. The best overall performance is obtained for the Median Robust Extended Local Binary Pattern (MRELBP) feature. For textures with very large appearance variations, Fisher vector pooling of deep Convolutional Neural Networks is clearly the best, but at the cost of very high computational complexity. The sensitivity to image degradations and computational complexity are among the key problems for most of the methods considered. HighlightsA taxonomy and comprehensive survey of LBP variants.Characteristics of, and connections between LBP variants are provided.A comprehensive experimental evaluation of 32 LBP methods.Comparison of 32 LBP variants with 8 deep ConvNets features.Evaluation of robustness to rotation, illumination, scale and noise changes.Comparison of computational complexity of forty variants.
TL;DR: The experimental results on three challenging depth video datasets demonstrate that the proposed online HAR method using the proposed multi-fused features outperforms the state-of-the-art HAR methods in terms of recognition accuracy.
Abstract: The recently developed depth imaging technologies have provided new directions for human activity recognition (HAR) without attaching optical markers or any other motion sensors to human body parts. In this paper, we propose novel multi-fused features for online human activity recognition (HAR) system that recognizes human activities from continuous sequences of depth map. The proposed online HAR system segments human depth silhouettes using temporal human motion information as well as it obtains human skeleton joints using spatiotemporal human body information. Then, it extracts the spatiotemporal multi-fused features that concatenate four skeleton joint features and one body shape feature. Skeleton joint features include the torso-based distance feature (DT), the key joint-based distance feature (DK), the spatiotemporal magnitude feature (M) and the spatiotemporal directional angle feature (θ). The body shape feature called HOG-DDS represents the projections of the depth differential silhouettes (DDS) between two consecutive frames onto three orthogonal planes by the histogram of oriented gradients (HOG) format. The size of the proposed spatiotemporal multi-fused feature is reduced by a code vector in the code book which is generated by vector quantization method. Then, it trains the hidden Markov model (HMM) with the code vectors of the multi-fused features and recognizes the segmented human activity by the forward spotting scheme using the trained HMM-based human activity classifiers. The experimental results on three challenging depth video datasets such as IM-DailyDepthActivity, MSRAction3D and MSRDailyActivity3D demonstrate that the proposed online HAR method using the proposed multi-fused features outperforms the state-of-the-art HAR methods in terms of recognition accuracy.
TL;DR: A novel formulation of the problem that includes knowledge of skilled forgeries from a subset of users in the feature learning process, that aims to capture visual cues that distinguish genuine signatures and forgeries regardless of the user is proposed.
Abstract: We propose formulations for learning features for Offline Signature Verification.A novel method that uses knowledge of forgeries from a subset of users is proposed.Learned features are used to train classifiers for other users (without forgeries).Experiments on GPDS-960 show a large improvement in state-of-the-art.Results in other 3 datasets show that the features generalize without fine-tuning. Verifying the identity of a person using handwritten signatures is challenging in the presence of skilled forgeries, where a forger has access to a persons signature and deliberately attempt to imitate it. In offline (static) signature verification, the dynamic information of the signature writing process is lost, and it is difficult to design good feature extractors that can distinguish genuine signatures and skilled forgeries. This reflects in a relatively poor performance, with verification errors around 7% in the best systems in the literature. To address both the difficulty of obtaining good features, as well as improve system performance, we propose learning the representations from signature images, in a Writer-Independent format, using Convolutional Neural Networks. In particular, we propose a novel formulation of the problem that includes knowledge of skilled forgeries from a subset of users in the feature learning process, that aims to capture visual cues that distinguish genuine signatures and forgeries regardless of the user. Extensive experiments were conducted on four datasets: GPDS, MCYT, CEDAR and Brazilian PUC-PR datasets. On GPDS-160, we obtained a large improvement in state-of-the-art performance, achieving 1.72% Equal Error Rate, compared to 6.97% in the literature. We also verified that the features generalize beyond the GPDS dataset, surpassing the state-of-the-art performance in the other datasets, without requiring the representation to be fine-tuned to each particular dataset.
TL;DR: The state-of-the-art CCL algorithms presented in the last decade are reviewed, the main strategies and algorithms are explained, their pseudo codes are presented, and experimental results are given in order to bring order of the algorithms.
Abstract: Connected-component labeling (CCL) is indispensable for pattern recognition.Many connected-component labeling algorithms have been proposed.The state-of-the-art CCL algorithms presented in the last decade are reviewed. This article addresses the connected-component labeling problem which consists in assigning a unique label to all pixels of each connected component (i.e., each object) in a binary image. Connected-component labeling is indispensable for distinguishing different objects in a binary image, and prerequisite for image analysis and object recognition in the image. Therefore, connected-component labeling is one of the most important processes for image analysis, image understanding, pattern recognition, and computer vision. In this article, we review state-of-the-art connected-component labeling algorithms presented in the last decade, explain the main strategies and algorithms, present their pseudo codes, and give experimental results in order to bring order of the algorithms. Moreover, we will also discuss parallel implementation and hardware implementation of connected-component labeling algorithms, extension for n-D images, and try to indicate future work on the connected component labeling problem.
TL;DR: In this article, a new adaptation layer is proposed to reduce the mismatch between training and test data on a particular source layer, and the adaptation process can be efficiently and effectively implemented in an unsupervised manner.
Abstract: Recent deep learning based methods have achieved the state-of-the-art performance for handwritten Chinese character recognition (HCCR) by learning discriminative representations directly from raw data. Nevertheless, we believe that the long-and-well investigated domain-specific knowledge should still help to boost the performance of HCCR. By integrating the traditional normalization-cooperated direction-decomposed feature map (directMap) with the deep convolutional neural network (convNet), we are able to obtain new highest accuracies for both online and offline HCCR on the ICDAR-2013 competition database. With this new framework, we can eliminate the needs for data augmentation and model ensemble, which are widely used in other systems to achieve their best results. This makes our framework to be efficient and effective for both training and testing. Furthermore, although directMap+convNet can achieve the best results and surpass human-level performance, we show that writer adaptation in this case is still effective. A new adaptation layer is proposed to reduce the mismatch between training and test data on a particular source layer. The adaptation process can be efficiently and effectively implemented in an unsupervised manner. By adding the adaptation layer into the pre-trained convNet, it can adapt to the new handwriting styles of particular writers, and the recognition accuracy can be further improved consistently and significantly. This paper gives an overview and comparison of recent deep learning based approaches for HCCR, and also sets new benchmarks for both online and offline HCCR.
TL;DR: Experimental results indicate that framework built based on CNN and ELM provides competitive performance with small number of training samples, and the average accuracy of ELM can be improved as high as 30.04%, while performs tens to hundreds of times faster than those state-of-the-art classifiers.
Abstract: Spatial features of hyperspectral imagery (HSI) have gained an increasing attention in the latest years. Considering deep convolutional neural network (CNN) can extract a hierarchy of increasingly spatial features, this paper proposes an HSI reconstruction model based on deep CNN to enhance spatial features. The framework proposes a new spatial features-based strategy for band selection to define training label with rich information for the first time. Then, hyperspectral data is trained by deep CNN to build a model with optimized parameters which is suitable for HSI reconstruction. Finally, the reconstructed image is classified by the efficient extreme learning machine (ELM) with a very simple structure. Experimental results indicate that framework built based on CNN and ELM provides competitive performance with small number of training samples. Specifically, by using the reconstructed image, the average accuracy of ELM can be improved as high as 30.04%, while performs tens to hundreds of times faster than those state-of-the-art classifiers.
TL;DR: Understanding basic mechanisms of postoperative pain to identify effective treatment strategies may improve patients' outcome after surgery and point towards useful elements of multimodal analgesia able to reduce opioid consumption, improve pain management, and enhance recovery.
Abstract: Introduction Pain management after surgery continues to be suboptimal; there are several reasons including lack of translation of results from basic science studies and scientific clinical evidence into clinical praxis. Objectives This review presents and discusses basic science findings and scientific evidence generated within the last 2 decades in the field of acute postoperative pain. Methods In the first part of the review, we give an overview about studies that have investigated the pathophysiology of postoperative pain by using rodent models of incisional pain up to July 2016. The second focus of the review lies on treatment recommendations based on guidelines and clinical evidence, eg, by using the fourth edition of the "Acute Pain Management: Scientific Evidence" of the Australian and New Zealand College of Anaesthetists and Faculty of Pain Medicine. Results Preclinical studies in rodent models characterized responses of primary afferent nociceptors and dorsal horn neurons as one neural basis for pain behavior including resting pain, hyperalgesia, movement-evoked pain or anxiety- and depression-like behaviors after surgery. Furthermore, the role of certain receptors, mediators, and neurotransmitters involved in peripheral and central sensitization after incision were identified; many of these are very specific, relate to some modalities only, and are unique for incisional pain. Future treatment should focus on these targets to develop therapeutic agents that are effective for the treatment of postoperative pain as well as have few side effects. Furthermore, basic science findings translate well into results from clinical studies. Scientific evidence is able to point towards useful (and less useful) elements of multimodal analgesia able to reduce opioid consumption, improve pain management, and enhance recovery. Conclusion Understanding basic mechanisms of postoperative pain to identify effective treatment strategies may improve patients' outcome after surgery.
TL;DR: A k -nearest neighbors-based synthetic minority oversampling algorithm, termed SMOM, to handle multiclass imbalance problems, which can aggressively explore the regions of minority classes by configuring a high value for parameter k, but do not result in severe over generalization.
Abstract: Multiclass imbalance data learning has attracted increasing interests from the research community. Unfortunately, existing oversampling solutions, when facing this more challenging problem as compared to two-class imbalance case, have shown their respective deficiencies such as causing serious over generalization or not actively improving the class imbalance in data space. We propose a k -nearest neighbors ( k -NN)-based synthetic minority oversampling algorithm, termed SMOM, to handle multiclass imbalance problems. Different from previous k -NN-based oversampling algorithms, where for any original minority instance the synthetic instances are randomly generated in the directions of its k -nearest neighbors, SMOM assigns a selection weight to each neighbor direction. The neighbor directions that can produce serious over generalization will be given small selection weights. This way, SMOM forms a mechanism of avoiding over generalization as the safer neighbor directions are more likely to be selected to yield the synthetic instances. Owing to this, SMOM can aggressively explore the regions of minority classes by configuring a high value for parameter k , but do not result in severe over generalization. Extensive experiments using 27 real-world data sets demonstrate the effectiveness of our algorithm.
TL;DR: Zhang et al. as discussed by the authors proposed a hybrid deep architecture which combines Fisher vectors and deep neural networks to learn non-linear transformations of pedestrian images to a deep space where data can be linearly separable.
Abstract: Person re-identification is to seek a correct match for a person of interest across different camera views among a large number of impostors. It typically involves two procedures of non-linear feature extractions against dramatic appearance changes, and subsequent discriminative analysis in order to reduce intra-personal variations while enlarging inter-personal differences. In this paper, we introduce a hybrid deep architecture which combines Fisher vectors and deep neural networks to learn non-linear transformations of pedestrian images to a deep space where data can be linearly separable. The proposed method starts from Fisher vector encoding which computes a sequence of local feature extraction, aggregation, and encoding. The resulting Fisher vector output are fed into stacked supervised layer to seek non-linear transformation into a deep space. On top of the deep neural network, Linear Discriminant Analysis (LDA) is reinforced such that linearly separable latent representations can be learned in an end-to-end fashion. By optimizing an objective function modified from LDA, the network is enforced to produce feature distributions which have a low variance within the same class and high variance between classes. The objective is essentially derived from the general LDA eigenvalue problem and allows to train the network with Stochastic Gradient Descent and back-propagate LDA gradients to compute Gaussian Mixture Model (GMM) gradients in Fisher vector encoding. For empirical evaluations, we test our approach on four benchmark data sets in person re-identification (VIPeR 1, CUHK03 2, CUHK01 3, and Market 1501 4). Extensive experiments on these benchmarks show that our method can achieve state-of-the-art results. HighlightsA hybrid architecture that combines Fisher vectors and deep neural networks.An end-to-end training with linear discriminant analysis as objective.Deep features are linearly separable and class separability is maximally preserved.
TL;DR: A novel region-based model for the segmentation of objects or structures in images is proposed by introducing a local similarity factor, which relies on the local spatial distance within a local window and local intensity difference to improve the segmentations results.
Abstract: Image segmentation using a region-based active contour model could present difficulties when its noise distribution is unknown. To overcome this problem, this paper proposes a novel region-based model for the segmentation of objects or structures in images by introducing a local similarity factor, which relies on the local spatial distance within a local window and local intensity difference to improve the segmentation results. By using this local similarity factor, the proposed method can accurately extract the object boundary while guaranteeing certain noise robustness. Furthermore, the proposed algorithm completely avoids the pre-processing steps typical of region-based contour model segmentation, resulting in a higher preservation of image details. Experiments performed on synthetic images and real word images demonstrate that the proposed algorithm, as compared with the state-of-art algorithms, is more efficient and robust to higher noise level manifestations in the images.
TL;DR: A widely used statistical body representation from the largest commercially available scan database is rebuilt, and the resulting model is made available to the community by developing robust best practice solutions for scan alignment that quantitatively lead to the best learned models.
Abstract: Expressive 3D human shape models are proposed.The models are learned from the largest available dataset of laser scans.Various template fitting and posture normalization approaches are evaluated.High quality of the learned shape spaces is empirically demonstrated.Proposed models and code to data pre-processing and model fitting are released. Statistical models of 3D human shape and pose learned from scan databases have developed into valuable tools to solve a variety of vision and graphics problems. Unfortunately, most publicly available models are of limited expressiveness as they were learned on very small databases that hardly reflect the true variety in human body shapes. In this paper, we contribute by rebuilding a widely used statistical body representation from the largest commercially available scan database, and making the resulting model available to the community (visit http://humanshape.mpi-inf.mpg.de). As preprocessing several thousand scans for learning the model is a challenge in itself, we contribute by developing robust best practice solutions for scan alignment that quantitatively lead to the best learned models. We make implementations of these preprocessing steps also publicly available. We extensively evaluate the improved accuracy and generality of our new model, and show its improved performance for human body reconstruction from sparse input data.
TL;DR: The results show that joining CNNs and adaptive gradient methods leads to the state-of-the-art in unconstrained head pose estimation.
Abstract: Head pose estimation is an old problem that is recently receiving new attention because of possible applications in human-robot interaction, augmented reality and driving assistance However, most of the existing work has been tested in controlled environments and is not robust enough for real-world applications In order to handle these limitations we propose an approach based on Convolutional Neural Networks (CNNs) supplemented with the most recent techniques adopted from the deep learning community We evaluate the performance of four architectures on recently released in-the-wild datasets Moreover, we investigate the use of dropout and adaptive gradient methods giving a contribution to their ongoing validation The results show that joining CNNs and adaptive gradient methods leads to the state-of-the-art in unconstrained head pose estimation
TL;DR: Watch, Attend and Parse (WAP), a novel end-to-end approach based on neural network that learns to recognize HMEs in a two-dimensional layout and outputs them as one-dimensional character sequences in LaTeX format, significantly outperformed the state-of-the-art method.
Abstract: Machine recognition of a handwritten mathematical expression (HME) is challenging due to the ambiguities of handwritten symbols and the two-dimensional structure of mathematical expressions. Inspired by recent work in deep learning, we present Watch, Attend and Parse (WAP), a novel end-to-end approach based on neural network that learns to recognize HMEs in a two-dimensional layout and outputs them as one-dimensional character sequences in LaTeX format. Inherently unlike traditional methods, our proposed model avoids problems that stem from symbol segmentation, and it does not require a predefined expression grammar. Meanwhile, the problems of symbol recognition and structural analysis are handled, respectively, using a watcher and a parser. We employ a convolutional neural network encoder that takes HME images as input as the watcher and employ a recurrent neural network decoder equipped with an attention mechanism as the parser to generate LaTeX sequences. Moreover, the correspondence between the input expressions and the output LaTeX sequences is learned automatically by the attention mechanism. We validate the proposed approach on a benchmark published by the CROHME international competition. Using the official training dataset, WAP significantly outperformed the state-of-the-art method with an expression recognition accuracy of 46.55% on CROHME 2014 and 44.55% on CROHME 2016.
TL;DR: In this approach, a wavelet constrained pooling layer is designed to replace the conventional pooling in CNN and the new architecture can suppress the noise and is better at keeping the structures of the learned features, which are crucial to the segmentation tasks.
Abstract: Synthetic aperture radar (SAR) imaging system is usually an observation of the earths' surface. It means that rich structures exist in SAR images. Convolutional neural network (CNN) is good at learning features from raw data automatically, especially the structural features. Inspired by these, we propose a novel SAR image segmentation method based on convolutional-wavelet neural networks (CWNN) and Markov Random Field (MRF). In this approach, a wavelet constrained pooling layer is designed to replace the conventional pooling in CNN. The new architecture can suppress the noise and is better at keeping the structures of the learned features, which are crucial to the segmentation tasks. CWNN produces the segmentation map by patch-by-patch scanning. The segmentation result of CWNN will be used with two labeling strategies (i.e., a superpixel approach and a MRF approach) to produce the final segmentation map. The superpixel approach is used to enforce the smooth nature on the local region. On the other hand, the MRF approach is used to preserve the edges and the details of the SAR image. Specifically, two segmentation maps will be produced by applying the superpixel and MRF approaches. The first segmentation map is obtained by combining the segmentation map of CWNN and the superpixel approach, and the second segmentation map is obtained by applying the MRF approach on the original SAR image. Afterwards, these two segmentation maps are fused by using the sketch map of the SAR image to produce the final segmentation map. Experiments on the texture images demonstrate that the CWNN is effective for the segmentation tasks. Moreover, the experiments on the real SAR images show that our approach obtains the regions with labeling consistency and preserves the edges and details at the same time. Convolutional neural network (CNN) is good at learning features from raw data automatically.A wavelet constrained pooling layer is designed to replace the conventional pooling.The wavelet pooling is plugged into the original CNN to generate the new network architecture named CWNN.Two labeling strategies (i.e., a superpixel approach and a MRF approach) are used to refine the segmentation results.
TL;DR: A novel approach called joint sparse principal component analysis (JSPCA) is proposed to jointly select useful features and enhance robustness to outliers and the experimental results demonstrate that the proposed approach is feasible and effective.
Abstract: Principal component analysis (PCA) is widely used in dimensionality reduction. A lot of variants of PCA have been proposed to improve the robustness of the algorithm. However, the existing methods either cannot select the useful features consistently or is still sensitive to outliers, which will depress their performance of classification accuracy. In this paper, a novel approach called joint sparse principal component analysis (JSPCA) is proposed to jointly select useful features and enhance robustness to outliers. In detail, JSPCA relaxes the orthogonal constraint of transformation matrix to make it have more freedom to jointly select useful features for low-dimensional representation. JSPCA imposes joint sparse constraints on its objective function, i.e., l 2 , 1 -norm is imposed on both the loss term and the regularization term, to improve the algorithmic robustness. A simple yet effective optimization solution is presented and the theoretical analyses of JSPCA are provided. The experimental results on eight data sets demonstrate that the proposed approach is feasible and effective.
TL;DR: A multi-modality classification framework to efficiently exploit the complementarity in theMulti-modal data is presented and pairwise similarity is calculated for each modality individually using the features including regional MRI volumes, voxel-based FDG-PET signal intensities, CSF biomarker measures, and categorical genetic information.
Abstract: Accurate diagnosis of Alzheimer's disease (AD) and its prodromal stage mild cognitive impairment (MCI) is of great interest to patients and clinicians. Recent studies have demonstrated that multiple neuroimaging and biological measures contain complementary information for diagnosis and prognosis. Classification methods are needed to combine these multiple biomarkers to provide an accurate diagnosis. State-of-the-art approaches calculate a mixed kernel or a similarity matrix by linearly combining kernels or similarities from multiple modalities. However, the complementary information from multi-modal data are not necessarily linearly related. In addition, this linear combination is also sensitive to the weights assigned to each modality. In this paper, we present a multi-modality classification framework to efficiently exploit the complementarity in the multi-modal data. First, pairwise similarity is calculated for each modality individually using the features including regional MRI volumes, voxel-based FDG-PET signal intensities, CSF biomarker measures, and categorical genetic information. Similarities from multiple modalities are then combined in a nonlinear graph fusion process, which generates a unified graph for final classification. Based on the unified graphs, we achieved classification area under curve (AUC) of receiver-operator characteristic of 98.1% between AD subjects and normal controls (NC), 82.4% between MCI subjects and NC and 77.9% in a three-way classification, which are significantly better than those using single-modality biomarkers and those based on state-of-the-art linear combination approaches.
TL;DR: Experimental results on benchmark datasets for unsupervised feature selection show that SCUFS outperforms the state-of-the-art UFS methods and can uncover the underlying multi-subspace structure of data.
Abstract: Unsupervised feature selection (UFS) aims to reduce the time complexity and storage burden, improve the generalization ability of learning machines by removing the redundant, irrelevant and noisy features. Due to the lack of training labels, most existing UFS methods generate the pseudo labels by spectral clustering, matrix factorization or dictionary learning, and convert UFS to a supervised problem. The learned clustering labels reflect the data distribution with respect to classes and therefore are vital to the UFS performance. In this paper, we proposed a novel subspace clustering guided unsupervised feature selection (SCUFS) method. The clustering labels of the training samples are learned by representation based subspace clustering, and features that can well preserve the cluster labels are selected. SCUFS can well learn the data distribution in that it uncovers the underlying multi-subspace structure of the data and iteratively learns the similarity matrix and clustering labels. Experimental results on benchmark datasets for unsupervised feature selection show that SCUFS outperforms the state-of-the-art UFS methods. HighlightsA novel subspace clustering guided unsupervised feature selection (SCUFS) model is proposed.SCUFS learns a similarity graph by self-representation of samples and can uncover the underlying multi-subspace structure of data.The iterative updating of similarity graph and pseudo label matrix can learn a more accurate data distribution.
TL;DR: A novel approach, namely CR_CompCode, which can achieve high recognition accuracy while having an extremely low computational complexity is proposed, which is highly effective and efficient for contactless palmprint identification.
Abstract: A novel device is designed and developed for capturing contactless palmprint images.A large-scale contactless palmprint image dataset is established.The quality of collected images is analyzed using modern image quality assessment metrics.For contactless palmprint identification, a CR-based approach is proposed, which is highly effective and efficient. Biometric authentication has been found to be an effective method for recognizing a persons identity with a high confidence. In this field, the use of palmprint represents a recent trend. To make the palmprint-based recognition systems more user-friendly and sanitary, researchers have been investigating how to design such systems in a contactless manner. Though substantial effort has been devoted to this area, it is still not quite clear about the discriminant power of the contactless palmprint, mainly owing to lack of a public, large-scale, and high-quality benchmark dataset collected using a well-designed device. As an attempt to fill this gap, we have at first developed a highly user-friendly device for capturing high-quality contactless palmprint images. Then, with the developed device, a large-scale palmprint image dataset is established, comprising 12,000 images collected from 600 different palms in two separate sessions. To the best of our knowledge, it is the largest contactless palmprint image benchmark dataset ever collected. Besides, for the first time, the quality of collected images is analyzed using modern image quality assessment metrics. Furthermore, for contactless palmprint identification, we have proposed a novel approach, namely CR_CompCode, which can achieve high recognition accuracy while having an extremely low computational complexity. To make the results fully reproducible, the collected dataset and the related source codes are publicly available at http://sse.tongji.edu.cn/linzhang/contactlesspalm/index.htm.
TL;DR: This survey investigates fourty-four studies on image-based insect recognition and tries to give a global picture on what are the scientific locks and how the problem was addressed.
Abstract: Entomology has had many applications in many biological domains (i.e insect counting as a biodiversity index). To meet a growing biological demand and to compensate a decreasing workforce amount, automated entomology has been around for decades. This challenge has been tackled by computer scientists as well as by biologists themselves. This survey investigates fourty-four studies on this topic and tries to give a global picture on what are the scientific locks and how the problem was addressed. Views are adopted on image capture, feature extraction, classification methods and the tested datasets. A general discussion is finally given on the questions that might still remain unsolved such as: the image capture conditions mandatory to good recognition performance, the definition of the problem and whether computer scientist should consider it as a problem in its own or just as an instance of a wider image recognition problem. Graphical abstractDisplay Omitted HighlightsFourty-four about image-based insect recognition are scrutinized.Each paper is qualified from three perspectives: image capture, feature extraction and classification.Datasets used in the literature are investigated.A discussion is given in which several questions about the problem are raised.
TL;DR: A general framework for multi-biometric template protection based on homomorphic probabilistic encryption, where only encrypted data is handled, showing that all requirements described in the ISO/IEC 24745 standard on biometric data protection are met with no accuracy degradation.
Abstract: New framework for multi-biometric template protection based on Homomorphic Encryption.Thorough eval of compliance with ISO/IEC IS 24745 on biometric information protection.Detailed complexity overhead analysis. In spite of the advantages of biometrics as an identity verification technology, some concerns have been raised due to the high sensitivity of biometric data: any information leakage poses a severe privacy threat. To solve those issues only protected templates should be stored or exchanged for recognition purposes. In order to improve the performance and achieve more secure and privacy-preserving systems, we propose a general framework for multi-biometric template protection based on homomorphic probabilistic encryption, where only encrypted data is handled. Three fusion levels are thoroughly analysed, showing that all requirements described in the ISO/IEC 24745 standard on biometric data protection are met with no accuracy degradation. Furthermore, even if all the process is carried out in the encrypted domain, no encryptions are necessary during verification, thereby allowing an efficient verification which can be deployed for real-time applications. Finally, experiments are carried out on a reproducible research framework. The results obtained show high accuracy rates, reaching EERs as low as 0.12%, and requiring protected templates comprising 200KB.