Showing papers in "Pattern Recognition in 2014"
TL;DR: A fiducial marker system specially appropriated for camera pose estimation in applications such as augmented reality and robot localization is presented and an algorithm for generating configurable marker dictionaries following a criterion to maximize the inter-marker distance and the number of bit transitions is proposed.
Abstract: This paper presents a fiducial marker system specially appropriated for camera pose estimation in applications such as augmented reality and robot localization. Three main contributions are presented. First, we propose an algorithm for generating configurable marker dictionaries (in size and number of bits) following a criterion to maximize the inter-marker distance and the number of bit transitions. In the process, we derive the maximum theoretical inter-marker distance that dictionaries of square binary markers can have. Second, a method for automatically detecting the markers and correcting possible errors is proposed. Third, a solution to the occlusion problem in augmented reality applications is shown. To that aim, multiple markers are combined with an occlusion mask calculated by color segmentation. The experiments conducted show that our proposal obtains dictionaries with higher inter-marker distances and lower false negative rates than state-of-the-art systems, and provides an effective solution to the occlusion problem. HighlightsWe propose an algorithm for generating configurable marker dictionaries.We derive the maximum theoretical inter-marker distance.A method for automatically detecting the markers and correcting errors is proposed.A solution to the occlusion problem in augmented reality applications is shown.
1,758 citations
TL;DR: This tutorial introduces RBMs from the viewpoint of Markov random fields, starting with the required concepts of undirected graphical models and reviewing the state-of-the-art in training restricted Boltzmann machines from the perspective of graphical models.
Abstract: Restricted Boltzmann machines (RBMs) are probabilistic graphical models that can be interpreted as stochastic neural networks They have attracted much attention as building blocks for the multi-layer learning systems called deep belief networks, and variants and extensions of RBMs have found application in a wide range of pattern recognition tasks This tutorial introduces RBMs from the viewpoint of Markov random fields, starting with the required concepts of undirected graphical models Different learning algorithms for RBMs, including contrastive divergence learning and parallel tempering, are discussed As sampling from RBMs, and therefore also most of their learning algorithms, are based on Markov chain Monte Carlo (MCMC) methods, an introduction to Markov chains and MCMC techniques is provided Experiments demonstrate relevant aspects of RBM training HighlightsWe review the state-of-the-art in training restricted Boltzmann machines (RBMs) from the perspective of graphical modelsVariants and extensions of RBMs are used in a wide range of pattern recognition tasksThe required background on graphical models and Markov chain Monte Carlo methods is providedTheoretical and experimental results are presented
464 citations
TL;DR: An extensive review of biometric technology is presented here, focusing on mono-modal biometric systems along with their architecture and information fusion levels.
Abstract: Identity management through biometrics offer potential advantages over knowledge and possession based methods. A wide variety of biometric modalities have been tested so far but several factors paralyze the accuracy of mono-modal biometric systems. Usually, the analysis of multiple modalities offers better accuracy. An extensive review of biometric technology is presented here. Besides the mono-modal systems, the article also discusses multi-modal biometric systems along with their architecture and information fusion levels. The paper along with the exemplary evidences highlights the potential for biometric technology, market value and prospects.
351 citations
TL;DR: This comprehensive study observed that, for some classification problems, the performance contribution of the dynamic selection approach is statistically significant when compared to that of a single-based classifier and found evidence of a relation between the observed performance contribution and the complexity of the classification problem.
Abstract: This work presents a literature review of multiple classifier systems based on the dynamic selection of classifiers. First, it briefly reviews some basic concepts and definitions related to such a classification approach and then it presents the state of the art organized according to a proposed taxonomy. In addition, a two-step analysis is applied to the results of the main methods reported in the literature, considering different classification problems. The first step is based on statistical analyses of the significance of these results. The idea is to figure out the problems for which a significant contribution can be observed in terms of classification performance by using a dynamic selection approach. The second step, based on data complexity measures, is used to investigate whether or not a relation exists between the possible performance contribution and the complexity of the classification problem. From this comprehensive study, we observed that, for some classification problems, the performance contribution of the dynamic selection approach is statistically significant when compared to that of a single-based classifier. In addition, we found evidence of a relation between the observed performance contribution and the complexity of the classification problem. These observations allow us to suggest, from the classification problem complexity, that further work should be done to predict whether or not to use a dynamic selection approach.
309 citations
TL;DR: A detailed overview of the state-of-the-art methods for still image-based action recognition is presented, and various high-level cues and low-level features for action analysis in still images are described.
Abstract: Recently still image-based human action recognition has become an active research topic in computer vision and pattern recognition. It focuses on identifying a person׳s action or behavior from a single image. Unlike the traditional action recognition approaches where videos or image sequences are used, a still image contains no temporal information for action characterization. Thus the prevailing spatiotemporal features for video-based action analysis are not appropriate for still image-based action recognition. It is more challenging to perform still image-based action recognition than the video-based one, given the limited source of information as well as the cluttered background for images collected from the Internet. On the other hand, a large number of still images exist over the Internet. Therefore it is demanding to develop robust and efficient methods for still image-based action recognition to understand the web images better for image retrieval or search. Based on the emerging research in recent years, it is time to review the existing approaches to still image-based action recognition and inspire more efforts to advance the field of research. We present a detailed overview of the state-of-the-art methods for still image-based action recognition, and categorize and describe various high-level cues and low-level features for action analysis in still images. All related databases are introduced with details. Finally, we give our views and thoughts for future research.
246 citations
TL;DR: This paper presents the largest inertial sensor-based gait database in the world, which is made open to the research community, and its application to a statistically reliable performance evaluation for gait-based personal authentication.
Abstract: This paper presents the largest inertial sensor-based gait database in the world, which is made open to the research community, and its application to a statistically reliable performance evaluation for gait-based personal authentication. We construct several datasets for both accelerometer and gyroscope of three inertial measurement units and a smartphone around the waist of a subject, which include at most 744 subjects (389 males and 355 females) with ages ranging from 2 to 78 years. The database has several advantages: a large number of subjects with a balanced gender ratio, variations of sensor types, sensor locations, and ground slope conditions. Therefore, we can reliably analyze the dependence of gait authentication performance on a number of factors such as gender, age group, sensor type, ground condition, and sensor location. The results with the latest existing authentication methods provide several insights for these factors. HighlightsWe present the world largest inertial sensor-based database to the community.Based on the database, females have a better recognition performance than males.People have the best recognition performance at their twenties.An accelerometer has a better recognition performance than a gyroscope.
236 citations
TL;DR: A new retinal vessel segmentation method based on level set and region growing is proposed based on a region-based active contour model with level set implementation and an anisotropic diffusion filter is used to smooth the image and preserve vessel boundaries.
Abstract: Retinal vessels play an important role in the diagnostic procedure of retinopathy. Accurate segmentation of retinal vessels is crucial for pathological analysis. In this paper, we propose a new retinal vessel segmentation method based on level set and region growing. Firstly, a retinal vessel image is preprocessed by the contrast-limited adaptive histogram equalization and a 2D Gabor wavelet to enhance the vessels. Then, an anisotropic diffusion filter is used to smooth the image and preserve vessel boundaries. Finally, the region growing method and a region-based active contour model with level set implementation are applied to extract retinal vessels, and their results are combined to achieve the final segmentation. Comparisons are conducted on the publicly available DRIVE and STARE databases using three different measurements. Experimental results show that the proposed method reaches an average accuracy of 94.77% on the DRIVE database and 95.09% on the STARE database.
215 citations
TL;DR: The MinMax k- means algorithm is proposed, a method that assigns weights to the clusters relative to their variance and optimizes a weighted version of the k-Means objective, which limits the emergence of large variance clusters and allows high quality solutions to be systematically uncovered, irrespective of the initialization.
Abstract: Applying k-Means to minimize the sum of the intra-cluster variances is the most popular clustering approach. However, after a bad initialization, poor local optima can be easily obtained. To tackle the initialization problem of k-Means, we propose the MinMax k-Means algorithm, a method that assigns weights to the clusters relative to their variance and optimizes a weighted version of the k-Means objective. Weights are learned together with the cluster assignments, through an iterative procedure. The proposed weighting scheme limits the emergence of large variance clusters and allows high quality solutions to be systematically uncovered, irrespective of the initialization. Experiments verify the effectiveness of our approach and its robustness over bad initializations, as it compares favorably to both k-Means and other methods from the literature that consider the k-Means initialization problem. & 2014 Elsevier Ltd. All rights reserved.
204 citations
TL;DR: The proposed novel structured dictionary learning method achieves better results than the existing sparse representation based face recognition methods, especially in dealing with large region contiguous occlusion and severe illumination variation, while the computational cost is much lower.
Abstract: Sparse representation based classification (SRC) has recently been proposed for robust face recognition. To deal with occlusion, SRC introduces an identity matrix as an occlusion dictionary on the assumption that the occlusion has sparse representation in this dictionary. However, the results show that SRC's use of this occlusion dictionary is not nearly as robust to large occlusion as it is to random pixel corruption. In addition, the identity matrix renders the expanded dictionary large, which results in expensive computation. In this paper, we present a novel method, namely structured sparse representation based classification (SSRC), for face recognition with occlusion. A novel structured dictionary learning method is proposed to learn an occlusion dictionary from the data instead of an identity matrix. Specifically, a mutual incoherence of dictionaries regularization term is incorporated into the dictionary learning objective function which encourages the occlusion dictionary to be as independent as possible of the training sample dictionary. So that the occlusion can then be sparsely represented by the linear combination of the atoms from the learned occlusion dictionary and effectively separated from the occluded face image. The classification can thus be efficiently carried out on the recovered non-occluded face images and the size of the expanded dictionary is also much smaller than that used in SRC. The extensive experiments demonstrate that the proposed method achieves better results than the existing sparse representation based face recognition methods, especially in dealing with large region contiguous occlusion and severe illumination variation, while the computational cost is much lower.
182 citations
TL;DR: A novel approach to deal with the problem of detecting whether the eyes in a given still face image are closed, which has wide potential applications in human–computer interface design, facial expression recognition, driver fatigue detection, and so on.
Abstract: In this paper, we present a novel approach to deal with the problem of detecting whether the eyes in a given still face image are closed, which has wide potential applications in human–computer interface design, facial expression recognition, driver fatigue detection, and so on. The approach combines the strength of multiple feature sets to characterize the rich information of eye patches (concerning both local/global shapes and local textures) and to construct the eye state model. To further improve the model׳s robustness against image noise and scale changes, we propose a new feature descriptor named Multi-scale Histograms of Principal Oriented Gradients (MultiHPOG). The resulting eye closeness detector handles a much wider range of eye appearance caused by expression, lighting, individual identity, and image noise than prior ones. We test our method on real-world eye datasets including the ZJU dataset and a new Closed Eyes in the Wild (CEW) dataset with promising results. In addition, several crucial design considerations that may have significant influence on the performance of a practical eye closeness detection system, including geometric normalization, feature extraction, and classification strategies, are also studied experimentally in this work.
160 citations
TL;DR: A novel method for classifying six categories of patterns of fluorescence staining of a HEp-2 cell as a combination of the powerful rotation invariant co-occurrence among adjacent local binary pattern image feature and a linear support vector machine (SVM).
Abstract: This paper proposes a novel method for classifying six categories of patterns of fluorescence staining of a HEp-2 cell. The proposed method is constructed as a combination of the powerful rotation invariant co-occurrence among adjacent local binary pattern (RIC-LBP) image feature and a linear support vector machine (SVM). RIC-LBP provides high descriptive ability and robustness against local rotations of an input cell image. To further deal with global rotation, we synthesize many training images by rotating the original training images and constructing the SVM using both the original and synthesized images. The proposed method has the following advantages: (1) robustness against uniform changes in intensity of an input cell image, (2) invariance under local and global rotation of the image, (3) low computational cost, and (4) easy implementation. The proposed method was demonstrated to be effective through evaluation experiments using the MIVIA HEp-2 images dataset and comparison with typical state-of-the-art methods.
TL;DR: A new shape representation called Bag of Contour Fragments (BCF) inspired by classical Bag of Words (BoW) model is developed, which achieves the state-of-the-art performance on several well-known shape benchmarks, and can be applied to real image classification problem.
Abstract: Shape representation is a fundamental problem in computer vision. Current approaches to shape representation mainly focus on designing low-level shape descriptors which are robust to rotation, scaling and deformation of shapes. In this paper, we focus on mid-level modeling of shape representation. We develop a new shape representation called Bag of Contour Fragments (BCF) inspired by classical Bag of Words (BoW) model. In BCF, a shape is decomposed into contour fragments each of which is then individually described using a shape descriptor, e.g., the Shape Context descriptor, and encoded into a shape code. Finally, a compact shape representation is built by pooling shape codes in the shape. Shape classification with BCF only requires an efficient linear SVM classifier. In our experiments, we fully study the characteristics of BCF, show that BCF achieves the state-of-the-art performance on several well-known shape benchmarks, and can be applied to real image classification problem. HighlightsA new shape representation is proposed by encoding contour fragments in shape.The proposed shape representation is compact yet informative.The proposed shape representation is robust to shape deformation and conclusion.We obtain the state-of-the-art shape classification performance on several bench-mark datasets.
TL;DR: A dynamic subspace detection (DSD) method which establishes a multiple detection framework is proposed and can find the most suitable pixels to construct detectors, which can be further distinguished by a fusion of all the detection procedures.
Abstract: For hyperspectral target detection, it is usually the case that only part of the targets pixels can be used as target signatures, so can we use them to construct the most proper background subspace for detecting all the probable targets? In this paper, a dynamic subspace detection (DSD) method which establishes a multiple detection framework is proposed. In each detection procedure, blocks of pixels are calculated by the random selection and the succeeding detection performance distribution analysis. Manifold analysis is further used to eliminate the probable anomalous pixels and purify the subspace datasets, and the remaining pixels construct the subspace for each detection procedure. The final detection results are then enhanced by the fusion of target occurrence frequencies in all the detection procedures. Experiments with both synthetic and real hyperspectral images (HSI) evaluate the validation of our proposed DSD method by using several different state-of-the-art methods as the basic detectors. With several other single detectors and multiple detection methods as comparable methods, improved receiver operating characteristic curves and better separability between targets and backgrounds by the DSD methods are illustrated. The DSD methods also perform well with the covariance-based detectors, showing their efficiency in selecting covariance information for detection. The dynamic subspace detection theory is a useful background statistics estimation.Our method can find the most suitable pixels to construct detectors.Targets can be further distinguished by a fusion of all the detection procedures.
TL;DR: The comprehensive Arabic offline Handwritten Text database (KHATT) consisting of 1000 handwritten forms written by 1000 distinct writers from different countries, made freely available to researchers world-wide for research in various handwritten-related problems.
Abstract: A comprehensive Arabic handwritten text database is an essential resource for Arabic handwritten text recognition research. This is especially true due to the lack of such database for Arabic handwritten text. In this paper, we report our comprehensive Arabic offline Handwritten Text database (KHATT) consisting of 1000 handwritten forms written by 1000 distinct writers from different countries. The forms were scanned at 200, 300, and 600 dpi resolutions. The database contains 2000 randomly selected paragraphs from 46 sources, 2000 minimal text paragraph covering all the shapes of Arabic characters, and optionally written paragraphs on open subjects. The 2000 random text paragraphs consist of 9327 lines. The database forms were randomly divided into 70%, 15%, and 15% sets for training, testing, and verification, respectively. This enables researchers to use the database and compare their results. A formal verification procedure is implemented to align the handwritten text with its ground truth at the form, paragraph and line levels. The verified ground truth database contains meta-data describing the written text at the page, paragraph, and line levels in text and XML formats. Tools to extract paragraphs from pages and segment paragraphs into lines are developed. In addition we are presenting our experimental results on the database using two classifiers, viz. Hidden Markov Models (HMM) and our novel syntactic classifier. The database is made freely available to researchers world-wide for research in various handwritten-related problems such as text recognition, writer identification and verification, forms analysis, pre-processing, segmentation. Several international research groups/researchers acquired the database for use in their research so far.
TL;DR: A graph based under-sampling strategy is introduced to keep the proximity information, which is robustness to outliers, and the weight biases are embedded in the Lagrangian TWSVM formulations, which overcomes the bias phenomenon.
Abstract: In this paper, we propose an efficient weighted Lagrangian twin support vector machine (WLTSVM) for the imbalanced data classification based on using different training points for constructing the two proximal hyperplanes. The main contributions of our WLTSVM are: (1) a graph based under-sampling strategy is introduced to keep the proximity information, which is robustness to outliers, (2) the weight biases are embedded in the Lagrangian TWSVM formulations, which overcomes the bias phenomenon in the original TWSVM for the imbalanced data classification, (3) the convergence of the training procedure of Lagrangian functions is proven and (4) it is tested and compared with some other TWSVMs on synthetic and real datasets to show its feasibility and efficiency for the imbalanced data classification.
TL;DR: A novel segmentation algorithm via a local correntropy-based K-means (LCK) clustering algorithm that can be robust to the outliers is proposed, and is incorporated into the region-based level set segmentation framework.
Abstract: It is still a challenging task to segment real-world images, since they are often distorted by unknown noise and intensity inhomogeneity. To address these problems, we propose a novel segmentation algorithm via a local correntropy-based K-means (LCK) clustering. Due to the correntropy criterion, the clustering algorithm can decrease the weights of the samples that are away from their clusters. As a result, LCK based clustering algorithm can be robust to the outliers. The proposed LCK clustering algorithm is incorporated into the region-based level set segmentation framework. The iteratively re-weighted algorithm is used to solve the LCK based level set segmentation method. Extensive experiments on synthetic and real images are provided to evaluate our method, showing significant improvements on both noise sensitivity and segmentation accuracy, as compared with the state-of-the-art approaches.
TL;DR: This paper presents two sets of features, shape representation and kinematic structure, for human activity recognition using a sequence of RGB-D images, fused using the Multiple Kernel Learning (MKL) technique at the kernel level forhuman activity recognition.
Abstract: This paper presents two sets of features, shape representation and kinematic structure, for human activity recognition using a sequence of RGB-D images. The shape features are extracted using the depth information in the frequency domain via spherical harmonics representation. The other features include the motion of the 3D joint positions (i.e. the end points of the distal limb segments) in the human body. Both sets of features are fused using the Multiple Kernel Learning (MKL) technique at the kernel level for human activity recognition. Our experiments on three publicly available datasets demonstrate that the proposed features are robust for human activity recognition and particularly when there are similarities among the actions.
TL;DR: The final aim of this solution is to represent a building block for new generation of smart-phone applications which need fast and accurate ellipse detection also with limited computational resources.
Abstract: Several papers addressed ellipse detection as a first step for several computer vision applications, but most of the proposed solutions are too slow to be applied in real time on large images or with limited hardware resources. This paper presents a novel algorithm for fast and effective ellipse detection and demonstrates its superior speed performance on large and challenging datasets. The proposed algorithm relies on an innovative selection strategy of arcs which are candidate to form ellipses and on the use of Hough transform to estimate parameters in a decomposed space. The final aim of this solution is to represent a building block for new generation of smart-phone applications which need fast and accurate ellipse detection also with limited computational resources.
TL;DR: An automatic procedure to classify legume species using scanned leaves based only on the analysis of their veins based on state-of-the-art classifiers outperforms human expert classification.
Abstract: In this paper, a procedure for segmenting and classifying scanned legume leaves based only on the analysis of their veins is proposed (leaf shape, size, texture and color are discarded). Three legume species are studied, namely soybean, red and white beans. The leaf images are acquired using a standard scanner. The segmentation is performed using the unconstrained hit-or-miss transform and adaptive thresholding. Several morphological features are computed on the segmented venation, and classified using four alternative classifiers, namely support vector machines (linear and Gaussian kernels), penalized discriminant analysis and random forests. The performance is compared to the one obtained with cleared leaves images, which require a more expensive, time consuming and delicate procedure of acquisition. The results are encouraging, showing that the proposed approach is an effective and more economic alternative solution which outperforms the manual expert's recognition. HighlightsWe develop an automatic procedure to classify legume species using scanned leaves.The method is based exclusively on the analysis of the leaf venation images.We analyze the advantages over the usage of cleared leaves.Different state-of-the-art classifiers are compared.The proposed method outperforms human expert classification.
TL;DR: This paper proposes DBR as a multi-label classifier that exploits conditional label dependence, called dependent binary relevance (DBR) learning, and provides a careful analysis of the relationship between these techniques.
Abstract: Several meta-learning techniques for multi-label classification (MLC), such as chaining and stacking, have already been proposed in the literature, mostly aimed at improving predictive accuracy through the exploitation of label dependencies. In this paper, we propose another technique of that kind, called dependent binary relevance (DBR) learning. DBR combines properties of both, chaining and stacking. We provide a careful analysis of the relationship between these and other techniques, specifically focusing on the underlying dependency structure and the type of training data used for model construction. Moreover, we offer an extensive empirical evaluation, in which we compare different techniques on MLC benchmark data. Our experiments provide evidence for the good performance of DBR in terms of several evaluation measures that are commonly used in MLC. HighlightsWe propose DBR as a multi-label classifier that exploits conditional label dependence.DBR combines properties of both, chaining and stacking learning strategies.We provide a careful analysis of the relationship between these techniques.We study the underlying dependency structure and the type of training data used.Our experiments show the good performance of DBR in terms of several measures.
TL;DR: This paper presents a novel framework for recognizing streamed actions using Motion Capture (MoCap) data based on histograms of action poses, extracted from MoCap data, that are computed according to Hausdorff distance.
Abstract: Ongoing human action recognition is a challenging problem that has many applications, such as video surveillance, patient monitoring, human-computer interaction, etc. This paper presents a novel framework for recognizing streamed actions using Motion Capture (MoCap) data. Unlike the after-the-fact classification of completed activities, this work aims at achieving early recognition of ongoing activities. The proposed method is time efficient as it is based on histograms of action poses, extracted from MoCap data, that are computed according to Hausdorff distance. The histograms are then compared with the Bhattacharyya distance and warped by a dynamic time warping process to achieve their optimal alignment. This process, implemented by our dynamic programming-based solution, has the advantage of allowing some stretching flexibility to accommodate for possible action length changes. We have shown the success and effectiveness of our solution by testing it on large datasets and comparing it with several state-of-the-art methods. In particular, we were able to achieve excellent recognition rates that have outperformed many well known methods. HighlightsHuman motion interpretation from Motion Capture systems.Histogram of poses analysis.Ongoing activity recognition.
TL;DR: In this article, a location-variant linear filter is proposed to select homogeneous pixels in the filtering area through statistical tests between distributions, which can be used to compute a local mean.
Abstract: This paper presents a technique for reducing speckle in Polarimetric Synthetic Aperture Radar (PolSAR) imagery using nonlocal means and a statistical test based on stochastic divergences. The main objective is to select homogeneous pixels in the filtering area through statistical tests between distributions. This proposal uses the complex Wishart model to describe PolSAR data, but the technique can be extended to other models. The weights of the location-variant linear filter are function of the p-values of tests which verify the hypothesis that two samples come from the same distribution and, therefore, can be used to compute a local mean. The test stems from the family of (h- ? ) divergences which originated in Information Theory. This novel technique was compared with the Boxcar, Refined Lee and IDAN filters. Image quality assessment methods on simulated and real data are employed to validate the performance of this approach. We show that the proposed filter also enhances the polarimetric entropy and preserves the scattering information of the targets. HighlightsNew convolution filter for full polarimetric SAR imagery which preserves scattering properties.It is built applying a soft function to the p -values of tests from stochastic divergences.We present both quantitative and qualitative assessments of the filter.
TL;DR: The RSCFCM algorithm is proposed, utilizing the negative log-posterior as the dissimilarity function, introducing a novel factor and integrating the bias field estimation model into the fuzzy objective function, which successfully overcomes the drawbacks of existing FCM-type clustering schemes and EM-type mixture models.
Abstract: Objective Accurate brain tissue segmentation from magnetic resonance (MR) images is an essential step in quantitative brain image analysis, and hence has attracted extensive research attention. However, due to the existence of noise and intensity inhomogeneity in brain MR images, many segmentation algorithms suffer from limited robustness to outliers, over-smoothness for segmentations and limited segmentation accuracy for image details. To further improve the accuracy for brain MR image segmentation, a robust spatially constrained fuzzy c-means (RSCFCM) algorithm is proposed in this paper. Method Firstly, a novel spatial factor is proposed to overcome the impact of noise in the images. By incorporating the spatial information amongst neighborhood pixels, the proposed spatial factor is constructed based on the posterior probabilities and prior probabilities, and takes the spatial direction into account. It plays a role as linear filters for smoothing and restoring images corrupted by noise. Therefore, the proposed spatial factor is fast and easy to implement, and can preserve more details. Secondly, the negative log-posterior is utilized as dissimilarity function by taking the prior probabilities into account, which can further improve the ability to identify the class for each pixel. Finally, to overcome the impact of intensity inhomogeneity, we approximate the bias field at the pixel-by-pixel level by using a linear combination of orthogonal polynomials. The fuzzy objective function is then integrated with the bias field estimation model to overcome the intensity inhomogeneity in the image and segment the brain MR images simultaneously. Results To demonstrate the performances of the proposed algorithm for the images with/without skull stripping, the first group of experiments is carried out in clinical 3T-weighted brain MR images which contain quite serious intensity inhomogeneity and noise. Then we quantitatively compare our algorithm to state-of-the-art segmentation approaches by using Jaccard similarity on benchmark images obtained from IBSR and BrainWeb with different level of noise and intensity inhomogeneity. The comparison results demonstrate that the proposed algorithm can produce higher accuracy segmentation and has stronger ability of denoising, especially in the area with abundant textures and details. Conclusion In this paper, the RSCFCM algorithm is proposed by utilizing the negative log-posterior as the dissimilarity function, introducing a novel factor and integrating the bias field estimation model into the fuzzy objective function. This algorithm successfully overcomes the drawbacks of existing FCM-type clustering schemes and EM-type mixture models. Our statistical results (mean and standard deviation of Jaccard similarity for each tissue) on both synthetic and clinical images show that the proposed algorithm can overcome the difficulties caused by noise and bias fields, and is capable of improving over 5% segmentation accuracy comparing with several state-of-the-art algorithms.
TL;DR: This paper presents a computationally efficient 3D face recognition system based on a novel facial signature called Angular Radial Signature (ARS) which is extracted from the semi-rigid region of the face which can handle expression variations.
Abstract: This paper presents a computationally efficient 3D face recognition system based on a novel facial signature called Angular Radial Signature (ARS) which is extracted from the semi-rigid region of the face. Kernel Principal Component Analysis (KPCA) is then used to extract the mid-level features from the extracted ARSs to improve the discriminative power. The mid-level features are then concatenated into a single feature vector and fed into a Support Vector Machine (SVM) to perform face recognition. The proposed approach addresses the expression variation problem by using facial scans with various expressions of different individuals for training. We conducted a number of experiments on the Face Recognition Grand Challenge (FRGC v2.0) and the 3D track of Shape Retrieval Contest (SHREC 2008) datasets, and a superior recognition performance has been achieved. Our experimental results show that the proposed system achieves very high Verification Rates (VRs) of 97.8% and 88.5% at a 0.1% False Acceptance Rate (FAR) for the "neutral vs. nonneutral" experiments on the FRGC v2.0 and the SHREC 2008 datasets respectively, and 96.7% for the ROC III experiment of the FRGC v2.0 dataset. Our experiments also demonstrate the computational efficiency of the proposed approach. HighlightsNovel facial Angular Radial Signatures (ARSs) are proposed for 3D face recognition.The Signatures are extracted from the semi-rigid facial regions.A two-stage mapping-based classification strategy is used to perform face recognition.ARSs combined with machine learning techniques can handle expression variations.State-of-the-art performance on two public datasets with high efficiency is achieved.
TL;DR: A new method to increase the diversity of each tree in the forests and thereby improve the overall accuracy of the Random Forests in most cases is proposed.
Abstract: Random Forests receive much attention from researchers because of their excellent performance. As Breiman suggested, the performance of Random Forests depends on the strength of the weak learners in the forests and the diversity among them. However, in the literature, many researchers only considered pre-processing of the data or post-processing of the Random Forests models. In this paper, we propose a new method to increase the diversity of each tree in the forests and thereby improve the overall accuracy. During the training process of each individual tree in the forest, different rotation spaces are concatenated into a higher space at the root node. Then the best split is exhaustively searched within this higher space. The location where the best split lies decides which rotation method to be used for all subsequent nodes. The performance of the proposed method here is evaluated on 42 benchmark data sets from various research fields and compared with the standard Random Forests. The results show that the proposed method improves the performance of the Random Forests in most cases.
TL;DR: A Monte Carlo approach for efficient classifier chains, applied to learning from multi-label and multi-dimensional data, and an empirical cross-fold comparison with PCC and other related methods is presented.
Abstract: Multi-dimensional classification (MDC) is the supervised learning problem where an instance is associated with multiple classes, rather than with a single class, as in traditional classification problems. Since these classes are often strongly correlated, modeling the dependencies between them allows MDC methods to improve their performance - at the expense of an increased computational cost. In this paper we focus on the classifier chains (CC) approach for modeling dependencies, one of the most popular and highest-performing methods for multi-label classification (MLC), a particular case of MDC which involves only binary classes (i.e., labels). The original CC algorithm makes a greedy approximation, and is fast but tends to propagate errors along the chain. Here we present novel Monte Carlo schemes, both for finding a good chain sequence and performing efficient inference. Our algorithms remain tractable for high-dimensional data sets and obtain the best predictive performance across several real data sets. HighlightsA Monte Carlo approach for efficient classifier chains.Applied to learning from multi-label and multi-dimensional data.A theoretical and empirical study of payoff functions in the search space.An empirical cross-fold comparison with PCC and other related methods.
TL;DR: This paper proposes a query expansion technique for image search that is faster and more precise than the existing ones and significantly outperforms the visual query expansion state of the art on popular benchmarks.
Abstract: This paper proposes a query expansion technique for image search that is faster and more precise than the existing ones. An enriched representation of the query is obtained by exploiting the binary representation offered by the Hamming Embedding image matching approach: The initial local descriptors are refined by aggregating those of the database, while new descriptors are produced from the images that are deemed relevant. The technique has two computational advantages over other query expansion techniques. First, the size of the enriched representation is comparable to that of the initial query. Second, the technique is effective even without using any geometry, in which case searching a database comprising 105k images typically takes 80 ms on a desktop machine. Overall, our technique significantly outperforms the visual query expansion state of the art on popular benchmarks. It is also the first query expansion technique shown effective on the UKB benchmark, which has few relevant images per query.
TL;DR: An effective alignment-free method for constructing cancelable fingerprint templates via curtailed circular convolution, which features an efficient one-way transform, which protects the input binary string such that it cannot be retrieved from the length-reduced, convolved output vector.
Abstract: Fraudulent use of stolen fingerprint data and privacy invasion by tracking individuals unlawfully with shared or stolen fingerprint data justify the significance of fingerprint template protection. With no a priori fingerprint image registration, alignment-free cancelable fingerprint templates do not suffer from inaccurate singular point detection. In this paper, we propose an effective alignment-free method for constructing cancelable fingerprint templates via curtailed circular convolution. The proposed method features an efficient one-way transform, which protects the input binary string such that it cannot be retrieved from the length-reduced, convolved output vector. The transformed template fulfills the requirements of non-invertibility, revocability and diversity for cancelable fingerprint templates. Evaluation of the proposed scheme over FVC2002 DB1, DB2 and DB3 shows that the new method demonstrates satisfactory performance compared to the existing alignment-free cancelable template schemes. HighlightsAlignment-free cancelable fingerprint templates.Non-invertible transform via curtailed circular convolution.Satisfactory matching performance with low equal error rate (EER).Generated cancelable templates have the properties of non-invertibility, revocability and diversity.Non-disclosure of original fingerprint data.
TL;DR: The experimental results show that the proposed model is very efficient in recognizing six basic emotions while ensuring significant increase in average classification accuracy over radial basis function and multi-layered perceptron.
Abstract: This paper presents a novel emotion recognition model using the system identification approach. A comprehensive data driven model using an extended Kohonen self-organizing map (KSOM) has been developed whose input is a 26 dimensional facial geometric feature vector comprising eye, lip and eyebrow feature points. The analytical face model using this 26 dimensional geometric feature vector has been effectively used to describe the facial changes due to different expressions. This paper thus includes an automated generation scheme of this geometric facial feature vector. The proposed non-heuristic model has been developed using training data from MMI facial expression database. The emotion recognition accuracy of the proposed scheme has been compared with radial basis function network, multi-layered perceptron model and support vector machine based recognition schemes. The experimental results show that the proposed model is very efficient in recognizing six basic emotions while ensuring significant increase in average classification accuracy over radial basis function and multi-layered perceptron. It also shows that the average recognition rate of the proposed method is comparatively better than multi-class support vector machine. HighlightsWe propose an emotion recognition model using system identification.Twenty six dimensional geometric feature vector is extracted using three different algorithms.Classification using an intermediate Kohonen self-organizing map layer.A comparative study with Radial basis function, Multi-layer perceptron and Support vector machine.Efficient recognition results with significant increase in average recognition accuracy over radial basis function and multi-layer perceptron. Marginal improvement over support vector machine.
TL;DR: The extended factorized approximation method is applied to introduce a single lower-bound to the variational objective function and an analytically tractable estimation solution is derived and the convergence of the proposed algorithm is theoretically guaranteed.
Abstract: In statistical modeling, parameter estimation is an essential and challengeable task Estimation of the parameters in the Dirichlet mixture model (DMM) is analytically intractable, due to the integral expressions of the gamma function and its corresponding derivatives We introduce a Bayesian estimation strategy to estimate the posterior distribution of the parameters in DMM By assuming the gamma distribution as the prior to each parameter, we approximate both the prior and the posterior distribution of the parameters with a product of several mutually independent gamma distributions The extended factorized approximation method is applied to introduce a single lower-bound to the variational objective function and an analytically tractable estimation solution is derived Moreover, there is only one function that is maximized during iterations and, therefore, the convergence of the proposed algorithm is theoretically guaranteed With synthesized data, the proposed method shows the advantages over the EM-based method and the previously proposed Bayesian estimation method With two important multimedia signal processing applications, the good performance of the proposed Bayesian estimation method is demonstrated