Showing papers in "Pattern Recognition in 2016"
TL;DR: A hybrid model where an unsupervised DBN is trained to extract generic underlying features, and a one-class SVM is trained from the features learned by the DBN, which delivers a comparable accuracy with a deep autoencoder and is scalable and computationally efficient.
Abstract: High-dimensional problem domains pose significant challenges for anomaly detection. The presence of irrelevant features can conceal the presence of anomalies. This problem, known as the 'curse of dimensionality', is an obstacle for many anomaly detection techniques. Building a robust anomaly detection model for use in high-dimensional spaces requires the combination of an unsupervised feature extractor and an anomaly detector. While one-class support vector machines are effective at producing decision surfaces from well-behaved feature vectors, they can be inefficient at modelling the variation in large, high-dimensional datasets. Architectures such as deep belief networks (DBNs) are a promising technique for learning robust features. We present a hybrid model where an unsupervised DBN is trained to extract generic underlying features, and a one-class SVM is trained from the features learned by the DBN. Since a linear kernel can be substituted for nonlinear ones in our hybrid model without loss of accuracy, our model is scalable and computationally efficient. The experimental results show that our proposed model yields comparable anomaly detection performance with a deep autoencoder, while reducing its training and testing time by a factor of 3 and 1000, respectively. HighlightsWe use a combination of a one-class SVM and deep learning.In our model linear kernels can be used rather than nonlinear ones.Our model delivers a comparable accuracy with a deep autoencoder.Our model executes 3times faster in training and 1000 faster than a deep autoencoder.
TL;DR: Two Mixed Integer Linear Programming (MILP) approaches to generate configurable square-based fiducial marker dictionaries maximizing their inter-marker distance are proposed.
Abstract: Square-based fiducial markers are one of the most popular approaches for camera pose estimation due to its fast detection and robustness. In order to maximize their error correction capabilities, it is required to use an inner binary codification with a large inter-marker distance. This paper proposes two Mixed Integer Linear Programming (MILP) approaches to generate configurable square-based fiducial marker dictionaries maximizing their inter-marker distance. The first approach guarantees the optimal solution, however, it can only be applied to relatively small dictionaries and number of bits since the computing times are too long for many situations. The second approach is an alternative formulation to obtain suboptimal dictionaries within restricted time, achieving results that still surpass significantly the current state of the art methods. HighlightsThe paper proposes two methods to obtain fiducial markers based on the MILP paradigm.First model guarantees the optimality in terms of inter-marker distance.Second model generates suboptimal markers within restricted time.The markers generated allow the correction of a higher amount of erroneous bits.
TL;DR: Experimental results on three sequences demonstrate that the proposed small target detection method can not only suppress background clutter effectively even if with strong noise interference, but also detect targets accurately with low false alarm rate and high speed.
Abstract: Infrared (IR) small target detection plays an important role in IR guidance systems. In this paper, a biologically inspired method called multiscale patch-based contrast measure (MPCM) is proposed for small IR target detection. MPCM can increase the contrast between target and background, which makes it easy to segment small target by simple adaptive thresholding method. Experimental results on three sequences demonstrate that the proposed small target detection method can not only suppress background clutter effectively even if with strong noise interference, but also detect targets accurately with low false alarm rate and high speed. HighlightsA biologically inspired target enhancement method called multiscale patch-based contrast measure (MPCM) is presented.Based on MPCM, a small IR target detection algorithm is designed.The proposed small target detection method achieves promising detection performance on three real IR image sequences.
TL;DR: This survey highlights motivations and challenges of this very recent research area by presenting technologies and approaches for 3D skeleton-based action classification, and introduces a categorization of the most recent works according to the adopted feature representation.
Abstract: In recent years, there has been a proliferation of works on human action classification from depth sequences. These works generally present methods and/or feature representations for the classification of actions from sequences of 3D locations of human body joints and/or other sources of data, such as depth maps and RGB videos.This survey highlights motivations and challenges of this very recent research area by presenting technologies and approaches for 3D skeleton-based action classification. The work focuses on aspects such as data pre-processing, publicly available benchmarks and commonly used accuracy measurements. Furthermore, this survey introduces a categorization of the most recent works in 3D skeleton-based action classification according to the adopted feature representation.This paper aims at being a starting point for practitioners who wish to approach the study of 3D action classification and gather insights on the main challenges to solve in this emerging field. HighlightsState of the art 3D skeleton-based action classification methods are reviewed.Methods are categorized based on the adopted feature representation.Motivations and challenges for skeleton-based action recognition are highlighted.Data pre-processing, public benchmarks and validation protocols are discussed.Comparison of renowned methods, open problems and future work are presented.
TL;DR: It is proved that the newly-defined entropy meets the common requirement of monotonicity and can equivalently characterize the existing attribute reductions in the fuzzy rough set theory.
Abstract: Feature selection in the data with different types of feature values, i.e., the heterogeneous or mixed data, is especially of practical importance because such types of data sets widely exist in real world. The key issue for feature selection in mixed data is how to properly deal with different types of the features or attributes in the data set. Motivated by the fuzzy rough set theory which allows different fuzzy relations to be defined for different types of attributes to measure the similarity between objects and in view of the effectiveness of entropy to measure information uncertainty, we propose in this paper a fuzzy rough set-based information entropy for feature selection in a mixed data set. It is proved that the newly-defined entropy meets the common requirement of monotonicity and can equivalently characterize the existing attribute reductions in the fuzzy rough set theory. Then, a feature selection algorithm is formulated based on the proposed entropy and a filter-wrapper method is suggested to select the best feature subset in terms of classification accuracy. An extensive numerical experiment is further conducted to assess the performance of the feature selection method and the results are satisfactory. HighlightsA novel fuzzy rough set-based information entropy is constructed for mixed data.The proposed entropy can equivalently characterize the existing attribute reductions in the fuzzy rough set theory.A feature selection algorithm is formulated based on the proposed entropy.A filter-wrapper method is suggested to select a best feature subset.
TL;DR: In this article, a comprehensive review of the most commonly used action recognition related RGB-D video datasets, including 27 single-view, 10 multi-view and 7 multi-person datasets, is presented.
Abstract: Human action recognition from RGB-D (Red, Green, Blue and Depth) data has attracted increasing attention since the first work reported in 2010. Over this period, many benchmark datasets have been created to facilitate the development and evaluation of new algorithms. This raises the question of which dataset to select and how to use it in providing a fair and objective comparative evaluation against state-of-the-art methods. To address this issue, this paper provides a comprehensive review of the most commonly used action recognition related RGB-D video datasets, including 27 single-view datasets, 10 multi-view datasets, and 7 multi-person datasets. The detailed information and analysis of these datasets is a useful resource in guiding insightful selection of datasets for future research. In addition, the issues with current algorithm evaluation vis-a-vis limitations of the available datasets and evaluation protocols are also highlighted; resulting in a number of recommendations for collection of new datasets and use of evaluation protocols. HighlightsA detailed review and in-depth analysis of 44 publicly available RGB-D-based action datasets.Recommendations on the selection of datasets and evaluation protocols for use in future research.Identification of some limitations of these datasets and evaluation protocols.Recommendations on future creation of datasets and use of evaluation protocols.
TL;DR: A comprehensive survey on the recent development and challenges of human detection in the thread of human object descriptors is provided, providing a thorough analysis of the state-of-the-art human detection methods and a guide to the selection of appropriate methods in practical applications.
Abstract: The problem of human detection is to automatically locate people in an image or video sequence and has been actively researched in the past decade. This paper aims to provide a comprehensive survey on the recent development and challenges of human detection. Different from previous surveys, this survey is organised in the thread of human object descriptors. This approach has advantages in providing a thorough analysis of the state-of-the-art human detection methods and a guide to the selection of appropriate methods in practical applications. In addition, challenges such as occlusion and real-time human detection are analysed. The commonly used evaluation of human detection methods such as the datasets, tools, and performance measures are presented and future research directions are highlighted. HighlightsA review on the state-of-the-art of human detection.This review is organised in the thread of human object descriptors.Challenges such as occlusion and real-time human detection are analysed.The commonly used datasets, tools, and performance measures are presented.Open issues and future research directions are highlighted.A guide to the selection of detection methods for applications is provided.
TL;DR: A novel graph-based index structure method is proposed that accelerates the neighbor search operations and also scalable for high dimensional datasets.
Abstract: Density based clustering methods are proposed for clustering spatial databases with noise. Density Based Spatial Clustering of Applications with Noise (DBSCAN) can discover clusters of arbitrary shape and also handles outliers effectively. DBSCAN obtains clusters by finding the number of points within the specified distance from a given point. It involves computing distances from given point to all other points in the dataset. The conventional index based methods construct a hierarchical structure over the dataset to speed-up the neighbor search operations. The hierarchical index-structures fail to scale for datasets of dimensionality above 20. In this paper, we propose a novel graph-based index structure method Groups that accelerates the neighbor search operations and also scalable for high dimensional datasets. Experimental results show that the proposed method improves the speed of DBSCAN by a factor of about 1.5-2.2 on benchmark datasets. The performance of DBSCAN degrades considerably with noise due to unnecessary distance computations introduced by noise points while the proposed method is robust to noise by pruning out noise points early and eliminating the unnecessary distance computations. The cluster results produced by our method are exactly similar to that of DBSCAN but executed at a much faster pace. A graph-based index structure is built for speeding up neighbor search operations.No additional inputs are required to build the index structure.Proposed method is scalable for high-dimensional datasets.Handles noise effectively to improve the performance of DBSCAN.
TL;DR: Results obtained show that oversampling concrete types of examples may lead to a significant improvement over standard multi-class preprocessing that do not consider the importance of example types.
Abstract: Canonical machine learning algorithms assume that the number of objects in the considered classes are roughly similar. However, in many real-life situations the distribution of examples is skewed since the examples of some of the classes appear much more frequently. This poses a difficulty to learning algorithms, as they will be biased towards the majority classes. In recent years many solutions have been proposed to tackle imbalanced classification, yet they mainly concentrate on binary scenarios. Multi-class imbalanced problems are far more difficult as the relationships between the classes are no longer straightforward. Additionally, one should analyze not only the imbalance ratio but also the characteristics of the objects within each class. In this paper we present a study on oversampling for multi-class imbalanced datasets that focuses on the analysis of the class characteristics. We detect subsets of specific examples in each class and fix the oversampling for each of them independently. Thus, we are able to use information about the class structure and boost the more difficult and important objects. We carry an extensive experimental analysis, which is backed-up with statistical analysis, in order to check when the preprocessing of some types of examples within a class may improve the indiscriminate preprocessing of all the examples in all the classes. The results obtained show that oversampling concrete types of examples may lead to a significant improvement over standard multi-class preprocessing that do not consider the importance of example types. Graphical abstractDisplay Omitted HighlightsA thorough analysis of oversampling for handling multi-class imbalanced datasets.Proposition to detect underlying structures and example types in considered classes.Smart oversampling based on extracted knowledge about imbalance distribution types.In-depth insight into the importance of selecting proper examples for oversampling.Guidelines that allow to design efficient classifiers for multi-class imbalanced data.
TL;DR: A new feature input space is proposed and an LBP-like descriptor that operates in the local line-geometry space is defined, thus proposing a new image descriptor, local line directional patterns (LLDP).
Abstract: Local binary patterns (LBP) are one of the most important image representations However, LBPs have not been as successful as other methods in palmprint recognition Originally, the LBP descriptor methods construct feature vectors in the image intensity space, using pixel intensity differences to encode a local representation of the image Recently, similar feature descriptors have been proposed which operate in the gradient space instead of the image intensity space, such as local directional patterns (LDP) and local directional number patterns (LDN) In this paper, we propose a new feature input space and define an LBP-like descriptor that operates in the local line-geometry space, thus proposing a new image descriptor, local line directional patterns (LLDP) Moreover, the purpose of this work is to show that different implementations of LLDP descriptors perform competitively in palmprint recognition We evaluate variations to LLDPs, eg, the modified finite radon transform (MFRAT) and the real part of Gabor filters are exploited to extract robust directional palmprint features As is well-known, palm lines are the essential features of a palmprint We are able to show that the proposed LLDP descriptors are suitable for robust palmprint recognition Finally, we present a thorough performance comparison among different LBP-like and LLDP image descriptors Based on experimental results, the proposed feature encoding of LLDPs using directional indexing can achieve better recognition performance than that of bit strings in the Gabor-based implementation of LLDPs We used four databases for performance comparisons: the Hong Kong Polytechnic University Palmprint Database II, the blue band of the Hong Kong Polytechnic University Multispectral Palmprint Database, the Cross-Sensor palmprint database, and the IIT Delhi touchless palmprint database Overall, LLDP descriptors achieve a performance that is competitive or better than other LBP descriptors We propose LLDP descriptor, which uses the line feature to calculate the codeWe show that line direction index number is better than bit strings for codingLLDP achieves the best recognition performance among all LBP-structure descriptorsLLDP achieves promising recognition performance on four palmprint databases
TL;DR: A novel algorithm termed Weighted Multi-view Clustering with Feature Selection (WMCFS) that can simultaneously perform multi-view data clustering and feature selection and considers both view weighting and feature weighting.
Abstract: In recent years, combining multiple sources or views of datasets for data clustering has been a popular practice for improving clustering accuracy. As different views are different representations of the same set of instances, we can simultaneously use information from multiple views to improve the clustering results generated by the limited information from a single view. Previous studies mainly focus on the relationships between distinct data views, which would get some improvement over the single-view clustering. However, in the case of high-dimensional data, where each view of data is of high dimensionality, feature selection is also a necessity for further improving the clustering results. To overcome this problem, this paper proposes a novel algorithm termed Weighted Multi-view Clustering with Feature Selection (WMCFS) that can simultaneously perform multi-view data clustering and feature selection. Two weighting schemes are designed that respectively weight the views of data points and feature representations in each view, such that the best view and the most representative feature space in each view can be selected for clustering. Experimental results conducted on real-world datasets have validated the effectiveness of the proposed method. HighlightsThis paper proposes a new multi-view data clustering algorithm.The new method considers both view weighting and feature weighting.An EM-like method is designed to get the local optimum solution.Extensive experiments have been conducted to show the effectiveness.
TL;DR: The credal classification captures well the uncertainty and imprecision of classification, and reduces effectively the rate of misclassifications thanks to the introduction of meta-classes.
Abstract: In classification of incomplete pattern, the missing values can either play a crucial role in the class determination, or have only little influence (or eventually none) on the classification results according to the context. We propose a credal classification method for incomplete pattern with adaptive imputation of missing values based on belief function theory. At first, we try to classify the object (incomplete pattern) based only on the available attribute values. As underlying principle, we assume that the missing information is not crucial for the classification if a specific class for the object can be found using only the available information. In this case, the object is committed to this particular class. However, if the object cannot be classified without ambiguity, it means that the missing values play a main role for achieving an accurate classification. In this case, the missing values will be imputed based on the K-nearest neighbor (K-NN) and Self-Organizing Map (SOM) techniques, and the edited pattern with the imputation is then classified. The (original or edited) pattern is classified according to each training class, and the classification results represented by basic belief assignments are fused with proper combination rules for making the credal classification. The object is allowed to belong with different masses of belief to the specific classes and meta-classes (which are particular disjunctions of several single classes). The credal classification captures well the uncertainty and imprecision of classification, and reduces effectively the rate of misclassifications thanks to the introduction of meta-classes. The effectiveness of the proposed method with respect to other classical methods is demonstrated based on several experiments using artificial and real data sets. HighlightsMissing values are adaptively imputed in classification according to context.SOM and K-NN are used for the imputation with admissible computation burden.Ensemble classifier is introduced for credal classification.The imprecision of classification can be well captured using belief functions.The proposed method has been tested by artificial and real data sets.
TL;DR: This work proposes a novel lane detection method, whereby each lane has two boundaries, and demonstrates the outstanding performance of the method on the challenging dataset of road images compared with state-of-the-art lane-detection methods.
Abstract: With the increase in the number of vehicles, many intelligent systems have been developed to help drivers to drive safely. Lane detection is a crucial element of any driver assistance system. At present, researchers working on lane detection are confronted with several major challenges, such as attaining robustness to inconsistencies in lighting and background clutter. To address these issues in this work, we propose a method named Lane Detection with Two-stage Feature Extraction (LDTFE) to detect lanes, whereby each lane has two boundaries. To enhance robustness, we take lane boundary as collection of small line segments. In our approach, we apply a modified HT (Hough Transform) to extract small line segments of the lane contour, which are then divided into clusters by using the DBSCAN (Density Based Spatial Clustering of Applications with Noise) clustering algorithm. Then, we can identify the lanes by curve fitting. The experimental results demonstrate that our modified HT works better for LDTFE than LSD (Line Segment Detector). Through extensive experiments, we demonstrate the outstanding performance of our method on the challenging dataset of road images compared with state-of-the-art lane-detection methods. HighlightsWe proposed a novel lane detection method.Our method regards lane boundary as collection of small line segments.We proposed a modified Hough Transform to detect small line segments.Small line segments are clustered based on our proposed similarity measurement.Removing interferential clusters depends on the balance of small line segments.
TL;DR: It is demonstrated that initializing the weights of a convolutional neural network (CNN) classifier based on solutions generated by genetic algorithms (GA) minimizes the classification error.
Abstract: In this paper, an approach for human action recognition using genetic algorithms (GA) and deep convolutional neural networks (CNN) is proposed. We demonstrate that initializing the weights of a convolutional neural network (CNN) classifier based on solutions generated by genetic algorithms (GA) minimizes the classification error. A gradient descent algorithm is used to train the CNN classifiers (to find a local minimum) during fitness evaluations of GA chromosomes. The global search capabilities of genetic algorithms and the local search ability of gradient descent algorithm are exploited to find a solution that is closer to global-optimum. We show that combining the evidences of classifiers generated using genetic algorithms helps to improve the performance. We demonstrate the efficacy of the proposed classification system for human action recognition on UCF50 dataset. HighlightsAn approach for human action recognition using genetic algorithms (GA) and deep convolutional neural networks (CNN) is proposed.The global and local search capabilities of genetic algorithms and gradient descent algorithms, respectively, are exploited by initializing the CNN classifier with the solutions generated by genetic algorithms and training the classifiers using gradient descent algorithm for fitness evaluation of GA chromosomes.Also, the evolution of candidate solutions explored by GA framework is examined.A near accurate recognition performance of 99.98
TL;DR: This paper proposed a novel double-orientation code (DOC) scheme to represent the orientation feature of palmprint and designed an effective nonlinear angular matching score to evaluate the similarity between the DOC.
Abstract: Many palmprint authentication approaches have been proposed in recent years. Among them, the orientation based coding approach, in which the dominant orientation features of palmprints are extracted and encoded into bitwise codes, is one of the most promising approaches. The distance between codes created from two palmprint images is calculated in the matching stage. Reliable orientation feature extraction and efficient matching are the two most crucial problems in orientation based coding approaches. However, conventional coding based approaches usually extract only one dominant orientation feature by adopting filters with discrete orientations, which is sensitive to the noise and rotation. This paper proposed a novel double-orientation code (DOC) scheme to represent the orientation feature of palmprint and designed an effective nonlinear angular matching score to evaluate the similarity between the DOC. Extensive experiments performed on three types of palmprint databases demonstrate that the proposed approach has excellent performance in comparison with previously proposed state-of-the-art approaches. Proposed a novel DOC based method for palmprint identification.A nonlinear matching scheme is used in the coding based method.Double orientations with top-two responses are more robust.DOC is reasonable and reliable for palmprint feature extraction.Three different types of databases are employed in experiments.
TL;DR: A novel method, named complete canonical correlation analysis (C3A), which overcome the shortcomings of CCA when dealing with high-dimensional matrix and the singularity of generalized eigenvalue problem in CCA is overcome naturally.
Abstract: Canonical correlation analysis (CCA) is a well-known multivariate analysis method for quantifying the correlations between two sets of multidimensional variables. However, for multi-view gait recognition, it is difficult to directly apply CCA to deal with two sets of high-dimensional vectors because of computational complexity. Moreover, in such situation, the eigenmatrix of CCA is usually singular which makes the direct implementation of the CCA algorithm almost impossible. In practice, PCA or singular value decomposition is employed as a preprocessing step to solve these problems. Nevertheless, this strategy may discard dimensions that contain important discriminative information and correlation information. To overcome the shortcomings of CCA when dealing with two sets of high-dimensional vectors, we develop a novel method, named complete canonical correlation analysis (C3A). In our method, we first reformulate the traditional CCA so that we can avoid the computing of the inverse of a high-dimensional matrix. With the help of this reformulation, C3A further transforms the singular generalized eigensystem computation of CCA into two stable eigenvalue decomposition problems. Moreover, a feasible and effective method is proposed to alleviate the computational burden of high dimensional matrix for typical gait image data. Experimental results on two benchmark gait databases, CASIA gait database and the challenge USF gait database, demonstrate the effectiveness of the proposed method. HighlightsWe overcome the shortcomings of CCA when dealing with high-dimensional matrix.The singularity of generalized eigenvalue problem in CCA is overcome naturally.The important discriminative information is preserved completely in our algorithm.Our scheme learns stable and complete solutions.The multi-view gait recognition is achieved based on our method.
TL;DR: By adopting DropSample together with different types of domain-specific knowledge, the accuracy of HCCR can be improved efficiently and the use of domain -specific knowledge to enhance the performance of DCNN by adding a domain knowledge layer before the traditional CNN is investigated.
Abstract: Inspired by the theory of Leitner׳s learning box from the field of psychology, we propose DropSample , a new method for training deep convolutional neural networks (DCNNs), and apply it to large-scale online handwritten Chinese character recognition (HCCR). According to the principle of DropSample , each training sample is associated with a quota function that is dynamically adjusted on the basis of the classification confidence given by the DCNN softmax output. After a learning iteration, samples with low confidence will have a higher frequency of being selected as training data; in contrast, well-trained and well-recognized samples with very high confidence will have a lower frequency of being involved in the ongoing training and can be gradually eliminated. As a result, the learning process becomes more efficient as it progresses. Furthermore, we investigate the use of domain-specific knowledge to enhance the performance of DCNN by adding a domain knowledge layer before the traditional CNN. By adopting DropSample together with different types of domain-specific knowledge, the accuracy of HCCR can be improved efficiently. Experiments on the CASIA-OLHDWB 1.0, CASIA-OLHWDB 1.1, and ICDAR 2013 online HCCR competition datasets yield outstanding recognition rates of 97.33%, 97.06%, and 97.51% respectively, all of which are significantly better than the previous best results reported in the literature.
TL;DR: The Histogram of Oriented Gradient is extended and two new feature descriptors are proposed: Co-occurrence HOG (Co-HOG) and Convolutional Co-Hog (ConvCo- HOG) for accurate recognition of scene texts of different languages.
Abstract: Automatic machine reading of texts in scenes is largely restricted by the poor character recognition accuracy. In this paper, we extend the Histogram of Oriented Gradient (HOG) and propose two new feature descriptors: Co-occurrence HOG (Co-HOG) and Convolutional Co-HOG (ConvCo-HOG) for accurate recognition of scene texts of different languages. Compared with HOG which counts orientation frequency of each single pixel, the Co-HOG encodes more spatial contextual information by capturing the co-occurrence of orientation pairs of neighboring pixels. Additionally, ConvCo-HOG exhaustively extracts Co-HOG features from every possible image patches within a character image for more spatial information. The two features have been evaluated extensively on five scene character datasets of three different languages including three sets in English, one set in Chinese and one set in Bengali. Experiments show that the proposed techniques provide superior scene character recognition accuracy and are capable of recognizing scene texts of different scripts and languages. HighlightsIntroduced powerful features Co-HOG and ConvCo-HOG for scene character recognition.Designed a new offset based strategy for dimension reduction of above features.Developed two new scene character datasets for Chinese and Bengali scripts.Extensive simulations on 5 datasets of 3 scripts show the efficiency of the approach.
TL;DR: The proposed no-reference metric achieves the state-of-the-art performance for quality assessment of stereoscopic images, and is even competitive to existing full-reference quality metrics.
Abstract: In this paper, we propose to learn the structures of stereoscopic image based on convolutional neural network (CNN) for no-reference quality assessment. Taking image patches from the stereoscopic images as inputs, the proposed CNN can learn the local structures which are sensitive to human perception and representative for perceptual quality evaluation. By stacking multiple convolution and max-pooling layers together, the learned structures in lower convolution layers can be composed and convolved to higher levels to form a fixed-length representation. Multilayer perceptron (MLP) is further employed to summarize the learned representation to a final value to indicate the perceptual quality of the stereo image patch pair. With different inputs, two different CNNs are designed, namely one-column CNN with only the image patch from the difference image as input, and three-column CNN with the image patches from left-view image, right-view image, and difference image as the input. The CNN parameters for stereoscopic images are learned and transferred based on the large number of 2D natural images. With the evaluation on public LIVE phase-I, LIVE phase-II, and IVC stereoscopic image databases, the proposed no-reference metric achieves the state-of-the-art performance for quality assessment of stereoscopic images, and is even competitive to existing full-reference quality metrics. HighlightsCNNs are employed to learn the local structures for stereoscopic image quality assessment.Two CNNs are designed to learn the image local structures based on different inputs.CNN parameters are pretrained on 2D images and transferred to stereoscopic images.The performances on public databases demonstrate the superiority of proposed model.
TL;DR: A novel algorithm is proposed to directly extract fingers from salient hand edges and can not only extract extensional fingers but also flexional fingers with high accuracy and the orientation of the gesture can be calculated without the aid of arm direction.
Abstract: This paper presents a high-level hand feature extraction method for real-time gesture recognition. Firstly, the fingers are modelled as cylindrical objects due to their parallel edge feature. Then a novel algorithm is proposed to directly extract fingers from salient hand edges. Considering the hand geometrical characteristics, the hand posture is segmented and described based on the finger positions, palm center location and wrist position. A weighted radial projection algorithm with the origin at the wrist position is applied to localize each finger. The developed system can not only extract extensional fingers but also flexional fingers with high accuracy. Furthermore, hand rotation and finger angle variation have no effect on the algorithm performance. The orientation of the gesture can be calculated without the aid of arm direction and it would not be disturbed by the bare arm area. Experiments have been performed to demonstrate that the proposed method can directly extract high-level hand feature and estimate hand poses in real-time.
TL;DR: A novel anomaly detection framework which integrates motion and appearance cues to detect abnormal objects and behaviors in video, achieving comparable performance to other state-of-the-art anomaly detection techniques.
Abstract: In this paper, we present a novel anomaly detection framework which integrates motion and appearance cues to detect abnormal objects and behaviors in video. For motion anomaly detection, we employ statistical histograms to model the normal motion distributions and propose a notion of "cut-bin" in histograms to distinguish unusual motions. For appearance anomaly detection, we develop a novel scheme based on Support Vector Data Description (SVDD), which obtains a spherically shaped boundary around the normal objects to exclude abnormal objects. The two complementary cues are finally combined to achieve more comprehensive detection results. Experimental results show that the proposed approach can effectively locate abnormal objects in multiple public video scenarios, achieving comparable performance to other state-of-the-art anomaly detection techniques. HighlightsAn algorithm integrating motion and appearance cues for video anomaly detection.Motion model uses the "cut-bin" to detect abnormal motions.Appearance model uses a spherical boundary to exclude unusual objects.Integration of the two cues achieves higher detection rate and fewer false alarms.
TL;DR: This paper provides a multi-view dictionary low-rank regularization term to solve the noise problem, and designs a structural incoherence constraint for multi-View DL, such that redundancy among dictionaries of different views can be reduced.
Abstract: Recently, a multi-view dictionary learning (DL) technique has received much attention Although some multi-view DL methods have been presented, they suffer from the problem of performance degeneration when large noise exists in multiple views In this paper, we propose a novel multi-view DL approach named multi-view low-rank DL (MLDL) for image classification Specifically, inspired by the low-rank matrix recovery theory, we provide a multi-view dictionary low-rank regularization term to solve the noise problem We further design a structural incoherence constraint for multi-view DL, such that redundancy among dictionaries of different views can be reduced In addition, to enhance efficiency of the classification procedure, we design a classification scheme for MLDL, which is based on the idea of collaborative representation based classification We apply MLDL for face recognition, object classification and digit classification tasks Experimental results demonstrate the effectiveness and efficiency of the proposed approach We offer a multi-view low-rank dictionary learning method for image classificationMulti-view dictionary low-rank regularization term is designed to handle noiseStructural incoherence constraint is given to reduce redundancy in dictionariesMulti-view collaborative representation based classification scheme is providedEffectiveness and efficiency of our method are demonstrated on four datasets
TL;DR: In this paper, the cross talk between central pain and opioid actions becomes clearer, as basic science elucidates mechanisms of pain and analgesia, and the use of opioids has become increasingly common for the treatment of persistent pain.
Abstract: Introduction:In the past 2 decades, opioids have been used increasingly for the treatment of persistent pain, and doses have tended to creep up. As basic science elucidates mechanisms of pain and analgesia, the cross talk between central pain and opioid actions becomes clearer.Objectives:We
TL;DR: The proposed FSFOA is validated on several real world datasets and it is compared with some other methods including HGAFS, PSO and SVM-FuzCoc, which shows improvement in classification accuracy of classifiers in some datasets.
Abstract: Feature selection as a combinatorial optimization problem is an important preprocessing step in data mining; which improves the performance of the learning algorithms with the help of removing the irrelevant and redundant features. As evolutionary algorithms are reported to be suitable for optimization tasks, so Forest Optimization Algorithm (FOA) - which is initially proposed for continuous search problems - is adapted to be used for feature selection as a discrete search space problem. As the result, Feature Selection using Forest Optimization Algorithm (FSFOA) is proposed in this article in order to select the more informative features from the datasets. The proposed FSFOA is validated on several real world datasets and it is compared with some other methods including HGAFS, PSO and SVM-FuzCoc. The results of the experiments show that, FSFOA can improve the classification accuracy of classifiers in some selected datasets. Also, we have compared the dimensionality reduction of the proposed FSFOA with other available methods. HighlightsFOA is adjusted for solving Feature Selection (FS) as a discrete search problem.The proposed FSFOA is compared with GA, PSO and ACO based methods.We investigated the performance of FSFOA on 11 well-known datasets from UCI.Results show improvement in classification accuracy of classifiers in some datasets.
TL;DR: The condition under which density-based clustering algorithms fail in this scenario is identified and a density-ratio based method to overcome this weakness is proposed, and it is revealed that it can be implemented in two approaches.
Abstract: Density-based clustering algorithms are able to identify clusters of arbitrary shapes and sizes in a dataset which contains noise. It is well-known that most of these algorithms, which use a global density threshold, have difficulty identifying all clusters in a dataset having clusters of greatly varying densities. This paper identifies and analyses the condition under which density-based clustering algorithms fail in this scenario. It proposes a density-ratio based method to overcome this weakness, and reveals that it can be implemented in two approaches. One approach is to modify a density-based clustering algorithm to do density-ratio based clustering by using its density estimator to compute density-ratio. The other approach involves rescaling the given dataset only. An existing density-based clustering algorithm, which is applied to the rescaled dataset, can find all clusters with varying densities that would otherwise impossible had the same algorithm been applied to the unscaled dataset. We provide an empirical evaluation using DBSCAN, OPTICS and SNN to show the effectiveness of these two approaches. HighlightsAnalyse a key weakness of density-based clustering algorithms.Introduce two approaches based on density-ratio to overcome this weakness.ReCon converts an existing density estimator to a density-ratio estimator.ReScale transforms a dataset by an adaptive scaling based on density-ratio.ReCon and ReScale approaches improve three density-based clustering algorithms.
TL;DR: To speed up the training procedure, an efficient successive overrelaxation (SOR) algorithm is developed for solving the involved quadratic programming problems (QPP) in MLTSVM.
Abstract: Multi-label learning paradigm, which aims at dealing with data associated with potential multiple labels, has attracted a great deal of attention in machine intelligent community. In this paper, we propose a novel multi-label twin support vector machine (MLTSVM) for multi-label classification. MLTSVM determines multiple nonparallel hyperplanes to capture the multi-label information embedded in data, which is a useful promotion of twin support vector machine (TWSVM) for multi-label classification. To speed up the training procedure, an efficient successive overrelaxation (SOR) algorithm is developed for solving the involved quadratic programming problems (QPPs) in MLTSVM. Extensive experimental results on both synthetic and real-world multi-label datasets confirm the feasibility and effectiveness of the proposed MLTSVM.
TL;DR: The concept of super-object is introduced, which serves as a compact and adaptive representation for the ensemble data and significantly facilitates the computation and achieves the state-of-the-art performance in effectiveness and efficiency.
Abstract: In this paper, we propose a new ensemble clustering approach termed ensemble clustering using factor graph (ECFG) Compared to the existing approaches, our approach has three main advantages: (1) the cluster number is obtained automatically and need not to be specified in advance; (2) the reliability of each base clustering can be estimated in an unsupervised manner and exploited in the consensus process; (3) our approach is efficient for processing ensembles with large data sizes and large ensemble sizes In this paper, we introduce the concept of super-object, which serves as a compact and adaptive representation for the ensemble data and significantly facilitates the computation Through the probabilistic formulation, we cast the ensemble clustering problem into a binary linear programming (BLP) problem The BLP problem is NP-hard To solve this optimization problem, we propose an efficient solver based on factor graph The constrained objective function is represented as a factor graph and the max-product belief propagation is utilized to generate the solution insensitive to initialization and converged to the neighborhood maximum Extensive experiments are conducted on multiple real-world datasets, which demonstrate the effectiveness and efficiency of our approach against the state-of-the-art approaches HighlightsIntroduce the super-object representation to facilitate the consensus processProbabilistically formulate the ensemble clustering problem into a BLP problemPropose an efficient solver for the BLP problem based on factor graphThe cluster number of the consensus clustering is estimated automaticallyOur method achieves the state-of-the-art performance in effectiveness and efficiency
TL;DR: A novel method to automatically produce approximately axis-symmetrical virtual face images that is mathematically very tractable and quite easy to implement and verified in comparison with state-of-the-art dictionary learning algorithms.
Abstract: Though most of the faces are axis-symmetrical objects, few real-world face images are axis-symmetrical images. In the past years, there are many studies on face recognition, but only little attention is paid to this issue and few studies to explore and exploit the axis-symmetrical property of faces for face recognition are conducted. In this paper, we take the axis-symmetrical nature of faces into consideration and design a framework to produce approximately axis-symmetrical virtual dictionary for enhancing the accuracy of face recognition. It is noteworthy that the novel algorithm to produce axis-symmetrically virtual face images is mathematically very tractable and quite easy to implement. Extensive experimental results demonstrate the superiority in face recognition of the virtual face images obtained using our method to the original face images. Moreover, experimental results on different databases also show that the proposed method can achieve satisfactory classification accuracy in comparison with state-of-the-art image preprocessing algorithms. The MATLAB code of the proposed method can be available at http://www.yongxu.org/lunwen.html. HighlightsDeveloped a novel method to automatically produce approximately axis-symmetrical virtual face images.Treated as an effective image preprocessing method.Used as a virtual image dictionary learning method for image classification.Extensive experiments on different face databases show its effectiveness as an image preprocessing algorithm.The strong identification capability of our method is verified in comparison with state-of-the-art dictionary learning algorithms.
TL;DR: The proposed DiscCNN achieves state-of-the-art performances on scene, video and document scripts as well, not requiring any preprocess like binarization, segmentation or hand-crafted features.
Abstract: Script identification facilitates many important applications in document/video analysis. This paper investigates a relatively new problem: identifying scripts in natural images. The basic idea is combining deep features and mid-level representations into a globally trainable deep model. Specifically, a set of deep feature maps is firstly extracted by a pre-trained CNN model from the input images, where the local deep features are densely collected. Then, discriminative clustering is performed to learn a set of discriminative patterns based on such local features. A mid-level representation is obtained by encoding the local features based on the learned discriminative patterns (codebook). Finally, the mid-level representations and the deep features are jointly optimized in a deep network. Benefiting from such a fine-grained classification strategy, the optimized deep model, termed Discriminative Convolutional Neural Network (DisCNN), is capable of effectively revealing the subtle differences among the scripts difficult to be distinguished, e.g. Chinese and Japanese. In addition, a large scale dataset containing 16,291 in-the-wild text images in 13 scripts, namely SIW-13, is created for evaluation. Our method is not limited to identifying text images, and performs effectively on video and document scripts as well, not requiring any preprocess like binarization, segmentation or hand-crafted features. The experimental comparisons on the datasets including SIW-13, CVSI-2015 and Multi-Script consistently demonstrate DisCNN a state-of-the-art approach for script identification. HighlightsWe study a new and important topic: script identification in scene text images.The proposed DiscCNN combines deep features and the mid-level representation.DiscCNN learns special characteristics of scripts from training data automatically.DiscCNN achieves state-of-the-art performances on scene, video and document scripts.A large-scale in-the-wild script identification dataset is proposed.
TL;DR: A blind system identification approach to the design of alignment-free cancelable fingerprint templates to protect the binary string's frequency samples by countering or dissatisfying the identifiability condition so that they cannot be recovered from the output complex vector (transformed template).
Abstract: Cancelable biometrics is an important biometric template protection scheme. With no a priori image pre-alignment, alignment-free cancelable fingerprint templates do not suffer from inaccurate singular point detection. In this paper, we develop a blind system identification approach to the design of alignment-free cancelable fingerprint templates. The binary string, derived from quantized pair-minutiae vectors, is to be secured and its frequency samples act as the input to the proposed algorithm. Motivated by the identifiability of the source signal in blind system identification, we propose to protect the binary string's frequency samples, which are treated as the source input, by countering or dissatisfying the identifiability condition so that they cannot be recovered from the output complex vector (transformed template). The proposed transform is irreversible because when the identifiability condition is not met, non-identification of the source input is theoretically guaranteed in blind system identification. Security, matching performance and resource requirement are the main factors in cancelable template design. With the size of the transform parameter key and transformed templates being moderate, the proposed method suits resource-limited applications, e.g., smartcards, driver license and mobile phones. Evaluation of the proposed method over FVC2002 DB1, DB2 and DB3 shows that the new method exhibits favorable performance compared to state-of-the-art alignment-free cancelable fingerprint templates. HighlightsDesign of alignment-free cancellable fingerprint templates.Proposed cancellable templates satisfying the requirements of non-invertibility, revocability and diversity.Non-invertible transform to guarantee non-identification of the frequency samples of the binary string.Proposed method suitable for resource-limited applications, e.g., smardcards, driver license and mobile phones.Satisfactory matching performance compared with the existing alignment-free cancellable fingerprint templates.