Showing papers in "Pattern Recognition Letters in 2015"
TL;DR: The relevance of using machine learning algorithms able to integrate contextual information in the modelling, like long short-term memory recurrent neural networks do, to automatically predict emotion from several (asynchronous) raters in continuous time domains, is investigated.
Abstract: We study the relevance of context-learning for handling asynchrony of annotation.We unite audiovisual and physiological data for continuous affect analysis.We propose multi-time resolution features extraction from multimodal data.The use of context-learning allows to include reaction time delay of raters.Fusion of audiovisual and physiological data performs best on arousal and valence. Automatic emotion recognition systems based on supervised machine learning require reliable annotation of affective behaviours to build useful models. Whereas the dimensional approach is getting more and more popular for rating affective behaviours in continuous time domains, e.g., arousal and valence, methodologies to take into account reaction lags of the human raters are still rare. We therefore investigate the relevance of using machine learning algorithms able to integrate contextual information in the modelling, like long short-term memory recurrent neural networks do, to automatically predict emotion from several (asynchronous) raters in continuous time domains, i.e., arousal and valence. Evaluations are performed on the recently proposed RECOLA multimodal database (27 subjects, 5? min of data and six raters for each), which includes audio, video, and physiological (ECG, EDA) data. In fact, studies uniting audiovisual and physiological information are still very rare. Features are extracted with various window sizes for each modality and performance for the automatic emotion prediction is compared for both different architectures of neural networks and fusion approaches (feature-level/decision-level). The results show that: (i) LSTM network can deal with (asynchronous) dependencies found between continuous ratings of emotion with video data, (ii) the prediction of the emotional valence requires longer analysis window than for arousal and (iii) a decision-level fusion leads to better performance than a feature-level fusion. The best performance (concordance correlation coefficient) for the multimodal emotion prediction is 0.804 for arousal and 0.528 for valence.
207 citations
TL;DR: Experimental results demonstrate that the proposed approach is effective in recognizing leaves with varying texture, shape, size and orientations to an acceptable degree.
Abstract: This paper proposes a novel methodology of characterizing and recognizing plant leaves using a combination of texture and shape features. Texture of the leaf is modeled using Gabor filter and gray level co-occurrence matrix (GLCM) while shape of the leaf is captured using a set of curvelet transform coefficients together with invariant moments. Since these features are in general sensitive to the orientation and scaling of the leaf image, a pre-processing stage prior to feature extraction is applied to make corrections for varying translation, rotation and scaling factors. Efficacy of the proposed methods is studied by using two neural classifiers: a neuro-fuzzy controller (NFC) and a feed-forward back-propagation multi-layered perceptron (MLP) to discriminate between 31 classes of leaves. The features have been applied individually as well as in combination to investigate how recognition accuracies can be improved. Experimental results demonstrate that the proposed approach is effective in recognizing leaves with varying texture, shape, size and orientations to an acceptable degree. Methodology for plant leaf recognition using shape and texture features is proposed.Features are made invariant to scaling and orientation of leaf images.Classification is done using two different types of neural classifiers.System is tested using both known and unknown classes of leaf images.System is also designed to handle images with small amounts of deformations.
204 citations
TL;DR: A new dataset of iris images acquired by mobile devices can support researchers with regard to biometric dimensions of interest including uncontrolled settings, demographics, interoperability, and real-world applications.
Abstract: A new dataset of iris images acquired by mobile devices can support researchers.MICHE-I will assist with developing continuous authentication to counter spoofing.The dataset includes images from different mobile devices, sessions and conditions. We introduce and describe here MICHE-I, a new iris biometric dataset captured under uncontrolled settings using mobile devices. The key features of the MICHE-I dataset are a wide and diverse population of subjects, the use of different mobile devices for iris acquisition, realistic simulation of the acquisition process (including noise), several data capture sessions separated in time, and image annotation using metadata. The aim of MICHE-I dataset is to make up the starting core of a wider dataset that we plan to collect, with the further aim to address interoperability, both in the sense of matching samples acquired with different devices and of assessing the robustness of algorithms to the use of devices with different characteristics. We discuss throughout the merits of MICHE-I with regard to biometric dimensions of interest including uncontrolled settings, demographics, interoperability, and real-world applications. We also consider the potential for MICHE-I to assist with developing continuous authentication aimed to counter adversarial spoofing and impersonation, when the bar for uncontrolled settings raises even higher for proper and effective defensive measures.
185 citations
TL;DR: Empirically evaluate several major information metrics, namely, Hartley entropy, Shannon entropy, Renyi’s entropy, generalized entropy, Kullback–Leibler divergence and generalized information distance measure in their ability to detect both low-rate and high-rate DDoS attacks.
Abstract: Distributed Denial of Service (DDoS) attacks represent a major threat to uninterrupted and efficient Internet service. In this paper, we empirically evaluate several major information metrics, namely, Hartley entropy, Shannon entropy, Renyi’s entropy, generalized entropy, Kullback–Leibler divergence and generalized information distance measure in their ability to detect both low-rate and high-rate DDoS attacks. These metrics can be used to describe characteristics of network traffic data and an appropriate metric facilitates building an effective model to detect both low-rate and high-rate DDoS attacks. We use MIT Lincoln Laboratory, CAIDA and TUIDS DDoS datasets to illustrate the efficiency and effectiveness of each metric for DDoS detection.
182 citations
TL;DR: A new segmentation scheme is proposed and adapted to smartphone based visible iris images for approximating the radius of the iris to achieve robust segmentation and a new feature extraction method based on deepsparsefiltering is proposed to obtain robust features for unconstrained iris image images.
Abstract: Good biometric performance of iris recognition motivates it to be used for many large scale security and access control applications. Recent works have identified visible spectrum iris recognition as a viable option with considerable performance. Key advantages of visible spectrum iris recognition include the possibility of iris imaging in on-the-move and at-a-distance scenarios as compared to fixed range imaging in near-infra-red light. The unconstrained iris imaging captures the images with largely varying radius of iris and pupil. In this work, we propose a new segmentation scheme and adapt it to smartphone based visible iris images for approximating the radius of the iris to achieve robust segmentation. The proposed technique has shown the improved segmentation accuracy up to 85% with standard OSIRIS v4.1. This work also proposes a new feature extraction method based on deepsparsefiltering to obtain robust features for unconstrained iris images. To evaluate the proposed segmentation scheme and feature extraction scheme, we employ a publicly available database and also compose a new iris image database. The newly composed iris image database (VSSIRIS) is acquired using two different smartphones - iPhone 5S and Nokia Lumia 1020 under mixed illumination with unconstrained conditions in visible spectrum. The biometric performance is benchmarked based on the equal error rate (EER) obtained from various state-of-art schemes and proposed feature extraction scheme. An impressive EER of 1.62% is obtained on our VSSIRIS database and an average gain of around 2% in EER is obtained on the public database as compared to the well-known state-of-art schemes.
175 citations
TL;DR: The method is based on the bag of words approach, adapted to deal with the specific issues of audio surveillance: the need to recognize both short and long sounds, the presence of a significant noise level and of superimposed background sounds of intensity comparable to the audio events to be detected.
Abstract: The authors propose an audio events detection system tailored to surveillance applications.The method has been tested on a huge and challenging data set, made publicly available.The performance analysis has been done for low SNR values and under various conditions.A comparative analysis with other methods from the literature has been performed. In this paper we propose a novel method for the detection of audio events for surveillance applications. The method is based on the bag of words approach, adapted to deal with the specific issues of audio surveillance: the need to recognize both short and long sounds, the presence of a significant noise level and of superimposed background sounds of intensity comparable to the audio events to be detected. In order to test the proposed method in complex, realistic scenarios, we have built a large, publicly available dataset of audio events. The dataset has allowed us to evaluate the robustness of our method with respect to varying levels of the Signal-to-Noise Ratio; the experimentation has confirmed its applicability under real world conditions, and has shown a significant performance improvement with respect to other methods from the literature.
148 citations
TL;DR: In this article, a constructivist view of reality and science is sketched, and various desirable characteristics of clusterings and various approaches to define a context-dependent truth are listed, and the impact these ideas can have on the comparison of clustering methods and the choice of a clustering method and related decisions in practice.
Abstract: A constructivist view of reality and science is sketched.Context- and aim-dependent characteristics of clusterings are listed.Formal approaches to define true clusters are presented.Researchers need to communicate their cluster concept transparently.Comparisons should show how different methods are good for different aims. Constructivist philosophy and Hasok Chang's active scientific realism are used to argue that the idea of "truth" in cluster analysis depends on the context and the clustering aims. Different characteristics of clusterings are required in different situations. Researchers should be explicit about on what requirements and what idea of "true clusters" their research is based, because clustering becomes scientific not through uniqueness but through transparent and open communication. The idea of "natural kinds" is a human construct, but it highlights the human experience that the reality outside the observer's control seems to make certain distinctions between categories inevitable. Various desirable characteristics of clusterings and various approaches to define a context-dependent truth are listed, and I discuss what impact these ideas can have on the comparison of clustering methods, and the choice of a clustering methods and related decisions in practice.
146 citations
TL;DR: The connection of the kernel versions of the ELM classifier with infinite Single-hidden Layer Feedforward Neural networks is discussed and it is shown that the original ELM kernel definition can be adopted for the calculation of theELM kernel matrix for two of the most common activation functions.
Abstract: In this paper, we discuss the connection of the kernel versions of the ELM classifier with infinite Single-hidden Layer Feedforward Neural networks and show that the original ELM kernel definition can be adopted for the calculation of the ELM kernel matrix for two of the most common activation functions, i.e., the RBF and the sigmoid functions. In addition, we show that a low-rank decomposition of the kernel matrix defined on the input training data can be exploited in order to determine an appropriate ELM space for input data mapping. The ELM space determined from this process can be subsequently used for network training using the original ELM formulation. Experimental results denote that the adoption of the low-rank decomposition-based ELM space determination leads to enhanced performance, when compared to the standard choice, i.e., random input weights generation.
145 citations
TL;DR: In this paper, a multi-level thresholding method for unsupervised separation between objects and background from a natural color image using the concept of the minimum cross entropy (MCE) is proposed.
Abstract: We propose a novel multi-level thresholding method for unsupervised separation between objects and background from a natural color image using the concept of the minimum cross entropy (MCE). MCE based thresholding techniques are widely popular for segmenting grayscale images. Color image segmentation is still a challenging field as it involves 3-D histogram unlike the 1-D histogram of grayscale images. Effectiveness of entropy based multi-level thresholding for color image is yet to be explored and this paper presents a humble contribution in this context. We have used differential evolution (DE), a simple yet efficient evolutionary algorithm of current interest, to improve the computation time and robustness of the proposed algorithm. The performance of DE is also investigated extensively through comparison with other well-known nature inspired global optimization techniques like genetic algorithm (GA), particle swarm optimization (PSO), and artificial bee colony (ABC). The proposed method is evaluated by comparing it with seven other prominent algorithms both qualitatively and quantitatively using a well known benchmark suite – the Barkley Segmentation Dataset (BSDS300) with 300 distinct images. Such comparison reflects the efficiency of our algorithm
134 citations
TL;DR: By choosing proper preprocessing method, fine tuned by HDBPSO with Hamming distance as a proximity measure, it is possible to find important feature subsets in gene expression data with better and competitive performances.
Abstract: Gene expression data typically contain fewer samples (as each experiment is costly) and thousands of expression values (or features) captured by automatic robotic devices. Feature selection is one of the important and challenging tasks for this kind of data where many traditional methods failed and evolutionary based methods were succeeded. In this study, the initial datasets are preprocessed using a quartile based fast heuristic technique to reduce the crude domain features which are less relevant in categorizing the samples of either group. Hamming distance is introduced as a proximity measure to update the velocity of particle(s) in binary PSO framework to select the important feature subsets. The experimental results on three benchmark datasets vis-a-vis colon cancer, defused B-cell lymphoma and leukemia data are evaluated by means of classification accuracies and validity indices as well. Detailed comparative studies are also made to show the superiority and effectiveness of the proposed method. The present study clearly reveals that by choosing proper preprocessing method, fine tuned by HDBPSO with Hamming distance as a proximity measure, it is possible to find important feature subsets in gene expression data with better and competitive performances.
119 citations
TL;DR: This work proposes a modification of a framework originally designed for the task of action recognition and applies it to gait recognition, which allows us to achieve complex representations of gait sequences and thus express efficiently the dynamic characteristics of human walking sequences.
Abstract: Dynamic characteristics of gait are utilized for identity and gender recognition.A new publicly available dataset for gait recognition is presented.Our algorithm can operate extremely well with a small sample training size.The proposed framework follows a biologicaly inspired human motion analysis.The hierarchy of feature representations results in a high level description. Gait analysis has gained new impetus over the past few years. This is mostly due to the launch of low cost depth cameras accompanied with real time pose estimation algorithms. In this work we focus on the problem of human gait recognition. In particular, we propose a modification of a framework originally designed for the task of action recognition and apply it to gait recognition. The new scheme allows us to achieve complex representations of gait sequences and thus express efficiently the dynamic characteristics of human walking sequences. The representational power of the suggested model is evaluated on a publicly available dataset where we achieved up to 93.29% identification rate, 3.1% EER on the verification task and 99.11% gender recognition rate.
[...]
TL;DR: The achievements that have been made in recognition by and in estimation of these parameters are surveyed, describing how these approaches can be used and where they might lead to.
Abstract: Innovation has formed much of the rich history in biometrics. The field of soft biometrics was originally aimed to augment the recognition process by fusion of metrics that were sufficient to discriminate populations rather than individuals. This was later refined to use measures that could be used to discriminate individuals, especially using descriptions that can be perceived using human vision and in surveillance imagery. A further branch of this new field concerns approaches to estimate soft biometrics, either using conventional biometrics approaches or just from images alone. These three strands combine to form what is now known as soft biometrics. We survey the achievements that have been made in recognition by and in estimation of these parameters, describing how these approaches can be used and where they might lead to. The approaches lead to a new type of recognition, and one similar to Bertillonage which is one of the earliest approaches to human identification.
TL;DR: This paper proposes a new feature descriptor named Local Directional Texture Pattern (LDTP) that is versatile, as it allows us to distinguish person's expressions, and different landscapes scenes, and uses Principal Component Analysis to reduce the dimension of the multilevel feature set.
Abstract: We created a micro-pattern that models the information of the principal directions.We discriminate and select the prominent information in each neighborhood.We model texture and structure simultaneously.We created a code that is versatile and works on large and small textures. Deriving an effective image representation is a critical step for a successful automatic image recognition application. In this paper, we propose a new feature descriptor named Local Directional Texture Pattern (LDTP) that is versatile, as it allows us to distinguish person's expressions, and different landscapes scenes. In detail, we compute the LDTP feature, at each pixel, by extracting the principal directions of the local neighborhood, and coding the intensity differences on these directions. Consequently, we represent each image as a distribution of LDTP codes. The mixture of structural and contrast information makes our descriptor robust against illumination changes and noise. We also use Principal Component Analysis to reduce the dimension of the multilevel feature set, and test the results on this new descriptor as well.
TL;DR: A new feature selection and weighting method aided with the decomposition based evolutionary multi-objective algorithm called MOEA/D, which is tested with several practical datasets from the well-known data repositories like UCI and LIBSVM to demonstrate the superiority of the proposed algorithm.
Abstract: Selection of feature subset is a preprocessing step in computational learning, and it serves several purposes like reducing the dimensionality of a dataset, decreasing the computational time required for classification and enhancing the classification accuracy of a classifier by removing redundant and misleading or erroneous features. This paper presents a new feature selection and weighting method aided with the decomposition based evolutionary multi-objective algorithm called MOEA/D. The feature vectors are selected and weighted or scaled simultaneously to project the data points to such a hyper space, where the distance between data points of non-identical classes is increased, thus, making them easier to classify. The inter-class and intra-class distances are simultaneously optimized by using MOEA/D to obtain the optimal features and the scaling factor associated with them. Finally, k-NN (k-Nearest Neighbor) is used to classify the data points having the reduced and weighted feature set. The proposed algorithm is tested with several practical datasets from the well-known data repositories like UCI and LIBSVM. The results are compared with those obtained with the state-of-the-art algorithms to demonstrate the superiority of the proposed algorithm. Presents a simultaneous feature selection and weighting method.Use of penalty to reduce number of selected features.Use of very competitive MOEA/D as a core optimizer.Best compromise solution to obtain the best feature selection and weighting vector.Evaluation on UCI and LIBSVM datasets.
TL;DR: A simple yet effective fusion of descriptors based on texture and local appearance; and a deep learning scheme for accurate age estimation, which demonstrate state-of-the-art results over previous work.
Abstract: Two novel methods for age estimation, using simple alignment unlike previous works.Fusing local texture/appearance descriptors improves over complex features like BIF.We propose a deep learning scheme to improve current state-of-the-art.Exhaustive validation over large databases, outperforming previous results in the field. The automatic estimation of age from face images is increasingly gaining attention, as it facilitates applications including advanced video surveillance, demographic statistics collection, customer profiling, or search optimization in large databases. Nevertheless, it becomes challenging to estimate age from uncontrollable environments, with insufficient and incomplete training data, dealing with strong person-specificity and high within-range variance. These difficulties have been recently addressed with complex and strongly hand-crafted descriptors, difficult to replicate and compare. This paper presents two novel approaches: first, a simple yet effective fusion of descriptors based on texture and local appearance; and second, a deep learning scheme for accurate age estimation. These methods have been evaluated under a diversity of settings, and the extensive experiments carried out on two large databases (MORPH and FRGC) demonstrate state-of-the-art results over previous work.
TL;DR: The application of well-known iris and periocular recognition strategies based on classical encoding and matching techniques are proposed, as well as demonstrating how they can be combined to overcome the issues associated with mobile environments.
Abstract: Announcement of an iris and periocular dataset, with 10 different mobile setups.Mobile biometric recognition approach based on iris and periocular information.Improvements from a sensor-specific color calibration technique are reported.Biometric recognition feasibility over mobile cross-sensor setups is shown.Preferable mobile setups are pointed out. In recent years, the usage of mobile devices has increased substantially, as have their capabilities and applications. Extending biometric technologies to these gadgets is desirable because it would facilitate biometric recognition almost anytime, anywhere, and by anyone. The present study focuses on biometric recognition in mobile environments using iris and periocular information as the main traits. Our study makes three main contributions, as follows. (1) We demonstrate the utility of an iris and periocular dataset, which contains images acquired with 10 different mobile setups and the corresponding iris segmentation data. This dataset allows us to evaluate iris segmentation and recognition methods, as well as periocular recognition techniques. (2) We report the outcomes of device-specific calibration techniques that compensate for the different color perceptions inherent in each setup. (3) We propose the application of well-known iris and periocular recognition strategies based on classical encoding and matching techniques, as well as demonstrating how they can be combined to overcome the issues associated with mobile environments.
TL;DR: An image retrieval framework that uses affine image moment invariants as descriptors of local image areas and results are promising compared with other widely used local descriptors, allowing the proposed framework to serve as a reference point for future image moment local descriptor applied to the general task of content based image retrieval.
Abstract: A new image descriptor specifically designed for image retrieval tasks is introduced.Evaluation of affine moment invariants in the area of image retrieval.The usage of image chromaticities improves the overall retrieval performance. This paper presents an image retrieval framework that uses affine image moment invariants as descriptors of local image areas. Detailed feature vectors are generated by feeding the produced moments into a Bag-of-Visual-Words representation. Image moment invariants have been selected for their compact representation of image areas as well as due to their ability to remain unchanged under affine image transformations. Three different setups were examined in order to evaluate and discuss the overall approach. The retrieval results are promising compared with other widely used local descriptors, allowing the proposed framework to serve as a reference point for future image moment local descriptors applied to the general task of content based image retrieval.
TL;DR: A novel approach to automatic detection and tracking of people taking different poses in cluttered and dynamic environments using a single RGB-D camera and a single-pass, progressive refinement framework enables the system to achieve high accuracy at real time.
Abstract: A PEI representation is proposed to alleviate overlapping in original image domain while preserving information of all pixels.A human plausible candidates locating technique is proposed to quickly reduce search space of the detector.Two novel features are proposed to characterize human shape and appearance in 3D space.A single-pass, progressive refinement framework enables the system to achieve high accuracy at real time. We propose a novel approach to automatic detection and tracking of people taking different poses in cluttered and dynamic environments using a single RGB-D camera. The original RGB-D pixels are transformed to a novel point ensemble image (PEI), and we demonstrate that human detection and tracking in 3D space can be performed very effectively with this new representation. The detector in the first phase quickly locates human physiquewise plausible candidates, which are then further carefully filtered in a supervised learning and classification second phase. Joint statistics of color and height are computed for data association to generate final 3D motion trajectories of tracked individuals. Qualitative and quantitative experimental results obtained on the publicly available office dataset, mobile camera dataset and the real-world clothing store dataset we created show very promising results.
TL;DR: The image classification experiments show that the simultaneous use of the proposed novel representations and original images can obtain a much higher accuracy?than the use of only the original images.
Abstract: To extract salient features from images is significant for image classification. Deformable objects suffer from the problem that a number of pixels may have varying intensities. In other words, pixels at the same positions of training samples and test samples of an object usually have different intensities, which makes it difficult to obtain salient features of images of deformable objects. In this paper, we propose a novel method to address this issue. Our method first produces new representation of original images that can enhance pixels with moderate intensities of the original images and reduces the importance of other pixels. The new representation and original image of the object are complementary in representing the object, so the integration of them is able to improve the accuracy of image classification. The image classification experiments show that the simultaneous use of the proposed novel representations and original images can obtain a much higher accuracy?than the use of only the original images. In particular, the incorporation of sparse representation with the proposed method can bring surprising improvement in accuracy. The maximum improvement in the accuracy may be greater than 8%. Moreover, The proposed non-parameter weighted fusion procedure is also attractive. The code of the proposed method is available at http://www.yongxu.org/lunwen.html.
TL;DR: New research questions via new means for socio?behavioral and emotional investigations are raised, and the gathering of new experimental data and theories across a spectrum of research concepts are suggested in order to develop new psychological and computational approaches crucial for implementing believable and trustable HCI systems which exploit synthetic agents, robots, and sophisticated humanlike interfaces.
Abstract: Emotional expressions.Multimodal communication.Needs and challenges in emotionally and believable ICT interfaces. Demand for and delivery so far of sophisticated computational instruments able to recognize, process and store relevant interactional signals, as well as interact with people, displaying suitable autonomous reactions appropriately sensitive to environmental changes, have produced great expectations in Information Communication Technology (ICT). Knowing what an appropriate continuation of an interaction is depends on detecting the addresser's register and a machine interface unable to assess differences will have difficulty managing interactions. Progress toward understanding and modeling such facets is crucial for implementing behaving Human Computer Interaction (HCI) systems that will simplify user access to future, profitable, remote and nearby social services. This paper raises new research questions via new means for socio?behavioral and emotional investigations, and suggests the gathering of new experimental data and theories across a spectrum of research concepts, in order to develop new psychological and computational approaches crucial for implementing believable and trustable HCI systems which exploit synthetic agents, robots, and sophisticated humanlike interfaces.
TL;DR: The numerical results on some real datasets show that the proposed method outperforms the contrast feature weighting methods, and is very competitive if compared with some other commonly used classifiers such as SVM.
Abstract: The global selection index can be determined from the local selection indexes.The local selection index can be calculated in its own dimension.The prediction function can be factorized.The NB models can be selectively pruned by thresholding the LSIs.Feature selection and weighting work hand-in-hand to improve classification. Feature subset selection is known to improve text classification performance of various classifiers. The model using the selected features is often regarded as if it had generated the data. By taking its uncertainty into account, the discrimination capabilities can be measured by a global selection index (GSI), which can be used in the prediction function. In this paper, we propose a latent selection augmented naive (LSAN) Bayes classifier. By introducing a latent feature selection indicator, the GSI can be factorized into each local selection index (LSI). Using conjugate priors, the LSI for feature evaluation can be explicitly calculated. Then the feature subset selection models can be pruned by thresholding the LSIs, and the LSAN classifier can be achieved by the product of a small percentage of single feature model averages. The numerical results on some real datasets show that the proposed method outperforms the contrast feature weighting methods, and is very competitive if compared with some other commonly used classifiers such as SVM.
TL;DR: A simple and effective method to generate a classifier of face images, by training a linear classification algorithm on a massive dataset entirely assembled and labelled by automated means, and proposes a general way to generate and exploit massive data without human annotation.
Abstract: We automatically assemble a big dataset to train a face gender classifier.This is formed by 4 million images and over 60,000 features.The resulting system significantly outperforms the previous state of the art without human annotation.This study lends support to the "unreasonable effectiveness of data" conjecture.This study is relevant to computer vision (LBP features, face classification), machine learning (large scale linear classifiers), and big data.This study can serve as a template for other "web scale" learning tasks. The application of learning algorithms to big datasets has been identified for a long time as an effective way to attack important tasks in pattern recognition, but the generation of large annotated datasets has a significant cost. We present a simple and effective method to generate a classifier of face images, by training a linear classification algorithm on a massive dataset entirely assembled and labelled by automated means. In doing so, we perform the largest experiment on face gender recognition so far published, reporting the highest performance yet. Four million images and more than 60,000 features are used to train online classifiers. By using an ensemble of linear classifiers, we achieve an accuracy of 96.86% on the most challenging public database, labelled faces in the wild (LFW), 2.05% higher than the previous best result on the same dataset (Shan, 2012). This result is relevant both for the machine learning community, addressing the role of large datasets, and the computer vision community, providing a way to make high quality face gender classifiers. Furthermore, we propose a general way to generate and exploit massive data without human annotation. Finally, we demonstrate a simple and effective adaptation of the Pegasos that makes it more robust.
TL;DR: A fast and accurate technique to detect printed-iris attacks based on the local binary pattern (LBP) descriptor, which makes possible the implementation for the relatively small CPU processing power of a mobile device.
Abstract: We propose to use local binary patterns (LBP) on image residual for iris liveness detection.Low-complexity interpolation-free implementation enables use on mobile devices.Performance is promising for both print-based and screen-based attacks. Iris recognition is well suited to authentication on mobile devices, due to its intrinsic security and non-intrusiveness. However, authentication systems can be easily tricked by attacks based on high-quality printing. A liveness detection module is therefore necessary. Here, we propose a fast and accurate technique to detect printed-iris attacks based on the local binary pattern (LBP) descriptor. In order to improve the discrimination ability of LBP and better explore the image statistics, LBP is performed on a high-pass version of the image with 3 i? 3 integer kernel. In addition a simplified interpolation-free descriptor is considered and finally a linear SVM classification scheme is used. The detection performance, measured on standard databases, is extremely promising, despite the resulting very low complexity, which makes possible the implementation for the relatively small CPU processing power of a mobile device.
TL;DR: A comprehensive method for iris authentication on mobiles by means of spatial histograms is described and tested, featuring subjects captured indoor and outdoor under controlled and uncontrolled conditions by Means of built-in cameras aboard three among the most diffused smartphones/tablets on the market.
Abstract: Iris authentication/recognition on mobile devices is feasible.Spatial histograms can be exploited for iris features extraction and matching.Performance of iris segmentation/recognition algorithms is strongly affected by capture conditions.Imaging sensor's resolution alone does not necessarily result in higher recognition accuracy. The worldwide diffusion of latest generations mobile devices, namely smartphones and tablets, represents the technological premise to a new wave of applications for which reliable owner identification is becoming a key requirement. This crucial task can be approached by means of biometrics (face, iris or fingerprint) by exploiting high resolution imaging sensors typically built-in on this class of devices, possibly resulting in a ubiquitous platform to verify owner identity during any kind of transaction involving the exchange of sensible data. Among the aforementioned biometrics, iris is known for its inherent invariance and accuracy, though only a few works have explored this topic on mobile devices. In this paper a comprehensive method for iris authentication on mobiles by means of spatial histograms is described. The proposed approach has been tested on the MICHE-I iris dataset, featuring subjects captured indoor and outdoor under controlled and uncontrolled conditions by means of built-in cameras aboard three among the most diffused smartphones/tablets on the market. The experimental results collected, provide an interesting insight about the readiness of mobile technology with regard to iris recognition.
TL;DR: This paper presents a fast precise unsupervised iris defects detection method based on the underlying multispectral spatial probabilistic iris textural model and adaptive thresholding applied to demanding high resolution mobile device measurements.
Abstract: Accurate iris defect detection.Unconstrained mobile devices high resolution color iris images.Multispectral Markovian spatial texture model.Ranked first from the 97+1 recent Noisy Iris Challenge Evaluation contest alternative methods.Promising performance on the very challenging, high resolution, and highly variable Mobile Iris Challenge Evaluation data. Display Omitted This paper presents a fast precise unsupervised iris defects detection method based on the underlying multispectral spatial probabilistic iris textural model and adaptive thresholding applied to demanding high resolution mobile device measurements. The accurate detection of iris eyelids and reflections is the prerequisite for the accurate iris recognition, both in near-infrared or visible spectrum measurements. The model adaptively learns its parameters on the iris texture part and subsequently checks for iris reflections using the recursive prediction analysis. The method is developed for color eye images from unconstrained mobile devices but it was also successfully tested on the UBIRIS v2 eye database. Our method ranked first from the 97+1 recent Noisy Iris Challenge Evaluation contest alternative methods on this large color iris database using the exact contest data and methodology.
TL;DR: This paper presents gait image features based on the information set theory, henceforth these are called gait information image features, and demonstrates the robustness of the proposed features.
Abstract: The information set that widens the fuzzy set theory.A spatiotemporal statistical gait representation, which expresses the statistics of motion patterns.The superiority of our features demonstrated by testing them on changed co-variant conditions (i.e. clothing, carrying) and change in speed.The performance of the new features is quantified through measures like cumulative match characteristics (CMC). Human gait, a soft biometric helps to recognize people by the manner, they walk. This paper presents gait image features based on the information set theory, henceforth these are called gait information image features. The information set stems from a fuzzy set with a view to represent the uncertainty in the information source values using the entropy function. The proposed gait information image (GII) is derived by applying the concept of information set on the frames in one gait cycle and two features named gait information image with energy feature (GII-EF) and gait information image with sigmoid feature (GII-SF) are extracted. Nearest neighbor (NN) classifier is applied to identify the gait. The proposed features are tested on Casia-B dataset, SOTON small database with variations in clothing and carrying conditions and on OU-ISIR Treadmill B database with large variation in clothing conditions. Moreover, experiments are carried out on OU-ISIR Treadmill A database with slight variation in the walking speeds to demonstrate the robustness of the proposed features.
TL;DR: A dynamic Bayesian model for the VFOA recognition from head pose is proposed, and a novel gaze models that dynamically and more accurately predict the expected head orientation used for looking in a given gaze target direction are proposed.
Abstract: Recognizing the visual focus of attention in the HRI context.Relying on head pose since eye gaze estimation is often impossible to achieve.Inspired from the behavioral models for body, head and gaze dynamics in gaze shifts.Exploiting the robot conversational state as context to reduce recognition ambiguities.Experiments on a public dataset where the robot Nao plays the role of an art guide. The ability to recognize the visual focus of attention (VFOA, i.e. what or whom a person is looking at) of people is important for robots or conversational agents interacting with multiple people, since it plays a key role in turn-taking, engagement or intention monitoring. As eye gaze estimation is often impossible to achieve, most systems currently rely on head pose as an approximation, creating ambiguities since the same head pose can be used to look at different VFOA targets. To address this challenge, we propose a dynamic Bayesian model for the VFOA recognition from head pose, where we make two main contributions. First, taking inspiration from behavioral models describing the relationships between the body, head and gaze orientations involved in gaze shifts, we propose novel gaze models that dynamically and more accurately predict the expected head orientation used for looking in a given gaze target direction. This is a neglected aspect of previous works but essential for recognition. Secondly, we propose to exploit the robot conversational state (when he speaks, objects to which he refers) as context to set appropriate priors on candidate VFOA targets and reduce the inherent VFOA ambiguities. Experiments on a public dataset where the humanoid robot NAO plays the role of an art guide and quiz master demonstrate the benefit of the two contributions.
TL;DR: This is the first method that applies a multi-objective approach to data imputation, based on the NSGA-II, which is suitable for mixed-attribute datasets and takes into account information from incomplete instances and the modeling task.
Abstract: The paper proposes a novel Multi-objective Genetic Algorithm for Data Imputation, called MOGAImp.This is the first method that applies a multi-objective approach to data imputation.MOGAImp presents a good tradeoff between the evaluation measures studied.The results confirm the MOGAImp prevalence for utilization over conflicting evaluation measures.MOGAImp codification scheme makes possible to adapt it to different application domains. A large number of techniques for data analyses have been developed in recent years, however most of them do not deal satisfactorily with a ubiquitous problem in the area: the missing data. In order to mitigate the bias imposed by this problem, several treatment methods have been proposed, highlighting the data imputation methods, which can be viewed as an optimization problem where the goal is to reduce the bias caused by the absence of information. Although most imputation methods are restricted to one type of variable whether categorical or continuous. To fill these gaps, this paper presents the multi-objective genetic algorithm for data imputation called MOGAImp, based on the NSGA-II, which is suitable for mixed-attribute datasets and takes into account information from incomplete instances and the modeling task. A set of tests for evaluating the performance of the algorithm were applied using 30 datasets with induced missing values; five classifiers divided into three classes: rule induction learning, lazy learning and approximate models; and were compared with three techniques presented in the literature. The results obtained confirm the MOGAImp outperforms some well-established missing data treatment methods. Furthermore, the proposed method proved to be flexible since it is possible to adapt it to different application domains.
[...]
TL;DR: Whether a convergence process can be used to solve RAT queries and if frequency of appearance of the test items in language data may influence knowledge association or discovery in solving such problems are studied.
Abstract: A cognitive system (comRAT-C) solving the Remote Associates Test (RAT) is implemented.comRAT-C gives results comparable to human normative data results.A hypothesis on human answer preference is quantified. Empirical support is provided.Cognitive difficulty of RAT correlates with comRAT-C probability of finding answer. Discovering the processes and types of knowledge organization which are involved in the creative process is a challenge up to this date. Human creativity is usually measured by psychological tests, such as the Remote Associates Test (RAT). In this paper, an approach based on a specific type of knowledge organization and processes which enables automatic solving of RAT queries is implemented (comRAT) as a part of a more general cognitive theoretical framework for creative problem-solving (CreaCogs). This aims to study: (a) whether a convergence process can be used to solve such queries and (b) if frequency of appearance of the test items in language data may influence knowledge association or discovery in solving such problems.The comRAT uses a knowledge base of language data extracted from the Corpus of Contemporary American English. The results obtained are compared to results obtained in empirical tests with humans. In order to explain why some answers might be preferred over others, frequencies of appearance of the queries and solutions are analyzed. The difficulty encountered by humans when solving RAT queries is expressed in response times and percentage of participants solving the query, and a significant moderate correlation between human data on query difficulty and the data provided by this approach is obtained.
TL;DR: In this article, the authors propose to learn the source subspace that best matches the target subspace while at the same time minimizing a regularized misclassification loss, and provide an alternating optimization technique based on stochastic sub-gradient descent to solve the learning problem.
Abstract: Focus on unsupervised domain adaptation by subspace learningHighlight the importance of learning jointly the prediction model and the subspacePropose the JCSL methodThorough experimental evaluation against existing subspace adaptive methodsInsight into JCSL with analysis on parameters and domain shift reduction Domain adaptation aims at adapting the knowledge acquired on a source domain to a new different but related target domain Several approaches have been proposed for classification tasks in the unsupervised scenario, where no labeled target data are available Most of the attention has been dedicated to searching a new domain-invariant representation, leaving the definition of the prediction function to a second stage Here we propose to learn both jointly Specifically we learn the source subspace that best matches the target subspace while at the same time minimizing a regularized misclassification loss We provide an alternating optimization technique based on stochastic sub-gradient descent to solve the learning problem and we demonstrate its performance on several domain adaptation tasks