scispace - formally typeset
Search or ask a question

Showing papers on "Histogram of oriented gradients published in 2018"


Journal ArticleDOI
TL;DR: The multiple feature fusion approach is robust in dealing with video-based facial expression recognition problems under lab-controlled environment and in the wild compared with the other state-of-the-art methods.
Abstract: Video based facial expression recognition has been a long standing problem and attracted growing attention recently. The key to a successful facial expression recognition system is to exploit the potentials of audiovisual modalities and design robust features to effectively characterize the facial appearance and configuration changes caused by facial motions. We propose an effective framework to address this issue in this paper. In our study, both visual modalities (face images) and audio modalities (speech) are utilized. A new feature descriptor called Histogram of Oriented Gradients from Three Orthogonal Planes (HOG-TOP) is proposed to extract dynamic textures from video sequences to characterize facial appearance changes. And a new effective geometric feature derived from the warp transformation of facial landmarks is proposed to capture facial configuration changes. Moreover, the role of audio modalities on recognition is also explored in our study. We applied the multiple feature fusion to tackle the video-based facial expression recognition problems under lab-controlled environment and in the wild, respectively. Experiments conducted on the extended Cohn-Kanade (CK+) database and the Acted Facial Expression in Wild (AFEW) 4.0 database show that our approach is robust in dealing with video-based facial expression recognition problems under lab-controlled environment and in the wild compared with the other state-of-the-art methods.

176 citations


Journal ArticleDOI
TL;DR: A multi-constraint fully convolutional network (MC–FCN) model is proposed to perform end-to-end building segmentation and significantly outperforms the classic FCN method and the adaptive boosting method using features extracted by the histogram of oriented gradients.
Abstract: Automatic building segmentation from aerial imagery is an important and challenging task because of the variety of backgrounds, building textures and imaging conditions. Currently, research using variant types of fully convolutional networks (FCNs) has largely improved the performance of this task. However, pursuing more accurate segmentation results is still critical for further applications such as automatic mapping. In this study, a multi-constraint fully convolutional network (MC–FCN) model is proposed to perform end-to-end building segmentation. Our MC–FCN model consists of a bottom-up/top-down fully convolutional architecture and multi-constraints that are computed between the binary cross entropy of prediction and the corresponding ground truth. Since more constraints are applied to optimize the parameters of the intermediate layers, the multi-scale feature representation of the model is further enhanced, and hence higher performance can be achieved. The experiments on a very-high-resolution aerial image dataset covering 18 km 2 and more than 17,000 buildings indicate that our method performs well in the building segmentation task. The proposed MC–FCN method significantly outperforms the classic FCN method and the adaptive boosting method using features extracted by the histogram of oriented gradients. Compared with the state-of-the-art U–Net model, MC–FCN gains 3.2% (0.833 vs. 0.807) and 2.2% (0.893 vs. 0.874) relative improvements of Jaccard index and kappa coefficient with the cost of only 1.8% increment of the model-training time. In addition, the sensitivity analysis demonstrates that constraints at different positions have inconsistent impact on the performance of the MC–FCN.

157 citations


Proceedings ArticleDOI
17 May 2018
TL;DR: A study of technique for human detection from video is presented, which is the Histograms of Oriented Gradients or HOG by developing a piece of application to import and detect the human from the video.
Abstract: Currently, Computer Vision (CV) is one of the most popular research topics in the world. This is because it can support the human daily life. Moreover, CV can also apply to various theories and researches. Human Detection is one of the most popular research topics in Computer Vision. In this paper, we present a study of technique for human detection from video, which is the Histograms of Oriented Gradients or HOG by developing a piece of application to import and detect the human from the video. We use the HOG Algorithm to analyze every frame from the video to find and count people. After analyzing video from starting to the end, the program generate histogram to show the number of detected people versus playing period of the video. As a result, the expected results are obtained, including the detection of people in the video and the histogram generation to show the appearance of human detected in the video file.

139 citations


Journal ArticleDOI
TL;DR: A novel approach for localizing a robot over longer periods of time using only monocular image data and a novel data association approach for matching streams of incoming images to an image sequence stored in a database are presented.
Abstract: Localization is an integral part of reliable robot navigation, and long-term autonomy requires robustness against perceptional changes in the environment during localization. In the context of vision-based localization, such changes can be caused by illumination variations, occlusion, structural development, different weather conditions, and seasons. In this paper, we present a novel approach for localizing a robot over longer periods of time using only monocular image data. We propose a novel data association approach for matching streams of incoming images to an image sequence stored in a database. Our method exploits network flows to leverage sequential information to improve the localization performance and to maintain several possible trajectories hypotheses in parallel. To compare images, we consider a semidense image description based on histogram of oriented gradients features as well as global descriptors from deep convolutional neural networks trained on ImageNet for robust localization. We perform extensive evaluations on a variety of datasets and show that our approach outperforms existing state-of-the-art approaches.

130 citations


Proceedings ArticleDOI
26 Jun 2018
TL;DR: In this paper, a novel unsupervised deep neural network architecture of a feature embedding for visual loop closure is proposed, which is built upon the autoencoder architecture tailored specifically to the problem at hand.
Abstract: Robust efficient loop closure detection is essential for large-scale real-time SLAM. In this paper, we propose a novel unsupervised deep neural network architecture of a feature embedding for visual loop closure that is both reliable and compact. Our model is built upon the autoencoder architecture, tailored specifically to the problem at hand. To train our network, we inflict random noise on our input data as the denoising autoencoder does, but, instead of applying random dropout, we warp images with randomized projective transformations to emulate natural viewpoint changes due to robot motion. Moreover, we utilize the geometric information and illumination invariance provided by histogram of oriented gradients (HOG), forcing the encoder to reconstruct a HOG descriptor instead of the original image. As a result, our trained model extracts features robust to extreme variations in appearance directly from raw images, without the need for labeled training data or environment-specific training. We perform extensive experiments on various challenging datasets, showing that the proposed deep loop-closure model consistently outperforms the state-of-the-art methods in terms of effectiveness and efficiency. Our model is fast and reliable enough to close loops in real time with no dimensionality reduction, and capable of replacing generic off-the-shelf networks in state-of-the-art ConvNet-based loop closure systems.

115 citations


Journal ArticleDOI
TL;DR: Simulation results reveal that the proposed method performs exceptionally better compared with existing works, and different performance measures are considered.
Abstract: License plate recognition (LPR) system plays a vital role in security applications which include road traffic monitoring, street activity monitoring, identification of potential threats, and so on. Numerous methods were adopted for LPR but still, there is enough space for a single standard approach which can be able to deal with all sorts of problems such as light variations, occlusion, and multi-views. The proposed approach is an effort to deal under such conditions by incorporating multiple features extraction and fusion. The proposed architecture is comprised of four primary steps: (i) selection of luminance channel from CIE-Lab colour space, (ii) binary segmentation of selected channel followed by image refinement, (iii) a fusion of Histogram of oriented gradients (HOG) and geometric features followed by a selection of appropriate features using a novel entropy-based method, and (iv) features classification with support vector machine (SVM). To authenticate the results of proposed approach, different performance measures are considered. The selected measures are False positive rate (FPR), False negative rate (FNR), and accuracy which is achieved maximum up to 99.5%. Simulation results reveal that the proposed method performs exceptionally better compared with existing works.

101 citations


Proceedings ArticleDOI
20 May 2018
TL;DR: This paper investigated the feasibility of processing surveillance video streaming at the network edge for real-time, uninterrupted moving human objects tracking, and an efficient multi-object tracking algorithm based on Kernelized Correlation Filters is proposed.
Abstract: Allowing computation to be performed at the edge of a network, edge computing has been recognized as a promising approach to address some challenges in the cloud computing paradigm, particularly to the delay-sensitive and mission-critical applications like real-time surveillance. Prevalence of networked cameras and smart mobile devices enable video analytics at the network edge. However, human objects detection and tracking are still conducted at cloud centers, as real-time, online tracking is computationally expensive. In this paper, we investigated the feasibility of processing surveillance video streaming at the network edge for real-time, uninterrupted moving human objects tracking. Moving human detection based on Histogram of Oriented Gradients (HOG) and linear Support Vector Machine (SVM) is illustrated for features extraction, and an efficient multi-object tracking algorithm based on Kernelized Correlation Filters (KCF) is proposed. Implemented and tested on Raspberry Pi 3, our experimental results are very encouraging, which validated the feasibility of the proposed approach toward a real-time surveillance solution at the edge of networks.

86 citations


Journal ArticleDOI
26 Feb 2018-Sensors
TL;DR: A new PAD method is proposed that uses a combination of deep and handcrafted features extracted from the images by visible-light camera sensor to form a new type of image features, called hybrid features, which has stronger discrimination ability than single image features.
Abstract: Although face recognition systems have wide application, they are vulnerable to presentation attack samples (fake samples). Therefore, a presentation attack detection (PAD) method is required to enhance the security level of face recognition systems. Most of the previously proposed PAD methods for face recognition systems have focused on using handcrafted image features, which are designed by expert knowledge of designers, such as Gabor filter, local binary pattern (LBP), local ternary pattern (LTP), and histogram of oriented gradients (HOG). As a result, the extracted features reflect limited aspects of the problem, yielding a detection accuracy that is low and varies with the characteristics of presentation attack face images. The deep learning method has been developed in the computer vision research community, which is proven to be suitable for automatically training a feature extractor that can be used to enhance the ability of handcrafted features. To overcome the limitations of previously proposed PAD methods, we propose a new PAD method that uses a combination of deep and handcrafted features extracted from the images by visible-light camera sensor. Our proposed method uses the convolutional neural network (CNN) method to extract deep image features and the multi-level local binary pattern (MLBP) method to extract skin detail features from face images to discriminate the real and presentation attack face images. By combining the two types of image features, we form a new type of image features, called hybrid features, which has stronger discrimination ability than single image features. Finally, we use the support vector machine (SVM) method to classify the image features into real or presentation attack class. Our experimental results indicate that our proposed method outperforms previous PAD methods by yielding the smallest error rates on the same image databases.

85 citations


Journal ArticleDOI
TL;DR: Experimental results using video collected from real world scenarios are provided, showing that the proposed method possesses higher detecting accuracy and time efficiency than the conventional ones, and it can detect and track the multi-vehicle targets successfully in complex urban environment.

81 citations


Journal ArticleDOI
TL;DR: The proposed method to distinguish power quality events based on the Histogram of Oriented Gradients (HOG) and Support Vector Machine (SVM) has less processing time than the previous methods due to multiple events occurring at same time.

70 citations


Proceedings ArticleDOI
01 Nov 2018
TL;DR: Experimental evaluations indicate that the color may be the most informative features for this task and it is found that RGB is the feature with the best accuracy for most classifiers the authors evaluate.
Abstract: Corn is one of major crops in Indonesia. Diseases outbreak could significantly reduce the maize production, causing millions of rupiah in damages. To reduce the risks of crop failure due to diseases outbreak, machine learning methods can be implemented. Naked eyes inspection for plant diseases usually based on the changes in color or the existence of spots or rotten area in the leaves. Based on these observations, In this paper, we investigate several image processing based features for diseases detection of corn. Various image processing features to detect color such as RGB, local features on images such as scale-invariant feature transform (SIFT), speeded up robust features (SURF), and Oriented FAST and rotated BRIEF (ORB), and object detector such as histogram of oriented gradients (HOG). We evaluate the performance of these features on several machine learning algorithms. They are support vector machines (SVM), Decision Tree (DT), Random forest (RF), and Naive Bayes (NB). Our experimental evaluations indicate that the color may be the most informative features for this task. We find that RGB is the feature with the best accuracy for most classifiers we evaluate.

Journal ArticleDOI
TL;DR: This paper proposes an efficient feature extraction and classification algorithm based on a visual saliency model that outperforms the state-of-the-art methods and develops a two-level directed acyclic graph (DAG) support vector metric learning.
Abstract: The performance of a synthetic aperture radar automatic (SAR) target recognition system mainly depends on feature extraction and classification It is crucial to select discriminative features to train a classifier to achieve desired performance In this paper, we propose an efficient feature extraction and classification algorithm based on a visual saliency model First, an SAR-oriented graph-based visual saliency model is introduced Second, relying on the ability of our saliency model in highlighting the most significant regions, Gabor and histogram of oriented gradients features are extracted from the processed SAR images Third, in order to have more discriminative features, the discrimination correlation analysis algorithm is used for feature fusion and combination Finally, a two-level directed acyclic graph (DAG) support vector metric learning is developed that seamlessly takes advantage of a two-level DAG by eliminating weak classifiers and the Mahalanobis distance-based radial basis function kernel which emphasizes relevant features and reduces the influence of noninformative features Experiments on real SAR data from the MSTAR database are conducted and the experimental results demonstrate that the proposed method outperforms the state-of-the-art methods

Journal ArticleDOI
TL;DR: An efficient and fast facial expression recognition system that outperforms existing methods is presented and a new feature called W_HOG where W indicates discrete wavelet transform and HOG indicates histogram of oriented gradients feature is introduced.
Abstract: Facial expression recognition plays a significant role in human behavior detection. In this study, we present an efficient and fast facial expression recognition system. We introduce a new feature called W_HOG where W indicates discrete wavelet transform and HOG indicates histogram of oriented gradients feature. The proposed framework comprises of four stages: (i) Face processing, (ii) Domain transformation, (iii) Feature extraction and (iv) Expression recognition. Face processing is composed of face detection, cropping and normalization steps. In domain transformation, spatial domain features are transformed into the frequency domain by applying discrete wavelet transform (DWT). Feature extraction is performed by retrieving Histogram of Oriented Gradients (HOG) feature in DWT domain which is termed as W_HOG feature. For expression recognition, W_HOG feature is supplied to a well-designed tree based multiclass support vector machine (SVM) classifier with one-versus-all architecture. The proposed system is trained and tested with benchmark CK+, JAFFE and Yale facial expression datasets. Experimental results of the proposed method are effective towards facial expression recognition and outperforms existing methods.

Posted Content
TL;DR: In this article, a novel unsupervised deep neural network architecture of a feature embedding for visual loop closure is proposed, which is built upon the autoencoder architecture tailored specifically to the problem at hand.
Abstract: Robust efficient loop closure detection is essential for large-scale real-time SLAM. In this paper, we propose a novel unsupervised deep neural network architecture of a feature embedding for visual loop closure that is both reliable and compact. Our model is built upon the autoencoder architecture, tailored specifically to the problem at hand. To train our network, we inflict random noise on our input data as the denoising autoencoder does, but, instead of applying random dropout, we warp images with randomized projective transformations to emulate natural viewpoint changes due to robot motion. Moreover, we utilize the geometric information and illumination invariance provided by histogram of oriented gradients (HOG), forcing the encoder to reconstruct a HOG descriptor instead of the original image. As a result, our trained model extracts features robust to extreme variations in appearance directly from raw images, without the need for labeled training data or environment-specific training. We perform extensive experiments on various challenging datasets, showing that the proposed deep loop-closure model consistently outperforms the state-of-the-art methods in terms of effectiveness and efficiency. Our model is fast and reliable enough to close loops in real time with no dimensionality reduction, and capable of replacing generic off-the-shelf networks in state-of-the-art ConvNet-based loop closure systems.

Proceedings ArticleDOI
08 Jul 2018
TL;DR: In this article, a comparison of three classification models using features extracted using local binary patterns, the histogram of gradients, and a pre-trained deep network was conducted using KIMIA Path960, a publicly available dataset of 960 histopathology images extracted from 20 different tissue scans.
Abstract: Medical image analysis has become a topic under the spotlight in recent years. There is a significant progress in medical image research concerning the usage of machine learning. However, there are still numerous questions and problems awaiting answers and solutions, respectively. In the present study, comparison of three classification models is conducted using features extracted using local binary patterns, the histogram of gradients, and a pre-trained deep network. Three common image classification methods, including support vector machines, decision trees, and artificial neural networks are used to classify feature vectors obtained by different feature extractors. We use KIMIA Path960, a publicly available dataset of 960 histopathology images extracted from 20 different tissue scans to test the accuracy of classification and feature extractions models used in the study, specifically for the histopathology images. SVM achieves the highest accuracy of 90.52% using local binary patterns as features which surpasses the accuracy obtained by deep features, namely 81.14%.

Journal ArticleDOI
TL;DR: A novel and efficient method for traffic sign recognition based on combination of complementary and discriminative feature sets, which has shown good complementariness and yielded fast recognition rate and is more adequate for real-time application as well.

Journal ArticleDOI
TL;DR: Video-based surveillance pedestrian detection is playing a key role in emerging technologies, such as Internet of Things and Big Data for use in smart industries and cities, and is now being automated using deep learning methods known as convolutional neural networks (CNNs).
Abstract: Video-based surveillance pedestrian detection is playing a key role in emerging technologies, such as Internet of Things and Big Data for use in smart industries and cities. In pedestrian detection, factors, such as lighting, object collisions, backgrounds, clothes, and occlusion cause complications because of inconsistent classification. To address these problems, enhancements in feature extraction are required. These features should arise from multiple variations of pedestrians. Well-known features used for pedestrian detection involve histogram of gradients, scale-invariant feature transform, and Haar built to represent boundary level classifications. Occlusion feature extraction supports identification of regions involving pedestrian detection. Classifiers, such as support vector machine and random forests are also used to classify pedestrians. All these feature extraction and pedestrian detection methods are now being automated using deep learning methods known as convolutional neural networks (CNNs). A model is trained by providing positive and negative image data sets, and larger data sets provide more accurate results when a CNN-based approach is used. Additionally, Extensible Markup Language cascading is used for detecting faces from detected pedestrian.

Journal ArticleDOI
TL;DR: The present study demonstrates the effectiveness of relatively less investigated oBIFs as a robust textual descriptor in gender classification competitions reporting classification rates of 71%, 76% and 68% respectively; outperforming the participating systems of these competitions.
Abstract: Classification of gender from images of handwriting is an interesting research problem in computerized analysis of handwriting. The correlation between handwriting and gender of writer can be exploited to develop intelligent systems to facilitate forensic experts, document examiners, paleographers, psychologists and neurologists. We propose a handwriting based gender recognition system that exploits texture as the discriminative attribute between male and female handwriting. The textural information in handwriting is captured using combinations of different configurations of oriented Basic Image Features (oBIFs). oBIFs histograms and oBIFs columns histograms extracted from writing samples of male and female handwriting are used to train a Support Vector Machine classifier (SVM). The system is evaluated on three subsets of the QUWI database of Arabic and English writing samples using the experimental protocols of the ICDAR 2013, ICDAR 2015 and ICFHR 2016 gender classification competitions reporting classification rates of 71%, 76% and 68% respectively; outperforming the participating systems of these competitions. While textural measures like local binary patterns, histogram of oriented gradients and Gabor filters etc. have remained a popular choice for many expert systems targeting recognition problems, the present study demonstrates the effectiveness of relatively less investigated oBIFs as a robust textual descriptor.

Proceedings ArticleDOI
01 May 2018
TL;DR: 3D HOG outperformed for micro-movement detection, compared to state-of-the-art feature representations: Local Binary Patterns in Three Orthogonal Planes and Histograms of Oriented Optical Flow.
Abstract: Micro-facial expressions are regarded as an important human behavioural event that can highlight emotional deception. Spotting these movements is difficult for humans and machines, however research into using computer vision to detect subtle facial expressions is growing in popularity. This paper proposes an individualised baseline micro-movement detection method using 3D Histogram of Oriented Gradients (3D HOG) temporal difference method. We define a face template consisting of 26 regions based on the Facial Action Coding System (FACS). We extract the temporal features of each region using 3D HOG. Then, we use Chi-square distance to find subtle facial motion in the local regions. Finally, an automatic peak detector is used to detect micro-movements above the proposed adaptive baseline threshold. The performance is validated on two FACS coded datasets: SAMM and CASME II. This objective method focuses on the movement of the 26 face regions. When comparing with the ground truth, the best result was an AUC of 0.7512 and 0.7261 on SAMM and CASME II, respectively. The results show that 3D HOG outperformed for micro-movement detection, compared to state-of-the-art feature representations: Local Binary Patterns in Three Orthogonal Planes and Histograms of Oriented Optical Flow.

Proceedings ArticleDOI
06 Jun 2018
TL;DR: A methodology to identify facial emotions using facial landmarks and random forest classifier and the Famous Extended Cohn-Kanade database has been used to train random forest and to test the accuracy of the system.
Abstract: Human emotions are the universally common mode of interaction. Automated human facial expression identification has its own advantages. In this paper, the author has proposed and developed a methodology to identify facial emotions using facial landmarks and random forest classifier. Firstly, faces are identified in each image using a histogram of oriented gradients with a linear classifier, image pyramid, and sliding window detection scheme. Then facial landmarks are identified using a model trained with the iBUG 300-W dataset. A feature vector is calculated using a proposed method which uses identified facial landmarks and it is normalized using a proposed method in order to remove facial size variations. The same feature vector is calculated for the neutral pose and vector difference is used to identify emotions using random forest classifier. Famous Extended Cohn-Kanade database has been used to train random forest and to test the accuracy of the system.

Journal ArticleDOI
TL;DR: This paper introduces the “encoded local projections” (ELP) as a new dense-sampling image descriptor for search and classification problems and experiments with three public datasets to comparatively evaluate the performance of ELP histograms.
Abstract: This paper introduces the “encoded local projections” (ELP) as a new dense-sampling image descriptor for search and classification problems. The gradient changes of multiple projections in local windows of gray-level images are encoded to build a histogram that captures spatial projection patterns. Using projections is a conventional technique in both medical imaging and computer vision. Furthermore, powerful dense-sampling methods, such as local binary patterns and the histogram of oriented gradients, are widely used for image classification and recognition. Inspired by many achievements of such existing descriptors, we explore the design of a new class of histogram-based descriptors with particular applications in medical imaging. We experiment with three public datasets (IRMA, Kimia Path24, and CT Emphysema) to comparatively evaluate the performance of ELP histograms. In light of the tremendous success of deep architectures, we also compare the results with deep features generated by pretrained networks. The results are quite encouraging as the ELP descriptor can surpass both conventional and deep descriptors in performance in several experimental settings.

Journal ArticleDOI
TL;DR: A new type of spatial partitioning scheme and a modified pyramid matching kernel based on spatial pyramid matching (SPM) are proposed and a dense histogram of oriented gradients is used as a low-level visual descriptor.

Journal ArticleDOI
TL;DR: An image segmentation technique based on the Histogram of Oriented Gradients (HOG) features that allows recognizing the signals of the basketball referee from videos and achieves an accuracy of 97.5% using Support Vector Machine (SVM) for classification.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper presented a simple yet effective Boolean map based representation that exploits connectivity cues for visual tracking, where each one is characterized by a set of Boolean maps generated by uniformly thresholding their values, which effectively encode multi-scale connectivity cues of the target with different granularities.

Journal ArticleDOI
TL;DR: A static hand gesture recognition system for mobile devices is presented by combining the histogram of oriented gradients (HOG) and local binary pattern (LBP) features, which can accurately detect hand poses.

Journal ArticleDOI
TL;DR: In this paper, a variation of HOG and Gabor filter combination called Histogram of Oriented Texture (HOT) was proposed for classification of mammogram patches as normal-abnormal and benign-malignant.
Abstract: Breast cancer is becoming pervasive with each passing day. Hence, its early detection is a big step in saving the life of any patient. Mammography is a common tool in breast cancer diagnosis. The most important step here is classification of mammogram patches as normal–abnormal and benign–malignant. Texture of a breast in a mammogram patch plays a significant role in these classifications. We propose a variation of Histogram of Gradients (HOG) and Gabor filter combination called Histogram of Oriented Texture (HOT) that exploits this fact. We also revisit the Pass Band - Discrete Cosine Transform (PB-DCT) descriptor that captures texture information well. All features of a mammogram patch may not be useful. Hence, we apply a feature selection technique called Discrimination Potentiality (DP). Our resulting descriptors, DP-HOT and DP-PB-DCT, are compared with the standard descriptors. Density of a mammogram patch is important for classification, and has not been studied exhaustively. The Image Retrieval in Medical Application (IRMA) database from RWTH Aachen, Germany is a standard database that provides mammogram patches, and most researchers have tested their frameworks only on a subset of patches from this database. We apply our two new descriptors on all images of the IRMA database for density wise classification, and compare with the standard descriptors. We achieve higher accuracy than all of the existing standard descriptors (more than 92%).

Journal ArticleDOI
TL;DR: A hybrid approach of facial expression based sentiment analysis has been presented combining local and global features, boosting performance of the proposed technique over face images containing noise and occlusions.
Abstract: Facial sentiment analysis has been an enthusiastic research area for the last two decades A fair amount of work has been done by researchers in this field due to its utility in numerous applications such as facial expression driven knowledge discovery However, developing an accurate and efficient facial expression recognition system is still a challenging problem Although many efficient recognition systems have been introduced in the past, the recognition rate is not satisfactory in general due to inherent limitations including light, pose variations, noise, and occlusion In this paper, a hybrid approach of facial expression based sentiment analysis has been presented combining local and global features Feature extraction is performed fusing the histogram of oriented gradients (HOG) descriptor with the uniform local ternary pattern (U-LTP) descriptor These features are extracted from the entire face image rather than from individual components of faces like eyes, nose, and mouth The most suitable set of HOG parameters are selected after analyzing them experimentally along with the ULTP descriptor, boosting performance of the proposed technique over face images containing noise and occlusions Face sentiments are analyzed classifying them into seven universal emotional expressions: Happy, Angry, Fear, Disgust, Sad, Surprise, and Neutral Extracted features via HOG and ULTP are fused into a single feature vector and this feature vector is fed into a Multi-class Support Vector Machine classifier for emotion classification Three types of experiments are conducted over three public facial image databases including JAFFE, MMI, and CK+ to evaluate the recognition rate of the proposed technique during experimental evaluation; recognition accuracy in percent, ie, 9571, 9820, and 9968 are achieved for JAFFE, MMI, and CK+, respectively

Proceedings ArticleDOI
03 May 2018
TL;DR: The detection results show that for the same dataset, LBP features perform better than the other two feature types with a higher detection rate, and a unique and robust detection algorithm using a combination of all the three different feature descriptors and AdaBoost cascade classification is proposed.
Abstract: Autonomous vehicles may be the most significant innovation in transportation since automobiles were first invented. Environmental perception plays a pivotal role in the development of self-driving vehicles which need to navigate in a complex environment of static and dynamic objects. It is required to extract dynamic objects like vehicles and pedestrians more precisely and robustly to estimate the current position, motion and predict its future position. In this article, the performance of three commonly used object detection approaches, Histogram of Oriented Gradients (HOG), Haar-like features and Local Binary Pattern (LBP) is investigated and analyzed using a public dataset of camera images. The detection results show that for the same dataset, LBP features perform better than the other two feature types with a higher detection rate. Finally, a unique and robust detection algorithm using a combination of all the three different feature descriptors and AdaBoost cascade classification is proposed.

Journal ArticleDOI
TL;DR: This work presents a multiple pedestrian tracking method for monocular videos captured by a fixed camera in an interacting multiple model (IMM) framework that outperforms four state-of-the-art visual tracking methods using benchmark video databases.
Abstract: We present a multiple pedestrian tracking method for monocular videos captured by a fixed camera in an interacting multiple model (IMM) framework. Our tracking method involves multiple IMM trackers running in parallel, which are tied together by a robust data association component. We investigate two data association strategies which take into account both the target appearance and motion errors. We use a 4D color histogram as the appearance model for each pedestrian returned by a people detector that is based on the histogram of oriented gradients features. Short-term occlusion problems and false negative errors from the detector are dealt with using a sliding window of video frames, where tracking persists in the absence of observations. Our method has been evaluated, and compared both qualitatively and quantitatively with four state-of-the-art visual tracking methods using benchmark video databases. The experiments demonstrate that, on average, our tracking method outperforms these four methods.

Proceedings ArticleDOI
08 Jul 2018
TL;DR: In this article, the authors define the iris location problem as the delimitation of the smallest squared window that encompasses the entire iris region and compare the classical and outstanding Daugman iris localization approach with two window based detectors: 1) a sliding window detector based on features from Histogram of Oriented Gradients (HOG) and a linear Support Vector Machines (SVM) classifier; 2) a deep learning based detector fine-tuned from YOLO object detector.
Abstract: The iris is considered as the biometric trait with the highest unique probability. The iris location is an important task for biometrics systems, affecting directly the results obtained in specific applications such as iris recognition, spoofing and contact lenses detection, among others. This work defines the iris location problem as the delimitation of the smallest squared window that encompasses the iris region. In order to build a benchmark for iris location we annotate (iris squared bounding boxes) four databases from different biometric applications and make them publicly available to the community. Besides these 4 annotated databases, we include 2 others from the literature. We perform experiments on these six databases, five obtained with near infra-red sensors and one with visible light sensor. We compare the classical and outstanding Daugman iris location approach with two window based detectors: 1) a sliding window detector based on features from Histogram of Oriented Gradients (HOG) and a linear Support Vector Machines (SVM) classifier; 2) a deep learning based detector fine-tuned from YOLO object detector. Experimental results showed that the deep learning based detector outperforms the other ones in terms of accuracy and runtime (GPUs version) and should be chosen whenever possible.