scispace - formally typeset
Search or ask a question

Showing papers on "Histogram of oriented gradients published in 2015"


Journal ArticleDOI
TL;DR: This paper addresses the problems of contour detection, bottom-up grouping, object detection and semantic segmentation on RGB-D data, and proposes an approach that classifies superpixels into the dominant object categories in the NYUD2 dataset.
Abstract: In this paper, we address the problems of contour detection, bottom-up grouping, object detection and semantic segmentation on RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset (Silberman et al., ECCV, 2012). We propose algorithms for object boundary detection and hierarchical segmentation that generalize the $$gPb-ucm$$gPb-ucm approach of Arbelaez et al. (TPAMI, 2011) by making effective use of depth information. We show that our system can label each contour with its type (depth, normal or albedo). We also propose a generic method for long-range amodal completion of surfaces and show its effectiveness in grouping. We train RGB-D object detectors by analyzing and computing histogram of oriented gradients on the depth image and using them with deformable part models (Felzenszwalb et al., TPAMI, 2010). We observe that this simple strategy for training object detectors significantly outperforms more complicated models in the literature. We then turn to the problem of semantic segmentation for which we propose an approach that classifies superpixels into the dominant object categories in the NYUD2 dataset. We design generic and class-specific features to encode the appearance and geometry of objects. We also show that additional features computed from RGB-D object detectors and scene classifiers further improves semantic segmentation accuracy. In all of these tasks, we report significant improvements over the state-of-the-art.

253 citations


Journal ArticleDOI
TL;DR: The approach for classifying acoustic scenes is based on transforming the audio signal into a time-frequency representation and then in extracting relevant features about shapes and evolutions of time- frequency structures based on histogram of gradients that are subsequently fed to a multi-class linear support vector machines.
Abstract: This abstract presents our entry to the Detection and Classification of Acoustic Scenes challenge. The approach we propose for classifying acoustic scenes is based on transforming the audio signal into a time–frequency representation and then in extracting relevant features about shapes and evolutions of time–frequency structures. These features are based on histogram of gradients that are subsequently fed to a multi-class linear support vector machines.

206 citations


Journal ArticleDOI
TL;DR: A comprehensive study on the application of histogram of oriented gradients (HOG) descriptor in the FER problem is proposed, highlighting as this powerful technique could be effectively exploited for this purpose.
Abstract: Automatic facial expression recognition (FER) is a topic of growing interest mainly due to the rapid spread of assistive technology applications, as human–robot interaction, where a robust emotional awareness is a key point to best accomplish the assistive task. This paper proposes a comprehensive study on the application of histogram of oriented gradients (HOG) descriptor in the FER problem, highlighting as this powerful technique could be effectively exploited for this purpose. In particular, this paper highlights that a proper set of the HOG parameters can make this descriptor one of the most suitable to characterize facial expression peculiarities. A large experimental session, that can be divided into three different phases, was carried out exploiting a consolidated algorithmic pipeline. The first experimental phase was aimed at proving the suitability of the HOG descriptor to characterize facial expression traits and, to do this, a successful comparison with most commonly used FER frameworks was carried out. In the second experimental phase, different publicly available facial datasets were used to test the system on images acquired in different conditions (e.g. image resolution, lighting conditions, etc.). As a final phase, a test on continuous data streams was carried out on-line in order to validate the system in real-world operating conditions that simulated a real-time human–machine interaction.

133 citations


Journal ArticleDOI
Shunli Zhang1, Xin Yu1, Yao Sui1, Sicong Zhao1, Li Zhang1 
TL;DR: This paper proposes a novel tracking method via multi-view learning framework by using multiple support vector machines (SVM) and presents a novel collaborative strategy with entropy criterion, which is acquired by the confidence distribution of the candidate samples.
Abstract: How to build an accurate and reliable appearance model to improve the performance is a crucial problem in object tracking. Since the multi-view learning can lead to more accurate and robust representation of the object, in this paper, we propose a novel tracking method via multi-view learning framework by using multiple support vector machines (SVM). The multi-view SVMs tracking method is constructed based on multiple views of features and a novel combination strategy. To realize a comprehensive representation, we select three different types of features, i.e., gray scale value, histogram of oriented gradients (HOG), and local binary pattern (LBP), to train the corresponding SVMs. These features represent the object from the perspectives of description, detection, and recognition, respectively . In order to realize the combination of the SVMs under the multi-view learning framework, we present a novel collaborative strategy with entropy criterion, which is acquired by the confidence distribution of the candidate samples. In addition, to learn the changes of the object and the scenario, we propose a novel update scheme based on subspace evolution strategy. The new scheme can control the model update adaptively and help to address the occlusion problems . We conduct our approach on several public video sequences and the experimental results demonstrate that our method is robust and accurate, and can achieve the state-of-the-art tracking performance.

117 citations


Journal ArticleDOI
TL;DR: In this paper, a novel rotationally invariant object detection descriptor was proposed to detect aircraft and cars in remote-sensing images using orientation normalization, feature space mapping, and an elliptic Fourier transform.
Abstract: High-resolution remote-sensing images are widely used for object detection but are affected by various factors. During the detection process, the orientation sensitivity of the image features is crucial to the detection performance. This study presents a novel rotationally invariant object detection descriptor that can address the difficulties with object detection that are caused by different object orientations. We use orientation normalization, feature space mapping, and an elliptic Fourier transform to achieve rotational invariance of the histogram of oriented gradients. Validation experiments indicate that the proposed descriptor is robust to rotation, noise, and compression. We use this novel image descriptor to detect aircraft and cars in remote-sensing images. The results show that the proposed method offers robust rotational invariance in object detection.

105 citations


Proceedings ArticleDOI
07 Jun 2015
TL;DR: A vision based method to automatically determine if a driver is holding a cell phone close to one of his/her ears (thus keeping only one hand on the steering wheel) is presented and quantitatively demonstrate the method's efficacy on challenging Strategic Highway Research Program (SHRP2) face view videos from the head pose validation data.
Abstract: The harmful effects of cell phone usage on driver behavior have been well investigated and the growing problem has motivated several several research efforts aimed at developing automated cell phone usage detection systems. Computer vision based approaches for dealing with this problem have only emerged in recent years. In this paper, we present a vision based method to automatically determine if a driver is holding a cell phone close to one of his/her ears (thus keeping only one hand on the steering wheel) and quantitatively demonstrate the method's efficacy on challenging Strategic Highway Research Program (SHRP2) face view videos from the head pose validation data that was acquired to monitor driver head pose variation under naturalistic driving conditions. To the best of our knowledge, this is the first such evaluation carried out using this relatively new data. Our approach utilizes the Supervised Descent Method (SDM) based facial landmark tracking algorithm to track the locations of facial landmarks in order to extract a crop of the region of interest. Following this, features are extracted from the crop and are classified using previously trained classifiers in order to determine if a driver is holding a cell phone. We adopt a through approach and benchmark the performance obtained using raw pixels and Histogram of Oriented Gradients (HOG) features in combination with various classifiers.

105 citations


Journal ArticleDOI
TL;DR: A new traffic sign detection method by integrating color invariants based image segmentation and pyramid histogram of oriented gradients (PHOG) features based shape matching that can robustly detect traffic signs under varying weather, shadow, occlusion and complex background conditions is proposed.

102 citations


Journal ArticleDOI
TL;DR: Experimental results indicate that the proposed method is robust under different complex backgrounds and has high detection rate with low false alarm.
Abstract: Automatic oil tank detection plays a very important role for remote sensing image processing. To accomplish the task, a hierarchical oil tank detector with deep surrounding features is proposed in this paper. The surrounding features extracted by the deep learning model aim at making the oil tanks more easily to recognize, since the appearance of oil tanks is a circle and this information is not enough to separate targets from the complex background. The proposed method is divided into three modules: 1) candidate selection; 2) feature extraction; and 3) classification. First, a modified ellipse and line segment detector (ELSD) based on gradient orientation is used to select candidates in the image. Afterward, the feature combing local and surrounding information together is extracted to represent the target. Histogram of oriented gradients (HOG) which can reliably capture the shape information is extracted to characterize the local patch. For the surrounding area, the convolutional neural network (CNN) trained in ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) contest is applied as a blackbox feature extractor to extract rich surrounding feature. Then, the linear support vector machine (SVM) is utilized as the classifier to give the final output. Experimental results indicate that the proposed method is robust under different complex backgrounds and has high detection rate with low false alarm.

95 citations


Journal ArticleDOI
TL;DR: This paper proposes several speed-ups for densely sampled HOG, HOF and MBH descriptors and investigates the trade-off between accuracy and computational efficiency of descriptors in terms of frame sampling rate and type of Optical Flow method.
Abstract: The current state-of-the-art in video classification is based on Bag-of-Words using local visual descriptors. Most commonly these are histogram of oriented gradients (HOG), histogram of optical flow (HOF) and motion boundary histograms (MBH) descriptors. While such approach is very powerful for classification, it is also computationally expensive. This paper addresses the problem of computational efficiency. Specifically: (1) We propose several speed-ups for densely sampled HOG, HOF and MBH descriptors and release Matlab code; (2) We investigate the trade-off between accuracy and computational efficiency of descriptors in terms of frame sampling rate and type of Optical Flow method; (3) We investigate the trade-off between accuracy and computational efficiency for computing the feature vocabulary, using and comparing most of the commonly adopted vector quantization techniques: \(k\)-means, hierarchical \(k\)-means, Random Forests, Fisher Vectors and VLAD.

93 citations


Journal ArticleDOI
TL;DR: A novel unsupervised learning framework for anomaly detection which learns a probability model which takes the spatial and temporal contextual information into consideration, and can accurately detect and localize anomalies in surveillance video.
Abstract: Detecting anomalies in surveillance videos, that is, finding events or objects with low probability of occurrence, is a practical and challenging research topic in computer vision community. In this paper, we put forward a novel unsupervised learning framework for anomaly detection. At feature level, we propose a Sparse Semi-nonnegative Matrix Factorization (SSMF) to learn local patterns at each pixel, and a Histogram of Nonnegative Coefficients (HNC) can be constructed as local feature which is more expressive than previously used features like Histogram of Oriented Gradients (HOG). At model level, we learn a probability model which takes the spatial and temporal contextual information into consideration. Our framework is totally unsupervised requiring no human-labeled training data. With more expressive features and more complicated model, our framework can accurately detect and localize anomalies in surveillance video. We carried out extensive experiments on several benchmark video datasets for anomaly detection, and the results demonstrate the superiority of our framework to state-of-the-art approaches, validating the effectiveness of our framework.

88 citations


Journal ArticleDOI
18 Sep 2015-Sensors
TL;DR: This article evaluates vision algorithms as alternatives for detection and distance estimation of mUAVs using Haar-like features, histogram of gradients (HOG) and local binary patterns (LBP) using cascades of boosted classifiers and shows that the cascaded classifiers using HOG train and run faster than the other algorithms.
Abstract: Detection and distance estimation of micro unmanned aerial vehicles (mUAVs) is crucial for (i) the detection of intruder mUAVs in protected environments; (ii) sense and avoid purposes on mUAVs or on other aerial vehicles and (iii) multi-mUAV control scenarios, such as environmental monitoring, surveillance and exploration. In this article, we evaluate vision algorithms as alternatives for detection and distance estimation of mUAVs, since other sensing modalities entail certain limitations on the environment or on the distance. For this purpose, we test Haar-like features, histogram of gradients (HOG) and local binary patterns (LBP) using cascades of boosted classifiers. Cascaded boosted classifiers allow fast processing by performing detection tests at multiple stages, where only candidates passing earlier simple stages are processed at the preceding more complex stages. We also integrate a distance estimation method with our system utilizing geometric cues with support vector regressors. We evaluated each method on indoor and outdoor videos that are collected in a systematic way and also on videos having motion blur. Our experiments show that, using boosted cascaded classifiers with LBP, near real-time detection and distance estimation of mUAVs are possible in about 60 ms indoors (1032 × 778 resolution) and 150 ms outdoors (1280 × 720 resolution) per frame, with a detection rate of 0.96 F-score. However, the cascaded classifiers using Haar-like features lead to better distance estimation since they can position the bounding boxes on mUAVs more accurately. On the other hand, our time analysis yields that the cascaded classifiers using HOG train and run faster than the other algorithms.

Journal ArticleDOI
TL;DR: In this article, a new descriptor referred to as Segmental HOG was developed to perform a comprehensive detection of hundreds of glomeruli in images of whole kidney sections, which possesses flexible blocks that can be adaptively fitted to input images in order to acquire robustness for the detection of the glomerulus.
Abstract: The detection of the glomeruli is a key step in the histopathological evaluation of microscopic images of the kidneys. However, the task of automatic detection of the glomeruli poses challenges owing to the differences in their sizes and shapes in renal sections as well as the extensive variations in their intensities due to heterogeneity in immunohistochemistry staining. Although the rectangular histogram of oriented gradients (Rectangular HOG) is a widely recognized powerful descriptor for general object detection, it shows many false positives owing to the aforementioned difficulties in the context of glomeruli detection. A new descriptor referred to as Segmental HOG was developed to perform a comprehensive detection of hundreds of glomeruli in images of whole kidney sections. The new descriptor possesses flexible blocks that can be adaptively fitted to input images in order to acquire robustness for the detection of the glomeruli. Moreover, the novel segmentation technique employed herewith generates high-quality segmentation outputs, and the algorithm is assured to converge to an optimal solution. Consequently, experiments using real-world image data revealed that Segmental HOG achieved significant improvements in detection performance compared to Rectangular HOG. The proposed descriptor for glomeruli detection presents promising results, and it is expected to be useful in pathological evaluation.

Journal ArticleDOI
TL;DR: A new standard Thai handwritten character dataset is provided for comparison of feature extraction techniques and methods and the results show that the local gradient feature descriptors significantly outperform directly using pixel intensities from the images.

Journal ArticleDOI
TL;DR: The spatial pyramid histogram of gradients is extended to spatio-temporal domain to give 3-dimensional facial features and integrate them with dense optical flow to give a spatio/temporal descriptor which extracts both the spatial and dynamic motion information of facial expressions.

Journal ArticleDOI
TL;DR: The experimental results demonstrate that the proposed vehicle detection method not only improves detection performance but also reduces computation time.
Abstract: In this paper, a new on-road vehicle detection method is presented. First, a new feature named the Position and Intensity-included Histogram of Oriented Gradients (PIHOG or $\pi$ HOG) is proposed. Unlike the conventional HOG, $\pi$ HOG compensates the information loss involved in the construction of a histogram with position information, and it improves the discriminative power using intensity information. Second, a new search space reduction (SSR) method is proposed to speed up the detection and reduce the computational load. The SSR additionally decreases the false positive rate. A variety of classifiers, including support vector machine, extreme learning machine, and $k$ -nearest neighbor, are used to train and classify vehicles using $\pi$ HOG. The validity of the proposed method is demonstrated by its application to Caltech, IR, Pittsburgh, and Kitti datasets. The experimental results demonstrate that the proposed vehicle detection method not only improves detection performance but also reduces computation time.

Journal ArticleDOI
TL;DR: In this article, a method based on Histogram of Oriented Gradients (HOG) and Support Vector Machine (SVM) was proposed to classify moving objects in crowded traffic scenes.
Abstract: Pedestrians and cyclists are amongst the most vulnerable road users. Pedestrian and cyclist collisions involving motor-vehicles result in high injury and fatality rates for these two modes. Data for pedestrian and cyclist activity at intersections such as volumes, speeds, and space–time trajectories are essential in the field of transportation in general, and road safety in particular. However, automated data collection for these two road user types remains a challenge. Due to the constant change of orientation and appearance of pedestrians and cyclists, detecting and tracking them using video sensors is a difficult task. This is perhaps one of the main reasons why automated data collection methods are more advanced for motorized traffic. This paper presents a method based on Histogram of Oriented Gradients to extract features of an image box containing the tracked object and Support Vector Machine to classify moving objects in crowded traffic scenes. Moving objects are classified into three categories: pedestrians, cyclists, and motor vehicles. The proposed methodology is composed of three steps: (i) detecting and tracking each moving object in video data, (ii) classifying each object according to its appearance in each frame, and (iii) computing the probability of belonging to each class based on both object appearance and speed. For the last step, Bayes’ rule is used to fuse appearance and speed in order to predict the object class. Using video datasets collected in different intersections, the methodology was built and tested. The developed methodology achieved an overall classification accuracy of greater than 88%. However, the classification accuracy varies across modes and is highest for vehicles and lower for pedestrians and cyclists. The applicability of the proposed methodology is illustrated using a simple case study to analyze cyclist–vehicle conflicts at intersections with and without bicycle facilities.

Proceedings ArticleDOI
28 Dec 2015
TL;DR: This work proposes to use the Subband Power Distribution (SPD) as a feature to capture the occurrences of these events by computing the histogram of amplitude values in each frequency band of a spectrogram image by using the so-called Sinkhorn kernel.
Abstract: Acoustic scene classification is a difficult problem mostly due to the high density of events concurrently occurring in audio scenes. In order to capture the occurrences of these events we propose to use the Subband Power Distribution (SPD) as a feature. We extract it by computing the histogram of amplitude values in each frequency band of a spectrogram image. The SPD allows us to model the density of events in each frequency band. Our method is evaluated on a large acoustic scene dataset using support vector machines. We outperform the previous methods when using the SPD in conjunction with the histogram of gradients. To reach further improvement, we also consider the use of an approximation of the earth mover's distance kernel to compare histograms in a more suitable way. Using the so-called Sinkhorn kernel improves the results on most of the feature configurations. Best performances reach a 92.8% F1 score.

Proceedings ArticleDOI
01 Oct 2015
TL;DR: A new method of micro-movement detection by applying Histogram of Oriented Gradients as a feature descriptor on the authors' in-house high-speed video dataset of spontaneous micro facial movements is proposed.
Abstract: Detecting micro-facial movements in a video sequence is the first step in realising a system that can pick out rapid movements automatically as a person is being recorded. This paper proposes a new method of micro-movement detection by applying Histogram of Oriented Gradients as a feature descriptor on our in-house high-speed video dataset of spontaneous micro facial movements. Firstly the algorithm aligns and crops faces for each video using automatic facial point detection and affine transformation. Then a de-noising algorithm is applied to each video before splitting them into blocks where the Histogram of Oriented Gradient features are calculated for each frame in every video block. The Chi-Squared distance measure is then used to calculate dissimilarity in the spatial appearance between frames at a set interval. The final feature vector is calculated after normalisation of the raw distance values and peak detection is applied to 'spot' micro-facial movements. An individualised baseline threshold is used to determine the value a peak must exceed to be classed as a movement. The result is compared with a benchmark algorithm - feature difference analysis techniques for micro-facial movements using Local Binary Patterns. Results indicate the proposed method achieves higher Recall of 0.8429 and F1-measure of 0.7672.

Journal ArticleDOI
TL;DR: A new descriptor referred to as Segmental HOG was developed to perform a comprehensive detection of hundreds of glomeruli in images of whole kidney sections and it is expected to be useful in pathological evaluation.
Abstract: Glomerulus detection is a key step in histopathological evaluation of microscopy images of kidneys. However, the task of automatic detection of glomeruli poses challenges due to the disparity in sizes and shapes of glomeruli in renal sections. Moreover, extensive variations of their intensities due to heterogeneity in immunohistochemistry staining are also encountered. Despite being widely recognized as a powerful descriptor for general object detection, the rectangular histogram of oriented gradients (Rectangular HOG) suffers from many false positives due to the aforementioned difficulties in the context of glomerulus detection. A new descriptor referred to as Segmental HOG is developed to perform a comprehensive detection of hundreds of glomeruli in images of whole kidney sections. The new descriptor possesses flexible blocks that can be adaptively fitted to input images to acquire robustness to deformations of glomeruli. Moreover, the novel segmentation technique employed herewith generates high quality segmentation outputs and the algorithm is assured to converge to an optimal solution. Consequently, experiments using real world image data reveal that Segmental HOG achieves significant improvements in detection performance compared to Rectangular HOG. The proposed descriptor and method for glomeruli detection present promising results and is expected to be useful in pathological evaluation.

Journal ArticleDOI
TL;DR: The authors generate three Depth Motion Maps DMMs over the entire video sequence corresponding to the front, side, and top projection views to merge decision-level fusion, where a soft decision-fusion rule is used to combine the classification outcomes from multiple classifiers each with an individual set of features.
Abstract: The emerging cost-effective depth sensors have facilitated the action recognition task significantly In this paper, the authors address the action recognition problem using depth video sequences combining three discriminative features More specifically, the authors generate three Depth Motion Maps DMMs over the entire video sequence corresponding to the front, side, and top projection views Contourlet-based Histogram of Oriented Gradients CT-HOG, Local Binary Patterns LBP, and Edge Oriented Histograms EOH are then computed from the DMMs To merge these features, the authors consider decision-level fusion, where a soft decision-fusion rule, Logarithmic Opinion Pool LOGP, is used to combine the classification outcomes from multiple classifiers each with an individual set of features Experimental results on two datasets reveal that the fusion scheme achieves superior action recognition performance over the situations when using each feature individually

Journal ArticleDOI
TL;DR: A new feature selection method based on FIsher criterion and genetic optimization to tackle the CISL recognition problem and the fivefold cross-validation experiments show the proposed FIG method brought the better recognition performance than not only the full set of original features but also any single type of features.
Abstract: Common CT imaging signs of lung diseases (CISLs) are defined as the imaging signs that frequently appear in lung CT images from patients and play important roles in the diagnosis of lung diseases. This paper proposes a new feature selection method based on FIsher criterion and genetic optimization, called FIG for short, to tackle the CISL recognition problem. In our FIG feature selection method, the Fisher criterion is applied to evaluate feature subsets, based on which a genetic optimization algorithm is developed to find out an optimal feature subset from the candidate features. We use the FIG method to select the features for the CISL recognition from various types of features, including bag-of-visual-words based on the histogram of oriented gradients, the wavelet transform-based features, the local binary pattern, and the CT value histogram. Then, the selected features cooperate with each of five commonly used classifiers including support vector machine (SVM), Bagging (Bag), Naive Bayes (NB), k -nearest neighbor ( k -NN), and AdaBoost (Ada) to classify the regions of interests (ROIs) in lung CT images into the CISL categories. In order to evaluate the proposed feature selection method and CISL recognition approach, we conducted the fivefold cross-validation experiments on a set of 511 ROIs captured from real lung CT images. For all the considered classifiers, our FIG method brought the better recognition performance than not only the full set of original features but also any single type of features. We further compared our FIG method with the feature selection method based on classification accuracy rate and genetic optimization (ARG). The advantages on computation effectiveness and efficiency of FIG over ARG are shown through experiments.

Proceedings ArticleDOI
23 Aug 2015
TL;DR: Two gradient features for writer's gender, handedness, and age range prediction are introduced, including the Histogram of Oriented Gradients and the so-called gradient local binary patterns, which is an improved gradient feature that incorporates the local binary pattern neighborhood in the gradient calculation.
Abstract: This work introduces two gradient features for writer's gender, handedness, and age range prediction. The first feature is the Histogram of Oriented Gradients, which highlights the distribution of gradient orientations within images. The second feature is the so-called gradient local binary patterns, which is an improved gradient feature that incorporates the local binary pattern neighborhood in the gradient calculation. The prediction task is achieved by using SVM classifier. Experiments are performed on two corpuses of English and Arabic handwritten text. The results obtained in terms of classification accuracy highlight the effectiveness of the proposed features, which overcome the state of the art.

Journal ArticleDOI
TL;DR: Experiments on colour iris datasets, captured by mobile devices and static camera, show that the proposed method achieves an improved performance compared to the individual iris segmentation models and existing algorithms.

Proceedings ArticleDOI
24 Jun 2015
TL;DR: This paper examines the effect of the number of HOG bins on the vehicle detection and the symmetric characteristics of Hog feature of vehicle and demonstrates the speed-up of SVM classifier for vehicle detection by about three times while maintaining the detection performance.
Abstract: Support Vector Machine (SVM) classifier with Histogram of Oriented Gradients (HOG) feature become one of the most popular techniques used for vehicle detection in recent years. And the computing time of SVM is a main obstacle to get real time implementation which is important for Advanced Driver Assistance Systems (ADAS) applications. One of the effective ways to reduce the computing complexity of SVM is to reduce the dimension of HOG feature. In this paper, we examine the effect of the number of HOG bins on the vehicle detection and the symmetric characteristics of HOG feature of vehicle. And we successfully demonstrate the speed-up of SVM classifier for vehicle detection by about three times while maintaining the detection performance.

Journal ArticleDOI
Guo Mu1, Zhang Xinyu1, Li Deyi1, Zhang Tianlei1, An Lifeng1 
TL;DR: A camera based algorithm for real-time robust traffic light detection and recognition and with voting schemes, the proposed can provide a sufficient accuracy for autonomous vehicles in urban enviroment.

Journal ArticleDOI
01 Mar 2015
TL;DR: An integrated system that segments and classifies four moving objects, including pedestrians, cars, motorcycles, and bicycles, from their side-views in a video sequence is proposed and compared with different well-known classification approaches verify the superiority of the proposed classification method.
Abstract: An integrated system that segments and classifies four moving objects.A weight mask is proposed to enhance the distinguishing pixels in a segmented object.A new classification feature vector extracted from a wavelet-transformed space.A hierarchical linear support vector machine classification configuration is proposed. This paper proposes an integrated system for the segmentation and classification of four moving objects, including pedestrians, cars, motorcycles, and bicycles, from their side-views in a video sequence. Based on the use of an adaptive background in the red-green-blue (RGB) color model, each moving object is segmented with its minimum enclosing rectangle (MER) window by using a histogram-based projection approach or a tracking-based approach. Additionally, a shadow removal technique is applied to the segmented objects to improve the classification performance. For the MER windows with different sizes, a window scaling operation followed by an adaptive block-shifting operation is applied to obtain a fixed feature dimension. A weight mask, which is constructed according to the frequency of occurrence of an object in each position within a square window, is proposed to enhance the distinguishing pixels in the rescaled MER window. To extract classification features, a two-level Haar wavelet transform is applied to the rescaled MER window. The local shape features and the modified histogram of oriented gradients (HOG) are extracted from the level-two and level-one sub-bands, respectively, of the wavelet-transformed space. A hierarchical linear support vector machine classification configuration is proposed to classify the four classes of objects. Six video sequences are used to test the classification performance of the proposed method. The computer processing times of the object segmentation, object tracking, and feature extraction and classification approaches are 79ms, 211ms, and 0.01ms, respectively. Comparisons with different well-known classification approaches verify the superiority of the proposed classification method.

Journal ArticleDOI
TL;DR: This work proposes a new optimized rank-1 tensor approximation of the Joint-SSV to obtain compact low-dimensional descriptors that very accurately characterize an action in a video sequence and shows that these descriptor vectors make it possible to recognize actions without explicitly aligning the videos in time in order to compensate for speed of execution or differences in video frame rates.
Abstract: We propose that the dynamics of an action in video data forms a sparse self-similar manifold in the space-time volume, which can be fully characterized by a linear rank decomposition. Inspired by the recurrence plot theory, we introduce the concept of Joint Self-Similarity Volume (Joint-SSV) to model this sparse action manifold, and hence propose a new optimized rank-1 tensor approximation of the Joint-SSV to obtain compact low-dimensional descriptors that very accurately characterize an action in a video sequence. We show that these descriptor vectors make it possible to recognize actions without explicitly aligning the videos in time in order to compensate for speed of execution or differences in video frame rates. Moreover, we show that the proposed method is generic, in the sense that it can be applied using different low-level features, such as silhouettes, tracked points, histogram of oriented gradients, and so forth. Therefore, our method does not necessarily require explicit tracking of features in the space-time volume. Our experimental results on five public data sets demonstrate that our method produces promising results and outperforms many baseline methods.

Journal ArticleDOI
TL;DR: An efficient automated method for facial expression recognition based on the histogram of oriented gradient (HOG) descriptor, which is higher than the recognition rates for almost all other single-image- or video-based methods for facial emotion recognition.
Abstract: This article proposes an efficient automated method for facial expression recognition based on the histogram of oriented gradient (HOG) descriptor. This subject-independent method was designed for recognizing six prototyping emotions. It recognizes emotions by calculating differences on a level of feature descriptors between a neutral expression and a peak expression of an observed person. The parameters for the HOG descriptor were determined by using a genetic algorithm. Support vector machines (SVM) were applied during the recognition phase, whereat one SVM classifier was trained for one emotion. Each classifier was trained using difference vectors obtained by subtraction of HOG feature vectors calculated for the neutral and apex emotion subjects image. The proposed method was tested by using a leave-one-subject-out validation strategy for 106 subjects on 1232 images from the Cohn Kanade, and for 10 subjects on 192 images from the JAFFE database. A mean recognition rate of 95.64 % was obtained using the Cohn Kanade database, which is higher than the recognition rates for almost all other single-image- or video-based methods for facial emotion recognition.

Proceedings ArticleDOI
02 Mar 2015
TL;DR: In this paper, the Pyramid of Histogram of Gradients (PHOG) and Local Binary Patterns (LBP) features were extracted from the active facial patches to form a hybrid feature vector.
Abstract: Facial expression recognition has many potential applications which has attracted the attention of researchers in the last decade. Feature extraction is one important step in expression analysis which contributes toward fast and accurate expression recognition. This paper represents an approach of combining the shape and appearance features to form a hybrid feature vector. We have extracted Pyramid of Histogram of Gradients (PHOG) as shape descriptors and Local Binary Patterns (LBP) as appearance features. The proposed framework involves a novel approach of extracting hybrid features from active facial patches. The active facial patches are located on the face regions which undergo a major change during different expressions. After detection of facial landmarks, the active patches are localized and hybrid features are calculated from these patches. The use of small parts of face instead of the whole face for extracting features reduces the computational cost and prevents the over-fitting of the features for classification. By using linear discriminant analysis, the dimensionality of the feature is reduced which is further classified by using the support vector machine (SVM). The experimental results on two publicly available databases show promising accuracy in recognizing all expression classes.

Journal ArticleDOI
13 Apr 2015-Sensors
TL;DR: The experimental evaluation shows that the proposed pedestrian detector with on-board FIR camera outperforms, in the FIR domain, the state-of-the-art Haar-like Adaboost-cascade, histogram of oriented gradients (HOG)/linear SVM (linSVM) and MultiFtrpedestrian detectors, trained on the FIR images.
Abstract: One of the main challenges in intelligent vehicles concerns pedestrian detection for driving assistance. Recent experiments have showed that state-of-the-art descriptors provide better performances on the far-infrared (FIR) spectrum than on the visible one, even in daytime conditions, for pedestrian classification. In this paper, we propose a pedestrian detector with on-board FIR camera. Our main contribution is the exploitation of the specific characteristics of FIR images to design a fast, scale-invariant and robust pedestrian detector. Our system consists of three modules, each based on speeded-up robust feature (SURF) matching. The first module allows generating regions-of-interest (ROI), since in FIR images of the pedestrian shapes may vary in large scales, but heads appear usually as light regions. ROI are detected with a high recall rate with the hierarchical codebook of SURF features located in head regions. The second module consists of pedestrian full-body classification by using SVM. This module allows one to enhance the precision with low computational cost. In the third module, we combine the mean shift algorithm with inter-frame scale-invariant SURF feature tracking to enhance the robustness of our system. The experimental evaluation shows that our system outperforms, in the FIR domain, the state-of-the-art Haar-like Adaboost-cascade, histogram of oriented gradients (HOG)/linear SVM (linSVM) and MultiFtrpedestrian detectors, trained on the FIR images.