scispace - formally typeset
Search or ask a question

Showing papers on "Histogram of oriented gradients published in 2013"


Proceedings ArticleDOI
23 Jun 2013
TL;DR: A new descriptor for activity recognition from videos acquired by a depth sensor is presented that better captures the joint shape-motion cues in the depth sequence, and thus outperforms the state-of-the-art on all relevant benchmarks.
Abstract: We present a new descriptor for activity recognition from videos acquired by a depth sensor. Previous descriptors mostly compute shape and motion features independently, thus, they often fail to capture the complex joint shape-motion cues at pixel-level. In contrast, we describe the depth sequence using a histogram capturing the distribution of the surface normal orientation in the 4D space of time, depth, and spatial coordinates. To build the histogram, we create 4D projectors, which quantize the 4D space and represent the possible directions for the 4D normal. We initialize the projectors using the vertices of a regular polychoron. Consequently, we refine the projectors using a discriminative density measure, such that additional projectors are induced in the directions where the 4D normals are more dense and discriminative. Through extensive experiments, we demonstrate that our descriptor better captures the joint shape-motion cues in the depth sequence, and thus outperforms the state-of-the-art on all relevant benchmarks.

978 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: A set of features derived from skeleton tracking of the human body and depth maps for the purpose of action recognition are proposed, and a new descriptor for spatio-temporal feature extraction from color and depth images is introduced.
Abstract: We propose a set of features derived from skeleton tracking of the human body and depth maps for the purpose of action recognition. The descriptors proposed are easy to implement, produce relatively small-sized feature sets, and the multi-class classification scheme is fast and suitable for real-time applications. We intuitively characterize actions using pairwise affinities between view-invariant joint angles features over the performance of an action. Additionally, a new descriptor for spatio-temporal feature extraction from color and depth images is introduced. This descriptor involves an application of a modified histogram of oriented gradients (HOG) algorithm. The application produces a feature set at every frame, and these features are collected into a 2D array which then the same algorithm is applied to again (the approach is termed HOG2). Both feature sets are evaluated in a bag-of-words scheme using a linear SVM, showing state-of-the-art results on public datasets from different domains of human-computer interaction.

338 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: This paper revisits some of the core assumptions in HOG+SVM and shows that by properly designing the feature pooling, feature selection, preprocessing, and training methods, it is possible to reach top quality, at least for pedestrian detections, using a single rigid component.
Abstract: The current state of the art solutions for object detection describe each class by a set of models trained on discovered sub-classes (so called "components"), with each model itself composed of collections of interrelated parts (deformable models). These detectors build upon the now classic Histogram of Oriented Gradients+linear SVM combo. In this paper we revisit some of the core assumptions in HOG+SVM and show that by properly designing the feature pooling, feature selection, preprocessing, and training methods, it is possible to reach top quality, at least for pedestrian detections, using a single rigid component. Abstract We provide experiments for a large design space, that give insights into the design of classifiers, as well as relevant information for practitioners. Our best detector is fully feed-forward, has a single unified architecture, uses only histograms of oriented gradients and colour information in monocular static images, and improves over 23 other methods on the INRIA, ETH and Caltech-USA datasets, reducing the average miss-rate over HOG+SVM by more than 30%.

288 citations


Journal ArticleDOI
TL;DR: A computer vision based algorithm for recognizing single actions of earthmoving construction equipment, based on a multiple binary SVM classifier and spatio-temporal features, which outperforms previous algorithms for excavator and truck action recognition.

215 citations


Journal ArticleDOI
TL;DR: Preliminary results on detection of standing workers, excavators and dump trucks with an average accuracy of 98.83%, 82.10%, and 84.88% respectively indicate the applicability of the proposed method for automated activity analysis of workers and equipment from single video cameras.

198 citations


Journal ArticleDOI
TL;DR: A new patch-adaptive sparse approximation (PASA) method is designed with the following main components: minimum discrepancy criteria for sparse-based classification, patch-specific adaptation for discriminative approximation, and feature-space weighting for distance computation.
Abstract: In this paper, we propose a new classification method for five categories of lung tissues in high-resolution computed tomography (HRCT) images, with feature-based image patch approximation. We design two new feature descriptors for higher feature descriptiveness, namely the rotation-invariant Gabor-local binary patterns (RGLBP) texture descriptor and multi-coordinate histogram of oriented gradients (MCHOG) gradient descriptor. Together with intensity features, each image patch is then labeled based on its feature approximation from reference image patches. And a new patch-adaptive sparse approximation (PASA) method is designed with the following main components: minimum discrepancy criteria for sparse-based classification, patch-specific adaptation for discriminative approximation, and feature-space weighting for distance computation. The patch-wise labelings are then accumulated as probabilistic estimations for region-level classification. The proposed method is evaluated on a publicly available ILD database, showing encouraging performance improvements over the state-of-the-arts.

161 citations


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors developed an object detection framework using a discriminatively trained mixture model, which is mainly composed of two stages: model training and object detection, where multi-scale histogram of oriented gradients (HOG) feature pyramids of all training samples are constructed.
Abstract: Automatically detecting objects with complex appearance and arbitrary orientations in remote sensing imagery (RSI) is a big challenge. To explore a possible solution to the problem, this paper develops an object detection framework using a discriminatively trained mixture model. It is mainly composed of two stages: model training and object detection. In the model training stage, multi-scale histogram of oriented gradients (HOG) feature pyramids of all training samples are constructed. A mixture of multi-scale deformable part-based models is then trained for each object category by training a latent Support Vector Machine (SVM), where each part-based model is composed of a coarse root filter, a set of higher resolution part filters, and a set of deformation models. In the object detection stage, given a test imagery, its multi-scale HOG feature pyramid is firstly constructed. Then, object detection is performed by computing and thresholding the response of the mixture model. The quantitative comparisons with state-of-the-art approaches on two datasets demonstrate the effectiveness of the developed framework.

151 citations


Journal ArticleDOI
TL;DR: A highly reliable classification scheme was proposed by cascade classifier ensembles with reject option to accommodate the situations where no decision should be made if there exists adequate ambiguity, exhibiting promising potential for real-world applications.
Abstract: Vehicle-type recognition based on images is a challenging task. This paper comparatively studied two feature extraction methods for image description, i.e., the Gabor wavelet transform and the Pyramid Histogram of Oriented Gradients (PHOG). The Gabor transform has been widely adopted to extract image features for various vision tasks. PHOG has the superiority in its description of more discriminating information. A highly reliable classification scheme was proposed by cascade classifier ensembles with reject option to accommodate the situations where no decision should be made if there exists adequate ambiguity. The first ensemble is heterogeneous, consisting of several classifiers, including k-nearest neighbors (kNNs), multiple-layer perceptrons (MLPs), support vector machines (SVMs), and random forest. The classification reliability is further enhanced by a second classifier ensemble, which is composed of a set of base MLPs coordinated by an ensemble metalearning method called rotation forest (RF). For both of the ensembles, rejection option is accomplished by relating the consensus degree from majority voting to a confidence measure and by abstaining to classify ambiguous samples if the consensus degree is lower than a threshold. The final class label is assigned by dual majority voting from the two ensembles. Experimental results using more than 600 images from a variety of 21 makes of cars and vans demonstrated the effectiveness of the proposed approach. The cascade ensembles produce consistently reliable results. With a moderate ensemble size of 25 in the second ensemble, the two-stage classification scheme offers 98.65% accuracy with a rejection rate of 2.5%, exhibiting promising potential for real-world applications.

120 citations


Journal ArticleDOI
TL;DR: A HOG-based texture descriptor that uses a partition of the image into overlapping horizontal cells with gradual boundaries, to characterize single-line texts in outdoor scenes and is shown to outperform state-of-the-art text detection systems in two major publicly available databases.

95 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: The Histograms of Oriented Gradients descriptor is used in combination with a Support Vector Machine for classification as a basic method to process image data at twice the pixel frequency and to normalize blocks with the L1-Sqrt-norm resulting in an efficient resource utilization.
Abstract: This paper focuses on real-time pedestrian detection on Field Programmable Gate Arrays (FPGAs) using the Histograms of Oriented Gradients (HOG) descriptor in combination with a Support Vector Machine (SVM) for classification as a basic method. We propose to process image data at twice the pixel frequency and to normalize blocks with the L1-Sqrt-norm resulting in an efficient resource utilization. This implementation allows for parallel computation of different scales. Combined with a time-multiplex approach we increase multiscale capabilities beyond resource limitations. We are able to process 64 high resolution images (1920 × 1080 pixels) per second at 18 scales with a latency of less than 150 u s. 1.79 million HOG descriptors and their SVM classifications can be calculated per second and per scale, which outperforms current FPGA implementations by a factor of 4.

92 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: The proposed image features, imposing no specific assumption on the targets, are so general as to be applicable to any kinds of tasks regarding image classifications, and exhibits superior performances compared to the other existing methods.
Abstract: Image classification methods have been significantly developed in the last decade. Most methods stem from bag-of-features (BoF) approach and it is recently extended to a vector aggregation model, such as using Fisher kernels. In this paper, we propose a novel feature extraction method for image classification. Following the BoF approach, a plenty of local descriptors are first extracted in an image and the proposed method is built upon the probability density function (p.d.f) formed by those descriptors. Since the p.d.f essentially represents the image, we extract the features from the p.d.f by means of the gradients on the p.d.f. The gradients, especially their orientations, effectively characterize the shape of the p.d.f from the geometrical viewpoint. We construct the features by the histogram of the oriented p.d.f gradients via orientation coding followed by aggregation of the orientation codes. The proposed image features, imposing no specific assumption on the targets, are so general as to be applicable to any kinds of tasks regarding image classifications. In the experiments on object recognition and scene classification using various datasets, the proposed method exhibits superior performances compared to the other existing methods.

Proceedings ArticleDOI
25 Aug 2013
TL;DR: An offline signature verification system using three different pseudo-dynamic features, two different classifier training approaches and two datasets to solve the most difficult problems of off-line signature verification.
Abstract: We present an offline signature verification system using three different pseudo-dynamic features, two different classifier training approaches and two datasets. One of the most difficult problems of off-line signature verification is that the signature is just a static image while losing a lot of useful dynamic information. Three separate pseudo-dynamic features based on gray level: local binary pattern (LBP), gray level co-occurrence matrix (GLCM) and histogram of oriented gradients (HOG) are used. The classification is performed using writer-dependent Support Vector Machine (SVMs) classifiers and Global Real Adaboost method, where two different approaches to train the classifier. In the first mode, each SVM is trained with the feature vectors obtained from the reference signatures of the corresponding user and those random forgeries for each signer while the global Adaboost classifier is trained using genuine and random forgery signatures of signers that are excluded from the test set. The fusion of all features achieves the best result of 7.66% and 9.94% equal error rate in GPDS while 7.55% and 11.55% equal error rate in CSD respectively.

Journal ArticleDOI
TL;DR: This paper combines both local motion and appearance feature in a novel framework to model the temporal dynamics of face and body gesture and proposes a bag of words (BOW) based representation for both MHI-HOG and Image- HOG features.

Journal ArticleDOI
TL;DR: A method for localizing and labeling the lumbar vertebrae and intervertebral discs in mid-sagittal MR image slices based on a Markov-chain-like graphical model that can scale-invariantly localize discs and vertebra at the same time even in the existence of missing structures is presented.
Abstract: This paper presents a method for localizing and labeling the lumbar vertebrae and intervertebral discs in mid-sagittal MR image slices. The approach is based on a Markov-chain-like graphical model of the ordered discs and vertebrae in the lumbar spine. The graphical model is formulated by combining local image features and semiglobal geometrical information. The local image features are extracted from the image by employing pyramidal histogram of oriented gradients (PHOG) and a novel descriptor that we call image projection descriptor (IPD). These features are trained with support vector machines (SVM) and each pixel in the target image is locally assigned a score. These local scores are combined with the semiglobal geometrical information like the distance ratio and angle between the neighboring structures under the Markov random field (MRF) framework. An exact localization of discs and vertebrae is inferred from the MRF by finding a maximum a posteriori solution efficiently using dynamic programming. As a result of the novel features introduced, our system can scale-invariantly localize discs and vertebra at the same time even in the existence of missing structures. The proposed system is tested and validated on a clinical lumbar spine MR image dataset containing 80 subjects of which 64 have disc- and vertebra-related diseases and abnormalities. The experiments show that our system is successful even in abnormal cases and our results are comparable to the state of the art.

Proceedings ArticleDOI
25 Aug 2013
TL;DR: Experiments show that the Co-HOG based technique clearly outperforms state-of-the-art techniques that use HOG, Scale Invariant Feature Transform (SIFT), and Maximally Stable Extremal Regions (MSER).
Abstract: Scene text recognition is a fundamental step in End-to-End applications where traditional optical character recognition (OCR) systems often fail to produce satisfactory results. This paper proposes a technique that uses co-occurrence histogram of oriented gradients (Co-HOG) to recognize the text in scenes. Compared with histogram of oriented gradients (HOG), Co-HOG is a more powerful tool that captures spatial distribution of neighboring orientation pairs instead of just a single gradient orientation. At the same time, it is more efficient compared with HOG and therefore more suitable for real-time applications. The proposed scene text recognition technique is evaluated on ICDAR2003 character dataset and Street View Text (SVT) dataset. Experiments show that the Co-HOG based technique clearly outperforms state-of-the-art techniques that use HOG, Scale Invariant Feature Transform (SIFT), and Maximally Stable Extremal Regions (MSER).

Journal ArticleDOI
TL;DR: A novel railway tracks detection and turnouts recognition method using HOG (Histogram of Oriented Gradients) features was presented, which was able to correctly extract tracks and recognize turnouts even in very bad illumination conditions and run fast enough for practical use.
Abstract: Railway tracks detection and turnouts recognition are the basic tasks in driver assistance systems, which can determine the interesting regions for detecting obstacles and signals. In this paper, a novel railway tracks detection and turnouts recognition method using HOG (Histogram of Oriented Gradients) features was presented. At first, the approach computes HOG features and establishes integral images, and then extracts railway tracks by region-growing algorithm. Then based on recognizing the open direction of the turnout, we find the path where the train will travel through. Experiments demonstrated that our method was able to correctly extract tracks and recognize turnouts even in very bad illumination conditions and run fast enough for practical use. In addition, our approach only needs a computer and a cheap camera installed in the railroad vehicle, not specialized hardwares and equipment.

Journal ArticleDOI
04 Sep 2013-Sensors
TL;DR: A novel adaptive projection technique, which is based on a probabilistic formulation of the classifier performance, is introduced in the laser module, which enhances the generalization of the system, while at the same time, increasing the outdoor performance in comparison with current methods.
Abstract: This paper presents a human detection system that can be employed on board a mobile platform for use in autonomous surveillance of large outdoor infrastructures. The prediction is based on the fusion of two detection modules, one for the laser and another for the vision data. In the laser module, a novel feature set that better encapsulates variations due to noise, distance and human pose is proposed. This enhances the generalization of the system, while at the same time, increasing the outdoor performance in comparison with current methods. The vision module uses the combination of the histogram of oriented gradients descriptor and the linear support vector machine classifier. Current approaches use a fixed-size projection to define regions of interest on the image data using the range information from the laser range finder. When applied to small size unmanned ground vehicles, these techniques suffer from misalignment, due to platform vibrations and terrain irregularities. This is effectively addressed in this work by using a novel adaptive projection technique, which is based on a probabilistic formulation of the classifier performance. Finally, a probability calibration step is introduced in order to optimally fuse the information from both modules. Experiments in real world environments demonstrate the robustness of the proposed method.

01 Jan 2013
TL;DR: A set of features derived from skeleton tracking of the human body and depth maps for the purpose of action recognition are proposed, and a new descriptor for spatio-temporal feature extraction from color and depth images is introduced.
Abstract: We propose a set of features derived from skeleton tracking of the human body and depth maps for the purpose of action recognition. The descriptors proposed are easy to implement, produce relatively small-sized feature sets, and the multi-class classification scheme is fast and suitable for real-time applications. We intuitively characterize actions using pairwise affinities between view-invariant joint angles features over the performance of an action. Additionally, a new descriptor for spatio-temporal feature extraction from color and depth images is introduced. This descriptor involves an application of a modified histogram of oriented gradients (HOG) algorithm. The application produces a feature set at every frame, and these features are collected into a 2D array which then the same algorithm is applied to again (the approach is termed HOG 2 ). Both feature sets are evaluated in a bag-of-words scheme using a linear SVM, showing state-of-the-art results on public datasets from different domains of human-computer interaction.

Proceedings ArticleDOI
26 Jul 2013
TL;DR: This paper uses 2D Scale-invariant feature transform (SIFT) features together with 3D histogram of oriented gradients (HOG) features which are extracted in a pair of RGB and depth images captured synchronously, named SIFT-HOG features, to improve the robustness and accuracy of head pose estimation.
Abstract: In this paper, an approach is presented to estimate the 3D position and orientation of head from RGB and depth images captured by a commercial sensor Kinect. We use 2D Scale-invariant feature transform (SIFT) features together with 3D histogram of oriented gradients (HOG) features which are extracted in a pair of RGB and depth images captured synchronously, named SIFT-HOG features, to improve the robustness and accuracy of head pose estimation. We apply random forests to formulate pose estimation as a regression problem, due to their power for handling large training data and the high mapping speed. And then the mean-shift method is employed to refine the result obtained by the random forests. The experiment results demonstrate that our approach of head pose estimation is efficient.

Proceedings ArticleDOI
23 Jul 2013
TL;DR: Improved Histograms of Oriented Gradients features are used to represent the edge information of images in order to track in real-time, which achieves the required accuracy and satisfies real- time demand.
Abstract: Feature extraction methods are widely used in the object detection procedure. In this paper, improved Histograms of Oriented Gradients features are used to represent the edge information of images. In order to track in real-time, we use background subtraction detection with Histograms of Oriented Gradients, which achieves the required accuracy and satisfies real-time demand.

Proceedings ArticleDOI
01 Nov 2013
TL;DR: A real-time obstacle recognition framework designed to alert the visually impaired people/blind of their presence and to assist humans to navigate safely, in indoor and outdoor environments, by handling a Smartphone device is introduced.
Abstract: In this paper we introduce a real-time obstacle recognition framework designed to alert the visually impaired people/blind of their presence and to assist humans to navigate safely, in indoor and outdoor environments, by handling a Smartphone device. Static and dynamic objects are detected using interest points selected based on an image grid and tracked using the multiscale Lucas-Kanade algorithm. Next, we activated an object classification methodology. We incorporate HOG (Histogram of Oriented Gradients) descriptor into the BoVW (Bag of Visual Words) retrieval framework and demonstrate how this combination may be used for obstacle classification in video streams. The experimental results performed on various challenging scenes demonstrate that our approach is effective in image sequence with important camera movement, including noise and low resolution data and achieves high accuracy, while being computational efficient.

Proceedings ArticleDOI
04 Jun 2013
TL;DR: By integrating the three techniques, noticeable improvements over previous state-of-the-art on FDDB with real-time speed are demonstrated, under widely comparisons with both academic and commercial detectors.
Abstract: We present an effective deformable part model for face detection in the wild. Compared with previous systems on face detection, there are mainly three contributions. The first is an efficient method for calculating histogram of oriented gradients by pre-calculated lookup tables, which only has read and write memory operations and the feature pyramid can be calculated in real-time. The second is a Sparse Constrained Latent Bilinear Model to simultaneously learn the discriminative deformable part model, and reduce the feature dimension by sparse transformations for efficient inference. The third contribution is a deformable part based cascade, where every stage is a deformable part in the discriminatively learned model. By integrating the three techniques, we demonstrate noticeable improvements over previous state-of-the-art on FDDB with real-time speed, under widely comparisons with both academic and commercial detectors.

Journal ArticleDOI
TL;DR: A new method for strawberry detection for use in a strawberry harvesting robot based on a histogram of oriented gradients descriptor associated with a support vector machine (SVM) classifier achieves high detection accuracy (87%) in a reasonable run time, and can appropriately handle slightly overlapping strawberries.

Book ChapterDOI
03 Sep 2013
TL;DR: A variation of the L1-norm dual total variational (TV-L1) optical flow model is proposed with a new illumination-robust data term defined from the histogram of oriented gradients computed from two consecutive frames, which is significantly more robust to illumination changes.
Abstract: The brightness constancy assumption has widely been used in variational optical flow approaches as their basic foundation. Unfortunately, this assumption does not hold when illumination changes or for objects that move into a part of the scene with different brightness conditions. This paper proposes a variation of the L1-norm dual total variational (TV-L1) optical flow model with a new illumination-robust data term defined from the histogram of oriented gradients computed from two consecutive frames. In addition, a weighted non-local term is utilized for denoising the resulting flow field. Experiments with complex textured images belonging to different scenarios show results comparable to state-of-the-art optical flow models, although being significantly more robust to illumination changes.

Proceedings ArticleDOI
26 May 2013
TL;DR: The processor employs a VLSI-oriented HOG algorithm with early classification in Support Vector Machine (SVM) classification, a dual core architecture for parallel feature extraction, and a detection-window-size scalable architecture with a reconfigurable MAC array for processing objects of different shapes.
Abstract: In this paper, a Histogram of Oriented Gradients (HOG) feature extraction accelerator for real-time multiple object detection is presented. The processor employs three techniques: a VLSI-oriented HOG algorithm with early classification in Support Vector Machine (SVM) classification, a dual core architecture for parallel feature extraction, and a detection-window-size scalable architecture with a reconfigurable MAC array for processing objects of different shapes. Early classification reduces the number of computations in SVM classification. The dual core architecture and the detection-window-size scalable architecture enable the processor to operate in several modes: high-speed mode, low-power mode, multiple object detection mode, and multiple shape object detection mode. These techniques expand the processor flexibility required for versatile application. The test chip was fabricated using 65 nm CMOS technology. The proposed architecture is designed to process HDTV resolution video (1920 × 1080 pixels) at 30 frames per second (fps). The performance of this accelerator is demonstrated on a pedestrian detection system.

Proceedings ArticleDOI
07 Apr 2013
TL;DR: An approach for automatic recognition and classification of these standard views - namely the Parasternal Long Axis (PLAX) and the Short Axis (SAX) B-mode echocardiograms and the Histogram of Oriented Gradients used as the discriminating feature.
Abstract: When imaging the heart, using a 2D ultrasound probe, different views can manifest depending on the location and angulations of the probe. Some of these views have been labeled as standard views, due to the presentation and ease of assessment of key cardiac structures in them. We present an approach for automatic recognition and classification of these standard views - namely the Parasternal Long Axis (PLAX) and the Short Axis (SAX) B-mode echocardiograms. The Histogram of Oriented Gradients (HOG) used as the discriminating feature encodes the spatial arrangement of edges/gradients in the images. The HOG feature is computed on the pre-scan converted image data in the ultrasound beam space. On a fairly large database of 703 images, with a Support Vector Machine classifier we obtained an accuracy of about 98%.

Journal ArticleDOI
TL;DR: In this article, a driving action dataset was prepared by a side-mounted camera looking at a driver's left profile and the driving actions, including operating the shift lever, talking on a cell phone, eating, and smoking, were decomposed into a number of predefined action primitives.
Abstract: In the field of intelligent transportation system (ITS), automatic interpretation of a driver’s behavior is an urgent and challenging topic. This paper studies vision-based driving posture recognition in the human action recognition framework. A driving action dataset was prepared by a side-mounted camera looking at a driver’s left profile. The driving actions, including operating the shift lever, talking on a cell phone, eating, and smoking, are first decomposed into a number of predefined action primitives, that is, interaction with shift lever, operating the shift lever, interaction with head, and interaction with dashboard. A global grid-based representation for the action primitives was emphasized, which first generate the silhouette shape from motion history image, followed by application of the pyramid histogram of oriented gradients (PHOG) for more discriminating characterization. The random forest (RF) classifier was then exploited to classify the action primitives together with comparisons to some other commonly applied classifiers such as NN, multiple layer perceptron, and support vector machine. Classification accuracy is over 94% for the RF classifier in holdout and cross-validation experiments on the four manually decomposed driving actions.

Proceedings ArticleDOI
05 Nov 2013
TL;DR: This paper proposes an effective traffic sign recognition method using multiple features which have demonstrated effective in computer vision and are computationally efficient and reports an accuracy of 98.65%.
Abstract: Traffic sign recognition is difficult due to the low resolution of image, illumination variation and shape distortion. On the public dataset GTSRB, the state-of-the-art performance have been obtained by convolutional neural networks (CNNs), which learn discriminative features automatically to achieve high accuracy but suffer from high computation costs in both training and classification. In this paper, we propose an effective traffic sign recognition method using multiple features which have demonstrated effective in computer vision and are computationally efficient. The extracted features are the histogram of oriented gradients (HOG) feature, Gabor filter feature and local binary pattern (LBP) feature. Using a linear support vector machine (SVM) for classification, each feature yields fairly high accuracy. The combination of three features has shown good complementariness and yielded competitively high accuracy. On the GTSRB dataset, our method reports an accuracy of 98.65%.

Journal ArticleDOI
TL;DR: This paper presents a new implementation of the processing operations required in a widely-used pedestrian detection algorithm (the histogram of oriented gradients) when run in various configurations on a heterogeneous platform suitable for use as an embedded system and demonstrates that prioritization of each of these factors can be made by selecting a specific configuration.
Abstract: This paper presents a new implementation, with complete analysis, of the processing operations required in a widely-used pedestrian detection algorithm (the histogram of oriented gradients (HOG) detector) when run in various configurations on a heterogeneous platform suitable for use as an embedded system. The platform consists of field-programmable gate array (FPGA), graphics processing unit (GPU), and central processing unit (CPU) and we detail the advantages of such an image processing system for real-time performance. We thoroughly analyze the consequent tradeoffs made between power consumption, latency and accuracy for each possible configuration. We thus demonstrate that prioritization of each of these factors can be made by selecting a specific configuration. These separate configurations may then be changed dynamically to respond to changing priorities of a real-time system, e.g., on a moving vehicle. We compare the performance of real-time implementations of linear and kernel support vector machines in HOG and evaluate the entire system against the state-of-the-art in real-time person detection. We also show that our FPGA implementation detects pedestrians more accurately than existing implementations, and that a heterogeneous configuration which performs image scaling on the GPU, and histogram extraction and classification on the FPGA, produces a good compromise between power and speed.

Journal ArticleDOI
TL;DR: A sliding window approach based on Histogram of Oriented Gradients (HOG) features is used for Brazilian license plate detection, which consists in scanning the whole image in a multiscale fashion such that the license plate is located precisely.
Abstract: Due to the increasingly need for automatic traffic monitoring, vehicle license plate detection is of high interest to perform automatic toll collection, traffic law enforcement, parking lot access control, among others. In this paper, a sliding window approach based on Histogram of Oriented Gradients (HOG) features is used for Brazilian license plate detection. This approach consists in scanning the whole image in a multiscale fashion such that the license plate is located precisely. The main contribution of this work consists in a deep study of the best setup for HOG descriptors on the detection of Brazilian license plates, in which HOG have never been applied before. We also demonstrate the reliability of this method ensured by a recall higher than 98% (with a precision higher than 78%) in a publicly available data set.