Combination of Feature Extraction Methods for SVM Pedestrian Detection
Summary (4 min read)
I. INTRODUCTION
- T HIS PAPER describes a comprehensive combination of feature extraction methods for vision-based pedestrian detection in Intelligent Transportation Systems (ITS).
- The use of infrared cameras is quite an expensive option that makes mass production an untraceable problem nowadays, especially for the case of stereo vision systems where two cameras are needed.
- Some authors have demonstrated that the recognition of pedestrians by components is more effective than the recognition of the entire body [10] , [21] .
- For this purpose, several feature extraction methods have been implemented, compared, and combined.
- The implementation and comparative results achieved to date are presented and discussed in Section VI.
II. CANDIDATE SELECTION
- An efficient candidate selection mechanism is a crucial factor in the global performance of the pedestrian detection system.
- In addition, the computation of accurate disparity maps requires fine grain texture images in order to avoid noise generation.
- This implies managing very little information to detect obstacles, which may work well for big object detection, such as vehicles [26] , but might not be enough for small thin object detection, such as pedestrians.
- Conversely, the authors propose a candidate selection method based on the direct computation of the 3-D coordinates of relevant points in the scene.
- A major advantage is that outliers can be easily filtered out in 3-D space, which makes the method less sensitive to noise.
A. Three-Dimensional Computation of Relevant Points
- The 3-D representation of relevant points in the scene is computed in two stages.
- Features such as heads, arms, and legs are distinguishable, when visible, and are not heavily affected by different colors or clothes.
- The matching computational cost is further reduced in two ways.
- An increase in the window size causes the performance to degrade due to occlusion regions and smoothing of disparity values across boundaries.
- Finally, an XZ map (bird's eye view of the 3-D scene) is filtered following a neighborhood criterion.
B. Subtractive Clustering
- Data clustering techniques are related to the partitioning of a data set into several groups in such a way that the similarity within a group is larger than that among groups.
- Objects in the 3-D space are roughly modeled by means of Gaussian functions.
- This point is selected as the cluster center at the current iteration of the algorithm.
- After applying subtractive clustering to a set of input data, each cluster finally represents a candidate.
- 4) Densities are corrected according to (5).
C. Multicandidate (MC) Generation
- In practice, a multiple candidate selection strategy has been implemented.
- Accordingly, several candidates are generated for each candidate cluster by slightly shifting the original candidate bounding box in the u and v axes in the image plane.
- A major benefit derived from the MC approach is the fact that the classification performance of pedestrians at long distance increases.
- Fig. 3 depicts typical images from their test sequences.
- The number below the bounding box represents range.
A. Component-Based Approach
- There are some important aspects that need to be addressed when constructing a classifier, such as the global classification structure and the use of single or multiple cascaded classifiers.
- The first decision to make implies the development of a holistic classifier against a component-based approach.
- The component-based approach suggests the division of the candidate body into several parts over which features are computed.
- Thus, the first subregion is located in the zone where the head would be.
- This subregion is particularly useful to recognize stationary pedestrians.
B. Combination of Feature Extraction Methods
- The choice of the most appropriate features for pedestrian characterization remains a challenging problem nowadays since recognition performance depends crucially on the features that are used to represent pedestrians.
- There seems then to be an optimal feature extraction method for each candidate subregion.
- The comparison among the results achieved in the four experiments yields the final combination of features used in this paper: head-NTU; arms-Histogram; legs-HON; between-the-legs-NTU.
- The increase in performance due to the use of the proposed optimal combination of feature extraction methods is illustrated in Section VI.
A. Training Strategy
- The first step in the design of the training strategy is to create representative databases for learning and testing.
- The following considerations must be taken into account when creating the training and test sets.
- The ratio between positive and negative samples has to be set to an appropriate value.
- It is clear that daytime and nighttime samples must be compulsorily separated in order to create multiple specialized classifiers.
- Pedestrians intersecting the vehicle trajectory from the sides are usually easier to recognize since their legs are clearly visible and distinguishable.
B. Classifier Structure
- In the first stage of the classifier, features computed over each individual fixed subregion are fed to the input of individual SVM classifiers.
- Thus, there are six individual SVM classifiers corresponding to the six candidate subregions.
- Two different methods have been tested to carry out this operation.
- The second method that has been tested to implement the second stage of the classifier relies on the use of another SVM classifier.
- Additionally, an optimal kernel selection for the SVM classifiers has been performed.
V. MULTIFRAME VALIDATION AND TRACKING
- Once candidates are validated by the SVM classifier, a tracking stage takes place.
- For this purpose, detection results are temporally accumulated.
- The multiframe validation and tracking algorithm relies on Kalman filter theory to provide spatial estimates of detected pedestrians and Bayesian probability to provide an estimate of pedestrian detection certainty over time.
- A pedestrian entering the pretracking stage must be validated in several iterations before entering the tracking stage.
- Once a precandidate is validated, pretracking stops, and tracking starts.
VI. EXPERIMENTAL RESULTS
- The system was implemented on a Pentium IV PC at 2.4 GHz running the Knoppix GNU/Linux Operating System and Libsvm libraries [35] .
- Using 320 × 240 pixel images, the complete algorithm runs at an average rate of 20 frames/s, depending on the number of pedestrians being tracked and their position.
- Accordingly, negative samples (nonpedestrian samples) in the training sets were neither randomly nor manually selected.
- In the following sections, the results are compared and assessed using DR under certain FPRs.
- The selection of the FPR value has been made to show performance in representative points where differences between curves can be optimally appreciated.
A. Holistic versus Component-Based
- A first comparison is made in order to state the best performing approach among the holistic and component-based options.
- In particular, the training and test sets were designed to contain 10 000 and 3670 samples, respectively.
- As depicted in Fig. 5 , the performance of the holistic approach for all feature extraction methods is largely improved in the component-based approach.
- The Haar Wavelet is again below those figures.
- This shows that breaking the pedestrian into smaller pieces and specifically training the SVM for these pieces reduces the variability and lets the SVM generalize the models much better.
B. Combination of Optimal Features
- These results can further be improved by combining different feature extraction methods for different candidate subregions.
- The best performing features for each subregion are combined in a second classifier instead of applying the same feature extractor to all six subregions.
- The authors used the same training and test sets as in Section VI-A. Fig. 6 (a)-(f) shows the ROC curves for each separate subregion after computing the seven predefined features.
- As concluded in Section III-B, the selection of optimal features for each subregion is carried out as follows: head-NTU, arms-Histogram, legs-HON, between-the-legs-NTU.
- These results improve the performance of Canny's detector, which is the best performing feature extractor (in the conditions of the experiment conducted and described in Section VI-A), which exhibits a DR of 95% at an FPR of 2%.
C. of the Second-Stage Classifier
- Another comparison has been studied in order to analyze the influence of the second-stage classifier that combines the information delivered by the six specifically trained SVM models.
- In the first approach, the authors have used a simple-distance criterion (i.e., distance to the hyperplane separating pedestrians from nonpedestrians) that computes the addition of the six first-stage SVM outputs and then decides the classification by setting a threshold.
- Another option has been tested by training a two-stage SVM (2-SVM).
- Once again, the same training and test sets as in Section VI-A were used in this experiment.
- The results achieved to date show that the simple-distance criterion clearly outperforms the 2-SVM classifier, as depicted in Fig. 7(b) , where a comparison between both methods is shown when optimal feature extraction methods are applied.
D. Effect of Illumination Conditions and Candidate Size
- The need of separate training sets for day, night, and different candidate sizes is analyzed in this section.
- The purpose of this experiment is to analyze the performance of nighttime classification using a global daytime classifier.
- Fig. 8(b) shows that nighttime pedestrian detection is not accurate when training is carried out using daytime samples (DR is between 23% and 70% at an FPR of 10%).
- In the next experiment, three different SVM classifiers were trained using sets DS, DL, and G, respectively.
- The results are illustrated in Fig. 9(a) .
E. Effect of Bounding Box Accuracy
- The accuracy exhibited in bounding candidates is limited, and in fact, a multiple-hypothesis generation for each detected candidate is encouraged to boost classifier performance, as described in Section II-C. Fig. 10 (a) depicts the performance obtained after testing a set of badly bounded samples using a classifier trained on badly bounded samples.
- All methods exhibit much worse figures since none of the proposed extractors succeed in providing a DR above 83% (for the case of HON, which is the best performing one) at an FPR of 5%.
- The analysis of these results suggests that choosing the optimal feature extraction methods just in terms of DR and FPR can lead, in practice, to a decrease in recognition performance.
- Additionally, an MC generation stage has been developed in order to generate several candidates for each originally selected hypothesis to at least assure some well-fitted candidates that match the samples used for training.
F. Global Performance
- Some of the sequences were acquired in urban environments and others in nonurban areas.
- The purpose of this evaluation is to assess the combined operation of the attention mechanism and the SVMbased classifier, including the MC generation strategy, and a multiframe validation stage using Kalman filtering.
- Similarly, the DR is 93.24% in urban environments, where ten pedestrians were missed by the system.
- Concerning nonurban environments, three pedestrians were missed by the system in 72 min of operation.
- As happens in urban environments, false alarms are caused by real objects.
Did you find this useful? Give us your feedback
Citations
3,170 citations
Additional excerpts
...[38], [39], [40], [41]), we refer readers to [2], [42], [43]....
[...]
1,263 citations
Cites background from "Combination of Feature Extraction M..."
...While the latter two require models of the pedestrian class, e.g., in terms of geometry, appearance, or dynamics, the initial generation of regions of interest is usually based on more general low-level features or prior scene knowledge....
[...]
1,021 citations
Cites background from "Combination of Feature Extraction M..."
...In the case of [98], Parra et al. define the features as the cooccurrence matrix between Canny edges and normalized gray-scale image, the orientation histogram, the magnitude and orientation of the image gradient, and the texture unit number, which are then fed to an SVM classifier....
[...]
204 citations
Cites methods from "Combination of Feature Extraction M..."
...Other ROI selection techniques use stereo vision [6]–[10] or motion cues [11]....
[...]
199 citations
Cites methods from "Combination of Feature Extraction M..."
...A combination of multiple features, such as silhouette, appearance, holistic, and part-based, can be used as input to a SVM classifier [33], [34]....
[...]
References
40,826 citations
31,952 citations
"Combination of Feature Extraction M..." refers background in this paper
...This subregion is particularly useful to recognize stationary pedestrians....
[...]
28,073 citations
"Combination of Feature Extraction M..." refers background in this paper
...As long as a sufficient number of body parts or limbs are visible in the image, the component-based approach can still manage to provide correct classification results....
[...]
15,696 citations
5,112 citations
Related Papers (5)
Frequently Asked Questions (9)
Q2. What is the purpose of the multiframe validation and tracking algorithm?
The multiframe validation and tracking algorithm relies on Kalman filter theory to provide spatial estimates of detected pedestrians and Bayesian probability to provide an estimate of pedestrian detection certainty overtime.
Q3. What are the main reasons why the nighttime detection is not considered in this analysis?
Nonilluminated areas have not been considered in this analysis since pedestrian detection would not be possible beyond a few meters (6–8 m), and infrared cameras would be needed.
Q4. How many windows are generated per frame?
In average, the candidate selection mechanism generates six windows per frame, which yields a total of 90 candidates per frame after the MC process.
Q5. What is the first strategy to solve the bounding accuracy effect?
The first one consists of training the classifier with additional badly fitted pedestrians in an attempt to absorb either the extra information due to large bounding boxes containing part of the background or the loss of information due to small bounding boxes in which part of the pedestrian is not visible.
Q6. What is the importance of the optimal selection of discriminant features in a pedestrian detection system?
The optimal selection of discriminant features is an issue of the greatest importance in a pedestrian detection system considering the large variability problem that has to be solved in real scenarios.
Q7. What is the combination of feature extraction methods?
The optimal combination of feature extraction methods eases the learning stage, which makes the classifier less sensitive, in particular, to clothing.
Q8. How many false alarms were produced in the sequences?
five false alarms occurred in the sequences, which are mainly due tolampposts and trees located by the edge of the road, yielding an average ratio of four false alarms per hour.
Q9. What is the training set for the second stage of the SVM classifier?
a new training set is created by taking as inputs the outputs produced by the six already trained first-stage SVM classifiers (in theory, between −1 and +1) after applying the 15 000 samples contained in DS and taking as outputs the supervised outputs of DS.