scispace - formally typeset
Search or ask a question

Showing papers on "Histogram of oriented gradients published in 2009"


Proceedings ArticleDOI
20 Jun 2009
TL;DR: A framework for computing low bit-rate feature descriptors with a 20× reduction in bit rate is proposed and it is shown how to efficiently compute distances between descriptors in their compressed representation eliminating the need for decoding.
Abstract: Establishing visual correspondences is an essential component of many computer vision problems, and is often done with robust, local feature-descriptors. Transmission and storage of these descriptors are of critical importance in the context of mobile distributed camera networks and large indexing problems. We propose a framework for computing low bit-rate feature descriptors with a 20× reduction in bit rate. The framework is low complexity and has significant speed-up in the matching stage. We represent gradient histograms as tree structures which can be efficiently compressed. We show how to efficiently compute distances between descriptors in their compressed representation eliminating the need for decoding. We perform a comprehensive performance comparison with SIFT, SURF, and other low bit-rate descriptors and show that our proposed CHoG descriptor outperforms existing schemes.

282 citations


Journal ArticleDOI
TL;DR: Results from extensive tests on both urban and suburban videos indicate that the algorithm can produce a detection rate of more than 90% at the cost of about 10 false alarms/h and perform as fast as the frame rate on a Pentium IV 3.0-GHz personal computer, which demonstrates that the proposed system is feasible for practical applications and enjoys the advantage of low implementation cost.
Abstract: Pedestrian detection is one of the most important components in driver-assistance systems. In this paper, we propose a monocular vision system for real-time pedestrian detection and tracking during nighttime driving with a near-infrared (NIR) camera. Three modules (region-of-interest (ROI) generation, object classification, and tracking) are integrated in a cascade, and each utilizes complementary visual features to distinguish the objects from the cluttered background in the range of 20-80 m. Based on the common fact that the objects appear brighter than the nearby background in nighttime NIR images, efficient ROI generation is done based on the dual-threshold segmentation algorithm. As there is large intraclass variability in the pedestrian class, a tree-structured, two-stage detector is proposed to tackle the problem through training separate classifiers on disjoint subsets of different image sizes and arranging the classifiers based on Haar-like and histogram-of-oriented-gradients (HOG) features in a coarse-to-fine manner. To suppress the false alarms and fill the detection gaps, template-matching-based tracking is adopted, and multiframe validation is used to obtain the final results. Results from extensive tests on both urban and suburban videos indicate that the algorithm can produce a detection rate of more than 90% at the cost of about 10 false alarms/h and perform as fast as the frame rate (30 frames/s) on a Pentium IV 3.0-GHz personal computer, which also demonstrates that the proposed system is feasible for practical applications and enjoys the advantage of low implementation cost.

185 citations


Proceedings ArticleDOI
07 Nov 2009
TL;DR: This paper demonstrated that the PHOG with a significantly shorter vector length could achieve as high a recognition rate as the Gabor features did, and combined these two feature extraction methods and achieved the best smile recognition rate.
Abstract: Recognizing smiles is of much importance for detecting happy moods. Gabor features are conventionally widely applied to facial expression recognition, but the number of Gabor features is usually too large. We proposed to use Pyramid Histogram of Oriented Gradients (PHOG) as the features extracted for smile recognition in this paper. The comparisons between the PHOG and Gabor features using a publicly available dataset demonstrated that the PHOG with a significantly shorter vector length could achieve as high a recognition rate as the Gabor features did. Furthermore, the feature selection conducted by an AdaBoost algorithm was not needed when using the PHOG features. To further improve the recognition performance, we combined these two feature extraction methods and achieved the best smile recognition rate, indicating a good value of the PHOG features for smile recognitions.

122 citations


Journal ArticleDOI
TL;DR: This work compares several texture-based descriptors for fingerprints and proposes a novel image-based fingerprint matcher based on the minutiae alignment, which dramatically outperforms the other image- based fingerprint matchers proposed in the literature.
Abstract: This paper focuses on the use of image-based techniques in fingerprint verification. A detailed review of the existing literature is provided by classifying existing methods on the basis of their alignment procedure and discussing the most salient approaches and their pros and cons. Even if, at present, the image-based techniques do not gain performance comparable with that obtained by the best minutiae-based approaches, several good reasons can be listed to support the research on image-based approaches: the possibility of using additional features in combination with minutiae to improve verification performance, the availability of a fixed length feature vector which makes these approaches suitable to be indexed, to be coupled with a learning system or to be combined with tokenised random number in a two factor authentication system (Biohashing). In this work we compare several texture-based descriptors for fingerprints and propose a novel image-based fingerprint matcher based on the minutiae alignment. In this approach, the feature extraction is performed locally on a decomposition of the fingerprint in several overlapping sub-windows considering the following measures: Gabor filters descriptors, invariant local binary patterns and histogram of gradients. Moreover, we propose to perform a supervised selection of a small subset of descriptors, in order to reduce the dimensionality of the feature set and discarding the less discriminative features. Extensive experiments conducted over the four FVC2002 fingerprint databases using a blind testing protocol show that the proposed system dramatically outperforms the other image-based fingerprint matchers proposed in the literature. Moreover, a further experiment conducted on a set of images reconstructed from ISO templates show that, differently to the minutiae-based approaches, our image-based matcher cannot be faked with the sole knowledge of the minutiae position and orientation, at least the original orientation image is required in order have a chance of performing a successful attack.

74 citations


Proceedings ArticleDOI
01 Sep 2009
TL;DR: A novel body gender dataset covering a large diversity of human body appearance is introduced and it is concluded that the simple linear kernel appears to give the best overall performance.
Abstract: In this paper we focus on building robust image representations for gender classification from full human bodies. We first investigate a number of state-of-the-art image representations with regard to their suitability for gender profiling from static body images. Features include Histogram of Gradients (HOG), spatial pyramid HOG and spatial pyramid bag of words etc. These representations are learnt and combined based on a kernel support vector machine (SVM) classifier. We compare a number of different SVM kernels for this task but conclude that the simple linear kernel appears to give the best overall performance. Our study shows that individual adoption of these representations for gender classification is not as promising as might be expected, given their good performance in the tasks of pedestrian detection on INRIA datasets, and object categorisation on Caltech 101 and Caltech 256 datasets. Our best results, 80% classification accuracy, were achieved from a combination of spatial shape information, captured by HOG, and colour information captured by HSV histogram based features. Additionally, to the best of our knowledge, currently there is no publicly available dataset for full body gender recognition. Hence, we further introduce a novel body gender dataset covering a large diversity of human body appearance.

69 citations


Proceedings ArticleDOI
01 Sep 2009
TL;DR: This paper describes a highly-compact and low power embedded system that can run such vision systems at very high speed, and enables real-time implementations of object detection, object recognition, and vision-based navigation algorithms in small-size robots, micro-UAVs, and hand-held devices.
Abstract: Many recent visual recognition systems can be seen as being composed of multiple layers of convolutional filter banks, interspersed with various types of non-linearities. This includes Convolutional Networks, HMAX-type architectures, as well as systems based on dense SIFT features or Histogram of Gradients. This paper describes a highly-compact and low power embedded system that can run such vision systems at very high speed. A custom board built around a Xilinx Virtex-4 FPGA was built and tested. It measures 70 × 80 mm, and the complete system—FPGA, camera, memory chips, flash—consumes 15 watts in peak, and is capable of more than 4 × 109 multiply-accumulate operations per second in real vision application. This enables real-time implementations of object detection, object recognition, and vision-based navigation algorithms in small-size robots, micro-UAVs, and hand-held devices. Real-time face detection is demonstrated, with speeds of 10 frames per second at VGA resolution.

65 citations


Proceedings ArticleDOI
01 Jan 2009
TL;DR: A novel method is proposed that overcomes limitations in the use of 2D HOG and is more robust than motion silhouettes which are often compromised in real data by variable lighting, camera quality and occlusions from other objects.
Abstract: This paper proposes and demonstrates a novel method for the detection and classification of individual vehicles and pedestrians in urban scenes. In this scenario, shadows, lights and various occlusions compromise the accuracy of foreground segmentation and hence there are challenges with conventional silhouette-based methods. 2D features derived from histograms of oriented gradients (HOG) have been shown to be effective for detecting pedestrians and other objects. However, the appearance of vehicles varies substantially with the viewing angle and local features may be often occluded. In this paper, a novel method is proposed that overcomes limitations in the use of 2D HOG. Full 3D models are used for the object categories to be detected and the feature patches are defined over these models. A calibrated camera allows an affine transform of the observation into a normalised representation from which ‘3DHOG’ features are defined. A variable set of interest points is used in the detection and classification processes, depending on which points in the 3D model are visible. Experiments on real CCTV data of urban scenes demonstrate the proposed method. The 3DHOG feature is compared with features based on FFT and simple histograms. A baseline method using overlap between wire-frame models and motion silhouettes is also included. The results demonstrate that the proposed method achieves comparable performance. In particular, an advantage of the proposed method is that it is more robust than motion silhouettes which are often compromised in real data by variable lighting, camera quality and occlusions from other objects.

63 citations


Proceedings ArticleDOI
08 Dec 2009
TL;DR: A novel descriptor to characterize human action when it is being observed from a far field of view is presented, able to achieve perfect accuracy on two of the datasets, and perform comparably to other methods on the third dataset.
Abstract: In this paper, we present a novel descriptor to characterize human action when it is being observed from a far field of view. Visual cues are usually sparse and vague under this scenario. An action sequence is divided into overlapped spatial-temporal volumes to make reliable and comprehensive use of the observed features. Within each volume, we represent successive poses by time series of Histogram of Oriented Gradients (HOG) and movements by time series of Histogram of Oriented Optical Flow (HOOF). Supervised Principle Component Analysis (SPCA) is applied to seek a subset of discriminantly informative principle components (PCs) to reduce the dimension of histogram vectors without loss of accuracy. The final action descriptor is formed by concatenating sequences of SPCA projected HOG and HOOF features. A Support Vector Machines (SVM) classifier is trained to perform action classification. We evaluated our algorithm by testing it on one normal resolution and two low-resolution datasets, and compared our results with those of other reported methods. By using less than 1/5 the dimension a full-length descriptor, our method is able to achieve perfect accuracy on two of the datasets, and perform comparably to other methods on the third dataset.

55 citations


Proceedings ArticleDOI
02 Sep 2009
TL;DR: A new HoG (Histogram of Oriented Gradients) tracker for Gesture Recognition is introduced to build HoG trajectory descriptors (representing local motion) which are used for gesture recognition.
Abstract: We introduce a new HoG (Histogram of Oriented Gradients) tracker for Gesture Recognition. Our main contribution is to build HoG trajectory descriptors (representing local motion) which are used for gesture recognition. First,we select for each individual in the scene a set of corner points to determine textured regions where to compute 2DHoG descriptors. Second, we track these 2D HoG descriptors in order to build temporal HoG descriptors. Lost descriptors are replaced by newly detected ones. Finally, we extract the local motion descriptors to learn offline a set of given gestures.Then, a new video can be classified according to the gesture occurring in the video. Results shows that the tracker performs well compared to KLT tracker [1]. The generated local motion descriptors are validated through gesture learning-classification using the KTH action database [2].

48 citations


Proceedings ArticleDOI
03 Dec 2009
TL;DR: An algorithm for tracking multiple objects through occlusions using the FAST algorithm, which builds a descriptor based on the Histogram of Oriented Gradients (HOG) and tracks feature points using these descriptors.
Abstract: We present an algorithm for tracking multiple objects through occlusions. Firstly, for each detected object we compute feature points using the FAST algorithm [1]. Secondly, for each feature point we build a descriptor based on the Histogram of Oriented Gradients (HOG) [2]. Thirdly, we track feature points using these descriptors. Object tracking is possible even if objects are occluded. If few objects are merged and detected as a single one, we assign each newly detected feature point in such single object to one of these occluded objects. We apply probabilistic methods for this task, using information from the previous frames like object size and motion (speed and orientation). We use multi resolution images to decrease the processing time. Our approach is tested on the synthetic video sequence, the KTH dataset [3] and the CAVIAR dataset [4]. All tests confirm the effectiveness of our approach.

40 citations


Patent
12 Nov 2009
TL;DR: In this article, a method, apparatus and computer program product may also be provided for permitting a compressed representation of a feature descriptor to be compared with a plurality of compressed representations of feature descriptors of respective predefined features.
Abstract: A method, apparatus and computer program product may be provided for generating a plurality of compressed feature descriptors that can be represented by a relatively small number of bits, thereby facilitating transmission and storage of the feature descriptors. A method, apparatus and computer program product may also be provided for permitting a compressed representation of a feature descriptor to be compared with a plurality of compressed representations of feature descriptors of respective predefined features. By permitting the comparison to be performed utilizing compressed representations of feature descriptors, a respective feature descriptor may be identified without having to first decompress the feature descriptor, thereby potentially increasing the efficiency with which feature descriptors may be identified.

Proceedings ArticleDOI
03 Jun 2009
TL;DR: This paper presents a fast Histogram of Oriented Gradients (HOG) based weak classifier that is extremely fast to compute and highly discriminative and is more discriminant on a per feature basis.
Abstract: This paper presents a fast Histogram of Oriented Gradients (HOG) based weak classifier that is extremely fast to compute and highly discriminative. This feature set has been developed in an effort to balance the required processing and memory bandwidth so as to eliminate bottlenecks during run time evaluation. The feature set is the next generation in a series of features based on a novel precomputed image for HOG based features. It contains features which are more balanced in terms of processing and memory requirements than its predecessors, has a larger and richer feature space, and is more discriminant on a per feature basis.

Proceedings ArticleDOI
25 May 2009
TL;DR: This paper reformulates the Histogram of Oriented Gradients algorithm for calculation with a relatively simple instruction set architecture (ISA) and partition the image for parallel processing, resulting in a real-time pedestrian detector with detection accuracy comparable to a double-precision floating-point reference implementation.
Abstract: This paper describes the implementation of a real-time pedestrian detector on a single instruction, multiple data (SIMD), fixed-point digital signal processor (DSP). We reformulate the Histogram of Oriented Gradients algorithm for calculation with a relatively simple instruction set architecture (ISA) and partition the image for parallel processing. Results obtained using an ISA simulator indicate a maximum frame rate above 40fps for 1MPixel images, with a detection accuracy comparable to a double-precision floating-point reference implementation.

Proceedings ArticleDOI
08 Dec 2009
TL;DR: Algorithms and image features that can be used to construct a real-time hand detector using the Histogram of Oriented Gradients features in combination with two variations of the AdaBoost algorithm are described.
Abstract: In this paper we describe algorithms and image features that can be used to construct a real-time hand detector. We present our findings using the Histogram of Oriented Gradients (HOG) features in combination with two variations of the AdaBoost algorithm. First, we compare stump and tree weak classifier. Next, we investigate the influence of a large training database. Furthermore, we compare the performance of HOG against the Haar-like features.

Proceedings ArticleDOI
01 Jan 2009
TL;DR: A multi view approach is presented to detect frontal and profile pose of people face using Histogram of Oriented Gradients, i.e. HOG, features to detect faces using K-mean clustering technique.
Abstract: Face detection algorithms are widely used in computer vision as they provide fast and reliable results depending on the application domain. A multi view approach is here presented to detect frontal and profile pose of people face using histogram of oriented gradients, i.e. HOG, features. A K-mean clustering technique is used in a cascade of HOG feature classifiers to detect faces. The evaluation of the algorithm shows similar performance in terms of detection rate as state of the art algorithms. Moreover, unlike state of the art algorithms, our system can be quickly trained before detection is possible. Performance is considerably increased in terms of lower computational cost and lower false detection rate when combined with motion constraint given by moving objects in video sequences. The detected HOG features are integrated within a tracking framework and allow reliable face tracking results in several tested surveillance video sequences. (6 pages)

Book ChapterDOI
23 Sep 2009
TL;DR: This paper presents a second-order HOG feature which attempts to capture second- order properties of object appearance by estimating the pairwise relationships among spatially neighbor components of HOGfeature.
Abstract: Histogram of Oriented Gradients (HOG) is a well-known feature for pedestrian recognition which describes object appearance as local histograms of gradient orientation. However, it is incapable of describing higher-order properties of object appearance. In this paper we present a second-order HOG feature which attempts to capture second-order properties of object appearance by estimating the pairwise relationships among spatially neighbor components of HOG feature. In our preliminary experiments, we found that using harmonic-mean or min function to measure pairwise relationship gives satisfactory results. We demonstrate that the proposed second-order HOG feature can significantly improve the HOG feature on several pedestrian datasets, and it is also competitive to other second-order features including GLAC and CoHOG.

Proceedings Article
23 Jul 2009
TL;DR: An original approach for pedestrian detection using the neural network classifier called Concurrent Self-Organizing Maps (CSOM), previously introduced by first author; it represents a winner-takes-all collection of neural modules.
Abstract: The paper presents an original approach for pedestrian detection using the neural network classifier called Concurrent Self-Organizing Maps (CSOM), previously introduced by first author; it represents a winner-takes-all collection of neural modules. The algorithm has the following stages: (a) feature selection using one of the three candidate techniques Histogram of Oriented Gradients (HOG)/1D Haar transform/2D Haar transform; (b) classification using a CSOM classifier with two concurrent neural modules, where first module is trained with pedestrian images and the second one is trained with non-pedestrian images. We present the experimental results obtained by computer simulation of our model. For training and testing the neural classifier, we have used INRIA Person Dataset. One obtains the best Total Success Rate (TSR) of 99.7%.

01 Jan 2009
TL;DR: This paper proposes two novel descriptors for training an ensemble using random subspace with a set of support vector machines based on Gabor filters andLBPs and combines these two sets of features with the histogram of gradients to obtain results comparable to using more sophisticated approaches.
Abstract: The most common method for handling human action classification is to determine a common set of optimal features and then apply a machine-learning algorithm to classify them. In this paper we explore combining sets of different features for training an ensemble using random subspace with a set of support vector machines. We propose two novel descriptors for this task domain: one based on Gabor filters and the other based on local binary patterns (LBPs). We then combine these two sets of features with the histogram of gradients. We obtain an accuracy of 97.8% using the 10-class Weizmann dataset and a 100% accuracy rate using the 9-class Weizmann dataset. These results are comparable with the state of the art. By combining sets of relatively simple descriptors it is possible to obtain results comparable to using more sophisticated approaches. Our simpler approach, however, offers the advantage of being less computationally expensive.

Proceedings ArticleDOI
01 Dec 2009
TL;DR: A method of identifying persons who are using wheelchairs in real environments on the basis of HOG features extracted from disparity images using USV to determine the 3-D location of the humans moving within the monitored area, and compute a HOG feature vector from those cut-out images.
Abstract: We propose a method of identifying persons who are using wheelchairs in real environments on the basis of HOG features extracted from disparity images. First, we use USV, which is a stereo vision system for detecting humans, to determine the 3-D location of the humans moving within the monitored area, and cut out the corresponding regions in the disparity image. Then, we compute a HOG feature vector from those cut-out images, divide the results into the two classes of persons in wheelchairs and pedestrians and perform training and distinguishing by SVM. The data dealt with here involved people moving around without restriction in real environments. Also, there was no guarantee that the photographed environment was the same at the time of training data acquisition and at the time of recognition processing. We conducted training and recognition experiments on this method using video data acquired in the laboratory that confirmed a per-frame recognition rate of 99% or higher. Furthermore, after training with the same laboratory data and application to video data of visitors to an assistive products exhibition, a per-frame correct recognition rate of 80% was confirmed.

Proceedings ArticleDOI
06 Nov 2009
TL;DR: A robust pedestrian detection algorithm in low resolution on-board monocular camera image sequences of cluttered scenes by developing a motion based object detection algorithm to detect foreground objects by analyzing horizontal motion vector.
Abstract: In this paper we present a robust pedestrian detection algorithm in low resolution on-board monocular camera image sequences of cluttered scenes. At first a motion based object detection algorithm is developed to detect foreground objects by analyzing horizontal motion vector. A cascade structure of rejection type classifier is utilized for our pedestrian detection system. Initial stage of cascade, simple rule based classification techniques are used to separate pedestrian from obvious road side structural object and later part of the cascade, a more complex algorithm which is a combination of Histogram of Oriented Gradients(HOG) and Support Vector Machine(SVM) based classification techniques are utilized to separate pedestrian from non-pedestrian objects. Finally, the image segments are tracked by our Spatio-Temporal Markov Random Field model(S-T MRF). Results show that our algorithms are promising for pedestrian detection in cluttered scenes.

Dissertation
01 Sep 2009
TL;DR: A novel implementation of a pedestrian detector that runs on commodity graphics hardware and is up to 76 faster than the authors' CPU-only implementation is produced, and it is shown how one can combine a variety of detection and tracking techniques to robustly handled event detection scenarios such as theft and left-luggage detection.
Abstract: : Knowing who people are, where they are, what they are doing, and how they interact with other people and things is valuable from commercial, security, and space utilization perspectives. Video sensors backed by computer vision algorithms are a natural way to gather this data. Unfortunately, key technical issues persist in extracting features and models that are simultaneously efficient to compute and robust to issues such as adverse lighting conditions, distracting background motions, appearance changes over time, and occlusions. In this thesis, we present a set of techniques and model enhancements to better handle these problems, focusing on contributions in four areas. First, we improve background subtraction so it can better handle temporally irregular dynamic textures. This allows us to achieve a 5.5% drop in false positive rate on the Wall ower waving trees video. Secondly, we adapt the Dalal and Triggs Histogram of Oriented Gradients pedestrian detector to work on large-scale scenes with dense crowds and harsh lighting conditions: challenges which prevent us from easily using a background subtraction solution. These scenes contain hundreds of simultaneously visible people. To make using the algorithm computationally feasible, we have produced a novel implementation that runs on commodity graphics hardware and is up to 76 faster than our CPU-only implementation. We demonstrate the utility of this detector by modeling scene-level activities with a Hierarchical Dirichlet Process. Third, we show how one can improve the quality of pedestrian silhouettes for recognizing individual people. We combine general appearance information from a large population of pedestrians with semi-periodic shape information from individual silhouette sequences. Finally, we show how one can combine a variety of detection and tracking techniques to robustly handle a variety of event detection scenarios such as theft and left-luggage detection.

Proceedings ArticleDOI
07 Dec 2009
TL;DR: A novel scheme where image features are bundled into local groups and achieves a substantial improvement in average precision over the baseline conventional HOG approach, which is less computationally expensive than existing approaches.
Abstract: In this paper we present a novel scheme where image features are bundled into local groups. Specifically, features of Near Infrared (NIR) images extracted by using Histogram of Oriented Gradients (HOG) descriptor and those by our multislit method are bundled into a single descriptor. The method involves first localizing the spatial layout of body parts (head, torso, and legs) in individual frames using multislit structures, and associating these through a series of extracting HOG features. A bundled feature vector describing various types of poses is then constructed and used for detecting the pedestrians. Experiments with a database of NIR images show that our scheme achieves a substantial improvement in average precision over the baseline conventional HOG approach. Detection and recognition performance is less computationally expensive than existing approaches.

Proceedings ArticleDOI
17 Sep 2009
TL;DR: A new framework integrating a Multiple-Stage Histogram of Oriented Gradients based human detector and the Particle Filter Gaussian Process Dynamical Model for multiple targets detection and tracking is proposed.
Abstract: Detection and tracking of a varying number of people is very essential in surveillance sensor systems. In the real applications, due to various human appearance and confusors, as well as various environmental conditions, multiple targets detection and tracking become even more challenging. In this paper, we proposed a new framework integrating a Multiple-Stage Histogram of Oriented Gradients (HOG) based human detector and the Particle Filter Gaussian Process Dynamical Model (PFGPDM) for multiple targets detection and tracking. The Multiple-Stage HOG human detector takes advantage from both the HOG feature set and the human motion cues. The detector enables the framework detecting new targets entering the scene as well as providing potential hypotheses for particle sampling in the PFGPDM. After processing the detection results, the motion of each new target is calculated and projected to the low dimensional latent space of the GPDM to find the most similar trained motion trajectory. In addition, the particle propagation of existing targets integrates both the motion trajectory prediction in the latent space of GPDM and the hypotheses detected by the HOG human detector. Experimental tests are conducted on the IDIAP data set. The test results demonstrate that the proposed approach can robustly detect and track a varying number of targets with reasonable run-time overhead and performance.

Proceedings ArticleDOI
09 Jun 2009
TL;DR: A hybrid Rao -- Blackwellzed particle filter that combines two efficient, well-known tracking techniques with an innovative color observation representation method in order to improve the overall tracking performance is proposed.
Abstract: Camera based supervision is a critical component for patient monitoring in assistive environments. However, visual tracking still remains one of the biggest challenges in the area computer vision although it has been extensively studied during the previous decades. It this paper we propose a hybrid Rao -- Blackwellzed particle filter that combines two efficient, well-known tracking techniques with an innovative color observation representation method in order to improve the overall tracking performance. This representation is combined with color and edge representation to obtain improved tracking efficiency. Furthermore, the global edge description template for the edge representation (histogram of oriented gradients) was obtained using a machine learning technique. Initial experiments show that the principle behind the proposed algorithm is sound, yielding good results and thus allowing its adoption as an initial stage for patient behavior recognition.

Patent
01 Apr 2009
TL;DR: In this paper, an object classifying method and a system thereof are provided to prevent an inputted image from being influenced by the rotation of a subject regardless of the difference in the size of the image, corresponding to the distance between a camera and the subject, and the inclined degree of the subject correspond to the photographing angle.
Abstract: An object classifying method and a system thereof are provided to prevent an inputted image from being influenced by the rotation of a subject regardless of the difference in the size of the image, corresponding to the distance between a camera and the subject, and the inclined degree of the subject corresponding to the photographing angle. A central axis extracting unit(110) extracts the central axis of an inclined image. A central axis conversion unit(120) converts the central axis which the central axis extracting unit extracts so as to coincide with the y-axis. An image vector extraction unit(130) extracts a feature vector by dividing the converted image in the lattice type based on the HOG(Histogram of Oriented Gradients) algorithm. An image determination unit(140) determines whether the image is an image of a person or vehicle by inputting the extracted feature vector into an LSVM(Linear Support Vector Machine).

Journal Article
TL;DR: The method has a good retrieval performance for images that contain a single background and obvious shape and is invariant to scale, translation and rotation.
Abstract: This paper describes an extraction method of shape feature of images based on edge direction histogram.The shape feature extracted is invariant to scale,translation and rotation.The experiment result illuminates that the method has a good retrieval performance for images that contain a single background and obvious shape.

Patent
12 Nov 2009
TL;DR: The method involves comparing a compressed representation of a feature descriptor with a plurality of compressed representations of feature descriptor of respective predefined features, and the respective feature descriptor may be identified without having to first decompress the feature descriptor, thereby potentially increasing the efficiency of identifying feature descriptors.
Abstract: FIELD: information technology. SUBSTANCE: method and apparatus with a computer program product are provided for generating a plurality of compressed feature descriptors that can be represented by a relatively small number of bits, thereby simplifying transmission and storage of the feature descriptors. The method involves comparing a compressed representation of a feature descriptor with a plurality of compressed representations of feature descriptors of respective predefined features, and the respective feature descriptor may be identified without having to first decompress the feature descriptor, thereby potentially increasing the efficiency of identifying feature descriptors. EFFECT: faster and more accurate identification of features. 24 cl, 21 dwg

Proceedings ArticleDOI
TL;DR: A person detection algorithm that has been trained for different domains such as urban, rural and wooded scenes, and designed a sensor platform that can be mounted on a moving vehicle to collect video data of pedestrians.
Abstract: Semi-autonomous operation of intelligent vehicles may require that such platforms maintain a basic situational awareness with respect to people, other vehicles and their intent. These vehicles should be able to operate safely among people and other vehicles, and be able to perceive threats and respond accordingly. A key requirement is the ability to detect people and vehicles from a moving platform. We have developed one such algorithm using video cameras mounted on the vehicle. Our person detection algorithms model the shape and appearance of the person instead of modeling the background. This algorithm uses histogram of oriented gradients (HOG), which model shape and appearance using image edge histograms. These HOG descriptors are computed on an exhaustive set of image windows, which are then classified as person/non-person using a support vector machine classifier. The image windows are computed using camera calibration, which provides approximate size of people with respect to their location in the imagery. The algorithm is flexible and has been trained for different domains such as urban, rural and wooded scenes. We have designed a sensor platform that can be mounted on a moving vehicle to collect video data of pedestrians. Using manually annotated ground-truth data we have evaluated the person detection algorithm in terms of true positive and false positive rates. This paper provides a detailed overview of the algorithm, describes the experiments conducted and reports on algorithmic performance.

Proceedings ArticleDOI
25 Jul 2009
TL;DR: A new approach of object detection based on spatial salience region features is introduced which can preserve shape and contour of an object, and discriminates between object and non-object classes.
Abstract: This paper addresses the challenging problem of detecting objects in still images. A new approach of object detection based on spatial salience region features is introduced. The features consist of marginal distributions of an image over local and global patches. It can preserve shape and contour of an object, and discriminates between object and non-object classes. There are three main contributions in this paper. First of all, we expand the histogram of oriented gradients which can capture local and global compact features of object automatically by extracting features in salience regions only. Secondly, we employ feature similarity and Fisher criterion to measure discriminability of features and select some discriminative features to identify the object. Thirdly, a sparse bayesian classifier, the relevance vector machine, is constructed to train the selected features from target and surrounding background. The proposed algorithm is tested by some public database and pictures which obtained from surveillance video. Experimental results show that the proposed approach is efficient and accurate in object detection.

01 Jan 2009
TL;DR: A novel system is presented to detect and match any objects in a network of uncalibrated fixed and mobile cameras, and outperforms existing work such as scale invariant feature transform (SIFT), or the speeded up robust features (SURF).
Abstract: Most multi-camera systems assume a well structured environment to detect and match objects across cameras. Cameras need to be fixed and calibrated. In this work, a novel system is presented to detect and match any objects in a network of uncalibrated fixed and mobile cameras. A master-slave system is presented. Objects are detected with the mobile cameras (the slaves) given only their observations from the fixed cameras (the masters). No training stage and data are used. Detected objects are correctly matched across cameras leading to a better understanding of the scene. A cascade of dense region descriptors is proposed to describe any object of interest. Various region descriptors are studied such as color histogram, histogram of oriented gradients, Haar-wavelet responses, and covariance matrices of various features. The proposed approach outperforms existing work such as scale invariant feature transform (SIFT), or the speeded up robust features (SURF). Moreover, a sparse scan of the image plane is proposed to reduce the search space of the detection and matching process, approaching nearly real-time performance. The approach is robust to changes in illuminations, viewpoints, color distributions and image quality. Partial occlusions are also handled.