scispace - formally typeset
Search or ask a question

Showing papers on "Object detection published in 2004"


Proceedings ArticleDOI
27 Jun 2004
TL;DR: A real-time version of the system was implemented that can detect and classify objects in natural scenes at around 10 frames per second and proved impractical, while convolutional nets yielded 16/7% error.
Abstract: We assess the applicability of several popular learning methods for the problem of recognizing generic visual categories with invariance to pose, lighting, and surrounding clutter. A large dataset comprising stereo image pairs of 50 uniform-colored toys under 36 azimuths, 9 elevations, and 6 lighting conditions was collected (for a total of 194,400 individual images). The objects were 10 instances of 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. Five instances of each category were used for training, and the other five for testing. Low-resolution grayscale images of the objects with various amounts of variability and surrounding clutter were used for training and testing. Nearest neighbor methods, support vector machines, and convolutional networks, operating on raw pixels or on PCA-derived features were tested. Test error rates for unseen object instances placed on uniform backgrounds were around 13% for SVM and 7% for convolutional nets. On a segmentation/recognition task with highly cluttered images, SVM proved impractical, while convolutional nets yielded 16/7% error. A real-time version of the system was implemented that can detect and classify objects in natural scenes at around 10 frames per second.

1,509 citations


Journal ArticleDOI
TL;DR: Quantitative evaluation and comparison show that the proposed Bayesian framework for foreground object detection in complex environments provides much improved results.
Abstract: This paper addresses the problem of background modeling for foreground object detection in complex environments. A Bayesian framework that incorporates spectral, spatial, and temporal features to characterize the background appearance is proposed. Under this framework, the background is represented by the most significant and frequent features, i.e., the principal features , at each pixel. A Bayes decision rule is derived for background and foreground classification based on the statistics of principal features. Principal feature representation for both the static and dynamic background pixels is investigated. A novel learning method is proposed to adapt to both gradual and sudden "once-off" background changes. The convergence of the learning process is analyzed and a formula to select a proper learning rate is derived. Under the proposed framework, a novel algorithm for detecting foreground objects from complex environments is then established. It consists of change detection, change classification, foreground segmentation, and background maintenance. Experiments were conducted on image sequences containing targets of interest in a variety of environments, e.g., offices, public buildings, subway stations, campuses, parking lots, airports, and sidewalks. Good results of foreground detection were obtained. Quantitative evaluation and comparison with the existing method show that the proposed method provides much improved results.

1,120 citations


Journal ArticleDOI
TL;DR: A learning-based approach to the problem of detecting objects in still, gray-scale images that makes use of a sparse, part-based representation is developed and a critical evaluation of the approach under the proposed standards is presented.
Abstract: We study the problem of detecting objects in still, gray-scale images. Our primary focus is the development of a learning-based approach to the problem that makes use of a sparse, part-based representation. A vocabulary of distinctive object parts is automatically constructed from a set of sample images of the object class of interest; images are then represented using parts from this vocabulary, together with spatial relations observed among the parts. Based on this representation, a learning algorithm is used to automatically learn to detect instances of the object class in new images. The approach can be applied to any object with distinguishable parts in a relatively fixed spatial configuration; it is evaluated here on difficult sets of real-world images containing side views of cars, and is seen to successfully detect objects in varying conditions amidst background clutter and mild occlusion. In evaluating object detection approaches, several important methodological issues arise that have not been satisfactorily addressed in the previous work. A secondary focus of this paper is to highlight these issues, and to develop rigorous evaluation standards for the object detection problem. A critical evaluation of our approach under the proposed standards is presented.

970 citations


Proceedings ArticleDOI
27 Jun 2004
TL;DR: A multi-class boosting procedure (joint boosting) is presented that reduces both the computational and sample complexity, by finding common features that can be shared across the classes.
Abstract: We consider the problem of detecting a large number of different object classes in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, which can be slow and require much training data. We present a multi-class boosting procedure (joint boosting) that reduces both the computational and sample complexity, by finding common features that can be shared across the classes. The detectors for each class are trained jointly, rather than independently. For a given performance level, the total number of features required is observed to scale approximately logarithmically with the number of classes. In addition, we find that the features selected by independently trained classifiers are often specific to the class, whereas the features selected by the jointly trained classifiers are more generic features, such as lines and edges.

621 citations


Proceedings ArticleDOI
B. Froba1, A. Ernst1
17 May 2004
TL;DR: An efficient four-stage classifier for rapid detection of illumination invariant local structure features for object detection and a modified census transform which enhances the original work of Zabih and Woodfill is proposed.
Abstract: Illumination variation is a big problem in object recognition, which usually requires a costly compensation prior to classification. It would be desirable to have an image-to-image transform, which uncovers only the structure of an object for an efficient matching. In this context the contribution of our work is two-fold. First, we introduce illumination invariant local structure features for object detection. For an efficient computation we propose a modified census transform which enhances the original work of Zabih and Woodfill. We show some shortcomings and how to get over them with the modified version. S6econdly, we introduce an efficient four-stage classifier for rapid detection. Each single stage classifier is a linear classifier, which consists of a set of feature lookup-tables. We show that the first stage, which evaluates only 20 features filters out more than 99% of all background positions. Thus, the classifier structure is much simpler than previous described multi-stage approaches, while having similar capabilities. The combination of illumination invariant features together with a simple classifier leads to a real-time system on standard computers (60 msec, image size: 288/spl times/384, 2GHi Pentium). Detection results are presented on two commonly used databases in this field namely the MIT+CMU set of 130 images and the BioID set of 1526 images. We are achieving detection rates of more than 90% with a very low false positive rate of 10/sup -7/%. We also provide a demo program that can be found on the Internet http://www.iis.fraunhofer.de/bv/biometrie/download/.

534 citations


Proceedings ArticleDOI
14 Jun 2004
TL;DR: The functional and architectural breakdown of a monocular pedestrian detection system is described and the approach for single-frame classification based on a novel scheme of breaking down the class variability by repeatedly training a set of relatively simple classifiers on clusters of the training set is described.
Abstract: We describe the functional and architectural breakdown of a monocular pedestrian detection system. We describe in detail our approach for single-frame classification based on a novel scheme of breaking down the class variability by repeatedly training a set of relatively simple classifiers on clusters of the training set. Single-frame classification performance results and system level performance figures for daytime conditions are presented with a discussion about the remaining gap to meet a daytime normal weather condition production system.

424 citations


Proceedings ArticleDOI
27 Jun 2004
TL;DR: This work shows that using local edge orientation histograms (EOH) as features can significantly improve performance compared to the standard linear features used in existing systems and enables learning a system that seems to outperform the state of the art in real-time systems even with a small number of training examples.
Abstract: Face detection systems have recently achieved high detection rates and real-time performance. However, these methods usually rely on a huge training database (around 5,000 positive examples for good performance). While such huge databases may be feasible for building a system that detects a single object, it is obviously problematic for scenarios where multiple objects (or multiple views of a single object) need to be detected. Indeed, even for multi-viewface detection the performance of existing systems is far from satisfactory. In this work we focus on the problem of learning to detect objects from a small training database. We show that performance depends crucially on the features that are used to represent the objects. Specifically, we show that using local edge orientation histograms (EOH) as features can significantly improve performance compared to the standard linear features used in existing systems. For frontal faces, local orientation histograms enable state of the art performance using only a few hundred training examples. For profile view faces, local orientation histograms enable learning a system that seems to outperform the state of the art in real-time systems even with a small number of training examples.

411 citations


Journal ArticleDOI
TL;DR: A trainable object detector achieves reliable and efficient detection of human faces and passenger cars with out-of-plane rotation.
Abstract: In this paper we describe a trainable object detector and its instantiations for detecting faces and cars at any size, location, and pose. To cope with variation in object orientation, the detector uses multiple classifiers, each spanning a different range of orientation. Each of these classifiers determines whether the object is present at a specified size within a fixed-size image window. To find the object at any location and size, these classifiers scan the image exhaustively. Each classifier is based on the statistics of localized parts. Each part is a transform from a subset of wavelet coefficients to a discrete set of values. Such parts are designed to capture various combinations of locality in space, frequency, and orientation. In building each classifier, we gathered the class-conditional statistics of these part values from representative samples of object and non-object images. We trained each classifier to minimize classification error on the training set by using Adaboost with Confidence-Weighted Predictions (Shapire and Singer, 1999). In detection, each classifier computes the part values within the image window and looks up their associated class-conditional probabilities. The classifier then makes a decision by applying a likelihood ratio test. For efficiency, the classifier evaluates this likelihood ratio in stages. At each stage, the classifier compares the partial likelihood ratio to a threshold and makes a decision about whether to cease evaluation—labeling the input as non-object—or to continue further evaluation. The detector orders these stages of evaluation from a low-resolution to a high-resolution search of the image. Our trainable object detector achieves reliable and efficient detection of human faces and passenger cars with out-of-plane rotation.

399 citations


Proceedings Article
01 Dec 2004
TL;DR: This work introduces Boosted Random Fields (BRFs), which uses Boosting to learn the graph structure and local evidence of a conditional random field (CRF) and applies it to detect stuff and things in office and street scenes.
Abstract: We seek to both detect and segment objects in images. To exploit both local image data as well as contextual information, we introduce Boosted Random Fields (BRFs), which uses Boosting to learn the graph structure and local evidence of a conditional random field (CRF). The graph structure is learned by assembling graph fragments in an additive model. The connections between individual pixels are not very informative, but by using dense graphs, we can pool information from large regions of the image; dense models also support efficient inference. We show how contextual information from other objects can improve detection performance, both in terms of accuracy and speed, by using a computational cascade. We apply our system to detect stuff and things in office and street scenes.

369 citations


Proceedings ArticleDOI
27 Jun 2004
TL;DR: A method that can track humans in crowded environments, with significant and persistent occlusion by making use of human shape models in addition to camera models, the assumption that humans walk on a plane and acquired appearance models is presented.
Abstract: Tracking of humans in dynamic scenes has been an important topic of research. Most techniques, however, are limited to situations where humans appear isolated and occlusion is small. Typical methods rely on appearance models that must be acquired when the humans enter the scene and are not occluded. We present a method that can track humans in crowded environments, with significant and persistent occlusion by making use of human shape models in addition to camera models, the assumption that humans walk on a plane and acquired appearance models. Experimental results and a quantitative evaluation are included.

355 citations


Proceedings ArticleDOI
27 Jun 2004
TL;DR: This work shows how to combine bottom-up and top-up approaches into a single figure-ground segmentation process that provides accurate delineation of object boundaries that cannot be achieved by either the top-down or bottom- up approach alone.
Abstract: In this work we show how to combine bottom-up and top-down approaches into a single figure-ground segmentation process. This process provides accurate delineation of object boundaries that cannot be achieved by either the top-down or bottom-up approach alone. The top-down approach uses object representation learned from examples to detect an object in a given input image and provide an approximation to its figure-ground segmentation. The bottom-up approach uses image-based criteria to define coherent groups of pixels that are likely to belong together to either the figure or the background part. The combination provides a final segmentation that draws on the relative merits of both approaches: The result is as close as possible to the top-down approximation, but is also constrained by the bottom-up process to be consistent with significant image discontinuities. We construct a global cost function that represents these top-down and bottom-up requirements. We then show how the global minimum of this function can be efficiently found by applying the sum-product algorithm. This algorithm also provides a confidence map that can be used to identify image regions where additional top-down or bottom-up information may further improve the segmentation. Our experiments show that the results derived from the algorithm are superior to results given by a pure top-down or pure bottom-up approach. The scheme has broad applicability, enabling the combined use of a range of existing bottom-up and top-down segmentations.

Proceedings ArticleDOI
27 Jun 2004
TL;DR: The paper investigates the unsupervised learning of a model of activity for a multi-camera surveillance network that can be created from a large set of observations that enables the learning algorithm to establish links between camera views associated with an activity.
Abstract: The paper investigates the unsupervised learning of a model of activity for a multi-camera surveillance network that can be created from a large set of observations. This enables the learning algorithm to establish links between camera views associated with an activity. The learning algorithm operates in a correspondence-free manner, exploiting the statistical consistency of the observation data. The derived model is used to automatically determine the topography of a network of cameras and to provide a means for tracking targets across the "blind" areas of the network. A theoretical justification and experimental validation of the methods are provided.

Proceedings ArticleDOI
28 Sep 2004
TL;DR: A new method is presented for detecting triangular, square and octagonal road signs efficiently and robustly using the symmetric nature of these shapes, together with the pattern of edge orientations exhibited by equiangular polygons with a known number of sides to establish possible shape centroid locations.
Abstract: A new method is presented for detecting triangular, square and octagonal road signs efficiently and robustly. The method uses the symmetric nature of these shapes, together with the pattern of edge orientations exhibited by equiangular polygons with a known number of sides, to establish possible shape centroid locations in the image. This approach is invariant to in-plane rotation and returns the location and size of the shape detected. Results on still images show a detection rate of over 95%. The method is efficient enough for real-time applications, such as on-board-vehicle sign detection.

Proceedings ArticleDOI
23 Aug 2004
TL;DR: Summarisation in terms of semantic regions is demonstrated using acted scenes through automatic recovery of the instructions given to the actor and the use of 'unusual inactivity' detection as a cue for fall detection is demonstrated.
Abstract: Automatic semantic summarisation of human activity and detection of unusual inactivity are useful goals for a vision system operating in a supportive home environment. Learned models of spatial context are used in conjunction with a tracker to achieve these goals. The tracker uses a coarse ellipse model and a particle filter to cope with cluttered scenes with multiple sources of illumination. Summarisation in terms of semantic regions is demonstrated using acted scenes through automatic recovery of the instructions given to the actor. The use of 'unusual inactivity' detection as a cue for fall detection is also demonstrated.

Book ChapterDOI
11 May 2004
TL;DR: The first stage of a new learning system for object detection and recognition using Boosting as the underlying learning technique and the inclusion of features from segmented re- gions and even spatial relationships leads us a significant step towards generic object recognition.
Abstract: In this paper we describe the first stage of a new learning system for object detection and recognition. For our system we propose Boosting (5) as the underlying learning technique. This allows the use of very diverse sets of visual features in the learning process within a com- mon framework: Boosting — together with a weak hypotheses finder — may choose very inhomogeneous features as most relevant for combina- tion into a final hypothesis. As another advantage the weak hypotheses finder may search the weak hypotheses space without explicit calculation of all available hypotheses, reducing computation time. This contrasts the related work of Agarwal and Roth (1) where Winnow was used as learning algorithm and all weak hypotheses were calculated explicitly. In our first empirical evaluation we use four types of local descriptors: two basic ones consisting of a set of grayvalues and intensity moments and two high level descriptors: moment invariants (8) and SIFTs (12). The descriptors are calculated from local patches detected by an inter- est point operator. The weak hypotheses finder selects one of the local patches and one type of local descriptor and efficiently searches for the most discriminative similarity threshold. This differs from other work on Boosting for object recognition where simple rectangular hypotheses (22) or complex classifiers (20) have been used. In relatively simple images, where the objects are prominent, our approach yields results comparable to the state-of-the-art (3). But we also obtain very good results on more complex images, where the objects are located in arbitrary positions, poses, and scales in the images. These results indicate that our flexible approach, which also allows the inclusion of features from segmented re- gions and even spatial relationships, leads us a significant step towards generic object recognition.

Proceedings ArticleDOI
24 Aug 2004
TL;DR: This paper proposes a prediction-based energy saving scheme, called PES, to reduce the energy consumption for object tracking under acceptable conditions, and compares PES against the basic schemes proposed in the paper to explore the conditions under which PES is most desired.
Abstract: In order to fully realize the potential of sensor networks, energy awareness should be incorporated into every stage of the network design and operation. In this paper, we address the energy management issue in a sensor network killer application - object tracking sensor networks (OTSNs). Based on the fact that the movements of the tracked objects are sometimes predictable, we propose a prediction-based energy saving scheme, called PES, to reduce the energy consumption for object tracking under acceptable conditions. We compare PES against the basic schemes we proposed in the paper to explore the conditions under which PES is most desired. We also test the effect of some parameters related to the system workload, object moving behavior and sensing operations on PES through extensive simulation. Our results show that PES can save significant energy under various conditions.

Journal ArticleDOI
TL;DR: This paper proposes a local intensity normalization method to effectively handle lighting variations, followed by a Gabor transform to obtain local features, and finally a linear discriminant analysis (LDA) method for feature selection.
Abstract: In this paper, we present an approach to automatic detection and recognition of signs from natural scenes, and its application to a sign translation task. The proposed approach embeds multiresolution and multiscale edge detection, adaptive searching, color analysis, and affine rectification in a hierarchical framework for sign detection, with different emphases at each phase to handle the text in different sizes, orientations, color distributions and backgrounds. We use affine rectification to recover deformation of the text regions caused by an inappropriate camera view angle. The procedure can significantly improve text detection rate and optical character recognition (OCR) accuracy. Instead of using binary information for OCR, we extract features from an intensity image directly. We propose a local intensity normalization method to effectively handle lighting variations, followed by a Gabor transform to obtain local features, and finally a linear discriminant analysis (LDA) method for feature selection. We have applied the approach in developing a Chinese sign translation system, which can automatically detect and recognize Chinese signs as input from a camera, and translate the recognized text into English.

Journal ArticleDOI
TL;DR: The problem of separating moving cast shadows from the moving objects in an outdoor environment is addressed with an approach based on a new spatio-temporal albedo test and dichromatic reflection model and accounts for both the sun and the sky illuminations.
Abstract: Current moving object detection systems typically detect shadows cast by the moving object as part of the moving object. In this paper, the problem of separating moving cast shadows from the moving objects in an outdoor environment is addressed. Unlike previous work, we present an approach that does not rely on any geometrical assumptions such as camera location and ground surface/object geometry. The approach is based on a new spatio-temporal albedo test and dichromatic reflection model and accounts for both the sun and the sky illuminations. Results are presented for several video sequences representing a variety of ground materials when the shadows are cast on different surface types. These results show that our approach is robust to widely different background and foreground materials, and illuminations.

Proceedings ArticleDOI
17 May 2004
TL;DR: A novel, unsupervised approach to training an efficient and robust detector which is capable of not only detecting the presence of human hands within an image but classifying the hand shape.
Abstract: The ability to detect a persons unconstrained hand in a natural video sequence has applications in sign language, gesture recognition and HCl. This paper presents a novel, unsupervised approach to training an efficient and robust detector which is capable of not only detecting the presence of human hands within an image but classifying the hand shape. A database of images is first clustered using a k-method clustering algorithm with a distance metric based upon shape context. From this, a tree structure of boosted cascades is constructed. The head of the tree provides a general hand detector while the individual branches of the tree classify a valid shape as belong to one of the predetermined clusters exemplified by an indicative hand shape. Preliminary experiments carried out showed that the approach boasts a promising 99.8% success rate on hand detection and 97.4% success at classification. Although we demonstrate the approach within the domain of hand shape it is equally applicable to other problems where both detection and classification are required for objects that display high variability in appearance.

Journal ArticleDOI
TL;DR: It is argued that feature selection is an important problem in object detection and demonstrated that genetic algorithms (GAs) provide a simple, general, and powerful framework for selecting good subsets of features, leading to improved detection rates.

Proceedings ArticleDOI
14 Jun 2004
TL;DR: The PROTECTOR system combines pedestrian detection, trajectory estimation, risk assessment and driver warning, and an optimization scheme models the system as a succession of individual modules and finds a good overall parameter setting by combining individual ROCs using a convex-hull technique.
Abstract: This paper presents the results of the first large-scale field tests on vision-based pedestrian protection from a moving vehicle. Our PROTECTOR system combines pedestrian detection, trajectory estimation, risk assessment and driver warning. The paper pursues a "system approach" related to the detection component. An optimization scheme models the system as a succession of individual modules and finds a good overall parameter setting by combining individual ROCs using a convex-hull technique. On the experimental side, we present a methodology for the validation of the pedestrian detection performance in an actual vehicle setting. We hope this test methodology to contribute towards the establishment of benchmark testing, enabling this application to mature. We validate the PROTECTOR system using the proposed methodology and present interesting quantitative results based on tens of thousands of images from hours of driving. Although results are promising, more research is needed before such systems can be placed at the hands of ordinary vehicle drivers.

Proceedings ArticleDOI
03 Oct 2004
TL;DR: This paper provides a critical survey of recent vision-based on-road vehicle detection systems appeared in the literature (i.e., the cameras are mounted on the vehicle rather than being static such as in traffic/driveway monitoring systems).
Abstract: As one of the most promising applications of computer vision, vision-based vehicle detection for driver assistance has received considerable attention over the last 15 years. There are at least three reasons for the blooming research in this field: first, the startling losses both in human lives and finance caused by vehicle accidents; second, the availability of feasible technologies accumulated within the last 30 years of computer vision research; and third, the exponential growth of processor speed has paved the way for running computation-intensive video-processing algorithms even on a low-end PC in realtime. This paper provides a critical survey of recent vision-based on-road vehicle detection systems appeared in the literature (i.e., the cameras are mounted on the vehicle rather than being static such as in traffic/driveway monitoring systems).

Journal ArticleDOI
TL;DR: It is demonstrated that an acceptable, expedient solution of the energy functional is possible through a search of the image-level lines: boundaries of connected components within the level sets obtained by threshold decomposition.
Abstract: We propose a cell detection and tracking solution using image-level sets computed via threshold decomposition. In contrast to existing methods where manual initialization is required to track individual cells, the proposed approach can automatically identify and track multiple cells by exploiting the shape and intensity characteristics of the cells. The capture of the cell boundary is considered as an evolution of a closed curve that maximizes image gradient along the curve enclosing a homogeneous region. An energy functional dependent upon the gradient magnitude along the cell boundary, the region homogeneity within the cell boundary and the spatial overlap of the detected cells is minimized using a variational approach. For tracking between frames, this energy functional is modified considering the spatial and shape consistency of a cell as it moves in the video sequence. The integrated energy functional complements shape-based segmentation with a spatial consistency based tracking technique. We demonstrate that an acceptable, expedient solution of the energy functional is possible through a search of the image-level lines: boundaries of connected components within the level sets obtained by threshold decomposition. The level set analysis can also capture multiple cells in a single frame rather than iteratively computing a single active contour for each individual cell. Results of cell detection using the energy functional approach and the level set approach are presented along with the associated processing time. Results of successful tracking of rolling leukocytes from a number of digital video sequences are reported and compared with the results from a correlation tracking scheme.

Proceedings ArticleDOI
27 Jun 2004
TL;DR: In this article, the authors introduce a new class of distinguished regions based on detecting the most salient convex local arrangements of contours in the image, which are used in a similar way to the local interest points extracted from gray-level images.
Abstract: We introduce a new class of distinguished regions based on detecting the most salient convex local arrangements of contours in the image. The regions are used in a similar way to the local interest points extracted from gray-level images, but they capture shape rather than texture. Local convexity is characterized by measuring the extent to which the detected image contours support circle or arc-like local structures at each position and scale in the image. Our saliency measure combines two cost functions defined on the tangential edges near the circle: a tangential-gradient energy term, and an entropy term that ensures local support from a wide range of angular positions around the circle. The detected regions are invariant to scale changes and rotations, and robust against clutter, occlusions and spurious edge detections. Experimental results show very good performance for both shape matching and recognition of object categories.

01 Jan 2004
TL;DR: A thorough evaluation clearly demonstrates that the bag of keypoints method is robust to background clutter and produces good categorization accuracy even without exploiting geometric information.
Abstract: We present a novel method for generic visual categorization: the problem of identifying the object content of natural images while generalizing across variations inherent to the object class. This bag of keypoints method is based on vector quantization of affine invariant descriptors of image patches. We propose and compare two alternative implementations using different classifiers: Naive Bayes and SVM. The main advantages of the method are that it is simple, computationally efficient and intrinsically invariant. We present results for classifying nine semantic visual categories and comment on results obtained by Fergus et al using a different method on the same data set. We obtain excellent results as well for multi class categorization as for object detection. A thorough evaluation clearly demonstrates that our method is robust to background clutter and produces good categorization accuracy even without exploiting geometric information.

Proceedings ArticleDOI
14 Jun 2004
TL;DR: A vehicle detection system using a single camera based on the search for areas with a high vertical symmetry in multi-resolution images; symmetry is computed using different sized boxes centered on all the columns of the interest areas to decrease the number of false positives.
Abstract: This paper describes a vehicle detection system using a single camera. It is based on the search for areas with a high vertical symmetry in multi-resolution images; symmetry is computed using different sized boxes centered on all the columns of the interest areas. All the columns with high symmetry are analyzed to get the width of detected objects. Horizontal edges are examined to find the base of the vehicle in the individuated area. The aim is to find horizontal lines located below an area with sufficient amount of edges. The algorithm deletes all the bounding boxes which are too large, too small, or too far from the camera in order to decrease the number of false positives. All the results found in different interest areas are mixed together and the overlapping bounding boxes are localized and managed in order to delete false positives. The algorithm analyzes images on a frame by frame basis, without any temporal correlation.

Proceedings ArticleDOI
27 Jun 2004
TL;DR: A 3D morphable model is used to compute 3D face models from three input images of each subject in the training database and the system achieved a recognition rate significantly better than a comparable global face recognition system.
Abstract: We present a system for pose and illumination invariant face recognition that combines two recent advances in the computer vision field: 3D morphable models and component-based recognition. A 3D morphable model is used to compute 3D face models from three input images of each subject in the training database. The 3D models are rendered under varying pose and illumination conditions to build a large set of synthetic images. These images are then used for training a component-based face recognition system. The face recognition module is preceded by a fast hierarchical face detector resulting in a system that can detect and identify faces in video images at about 4 Hz. The system achieved a recognition rate of 88% on a database of 2000 real images of ten people, which is significantly better than a comparable global face recognition system. The results clearly show the potential of the combination of morphable models and component-based recognition towards pose and illumination invariant face recognition.

Proceedings Article
01 Mar 2004
TL;DR: A probabilistic approach for moving object detection from a mobile robot using a single camera in outdoor environments and the positions of moving objects are estimated using an adaptive particle and EM detection.
Abstract: Robust detection of moving objects from a mobile robot is required for safe outdoor navigation, but is not easily achievable since there are two motions involved: the motions of moving objects and the motion of the sensors used to detect the objects. We have experimented with a probabilistic approach for moving object detection from a mobile robot using a single camera in outdoor environments. The ego-motion of the camera is compensated using corresponding feature sets and outlier detection, and the positions of moving objects are estimated using an adaptive particle lter and EM al- gorithm. The algorithms are implemented and tested on three different robot platforms (robotic helicopter, Segway RMP, and Pioneer2 AT) in an outdoor environment, and the detection results are analyzed.

Book ChapterDOI
30 Aug 2004
TL;DR: In this paper, object categorization in real-world scenes is addressed, given a novel image we want to recognize and localize unseen-before objects based on their similarity to a learned object category.
Abstract: The goal of our work is object categorization in real-world scenes. That is, given a novel image we want to recognize and localize unseen-before objects based on their similarity to a learned object category. For use in a real-world system, it is important that this includes the ability to recognize objects at multiple scales.

Proceedings ArticleDOI
06 Jul 2004
TL;DR: A probabilistic framework for detection and modeling of doors from sensor data acquired in corridor environments with mobile robots is described, which achieves better results than models that only capture behavior, or only capture appearance.
Abstract: We describe a probabilistic framework for detection and modeling of doors from sensor data acquired in corridor environments with mobile robots. The framework captures shape, color, and motion properties of door and wall objects. The probabilistic model is optimized with a version of the expectation maximization algorithm, which segments the environment into door and wall objects and learns their properties. The framework allows the robot to generalize the properties of detected object instances to new object instances. We demonstrate the algorithm on real-world data acquired by a Pioneer robot equipped with a laser range finder and an omni-directional camera. Our results show that our algorithm reliably segments the environment into walls and doors, finding both doors that move and doors that do not move. We show that our approach achieves better results than models that only capture behavior, or only capture appearance.