scispace - formally typeset
Search or ask a question

Showing papers on "Object-class detection published in 2005"


Proceedings ArticleDOI
15 Oct 2005
TL;DR: It is shown that the direct 3D counterparts to commonly used 2D interest point detectors are inadequate, and an alternative is proposed, and a recognition algorithm based on spatio-temporally windowed data is devised.
Abstract: A common trend in object recognition is to detect and leverage the use of sparse, informative feature points. The use of such features makes the problem more manageable while providing increased robustness to noise and pose variation. In this work we develop an extension of these ideas to the spatio-temporal case. For this purpose, we show that the direct 3D counterparts to commonly used 2D interest point detectors are inadequate, and we propose an alternative. Anchoring off of these interest points, we devise a recognition algorithm based on spatio-temporally windowed data. We present recognition results on a variety of datasets including both human and rodent behavior.

2,699 citations


Proceedings ArticleDOI
20 Jun 2005
TL;DR: Qualitative and quantitative results on a large data set confirm that the core part of the method is the combination of local and global cues via probabilistic top-down segmentation that allows examining and comparing object hypotheses with high precision down to the pixel level.
Abstract: In this paper, we address the problem of detecting pedestrians in crowded real-world scenes with severe overlaps. Our basic premise is that this problem is too difficult for any type of model or feature alone. Instead, we present an algorithm that integrates evidence in multiple iterations and from different sources. The core part of our method is the combination of local and global cues via probabilistic top-down segmentation. Altogether, this approach allows examining and comparing object hypotheses with high precision down to the pixel level. Qualitative and quantitative results on a large data set confirm that our method is able to reliably detect pedestrians in crowded scenes, even when they overlap and partially occlude each other. In addition, the flexible nature of our approach allows it to operate on very small training sets.

952 citations


Journal ArticleDOI
TL;DR: An object detection scheme that has three innovations over existing approaches that is based on a model of the background as a single probability density, and the posterior function is maximized efficiently by finding the minimum cut of a capacitated graph.
Abstract: Accurate detection of moving objects is an important precursor to stable tracking or recognition. In this paper, we present an object detection scheme that has three innovations over existing approaches. First, the model of the intensities of image pixels as independent random variables is challenged and it is asserted that useful correlation exists in intensities of spatially proximal pixels. This correlation is exploited to sustain high levels of detection accuracy in the presence of dynamic backgrounds. By using a nonparametric density estimation method over a joint domain-range representation of image pixels, multimodal spatial uncertainties and complex dependencies between the domain (location) and range (color) are directly modeled. We propose a model of the background as a single probability density. Second, temporal persistence is proposed as a detection criterion. Unlike previous approaches to object detection which detect objects by building adaptive models of the background, the foregrounds modeled to augment the detection of objects (without explicit tracking) since objects detected in the preceding frame contain substantial evidence for detection in the current frame. Finally, the background and foreground models are used competitively in a MAP-MRF decision framework, stressing spatial context as a condition of detecting interesting objects and the posterior function is maximized efficiently by finding the minimum cut of a capacitated graph. Experimental validation of the proposed method is performed and presented on a diverse set of dynamic scenes.

685 citations


Proceedings ArticleDOI
17 Oct 2005
TL;DR: This paper constructs a realtime event detector for each action of interest by learning a cascade of filters based on volumetric features that efficiently scans video sequences in space and time and confirms that it achieves performance comparable to a current interest point based human activity recognizer on a standard database of human activities.
Abstract: This paper studies the use of volumetric features as an alternative to popular local descriptor approaches for event detection in video sequences. Motivated by the recent success of similar ideas in object detection on static images, we generalize the notion of 2D box features to 3D spatio-temporal volumetric features. This general framework enables us to do real-time video analysis. We construct a realtime event detector for each action of interest by learning a cascade of filters based on volumetric features that efficiently scans video sequences in space and time. This event detector recognizes actions that are traditionally problematic for interest point methods - such as smooth motions where insufficient space-time interest points are available. Our experiments demonstrate that the technique accurately detects actions on real-world sequences and is robust to changes in viewpoint, scale and action speed. We also adapt our technique to the related task of human action classification and confirm that it achieves performance comparable to a current interest point based human activity recognizer on a standard database of human activities.

616 citations


Journal ArticleDOI
TL;DR: In this article, a two-step detection/tracking method is proposed to deal with the nonrigid nature of human appearance on the road, where the detection phase is performed by a support vector machine (SVM) with size-normalized pedestrian candidates and the tracking phase is a combination of Kalman filter prediction and mean shift tracking.
Abstract: This paper presents a method for pedestrian detection and tracking using a single night-vision video camera installed on the vehicle. To deal with the nonrigid nature of human appearance on the road, a two-step detection/tracking method is proposed. The detection phase is performed by a support vector machine (SVM) with size-normalized pedestrian candidates and the tracking phase is a combination of Kalman filter prediction and mean shift tracking. The detection phase is further strengthened by information obtained by a road-detection module that provides key information for pedestrian validation. Experimental comparisons (e.g., grayscale SVM recognition versus binary SVM recognition and entire-body detection versus upper-body detection) have been carried out to illustrate the feasibility of our approach.

431 citations


Proceedings ArticleDOI
17 Oct 2005
TL;DR: The major contributions are the application of boosted local contour-based features for object detection in a partially supervised learning framework, and an efficient new boosting procedure for simultaneously selecting features and estimating per-feature parameters.
Abstract: We present a novel categorical object detection scheme that uses only local contour-based features. A two-stage, partially supervised learning architecture is proposed: a rudimentary detector is learned from a very small set of segmented images and applied to a larger training set of un-segmented images; the second stage bootstraps these detections to learn an improved classifier while explicitly training against clutter. The detectors are learned with a boosting algorithm which creates a location-sensitive classifier using a discriminative set of features from a randomly chosen dictionary of contour fragments. We present results that are very competitive with other state-of-the-art object detection schemes and show robustness to object articulations, clutter, and occlusion. Our major contributions are the application of boosted local contour-based features for object detection in a partially supervised learning framework, and an efficient new boosting procedure for simultaneously selecting features and estimating per-feature parameters.

349 citations


Proceedings ArticleDOI
T. Mita1, Toshimitsu Kaneko1, O. Hori1
17 Oct 2005
TL;DR: Experimental results show that the proposed joint Haar-like feature for detecting faces in images yields higher classification performance than Viola and Jones' detector; which uses a single feature for each weak classifier.
Abstract: In this paper, we propose a new distinctive feature, called joint Haar-like feature, for detecting faces in images. This is based on co-occurrence of multiple Haar-like features. Feature co-occurrence, which captures the structural similarities within the face class, makes it possible to construct an effective classifier. The joint Haar-like feature can be calculated very fast and has robustness against addition of noise and change in illumination. A face detector is learned by stagewise selection of the joint Haar-like features using AdaBoost. A small number of distinctive features achieve both computational efficiency and accuracy. Experimental results with 5, 676 face images and 30,000 nonface images show that our detector yields higher classification performance than Viola and Jones' detector; which uses a single feature for each weak classifier. Given the same number of features, our method reduces the error by 37%. Our detector is 2.6 times as fast as Viola and Jones' detector to achieve the same performance

331 citations


Proceedings ArticleDOI
10 Oct 2005
TL;DR: This paper presents a method for fully automatic detection of 20 facial feature points in images of expressionless faces using Gabor feature based boosted classifiers using GentleBoost templates built from both gray level intensities and Gabor wavelet features.
Abstract: Locating facial feature points in images of faces is an important stage for numerous facial image interpretation tasks. In this paper we present a method for fully automatic detection of 20 facial feature points in images of expressionless faces using Gabor feature based boosted classifiers. The method adopts fast and robust face detection algorithm, which represents an adapted version of the original Viola-Jones face detector. The detected face region is then divided into 20 relevant regions of interest, each of which is examined further to predict the location of the facial feature points. The proposed facial feature point detection method uses individual feature patch templates to detect points in the relevant region of interest. These feature models are GentleBoost templates built from both gray level intensities and Gabor wavelet features. When tested on the Cohn-Kanade database, the method has achieved average recognition rates of 93%.

304 citations


Proceedings ArticleDOI
20 Jun 2005
TL;DR: The impact of eye locations on face recognition accuracy is studied, and an automatic technique for eye detection is introduced, and the face recognition performance is shown to be comparable to that of using manually given eye positions.
Abstract: The accuracy of face alignment affects the performance of a face recognition system. Since face alignment is usually conducted using eye positions, an accurate eye localization algorithm is therefore essential for accurate face recognition. In this paper, we first study the impact of eye locations on face recognition accuracy, and then introduce an automatic technique for eye detection. The performance of our automatic eye detection technique is subsequently validated using FRGC 1.0 database. The validation shows that our eye detector has an overall 94.5% eye detection rate, with the detected eyes very close to the manually provided eye positions. In addition, the face recognition performance based on the automatic eye detection is shown to be comparable to that of using manually given eye positions.

237 citations


Proceedings ArticleDOI
18 Apr 2005
TL;DR: A feature detection system for real-time identification of lines, circles and people legs from laser range data is developed and a new method suitable for arc/circle detection is proposed: the Inscribed Angle Variance (IAV).
Abstract: A feature detection system has been developed for real-time identification of lines, circles and people legs from laser range data. A new method suitable for arc/circle detection is proposed: the Inscribed Angle Variance (IAV). Lines are detected using a recursive line fitting method. The people leg detection is based on geometrical relations. The system was implemented as a plugin driver in Player, a mobile robot server. Real results are presented to verify the effectiveness of the proposed algorithms in indoor environment with moving objects.

205 citations


Book ChapterDOI
TL;DR: The main purpose of this overview is to describe the recent 3D face recognition algorithms, which hold more information of the face, like surface information, that can be used for face recognition or subject discrimination.
Abstract: Many researches in face recognition have been dealing with the challenge of the great variability in head pose, lighting intensity and direction,facial expression, and aging. The main purpose of this overview is to describe the recent 3D face recognition algorithms. The last few years more and more 2D face recognition algorithms are improved and tested on less than perfect images. However, 3D models hold more information of the face, like surface information, that can be used for face recognition or subject discrimination. Another major advantage is that 3D face recognition is pose invariant. A disadvantage of most presented 3D face recognition methods is that they still treat the human face as a rigid object. This means that the methods aren't capable of handling facial expressions. Although 2D face recognition still seems to outperform the 3D face recognition methods, it is expected that this will change in the near future.

Proceedings ArticleDOI
31 Aug 2005
TL;DR: Experimental results demonstrate that the proposed approach could efficiently be used as an automatic text detection system, which is robust for font size, font color, background complexity and language.
Abstract: In this paper, an algorithm is proposed for detecting texts in images and video frames. It is performed by three steps: edge detection, text candidate detection and text refinement detection. Firstly, it applies edge detection to get four edge maps in horizontal, vertical, up-right, and up-left direction. Secondly, the feature is extracted from four edge maps to represent the texture property of text. Then k-means algorithm is applied to detect the initial text candidates. Finally, the text areas are identified by the empirical rules analysis and refined through project profile analysis. Experimental results demonstrate that the proposed approach could efficiently be used as an automatic text detection system, which is robust for font size, font color, background complexity and language.

Proceedings ArticleDOI
20 Jun 2005
TL;DR: This paper uses the face recognition grand challenge dataset to evaluate hierarchical graph matching (HGM), an universal approach to 2D and 3D face recognition, and shows that HGM yields the best results presented at the recent FRGC workshop, and that 2d face recognition is significantly more accurate than 3DFace recognition and that the fusion of both modalities leads to a further improvement of the 2D results.
Abstract: The extension of 2D image-based face recognition methods with respect to 3D shape information and the fusion of both modalities is one of the main topics in the recent development of facial recognition. In this paper we discuss different strategies and their expected benefit for the fusion of 2D and 3D face recognition. The face recognition grand challenge (FRGC) provides for the first time ever a public benchmark dataset of a suitable size to evaluate the accuracy of both 2D and 3D face recognition. We use this benchmark to evaluate hierarchical graph matching (HGM), an universal approach to 2D and 3D face recognition, and demonstrate the benefit of different fusion strategies. The results show that HGM yields the best results presented at the recent FRGC workshop, that 2D face recognition is significantly more accurate than 3D face recognition and that the fusion of both modalities leads to a further improvement of the 2D results.

Proceedings ArticleDOI
20 Jun 2005
TL;DR: Two algorithms for detecting face anchor points in the context of face verification are presented; One for frontal images and one for arbitrary pose, demonstrating the challenges in 3D face recognition under arbitrary pose and expression.
Abstract: This paper outlines methods to detect key anchor points in 3D face scanner data. These anchor points can be used to estimate the pose and then match the test image to a 3D face model. We present two algorithms for detecting face anchor points in the context of face verification; One for frontal images and one for arbitrary pose. We achieve 99% success in finding anchor points in frontal images and 86% success in scans with large variations in pose and changes in expression. These results demonstrate the challenges in 3D face recognition under arbitrary pose and expression. We are currently working on robust ?tting algorithms to localize more precisely the anchor points for arbitrary pose images.

Proceedings ArticleDOI
20 Jun 2005
TL;DR: Using a non-parametric density estimation method over a joint domain-range representation of image pixels, multi-modal spatial uncertainties and complex dependencies between the domain and range are directly modeled and temporal persistence is proposed as a detection criteria.
Abstract: Detecting moving objects using stationary cameras is an important precursor to many activity recognition, object recognition and tracking algorithms. In this paper, three innovations are presented over existing approaches. Firstly, the model of the intensities of image pixels as independently distributed random variables is challenged and it is asserted that useful correlation exists in the intensities of spatially proximal pixels. This correlation is exploited to sustain high levels of detection accuracy in the presence of nominal camera motion and dynamic textures. By using a non-parametric density estimation method over a joint domain-range representation of image pixels, multi-modal spatial uncertainties and complex dependencies between the domain (location) and range (color) are directly modeled. Secondly, temporal persistence is proposed as a detection criteria. Unlike previous approaches to object detection which detect objects by building adaptive models of the only background, the foreground is also modeled to augment the detection of objects (without explicit tracking) since objects detected in a preceding frame contain substantial evidence for detection in a current frame. Third, the background and foreground models are used competitively in a MAP-MRF decision framework, stressing spatial context as a condition of pixel-wise labeling and the posterior function is maximized efficiently using graph cuts. Experimental validation of the proposed method is presented on a diverse set of dynamic scenes.

01 Jan 2005
TL;DR: In this paper, a framework for image object-based change detection is proposed, which breaks down the n-dimensional problem to two main aspects, geometry and thematic content, which can be associated with the following questions: did a certain classified object change geometrically, class-wise, or both.
Abstract: With the advent of high resolution satellite imagery and airborne digital camera data approaches that include contextual information are more commonly utilized. One way to include spatial dimensions in image analysis is to identify relatively homogeneous regions and to treat them as objects. Although segmentation is not a new concept, the number of image segmentation based applications is recently significantly increasing. Concurrently, new methodological challenges arise. Standard change detection and accuracy assessment techniques mainly rely on statistically assessing individual pixels. Such assessments are not satisfactory for image objects which exhibit shape, boundary, homogeneity or topological information. These additional dimensions of information describing real world objects have to be assessed in multitemporal object-based image analysis. In this paper, problems associated with multitemporal object recognition are identified and a framework for image object-based change detection is suggested. For simplicity, this framework breaks down the n-dimensional problem to two main aspects, geometry and thematic content. These two aspects can be associated with the following questions: did a certain classified object change geometrically, class-wise, or both? When can we identify an object in one data set as being the same object in another data set? Do we need user-defined or application-specific thresholds for geometric overlap, shape-area relations, centroid movements, etc? This paper elucidates some specific challenges to change detection of objects and incorporates GIS-functionality into image analysis.

Proceedings ArticleDOI
17 Oct 2005
TL;DR: This work explores a hybrid generative/discriminative approach using 'Fisher kernels' by Jaakkola and Haussler (1999) which retains most of the desirable properties of generative methods, while increasing the classification performance through a discriminative setting.
Abstract: Learning models for detecting and classifying object categories is a challenging problem in machine vision. While discriminative approaches to learning and classification have, in principle, superior performance, generative approaches provide many useful features, one of which is the ability to naturally establish explicit correspondence between model components and scene features - this, in turn, allows for the handling of missing data and unsupervised learning in clutter. We explore a hybrid generative/discriminative approach using 'Fisher kernels' by Jaakkola and Haussler (1999) which retains most of the desirable properties of generative methods, while increasing the classification performance through a discriminative setting. Furthermore, we demonstrate how this kernel framework can be used to combine different types of features and models into a single classifier. Our experiments, conducted on a number of popular benchmarks, show strong performance improvements over the corresponding generative approach and are competitive with the best results reported in the literature.

Patent
21 Jan 2005
TL;DR: A display arrangement comprises an image display device having two or more sets of images for display, a camera directed towards positions adopted by users viewing the display, and a face detector for detecting human faces in images captured by the camera, the face detector being arranged to detect faces in at least two face categories as discussed by the authors.
Abstract: A display arrangement comprises an image display device having two or more sets of images for display; a camera directed towards positions adopted by users viewing the display; a face detector for detecting human faces in images captured by the camera, the face detector being arranged to detect faces in at least two face categories; and means, responsive to the a frequency of detection of categories of faces by the face detector at one or more different periods, for selecting a set of images to be displayed on the image display device at that time of day.

Proceedings ArticleDOI
12 Dec 2005
TL;DR: This paper compares the two recently developed systems ARTag and ARToolkit Plus on their reliability, detection rates, and immunity to lighting and occlusion.
Abstract: Fiducial marker systems are systems of unique patterns and computer vision algorithms that help solve the correspondence problem, automatically finding features in different camera images that belong to the same object point in the world. Fiducial marker systems consist of patterns that are mounted in the environment and automatically detected in digital images using an accompanying detection algorithm, useful for augmented reality (AR), robot navigation, 3D modeling, and other applications. This paper compares the two recently developed systems ARTag and ARToolkit Plus on their reliability, detection rates, and immunity to lighting and occlusion. Processing in fiducial systems are defined as two stages, unique feature detection and verification/identification. The systems are compared considering these stages, experimental results are shown.

Proceedings ArticleDOI
20 Jun 2005
TL;DR: 3D face recognition has lately been attracting ever increasing attention and this paper complements other reviews in the face biometrics area by focusing on the sensor technology, and by detailing the efforts in 3D face modelling and 3D assisted 2D face matching.
Abstract: 3D face recognition has lately been attracting ever increasing attention. In this paper we review the full spectrum of 3D face processing technology, from sensing to recognition. The review covers 3D face modelling, 3D to 3D and 3D to 2D registration, 3D based recognition and 3D assisted 2D based recognition. The fusion of 2D and 3D modalities is also addressed. The paper complements other reviews in the face biometrics area by focusing on the sensor technology, and by detailing the efforts in 3D face modelling and 3D assisted 2D face matching.

Proceedings ArticleDOI
20 Jun 2005
TL;DR: This paper presents an effective method to automatically extract ROI of facial surface, which mainly depends on automatic detection of facial bilateral symmetry plane and localization of nose tip, and builds a reference plane through the nose tip for calculating the relative depth values.
Abstract: This paper addresses 3D face recognition from facial shape. Firstly, we present an effective method to automatically extract ROI of facial surface, which mainly depends on automatic detection of facial bilateral symmetry plane and localization of nose tip. Then we build a reference plane through the nose tip for calculating the relative depth values. Considering the non-rigid property of facial surface, the ROI is triangulated and parameterized into an isomorphic 2D planar circle, attempting to preserve the intrinsic geometric properties. At the same time the relative depth values are also mapped. Finally we perform eigenface on the mapped relative depth image. The entire scheme is insensitive to pose variance. The experiment using FRGC database v1.0 obtains the rank-1 identification score of 95%, which outperforms the result of the PCA base-line method by 4%, which demonstrates the effectiveness of our algorithm.

Proceedings ArticleDOI
20 Jun 2005
TL;DR: This paper describes a system for pedestrian detection in stereo infrared images based on three different underlying approaches: warm area detection, edgebased detection, and v-disparity computation.
Abstract: This paper describes a system for pedestrian detection in stereo infrared images. The system is based on three different underlying approaches: warm area detection, edgebased detection, and v-disparity computation. Stereo is also used for computing the distance and size of detected objects. A final validation process is performed using head morphological and thermal characteristics. Neither temporal correlation, nor motion cues are used in this processing. The developed system has been implemented on an experimental vehicle equipped with two infrared camera and preliminarily tested in different situations.

Patent
Dalong Jiang1, Hong-Jiang Zhang1, Lei Zhang1, Shuicheng Yan1, Yuxiao Hu1 
29 Apr 2005
TL;DR: In this article, a method and system for generating 3D images of faces from 2D images, for generating two-dimensional images of the faces at different image conditions from the 3D image, and for recognizing a 2D image of a target face based on the generated 2-D images is provided.
Abstract: A method and system for generating 3D images of faces from 2D images, for generating 2D images of the faces at different image conditions from the 3D images, and for recognizing a 2D image of a target face based on the generated 2D images is provided. The recognition system provides a 3D model of a face that includes a 3D image of a standard face under a standard image condition and parameters indicating variations of an individual face from the standard face. To generate the 3D image of a face, the recognition system inputs a 2D image of the face under a standard image condition. The recognition system then calculates parameters that map the points of the 2D image to the corresponding points of a 2D image of the standard face. The recognition system uses these parameters with the 3D model to generate 3D images of the face at different image conditions.

Proceedings ArticleDOI
15 Oct 2005
TL;DR: A novel on-line conservative learning framework for an object detection system that uses reconstructive and discriminative classifiers in an iterative co-training fashion to arrive at increasingly better object detectors.
Abstract: We present a novel on-line conservative learning framework for an object detection system. All algorithms operate in an on-line mode, in particular we also present a novel on-line AdaBoost method. The basic idea is to start with a very simple object detection system and to exploit a huge amount of unlabeled video data by being very conservative in selecting training examples. The key idea is to use reconstructive and discriminative classifiers in an iterative co-training fashion to arrive at increasingly better object detectors. We demonstrate the framework on a surveillance task where we learn person detectors that are tested on two surveillance video sequences. We start with a simple moving object classifier and proceed with incremental PCA (on shape and appearance) as a reconstructive classifier, which in turn generates a training set for a discriminative on-line AdaBoost classifier

Proceedings ArticleDOI
17 Oct 2005
TL;DR: A closely coupled object detection and segmentation algorithm for enhancing both processes in a cooperative and iterative manner is proposed, which improves both segmentation and detection.
Abstract: We propose a closely coupled object detection and segmentation algorithm for enhancing both processes in a cooperative and iterative manner. Figure-ground segmentation reduces the effect of background clutter on template matching; the matched template provides shape constraints on segmentation. More precisely, we estimate the probability of each pixel belonging to the foreground by a weighted sum of the estimates based on shape and color alone. The weight on the shape-based estimate is related to the probability that a familiar object is present and is updated dynamically so that we enforce shape constraints only where the object is present. Experiments on detecting people in images of cluttered scenes demonstrate that the proposed algorithm improves both segmentation and detection. More accurate object boundaries are extracted; higher object detection rates and lower false alarm rates are achieved than performing the two processes separately or sequentially.

Patent
Jan Erik Solem1, Fredrik Kahl1
11 Aug 2005
TL;DR: In this article, a statistical shape model is used to recover 3D shapes from a 2D representation of the 3D object and compare the recovered 3D shape with known 3D to 2D representations of at least one object of the object class.
Abstract: A method, device, system, and computer program for object recognition of a 3D object of a certain object class using a statistical shape model for recovering 3D shapes from a 2D representation of the 3D object and comparing the recovered 3D shape with known 3D to 2D representations of at least one object of the object class.

Journal ArticleDOI
TL;DR: A probabilistic approach to image orientation detection via confidence-based integration of low-level and semantic cues within a Bayesian framework is developed, an attempt to bridge the gap between computer and human vision systems and is applicable to other problems involving semantic scene content understanding.
Abstract: Automatic image orientation detection for natural images is a useful, yet challenging research topic. Humans use scene context and semantic object recognition to identify the correct image orientation. However, it is difficult for a computer to perform the task in the same way because current object recognition algorithms are extremely limited in their scope and robustness. As a result, existing orientation detection methods were built upon low-level vision features such as spatial distributions of color and texture. Discrepant detection rates have been reported for these methods in the literature. We have developed a probabilistic approach to image orientation detection via confidence-based integration of low-level and semantic cues within a Bayesian framework. Our current accuracy is 90 percent for unconstrained consumer photos, impressive given the findings of a psychophysical study conducted recently. The proposed framework is an attempt to bridge the gap between computer and human vision systems and is applicable to other problems involving semantic scene content understanding.

Proceedings ArticleDOI
17 Oct 2005
TL;DR: A new boosting paradigm to achieve detection of events in video that has the capability to improve weak classifiers by allowing them to use previous history in evaluating the current frame and a learning mechanism built into the boosting paradigm which allows event level decisions to be made.
Abstract: This paper contributes a new boosting paradigm to achieve detection of events in video. Previous boosting paradigms in vision focus on single frame detection and do not scale to video events. Thus new concepts need to be introduced to address questions such as determining if an event has occurred, localizing the event, handling same action performed at different speeds, incorporating previous classifier responses into current decision, using temporal consistency of data to aid detection and recognition. The proposed method has the capability to improve weak classifiers by allowing them to use previous history in evaluating the current frame. A learning mechanism built into the boosting paradigm is also given which allows event level decisions to be made. This is contrasted with previous work in boosting which uses limited higher level temporal reasoning and essentially makes object detection decisions at the frame level. Our approach makes extensive use of temporal continuity of video at the classifier and detector levels. We also introduce a relevant set of activity features. Features are evaluated at multiple zoom levels to improve detection. We show results for a system that is able to recognize 11 actions.

Patent
28 Mar 2005
TL;DR: In this paper, the image object is recognized using pose-specific object recognizers that use outputs from the pose-sensitive object detectors and the fused output of the posespecific object detectors.
Abstract: Methods for image processing for detecting and recognizing an image object include detecting an image object using pose-specific object detectors, and performing fusion of the outputs from the pose-specific object detectors. The image object is recognized using pose-specific object recognizers that use outputs from the pose-specific object detectors and the fused output of the pose-specific object detectors; and by performing fusion of the outputs of the pose-specific object recognizers to recognize the image object.

Proceedings ArticleDOI
05 Jan 2005
TL;DR: This paper presents a method utilizing the registered 2D color and range image of a face to automatically identify the eyes, nose, and mouth and aims to run the algorithm as fast as possible.
Abstract: As interest in 3D face recognition increases the importance of the initial alignment problem does as well. In this paper we present a method utilizing the registered 2D color and range image of a face to automatically identify the eyes, nose, and mouth. These features are important to initially align faces in both standard 2D and 3D face recognition algorithms. For our algorithm to run as fast as possible, we focus on the 2D color information. This allows the algorithm to run in approximately 4 seconds on a 640times480 image with registered range data. On a database of 1,500 images the algorithm achieved a facial feature detection rate of 99.6% with 0.4% of the images skipped due to hair obstruction of the face.