scispace - formally typeset
Search or ask a question

Showing papers by "Paul A. Viola published in 2003"


Journal ArticleDOI
13 Oct 2003
TL;DR: This paper describes a pedestrian detection system that integrates image intensity information with motion information, and is the first to combine both sources of information in a single detector.
Abstract: This paper describes a pedestrian detection system that integratesimage intensity information with motion information.We use a detection style algorithm that scans a detectorover two consecutive frames of a video sequence. Thedetector is trained (using AdaBoost) to take advantage ofboth motion and appearance information to detect a walkingperson. Past approaches have built detectors based onmotion information or detectors based on appearance information,but ours is the first to combine both sources ofinformation in a single detector. The implementation describedruns at about 4 frames/second, detects pedestriansat very small scales (as small as 20x15 pixels), and has avery low false positive rate.Our approach builds on the detection work of Viola andJones. Novel contributions of this paper include: i) developmentof a representation of image motion which is extremelyefficient, and ii) implementation of a state of theart pedestrian detection system which operates on low resolutionimages under difficult conditions (such as rain andsnow).

2,367 citations


Journal Article
TL;DR: A multi-view detector presented in this pa-per is a combination of Viola-Jones detectors, each detectortrained on face data taken from a single viewpoint, which appears that a monolithic approach to face detection is unlearnable with existing classifier trained on all poses.
Abstract: This paper extends the face detection framework proposedby Viola and Jones 2001 to handle profile views and rotatedfaces. As in the work of Rowley et al 1998. and Schneider-man et al. 2000, we build different detectors for differentviews of the face. A decision tree is then trained to deter-mine the viewpoint class (such as right profile or rotated60 degrees) for a given window of the image being exam-ined. This is similar to the approach of Rowley et al. 1998.The appropriate detector for that viewpoint can then be runinstead of running all detectors on all windows. This tech-niqueyields goodresults and maintainsthe speed advantageof the Viola-Jones detector. 1. Introduction There are a number of techniques that can successfullydetect frontal upright faces in a wide variety of images[11, 7, 10, 12, 3, 6]. While the definition of “frontal” and“upright”mayvaryfromsystem to system, the reality is thatmany natural images contain rotated or profile faces thatare not reliably detected. There are a small number of sys-tems which explicitly address non-frontal, or non-uprightface detection [8, 10, 2]. This paper describes progress to-ward a system which can detect faces regardless of posereliably and in real-time.This paperextendsthe frameworkproposedby Viola andJones [12]. This approach is selected because of its compu-tational efficiency and simplicity.One observation which is shared among all previous re-lated work is that a multi-view detector must be carefullyconstructed by combining a collection of detectors eachtrained for a single viewpoint. It appears that a monolithicapproach, where a single classifier is trained to detect allposes of a face, is unlearnable with existing classifiers. Ourinformal experiments lend support to this conclusion, sincea classifier trained on all poses appears to be hopelessly in-accurate.This paper addresses two types of pose variation: non-frontal faces, which are rotated out of the image plane, andnon-upright faces, which are rotated in the image plane.In both cases the multi-view detector presented in this pa-per is a combination of Viola-Jones detectors, each detectortrained on face data taken from a single viewpoint.Reliable non-upright face detection was first presentedin a paper by Rowley, Baluja and Kanade [8]. They traintwo neural network classifiers. The first estimates the poseof a face in the detection window. The second is a conven-tional face detector. Faces are detected in three steps: foreach image window the pose of “face” is first estimated; thepose estimate is then used to de-rotate the image window;the window is then classified by the second detector. Fornon-face windows, the poses estimate must be consideredrandom. Nevertheless, a rotated non-faceshouldbe rejectedby the conventional detector. One potential flaw of such asystem is that the final detection rate is roughly the productof the correct classification rates of the two classifiers (sincethe errors of the two classifiers are somewhat independent).One could adopt the Rowley et al. three step approachwhile replacingthe classifiers with those of Viola andJones.The final system would be more efficient, but not signifi-cantly. Classification by the Viola-Jones system is so effi-cient, that derotation would dominate the computational ex-pense. In principle derotation is not strictly necessary sinceit should be possible to construct a detector for rotated facesdirectly. Detection becomes a two stage process. First thepose of the window is estimated and then one ofrotationspecific detectors is called upon to classify the window.In this paper detection of non-upright faces is handledusing the two stage approach. In the first stage the pose ofeach window is estimated using a decision tree constructedusing features like those described by Viola and Jones. Inthe second stage one ofpose specific Viola-Jones dete-tectors are used to classify the window.Oncepose specific detectors are trained and available,an alternative detection process can be tested as well. In thiscase alldetectors are evaluated and the union of their de-

746 citations


01 Jan 2003
TL;DR: A new method for face recognition which learns a face similarity measure from example image pairs using a set of computationally efficient “rectangle” features which act on pairs of input images.
Abstract: This paper presents a new method for face recognition which learns a face similarity measure from example image pairs. A set of computationally efficient “rectangle” features are described which act on pairs of input images. The features compare regions within the input images at different locations, scales, and orientations. The AdaBoost algorithm is used to train the face similarity function by selecting features. Given a large face database, the set of face pairs is too large for effective training. We present a sampling procedure which selects a training subset based on the AdaBoost example weights. Finally, we show state of the art results on the FERET set of faces as well as a more challenging set of faces collected at our lab.

156 citations


Patent
17 Jun 2003
TL;DR: In this article, a linear combination of filters is applied to a detection window in the set of combined images to determine motion and appearance features of the detection window, which are summed to determine a cumulative score.
Abstract: A method detects a moving object in a temporal sequence of images. Images are selected from the temporally ordered sequence of images. A set of functions is applied to the selected images to generate a set of combined images. A linear combination of filters is applied to a detection window in the set of combined images to determine motion and appearance features of the detection window. The motion and appearance features are summed to determine a cumulative score, which enables a classification of the detection window as including the moving object.

80 citations


Patent
17 Jun 2003
TL;DR: In this article, an orientation of an arbitrary object with respect to an image plane is determined and one of a plurality orientation and object specific classifiers is selected according to the orientation.
Abstract: A method for detects a specific object in an image. An orientation of an arbitrary object with respect to an image plane is determined and one of a plurality orientation and object specific classifiers is selected according to the orientation. The arbitrary object is classified as a specific object with the selected orientation and object specific classifier.

38 citations