scispace - formally typeset
Search or ask a question

Showing papers on "Face detection published in 1998"


Journal ArticleDOI
TL;DR: A neural network-based upright frontal face detection system that arbitrates between multiple networks to improve performance over a single network, and a straightforward procedure for aligning positive face examples for training.
Abstract: We present a neural network-based upright frontal face detection system. A retinally connected neural network examines small windows of an image and decides whether each window contains a face. The system arbitrates between multiple networks to improve performance over a single network. We present a straightforward procedure for aligning positive face examples for training. To collect negative examples, we use a bootstrap algorithm, which adds false detections into the training set as training progresses. This eliminates the difficult task of manually selecting nonface training examples, which must be chosen to span the entire space of nonface images. Simple heuristics, such as using the fact that faces rarely overlap in images, can further improve the accuracy. Comparisons with several other state-of-the-art face detection systems are presented, showing that our system has comparable performance in terms of detection and false-positive rates.

4,105 citations


Journal ArticleDOI
TL;DR: An example-based learning approach for locating vertical frontal views of human faces in complex scenes and shows empirically that the distance metric adopted for computing difference feature vectors, and the "nonface" clusters included in the distribution-based model, are both critical for the success of the system.
Abstract: We present an example-based learning approach for locating vertical frontal views of human faces in complex scenes. The technique models the distribution of human face patterns by means of a few view-based "face" and "nonface" model clusters. At each image location, a difference feature vector is computed between the local image pattern and the distribution-based model. A trained classifier determines, based on the difference feature vector measurements, whether or not a human face exists at the current image location. We show empirically that the distance metric we adopt for computing difference feature vectors, and the "nonface" clusters we include in our distribution-based model, are both critical for the success of our system.

2,013 citations


Proceedings ArticleDOI
04 Jan 1998
TL;DR: A general trainable framework for object detection in static images of cluttered scenes based on a wavelet representation of an object class derived from a statistical analysis of the class instances and a motion-based extension to enhance the performance of the detection algorithm over video sequences is presented.
Abstract: This paper presents a general trainable framework for object detection in static images of cluttered scenes. The detection technique we develop is based on a wavelet representation of an object class derived from a statistical analysis of the class instances. By learning an object class in terms of a subset of an overcomplete dictionary of wavelet basis functions, we derive a compact representation of an object class which is used as an input to a support vector machine classifier. This representation overcomes both the problem of in-class variability and provides a low false detection rate in unconstrained environments. We demonstrate the capabilities of the technique in two domains whose inherent information content differs significantly. The first system is face detection and the second is the domain of people which, in contrast to faces, vary greatly in color, texture, and patterns. Unlike previous approaches, this system learns from examples and does not rely on any a priori (hand-crafted) models or motion-based segmentation. The paper also presents a motion-based extension to enhance the performance of the detection algorithm over video sequences. The results presented here suggest that this architecture may well be quite general.

1,594 citations


Proceedings ArticleDOI
Gary Bradski1
19 Oct 1998
TL;DR: An efficient, new algorithm is described here based on the mean shift algorithm, which robustly finds the mode (peak) of probability distributions within a video scene and is used as an interface for games and graphics.
Abstract: As a step towards a perceptual user interface, an object tracking algorithm is developed and demonstrated tracking human faces. Computer vision algorithms that are intended to form part of a perceptual user interface must be fast and efficient. They must be able to track in real time and yet not absorb a major share of computational resources. An efficient, new algorithm is described here based on the mean shift algorithm. The mean shift algorithm robustly finds the mode (peak) of probability distributions. We first describe histogram based methods of producing object probability distributions. In our case, we want to track the mode of an object's probability distribution within a video scene. Since the probability distribution of the object can change and move dynamically in time, the mean shift algorithm is modified to deal with dynamically changing probability distributions. The modified algorithm is called the Continuously Adaptive Mean Shift (CAMSHIFT) algorithm. CAMSHIFT is then used as an interface for games and graphics.

676 citations


Book ChapterDOI
14 Apr 1998
TL;DR: A hybrid classifier using PCA and LDA provides a useful framework for other image recognition tasks as well and demonstrates a significant improvement when principal components rather than original images are fed to the LDA classifier.
Abstract: In this paper we describe a face recognition method based on PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis). The method consists of two steps: first we project the face image from the original vector space to a face subspace via PCA, second we use LDA to obtain a best linear classifier. The basic idea of combining PCA and LDA is to improve the generalization capability of LDA when only few samples per class are available. Using PCA, we are able to construct a face subspace in which we apply LDA to perform classification. Using FERET dataset we demonstrate a significant improvement when principal components rather than original images are fed to the LDA classifier. The hybrid classifier using PCA and LDA provides a useful framework for other image recognition tasks as well.

670 citations


Book ChapterDOI
25 Dec 1998
TL;DR: An eigenspace manifold for the representation and recognition of pose-varying faces is described and a framework is proposed which can be used for both familiar and unfamiliar face recognition.
Abstract: We describe an eigenspace manifold for the representation and recognition of pose-varying faces. The distribution of faces in this manifold allows us to determine theoretical recognition characteristics which are then verified experimentally. Using this manifold a framework is proposed which can be used for both familiar and unfamiliar face recognition. A simple implementation demonstrates the pose dependent nature of the system over the transition from unfamiliar to familiar face recognition. Furthermore we show that multiple test images, whether real or virtual, can be used to augment the recognition process. The results compare favourably with reported human face recognition experiments. Finally, we describe how this framework can be used as a mechanism for characterising faces from video for general purpose recognition.

637 citations


Proceedings ArticleDOI
14 Apr 1998
TL;DR: The authors investigate the use of two types of features extracted from face images for recognizing facial expressions, and it turns out that five to seven hidden units are probably enough to represent the space of feature expressions.
Abstract: The authors investigate the use of two types of features extracted from face images for recognizing facial expressions. The first type is the geometric positions of a set of fiducial points on a face. The second type is a set of multi-scale and multi-orientation Gabor wavelet coefficients extracted from the face image at the fiducial points. They can be used either independently or jointly. The architecture developed is based on a two-layer perceptron. The recognition performance with different types of features has been compared, which shows that Gabor wavelet coefficients are much more powerful than geometric positions. Furthermore, since the first layer of the perceptron actually performs a nonlinear reduction of the dimensionality of the feature space, they have also studied the desired number of hidden units, i.e., the appropriate dimension to represent a facial expression in order to achieve a good recognition rate. It turns out that five to seven hidden units are probably enough to represent the space of feature expressions.

637 citations


Proceedings ArticleDOI
23 Jun 1998
TL;DR: This paper presents a neural network-based face detection system, which is limited to detecting upright, frontal faces, and presents preliminary results for detecting faces rotated out of the image plane, such as profiles and semi-profiles.
Abstract: In this paper, we present a neural network-based face detection system. Unlike similar systems which are limited to detecting upright, frontal faces, this system detects faces at any degree of rotation in the image plane. The system employs multiple networks; a "router" network first processes each input window to determine its orientation and then uses this information to prepare the window for one or more "detector" networks. We present the training methods for both types of networks. We also perform sensitivity analysis on the networks, and present empirical results on a large test set. Finally, we present preliminary results for detecting faces rotated out of the image plane, such as profiles and semi-profiles.

570 citations


Proceedings ArticleDOI
14 Apr 1998
TL;DR: A face recognition method using image sequence that essentially form a subspace with the image sequence and applies the Mutual Subspace Method in which the similarity is defined by the angle between the subspace of input and those of references.
Abstract: We present a face recognition method using image sequence. As input we utilize plural face images rather than a "single-shot", so that the input reflects variation of facial expression and face direction. For the identification of the face, we essentially form a subspace with the image sequence and apply the Mutual Subspace Method in which the similarity is defined by the angle between the subspace of input and those of references. We demonstrate the effectiveness of the proposed method through several experimental results.

512 citations


Journal ArticleDOI
TL;DR: A novel method for the segmentation of faces, extraction of facial features and tracking of the face contour and features over time, using deformable models like snakes is described.
Abstract: The present paper describes a novel method for the segmentation of faces, extraction of facial features and tracking of the face contour and features over time. Robust segmentation of faces out of complex scenes is done based on color and shape information. Additionally, face candidates are verified by searching for facial features in the interior of the face. As interesting facial features we employ eyebrows, eyes, nostrils, mouth and chin. We consider incomplete feature constellations as well. If a face and its features are detected once reliably, we track the face contour and the features over time. Face contour tracking is done by using deformable models like snakes. Facial feature tracking is performed by block matching. The success of our approach was verified by evaluating 38 different color image sequences, containing features as beard, glasses and changing facial expressions.

334 citations


Book ChapterDOI
02 Jun 1998
TL;DR: A simplified model of a deformable object class is introduced and the optimal detector for this model is derived, which is not realizable except under special circumstances (independent part positions).
Abstract: Many object classes, including human faces, can be modeled as a set of characteristic parts arranged in a variable spatial configuration. We introduce a simplified model of a deformable object class and derive the optimal detector for this model. However, the optimal detector is not realizable except under special circumstances (independent part positions). A cousin of the optimal detector is developed which uses “soft” part detectors with a probabilistic description of the spatial arrangement of the parts. Spatial arrangements are modeled probabilistically using shape statistics to achieve invariance to translation, rotation, and scaling. Improved recognition performance over methods based on “hard” part detectors is demonstrated for the problem of face detection in cluttered scenes.

Journal ArticleDOI
TL;DR: An analytic-to-holistic approach which can identify faces at different perspective variations is proposed, and it is shown that this approach can achieve a similar level of performance from different viewing directions of a face.
Abstract: We propose an analytic-to-holistic approach which can identify faces at different perspective variations. The database for the test consists of 40 frontal-view faces. The first step is to locate 15 feature points on a face. A head model is proposed, and the rotation of the face can be estimated using geometrical measurements. The positions of the feature points are adjusted so that their corresponding positions for the frontal view are approximated. These feature points are then compared with the feature points of the faces in a database using a similarity transform. In the second step, we set up windows for the eyes, nose, and mouth. These feature windows are compared with those in the database by correlation. Results show that this approach can achieve a similar level of performance from different viewing directions of a face. Under different perspective variations, the overall recognition rates are over 84 percent and 96 percent for the first and the first three likely matched faces, respectively.

Journal ArticleDOI
TL;DR: An algorithm for detecting human faces and facial features, such as the location of the eyes, nose and mouth, is described, using a supervised pixel-based color classifier and an ellipse model fit to each disjoint skin region.

Proceedings ArticleDOI
23 Jun 1998
TL;DR: This work combines stereo, color and face detection modules into a single robust system, shows an initial application in an interactive, face-responsive display, and discusses the failure modes of each individual module.
Abstract: We present an approach to real-time person tracking in crowded and/or unknown environments using multi-modal integration. We combine stereo, color and face detection modules into a single robust system, and show an initial application in an interactive, face-responsive display. Dense, real-time stereo processing is used to isolate users from other objects and people in the background. Skin-hue classification identifies and tracks likely body parts within the silhouette of a user. Face pattern detection discriminates and localizes the face within the identified body parts. Faces and bodies of users are tracked over several temporal scales: short-term (user stay's within the field of view), medium-term (user exits/reenters within minutes), and long term (user returns after hours or days). Short-term tracking is performed using simple region position and size correspondences, while medium and long-term tracking are based on statistics of user appearance. We discuss the failure modes of each individual module, describe our integration method, and report results with the complete system in trials with thousands of users.

Proceedings ArticleDOI
14 Apr 1998
TL;DR: A computer vision system is developed that automatically recognizes individual action units or action unit combinations in the upper face using hidden Markov models (HMMs) based on the Facial Action Coding System.
Abstract: Automated recognition of facial expression is an important addition to computer vision research because of its relevance to the study of psychological phenomena and the development of human-computer interaction (HCI). We developed a computer vision system that automatically recognizes individual action units or action unit combinations in the upper face using hidden Markov models (HMMs). Our approach to facial expression recognition is based an the Facial Action Coding System (FACS), which separates expressions into upper and lower face action. We use three approaches to extract facial expression information: (1) facial feature point tracking; (2) dense flow tracking with principal component analysis (PCA); and (3) high gradient component detection (i.e. furrow detection). The recognition results of the upper face expressions using feature point tracking, dense flow tracking, and high gradient component detection are 85%, 93% and 85%, respectively.

Proceedings ArticleDOI
23 Jun 1998
TL;DR: This method is both an implementation and extension (an extension in that it models cast shadows) of the illumination cone representation proposed in Belhumeur and Kriegman (1996), and the results exceed those of popular existing methods.
Abstract: Due to illumination variability, the same object can appear dramatically different even when viewed in fixed pose. To handle this variability, an object recognition system must employ a representation that is either invariant to, or models this variability. This paper presents an appearance-based method for modeling the variability due to illumination in the images of objects. The method differs from past appearance-based methods, however, in that a small set of training images is used to generate a representation-the illumination cone-which models the complete set of images of an object with Lambertian reflectance map under an arbitrary combination of point light sources at infinity. This method is both an implementation and extension (an extension in that it models cast shadows) of the illumination cone representation proposed in Belhumeur and Kriegman (1996). The method is tested on a database of 660 images of 10 faces, and the results exceed those of popular existing methods.

Journal ArticleDOI
TL;DR: It is argued that modelling person-specific probability densities in a generic face space using mixture models provides a technique applicable to all four face recognition tasks.

Proceedings ArticleDOI
14 Apr 1998
TL;DR: This paper addresses the proposed method to automatically locate the person's face from a given image that consists of a head-and-shoulders view of the person and a complex background scene and involves a fast, simple and yet robust algorithm that exploits the spatial distribution characteristics of human skin color.
Abstract: This paper addresses our proposed method to automatically locate the person's face from a given image that consists of a head-and-shoulders view of the person and a complex background scene. The method involves a fast, simple and yet robust algorithm that exploits the spatial distribution characteristics of human skin color. It first uses the chrominance component of the input image to detect pixels with skin color appearance. Then, bused on the spatial distribution of the detected skin-color pixels and their corresponding luminance values, the algorithm employs some regularization processes to reinforce regions of skin-color pixels that are more likely to belong to the facial regions and eliminate those that are not. The performance of the face localization algorithm is illustrated by some simulation results carried out on various head-and-shoulders test images.

Journal ArticleDOI
Thomas Vetter1
TL;DR: A new technique is described for synthesizing images of faces from new viewpoints, when only a single 2D image is available, which is interesting for view independent face recognition tasks as well as for image synthesis problems in areas like teleconferencing and virtualized reality.
Abstract: Images formed by a human face change with viewpoint. A new technique is described for synthesizing images of faces from new viewpoints, when only a single 2D image is available. A novel 2D image of a face can be computed without explicitly computing the 3D structure of the head. The technique draws on a single generic 3D model of a human head and on prior knowledge of faces based on example images of other faces seen in different poses. The example images are used to ’’learn‘‘ a pose-invariant shape and texture description of a new face. The 3D model is used to solve the correspondence problem between images showing faces in different poses. The proposed method is interesting for view independent face recognition tasks as well as for image synthesis problems in areas like teleconferencing and virtualized reality.

Journal ArticleDOI
TL;DR: A pupil detection technique using two light sources and the image difference method is proposed and a method for eliminating the images of the light sources reflected in the glass lens is proposed for users wearing eye glasses.
Abstract: Recently, some video-based eye-gaze detection methods used in eye-slaved support systems for the severely disabled have been studied. In these methods, infrared light was irradiated to an eye, two feature areas (the corneal reflection light and pupil) were detected in the image obtained from a video camera and then the eye-gaze direction was determined by the relative positions between the two. However, there were problems concerning stable pupil detection under various room light conditions. In this paper, methods for precisely detecting the two feature areas are consistently mentioned. First, a pupil detection technique using two light sources and the image difference method is proposed. Second, for users wearing eye glasses, a method for eliminating the images of the light sources reflected in the glass lens is proposed. The effectiveness of these proposed methods is demonstrated by using an imaging board. Finally, the feasibility of implementing hardware for the proposed methods in real time is discussed.

Proceedings ArticleDOI
04 Oct 1998
TL;DR: A novel HMM-based face detection approach using the same feature extraction techniques used for face recognition using the coefficients of the Karhunen-Loeve transform is introduced.
Abstract: The work presented in this paper describes a hidden Markov model (HMM)-based framework for face recognition and face detection. The observation vectors used to characterize the states of the HMM are obtained using the coefficients of the Karhunen-Loeve transform (KLT). The face recognition method presented reduces significantly the computational complexity of previous HMM-based face recognition systems, while slightly improving the recognition rate. Consistent with the HMM model of the face, this paper introduces a novel HMM-based face detection approach using the same feature extraction techniques used for face recognition.

MonographDOI
11 Jun 1998
TL;DR: In this article, a theoretical perspective for understanding face recognition is presented, where the authors find the mind's construction in the face and find the face in their social and biological context.
Abstract: Preface. 1: Finding the mind's construction in the face. 2: Faces in their social and biological context. 3: A theoretical perspective for understanding face recognition. 4: Applicability of the theoretical model. 5: Everyday errors in face recognition. 6: Dissociable deficits after brain injury. 7: Face recognition and face imagery. 8: Accounting for delusional misidentifications. 9: Reduplication of visual stimuli. 10: Recognition and reality. 11: Covert face recognition in prosopagnosia. 12: Covert face recognition without prosopagnosia. 13: Simulating covert recognition. 14: Consciousness. Author index. Subject index

Journal ArticleDOI
TL;DR: A face identification algorithm that automatically processes an unknown image by locating and identifying the face by using matching pursuit filters, which is robust to variations in facial expression, hair style, and the surrounding environment.
Abstract: We present a face identification algorithm that automatically processes an unknown image by locating and identifying the face. The heart of the algorithm is the use of pursuit filters. A matching pursuit filter is an adapted wavelet expansion, where the expansion is adapted to both the data and the pattern recognition problem being addressed. For identification, the filters find the features that differentiate among faces, whereas, for detection, the filters encode the similarities among faces. The filters are designed though a simultaneous decomposition of a training set into a two-dimensional (2-D) wavelet expansion. This yields a representation that is explicitly 2-D and encodes information locally. The algorithm uses coarse to fine processing to locate a small set of key facial features, which are restricted to the nose and eye regions of the Face. The result is an algorithm that is robust to variations in facial expression, hair style, and the surrounding environment. Based on the locations of the facial features, the identification module searches the data base for the identity of the unknown face using matching pursuit filters to make the identification. The algorithm was demonstrated on three sets of images. The first set was images from the FERET data base. The second set was infrared and visible images of the same people. This demonstration was done to compare performance on infrared and visible images individually, and on fusing the results from both modalities. The third set was mugshot data from a law enforcement application.

Proceedings ArticleDOI
14 Apr 1998
TL;DR: An online facial expression recognition system based on personalized galleries is presented, built on the framework of the PersonSpotter system, which is able to track and detect the face of a person in a live video sequence.
Abstract: An online facial expression recognition system based on personalized galleries is presented. This system is built on the framework of the PersonSpotter system, which is able to track and detect the face of a person in a live video sequence. By utilizing the recognition method of Elastic Graph Matching, the most similar person whose images are stored in the gallery can be found, then the personalized gallery of this person is used to recognize the expression on the probe face. A personalized gallery consists of images of the same person showing different facial expressions. Node weighting and weighted voting in addition to Elastic Graph Matching are applied to identify the expression. The performance achieved by this system shows its great potential.

Proceedings ArticleDOI
12 May 1998
TL;DR: A hybrid real-time face tracker based on both sound and visual cues is presented that is robust to nonlinear source motions, complex backgrounds, varying lighting conditions, and a variety of source-camera depths.
Abstract: A hybrid real-time face tracker based on both sound and visual cues is presented. Initial talker locations are estimated acoustically from microphone array data while precise localization and tracking are derived from image information. A computationally efficient algorithm for face detection via motion analysis is employed to track individual faces at rates up to 30 frames per second. The system is robust to nonlinear source motions, complex backgrounds, varying lighting conditions, and a variety of source-camera depths. While the direct focus of this work is automated video conferencing, the face tracking capability has utility to many multimedia and virtual reality applications.

ReportDOI
01 Jul 1998
TL;DR: This paper presents a real-time implementation of an eye finding algorithm for a foveated active vision system, and finds that the system finds eyes in 94% of a set of behavioral trials, suggesting that alternate means of evaluating behavioral systems are necessary.
Abstract: Eye finding is the first step toward building a machine that can recognize social cues, like eye contact and gaze direction, in a natural context. In this paper, we present a real-time implementation of an eye finding algorithm for a foveated active vision system. The system uses a motion-based prefilter to identify potential face locations. These locations are analyzed for faces with a template-based algorithm developed by Sinha (1996). Detected faces are tracked in real time, and the active vision system saccades to the face using a learned sensorimotor mapping. Once gaze has been centered on the face, a high-resolution image of the eye can be captured from the foveal camera using a self-calibrated peripheral-ta-foveal mapping.We also present a performance analysis of Sinha's ratio template algorithm on a standard set of static face images. Although this algorithm performs relatively poorly on static images, this result is a poor indicator of real-time performance of the behaving system. We find that our system finds eyes in 94% of a set of behavioral trials. We suggest that alternate means of evaluating behavioral systems are necessary.

Proceedings ArticleDOI
04 Oct 1998
TL;DR: This paper presents a statistics-based method for estimating the position and size of a face in a complex background based on robust statistical measurements derived from two one-dimensional histograms obtained by projecting the result of skin color filtering.
Abstract: This paper presents a statistics-based method for estimating the position and size of a face in a complex background. Face position and size are estimated based on robust statistical measurements which are derived from two one-dimensional histograms obtained by projecting the result of skin color filtering. The proposed algorithm also utilizes a linear Kalman filter and a simple nonlinear filter to perform smooth tracking and remove jitter. The algorithm has been implemented and tested under a wide range of real-world conditions. It has consistently provided performance which satisfies the following requirements: (1) the ability to automatically determine the initial position and size of a face and track it in a complex background; (2) is insensitive to partial occlusions and shadows; (3) is insensitive to face orientation and scale changes; and (4) is also insensitive to lighting condition changes. In addition, the algorithm is computationally simple so that it can be executed in real-time.

Proceedings ArticleDOI
01 Sep 1998
TL;DR: A face detection and facial feature extraction in frontal views algorithm based on principles described in [1] but extends the work by considering: (a) the mirror-symmetry of the face in the vertical direction and (b) facial biometric analogies depending on the size of the faces estimated by the face localization method.
Abstract: Face detection and facial feature extraction are considered to be key requirements in many applications, such as access control systems, model-based video coding, content-based video browsing and retrieval Thus, accurate face localization and facial feature extraction are most desirable A face detection and facial feature extraction in frontal views algorithm is described in this paper The algorithm is based on principles described in [1] but extends the work by considering: (a) the mirror-symmetry of the face in the vertical direction and (b) facial biometrie analogies depending on the size of the face estimated by the face localization method Further improvements have been added to the face localization method to enhance its performance The proposed algorithm has been applied to frontal views extracted from the European ACTS M2VTS database with very good results

Proceedings ArticleDOI
14 Apr 1998
TL;DR: A real-time face recognition system which is able to capture, track and recognize a person walking toward a stereo CCD camera, built for real world applications where environmental conditions are specified only roughly or not at all.
Abstract: The authors present a real-time face recognition system which is able to capture, track and recognize a person walking toward a stereo CCD camera. The system is built for real world applications where environmental conditions like illumination, background structure and room architecture are specified only roughly or not at all. The program is implemented on a 4-processor system running UNIX and reaches a recognition speed of 6-8 persons per minute.

Patent
31 Mar 1998
TL;DR: In this paper, the location of a facial region of a frame of a video is detected and the sensitivity information is calculated for each of a plurality of locations within the video based upon the location.
Abstract: A system encodes video by detecting the location of a facial region of a frame of the video. Sensitivity information is calculated for each of a plurality of locations within the video based upon the location of the facial region. The frame is encoded in manner that provides a substantially uniform apparent quality of the plurality of locations to the viewer when the viewer is observing the facial region of the video.