scispace - formally typeset
Search or ask a question

Showing papers on "Face detection published in 1999"


Journal ArticleDOI
TL;DR: An efficient and reliable probabilistic metric derived from the Bhattacharrya distance is used in order to classify the extracted feature vectors into face or nonface areas, using some prototype face area vectors, acquired in a previous training stage.
Abstract: Detecting and recognizing human faces automatically in digital images strongly enhance content-based video indexing systems. In this paper, a novel scheme for human faces detection in color images under nonconstrained scene conditions, such as the presence of a complex background and uncontrolled illumination, is presented. Color clustering and filtering using approximations of the YCbCr and HSV skin color subspaces are applied on the original image, providing quantized skin color regions. A merging stage is then iteratively performed on the set of homogeneous skin color regions in the color quantized image, in order to provide a set of potential face areas. Constraints related to shape and size of faces are applied, and face intensity texture is analyzed by performing a wavelet packet decomposition on each face area candidate in order to detect human faces. The wavelet coefficients of the band filtered images characterize the face texture and a set of simple statistical deviations is extracted in order to form compact and meaningful feature vectors. Then, an efficient and reliable probabilistic metric derived from the Bhattacharrya distance is used in order to classify the extracted feature vectors into face or nonface areas, using some prototype face area vectors, acquired in a previous training stage.

641 citations



Proceedings Article
29 Nov 1999
TL;DR: Experimental results on commonly used benchmark data sets of a wide range of face images show that the SNoW-based approach outperforms methods that use neural networks, Bayesian methods, support vector machines and others.
Abstract: A novel learning approach for human face detection using a network of linear units is presented. The SNoW learning architecture is a sparse network of linear functions over a pre-defined or incrementally learned feature space and is specifically tailored for learning in the presence of a very large number of features. A wide range of face images in different poses, with different expressions and under different lighting conditions are used as a training set to capture the variations of human faces. Experimental results on commonly used benchmark data sets of a wide range of face images show that the SNoW-based approach outperforms methods that use neural networks, Bayesian methods, support vector machines and others. Furthermore, learning and evaluation using the SNoW-based method are significantly more efficient than with other methods.

339 citations


Journal ArticleDOI
TL;DR: Name-It, a system that associates faces and names in news videos, takes a multimodal video analysis approach: face sequence extraction and similarity evaluation from videos, name extraction from transcripts, and video-caption recognition.
Abstract: We developed Name-It, a system that associates faces and names in news videos. It processes information from the videos and can infer possible name candidates for a given face or locate a face in news videos by name. To accomplish this task, the system takes a multimodal video analysis approach: face sequence extraction and similarity evaluation from videos, name extraction from transcripts, and video-caption recognition.

311 citations


Journal ArticleDOI
TL;DR: Two fuzzy models are made to describe the skin color and hair color and then compared with the prebuilt head-shape models by using a fuzzy theory based pattern-matching method to detect face candidates.
Abstract: This paper describes a new method to detect faces in color images based on the fuzzy theory. We make two fuzzy models to describe the skin color and hair color, respectively. In these models, we use a perceptually uniform color space to describe the color information to increase the accuracy and stableness. We use the two models to extract the skin color regions and the hair color regions, and then comparing them with the prebuilt head-shape models by using a fuzzy theory based pattern-matching method to detect face candidates.

286 citations


Proceedings ArticleDOI
01 Sep 1999
TL;DR: Given video footage of a person's face, this work presents new techniques to automatically recover the face position and the facial expression from each frame in the video sequence using a 3D face model fitted to each frame using a continuous optimization technique.
Abstract: Given video footage of a person's face, we present new techniques to automatically recover the face position and the facial expression from each frame in the video sequence. A 3D face model is fitted to each frame using a continuous optimization technique. Our model is based on a set of 3D face models that are linearly combined using 3D morphing. Our method has the advantages over previous techniques of fitting directly a realistic 3-dimensional face model and of recovering parameters that can be used directly in an animation system. We also explore many applications, including performance-driven animation (applying the recovered position and expression of the face to a synthetic character to produce an animation that mimics the input video), relighting the face, varying the camera position, and adding facial ornaments such as tattoos and scars.

220 citations


Proceedings ArticleDOI
15 Mar 1999
TL;DR: An embedded hidden Markov model (HMM)-based approach for face detection and recognition that uses an efficient set of observation vectors obtained from the 2D-DCT coefficients that can model the two dimensional data better than the one-dimensional HMM and is computationally less complex than the two-dimensional model.
Abstract: We describe an embedded hidden Markov model (HMM)-based approach for face detection and recognition that uses an efficient set of observation vectors obtained from the 2D-DCT coefficients. The embedded HMM can model the two dimensional data better than the one-dimensional HMM and is computationally less complex than the two-dimensional HMM. This model is appropriate for face images since it exploits an important facial characteristic: frontal faces preserve the same structure of "super states" from top to bottom, and also the same left-to-right structure of "states" inside each of these "super states".

176 citations


Patent
Edwin Ho1, Alison Lennon1
07 Jun 1999
TL;DR: In this article, a method and apparatus for the detection of faces in a digital image captured under a variety of lighting conditions are provided, where the pixels are selected by first selecting a color distribution model based on an image capture condition provided with the color digial image and then the color of the pixels is tested using the distribution model to determine those pixels having predominantly skin color.
Abstract: A method and apparatus for the detection of faces in a digital image captured under a variety of lighting conditions are provided. Rather than subjecting an entire image to computationally intensive face detection analysis, the image is subjected to computationally simply analysis to identify candidate pixels likely to be of skin colour. Only those pixels having such color are subject to computationally intensive face detection analysis. The pixels are selected by first selecting a color distribution model based on an image capture condition provided with the color digial image. Then, the color of the pixels is tested using the distribution model to determine those pixels having predominantly skin color. Preferably, the distribution model is selected from among a number of different distribution models with each of the distribution models associated with a different set of lighting conditions. In this way, a face detection process may accommodate a variety of lighting conditions associated with the capture of an image using a computationally simple analysis.

176 citations


Journal ArticleDOI
TL;DR: A novel faster search scheme of gravity-center template matching compared with the traditional search method in an image for human face detection, which significantly saves the time consumed in rough detection of human faces in a mosaic image.

129 citations


Journal ArticleDOI
TL;DR: Results obtained from a testbed used to investigate different codings for automatic face recognition strongly support the suggestion that faces should be considered as lying in a high-dimensional manifold, which is locally linearly approximated by these shapes and textures, possibly with a separate system for local features.
Abstract: We describe results obtained from a testbed used to investigate different codings for automatic face recognition. An eigenface coding of shape-free faces using manually located landmarks was more effective than the corresponding coding of correctly shaped faces. Configuration also proved an effective method of recognition, with rankings given to incorrect matches relatively uncorrelated with those from shape-free faces. Both sets of information combine to improve significantly the performance of either system. The addition of a system, which directly correlated the intensity values of shape-free images, also significantly increased recognition, suggesting extra information was still available. The recognition advantage for shape-free faces reflected and depended upon high-quality representation of the natural facial variation via a disjoint ensemble of shape-free faces; if the ensemble comprised nonfaces, a shape-free disadvantage was induced. Manipulation within the shape-free coding to emphasize distinctive features of the faces, by caricaturing, allowed further increases in performance; this effect was only noticeable when the independent shape-free and configuration coding was used. Taken together, these results strongly support the suggestion that faces should be considered as lying in a high-dimensional manifold, which is locally linearly approximated by these shapes and textures, possibly with a separate system for local features. Principal components analysis is then seen as a convenient tool in this local approximation.

114 citations


Journal ArticleDOI
TL;DR: A novel algorithm for front-view facial contour detection and features extraction is described, which eliminates the ears and neck from the face region by using morphological operations and knowledge about the face structure.

Dissertation
01 Jan 1999
TL;DR: Compared to other methods, this proposed system offers a more flexible framework for face recognition and detection, and can be used more efficiently in scale invariant systems.
Abstract: The use of hidden Markov models (HMM) for faces is motivated by their partial invariance to variations in scaling and by the structure of faces. The most significant facial features of a frontal face include the hair, forehead, eyes, nose and mouth. These features occur in a natural order, from top to bottom, even if the images undergo small rotations in the image plane, and/or rotations in the plane perpendicular to the image plane. Therefore, the image of a face may be modeled using a one-dimensional HMM by assigning each of these regions to a state. The observation vectors are obtained from the DCT or KLT coefficients. A one-dimensional HMM may be generalized, to give it the appearance of a two-dimensional structure, by allowing each state in a one-dimensional HMM to be a HMM. In this way, the HMM consists of a set of super states, along with a set of embedded states. Therefore, this is referred to as an embedded HMM. The super states may then be used to model two-dimensional data along one direction, with the embedded HMM modeling the data along the other direction. Both the standard HMM and the embedded HMM were tested for face recognition and detection. Compared to other methods, our proposed system offers a more flexible framework for face recognition and detection, and can be used more efficiently in scale invariant systems.

Proceedings ArticleDOI
24 Oct 1999
TL;DR: The algorithm works by first detecting areas of skin color, then it applies a top-down and a bottom-up analysis to the skin colored areas and shows that the algorithm is robust and works even for cases where there are objects in the background that have colors similar to theskin.
Abstract: The detection of faces in color images is important for many multimedia applications. It is the first step for face recognition and it can be used for classifying specific shots such as anchorperson and talk show shots. In this paper, we present an algorithm for the detection of faces in color images. The algorithm works by first detecting areas of skin color, then it applies a top-down and a bottom-up analysis to the skin colored areas. The algorithm has been tested on images from news clips and other television programs. The results show that the algorithm is robust and works even for cases where there are objects in the background that have colors similar to the skin.

Patent
Renato Kresch1
11 Jan 1999
TL;DR: In this paper, a face detection system and a method of pre-filtering an input image for face detection utilize a candidate selector that selects candidate regions of the input image that potentially contains a picture of a human face.
Abstract: A face detection system and a method of pre-filtering an input image for face detection utilize a candidate selector that selects candidate regions of the input image that potentially contains a picture of a human face. The candidate selector operates in conjunction with an associated face detector that verifies whether the candidate regions contain a human face. In the preferred embodiment, the candidate selector includes a linear matched filter and a non-linear filter that operate in series to select the candidate regions from the input image. Initially, the linear matched filter performs a linear correlation on the input image using a filtering kernel to derive a correlation image. The linear matched filter selects regions of the input image that have a local maximum in the correlation image and have correlation values greater than a threshold correlation value. Preferably, the linear correlation is performed in the discrete cosine transform (DCT) domain. The non-linear filter then examines contrast values from various sub-regions of the image regions that were selected by the linear matched filter to screen for suitable candidate regions. The filtering kernel used by the linear matched filter is calculated during a training period, or a non-face detecting period, by a filtering-kernel generator. The filtering kernel is preferably computed utilizing a database of training face images that have been averaged and then masked to remove DC, illumination and noise components.

Proceedings ArticleDOI
26 Sep 1999
TL;DR: This work extends SVMs to model the 2D appearance of human faces which undergo nonlinear change across the view sphere and enables simultaneous multi-view face detection and pose estimation at near-frame rate.
Abstract: Support vector machines have shown great potential for learning classification functions that can be applied to object recognition. In this work, we extend SVMs to model the 2D appearance of human faces which undergo nonlinear change across the view sphere. The model enables simultaneous multi-view face detection and pose estimation at near-frame rate.

Patent
Kentaro Toyama1
29 Oct 1999
TL;DR: In this article, a system and method for detecting a face within an image using a relational template over a geometric distribution of a non-intensity image property is described. But the method of the present invention includes performing feature extraction on the image based on an image property (such as edge density), grouping extracted image feature values into facial regions, and using relational template to determine whether a face has been detected.
Abstract: The present invention is embodied in a system and method for detecting a face within an image using a relational template over a geometric distribution of a non-intensity image property. In general, the system of the present invention includes a hypothesis module for defining a sub-region in which to search for a face, a feature extraction module for extracting image feature values image based on a non-intensity image property, an averaging module for grouping the extracted image feature values into geometrically distributed facial regions, and a relational template module that uses a relational template and facial regions to determine whether a face has been detected. In a preferred embodiment the image property used is edge density, although other suitable properties (such as pixel color) may also be used. The method of the present invention includes performing feature extraction on the image based on an image property (such as edge density), grouping extracted image feature values into facial regions and using a relational template to determine whether a face has been detected.


Journal ArticleDOI
TL;DR: A novel image-based face recognition algorithm that uses a set of random rectilinear line segments of 2D face image views as the underlying image representation, together with a nearest-neighbor classifier as the line matching scheme, which achieves high generalization recognition rates for rotations both in and out of the plane.
Abstract: Much research in human face recognition involves fronto-parallel face images, constrained rotations in and out of the plane, and operates under strict imaging conditions such as controlled illumination and limited facial expressions. Face recognition using multiple views in the viewing sphere is a more difficult task since face rotations out of the imaging plane can introduce occlusion of facial structures. In this paper, we propose a novel image-based face recognition algorithm that uses a set of random rectilinear line segments of 2D face image views as the underlying image representation, together with a nearest-neighbor classifier as the line matching scheme. The combination of 1D line segments exploits the inherent coherence in one or more 2D face image views in the viewing sphere. The algorithm achieves high generalization recognition rates for rotations both in and out of the plane, is robust to scaling, and is computationally efficient. Results show that the classification accuracy of the algorithm is superior compared with benchmark algorithms and is able to recognize test views in quasi-real-time.

Proceedings ArticleDOI
TL;DR: A real-time system for tracking and modeling of faces using an analysis-by-synthesis approach is presented, which can be used to warp the face in the incoming video back to frontal position, and parts of the image can then be subject to eigenspace coding for efficient transmission.
Abstract: A real-time system for tracking and modeling of faces using an analysis-by-synthesis approach is presented. A 3D face model is texture-mapped with a head-on view of the face. Feature points in the face-texture are then selected based on image Hessians. The selected points of the rendered image are tracked in the incoming video using normalized correlation. The result is fed into an extended Kalman filter to recover camera geometry, head pose, and structure from motion. This information is used to rigidly move the face model to render the next image needed for tracking. Every point is tracked from the Kalman filter's estimated position. The variance of each measurement is estimated using a number of factors, including the residual error and the angle between the surface normal and the camera. The estimated head pose can be used to warp the face in the incoming video back to frontal position, and parts of the image can then be subject to eigenspace coding for efficient transmission. The mouth texture is transmitted in this way using 50 bits per frame plus overhead from the person specific eigenspace. The face tracking system runs at 30 Hz, coding the mouth texture slows it down to 12 Hz.

Proceedings ArticleDOI
23 Jun 1999
TL;DR: It is argued that Bayesian network models are an attractive statistical framework for cue fusion in these applications because they combine a natural mechanism for expressing contextual information with efficient algorithms for learning and inference.
Abstract: The development of user interfaces based on vision and speech requires the solution of a challenging statistical inference problem: The intentions and actions of multiple individuals must be inferred from noisy and ambiguous data. We argue that Bayesian network models are an attractive statistical framework for cue fusion in these applications. Bayes nets combine a natural mechanism for expressing contextual information with efficient algorithms for learning and inference. We illustrate these points through the development of a Bayes net model for detecting when a user is speaking. The model combines four simple vision sensors: face detection, skin color, skin texture, and mouth motion. We present some promising experimental results.

Proceedings ArticleDOI
21 Jun 1999
TL;DR: An illumination-based method for synthesizing images of an object under novel viewing conditions that requires as few as three images of the object taken under variable illumination, but from a fixed viewpoint is presented.
Abstract: We present an illumination-based method for synthesizing images of an object under novel viewing conditions. Our method requires as few as three images of the object taken under variable illumination, but from a fixed viewpoint. Unlike multi-view based image synthesis, our method does not require the determination of point or line correspondences. Furthermore, our method is able to synthesize not simply novel viewpoints, but novel illumination conditions as well. We demonstrate the effectiveness of our approach by generating synthetic images of human faces.

Proceedings Article
07 Jun 1999
TL;DR: A new log-polar mapping is presented with main originality of this mapping is its great flexibility, and the range of the logarithmic function used for the mapping can be fully specified (width and position).
Abstract: Space-variant images (images whose resolution changes across the image) supply a powerful image representation for active vision systems. In this article a new log-polar mapping is presented. The main originality of this mapping is its great flexibility. Unlike other approaches the range of the logarithmic function used for the mapping can be fully specified (width and position). In this paper we also propose several efficient algorithms performing basic operations on these images. The originality of the proposed encoding is that log-polar pixels are a fractional part of rectangular pixels. Finally, we present some experimental results describing the use of this mapping in a face detection and tracking application.

01 Jan 1999
TL;DR: A new technique for a faster computation of the activities of the hidden layer units is proposed and has been demonstrated on face detection examples.
Abstract: We propose a new technique for a faster computation of the activities of the hidden layer units. This has been demonstrated on face detection examples.

Journal ArticleDOI
TL;DR: In this paper, a new log-polar mapping is presented, where the range of the logarithmic function used for the mapping can be fully specified (width and position).

Patent
08 Dec 1999
TL;DR: In this paper, the authors proposed a method to detect and display the faces of persons appearing in video by detecting the faces in particular from the video and identifying the detected face in addition.
Abstract: PROBLEM TO BE SOLVED: To distinguish and display the faces of persons appearing in video by detecting the faces in particular from the video and identifying the detected face in addition. SOLUTION: The device provided with a means for detecting a face from the video and a means for identifying the detected face detects a frame including the face from the video, extracts a face picture from the frame, and groups the faces of the same appearing person from all the extracted face pictures to extract the representative face picture of each appearing person to identify the face of the appearing person in the video. Thus, the face of the person appearing in the video can be distinguished and displayed.

Proceedings ArticleDOI
23 Jun 1999
TL;DR: It is demonstrated that face recognition can be considerably improved by the analysis of video sequences, and the method presented is widely applicable in many multi-class interpretation problems.
Abstract: We present a quantitative evaluation of an algorithm for model-based face recognition. The algorithm actively learns how individual faces vary through video sequences, providing on-line suppression of confounding factors such as expression, lighting and pose. By actively decoupling sources of image variation, the algorithm provides a framework in which identity evidence can be integrated over a sequence. We demonstrate that face recognition can be considerably improved by the analysis of video sequences. The method presented is widely applicable in many multi-class interpretation problems.

Proceedings ArticleDOI
17 Oct 1999
TL;DR: A system which is able to visually detect human faces, to track them by controlling a robot-head and to pursue a detected person by means of driving movements is presented.
Abstract: We present a system which is able to visually detect human faces, to track them by controlling a robot-head and to pursue a detected person by means of driving movements. The detection is based on a multimodal approach combining color, motion, and contour information. By using a stereo algorithm the position of the person in the scene is determined. Both the path of the person going ahead and a local environment map built by means of range sensor data are used to perform the navigation task. Stationary and dynamic obstacles are avoided during the process of pursuit.

Journal ArticleDOI
TL;DR: A face detection system for an image annotation task that requires detection of faces at multiple scales against a complex background is described; it uses the presence of skin tone pixels coupled with shape and face-specific features to locate faces in images.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: Experiments show that color information improves the robustness of the detection significantly and includes color information into a face detection approach based on principal components analysis (PCA).
Abstract: We present a face detection algorithm for color images with complex background. We include color information into a face detection approach based on principal components analysis (PCA). A skin color probability image is generated by doing a color analysis and the PCA is performed on this new image instead of the luminance image. Experiments show that color information improves the robustness of the detection significantly.

Proceedings ArticleDOI
27 Sep 1999
TL;DR: This paper describes a method for the detection and tracking of human face and facial features that is capable of tracking, in every frame, the three main features of a human face.
Abstract: This paper describes a method for the detection and tracking of human face and facial features. Skin segmentation is learnt from samples of an image. After detecting a moving object, the corresponding area is searched for clusters of pixels with a known distribution. Since we only use the hue (color) component this process is quite insensitive to illumination changes. The face localization procedure looks for areas in the segmented area which resemble a head. Using simple heuristics, the located head is searched and its centroid is fed back to a camera motion control algorithm which tries to keep the face centered in the image using a pan-tilt camera unit. Furthermore the system is capable of tracking, in every frame, the three main features of a human face. Since precise eye location is computationally intensive, an eye and mouth locator using fast morphological and linear filters is developed. This allows for frame-by-frame checking, which reduces the probability of tracking a non-basis feature, yielding a higher success ratio. Velocity and robustness are the main advantages of this fast facial feature detector.