scispace - formally typeset
Search or ask a question

Showing papers on "Face detection published in 1996"


Proceedings ArticleDOI
18 Jun 1996
TL;DR: A neural network-based face detection system that arbitrates between multiple networks to improve performance over a single network using a bootstrap algorithm, which eliminates the difficult task of manually selecting non-face training examples.
Abstract: We present a neural network-based face detection system. A retinally connected neural network examines small windows of an image and decides whether each window contains a face. The system arbitrates between multiple networks to improve performance over a single network. We use a bootstrap algorithm for training the networks, which adds false detections into the training set as training progresses. This eliminates the difficult task of manually selecting non-face training examples, which must be chosen to span the entire space of non-face images. Comparisons with other state-of-the-art face detection systems are presented; our system has better performance in terms of detection and false-positive rates.

301 citations


Journal ArticleDOI
TL;DR: A kind of scanning scheme for face detection in color scenes, in which the orange-like parts including the face areas were enhanced by utilizing the I component of the YIQ color system, showed that this method can locate the face position in the complex backgrounds effectively.

256 citations


Dissertation
01 Jan 1996
TL;DR: This thesis presents a learning based approach for detecting classes of objects and patterns with variable image appearance but highly predictable image boundaries, and proposes an active learning formulation for function approximation, and shows that the active example selection strategy learns its target with fewer data samples than random sampling.
Abstract: Object and pattern detection is a classical computer vision problem with many potential applications, ranging from automatic target recognition to image-based industrial inspection tasks in assembly lines. While there have been some successful object and pattern detection systems in the past, most such systems handle only specific rigid objects or patterns that can be accurately described by fixed geometric models or pictorial templates. This thesis presents a learning based approach for detecting classes of objects and patterns with variable image appearance but highly predictable image boundaries. Some examples of such object and pattern classes include human faces, aerial views of structured terrain features like volcanoes, localized material defect signatures in industrial parts, certain tissue anomalies in medical images, and instances of a given digit or character, which may be written or printed in many different styles. The thesis consists of two parts. In part one, we introduce our object and pattern detection approach using a concrete human face detection example. The approach first builds a distribution-based model of the target pattern class in an appropriate feature space to describe the target's variable image appearance. It then learns from examples a similarity measure for matching new patterns against the distribution-based target model. We also discuss some pertinent learning issues, including ideas on virtual example generation and example selection. The approach makes few assumptions about the target pattern class and should therefore be fairly general, as long as the target class has predictable image boundaries. We show that this is indeed the case by demonstrating the technique on two other pattern detection/recognition problems. Because our object and pattern detection approach is very much learning-based, how well a system eventually performs depends heavily on the quality of training examples it receives. The second part of this thesis looks at how one can select high quality examples for function approximation learning tasks. Active learning is an area of research that investigates how a learner can intelligently select future training examples to get better approximation results with less data. We propose an active learning formulation for function approximation, and show for three specific approximation function classes, that the active example selection strategy learns its target with fewer data samples than random sampling. Finally, we simplify the original active learning formulation, and show how it leads to a tractable example selection paradigm, suitable for use in many object and pattern detection problems. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

254 citations


Proceedings ArticleDOI
14 Oct 1996
TL;DR: A new approach for automatically segmentation and tracking of faces in color images by evaluating color and shape information and finding regions with elliptical shape selected as face hypotheses is presented.
Abstract: The authors present a new approach for automatically segmentation and tracking of faces in color images. Segmentation of faces is performed by evaluating color and shape information. First, skin-like regions are determined based on the color attributes hue and saturation. Then regions with elliptical shape are selected as face hypotheses. They are verified by searching for facial features in their interior. After a face is reliably detected it is tracked over time. Tracking is realized by using an active contour model. The exterior forces of the snake are defined based on color features. They push or pull snaxels perpendicular to the snake. Results for tracking are shown for an image sequence consisting of 150 frames.

208 citations


Proceedings ArticleDOI
14 Oct 1996
TL;DR: In this paper, an approach for estimating 3D head orientation in a monocular image sequence is proposed, which employs recently developed image-based parameterized tracking for face and face features to locate the area in which a sub-pixel parameterized shape estimation of the eye's boundary is performed.
Abstract: An approach for estimating 3D head orientation in a monocular image sequence is proposed. The approach employs recently developed image-based parameterized tracking for face and face features to locate the area in which a sub-pixel parameterized shape estimation of the eye's boundary is performed. This involves tracking of five points (four at the eye corners and the fifth is the lip of the nose). The authors describe an approach that relies on the coarse structure of the face to compute orientation relative to the camera plane. Our approach employs projective invariance of the cross-ratios of the eye corners and anthropometric statistics to estimate the head yaw, roll and pitch. Analytical and experimental results are reported.

194 citations


Proceedings ArticleDOI
14 Oct 1996
TL;DR: The recognition results show that both visible and IR imagery perform similarly across algorithms and that fusion of IR and visible imagery as a viable means of enhancing performance beyond that of either acting alone.
Abstract: This paper presents initial results in a study comparing the effectiveness of visible and infra-red (IR) imagery for detecting and recognizing faces in areas where personnel identification is critical, (e.g. airports and secure buildings). We compare the effectiveness of visible versus IR imagery by running three face recognition algorithms on a database of images collected for this study. There are both IR and visible images for each person in the database collected using the same scenarios. We used three very different feature-extraction and decision-making algorithms for our study to insure that the comparisons would not depend on a particular processing technique. We also present recognition results when visible and infra-red decision metrics are fused. The recognition results show that both visible and IR imagery perform similarly across algorithms and that fusion of IR and visible imagery as a viable means of enhancing performance beyond that of either acting alone. We examine the relative importance of different regions of the face for recognition. We also discuss practical issues of implementation, along with plans for the next phase of the study, face detection in an uncontrolled environment. Preliminary face detection results are presented.

186 citations


Proceedings ArticleDOI
16 Sep 1996
TL;DR: This paper performs face localization based on the observation that human faces are characterized by their oval shape and skin-color, also in the case of varying light conditions, and segment faces by evaluating shape and color information.
Abstract: Recognition of human faces out of still images or image sequences is a research field of fast increasing interest. At first, facial regions and facial features like eyes and mouth have to be extracted. In the present paper we propose an approach that copes with problems of these first two steps. We perform face localization based on the observation that human faces are characterized by their oval shape and skin-color, also in the case of varying light conditions. For that we segment faces by evaluating shape and color (HSV) information. Then face hypotheses are verified by searching for facial features inside of the face-like regions. This is done by applying morphological operations and minima localization to intensity images.

179 citations


Proceedings ArticleDOI
18 Jun 1996
TL;DR: This work determines the spatial location of a user's head and guides an active camera to obtain foveated images of the face to demonstrate real-time face tracking and pose estimation in an unconstrained office environment with an active foveate camera.
Abstract: We demonstrate real-time face tracking and pose estimation in an unconstrained office environment with an active foveated camera. Using vision routines previously implemented for an interactive environment, we determine the spatial location of a user's head and guide an active camera to obtain foveated images of the face. Faces are analyzed using a set of eigenspaces indexed over both pose and world location. Closed loop feedback from the estimated facial location is used to guide the camera when a face is present in the foveated view. Our system can detect the head pose of an unconstrained user in real-time as he or she moves about an open room.

145 citations


Journal ArticleDOI
TL;DR: An effective automatic face location system that can locate the face region in a complex background when the system is used as a pre-processor of a practical face recognition system for security is proposed.

140 citations


Journal ArticleDOI
TL;DR: This work applies an appearance-based approach to the problem of context-specific gesture interpolation and recognition, and demonstrates real-time systems which perform these tasks.
Abstract: Hand and face gestures are modeled using an appearance-based approach in which patterns are represented as a vector of similarity scores to a set of view models defined in space and time. These view models are learned from examples using unsupervised clustering techniques. A supervised teaming paradigm is then used to interpolate view scores into a task-dependent coordinate system appropriate for recognition and control tasks. We apply this analysis to the problem of context-specific gesture interpolation and recognition, and demonstrate real-time systems which perform these tasks.

131 citations


Proceedings ArticleDOI
18 Jun 1996
TL;DR: A new framework for recognizing planar object classes is presented, which is based on local feature detectors and a probabilistic model of the spatial arrangement of the features, and the allowed object deformations are represented through shape statistics, which are learned from examples.
Abstract: We present a new framework for recognizing planar object classes, which is based on local feature detectors and a probabilistic model of the spatial arrangement of the features. The allowed object deformations are represented through shape statistics, which are learned from examples. Instances of an object in an image are detected by finding the appropriate features in the correct spatial configuration. The algorithm is robust with respect to partial occlusion, detector false alarms, and missed features. A 94% success rate was achieved for the problem of locating quasi-frontal views of faces in cluttered scenes.

Proceedings ArticleDOI
14 Oct 1996
TL;DR: This work uses a variation of Gabor wavelet transform as a representation framework for investigating face pose measurement and Dimensionality reduction using principal components analysis (PCA) enables pose changes to be visualised as manifolds in low-dimensional subspaces and provides a useful mechanism for investigating these changes.
Abstract: Visual perception of faces is invariant under many transformations, perhaps the most problematic of which is pose change (face rotating in depth). We use a variation of Gabor wavelet transform (GWT) as a representation framework for investigating face pose measurement. Dimensionality reduction using principal components analysis (PCA) enables pose changes to be visualised as manifolds in low-dimensional subspaces and provides a useful mechanism for investigating these changes. The effectiveness of measuring face pose with GWT representations was examined using PCA. We discuss our experimental results and draw a few preliminary conclusions.

Journal ArticleDOI
TL;DR: The method proposed successfully detected faces wearing glasses and all faces in images which contained multiple faces, as well as four back-propagation neural networks arranged in a hierarchical structure.

Patent
08 May 1996
TL;DR: In this article, a face detection system (100) includes an imaging device, a computer having a pattern prototype synthetizer and an image classifier, and an output display device, which synthesizes face and non-face pattern prototypes by a network training process using a number of example images.
Abstract: A face detection system (100) includes an imaging device, a computer having a pattern prototype synthetizer and an image classifier, and an output display device. Face and non-face pattern prototypes are synthesized by a network training process using a number of example images. A new applied image is analysed by taking a distance from the applied image to each of the prototypes, and a decision is made as to whether the applied image contains a face based on the distances thus obtained. The system and method may be applied to various other detection tasks.

Proceedings Article
14 Oct 1996
TL;DR: The essence of the system is that the motion tracker is able to focus attention for a face detection network whilst the latter is used to aid the tracking process.
Abstract: Robust tracking and segmentation of faces is a prerequisite for face analysis and recognition. In this paper we describe an approach to this problem which is well suited to surveillance applications with poorly constrained viewing conditions. It integrates motion-based tracking with model based face detection to produce segmented face sequences from complex scenes containing several people. The motion of moving image contours was estimated using temporal convolution and a temporally consistent list of moving objects was maintained. Objects were tracked using Kalman filters. Faces were detected using a neural network. The essence of the system is that the motion tracker is able to focus attention for a face detection network whilst the latter is used to aid the tracking process.

Proceedings ArticleDOI
14 Oct 1996
TL;DR: A face detection framework that groups image features into meaningful entities-using perceptual organization, assigns probabilities to each of them, and reinforce there probabilities using Bayesian reasoning techniques is proposed.
Abstract: Present approaches to human face detection have made several assumptions that restrict their ability to be extended to general imaging conditions. We identify that the key factor in a generic and robust system is that of exploiting a large amount of evidence, related and reinforced by model knowledge through a probabilistic framework. In this paper, we propose a face detection framework that groups image features into meaningful entities-using perceptual organization, assigns probabilities to each of them, and reinforce there probabilities using Bayesian reasoning techniques. True hypotheses of faces will be reinforced to a high probability. The detection of faces under scale, orientation and viewpoint variations will be examined in a subsequent paper.

Dissertation
03 Oct 1996
TL;DR: The view-based face recognizer, techniques for synthesizing virtual views, and experimental results using real and virtual views in the recognizer are presented.
Abstract: The problem of automatic face recognition is to visually identify a person in an input image. This task is performed by matching the input face against the faces of known people in a database of faces. Most existing work in face recognition has limited the scope of the problem, however, by dealing primarily with frontal views, neutral expressions, and fixed lighting conditions. To help generalize existing face recognition systems, we are looking at the problem of recognizing faces under a range of viewpoints. In particular, we consider two cases of this problem: (i) many example views are available of each person, and (ii) only one view is available per person, perhaps a driver's license or passport photograph. Ideally, we would like to address these two cases using a simple view-based approach, where a person is represented in the database by using a number of views on the viewing sphere. While the view-based approach is consistent with case (i), for case (ii) we need to augment the single real view of each person with synthetic views from other viewpoints, views we call "virtual views". Virtual views are generated using prior knowledge of face rotation, knowledge that is "learned" from images of prototype faces. This prior knowledge is used to effectively rotate in depth the single real view available of each person. In this thesis, I present the view-based face recognizer, techniques for synthesizing virtual views, and experimental results using real and virtual views in the recognizer. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

Proceedings ArticleDOI
14 Oct 1996
TL;DR: The authors introduce a method for the automatic detection of facial features and characteristic anatomical keypoints based on steerable filters that integrates model knowledge to guarantee a consistent interpretation of the abundance of local features.
Abstract: The authors introduce a method for the automatic detection of facial features and characteristic anatomical keypoints. In the application they are aiming at the anatomical landmarks are used to accurately measure facial features. Their approach is essentially bused on a selective search and sequential tracking of characteristic edge and line structures of the facial object to be searched. It integrates model knowledge to guarantee a consistent interpretation of the abundance of local features. The search and the tracking is controlled in each step by interpreting the already derived edge and line information in the context of the whole considered region. For their application, the edge and line detection has to be very precise and flexible. Therefore, they apply a powerful filtering scheme based on steerable filters.

Proceedings ArticleDOI
14 Oct 1996
TL;DR: A novel algorithm for face detection using decision trees (DT) is proposed and its generality and feasibility is shown using a database consisting of 2340 face images from the FERET database over a semi-uniform background.
Abstract: The paper proposes a novel algorithm for face detection using decision trees (DT) and shows its generality and feasibility using a database consisting of 2340 face images from the FERET database (corresponding to 817 subjects and including 190 sets of duplicates) over a semi-uniform background. The approach used for face detection involves three main stages, those of location, cropping, and post-processing. The first stage finds a rough approximation for the possible location of the face box, the second stage will refine it, and the last stage decider whether a face is present in the image and if the answer is positive would normalize the face image. The algorithm does not require multiple (scale) templates and the accuracy achieved is 96%. Accuracy is based on the visual observation that the face box includes both eyes, nose, and mouth, and that the top side of the box is below the hairline. Experiments were also performed to assess the accuracy of the algorithm in rejecting images where no face is present. Using a small database of 25 images of various but complex backgrounds the algorithm failed on two images for an overall accuracy rate of 92%.

Journal ArticleDOI
TL;DR: The artificial neural network group-based adaptive tolerance (GAT) tree model for translation-invariant face recognition, suitable for use in an airport security system, is introduced.
Abstract: Recent artificial neural network research has focused on simple models, but such models have not been very successful in describing complex systems (such as face recognition). This paper introduces the artificial neural network group-based adaptive tolerance (GAT) tree model for translation-invariant face recognition, suitable for use in an airport security system. GAT trees use a two-stage divide-and-conquer tree-type approach. The first stage determines general properties of the input, such as whether the facial image contains glasses or a beard. The second stage identifies the individual. Face perception classification, detection of front faces with glasses and/or beards, and face recognition results using GAT trees under laboratory conditions are presented. We conclude that the neural network group-based model offers significant improvement over conventional neural network trees for this task.

Proceedings ArticleDOI
Eli Saber1, A.M. Tekalp
25 Aug 1996
TL;DR: An algorithm for detecting human faces and subsequently localizing the eyes, nose, and mouth is described and symmetry-based cost functions are introduced to take advantage of the inherent symmetries associated with facial patterns.
Abstract: This paper describes an algorithm for detecting human faces and subsequently localizing the eyes, nose, and mouth. First, we locate the face based on color and shape information. To this effect, a supervised pixel-based color classifier is used to mark all pixels which are within a prespecified distance of "skin color". This color-classification map is then subject to smoothing employing either morphological operations or filtering using a Gibbs random field model. The eigenvalues and eigenvectors computed from the spatial covariance matrix are utilized to fit an ellipse to the skin region under analysis. The Hausdorff distance is employed as a means for comparison, yielding a measure of proximity between the shape of the region and the ellipse model. Then, we introduce symmetry-based cost functions to locate the center of the eyes, tip of nose, and center of mouth within the facial segmentation mask. The cost functions are designed to take advantage of the inherent symmetries associated with facial patterns. We demonstrate the performance of our algorithm on a variety of images.

Proceedings ArticleDOI
01 Jan 1996
TL;DR: A dynamic face tracking system based on an integrated motion-based object tracking and model-based face detection framework that produces segmented face sequences from complex scenes with poor viewing conditions in surveillance applications is described.
Abstract: We describe a dynamic face tracking system based on an integrated motion-based object tracking and model-based face detection framework. The motion-based tracker focuses attention for the face detector whilst the latter aids the tracking process. The system produces segmented face sequences from complex scenes with poor viewing conditions in surveillance applications. We also investigate a Gabor wavelet transform as a representation scheme for capturing head rotations in depth. Principal components analysis was used to visualise the mani-folds described by pose changes. Qualitative results are given.

Proceedings ArticleDOI
14 Oct 1996
TL;DR: The proposed system works hierarchically from detecting the position of human face and its features to contours and feature points extraction, and is confirmed to be very effective and robust to deal with the image of faces in the complex background.
Abstract: This paper presents an automatic processing of human face from color images. The system works hierarchically from detecting the position of human face and its features (such as eyes, nose, mouth, etc.) to contours and feature points extraction. The position of human face and its parts are detected from the image by applying the integral projection method, which synthesize the color information (skin and hair color) and the edge information (intensity and sign). In order to extract the contour-line of face features we used a multiple active contour model with color information based energy terms. Facial feature points are decided based on the optimized contours. The proposed system is confirmed to be very effective and robust to deal with the image of faces in the complex background.

Proceedings ArticleDOI
25 Aug 1996
TL;DR: This paper model the face detection problem using information theory, and formulate information based measures for detecting faces by maximizing the feature class separation, which is empirically compared using multiple test sets.
Abstract: Face detection in complex environments is an unsolved problem which has fundamental importance to face recognition, model based video coding, content based image retrieval, and human computer interaction. In this paper we model the face detection problem using information theory, and formulate information based measures for detecting faces by maximizing the feature class separation. The underlying principle is that search through an image can be viewed as a reduction of uncertainty in the classification of the image. The face detection algorithm is empirically compared using multiple test sets, which include four face databases from three universities.

Proceedings ArticleDOI
16 Sep 1996
TL;DR: This work proposes an approach to the automatic construction of 3D human face models using a generic face model and several 2D face images and develops a template matching based algorithm to automatically extract all necessary facial features from the front and side profile face images.
Abstract: In order to achieve low bit-rate video coding, model-based coding systems have attracted great interests in visual telecommunications, e.g., videophone and teleconferencing where the human faces are the major part in the scenes. The main idea of this approach is to construct a 3D model for human face. Only the moving parts on the face are analyzed and the motion parameters are transmitted, finally the original facial expressions could be synthesized by deforming the face model using the facial motion parameters. We propose an approach to the automatic construction of 3D human face models using a generic face model and several 2D face images. A template matching based algorithm is developed to automatically extract all necessary facial features from the front and side profile face images. Then the generic face model is fitted to these feature points by geometric transforms. Finally, texture mapping is performed to achieve realistic results.

Proceedings ArticleDOI
14 Oct 1996
TL;DR: This work has implemented an interface that tracks a person's facial features in real time (30 Hz) and can recognise a large set of gestures ranging from "yes", "no" and "may be" to detecting winks, blinks and sleeping.
Abstract: People naturally express themselves through facial gestures and expressions. Our goal is to build a facial gesture human-computer interface for use in robot applications. We have implemented an interface that tracks a person's facial features in real time (30 Hz). Our system does not require special illumination nor facial makeup. By using multiple Kalman filters we accurately predict and robustly track facial features. This is despite disturbances and rapid movements of the head (including both translational and rotational motion). Since we reliably track the face in real-time we are also able to recognise motion gestures of the face. Our system can recognise a large set of gestures (13) ranging from "yes", "no" and "may be" to detecting winks, blinks and sleeping.

Proceedings ArticleDOI
14 Oct 1996
TL;DR: A method of locating hypotheses for the positions of faces in an image by using statistical feature detectors to locate candidates for features, then using a statistical model of the shape and orientation of the features to test combinations of such features to find the most plausible.
Abstract: We describe a method of locating hypotheses for the positions of faces in an image. We use statistical feature detectors to locate candidates for features, then use a statistical model of the shape and orientation of the features to test combinations of such features to find the most plausible. The best sets can be used as the initial position of an Active Shape Model, which can then accurately locate the full face.

Proceedings ArticleDOI
25 Aug 1996
TL;DR: Experimental results demonstrate that the proposed scheme can efficiently detect human facial features and is deal for dealing with the problems caused by bad lighting condition, skew face orientation, and even facial expression.
Abstract: Most of the conventional approaches for facial feature detection use the template matching and correlation techniques. These kinds of approaches are very time-consuming and therefore impractical in a real-time systems. In this paper, we propose a useful geometrical face model and an efficient facial feature detection scheme. Based on the fact that human faces are constructed in the same geometrical configuration, the proposed scheme can accurately detect facial features, especially the eyes, even when the images have complex backgrounds. Experimental results demonstrate that the proposed scheme can efficiently detect human facial features and is deal for dealing with the problems caused by bad lighting condition, skew face orientation, and even facial expression.

Proceedings ArticleDOI
E. Petajan1, Hans Peter Graf1
14 Oct 1996
TL;DR: A face feature acquisition system with robust performance in the presence of extreme lighting variations and moderate variations in pose, which can be used as the basis for visual speech features in an automatic speechreading system.
Abstract: The robust acquisition of facial features needed for visual speech processing is fraught with difficulties which greatly increase the complexity of the machine vision system. This system must extract the inner lip contour from facial images with variations in pose, lighting, and facial hair. This paper describes a face feature acquisition system with robust performance in the presence of extreme lighting variations and moderate variations in pose. Furthermore, system performance is not degraded by facial hair or glasses. To find the position of a face reliably we search the whole image for facial features. These features are then combined and tests are applied, to determine whether any such combination actually belongs to a face. In order to find where the lips are, other features of the face, such as the eyes, must be located as well. Without this information it is difficult to reliably find the mouth in a complex image. Just the mouth by itself is easily missed or other elements in the image can be mistaken for a mouth. If camera position can be constrained to allow the nostrils to be viewed, then nostril tracking is used to both reduce computation and provide additional robustness. Once the nostrils are tracked from frame to frame using a tracking window the mouth area can be isolated and normalized for scale and rotation. A mouth detail analysis procedure is then used to estimate the inner lip contour and teeth and tongue regions. The inner lip contour and head movements are then mapped to synthetic face parameters to generate a graphical talking head synchronized with the original human voice. This information can also be used as the basis for visual speech features in an automatic speechreading system. Similar features were used in our previous automatic speechreading systems.

Proceedings ArticleDOI
14 Oct 1996
TL;DR: This work describes how the cooperation of image and audio processing allows to track a person's face and to collect the audio information it produces, coupled with a neural network based face detector with a low false alarm rate, to locate and track faces.
Abstract: Both visual and acoustical informations provide effective means of telecommunication between persons. In this context, the face is the most important part of the person both visually and acoustically. We describe how the cooperation of image and audio processing allows to track a person's face and to collect the audio information it produces. We present detection techniques of regions of interest (e.g. Moving regions of skin color), coupled with a neural network based face detector with a low false alarm rate, to locate and track faces. The system is connected to a nine microphone array adaptive beam forming which performs immediate beam forming. Visual and acoustical informations from the speaker face are thus obtained in real time.