scispace - formally typeset
Search or ask a question

Showing papers on "Object-class detection published in 1998"


Journal ArticleDOI
TL;DR: A neural network-based upright frontal face detection system that arbitrates between multiple networks to improve performance over a single network, and a straightforward procedure for aligning positive face examples for training.
Abstract: We present a neural network-based upright frontal face detection system. A retinally connected neural network examines small windows of an image and decides whether each window contains a face. The system arbitrates between multiple networks to improve performance over a single network. We present a straightforward procedure for aligning positive face examples for training. To collect negative examples, we use a bootstrap algorithm, which adds false detections into the training set as training progresses. This eliminates the difficult task of manually selecting nonface training examples, which must be chosen to span the entire space of nonface images. Simple heuristics, such as using the fact that faces rarely overlap in images, can further improve the accuracy. Comparisons with several other state-of-the-art face detection systems are presented, showing that our system has comparable performance in terms of detection and false-positive rates.

4,105 citations


Proceedings ArticleDOI
04 Jan 1998
TL;DR: A general trainable framework for object detection in static images of cluttered scenes based on a wavelet representation of an object class derived from a statistical analysis of the class instances and a motion-based extension to enhance the performance of the detection algorithm over video sequences is presented.
Abstract: This paper presents a general trainable framework for object detection in static images of cluttered scenes. The detection technique we develop is based on a wavelet representation of an object class derived from a statistical analysis of the class instances. By learning an object class in terms of a subset of an overcomplete dictionary of wavelet basis functions, we derive a compact representation of an object class which is used as an input to a support vector machine classifier. This representation overcomes both the problem of in-class variability and provides a low false detection rate in unconstrained environments. We demonstrate the capabilities of the technique in two domains whose inherent information content differs significantly. The first system is face detection and the second is the domain of people which, in contrast to faces, vary greatly in color, texture, and patterns. Unlike previous approaches, this system learns from examples and does not rely on any a priori (hand-crafted) models or motion-based segmentation. The paper also presents a motion-based extension to enhance the performance of the detection algorithm over video sequences. The results presented here suggest that this architecture may well be quite general.

1,594 citations


Proceedings ArticleDOI
23 Jun 1998
TL;DR: An algorithm for object recognition that explicitly models and estimated the posterior probability function, P(object/image) in closed form is described, which captures the joint statistics of local appearance and position on the object as well as the statistics ofLocal appearance in the visual world at large.
Abstract: In this paper, we describe an algorithm for object recognition that explicitly models and estimated the posterior probability function, P(object/image). We have chosen a functional form of the posterior probability function that captures the joint statistics of local appearance and position on the object as well as the statistics of local appearance in the visual world at large. We use a discrete representation of local appearance consisting of approximately 10/sup 6/ patterns. We compute an estimate of P(object/image) in closed form by counting the frequency of occurrence of these patterns over various sets of training images. We have used this method for detecting human faces from frontal and profile views. The algorithm for frontal views has shown a detection rate of 93.0% with 88 false alarms on a set of 125 images containing 483 faces combining the MIT test set of Sung and Poggio with the CMU test sets of Rowley, Baluja, and Kanade. The algorithm for detection of profile views has also demonstrated promising results.

435 citations


01 Jan 1998
TL;DR: A trainable object detection system that automatically learns to detect objects of a certain class in unconstrained scenes and learns the pedestrian model from examples and uses no motion cues.
Abstract: In the near future, we can expect on-board automotive vision systems that inform or alert the driver about pedestrians, track surrounding vehicles, and read street signs. Object detection is fundamental to the success of this type of next-generation vision system. In this paper, w e present a trainable object detection system that automatically learns to detect objects of a certain class in unconstrained scenes. We apply our system to the task of pedestrian detection. Unlike previous approaches to pedestrian detection that rely heavily on hand-crafted models and motion information, our system learns the pedestrian model from examples and uses no motion cues. The system can easily be extended to include motion information. We review our previous system, describe a new system that exhibits significantly better performance, provide a comparison between using different combinations of feature sets with classifiers of varying complexity, and describe improvements that increase the system’s processing speed by two orders of magnitude.

220 citations


Patent
Shang-Hung Lin1
20 Feb 1998
TL;DR: In this paper, a face recognition system is provided comprising an input process or circuit, such as a video camera for generating an image of a person, and a face detector process, which determines if a face is present in an image.
Abstract: A face recognition system is provided comprising an input process or circuit, such as a video camera for generating an image of a person. A face detector process or circuit determines if a face is present in a image. A face position registration process or circuit determines a position of the face in the image if the face detector process or circuit determines that the face is present. A feature extractor process or circuit is provided for extracting at least two facial features from the face. A voting process or circuit compares the extractor facial features with a database of extracted facial features to identify the face.

150 citations


Proceedings ArticleDOI
Kazunori Onoguchi1
16 Aug 1998
TL;DR: This paper presents a new method for eliminating shadow areas accompanied with pedestrian-like moving objects in visual surveillance systems by the way of height information, and shows the effectiveness of the proposed method on a pedestrian crossing.
Abstract: This paper presents a new method for eliminating shadow areas accompanied with pedestrian-like moving objects in visual surveillance systems. The proposed method removes the shadow areas by the way of height information since most of the shadow areas accompanying moving objects are assumed to be on the road plane. Two cameras are set at any locations so that their common visual fields which include the surveillance area are used for this purpose. The image obtained from one of the cameras is inversely projected to the road plane and the projected image on the road plane is transformed to the view from the other camera. Shadows existing on the road plane occupy the same areas in the transformed image and in the image acquired from the other camera, whereas objects' areas with different heights from the road plane occupy different areas in these images. Therefore, shadow areas can be removed by a subtraction between these images. An experimental result on a pedestrian crossing showed the effectiveness of the proposed method.

80 citations


Proceedings ArticleDOI
14 Apr 1998
TL;DR: A scale invariant face detection method which combines higher-order local autocorrelation (HLAC) features extracted from a log-polar transformed image with linear discriminant analysis for "face" and "not face" classification is proposed.
Abstract: This paper proposes a scale invariant face detection method which combines higher-order local autocorrelation (HLAC) features extracted from a log-polar transformed image with linear discriminant analysis for "face" and "not face" classification. Since HLAC features of log-polar images are sensitive to shifts of a face, we utilize this property and develop a face detection method. HLAC features extracted from a log-polar image become scale and rotation invariant because scalings and rotations of a face are expressed as shifts in a log-polar image (coordinate). By combining these features with the linear discriminant analysis which is extended to treat "face" and "not face" classes, a scale invariant face detection system can be realized.

77 citations


Proceedings ArticleDOI
14 Apr 1998
TL;DR: A method which utilizes color, local symmetry and geometry information of human face based on various models to identify faces under certain variations and its validity and robustness have been demonstrated.
Abstract: As we know, a robust approach to face and facial features detection must be able to handle the variation issues such as changes in imaging conditions, face appearances and image contents. Here we present a method which utilizes color, local symmetry and geometry information of human face based on various models. The algorithm first detects most likely face regions or ROIs (Region-Of-Interest) from the image using face color model and face outline model, produces a face color similarity map. Then it performs local symmetry detection within these ROIs to obtain a local symmetry similarity map. These two maps are fused to obtain potential facial feature points. Finally similarity matching is performed to identify faces between the fusion map and face geometry model under affine transformation. The output results are the detected faces with confidence values. Experimental results have demonstrated its validity and robustness to identify faces under certain variations.

67 citations


Proceedings ArticleDOI
26 Oct 1998
TL;DR: This work demonstrates an image space collision detection process that allows substantial computational savings during the image space interference test and makes efficient use of the graphics rendering hardware for real time complex object interactions.
Abstract: Object interactions are ubiquitous in interactive computer graphics, 3D object motion simulations, virtual reality and robotics applications. Most collision detection algorithms are based on geometrical object space interference tests. Some algorithms have employed an image space approach to the collision detection problem. We demonstrate an image space collision detection process that allows substantial computational savings during the image space interference test. This approach makes efficient use of the graphics rendering hardware for real time complex object interactions.

62 citations


Patent
18 Aug 1998
TL;DR: In this article, a foreign object video detection system consisting of a television camera for producing a digital color image of a work surface, a converter having direct memory access to a computer, color detection and color image processing software, logic for discriminating objects deemed to be aforeign object on the work surface or upon a layer of material on the surface, and input controls for selecting the area of interest for detection and for optimizing the detection technique based upon the manufacturing situation.
Abstract: A foreign object video detection system comprises a television camera for producing a digital color image of a work surface, a converter having direct memory access to a computer, color detection and color image processing software, logic for discriminating objects deemed to be a foreign object on the work surface or upon a layer of material on the work surface, logic for providing appropriate warning to the operator, and input controls for selecting the area of interest for detection and for optimizing the detection technique based upon the manufacturing situation.

55 citations


Proceedings ArticleDOI
16 Aug 1998
TL;DR: A scheme for robust face and eyes detection from an image using the Gaussian steerable filter to search and detect the facial feature (preattentive feature) roughly in an image.
Abstract: Automatic face location in complex scenes is extremely challenging in human face recognition systems. Further more, the facial features detection also plays an important role. The paper presents a scheme for robust face and eyes detection from an image. The scheme uses the Gaussian steerable filter to search and detect the facial feature (preattentive feature) roughly in an image. The face model is investigated to locate the whole face and facial features, such as eyes, nose and mouth. Here, multiple evidences are used in the face location and eyes detection. One important feature is the structural information of the face, i.e. facial components of certain structure. The other is the symmetry property of the face, here only the front face with certain pose variation is considered. It will reduce the computation greatly. For facial components detection, some image features and PCA features are used for verification from the candidates detected before. Experiments show that the algorithm is robust and fast.

01 Jan 1998
TL;DR: The main task during the internship was to study and implement a neural-network based face detection algorithm for general scenes, which has previously been developed within the IDIAP Computer Vision group, and to deploy a single neural network for face detection running in a sequential manner on a standard workstation.
Abstract: Computerized human face processing (detection, recognition, synthesis) has known an intense research activity during the last few years. Applications involving human face recognition are very broad with an important commercial impacts. Human face processing is a difficult and challenging task: the space of different facial patterns is huge. The variability of human faces as well as their similarity and the influence of other features like beard, glasses, hair, illumination, background etc., make face recognition or face detection difficult to tackle. The main task during the internship was to study and implement a neural-network based face detection algorithm for general scenes, which has previously been developed within the IDIAP Computer Vision group. It also included the study and design of a multi-scale face detection method. A face database and a camera were available to make tests and perform some benchmarking. The main constaint was to have a real-time or almost real-time face detection system. This has beeen achieved. Evaluation of the face detection capability of the employed neural networks were demonstrated on a variety of still images. In addition, we introdudced an efficient preprocessing stage and a new post-processing strategy to eliminate false detections significantly. This allowed to deploy a single neural network for face detection running in a sequential manner on a standard workstation.

Proceedings ArticleDOI
04 Oct 1998
TL;DR: An algorithm for detecting wipes using both structural and statistical information is proposed which can effectively detect most wipes used in current TV programs and can be easily extracted from the MPEG stream without full decompression.
Abstract: The detection of transitions between shots in video programs is an important first step in analyzing video content. The wipe is a frequently used transitional form between shots. Wipe detection is more involved than the detection of abrupt and other gradual transitions because a wipe may take various patterns and because of the difficulty in discriminating a wipe from object and camera motion. In this paper, we propose an algorithm for detecting wipes using both structural and statistical information. The algorithm can effectively detect most wipes used in current TV programs. It uses the DC sequence which can be easily extracted from the MPEG stream without full decompression.

Proceedings ArticleDOI
14 Apr 1998
TL;DR: This work proposes a human motion detection method using multiple-viewpoint images, dividing the task into three primitive sub-tasks (position detection, rotation angle detection and body side detection) and addresses the problem of tracking human bodies.
Abstract: We propose a human motion detection method using multiple-viewpoint images. In vision-based human tracking, self-occlusions and human-human occlusions are a part of the more significant problems. Employing multiple viewpoints and a viewpoint selection mechanism, however, can reduce these problems. The vision system in this case should select the best viewpoints for extracting human motion information; the "best" selections can be changed among different types of target information. We address the problem of tracking human bodies. We divide the task into three primitive sub-tasks (position detection, rotation angle detection and body side detection). Each sub-task has a different criterion for selecting viewpoints and an estimation result of one sub-task can help another sub-task. We describe the criteria for accomplishing the individual sub-tasks and the relationships between sub-tasks. We have built an experimental system based on a small number of reliable image features and performed fundamental examinations on the viewpoint selection approach.

Journal ArticleDOI
TL;DR: A real-time method for object detection and tracking in outdoor environments where illumination can be very low and not constant is presented and an extended Kalman filter is applied to track multiple objects entering the scene.
Abstract: A real-time method for object detection and tracking in outdoor environments where illumination can be very low and not constant is presented. A hierarchical (two levels) change detection method is employed to detect moving objects in the scene. At the first level, a focusof- attention stage is applied to individuate image areas containing moving objects; at the second level, each selected image area is inspected at higher accuracy to improve the detection probability and to obtain an accurate binary reconstruction of the object shape. A background updating procedure is used to adapt the background image to the changes in the scene. Then, an extended Kalman filter is applied to track multiple objects entering the scene. Results are reported on real scenarios in the presence of shadows, occlusions, light reflections, and bad environmental conditions.

Book ChapterDOI
01 Jan 1998
TL;DR: An automatic, real-time face recognition system based on a visual learning technique and its application to face detection in complex background, and accurate facial feature detection/tracking is reported.
Abstract: Two of the most important aspects in the general research framework of face recognition by computer are addressed here: face and facial feature detection, and face recognition — or rather face comparison. The best reported results of the mug-shot face recognition problem are obtained with elastic matching using jets. In this approach, the overall face detection, facial feature localization, and face comparison is carried out in a single step. This paper describes our research progress towards a different approach for face recognition. On the one hand, we describe a visual learning technique and its application to face detection in complex background, and accurate facial feature detection/tracking. On the other hand, a fast algorithm for 2D-template matching is presented as well as its application to face recognition. Finally, we report an automatic, real-time face recognition system.

01 Jan 1998
TL;DR: A shape-based object detection method that uses Distance Transforms that uses multiple features and a template hierarchy that is associated with a coarse-to-fine search over the template transformation parameters is presented.
Abstract: In this paper, the authors present a shape-based object detection method that uses Distance Transforms (DT) Extending previous DT- based matching techniques, this method uses multiple features and a template hierarchy that is associated with a coarse-to-fine search over the template transformation parameters Results from applying this method to real-time traffic sign detection are reported

Proceedings ArticleDOI
16 Aug 1998
TL;DR: Six measures for detecting the presence of glasses and combinations of these measures are introduced and results are given for two facial image sets.
Abstract: In this paper we introduce six measures for detecting the presence of glasses. We also investigate combinations of these measures. Results are given for two facial image sets.

Proceedings ArticleDOI
14 Apr 1998
TL;DR: The proposed shape comparison method operates on edge maps and derives holistic similarity measures without the explicit need for point-to-point correspondence and implicate that the process of face recognition may start at a much earlier stage of visual processing than it was earlier suggested.
Abstract: We introduce a novel methodology applicable to face matching and fast screening of large facial databases. The proposed shape comparison method operates on edge maps and derives holistic similarity measures without the explicit need for point-to-point correspondence. While the use of edge images is important to introduce robustness to changes in illumination, the lack of point-to-point matching delivers speed and tolerance to local non-rigid distortions. In particular, we propose a face similarity measure derived as a variant of the Hausdorff distance by introducing the notion of a neighborhood function and associated penalties. Experimental results on a large set of face images demonstrate that our approach produces excellent recognition results even when less than 1% of the original grey scale face image information is stored in the face database (gallery). These results implicate that the process of face recognition may start at a much earlier stage of visual processing than it was earlier suggested.

Proceedings ArticleDOI
04 Oct 1998
TL;DR: The details of the face database such as the development process and the image file formats are described, together with a discussion on some application scenarios, as well as current benchmarking activities.
Abstract: This paper describes a face image database which has been created and developed at Kodak as a common database for direct benchmarking of automatic face detection and recognition algorithms. This consumer application-oriented face image database is composed of two main sub-databases, one for face detection, and one for face recognition. The database is intended to be distributed to researchers both inside and outside of Kodak working in face detection and recognition research. The database contains pictures taken using consumer digital cameras, scanned in from a photo scanner, as well as pictures from Kodak Image Magic picture disks. The details of the face database such as the development process and the image file formats, are described, together with a discussion on some application scenarios, as well as current benchmarking activities.

Journal Article
TL;DR: In this article, a method to analyze moving pictures by directly manipulating the MPEG stream is proposed, which extracts plural moving objects without a decoding process, and tracks them even if they approach and cross.
Abstract: Intelligent transport systems (ITS) have been developed as next generation systems which unite multimedia technology and the roadway. One ITS technology, image processing, for example, the detection and tracking of moving objects (e.g. vehicles), is studied. On the other hand, the coding technique of MPEG (Motion Picture Experts Group) is used as the worldwide standard method of video data compression. Japan Highway Public Corporation (JH) adopted MPEG to transmit picture data from the surveillance system on the roadway to the control center. Therefore, we propose a method to analyze moving pictures by directly manipulating the MPEG stream. Our proposed method extracts plural moving objects from the MPEG stream without a decoding process, and tracks them even if they approach and cross. It is implemented to extract moving cars on the roadway, and its effectiveness is verified.

Proceedings ArticleDOI
16 Aug 1998
TL;DR: The proposed system uses a hierarchical model to control the image segmentation and matching process and is a multipurpose model for articulated objects which is supporting the linkage of different features.
Abstract: Detecting persons in images is up to now an unsolved problem in general. We propose an approach which strictly follows the assumption that only objects and parts of these objects can be detected, if their appearance is known in advance. We consider the detection step as a first step towards the recognition process. The latter contains the reasoning about the movements of a person. The appearance of a 3D object means the description of the mapping of object features into the image. The proposed system uses a hierarchical model to control the image segmentation and matching process. It is a multipurpose model for articulated objects which is supporting the linkage of different features.

Proceedings ArticleDOI
02 Jan 1998
TL;DR: This work combines stereo, color, and face detection modules into a single robust system, and shows an initial application for an interactive display where the user sees his face distorted into various comic poses in real-time.
Abstract: We present approach to robust real-time person tracking in crowded and/or unknown environments using multimodal integration. We combine stereo, color, and face detection modules into a single robust system, and show an initial application for an interactive display where the user sees his face distorted into various comic poses in real-time. Stereo processing is used to isolate the figure of a user from other objects and people in the background. Skin-hue classification identifies and tracks likely body parts within the foreground region, and face pattern detection discriminates and localizes the face within the tracked body parts. We discuss the failure modes of these individual components, and report results with the complete system in trials with thousands of users.

Proceedings ArticleDOI
01 Jan 1998
TL;DR: In this paper, possible face region candidates in an image with a complex background are identified by means of the valley features on the eyes, considered to belong to the eyes of a human face if they satisfy the local properties of the eyes.
Abstract: Human face detection is an important capability with a wide range of applications, such as human face recognition, surveillance systems, human-computer interfacing, video-conferencing, etc. In most such applications, the existence of human faces and their corresponding locations must be found. In other words, a reliable and fast method for detecting and locating face regions is of practical importance. In this paper, possible face region candidates in an image with a complex background are identified by means of the valley features on the eyes. These valley features are considered to belong to the eyes of a human face if they satisfy the local properties of the eyes. A pair of these features is matched if their Gabor features are similar; a face region is then formed. Each of the face region candidates is then further verified by matching it to a human face template, and by measuring its symmetry. Experiments show that this approach is fast and reliable.

Proceedings ArticleDOI
16 Aug 1998
TL;DR: An inductive learning-based detection method is described that produces a maximally specific hypothesis consistent with the training data and achieves 85% detection rate, a false alarm rate of 0.04% and 1 minute detection table for a 320/spl times/240 image on a Sun Ultrasparc 1.
Abstract: Presents a learning approach for the face detection problem. The problem can be stated as follows: given an arbitrary black and white, still image, find the location and size of every human face it contains. Numerous applications of automatic face detection have attracted considerable interest in this problem, but no present face detection system is completely satisfactory from the point of view of detection rate, false alarm rate and detection time. We describe an inductive learning-based detection method that produces a maximally specific hypothesis consistent with the training data. Three different sets of features were considered for defining the concept of a human face. The performance achieved is as follows: 85% detection rate, a false alarm rate of 0.04% of the number of windows analyzed and 1 minute detection table for a 320/spl times/240 image on a Sun Ultrasparc 1.

Proceedings ArticleDOI
20 Jan 1998
TL;DR: A method using genetic techniques to learn visual features and a program which combines and integrates the features in non-linear ways is presented and integrated in a face tracking system to generate a variety of new visual perceptual processes.
Abstract: This paper addresses the problem of automatic synthesis of visual detectors. We present a method using genetic techniques to learn visual features and a program which combines and integrates the features in non-linear ways. The method is integrated in a face tracking system to generate a variety of new visual perceptual processes.

Proceedings ArticleDOI
19 Oct 1998
TL;DR: A non-intrusive real-time program detects the face, then eyes and nose of a moving workstation user at a rate of between 10 and 30 Hertz and based on the head pose, it determines where on the display the subject is looking.
Abstract: A non-intrusive real-time program detects the face, then eyes and nose of a moving workstation user at a rate of between 10 and 30 Hertz. Based on the head pose, it determines where on the display the subject is looking. The long term goal is to provide a system for controlling a computer using head movements and gaze direction. The current program creates a base facility for other capabilities such as detecting facial gestures, creating face models, and normalizing for face recognition. A skin color model is used along with geometric knowledge about the face and weak assumptions about the lighting. Good results are reported with various subjects and conditions, including facial hair; 3D motion, and use of eyeglasses.

Proceedings ArticleDOI
16 Aug 1998
TL;DR: A method to automatically detect and locate human face features (eyes and mouth) in a 2D gray level image is presented and uses the first four translation, rotation, and scale moment invariants proposed by Hu (1962).
Abstract: A method to automatically detect and locate human face features (eyes and mouth) in a 2D gray level image is presented. The method uses a genetic algorithm (GA) and an invariant description of the facial features to accomplish the task. The descriptors used are the well known first four translation, rotation, and scale moment invariants proposed by Hu (1962). In a first step, an image possibly containing a face or a set of faces is first divided into small cells of fixed size. For each cell, the ordinary moments are next computed. From these quantities, the corresponding Hu's invariants are then derived. Human face feature detection and location is thus accomplished by grouping individual cells using a genetic algorithm by fitting a specific cost function. The cost function corresponds to the invariant description of a specified face feature (eye or mouth) given in terms of the corresponding gray level values.

Proceedings ArticleDOI
16 Aug 1998
TL;DR: A method to narrow down the search space for scale-invariant human face detection is presented, which uses a dynamic attention map implemented by Ising dynamics and it is shown that the "face" region in the image can be detected faster with some experiments.
Abstract: We present a method to narrow down the search space for scale-invariant human face detection, which uses a dynamic attention map implemented by Ising dynamics. Combining the proposed method and the scale-invariant face detection method which is based on both higher-order local autocorrelation (HLAC) features of a log-polar image and linear discriminant analysis for "face" and "not face" classification, it is shown that the "face" region in the image can be detected faster with some experiments.

Proceedings ArticleDOI
23 Jun 1998
TL;DR: This paper presents a methodology for localization of manmade objects in complex scenes by learning multiple feature models in images based on a modular structure consisting of multiple classifiers, each of which solves the problem independently based on its input observations.
Abstract: This paper presents a methodology for localization of manmade objects in complex scenes by learning multiple feature models in images. The methodology is based on a modular structure consisting of multiple classifiers, each of which solves the problem independently based on its input observations. Each classifier module is trained to detect manmade object regions and a higher order decision integrator collects evidence from each of the modules to delineate a final region of interest. The proposed framework is applied to the problem of Automatic Manmade Object Localization/Detection. Results obtained on the detection of vehicles in color visual and infrared imagery are presented in this paper.