scispace - formally typeset
Search or ask a question

Showing papers by "Takeo Kanade published in 2001"


Journal ArticleDOI
TL;DR: An Automatic Face Analysis (AFA) system to analyze facial expressions based on both permanent facial features and transient facial features in a nearly frontal-view face image sequence and Multistate face and facial component models are proposed for tracking and modeling the various facial features.
Abstract: Most automatic expression analysis systems attempt to recognize a small set of prototypic expressions, such as happiness, anger, surprise, and fear. Such prototypic expressions, however, occur rather infrequently. Human emotions and intentions are more often communicated by changes in one or a few discrete facial features. In this paper, we develop an automatic face analysis (AFA) system to analyze facial expressions based on both permanent facial features (brows, eyes, mouth) and transient facial features (deepening of facial furrows) in a nearly frontal-view face image sequence. The AFA system recognizes fine-grained changes in facial expression into action units (AU) of the Facial Action Coding System (FACS), instead of a few prototypic expressions. Multistate face and facial component models are proposed for tracking and modeling the various facial features, including lips, eyes, brows, cheeks, and furrows. During tracking, detailed parametric descriptions of the facial features are extracted. With these parameters as the inputs, a group of action units (neutral expression, six upper face AU and 10 lower face AU) are recognized whether they occur alone or in combinations. The system has achieved average recognition rates of 96.4 percent (95.4 percent if neutral expressions are excluded) for upper face AU and 96.7 percent (95.6 percent with neutral expressions excluded) for lower face AU. The generalizability of the system has been tested by using independent image databases collected and FACS-coded for ground-truth by different research teams.

1,773 citations


Journal ArticleDOI
01 Oct 2001
TL;DR: This paper presents an overview of the issues and algorithms involved in creating this semiautonomous, multicamera surveillance system and its potential to improve the situational awareness of security providers and decision makers.
Abstract: The Video Surveillance and Monitoring (VSAM) team at Carnegie Mellon University (CMU) has developed an end-to-end, multicamera surveillance system that allows a single human operator to monitor activities in a cluttered environment using a distributed network of active video sensors. Video understanding algorithms have been developed to automatically detect people and vehicles, seamlessly track them using a network of cooperating active sensors, determine their three-dimensional locations with respect to a geospatial site model, and present this information to a human operator who controls the system through a graphical user interface. The goal is to automatically collect and disseminate real-time information to improve the situational awareness of security providers and decision makers. The feasibility of real-time video surveillance has been demonstrated within a multicamera testbed system developed on the campus of CMU. This paper presents an overview of the issues and algorithms involved in creating this semiautonomous, multicamera surveillance system.

693 citations


Patent
23 Oct 2001
TL;DR: In this paper, a three-dimensional model of an area of a patient upon which a surgical procedure is to be performed is modeled using software techniques, including placement of multifunctional markers.
Abstract: Devices and methods implement computer-aided orthopedic surgical procedures (50) utilizing intra-operative feedback. A three-dimensional model of an area of a patient upon which a surgical procedure is to be performed is modeled using software techniques (52). The software model is used to generate a surgical plan (54), including placement of multifunctional markers. After the markers are placed on the patient, an updated image of the patient is taken and used to calculate a final surgical plan (70) for performing the remainder of the surgical planning, and surgery (56) may all take place remote from each other. The various entities may communicate via an electronic communications network such as the Internet.

411 citations


Patent
06 Apr 2001
TL;DR: In this article, a computer assisted orthopedic surgery planner software for generation of 3D (three dimensional) solid bone models from two or more 2D (two dimensional) X-ray images of a patient's bone is presented.
Abstract: A computer assisted orthopedic surgery planner software for generation of 3D (three dimensional) solid bone models from two or more 2D (two dimensional) X-ray images of a patient's bone. The computer assisted orthopedic surgery planner software reconstructs the bone contours by starting with a 3D template bone and deforming the 3D template bone to substantially match the geometry of the patient's bone. A surgical planner and simulator module of the computer assisted orthopedic surgery planner software generates a simulated surgery plan showing the animation of the bone distraction process, the type and the size of the fixator frame to be mounted on the patient's bone, the frame mounting plan, the osteotomy/coricotomy site location and the day-by-day length adjustment schedule for each fixator strut. All bone models and surgery plans are shown as 3D graphics on a computer screen to provide realistic, pre-surgery guidance to the surgeon. Post-operative surgical data may be fed back into the computer assisted orthopedic surgery planner software to revise the earlier specified bone distraction trajectory in view of any discrepancy between the pre-operative plan data and the actual, post-operative data.

350 citations


Proceedings ArticleDOI
07 Jul 2001
TL;DR: This paper illustrates how careful modeling of the error sources and the various processing steps enable us to accurately estimate the "response function", the inverse mapping from image measurements to scene radiance for a given camera exposure setting.
Abstract: Charge-Coupled Device (CCD) cameras are widely used imaging sensors in computer vision systems. Many photometric algorithms, such as shape from shading, color constancy and photometric stereo, implicitly assume that the image intensity is proportional to scene radiance. The actual image measurements deviate significantly from this assumption since the transformation from scene radiance to image intensity is non-linear and is a function of various factors including: noise sources in the CCD sensor, as well as various transformations occurring in the camera including: white balancing, gamma correction and automatic gain control. This paper illustrates how careful modeling of the error sources and the various processing steps enable us to accurately estimate the "response function", the inverse mapping from image measurements to scene radiance for a given camera exposure setting. It is shown that the estimation algorithm outperforms the calibration procedures known to us in terms of reduced bias and variance. Further, we demonstrate how the error modelling helps us to obtain uncertainty estimates of the camera irradiance value. The power of this uncertainty modeling is illustrated by a vision task involving High Dynamic Range image generation followed by change detection. Change can be detected reliably even in situation where the two images (the reference scene image and the current image) are taken several hours apart.

228 citations


01 Jan 2001
TL;DR: A statistical shape-fromshading model is developed to recover face shape from a single image, and to synthesize the same face under new illumination to build a simple and fast classifier that was not possible before because of a lack of training data.
Abstract: We propose a model- and exemplar-based approach for face recognition. This problem has been previously tackled using either models or exemplars, with limited success. Our idea uses models to synthesize many more exemplars, which are then used in the learning stage of a face recognition system. To demonstrate this, we develop a statistical shape-fromshading model to recover face shape from a single image, and to synthesize the same face under new illumination. We then use this to build a simple and fast classifier that was not possible before because of a lack of training data.

161 citations


Patent
28 Feb 2001
TL;DR: In this paper, a coarse-to-fine object detection strategy coupled with exhaustive object search across different positions and scales results in an efficient and accurate object detection scheme, and the object detection then proceeds with sampling of the quantized wavelet coefficients at different image window locations on the input image and efficient lookup of pre-computed log-likelihood tables to determine object presence.
Abstract: An object finder program for detecting presence of a 3D object in a 2D image containing a 2D representation of the 3D object. The object finder uses the wavelet transform of the input 2D image for object detection. A pre-selected number of view-based detectors are trained on sample images prior to performing the detection on an unknown image. These detectors then operate on the given input image and compute a quantized wavelet transform for the entire input image. The object detection then proceeds with sampling of the quantized wavelet coefficients at different image window locations on the input image and efficient look-up of pre-computed log-likelihood tables to determine object presence. The object finder's coarse-to-fine object detection strategy coupled with exhaustive object search across different positions and scales results in an efficient and accurate object detection scheme. The object finder detects a 3D object over a wide range in angular variation (e.g., 180 degrees) through the combination of a small number of detectors each specialized to a small range within this range of angular variation.

120 citations


Proceedings ArticleDOI
08 Dec 2001
TL;DR: This paper presents an approach to reliably extracting layers from images by taking advantages of the fact that homographies induced by planar patches in the scene form a low dimensional linear subspace.
Abstract: Representing images with layers has many important applications, such as video compression, motion analysis, and 3D scene analysis. This paper presents an approach to reliably extracting layers from images by taking advantages of the fact that homographies induced by planar patches in the scene form a low dimensional linear subspace. Layers in the input images will be mapped in the subspace, where it is proven that they form well-defined clusters and can be reliably identified by a simple mean-shift based clustering algorithm. Global optimality is achieved since all valid regions are simultaneously taken into account, and noise can be effectively reduced by enforcing the subspace constraint. Good layer descriptions are shown to be extracted in the experimental results.

99 citations


01 Jan 2001
TL;DR: This thesis presents a system for recovering the position and orientation of the target anatomy in 3D space based on iterative comparison of 2D planar radiographs with preoperative CT data, and uses X-ray images acquired at the time of treatment, and iteratively compares them with synthetic images, known as Digitally Reconstructed Radiographs (DRRs), in order to estimate the position.
Abstract: Recent years have seen exciting advances in Computer Assisted Surgery (CAS). CAS systems are currently in use which provide data to the surgeon, provide passive feedback and motion constraint, and even automate parts of the surgery by manipulating cutters and endoscopic cameras. For most of these systems, accurate registration between the patient's anatomy and the CAS system is crucial: if the position of the surgical target is not known with sufficient accuracy, therapies cannot be applied precisely, and treatment efficacy falls. This thesis presents a system for recovering the position and orientation of the target anatomy in 3D space based on iterative comparison of 2D planar radiographs with preoperative CT data. More specifically, this system uses X-ray images acquired at the time of treatment, and iteratively compares them with synthetic images, known as Digitally Reconstructed Radiographs (DRRs), in order to estimate the position and orientation of the target anatomy. An intermediate data representation called a Transgraph is presented. The Transgraph is similar to the Lumigraph, or Light Field, and extends the computer graphics field called image-based rendering to transmission imaging. This representation speeds up computation of DRRs by over an order of magnitude compared to ray-casting techniques, without the use of special graphics hardware. A hardware based volume rendering technique is also presented. This approach is based on new texture mapping techniques which enable DRR generation using off the shelf consumer grade computer graphics hardware. These techniques permit computation of full resolution (512 x 512) DRRs based on 256 x 256 x 256 CT data in roughly 70 ms. The registration system is evaluated for application to frameless stereotactic radiosurgery, and phantom studies are presented demonstrating accuracy comparable to state of the art immobilization systems. Additional phantom studies are presented in which the registration system is used to measure implant orientation following total hip replacement surgery, improving on current practice by a factor of two.

83 citations


Proceedings ArticleDOI
01 Dec 2001
TL;DR: A Bayesian approach to classifying a color image of an outdoor scene using a likelihood model that factors in the physics of the image formation process, sensor noise distribution, and prior distributions over geometry, material types, and illuminant spectrum parameters.
Abstract: Outdoor scene classification is challenging due to irregular geometry, uncontrolled illumination, and noisy reflectance distributions. This paper discusses a Bayesian approach to classifying a color image of an outdoor scene. A likelihood model factors in the physics of the image formation process, sensor noise distribution, and prior distributions over geometry, material types, and illuminant spectrum parameters. These prior distributions are learned through a training process that uses color observations of planar scene patches over time. An iterative linear algorithm estimates the maximum likelihood reflectance, spectrum, geometry, and object class labels for a new image. Experiments on images taken by outdoor surveillance cameras classify known material types and shadow regions correctly, and flag as outliers material types that were not seen previously.

72 citations


Proceedings ArticleDOI
01 Jul 2001
TL;DR: A reconstruction method of multiple motion scenes, which are the scenes containing multiple moving objects, from uncalibrated views, that first performs a projective reconstruction using a bilinear factorization algorithm, and then converts the projective solution to a Euclidean one by enforcing metric constraints.
Abstract: We describe a reconstruction method of multiple motion scenes, which are the scenes containing multiple moving objects, from uncalibrated views. Assuming that the objects are moving with constant velocities, the method recovers the scene structure, the trajectories of the moving objects, the camera motion and the camera intrinsic parameters (except skews) simultaneously. The number of the moving objects is automatically detected without prior motion segmentation. The method is based on a unified geometrical representation of the static scene and the moving objects. It first performs a projective reconstruction using a bilinear factorization algorithm, and then converts the projective solution to a Euclidean one by enforcing metric constraints. Experimental results on synthetic and real images are presented.

Book ChapterDOI
03 Sep 2001
TL;DR: This model uses hierarchical Markov random field (HMRF) to segregate overlapping objects into depth layers, and suggests a broader view that clique potentials in MRF models can be used to encode any local decision rules.
Abstract: To segregate overlapping objects into depth layers requires the integration of local occlusion cues distributed over the entire image into a global percept. We propose to model this process using hierarchical Markov random field (HMRF), and suggest a broader view that clique potentials in MRF models can be used to encode any local decision rules. A topology-dependent multiscale hierarchy is used to introduce long range interaction. The operations within each level are identical across the hierarchy. The clique parameters that encode the relative importance of these decision rules are estimated using an optimization technique called learning from rehearsals based on 2-object training samples. We find that this model generalizes successfully to 5-object test images, and that depth segregation can be completed within two traversals across the hierarchy. This computational framework therefore provides an interesting platform for us to investigate the interaction of local decision rules and global representations, as well as to reason about the rationales underlying some of recent psychological and neurophysiological findings related to figure-ground segregation.

01 Jan 2001
TL;DR: The results of automated facial expression analysis by the CMU/Pittsburgh group are described and an interdisciplinary team of consultants, who have combined expertise in computer vision and in facial analysis, will compare the results with those in a separate report submitted by UCSD group.
Abstract: Two groups were contracted to experiment with coding of FACS (Ekman & Friesen, 1978) action units on a common database. One group is ours at CMU and the University of Pittsburgh, and the other is at UCSD. The database is from Frank and Ekman (1997) who video-recorded an interrogation in which subjects lied or told the truth about a mock crime. Subjects were ethnically diverse, action units occurred during speech, and out-of-plane motion and occlusion from head motion and glasses were common. The video data were originally collected to answer substantive questions in psychology, and represent a substantial challenge to automated AU recognition. This report describes the results of automated facial expression analysis by the CMU/Pittsburgh group. An interdisciplinary team of consultants, who have combined expertise in computer vision and in facial analysis, will compare the results of this report with those in a separate report submitted by UCSD group.

Proceedings ArticleDOI
08 Dec 2001
TL;DR: This work investigates how scale fixing influences the accuracy of 3D reconstruction and determines what measurement should be made to maximize the shape accuracy.
Abstract: Computer vision techniques can estimate 3D shape from images, but usually only up to a scale factor. The scale factor must be obtained by a physical measurement of the scene or the camera motion. Using gauge theory, we show that how this scale factor is determined can significantly affect the accuracy of the estimated shape. And yet these considerations have been ignored in previous works where 3D shape accuracy is optimized. We investigate how scale fixing influences the accuracy of 3D reconstruction and determine what measurement should be made to maximize the shape accuracy.

Book ChapterDOI
14 Oct 2001
TL;DR: An idealized model of the collection process is used to eliminate outliers from the calibration dataset and also to examine the theoretical accuracy limits of this method.
Abstract: This paper describes a calibration method for determining the physical location of the ultrasound (US) image plane relative to a rigidly attached 3D position sensor. A calibrated US probe can measure the 3D spatial location of anatomic structures relative to a global coordinate system. The calibration is performed by aiming the US probe at a calibration target containing a known point (1 mm diameter sphere) in physical space. This point is repeatedly collected at various locations in the US image plane to produce the calibration dataset. An idealized model of the collection process is used to eliminate outliers from the calibration dataset and also to examine the theoretical accuracy limits of this method. The results demonstrate accurate and robust calibration of the 3D spatial relationship between the US image plane and the 3D position sensor.

Book ChapterDOI
14 Oct 2001
TL;DR: A set of novel image features are computed to quantify the statistical distributions of approximate bilateral asymmetry of normal and pathological human brains and this selected feature subset is used as indexing features to retrieve medically similar images under a semantic-based image retrieval framework.
Abstract: This paper reports our methodology and initial results on volumetric pathological neuroimage retrieval. A set of novel image features are computed to quantify the statistical distributions of approximate bilateral asymmetry of normal and pathological human brains. We apply memory-based learning method to find the most-discriminative feature subset through image classification according to predefined semantic categories. Finally, this selected feature subset is used as indexing features to retrieve medically similar images under a semantic-based image retrieval framework. Quantitative evaluations are provided.

Proceedings ArticleDOI
01 Jul 2001
TL;DR: It is shown that stereo computed from the complete light-field is ambiguous if and only if the scene is radiating light of a constant intensity over an extended region.
Abstract: The complete set of measurements that could ever be used by a stereo algorithm is the plenoptic function or light-field. We give a concise characterization of when the light-field of a Lambertian scene uniquely determines its shape, and, conversely, when stereo is inherently ambiguous. We show that stereo computed from the complete light-field is ambiguous if and only if the scene is radiating light of a constant intensity (and color) over an extended region.

01 Jan 2001
TL;DR: In this paper, the authors analyze the reconstruction constraints and show that they provide less and less useful information as the magnification factor increases, and they describe a hallucination algorithm, incorporating the recognition of local features in the low-resolution images, which outperforms existing reconstruction-based algorithms.
Abstract: Super-resolution is usually posed as a reconstruction problem. The low resolution input images are assumed to be noisy, downsampled versions of an unknown super-resolution image that is to be estimated. A common way of inverting the down-sampling process is to write down the reconstruction constraints and then solve them, often adding a smoothness prior to regularize the solution. In this paper, we present two results which both show that there is more to super-resolution than image reconstruction. We first analyze the reconstruction constraints and show that they provide less and less useful information as the magnification factor increases. Afterwards, we describe a “hallucination” algorithm, incorporating the recognition of local features in the low resolution images, which outperforms existing reconstruction-based algorithms.

01 Jan 2001
TL;DR: This paper combines these approaches in a feature-based system to recognize Facial Action Coding System (FACS) action units (AUs) in a complex database of 606 image sequences from 107 adults of European, African, and Asian ancestry.
Abstract: In facial expression analysis, two principle approaches to extract facial features are geometric featurebased methods and appearance-based methods such as Gabor filters. In this paper, we combine these approaches in a feature-based system to recognize Facial Action Coding System (FACS) action units (AUs) in a complex database. The geometric facial features (including mouth, eyes, brows, and cheeks) are extracted using multi-state facial component models. After extraction, these features are represented parametricly. The regional facial appearance patterns are captured using a set of multi-scale and multiorientation Gabor wavelet filters at specific locations. For the upper face, we recognize 8 AUs and neutral expression. The database consists of 606 image sequences from 107 adults of European, African, and Asian ancestry. AUs occur both alone and in combinations. Average recogn ition rate is 87.6% by using geometric facial features alone, 32% by using regional appearance patterns alone, 89.6% by combining both features, and 92.7% after refinement. For the lower face, we recognize 13 AUs and neutral expression. The database consists of 514 image sequences from 180 adults of European, African, and Asian ancestry. AUs occur both alone and in combinations. Average recogn ition rate is 84.7% by using geometric facial features alone, 82% by combining both features, and 87.4% after refinement.


01 Jan 2001
TL;DR: In this paper, the Cramer-Rao lower bound is extended to include parameter indeterminacies, which can have significant impact on the accuracy of the estimated 3D structure.
Abstract: Parameter indeterminacies are inherent in 3D computer vision. We show in this thesis that how they are treated can have significant impact on the accuracy of the estimated 3D structure. However, there has not been a general and convenient method available for representing and analyzing the indeterminacies and their effects on accuracy. Consequently, up to the present their effects are usually ignored in uncertainty modeling research. In this work we a develop gauge-based uncertainty representation for 3D estimation that includes indeterminacies. We represent indeterminacies with orbits in the parameter space and model local linearized parameter indeterminacies as gauge freedoms. Combining this formalism with first order perturbation theory, we are able to model uncertainties along with parameter indeterminacies. The key to our work is a geometric interpretation of the parameters and gauge freedoms. We solve the problem of how to compare parameter uncertainties despite indeterminacies and added constraints. This permits us to extend the Cramer-Rao lower bound to problems that include parameter indeterminacies. In 3D computer vision the basic quantities that often cannot be recovered include scale, rotation and translation. We use our method to analyze the local effects of these indeterminacies on the estimated shape, and find all the local gauge freedoms. This enables us to express the uncertainties when additional information is available from measurements that constrain the gauge freedoms. Through analytical and empirical means we gain intuition into the effects of constraining the gauge freedoms, for both general Structure from Motion and stereo shape estimation. We include, in our uncertainty model, measurement errors and feature localization errors. These results along with our theory allow us to find optimal constraints on the gauge freedoms that maximize the accuracy of the part of the object we seek to estimate.



Journal Article
TL;DR: This work newly introduces pre-defined knowledge that each blob has attributes set which consists of object's type, action, and interaction, and probabilistic relations introduced by a specific Markov model of these attributes sets are introduced.
Abstract: We present a method for estimating activities of multiple objects which are detected in video surveillance systems. In most existing video surveillance systems , the objects detection and classification sometimes cause inaccurate results. In addition to this, we want to monitor activities of objects including interactions between them for long term image sequence. To solve this problem, we newly introduce pre-defined knowledge that each blob has attributes set which consists of object's type, action, and interaction. Using probabilistic relations introduced by a specific Markov model of these attributes sets , the activity descriptions are estimated accurately .

Patent
06 Apr 2001
TL;DR: In this paper, a logiciel de planification de chirurgie orthopedie assistee par ordinateur is presented, which enables the generation 3D (tridimensionnelle) of modeles d'os solides a partir d'images radiologiques 2D (bidimensionnelles) d'un os de patient.
Abstract: L'invention se rapporte a un logiciel de planification de chirurgie orthopedique assistee par ordinateur permettant la generation 3D (tridimensionnelle) de modeles d'os solides a partir d'images radiologiques 2D (bidimensionnelles) d'un os de patient. Ce logiciel de planification de chirurgie orthopedique assistee par ordinateur reconstruit les contours de l'os en commencant avec un os modele 3D et en deformant l'os modele 3D de maniere a l'adapter sensiblement a la geometrie de l'os du patient. Un module de simulation et de planification chirurgicale du logiciel de planification de chirurgie orthopedique assistee par ordinateur genere un programme chirurgical simule presentant l'animation du processus de distraction osseuse, le type et la taille de la structure de fixation devant etre fixee sur l'os du patient, le programme de fixation de cette structure, l'emplacement du site d'osteotomie/coricotomie, et les programmes de chirurgie sont presentes sous forme de graphiques 3D sur un ecran d'ordinateur de facon a fournir au chirurgien une assistance preoperatoire realiste. Les donnees chirurgicales postoperatoires peuvent etre reintroduites dans le logiciel de planification de chirurgie orthopedique assistee par ordinateur afin de permettre une revision de la trajectoire de la distraction osseuse specifiee anterieurement en fonction de toute difference entre les donnees du programme preoperatoire et les donnees courantes, postoperatoires.