scispace - formally typeset
Search or ask a question

Showing papers by "Takeo Kanade published in 2008"


Proceedings Article
01 Sep 2008
TL;DR: The CMU Multi-PIE database as mentioned in this paper contains 337 subjects, imaged under 15 view points and 19 illumination conditions in up to four recording sessions, with a limited number of subjects, a single recording session and only few expressions captured.
Abstract: A close relationship exists between the advancement of face recognition algorithms and the availability of face databases varying factors that affect facial appearance in a controlled manner. The CMU PIE database has been very influential in advancing research in face recognition across pose and illumination. Despite its success the PIE database has several shortcomings: a limited number of subjects, a single recording session and only few expressions captured. To address these issues we collected the CMU Multi-PIE database. It contains 337 subjects, imaged under 15 view points and 19 illumination conditions in up to four recording sessions. In this paper we introduce the database and describe the recording procedure. We furthermore present results from baseline experiments using PCA and LDA classifiers to highlight similarities and differences between PIE and Multi-PIE.

1,181 citations


Journal ArticleDOI
TL;DR: This paper presents a fully automated multi-target tracking system that can efficiently cope with these challenges while simultaneously tracking and analyzing thousands of cells observed using time-lapse phase contrast microscopy.

415 citations


Proceedings ArticleDOI
08 Dec 2008
TL;DR: It is shown that generic bases over trajectories, such as the Discrete Cosine Transform (DCT) basis, can be used to compactly describe most real motions.
Abstract: Existing approaches to nonrigid structure from motion assume that the instantaneous 3D shape of a deforming object is a linear combination of basis shapes, which have to be estimated anew for each video sequence. In contrast, we propose that the evolving 3D structure be described by a linear combination of basis trajectories. The principal advantage of this approach is that we do not need to estimate any basis vectors during computation. We show that generic bases over trajectories, such as the Discrete Cosine Transform (DCT) basis, can be used to compactly describe most real motions. This results in a significant reduction in unknowns, and corresponding stability in estimation. We report empirical performance, quantitatively using motion capture data, and qualitatively on several video sequences exhibiting nonrigid motions including piece-wise rigid motion, partially nonrigid motion (such as a facial expression), and highly nonrigid motion (such as a person dancing).

274 citations


Book ChapterDOI
20 Oct 2008
TL;DR: This paper presents a robust face alignment system that is capable of dealing with exaggerating expressions, large occlusions, and a wide variety of image noises and can effectively recover sufficient shape details from very noisy observations.
Abstract: In this paper, we present a robust face alignment system that is capable of dealing with exaggerating expressions, large occlusions, and a wide variety of image noises. The robustness comes from our shape regularization model, which incorporates constrained nonlinear shape prior, geometric transformation, and likelihood of multiple candidate landmarks in a three-layered generative model. The inference algorithm iteratively examines the best candidate positions and updates face shape and pose. This model can effectively recover sufficient shape details from very noisy observations. We demonstrate the performance of this approach on two public domain databases and a large collection of real-world face photographs.

162 citations


Journal ArticleDOI
TL;DR: Experimental results showed that face tracking combining AAMs and CHMs is more pose robust than that of A AMs in terms of 170% higher tracking rate and the 115% wider pose coverage.
Abstract: The active appearance models (AAMs) provide the detailed descriptive parameters that are useful for various autonomous face analysis problems. However, they are not suitable for robust face tracking across large pose variation for the following reasons. First, they are suitable for tracking the local movements of facial features within a limited pose variation. Second, they use gradient-based optimization techniques for model fitting and the fitting performance is thus very sensitive to initial model parameters. Third, when their fitting is failed, it is difficult to obtain appropriate model parameters to re-initialize them. To alleviate these problems, we propose to combine the active appearance models and the cylinder head models (CHMs), where the global head motion parameters obtained from the CHMs are used as the cues of the AAM parameters for a good fitting or re-initialization. The good AAM parameters for robust face tracking are computed in the following manner. First, we estimate the global motion parameters by the CHM fitting algorithm. Second, we project the previously fitted 2D shape points onto the 3D cylinder surface inversely. Third, we transform the inversely projected shape points by the estimated global motion parameters. Fourth, we project the transformed 3D points onto the input image and computed the AAM parameters from them. Finally, we treat the computed AAM parameters as the initial parameters for the fitting. Experimental results showed that face tracking combining AAMs and CHMs is more pose robust than that of AAMs in terms of 170% higher tracking rate and the 115% wider pose coverage.

95 citations


Proceedings ArticleDOI
14 May 2008
TL;DR: A machine-learning approach for detecting spatiotemporal mitosis events without image segmentation is presented and it is demonstrated that this approach not only improves tracking performance, but can also independently quantify mitoses and cellular divisions.
Abstract: Clinical translation of stem cell research promises to revolutionize medicine. Challenges remain toward belter understanding of stem cell biology and cost-effective strategies for stem cell manufacturing. These challenges call for novel engineering toolsets to study stem cell behaviors and the associated sternness. Towards this goal, we are developing a computer vision based system to automatically and reliably follow the behaviors of individual stem cells in expanding populations. This paper reports on significant progress in our development. In particular, we present a machine-learning approach for detecting spatiotemporal mitosis events without image segmentation. This approach not only improves tracking performance, but can also independently quantify mitoses and cellular divisions. We also employ bilateral filtering to improve cell detection performance. We demonstrate the effectiveness of this system on tracking C2C12 mouse myoblast stem cells.

61 citations


Proceedings ArticleDOI
01 Dec 2008
TL;DR: This work presents several results of the enhancement of low quality nighttime images using denighting, the method exploits the fact that background images of the same scene have been captured all day long with a much higher quality.
Abstract: Nighttime images of a scene from a surveillance camera have lower contrast and higher noise than their corresponding daytime images of the same scene due to low illumination. Denighting is an image enhancement method for improving nighttime images, so that they are closer to those that would have been taken during daytime. The method exploits the fact that background images of the same scene have been captured all day long with a much higher quality. We present several results of the enhancement of low quality nighttime images using denighting.

54 citations


Patent
11 Sep 2008
TL;DR: In this article, a ground-based information dispatch apparatus captures a blind-spot image, showing a region that is a blind spot with respect to a vehicle driver, and a vehicle-mounted camera captures a forward-view image corresponding to the viewpoint of the driver, together with vehicle position and direction information and camera parameters.
Abstract: A camera of a ground-based information dispatch apparatus captures a blind-spot image, showing a region that is a blind spot with respect to a vehicle driver. A vehicle-mounted camera captures a forward-view image corresponding to the viewpoint of the driver, and the forward-view image is transmitted to the information dispatch apparatus together with vehicle position and direction information and camera parameters. Based on the received information, the blind-spot image is converted to a corresponding image having the viewpoint of the vehicle driver, and the forward-view image and viewpoint-converted blind-spot image are combined to form a synthesized image, which is transmitted to the vehicle.

52 citations


Proceedings ArticleDOI
01 Sep 2008
TL;DR: The proposed method can overcome occlusions and divergence problems, and provides fast recovery after occlusion has ended, while preventing divergence problem which frequently occurs in conventional frame-to-frame tracking methods.
Abstract: This paper presents a robust method for tracking the position and orientation of a head in videos. The proposed method can overcome occlusions and divergence problems. We introduce an online registration technique to detect and register feature point of the head while tracking. A set of point features is registered and updated for each reference pose serving a multi-view head detector. The online feature registration rectifies error accumulation and provides fast recovery after occlusion has ended, while preventing divergence problem which frequently occurs in conventional frame-to-frame tracking methods. The robustness of the proposed tracker is experimentally shown with video sequences that include occlusions and large pose variations.

50 citations


Journal ArticleDOI
TL;DR: It is shown that constructing a 3D face model using non-rigid structure-from-motion suffers from the Bas-Relief ambiguity and may result in a “scaled” (stretched/compressed) model.
Abstract: Active Appearance Models (AAMs) are generative, parametric models that have been successfully used in the past to model deformable objects such as human faces. The original AAMs formulation was 2D, but they have recently been extended to include a 3D shape model. A variety of single-view algorithms exist for fitting and constructing 3D AAMs but one area that has not been studied is multi-view algorithms. In this paper we present multi-view algorithms for both fitting and constructing 3D AAMs. Fitting an AAM to an image consists of minimizing the error between the input image and the closest model instance; i.e. solving a nonlinear optimization problem. In the first part of the paper we describe an algorithm for fitting a single AAM to multiple images, captured simultaneously by cameras with arbitrary locations, rotations, and response functions. This algorithm uses the scaled orthographic imaging model used by previous authors, and in the process of fitting computes, or calibrates, the scaled orthographic camera matrices. In the second part of the paper we describe an extension of this algorithm to calibrate weak perspective (or full perspective) camera models for each of the cameras. In essence, we use the human face as a (non-rigid) calibration grid. We demonstrate that the performance of this algorithm is roughly comparable to a standard algorithm using a calibration grid. In the third part of the paper, we show how camera calibration improves the performance of AAM fitting. A variety of non-rigid structure-from-motion algorithms, both single-view and multi-view, have been proposed that can be used to construct the corresponding 3D non-rigid shape models of a 2D AAM. In the final part of the paper, we show that constructing a 3D face model using non-rigid structure-from-motion suffers from the Bas-Relief ambiguity and may result in a "scaled" (stretched/compressed) model. We outline a robust non-rigid motion-stereo algorithm for calibrated multi-view 3D AAM construction and show how using calibrated multi-view motion-stereo can eliminate the Bas-Relief ambiguity and yield face models with higher 3D fidelity.

46 citations


Proceedings ArticleDOI
19 May 2008
TL;DR: This work presents an easy-to-use calibration method for MEMS inertial sensor units based on the factorization method, originally invented for shape-and-motion recovery in computer vision, applicable to any coordination of more than three sensor elements.
Abstract: We present an easy-to-use calibration method for MEMS inertial sensor units based on the factorization method which was originally invented for shape-and-motion recovery in computer vision. Our method requires no explicit knowledge of individual motions applied during calibration procedure. Instead a set of motion constraints in the form of an inner-product is used to factorize sensor measurements into a calibration matrix (that represents intrinsic sensor parameters) and a motion matrix (that represents acceleration or angular velocity). These motion constraints can be collected quickly from a low-cost calibration apparatus. Our method is not limited to just triad configurations but also applicable to any coordination of more than three sensor elements. A redundant configuration has the benefit that all the calibration parameters including biases are estimated at once. Simulation and experiments are provided to verify the proposed method.

Proceedings ArticleDOI
10 Aug 2008
TL;DR: In this article, a projector-camera system is proposed to measure the shape of the naked foot while walking or running, and a characteristic pattern is set on the projector, so that correspondence between the projection pattern and the camera captured image can be solved easily.
Abstract: Recently, techniques for measuring and modeling of human body are recieving attention, because human models are useful for ergonomic design in manufacturing. We aim to accurately measure the dynamic shape of human foot in motion (i.e. walking or running). Such measurement is profitable for shoe design and sports analysis. In this paper, a projector-camera system is proposed to measure the shape of the naked foot while walking or running. A characteristic pattern is set on the projector, so that correspondence between the projection pattern and the camera captured image can be solved easily. Because pattern switching is not required, the system can measure foot shape even when the foot is in motion. The proposed method trades "density of measurement" for "stability of matching", but the reduced density is sufficient for our purpose.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: The explicit application of articulation constraints for estimating the motion of a system of planes is described, relating articulations to the relative homography between planes and showing that for affine cameras, these articulations translate into linear equality constraints on a linear least squares system, yielding accurate and numerically stable estimates of motion.
Abstract: In this paper, we describe the explicit application of articulation constraints for estimating the motion of a system of planes. We relate articulations to the relative homography between planes and show that for affine cameras, these articulations translate into linear equality constraints on a linear least squares system, yielding accurate and numerically stable estimates of motion. The global nature of motion estimation allows us to handle areas where there is limited texture information and areas that leave the field of view. Our results demonstrate the accuracy of the algorithm in a variety of cases such as human body tracking, motion estimation of rigid, piecewise planar scenes and motion estimation of triangulated meshes.

Proceedings ArticleDOI
19 May 2008
TL;DR: This paper presents a method for motion estimation by assuming that such a multi-camera system is a spherical imaging system (that is, the cameras share a single optical center).
Abstract: An imaging sensor made of multiple light-weight non-overlapping cameras is an effective sensor for a small unmanned aerial vehicle that has strong payload limitation. This paper presents a method for motion estimation by assuming that such a multi-camera system is a spherical imaging system (that is, the cameras share a single optical center). We derive analytically and empirically a condition for a multi-camera system to be modeled as a spherical camera. Interestingly, not only does the spherical assumption simplify the algorithms and calibration procedure, but also motion estimation based on that assumption becomes more accurate.

Proceedings ArticleDOI
01 Dec 2008
TL;DR: The use of feature co-occurrence, which captures the similarity of appearance, motion, and spatial information within the people class, makes it an effective detector.
Abstract: This paper presents a method for detecting people based on the co-occurrence of appearance and spatiotemporal features. Histograms of oriented gradients(HOG) are used as appearance features, and the results of pixel state analysis are used as spatiotemporal features. The pixel state analysis classifies foreground pixels as either stationary or transient. The appearance and spatiotemporal features are projected into subspaces in order to reduce the dimensions of the vectors by principal component analysis(PCA). The cascade AdaBoost classifier is used to represent the co-occurrence of the appearance and spatiotemporal features. The use of feature co-occurrence, which captures the similarity of appearance, motion, and spatial information within the people class, makes it an effective detector. Experimental results show that the performance of our method is about 29% better than that of the conventional method.

Proceedings ArticleDOI
01 Sep 2008
TL;DR: An algorithm for sustained tracking of humans, where it is proposed to combine frame-to-frame articulated motion estimation with a per-frame body detection algorithm, which shows stability and sustained accuracy over thousands of frames.
Abstract: In this paper, we propose an algorithm for sustained tracking of humans, where we combine frame-to-frame articulated motion estimation with a per-frame body detection algorithm The proposed approach can automatically recover from tracking error and drift The frame-to-frame motion estimation algorithm replaces traditional dynamic models within a filtering framework Stable and accurate per-frame motion is estimated via an image-gradient based algorithm that solves a linear constrained least squares system The per-frame detector learns appearance of different body parts and dasiasketchespsila expected gradient maps to detect discriminant pose configurations in images The resulting online algorithm is computationally efficient and has been widely tested on a large dataset of sequences of drivers in vehicles It shows stability and sustained accuracy over thousands of frames

Journal ArticleDOI
TL;DR: This IJCV special issue on Modeling and Representations of Large-Scale 3D Scenes is devoted to the latest research results in this interesting and challenging area.
Abstract: Modeling large urban and historical scenes, both indoors and outdoors, has many applications, such as mapping, surveillance, transportation, development planning, archeology, and architecture. Research on large-scale 3D scene modeling has recently attracted increased attention of both academia and industry, resulting in major research projects. In addition to aerial imagery, both closer-range airborne and ground video/lidar sensors are used for achieving rapid, accurate and realistic modeling. Also critical for modeling large-scale 3D man-made urban or historical scenes is the choice of representations to accurately and properly represent fine structures, textureless regions, sharp depth changes, and occlusions. This IJCV special issue on Modeling and Representations of Large-Scale 3D Scenes is devoted to the latest research results in this interesting and challenging area. After a rigorous IJCV peer-review process, eight papers in four categories are selected:

01 Jan 2008
TL;DR: In this article, a robust model-based 3D tracking system by programmable graphics hardware to run online at frame-rate during operation of a humanoid robot and to efficiently auto-initialize.
Abstract: We have accelerated a robust model-based 3D tracking system by programmable graphics hardware to run online at frame-rate during operation of a humanoid robot and to efficiently auto-initialize. The tracker recovers the full 6 degree-of-freedom pose of viewable objects relative to the robot. Leveraging the computational resources of the GPU for perception has enabled us to increase our tracker’s robustness to the significant camera displacement and camera shake typically encountered during humanoid navigation. We have combined our approach with a footstep planner and a controller capable of adaptively adjusting the height of swing leg trajectories. The resulting integrated perception-planning-action system has allowed an HRP-2 humanoid robot to successfully and rapidly localize, approach and climb stairs, as well as to avoid obstacles during walking.

Proceedings ArticleDOI
01 Sep 2008
TL;DR: A method for predicting typical operations performed by vehicle drivers such as pushing a navigation buttonrdquo, ldquoadjusting the rear-view mirrorrd Quo, or ld Quoopening the console boxrd quo, before the driver actually reaches the target position is proposed.
Abstract: In this paper, we propose a method for predicting typical operations performed by vehicle drivers such as ldquopushing a navigation buttonrdquo, ldquoadjusting the rear-view mirrorrdquo, or ldquoopening the console boxrdquo, before the driver actually reaches the target position The prediction method uses the image position of anatomical landmarks (shoulders, elbows, and wrists) as they move over time The difference of configurations among operations is modeled by a combination of clustering and discriminant analysis The proposed method was applied to predict nine frequently executed operations inside a vehicle, running at over 150 frames per second For five subjects, the method achieved an average prediction accuracy of 90% with a false positive rate of 14% at half the operation duration

Proceedings ArticleDOI
01 Dec 2008
TL;DR: An approach to navigation planning for humanoid robots that aims to ensure reliable execution by augmenting the planning process to reason about the robotpsilas ability to successfully perceive its environment during operation by generating a metric that quantifies the dasiasensabilitypsila of the environment in each state given the task to be accomplished.
Abstract: We present an approach to navigation planning for humanoid robots that aims to ensure reliable execution by augmenting the planning process to reason about the robotpsilas ability to successfully perceive its environment during operation. By efficiently simulating the robotpsilas perception system during search, our planner generates a metric, the so-called perceptive capability, that quantifies the dasiasensabilitypsila of the environment in each state given the task to be accomplished. We have applied our method to the problem of planning robust autonomous walking sequences as performed by an HRP-2 humanoid. A fast GPU-accelerated 3D tracker is used for perception, with a footstep planner incorporating reasoning about the robotpsilas perceptive capability. When combined with a controller capable of adaptively adjusting the height of swing leg trajectories, HRP-2 is able to navigate around obstacles and climb stairs in dynamically changing environments. Reasoning about the future perceptive capability ensures that sensing remains operational throughout the walking sequence and yields higher task success rates than perception-unaware planning.

Book ChapterDOI
06 Sep 2008
TL;DR: An algorithm to correct 3D reconstruction errors of 3D ultrasound catheter caused by ultrasound image thickness is presented and a method to quickly measure ultrasound image plane's thickness is provided.
Abstract: In this paper we present an algorithm to correct 3D reconstruction errors of 3D ultrasound catheter caused by ultrasound image thickness We also provide a method to quickly measure ultrasound image plane's thickness With thickness correction registration accuracy of navigation system using 3D ultrasound catheters can be improved by 20%


Book ChapterDOI
01 Jan 2008
TL;DR: Automated analysis of facial images has found eyes still to be a difficult target due to the diversities in the appearance of eyes due to both structural individuality and motion of eyes, as shown in Fig. 2.1.
Abstract: Automated analysis of facial images has found eyes still to be a difficult target [90, 96, 97, 125, 215, 221, 229, 230, 248, 360, 509, 692, 709]. The difficulty comes from the diversities in the appearance of eyes due to both structural individuality and motion of eyes, as shown in Fig. 2.1. Past studies have failed to represent these diversities adequately. For example, Tian et al. [616] used a pair of parabolic curves and a circle as a generic eye model, but parabolic curves have too few parameters to represent the complexity of eyelid shape and motion. Statistical models have been deployed to represent such individual differences for the whole eye region [322, 569, 635], but not for subregions, such as the eyelids, due in part to limited variation in training samples.

Proceedings ArticleDOI
07 Jan 2008
TL;DR: This work presents a novel feature representation for categorical object detection that can be complemented by the traditional holistic patch method, thus achieving both efficiency and accuracy.
Abstract: We present a novel feature representation for categorical object detection. Unlike previous approaches that have concentrated on generic interest-point detectors, we construct object-specific features directly from the training images. Our feature is represented by a collection of Flexible Edge Arrangement Templates (FEATs). We propose a two-stage semi-supervised learning approach to feature selection. A subset of frequent templates are first selected from a large template pool. In the second stage, we formulate feature selection as a regression problem and use LASSO method to find the most discriminative templates from the preselected ones. FEATs adaptively capture the image structure and naturally accommodate local shape variations. We show that this feature can be complemented by the traditional holistic patch method, thus achieving both efficiency and accuracy. We evaluate our method on three well-known car datasets, showing performance competitive with existing methods.

Proceedings ArticleDOI
17 Nov 2008
TL;DR: This work presents several results of the enhancement of low quality nighttime images using denighting, the method exploits the fact that background images of the same scene have been captured all day long with a much higher quality.
Abstract: Nighttime images of a scene from a surveillance camera have lower contrast and higher noise than their corresponding daytime images of the same scene due to low illumination. Denighting is an image enhancement method for improving nighttime images, so that they are closer to those that would have been taken during daytime. The method exploits the fact that background images of the same scene have been captured all day long with a much higher quality. We present several results of the enhancement of low quality nighttime images using denighting.

Journal ArticleDOI
06 Jun 2008
TL;DR: A robust model-based 3D tracking system by programmable graphics hardware to run online at frame-rate during operation of a humanoid robot and to efficiently auto-initialize, which recovers the full 6 degree-of-freedom pose of viewable objects relative to the robot.
Abstract: We have accelerated a robust model-based 3D tracking system by programmable graphics hardware to run online at frame-rate during operation of a humanoid robot and to efficiently auto-initialize. The tracker recovers the full 6 degree-of-freedom pose of viewable objects relative to the robot. Leveraging the computational resources of the GPU for perception has enabled us to increase our tracker’s robustness to the significant camera displacement and camera shake typically encountered during humanoid navigation. We have combined our approach with a footstep planner and a controller capable of adaptively adjusting the height of swing leg trajectories. The resulting integrated perception-planning-action system has allowed an HRP-2 humanoid robot to successfully and rapidly localize, approach and climb stairs, as well as to avoid obstacles during walking.