Showing papers by "Takeo Kanade published in 2002"

PDF

Open Access

Journal Article•DOI•

Limits on super-resolution and how to break them

[...]

Simon Baker¹, Takeo Kanade¹•Institutions (1)

01 Sep 2002-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work derives a sequence of analytical results which show that the reconstruction constraints provide less and less useful information as the magnification factor increases, and proposes a super-resolution algorithm which attempts to recognize local features in the low-resolution images and then enhances their resolution in an appropriate manner.

...read moreread less

Abstract: Nearly all super-resolution algorithms are based on the fundamental constraints that the super-resolution image should generate low resolution input images when appropriately warped and down-sampled to model the image formation process. (These reconstruction constraints are normally combined with some form of smoothness prior to regularize their solution.) We derive a sequence of analytical results which show that the reconstruction constraints provide less and less useful information as the magnification factor increases. We also validate these results empirically and show that, for large enough magnification factors, any smoothness prior leads to overly smooth results with very little high-frequency content. Next, we propose a super-resolution algorithm that uses a different kind of constraint in addition to the reconstruction constraints. The algorithm attempts to recognize local features in the low-resolution images and then enhances their resolution in an appropriate manner. We call such a super-resolution algorithm a hallucination or reconstruction algorithm. We tried our hallucination algorithm on two different data sets, frontal images of faces and printed Roman text. We obtained significantly better results than existing reconstruction-based algorithms, both qualitatively and in terms of RMS pixel error.

...read moreread less

1,418 citations

Proceedings Article•DOI•

Evaluation of Gabor-wavelet-based facial action unit recognition in image sequences of increasing complexity

[...]

Yingli Tian¹, Takeo Kanade², Jeffrey F. Cohn³•Institutions (3)

IBM¹, Carnegie Mellon University², University of Pittsburgh³

20 May 2002

TL;DR: This paper evaluates a Gabor-wavelet-based method to recognize AUs in image sequences of increasing complexity and finds that the best recognition is a rate of 92.7% obtained by combining Gabor wavelets and geometry features.

...read moreread less

Abstract: Previous work suggests that Gabor-wavelet-based methods can achieve high sensitivity and specificity for emotion-specified expressions (e.g., happy, sad) and single action units (AUs) of the Facial Action Coding System (FACS). This paper evaluates a Gabor-wavelet-based method to recognize AUs in image sequences of increasing complexity. A recognition rate of 83% is obtained for three single AUs when image sequences contain homogeneous subjects and are without observable head motion. The accuracy of AU recognition decreases to 32% when the number of AUs increases to nine and the image sequences consist of AU combinations, head motion, and non-homogeneous subjects. For comparison, an average recognition rate of 87.6% is achieved for the geometry-feature-based method. The best recognition is a rate of 92.7% obtained by combining Gabor wavelets and geometry features.

...read moreread less

256 citations

Journal Article•DOI•

System identification modeling of a small-scale unmanned rotorcraft for flight control design

[...]

Bernard Mettler¹, Mark B. Tischler¹, Takeo Kanade¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 2002-Journal of The American Helicopter Society

236 citations

Proceedings Article•DOI•

Robust full-motion recovery of head by dynamic templates and re-registration techniques

[...]

Jing Xiao¹, Takeo Kanade¹, Jeffrey F. Cohn•Institutions (1)

Carnegie Mellon University¹

20 May 2002

TL;DR: This paper presents a method to recover the full-motion (3 rotations and 3 translations) of the head using a cylindrical model and uses the iteratively re-weighted least squares (IRLS) technique in conjunction with the image gradient to deal with non-rigid motion and occlusion.

...read moreread less

Abstract: This paper presents a method to recover the full-motion (3 rotations and 3 translations) of the head using a cylindrical model. The robustness of the approach is achieved by a combination of three techniques. First, we use the iteratively re-weighted least squares (IRLS) technique in conjunction with the image gradient to deal with non-rigid motion and occlusion. Second, while tracking, the templates are dynamically updated to diminish the effects of self-occlusion and gradual lighting changes and keep tracking the head when most of the face is not visible. Third, because the dynamic templates may cause error accumulation, we re-register images to a reference frame when head pose is close to a reference pose. The performance of the real-time tracking program was evaluated in three separate experiments using image sequences (both synthetic and real) for which ground truth head motion is known. The real sequences included pitch and yaw of as large as 40/spl deg/ and 75/spl deg/ respectively. The average recovery accuracy of the 3D rotations was found to be about 3/spl deg/.

...read moreread less

150 citations

Proceedings Article•DOI•

An active camera system for acquiring multi-view video

[...]

Robert T. Collins¹, Omead Amidi¹, Takeo Kanade¹•Institutions (1)

Carnegie Mellon University¹

10 Dec 2002

TL;DR: A system is described for acquiring multi-view video of a person moving through the environment that adjusts the pan, tilt, zoom and focus parameters of multiple active cameras to keep the moving person centered in each view.

...read moreread less

Abstract: A system is described for acquiring multi-view video of a person moving through the environment. A real-time tracking algorithm adjusts the pan, tilt, zoom and focus parameters of multiple active cameras to keep the moving person centered in each view. The output of the system is a set of synchronized, time-stamped video streams, showing the person simultaneously from several viewpoints.

...read moreread less

108 citations

Proceedings Article•DOI•

Spatio-temporal view interpolation

[...]

Sundar Vedula¹, Simon Baker¹, Takeo Kanade¹•Institutions (1)

Carnegie Mellon University¹

26 Jul 2002

TL;DR: This work proposes a fully automatic algorithm for view interpolation of a completely non-rigid dynamic event across both space and time, and uses it to create re-timed slow-motion fly-by movies of dynamic real-world events.

...read moreread less

Abstract: We propose a fully automatic algorithm for view interpolation of a completely non-rigid dynamic event across both space and time. The algorithm operates by combining images captured across space to compute voxel models of the scene shape at each time instant, and images captured across time to compute the "scene flow" between the voxel models. The scene-flow is the non-rigid 3D motion of every point in the scene. To interpolate in time, the voxel models are "flowed" using an appropriate multiple of the scene flow and a smooth surface fit to the result. The novel image is then computed by ray-casting to the surface at the intermediate time instant, following the scene flow to the neighboring time instants, projecting into the input images at those times, and finally blending the results. We use our algorithm to create re-timed slow-motion fly-by movies of dynamic real-world events.

...read moreread less

96 citations

Proceedings Article•DOI•

Automatic recognition of eye blinking in spontaneously occurring behavior

[...]

Tsuyoshi Moriyama¹, Takeo Kanade¹, Jeffrey F. Cohn¹, Jing Xiao¹, Zara Ambadar², Jiang Gao¹, H. Imamura¹ - Show less +3 more•Institutions (2)

Carnegie Mellon University¹, University of Pittsburgh²

11 Aug 2002

TL;DR: In this article, a system that detects discrete and important facial actions (e.g., eye blinking) in spontaneously occurring facial behavior with non-frontal pose, moderate out-of-plane head motion, and occlusion was developed.

...read moreread less

Abstract: Previous research in automatic facial expression recognition has been limited to recognition of gross expression categories (e.g., joy or anger) in posed facial behavior under well-controlled conditions (e.g., frontal pose and minimal out-of-plane head motion). We developed a system that detects discrete and important facial actions, (e.g., eye blinking), in spontaneously occurring facial behavior with non-frontal pose, moderate out-of-plane head motion, and occlusion. The system recovers 3D motion parameters, stabilizes facial regions, extracts motion and appearance information, and recognizes discrete facial actions in spontaneous facial behavior. We tested the system in video data from a 2-person interview. Subjects were ethnically diverse, action units occurred during speech, and out-of-plane motion and occlusion from head motion and glasses were common. The video data were originally collected to answer substantive questions in psychology, and represent a substantial challenge to automated AU recognition. In the analysis of 335 single and multiple blinks and non-blinks, the system achieved 98% accuracy.

...read moreread less

86 citations

Patent•

System and method for manipulating the point of interest in a sequence of images

[...]

Takeo Kanade¹, Robert T. Collins¹•Institutions (1)

Carnegie Mellon University¹

12 Feb 2002

TL;DR: In this paper, a plurality of camera systems relative to a scene such that the camera systems define a gross trajectory is used to generate a video image sequence, which is then used to display the transformed images in sequence corresponding to the position of corresponding camera systems along the gross trajectory.

...read moreread less

Abstract: A method and a system of generating a video image sequence. According to one embodiment, the method includes positioning a plurality of camera systems relative to a scene such that the camera systems define a gross trajectory. The method further includes transforming images from the camera systems to superimpose a secondary induced motion on the gross trajectory. And the method includes displaying the transformed images in sequence corresponding to the position of the corresponding camera systems along the gross trajectory.

...read moreread less

75 citations

Proceedings Article•DOI•

Design and Flight Testing of a High-Bandwidth H-Infinity Loop Shaping Controller for a Robotic Helicopter

[...]

Marco La Civita¹, George Papageorgiou², William C. Messner¹, Takeo Kanade¹•Institutions (2)

Carnegie Mellon University¹, University of Minnesota²

05 Aug 2002

57 citations

Patent•

System and method for stabilizing rotational images

[...]

Takeo Kanade¹, Robert T. Collins¹•Institutions (1)

Carnegie Mellon University¹

12 Feb 2002

TL;DR: In this paper, a system of generating an image sequence of an object within a scene is presented, which includes capturing an image (images I1-N) of the object with a plurality of camera systems, wherein the camera systems are positioned around the scene.

...read moreread less

Abstract: A method and a system of generating an image sequence of an object within a scene. According to one embodiment, the method includes capturing an image (images I1-N) of the object with a plurality of camera systems, wherein the camera systems are positioned around the scene. Next, the method includes 2D projective transforming certain of the images (I2-N) such that a point of interest in each of the images is at a same position as a point of interest in a first image (I1) from one of the camera systems. The method further includes outputting the transformed images (I2'-N') and the first image (I1) in a sequence corresponding to a positioning of the corresponding camera systems around the scene.

...read moreread less

55 citations

Patent•

System and method for servoing on a moving fixation point within a dynamic scene

[...]

Takeo Kanade, Robert T. Collins, Omead Amidi, Ryan Miller, Wei Hua - Show less +1 more

12 Feb 2002

TL;DR: In this article, a system and method for servoing on a moving target within a dynamic scene is described, which includes a master variable pointing camera system and a plurality of slave variable pointing cameras positioned around the scene.

...read moreread less

Abstract: A system and method for servoing on a moving target within a dynamic scene. According to one embodiment, the system includes a master variable pointing camera system and a plurality of slave variable pointing camera systems positioned around the scene. The system also includes a master control unit in communication with the master variable pointing camera system. The master control unit is for determining, based on parameters of the master variable pointing camera system, parameters for each of the slave variable pointing camera systems such that, at a point in time, the master variable pointing camera system and the slave variable pointing camera systems are aimed at the target and a size of the target in an image from each of the master variable pointing camera system and the slave variable pointing camera systems is substantially the same. The system also includes a plurality of slave camera control units in communication with the master control unit. The slave camera control units are for controlling at least one of the slave variable pointing camera systems based on the parameters for each of the slave variable pointing camera systems. The system may also include a video image sequence generator in communication with the master control unit and the slave camera control units. The video image sequence generator may generate a video image sequence of the target by outputting an image from certain of the master variable pointing camera system and the slave variable pointing camera systems in sequence according to the position of the master variable pointing camera system and the slave variable pointing camera systems around the scene.

...read moreread less

Proceedings Article•DOI•

A robust subspace approach to layer extraction

[...]

Qifa Ke¹, Takeo Kanade¹•Institutions (1)

Carnegie Mellon University¹

05 Dec 2002

TL;DR: A robust subspace approach to extracting layers from images reliably is presented by taking advantage of the fact that homographies induced by planar patches in the scene form a low dimensional linear subspace, which provides a constraint for detecting outliers in the local measurements, thus making the algorithm robust to outliers.

...read moreread less

Abstract: Representing images with layers has many important applications, such as video compression, motion analysis, and 3D scene analysis. The paper presents a robust subspace approach to extracting layers from images reliably by taking advantage of the fact that homographies induced by planar patches in the scene form a low dimensional linear subspace. Such a subspace provides not only a feature space where layers in the image domain are mapped onto denser and better-defined clusters, but also a constraint for detecting outliers in the local measurements, thus making the algorithm robust to outliers. By enforcing the subspace constraint, spatial and temporal redundancy from multiple frames are simultaneously utilized, and noise can be effectively reduced. Good layer descriptions are shown to be extracted in the experimental results.

...read moreread less

Patent•

System and method for obtaining video of multiple moving fixation points within a dynamic scene

[...]

Takeo Kanade, Omead Amidi, Robert T. Collins

23 Oct 2002

TL;DR: In this article, a system and method for obtaining video of a moving fixation point within a scene is presented, which includes a control unit and a plurality of non-moving image capturing devices positioned around the scene, wherein the scene is within a field of view of each image capturing device.

...read moreread less

Abstract: A system and method for obtaining video of a moving fixation point within a scene. According to one embodiment, the system includes a control unit and a plurality of non-moving image capturing devices positioned around the scene, wherein the scene is within a field of view of each image capturing device. The system also includes a plurality of image generators, wherein each image generator is in communication with one of the image capturing devices, and wherein a first of the image generators is responsive to a command from the control unit. The system also includes a surround-view image sequence generator in communication with each of the image generators and responsive to the command form the control unit for generating a surround-view video sequence of the fixation point within the scene based on output form certain of the image generators.

...read moreread less

Journal Article•DOI•

A perspective factorization method for Euclidean reconstruction with uncalibrated cameras

[...]

Mei Han¹, Takeo Kanade•Institutions (1)

NEC¹

01 Sep 2002-Journal of Visualization and Computer Animation

TL;DR: A factorization-based method to recover Euclidean structure from multiple perspective views with uncalibrated cameras, and presents three normalization algorithms which enforce Euclideans constraints on camera calibration parameters to recover the scene structure and the camera calibration simultaneously, assuming zero skew cameras.

...read moreread less

Abstract: Structure from motion (SFM), which is recovering camera motion and scene structure from image sequences, has various applications, such as scene modelling, robot navigation, object recognition and virtual reality. Most of previous research on SFM requires the use of intrinsically calibrated cameras. In this paper we describe a factorization-based method to recover Euclidean structure from multiple perspective views with uncalibrated cameras. The method first performs a projective reconstruction using a bilinear factorization algorithm, and then converts the projective solution to a Euclidean one by enforcing metric constraints. The process of updating a projective solution to a full metric one is referred as normalization in most factorization-based SFM methods. We present three normalization algorithms which enforce Euclidean constraints on camera calibration parameters to recover the scene structure and the camera calibration simultaneously, assuming zero skew cameras. The first two algorithms are linear, one for dealing with the case that only the focal lengths are unknown, and another for the case that the focal lengths and the constant principal point are unknown. The third algorithm is bilinear, dealing with the case that the focal lengths, the principal points and the aspect ratios are all unknown. The results of experiments are presented. Copyright © 2002 John Wiley & Sons, Ltd.

...read moreread less

Proceedings Article•DOI•

Layered detection for multiple overlapping objects

[...]

Hironobu Fujiyoshi¹, Takeo Kanade²•Institutions (2)

Chubu University¹, Carnegie Mellon University²

01 Aug 2002

TL;DR: An important aspect of this work derives from the observation that legitimately moving objects in a scene tend to cause much faster intensity transitions than changes due to lighting, meteorological, and diurnal effects.

...read moreread less

Abstract: This paper describes a method for detecting multiple overlapping objects from a real-time video stream Layered detection is based on two processes: pixel analysis and region analysis Pixel analysis determines whether a pixel is stationary or transient by observing its intensity over time Region analysis detects regions consisting of stationary pixels corresponding to stopped objects These regions are registered as layers on the background image, and thus new moving objects passing through these layers can be detected An important aspect of this work derives from the observation that legitimately moving objects in a scene tend to cause much faster intensity transitions than changes due to lighting, meteorological, and diurnal effects The resulting system robustly detects objects at an outdoor surveillance site For 8 hours of video evaluation, a detection rate of 92% was measured which is higher than traditional background subtraction methods

...read moreread less

Book Chapter•DOI•

Super-Resolution: Limits and Beyond

[...]

Simon Baker¹, Takeo Kanade¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 2002

TL;DR: This final chapter investigates how much extra information is actually added by having more than one image for super- resolution and proposes a super-resolution algorithm which uses a completely different source of information, in addition to the reconstruction constraints.

...read moreread less

Abstract: A variety of super-resolution algorithms have been described in this book. Most of them are based on the same source of information however; that the super-resolution image should generate the lower resolution input images when appropriately warped and down-sampled to model image formation. (This information is usually incorporated into super-resolution algorithms in the form of reconstruction constraints which are frequently combined with a smoothness prior to regularize their solution.) In this final chapter, we first investigate how much extra information is actually added by having more than one image for super-resolution. In particular, we derive a sequence of analytical results which show that the reconstruction constraints provide far less useful information as the decimation ratio increases. We validate these results empirically and show that for large enough decimation ratios any smoothness prior leads to overly smooth results with very little high-frequency content however many (noiseless) low resolution input images are used. In the second half of this chapter, we propose a super-resolution algorithm which uses a completely different source of information, in addition to the reconstruction constraints. The algorithm recognizes local “features” in the low resolution images and then enhances their resolution in an appropriate manner, based on a collection of high and low-resolution training samples. We call such an algorithm a hallucination algorithm.

...read moreread less

A three-dimensional color terrain modeling system for small autonomous helicopters

[...]

Omead Amidi, Charles E. Thorpe, Takeo Kanade, James Ryan Miller

01 Jan 2002

TL;DR: This thesis presents a novel sensor, calibration methodology, and synchronization approach for a working terrain sensor prototype, which has proven to be effective in over 50 modeling flights, which produced terrain models accurate to <20cm in 3D.

...read moreread less

Abstract: This thesis develops a novel aerial terrain modeling system. The system is unique since it flies onboard a small autonomous helicopter and senses the structure and color of its surroundings to build accurate 3D terrain models. The system is capable of modeling terrain where current approaches are too expensive, too dangerous, or too difficult. The prototype system is primarily composed of a mechanically aligned laser rangefinder and 1-pixel color camera, viewing the terrain through a common scan mechanism. The merit of this sensing approach is that range and color measurements are inherently collected from an identical terrain location. This thesis presents a novel sensor, calibration methodology, and synchronization approach for a working terrain sensor prototype. The prototype's performance was verified by carrying out a number of real-world mapping missions. These missions range from geological feature modeling in the Arctic for NASA scientists, to mapping an urban building complex for DARPA researchers. The system has proven to be effective in over 50 modeling flights, which produced terrain models accurate to <20cm in 3D.

...read moreread less

Book Chapter•DOI•

Feature Localization Error in 3D Computer Vision

[...]

Daniel H. Morris¹, Takeo Kanade²•Institutions (2)

Northrop Grumman Corporation¹, Carnegie Mellon University²

01 Jan 2002

TL;DR: This work identifies another source of error which is called feature localization error, which captures how well a feature corresponds to the true 3D point, rather than how well features correspond over multiple images.

...read moreread less

Abstract: Uncertainty modeling in 3D Computer Vision typically relies on propagating the uncertainty of measured feature positions through the modeling equations to obtain the uncertainty of the 3D shape being estimated. It is widely believed that this adequately captures the uncertainties of estimated geometric properties when there are no large errors due to mismatching. However, we identify another source of error which we call feature localization error. This captures how well a feature corresponds to the true 3D point, rather than how well features correspond over multiple images. We model this error as independent of the tracking error, and when combined as part of the total error, we show that it is significant and may even dominate the 3D reconstruction error.

...read moreread less

Journal Article•

A Vision System that Recognizes Objects on General Streets.

[...]

Osamu Hasegawa, Takeo Kanade

01 Jan 2002-Journal of Machine Vision and Applications

TL;DR: A vision based monitoring system which classifies targets (vehicles and humans) based on shape appearance, estimates their colors, and detects special targets, from images of color video cameras set up toward a street.

...read moreread less

Abstract: This paper describes a vision based monitoring system which (1) classifies targets (vehicles and humans) based on shape appearance, (2) estimates their colors, and (3) detects special targets, from images of color video cameras set up toward a street. The categories of targets were classified into {human, sedan, van, truck, mule (golf cart for workers), and others), and their colors were classified into the groups of {redorange-yellow, green, blue-lightblue, white-silver-gray, darkblue-darkgray-black, and darkred-darkorange). On the detection of special targets, the test was carried out setting {FedEx van, UPS van, Police Car) as target and yielded desirable results. The system tracks the target, independently conducts category classification and color estimation, extracts the result with the largest probability throughout the tracking sequence from each result, and provides the data as the final decision. For classification and special target detection, we cooperatively used a stochastic linear discrimination method (linear discriminant analysis : LDA) and nonlinear decision rule (K-Nearest Neighbor rule: K-NN).

...read moreread less

Creating benchmarking problems in machine vision: scientific challenge problems

[...]

Oscar Firschein, Takeo Kanade

01 Jan 2002

TL;DR: A Benchmark 2000 problem is suggested using children's "what is wrong" puzzles in which defective objects in a line drawing of a scene must be found to show how far machine vision has yet to go.

...read moreread less

Abstract: We discuss the need for a new series of benchmarks in the vision field. to provide a direct quantitative measure of progress understandable to sponsors of research as well as a guide to practitioners in the field. A first set of benchmarks in two categories is proposed ( 1 ) static scenes containing manmade objects, and (2) static naturalloutdoor scenes. The tests are "end-to-end" and involve determining how well a system can identify instances (an item or condition is present or absent) in selected regwns of an image. The scoring would be set up so that the automatic setting of adjustable parameters is rewarded and manual tuning is penalized. To show how far machine vision has yet to go, a Benchmark 2000 problem is also suggested using children's "what is wrong" puzzles in which defective objects in a line drawing of a scene must be found.

...read moreread less