Showing papers by "Takeo Kanade published in 2007"

PDF

Open Access

Proceedings Article•DOI•

[...]

Yaser Sheikh¹, E.A. Khan², Takeo Kanade¹•Institutions (2)

Carnegie Mellon University¹, University of Central Florida²

26 Dec 2007

TL;DR: A nonparametric mode-seeking algorithm, called medoidshift, based on approximating the local gradient using a weighted estimate of medoids, which automatically computes the number of clusters and the data does not have to be linearly separable.

...read moreread less

Abstract: We present a nonparametric mode-seeking algorithm, called medoidshift, based on approximating the local gradient using a weighted estimate of medoids. Like meanshift, medoidshift clustering automatically computes the number of clusters and the data does not have to be linearly separable. Unlike meanshift, the proposed algorithm does not require the definition of a mean. This property allows medoidshift to find modes even when only a distance measure between samples is defined. In this sense, the relationship between the medoidshift algorithm and the meanshift algorithm is similar to the relationship between the k-medoids and the k-means algorithms. We show that medoidshifts can also be used for incremental clustering of growing datasets by recycling previous computations. We present experimental results using medoidshift for image segmentation, incremental clustering for shot segmentation and clustering on nonlinearly separable data.

...read moreread less

173 citations

Proceedings Article•DOI•

GPU-accelerated real-time 3D tracking for humanoid locomotion and stair climbing

[...]

Philipp Michel¹, J. Chestnut¹, Satoshi Kagami¹, Koichi Nishiwaki¹, James J. Kuffner², Takeo Kanade² - Show less +2 more•Institutions (2)

Carnegie Mellon University¹, National Institute of Advanced Industrial Science and Technology²

10 Dec 2007

TL;DR: A robust model-based three-dimensional tracking system by programmable graphics hardware to operate online at frame-rate during locomotion of a humanoid robot and recovers the full 6 degree-of- freedom pose of viewable objects relative to the robot.

...read moreread less

Abstract: For humanoid robots to fully realize their biped potential in a three-dimensional world and step over, around or onto obstacles such as stairs, appropriate and efficient approaches to execution, planning and perception are required. To this end, we have accelerated a robust model-based three-dimensional tracking system by programmable graphics hardware to operate online at frame-rate during locomotion of a humanoid robot. The tracker recovers the full 6 degree-of- freedom pose of viewable objects relative to the robot. Leveraging the computational resources of the GPU for perception has enabled us to increase our tracker's robustness to the significant camera displacement and camera shake typically encountered during humanoid navigation. We have combined our approach with a footstep planner and a controller capable of adaptively adjusting the height of swing leg trajectories. The resulting integrated perception-planning-action system has allowed an HRP-2 humanoid robot to successfully and rapidly localize, approach and climb stairs, as well as to avoid obstacles during walking.

...read moreread less

138 citations

Proceedings Article•DOI•

Efficient Two-phase 3D Motion Planning for Small Fixed-wing UAVs

[...]

Myung Hwangbo¹, James J. Kuffner¹, Takeo Kanade¹•Institutions (1)

Carnegie Mellon University¹

10 Apr 2007

TL;DR: A new planning heuristic for 3D motions of fixed-wing UAVs based on 2D Dubins curves, along with precomputed sets of motion primitives derived from the vehicle dynamics model are introduced in order to achieve high efficiency.

...read moreread less

Abstract: We present an efficient two-phase approach to motion planning for small fixed-wing unmanned aerial vehicles (UAVs) navigating in complex 3D air slalom environments. A coarse global motion planner first computes a kinematically feasible obstacle-free path in a discretized 3D workspace which roughly satisfies the kinematic constraints of the UAV. Given a coarse global path, a fine local motion planner is used to compute a more accurate trajectory for the UAV at a higher level of detail. The local planner is iterated as the vehicle traverses and refines the global path as needed up to its planning horizon. We also introduce a new planning heuristic for 3D motions of fixed-wing UAVs based on 2D Dubins curves, along with precomputed sets of motion primitives derived from the vehicle dynamics model in order to achieve high efficiency.

...read moreread less

117 citations

Journal Article•DOI•

Quasiconvex Optimization for Robust Geometric Reconstruction

[...]

Qifa Ke¹, Takeo Kanade²•Institutions (2)

Ricoh¹, Carnegie Mellon University²

01 Oct 2007-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper presents a novel quasiconvex optimization framework in which the geometric reconstruction problems are formulated as a small number of small-scale convex programs that are readily solvable and provides an intuitive method to handle directional uncertainties and outliers in measurements.

...read moreread less

Abstract: Geometric reconstruction problems in computer vision are often solved by minimizing a cost function that combines the reprojection errors in the 2D images. In this paper, we show that, for various geometric reconstruction problems, their reprojection error functions share a common and quasiconvex formulation. Based on the quasiconvexity, we present a novel quasiconvex optimization framework in which the geometric reconstruction problems are formulated as a small number of small-scale convex programs that are readily solvable. Our final reconstruction algorithm is simple and has intuitive geometric interpretation. In contrast to existing local minimization approaches, our algorithm is deterministic and guarantees a predefined accuracy of the minimization result. The quasiconvexity also provides an intuitive method to handle directional uncertainties and outliers in measurements. For a large-scale problem or in a situation where computational resources are constrained, we provide an efficient approximation that gives a good upper bound (but not global minimum) on the reconstruction error. We demonstrate the effectiveness of our algorithm by experiments on both synthetic and real data.

...read moreread less

111 citations

Spatio-Temporal Frequency Analysis for Removing Rain and Snow from Videos

[...]

Peter Barnum¹, Takeo Kanade¹, Srinivasa G. Narasimhan¹•Institutions (1)

Carnegie Mellon University¹

14 Oct 2007

TL;DR: A physical model of raindrops and snowflakes is derived that is combined with the statistical properties of rain and snow, to determine how they effect the spatio-temporal frequencies of an image sequence, and the effectiveness of removal is shown.

...read moreread less

Abstract: Capturing good videos outdoors can be challenging due to harsh lighting, unpredictable scene changes, and most relevant to this work, dynamic weather. Particulate weather, such as rain and snow, creates complex flickering effects that are irritating to people and confusing to vision algorithms. Although each raindrop or snowflake only affects a small number of pixels, collections of them have predictable global spatio-temporal properties. In this paper, we formulate a model of these global dynamic weather frequencies. To begin, we derive a physical model of raindrops and snowflakes that is used to determine the general shape and brightness of a single streak. This streak model is combined with the statistical properties of rain and snow, to determine how they effect the spatio-temporal frequencies of an image sequence. Once detected, these frequencies can then be suppressed. At a small scale, many things appear the same as rain and snow, but by treating them as global phenomena, we achieve better performance than with just a local analysis. We show the effectiveness of removal on a variety of complex video sequences.

...read moreread less

85 citations

Book Chapter•DOI•

Cell population tracking and lineage construction with spatiotemporal context

[...]

Kang Li¹, Mei Chen², Takeo Kanade¹•Institutions (2)

Carnegie Mellon University¹, Intel²

29 Oct 2007

TL;DR: The system performs tracking by cycling through frame-by-frame track compilation and spatiotemporal track linking, combining the power of two tracking paradigms, outperforming previous work by up to 8%.

...read moreread less

Abstract: Automated visual-tracking of cell populations in vitro using phase contrast time-lapse microscopy is vital for quantitative, systematic and high-throughput measurements of cell behaviors. These measurements include the spatiotemporal quantification of migration, mitosis, apoptosis, and cell lineage. This paper presents an automated cell tracking system that can simultaneously track and analyze thousands of cells. The system performs tracking by cycling through frame-by-frame track compilation and spatiotemporal track linking, combining the power of two tracking paradigms. We applied the system to a range of cell populations including adult stem cells. The system achieved tracking accuracies in the range of 83.8%-92.5%, outperforming previous work by up to 8%.

...read moreread less

63 citations

Journal Article•DOI•

A Unified Gradient-Based Approach for Combining ASM into AAM

[...]

Jaewon Sung¹, Takeo Kanade², Daijin Kim¹•Institutions (2)

Pohang University of Science and Technology¹, Carnegie Mellon University²

01 Nov 2007-International Journal of Computer Vision

TL;DR: A new fitting method is proposed that combines the objective functions of both ASM and AAM into a single objective function in a gradient based optimization framework and improves the performance of facial expression recognition significantly.

...read moreread less

Abstract: Active Appearance Model (AAM) framework is a very useful method that can fit the shape and appearance model to the input image for various image analysis and synthesis problems. However, since the goal of the AAM fitting algorithm is to minimize the residual error between the model appearance and the input image, it often fails to accurately converge to the landmark points of the input image. To alleviate this weakness, we have combined Active Shape Models (ASM) into AAMs, in which ASMs try to find correct landmark points using the local profile model. Since the original objective function of the ASM search is not appropriate for combining these methods, we derive a gradient based iterative method by modifying the objective function of the ASM search. Then, we propose a new fitting method that combines the objective functions of both ASM and AAM into a single objective function in a gradient based optimization framework. Experimental results show that the proposed fitting method reduces the average fitting error when compared with existing fitting methods such as ASM, AAM, and Texture Constrained-ASM (TC-ASM) and improves the performance of facial expression recognition significantly.

...read moreread less

46 citations

Proceedings Article•DOI•

Locomotion among dynamic obstacles for the honda ASIMO

[...]

Joel Chestnutt¹, Philipp Michel¹, James J. Kuffner¹, Takeo Kanade¹•Institutions (1)

Carnegie Mellon University¹

10 Dec 2007

TL;DR: A vision system is used to predict the velocities of objects in the scene, allowing ASIMO to safely navigate autonomously through a dynamic environment, allowing the robot to successfully circumnavigate the moving obstacles.

...read moreread less

Abstract: We have equipped a Honda ASIMO humanoid with the ability to navigate autonomously in obstacle-filled environments. In addition to finding its way through known, fixed obstacle configurations, the planning system can reason about the future state of the world to locomote through challenging environments when the obstacle motions can be inferred from observation. This video presents work using a vision system to predict the velocities of objects in the scene, allowing ASIMO to safely navigate autonomously through a dynamic environment. Neither obstacle positions nor velocities are known at the start of the trial, but are estimated online as the robot walks. The planner constantly adjusts the footstep path with the latest estimates of ASIMO's position and the obstacle trajectories, allowing the robot to successfully circumnavigate the moving obstacles.

...read moreread less

42 citations

Journal Article•DOI•

The Asymmetry of Image Registration and Its Application to Face Tracking

[...]

Goksel Dedeoglu¹, Takeo Kanade¹, Simon Baker¹•Institutions (1)

Carnegie Mellon University¹

01 May 2007-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel, "resolution-aware fitting" (RAF) algorithm that respects the asymmetry and incorporates an explicit model of the blur caused by the camera's sensing elements into the fitting formulation and Experimental results show that RAF significantly improves the estimation accuracy of both shape and appearance parameters when fitting to low-resolution data.

...read moreread less

Abstract: Most image registration problems are formulated in an asymmetric fashion. Given a pair of images, one is implicitly or explicitly regarded as a template and warped onto the other to match as well as possible. In this paper, we focus on this seemingly arbitrary choice of the roles and reveal how it may lead to biased warp estimates in the presence of relative scaling. We present a principled way of selecting the template and explain why only the correct asymmetric form, with the potential inclusion of a blurring step, can yield an unbiased estimator. We validate our analysis in the domain of model-based face tracking. We show how the usual active appearance model (AAM) formulation overlooks the asymmetry issue, causing the fitting accuracy to degrade quickly when the observed objects are smaller than their model. We formulate a novel, "resolution-aware fitting" (RAF) algorithm that respects the asymmetry and incorporates an explicit model of the blur caused by the camera's sensing elements into the fitting formulation. We compare the RAF algorithm against a state-of-the-art tracker across a variety of resolutions and AAM complexity levels. Experimental results show that RAF significantly improves the estimation accuracy of both shape and appearance parameters when fitting to low-resolution data. Recognizing and accounting for the asymmetry of image registration leads to tangible accuracy improvements in analyzing low-resolution imagery

...read moreread less

42 citations

CareMedia: Automated Video and Sensor Analysis for Geriatric Care

[...]

Alexander G. Hauptmann, Takeo Kanade, Scott Stevens, Ashok Bharucha

01 Jan 2007

TL;DR: An algorithm for dining activity analysis in a nursing home is described and a hidden Markov model is proposed to characterize different stages in dining activities with certain temporal order, which could be successful in assisting caregivers in assessments of resident's activity levels over time.

...read moreread less

Abstract: In the absence of objective, reliable assessment and outcomes measurement methodologies in a nursing home, effectiveness of behavioral and pharmacological interventions cannot be determined. Pervasive technology holds the promise of developing objective, real-time, continuous assessment and outcomes measurement methodologies that were previously unfeasible. Such technologies can contribute greatly to a deeper understanding of the activity and behavior patterns of individual residents, and the physical, environmental and psychosocial correlates of these patterns. Bharucha, A., Allin, S. and Stevens, S., “CareMedia: Towards Automated Behavior Analysis in the Nursing Home Setting,” in The International Psychogeriatric Association Eleventh International Conference, Aug. 17-22, 2003. Several hours of surveillance-type video were captured in a nursing home. The task of data reduction and extraction of highlevel activity information was approached through both automated and manual techniques. For the manual encoding, 4 undergraduate students were trained by a geriatric psychiatrist to code the data frame-by-frame. A computer interface allowed coders to annotate behaviors of interest, as well as physical pose and ambulatory status. Behaviors of interest were identified with the CohenMansfield Agitation Inventory and grouped into 4 sub-categories: CareMedia Carnegie Mellon University 8 CareMedia: Automated Video and Sensor Analysis for Geriatric Care March 2003 Annual Progress Report physically aggressive, physically non-aggressive, verbally aggressive, and verbally non-aggressive. These manual encodings are currently forming the development of automated techniques at Carnegie Mellon University to extract information relevant to the detection of anomalous and disruptive physical activities. This includes automated tracking and extraction of navigational patterns. Gao, J., Hauptmann, A.G., Barucha, A. and Wactlar, H.D., “Dining Activity Analysis Using Hidden Markov Models,” accepted to The 17th International Conference on Pattern Recognition (ICPR’04), Cambridge, United Kingdom, Aug. 23-26, 2004. Abstract: We describe an algorithm for dining activity analysis in a nursing home. Based on several features, including motion vectors and distance between moving regions in the subspace of an individual person, a hidden Markov model is proposed to characterize different stages in dining activities with certain temporal order. Using HMM model, we are able to identify the start (and ending) of individual dining events with high accuracy and low false positive rate. This approach could be successful in assisting caregivers in assessments of resident's activity levels over time. We describe an algorithm for dining activity analysis in a nursing home. Based on several features, including motion vectors and distance between moving regions in the subspace of an individual person, a hidden Markov model is proposed to characterize different stages in dining activities with certain temporal order. Using HMM model, we are able to identify the start (and ending) of individual dining events with high accuracy and low false positive rate. This approach could be successful in assisting caregivers in assessments of resident's activity levels over time. Gao, J., Hauptmann, A.G. and Wactlar, H.D., “Combining Motion Segmentation with Tracking for Activity Analysis,” submitted to The Sixth International Conference on Automatic Face and Gesture Recognition (FG’04), Seoul, Korea, May 17-19, 2004. Abstract: We explore a novel motion feature as the appropriate basis for classifying or describing a number of fine motor human activities. Our approach not only estimates motion directions and magnitudes in different image regions, but also provides accurate segmentation of moving regions. Through a combination of motion segmentation and region tracking techniques, while filtering for temporal consistency, we achieve a balance between accuracy and reliability of motion feature extraction. To identify specific activities, we characterize the dominant directions of relative motions. Experimental results show that this approach to motion feature analysis could be successful in assisting caregivers at a nursing home in assessments of patient's activity levels over time. We explore a novel motion feature as the appropriate basis for classifying or describing a number of fine motor human activities. Our approach not only estimates motion directions and magnitudes in different image regions, but also provides accurate segmentation of moving regions. Through a combination of motion segmentation and region tracking techniques, while filtering for temporal consistency, we achieve a balance between accuracy and reliability of motion feature extraction. To identify specific activities, we characterize the dominant directions of relative motions. Experimental results show that this approach to motion feature analysis could be successful in assisting caregivers at a nursing home in assessments of patient's activity levels over time. Hauptmann, A.G., Gao, J., Yan, R., Qi, Y., Yang, J., and Wactlar, H.D., “Aiding Geriatric Patients and Caregivers through Automated Analysis of Nursing Home Observations,” to be published in IEEE Pervasive Computing, April-June special issue: Pervasive Computing for Successful Aging. Abstract: Through pervasive activity monitoring in a skilled nursing facility, a continuous audio and video record is captured. Through pervasive activity monitoring in a skilled nursing facility, a continuous audio and video record is captured. CareMedia Carnegie Mellon University 9 CareMedia: Automated Video and Sensor Analysis for Geriatric Care March 2003 Annual Progress Report Our CareMedia Project research analyzes this video information by automatically tracking people, assisting in efficiently labeling individuals, and characterizing selected activities and actions. Special emphasis is given to detecting eating activity in the dining hall and to personal hygiene. Through this work, the video record is transformed into an information asset that can provide geriatric care specialists with greater insights and evaluation of behavioral problems for the elderly. Evaluations of the effectiveness of analyzing such a large video record illustrate the feasibility of our approach. Hauptmann, A.G., Jin, R. and Wactlar, H.D., “Data Analysis for a Multimedia Library, in Text and Speech-Triggered Information Access,” Renals, S and Grefenstette, G. (eds)., Springer, Berlin, pp. 6-37, 2003. Abstract: This book section describes the indexing, search and retrieval of various combinations of audio, video, text and image media and the automated content processing that enables it. The intent is to provide a framework for data analysis in multimedia digital libraries. The introduction briefly distinguishes the digital from traditional libraries and touches on the specific issues important to searching the content of multimedia libraries. The second section introduces the Informedia Digital Video Library as an example of a multimedia library, including a quick tour of the functionality. The next section discusses the processing of audio and image information, as it relates to a multimedia library. Section four illustrates the interplay between audio and video information using a video information retrieval experiment as an example. Section five discusses the exporting and sharing of metadata in a digital library using MPEG-7. Finally, section 6 provides one vision of a future digital library, where all personal memory can be recorded and accessed. This book section describes the indexing, search and retrieval of various combinations of audio, video, text and image media and the automated content processing that enables it. The intent is to provide a framework for data analysis in multimedia digital libraries. The introduction briefly distinguishes the digital from traditional libraries and touches on the specific issues important to searching the content of multimedia libraries. The second section introduces the Informedia Digital Video Library as an example of a multimedia library, including a quick tour of the functionality. The next section discusses the processing of audio and image information, as it relates to a multimedia library. Section four illustrates the interplay between audio and video information using a video information retrieval experiment as an example. Section five discusses the exporting and sharing of metadata in a digital library using MPEG-7. Finally, section 6 provides one vision of a future digital library, where all personal memory can be recorded and accessed. Jin, R., Hauptmann, A., Carbonell, J., Si, L., Liu, Y., “A New Boosting Algorithm Using Input Dependent Regularizer,” 20th International Conference on Machine Learning (ICML'03), Washington, DC, August 21-24, 2003. Abstract: AdaBoost has proved to be an effective method to improve the performance of base classifiers both theoretically and empirically. However, previous studies have shown that AdaBoost might suffer from the overfitting problem, especially for noisy data. In addition, most current work on boosting assumes that the combination weights are fixed constants and therefore does not take particular input patterns into consideration. In this paper, we present a new boosting algorithm, “WeightBoost”, which tries to solve these two problems by introducing an input-dependent regularization factor to the combination weight. Similarly to AdaBoost, we derive a learning procedure for WeightBoost, which AdaBoost has proved to be an effective method to improve the performance of base classifiers both theoretically and empirically. However, previous studies have shown that AdaBoost might suffer from the overfitting problem, especially for noisy data. In addition, most current work on boosting assumes that the combination weights are fixed constants and therefore does not take particular input p

...read moreread less

38 citations

Cell Population Tracking and Lineage Construction Using Multiple-Model Dynamics Filters and Spatiotemporal Optimization

[...]

Kang Li¹, Takeo Kanade•Institutions (1)

Carnegie Mellon University¹

01 Jan 2007

TL;DR: An automated cell tracking system that can simultaneously track and analyze thousands of cells by cycling through frame-by-frame track compilation and spatiotemporal track linking is presented, combining the power of two tracking paradigms.

...read moreread less

Abstract: Automated visual-tracking of cell populations in vitro using phase contrast time-lapse microscopy is vital for quantitative, systematic and high-throughput measurements of cell behaviors. These measurements include the spatiotemporal quantification of migration, mitosis, apoptosis, and cell lineage. This paper presents an automated cell tracking system that can simultaneously track and analyze thousands of cells. The system performs tracking by cycling through frame-by-frame track compilation and spatiotemporal track linking, combining the power of two tracking paradigms. We applied the system to a range of cell populations including adult stem cells. The system achieved tracking accuracies in the range of 85.9%– 92.5%, outperforming previous work by up to 9%. The proposed tracking methodology is valuable for tissue engineering, stem cell research, drug discovery and development, and related areas.

...read moreread less

Book Chapter•DOI•

Image segmentation using iterated graph cuts based on multi-scale smoothing

[...]

Tomoyuki Nagahashi¹, Hironobu Fujiyoshi¹, Takeo Kanade²•Institutions (2)

Chubu University¹, Carnegie Mellon University²

18 Nov 2007

TL;DR: The proposed method can segment the regions of an object with a stepwise process from global to local segmentation by iterating the graph-cuts process with Gaussian smoothing using different values for the standard deviation.

...read moreread less

Abstract: We present a novel approach to image segmentation using iterated Graph Cuts based on multi-scale smoothing. We compute the prior probability obtained by the likelihood from a color histogram and a distance transform using the segmentation results from graph cuts in the previous process, and set the probability as the t-link of the graph for the next process. The proposed method can segment the regions of an object with a stepwise process from global to local segmentation by iterating the graph-cuts process with Gaussian smoothing using different values for the standard deviation. We demonstrate that we can obtain 4.7% better segmentation than that with the conventional approach.

...read moreread less

Journal Article•DOI•

Virtualized Reality: Perspectives on 4D Digitization of Dynamic Events

[...]

Takeo Kanade¹, P. J. Narayanan²•Institutions (2)

Carnegie Mellon University¹, International Institute of Information Technology²

01 May 2007-IEEE Computer Graphics and Applications

TL;DR: The virtualized reality system serves as an example on the general problem of digitizing dynamic events and the details of the system's details are presented from a historical perspective.

...read moreread less

Abstract: Digitally recording dynamic events, such as sporting events, for experiencing in a spatio-temporally distant and arbitrary setting requires 4D capture: three dimensions for their geometry and appearance over the fourth dimension of time. Today's computer vision techniques make 4D capture possible. The virtualized reality system serves as an example on the general problem of digitizing dynamic events. In this article, we present the virtualized reality system's details from a historical perspective

...read moreread less

Proceedings Article•DOI•

Filtered Component Analysis to Increase Robustness to Local Minima in Appearance Models

[...]

F. De la Torre¹, Alvaro Collet¹, M. Quero¹, Jeffrey F. Cohn¹, Takeo Kanade¹ - Show less +1 more•Institutions (1)

Carnegie Mellon University¹

17 Jun 2007

TL;DR: FCA learns an optimal set of filters with which to build a multi-band representation of the object, and was found to be more robust than either grayscale or Gabor filters to problems of local minima.

...read moreread less

Abstract: Appearance models (AM) are commonly used to model appearance and shape variation of objects in images. In particular, they have proven useful to detection, tracking, and synthesis of people's faces from video. While AM have numerous advantages relative to alternative approaches, they have at least two important drawbacks. First, they are especially prone to local minima in fitting; this problem becomes increasingly problematic as the number of parameters to estimate grows. Second, often few if any of the local minima correspond to the correct location of the model error. To address these problems, we propose filtered component analysis (FCA), an extension of traditional principal component analysis (PCA). FCA learns an optimal set of filters with which to build a multi-band representation of the object. FCA representations were found to be more robust than either grayscale or Gabor filters to problems of local minima. The effectiveness and robustness of the proposed algorithm is demonstrated in both synthetic and real data.

...read moreread less

Proceedings Article•DOI•

Learning GMRF Structures for Spatial Priors

[...]

Lie Gu¹, Eric P. Xing¹, Takeo Kanade¹•Institutions (1)

Carnegie Mellon University¹

17 Jun 2007

TL;DR: The goal of this paper is to find sparse and representative spatial priors that can be applied to part-based object localization and it is shown that the learned graph captures meaningful geometrical variations with significantly sparser structure and leads to better parts localization results.

...read moreread less

Abstract: The goal of this paper is to find sparse and representative spatial priors that can be applied to part-based object localization. Assuming a GMRF prior over part configurations, we construct the graph structure of the prior by regressing the position of each part on all other parts, and selecting the neighboring edges using a Lasso-based method. This approach produces a prior structure which is not only sparse, but also faithful to the spatial dependencies that are observed in training data. We evaluate the representation power of the learned prior structure in two ways: first is drawing samples from the prior, and comparing them with the samples produced by the GMRF priors of other structures; second is comparing the results when applying different priors to a facial components localization task. We show that the learned graph captures meaningful geometrical variations with significantly sparser structure and leads to better parts localization results.

...read moreread less

Proceedings Article•DOI•

Coplanar Shadowgrams for Acquiring Visual Hulls of Intricate Objects

[...]

Shuntaro Yamazaki, Srinivasa G. Narasimhan¹, Simon Baker², Takeo Kanade³•Institutions (3)

Carnegie Mellon University¹, Microsoft², National Institute of Advanced Industrial Science and Technology³

26 Dec 2007

TL;DR: A practical approach to SFS is presented using a novel technique called coplanar shadowgram imaging, that allows us to use dozens to even hundreds of views for visual hull reconstruction, and yields novel geometric properties that are not possible in traditional multi-view camera- based imaging systems.

...read moreread less

Abstract: Acquiring 3D models of intricate objects (like tree branches, bicycles and insects) is a hard problem due to severe self-occlusions, repeated thin structures and surface discontinuities. In theory, a shape-from-silhouettes (SFS) approach can overcome these difficulties and use many views to reconstruct visual hulls that are close to the actual shapes. In practice, however, SFS is highly sensitive to errors in silhouette contours and the calibration of the imaging system, and therefore not suitable for obtaining reliable shapes with a large number of views. We present a practical approach to SFS using a novel technique called coplanar shadowgram imaging, that allows us to use dozens to even hundreds of views for visual hull reconstruction. Here, a point light source is moved around an object and the shadows (silhouettes) cast onto a single background plane are observed. We characterize this imaging system in terms of image projection, reconstruction ambiguity, epipolar geometry, and shape and source recovery. The coplanarity of the shadowgrams yields novel geometric properties that are not possible in traditional multi-view camera- based imaging systems. These properties allow us to derive a robust and automatic algorithm to recover the visual hull of an object and the 3D positions of light source simultaneously, regardless of the complexity of the object. We demonstrate the acquisition of several intricate shapes with severe occlusions and thin structures, using 50 to 120 views.

...read moreread less

Proceedings Article•DOI•

Overlay what Humanoid Robot Perceives and Thinks to the Real-world by Mixed Reality System

[...]

Kazuhiko Kobayashi¹, Koichi Nishiwaki², Shinji Uchiyama¹, Hiroyuki Yamamoto¹, Satoshi Kagami², Takeo Kanade² - Show less +2 more•Institutions (2)

Canon Inc.¹, National Institute of Advanced Industrial Science and Technology²

13 Nov 2007

TL;DR: A novel environment for robot development is presented, in which intermediate results of the system are overlaid on physical space using mixed reality technology, which gives a human-robot interface that shows the robot internal state intuitively, not only in development, but also in operation.

...read moreread less

Abstract: One of the problems in developing a humanoid robot is caused by the fact that intermediate results, such as what the robot perceives the environment, and how it plans its moving path are hard to be observed online in the physical environment. What developers can see is only the behavior. Therefore, they usually investigate logged data afterwards, to analyze how well each component worked, or which component was wrong in the total system. In this paper, we present a novel environment for robot development, in which intermediate results of the system are overlaid on physical space using mixed reality technology. Real-time observation enables the developers to see intuitively, in what situation the specific intermediate results are generated, and to understand how results of a component affected the total system. This feature makes the development efficient and precise. This environment also gives a human-robot interface that shows the robot internal state intuitively, not only in development, but also in operation.

...read moreread less

Journal Article•

Object Type Classification Using Structure-based Feature Representation

[...]

Tomoyuki Nagahashi¹, Hironobu Fujiyoshi¹, Takeo Kanade²•Institutions (2)

Chubu University¹, Carnegie Mellon University²

01 Jan 2007-Journal of Machine Vision and Applications

TL;DR: This work proposes a method to distinguish object types using structure-based features described by a Gaussian mixture model, and demonstrates that it can obtain higher classification performance when using both conventional and structure- based features together than when using either alone.

...read moreread less

Abstract: Current feature-based object type classification methods information of texture and shape based information derived from image patches. Generally, input features, such as the aspect ratio, are derived from rough characteristics of the entire object. However, we derive input features from a parts-based representation of the object. We propose a method to distinguish object types using structure-based features described by a Gaussian mixture model. This approach uses Gaussian fitting onto foreground pixels detected by background subtraction to segment an image patch into several sub-regions, each of which is related to a physical part of the object. The object is modeled as a graph, where the nodes contain SIFT(Scale Invariant Feature Transform) information obtained from the corresponding segmented regions, and the edges contain information on distance between two connected regions. By calculating the distance between the reference and input graphs, we can use a k-NN-based classifier to classify an object as one of the following: single human, human group, bike, or vehicle. We demonstrate that we can obtain higher classification performance when using both conventional and structure-based features together than when using either alone.

...read moreread less

Proceedings Article•DOI•

Automated Individualization of Deformable Eye Region Model and Its Application to Eye Motion Analysis

[...]

Tsuyoshi Moriyama, Takeo Kanade¹•Institutions (1)

Carnegie Mellon University¹

17 Jun 2007

TL;DR: Experimental results show the proposed method of automated individualization of eye region model gives as nice initialization as manual labor and allows comparative tracking results for a comprehensive set of eye motions.

...read moreread less

Abstract: This paper proposes a method of automated individualization of eye region model. The eye region model has been proposed in past research that parameterizes both the structure and the motion of the eye region. Without any prior knowledge, one can never determine a given appearance of eye region to be either neutral to any expression, i.e., the inherent structure of the eye region, or the result of motion by a facial expression. The past method manually individualized the model with respect to the structure parameters in the initial frame and tracks the motion parameters automatically across the rest of the image sequence, assuming the initial frame contains only neutral faces. Under the same assumption, we automatically determine the structure parameters for the given eye region image. We train active appearance models (AAMs) for parameterizing the variance of individuality. The system projects a given eye region image onto the low dimensional subspace spanned by the AAM and retrieves the structure parameters of the nearest training sample and initializes the eye region model using them. The AAMs are trained in the subregions, i.e., the upper eyelid region, the palpebral fissure (the eye aperture) region, and the lower eyelid region, respectively. It enables each AAM to effectively represent fine structures. Experimental results show the proposed method gives as nice initialization as manual labor and allows comparative tracking results for a comprehensive set of eye motions.

...read moreread less

Book Chapter•DOI•

Combined object detection and segmentation by using space-time patches

[...]

Yasuhiro Murai¹, Hironobu Fujiyoshi¹, Takeo Kanade²•Institutions (2)

Chubu University¹, Carnegie Mellon University²

18 Nov 2007

TL;DR: Experimental results show that object detection is more accurate with the method than with the conventional method, which is only based on appearance features.

...read moreread less

Abstract: This paper presents a method for classifying the direction of movement and for segmenting objects simultaneously using features of space-time patches. Our approach uses vector quantization to classify the direction of movement of an object and to estimate its centroid by referring to a codebook of the space-time patch feature, which is generated from multiple learning samples. We segmented the objects' regions based on the probability calculated from the mask images of the learning samples by using the estimated centroid of the object. Even though occlusions occur when multiple objects overlap in different directions of movement, our method detects objects individually because their direction of movement is classified. Experimental results show that object detection is more accurate with our method than with the conventional method, which is only based on appearance features.

...read moreread less

Proceedings Article•

Simultaneous registration and clustering for temporal segmentation of facial gestures from video

[...]

Fernando De la Torre, Joan Campoy, Jeffrey F. Cohn, Takeo Kanade

01 Jan 2007

TL;DR: Ethanol and/or acetaldehyde are produced by reacting at elevated temperature and pressure methanol with synthesis gas in the presence of a catalyst comprising a metal complex in which the metal is a metal of Group VIII of the Periodic Table other than iron, palladium and platinum and the ligand is derived from cyclopentadiene or a substituted cyclopents.

...read moreread less

Abstract: Ethanol and/or acetaldehyde are produced by reacting at elevated temperature and pressure methanol with synthesis gas in the presence of a catalyst comprising a metal complex in which the metal is a metal of Group VIII of the Periodic Table other than iron, palladium and platinum and the ligand is derived from cyclopentadiene or a substituted cyclopentadiene and a promoter which is an iodide or a bromide. Optionally there is also added as a co-promoter a compound of formula X(A) (B) (C) wherein X is nitrogen, phosphorus, arsenic, antimony or bismuth and A, B and C are individually C1 to C20 monovalent hydrocarbyl groups which are free from aliphatic carbon-carbon unsaturation and are bound to the X atom by a carbon/X bond, or X is phosphorus, arsenic, antimony or bismuth and any two of A, B and C together form an organic divalent cyclic ring system bonded to the X atom, or X is nitrogen and all of A, B and C together form an organic trivalent cyclic ring system bonded to the X atom, e.g. triphenylphosphine.

...read moreread less

Vision-Based Control of Agile, Autonomous Micro Air Vehicles and Small UAVs in Urban Environments

[...]

Rick Lind, Peter Ifju, Warren E. Dixon, Andrew J. Kurdila, Robert Sharpley, Takeo Kanade - Show less +2 more

01 Dec 2007

TL;DR: In this article, the authors investigated technologies to enable autonomous flight of agile vehicles in urban environments and developed technologies related to vision-based feedback for control, including feature-point tracking, state estimation, scene reconstruction, robustness to camera calibration, daisy-chaining navigation, mapping, path planning, and feedback characterization.

...read moreread less

Abstract: : This project investigated technologies to enable autonomous flight of agile vehicles in urban environments. Specifically, technologies were developed that related to vision-based feedback for control. The mission profile under consideration was a single vehicle carrying a video camera while flying below the rooftops of a city with no additional sensors or pre-existing map information. As such, substantial progress was made in the areas of feature-point tracking, state estimation, scene reconstruction, robustness to camera calibration, daisy-chaining navigation, mapping, path planning, and feedback characterization. An integrated approach was used that (be used on multi-disciplinary analysis for decision making and control commands. The project spiraled the maturation of technologies from theory to simulation to flight testing. The simulation relied upon a hardware-in-the-loop simulation facility that allowed the physical camera to record measurements from high-fidelity graphics and thus consider the effects of distortion and nonlinearities.

...read moreread less

Proceedings Article•

Robust appearance matching with filtered component analysis

[...]

Fernando De la Torre, Alvaro Collet, Jeffrey F. Cohn, Takeo Kanade

01 Jan 2007

TL;DR: A process for preserving green colored plant tissues and in particular coniferous needles, holly and low fiber leaves such as mosses, lichens and ferns in which selected leaves are immersed in a solution to permanently retain and biologically fix the green color of the leaves.

...read moreread less

Abstract: A process for preserving green colored plant tissues and in particular coniferous needles, holly and low fiber leaves such as mosses, lichens and ferns in which selected leaves are immersed in a solution comprising (by volume) 35-45% water, 20-30% 2-propanol, 5-12% propionic acid, 5-10% sulphurous acid, 5-10% formalin, 2.5-5% formic acid, 1-5% ethylene glycol, and optionally minor amounts of compounds selected from the group consisting of cupric sulphate, cupric chloride, 20-20-20 fertilizer, citric acid, DBE, magnesium sulphate, acetic acid, cupric acetate, cupric nitrate, sodium phosphate, sodium sulfite, butylated hydroxytolulene and glycerol, for a sufficient time to exchange the naturally occurring water in the tissues with the "chemical water" of the solution and thereby permanently retain and biologically fix the green color of the leaves.

...read moreread less