scispace - formally typeset
Search or ask a question

Showing papers by "Takeo Kanade published in 2003"


Proceedings ArticleDOI
18 Jun 2003
TL;DR: An iterative algorithm is proposed to solve this simultaneous assignment and alignment problem of the shape and motion of a rigidly moving object over time to apply to dynamic articulated objects.
Abstract: Shape-from-silhouette (SFS), also known as visual hull (VH) construction, is a popular 3D reconstruction method, which estimates the shape of an object from multiple silhouette images. The original SFS formulation assumes that the entire silhouette images are captured either at the same time or while the object is static. This assumption is violated when the object moves or changes shape. Hence the use of SFS with moving objects has been restricted to treating each time instant sequentially and independently. Recently we have successfully extended the traditional SFS formulation to refine the shape of a rigidly moving object over time. We further extend SFS to apply to dynamic articulated objects. Given silhouettes of a moving articulated object, the process of recovering the shape and motion requires two steps: (1) correctly segmenting (points on the boundary of) the silhouettes to each articulated part of the object, (2) estimating the motion of each individual part using the segmented silhouette. In this paper, we propose an iterative algorithm to solve this simultaneous assignment and alignment problem. Once we have estimated the shape and motion of each part of the object, the articulation points between each pair of rigid parts are obtained by solving a simple motion constraint between the connected parts. To validate our algorithm, we first apply it to segment the different body parts and estimate the joint positions of a person. The acquired kinematic (shape and joint) information is then used to track the motion of the person in new video sequences.

433 citations


Journal ArticleDOI
TL;DR: The method successfully stabilized face and eye images allowing for 98% accuracy in automatic blink recognition and was used as part of a facial expression analysis system intended for use with spontaneous facial behavior in which moderate head motion is common.
Abstract: This paper presents a method to recover the full-motion (3 rotations and 3 translations) of the head from an input video using a cylindrical head model. Given an initial reference template of the head image and the corresponding head pose, the head model is created and full head motion is recovered automatically. The robustness of the approach is achieved by a combination of three techniques. First, we use the iteratively re-weighted least squares (IRLS) technique in conjunction with the image gradient to accommodate non-rigid motion and occlusion. Second, while tracking, the templates are dynamically updated to diminish the effects of self-occlusion and gradual lighting changes and to maintain accurate tracking even when the face moves out of view of the camera. Third, to minimize error accumulation inherent in the use of dynamic templates, we re-register images to a reference template whenever head pose is close to that in the template. The performance of the method, which runs in real time, was evaluated in three separate experiments using image sequences (both synthetic and real) for which ground truth head motion was known. The real sequences included pitch and yaw as large as 40° and 75°, respectively. The average recovery accuracy of the 3D rotations was about 3°. In a further test, the method was used as part of a facial expression analysis system intended for use with spontaneous facial behavior in which moderate head motion is common. Image data consisted of 1-minute of video from each of 10 subjects while engaged in a 2-person interview. The method successfully stabilized face and eye images allowing for 98% accuracy in automatic blink recognition.

218 citations


Proceedings ArticleDOI
02 Nov 2003
TL;DR: The Distant Human Identification system is a master-slave, real-time surveillance system designed to acquire biometric imagery of humans at distance to detect and track moving people at distances up to 50 meters, within a 60° field of regard.
Abstract: The Distant Human Identification (DHID) system is a master-slave, real-time surveillance system designed to acquire biometric imagery of humans at distance. A stationary wide field of view master camera is used to monitor an environment at distance. When the master camera detects a moving person, a narrow field of view slave camera is commanded to turn to that direction, acquire the target human, and track them while recording zoomed-in images. These zoomed-in views provide meaningful biometric imagery of the distant humans, who are not recognizable in the master view. Based on the lenses we currently use, the system can detect and track moving people at distances up to 50 meters, within a 60° field of regard.

187 citations


Proceedings ArticleDOI
18 Jun 2003
TL;DR: An algorithm to improve the shape approximation by combining multiple silhouette images captured across time by first estimating the rigid motion between the visual hulls formed at different time instants and then combining them to get a tighter bound on the object's shape.
Abstract: Visual hull (VH) construction from silhouette images is a popular method of shape estimation. The method, also known as shape-from-silhouette (SFS), is used in many applications such as non-invasive 3D model acquisition, obstacle avoidance, and more recently human motion tracking and analysis. One of the limitations of SFS, however, is that the approximated shape can be very coarse when there are only a few cameras. In this paper, we propose an algorithm to improve the shape approximation by combining multiple silhouette images captured across time. The improvement is achieved by first estimating the rigid motion between the visual hulls formed at different time instants (visual hull alignment) and then combining them (visual hull refinement) to get a tighter bound on the object's shape. Our algorithm first constructs a representation of the VHs called the bounding edge representation. Utilizing a fundamental property of visual hulls, which states that each bounding edge must touch the object at at least one point, we use multi-view stereo to extract points called colored surface points (CSP) on the surface of the object. These CSPs are then used in a 3D image alignment algorithm to find the 6 DOF rigid motion between two visual hulls. Once the rigid motion across time is known, all of the silhouette images are treated as being captured at the same time instant and the shape of the object is refined. We validate our algorithm on both synthetic and real data and compare it with space carving.

165 citations


Patent
02 May 2003
TL;DR: In this article, a method and system for providing control that include providing a workpiece (5126) that includes a target shape, providing a cutting tool, (5122), providing a 3-D image associated with the workpiece, identifying the target shape within the work piece image, registering the work pieces (5 126) with the image and registering the cutting tool (5 122) with image.
Abstract: A method and system for providing control that include providing a workpiece (5126) that includes a target shape, providing a cutting tool, (5122), providing a 3-D image associated with the workpiece (5126), identifying the target shape within the workpiece image, providing a 3-D image associated with the cutting tool, registering the workpiece (5126) with the workpiece image, registering the cutting tool (5122) with the cutting tool image, tracking at least one of the workpiece (5126) and the cutting tool (5122), transforming the tracking data based on the image coordinates to determine a relationship between the workpiece and the cutting tool, and, based on the relationship, providing a control to the cutting tool (5122). In one embodiment, the workpiece image can be represented as volume pixels (voxels) that can be classified and/or reclassified based on target shape, waste, and or workpiece (5126).

160 citations


Journal ArticleDOI
TL;DR: Ultrasound-based registration eliminates the need for physical contact with the bone surface as in point-basedRegistration, enabling development of the next generation of minimally invasive surgical procedures.
Abstract: Objective: To allow non-invasive registration of the bone surface for computer-assisted surgery (CAS), this investigation reports the development and evaluation of intraoperative registration using 2D ultrasound (US) images. This approach employs automatic segmentation of the bone surface reflection from US images tagged with the 3D position to enable the application of CAS to minimally invasive procedures. Methods: The US-based registration method was evaluated in comparison to point-based registration, which is the predominant method in current clinical use. The absolute accuracy of the US-based registration was determined using a phantom pelvis, with fiducial registration providing the ground truth. The relative accuracy was determined by an intraoperative study comparing the US registration to the point-based registration obtained as part of the HipNavm experimental protocol. Results: The phantom pelvis study demonstrated equivalent accuracy between point- and USbased registration under in vitro conditions. In the intraoperative study, the US-based registration was sufficiently consistent with the point-based registration to warrant larger-scale clinical trials of this non-invasive registration method. Conclusion: Ultrasound-based registration eliminates the need for physical contact with the bone surface as in point-based registration. As a result, non-invasive registration could fully unlock the potential of computer-assisted surgery, enabling development of the next generation of minimdy invasive surgical procedures. Comp Aid Surg 8:l-16 (2003). 02003 CAS Journal, LLC

126 citations


Proceedings ArticleDOI
08 Dec 2003
TL;DR: An ultrasonic tagging system developed for robustly observing human activity in a living area using ultrasonic transmitter tags with unique identifiers is shown to be able to track the three-dimensional motion of tagged objects in real time with high accuracy, resolution and robustness to occlusion.
Abstract: This paper describes an ultrasonic tagging system developed for robustly observing human activity in a living area. Using ultrasonic transmitter tags with unique identifiers, the system is shown through experimental application to be able to track the three-dimensional motion of tagged objects in real time with high accuracy, resolution and robustness to occlusion. The use of an ultrasonic system is desirable because of its low cost and use of commercial components, and the proposed system achieves high accuracy and robustness through the use of many redundant sensors. The system employs multilateration to locate tagged objects using one of two estimation algorithms, a least-squares optimization method or a random sample consensus method.

126 citations


Proceedings ArticleDOI
16 Jul 2003
TL;DR: By using a large facial image database called CMU PIE database, a probabilistic model of how facial features change as the pose changes is developed, which achieves a better recognition rate than conventional face recognition methods over a much larger range of pose.
Abstract: Current automatic facial recognition systems are not robust against changes in illumination, pose, facial expression and occlusion. In this paper, we propose an algorithm based on a probabilistic approach for face recognition to address the problem of pose change by a probabilistic approach that takes into account the pose difference between probe and gallery images. By using a large facial image database called CMU PIE database, which contains images of the same set of people taken from many different angles, we have developed a probabilistic model of how facial features change as the pose changes. This model enables us to make our face recognition system more robust to the change of poses in the probe image. The experimental results show that this approach achieves a better recognition rate than conventional face recognition methods over a much larger range of pose. For example, when the gallery contains only images of a frontal face and the probe image varies its pose orientation, the recognition rate remains within a less than 10% difference until the probe pose begins to differ more than 45 degrees, whereas the recognition rate of a PCA-based method begins to drop at a difference as small as 10 degrees, and a representative commercial system at 30 degrees.

115 citations


Proceedings ArticleDOI
18 Jun 2003
TL;DR: A robust method to solve two coupled problems, ground-layer detection and vehicle ego-motion estimation, which appear in visual navigation, byvirtually rotate the camera to the downward-looking pose, which will eliminate the ambiguity between rotational and translation ego- motion parameters and improve the Hessian matrix condition in the direct motion estimation process.
Abstract: This paper presents a robust method to solve two coupled problems, ground-layer detection and vehicle ego-motion estimation, which appear in visual navigation. We virtually rotate the camera to the downward-looking pose in order to exploit the fact that the vehicle motion is roughly constrained to be planar motion on the ground. This camera geometry transformation together with the planar motion constraint will: 1) eliminate the ambiguity between rotational and translation ego-motion parameters, and 2) improve the Hessian matrix condition in the direct motion estimation process. The virtual downward-looking camera enables us to estimate the planar ego-motions even for small image patches. Such local measurements are then combined together, by a robust weighting scheme based on both ground plane geometry and motion compensated intensity residuals, for a global ego-motion estimation and ground plane detection. We demonstrate the effectiveness of our method by experiments on both synthetic and real data.

107 citations


01 Jan 2003
TL;DR: This paper presents two algorithms to optimize the L1 norm metric: the weighted median algorithm and the quadratic programming algorithm, and shows that it is robust to outliers and can handle missing data.
Abstract: Linear subspace has many important applications in computer vision, such as structure from motion, motion estimation, layer extraction, object recognition, and object tracking. Singular Value Decomposition (SVD) algorithm is a standard technique to compute the subspace from the input data. The SVD algorithm, however, is sensitive to outliers as it uses L2 norm metric, and it can not handle missing data either. In this paper, we propose using L1 norm metric to compute the subspace. We show that it is robust to outliers and can handle missing data. We present two algorithms to optimize the L1 norm metric: the weighted median algorithm and the quadratic programming algorithm. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Carnegie Mellon University or the U.S. Government.

100 citations


Proceedings ArticleDOI
01 Jan 2003
TL;DR: This paper implemented and flight tested a gain-scheduled H/sub /spl infin// loop-shaping controller on the Carnegie Mellon University (CMU) Yamaha R-50 robotic helicopter and believes that this approach quickly delivers controllers that exploit the full dynamic capabilities of the airframe and thus are ready to be used by higher level navigation systems for complex autonomous missions.
Abstract: To accomplish successfully the complex future mission in civilian and military scenarios, robotic helicopters need to have controllers that exploit their full dynamic capabilities. The absence of high-fidelity simulation models has prevented the use of well established multivariable control techniques for the design of high-bandwidth full-flight-envelope control systems. Existing model-based controllers are of low bandwidth and cover only small portions of the vehicle's flight envelope. In this paper we present the results of the synergistic use of high-fidelity integrated modeling strategies, robust multivariable control techniques, and classical gain scheduling for the rapid and reliable design of high-bandwidth full-flight envelope controllers for robotic helicopters. We implemented and flight tested a gain-scheduled H/sub /spl infin// loop-shaping controller on the Carnegie Mellon University (CMU) Yamaha R-50 robotic helicopter. During the flight tests, the CMU R-50 flew several high-speed maneuvers. We believe that our modeling/control approach quickly delivers controllers that exploit the full dynamic capabilities of the airframe and thus are ready to be used by higher level navigation systems for complex autonomous missions.

Journal ArticleDOI
TL;DR: An appearance-based virtual view generation method that allows viewers to fly through a real dynamic scene by interpolating two original camera-views near the given viewpoint and extracting much more reliable and comprehensive 3D geometry of the scene as a 3D model.
Abstract: We present an appearance-based virtual view generation method that allows viewers to fly through a real dynamic scene. The scene is captured by multiple synchronized cameras. Arbitrary views are generated by interpolating two original camera-views near the given viewpoint. The quality of the generated synthetic view is determined by the precision, consistency and density of correspondences between the two images. All or most of previous work that uses interpolation extracts the correspondences from these two images. However, not only is it difficult to do so reliably (the task requires a good stereo algorithm), but also the two images alone sometimes do not have enough information, due to problems such as occlusion. Instead, we take advantage of the fact that we have many views, from which we can extract much more reliable and comprehensive 3D geometry of the scene as a 3D model. Dense and precise correspondences between the two images, to be used for interpolation, are obtained using this constructed 3D model.

Proceedings ArticleDOI
04 Jun 2003
TL;DR: This paper implemented and flight tested a gain-scheduled H/sub /spl infin// loop shaping controller on the Carnegie Mellon University (CMU) Yamaha R-50 robotic helicopter, the first of its kind to be flight tested on a helicopter (manned or unmanned).
Abstract: Complex future missions in civilian and military scenarios will require robotic helicopters to have controllers that exploit their full dynamic capabilities. The absence of high fidelity simulation models has prevented the use of well established multivariable control techniques for the design of high-bandwidth full-flight-envelope control systems. Existing model-based controllers are of low bandwidth and cover only small portions of the vehicle's flight envelope. In this paper we present the results of the synergistic use of high-fidelity integrated modeling strategies, robust multivariable control techniques, and classical gain scheduling for the rapid and reliable design of a high-bandwidth full-flight-envelope controller for robotic helicopters. We implemented and flight tested a gain-scheduled H/sub /spl infin// loop shaping controller on the Carnegie Mellon University (CMU) Yamaha R-50 robotic helicopter. This gain-scheduled H/sub /spl infin// loop shaping controller is the first of its kind to be flight tested on a helicopter (manned or unmanned). During the flight tests, the CMU R-50 flew moderate to high-speed maneuvers. We believe that our modeling/control approach delivers controllers that exploit the full dynamic capability of the airframe and thus are ready to be used by higher level navigation systems for complex autonomous missions.

Proceedings ArticleDOI
09 Dec 2003
TL;DR: In this article, an image-based tracking control of an aerial blimp is proposed for surveillance systems, where only one camera is used to control the blimp without the other sensors and to generate a reference trajectory from image information.
Abstract: Image-based tracking control of an aerial blimp is proposed for surveillance systems. The controller is designed by backstepping techniques for underactuated systems. In our framework, only one camera on a blimp is used to control the blimp without the other sensors and to generate a reference trajectory from image information. After a commander sets a target, the blimp flies around it automatically. This image-based approach reduces to utilize the values estimated by a structure-from-motion algorithm. In this research an aerial blimp is used for indoor experiments, and image-based control are compared with position-based control.

Journal ArticleDOI
TL;DR: A novel two-dimensional scaled prismatic model (SPM) for figure registration that has fewer singularity problems and does not require detailed knowledge of the 3D kinematics is described.
Abstract: Three-dimensional (3D) kinematic models are widely-used in video-based figure tracking. We show that these models can suffer from singularities when motion is directed along the viewing axis of a single camera. The single camera case is important because it arises in many interesting applications, such as motion capture from movie footage, video surveillance, and vision-based user-interfaces.We describe a novel two-dimensional scaled prismatic model (SPM) for figure registration. In contrast to 3D kinematic models, the SPM has fewer singularity problems and does not require detailed knowledge of the 3D kinematics. We fully characterize the singularities in the SPM and demonstrate tracking through singularities using synthetic and real examples.We demonstrate the application of our model to motion capture from movies. Fred Astaire is tracked in a clip from the film “Shall We Dance”. We also present the use of monocular hand tracking in a 3D user-interface. These results demonstrate the benefits of the SPM ...

Journal ArticleDOI
TL;DR: A reconstruction method for multiple motion scenes, which are scenes containing multiple moving objects, from uncalibrated views, that first performs a projective reconstruction using a bilinear factorization algorithm and converts the projective solution to a Euclidean one by enforcing metric constraints.
Abstract: In this paper, we describe a reconstruction method for multiple motion scenes, which are scenes containing multiple moving objects, from uncalibrated views. Assuming that the objects are moving with constant velocities, the method recovers the scene structure, the trajectories of the moving objects, the camera motion, and the camera intrinsic parameters (except skews) simultaneously. We focus on the case where the cameras have unknown and varying focal lengths while the other intrinsic parameters are known. The number of the moving objects is automatically detected without prior motion segmentation. The method is based on a unified geometrical representation of the static scene and the moving objects. It first performs a projective reconstruction using a bilinear factorization algorithm and, then, converts the projective solution to a Euclidean one by enforcing metric constraints. Experimental results on synthetic and real images are presented.

Journal ArticleDOI
01 Oct 2003
TL;DR: A method to reconstruct the specimen's optical properties over a three-dimensional (3-D) volume is developed which is a nonlinear optimization which uses hierarchical representations of the specimen and data.
Abstract: Differential interference contrast (DIC) microscopy is a powerful visualization tool used to study live biological cells. Its use, however, has been limited to qualitative observations. The inherent nonlinear relationship between the object properties and the image intensity makes quantitative analysis difficult. Toward quantitatively measuring optical properties of objects from DIC images, we develop a method to reconstruct the specimen's optical properties over a three-dimensional (3-D) volume. The method is a nonlinear optimization which uses hierarchical representations of the specimen and data. As a necessary tool, we have developed and validated a computational model for the DIC image formation process. We test our algorithm by reconstructing the optical properties of known specimens.

01 Jan 2003
TL;DR: This thesis proposes a fast testing/projection algorithm for voxel-based SFS algorithms and develops an algorithm to align Visual Hulls over time using stereo and an important property of the Shape-From-Silhouette principle.
Abstract: The abilities to build precise human kinematic models and to perform accurate human motion tracking are essential in a wide variety of applications. Due to the complexity of the human bodies and the problem of self-occlusion, modeling and tracking humans using cameras are challenging tasks. In this thesis, we develop algorithms to perform these two tasks based on the shape estimation method Shape-From-Silhouette (SFS) which constructs a shape estimate (known as Visual Hull) of an object using its silhouettes images. In the first half of this thesis we extend the traditional SFS algorithm so that it can be used effectively for the human related applications. To perform SFS in real-time, we propose a fast testing/projection algorithm for voxel-based SFS algorithms. Moreover, we combine silhouette information over time to effectively increase the number of cameras (and hence reconstruction details) for SFS without physically adding new cameras. We first propose a new Visual Hull representation called Bounding Edges. We then analyze the ambiguity problem of aligning two Visual Hulls. Based on the analysis, we develop an algorithm to align Visual Hulls over time using stereo and an important property of the Shape-From-Silhouette principle. This temporal SFS algorithm combines both geometric constraints and photometric consistency to align Colored Surface Points of the object extracted from the silhouette and color images. Once the Visual Hulls are aligned, they are refined by compensating for the motion of the object. The algorithm is developed for both rigid and articulated objects. In the second half of this thesis we show how the improved SFS algorithms are used to perform the tasks of human modeling and motion tracking. First we build a system to acquire human kinematic models consisting of precise shape and joint locations. Once the kinematic models are built, they are used to track the motion of the person in new video sequences. The tracking algorithm is based on the Visual Hull alignment idea used in the temporal SFS algorithms. Finally we demonstrate how the kinematic model and the tracked motion data can be used for image-based rendering and motion transfer between two people.

01 Jan 2003
TL;DR: This paper summarizes the work and the understanding on volumetric pathological neuro image retrieval under the framework of classification-driven feature selection for improved image indexing feature discriminating power as well as reduced computational cost during on-line pathological neuroimage retrieval.
Abstract: This paper summarizes our work and our understanding on volumetric pathological neuroimage retrieval under the framework of classification-driven feature selection. The main effort concerns off-line image feature space reduction for improved image indexing feature discriminating power as well as reduced computational cost during on-line pathological neuroimage retrieval. Keywrods: 3D image, feature selection, brain asymmetry, midsagittal plane, image classification,indexing, retrieval

Proceedings ArticleDOI
03 Dec 2003
TL;DR: An autonomous blimp for a surveillance system, which is circling around a specified target with only one camera, is designed, and an extension of Lucas-Kanade algorithm for detection and tracking of features with rotation and scaling is provided, and a simplified structure-from-motion algorithm is applied to improve the accuracy of state estimation.
Abstract: An autonomous blimp for a surveillance system, which is circling around a specified target with only one camera, is designed in this paper. For this purpose, an extension of Lucas-Kanade algorithm for detection and tracking of features with rotation and scaling is provided, and a simplified structure-from-motion algorithm is applied to improve the accuracy of state estimation. A tracking controller is designed for the blimp which is an underactuated system. The desired path of the blimp is also generated from image information of a target. The blimp flies around the target automatically, after a commander sets it. Some experiments are performed indoors by using an aerial blimp.

Journal ArticleDOI
TL;DR: It is shown that stereo computed from the light-field is ambiguous if and only if the scene is radiating light of a constant intensity (and color, etc.) over an extended region.
Abstract: The complete set of measurements that could ever be used by a passive 3D vision algorithm is the plenoptic function or light-field. We give a concise characterization of when the light-field of a Lambertian scene uniquely determines its shape and, conversely, when the shape is inherently ambiguous. In particular, we show that stereo computed from the light-field is ambiguous if and only if the scene is radiating light of a constant intensity (and color, etc.) over an extended region.

Journal ArticleDOI
TL;DR: In this article, a system that detects eye blinking in spontaneously occurring facial behavior that has been measured with a nonfrontal pose, moderate out-of-plane head motion, and occlusion was developed.
Abstract: Previous research in automatic facial expression recognition has been limited to recognition of gross expression categories (e.g., joy or anger) in posed facial behavior under well-controlled conditions (e.g., frontal pose and minimal out-of-plane head motion). We have developed a system that detects a discrete and important facial action (e.g., eye blinking) in spontaneously occurring facial behavior that has been measured with a nonfrontal pose, moderate out-of-plane head motion, and occlusion. The system recovers three-dimensional motion parameters, stabilizes facial regions, extracts motion and appearance information, and recognizes discrete facial actions in spontaneous facial behavior. We tested the system in video data from a two-person interview. The 10 subjects were ethnically diverse, action units occurred during speech, and out-of-plane motion and occlusion from head motion and glasses were common. The video data were originally collected to answer substantive questions in psychology and represent a substantial challenge to automated action unit recognition. In analysis of blinks, the system achieved 98% accuracy.

Proceedings ArticleDOI
16 Jul 2003
TL;DR: This paper describes the research efforts aimed at understanding human being walking functions using motion capture system, force plates and distributed force sensors, both human being and humanoid H7 walk motion were captured.
Abstract: This paper describes our research efforts aimed at understanding human being walking functions. Using motion capture system, force plates and distributed force sensors, both human being and humanoid H7 walk motion were captured. Experimental results are shown. Comparison in between human being with H7 walk in following points are discussed: 1) ZMP trajectories, 2) torso movement, 3) free leg trajectories, 4) joint angle usage, 5) joint torque usage. Furthermore, application to the humanoid robot is discussed.

01 Jan 2003
TL;DR: This dissertation discusses methods for efficiently approximating conditional probabilities in large domains by maximizing the entropy of the distribution given a set of constraints and develops two algorithms, the inverse probability method and recurrent linear network, for maximizing Renyi's quadratic entropy without bounds.
Abstract: In this dissertation we discuss methods for efficiently approximating conditional probabilities in large domains by maximizing the entropy of the distribution given a set of constraints. The constraints are constructed from conditional probabilities, typically of low-order, that can be accurately computed from the training data. By appropriately choosing the constraints, maximum entropy methods can balance the tradeoffs in errors due to bias and variance. Standard maximum entropy techniques are too computationally inefficient for use in large domains in which the set of variables that are being conditioned upon varies. Instead of using the standard measure of entropy first proposed by Shannon, we use a measure that lies within the family of Renyi's entropies. If we allow our probability estimates to occasionally lie outside the range from 0 to 1, we can efficiently maximize Renyi's quadratic entropy relative to the constraints using a set of linear equations. We develop two algorithms, the inverse probability method and recurrent linear network, for maximizing Renyi's quadratic entropy without bounds. The algorithms produce identical results. However, depending on the type of problem, one method may be more computationally efficient than the other. We also propose an extension to the algorithms for partially enforcing the constraints based on our confidence in them. Our algorithms are tested on several applications including: collaborative filtering, image retrieval and language modeling.

01 Jan 2003
TL;DR: A robust distance minimization approach to solving point-sampled vision problems based on correlating kernels centered at point-samples, a technique called kernel correlation, which enforces smoothness on point samples from all views, not just within a single view.
Abstract: Range sensors, such as laser range finder and stereo vision systems, return point-samples of a scene Typical point-sampled vision problems include registration, regularization and merging We introduce a robust distance minimization approach to solving the three classes of problems The approach is based on correlating kernels centered at point-samples, a technique we call kernel correlation Kernel correlation is an affinity measure, and it contains an M-estimator mechanism for distance minimization Kernel correlation is also an entropy measure of the point set configuration Maximizing kernel correlation implies enforcing compact point set The effectiveness of kernel correlation is evaluated by the three classes of problems First, the kernel correlation based registration method is shown to be efficient, accurate and robust, and its performance is compared with the iterative closest point (ICP) algorithm Second, kernel correlation is adopted as an object space regularizer in the stereo vision problem Kernel correlation is discontinuity preserving and usually can be applied in large scales, resulting in smooth appearance of the estimated model The performance of the algorithm is evaluated both quantitatively and qualitatively Finally, kernel correlation plays a point-sample merging role in a multiple view stereo algorithm Kernel correlation enforces smoothness on point samples from all views, not just within a single view As a result we can put both the photo-consistency and the model merging constraints into a single energy function Convincing reconstruction results are demonstrated

Proceedings ArticleDOI
17 Nov 2003
TL;DR: This paper presents a position estimation method for multiple ultrasonic emitters which are activated simultaneously and applies a newly developed position estimation algorithm.
Abstract: This paper presents a position estimation method for multiple ultrasonic emitters which are activated simultaneously Locations of ultrasonic emitters are generally calculated from distance data obtained at receivers, but the emitters must be activated one after another to avoid crosstalk Thus, the sampling rate for each emitter decreases as the number of emitters increases The authors solved this problem by activating multiple emitters simultaneously and applying a newly developed position estimation algorithm This paper presents the theoretical analysis and the results of simulations and experiments

01 Jan 2003
TL;DR: A proof is presented to show that the subspace approach is guaranteed to increase significantly the layer discriminability, due to its ability to simultaneously utilize spatial and temporal constraints in the video.
Abstract: A layer is a 2D sub-image inside which pixels share common apparent motion of some 3D scene plane. Representing videos with layers has many important applications, such as video compression, 3D scene and motion analysis, object detection and tracking, and vehicle navigation. Extracting layers from videos involves solving three highly intertwined subproblems: (1) segment the image into sub-regions (layers); (2) estimate the 2D motion of each layer; and (3) determine the number of layers. These three sub-problems are highly coupled together, making the layer extraction problem a very challenging one. Existing approaches to layer extraction are limited by (1) requiring good initial segmentation and/or strong assumptions about the scene, (2) unable to fully and simultaneously utilize the spatial and temporal constraints in video, and/or (3) unstable clustering in high dimensional space. This thesis presents a subspace approach to layer extraction which does not have the above limitations. We first show that the homographies induced by the planar patches in the scene form a linear subspace whose dimension is as low as two or three in many applications. We then formulate the layer extraction problem as clustering in such low dimensional subspace. Each layer in the input images will form a well-defined cluster in the subspace, and a simple mean shift based clustering algorithm can reliably identify the clusters thus the layers. A proof is presented to show that the subspace approach is guaranteed to increase significantly the layer discriminability, due to its ability to simultaneously utilize spatial and temporal constraints in the video. We present the detailed robust algorithm for layer extraction using subspace, as well as experimental results on a variety of real image sequences.


Proceedings ArticleDOI
22 Oct 2003
TL;DR: In this paper, a multi-lateration algorithm for multiplexed ultrasonic emitters is proposed to locate multiple emitters at a time by activating them simultaneously, and the algorithm implemented in a test system worked perfectly for the experiments conducted in a real environment.
Abstract: This paper presents a new multi-lateration algorithm developed by the authors for multiplexed ultrasonic emitters. Our ultrasonic sensor system had a problem that a sampling rate for each emitter had to be decreased to avoid crosstalk as the number of emitter increased. We analyzed the phenomena geometrically and mathematically, and developed a new multi-lateration algorithm that enables our ultrasonic sensor system locate multiple emitters at a time by activating them simultaneously. Simulation results supported the effectiveness of our proposed algorithm and the algorithm implemented in our test system worked perfectly for the experiments conducted in a real environment.

Book ChapterDOI
01 Jan 2003
TL;DR: This work has developed a system that tracks a person in real-time and adjusts the pan, tilt, zoom and focus of each camera to acquire synchronized multi-view video of a person moving through the scene.
Abstract: For applications in human identification, activity recognition, 3D reconstruction, entertainment and sports, it is often desirable to capture a set of synchronized video sequences of a person from multiple camera viewpoints (see Figure 8-1). One way to achieve this is to set up a ring of cameras all statically aimed at a single point in space, and to have an actor perform at this fixation point while the video footage is shot. This is the method used to create spectacular special effects in the movie The Matrix, where playing back frames from a single time step, across all cameras, yields the appearance of freezing the action in time while a virtual camera flies around the scene. However, in surveillance or sports applications it is not possible to predict beforehand the precise location where an interesting activity will occur, and therefore it is necessary to dynamically adjust the fixation point of multiple camera views. We have developed a system that tracks a person in real-time and adjusts the pan, tilt, zoom and focus of each camera to acquire synchronized multi-view video of a person moving through the scene.