Showing papers presented at "Workshop on Applications of Computer Vision in 2005"

PDF

Open Access

Proceedings Article•DOI•

Semi-Supervised Self-Training of Object Detection Models

[...]

Charles J. Rosenberg¹, Martial Hebert², H. Schneiderman²•Institutions (2)

05 Jan 2005

TL;DR: The key contributions of this empirical study are to demonstrate that a model trained in this manner can achieve results comparable to a modeltrained in the traditional manner using a much larger set of fully labeled data, and that a training data selection metric that is defined independently of the detector greatly outperforms a selection metric based on the detection confidence generated by the detector.

...read moreread less

Abstract: The construction of appearance-based object detection systems is time-consuming and difficult because a large number of training examples must be collected and manually labeled in order to capture variations in object appearance. Semi-supervised training is a means for reducing the effort needed to prepare the training set by training the model with a small number of fully labeled examples and an additional set of unlabeled or weakly labeled examples. In this work we present a semi-supervised approach to training object detection systems based on self-training. We implement our approach as a wrapper around the training process of an existing object detector and present empirical results. The key contributions of this empirical study is to demonstrate that a model trained in this manner can achieve results comparable to a model trained in the traditional manner using a much larger set of fully labeled data, and that a training data selection metric that is defined independently of the detector greatly outperforms a selection metric based on the detection confidence generated by the detector.

...read moreread less

767 citations

Proceedings Article•DOI•

A Two-Stage Template Approach to Person Detection in Thermal Imagery

[...]

James W. Davis¹, Mark A. Keck¹•Institutions (1)

Ohio State University¹

05 Jan 2005

TL;DR: A two-stage template-based method to detect people in widely varying thermal imagery using a generalized template and an AdaBoosted ensemble classifier using automatically tuned filters to test the hypothesized person locations.

...read moreread less

Abstract: We present a two-stage template-based method to detect people in widely varying thermal imagery. The approach initially performs a fast screening procedure using a generalized template to locate potential person locations. Next an AdaBoosted ensemble classifier using automatically tuned filters is employed to test the hypothesized person locations. We demonstrate and evaluate the approach using a challenging dataset of thermal imagery

...read moreread less

307 citations

Proceedings Article•DOI•

Measures of Similarity

[...]

Ranjith Unnikrishnan¹, Martial Hebert¹•Institutions (1)

Carnegie Mellon University¹

05 Jan 2005

TL;DR: This paper proposes a measure that addresses the above concerns and has desirable properties such as accommodation of labeling errors at segment boundaries, region sensitive refinement, and compensation for differences in segment ambiguity between images.

...read moreread less

Abstract: Quantitative evaluation and comparison of image segmentation algorithms is now feasible owing to the recent availability of collections of hand-labeled images. However, little attention has been paid to the design of measures to compare one segmentation result to one or more manual segmentations of the same image. Existing measures in statistics and computer vision literature suffer either from intolerance to labeling refinement, making them unsuitable for image segmentation, or from the existence of degenerate cases, making the process of training algorithms using the measures to be prone to failure. This paper surveys previous work on measures of similarity and illustrates scenarios where they are applicable for performance evaluation in computer vision. For the image segmentation problem, we propose a measure that addresses the above concerns and has desirable properties such as accommodation of labeling errors at segment boundaries, region sensitive refinement, and compensation for differences in segment ambiguity between images

...read moreread less

186 citations

Proceedings Article•DOI•

Contour Matching for 3D Ear Recognition

[...]

Hui Chen¹, Bir Bhanu¹•Institutions (1)

University of California, Riverside¹

05 Jan 2005

TL;DR: A two-step ICP (Iterative Closest Point) algorithm for matching 3D ears is introduced and results on a dataset of 30 subjects with 3D ear images are presented to demonstrate the effectiveness of the approach.

...read moreread less

Abstract: Ear is a new class of relatively stable biometric that is invariant from childhood to early old age (8 to 70). It is not affected with facial expressions, cosmetics and eye glasses. In this paper, we introduce a two-step ICP (Iterative Closest Point) algorithm for matching 3D ears. In the first step, the helix of the ear in 3D images is detected. The ICP algorithm is run to find the initial rigid transformation to align a model ear helix with the test ear helix. In the second step, the initial transformation is applied to selected locations of model ears and the ICP algorithm iteratively refines the transformation to bring model ears and test ear into best alignment. The root mean square (RMS) registration error is used as the matching error criterion. The model ear with the minimum RMS error is declared as the recognized ear. Experimental results on a dataset of 30 subjects with 3D ear images are presented to demonstrate the effectiveness of the approach.

...read moreread less

137 citations

Proceedings Article•DOI•

Epipolar Constraints for Vision-Aided Inertial Navigation

[...]

David D. Diel¹, Paul A. DeBitetto², Seth Teller¹•Institutions (2)

Massachusetts Institute of Technology¹, Charles Stark Draper Laboratory²

05 Jan 2005

TL;DR: The proposed method lengthens the period of time during which a human or vehicle can navigate in GPS-deprived environments by contributing stochastic epipolar constraints over a broad baseline in time and space.

...read moreread less

Abstract: This paper describes a new method to improve inertial navigation using feature-based constraints from one or more video cameras The proposed method lengthens the period of time during which a human or vehicle can navigate in GPS-deprived environments Our approach integrates well with existing navigation systems, because we invoke general sensor models that represent a wide range of available hardware The inertial model includes errors in bias, scale, and random walk Any purely projective camera and tracking algorithm may be used, as long as the tracking output can be expressed as ray vectors extending from known locations on the sensor body A modified linear Kalman filter performs the data fusion Unlike traditional SLAM, our state vector contains only inertial sensor errors related to position This choice allows uncertainty to be properly represented by a covariance matrix We do not augment the state with feature coordinates Instead, image data contributes stochastic epipolar constraints over a broad baseline in time and space, resulting in improved observability of the IMU error states The constraints lead to a relative residual and associated relative covariance, defined partly by the state history Navigation results are presented using high-quality synthetic data and real fisheye imagery

...read moreread less

127 citations

Proceedings Article•DOI•

Robust Salient Motion Detection with Complex Background for Real-Time Video Surveillance

[...]

Yingli Tian¹, Arun Hampapur¹•Institutions (1)

IBM¹

05 Jan 2005

TL;DR: The effectiveness of the proposed algorithm to robust detect salient motion is demonstrated for a variety of real environments with distracting motions such as lighting changes, swaying branches, rippling water, waterfall, and fountains.

...read moreread less

Abstract: Moving object detection is very important for video surveillance. In many environments, motion maybe either interesting (salient) motion (e.g., a person) or uninteresting motion (e.g., swaying branches.) In this paper, we propose a new real-time algorithm to detect salient motion in complex environments by combining temporal difference imaging and a temporal filtered motion field. We assume that the object with salient motion moves in a consistent direction for a period of time. No prior knowledge about object size and shape is necessary. Compared to background subtraction methods, our method does NOT need to learn the background model from hundreds of images and can handle quick image variations; e.g., a light being turned on or off. The average speed of our method is about 50fps on images at size 160x120 in 1GB Pentium III machines. The effectiveness of the proposed algorithm to robust detect salient motion is demonstrated for a variety of real environments with distracting motions such as lighting changes, swaying branches, rippling water, waterfall, and fountains.

...read moreread less

120 citations

Proceedings Article•DOI•

Motion Layer Based Object Removal in Videos

[...]

Yunjun Zhang¹, Jiangjian Xiao¹, Mubarak Shah¹•Institutions (1)

University of Central Florida¹

05 Jan 2005

TL;DR: A novel method to generate plausible video sequences after removing relatively large objects from the original videos is proposed by applying motion layer segmentation method and a set of synthesized layers are generated.

...read moreread less

Abstract: This paper proposes a novel method to generate plausible video sequences after removing relatively large objects from the original videos. In order to maintain temporal coherence among the frames, a motion layer segmentation method is applied. Then, a set of synthesized layers are generated by applying motion compensation and region completion algorithm. Finally, a new video, in which the selected object is removed, is plausibly rendered given the synthesized layers and the motion parameters. A number of example videos are shown in the results to demonstrate the effectiveness of our method

...read moreread less

105 citations

Proceedings Article•DOI•

Deformation Analysis for 3D Face Matching

[...]

Xiaoguang Lu¹, Anil K. Jain¹•Institutions (1)

Michigan State University¹

05 Jan 2005

TL;DR: A face surface matching framework to take into account both rigid and non-rigid variations to match a 2.5D face image to a 3D face model is proposed and the number of errors is reduced.

...read moreread less

Abstract: Current two-dimensional image based face recognition systems encounter difficulties with large facial appearance variations due to the pose, illumination and expression changes. Utilizing 3D information of human faces is promising to handle the pose and lighting variations. While the 3D shape of a face does not change due to head pose (rigid) and lighting changes, it is not invariant to the non-rigid facial movement and evolution, such as expressions and aging effect. We propose a face surface matching framework to take into account both rigid and non-rigid variations to match a 2.5D face image to a 3D face model. The rigid registration is achieved by a modified Iterative Closest Point (ICP) algorithm. The thin plate spline (TPS) model is applied to estimate the deformation displacement vector field, which is used to represent the non-rigid deformation. For the purpose of face matching, the non-rigid deformations from different sources are identified, which is formulated as a two-class classification problem: intra-subject deformation vs. inter-subject deformation. The deformation classification results are integrated with the matching distances to make the final decision. Experimental results on a database containing 100 3D face models and 98 2.5D scans with smiling expression show that the number of errors is reduced from 28 to 18.

...read moreread less

102 citations

Proceedings Article•DOI•

Integrating Range and Texture Information for 3D Face Recognition

[...]

Xiaoguang Lu¹, Anil K. Jain¹•Institutions (1)

Michigan State University¹

05 Jan 2005

TL;DR: A face recognition system that utilizes three-dimensional shape information to make the system more robust to arbitrary view, lighting, and facial appearance is developed and the results show the feasibility of the proposed matching scheme.

...read moreread less

Abstract: The performance of face recognition systems that use two-dimensional images depends on consistent conditions w.r.t. lighting, pose, and facial appearance. We are developing a face recognition system that utilizes three-dimensional shape information to make the system more robust to arbitrary view, lighting, and facial appearance. For each subject, a 3D face model is constructed by integrating several 2.5D face scans from different viewpoints. A 2.5D scan is composed of one range image along with a registered 2D color image. The recognition engine consists of two components, surface matching and appearance-based matching. The surface matching component is based on a modified Iterative Closest Point (ICP) algorithm.The candidate list used for appearance matching is dynamically generated based on the output of the surface matching component, which reduces the complexity of the appearance-based matching stage. The 3D model in the gallery is used to synthesize new appearance samples with pose and illumination variations that are used for discriminant subspace analysis. The weighted sum rule is applied to combine the two matching components. A hierarchical matching structure is designed to further improve the system performance in both accuracy and efficiency. Experimental results are given for matching a database of 100 3D face models with 598 2.5D independent test scans acquired in different pose and lighting conditions, and with some smiling expression. The results show the feasibility of the proposed matching scheme.

...read moreread less

98 citations

Proceedings Article•DOI•

Dynamic Texture Recognition by Spatio-Temporal Multiresolution Histograms

[...]

Zongqing Lu¹, Weixin Xie¹, Jihong Pei², JianJun Huang²•Institutions (2)

Xidian University¹, Shenzhen University²

05 Jan 2005

TL;DR: This work proposed a novel characterization of dynamic textures that poses the problems of recognizing, and described a simple matching algorithm based on multiresolution histogram, which measure difference between two sequences.

...read moreread less

Abstract: Dynamic textures are sequences of images of moving scenes that exhibit certain stationarity properties in time, for example, sea-waves, smoke, foliage, whirlwind etc. This work proposed a novel characterization of dynamic textures that poses the problems of recognizing. A method by spatio-temporal multiresolution histogram based on velocity and acceleration fields is presented. The spatio-temporal multiresolution histogram has many desirable properties including simple computing, spatial efficiency, robustness to noise and ability of encoding spatio-temporal dynamic information, which can reliably capture and represent the motion properties of different image sequences. Velocity and acceleration fields of different spatio-temporal resolution image sequences are accurately estimated by structure tensor method. We describe a simple matching algorithm based on multiresolution histogram, which measure difference between two sequences.

...read moreread less

86 citations

Proceedings Article•DOI•

Acquiring Multi-Scale Images by Pan-Tilt-Zoom Control and Automatic Multi-Camera Calibration

[...]

Andrew W. Senior¹, Arun Hampapur¹, Max Lu¹•Institutions (1)

IBM¹

05 Jan 2005

TL;DR: A novel method to automatically calibrate between multiple cameras, estimating the homography between the cameras in a home position, together with the effects of pan and tilt controls and the expected height of a person in the image is described.

...read moreread less

Abstract: This paper describes a system for automatically acquiring high-resolution images by steering a pan-tilt-zoom camera at targets detected in a fixed camera view. The system uses a novel method to automatically calibrate between multiple cameras, estimating the homography between the cameras in a home position, together with the effects of pan and tilt controls and the expected height of a person in the image. These calibrations are chained together to steer a slave camera. In addition we describe a simple manual calibration scheme

...read moreread less

Proceedings Article•DOI•

Background Subtraction Using Markov Thresholds

[...]

Joshua Migdal¹, W. Eric L. Grimson¹•Institutions (1)

Massachusetts Institute of Technology¹

05 Jan 2005

TL;DR: It is shown that the MRF approach produces more accurate and visually appealing silhouettes that are less prone to noise and background camouflaging effects than traditional per-pixel based methods.

...read moreread less

Abstract: Many video surveillance and identification applications need to find moving objects in the field of view of a stationary camera. A popular method for obtaining these silhouettes is through the process of background subtraction. We present a novel method for comparing image frames to the model of the stationary background that exploits the spatial and temporal dependencies that objects in motion impose on their images. We achieve this through the development and use of Markov random fields of binary segmentation variates. We show that the MRF approach produces more accurate and visually appealing silhouettes that are less prone to noise and background camouflaging effects than traditional per-pixel based methods. Results include visual examination of silhouettes, comparisons against hand-segmented data, and an analysis of the effects of various silhouette extraction techniques on gait recognition performance.

...read moreread less

Proceedings Article•DOI•

Automatic In Situ Identification of Plankton

[...]

Matthew B. Blaschko¹, G. Holness¹, Marwan Mattar¹, Dimitri A. Lisin¹, Paul E. Utgoff¹, Allen R. Hanson¹, Howard Schultz¹, Edward M. Riseman¹ - Show less +4 more•Institutions (1)

University of Massachusetts Amherst¹

05 Jan 2005

TL;DR: A technique for automatic identification of plankton using a variety of features and classification methods including ensembles is presented, expecting that upon completion, the system will become a useful tool for marine biologists to assess the health of the world's oceans.

...read moreread less

Abstract: Earth's oceans are a soup of living micro-organisms known as plankton. As the foundation of the food chain for marine life, plankton are also an integral component of the global carbon cycle which regulates the planet's temperature. In this paper, we present a technique for automatic identification of plankton using a variety of features and classification methods including ensembles. The images were obtained in situ by an instrument known as the flow cytometer and microscope (FlowCAM), that detects particles from a stream of water siphoned directly from the ocean. The images are of necessity of limited resolution, making their identification a rather difficult challenge. We expect that upon completion, our system will become a useful tool for marine biologists to assess the health of the world's oceans.

...read moreread less

Proceedings Article•DOI•

Simultaneous Localization and Recognition of Dynamic Hand Gestures

[...]

Jonathan Alon¹, Vassilis Athitsos¹, Quan Yuan¹, Stan Sclaroff¹•Institutions (1)

Boston University¹

05 Jan 2005

TL;DR: The proposed framework includes translation invariant recognition of gestures, a desirable property for many HCI systems, that allows for multiple candidate feature vectors to be extracted at each time step.

...read moreread less

Abstract: A method for the simultaneous localization and recognition of dynamic hand gestures is proposed. At the core of this method is a dynamic space-time warping (DSTW) algorithm, that aligns a pair of query and model gestures in both space and time. For every frame of the query sequence, feature detectors generate multiple hand region candidates. Dynamic programming is then used to compute both a global matching cost, which is used to recognize the query gesture, and a warping path, which aligns the query and model sequences in time, and also finds the best hand candidate region in every query frame. The proposed framework includes translation invariant recognition of gestures, a desirable property for many HCI systems. The performance of the approach is evaluated on a dataset of hand signed digits gestured by people wearing short sleeve shirts, in front of a background containing other non-hand skin-colored objects. The algorithm simultaneously localizes the gesturing hand and recognizes the hand-signed digit. Although DSTW is illustrated in a gesture recognition setting, the proposed algorithm is a general method for matching time series, that allows for multiple candidate feature vectors to be extracted at each time step.

...read moreread less

Proceedings Article•DOI•

Learning to Detect Small Impact Craters

[...]

Philipp Wetzler¹, Rie Honda¹, B. Enke², William J. Merline², Clark R. Chapman², Michael C. Burl¹ - Show less +2 more•Institutions (2)

University of Colorado Boulder¹, Southwest Research Institute²

05 Jan 2005

TL;DR: The SVM approach with normalized image patches provides detection and localization performance closest to that of human labelers and is shown to be substantially superior to boundary-based approaches such as the Hough transform.

...read moreread less

Abstract: Machine learning techniques have shown considerable promise for visual inspection tasks such as locating human faces in cluttered scenes. In this paper, we examine the utility of such techniques for the scientifically-important problem of detecting and cataloging impact craters in planetary images gathered by spacecraft. Various supervised learning algorithms, including ensemble methods (bagging and AdaBoost with feed-forward neural networks as base learners), support vector machines (SVM), and continuously-scalable template models (CSTM), are employed to derive crater detectors from ground-truthed images. The resulting detectors are evaluated on a challenging set of Viking Orbiter images of Mars containing roughly one thousand craters. The SVM approach with normalized image patches provides detection and localization performance closest to that of human labelers and is shown to be substantially superior to boundary-based approaches such as the Hough transform.

...read moreread less

Proceedings Article•DOI•

Real-Time Detection of Independent Motion using Stereo

[...]

Motilal Agrawal¹, Kurt Konolige¹, Luca Iocchi¹•Institutions (1)

SRI International¹

05 Jan 2005

TL;DR: A system that detects independently moving objects from a mobile platform in real time using a calibrated stereo camera and an efficient three-point algorithm in a RANSAC framework for outlier detection is described.

...read moreread less

Abstract: We describe a system that detects independently moving objects from a mobile platform in real time using a calibrated stereo camera. Interest points are first detected and tracked through the images. These tracks are used to obtain the motion of the platform by using an efficient three-point algorithm in a RANSAC framework for outlier detection. We use a formulation based on disparity space for our inlier computation. In the disparity space, two disparity images of a rigid object are related by a homography that depends on the object's euclidean rigid motion. We use the homography obtained from the camera motion to detect the independently moving objects from the disparity maps obtained by an efficient stereo algorithm. Our system is able to reliably detect the independently moving objects at 16 Hz for a 320 x 240 stereo image sequence using a standard laptop computer.

...read moreread less

Proceedings Article•DOI•

Automatic 2D Hand Tracking in Video Sequences

[...]

Quan Yuan¹, Stan Sclaroff¹, Vassilis Athitsos¹•Institutions (1)

Boston University¹

05 Jan 2005

TL;DR: A temporal filtering framework for hand tracking is proposed that can initialize and reset itself without human intervention, and can automatically identify video trajectories of unambiguous hand motion, and detect frames where tracking becomes ambiguous because of occlusions or overlaps.

...read moreread less

Abstract: In gesture and sign language video sequences, hand motion tends to be rapid, and hands frequently appear in front of each other or in front of the face. Thus, hand location is often ambiguous, and naive color-based hand tracking is insufficient. To improve tracking accuracy, some methods employ a prediction-update framework, but such methods require careful initialization of model parameters, and tend to drift and lose track in extended sequences. In this paper, a temporal filtering framework for hand tracking is proposed that can initialize and reset itself without human intervention. In each frame, simple features like color and motion residue are exploited to identify multiple candidate hand locations. The temporal filter then uses the Viterbi algorithm to select among the candidates from frame to frame. The resulting tracking system can automatically identify video trajectories of unambiguous hand motion, and detect frames where tracking becomes ambiguous because of occlusions or overlaps. Experiments on video sequences of several hundred frames in duration demonstrate the system's ability to track hands robustly, to detect and handle tracking ambiguities, and to extract the trajectories of unambiguous hand motion.

...read moreread less

Proceedings Article•DOI•

Temporal Synchronization of Video Sequences in Theory and in Practice

[...]

Anthony Whitehead¹, Robert Laganiere², Prosenjit Bose¹•Institutions (2)

Carleton University¹, University of Ottawa²

05 Jan 2005

TL;DR: A novel method to temporally synchronize multiple stationary video cameras with overlapping views that suffices for all variants of the synchronization problem exposed by the theoretical disseration and does not rely on the trajectory correspondence problem to be solved apriori.

...read moreread less

Abstract: In this work, we present a formalization of the video synchronization problem that exposes new variants of the problem that have been left unexplored to date. We also present a novel method to temporally synchronize multiple stationary video cameras with overlapping views that: 1) does not rely on certain scene properties, 2) suffices for all variants of the synchronization problem exposed by the theoretical disseration, and 3) does not rely on the trajectory correspondence problem to be solved apriori. The method uses a two stage approach that first approximates the synchronization by tracking moving objects and identifying inflection points. The method then proceeds to refine the estimate using a consensus based matching heuristic to find moving features that best agree with the pre-computed camera geometries from stationary image features. By using the fundamental matrix and the trifocal tensor in the second refinement step we are able to improve the estimation of the first step and handle a broader range of input scenarios and camera conditions.

...read moreread less

Proceedings Article•DOI•

A Fast Multi-Modal Approach to Facial Feature Detection

[...]

C. Boehnen¹, T. Russ²•Institutions (2)

University of Notre Dame¹, Sandia National Laboratories²

05 Jan 2005

TL;DR: This paper presents a method utilizing the registered 2D color and range image of a face to automatically identify the eyes, nose, and mouth and aims to run the algorithm as fast as possible.

...read moreread less

Abstract: As interest in 3D face recognition increases the importance of the initial alignment problem does as well. In this paper we present a method utilizing the registered 2D color and range image of a face to automatically identify the eyes, nose, and mouth. These features are important to initially align faces in both standard 2D and 3D face recognition algorithms. For our algorithm to run as fast as possible, we focus on the 2D color information. This allows the algorithm to run in approximately 4 seconds on a 640times480 image with registered range data. On a database of 1,500 images the algorithm achieved a facial feature detection rate of 99.6% with 0.4% of the images skipped due to hair obstruction of the face.

...read moreread less

Proceedings Article•DOI•

Learning to Track Objects Through Unobserved Regions

[...]

Chris Stauffer¹•Institutions (1)

Massachusetts Institute of Technology¹

05 Jan 2005

TL;DR: This paper investigates an unsupervised hypothesis testing method for learning the characteristics of objects passing unobserved from one observed location to another that is robust to non-stationary traffic processes that result from traffic lights, vehicle grouping, and other non-linear vehicle-vehicle interactions.

...read moreread less

Abstract: As tracking systems become more effective at reliably tracking multiple objects over extended periods of time within single camera views and across overlapping camera views, increasing attention is being focused on tracking objects through periods where they are not observed. This paper investigates an unsupervised hypothesis testing method for learning the characteristics of objects passing unobserved from one observed location to another. This method not only reliably determines whether objects predictably pass from one location to another without performing explicit correspondence, but it approximates the likelihood of those transitions. It is robust to non-stationary traffic processes that result from traffic lights, vehicle grouping, and other non-linear vehicle-vehicle interactions. Synthetic data allows us to test and verify our results for complex traffic situations over multiple city blocks and contrast it with previous approaches.

...read moreread less

Proceedings Article•DOI•

Multi-Scale 3D Scene Flow from Binocular Stereo Sequences

[...]

Rui Li¹, Stan Sclaroff¹•Institutions (1)

Boston University¹

05 Jan 2005

TL;DR: In this article, a multi-scale method along with a novel adaptive smoothing technique is used to gain a regularized solution, which preserves discontinuities and prevents over-regularization.

...read moreread less

Abstract: Scene flow methods estimate the three-dimensional motion field for points in the world, using multi-camera video data. Such methods combine multi-view reconstruction with motion estimation approaches. This paper describes an alternative formulation for dense scene flow estimation that provides convincing results using only two cameras by fusing stereo and optical flow estimation into a single coherent framework. To handle the aperture problems inherent in the estimation task, a multi-scale method along with a novel adaptive smoothing technique is used to gain a regularized solution. This combined approach both preserves discontinuities and prevents over-regularization - two problems commonly associated with basic multi-scale approaches. Internally, the framework generates probability distributions for optical flow and disparity. Taking into account the uncertainty in the intermediate stages allows for more reliable estimation of the 3D scene flow than standard stereo and optical flow methods allow. Experiments with synthetic and real test data demonstrate the effectiveness of the approach.

...read moreread less

Proceedings Article•DOI•

Dental Biometrics: Alignment and Matching of Dental Radiographs

[...]

Hong Chen¹, Anil K. Jain¹•Institutions (1)

Michigan State University¹

05 Jan 2005

TL;DR: All the distances between the given postmortem radiographs and the antemortem radiographs that provide candidate identities are combined to establish the identity of the subject associated with the post autopsy radiographs.

...read moreread less

Abstract: Dental biometrics utilizes the evidence revealed by dental radiographs for human identification. This evidence includes the tooth contours, the relative positions of neighboring teeth, and the shapes of the dental work (e.g., crowns, fillings and bridges). The proposed system has two main stages: feature extraction, and matching. The feature extraction stage uses anisotropic diffusion to enhance the images and a mixture of Gaussians model to segment the dental work. The matching stage has three sequential steps: shape registration, computation of image similarity, and subject identification. In shape registration, we align the tooth contours and obtain the distance between them. A second method based on overlapped areas is used to match the dental work. The distance between the shapes of the teeth and the distance between the shapes of the dental work are then combined using likelihood estimates to improve the retrieval accuracy. At the second step, the correspondence of teeth between two given images is established. A distance measure based on this correspondence is then used to represent the similarity between the two images. Finally, the distances are used to infer the subject's identity.

...read moreread less

Proceedings Article•DOI•

Learning the Behavior of Users in a Public Space through Video Tracking

[...]

Wei Yan¹, David Forsyth¹•Institutions (1)

University of California, Berkeley¹

05 Jan 2005

TL;DR: A video tracking system that tracks and analyzes the behavioral pattern of users in a public space and has obtained important statistical measurements about users' behavior, which can be used to evaluate architectural design in terms of human spatial behavior and model the behavior ofusers in public spaces.

...read moreread less

Abstract: The paper describes a video tracking system that tracks and analyzes the behavioral pattern of users in a public space We have obtained important statistical measurements about users' behavior, which can be used to evaluate architectural design in terms of human spatial behavior and model the behavior of users in public spaces Previously, such measurements could only be obtained through costly manual processes, eg behavioral mapping and time-lapse filming with human examiners Our system has automated the process of analyzing the behavior of users The system consists of a head detector for detecting people in each single frame of the video and data association for tracking people through frames We compared the results obtained using our system with those obtained by manual counting, for a small data set, and found the results to be fairly accurate We then applied the system to a large-scale data set and obtained substantial statistical measurements of parameters such as the total number of users who entered the space, the total number of users who sat by a fountain, the time that each spent by the fountain, etc These statistics allow fundamental rethinking of the way people use a public space This research is a novel application of computer vision in evaluating architectural design in terms of human behavior

...read moreread less

Proceedings Article•DOI•

Stereo-Based Tree Traversability Analysis for Autonomous Off-Road Navigation

[...]

Andres Huertas¹, Larry Matthies¹, Arturo L. Rankin¹•Institutions (1)

California Institute of Technology¹

05 Jan 2005

TL;DR: This paper describes a stereo-based tree traversability algorithm implemented and tested on a robotic vehicle under the DARPA PerceptOR program, and results from the daytime for short baseline (9 cm) and wide baseline (30 cm) stereo are presented.

...read moreread less

Abstract: Autonomous off-road navigation through forested areas is particularly challenging when there exists a mixture of densely distributed thin and thick trees. To make progress through a dense forest, the robot must decide which trees it can push over and which trees it must circumvent. This paper describes a stereo-based tree traversability algorithm implemented and tested on a robotic vehicle under the DARPA PerceptOR program. Edge detection is applied to the left view of the stereo pair to extract long and vertical edge contours. A search step matches anti-parallel line pairs that correspond to the boundaries of individual trees. Stereo ranging is performed and the range data within trunk fragments are averaged. The diameters of each tree is then estimated, based on the average range to the tree, the focal length of the camera, and the distance in pixels between matched contour lines. We use the estimated tree diameters to construct a tree traversability image used in generating a terrain map. In stationary experiments, the average error in estimating the diameter of thirty mature tree trunks (having diameters ranging from 10-65 cm and a distance from the cameras ranging from 2.5-30 meters) was less than 5 cm. Tree traversability results from the daytime for short baseline (9 cm) and wide baseline (30 cm) stereo are presented. Results from nighttime using wide baseline (33.5 cm) thermal infrared stereo are also presented.

...read moreread less

Proceedings Article•DOI•

Shared Features for Scalable Appearance-Based Object Recognition

[...]

Erik Murphy-Chutorian¹, Jochen Triesch²•Institutions (2)

University of California, Los Angeles¹, University of California, San Diego²

05 Jan 2005

TL;DR: Rather than learning and storing feature representations separately for each object, this work creates a finite set of representative features and share these features within and between different object models to achieve fast recognition of a large number of different objects.

...read moreread less

Abstract: We present a framework for learning object representations for fast recognition of a large number of different objects Rather than learning and storing feature representations separately for each object, we create a finite set of representative features and share these features within and between different object models In contrast to traditional recognition methods that scale linearly with the number of objects, the shared features can be exploited by bottom-up search algorithms which require a constant number of feature comparisons for any number of objects We demonstrate the feasibility of this approach on a novel database of 50 everyday objects in cluttered real-world scenes Using Gabor wavelet-response features extracted only at corner points, our system achieves good recognition results despite substantial occlusion and background clutter

...read moreread less

Proceedings Article•DOI•

Persistent Objects Tracking Across Multiple Non Overlapping Cameras

[...]

Jinman Kang¹, Isaac Cohen¹, Gerard Medioni¹•Institutions (1)

University of Southern California¹

05 Jan 2005

TL;DR: This work addresses the tracking problem by modeling the appearance and motion of the moving regions of moving objects by defining a spatio-temporal Joint Probability Data Association Filter (JPDAF) for integrating multiple cues.

...read moreread less

Abstract: We present an approach for persistent tracking of moving objects observed by non-overlapping and moving cameras. Our approach robustly recovers the geometry of non-overlapping views using a moving camera that pans across the scene. We address the tracking problem by modeling the appearance and motion of the moving regions. The appearance of the detected blobs is described by multiple spatial distributions models of blobs' colors and edges. This representation is invariant to 2D rigid and scale transformation. It provides a rich description of the detected regions, and produces an efficient blob similarity measure for tracking. The motion model is obtained using a Kalman Filter (KF) process, which predicts the position of the moving objects while taking into account the camera motion. Tracking is performed by the maximization of a joint probability model combining objects' appearance and motion. The novelty of our approach consists in defining a spatio-temporal Joint Probability Data Association Filter (JPDAF) for integrating multiple cues. The proposed method tracks a large number of moving people with partial and total occlusions and provides automatic handoff of tracked objects. We demonstrate the performance of the system on several real video surveillance sequences.

...read moreread less

Proceedings Article•DOI•

Pseudo-Polar Based Estimation of Large Translations Rotations and Scalings in Images

[...]

Yosi Keller¹, Amir Averbuch², Moshe Israeli³•Institutions (3)

Yale University¹, Tel Aviv University², Technion – Israel Institute of Technology³

05 Jan 2005

TL;DR: This work presents a Fourier-based approach that estimates large translations, scalings, and rotations using the pseudopolar (PP) Fourier transform to achieve substantial improved approximations of the polar and log-polar Fourier transforms of an image.

...read moreread less

Abstract: One of the major challenges related to image registration is the estimation of large motions without prior knowledge This paper presents a Fourier based approach that estimates large translation, scale and rotation motions The algorithm uses the pseudo-polar transform to achieve substantial improved approximations of the polor and log-polar Fourier transforms of an image Thus, rotation and scale changes are reduced to translations which are estimated using phase correlation By utilizing the pseudo-polar grid we increase the performance (accuracy, speed, robustness) of the registration algorithms Scales up to 4 and arbitrary rotation angles can be robustly recovered, compared to a maximum scaling of 2 recovered by the current state-of-the-art algorithms The algorithm utilizes only 1D FFT calculations whose overall complexity is significantly lower than prior works Experimental results demosntrate the applicability of these algorithms

...read moreread less

Proceedings Article•DOI•

Incorporating Background Invariance into Feature-Based Object Recognition

[...]

Andrew Neil Stein¹, Martial Hebert¹•Institutions (1)

Carnegie Mellon University¹

05 Jan 2005

TL;DR: Improvements to the popular scale invariant feature transform (SIFT) are suggested which incorporate local object boundary information and the resulting feature detection and descriptor creation processes are invariant to changes in background.

...read moreread less

Abstract: Current feature-based object recognition methods use information derived from local image patches. For robustness, features are engineered for invariance to various transformations, such as rotation, scaling, or affine warping. When patches overlap object boundaries, however, errors in both detection and matching will almost certainly occur due to inclusion of unwanted background pixels. This is common in real images, which often contain significant background clutter, objects which are not heavily textured, or objects which occupy a relatively small portion of the image. We suggest improvements to the popular scale invariant feature transform (SIFT) which incorporate local object boundary information. The resulting feature detection and descriptor creation processes are invariant to changes in background. We call this method the background and scale invariant feature transform (BSIFT). We demonstrate BSIFT's superior performance in feature detection and matching on synthetic and natural images.

...read moreread less

Proceedings Article•DOI•

Reliable Automatic Calibration of a Marker-Based Position Tracking System

[...]

David Claus¹, Andrew Fitzgibbon¹•Institutions (1)

University of Oxford¹

05 Jan 2005

TL;DR: An accurate vision-based position tracking system which is significantly more robust and reliable over a wide range of environments than existing approaches and nonlinear optimization of the camera position during tracking gives accuracy comparable with full bundle adjustment but at significantly reduced cost.

...read moreread less

Abstract: This paper describes an accurate vision-based position tracking system which is significantly more robust and reliable over a wide range of environments than existing approaches. Based on fiducial detection for robustness, we show how a machine-learning approach allows the development of significantly more reliable fiducial detection than has previously been demonstrated. We calibrate fiducial positions using a structure-from-motion solver. We then show how nonlinear optimization of the camera position during tracking gives accuracy comparable with full bundle adjustment but at significantly reduced cost.

...read moreread less

Proceedings Article•DOI•

Activity Recognition using Visual Tracking and RFID

[...]

Nils Oliver Krahnstoever, Jens Rittscher, Peter Henry Tu, Kevin Chean, T. Tomlinson - Show less +1 more

05 Jan 2005

TL;DR: A framework that combines visual human motion tracking with RFID based object tracking is proposed that enables the accurate estimation of high-level interactions between people and objects for application domains such as retail, home-care, workplace-safety, manufacturing and others.

...read moreread less

Abstract: Computer vision-based articulated human motion tracking is attractive for many applications since it allows unobtrusive and passive estimation of people's activities. Although much progress has been made on human-only tracking, the visual tracking of people that interact with objects such as tools, products, packages, and devices is considerably more challenging. The wide variety of objects, their varying visual appearance, and their varying (and often small) size makes a vision-based understanding of person-object interactions very difficult. To alleviate this problem for at least some application domains, we propose a framework that combines visual human motion tracking with RFID based object tracking. We customized commonly available RFID technology to obtain orientation estimates of objects in the field of RFID emitter coils. The resulting fusion of visual human motion tracking and RFID-based object tracking enables the accurate estimation of high-level interactions between people and objects for application domains such as retail, home-care, workplace-safety, manufacturing and others

...read moreread less