scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Pattern Analysis and Machine Intelligence in 2003"


Journal ArticleDOI
TL;DR: A new approach toward target representation and localization, the central component in visual tracking of nonrigid objects, is proposed, which employs a metric derived from the Bhattacharyya coefficient as similarity measure, and uses the mean shift procedure to perform the optimization.
Abstract: A new approach toward target representation and localization, the central component in visual tracking of nonrigid objects, is proposed. The feature histogram-based target representations are regularized by spatial masking with an isotropic kernel. The masking induces spatially-smooth similarity functions suitable for gradient-based optimization, hence, the target localization problem can be formulated using the basin of attraction of the local maxima. We employ a metric derived from the Bhattacharyya coefficient as similarity measure, and use the mean shift procedure to perform the optimization. In the presented tracking examples, the new method successfully coped with camera motion, partial occlusions, clutter, and target scale variations. Integration with motion filters and data association techniques is also discussed. We describe only a few of the potential applications: exploitation of background information, Kalman tracking using motion models, and face tracking.

4,996 citations


Journal ArticleDOI
TL;DR: This paper presents a method for face recognition across variations in pose, ranging from frontal to profile views, and across a wide range of illuminations, including cast shadows and specular reflections, using computer graphics.
Abstract: This paper presents a method for face recognition across variations in pose, ranging from frontal to profile views, and across a wide range of illuminations, including cast shadows and specular reflections. To account for these variations, the algorithm simulates the process of image formation in 3D space, using computer graphics, and it estimates 3D shape and texture of faces from single images. The estimate is achieved by fitting a statistical, morphable model of 3D faces to images. The model is learned from a set of textured 3D scans of heads. We describe the construction of the morphable model, an algorithm to fit the model to images, and a framework for face identification. In this framework, faces are represented by model parameters for 3D shape and texture. We present results obtained with 4,488 images from the publicly available CMU-PIE database and 1,940 images from the FERET database.

2,187 citations


Journal ArticleDOI
TL;DR: In the Fall of 2000, a database of more than 40,000 facial images of 68 people was collected using the Carnegie Mellon University 3D Room to imaged each person across 13 different poses, under 43 different illumination conditions, and with four different expressions.
Abstract: In the Fall of 2000, we collected a database of more than 40,000 facial images of 68 people. Using the Carnegie Mellon University 3D Room, we imaged each person across 13 different poses, under 43 different illumination conditions, and with four different expressions. We call this the CMU pose, illumination, and expression (PIE) database. We describe the imaging hardware, the collection procedure, the organization of the images, several possible uses, and how to obtain the database.

1,880 citations


Journal ArticleDOI
TL;DR: It is proved that the set of all Lambertian reflectance functions (the mapping from surface normals to intensities) obtained with arbitrary distant light sources lies close to a 9D linear subspace, implying that, in general, theSet of images of a convex Lambertian object obtained under a wide variety of lighting conditions can be approximated accurately by a low-dimensional linear sub space, explaining prior empirical results.
Abstract: We prove that the set of all Lambertian reflectance functions (the mapping from surface normals to intensities) obtained with arbitrary distant light sources lies close to a 9D linear subspace. This implies that, in general, the set of images of a convex Lambertian object obtained under a wide variety of lighting conditions can be approximated accurately by a low-dimensional linear subspace, explaining prior empirical results. We also provide a simple analytic characterization of this linear space. We obtain these results by representing lighting using spherical harmonics and describing the effects of Lambertian materials as the analog of a convolution. These results allow us to construct algorithms for object recognition based on linear methods as well as algorithms that use convex optimization to enforce nonnegative lighting functions. We also show a simple way to enforce nonnegative lighting when the images of an object lie near a 4D linear space. We apply these algorithms to perform face recognition by finding the 3D model that best matches a 2D query image.

1,634 citations


Journal ArticleDOI
TL;DR: A general-purpose method is proposed that combines statistical assumptions with the object-level knowledge of moving objects, apparent objects (ghosts), and shadows acquired in the processing of the previous frames to improve object segmentation and background update.
Abstract: Background subtraction methods are widely exploited for moving object detection in videos in many applications, such as traffic monitoring, human motion capture, and video surveillance. How to correctly and efficiently model and update the background model and how to deal with shadows are two of the most distinguishing and challenging aspects of such approaches. The article proposes a general-purpose method that combines statistical assumptions with the object-level knowledge of moving objects, apparent objects (ghosts), and shadows acquired in the processing of the previous frames. Pixels belonging to moving objects, ghosts, and shadows are processed differently in order to supply an object-based selective update. The proposed approach exploits color information for both background subtraction and shadow detection to improve object segmentation and background update. The approach proves fast, flexible, and precise in terms of both pixel accuracy and reactivity to background changes.

1,521 citations


Journal ArticleDOI
TL;DR: The system consists of a novel device for online palmprint image acquisition and an efficient algorithm for fast palmprint recognition, and a robust image coordinate system is defined to facilitate image alignment for feature extraction.
Abstract: Biometrics-based personal identification is regarded as an effective method for automatically recognizing, with a high confidence, a person's identity. This paper presents a new biometric approach to online personal identification using palmprint technology. In contrast to the existing methods, our online palmprint identification system employs low-resolution palmprint images to achieve effective personal identification. The system consists of two parts: a novel device for online palmprint image acquisition and an efficient algorithm for fast palmprint recognition. A robust image coordinate system is defined to facilitate image alignment for feature extraction. In addition, a 2D Gabor phase encoding scheme is proposed for palmprint feature extraction and representation. The experimental results demonstrate the feasibility of the proposed system.

1,416 citations


Journal ArticleDOI
TL;DR: A physics-based model is presented that describes the appearances of scenes in uniform bad weather conditions and a fast algorithm to restore scene contrast, which is effective under a wide range of weather conditions including haze, mist, fog, and conditions arising due to other aerosols.
Abstract: Images of outdoor scenes captured in bad weather suffer from poor contrast. Under bad weather conditions, the light reaching a camera is severely scattered by the atmosphere. The resulting decay in contrast varies across the scene and is exponential in the depths of scene points. Therefore, traditional space invariant image processing techniques are not sufficient to remove weather effects from images. We present a physics-based model that describes the appearances of scenes in uniform bad weather conditions. Changes in intensities of scene points under different weather conditions provide simple constraints to detect depth discontinuities in the scene and also to compute scene structure. Then, a fast algorithm to restore scene contrast is presented. In contrast to previous techniques, our weather removal algorithm does not require any a priori scene structure, distributions of scene reflectances, or detailed knowledge about the particular weather condition. All the methods described in this paper are effective under a wide range of weather conditions including haze, mist, fog, and conditions arising due to other aerosols. Further, our methods can be applied to gray scale, RGB color, multispectral and even IR images. We also extend our techniques to restore contrast of scenes with moving objects, captured using a video camera.

1,393 citations


Journal ArticleDOI
TL;DR: This work reviews recent advances in computational stereo, focusing primarily on three important topics: correspondence methods, methods for occlusion, and real-time implementations.
Abstract: Extraction of three-dimensional structure of a scene from stereo images is a problem that has been studied by the computer vision community for decades. Early work focused on the fundamentals of image correspondence and stereo geometry. Stereo research has matured significantly throughout the years and many advances in computational stereo continue to be made, allowing stereo to be applied to new and more demanding problems. We review recent advances in computational stereo, focusing primarily on three important topics: correspondence methods, methods for occlusion, and real-time implementations. Throughout, we present tables that summarize and draw distinctions among key ideas and approaches. Where available, we provide comparative analyses and we make suggestions for analyses yet to be done.

1,274 citations


Journal ArticleDOI
TL;DR: This paper formulate the stereo matching problem as a Markov network and solve it using Bayesian belief propagation to obtain the maximum a posteriori (MAP) estimation in the Markovnetwork.
Abstract: In this paper, we formulate the stereo matching problem as a Markov network and solve it using Bayesian belief propagation. The stereo Markov network consists of three coupled Markov random fields that model the following: a smooth field for depth/disparity, a line process for depth discontinuity, and a binary process for occlusion. After eliminating the line process and the binary process by introducing two robust functions, we apply the belief propagation algorithm to obtain the maximum a posteriori (MAP) estimation in the Markov network. Other low-level visual cues (e.g., image segmentation) can also be easily incorporated in our stereo model to obtain better stereo results. Experiments demonstrate that our methods are comparable to the state-of-the-art stereo algorithms for many test cases.

1,272 citations


Journal ArticleDOI
TL;DR: A simple but efficient gait recognition algorithm using spatial-temporal silhouette analysis is proposed that implicitly captures the structural and transitional characteristics of gait.
Abstract: Human identification at a distance has recently gained growing interest from computer vision researchers. Gait recognition aims essentially to address this problem by identifying people based on the way they walk. In this paper, a simple but efficient gait recognition algorithm using spatial-temporal silhouette analysis is proposed. For each image sequence, a background subtraction algorithm and a simple correspondence procedure are first used to segment and track the moving silhouettes of a walking figure. Then, eigenspace transformation based on principal component analysis (PCA) is applied to time-varying distance signals derived from a sequence of silhouette images to reduce the dimensionality of the input feature space. Supervised pattern classification techniques are finally performed in the lower-dimensional eigenspace for recognition. This method implicitly captures the structural and transitional characteristics of gait. Extensive experimental results on outdoor image sequences demonstrate that the proposed algorithm has an encouraging recognition performance with relatively low computational cost.

1,183 citations


Journal ArticleDOI
TL;DR: This paper implemented and tested the ALIP (Automatic Linguistic Indexing of Pictures) system on a photographic image database of 600 different concepts, each with about 40 training images and demonstrated the good accuracy of the system and its high potential in linguistic indexing of photographic images.
Abstract: Automatic linguistic indexing of pictures is an important but highly challenging problem for researchers in computer vision and content-based image retrieval. In this paper, we introduce a statistical modeling approach to this problem. Categorized images are used to train a dictionary of hundreds of statistical models each representing a concept. Images of any given concept are regarded as instances of a stochastic process that characterizes the concept. To measure the extent of association between an image and the textual description of a concept, the likelihood of the occurrence of the image based on the characterizing stochastic process is computed. A high likelihood indicates a strong association. In our experimental implementation, we focus on a particular group of stochastic processes, that is, the two-dimensional multiresolution hidden Markov models (2D MHMMs). We implemented and tested our ALIP (Automatic Linguistic Indexing of Pictures) system on a photographic image database of 600 different concepts, each with about 40 training images. The system is evaluated quantitatively using more than 4,600 images outside the training database and compared with a random annotation scheme. Experiments have demonstrated the good accuracy of the system and its high potential in linguistic indexing of photographic images.

Journal ArticleDOI
TL;DR: A framework for learning robust, adaptive, appearance models to be used for motion-based tracking of natural objects to provide robustness in the face of image outliers, while adapting to natural changes in appearance such as those due to facial expressions or variations in 3D pose.
Abstract: We propose a framework for learning robust, adaptive, appearance models to be used for motion-based tracking of natural objects. The model adapts to slowly changing appearance, and it maintains a natural measure of the stability of the observed image structure during tracking. By identifying stable properties of appearance, we can weight them more heavily for motion estimation, while less stable properties can be proportionately downweighted. The appearance model involves a mixture of stable image structure, learned over long time courses, along with two-frame motion information and an outlier process. An online EM-algorithm is used to adapt the appearance model parameters over time. An implementation of this approach is developed for an appearance model based on the filter responses from a steerable pyramid. This model is used in a motion-based tracking algorithm to provide robustness in the face of image outliers, such as those caused by occlusions, while adapting to natural changes in appearance such as those due to facial expressions or variations in 3D pose.

Journal ArticleDOI
TL;DR: A bank of spatial filters, whose kernels are suitable for iris recognition, is used to capture local characteristics of the iris so as to produce discriminating texture features and results show that the proposed method has an encouraging performance.
Abstract: With an increasing emphasis on security, automated personal identification based on biometrics has been receiving extensive attention over the past decade. Iris recognition, as an emerging biometric recognition approach, is becoming a very active topic in both research and practical applications. In general, a typical iris recognition system includes iris imaging, iris liveness detection, and recognition. This paper focuses on the last issue and describes a new scheme for iris recognition from an image sequence. We first assess the quality of each image in the input sequence and select a clear iris image from such a sequence for subsequent recognition. A bank of spatial filters, whose kernels are suitable for iris recognition, is then used to capture local characteristics of the iris so as to produce discriminating texture features. Experimental results show that the proposed method has an encouraging performance. In particular, a comparative study of existing methods for iris recognition is conducted on an iris image database including 2,255 sequences from 213 subjects. Conclusions based on such a comparison using a nonparametric statistical method (the bootstrap) provide useful information for further research.

Journal ArticleDOI
TL;DR: Li et al. as discussed by the authors presented a new biometric approach to online personal identification using palmprint technology, which consists of two parts: a novel device for online palmprint image acquisition and an efficient algorithm for fast palmprint recognition.
Abstract: —Biometrics-based personal identification is regarded as an effective method for automatically recognizing, with a high confidence, a person's identity. This paper presents a new biometric approach to online personal identification using palmprint technology. In contrast to the existing methods, our online palmprint identification system employs low-resolution palmprint images to achieve effective personal identification. The system consists of two parts: a novel device for online palmprint image acquisition and an efficient algorithm for fast palmprint recognition. A robust image coordinate system is defined to facilitate image alignment for feature extraction. In addition, a 2D Gabor phase encoding scheme is proposed for palmprint feature extraction and representation. The experimental results demonstrate the feasibility of the proposed system.

Journal ArticleDOI
TL;DR: The algorithm, which is based on dimensionality reduction and partial Voronoi diagram construction, can be used for computing the DT for a wide class of distance functions, including the L/sub p/ and chamfer metrics.
Abstract: A sequential algorithm is presented for computing the exact Euclidean distance transform (DT) of a k-dimensional binary image in time linear in the total number of voxels N. The algorithm, which is based on dimensionality reduction and partial Voronoi diagram construction, can be used for computing the DT for a wide class of distance functions, including the L/sub p/ and chamfer metrics. At each dimension level, the DT is computed by constructing the intersection of the Voronoi diagram whose sites are the feature voxels with each row of the image. This construction is performed efficiently by using the DT in the next lower dimension. The correctness and linear time complexity are demonstrated analytically and verified experimentally. The algorithm may be of practical value since it is relatively simple and easy to implement and it is relatively fast (not only does it run in O(N) time but the time constant is small). A simple modification of the algorithm computes the weighted Euclidean DT, which is useful for images with anisotropic voxel dimensions. A parallel version of the algorithm runs in O(N/p) time with p processors.

Journal ArticleDOI
TL;DR: This work uses Wu-Ritt's zero decomposition algorithm to give a complete triangular decomposition for the P3P equation system, and gives some pure geometric criteria for the number of real physical solutions.
Abstract: We use two approaches to solve the perspective-three-point (P3P) problem: the algebraic approach and the geometric approach. In the algebraic approach, we use Wu-Ritt's zero decomposition algorithm to give a complete triangular decomposition for the P3P equation system. This decomposition provides the first complete analytical solution to the P3P problem. We also give a complete solution classification for the P3P equation system, i.e., we give explicit criteria for the P3P problem to have one, two, three, and four solutions. Combining the analytical solutions with the criteria, we provide an algorithm, CASSC, which may be used to find complete and robust numerical solutions to the P3P problem. In the geometric approach, we give some pure geometric criteria for the number of real physical solutions.

Journal ArticleDOI
TL;DR: This paper organizes contributions reported in the literature in four classes two of them are statistical and two are deterministic, and presents a comparative empirical evaluation of representative algorithms selected from these four classes.
Abstract: Moving shadows need careful consideration in the development of robust dynamic scene analysis systems. Moving shadow detection is critical for accurate object detection in video streams since shadow points are often misclassified as object points, causing errors in segmentation and tracking. Many algorithms have been proposed in the literature that deal with shadows. However, a comparative evaluation of the existing approaches is still lacking. In this paper, we present a comprehensive survey of moving shadow detection approaches. We organize contributions reported in the literature in four classes two of them are statistical and two are deterministic. We also present a comparative empirical evaluation of representative algorithms selected from these four classes. Novel quantitative (detection and discrimination rate) and qualitative metrics (scene and object independence, flexibility to shadow situations, and robustness to noise) are proposed to evaluate these classes of algorithms on a benchmark suite of indoor and outdoor video sequences. These video sequences and associated "ground-truth" data are made available at http://cvrr.ucsd.edu/aton/shadow to allow for others in the community to experiment with new algorithms and metrics.

Journal ArticleDOI
TL;DR: A general framework of adaptive local thresholding based on a verification-based multithreshold probing scheme is proposed, regarded as knowledge-guided adaptive thresholding, in contrast to most algorithms known from the literature.
Abstract: In this paper, we propose a general framework of adaptive local thresholding based on a verification-based multithreshold probing scheme. Object hypotheses are generated by binarization using hypothetic thresholds and accepted/rejected by a verification procedure. The application-dependent verification procedure can be designed to fully utilize all relevant informations about the objects of interest. In this sense, our approach is regarded as knowledge-guided adaptive thresholding, in contrast to most algorithms known from the literature. We apply our general framework to detect vessels in retinal images. An experimental evaluation demonstrates superior performance over global thresholding and a vessel detection method recently reported in the literature. Due to its simplicity and general nature, our novel approach is expected to be applicable to a variety of other applications.

Journal ArticleDOI
TL;DR: A new transform is presented that utilizes local radial symmetry to highlight points of interest within a scene and is seen to offer equal or superior performance to contemporary techniques at a relatively low-computational cost.
Abstract: A new transform is presented that utilizes local radial symmetry to highlight points of interest within a scene. Its low-computational complexity and fast runtimes makes this method well-suited for real-time vision applications. The performance of the transform is demonstrated on a wide variety of images and compared with leading techniques from the literature. Both as a facial feature detector and as a generic region of interest detector the new transform is seen to offer equal or superior performance to contemporary techniques at a relatively low-computational cost. A real-time implementation of the transform is presented running at over 60 frames per second on a standard Pentium III PC.

Journal ArticleDOI
TL;DR: A method to solve exactly a first order Markov random field optimization problem in more generality than was previously possible is introduced, which maps the problem into a minimum-cut problem for a directed graph, for which a globally optimal solution can be found in polynomial time.
Abstract: We introduce a method to solve exactly a first order Markov random field optimization problem in more generality than was previously possible. The MRF has a prior term that is convex in terms of a linearly ordered label set. The method maps the problem into a minimum-cut problem for a directed graph, for which a globally optimal solution can be found in polynomial time. The convexity of the prior function in the energy is shown to be necessary and sufficient for the applicability of the method.

Journal ArticleDOI
TL;DR: It is found that recognition performance is not significantly different between the face and the ear, for example, 70.5 percent versus 71.6 percent in one experiment and multimodal recognition using both the ear and face results in statistically significant improvement over either individual biometric.
Abstract: Researchers have suggested that the ear may have advantages over the face for biometric recognition. Our previous experiments with ear and face recognition, using the standard principal component analysis approach, showed lower recognition performance using ear images. We report results of similar experiments on larger data sets that are more rigorously controlled for relative quality of face and ear images. We find that recognition performance is not significantly different between the face and the ear, for example, 70.5 percent versus 71.6 percent, respectively, in one experiment. We also find that multimodal recognition using both the ear and face results in statistically significant improvement over either individual biometric, for example, 90.9 percent in the analogous experiment.

Journal ArticleDOI
TL;DR: A Bayesian approach to supervised learning, which leads to sparse solutions; that is, in which irrelevant parameters are automatically set exactly to zero, and involves no tuning or adjustment of sparseness-controlling hyperparameters.
Abstract: The goal of supervised learning is to infer a functional mapping based on a set of training examples. To achieve good generalization, it is necessary to control the "complexity" of the learned function. In Bayesian approaches, this is done by adopting a prior for the parameters of the function being learned. We propose a Bayesian approach to supervised learning, which leads to sparse solutions; that is, in which irrelevant parameters are automatically set exactly to zero. Other ways to obtain sparse classifiers (such as Laplacian priors, support vector machines) involve (hyper)parameters which control the degree of sparseness of the resulting classifiers; these parameters have to be somehow adjusted/estimated from the training data. In contrast, our approach does not involve any (hyper)parameters to be adjusted or estimated. This is achieved by a hierarchical-Bayes interpretation of the Laplacian prior, which is then modified by the adoption of a Jeffreys' noninformative hyperprior. Implementation is carried out by an expectation-maximization (EM) algorithm. Experiments with several benchmark data sets show that the proposed approach yields state-of-the-art performance. In particular, our method outperforms SVMs and performs competitively with the best alternative techniques, although it involves no tuning or adjustment of sparseness-controlling hyperparameters.

Journal ArticleDOI
TL;DR: This work presents a new class of geometric deformable models designed using a novel topology-preserving level set method, which achieves topology preservation by applying the simple point concept from digital topology.
Abstract: Active contour and surface models, also known as deformable models, are powerful image segmentation techniques. Geometric deformable models implemented using level set methods have advantages over parametric models due to their intrinsic behavior, parameterization independence, and ease of implementation. However, a long claimed advantage of geometric deformable models-the ability to automatically handle topology changes-turns out to be a liability in applications where the object to be segmented has a known topology that must be preserved. We present a new class of geometric deformable models designed using a novel topology-preserving level set method, which achieves topology preservation by applying the simple point concept from digital topology. These new models maintain the other advantages of standard geometric deformable models including subpixel accuracy and production of nonintersecting curves or surfaces. Moreover, since the topology-preserving constraint is enforced efficiently through local computations, the resulting algorithm incurs only nominal computational overhead over standard geometric deformable models. Several experiments on simulated and real data are provided to demonstrate the performance of this new deformable model algorithm.

Journal ArticleDOI
TL;DR: This work presents a method to construct a bending invariant signature for isometric surfaces, an embedding of the geometric structure of the surface in a small dimensional Euclidesan space in which geodesic distances are approximated by Euclidean ones.
Abstract: Isometric surfaces share the same geometric structure, also known as the "first fundamental form." For example, all possible bendings of a given surface that includes all length preserving deformations without tearing or stretching the surface are considered to be isometric. We present a method to construct a bending invariant signature for such surfaces. This invariant representation is an embedding of the geometric structure of the surface in a small dimensional Euclidean space in which geodesic distances are approximated by Euclidean ones. The bending invariant representation is constructed by first measuring the intergeodesic distances between uniformly distributed points on the surface. Next, a multidimensional scaling technique is applied to extract coordinates in a finite dimensional Euclidean space in which geodesic distances are replaced by Euclidean ones. Applying this transform to various surfaces with similar geodesic structures (first fundamental form) maps them into similar signature surfaces. We thereby translate the problem of matching nonrigid objects in various postures into a simpler problem of matching rigid objects. As an example, we show a simple surface classification method that uses our bending invariant signatures.

Journal ArticleDOI
TL;DR: A fast incremental principal component analysis (IPCA) algorithm, called candid covariance-free IPCA (CCIPCA), used to compute the principal components of a sequence of samples incrementally without estimating the covariance matrix (so covariances-free).
Abstract: Appearance-based image analysis techniques require fast computation of principal components of high-dimensional image vectors. We introduce a fast incremental principal component analysis (IPCA) algorithm, called candid covariance-free IPCA (CCIPCA), used to compute the principal components of a sequence of samples incrementally without estimating the covariance matrix (so covariance-free). The new method is motivated by the concept of statistical efficiency (the estimate has the smallest variance given the observed data). To do this, it keeps the scale of observations and computes the mean of observations incrementally, which is an efficient estimate for some well known distributions (e.g., Gaussian), although the highest possible efficiency is not guaranteed in our case because of unknown sample distribution. The method is for real-time applications and, thus, it does not allow iterations. It converges very fast for high-dimensional image vectors. Some links between IPCA and the development of the cerebral cortex are also discussed.

Journal ArticleDOI
TL;DR: The combination of CAMSHIFT and SVMs produces both robust and efficient text detection, as time-consuming texture analyses for less relevant pixels are restricted, leaving only a small part of the input image to be texture-analyzed.
Abstract: The current paper presents a novel texture-based method for detecting texts in images. A support vector machine (SVM) is used to analyze the textural properties of texts. No external texture feature extraction module is used, but rather the intensities of the raw pixels that make up the textural pattern are fed directly to the SVM, which works well even in high-dimensional spaces. Next, text regions are identified by applying a continuously adaptive mean shift algorithm (CAMSHIFT) to the results of the texture analysis. The combination of CAMSHIFT and SVMs produces both robust and efficient text detection, as time-consuming texture analyses for less relevant pixels are restricted, leaving only a small part of the input image to be texture-analyzed.

Journal ArticleDOI
TL;DR: A general framework is presented which allows for a novel set of linear solutions to the pose estimation problem for both n points and n lines and compares the results to two other recent linear algorithms, as well as to iterative approaches.
Abstract: Estimation of camera pose from an image of n points or lines with known correspondence is a thoroughly studied problem in computer vision. Most solutions are iterative and depend on nonlinear optimization of some geometric constraint, either on the world coordinates or on the projections to the image plane. For real-time applications, we are interested in linear or closed-form solutions free of initialization. We present a general framework which allows for a novel set of linear solutions to the pose estimation problem for both n points and n lines. We then analyze the sensitivity of our solutions to image noise and show that the sensitivity analysis can be used as a conservative predictor of error for our algorithms. We present a number of simulations which compare our results to two other recent linear algorithms, as well as to iterative approaches. We conclude with tests on real imagery in an augmented reality setup.

Journal ArticleDOI
TL;DR: It is shown that, if the FOV lines are known, it is possible to disambiguate between multiple possibilities for correspondence, and once these lines are initialized, the homography between the views can also be recovered.
Abstract: We address the issue of tracking moving objects in an environment covered by multiple uncalibrated cameras with overlapping fields of view, typical of most surveillance setups. In such a scenario, it is essential to establish correspondence between tracks of the same object, seen in different cameras, to recover complete information about the object. We call this the problem of consistent labeling of objects when seen in multiple cameras. We employ a novel approach of finding the limits of field of view (FOV) of each camera as visible in the other cameras. We show that, if the FOV lines are known, it is possible to disambiguate between multiple possibilities for correspondence. We present a method to automatically recover these lines by observing motion in the environment, Furthermore, once these lines are initialized, the homography between the views can also be recovered. We present results on indoor and outdoor sequences containing persons and vehicles.

Journal ArticleDOI
TL;DR: The optimal correspondence is found by an efficient dynamic-programming method both for aligning pairs of curve segments and pairs of closed curves, and is effective in the presence of a variety of transformations of the curve.
Abstract: We present a novel approach to finding a correspondence (alignment) between two curves. The correspondence is based on a notion of an alignment curve which treats both curves symmetrically. We then define a similarity metric based on the alignment curve using two intrinsic properties of the curve, namely, length and curvature. The optimal correspondence is found by an efficient dynamic-programming method both for aligning pairs of curve segments and pairs of closed curves, and is effective in the presence of a variety of transformations of the curve. Finally, the correspondence is shown in application to handwritten character recognition, prototype formation, and object recognition, and is potentially useful in other applications such as registration and tracking.

Journal ArticleDOI
TL;DR: This paper develops a reliable algorithm which takes into account the stability of local bandwidth estimates across scales, and demonstrates that, within the large sample approximation, the local covariance is estimated by the matrix that maximizes the magnitude of the normalized mean shift vector.
Abstract: The analysis of a feature space that exhibits multiscale patterns often requires kernel estimation techniques with locally adaptive bandwidths, such as the variable-bandwidth mean shift. Proper selection of the kernel bandwidth is, however, a critical step for superior space analysis and partitioning. This paper presents a mean shift-based approach for local bandwidth selection in the multimodal, multivariate case. The method is based on a fundamental property of normal distributions regarding the bias of the normalized density gradient. This paper demonstrates that, within the large sample approximation, the local covariance is estimated by the matrix that maximizes the magnitude of the normalized mean shift vector. Using this property, the paper develops a reliable algorithm which takes into account the stability of local bandwidth estimates across scales. The validity of the theoretical results is proven in various space partitioning experiments involving the variable-bandwidth mean shift.